Performance Evaluation in Call Centers: An Investigation ... · This research explores the use of...

Washington University in St. Louis

John M. Olin Business School

Performance Evaluation in Call Centers:

An Investigation into the Use of Analytics Tools

Prepared by Hossam Abuelwafa

Candidate for MS in Supply Chain Management

Committee Members

Professor Sergio Chayet

Professor Amr Farahat (Research Advisor)

Associate Dean Gregory Hutchings

December 15th 2014

2 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s

Abstract

This research explores the use of analytics tools in evaluating performance of inbound call centers, both

aggregate call center performance and call center agent performance. The definition of Performance in call centers

is dependent on the company’s value proposition, which makes defining performance in general term a challenge.

Call Center performance is multi-dimensional which calls for the use of more capable tools to help define overall

performance on a single scale. In addition, since the call center industry is closely intertwined with outsourcing

agreements, outsiders to call center operations often have to evaluate the performance of “outsourcing

destinations” which calls for a tool that is fair and doesn’t require much subject-matter knowledge. The research

explores Data Envelopment Analysis (DEA) and linear regression as possible analytics tools fit for tackling the

performance evaluation challenge in call centers. DEA and linear regression were applied to real call center and

agent performance data - coming from two different call centers in the Middle East North Africa (MENA) region - to

illustrate the possible insights that can be brought in by both methodologies, in addition to the strengths and

weaknesses of each analytics tool in tackling the performance evaluation challenge in call centers.

The Main contributions of this research can be summarized as follows:

This research represents a novel application of DEA in tackling call center performance evaluation

challenge. As far as we know, DEA has not been applied to this problem before.

In the analysis chapters of this research we have applied DEA to “Decision Making Units” (DMUs) that are

the same entity but over time. This use of DEA is quite uncommon to the DEA literature.

Since the majority of research in call centers involve “Queueing analysis”, the call center data that is

available for use from previous research was designed to meet the queueing analysis needs (Gans, Koole

and Mandelbaum), which is very different from what is needed for studying performance evaluation. So,

this research provides real call center performance data coming from two different call centers in the

MENA region.

Last but not least, this research involves an explicit comparison between DEA and linear regression in

terms of their fit for use in call center performance evaluation.

After conducting the different analyses we have come to the conclusion that, of the two analytics tools explored

- DEA and linear regression – DEA proves to be the best fit to that particular use in performance evaluation in call

centers on both levels of analysis, the call center aggregate performance analysis and the call center agent’s

performance analysis. The research concludes with a very important question that perhaps can act as an area for

future research. The question is that “How can the call center agent’s experience be incorporated in the DEA

analysis and still render accurate results?”


Table of Contents Abstract ..................................................................................................................................................................................................... 2

Chapter 1: Call Center Operations ...................................................................................................................................................... 4

1.1 Overview of the Call Center industry ...................................................................................................................................... 4

1.1.1 Different Types of Call Centers ......................................................................................................................................... 5

1.1.2 Different organizational structures of Call Centers ...................................................................................................... 5

1.1.3 Different roles in Call Centers ........................................................................................................................................... 7

1.2 Outsourcing in call centers ........................................................................................................................................................ 9

1.3 Performance Measurement in Call Centers ......................................................................................................................... 10

1.3.1 Aggregate Call Center Performance ............................................................................................................................... 10

1.3.2 Call Center Agent Performance ....................................................................................................................................... 11

Chapter 2: Analytics as a Benchmarking tool for Call Centers .................................................................................................... 14

2.1 Benchmarking as a platform for Performance Evaluation in call centers ..................................................................... 14

2.2 Linear Regression as an Empirical Benchmarking Analytics tool ................................................................................... 16

2.3 DEA as an Empirical Benchmarking Analytics tool ............................................................................................................ 19

Chapter 3: Case study background and data description ............................................................................................................. 24

3.1 Company Description ............................................................................................................................................................... 24

3.2 Company A - Dataset 1: Call Center’s Aggregate Performance ......................................................................................... 25

3.4 Company B - Dataset 2: Agent’s overall performance ........................................................................................................ 26

Chapter 4: Aggregate Performance Tracking ................................................................................................................................. 28

4.1 Introduction ............................................................................................................................................................................... 28

4.2 Preliminary Data Analysis ....................................................................................................................................................... 29

4.3 Theoretical Benchmarking “Queueing Analysis” ................................................................................................................ 36

4.4 Empirical Self-Benchmarking - I “Multiple Regression” – Dataset 1 ............................................................................... 39

4.5 Empirical Self-Benchmarking - II “Data Envelopment Analysis” – Dataset 1 ................................................................ 41

4.6 Summary of Findings ................................................................................................................................................................ 49

Chapter 5: Agent Performance Assessment .................................................................................................................................... 50

5.1 Introduction ............................................................................................................................................................................... 50

5.2 Preliminary Data Analysis ....................................................................................................................................................... 51

5.3 Absolute Benchmarking “Performance Targets” ................................................................................................................ 56

5.4 Empirical Peer Benchmarking – I “Multiple Regression” – Dataset 2 ............................................................................. 57

5.5 Empirical Peer Benchmarking – II “Data Envelopment Analysis” – Dataset 2 .............................................................. 60

5.6 Summary of findings ................................................................................................................................................................. 63

Chapter 6: Conclusions, and future research opportunities ........................................................................................................ 64

Appendix ................................................................................................................................................................................................ 66

References ............................................................................................................................................................................................. 82


Chapter 1: Call Center Operations

1.1 Overview of the Call Center industry

The Goal behind this chapter

In this chapter, we will try to provide a brief tutorial on the call center industry, in order to help the reader

understand the background of the analysis to be done in later chapters. This chapter will start with a brief

description of call centers, the different types of call centers, and the functional roles involved in call center

operations. Then we will shift gears and talk about “outsourcing in call centers” in which we will explain common

forms of outsourcing agreements through the lens of call centers. Finally we will discuss some of the important

performance metrics involved in call center performance evaluation on both the aggregate call center level, and the

call center agent level. By the end of this chapter, we hope that the reader will have a degree of understanding of

the call center operations that is sufficient to act as a background for the performance measurement analysis that

will follow in later chapters.

The Call Center Business

Let’s start by looking at the definition of the word “Call center” in the Oxford dictionary. Call Center (noun)

is “An office set up to handle a large volume of telephone calls, especially for taking orders and providing customer

service”. It is not very clear when exactly did call centers start, but the call centers we know today have started

hand in hand with the invention of the “Automated Call Distributor” ACD (Call Center Helper Magazine). ACD uses

computerized technology and algorithms to filter through calls, and assign the right calls to the right call center

agents, based on some pre-set rules. For example, if a call center wants to promote familiarity between customers

and call center agents, they usually set the ACD to route the customer calls to the agents that they have recently

spoken to, provided that those agents are free at the time the customer calls. Before ACD technology was invented,

it was usually a human operator who manually transferred the calls to various agents to handle the customer

inquiries.

In the start of 2000s, the size of the global call center industry grew drastically to reach $40.1 billion in

2003, most of that growth was is in the Communications industry (18.5%), followed closely by the Outsourcing

industry (15.6%) (Datamonitor). The industry employed almost 5 million call center agents by the end of the year

2003 (Datamonitor). Thanks to the enabling technologies such as the cloud computing technology, call centers

nowadays have the flexibility of operating literally anywhere in the world, which has really helped make some

serious savings to an industry that is mainly considered a “cost center” to many companies. Moreover, as the

pressure on call center management to be more efficient increases, the majority of call center managers think that

“improved analytics” is a key innovation area that will benefit their call centers (Dimension Data).


1.1.1 Different Types of Call Centers

As similar as call centers might seem to many, call centers are quite different in terms of the services they offer.

Enabled by technological advancements, call centers offer a wide range of services that can fit the needs of many

business processes and industries. The rule of thumb here is “if it can be done remotely, it can be done in a call

center”. Since the industry applications of call centers are too many to count, call center types are usually classified

based on the type of calls they engage in. The classification is as follows:

Outbound call centers - These are the call centers that mainly engage in outbound calls “i.e. call out

customers”. Examples of outbound call centers include:

- Telemarketing campaigns, in which call centers call customers to create brand awareness and/or

promote certain limited-time offers

- Telesales, which has the purpose of generating a sale on the phone, whether by cold calling new

customers, or generating more sales from existing customer base.

Inbound call centers - Those are the main focus in this research, they mainly receive calls from their

customers but sometimes they also use outbound calls to follow up with customers. Examples of

application here are:

- Technical support, in which customers call the company to receive over-the-phone technical service to

the products purchased.

- Airline booking, in which customers call an airline call center to inquire about a flight status, make a

booking, or manage their tickets and/or luggage.

- Banking Service, which is used by many banking customers to manage their bank accounts, transfer

funds, and inquire about other banking services over the phone.

Omni Channel call centers – Technological advancements nowadays made it possible to interact with

customers through many different platforms, not just the phone anymore. In addition to conventional

phone lines, Omni channel call centers interact with their customers through many different

communication channels such as live chat, email, TXT, etc…

1.1.2 Different organizational structures of Call Centers

Another main difference between call centers is in the way they organize their operations to fit the nature

of the services they offer and the nature of the industry in which they operate. As call centers vary in size, services,

and location, different organizational structures are hence needed to best manage these differences. There are

several dimensions in which the organizational structure can vary, these dimensions are:

Specialized versus Pooled call centers

- Pooled Call Centers - Some call centers prefer to have “Generic Agents”, in which case, all customer

inquiries will be answered by the same generic agent who first picks up the phone, this is usually better

if there services don’t require much of technical knowledge because it enhances a very important


metric called “First Call Resolution” (FCR) , which measures the percentage of calls that a call center

handles in which the customer inquiry was resolved from the first call.

- Specialized Call Centers - Some other more technical call centers are obligated to follow a “Specialized

Queue” structure, in which every group of agents with similar technical knowledge are organized in a

separate department “aka. Queue”. When customers call up, they are greeted by the “Interactive Voice

Response” unit (IVR) which asks the customer a few questions to determine which queue they need to

join, and then the ACD routes the customer to the right queue based on IVR’s instructions.

Flat versus Multi-layered Call Centers

- Flat Call Centers - Some Call centers are flat in the sense that the frontline “aka. First Line” agents have

all the technical tools and authority they need to resolve any customer issues with the exception of very

rare cases in which supervisors need to jump in and take over, in which case it is regarded as an

“Escalation”, which is a term used to reflect a customer complaint that is escalated to a higher decision

authority such as a manager or a supervisor.

- Multi-layered Call Centers - On the other hand, some other call centers with more technical needs, in an

attempt to be very efficient in the use of their most valuable resources such as engineers, they group

these resources in a higher level usually called “Second Line” in which they receive service requests

from the first line agents if they encounter a technical problem that is way above their head or the

power/resources given to them. Then the second line agents call up the customer directly to help

resolve their technical issue.

Physical versus Virtual Call Centers

- Physical Call Centers – These are the usual call centers, in which the call center is located in a normal

office building or a small office, depending on the size of the call center.

- Virtual Call Centers – These call centers usually do not operate in a physical workspace as the “Physical

Call Centers”, rather, they employ different agents who work from home with the equipment that the

call center have set up for them. This is more widely used in outbound call centers, due to the flexible

nature of outbound calls “i.e. the agent decides when to call”, which is more fit to a work-from-home

lifestyle. Hence, virtual call centers enjoy the savings of running virtual operations. Also, enabled by

“Cloud Computing” technology, virtual call centers are granted access to a new demographic of call

center agents “i.e. stay-home Moms”, a demographic that is more stable in nature, which will help

reduce the too high turnover rate in the industry.


Centralized versus Decentralized Call Centers

- Centralized Call Centers - Some call centers who believe in the “Pooling Effects” in reducing overall costs

would rather have one huge call center in which all their call center operations are nested. This allows

them to enjoy the economies of scale in every aspect of running the call center. On the other hand, from

a business risk management perspective, it makes them very susceptible to business disruptions.

- Decentralized Call Centers - Other call centers have multiple relatively smaller locations, and whenever

a customer calls, the call can either be routed to the call center closest to their location, or to the least

busy call center.

1.1.3 Different roles in Call Centers

Going deeper into call center operations, we need to explore the various roles played by different call

center functions to run a successful call center. Given the wide spectrum of call centers, we understand that a

certain degree of variability will exist among different call centers in terms of the role names or the exact role

definitions, but we will try to be as general as possible in our description. The main roles are as follows:

1- Human Resource Administration – This function is responsible for planning the human resource needs of

the call center, recruiting, screening and hiring the new call center agents. Many call centers, especially in

India, have developed a technical “Aptitude tests” that are used to measure generic core skills (e.g. problem

solving, mathematical, communication, etc…) needed to work in a call center. These “Aptitude tests” speed

up the hiring process in call centers by eliminating the need for further technical evaluation. The human

resource administration also supervise the application of company’s policies and procedures (e.g. dress

code), supervise generic employee performance management, and is responsible for payroll management.

2- Training and Development – The training function is responsible for training the newly hired agents on

technical procedures and knowledge as well as company-specific culture, policy and other job-related

matters. They are the ones responsible for training current agents on any new changes in technology or

policy and procedures. In case a new or existing agent underperforms, the training department is there to

support.

3- Operations Management – This function is carried out by Team supervisors, Real-time Managers (RTMs),

and Account Managers, and all the call center agents. Call Center agents are grouped in teams, which makes

the task of managing call center agent performance easier. Every team is led by a “Team Supervisor” who is

held accountable for the “key performance indicators” KPIs of his/her team. If a call center agent

underperforms on specific KPIs, the team leader agrees with the agent on performance correction plans


that both of them sign. the actions taken by a supervisor against underperformance are usually ranked as

follows:

Verbal warning

Performance plan

First written warning

Second written warning

Retraining (sometimes)

Termination of the call center agent

The role of the Real-Time Manager is simple, he/she is responsible for monitoring the Call Queue at all

times. Maintaining Schedule and break adherence, which means that everyone is logging in in the right time, and

taking their breaks in the right time as scheduled. He has the power to adjust breaks instantly to respond to

unexpected surges in call volume received, in addition to the power to bring all the “off-queue” agents (performing

off-queue tasks) back to queue or in some extreme cases he/she can resort to making the Team Supervisors – who

were agents one day – log in and take some calls to help reduce the queue length. To sum up, he is the individual

responsible for “Service Level”, which is a main call center metric, which we will explain later in more detail.

Last but not least, in case the call center is an outsourcing destination (i.e. handles more than a call center

account) the role of “Account Manager” is then necessary, who is accountable for the overall performance of the

whole account and is held accountable directly by the “client” company, which is the company that outsourced its

call center. His role is similar to that of a team supervisor in the sense that he manages the KPIs of every team

supervisor and takes necessary actions to fix underperformance problems.

4- Workforce Management – the main responsibility of workforce management is to create call center

personnel shift-schedules in a way that meets planned human resource needs on a day-to-day basis. If a call

center agent needs to adjust his/her breaks, he/she should email workforce management, if the agent

needs to take a leave he/she needs to talk to them also. So, they are responsible for anything that has to do

with schedules and workforce planning.

5- Quality Management – Quality Trainers are responsible for issuing the quality score for each call center

agent every week/month. The rating is done by evaluating the quality of a certain number of calls for every

agent every week/month. The quality score serves as a main ingredient of the agent’s scorecard, which we

will touch upon later (See Appendix 1.2 for a sample of a scorecard). Quality trainers are also responsible

for coordinating with the training team to support the training of call center agents on all quality-related

matters if underperformance is detected, or in case of newly hired orientation training.


6- IT Management (Help Desk) – This is usually a call center within the call center that serves all the

employees having any technical difficulties with their log in credentials, terminals, or equipment.

7- Facility and Fleet Management – This function is responsible for maintaining the building in which the call

center is located in addition to organizing the trips of employees from and to the building by the use of

hired or owned fleet of buses.

1.2 Outsourcing in call centers

As we mentioned before, the pressure on call centers to be more efficient has led the call center industry to

be one of the earliest and most aggressive adopters of outsourcing. The second highest market segment in call

centers is the outsourcing industry (Datamonitor). Over the years, the two industries “Call Center and Outsourcing”

have been very closely intertwined to the extent that to non-experts outsourcing usually meant call centers. This

sub-section represents a very good opportunity to lay down a brief description of outsourcing in the call center

industry. Let’s start by defining the different parties involved in a usual call center outsourcing agreement:

Client Company – This is the company that wants to outsource its call center function to another

professional company to take care of it. This company’s customers will be served in the new call center for

an exchange for a fee paid by the client company.

Outsourcing Destination – This represents the company that is offering the Call Center Management Service

to the client company in return for a fee that they receive.

As we all know, as a part of any service industry like call centers, planning to match call demand with the right

supply of call center agents to handle that demand is very challenging. As a result, many companies use

outsourcing contracts as a way to better match supply and demand. Let us examine the different types of contracts

associated with call center outsourcing (Aksin, Vericourt and Karaesmen):

1- Pay for Capacity Contract: In this type of contract, the client company rents a fixed capacity at the

outsourcing destination’s call center for a fixed fee. This is usually used when the client company wishes to

outsource the “predictable” portion of their call demand, while keeping a smaller sized call center in-house

to absorb the fluctuations in call demand. For example, if a call center knows that the usually receive over

3,000 calls a week, they might outsource enough capacity to meet the 3,000 calls, while keeping a small in-

house call center to absorb the calls in-excess of the 3,000 expected calls. This type of contracts is much

more economical to maintain because it is easier for the outsourcing destination to plan for capacity.

2- Pay for Job Contract: This contract type is the exact reverse of the previous type. In which the client

company decides to keep the “predictable” portion of their call demand in-house, while outsourcing the

“excess” to an outsourcing destination for a variable fee depending on the amount of excess they handle.


This is usually more costly to maintain, because the outsourcing destination will have to keep enough

“safety stock” of agents to fight the unpredictability of the excess call demand.

1.3 Performance Measurement in Call Centers

In this sub-section, we will explore many of the previously mentioned and unexplained call center

performance “terms”. The idea is that this chapter will help the reader understand the critical aspects of a call

center performance that needs to be measured and managed. Since our analysis chapters will be divided equally

into “Call Center Performance” and “Call Center Agent Performance”, it only makes sense to do the same here. We

will start our conversation with the definition of performance in an “inbound” call center, and the different metrics

that are normally used to measure that performance. Then, we will define “agent performance” in an inbound call

center setting, and explore all the relevant metrics based on that definition.

1.3.1 Aggregate Call Center Performance

As we can see by now, the call center industry is a highly competitive, fast-paced industry, in which being

efficient isn’t a luxury. In an environment like that, the quality of our decisions depend heavily on the accuracy of

our information, which is mainly derived from performance data. Hence, the ability to accurately and meaningfully

evaluate the performance in call centers is in fact a determining factor to the ability of the call center management

to make the right decisions to drive efficiencies and enhance service quality, thus have better chances of survival

and growth in the marketplace. Let us start by defining performance on a call center level.

Call Center performance can be defined in terms of the tasks it is supposed to carry out, the basic tasks of

an inbound call center are usually as follows:

1- Answer the majority of customer calls in a timely fashion, given a certain threshold of wait time.

2- Provide appropriate service quality to customers. The appropriateness of service should be defined

through the client company’s strategic positioning or value proposition.

3- Provide the quality service fast enough to appreciate the valuableness of customer time. In addition, if

service takes too long, it will be very costly to maintain.

Now, let’s explore various aggregate performance metrics that are at a call center level. Then we will try to

relate them to one or more of the tasks mentioned above. The metrics are as follows:

Service level – Can be described as the percentage of calls received, that was answered within a

given threshold of time (e.g. 20 seconds) from when the call was received. This metric is one of the

most important metrics in the call center business to the extent that some call center outsourcing

contracts - called “Service Level Agreements” (SLAs) – tie the compensation of the outsourcing

destination to the service level achieved. This metric measures the effectiveness in executing “task

1” above. This metric can be presented on the “queue level” or the “call center level”. The

orientation of this metric is “the higher, the better”


Average Handling Time (AHT) “Aggregated to the call center level” – This metric describes three

main averages combined. It represents the average time “1” taken by an average agent “2” to serve

an average customer “3”. As we mentioned before, many call centers adopt the specialized

organizational structure, which means that every queue employs a different kind of agents, receives

a different kind of calls, and each kind of call requires a different average handling time. Thus,

looking at the AHT on a call center level, gives us an idea of the average time that the average

customers will spend in our call center. This metric measures the effectiveness in executing “task 3”

above. The orientation of this metric is “the lower, the better”

Abandoned rate (%) “5 seconds” - This metric represents the percentage of callers on a given

day/week/month that has called the call center, waited on the line and stayed for more than 5

seconds on the line, and then hung up “i.e. reneged the line”. The reason we look at the number of

Abandoned calls after 5 seconds, because usually if a call hangs up before 5 seconds, it is usually a

technical problem or a customer error rather than the length of queue line. This metric represents

the magnitude of failure in accomplishing “task 1” above. The orientation of this metric is “the

lower, the better”

Average Speed to Answer - This metric also relates to “task 1”. This metric represents the average

“wait time” that callers have to spend on the line before being handled by a call center agent. The

orientation of this metric is “the lower, the better”

Customer Satisfaction Survey (CSAT) score “aggregated to call center level” - This metric is usually

the only metric that gives a sense of service quality (from a customer perspective) on the call center

level “i.e. task 2”. The CSAT is the survey that callers usually are asked to take after finishing up a

call with a call center agent. The survey asks them to rate the different dimensions of the customer

service experience. After that, all the CSAT data is aggregated to form a single number that reflects

how satisfied the customers are with the call center. The orientation of this metric is “the higher,

the better”

As we mentioned before, these metrics are some of the most popular metrics used in inbound call center

operations, as we can see, although different metrics can measure the same task from various angles, managers

usually favor some metrics over the others. Let us examine the performance definition on an “agent level”.

1.3.2 Call Center Agent Performance

Call center performance is at the end of the day the result of each agent’s performance added together.

From here, the importance of measuring and evaluating call center agents’ performance stems. Inbound call center

agents’ jobs, as different as they are, they all share the same basic tasks that have to be completed to ensure

successful call center operations. These tasks are:


1- The agent has to be present in the right time for his/her scheduled shift

2- During the shift, the agent has to stick to the scheduled break times

3- The agent has to work an amount of hours that are at least equal to his scheduled working hours

4- Call center agents must manage their time well on the call, in order to respect the callers time, and to

give equal attention to other callers

5- The agent has to serve the customer with high quality, while quality is broken down to specific tasks by

the quality management team (See Appendix 1.1 for a sample of quality tasks breakdown)

These are the main basic tasks that an agent has to perform to result in an overall successful call center.

Any deviation in the tasks above caused by one or more agents, will affect the call center’s overall performance

metrics discussed in the previous section negatively. Now, let us examine a group of common agent performance

metrics used in most call centers:

Agent Absenteeism (%) “No-shows” – This metric represents the percentage of shifts in which the

agent was present relative to all the shifts that he/she was scheduled to attend. The importance of

this metric can hardly be overestimated, because if the agent isn’t there to begin with, nothing of

the other tasks can be achieved. In addition, call centers usually plan well ahead of time to make

sure that they have enough resources “i.e. agents” to meet the forecasted call demand. This means

that an absent agent means a lost portion of service level, longer waiting time for most customers,

and a more stressful shift for all of their colleagues. As a result, there is a very low tolerance for

absenteeism in call centers in general. This metrics relates to “task 1” mentioned above. The

orientation of this metric is “the lower, the better”

Agent Adherence (%) – This metric represents the percentage of time a certain agent was actively

logged in “i.e. working” relative to the exact time he/she was scheduled to work. For example, if an

agent was scheduled for a break of 15 minutes, and decided to take 20 minutes instead, his/her

adherence will go down as a result. Similarly, if an agent was scheduled to have break or start

his/her shift at 12:00 pm, and he decided to delay the break or the shift start time to 12:30 pm

without coordinating with the workforce management team, these 30 minutes will be reflected

negatively in that agent’s adherence percentage even if he/she plans to stay an extra 30 minutes at

the end of the shift. For inbound call centers, 30 minutes in the middle of the day are completely

different from 30 minutes by the end of the shift, as the demand pattern is completely different.

Accordingly, call centers also have a very low tolerance for consistent underperformance in

schedule adherence because of its direct effect on service level. This metric relates to “task 2”

above. And the orientation of this metric is “the higher, the better”


Agent Conformance (%) – Conformance to schedule is very close to adherence but yet very different.

Adherence cares about whether the agent was logged in at the time “e.g. 1:39 pm” he/she was

scheduled to be logged in. While Conformance cares about the total amount of hours the agent has

worked. So, if an agent was scheduled to work for 8 hours and ended up working for only 7.5 hours,

his/her conformance score will be affected negatively. On the other hand, if an agent logged in for

more than he/she was scheduled for his/her conformance will be more than 100%. This metric

relates to “task 3” above, and its orientation is “the higher, the better”

Agent AHT – This metric measures the average duration of the agent’s calls. This metric is call

center’s frontline defense against losing service level, because when call center staffing decisions

are made, they use the AHT to calculate each agent’s service capacity. Hence, they can decide how

many agents to schedule for each shift given the forecasted call demand. Of course, call centers need

to have safety capacity to account for AHT variations, so they usually overestimate AHT to end up

having a slightly smaller utilization than desired, but that’s normal because this acts as their safety

stock against unexpected surges in call demand. This metric relates to “task 4” above, and is

oriented as “the lower, the better”.

After-call work (ACW) – This metric represents the total time an agent has spent in an ACW status

during his/her shift. The ACW status prevents an agent from getting new calls, while he/she is

wrapping up the required work from the previous call. Since the agent during the ACW status is not

taking any calls, thus not contributing to the service level, call centers do not prefer that agents use

ACW casually. That’s why many call center agents are trained to multi-task so that they can save

some valuable ACW time. This metric relates to “task 4” above, and is oriented as “the lower, the

better”

Quality Score (%) – This is the score given to an agent by a “quality coach” from the quality

management team after he listens to a sample of the agent’s calls during the week/month. The

score is based on a rubric that is already known by the agent, the rubric represents the call center’s

understanding of good customer experience, which optimally should be based on marketing

surveys and research (See Appendix 1.1 for an example of a quality rubric). This metric relates to

“task 5” above, and is oriented as “the higher, the better”.

A last comment on quality is that the quality score (%) is an internal measure of quality, while the only

external quality measure remains to be the CSAT survey, which isn’t always available on an agent level. Although,

some call centers look at it on the agent level, but it is more common to see it on a call center level. Last but not

least, the agent performance metrics mentioned above are usually grouped into 3 main categories in the agent’s

scorecards (See Appendix 1.2 for a sample of a real scorecard for a Telecommunications company in the MENA

region), these categories are Agent’s Productivity, Quality, and Punctuality.


Chapter 2: Analytics as a Benchmarking tool for Call Centers

2.1 Benchmarking as a platform for Performance Evaluation in call centers

The Goal behind this Chapter

This chapter aims at identifying the possible routes for call centers to evaluate their performance without

the use of analytics. Then, the chapter moves into a description of the analytics tools that will be deployed in this

research, specifically “Linear Regression” and “Data Envelopment Analysis” (DEA), as an alternative to some

conventional benchmarking routes that will be explored in the first half of the chapter. The analytics tools

discussion will involve a technical description of the models to be used in the upcoming analysis chapters.

Benchmarking in Performance Evaluation

So far we have been talking about the definition of performance from a functional perspective, which

means that if the job is done then we have accomplished our goal, regardless of the costs involved. But as the

pressure on call centers to become more efficient increases, call centers need to focus on cost-conscious

operations. In other words, the race is not “who will get the job done?” rather it is “who gets the job done

cheapest?”. This is also very consistent with the outsourcing expansion in the call center industry that we are

witnessing today. Outsourcing destinations are able to win the client company call centers because they can do it

cheaper. It is very important to understand that from a business profitability perspective, call centers should not

focus on achieving perfect scores on all the metrics, rather, they should focus on achieving the level of service that

adds value in terms of the company’s value proposition without over producing in non-value adding metrics. For

example, achieving 100% service level is quite costly to maintain in terms of staffing needs, while the customers do

not care if they wait a little bit on the phone, especially if the company is positioned as a low-cost leader rather

than a high-service provider. So, if a call center strives to achieve the 100% service level even though the

customers don’t see that as an added value to them, it will be regarded as overproduction on a metric that will not

add any value to these specific customers. To sum up, performance isn’t absolute, as important as defining

performance is, we also need to define the “profitable” levels of performance for the call center. As a result, in order

to develop an understanding of the “profitable” range of performance, call centers need to benchmark their

performance either to (1) Absolute Benchmarks or (2) Empirical Benchmarks.

Absolute Benchmarking

For the purposes of this research, this type of benchmarking involves defining the call center’s target

performance by comparing it to an “Absolute Benchmark”. This benchmark may very well be a (1) Performance

Targets that were developed as a result of a deep understanding of the company’s value proposition as well as the

customer’s perspective on “value added services”. These performance targets as we will explain in more detail in

chapter 5 are considered a reasonable definition of “profitable” performance ranges that are customized in every

sense to that specific call center. (2) Theoretical Benchmarks, which are benchmarks that are synthesized by using


theoretical models such as “Queueing theory” to produce a theoretical benchmark of performance based on the call

center’s means of production, or input parameters, such as “staffing level, call demand, AHT, etc…”. These two

categories of absolute benchmarks are very useful in the sense that they are relatively easier and more feasible to

produce and deploy, but they run the risk of going obsolete too often in case of “Performance Targets” or be too

impractical in case of “Theoretical benchmarks”. Both of these benchmark types will be discussed in separate

sections in chapters 4 & 5.

Absolute benchmarks are usually mounted on a “Scorecard” platform (Kaplan and Norton) in case of having

a multidimensional definition of performance. The scorecard helps develop a single scale definition of

performance, which is simply a weighted average of the performance score in each individual scale of the

multidimensional performance. Talking about scorecards takes us immediately to a discussion of “optimal weights”

to be assigned to various dimensions of performance. Later, in the analysis chapters, we will discuss the weights

issues, and we will also try to provide an alternative workaround through the use of DEA.

Empirical Benchmarking

From our research perspective, empirical benchmarking involves comparing the call center’s definition of

performance to other similar entities. These entities can fall under two main categories:

1- Industry benchmarking - in which the compared entities will be other call centers. Call centers then will

have to pay close attention to the comparability of the dataset used. This kind of benchmarking is usually

done through industry consulting firms that collect data from various companies in the same industry and

then sell the descriptive statistics of the collected data to be used by companies again for benchmarking.

The need for using consultants stems from the fact that companies feel more comfortable dealing with a

third party that promises to keep the confidentiality of their data.

2- Internal Benchmarking – the compared entities here are internal in the sense that the company is either

comparing some internal units/personnel to others “Peer benchmarking”, or comparing itself as a whole to

itself over time “Self benchmarking”.

a. Peer benchmarking “Agent Level” - In case of inbound call center agents’ performance, these entities

can be other agents of similar or different queues in the same company. But similar to industry

benchmarking, the company needs to ensure the comparability of different agents in terms of

factors that affect the agents’ performance such as “Experience”.

b. Self-benchmarking: Call centers can benchmark to their own performance over time, but they also

need to control for the changes in their “input parameters”/”factors of production” over time, such

as “staffing level” for example.

First, industry benchmarking means looking at competition and seeing what there definition of “profitable”

performance targets is. We would like to think that the similarity in customer base will mean a similarity in “value

proposition”, we know that this is not always the case, that’s why call centers need to be careful in selecting the


right competitors with similar “value proposition” to benchmark. Let us examine the reasonableness of industry

benchmarking for call centers. Without going into a discussion of the merits and the beneficial insights brought by

industry benchmarking, here are some of the challenges associated with industry benchmarking:

Many call centers, especially competitors, would prefer not to share performance data.

Call centers, even if they share the industry and the customer segments, are still going to be very

different in terms of their staff, company culture, policies, and technology infrastructure (including

the CRM software used). These differences and many more can make a performance target that fits

a competitor’s environment unreasonable in terms of another call center’s environment

Using professional consulting companies that do benchmarking studies in certain industries is quite

expensive when compared to the other alternative routes to benchmarking

Now, in regards to the second type of benchmarking, “i.e. internal benchmarking”. In this research, we are

suggesting the use of analytics tools, specifically “Linear regression” and “DEA” to carry out both kinds of internal

benchmarking in a more effective and efficient fashion. In the next section, we will explore both methodologies to

act as a background to our analysis that will be presented in chapters 4 & 5.

2.2 Linear Regression as an Empirical Benchmarking Analytics tool

In this section, we will try to briefly explain the concept behind the “Linear Regression” tool in simple

English, as well as provide a brief explanation on how to use it and interpret its report. In addition, we will provide

a brief tutorial of 6 steps on how this tool can be used as a performance evaluation tool in the call center context.

Understanding Linear regression

Linear regression tool is used to estimate a linear relationship between the variable under-study

“Dependent Variable” and other variables that are believed to affect it “Independent variables”. If we have a single

Independent variable, then we are using “Simple Regression”, while if we have multiple independent variables then

the method is called “Multiple Regression”. However, we can only have a single Dependent variable for each

relationship being estimated by linear regression. In addition to estimating the relationship between dependent

and independent variables, linear regression provides information on the statistical significance of various

parameters “i.e.𝛼, 𝛽, 𝜎2” of the linear model “i.e. p-value”. The right p-value “i.e. less than 0.05” for the different

model parameters informs the researcher that the chosen independent variables are statistically significant in

determining the dependent variable’s value. In other words, it confirms or rejects our understanding to which

independent variables affect the dependent variable being studied. The method commonly used to estimate the

linear model is called “Least Squares” method, which chooses the linear model line that minimizes the sum of

squared variations of the data from the regression line. Referring to the data variation from the regression line,


which is normal, we represent these variations using the error term “epsilon” which is assumed to be normally

distributed with a mean of “0” and a variance of “𝜎2”. A typical multiple regression model consists of the following:

Dependent variable (y)

Independent variables (X1, X2, etc…)

Intercept (𝛼)

Slope for every independent variable (𝛽1, 𝛽2, etc…)

Error term (𝜀)

The Model formulation is depicted as follows:

𝑦 = 𝛼 + 𝛽1𝑥1 + 𝛽2𝑥2 + ⋯ + 𝛽𝑛𝑥𝑛 + 𝜀 ~ 𝑁(0, 𝜎2)

Where, “n” – is the number of independent variables in the model

Regression model estimation report

A regression model usually starts with a researcher’s prior knowledge of what variable needs to be studied

“Dependent variable” and what factors might affect that variable “Independent variables”. After estimating the

regression model, the researcher is informed of the direction and magnitude of the relationships between the

dependent variable and the various independent variables by looking at the values of “Coefficients”, which are the

slopes (𝛽), in the regression model report. In other words, the regression model estimated along with its

significance reports, help adjust the researcher’s prior knowledge by highlighting the most statistically significant

independent variables through the p-value for each independent variable’s coefficient. In addition, the regression

model also provides an estimate of the overall ability of the model in explaining the variability in the dependent

variable (y). This estimate is represented by the “R-square, and adjusted R-square” values. These are very

important values to look for in a regression estimation report. In the chapters to follow, as we carry out our

regression analysis, we will be able to see examples of regression models estimation reports generated by Excel®.

Choosing the right variables

Before we choose the variables, we need to understand that a researcher’s prior knowledge of variables is

much more important than statistical significance values in regression reports, especially when we are using

regression to predict the expected performance range of various entities. The reason is that the statistical

significance is affected by whether the sample is representative or not. For example, if we are trying to study the

relationship between “service level” as a dependent variable and “staffing level” as an independent variable.

Although these two variables are clearly connected, but it may be the case that the sample of data used was at a

time where many newly hired employees where working, so they weren’t efficient enough, so they ended up

affecting service level insignificantly. To sum up, if regression is being used for prediction (like in performance


evaluation), then the statistical significance values aren’t the only thing to consider, unlike if it is being used for

analyzing past data.

Now, a dependent variable in a call center context is usually a performance metric that we are interested in

studying. Service level or AHT are very good examples of a dependent variable when evaluating the performance of

the call center as a whole, while Quality score can be an example of a dependent variable in a model studying the

performance of call center agents.

Using Multiple Regression as a Performance Evaluation tool

In the analysis chapters to follow, we intend to use multiple regression as a performance evaluation tool,

we will use it in the following fashion:

Step one: Decide on the outcome variable to be studied “y” and the proper predictors “x” of that

outcome using prior subject-matter knowledge.

Step two: Collect a representative sample of data of all the variables involved in the analysis.

Step three: Estimate the regression model parameters. “we used Excel®”

Step four: Use the estimated model parameters especially “alpha & Beta” to calculate the expected

value of the outcome “y-hat” based on the model estimation

Step five: Compare the values of the outcome “y” collected from the data, to the expected outcome

values “y-hat” by the model. And compute “percentage deviations from model estimate”.

𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑓𝑟𝑜𝑚 𝑀𝑜𝑑𝑒𝑙 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 = (𝑦𝑖 − ŷ𝑖)

ŷ𝑖 × 100

Step six: Rank the compared entities based on their “Percentage Deviation from Model Estimate”

from largest to smallest.

In the analysis chapters, we will illustrate the use of linear regression in evaluating different entities, or as

we will call them from now on “Decision Making Units” (DMUs). We will follow the exact same steps mentioned

above, and will report the results for the various DMUs based on that analysis. After exploring the use of multiple

regression in evaluating performance in call centers, we will provide a detail review, on the effectiveness of this

analytics tool to that specific use in chapter 6 after we conclude our analysis. To summarize, let’s take a look at the

following Exhibit.


Exhibit 2.1 Summary of “Linear regression” as a tool for empirical benchmarking

2.3 DEA as an Empirical Benchmarking Analytics tool

In this section we will try to explain the concept of “Data Envelopment Analysis” as a linear program. After

that we will explain the needs of DEA in terms of data formatting, and then we will lay down the formulation of

both models we are planning to use in the analysis chapters.

Understanding Data Envelopment Analysis

DEA since its inception has been widely used in measuring operational efficiency in many industries such

as the airline industry (Sarkis) (Lapre and Scudder), the Military (Sun), and Education (BEASLEY) (Agasisti and

Johnes). Its ease of use and minimal assumptions made it very popular in evaluating performance through

benchmarking. In addition, some interesting uses of DEA were in supplier evaluation and development programs

(Forker and Mendez), evaluating distribution centers’ performance (Ross and Droge), and Moreover, DEA is very

flexible in the sense that many variations of DEA has been created to fit different users’ needs in performance

evaluation.

DEA is a methodology that computes the efficiency of every DMU in converting “inputs” into “outputs”,

which is very similar to a production line. Inputs represent the resources and/or factors affecting performance,

while outputs are the performance level as measured by various performance metrics. For example, the Call

Volume a call center gets can be considered input, while the CSAT score is an output because it is a measure of the


produced performance. The DEA for the purposes of this research is a linear program that aims to maximize the

efficiency scores of the various DMUs being evaluated. Unlike linear regression, DEA isn’t trying to fit a line at the

center of the data to compare the DMUs to, rather, DEA flies over the data and looks at the top performers to create

an efficient frontier that every DMU is then compared to and given an appropriate efficiency score (Cooper, Seiford

and Zhu). Efficiency scores are 100% for efficient DMUs while it is less than 100% for other less efficient DMUs.

As we mentioned before, DEA is a linear program that tries to maximize the efficiency scores of all DMUs

within the established constraints. However, DEA is given the freedom to choose the weights given to each input

and each output. So, if a certain DMU performs badly on a certain output, DEA can choose to place less weight or

even no weight at all on that input, to make the DMU look as best as it can. Having DEA trying to maximize the

DMUs scores through the use of weights works as an alternative of making each DMU defend their performance

and build their case around which inputs and outputs they think they deserve to be evaluated on. In other words, it

gives the benefit of the doubt, which is very useful if the evaluator is an outsider to the process being evaluated.

Some might see that DEA is too lax on DMUs, but in some situations, having the DEA take the side of DMUs helps to

increase the perceived fairness of the process. Regardless, if the user requires a more firm evaluation, DEA offers

the flexibility to add weights or ranges of weights on each input or output, this variation of DEA is called “weight

restricted DEA” (Sunnetci and Benneyan).

Formatting data correctly for DEA

As we established so far, DEA requires data on both the inputs and the outputs of the evaluated

performance in order to carry out the analysis. DEA calculates efficiency scores, which means the ability to produce

as much as possible –which is measured by outputs – with the least resources you can, which is measured by

inputs. As a result, inputs and outputs should be formatted properly as follows:

Orientation of inputs - From a DMU’s efficiency score standpoint, if all DMUs have the same level of

output, then the DMU with the lowest input will be considered the most efficient. Hence, for DEA,

inputs should be oriented as “the lower, the better” – we mean better for the DMU’s efficiency score.

Orientation of outputs – Assuming a similar perspective, if all DMUs have the same level of input,

then the DMU with the highest level of output will be considered the most efficient. Hence, for DEA,

outputs should be oriented as “the higher, the better” – we also mean the DMU’s efficiency score.

A good example of an input formatted correctly is “staffing level” of a call center, which is the number of

agents working in the call center at a given point in time. The lower “staffing level” goes, assuming the same level of

output for DMUs, the better the DMU score will be because this means that the DMU used less resources “agents” to

produce the same output. On the other hand, a good example of an input formatted incorrectly is “Call Volume”.

The lower the call volume of a DMU goes, assuming the same level of output across DMUs, the lower the efficiency

score will be because this means that the DMU have produced that level of output in a less intense environment of


call volume compared to other DMUs. In the latter case with call volume, we can fix this problem by using its

inverse, which is “inter-arrival time”, which is the average time between the arrivals of customer calls. This way

inter-arrival time will be oriented correctly to fit the nature of inputs “i.e. the lower, the better”.

Now, a good example of a correctly oriented output is “service level”, because at the same input level, the

DMU that has the highest service level will be the most efficient. On the other hand, an output metric like “AHT” is

an example of an incorrectly formatted output. Because the higher the AHT will get, the worse the service, which

means that at a fixed level of input, the DMU with the highest AHT will be the most inefficient. To solve this

problem, we can also take the inverse of AHT, which is “Service Capacity”, which is the amount of customer calls

that can be managed in a unit of time (e.g. hour) at that level of AHT. This way, the DMU with the highest service

capacity, assuming the same input level, will be the most efficient “i.e. the higher, the better”.

The following Exhibit summarizes the main points in “Data Envelopment Analysis”.

Exhibit 2.2 Summary of DEA as a tool for empirical benchmarking

DEA weight-unrestricted model formulation

In the analysis chapters we will use two main variations of DEA. The first one in each analysis chapter will

be the baseline model, and for that we will use “Weight Unrestricted DEA”, in which DEA is given the freedom to

choose the input and output weights that maximizes the DMU’s efficiency score. After that we will run another

iteration but with a “Weight restricted DEA”, in which we will provide a desired weight level for each output. To

help familiarize the reader with the technical underpinnings of DEA, let’s take a look at the formulation of the

“Weight Unrestricted DEA” linear program:


Model parameters (weight unrestricted DEA)

Inputs: Let Xik denote the value of input i for DMU k

Outputs: Let Yjk denote the value of output j for DMU k

Weights for Inputs: Let ui denote the weight of every input i

Weights for Outputs: Let vj denote the weight of every output j

Efficiency Scores: Let Ek denote the efficiency score for every DMU k

Objective Function (weight unrestricted DEA)

𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒: 𝐸𝑘 = ∑ 𝑣𝑗𝑦𝑗1

𝑚

𝑗=1

Model Constraints (weight unrestricted DEA)

DMU Constraints:

∑ 𝑢𝑖𝑥𝑖𝑘 −

𝑛

𝑖=1

∑ 𝑣𝑗𝑦𝑗𝑘

𝑚

𝑗=1

≥ 0

Inputs Constraint:

∑ 𝑢𝑖𝑥𝑖1 = 1

𝑛

𝑖=1

Non-Negativity Constraints:

𝑢𝑖, 𝑣𝑗 ≥ 0

DEA weight-restricted model formulation

Now, in regards to the “Weight restricted DEA” model formulation, it will be the exact same model, except

we will add 1 more parameter and 1 more set of constraints. The additions will be as follows:

Additional Model parameter (weight-restricted DEA)

Assigned Weights: Let wj denote the assigned weights for every output j


Additional Model Constraint (weight-restricted DEA)

Weight Constraints:

𝑣𝑗 ≥ 𝑤𝑗 ∑ 𝑣𝑗

𝑚

𝑗=1

Finally, we would like to end this chapter with a figure that represents the scope of this research in terms of

the performance evaluation methodology and analytics tools involved.

Exhibit 2.3 Research Scope Summary


Chapter 3: Case study background and data description

In this chapter, we introduce the datasets that will be used in the analysis in chapters 4 and 5. We have used

two datasets extracted from two different call centers operating in the MENA region. We refer to the two

companies in which the call centers operate anonymously as (Company A, and Company B). The datasets are

identified as follows:

Dataset 1: Weekly aggregated call center performance for 24 weeks of operations (Company A)

Dataset 2: Agent’s performance aggregated over a three-month period (Company B)

3.1 Company Description

Company A

Dataset 1 came from a fairly new company (started in 2012) that operates in the online retailing business.

Company A’s call center operations are hosted in-house, unlike Company B, which we will discuss next. Company A

like any other big retailer needs to provide pre-sale and after-sale service to help customers with their purchases.

They also need a returns department that manages product returns, In addition to a sales team that works to

promote the company and call up customers to inform them of new offers and run subscriptions, they also offer

their services in multi-languages (Arabic and English). Moreover, in an attempt to push transaction costs down,

Company A’s Customer Care department adopted an Omni channel approach, in which they used emails, live chat,

and social networking – in addition to the conventional telephone lines – to handle customers’ inquiries. As a

result, Company A’s call center is quite mixed in terms of call center services offered (inbound and outbound). But

in this research we are mainly interested in Company A’s inbound call center performance, so we will be focusing

on the customer care department only and exclude any telesales or telemarketing divisions from our analysis.

Company A’s Customer Care department is divided into mainly 5 queues, each specializes in a unique set of

services (initially they were 6 queues, but queue 3 got merged with queue 1 beginning from Week 30 of

operations). Queues 1 & 2 handle customers in Arabic language, while Queues 4 through 6 handle English-speaking

customers. What makes Company A’s operations unusual is the fact that some of the call center agents circulate on

different queues every week in an attempt to maximize agent utilization. This necessitates the “cross training” of

those agents being circulated, which can be costly. But given that this is a startup, and call demand is still hard to

predict, maximizing utilization through cross training is a reasonable strategy. It is also worth mentioning that

Company A’s call center operates on three shifts, but they overlap for most of the working day. So, for simplicity

purposes, we will assume that all agents are there all the time. They operate for 12 hours a day, from 9 am to 9 pm,

and they are open 7 days a week.


Company B

In regards to Dataset 2 that will be used in this research, it was obtained from a call center outsourcing

services Company. Company B provides call center services to other companies that wish to outsource their call

center operations. Company B is also a well-known company in its field, and operates in the MENA region as well.

Company B’s call center floor hosts many accounts for different client companies. Most of Company B’s

agents are paid a fixed salary, and many of them have been in the company for a relatively long time. Company B is

one of the biggest call centers in its region, it has a reputation of being one of the best places a call center agent can

work, and this is mainly due to their loyalty to their good employees.

Dataset 2 that we collected from Company B is from an inbound account that is fairly new, it has started

around June 2014. This account, like all the others, is operated and staffed by Company B’s management, while the

client can only ask for specific criteria for selection and hiring, and in some cases they require that they interview

the agents before they are approved.

Next we describe each of the two datasets in details.

3.2 Company A - Dataset 1: Call Center’s Aggregate Performance

In Dataset 1, we were able to capture the inbound call center’s aggregate performance over the period of 24

weeks of operation, after the exclusion of one week (Week 31) in which performance was considered to be an

outlier due to low “Call Volume”. The inbound call center’s aggregate performance is represented by two main

metrics (1) Average Handling Time (AHT) across all agents per week, and (2) Service Level offered by the call

center for that week. In addition, we obtained data about “Staffing level” of the call center per week, as well as the

“Call volume” per week. So, to sum up, Dataset 1 has the following fields for 24 weeks of operations:

Weekly aggregate AHT

Weekly Service level

Weekly Staffing level

Weekly Call Volume

The dataset looks as follows:


Exhibit 3.1 Company A’s raw data (Dataset 1)

Week 31 was the outlier that was removed from the sample.

3.4 Company B - Dataset 2: Agent’s overall performance

In this dataset, we have obtained three-month’s aggregated agent performance data for 30 agents, who

work in different queues in the same account. The data we have consists of:

Queue AHT target

AHT per agent aggregated over the three-month period

Quality score per agent aggregated over the three-month period

Agents’ employment date

Agent absenteeism for the three-month period

Agent adherence for the three months period

Staffing level

(Agents per Week)

Call Volume

(Customers per Week)

AHT

(mins)

Service Level

(%)

Week 14 18 7122 4.33 23.67%

Week 15 18 9664 4.72 7.00%

Week 16 18 10857 5.19 5.15%

Week 17 18 8264 4.23 16.00%

Week 18 18 10064 3.99 15.26%

Week 19 25 9663 5.72 6.91%

Week 20 25 8287 4.85 20.96%

Week 21 25 9813 5.39 11.23%

Week 22 25 10262 4.47 12.13%

Week 23 27 11511 4.61 7.84%

Week 24 27 10552 4.65 7.81%

Week 25 27 8968 4.67 10.68%

Week 26 33 7914 4.63 18.40%

Week 27 36 6168 4.67 34.74%

Week 28 36 7340 4.47 32.22%

Week 29 36 7676 4.30 40.75%

Week 30 36 10403 4.22 21.74%

Week 32 39 10121 4.27 42.46%

Week 33 39 9689 4.52 35.70%

Week 34 39 9487 4.66 61.75%

Week 35 45 8201 4.66 66.85%

Week 36 45 8106 4.70 59.71%

Week 37 50 8538 4.47 52.76%

Week 38 64 8137 4.80 64.08%


This dataset has captured the three main dimensions of a call center performance, which are:

1- Agent Productivity: which is represented here by AHT

2- Agent Quality: which is represented here by Quality

3- Agent Punctuality: which is captured by agent attendance and adherence metrics

Exhibit 3.2 Company B’s raw data (Dataset 2)

Both datasets above are analyzed in the next two chapters. We will start with the first level of analysis,

which is the call center’s overall performance “aka. Aggregate Performance”. Aggregate performance looks at the

call center metrics aggregated across all queues and accounts. These metrics are designed to give its user a feel of

the call center’s ability to carry out its basic tasks which are (1) answering the customers, (2) Serving the

customers fast enough and with a (3) appropriate level of Quality.

Agent Name Queue Employment dateQueue AHT

targetAHT Quality

Agent

Absenteeism

Agent

Adherence

Agent 1 6-Jul-14 2.88 2.92 95.2% 8.2% 98.8%

Agent 2 11-Feb-14 2.88 2.87 94.3% 17.9% 99.9%

Agent 3 24-Jul-14 2.88 3.43 83.3% 10.0% 99.7%

Agent 4 3-Apr-14 2.88 2.80 91.1% 13.8% 99.5%

Agent 5 9-Jun-14 2.88 3.10 94.6% 6.5% 100.0%

Agent 6 24-Jul-14 2.88 3.70 86.6% 15.8% 99.2%

Agent 7 6-Jul-14 2.88 3.08 98.7% 9.1% 99.3%

Agent 8 12-Jul-14 2.88 3.22 89.2% 26.9% 100.0%

Agent 9 11-Feb-14 2.88 3.03 98.9% 11.9% 99.5%

Agent 10 12-Jul-14 2.88 3.22 90.3% 8.2% 99.7%

Agent 11 18-May-14 2.88 3.15 92.5% 5.6% 96.4%

Agent 12 9-Jun-14 2.88 3.77 98.4% 32.0% 99.3%

Agent 13 9-Jun-14 2.88 2.83 90.9% 13.8% 96.7%

Agent 14 9-Jun-14 2.88 2.73 99.6% 15.8% 99.7%

Agent 15 21-Jun-14 2.88 2.63 99.8% 11.9% 100.0%

Agent 16 6-Jul-14 2.88 4.02 86.6% 10.0% 99.9%

Agent 17 24-Apr-14 2.88 2.87 91.6% 30.7% 96.7%

Agent 18 8-May-14 2.88 3.85 90.9% 22.2% 99.9%

Agent 19 12-Jul-14 2.88 3.58 90.0% 8.2% 99.9%

Agent 20 6-Jul-14 2.88 3.27 95.7% 15.8% 99.3%

Agent 21 12-Mar-13 3.38 4.52 95.9% 2.6% 99.0%

Agent 22 16-Apr-14 3.38 3.02 100.0% 4.7% 97.3%

Agent 23 3-Jun-14 3.38 4.45 84.8% 14.7% 99.5%

Agent 24 8-Jan-14 3.38 4.80 92.0% 12.2% 99.0%

Agent 25 3-Jun-14 3.53 4.08 70.0% 9.1% 98.6%

Agent 26 14-May-14 3.53 3.55 91.7% 4.0% 95.8%

Agent 27 24-Jun-14 3.53 4.10 100.0% 25.8% 93.6%

Agent 28 1-Aug-14 3.53 4.02 81.0% 11.4% 94.9%

Agent 29 3-Jun-14 3.53 3.33 90.9% 5.4% 99.1%

Agent 30 3-Jun-14 3.53 2.80 57.8% 3.3% 97.2%

Qu

eu

e 1

Qu

eu

e 2

Qu

eu

e 3


Chapter 4: Aggregate Performance Tracking

4.1 Introduction

This Chapter examines Aggregate Call Center Performance from a “client company’s perspective”. Thus, we

are assuming an outsourced call center situation rather than an in-house call center. Aggregate call center

performance means: “The call center metrics that measure the ability of the call center in carrying out its most

basic tasks aggregated across all queues”. In our analysis, through the client perspective, we will look at the call

center performance over time “i.e. time series analysis”, as opposed to analyzing call center performance at a single

point “i.e. snapshot”. The client company cares mainly about the following aspects of performance over time:

How many of the calling customers have been answered in a reasonable time period?

What is the average service time?

What is the quality of service they are receiving?

The answer to the first question is well-captured through the “Service level” metric, which is the percentage of

customers answered in a given threshold of time (e.g. 20 seconds) of the total number of customers calling that

day/week/month. Client companies appreciate the service level metric so much that they have created what is

called a “Service level agreement” (SLA) with outsourcing destinations. SLAs allow clients to tie the outsourcing

destination’s compensation to the service level they achieve at the end of each period. SLAs help align the

incentives of the outsourcing destination to those of the client company which wishes to see service level going up.

Client companies also expect that every year the outsourcing destination should become more efficient, so they

usually reduce the value of the contract by a certain percentage every year. Service level will be included in our

aggregate analysis, later in this chapter.

The second question is answered by looking at the “Average Handling Time” metric, which shows the average

service time experienced by customers answered on that day/week/month. AHT is considered one of the most

common metrics in the call center business, due to its direct effect on queue length, which in turn affects service

level. So, the higher AHT goes, the lower the average capacity of agents to handle customers, which means that the

queue length will increase, and more customers will stay on the line beyond the desired threshold, which means a

lower Service level. AHT will also be included in our aggregate analysis, later in this chapter.

Last but not least, the quality question can be answered in two ways (1) by looking at the Quality scores given

by the internal Quality staff (working for the outsourcing destination). (2) If the client prefers a third party to do

the analysis, they would use a customer satisfaction survey (CSAT), which simply asks the customer after finishing

the call with the Customer service representative to rate the various aspects of customer service experience (e.g.

on a scale from 1 to 10). Quality is usually measured on the agent level, and sometimes when CSAT is used, the call

center can aggregate the data into an overall call center service quality score. However, in our aggregate analysis


with dataset 1, due to lack of data about aggregate quality, we will not be able to include Quality as a dimension in

our analysis.

This analysis is challenging because considering individual metrics such as service level, AHT, or quality

independently or without considering the inputs used to produce these metrics does not provide a holistic view of

the call center performance. For example, a client can look at the service level for the past 3 month to find it

improving while the outsourcing destination have billed them to hire double the number of agents, which means

that the outsourcing destination might not be managing the resources they have efficiently. Or maybe the AHT and

service level look promising while the service quality has deteriorated. To summarize, the main challenges in this

analysis are as follows:

There are multiple outputs (i.e. performance metrics) involved in the analysis, some of which might be in

tension with each other (e.g. AHT and quality), so one cannot easily conclude a meaningful story just by

looking at one output, or even by looking at all outputs separately.

Looking at the outputs alone without looking at the costs associated with producing these outputs is

absurd. For example, the outsourcing destination can improve service level by hiring more agents rather

than managing their existing agents more efficiently.

Having multiple outputs and multiple inputs adds much more complexity to the analysis.

For these reasons, the aggregate call center performance analysis from a client company perspective is very

challenging without the use of the right analytics tool. In the next section, we will try to see how informative

“Dataset 1” can be to a client without the use of analytics (i.e. Preliminary Data Analysis). After that, in the sections

to follow, we will attempt to use different analytics tools to see if we can extract a much more meaningful picture

than the one achieved in the preliminary data analysis without analytics.

4.2 Preliminary Data Analysis

In this section we will try to look at the data presented in dataset 1, while assuming a client perspective to

the performance evaluation problem over time. We will try to extract the most we can from the dataset without the

use of any analytics. In later sections, we will try to compare and contrast the results we get from this section to the

results achieved by the use of analytics. For standardization purposes, we will transform two items from dataset 1

to be friendlier to the DEA analysis in the following sections. We will transform (1) AHT, which is an output (i.e.

performance metric), to its inverse format “Service Capacity”. The reason is that we need it to be in a “the more, the

better” format, to fit the needs of the DEA analysis. (2) Call Volume, which is an input (i.e. differentiating variable),

will be transformed to “inter-arrival time” – its inverse – because inputs are needed to be in a “the lower, the

better” format for the purpose of the DEA analysis. Now, with our data oriented correctly, let us start by examining

the various outputs and inputs in dataset 1. The metrics involved in our analysis are as follows:


Outputs (Performance metrics) – orientation is “The higher the better”:

(1) Service Level Percentage (Weekly)

(2) Customer service representatives (CSR) service capacity per hour – It is the reciprocal of average weekly

handing time (AHT) multiplied by 60 minutes per hour - which is used to reflect that the higher the

capacity, the lower the AHT, the faster the service, thus the better the performance.

Inputs (Differentiating variables) - orientation is “The lower the better”:

(1) Staffing level (# of CS agents staffed for every week) (Weekly)

(2) Inter-arrival time (minutes), which is the reciprocal of “Call Volume” multiplied by 12 hours of operations

per day, 7 days a week, and 60 minutes per hour – This is used to reflect that the bigger the call volume, the

smaller the inter-arrival time, which means the more difficult it is to maintain a good service level given a

fixed number of staff. (Weekly)

Our analysis extends from Week 14 of Company A’s Operations to Week 38, with the exception of week 31,

which as mentioned above, is an outlier that had to be eliminated from the analysis. The data after the necessary

transformations looks as follows:

Exhibit 4.1 Dataset 1 in analysis-ready formatting

Staffing level

(# of Agents)

Inter-arrival time

(mins)

CSR Service

Capacity per hourService Level (%)

Week 14 18 0.71 13.87 23.67%

Week 15 18 0.52 12.71 7.00%

Week 16 18 0.46 11.57 5.15%

Week 17 18 0.61 14.17 16.00%

Week 18 18 0.50 15.05 15.26%

Week 19 25 0.52 10.48 6.91%

Week 20 25 0.61 12.37 20.96%

Week 21 25 0.51 11.13 11.23%

Week 22 25 0.49 13.42 12.13%

Week 23 27 0.44 13.02 7.84%

Week 24 27 0.48 12.91 7.81%

Week 25 27 0.56 12.85 10.68%

Week 26 33 0.64 12.96 18.40%

Week 27 36 0.82 12.86 34.74%

Week 28 36 0.69 13.42 32.22%

Week 29 36 0.66 13.95 40.75%

Week 30 36 0.48 14.22 21.74%

Week 32 39 0.50 14.06 42.46%

Week 33 39 0.52 13.27 35.70%

Week 34 39 0.53 12.87 61.75%

Week 35 45 0.61 12.88 66.85%

Week 36 45 0.62 12.75 59.71%

Week 37 50 0.59 13.42 52.76%

Week 38 64 0.62 12.50 64.08%

Inputs Outputs


We will start our preliminary data analysis by graphing the outputs and look at their descriptive statistics

and trends over time, and then we will do the same for inputs to try to see if we can get a meaningful picture.

Descriptive Statistics:

Outputs:

Exhibit 4.2 Outputs graphed (Dataset 1)

CSR Service Capacity per hour

Service capacity per hour doesn’t show a general direction across the period of analysis, it fluctuated

significantly from Week 14 through Week 22, and then fluctuation seemed to tone down. The all-time high

in service capacity per hour was achieved in Week 18 with approximately 15 customers per hour, and the

all-time low happened at Week 19 with approximately 10.5 customers per hour. The steepest incremental

decrease was at Week 19 were the service capacity dropped by almost 4.5 customers per hour. The average

service capacity throughout the period of analysis was approximately 13.1 customers per hour.

Service Level

Service level started at Week 14 at 23.67% and also fluctuated without a general direction until week 24.

Then at week 25 the uptrend started with an all-time high of 66.85% at Week 35. The highest incremental

increase happened during Week 34, where service level improved by 26.05% in one week to become

61.75% at Week 34. Service level closes strongly at Week 38 with a 64.08%. The all-time low for this period


of analysis is at Week 16 with 5.15% of service level. The average Service level throughout the period of

analysis is 28.16%.

Relating both metrics

The output metrics are weakly correlated. The correlation coefficient is 0.16823.Therefore, the fluctuation

in Service level doesn’t seem to be explained so much by the Service Capacity per hour or vice versa.

Inputs:

Exhibit 4.3 Inputs graphed (Dataset 1)

Staffing level

Since Company A’s operations is in the start-up stage, the customer care department’s staffing level is

increasing over time as there sales and customer base grow, it started at Week 14 with only 18 agents. In

the period of analysis (24 weeks) the staffing level has increased by 2.5 folds to reach 64 agents by Week

38. The highest incremental increase took place at the last week with 14 Customer service agents hired in 1

week!

Inter-arrival time

Inter-arrival time seems to fluctuate with no general direction with the exception to the huge increase –

which means a decrease in call volume - that happened starting from Week 24 until Week 27, where the


inter-arrival time increased by approximately 0.11 minutes on average every week through that period.

The biggest incremental change was at Week 17 where inter-arrival time increased by about 0.15 minutes

in only 1 week. The all-time high is represented by Week 27 at an inter-arrival time of 0.82 minutes. While

the all-time low was about 0.44 minutes at Week 23. The average inter-arrival time throughout the period

of analysis is 0.57 minutes.

Relating both metrics

These two metrics have a weak positive correlation of 0.2635.

As a result to our descriptive statistics, we can see clearly that there is no agreement on which week of

operations was the best, the rankings based on our two outputs “Service level” and “Service Capacity are as

follows:

Exhibit 4.4 Summary of ranking by outputs (Dataset 1)

We can see that the only agreement is on the worst week, which is probably coincidental. But in general, there

is no definition yet to be found of “overall performance”.

Now, after looking at outputs and inputs separately, let us examine the possibility of combining some of the

four variables together to get more meaningful conclusions. But first we need to look at the relationship between

the various variables.

Correlation matrix:

The correlation matrix showed a significant positive correlation between “Staffing level” and “Service

level”. This is expected, because service level is inversely proportional to the time customers spend waiting on the

line, which is in turn also inversely proportional to the number of servers (staff) available. That is why Service level

is directly proportional with staffing level. The other moderately significant relationship is found between Inter-

arrival time and Service level, which is also very intuitive, as when the inter-arrival time increases, the call volume

decrease, which makes achieving a higher service level more possible.

Service level Service Capacity

Best Week Week 35 Week 18

Second Best Week Week 34 Week 30

Worst Week Week 19 Week 19


Exhibit 4.5 Correlation Matrix (Dataset 1)

Using ratios to combine variables:

A common way of combining multiple variables to give a more meaningful representation of performance

is “ratios”. For example, retailers use “sales per square foot” to examine the sales performance after controlling for

“retail space” as a differentiator between retail chains. This combines the output metric “i.e. sales” with an input

resource that was used to produce it “i.e. retail space”. We think that applying the same methodology to our

problem here can be very beneficial, as it will tell a better story than we have now.

Agent contribution to Service level: This ratio represents the average contribution to service level made by

each agent every week. The ratio is achieved by dividing weekly “Service level” in dataset 1 by the weekly

“Staffing level”. The units of this ratio is “percentage per agent”. By looking at the trend made by this ratio,

we can already see that it is telling a different story from the individual metrics “Service level” and “Staffing

level”. For example, the all-time high service level (66.85%) was achieved at “Week 35” at a staffing level of

45 agents. While by looking at the “agent contribution to service level” ratio, we can see that “Week 34”,

which had only (61.75%) service level, is the all-time high (1.6% per agent) as compared to Week 35’s

(1.5% per agent), because it achieved that level with only 39 agents.

Exhibit 4.6 Agent Contribution to Service level graphed (Dataset 1)

Staffing level

(# of Agents)

Inter-arrival

time (mins)

CSR Service Capacity

per hour

Service Level

(%)

Staffing level (# of Agents) 1.00

Inter-arrival time (mins) 0.26 1.00

CSR Service Capacity per hour 0.02 0.11 1.00

Service Level (%) 0.85 0.40 0.17 1.00


Agent’s Share of Successful Calls (ASSC): This ratio represents the average number of customers that were

handled successfully “i.e. within service level threshold” by every agent for each week. This ratio is

calculated by using “inter-arrival time” in its original form “Call Volume”, and then multiplying the call

volume by “Service level”, which should give us the number of calls that were handled “successfully”, which

will then be divided by the number of agents available “i.e. Staffing level” that week. This metric, combines

3 out of 4 variables, and paints a slightly different picture than the one painted by “Service level” only,

especially towards after Week 34. We can see that in various occasions (especially towards the end of the

period), “Service level” over estimates the performance of some weeks, while ASSC because it adjusts for

inputs, we can see that as the hiring frenzy picks up in the last few weeks, ASSC starts penalizing overall

performance. It is worthy to be mentioned that “Week 34” is considered dominant to all other weeks, which

is consistent with our findings with the “Agent Contribution to Service level” ratio.

Exhibit 4.7 ASSC metric graphed with Service level (Dataset 1)

As we can see, combining variables with ratios that make sense is very beneficial, and gives a clearer and

more holistic picture on real performance, which otherwise would have been hidden in the folds of single-variable

analysis. To summarize the differences, let us take a look at the different rankings of weeks of operation in dataset

1 achieved by different metrics and ratio analysis.


Exhibit 4.8 Summary of Preliminary Analysis rankings (Dataset 1)

Although we couldn’t reach an agreement on a unified ranking, but we can see that by deploying ratios, a

more holistic picture of performance can be achieved. As a result, we conclude that the ratio analysis is very

beneficial as it works to combine different metrics into one more meaningful ratio, but unfortunately it has the

following drawbacks:

Ratio analysis cannot combine all 4 variables and still produce a new meaningful metric

Ratio analysis might require some subject matter knowledge in terms of which metrics affect which, or how

they are connected, which might not be the case assuming a client company perspective.

As a result of this preliminary data analysis, we can have a sense of what the optimal analytics tool should

produce:

It should give a holistic view of performance “include ALL variables in the analysis”

It should produce a unified ranking of the different weeks of operation in terms of overall performance

It should be fair to the outsourcing destination without having to know too much about the call center’s

internal operations

In the next few sub-sections we will explore different analytics tools and analyze the strengths and weaknesses

of each in tackling this particular performance evaluation challenge. We will examine two main framework

approaches to using analytics:

1. Benchmarking to a theoretical yardstick, in which we will use analytics to compare overall performance to

what it should have been from a theoretical standpoint based on “Queuing theory”

2. Benchmarking to an empirical yardstick, in which we will use analytics to compare company A to itself over

time and develop a ranking based on that, in this approach we are going to experiment with specifically

“Multiple regression” and “Data Envelopment Analysis” as possible analytical tools.

4.3 Theoretical Benchmarking “Queueing Analysis”

In an attempt to answer the “right tool” question, we thought we should examine the benchmarking

methodology since the client has very little knowledge of the call center operations, thus benchmarking can

compensate the client’s lack of subject-matter knowledge by providing a benchmark for comparison.

Service Level Service CapacityAgent Contribution to

Service LevelASSC

Best Week Week 35 Week 18 Week 34 Week 34

Second Best Week Weel 34 Week 30 Week 35 Week 35

Worst Week Week 19 Week 19 Week 19 Week 19


In this analysis, we will attempt to use “Queueing Theory” to produce the optimal weekly service level “i.e.

the theoretical benchmark” given the weekly “Staffing level, inter-arrival time, and Service Capacity” by the use of

an excel template produced by Professor John O McClain of Cornell University.

The Excel® template is available for free use, and it includes two different files, the first one

(QueueTransient.xlsx) calculates MMc models assuming transient queue “which is fit for a system that is just

starting”, and the second one (Queue.xlsx) calculates MMc models assuming steady-state queue “which is fit for a

system that has been online for quite a while”. We have decided to use the steady-state template, since Company

A’s call center has been operational for almost 2 years now. The steady-state template accommodates the needs of

different call center types by introducing different sheets, each accommodates a different option as follows:

1- Finite queue sheet, which is fit for a situation in which the queue length is limited. This can represent a call

center that has limited queue capacity “i.e. trunk lines” which if all of them are occupied, the next callers

would get a busy signal and won’t be allowed to join the queue until at least one trunk line is freed. Trunk

lines are freed when waiting customers are connected to a free agent’s terminal.

2- Infinite queue sheet, which suits situations where queue length is virtually unlimited to a certain number, or

information about queue length limitation isn’t available. These two templates “Finite and Infinite queues”

assume exponential distribution for service time and inter-arrival time. This takes us to the last template

3- Queue simulation sheet, which allows for change in the value of the “Coefficient of variation” which if set to

1 represents exponential distribution for service time, but if it is set to different values, it can represent

other types of distributions for service time, while inter-arrival time is always exponentially distributed.

This allows for flexibility in choosing distributions to fit different call center patterns.

For the problem in hand, we decided to start by using the “Infinite Queue” template, since we lack the

proper data about Company A’s trunk lines “i.e. queue capacity”. The template requires 3 pieces of data to calculate

the theoretical service level for each week, these pieces are:

1- Number of servers, which we have in dataset 1 for each week as the “Staffing level”

2- Arrival rate per hour, which is simply the “Call Volume” divided by the number of hours of operation, which

we also have in dataset 1

3- Service Capacity for each server, which we also have in dataset 1 as the “CSR Service Capacity per hour”

We have successfully conducted the analysis for each of the 24 weeks, but unfortunately, the results weren’t

very helpful, because all the theoretical service levels were defined as 100% for all the 24 weeks. This result wasn’t

too surprising given the MMc models assumptions. Considering the MMc models assumptions we think that these

are the possible reasons for queuing model to be a poor tool for this problem. We divided the reasons into

“Assumption-related shortcomings” and “Modelling or data shortcomings”:


Assumption-related shortcomings

The template used “Infinite queue” assumes exponential distribution for service times, which might

not be the case for Company A’s call center.

The template used “Infinite queue” as a part of the “Queue.xlsx” excel worksheet assumes a steady-

state queue, which might not be true for Company A’s queue given its start-up nature. “N.B. Some

companies take years to reach a steady state queue”

Since Company A has 5 different queues, MMc queueing model –which is the one used in the

“Infinite queue” template- might not be the right model to be used on this particular problem.

Because MMc assumes 1 waiting line for all servers, but in Company A they might have 5 different

waiting lines in front of each of the 5 queues.

Some customers might be transferred multiple times to different queues to perform multiple

transactions in one phone call, leading to a larger processing time for this customer, and more

waiting time for other customers. This isn’t captured by the MMc model used.

Modelling/Data shortcomings

The template used “Infinite queue” assumes infinite queue size, which is not the case for Company

A’s call centers. But we couldn’t get the data regarding there trunk lines.

The model doesn’t account for the “After-call work” (ACW) status that a call center agent might use

to get a chance to finish the forms that need to be filled after a call or any other call-related tasks.

This ACW status eats away from the agent’s time of service

The queueing model doesn’t capture the different shifts of Company A’s call center agents, since not

all agents are available for all the operating hours

The model doesn’t capture the fact that agents take breaks, and some of their break times are

variable, conditional on the queue traffic.

All these factors are possible reasons behind having all theoretical service levels as 100%. This made it

very difficult to use these results as a benchmark to be used by the client company. That’s why we wanted to try

the “Queue simulation” template in which we can have more freedom to change the service time distribution, but

the template required knowledge of the Queue capacity “i.e. number of trunk lines”, which we don’t know about

Company A. That’s why we had to give up on using queueing theory as an analytical tool to produce a theoretical

benchmark for this problem in-hand.

Now, it is obvious that this problem requires a more sophisticated, yet a more practical tool of analysis, one

that can look at different variables at the same time and provide a good picture on performance over time. Let us

examine “multiple linear regression” as a possible tool to carry out this analysis.


4.4 Empirical Self-Benchmarking - I “Multiple Regression” – Dataset 1

The reason we considered using regression analysis was that its data driven nature, which will allow it to

be much more practical than the previous analytics tool in the sense that it uses the call center data to estimate a

model that would set the expectations for each entity under evaluation. As we mentioned in chapter 2, we will use

the “six steps” to evaluate aggregate overall performance using regression over the period of 24 weeks (dataset 1).

But first, we need to define the following:

1. DMU: in this analysis, we will look at the call center’s aggregate overall performance per week, which is

defined by both Service Capacity and Service level. So, each DMU will represent a week of operations in

Company A.

2. Independent variables (inputs): our independent variables will be (1) Weekly Staffing level of the call

center (2) Weekly Average inter-arrival time of.

3. Dependent variables (outputs): our dependent variables are (1) Average Service Capacity per agent per

hour”, and (2) Service level (%)

Now, given the limitations of multiple regression, while we can have multiple independent variables, we can

only have one dependent variable in each model, which means that we will have to have a separate model for each

of the two dependent variables “Service Capacity” and “Service Level”. The models are formulated as follows:

𝑆𝑒𝑟𝑣𝑖𝑐𝑒 𝐶𝑎𝑝𝑎𝑐𝑖𝑡𝑦 (𝑦1) = 𝛼1 + 𝛽1 ∗ 𝑆𝑡𝑎𝑓𝑓𝑖𝑛𝑔 𝑙𝑒𝑣𝑒𝑙 (𝑥1) + 𝛽2 ∗ 𝑖𝑛𝑡𝑒𝑟𝑎𝑟𝑟𝑖𝑣𝑎𝑙 𝑡𝑖𝑚𝑒 (𝑥2) + 𝜀1

𝑆𝑒𝑟𝑣𝑖𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 (𝑦2) = 𝛼2 + 𝛽3 ∗ 𝑆𝑡𝑎𝑓𝑓𝑖𝑛𝑔 𝑙𝑒𝑣𝑒𝑙 (𝑥1) + 𝛽4 ∗ 𝑖𝑛𝑡𝑒𝑟𝑎𝑟𝑟𝑖𝑣𝑎𝑙 𝑡𝑖𝑚𝑒 (𝑥2) + 𝜀2

Having the models ready, we will go ahead and use excel to estimate the parameters for both models (See

Appendix 4.1 for Model estimation reports). Now, after estimating the models parameters, we can go ahead and

use both models to calculate the expected value for outputs [model 1 = “Service Capacity”, model 2 = “Service

Level”], after that we will compute the percentage deviation of each week’s output from the estimated output by

the regression model (See Appendix 4.2 for detailed results). We now discuss the conclusions that we can drive

from the results of both regression models.

Model 1 “Service Capacity”: regression is painting a picture that we haven’t seen before so far. It has

combined 3 out of the 4 variables in dataset 1 to produce a very meaningful aggregate performance

evaluation. Week 18 dominates the other weeks according to this ranking (as opposed to Weeks 34

and 35 in the preliminary data analysis), the worst week seems to be Week 19, which is consistent

with our findings from the “Preliminary Data Analysis”.


Exhibit 4.9 Percentage Deviation from Model 1’s estimate graphed (Dataset 1)

Model 2 “Service level”: The results from this model are quite interesting, because they also oppose

the initial rankings presented by the ratios in the preliminary analysis, as Week 18 here is the

dominating week and not week 34 or 35 as was discovered in the preliminary analysis. The results

from this model are highly reliable, as this model has an Adjusted R-square of 74%. The worst week

here is still week 19 though.

Exhibit 4.10 Percentage Deviation from Model 2’s estimate graphed (Dataset 1)


Now, by looking at these results, one can see the value that is brought by the comprehensiveness of

analytics when compared to the results of the “Preliminary Analysis”. Also, the picture painted by regression here

is much more elaborate and practical than Queuing analysis (which said that every week wasn’t efficient!). Here,

we can see that regardless of what theory says, regression takes a data-sensitive approach by setting expectations

based on the dataset’s performance in general.

Best Week(s) Second Best Week(s) Worst Week(s)

Model 1 (Service Capacity) Week 18 Week 30 Week 21

Model 2 (Service level) Week 18 Week 14 Week 24

Exhibit 4.11 Summary of post-regression rankings (Dataset 1)

As appealing as this is, unfortunately, we are still unable to answer the simple question of “which Week is

best overall?”, “which week is the second best?” etc… We only got lucky that Week 18 is dominating both output

metrics (Service Capacity and Service level), so we can answer who is the best, but what about who is the second

best, etc… Multiple linear regression is a very beneficial analytics tool, but as a tool that is supposed to tackles this

specific performance evaluation challenge it has the following drawbacks:

In case of multiple performance metrics, regression can’t produce a single overall evaluation of

performance, since it can only take in one dependent variable. So regression will always result in

multiple rankings.

Regression is a very widely used analytical tool, but not all relationships are linear, in which case,

linear multiple regression will not be the right tool to be used. The perfect example for that is when

we get into dataset 2, we will see experience as an input. Experience isn’t linear because it has

diminishing returns over time, that’s why we couldn’t use linear regression with experience as an

input in dataset 2.

So, overall, regression analysis is an improvement from the less comprehensive and less accurate

“Preliminary Data Analysis”, and is also a more practical approach than the Queuing theory’s too-theoretical

benchmark. But in general, it is still missing the ability of giving a single definition of overall performance across

multiple performance output metrics (i.e. Service Capacity and Service level).

4.5 Empirical Self-Benchmarking - II “Data Envelopment Analysis” – Dataset 1

Given the nature of this problem, we needed a tool that will satisfy the multi-dimensional nature of the overall

call center performance evaluation challenge. Although using DEA to evaluate performance over time is an

uncommon application of DEA, but we choose DEA because:


(1) DEA is multi-dimensional in the sense that it can handle multiple outputs (Performance metrics) and

multiple inputs. This attribute is central to our analysis from a client company perspective, since the client

usually faces multiple dimensions of performance outputs, as well as multiple input parameters.

(2) DEA produces a single efficiency score to indicate the efficiency of every DMU in producing outputs, given

the input parameters. These efficiency scores are comparable in every sense because they were created

after considering the different input levels, which means that the client will have a much better picture of

how well the outsourcing destination’s team is managing call center operations given the varying input

levels every week.

(3) Since, as a client company, we don’t know too much about the outsourcing destination’s internal processes,

we needed a tool that will be fair to the outsourcing destination in the sense that it gives the highest

possible efficiency score given any “weights” assigned to inputs. This is equivalent to letting the

outsourcing destination management team defend themselves and build their argument around which

inputs they think had the most effect on performance output. As inaccurate as it might seem, this option is

very useful when you have little knowledge of internal operations because it gives the benefit of doubt.

(4) It is also fair in the sense that it is data-sensitive, which means it doesn’t compare DMUs performance to an

absolute optimum or a benchmark, it compares the DMUs to other efficient DMUs. This basically means that

the client is comparing the outsourcing destination’s performance to its best-self. This is also very useful,

when you have little knowledge of the industry, for example, the client doesn’t know what is the fair

increase in service level that should come from adding 3 more employees or reducing call volume by 20%

(5) Last but not least, DEA is a very simple technique to learn and apply, which is very useful for a client’s non-

technical needs.

DEA’s first iteration (unrestricted weights):

Now, in order to start with the DEA analysis, we need to define the following:

Decision making unit (DMU) – For the purpose of this analysis, we decided to use “weekly” aggregate

performance of the call center as the decision making unit.

Inputs – we chose “Staffing level” and “inter-arrival time” to be our inputs, since the first input (Staffing

level) reflects management’s “hiring and firing” decisions on a weekly basis, and the second input (inter-

arrival time) reflects management’s ability to plan workforce to match the variable call volume. These two

inputs are chosen as inputs because they are up to the client to change, so they are considered input

parameters to the outsourcing destination. Of course the outsourcing destination can suggest increasing

the number of CS agents, but it is up to the client to approve or disapprove.

Outputs – we went with “Service level” and “CSR Service Capacity per hour” as the outputs of our model.


After conducting the DEA analysis, the efficiency scores of the 24 DMUs were calculated along with the weights

chosen on each output (See Appendix 4.3 for detailed efficiency scores). As we see in the chart below, DEA’s

results for the first iteration confirm all of our previous findings in the “Preliminary Data Analysis” and the

“Multiple Regression Analysis”, as we have both “Week 34” and “Week 18” as efficient, but we have other efficient

weeks as well, which is a new finding. The DMUs highlighted in green are the top-tier weeks with scores ranging

from 100% to 90%. While the DMUs highlighted in yellow are those that fall between 89.99% and 70%, and the

DMUs highlighted in red are those less than 70% in their efficiency scores.

Exhibit 4.12 DEA’s first iteration (unrestricted weights) efficiency scores graphed (Dataset 1)

The picture painted by DEA here is very holistic because it has successfully defined overall performance on

a single scale that is comparable in every sense. Now, by looking at each single week, we can conclude how efficient

was the outsourcing destination in using the deployed resources to achieve performance metrics (outputs). This

way, as a client company, we can have a more insightful conversation with the outsourcing destination and discuss

with them every given week that’s underperforming without having to know too much about how they operate.

For example, in Week 19 the performance was so low given that we just hired 7 additional CS agents in that week

(why was that?), this is an example of a question that needs to be asked to the outsourcing destination’s call center

operations management. May be the reason for the drop in Week 19 is that the newly hired agents are still in

training, which means higher AHT and thus a lower service level.


Now, as impressive as this is, DEA in this iteration had the following pitfalls:

DEA assigned different weights on outputs for every DMU that maximized the DMU’s efficiency scores,

which is very good when you consider fairness to the outsourcing destination by making DEA take their

side to compensate for the lack of knowledge about their internal operations. But, this renders the DMU

scores a bit incomparable because the outputs have been weighted differently.

Most scores seem to be in the middle and upper tier, which doesn’t seem natural. We call that “Efficiency

Score inflation”.

In order to avoid these pitfalls, we decided to carry out another iteration of DEA in which we will restrict the

weights placed on outputs to certain values. Since these values might be arbitrary, we will try different weight

recipes as a way to illustrate the concept, but we are not arguing by any means that the chosen weights are

optimal.

DEA’s second iteration (weight-restricted outputs):

Taking a client perspective to this performance evaluation challenge, we need to closely follow the needs of

a client company that is analyzing its outsourced call center’s overall performance over time. The client company

wants both “Service level” and “Service Capacity”, and most probably clients will want them equally important.

DEA unfortunately if unrestricted as in the previous iteration, will attempt to maximize the DMU’s efficiency score,

even on the expense of one or more of the outputs. For example, in Week 15, regardless of the fact that the “Service

level” dropped from 23.67% in the previous week to only 7.00% in Week 15, DEA managed to secure a score of

84.43% (Middle tier band) for Week 15. DEA did that by simply avoiding any weights to be placed on service level,

and placed all the output weights on “CSR Service Capacity”. Also, since the “Inter-arrival time” has decreased

dramatically in Week 15, this allowed Week 15 to get away with a high efficiency score like that.

As we can see from the previous DEA iteration (Appendix 4.3), DEA almost always weighs one output

much higher than the other in order to maximize the DMUs efficiency score. In order to avoid this “Score inflation”

we will simply add a constraint on output weights. For simplicity purposes, we will assume that the “Service level”

and “Service Capacity” are both equally important to the client. Hence, the constraint will simply be that the

weights should always be equal. We will try 3 different scenarios:

(1) Scenario one: client cares about both outputs (Service level and Server Capacity) equally important, thus

we will place equal weights on both

(2) Scenario two: client has an interest in providing fast service, hence, all the attention is given to CSR Service

Capacity.

(3) Scenario Three: client cares only about Service level, as a result, all the output weight is placed on service

level only.


After completing the second iteration of analysis with differing weights on outputs, the new efficiency scores are as

follows:

Exhibit 4.13 DEA’s first and second iterations’ efficiency scores (Dataset 1)

As we can see, the efficiency scores of all DMUs have changed dramatically as a result of changing the

weights on outputs. This shows how flexible DEA is in incorporating the changing needs of different users or the

changing needs of the same users over time. As a result of constraining output weights, we have also treated the

“efficiency score inflation” symptom, and we now have a much more representative view of the outsourcing

destination’s performance over time. It is also worth observing that the unrestricted model is still to some extent

conservative, since the really inefficient DMUs in the unrestricted analysis will remain inefficient in the weight-

restricted models.

DMUs

Efficiency

Scores

(unrestricted)

Efficiency Scores

(Weight-restricted -

Equal weights)

Efficiency Scores



focus)

Efficiency Scores


Service Level focus)

Week 14 100.00% 92.76% 92.13% 83.07%

Week 15 84.43% 84.04% 84.43% 24.54%

Week 16 82.89% 82.43% 82.89% 18.07%

Week 17 95.50% 94.27% 94.16% 56.13%

Week 18 100.00% 100.00% 100.00% 53.55%

Week 19 66.85% 66.62% 66.85% 17.47%

Week 20 74.21% 68.10% 67.64% 52.95%

Week 21 72.09% 72.09% 72.09% 28.37%

Week 22 90.88% 90.78% 90.88% 30.65%

Week 23 98.90% 98.50% 98.90% 18.33%

Week 24 89.90% 89.54% 89.90% 18.27%

Week 25 76.07% 75.94% 76.07% 24.99%

Week 26 68.59% 67.97% 67.70% 35.21%

Week 27 68.52% 53.24% 52.36% 60.96%

Week 28 70.61% 65.94% 65.04% 56.53%

Week 29 81.56% 72.01% 70.67% 71.49%

Week 30 99.30% 98.17% 97.67% 38.61%

Week 32 100.00% 95.77% 93.91% 73.35%

Week 33 89.48% 86.29% 84.88% 59.05%

Week 34 100.00% 83.61% 80.59% 100.00%

Week 35 93.82% 72.58% 69.70% 93.89%

Week 36 84.27% 70.71% 68.24% 83.81%

Week 37 87.35% 77.81% 75.62% 76.90%

Week 38 89.01% 69.88% 67.15% 89.01%


The applications to this result can be numerous. The client can agree with the outsourcing destination on a

specific efficiency score threshold to act as their minimum performance every week, if they go below the threshold,

they will be penalized in some way, and if the poor performance persists, the whole account can be withdrawn

from them.

In order to realize the extent of variability in evaluation introduced by changing the weights of outputs,

let’s take a look at the following chart.

Exhibit 4.14 DEA’s first and second iterations’ efficiency scores summarized in graph (Dataset 1)

After conducting DEA’s both iterations we can see how DEA was able to tackle almost all the shortcomings

of the previous analytics tools. DEA was able to provide a single ranking that can be used by a client company to

evaluate the overall performance of their outsourced call center over time. The process didn’t require much of

subject matter knowledge about running call centers.

The following table summarizes the different rankings achieved by the various Preliminary and analytics

tools explored in this chapter.


Exhibit 4.15 Summary of different rankings achieved by various analyses (Dataset 1)

This shows two main findings:

Performance is dependent on how it is defined

DEA was the only tool capable of defining overall performance on a single scale.

Further analysis with DEA – accommodating queue differences:

Looking back to the DEA model’s inputs used so far, especially the “inter-arrival time” as an input, which is

the inverse of call volume. This input metric is very tricky to include as it has a hidden assumption if the company

being investigated has multiple queues in their inbound call center. In case of different queues, each queue receives

a different type of calls. Each call type has its own level of difficulty that dictates a different level of AHT. Hence,

each queue will have its own definition of AHT. As a result, using “Call volume” aggregated across all queues as an

input assumes that the proportion of calls coming to different queues is the same every week. However, for dataset

1 in Company A, this is not true, as we have looked at the call proportions coming into the different queues each

week across the 24 weeks, and we have found them fluctuating a lot. (See Appendix 4.4 and 4.5 for Queue data and

Queue Call proportions chart).

There are multiple ways to accommodate this assumption into the DEA to reflect the different queues, some of

these ways are:

Conduct DEA analysis on the queue level, to eliminate the need to use aggregated “Call volume”. This way,

the type of calls will be homogenous and there will be no more hidden assumptions. Although this method

is a clear workaround the problem, but it might not fit the needs of a client who is interested in aggregated

overall performance of the call center rather than separate queues

Another approach will be through adding an additional input to reflect the “Service Capacity expectation”

every week based on the varying queue call proportions. This input “Service Capacity expectation” will be

calculated as a “SUMPRODUCT” of the Grand AHT per queue - which is the average AHT per queue over the

24-week period – array and the call proportions per queue array. The result is an “AHT expectation” in

minutes for each of the 25 weeks, but for the purpose of input orientation “the lower, the better”, we will

need to transform “AHT expectation” into its inverse “Service Capacity expectation” and multiply it by 60

minutes per hour to get “Service Capacity expectation per hour”. It is worthy to be mentioned that we will

Agent

Contribution to

Service Level

ratio

ASSC

Ratio

Regression Model 1

(Service Capacity)

Regression

Model 2

(Service level)

DEA unrestricted

DEA

(equal

weights)

DEA -

Service

Capacity

focus

DEA -

Service

level Focus

Best Week(s) Week 34 Week 34 Week 18 Week 18 Week 14,18,32, and 34 Week 18 Week 18 Week 34

Second Best Week(s) Weel 35 Week 35 Week 30 Week 14 Week 30 Week 23 Week 23 Week 35

Worst Week(S) Week 19 Week 19 Week 21 Week 24 Week 19 Week 27 Week 27 Week 19


still need to eliminate Week 31 from the dataset, as it is still an outlier. (See Appendix 4.6 for “Service

Capacity expectation per hour” dataset)

Now, we will try to illustrate the use of the second suggestion by running two more iterations of DEA after

including the “Service Capacity expectation” column as an input in dataset 1 (See Appendix 4.7 for DEA’s third

iteration efficiency scores – unrestricted). The results are as follows (See Appendix 4.8 for DEA’s fourth iteration

results charted):

Exhibit 4.16 DEA’s third and fourth iterations’ (Accommodating Queue differences) efficiency scores

(Dataset 1)

As we can see, some changes have occurred to some of the DMUs’ efficiency scores as a result of the

inclusion of the new input. All the changes are in favor of the DMUs of course, because if it is otherwise, DEA will

not place weight on the new “Service Capacity expectation per hour” column. To sum up, this was an illustration of

a possible method to accommodate queue differences while using DEA.

DMUs

Efficiency

Scores

(unrestricted)

Efficiency Scores


Equal weights)

Efficiency Scores



focus)

Efficiency Scores


Service Level focus)

Week 14 100.00% 92.85% 92.22% 83.07%

Week 15 85.41% 85.01% 85.41% 24.54%

Week 16 82.89% 82.43% 82.89% 18.07%

Week 17 96.03% 94.98% 94.88% 56.13%

Week 18 100.00% 100.00% 100.00% 53.55%

Week 19 68.58% 68.22% 68.58% 17.47%

Week 20 80.76% 80.02% 79.61% 52.95%

Week 21 72.09% 72.09% 72.09% 28.37%

Week 22 90.88% 90.78% 90.88% 30.65%

Week 23 98.90% 98.50% 98.90% 18.33%

Week 24 89.90% 89.54% 89.90% 18.27%

Week 25 79.42% 79.14% 79.42% 24.99%

Week 26 78.68% 78.63% 78.56% 35.21%

Week 27 68.52% 77.37% 76.37% 60.96%

Week 28 80.97% 79.90% 79.10% 56.53%

Week 29 92.43% 89.96% 88.66% 71.49%

Week 30 100.00% 100.00% 100.00% 38.61%

Week 32 100.00% 100.00% 99.14% 73.35%

Week 33 94.20% 94.08% 93.57% 59.05%

Week 34 100.00% 92.98% 90.61% 100.00%

Week 35 100.00% 94.57% 91.82% 100.00%

Week 36 96.91% 92.81% 90.55% 89.17%

Week 37 99.10% 96.74% 95.07% 80.63%

Week 38 96.51% 91.41% 88.81% 95.49%


4.6 Summary of Findings

Preliminary Data Analysis Theoretical Benchmarking


Individual variables

ratios Queueing Analysis Linear Regression

DEA

Can it combine

multiple variables?

No

Yes, but only if they

produce a meaningful

ratio

It can only combine Staffing level, Service

Capacity, and Call volume)

Yes, if they have a single

dependent variable

Yes

Can it combine multiple output

metrics?

No

Yes, but only if they

produce a meaningful

ratio

No

No

Yes

Does it require subject-matter

knowledge?

No

Yes

Yes

No

Yes

Does it provide a

single scale definition of overall

performance?

No

No

No

No

Yes

Is it fair to the

outsourcing destination (given

that we don’t know much about their

operations)?

No

Depends

No, because it

depends on theory, which might be

different from reality

Yes, to some

Extent because it

is data-sensitive

Yes, because it gives the benefit of

doubt

Exhibit 4.17 Summary of findings on Call Center aggregate performance analysis


Chapter 5: Agent Performance Assessment 5.1 Introduction

In this chapter we will discuss the call center agent performance evaluation challenge. We will assume the

perspective of an inbound call center supervisor who is interested in evaluating the performance of his/her team.

Based on the evaluation results, the supervisor will make his hiring, firing, training, and promotions decisions. In

this chapter, we will start by conducting a preliminary data analysis with “Dataset 2” from company B which will

acts as a “baseline” analysis. Then, we will apply both analytics tools linear regression and DEA respectively to the

same dataset.

Inbound call center agent performance is usually defined in terms of three main dimensions:

1- Productivity, which focuses on how the agent manages his/her time during the shift. This dimension is

represented by metrics such as AHT, ACW, and Hold time. Call centers focus on productivity as a means of

achieving service level efficiently

2- Quality, which focuses on the call center agent’s performance during any interactions with customers

whether on or off the call. For example, the quality dimension focuses on metrics such as quality scores that

are given by internal quality coaches. Also, CSAT scores can be another measure of quality.

3- Punctuality, which focuses on the agent’s attendance, adherence to schedule and breaks, and conformance

to scheduled shift length. The main difference between adherence and conformance is that adherence has

to do with how closely the agent adheres to his login, logout and scheduled break times. While

conformance looks at the agent’s commitment to the scheduled shift length, for example, if the agent is

scheduled to work for 9 hours and works only 8, he/she will be penalized in the conformance metric.

As a result, inbound call center supervisors are always monitoring these 3 categories of agent performance

for each agent. When an agent underperforms in at least one of these dimensions of performance, the supervisor is

entitled to take any of the following actions:

Give verbal warning to the agent

Make the agent sign an “action plan” to correct the poor performance, to which the agent will be

held responsible if he/she didn’t follow through.

Ask the training department to retrain the agent

Make the agent sign a warning letter

Fire the agent


Hence, the call center agent performance evaluation challenge from an inbound call center supervisor

perspective is challenging because:

Agent performance is highly multi-dimensional

The performance parameters, such as the difficulty of calls or the nature of customers, change very

rapidly, which means that performance needs to be redefined quite often

Call center agents often vary in their experience due to the high turnover rate in the industry, so

accounting for experience differences in measuring performance isn’t straight forward

It isn’t clear what defines a good target for performance

The weights of various performance dimensions in agents’ scorecards are arbitrary, and there is no

clear methodology of determining the optimal weights

Agent performance needs to be defined on a single overall performance scale “i.e. there needs to be

only one ranking of agents based on their overall performance”

Performance evaluation needs to be done in a way that supports training and development

In this chapter, we will try to explore the usefulness of various analytics tools in meeting the needs of agent

performance evaluation challenge. We will start the chapter with an attempt to use “Preliminary Data Analysis” to

see how far we can go without the use of analytics. And then, in the sections to follow, we will compare and

contrast the results with and without the use of various analytical tools.

5.2 Preliminary Data Analysis

In this section, we will attempt to conduct the analysis to Dataset 2 without the use of analytics. We will use

our findings in this section as a benchmark when assessing the yield from deploying various analytics tools.

For the purpose of standardization, we will transform some of the fields in dataset 2 to fit the needs of the

DEA analysis later. We will perform the following transformations:

Dataset 2 is aggregated over a three-month period, we will use the day in the middle of the three

months to calculate “Agent Experience” in months using the “DATEDIF( )” equation in Excel®

Both fields “Queue AHT target” and “AHT” will be transformed into their inverse versions, which

are “Queue Service Capacity target per hour” and “CSR Service Capacity per hour” respectively.

Agent absenteeism will be converted into its corresponding value “Agent attendance” by deducting

agent absenteeism (%) from 1

Both fields “Agent Attendance” and “Agent Adherence” will be combined into one field, which is

their weighted average, which will be called “Agent Punctuality”

Before we extract the data, we have a final change that we need to make, but first, let us take a look at the

suggested inputs and outputs for this agent performance evaluation challenge:


Outputs (Performance metrics) – orientation “the higher the better”:

CSR Service Capacity per hour Quality Punctuality

Inputs (Differentiating variables) - orientation is “The lower the better”:

Agent Experience (months) Queue Service Capacity target per hour

We think that all these inputs and outputs are very sound. However, we have a technical concern on the

soundness of “Experience” as an input to this analysis. The reason is that as we all know, both analytics tools we

chose are linear in nature, which means that they assume linearity in the relationships between inputs and

outputs. Applying that to “Experience”, we think that experience in general is a curve that has diminishing returns

over time, hence it is not linear. For that reason, we will have to discard experience as a possible input to that level

of analysis. After applying that change, dataset 2 looks as follows:

Exhibit 5.1 Dataset 2 in analysis-ready formatting

Inputs

Agent NameQueue

name

Queue Service

Capacity target

per hour

CSR Service

Capacity per hourQuality (%) Punctuality (%)

Agent 1 20.81 20.57 95.2% 95.3%

Agent 2 20.81 20.93 94.3% 91.0%

Agent 3 20.81 17.48 83.3% 94.8%

Agent 4 20.81 21.43 91.1% 92.9%

Agent 5 20.81 19.35 94.6% 96.8%

Agent 6 20.81 16.22 86.6% 91.7%

Agent 7 20.81 19.46 98.7% 95.1%

Agent 8 20.81 18.65 89.2% 86.5%

Agent 9 20.81 19.78 98.9% 93.8%

Agent 10 20.81 18.65 90.3% 95.8%

Agent 11 20.81 19.05 92.5% 95.4%

Agent 12 20.81 15.93 98.4% 83.7%

Agent 13 20.81 21.18 90.9% 91.5%

Agent 14 20.81 21.95 99.6% 91.9%

Agent 15 20.81 22.78 99.8% 94.1%

Agent 16 20.81 14.94 86.6% 95.0%

Agent 17 20.81 20.93 91.6% 83.0%

Agent 18 20.81 15.58 90.9% 88.8%

Agent 19 20.81 16.74 90.0% 95.8%

Agent 20 20.81 18.37 95.7% 91.7%

Agent 21 17.73 13.28 95.9% 98.2%

Agent 22 17.73 19.89 100.0% 96.3%

Agent 23 17.73 13.48 84.8% 92.4%

Agent 24 17.73 12.50 92.0% 93.4%

Agent 25 16.98 14.69 70.0% 94.8%

Agent 26 16.98 16.90 91.7% 95.9%

Agent 27 16.98 14.63 100.0% 83.9%

Agent 28 16.98 14.94 81.0% 91.7%

Agent 29 16.98 18.00 90.9% 96.8%

Agent 30 16.98 21.43 57.8% 96.9%

Outputs

Que

ue 1

Que

ue 2

Que

ue 3


Next is our “Preliminary Data Analysis” for Dataset 2 with some descriptive statistics.

Descriptive statistics:

Since dataset 2 is a snapshot of different DMUs at the same point of time, a line chart will not be a good

representation, so let us examine a histogram of each output to give us an idea of how variable each output is.

Exhibit 5.2 CSR Service Capacity (output) graphed

CSR Service Capacity per hour

As we can see in Exhibit 5.2, this metric is dominated by “Agent 15”, followed by “Agent 14”. The average

service capacity among the sample is 17.99. On the other hand, the lowest agent on this metric is “Agent 24” with

12.5 customers per hour.

Quality

In Exhibit 5.3 below we can see that the quality metric is dominated by multiple agents “Agents 22, and 27”.

While there are several other agents who are very close to the top such as “Agents 14, and 15” who have quality

scores of 99.6% and 99.8% respectively. The average quality score among the group is 91%, which is quite high.

However, our lowest performer on that metric is “Agent 30” with a quality score of 57.8%.


Exhibit 5.3 Quality (output) graphed

Exhibit 5.4 Punctuality (output) graphed

Punctuality

From the graph above, we can see the size of variability introduced by this hybrid output “i.e. combines

attendance and adherence”. The group is dominated by “Agent 21” followed by “Agent 30”, which is surprising

given agent 30’s poor quality score. The average punctuality score here is 92.83%. Lowest performer on that

metric is “Agent 17”.


Correlation Matrix:

The correlation matrix shows a moderate positive correlation between “CSR Service Capacity per hour” and

“Queue Service Capacity target per hour”. This is expected, because as the queue target changes, the difficulty of

calls change, which means that service capacity should also change. The other surprising relationship is that

between “Quality” and “Queue Service Capacity target”, which is quite counter-intuitive because one would expect

that quality in more demanding queues “i.e. in terms of speed of service” should be compromised. But, apparently

this is not the case for this sample of agents. The other last important relationship to highlight is the negative

correlation between “Quality” and “Punctuality”, which is also expected because if agents care enough to come,

they would be expected to care about the quality of their service, but of course this is not always true!

Exhibit 5.5 Correlation Matrix on Dataset 2

Using ratios to combine variables – Dataset 2:

In the previous chapter, we used ratios as a means to combine inputs and outputs to produce more

meaningful trends than presented by single variables independently. However, this is very challenging with this

dataset “Dataset 2” since we have only 1 input, and many of the outputs are very hard to combine into a meaningful

ratio. But, regardless of the challenging dataset, we will look at the deviation of “Service Capacity” from its “Queue

target”.

Percentage deviation from target Service Capacity: This isn’t an actual ratio, rather it is a comparison

between an input “Queue Service Capacity target per hour” – which can be considered a

performance target – and the output “CSR Service Capacity per hour”. We will look at the

percentage deviation of each agent’s service capacity from target. As we can see, Agent 30 – not

agent 15 or 14 as we expected before – was able to dominate the whole group, followed by agent

22. This confirms the value of combining different inputs and outputs together.

Queue Service

Capacity target per


per hour

Quality

(%)

Punctuality

(%)

Queue Service Capacity target per hour 1.00

CSR Service Capacity per hour 0.48 1.00

Quality (%) 0.39 0.15 1.00

Punctuality (%) -0.20 0.05 -0.21 1.00


Exhibit 5.6 Percentage Deviation from Target Service Capacity graphed (Dataset 2)

As a result to this comparison metric above, we were able to see our first ranking for agents, in terms of

their productivity relative to their queue productivity targets. Next, we will move into a discussion about using

performance targets in agent scorecards as an “absolute benchmark”.

5.3 Absolute Benchmarking “Performance Targets”

Unlike “Queueing theory” used in chapter 4, we are not familiar with a theory that predicts call center

agents performance to provide the optimum to which we can compare our agents. Call centers also face the same

problem, they need an internally developed benchmark to compare their agents’ performance to in terms of

(Productivity, Quality, and Punctuality). Finding such benchmark isn’t quite easy, because when targets are picked,

nature of call center calls and their challenges change a lot over time, which will require revisiting the performance

targets quite often to adjust them according to the new challenges. And unfortunately, the process of constantly

revisiting targets is very costly, susceptible to subjectivity, and time consuming!

For that reason, companies should optimally start with a blank sheet “i.e. no expectations for performance” and

have a pilot period, in which they will monitor performance very closely to be able to define what a good target

should be. As we mentioned earlier, companies should focus on performance from the customer perspective to

make sure that they do not overproduce on non-value adding metrics. This way, performance targets will be

custom tailored to the call center’s specific set of conditions. Even then, call center management should constantly

listen to their agents and revisit these targets every now and then to see if they need any adjustment, otherwise,


performance targets will go obsolete and agents will feel that they are not realistic and will not care about them

anymore!

Now, after setting the right performance targets, the question now is “what weights should be placed on each

performance dimension?” The answer to that question is that there is no one right answer to that question,

because the weights are only an extension to the company’s strategy and positioning. For example if a company is

strategically positioned for its high quality service, they should stress Quality through weighing it heavily in the

agent’s scorecard. To illustrate how the different weights look like in a real scorecard (See Appendix 1.2 for a

sample scorecard weights and calculation methods from a major Telecommunications inbound call center in the

MENA region)

To sum up, absolute benchmarking through “internally developed performance targets” is widely used in the

call center industry because of its perceived fairness, ease of use, and flexibility. But on the other hand, it usually

goes obsolete quite often if not revisited frequently. In addition, performance targets that aren’t based on solid

research tend to be very arbitrary in nature, which yields performance evaluation efforts inaccurate. That’s why

we don’t recommend the use of performance target if there isn’t enough solid data to support these targets.

As a result, in the following sections we will attempt to explore analytics tool that can work as an alternative

path to measuring performance without having “Performance Targets”.

5.4 Empirical Peer Benchmarking – I “Multiple Regression” – Dataset 2

In this section we will apply “Linear Multiple Regression” to dataset 2, to illustrate how it can be used to

evaluate individual agents’ performance. We need to start by defining the dependent and independent variables.

But before we do so let us remind ourselves with the analysis parameters:

1. DMU: In this analysis, we are looking at the performance of 30 agents in Company B (Dataset 2). So, the

DMU here is each agent of the 30 agents.

2. Independent variables (inputs): our single independent variable will be “Queue Service Capacity target per

hour”

3. Dependent variables (outputs): our dependent variables are “CSR Service Capacity per hour”, “Quality”, and

“Punctuality”.

For this analysis to take place, we will need three separate models, since linear regression can accommodate only a

single output in each model. Hence, the “model formulation” is as follows for our 3 models:

𝐶𝑆𝑅 𝑆𝑒𝑟𝑣𝑖𝑐𝑒 𝐶𝑎𝑝𝑎𝑐𝑖𝑡𝑦 𝑝𝑒𝑟 ℎ𝑜𝑢𝑟 (𝑦1) = 𝛼1 + 𝛽1 ∗ 𝑄𝑢𝑒𝑢𝑒 𝑆𝑒𝑟𝑣𝑖𝑐𝑒 𝐶𝑎𝑝𝑎𝑐𝑖𝑡𝑦 𝑡𝑎𝑟𝑔𝑒𝑡 𝑝𝑒𝑟 ℎ𝑜𝑢𝑟(𝑥) + 𝜀1

𝑄𝑢𝑎𝑙𝑖𝑡𝑦 (𝑦2) = 𝛼2 + 𝛽2 ∗ 𝑄𝑢𝑒𝑢𝑒 𝑆𝑒𝑟𝑣𝑖𝑐𝑒 𝐶𝑎𝑝𝑎𝑐𝑖𝑡𝑦 𝑡𝑎𝑟𝑔𝑒𝑡 𝑝𝑒𝑟 ℎ𝑜𝑢𝑟(𝑥) + 𝜀2

𝑃𝑢𝑛𝑐𝑡𝑢𝑎𝑙𝑖𝑡𝑦 (𝑦3) = 𝛼3 + 𝛽3 ∗ 𝑄𝑢𝑒𝑢𝑒 𝑆𝑒𝑟𝑣𝑖𝑐𝑒 𝐶𝑎𝑝𝑎𝑐𝑖𝑡𝑦 𝑡𝑎𝑟𝑔𝑒𝑡 𝑝𝑒𝑟 ℎ𝑜𝑢𝑟(𝑥) + 𝜀3


Our tool for estimating these models was Excel® (See Appendix 5.1 for model estimation reports). After

estimating the model successfully (See Appendix 5.2 for model results). We can now discuss the conclusions that

we can extract from the model results, as follows:

Model 1 “Service Capacity”: regression is confirming our findings from the ratio analysis about

service capacity “i.e. the productivity aspect of the agents’ performance”. We can clearly see that

“Agent 30” still dominates this territory followed by “Agents 22 and 15”. While “Agent 24” remains

as the poor performer on this metric.

Exhibit 5.7 Percentage deviation from Model 1’s estimate graphed (Dataset 2)

Model 2 “Quality”: The model here confirms that “Agent 27” is the dominating agent on the quality

metric, which is a very clear distinction, unlike having multiple agents hovering around the top “i.e.

Agents 14, 15 and 22” as the preliminary data analysis suggested. However, the model confirms the

findings of the preliminary data analysis in terms of the lowest performer “ Agent 30”


Exhibit 5.8 Percentage deviation from Model 2’s estimate graphed (Dataset 2)

Model 3 “Punctuality”: In this model, we can also see the change brought in by combining an input.

Instead of “Agent 21” dominating the group as the preliminary study suggested, regression

confirms that “Agent 5” is now the dominating agent followed closely by “Agent 21”. Another

surprising result is that “Agent 27” has now become the lowest performer in this group, rather than

“Agent 17”.

Exhibit 5.9 Percentage deviation from Model’s 3 estimate graphed (Dataset 2)


After analyzing our findings from using regression as an analytics tool for benchmarking call center agent

performance, we can see clearly the value that regression brings in even if we only have 1 input like in our case

with dataset 2. Next, we will examine DEA’s ability in analyzing dataset 2.

5.5 Empirical Peer Benchmarking – II “Data Envelopment Analysis” – Dataset 2

As important as AHT to operations managers, everyone is much more concerned about quality. You can be

the best agent in handling customers calls as quickly as possible, but “are you doing it right?” are you spending

enough time producing on the various aspects of quality defined by the company to its customers, for example “are

you educating the customer about the company’s service?”, more importantly “are you preserving customers

privacy by confirming the phone password before sharing any account information?”, “are you directing customers

to self-service online in order to help reduce workload in the future?”, etc… (See Appendix 1.1 for a sample of

quality checklist)

As a result, it is very uncommon (but it might happen!) to find an agent famous for their low AHT, although

they will be appreciated by their supervisor. On the other hand, almost all agents will be well known and noticed

by their bosses for their “quality service”, and sometimes customers would want to do the agent a favor if they

receive good service and would ask to talk to the agent’s supervisor to praise the agent’s service, this is usually

called a “thank you!” call, and bosses often email the whole Customer Service department when an agent gets a

“Thank you” call. They will also mention that agent’s name in the email. As a result, we think that DEA is necessary

for combining all the three different outputs (Service Capacity, Quality, and Punctuality) together in order to have a

single holistic scale on which different agents in dataset 2 will be rated.

The DMU in this analysis will be the different agents, we have 30 of them in this dataset 2. For each agent

we will analyze 3 main metrics (1) Service Capacity, which represents agent’s productivity. (2) Quality, and (3)

Punctuality, which was calculated as the average of both “Attendance and Adherence”. These metrics will be the

outputs to our model, with the only available input being the “Queue Service Capacity target”.

Unfortunately, since DEA is also a linear program, we will have to discard “Experience” as an input. We

could run DEA with experience, but we will end up over-punishing the experienced agents on the expense of the

newly hired agents. So, we decided to run the DEA model with “Queue Service Capacity target” as the only input. It

is also worth mentioning that even if we didn’t have any inputs we could still run an “output only” DEA (Lovell and

Pastor) (See Exhibit 5.1 for the input and output data in dataset 2). To summarize, our model is as follows:

DMU: each of the 30 agents

Inputs: Queue Service Capacity target per hour

Outputs: Productivity (Service Capacity), Quality, and Punctuality (average of adherence and attendance)


DEA’s first iteration (Unrestricted weights) – Output only:

The value behind the weight unrestricted model was illustrated in the last chapter when we wanted to be

fair to the outsourcing destination. Which made sense than because we didn’t know much about their internal

operations. But here, this isn’t the case, as the supervisor have access to everything that the agent does. That’s why

we are using the “weight unrestricted” model as a baseline only, or a way to show the improvement brought by the

next iterations. However, we think that the weight-unrestricted DEA model here is too conservative for internal

use. The results can be summarized as follows (See Appendix 5.3 for details on efficiency scores and associated

weights):

Exhibit 5.10 DEA’s first iteration’s (unrestricted weights) efficiency scores graphed (Dataset 2)

As we can see, five agents got away with an efficient score of (100%), which was expected due to lack of

weight restrictions. Even though, this model was weight-unrestricted, but we can still see clearly that there are

many inefficient agents. In the next iteration, we will run DEA with a specific set of weights, but in order to allow

some degree of freedom, we will not make the weights add up to one. The choice of weights was quite arbitrary, we

went with the same weights as in Appendix 1.2. But again, our discussion here isn’t about DEA weight selection,

rather, it is to illustrate the use of the tool in the call center agent performance evaluation challenge.


DEA’s second iteration (weight restricted) – Output only:

As we said before, the chosen weights were as follows:

Productivity: 20%

Quality: 40%

Punctuality: 20%

Which leaves 20% of freedom for DEA to assign to whichever output that maximizes the DMU’s efficiency

score. After running the analysis, the results were as follows:

Exhibit 5.11 DEA’s second iteration’s (restricted weights) efficiency scores (Dataset 2)

Agent NameQueue

name

Efficiency

Scores

Service

CapacityQuality Punctuality

Agent 1 82.45% 20% 60% 20%

Agent 2 83.43% 20% 60% 20%

Agent 3 70.76% 20% 60% 20%

Agent 4 84.85% 20% 60% 20%

Agent 5 78.32% 20% 60% 20%

Agent 6 66.72% 20% 60% 20%

Agent 7 79.03% 20% 60% 20%

Agent 8 75.05% 20% 60% 20%

Agent 9 80.10% 20% 60% 20%

Agent 10 75.48% 20% 60% 20%

Agent 11 77.02% 20% 60% 20%

Agent 12 66.68% 20% 60% 20%

Agent 13 83.93% 20% 60% 20%

Agent 14 87.44% 20% 60% 20%

Agent 15 90.36% 20% 60% 20%

Agent 16 62.51% 20% 60% 20%

Agent 17 82.87% 20% 60% 20%

Agent 18 64.93% 20% 60% 20%

Agent 19 69.00% 20% 60% 20%

Agent 20 74.92% 20% 60% 20%

Agent 21 68.02% 20% 60% 20%

Agent 22 94.65% 20% 60% 20%

Agent 23 67.26% 20% 60% 20%

Agent 24 64.25% 20% 60% 20%

Agent 25 73.53% 20% 60% 20%

Agent 26 85.42% 20% 60% 20%

Agent 27 76.55% 20% 60% 20%

Agent 28 75.78% 20% 60% 20%

Agent 29 89.91% 20% 60% 20%

Agent 30 100.00% 20% 60% 20%

Weights chosen by DEA

Qu

eu

e 1

Qu

eu

e 2

Qu

eu

e 3


As we can see here, this is quite an improvement from the last iteration. We can see a more real-view of agents’

performance. We can see that Agent 30 is dominating the whole group regardless of his/her poor performance in

Quality, due to his/her superior performance in service capacity and punctuality with a lower input than most of

the group “queue 3”. We can also see how DEA uses the 20% degree of freedom identically across the whole

sample, which means that this specific recipe of weights maximizes the efficiency scores of all the DMUs in this

sample.

5.6 Summary of findings

In this chapter, we were able to see the value of the analytics tools in defining overall performance on a

single scale, especially DEA. Although dataset 2 was particularly hard for linear regression because it had only 1

input and multiple outputs, but were still able to see the improvements brought in by the use of linear regression.

Not to mention the magnificent picture painted by DEA, which is the dominant analytics tool in multi-dimensional

environments of performance. So, getting back to the needs of the operations supervisor. Let’s summarize what

we’ve learned about the various analytical methods used in this chapter.

Preliminary Data Analysis Absolute Benchmarking


Individual variables

ratios Performance Targets Linear Regression

DEA

Can it combine

multiple variables?

No

No

No

Yes

Yes

Can it combine multiple output

metrics?

No

No

No

No

Yes

Could it provide a single definition of

overall agent performance on a

single scale?

No

No

No

No

Yes

Could it

accommodate experience as a differentiator?

No

No

No

No

No

Exhibit 5.12 Summary of findings on Agent performance analysis


Chapter 6: Conclusions, and future research opportunities

In this chapter we will summarize our comparison between linear regression and DEA as possible analytics

tools for the two performance evaluation challenges we examined earlier. Then we will engage in a qualitative

discussion on the topics that we think can act as future research opportunities.

Conclusions:

After applying the suggested analytics methods “i.e. DEA and linear regression” to the two evaluation

challenges we explored in this research, which were the call center performance over time and the agent overall

performance, we summarized our findings about each as follows:

Linear regression

This widely used analytics tool brought some very beneficial insights to our analyses, let us name a few:

It provided information about the direction, magnitude and even the significance of the different

relationships between various inputs and outputs. This is very helpful for users to make them realize what

the data is saying exactly. This in turn helps the user to shape their prior knowledge of the inputs and

outputs into a more informed posterior knowledge that is based on real data.

It also helped combining all the inputs in each of the analyses to each of the outputs separately, which produced a much more meaningful picture of the outputs after controlling for the inputs, and we showed

how more meaningful was that during our analysis chapters in several occasions

On the other hand, linear regression seemed to have a major shortcoming when applied to these two performance

evaluation challenges:

Linear regression could not incorporate multiple outputs into the same model, which was not very helpful in this particular performance evaluation challenge because we ended up with more meaningful but yet

separate scales of performance for each of the outputs. So, from the perspective of our analysis, we needed

a single scale of performance in order to be able to judge different DMUs holistically.

Overall, we see linear regression as an analytics tool that is perfect for blending multiple inputs with a single

output, especially because of the data it provides in the model estimation report which helps shape the user’s

understanding of the relationships. But when it comes to multi-dimensional performance, linear regression might

not be the best tool for that.

Data Envelopment Analysis

DEA was a very useful tool in both of these analyses, especially the first one “i.e. aggregate performance

tracking”. The main reasons behind that are as follows:

DEA was able to produce a single scale of overall performance that combined all of the inputs and outputs

together. This was a perfect fit for the nature of our challenges, which made DEA the lead analytics tool for these performance evaluation challenges.

The flexibility of DEA brought by the use of “weight unrestricted” versus “weight restricted” DEA models

was very valuable in the sense that it allowed for a change in the degree of firmness we wanted in our

results. If we prefer more conservative results in order to give the benefit of doubt to the various DMUs we

should use the “weigh unrestricted” DEA model. On the other hand, if we would like a more unified weight

system across all DMUs, then “weight restricted” DEA is the answer. This flexibility meets the needs of

different users, we like to think that “weight unrestricted” DEA is a better fit for external users who have

minimal information about internal operations and/or relationships between various inputs and outputs.

While we also think that the “weight restricted” DEA is a better fit for internal use, were a more unified


measure is needed to fit the more knowledge that the user has about performance and also to promote

fairness of the internal performance evaluation process.

DEA is very easy to learn and deploy

On the other hand, DEA showed some shortcomings in tackling the two performance evaluation challenges in

this research. These shortcomings were as follows:

DEA could not incorporate obviously non-linear inputs such as “Agent Experience”. This is a real

shortcoming of the methodology, but we think that there are many ways to linearize non-linear inputs.

DEA doesn’t provide any information about the relationships between inputs and outputs, which makes us

think that it assumes that the user’s prior knowledge of the relationships between inputs and outputs is

enough. This might be true in the case of internal use, but external users might not have the right

understanding of the relationships between inputs and outputs which might lead to very inaccurate results

if used haphazardly! As a result, we think that DEA requires some knowledge of the relationship between

the different inputs and outputs, but it doesn’t require much knowledge about the fine details of operations

in a specific call center.

There are some question marks on the comparability or the fairness of DEA “weight unrestricted” models

in the sense that the DMUs have different weights on inputs and outputs, which doesn’t seem fair to some.

But we agree that in some situations, it is very useful due to its more conservative nature.

Overall, we think that of both methodologies, DEA seems to be the dominating analytics tool that best meets

the needs of both of these challenges. However, we would like to stress the importance of careful use of DEA in this

application, sense it doesn’t correct the user if the tool was used incorrectly.

Future research opportunities:

In this research, we have tried to use the data we obtained in the best way possible to test both analytics

tools and we came to conclusions about which tool is more fitting to these specific two performance evaluation

challenges “i.e. Self-benchmarking and Peer benchmarking”. However, we found ourselves facing some obstacles

that we couldn’t tackle in this research due to time and project scope constraints, these obstacles should serve as

potential future research opportunities in the field of performance evaluation in call centers. These research

opportunities are:

We think that “Agent experience” can be incorporated successfully into the DEA analysis of the agent performance. But since DEA requires inputs and outputs to be linearly connected, we would like to find a

way of turning agent experience into a linear variable that reflects either the agent’s experience or the

amount of training or knowledge that he/she has, in order to serve as a differentiator among agents.

Due to the uncommon nature of “weight unrestricted” DEA in evaluating performance, we would like to investigate the psychological effect associated with using DEA in evaluation on both the evaluator and the

DMU being evaluated. Will it make DMUs more motivated to perform? Will they understand how it works

and find a way around it? How to use it to better align agent’s motivation to that of the call center and the

client company?


Customer

Centric/Business

Attributes

Error Error Reason Error Type

Asked repetitive questions

Asking the customer for an information already mentioned before

Didn’t confirm customer's understanding

Didn’t keep the conversation on track

Didn’t use acknowledgment listening

Didn’t wait for customer's confirmation

Inappropriate wordings used to fill the dead air

Interrupted the customer

Let the customer repeat the information

Not concentrated

Not confident and hesitant

Not patient

Over confident

Used mute button

Didn’t follow the correct conference protocol

Didn’t follow the correct transfer protocol

Didn't ask the customer for permission

Didn't give the reason for hold

Didn't thank the customer for hold

Didn't use the hold statement

Didn't wait for customer permission

Take the steps on notes if more than 3 steps

Follows proper call sequence then proceed accordingly with relevant information

Warm Greeting & Closing

Used language not match with the customer

Didn’t absorb customer anger

Didn’t allow customer to vent completely

Didn’t offer a sincere apology showing understanding of the situation

Not able to handle stress

Not empathetic

Not understanding

Showed understanding with over reacting

Didn’t offer extra assistance

Didn’t offer extra assistance in a willing way

Didn’t ask for customer name

Didn’t explain the reason for verification (when needed)

Uses of the Transitional phrase

Verified customer's data while no need for it

Verified mobile/Land Line number while no need for it

Didn’t verify mobile/Land line number

Didn’t verify mobile/Land line number properly

Didn’t ask customer for his mobile number/Land line

Didn't address the customer with his/her name using available data

Didn’t repeat mobile/Land Line number after the customer

Welcoming the customer

Didn't ask for customer permission to talk at the beginning of the call

Didn't ask for customer permission to talk at all

Didn't introduce himself at all

Didn't mention his name

Didn't mention the company name

Didn't explain any reason for the call

Explains wrong reason for the call

Addressed customer by wrong name

Didn’t address customer by his/her formal name (title)

Didn’t address customer by his/her name at all

Didn’t Follow with customer on time / as promised

Didn’t follow up with the customer when required

Didn’t make security verification

Didn’t verify customer address

Didn’t verify customer birth date

Didn’t verify customer contact numbers

Didn’t verify customer ID number

Released customer personal data

Verified customer's data while no need

Going extra mile to solve the customer’s problem and ability to retain the customer

Way of education / Satisfaction confirmed

Tariff advisory

Extra relevant information

What it is for customer interest

Cross and up selling

Profit to the organization

Exceeding

Exceeding

Maintain

ConfidentialitySecurity verification End User Critical Error

Extra Mile /

Revenue

Opportunity

(Outstanding

/Unique)

Extra Mile

Revenue Opportunity

Addressing

customer by

formal name

Professional personalization Non Critical Error

Follow up when

required.

Escalating / directing the

customer to the correct

channels

End User Critical Error

Asking for customer

permission to talk or / and

introducing yourself

Non Critical Error

Explains reason for the call Non Critical Error

Staff members

are attentative

Standard verification Non Critical Error

Controls the call well Non Critical Error

Non Critical Error

Offers extra assistance Non Critical Error

Offers a sincere apology

showing understanding of

the situation & displaying

empathy

Appendix 1.1 Sample of Quality rubric “checklist” for a Major Telecommunications company in the MENA region:


1.2 Sample of Agent Scorecard for a Major Telecommunications company in the MENA region:

2014 Target 2014 Weight Way of Calculation Grading

TRIM Above Competition 10%

Service Level 80/20 5%

Repetitive Callers % 4% 5%

AHT Inbound 210 5%

<= 210 Full Score

< 210 - 230 Prorated

> 230 Lose Score

<= 200-210 Score 110%

= 190-200 Score 120%

Hold % 2% 5%

<= 2% Full Score

< 2.5% - 3% Prorated

> 3% Lose Score

<= 1% -2% Score 110%

> 1% Score 120%

Not Ready

(Personal/Business

Related/ACW)

Total 7% 10%

<= 7% Full Score

< 7% - 9%Prorated

> 9% Lose Score

<= 5% Score 110%

>5% Score 120%

Quality Assurance98% C

95% NC20%

1 EUC - 7.5%

1 BC - 5%

1 NC - 2.5%

<= 99% Score 110%

> 100% Score 120%

SPV Calls Observations98% C

95% NC5%

1 EUC - 5%

1 BC - 5%

1 NC - 2.5%

<= 99% Score 110%

> 100% Score 120%

Rejection 2% 5% Go or No Go

Revenue Loss Mistakes Zero Revenue Loss 5%

50 - 100 1%

100-300 2%

300-500 5%

500-1000 10%

1000-2000 15%

2000-2500 17%

25000-3000 19%

3000-3500 21%

3500-4000 23%

Dropped Calls VS Call

BacksTo make call back for 75% of the calls dropped5%

>=75% to get 5%

75%~65% to get 2%

<65% to lose score

<= 90% score 110%

> 100% score 120%

Conformance 100% 5% Go or No Go

Adherence 99% 5% Go or No Go

Absenteeism 0 10% 1 day = -5%

Total

Productivity 20%

Quality 40%

100%

Punctuality 20%

Corp End UserItemWeightCore Job KPI (70%)

20%Global


4.1 Regression Models estimation reports

Model 1: Service Capacity

Model 2: Service Level

SUMMARY OUTPUT

Regression Statistics

Multiple R 0.114541652

R Square 0.01311979

Adjusted R Square -0.080868801

Standard Error 1.044698859

Observations 24

ANOVA

df SS MS F Significance F

Regression 2 0.304694057 0.152347 0.1395892 0.870515725

Residual 21 22.91930984 1.091396

Total 23 23.2240039

Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%

Intercept 12.32151756 1.409568637 8.741339 1.933E-08 9.390159107 15.25287601 9.390159107 15.25287601

Staffing level (# of Agents) -0.001044493 0.019298804 -0.054122 0.9573494 -0.041178552 0.039089567 -0.041178552 0.039089567

Inter-arrival time (mins) 1.299324032 2.492550919 0.521283 0.6076239 -3.884219369 6.482867433 -3.884219369 6.482867433

SUMMARY OUTPUT



R Square 0.761983892

Adjusted R Square 0.739315692


Observations 24

ANOVA


Regression 2 0.733250102 0.366625 33.6146614 2.84686E-07

Residual 21 0.229040715 0.010907

Total 23 0.962290816


Intercept -0.408218984 0.140909997 -2.89702 0.008622716 -0.70125736 -0.115180604 -0.701257364 -0.115180604

Staffing level (# of Agents) 0.01408381 0.001929239 7.300191 3.45934E-07 0.010071739 0.018095882 0.010071739 0.018095882

Inter-arrival time (mins) 0.418075245 0.249172217 1.677857 0.108197086 -0.10010675 0.936257237 -0.100106747 0.936257237


4.2 Regression Models analysis results


Intercept Staffing level inter-arrival time

12.321518 -0.001044493 1.299324032

Staffing level (#

of Agents)

Inter-arrival

time (mins)

CSR Service

Capacity per hour

Expected CSR

Service Capacity

per hour

Percentage

deviation from

model estimateWeek 14 18 0.71 13.87 13.22 4.89%

Week 15 18 0.52 12.71 12.98 -2.09%

Week 16 18 0.46 11.57 12.91 -10.38%

Week 17 18 0.61 14.17 13.10 8.25%

Week 18 18 0.50 15.05 12.95 16.21%

Week 19 25 0.52 10.48 12.97 -19.21%

Week 20 25 0.61 12.37 13.09 -5.50%

Week 21 25 0.51 11.13 12.96 -14.14%

Week 22 25 0.49 13.42 12.93 3.73%

Week 23 27 0.44 13.02 12.86 1.20%

Week 24 27 0.48 12.91 12.91 -0.05%

Week 25 27 0.56 12.85 13.02 -1.32%

Week 26 33 0.64 12.96 13.11 -1.18%

Week 27 36 0.82 12.86 13.35 -3.63%

Week 28 36 0.69 13.42 13.18 1.89%

Week 29 36 0.66 13.95 13.14 6.17%

Week 30 36 0.48 14.22 12.91 10.15%

Week 32 39 0.50 14.06 12.93 8.73%

Week 33 39 0.52 13.27 12.96 2.44%

Week 34 39 0.53 12.87 12.97 -0.78%

Week 35 45 0.61 12.88 13.07 -1.51%

Week 36 45 0.62 12.75 13.08 -2.52%

Week 37 50 0.59 13.42 13.04 2.93%

Week 38 64 0.62 12.50 13.06 -4.27%


Model 2: Service level

Intercept Staffing level inter-arrival time

-0.408219 0.01408381 0.418075245

Staffing level

(# of Agents)

Inter-arrival time

(mins)

Service Level

(%)

Expected

Service level

(%)

Percentage

deviation

Week 14 18 0.71 23.67% 14.11% 67.72%

Week 15 18 0.52 7.00% 6.33% 10.46%

Week 16 18 0.46 5.15% 3.94% 30.79%

Week 17 18 0.61 16.00% 10.03% 59.55%

Week 18 18 0.50 15.26% 5.47% 179.23%

Week 19 25 0.52 6.91% 16.19% -57.31%

Week 20 25 0.61 20.96% 19.81% 5.79%

Week 21 25 0.51 11.23% 15.86% -29.19%

Week 22 25 0.49 12.13% 14.92% -18.69%

Week 23 27 0.44 7.84% 15.51% -49.48%

Week 24 27 0.48 7.81% 17.17% -54.53%

Week 25 27 0.56 10.68% 20.70% -48.39%

Week 26 33 0.64 18.40% 32.28% -43.01%

Week 27 36 0.82 34.74% 44.04% -21.11%

Week 28 36 0.69 32.22% 38.59% -16.50%

Week 29 36 0.66 40.75% 37.33% 9.16%

Week 30 36 0.48 21.74% 30.13% -27.84%

Week 32 39 0.50 42.46% 34.92% 21.57%

Week 33 39 0.52 35.70% 35.85% -0.42%

Week 34 39 0.53 61.75% 36.32% 70.03%

Week 35 45 0.61 66.85% 48.25% 38.54%

Week 36 45 0.62 59.71% 48.55% 22.99%

Week 37 50 0.59 52.76% 54.28% -2.79%

Week 38 64 0.62 64.08% 75.21% -14.80%


4.3 DEA’s results

DEA’s first iteration results

DMUsEfficiency Scores

(unrestricted)

Weight on CSR

Service Capacity

per hour

Weight on

Service level

Week 14 100.00% 5.00% 95.00%

Week 15 84.43% 100.00% 0.00%

Week 16 82.89% 100.00% 0.00%

Week 17 95.50% 7.00% 93.00%

Week 18 100.00% 100.00% 0.00%

Week 19 66.85% 100.00% 0.00%

Week 20 74.21% 2.00% 98.00%

Week 21 72.09% 100.00% 0.00%

Week 22 90.88% 100.00% 0.00%

Week 23 98.90% 100.00% 0.00%

Week 24 89.90% 100.00% 0.00%

Week 25 76.07% 100.00% 0.00%

Week 26 68.59% 23.00% 77.00%

Week 27 68.52% 2.00% 98.00%

Week 28 70.61% 8.00% 92.00%

Week 29 81.56% 2.00% 98.00%

Week 30 99.30% 23.00% 77.00%

Week 32 100.00% 23.00% 77.00%

Week 33 89.48% 23.00% 77.00%

Week 34 100.00% 0.00% 100.00%

Week 35 93.82% 0.00% 100.00%

Week 36 84.27% 2.00% 98.00%

Week 37 87.35% 7.00% 93.00%

Week 38 89.01% 0.00% 100.00%


4.4 Queue Data

4.5 Queue call-proportions chart

Call Volume AHT Call Volume AHT Call Volume AHT Call Volume AHT Call Volume AHT Call Volume AHT

Week 14 2596 4.32 1799 4.98 2603 3.81 51 3.23 46 4.51 27 7.69

Week 15 3725 4.64 2460 5.34 3263 4.30 79 4.09 69 5.03 68 4.47

Week 16 4300 4.92 2602 5.92 3670 4.66 93 4.04 95 7.17 97 6.67

Week 17 3424 3.94 1883 5.38 2749 3.76 82 5.44 73 3.94 53 3.70

Week 18 3828 4.05 2391 3.92 3570 3.92 110 3.79 90 5.21 75 3.82

Week 19 3773 5.78 2447 6.60 3102 5.04 169 3.65 115 4.52 57 4.35

Week 20 2784 5.17 2187 5.17 3069 4.43 82 4.26 92 4.72 73 3.34

Week 21 2733 5.56 2540 7.18 4296 4.45 85 3.41 85 1.83 74 2.55

Week 22 2792 3.80 2596 5.50 4611 4.43 51 3.16 100 3.25 112 3.69

Week 23 3235 4.85 2811 6.42 5036 3.99 121 2.96 140 3.09 168 2.29

Week 24 3148 4.80 2405 5.91 4604 4.42 84 1.84 129 3.15 182 2.14

Week 25 2658 4.48 2114 5.97 3889 4.40 72 1.78 111 3.28 124 1.78

Week 26 2200 5.45 2013 5.79 3397 4.01 37 1.59 114 2.30 153 1.22

Week 27 1620 5.04 1563 4.94 2809 4.44 49 2.45 51 3.16 76 2.15

Week 28 1596 4.78 1996 4.84 3589 4.17 31 2.96 74 3.01 54 2.78

Week 29 2870 4.69 2310 4.31 2357 4.00 52 1.77 27 1.62 60 1.81

Week 30 7422 4.21 2841 4.21 0 0 25 3.70 45 5.09 70 4.83

Week 31 3435 3.82 969 5.21 0 0 27 2.55 23 4.95 30 2.70

Week 32 6954 4.12 2978 4.93 0 0 53 4.29 77 3.35 59 3.65

Week 33 6689 4.21 2823 5.45 0 0 56 3.96 72 4.72 49 4.16

Week 34 6628 4.19 2695 6.45 0 0 42 2.98 61 5.61 61 2.47

Week 35 5213 4.27 2859 5.86 0 0 33 4.82 53 4.19 43 4.11

Week 36 5264 4.48 2680 5.59 0 0 61 3.84 55 3.76 46 2.80

Week 37 5694 4.16 2700 5.19 0 0 30 5.62 48 3.09 66 2.90

Week 38 5380 4.31 2651 6.38 0 0 27 4.12 44 3.09 35 2.73

Queue 6Queue 1 Queue 2 Queue 3 Queue 4 Queue 5


4.6 Service Capacity Expectation per hour Dataset

Staffing level

(# of Agents)

Inter-arrival time

(mins)


Expectation per hour

CSR Service

Capacity per hourService Level (%)

Week 14 18 0.71 14.59 13.87 23.67%

Week 15 18 0.52 14.42 12.71 7.00%

Week 16 18 0.46 14.49 11.57 5.15%

Week 17 18 0.61 14.49 14.17 16.00%

Week 18 18 0.50 14.61 15.05 15.26%

Week 19 25 0.52 14.37 10.48 6.91%

Week 20 25 0.61 14.63 12.37 20.96%

Week 21 25 0.51 15.08 11.13 11.23%

Week 22 25 0.49 15.19 13.42 12.13%

Week 23 27 0.44 15.18 13.02 7.84%

Week 24 27 0.48 15.23 12.91 7.81%

Week 25 27 0.56 15.17 12.85 10.68%

Week 26 33 0.64 15.09 12.96 18.40%

Week 27 36 0.82 15.24 12.86 34.74%

Week 28 36 0.69 15.38 13.42 32.22%

Week 29 36 0.66 14.08 13.95 40.75%

Week 30 36 0.48 12.49 14.22 21.74%

Week 32 39 0.50 12.45 14.06 42.46%

Week 33 39 0.52 12.45 13.27 35.70%

Week 34 39 0.53 12.47 12.87 61.75%

Week 35 45 0.61 12.31 12.88 66.85%

Week 36 45 0.62 12.37 12.75 59.71%

Week 37 50 0.59 12.39 13.42 52.76%

Week 38 64 0.62 12.36 12.50 64.08%

OutputsInputs


4.7 DEA third iteration – Efficiency scores (unrestricted)

DMUsEfficiency Scores

(unrestricted)

Weight on CSR

Service Capacity

per hour

Weight on

Service level

Week 14 100.00% 7.00% 93.00%

Week 15 85.41% 100.00% 0.00%

Week 16 82.89% 100.00% 0.00%

Week 17 96.03% 7.00% 93.00%

Week 18 100.00% 100.00% 0.00%

Week 19 68.58% 100.00% 0.00%

Week 20 80.76% 14.00% 86.00%

Week 21 72.09% 100.00% 0.00%

Week 22 90.88% 100.00% 0.00%

Week 23 98.90% 100.00% 0.00%

Week 24 89.90% 100.00% 0.00%

Week 25 79.42% 100.00% 0.00%

Week 26 78.68% 37.00% 63.00%

Week 27 68.52% 2.00% 98.00%

Week 28 80.97% 14.00% 86.00%

Week 29 92.43% 14.00% 86.00%

Week 30 100.00% 100.00% 0.00%

Week 32 100.00% 63.00% 37.00%

Week 33 94.20% 63.00% 37.00%

Week 34 100.00% 0.00% 100.00%

Week 35 100.00% 0.00% 100.00%

Week 36 96.91% 19.00% 81.00%

Week 37 99.10% 19.00% 81.00%

Week 38 96.51% 19.00% 81.00%


4.8 DEA fourth iteration – weight restricted - results charted


5.1 Regression Model estimation reports:


Model 2: Quality

SUMMARY OUTPUT



R Square 0.226357135



Observations 30

ANOVA


Regression 1 54.08765213 54.08765213 8.192410317 0.007875183

Residual 28 184.8606455 6.602165911

Total 29 238.9482976


Intercept 2.263454857 5.514808657 0.410432165 0.68461279 -9.033118581 13.5600283 -9.033118581 13.5600283

Queue target Service Capacity per hour 0.801049863 0.279868295 2.86223869 0.007875183 0.227765648 1.374334078 0.227765648 1.374334078

SUMMARY OUTPUT



R Square 0.15029657



Observations 30

ANOVA


Regression 1 0.035512828 0.035512828 4.952673844 0.03427946

Residual 28 0.200772192 0.007170435

Total 29 0.23628502


Intercept 0.504402038 0.181743964 2.775344098 0.009714394 0.132116404 0.876687672 0.132116404 0.876687672

Queue target Service Capacity per hour 0.020525943 0.009223234 2.225460367 0.03427946 0.001633003 0.039418882 0.001633003 0.039418882


Model 3: Punctuality

SUMMARY OUTPUT



R Square 0.038394923



Observations 30

ANOVA


Regression 1 0.001826073 0.001826073 1.117982698 0.29939071

Residual 28 0.045734193 0.001633364

Total 29 0.047560266


Intercept 1.019717839 0.086741859 11.75577565 2.41601E-12 0.842035195 1.197400484 0.842035195 1.197400484

Queue target Service Capacity per hour -0.004654462 0.00440202 -1.057347009 0.29939071 -0.013671591 0.004362666 -0.013671591 0.004362666


5.2 Regression Models results:


Intercept

Queue Service

Capacity target

2.263454857 0.801049863

Agent Name Queue

Queue target

Service Capacity

per hour

CSR Service

Capacity per

hour

Expected

Service

Capacity

Percentage

deviation

from model

estimate

Agent 1 20.81 20.57 18.93 8.7%

Agent 2 20.81 20.93 18.93 10.6%

Agent 3 20.81 17.48 18.93 -7.7%

Agent 4 20.81 21.43 18.93 13.2%

Agent 5 20.81 19.35 18.93 2.2%

Agent 6 20.81 16.22 18.93 -14.3%

Agent 7 20.81 19.46 18.93 2.8%

Agent 8 20.81 18.65 18.93 -1.5%

Agent 9 20.81 19.78 18.93 4.5%

Agent 10 20.81 18.65 18.93 -1.5%

Agent 11 20.81 19.05 18.93 0.6%

Agent 12 20.81 15.93 18.93 -15.9%

Agent 13 20.81 21.18 18.93 11.9%

Agent 14 20.81 21.95 18.93 15.9%

Agent 15 20.81 22.78 18.93 20.3%

Agent 16 20.81 14.94 18.93 -21.1%

Agent 17 20.81 20.93 18.93 10.6%

Agent 18 20.81 15.58 18.93 -17.7%

Agent 19 20.81 16.74 18.93 -11.6%

Agent 20 20.81 18.37 18.93 -3.0%

Agent 21 17.73 13.28 16.47 -19.3%

Agent 22 17.73 19.89 16.47 20.8%

Agent 23 17.73 13.48 16.47 -18.1%

Agent 24 17.73 12.50 16.47 -24.1%

Agent 25 16.98 14.69 15.87 -7.4%

Agent 26 16.98 16.90 15.87 6.5%

Agent 27 16.98 14.63 15.87 -7.8%

Agent 28 16.98 14.94 15.87 -5.9%

Agent 29 16.98 18.00 15.87 13.4%

Agent 30 16.98 21.43 15.87 35.1%

Qu

eu

e 1

Qu

eu

e 2

Qu

eu

e 3


Model 2: Quality

Intercept

Queue Service

Capacity target

0.504402038 0.020525943

Agent Name Queue name

Queue target

Service Capacity

per hour

QualityExpected

Quality

Percentage

deviation

from model

estimate

Agent 1 20.81 95.2% 93.2% 2.2%

Agent 2 20.81 94.3% 93.2% 1.3%

Agent 3 20.81 83.3% 93.2% -10.5%

Agent 4 20.81 91.1% 93.2% -2.2%

Agent 5 20.81 94.6% 93.2% 1.5%

Agent 6 20.81 86.6% 93.2% -7.1%

Agent 7 20.81 98.7% 93.2% 5.9%

Agent 8 20.81 89.2% 93.2% -4.2%

Agent 9 20.81 98.9% 93.2% 6.2%

Agent 10 20.81 90.3% 93.2% -3.0%

Agent 11 20.81 92.5% 93.2% -0.7%

Agent 12 20.81 98.4% 93.2% 5.7%

Agent 13 20.81 90.9% 93.2% -2.4%

Agent 14 20.81 99.6% 93.2% 6.9%

Agent 15 20.81 99.8% 93.2% 7.1%

Agent 16 20.81 86.6% 93.2% -7.1%

Agent 17 20.81 91.6% 93.2% -1.7%

Agent 18 20.81 90.9% 93.2% -2.4%

Agent 19 20.81 90.0% 93.2% -3.4%

Agent 20 20.81 95.7% 93.2% 2.7%

Agent 21 17.73 95.9% 86.8% 10.4%

Agent 22 17.73 100.0% 86.8% 15.2%

Agent 23 17.73 84.8% 86.8% -2.4%

Agent 24 17.73 92.0% 86.8% 5.9%

Agent 25 16.98 70.0% 85.3% -17.9%

Agent 26 16.98 91.7% 85.3% 7.5%

Agent 27 16.98 100.0% 85.3% 17.2%

Agent 28 16.98 81.0% 85.3% -5.0%

Agent 29 16.98 90.9% 85.3% 6.6%

Agent 30 16.98 57.8% 85.3% -32.3%

Qu

eu

e 1

Qu

eu

e 2

Qu

eu

e 3


Model 3: Punctuality

Intercept

Queue Service

Capacity target

1.019717839 -0.004654462

Agent Name Queue name

Queue target

Service Capacity

per hour

PunctualityExpected

Punctuality

Percentage

deviation

from model

estimate

Agent 1 20.81 95.3% 0.92 3.2%

Agent 2 20.81 91.0% 0.92 -1.4%

Agent 3 20.81 94.8% 0.92 2.8%

Agent 4 20.81 92.9% 0.92 0.6%

Agent 5 20.81 96.8% 0.92 4.9%

Agent 6 20.81 91.7% 0.92 -0.6%

Agent 7 20.81 95.1% 0.92 3.1%

Agent 8 20.81 86.5% 0.92 -6.2%

Agent 9 20.81 93.8% 0.92 1.7%

Agent 10 20.81 95.8% 0.92 3.8%

Agent 11 20.81 95.4% 0.92 3.4%

Agent 12 20.81 83.7% 0.92 -9.4%

Agent 13 20.81 91.5% 0.92 -0.9%

Agent 14 20.81 91.9% 0.92 -0.4%

Agent 15 20.81 94.1% 0.92 1.9%

Agent 16 20.81 95.0% 0.92 2.9%

Agent 17 20.81 83.0% 0.92 -10.1%

Agent 18 20.81 88.8% 0.92 -3.7%

Agent 19 20.81 95.8% 0.92 3.8%

Agent 20 20.81 91.7% 0.92 -0.6%

Agent 21 17.73 98.2% 0.94 4.8%

Agent 22 17.73 96.3% 0.94 2.8%

Agent 23 17.73 92.4% 0.94 -1.4%

Agent 24 17.73 93.4% 0.94 -0.4%

Agent 25 16.98 94.8% 0.94 0.7%

Agent 26 16.98 95.9% 0.94 1.9%

Agent 27 16.98 83.9% 0.94 -10.8%

Agent 28 16.98 91.7% 0.94 -2.5%

Agent 29 16.98 96.8% 0.94 3.0%

Agent 30 16.98 96.9% 0.94 3.0%

Qu

eu

e 1

Qu

eu

e 2

Qu

eu

e 3


5.3 DEA’s first iteration efficiency scores and weights:

Agent NameQueue

name

Efficiency

Scores

Service

CapacityQuality Punctuality

Agent 1 86.47% 14% 86% 0%

Agent 2 87.45% 14% 86% 0%

Agent 3 79.90% 0% 0% 100%

Agent 4 88.42% 14% 86% 0%

Agent 5 83.94% 2% 23% 75%

Agent 6 77.49% 0% 49% 51%

Agent 7 84.13% 0% 49% 51%

Agent 8 78.99% 14% 86% 0%

Agent 9 84.64% 14% 86% 0%

Agent 10 81.92% 2% 23% 75%

Agent 11 82.61% 2% 23% 75%

Agent 12 81.38% 1% 99% 0%

Agent 13 87.55% 14% 86% 0%

Agent 14 91.85% 14% 86% 0%

Agent 15 94.61% 14% 86% 0%

Agent 16 80.01% 0% 0% 100%

Agent 17 86.89% 14% 86% 0%

Agent 18 78.04% 0% 49% 51%

Agent 19 80.77% 0% 49% 51%

Agent 20 81.42% 0% 66% 34%

Agent 21 98.93% 0% 49% 51%

Agent 22 100.00% 14% 86% 0%

Agent 23 91.37% 0% 0% 100%

Agent 24 94.47% 0% 49% 51%

Agent 25 97.80% 0% 0% 100%

Agent 26 99.91% 0% 49% 51%

Agent 27 100.00% 1% 99% 0%

Agent 28 94.71% 0% 0% 100%

Agent 29 100.00% 2% 23% 75%

Agent 30 100.00% 100% 0% 0%

Qu

eu

e 1

Qu

eu

e 2

Qu

eu

e 3

Weights chosen by DEA


References Agasisti, Tommaso and Geraint Johnes. "Beyond frontiers: comparing the efficiency of higher education decision-making

units across more than one country." Education Economics (2009).

Aksin, Zeynep, Francis de Vericourt and Fikri Karaesmen. "Call Center Outsourcing Contract Analysis and Choice."

Management Science 54.2 (2008).

BEASLEY, J. E. "Determining Teaching and Research Efficiencies." Journal of the Operational Research Society (1995).

Call Center Helper Magazine. "The history of the Call Center." 2011.

Cooper, William W., Lawrence M. Seiford and Joe Zhu. Handbook on Data Envelopment Analysis. Springer, 2011.

Datamonitor. "Global - Call Centers." 2004.

Dimension Data. "Dimension Data's Global Contact Centre Benchmarking Summary report 2012." 2012.

Forker, Laura B. and David Mendez. "An Analytical method for benchmarking best peer suppliers." International Journal

of Operations and Production Management 21.1/2 (2001): 195-209.

Gans, Noah, Ger Koole and Avishai Mandelbaum. "Telephone Call Centers: Tutorial, Review, and Research Prospects."

Manufacturing and Service Operations Management (2003): 79-141.

Kaplan, Robert S. and David P. Norton. "The Balanced Scorecard - Measure That Drive Performance." Harvard Business

Review January-February 1992: 71-79.

Lapre, Michael A. and Gary D. Scudder. "Performance Improvement Paths in the U.S. Airline industry: Linking Trade-offs

to Asset Frontiers." Production and Operations Management 13.2 (2004).

Lovell, C.A. Knox and Jesus T. Pastor. "Radial DEA models without inputs or without outputs." European Journal of

Operational Research (1999).

Ross, Anthony and Carnelia Droge. "An integrated benchmarking approach to distribution center performance using DEA

modeling." Journal of Operations Management (2002): 19-32.

Sarkis, Joseph. "An Analysis of the operational efficiency of major airports in the United States." Journal of Operations

Management (2000).

Sun, Shinn. "Assessing joint maintenance shops in the Taiwanese Army using data envelopment analysis." Journal of

Operations Management (2004).

Sunnetci, Aysun and James C. Benneyan. "Weight Restricted DEA Models to Identify the Best U.S. Hospitals."

Proceedings of the 2008 Industrial Engineering Research Conference. 2008.

Performance Evaluation in Call Centers: An Investigation ... · This research explores the use of...

Documents

Transcript of Performance Evaluation in Call Centers: An Investigation ... · This research explores the use of...