Performance Evaluation in Call Centers: An Investigation ... · This research explores the use of...
Transcript of Performance Evaluation in Call Centers: An Investigation ... · This research explores the use of...
Washington University in St. Louis
John M. Olin Business School
Performance Evaluation in Call Centers:
An Investigation into the Use of Analytics Tools
Prepared by Hossam Abuelwafa
Candidate for MS in Supply Chain Management
Committee Members
Professor Sergio Chayet
Professor Amr Farahat (Research Advisor)
Associate Dean Gregory Hutchings
December 15th 2014
2 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Abstract
This research explores the use of analytics tools in evaluating performance of inbound call centers, both
aggregate call center performance and call center agent performance. The definition of Performance in call centers
is dependent on the company’s value proposition, which makes defining performance in general term a challenge.
Call Center performance is multi-dimensional which calls for the use of more capable tools to help define overall
performance on a single scale. In addition, since the call center industry is closely intertwined with outsourcing
agreements, outsiders to call center operations often have to evaluate the performance of “outsourcing
destinations” which calls for a tool that is fair and doesn’t require much subject-matter knowledge. The research
explores Data Envelopment Analysis (DEA) and linear regression as possible analytics tools fit for tackling the
performance evaluation challenge in call centers. DEA and linear regression were applied to real call center and
agent performance data - coming from two different call centers in the Middle East North Africa (MENA) region - to
illustrate the possible insights that can be brought in by both methodologies, in addition to the strengths and
weaknesses of each analytics tool in tackling the performance evaluation challenge in call centers.
The Main contributions of this research can be summarized as follows:
This research represents a novel application of DEA in tackling call center performance evaluation
challenge. As far as we know, DEA has not been applied to this problem before.
In the analysis chapters of this research we have applied DEA to “Decision Making Units” (DMUs) that are
the same entity but over time. This use of DEA is quite uncommon to the DEA literature.
Since the majority of research in call centers involve “Queueing analysis”, the call center data that is
available for use from previous research was designed to meet the queueing analysis needs (Gans, Koole
and Mandelbaum), which is very different from what is needed for studying performance evaluation. So,
this research provides real call center performance data coming from two different call centers in the
MENA region.
Last but not least, this research involves an explicit comparison between DEA and linear regression in
terms of their fit for use in call center performance evaluation.
After conducting the different analyses we have come to the conclusion that, of the two analytics tools explored
- DEA and linear regression – DEA proves to be the best fit to that particular use in performance evaluation in call
centers on both levels of analysis, the call center aggregate performance analysis and the call center agent’s
performance analysis. The research concludes with a very important question that perhaps can act as an area for
future research. The question is that “How can the call center agent’s experience be incorporated in the DEA
analysis and still render accurate results?”
3 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Table of Contents Abstract ..................................................................................................................................................................................................... 2
Chapter 1: Call Center Operations ...................................................................................................................................................... 4
1.1 Overview of the Call Center industry ...................................................................................................................................... 4
1.1.1 Different Types of Call Centers ......................................................................................................................................... 5
1.1.2 Different organizational structures of Call Centers ...................................................................................................... 5
1.1.3 Different roles in Call Centers ........................................................................................................................................... 7
1.2 Outsourcing in call centers ........................................................................................................................................................ 9
1.3 Performance Measurement in Call Centers ......................................................................................................................... 10
1.3.1 Aggregate Call Center Performance ............................................................................................................................... 10
1.3.2 Call Center Agent Performance ....................................................................................................................................... 11
Chapter 2: Analytics as a Benchmarking tool for Call Centers .................................................................................................... 14
2.1 Benchmarking as a platform for Performance Evaluation in call centers ..................................................................... 14
2.2 Linear Regression as an Empirical Benchmarking Analytics tool ................................................................................... 16
2.3 DEA as an Empirical Benchmarking Analytics tool ............................................................................................................ 19
Chapter 3: Case study background and data description ............................................................................................................. 24
3.1 Company Description ............................................................................................................................................................... 24
3.2 Company A - Dataset 1: Call Center’s Aggregate Performance ......................................................................................... 25
3.4 Company B - Dataset 2: Agent’s overall performance ........................................................................................................ 26
Chapter 4: Aggregate Performance Tracking ................................................................................................................................. 28
4.1 Introduction ............................................................................................................................................................................... 28
4.2 Preliminary Data Analysis ....................................................................................................................................................... 29
4.3 Theoretical Benchmarking “Queueing Analysis” ................................................................................................................ 36
4.4 Empirical Self-Benchmarking - I “Multiple Regression” – Dataset 1 ............................................................................... 39
4.5 Empirical Self-Benchmarking - II “Data Envelopment Analysis” – Dataset 1 ................................................................ 41
4.6 Summary of Findings ................................................................................................................................................................ 49
Chapter 5: Agent Performance Assessment .................................................................................................................................... 50
5.1 Introduction ............................................................................................................................................................................... 50
5.2 Preliminary Data Analysis ....................................................................................................................................................... 51
5.3 Absolute Benchmarking “Performance Targets” ................................................................................................................ 56
5.4 Empirical Peer Benchmarking – I “Multiple Regression” – Dataset 2 ............................................................................. 57
5.5 Empirical Peer Benchmarking – II “Data Envelopment Analysis” – Dataset 2 .............................................................. 60
5.6 Summary of findings ................................................................................................................................................................. 63
Chapter 6: Conclusions, and future research opportunities ........................................................................................................ 64
Appendix ................................................................................................................................................................................................ 66
References ............................................................................................................................................................................................. 82
4 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Chapter 1: Call Center Operations
1.1 Overview of the Call Center industry
The Goal behind this chapter
In this chapter, we will try to provide a brief tutorial on the call center industry, in order to help the reader
understand the background of the analysis to be done in later chapters. This chapter will start with a brief
description of call centers, the different types of call centers, and the functional roles involved in call center
operations. Then we will shift gears and talk about “outsourcing in call centers” in which we will explain common
forms of outsourcing agreements through the lens of call centers. Finally we will discuss some of the important
performance metrics involved in call center performance evaluation on both the aggregate call center level, and the
call center agent level. By the end of this chapter, we hope that the reader will have a degree of understanding of
the call center operations that is sufficient to act as a background for the performance measurement analysis that
will follow in later chapters.
The Call Center Business
Let’s start by looking at the definition of the word “Call center” in the Oxford dictionary. Call Center (noun)
is “An office set up to handle a large volume of telephone calls, especially for taking orders and providing customer
service”. It is not very clear when exactly did call centers start, but the call centers we know today have started
hand in hand with the invention of the “Automated Call Distributor” ACD (Call Center Helper Magazine). ACD uses
computerized technology and algorithms to filter through calls, and assign the right calls to the right call center
agents, based on some pre-set rules. For example, if a call center wants to promote familiarity between customers
and call center agents, they usually set the ACD to route the customer calls to the agents that they have recently
spoken to, provided that those agents are free at the time the customer calls. Before ACD technology was invented,
it was usually a human operator who manually transferred the calls to various agents to handle the customer
inquiries.
In the start of 2000s, the size of the global call center industry grew drastically to reach $40.1 billion in
2003, most of that growth was is in the Communications industry (18.5%), followed closely by the Outsourcing
industry (15.6%) (Datamonitor). The industry employed almost 5 million call center agents by the end of the year
2003 (Datamonitor). Thanks to the enabling technologies such as the cloud computing technology, call centers
nowadays have the flexibility of operating literally anywhere in the world, which has really helped make some
serious savings to an industry that is mainly considered a “cost center” to many companies. Moreover, as the
pressure on call center management to be more efficient increases, the majority of call center managers think that
“improved analytics” is a key innovation area that will benefit their call centers (Dimension Data).
5 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
1.1.1 Different Types of Call Centers
As similar as call centers might seem to many, call centers are quite different in terms of the services they offer.
Enabled by technological advancements, call centers offer a wide range of services that can fit the needs of many
business processes and industries. The rule of thumb here is “if it can be done remotely, it can be done in a call
center”. Since the industry applications of call centers are too many to count, call center types are usually classified
based on the type of calls they engage in. The classification is as follows:
Outbound call centers - These are the call centers that mainly engage in outbound calls “i.e. call out
customers”. Examples of outbound call centers include:
- Telemarketing campaigns, in which call centers call customers to create brand awareness and/or
promote certain limited-time offers
- Telesales, which has the purpose of generating a sale on the phone, whether by cold calling new
customers, or generating more sales from existing customer base.
Inbound call centers - Those are the main focus in this research, they mainly receive calls from their
customers but sometimes they also use outbound calls to follow up with customers. Examples of
application here are:
- Technical support, in which customers call the company to receive over-the-phone technical service to
the products purchased.
- Airline booking, in which customers call an airline call center to inquire about a flight status, make a
booking, or manage their tickets and/or luggage.
- Banking Service, which is used by many banking customers to manage their bank accounts, transfer
funds, and inquire about other banking services over the phone.
Omni Channel call centers – Technological advancements nowadays made it possible to interact with
customers through many different platforms, not just the phone anymore. In addition to conventional
phone lines, Omni channel call centers interact with their customers through many different
communication channels such as live chat, email, TXT, etc…
1.1.2 Different organizational structures of Call Centers
Another main difference between call centers is in the way they organize their operations to fit the nature
of the services they offer and the nature of the industry in which they operate. As call centers vary in size, services,
and location, different organizational structures are hence needed to best manage these differences. There are
several dimensions in which the organizational structure can vary, these dimensions are:
Specialized versus Pooled call centers
- Pooled Call Centers - Some call centers prefer to have “Generic Agents”, in which case, all customer
inquiries will be answered by the same generic agent who first picks up the phone, this is usually better
if there services don’t require much of technical knowledge because it enhances a very important
6 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
metric called “First Call Resolution” (FCR) , which measures the percentage of calls that a call center
handles in which the customer inquiry was resolved from the first call.
- Specialized Call Centers - Some other more technical call centers are obligated to follow a “Specialized
Queue” structure, in which every group of agents with similar technical knowledge are organized in a
separate department “aka. Queue”. When customers call up, they are greeted by the “Interactive Voice
Response” unit (IVR) which asks the customer a few questions to determine which queue they need to
join, and then the ACD routes the customer to the right queue based on IVR’s instructions.
Flat versus Multi-layered Call Centers
- Flat Call Centers - Some Call centers are flat in the sense that the frontline “aka. First Line” agents have
all the technical tools and authority they need to resolve any customer issues with the exception of very
rare cases in which supervisors need to jump in and take over, in which case it is regarded as an
“Escalation”, which is a term used to reflect a customer complaint that is escalated to a higher decision
authority such as a manager or a supervisor.
- Multi-layered Call Centers - On the other hand, some other call centers with more technical needs, in an
attempt to be very efficient in the use of their most valuable resources such as engineers, they group
these resources in a higher level usually called “Second Line” in which they receive service requests
from the first line agents if they encounter a technical problem that is way above their head or the
power/resources given to them. Then the second line agents call up the customer directly to help
resolve their technical issue.
Physical versus Virtual Call Centers
- Physical Call Centers – These are the usual call centers, in which the call center is located in a normal
office building or a small office, depending on the size of the call center.
- Virtual Call Centers – These call centers usually do not operate in a physical workspace as the “Physical
Call Centers”, rather, they employ different agents who work from home with the equipment that the
call center have set up for them. This is more widely used in outbound call centers, due to the flexible
nature of outbound calls “i.e. the agent decides when to call”, which is more fit to a work-from-home
lifestyle. Hence, virtual call centers enjoy the savings of running virtual operations. Also, enabled by
“Cloud Computing” technology, virtual call centers are granted access to a new demographic of call
center agents “i.e. stay-home Moms”, a demographic that is more stable in nature, which will help
reduce the too high turnover rate in the industry.
7 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Centralized versus Decentralized Call Centers
- Centralized Call Centers - Some call centers who believe in the “Pooling Effects” in reducing overall costs
would rather have one huge call center in which all their call center operations are nested. This allows
them to enjoy the economies of scale in every aspect of running the call center. On the other hand, from
a business risk management perspective, it makes them very susceptible to business disruptions.
- Decentralized Call Centers - Other call centers have multiple relatively smaller locations, and whenever
a customer calls, the call can either be routed to the call center closest to their location, or to the least
busy call center.
1.1.3 Different roles in Call Centers
Going deeper into call center operations, we need to explore the various roles played by different call
center functions to run a successful call center. Given the wide spectrum of call centers, we understand that a
certain degree of variability will exist among different call centers in terms of the role names or the exact role
definitions, but we will try to be as general as possible in our description. The main roles are as follows:
1- Human Resource Administration – This function is responsible for planning the human resource needs of
the call center, recruiting, screening and hiring the new call center agents. Many call centers, especially in
India, have developed a technical “Aptitude tests” that are used to measure generic core skills (e.g. problem
solving, mathematical, communication, etc…) needed to work in a call center. These “Aptitude tests” speed
up the hiring process in call centers by eliminating the need for further technical evaluation. The human
resource administration also supervise the application of company’s policies and procedures (e.g. dress
code), supervise generic employee performance management, and is responsible for payroll management.
2- Training and Development – The training function is responsible for training the newly hired agents on
technical procedures and knowledge as well as company-specific culture, policy and other job-related
matters. They are the ones responsible for training current agents on any new changes in technology or
policy and procedures. In case a new or existing agent underperforms, the training department is there to
support.
3- Operations Management – This function is carried out by Team supervisors, Real-time Managers (RTMs),
and Account Managers, and all the call center agents. Call Center agents are grouped in teams, which makes
the task of managing call center agent performance easier. Every team is led by a “Team Supervisor” who is
held accountable for the “key performance indicators” KPIs of his/her team. If a call center agent
underperforms on specific KPIs, the team leader agrees with the agent on performance correction plans
8 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
that both of them sign. the actions taken by a supervisor against underperformance are usually ranked as
follows:
Verbal warning
Performance plan
First written warning
Second written warning
Retraining (sometimes)
Termination of the call center agent
The role of the Real-Time Manager is simple, he/she is responsible for monitoring the Call Queue at all
times. Maintaining Schedule and break adherence, which means that everyone is logging in in the right time, and
taking their breaks in the right time as scheduled. He has the power to adjust breaks instantly to respond to
unexpected surges in call volume received, in addition to the power to bring all the “off-queue” agents (performing
off-queue tasks) back to queue or in some extreme cases he/she can resort to making the Team Supervisors – who
were agents one day – log in and take some calls to help reduce the queue length. To sum up, he is the individual
responsible for “Service Level”, which is a main call center metric, which we will explain later in more detail.
Last but not least, in case the call center is an outsourcing destination (i.e. handles more than a call center
account) the role of “Account Manager” is then necessary, who is accountable for the overall performance of the
whole account and is held accountable directly by the “client” company, which is the company that outsourced its
call center. His role is similar to that of a team supervisor in the sense that he manages the KPIs of every team
supervisor and takes necessary actions to fix underperformance problems.
4- Workforce Management – the main responsibility of workforce management is to create call center
personnel shift-schedules in a way that meets planned human resource needs on a day-to-day basis. If a call
center agent needs to adjust his/her breaks, he/she should email workforce management, if the agent
needs to take a leave he/she needs to talk to them also. So, they are responsible for anything that has to do
with schedules and workforce planning.
5- Quality Management – Quality Trainers are responsible for issuing the quality score for each call center
agent every week/month. The rating is done by evaluating the quality of a certain number of calls for every
agent every week/month. The quality score serves as a main ingredient of the agent’s scorecard, which we
will touch upon later (See Appendix 1.2 for a sample of a scorecard). Quality trainers are also responsible
for coordinating with the training team to support the training of call center agents on all quality-related
matters if underperformance is detected, or in case of newly hired orientation training.
9 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
6- IT Management (Help Desk) – This is usually a call center within the call center that serves all the
employees having any technical difficulties with their log in credentials, terminals, or equipment.
7- Facility and Fleet Management – This function is responsible for maintaining the building in which the call
center is located in addition to organizing the trips of employees from and to the building by the use of
hired or owned fleet of buses.
1.2 Outsourcing in call centers
As we mentioned before, the pressure on call centers to be more efficient has led the call center industry to
be one of the earliest and most aggressive adopters of outsourcing. The second highest market segment in call
centers is the outsourcing industry (Datamonitor). Over the years, the two industries “Call Center and Outsourcing”
have been very closely intertwined to the extent that to non-experts outsourcing usually meant call centers. This
sub-section represents a very good opportunity to lay down a brief description of outsourcing in the call center
industry. Let’s start by defining the different parties involved in a usual call center outsourcing agreement:
Client Company – This is the company that wants to outsource its call center function to another
professional company to take care of it. This company’s customers will be served in the new call center for
an exchange for a fee paid by the client company.
Outsourcing Destination – This represents the company that is offering the Call Center Management Service
to the client company in return for a fee that they receive.
As we all know, as a part of any service industry like call centers, planning to match call demand with the right
supply of call center agents to handle that demand is very challenging. As a result, many companies use
outsourcing contracts as a way to better match supply and demand. Let us examine the different types of contracts
associated with call center outsourcing (Aksin, Vericourt and Karaesmen):
1- Pay for Capacity Contract: In this type of contract, the client company rents a fixed capacity at the
outsourcing destination’s call center for a fixed fee. This is usually used when the client company wishes to
outsource the “predictable” portion of their call demand, while keeping a smaller sized call center in-house
to absorb the fluctuations in call demand. For example, if a call center knows that the usually receive over
3,000 calls a week, they might outsource enough capacity to meet the 3,000 calls, while keeping a small in-
house call center to absorb the calls in-excess of the 3,000 expected calls. This type of contracts is much
more economical to maintain because it is easier for the outsourcing destination to plan for capacity.
2- Pay for Job Contract: This contract type is the exact reverse of the previous type. In which the client
company decides to keep the “predictable” portion of their call demand in-house, while outsourcing the
“excess” to an outsourcing destination for a variable fee depending on the amount of excess they handle.
10 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
This is usually more costly to maintain, because the outsourcing destination will have to keep enough
“safety stock” of agents to fight the unpredictability of the excess call demand.
1.3 Performance Measurement in Call Centers
In this sub-section, we will explore many of the previously mentioned and unexplained call center
performance “terms”. The idea is that this chapter will help the reader understand the critical aspects of a call
center performance that needs to be measured and managed. Since our analysis chapters will be divided equally
into “Call Center Performance” and “Call Center Agent Performance”, it only makes sense to do the same here. We
will start our conversation with the definition of performance in an “inbound” call center, and the different metrics
that are normally used to measure that performance. Then, we will define “agent performance” in an inbound call
center setting, and explore all the relevant metrics based on that definition.
1.3.1 Aggregate Call Center Performance
As we can see by now, the call center industry is a highly competitive, fast-paced industry, in which being
efficient isn’t a luxury. In an environment like that, the quality of our decisions depend heavily on the accuracy of
our information, which is mainly derived from performance data. Hence, the ability to accurately and meaningfully
evaluate the performance in call centers is in fact a determining factor to the ability of the call center management
to make the right decisions to drive efficiencies and enhance service quality, thus have better chances of survival
and growth in the marketplace. Let us start by defining performance on a call center level.
Call Center performance can be defined in terms of the tasks it is supposed to carry out, the basic tasks of
an inbound call center are usually as follows:
1- Answer the majority of customer calls in a timely fashion, given a certain threshold of wait time.
2- Provide appropriate service quality to customers. The appropriateness of service should be defined
through the client company’s strategic positioning or value proposition.
3- Provide the quality service fast enough to appreciate the valuableness of customer time. In addition, if
service takes too long, it will be very costly to maintain.
Now, let’s explore various aggregate performance metrics that are at a call center level. Then we will try to
relate them to one or more of the tasks mentioned above. The metrics are as follows:
Service level – Can be described as the percentage of calls received, that was answered within a
given threshold of time (e.g. 20 seconds) from when the call was received. This metric is one of the
most important metrics in the call center business to the extent that some call center outsourcing
contracts - called “Service Level Agreements” (SLAs) – tie the compensation of the outsourcing
destination to the service level achieved. This metric measures the effectiveness in executing “task
1” above. This metric can be presented on the “queue level” or the “call center level”. The
orientation of this metric is “the higher, the better”
11 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Average Handling Time (AHT) “Aggregated to the call center level” – This metric describes three
main averages combined. It represents the average time “1” taken by an average agent “2” to serve
an average customer “3”. As we mentioned before, many call centers adopt the specialized
organizational structure, which means that every queue employs a different kind of agents, receives
a different kind of calls, and each kind of call requires a different average handling time. Thus,
looking at the AHT on a call center level, gives us an idea of the average time that the average
customers will spend in our call center. This metric measures the effectiveness in executing “task 3”
above. The orientation of this metric is “the lower, the better”
Abandoned rate (%) “5 seconds” - This metric represents the percentage of callers on a given
day/week/month that has called the call center, waited on the line and stayed for more than 5
seconds on the line, and then hung up “i.e. reneged the line”. The reason we look at the number of
Abandoned calls after 5 seconds, because usually if a call hangs up before 5 seconds, it is usually a
technical problem or a customer error rather than the length of queue line. This metric represents
the magnitude of failure in accomplishing “task 1” above. The orientation of this metric is “the
lower, the better”
Average Speed to Answer - This metric also relates to “task 1”. This metric represents the average
“wait time” that callers have to spend on the line before being handled by a call center agent. The
orientation of this metric is “the lower, the better”
Customer Satisfaction Survey (CSAT) score “aggregated to call center level” - This metric is usually
the only metric that gives a sense of service quality (from a customer perspective) on the call center
level “i.e. task 2”. The CSAT is the survey that callers usually are asked to take after finishing up a
call with a call center agent. The survey asks them to rate the different dimensions of the customer
service experience. After that, all the CSAT data is aggregated to form a single number that reflects
how satisfied the customers are with the call center. The orientation of this metric is “the higher,
the better”
As we mentioned before, these metrics are some of the most popular metrics used in inbound call center
operations, as we can see, although different metrics can measure the same task from various angles, managers
usually favor some metrics over the others. Let us examine the performance definition on an “agent level”.
1.3.2 Call Center Agent Performance
Call center performance is at the end of the day the result of each agent’s performance added together.
From here, the importance of measuring and evaluating call center agents’ performance stems. Inbound call center
agents’ jobs, as different as they are, they all share the same basic tasks that have to be completed to ensure
successful call center operations. These tasks are:
12 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
1- The agent has to be present in the right time for his/her scheduled shift
2- During the shift, the agent has to stick to the scheduled break times
3- The agent has to work an amount of hours that are at least equal to his scheduled working hours
4- Call center agents must manage their time well on the call, in order to respect the callers time, and to
give equal attention to other callers
5- The agent has to serve the customer with high quality, while quality is broken down to specific tasks by
the quality management team (See Appendix 1.1 for a sample of quality tasks breakdown)
These are the main basic tasks that an agent has to perform to result in an overall successful call center.
Any deviation in the tasks above caused by one or more agents, will affect the call center’s overall performance
metrics discussed in the previous section negatively. Now, let us examine a group of common agent performance
metrics used in most call centers:
Agent Absenteeism (%) “No-shows” – This metric represents the percentage of shifts in which the
agent was present relative to all the shifts that he/she was scheduled to attend. The importance of
this metric can hardly be overestimated, because if the agent isn’t there to begin with, nothing of
the other tasks can be achieved. In addition, call centers usually plan well ahead of time to make
sure that they have enough resources “i.e. agents” to meet the forecasted call demand. This means
that an absent agent means a lost portion of service level, longer waiting time for most customers,
and a more stressful shift for all of their colleagues. As a result, there is a very low tolerance for
absenteeism in call centers in general. This metrics relates to “task 1” mentioned above. The
orientation of this metric is “the lower, the better”
Agent Adherence (%) – This metric represents the percentage of time a certain agent was actively
logged in “i.e. working” relative to the exact time he/she was scheduled to work. For example, if an
agent was scheduled for a break of 15 minutes, and decided to take 20 minutes instead, his/her
adherence will go down as a result. Similarly, if an agent was scheduled to have break or start
his/her shift at 12:00 pm, and he decided to delay the break or the shift start time to 12:30 pm
without coordinating with the workforce management team, these 30 minutes will be reflected
negatively in that agent’s adherence percentage even if he/she plans to stay an extra 30 minutes at
the end of the shift. For inbound call centers, 30 minutes in the middle of the day are completely
different from 30 minutes by the end of the shift, as the demand pattern is completely different.
Accordingly, call centers also have a very low tolerance for consistent underperformance in
schedule adherence because of its direct effect on service level. This metric relates to “task 2”
above. And the orientation of this metric is “the higher, the better”
13 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Agent Conformance (%) – Conformance to schedule is very close to adherence but yet very different.
Adherence cares about whether the agent was logged in at the time “e.g. 1:39 pm” he/she was
scheduled to be logged in. While Conformance cares about the total amount of hours the agent has
worked. So, if an agent was scheduled to work for 8 hours and ended up working for only 7.5 hours,
his/her conformance score will be affected negatively. On the other hand, if an agent logged in for
more than he/she was scheduled for his/her conformance will be more than 100%. This metric
relates to “task 3” above, and its orientation is “the higher, the better”
Agent AHT – This metric measures the average duration of the agent’s calls. This metric is call
center’s frontline defense against losing service level, because when call center staffing decisions
are made, they use the AHT to calculate each agent’s service capacity. Hence, they can decide how
many agents to schedule for each shift given the forecasted call demand. Of course, call centers need
to have safety capacity to account for AHT variations, so they usually overestimate AHT to end up
having a slightly smaller utilization than desired, but that’s normal because this acts as their safety
stock against unexpected surges in call demand. This metric relates to “task 4” above, and is
oriented as “the lower, the better”.
After-call work (ACW) – This metric represents the total time an agent has spent in an ACW status
during his/her shift. The ACW status prevents an agent from getting new calls, while he/she is
wrapping up the required work from the previous call. Since the agent during the ACW status is not
taking any calls, thus not contributing to the service level, call centers do not prefer that agents use
ACW casually. That’s why many call center agents are trained to multi-task so that they can save
some valuable ACW time. This metric relates to “task 4” above, and is oriented as “the lower, the
better”
Quality Score (%) – This is the score given to an agent by a “quality coach” from the quality
management team after he listens to a sample of the agent’s calls during the week/month. The
score is based on a rubric that is already known by the agent, the rubric represents the call center’s
understanding of good customer experience, which optimally should be based on marketing
surveys and research (See Appendix 1.1 for an example of a quality rubric). This metric relates to
“task 5” above, and is oriented as “the higher, the better”.
A last comment on quality is that the quality score (%) is an internal measure of quality, while the only
external quality measure remains to be the CSAT survey, which isn’t always available on an agent level. Although,
some call centers look at it on the agent level, but it is more common to see it on a call center level. Last but not
least, the agent performance metrics mentioned above are usually grouped into 3 main categories in the agent’s
scorecards (See Appendix 1.2 for a sample of a real scorecard for a Telecommunications company in the MENA
region), these categories are Agent’s Productivity, Quality, and Punctuality.
14 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Chapter 2: Analytics as a Benchmarking tool for Call Centers
2.1 Benchmarking as a platform for Performance Evaluation in call centers
The Goal behind this Chapter
This chapter aims at identifying the possible routes for call centers to evaluate their performance without
the use of analytics. Then, the chapter moves into a description of the analytics tools that will be deployed in this
research, specifically “Linear Regression” and “Data Envelopment Analysis” (DEA), as an alternative to some
conventional benchmarking routes that will be explored in the first half of the chapter. The analytics tools
discussion will involve a technical description of the models to be used in the upcoming analysis chapters.
Benchmarking in Performance Evaluation
So far we have been talking about the definition of performance from a functional perspective, which
means that if the job is done then we have accomplished our goal, regardless of the costs involved. But as the
pressure on call centers to become more efficient increases, call centers need to focus on cost-conscious
operations. In other words, the race is not “who will get the job done?” rather it is “who gets the job done
cheapest?”. This is also very consistent with the outsourcing expansion in the call center industry that we are
witnessing today. Outsourcing destinations are able to win the client company call centers because they can do it
cheaper. It is very important to understand that from a business profitability perspective, call centers should not
focus on achieving perfect scores on all the metrics, rather, they should focus on achieving the level of service that
adds value in terms of the company’s value proposition without over producing in non-value adding metrics. For
example, achieving 100% service level is quite costly to maintain in terms of staffing needs, while the customers do
not care if they wait a little bit on the phone, especially if the company is positioned as a low-cost leader rather
than a high-service provider. So, if a call center strives to achieve the 100% service level even though the
customers don’t see that as an added value to them, it will be regarded as overproduction on a metric that will not
add any value to these specific customers. To sum up, performance isn’t absolute, as important as defining
performance is, we also need to define the “profitable” levels of performance for the call center. As a result, in order
to develop an understanding of the “profitable” range of performance, call centers need to benchmark their
performance either to (1) Absolute Benchmarks or (2) Empirical Benchmarks.
Absolute Benchmarking
For the purposes of this research, this type of benchmarking involves defining the call center’s target
performance by comparing it to an “Absolute Benchmark”. This benchmark may very well be a (1) Performance
Targets that were developed as a result of a deep understanding of the company’s value proposition as well as the
customer’s perspective on “value added services”. These performance targets as we will explain in more detail in
chapter 5 are considered a reasonable definition of “profitable” performance ranges that are customized in every
sense to that specific call center. (2) Theoretical Benchmarks, which are benchmarks that are synthesized by using
15 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
theoretical models such as “Queueing theory” to produce a theoretical benchmark of performance based on the call
center’s means of production, or input parameters, such as “staffing level, call demand, AHT, etc…”. These two
categories of absolute benchmarks are very useful in the sense that they are relatively easier and more feasible to
produce and deploy, but they run the risk of going obsolete too often in case of “Performance Targets” or be too
impractical in case of “Theoretical benchmarks”. Both of these benchmark types will be discussed in separate
sections in chapters 4 & 5.
Absolute benchmarks are usually mounted on a “Scorecard” platform (Kaplan and Norton) in case of having
a multidimensional definition of performance. The scorecard helps develop a single scale definition of
performance, which is simply a weighted average of the performance score in each individual scale of the
multidimensional performance. Talking about scorecards takes us immediately to a discussion of “optimal weights”
to be assigned to various dimensions of performance. Later, in the analysis chapters, we will discuss the weights
issues, and we will also try to provide an alternative workaround through the use of DEA.
Empirical Benchmarking
From our research perspective, empirical benchmarking involves comparing the call center’s definition of
performance to other similar entities. These entities can fall under two main categories:
1- Industry benchmarking - in which the compared entities will be other call centers. Call centers then will
have to pay close attention to the comparability of the dataset used. This kind of benchmarking is usually
done through industry consulting firms that collect data from various companies in the same industry and
then sell the descriptive statistics of the collected data to be used by companies again for benchmarking.
The need for using consultants stems from the fact that companies feel more comfortable dealing with a
third party that promises to keep the confidentiality of their data.
2- Internal Benchmarking – the compared entities here are internal in the sense that the company is either
comparing some internal units/personnel to others “Peer benchmarking”, or comparing itself as a whole to
itself over time “Self benchmarking”.
a. Peer benchmarking “Agent Level” - In case of inbound call center agents’ performance, these entities
can be other agents of similar or different queues in the same company. But similar to industry
benchmarking, the company needs to ensure the comparability of different agents in terms of
factors that affect the agents’ performance such as “Experience”.
b. Self-benchmarking: Call centers can benchmark to their own performance over time, but they also
need to control for the changes in their “input parameters”/”factors of production” over time, such
as “staffing level” for example.
First, industry benchmarking means looking at competition and seeing what there definition of “profitable”
performance targets is. We would like to think that the similarity in customer base will mean a similarity in “value
proposition”, we know that this is not always the case, that’s why call centers need to be careful in selecting the
16 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
right competitors with similar “value proposition” to benchmark. Let us examine the reasonableness of industry
benchmarking for call centers. Without going into a discussion of the merits and the beneficial insights brought by
industry benchmarking, here are some of the challenges associated with industry benchmarking:
Many call centers, especially competitors, would prefer not to share performance data.
Call centers, even if they share the industry and the customer segments, are still going to be very
different in terms of their staff, company culture, policies, and technology infrastructure (including
the CRM software used). These differences and many more can make a performance target that fits
a competitor’s environment unreasonable in terms of another call center’s environment
Using professional consulting companies that do benchmarking studies in certain industries is quite
expensive when compared to the other alternative routes to benchmarking
Now, in regards to the second type of benchmarking, “i.e. internal benchmarking”. In this research, we are
suggesting the use of analytics tools, specifically “Linear regression” and “DEA” to carry out both kinds of internal
benchmarking in a more effective and efficient fashion. In the next section, we will explore both methodologies to
act as a background to our analysis that will be presented in chapters 4 & 5.
2.2 Linear Regression as an Empirical Benchmarking Analytics tool
In this section, we will try to briefly explain the concept behind the “Linear Regression” tool in simple
English, as well as provide a brief explanation on how to use it and interpret its report. In addition, we will provide
a brief tutorial of 6 steps on how this tool can be used as a performance evaluation tool in the call center context.
Understanding Linear regression
Linear regression tool is used to estimate a linear relationship between the variable under-study
“Dependent Variable” and other variables that are believed to affect it “Independent variables”. If we have a single
Independent variable, then we are using “Simple Regression”, while if we have multiple independent variables then
the method is called “Multiple Regression”. However, we can only have a single Dependent variable for each
relationship being estimated by linear regression. In addition to estimating the relationship between dependent
and independent variables, linear regression provides information on the statistical significance of various
parameters “i.e.𝛼, 𝛽, 𝜎2” of the linear model “i.e. p-value”. The right p-value “i.e. less than 0.05” for the different
model parameters informs the researcher that the chosen independent variables are statistically significant in
determining the dependent variable’s value. In other words, it confirms or rejects our understanding to which
independent variables affect the dependent variable being studied. The method commonly used to estimate the
linear model is called “Least Squares” method, which chooses the linear model line that minimizes the sum of
squared variations of the data from the regression line. Referring to the data variation from the regression line,
17 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
which is normal, we represent these variations using the error term “epsilon” which is assumed to be normally
distributed with a mean of “0” and a variance of “𝜎2”. A typical multiple regression model consists of the following:
Dependent variable (y)
Independent variables (X1, X2, etc…)
Intercept (𝛼)
Slope for every independent variable (𝛽1, 𝛽2, etc…)
Error term (𝜀)
The Model formulation is depicted as follows:
𝑦 = 𝛼 + 𝛽1𝑥1 + 𝛽2𝑥2 + ⋯ + 𝛽𝑛𝑥𝑛 + 𝜀 ~ 𝑁(0, 𝜎2)
Where, “n” – is the number of independent variables in the model
Regression model estimation report
A regression model usually starts with a researcher’s prior knowledge of what variable needs to be studied
“Dependent variable” and what factors might affect that variable “Independent variables”. After estimating the
regression model, the researcher is informed of the direction and magnitude of the relationships between the
dependent variable and the various independent variables by looking at the values of “Coefficients”, which are the
slopes (𝛽), in the regression model report. In other words, the regression model estimated along with its
significance reports, help adjust the researcher’s prior knowledge by highlighting the most statistically significant
independent variables through the p-value for each independent variable’s coefficient. In addition, the regression
model also provides an estimate of the overall ability of the model in explaining the variability in the dependent
variable (y). This estimate is represented by the “R-square, and adjusted R-square” values. These are very
important values to look for in a regression estimation report. In the chapters to follow, as we carry out our
regression analysis, we will be able to see examples of regression models estimation reports generated by Excel®.
Choosing the right variables
Before we choose the variables, we need to understand that a researcher’s prior knowledge of variables is
much more important than statistical significance values in regression reports, especially when we are using
regression to predict the expected performance range of various entities. The reason is that the statistical
significance is affected by whether the sample is representative or not. For example, if we are trying to study the
relationship between “service level” as a dependent variable and “staffing level” as an independent variable.
Although these two variables are clearly connected, but it may be the case that the sample of data used was at a
time where many newly hired employees where working, so they weren’t efficient enough, so they ended up
affecting service level insignificantly. To sum up, if regression is being used for prediction (like in performance
18 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
evaluation), then the statistical significance values aren’t the only thing to consider, unlike if it is being used for
analyzing past data.
Now, a dependent variable in a call center context is usually a performance metric that we are interested in
studying. Service level or AHT are very good examples of a dependent variable when evaluating the performance of
the call center as a whole, while Quality score can be an example of a dependent variable in a model studying the
performance of call center agents.
Using Multiple Regression as a Performance Evaluation tool
In the analysis chapters to follow, we intend to use multiple regression as a performance evaluation tool,
we will use it in the following fashion:
Step one: Decide on the outcome variable to be studied “y” and the proper predictors “x” of that
outcome using prior subject-matter knowledge.
Step two: Collect a representative sample of data of all the variables involved in the analysis.
Step three: Estimate the regression model parameters. “we used Excel®”
Step four: Use the estimated model parameters especially “alpha & Beta” to calculate the expected
value of the outcome “y-hat” based on the model estimation
Step five: Compare the values of the outcome “y” collected from the data, to the expected outcome
values “y-hat” by the model. And compute “percentage deviations from model estimate”.
𝑃𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒 𝐷𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑓𝑟𝑜𝑚 𝑀𝑜𝑑𝑒𝑙 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 = (𝑦𝑖 − ŷ𝑖)
ŷ𝑖 × 100
Step six: Rank the compared entities based on their “Percentage Deviation from Model Estimate”
from largest to smallest.
In the analysis chapters, we will illustrate the use of linear regression in evaluating different entities, or as
we will call them from now on “Decision Making Units” (DMUs). We will follow the exact same steps mentioned
above, and will report the results for the various DMUs based on that analysis. After exploring the use of multiple
regression in evaluating performance in call centers, we will provide a detail review, on the effectiveness of this
analytics tool to that specific use in chapter 6 after we conclude our analysis. To summarize, let’s take a look at the
following Exhibit.
19 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Exhibit 2.1 Summary of “Linear regression” as a tool for empirical benchmarking
2.3 DEA as an Empirical Benchmarking Analytics tool
In this section we will try to explain the concept of “Data Envelopment Analysis” as a linear program. After
that we will explain the needs of DEA in terms of data formatting, and then we will lay down the formulation of
both models we are planning to use in the analysis chapters.
Understanding Data Envelopment Analysis
DEA since its inception has been widely used in measuring operational efficiency in many industries such
as the airline industry (Sarkis) (Lapre and Scudder), the Military (Sun), and Education (BEASLEY) (Agasisti and
Johnes). Its ease of use and minimal assumptions made it very popular in evaluating performance through
benchmarking. In addition, some interesting uses of DEA were in supplier evaluation and development programs
(Forker and Mendez), evaluating distribution centers’ performance (Ross and Droge), and Moreover, DEA is very
flexible in the sense that many variations of DEA has been created to fit different users’ needs in performance
evaluation.
DEA is a methodology that computes the efficiency of every DMU in converting “inputs” into “outputs”,
which is very similar to a production line. Inputs represent the resources and/or factors affecting performance,
while outputs are the performance level as measured by various performance metrics. For example, the Call
Volume a call center gets can be considered input, while the CSAT score is an output because it is a measure of the
20 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
produced performance. The DEA for the purposes of this research is a linear program that aims to maximize the
efficiency scores of the various DMUs being evaluated. Unlike linear regression, DEA isn’t trying to fit a line at the
center of the data to compare the DMUs to, rather, DEA flies over the data and looks at the top performers to create
an efficient frontier that every DMU is then compared to and given an appropriate efficiency score (Cooper, Seiford
and Zhu). Efficiency scores are 100% for efficient DMUs while it is less than 100% for other less efficient DMUs.
As we mentioned before, DEA is a linear program that tries to maximize the efficiency scores of all DMUs
within the established constraints. However, DEA is given the freedom to choose the weights given to each input
and each output. So, if a certain DMU performs badly on a certain output, DEA can choose to place less weight or
even no weight at all on that input, to make the DMU look as best as it can. Having DEA trying to maximize the
DMUs scores through the use of weights works as an alternative of making each DMU defend their performance
and build their case around which inputs and outputs they think they deserve to be evaluated on. In other words, it
gives the benefit of the doubt, which is very useful if the evaluator is an outsider to the process being evaluated.
Some might see that DEA is too lax on DMUs, but in some situations, having the DEA take the side of DMUs helps to
increase the perceived fairness of the process. Regardless, if the user requires a more firm evaluation, DEA offers
the flexibility to add weights or ranges of weights on each input or output, this variation of DEA is called “weight
restricted DEA” (Sunnetci and Benneyan).
Formatting data correctly for DEA
As we established so far, DEA requires data on both the inputs and the outputs of the evaluated
performance in order to carry out the analysis. DEA calculates efficiency scores, which means the ability to produce
as much as possible –which is measured by outputs – with the least resources you can, which is measured by
inputs. As a result, inputs and outputs should be formatted properly as follows:
Orientation of inputs - From a DMU’s efficiency score standpoint, if all DMUs have the same level of
output, then the DMU with the lowest input will be considered the most efficient. Hence, for DEA,
inputs should be oriented as “the lower, the better” – we mean better for the DMU’s efficiency score.
Orientation of outputs – Assuming a similar perspective, if all DMUs have the same level of input,
then the DMU with the highest level of output will be considered the most efficient. Hence, for DEA,
outputs should be oriented as “the higher, the better” – we also mean the DMU’s efficiency score.
A good example of an input formatted correctly is “staffing level” of a call center, which is the number of
agents working in the call center at a given point in time. The lower “staffing level” goes, assuming the same level of
output for DMUs, the better the DMU score will be because this means that the DMU used less resources “agents” to
produce the same output. On the other hand, a good example of an input formatted incorrectly is “Call Volume”.
The lower the call volume of a DMU goes, assuming the same level of output across DMUs, the lower the efficiency
score will be because this means that the DMU have produced that level of output in a less intense environment of
21 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
call volume compared to other DMUs. In the latter case with call volume, we can fix this problem by using its
inverse, which is “inter-arrival time”, which is the average time between the arrivals of customer calls. This way
inter-arrival time will be oriented correctly to fit the nature of inputs “i.e. the lower, the better”.
Now, a good example of a correctly oriented output is “service level”, because at the same input level, the
DMU that has the highest service level will be the most efficient. On the other hand, an output metric like “AHT” is
an example of an incorrectly formatted output. Because the higher the AHT will get, the worse the service, which
means that at a fixed level of input, the DMU with the highest AHT will be the most inefficient. To solve this
problem, we can also take the inverse of AHT, which is “Service Capacity”, which is the amount of customer calls
that can be managed in a unit of time (e.g. hour) at that level of AHT. This way, the DMU with the highest service
capacity, assuming the same input level, will be the most efficient “i.e. the higher, the better”.
The following Exhibit summarizes the main points in “Data Envelopment Analysis”.
Exhibit 2.2 Summary of DEA as a tool for empirical benchmarking
DEA weight-unrestricted model formulation
In the analysis chapters we will use two main variations of DEA. The first one in each analysis chapter will
be the baseline model, and for that we will use “Weight Unrestricted DEA”, in which DEA is given the freedom to
choose the input and output weights that maximizes the DMU’s efficiency score. After that we will run another
iteration but with a “Weight restricted DEA”, in which we will provide a desired weight level for each output. To
help familiarize the reader with the technical underpinnings of DEA, let’s take a look at the formulation of the
“Weight Unrestricted DEA” linear program:
22 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Model parameters (weight unrestricted DEA)
Inputs: Let Xik denote the value of input i for DMU k
Outputs: Let Yjk denote the value of output j for DMU k
Weights for Inputs: Let ui denote the weight of every input i
Weights for Outputs: Let vj denote the weight of every output j
Efficiency Scores: Let Ek denote the efficiency score for every DMU k
Objective Function (weight unrestricted DEA)
𝑚𝑎𝑥𝑖𝑚𝑖𝑧𝑒: 𝐸𝑘 = ∑ 𝑣𝑗𝑦𝑗1
𝑚
𝑗=1
Model Constraints (weight unrestricted DEA)
DMU Constraints:
∑ 𝑢𝑖𝑥𝑖𝑘 −
𝑛
𝑖=1
∑ 𝑣𝑗𝑦𝑗𝑘
𝑚
𝑗=1
≥ 0
Inputs Constraint:
∑ 𝑢𝑖𝑥𝑖1 = 1
𝑛
𝑖=1
Non-Negativity Constraints:
𝑢𝑖, 𝑣𝑗 ≥ 0
DEA weight-restricted model formulation
Now, in regards to the “Weight restricted DEA” model formulation, it will be the exact same model, except
we will add 1 more parameter and 1 more set of constraints. The additions will be as follows:
Additional Model parameter (weight-restricted DEA)
Assigned Weights: Let wj denote the assigned weights for every output j
23 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Additional Model Constraint (weight-restricted DEA)
Weight Constraints:
𝑣𝑗 ≥ 𝑤𝑗 ∑ 𝑣𝑗
𝑚
𝑗=1
Finally, we would like to end this chapter with a figure that represents the scope of this research in terms of
the performance evaluation methodology and analytics tools involved.
Exhibit 2.3 Research Scope Summary
24 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Chapter 3: Case study background and data description
In this chapter, we introduce the datasets that will be used in the analysis in chapters 4 and 5. We have used
two datasets extracted from two different call centers operating in the MENA region. We refer to the two
companies in which the call centers operate anonymously as (Company A, and Company B). The datasets are
identified as follows:
Dataset 1: Weekly aggregated call center performance for 24 weeks of operations (Company A)
Dataset 2: Agent’s performance aggregated over a three-month period (Company B)
3.1 Company Description
Company A
Dataset 1 came from a fairly new company (started in 2012) that operates in the online retailing business.
Company A’s call center operations are hosted in-house, unlike Company B, which we will discuss next. Company A
like any other big retailer needs to provide pre-sale and after-sale service to help customers with their purchases.
They also need a returns department that manages product returns, In addition to a sales team that works to
promote the company and call up customers to inform them of new offers and run subscriptions, they also offer
their services in multi-languages (Arabic and English). Moreover, in an attempt to push transaction costs down,
Company A’s Customer Care department adopted an Omni channel approach, in which they used emails, live chat,
and social networking – in addition to the conventional telephone lines – to handle customers’ inquiries. As a
result, Company A’s call center is quite mixed in terms of call center services offered (inbound and outbound). But
in this research we are mainly interested in Company A’s inbound call center performance, so we will be focusing
on the customer care department only and exclude any telesales or telemarketing divisions from our analysis.
Company A’s Customer Care department is divided into mainly 5 queues, each specializes in a unique set of
services (initially they were 6 queues, but queue 3 got merged with queue 1 beginning from Week 30 of
operations). Queues 1 & 2 handle customers in Arabic language, while Queues 4 through 6 handle English-speaking
customers. What makes Company A’s operations unusual is the fact that some of the call center agents circulate on
different queues every week in an attempt to maximize agent utilization. This necessitates the “cross training” of
those agents being circulated, which can be costly. But given that this is a startup, and call demand is still hard to
predict, maximizing utilization through cross training is a reasonable strategy. It is also worth mentioning that
Company A’s call center operates on three shifts, but they overlap for most of the working day. So, for simplicity
purposes, we will assume that all agents are there all the time. They operate for 12 hours a day, from 9 am to 9 pm,
and they are open 7 days a week.
25 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Company B
In regards to Dataset 2 that will be used in this research, it was obtained from a call center outsourcing
services Company. Company B provides call center services to other companies that wish to outsource their call
center operations. Company B is also a well-known company in its field, and operates in the MENA region as well.
Company B’s call center floor hosts many accounts for different client companies. Most of Company B’s
agents are paid a fixed salary, and many of them have been in the company for a relatively long time. Company B is
one of the biggest call centers in its region, it has a reputation of being one of the best places a call center agent can
work, and this is mainly due to their loyalty to their good employees.
Dataset 2 that we collected from Company B is from an inbound account that is fairly new, it has started
around June 2014. This account, like all the others, is operated and staffed by Company B’s management, while the
client can only ask for specific criteria for selection and hiring, and in some cases they require that they interview
the agents before they are approved.
Next we describe each of the two datasets in details.
3.2 Company A - Dataset 1: Call Center’s Aggregate Performance
In Dataset 1, we were able to capture the inbound call center’s aggregate performance over the period of 24
weeks of operation, after the exclusion of one week (Week 31) in which performance was considered to be an
outlier due to low “Call Volume”. The inbound call center’s aggregate performance is represented by two main
metrics (1) Average Handling Time (AHT) across all agents per week, and (2) Service Level offered by the call
center for that week. In addition, we obtained data about “Staffing level” of the call center per week, as well as the
“Call volume” per week. So, to sum up, Dataset 1 has the following fields for 24 weeks of operations:
Weekly aggregate AHT
Weekly Service level
Weekly Staffing level
Weekly Call Volume
The dataset looks as follows:
26 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Exhibit 3.1 Company A’s raw data (Dataset 1)
Week 31 was the outlier that was removed from the sample.
3.4 Company B - Dataset 2: Agent’s overall performance
In this dataset, we have obtained three-month’s aggregated agent performance data for 30 agents, who
work in different queues in the same account. The data we have consists of:
Queue AHT target
AHT per agent aggregated over the three-month period
Quality score per agent aggregated over the three-month period
Agents’ employment date
Agent absenteeism for the three-month period
Agent adherence for the three months period
Staffing level
(Agents per Week)
Call Volume
(Customers per Week)
AHT
(mins)
Service Level
(%)
Week 14 18 7122 4.33 23.67%
Week 15 18 9664 4.72 7.00%
Week 16 18 10857 5.19 5.15%
Week 17 18 8264 4.23 16.00%
Week 18 18 10064 3.99 15.26%
Week 19 25 9663 5.72 6.91%
Week 20 25 8287 4.85 20.96%
Week 21 25 9813 5.39 11.23%
Week 22 25 10262 4.47 12.13%
Week 23 27 11511 4.61 7.84%
Week 24 27 10552 4.65 7.81%
Week 25 27 8968 4.67 10.68%
Week 26 33 7914 4.63 18.40%
Week 27 36 6168 4.67 34.74%
Week 28 36 7340 4.47 32.22%
Week 29 36 7676 4.30 40.75%
Week 30 36 10403 4.22 21.74%
Week 32 39 10121 4.27 42.46%
Week 33 39 9689 4.52 35.70%
Week 34 39 9487 4.66 61.75%
Week 35 45 8201 4.66 66.85%
Week 36 45 8106 4.70 59.71%
Week 37 50 8538 4.47 52.76%
Week 38 64 8137 4.80 64.08%
27 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
This dataset has captured the three main dimensions of a call center performance, which are:
1- Agent Productivity: which is represented here by AHT
2- Agent Quality: which is represented here by Quality
3- Agent Punctuality: which is captured by agent attendance and adherence metrics
Exhibit 3.2 Company B’s raw data (Dataset 2)
Both datasets above are analyzed in the next two chapters. We will start with the first level of analysis,
which is the call center’s overall performance “aka. Aggregate Performance”. Aggregate performance looks at the
call center metrics aggregated across all queues and accounts. These metrics are designed to give its user a feel of
the call center’s ability to carry out its basic tasks which are (1) answering the customers, (2) Serving the
customers fast enough and with a (3) appropriate level of Quality.
Agent Name Queue Employment dateQueue AHT
targetAHT Quality
Agent
Absenteeism
Agent
Adherence
Agent 1 6-Jul-14 2.88 2.92 95.2% 8.2% 98.8%
Agent 2 11-Feb-14 2.88 2.87 94.3% 17.9% 99.9%
Agent 3 24-Jul-14 2.88 3.43 83.3% 10.0% 99.7%
Agent 4 3-Apr-14 2.88 2.80 91.1% 13.8% 99.5%
Agent 5 9-Jun-14 2.88 3.10 94.6% 6.5% 100.0%
Agent 6 24-Jul-14 2.88 3.70 86.6% 15.8% 99.2%
Agent 7 6-Jul-14 2.88 3.08 98.7% 9.1% 99.3%
Agent 8 12-Jul-14 2.88 3.22 89.2% 26.9% 100.0%
Agent 9 11-Feb-14 2.88 3.03 98.9% 11.9% 99.5%
Agent 10 12-Jul-14 2.88 3.22 90.3% 8.2% 99.7%
Agent 11 18-May-14 2.88 3.15 92.5% 5.6% 96.4%
Agent 12 9-Jun-14 2.88 3.77 98.4% 32.0% 99.3%
Agent 13 9-Jun-14 2.88 2.83 90.9% 13.8% 96.7%
Agent 14 9-Jun-14 2.88 2.73 99.6% 15.8% 99.7%
Agent 15 21-Jun-14 2.88 2.63 99.8% 11.9% 100.0%
Agent 16 6-Jul-14 2.88 4.02 86.6% 10.0% 99.9%
Agent 17 24-Apr-14 2.88 2.87 91.6% 30.7% 96.7%
Agent 18 8-May-14 2.88 3.85 90.9% 22.2% 99.9%
Agent 19 12-Jul-14 2.88 3.58 90.0% 8.2% 99.9%
Agent 20 6-Jul-14 2.88 3.27 95.7% 15.8% 99.3%
Agent 21 12-Mar-13 3.38 4.52 95.9% 2.6% 99.0%
Agent 22 16-Apr-14 3.38 3.02 100.0% 4.7% 97.3%
Agent 23 3-Jun-14 3.38 4.45 84.8% 14.7% 99.5%
Agent 24 8-Jan-14 3.38 4.80 92.0% 12.2% 99.0%
Agent 25 3-Jun-14 3.53 4.08 70.0% 9.1% 98.6%
Agent 26 14-May-14 3.53 3.55 91.7% 4.0% 95.8%
Agent 27 24-Jun-14 3.53 4.10 100.0% 25.8% 93.6%
Agent 28 1-Aug-14 3.53 4.02 81.0% 11.4% 94.9%
Agent 29 3-Jun-14 3.53 3.33 90.9% 5.4% 99.1%
Agent 30 3-Jun-14 3.53 2.80 57.8% 3.3% 97.2%
Qu
eu
e 1
Qu
eu
e 2
Qu
eu
e 3
28 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Chapter 4: Aggregate Performance Tracking
4.1 Introduction
This Chapter examines Aggregate Call Center Performance from a “client company’s perspective”. Thus, we
are assuming an outsourced call center situation rather than an in-house call center. Aggregate call center
performance means: “The call center metrics that measure the ability of the call center in carrying out its most
basic tasks aggregated across all queues”. In our analysis, through the client perspective, we will look at the call
center performance over time “i.e. time series analysis”, as opposed to analyzing call center performance at a single
point “i.e. snapshot”. The client company cares mainly about the following aspects of performance over time:
How many of the calling customers have been answered in a reasonable time period?
What is the average service time?
What is the quality of service they are receiving?
The answer to the first question is well-captured through the “Service level” metric, which is the percentage of
customers answered in a given threshold of time (e.g. 20 seconds) of the total number of customers calling that
day/week/month. Client companies appreciate the service level metric so much that they have created what is
called a “Service level agreement” (SLA) with outsourcing destinations. SLAs allow clients to tie the outsourcing
destination’s compensation to the service level they achieve at the end of each period. SLAs help align the
incentives of the outsourcing destination to those of the client company which wishes to see service level going up.
Client companies also expect that every year the outsourcing destination should become more efficient, so they
usually reduce the value of the contract by a certain percentage every year. Service level will be included in our
aggregate analysis, later in this chapter.
The second question is answered by looking at the “Average Handling Time” metric, which shows the average
service time experienced by customers answered on that day/week/month. AHT is considered one of the most
common metrics in the call center business, due to its direct effect on queue length, which in turn affects service
level. So, the higher AHT goes, the lower the average capacity of agents to handle customers, which means that the
queue length will increase, and more customers will stay on the line beyond the desired threshold, which means a
lower Service level. AHT will also be included in our aggregate analysis, later in this chapter.
Last but not least, the quality question can be answered in two ways (1) by looking at the Quality scores given
by the internal Quality staff (working for the outsourcing destination). (2) If the client prefers a third party to do
the analysis, they would use a customer satisfaction survey (CSAT), which simply asks the customer after finishing
the call with the Customer service representative to rate the various aspects of customer service experience (e.g.
on a scale from 1 to 10). Quality is usually measured on the agent level, and sometimes when CSAT is used, the call
center can aggregate the data into an overall call center service quality score. However, in our aggregate analysis
29 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
with dataset 1, due to lack of data about aggregate quality, we will not be able to include Quality as a dimension in
our analysis.
This analysis is challenging because considering individual metrics such as service level, AHT, or quality
independently or without considering the inputs used to produce these metrics does not provide a holistic view of
the call center performance. For example, a client can look at the service level for the past 3 month to find it
improving while the outsourcing destination have billed them to hire double the number of agents, which means
that the outsourcing destination might not be managing the resources they have efficiently. Or maybe the AHT and
service level look promising while the service quality has deteriorated. To summarize, the main challenges in this
analysis are as follows:
There are multiple outputs (i.e. performance metrics) involved in the analysis, some of which might be in
tension with each other (e.g. AHT and quality), so one cannot easily conclude a meaningful story just by
looking at one output, or even by looking at all outputs separately.
Looking at the outputs alone without looking at the costs associated with producing these outputs is
absurd. For example, the outsourcing destination can improve service level by hiring more agents rather
than managing their existing agents more efficiently.
Having multiple outputs and multiple inputs adds much more complexity to the analysis.
For these reasons, the aggregate call center performance analysis from a client company perspective is very
challenging without the use of the right analytics tool. In the next section, we will try to see how informative
“Dataset 1” can be to a client without the use of analytics (i.e. Preliminary Data Analysis). After that, in the sections
to follow, we will attempt to use different analytics tools to see if we can extract a much more meaningful picture
than the one achieved in the preliminary data analysis without analytics.
4.2 Preliminary Data Analysis
In this section we will try to look at the data presented in dataset 1, while assuming a client perspective to
the performance evaluation problem over time. We will try to extract the most we can from the dataset without the
use of any analytics. In later sections, we will try to compare and contrast the results we get from this section to the
results achieved by the use of analytics. For standardization purposes, we will transform two items from dataset 1
to be friendlier to the DEA analysis in the following sections. We will transform (1) AHT, which is an output (i.e.
performance metric), to its inverse format “Service Capacity”. The reason is that we need it to be in a “the more, the
better” format, to fit the needs of the DEA analysis. (2) Call Volume, which is an input (i.e. differentiating variable),
will be transformed to “inter-arrival time” – its inverse – because inputs are needed to be in a “the lower, the
better” format for the purpose of the DEA analysis. Now, with our data oriented correctly, let us start by examining
the various outputs and inputs in dataset 1. The metrics involved in our analysis are as follows:
30 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Outputs (Performance metrics) – orientation is “The higher the better”:
(1) Service Level Percentage (Weekly)
(2) Customer service representatives (CSR) service capacity per hour – It is the reciprocal of average weekly
handing time (AHT) multiplied by 60 minutes per hour - which is used to reflect that the higher the
capacity, the lower the AHT, the faster the service, thus the better the performance.
Inputs (Differentiating variables) - orientation is “The lower the better”:
(1) Staffing level (# of CS agents staffed for every week) (Weekly)
(2) Inter-arrival time (minutes), which is the reciprocal of “Call Volume” multiplied by 12 hours of operations
per day, 7 days a week, and 60 minutes per hour – This is used to reflect that the bigger the call volume, the
smaller the inter-arrival time, which means the more difficult it is to maintain a good service level given a
fixed number of staff. (Weekly)
Our analysis extends from Week 14 of Company A’s Operations to Week 38, with the exception of week 31,
which as mentioned above, is an outlier that had to be eliminated from the analysis. The data after the necessary
transformations looks as follows:
Exhibit 4.1 Dataset 1 in analysis-ready formatting
Staffing level
(# of Agents)
Inter-arrival time
(mins)
CSR Service
Capacity per hourService Level (%)
Week 14 18 0.71 13.87 23.67%
Week 15 18 0.52 12.71 7.00%
Week 16 18 0.46 11.57 5.15%
Week 17 18 0.61 14.17 16.00%
Week 18 18 0.50 15.05 15.26%
Week 19 25 0.52 10.48 6.91%
Week 20 25 0.61 12.37 20.96%
Week 21 25 0.51 11.13 11.23%
Week 22 25 0.49 13.42 12.13%
Week 23 27 0.44 13.02 7.84%
Week 24 27 0.48 12.91 7.81%
Week 25 27 0.56 12.85 10.68%
Week 26 33 0.64 12.96 18.40%
Week 27 36 0.82 12.86 34.74%
Week 28 36 0.69 13.42 32.22%
Week 29 36 0.66 13.95 40.75%
Week 30 36 0.48 14.22 21.74%
Week 32 39 0.50 14.06 42.46%
Week 33 39 0.52 13.27 35.70%
Week 34 39 0.53 12.87 61.75%
Week 35 45 0.61 12.88 66.85%
Week 36 45 0.62 12.75 59.71%
Week 37 50 0.59 13.42 52.76%
Week 38 64 0.62 12.50 64.08%
Inputs Outputs
31 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
We will start our preliminary data analysis by graphing the outputs and look at their descriptive statistics
and trends over time, and then we will do the same for inputs to try to see if we can get a meaningful picture.
Descriptive Statistics:
Outputs:
Exhibit 4.2 Outputs graphed (Dataset 1)
CSR Service Capacity per hour
Service capacity per hour doesn’t show a general direction across the period of analysis, it fluctuated
significantly from Week 14 through Week 22, and then fluctuation seemed to tone down. The all-time high
in service capacity per hour was achieved in Week 18 with approximately 15 customers per hour, and the
all-time low happened at Week 19 with approximately 10.5 customers per hour. The steepest incremental
decrease was at Week 19 were the service capacity dropped by almost 4.5 customers per hour. The average
service capacity throughout the period of analysis was approximately 13.1 customers per hour.
Service Level
Service level started at Week 14 at 23.67% and also fluctuated without a general direction until week 24.
Then at week 25 the uptrend started with an all-time high of 66.85% at Week 35. The highest incremental
increase happened during Week 34, where service level improved by 26.05% in one week to become
61.75% at Week 34. Service level closes strongly at Week 38 with a 64.08%. The all-time low for this period
32 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
of analysis is at Week 16 with 5.15% of service level. The average Service level throughout the period of
analysis is 28.16%.
Relating both metrics
The output metrics are weakly correlated. The correlation coefficient is 0.16823.Therefore, the fluctuation
in Service level doesn’t seem to be explained so much by the Service Capacity per hour or vice versa.
Inputs:
Exhibit 4.3 Inputs graphed (Dataset 1)
Staffing level
Since Company A’s operations is in the start-up stage, the customer care department’s staffing level is
increasing over time as there sales and customer base grow, it started at Week 14 with only 18 agents. In
the period of analysis (24 weeks) the staffing level has increased by 2.5 folds to reach 64 agents by Week
38. The highest incremental increase took place at the last week with 14 Customer service agents hired in 1
week!
Inter-arrival time
Inter-arrival time seems to fluctuate with no general direction with the exception to the huge increase –
which means a decrease in call volume - that happened starting from Week 24 until Week 27, where the
33 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
inter-arrival time increased by approximately 0.11 minutes on average every week through that period.
The biggest incremental change was at Week 17 where inter-arrival time increased by about 0.15 minutes
in only 1 week. The all-time high is represented by Week 27 at an inter-arrival time of 0.82 minutes. While
the all-time low was about 0.44 minutes at Week 23. The average inter-arrival time throughout the period
of analysis is 0.57 minutes.
Relating both metrics
These two metrics have a weak positive correlation of 0.2635.
As a result to our descriptive statistics, we can see clearly that there is no agreement on which week of
operations was the best, the rankings based on our two outputs “Service level” and “Service Capacity are as
follows:
Exhibit 4.4 Summary of ranking by outputs (Dataset 1)
We can see that the only agreement is on the worst week, which is probably coincidental. But in general, there
is no definition yet to be found of “overall performance”.
Now, after looking at outputs and inputs separately, let us examine the possibility of combining some of the
four variables together to get more meaningful conclusions. But first we need to look at the relationship between
the various variables.
Correlation matrix:
The correlation matrix showed a significant positive correlation between “Staffing level” and “Service
level”. This is expected, because service level is inversely proportional to the time customers spend waiting on the
line, which is in turn also inversely proportional to the number of servers (staff) available. That is why Service level
is directly proportional with staffing level. The other moderately significant relationship is found between Inter-
arrival time and Service level, which is also very intuitive, as when the inter-arrival time increases, the call volume
decrease, which makes achieving a higher service level more possible.
Service level Service Capacity
Best Week Week 35 Week 18
Second Best Week Week 34 Week 30
Worst Week Week 19 Week 19
34 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Exhibit 4.5 Correlation Matrix (Dataset 1)
Using ratios to combine variables:
A common way of combining multiple variables to give a more meaningful representation of performance
is “ratios”. For example, retailers use “sales per square foot” to examine the sales performance after controlling for
“retail space” as a differentiator between retail chains. This combines the output metric “i.e. sales” with an input
resource that was used to produce it “i.e. retail space”. We think that applying the same methodology to our
problem here can be very beneficial, as it will tell a better story than we have now.
Agent contribution to Service level: This ratio represents the average contribution to service level made by
each agent every week. The ratio is achieved by dividing weekly “Service level” in dataset 1 by the weekly
“Staffing level”. The units of this ratio is “percentage per agent”. By looking at the trend made by this ratio,
we can already see that it is telling a different story from the individual metrics “Service level” and “Staffing
level”. For example, the all-time high service level (66.85%) was achieved at “Week 35” at a staffing level of
45 agents. While by looking at the “agent contribution to service level” ratio, we can see that “Week 34”,
which had only (61.75%) service level, is the all-time high (1.6% per agent) as compared to Week 35’s
(1.5% per agent), because it achieved that level with only 39 agents.
Exhibit 4.6 Agent Contribution to Service level graphed (Dataset 1)
Staffing level
(# of Agents)
Inter-arrival
time (mins)
CSR Service Capacity
per hour
Service Level
(%)
Staffing level (# of Agents) 1.00
Inter-arrival time (mins) 0.26 1.00
CSR Service Capacity per hour 0.02 0.11 1.00
Service Level (%) 0.85 0.40 0.17 1.00
35 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Agent’s Share of Successful Calls (ASSC): This ratio represents the average number of customers that were
handled successfully “i.e. within service level threshold” by every agent for each week. This ratio is
calculated by using “inter-arrival time” in its original form “Call Volume”, and then multiplying the call
volume by “Service level”, which should give us the number of calls that were handled “successfully”, which
will then be divided by the number of agents available “i.e. Staffing level” that week. This metric, combines
3 out of 4 variables, and paints a slightly different picture than the one painted by “Service level” only,
especially towards after Week 34. We can see that in various occasions (especially towards the end of the
period), “Service level” over estimates the performance of some weeks, while ASSC because it adjusts for
inputs, we can see that as the hiring frenzy picks up in the last few weeks, ASSC starts penalizing overall
performance. It is worthy to be mentioned that “Week 34” is considered dominant to all other weeks, which
is consistent with our findings with the “Agent Contribution to Service level” ratio.
Exhibit 4.7 ASSC metric graphed with Service level (Dataset 1)
As we can see, combining variables with ratios that make sense is very beneficial, and gives a clearer and
more holistic picture on real performance, which otherwise would have been hidden in the folds of single-variable
analysis. To summarize the differences, let us take a look at the different rankings of weeks of operation in dataset
1 achieved by different metrics and ratio analysis.
36 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Exhibit 4.8 Summary of Preliminary Analysis rankings (Dataset 1)
Although we couldn’t reach an agreement on a unified ranking, but we can see that by deploying ratios, a
more holistic picture of performance can be achieved. As a result, we conclude that the ratio analysis is very
beneficial as it works to combine different metrics into one more meaningful ratio, but unfortunately it has the
following drawbacks:
Ratio analysis cannot combine all 4 variables and still produce a new meaningful metric
Ratio analysis might require some subject matter knowledge in terms of which metrics affect which, or how
they are connected, which might not be the case assuming a client company perspective.
As a result of this preliminary data analysis, we can have a sense of what the optimal analytics tool should
produce:
It should give a holistic view of performance “include ALL variables in the analysis”
It should produce a unified ranking of the different weeks of operation in terms of overall performance
It should be fair to the outsourcing destination without having to know too much about the call center’s
internal operations
In the next few sub-sections we will explore different analytics tools and analyze the strengths and weaknesses
of each in tackling this particular performance evaluation challenge. We will examine two main framework
approaches to using analytics:
1. Benchmarking to a theoretical yardstick, in which we will use analytics to compare overall performance to
what it should have been from a theoretical standpoint based on “Queuing theory”
2. Benchmarking to an empirical yardstick, in which we will use analytics to compare company A to itself over
time and develop a ranking based on that, in this approach we are going to experiment with specifically
“Multiple regression” and “Data Envelopment Analysis” as possible analytical tools.
4.3 Theoretical Benchmarking “Queueing Analysis”
In an attempt to answer the “right tool” question, we thought we should examine the benchmarking
methodology since the client has very little knowledge of the call center operations, thus benchmarking can
compensate the client’s lack of subject-matter knowledge by providing a benchmark for comparison.
Service Level Service CapacityAgent Contribution to
Service LevelASSC
Best Week Week 35 Week 18 Week 34 Week 34
Second Best Week Weel 34 Week 30 Week 35 Week 35
Worst Week Week 19 Week 19 Week 19 Week 19
37 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
In this analysis, we will attempt to use “Queueing Theory” to produce the optimal weekly service level “i.e.
the theoretical benchmark” given the weekly “Staffing level, inter-arrival time, and Service Capacity” by the use of
an excel template produced by Professor John O McClain of Cornell University.
The Excel® template is available for free use, and it includes two different files, the first one
(QueueTransient.xlsx) calculates MMc models assuming transient queue “which is fit for a system that is just
starting”, and the second one (Queue.xlsx) calculates MMc models assuming steady-state queue “which is fit for a
system that has been online for quite a while”. We have decided to use the steady-state template, since Company
A’s call center has been operational for almost 2 years now. The steady-state template accommodates the needs of
different call center types by introducing different sheets, each accommodates a different option as follows:
1- Finite queue sheet, which is fit for a situation in which the queue length is limited. This can represent a call
center that has limited queue capacity “i.e. trunk lines” which if all of them are occupied, the next callers
would get a busy signal and won’t be allowed to join the queue until at least one trunk line is freed. Trunk
lines are freed when waiting customers are connected to a free agent’s terminal.
2- Infinite queue sheet, which suits situations where queue length is virtually unlimited to a certain number, or
information about queue length limitation isn’t available. These two templates “Finite and Infinite queues”
assume exponential distribution for service time and inter-arrival time. This takes us to the last template
3- Queue simulation sheet, which allows for change in the value of the “Coefficient of variation” which if set to
1 represents exponential distribution for service time, but if it is set to different values, it can represent
other types of distributions for service time, while inter-arrival time is always exponentially distributed.
This allows for flexibility in choosing distributions to fit different call center patterns.
For the problem in hand, we decided to start by using the “Infinite Queue” template, since we lack the
proper data about Company A’s trunk lines “i.e. queue capacity”. The template requires 3 pieces of data to calculate
the theoretical service level for each week, these pieces are:
1- Number of servers, which we have in dataset 1 for each week as the “Staffing level”
2- Arrival rate per hour, which is simply the “Call Volume” divided by the number of hours of operation, which
we also have in dataset 1
3- Service Capacity for each server, which we also have in dataset 1 as the “CSR Service Capacity per hour”
We have successfully conducted the analysis for each of the 24 weeks, but unfortunately, the results weren’t
very helpful, because all the theoretical service levels were defined as 100% for all the 24 weeks. This result wasn’t
too surprising given the MMc models assumptions. Considering the MMc models assumptions we think that these
are the possible reasons for queuing model to be a poor tool for this problem. We divided the reasons into
“Assumption-related shortcomings” and “Modelling or data shortcomings”:
38 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Assumption-related shortcomings
The template used “Infinite queue” assumes exponential distribution for service times, which might
not be the case for Company A’s call center.
The template used “Infinite queue” as a part of the “Queue.xlsx” excel worksheet assumes a steady-
state queue, which might not be true for Company A’s queue given its start-up nature. “N.B. Some
companies take years to reach a steady state queue”
Since Company A has 5 different queues, MMc queueing model –which is the one used in the
“Infinite queue” template- might not be the right model to be used on this particular problem.
Because MMc assumes 1 waiting line for all servers, but in Company A they might have 5 different
waiting lines in front of each of the 5 queues.
Some customers might be transferred multiple times to different queues to perform multiple
transactions in one phone call, leading to a larger processing time for this customer, and more
waiting time for other customers. This isn’t captured by the MMc model used.
Modelling/Data shortcomings
The template used “Infinite queue” assumes infinite queue size, which is not the case for Company
A’s call centers. But we couldn’t get the data regarding there trunk lines.
The model doesn’t account for the “After-call work” (ACW) status that a call center agent might use
to get a chance to finish the forms that need to be filled after a call or any other call-related tasks.
This ACW status eats away from the agent’s time of service
The queueing model doesn’t capture the different shifts of Company A’s call center agents, since not
all agents are available for all the operating hours
The model doesn’t capture the fact that agents take breaks, and some of their break times are
variable, conditional on the queue traffic.
All these factors are possible reasons behind having all theoretical service levels as 100%. This made it
very difficult to use these results as a benchmark to be used by the client company. That’s why we wanted to try
the “Queue simulation” template in which we can have more freedom to change the service time distribution, but
the template required knowledge of the Queue capacity “i.e. number of trunk lines”, which we don’t know about
Company A. That’s why we had to give up on using queueing theory as an analytical tool to produce a theoretical
benchmark for this problem in-hand.
Now, it is obvious that this problem requires a more sophisticated, yet a more practical tool of analysis, one
that can look at different variables at the same time and provide a good picture on performance over time. Let us
examine “multiple linear regression” as a possible tool to carry out this analysis.
39 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
4.4 Empirical Self-Benchmarking - I “Multiple Regression” – Dataset 1
The reason we considered using regression analysis was that its data driven nature, which will allow it to
be much more practical than the previous analytics tool in the sense that it uses the call center data to estimate a
model that would set the expectations for each entity under evaluation. As we mentioned in chapter 2, we will use
the “six steps” to evaluate aggregate overall performance using regression over the period of 24 weeks (dataset 1).
But first, we need to define the following:
1. DMU: in this analysis, we will look at the call center’s aggregate overall performance per week, which is
defined by both Service Capacity and Service level. So, each DMU will represent a week of operations in
Company A.
2. Independent variables (inputs): our independent variables will be (1) Weekly Staffing level of the call
center (2) Weekly Average inter-arrival time of.
3. Dependent variables (outputs): our dependent variables are (1) Average Service Capacity per agent per
hour”, and (2) Service level (%)
Now, given the limitations of multiple regression, while we can have multiple independent variables, we can
only have one dependent variable in each model, which means that we will have to have a separate model for each
of the two dependent variables “Service Capacity” and “Service Level”. The models are formulated as follows:
𝑆𝑒𝑟𝑣𝑖𝑐𝑒 𝐶𝑎𝑝𝑎𝑐𝑖𝑡𝑦 (𝑦1) = 𝛼1 + 𝛽1 ∗ 𝑆𝑡𝑎𝑓𝑓𝑖𝑛𝑔 𝑙𝑒𝑣𝑒𝑙 (𝑥1) + 𝛽2 ∗ 𝑖𝑛𝑡𝑒𝑟𝑎𝑟𝑟𝑖𝑣𝑎𝑙 𝑡𝑖𝑚𝑒 (𝑥2) + 𝜀1
𝑆𝑒𝑟𝑣𝑖𝑐𝑒 𝑙𝑒𝑣𝑒𝑙 (𝑦2) = 𝛼2 + 𝛽3 ∗ 𝑆𝑡𝑎𝑓𝑓𝑖𝑛𝑔 𝑙𝑒𝑣𝑒𝑙 (𝑥1) + 𝛽4 ∗ 𝑖𝑛𝑡𝑒𝑟𝑎𝑟𝑟𝑖𝑣𝑎𝑙 𝑡𝑖𝑚𝑒 (𝑥2) + 𝜀2
Having the models ready, we will go ahead and use excel to estimate the parameters for both models (See
Appendix 4.1 for Model estimation reports). Now, after estimating the models parameters, we can go ahead and
use both models to calculate the expected value for outputs [model 1 = “Service Capacity”, model 2 = “Service
Level”], after that we will compute the percentage deviation of each week’s output from the estimated output by
the regression model (See Appendix 4.2 for detailed results). We now discuss the conclusions that we can drive
from the results of both regression models.
Model 1 “Service Capacity”: regression is painting a picture that we haven’t seen before so far. It has
combined 3 out of the 4 variables in dataset 1 to produce a very meaningful aggregate performance
evaluation. Week 18 dominates the other weeks according to this ranking (as opposed to Weeks 34
and 35 in the preliminary data analysis), the worst week seems to be Week 19, which is consistent
with our findings from the “Preliminary Data Analysis”.
40 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Exhibit 4.9 Percentage Deviation from Model 1’s estimate graphed (Dataset 1)
Model 2 “Service level”: The results from this model are quite interesting, because they also oppose
the initial rankings presented by the ratios in the preliminary analysis, as Week 18 here is the
dominating week and not week 34 or 35 as was discovered in the preliminary analysis. The results
from this model are highly reliable, as this model has an Adjusted R-square of 74%. The worst week
here is still week 19 though.
Exhibit 4.10 Percentage Deviation from Model 2’s estimate graphed (Dataset 1)
41 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Now, by looking at these results, one can see the value that is brought by the comprehensiveness of
analytics when compared to the results of the “Preliminary Analysis”. Also, the picture painted by regression here
is much more elaborate and practical than Queuing analysis (which said that every week wasn’t efficient!). Here,
we can see that regardless of what theory says, regression takes a data-sensitive approach by setting expectations
based on the dataset’s performance in general.
Best Week(s) Second Best Week(s) Worst Week(s)
Model 1 (Service Capacity) Week 18 Week 30 Week 21
Model 2 (Service level) Week 18 Week 14 Week 24
Exhibit 4.11 Summary of post-regression rankings (Dataset 1)
As appealing as this is, unfortunately, we are still unable to answer the simple question of “which Week is
best overall?”, “which week is the second best?” etc… We only got lucky that Week 18 is dominating both output
metrics (Service Capacity and Service level), so we can answer who is the best, but what about who is the second
best, etc… Multiple linear regression is a very beneficial analytics tool, but as a tool that is supposed to tackles this
specific performance evaluation challenge it has the following drawbacks:
In case of multiple performance metrics, regression can’t produce a single overall evaluation of
performance, since it can only take in one dependent variable. So regression will always result in
multiple rankings.
Regression is a very widely used analytical tool, but not all relationships are linear, in which case,
linear multiple regression will not be the right tool to be used. The perfect example for that is when
we get into dataset 2, we will see experience as an input. Experience isn’t linear because it has
diminishing returns over time, that’s why we couldn’t use linear regression with experience as an
input in dataset 2.
So, overall, regression analysis is an improvement from the less comprehensive and less accurate
“Preliminary Data Analysis”, and is also a more practical approach than the Queuing theory’s too-theoretical
benchmark. But in general, it is still missing the ability of giving a single definition of overall performance across
multiple performance output metrics (i.e. Service Capacity and Service level).
4.5 Empirical Self-Benchmarking - II “Data Envelopment Analysis” – Dataset 1
Given the nature of this problem, we needed a tool that will satisfy the multi-dimensional nature of the overall
call center performance evaluation challenge. Although using DEA to evaluate performance over time is an
uncommon application of DEA, but we choose DEA because:
42 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
(1) DEA is multi-dimensional in the sense that it can handle multiple outputs (Performance metrics) and
multiple inputs. This attribute is central to our analysis from a client company perspective, since the client
usually faces multiple dimensions of performance outputs, as well as multiple input parameters.
(2) DEA produces a single efficiency score to indicate the efficiency of every DMU in producing outputs, given
the input parameters. These efficiency scores are comparable in every sense because they were created
after considering the different input levels, which means that the client will have a much better picture of
how well the outsourcing destination’s team is managing call center operations given the varying input
levels every week.
(3) Since, as a client company, we don’t know too much about the outsourcing destination’s internal processes,
we needed a tool that will be fair to the outsourcing destination in the sense that it gives the highest
possible efficiency score given any “weights” assigned to inputs. This is equivalent to letting the
outsourcing destination management team defend themselves and build their argument around which
inputs they think had the most effect on performance output. As inaccurate as it might seem, this option is
very useful when you have little knowledge of internal operations because it gives the benefit of doubt.
(4) It is also fair in the sense that it is data-sensitive, which means it doesn’t compare DMUs performance to an
absolute optimum or a benchmark, it compares the DMUs to other efficient DMUs. This basically means that
the client is comparing the outsourcing destination’s performance to its best-self. This is also very useful,
when you have little knowledge of the industry, for example, the client doesn’t know what is the fair
increase in service level that should come from adding 3 more employees or reducing call volume by 20%
(5) Last but not least, DEA is a very simple technique to learn and apply, which is very useful for a client’s non-
technical needs.
DEA’s first iteration (unrestricted weights):
Now, in order to start with the DEA analysis, we need to define the following:
Decision making unit (DMU) – For the purpose of this analysis, we decided to use “weekly” aggregate
performance of the call center as the decision making unit.
Inputs – we chose “Staffing level” and “inter-arrival time” to be our inputs, since the first input (Staffing
level) reflects management’s “hiring and firing” decisions on a weekly basis, and the second input (inter-
arrival time) reflects management’s ability to plan workforce to match the variable call volume. These two
inputs are chosen as inputs because they are up to the client to change, so they are considered input
parameters to the outsourcing destination. Of course the outsourcing destination can suggest increasing
the number of CS agents, but it is up to the client to approve or disapprove.
Outputs – we went with “Service level” and “CSR Service Capacity per hour” as the outputs of our model.
43 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
After conducting the DEA analysis, the efficiency scores of the 24 DMUs were calculated along with the weights
chosen on each output (See Appendix 4.3 for detailed efficiency scores). As we see in the chart below, DEA’s
results for the first iteration confirm all of our previous findings in the “Preliminary Data Analysis” and the
“Multiple Regression Analysis”, as we have both “Week 34” and “Week 18” as efficient, but we have other efficient
weeks as well, which is a new finding. The DMUs highlighted in green are the top-tier weeks with scores ranging
from 100% to 90%. While the DMUs highlighted in yellow are those that fall between 89.99% and 70%, and the
DMUs highlighted in red are those less than 70% in their efficiency scores.
Exhibit 4.12 DEA’s first iteration (unrestricted weights) efficiency scores graphed (Dataset 1)
The picture painted by DEA here is very holistic because it has successfully defined overall performance on
a single scale that is comparable in every sense. Now, by looking at each single week, we can conclude how efficient
was the outsourcing destination in using the deployed resources to achieve performance metrics (outputs). This
way, as a client company, we can have a more insightful conversation with the outsourcing destination and discuss
with them every given week that’s underperforming without having to know too much about how they operate.
For example, in Week 19 the performance was so low given that we just hired 7 additional CS agents in that week
(why was that?), this is an example of a question that needs to be asked to the outsourcing destination’s call center
operations management. May be the reason for the drop in Week 19 is that the newly hired agents are still in
training, which means higher AHT and thus a lower service level.
44 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Now, as impressive as this is, DEA in this iteration had the following pitfalls:
DEA assigned different weights on outputs for every DMU that maximized the DMU’s efficiency scores,
which is very good when you consider fairness to the outsourcing destination by making DEA take their
side to compensate for the lack of knowledge about their internal operations. But, this renders the DMU
scores a bit incomparable because the outputs have been weighted differently.
Most scores seem to be in the middle and upper tier, which doesn’t seem natural. We call that “Efficiency
Score inflation”.
In order to avoid these pitfalls, we decided to carry out another iteration of DEA in which we will restrict the
weights placed on outputs to certain values. Since these values might be arbitrary, we will try different weight
recipes as a way to illustrate the concept, but we are not arguing by any means that the chosen weights are
optimal.
DEA’s second iteration (weight-restricted outputs):
Taking a client perspective to this performance evaluation challenge, we need to closely follow the needs of
a client company that is analyzing its outsourced call center’s overall performance over time. The client company
wants both “Service level” and “Service Capacity”, and most probably clients will want them equally important.
DEA unfortunately if unrestricted as in the previous iteration, will attempt to maximize the DMU’s efficiency score,
even on the expense of one or more of the outputs. For example, in Week 15, regardless of the fact that the “Service
level” dropped from 23.67% in the previous week to only 7.00% in Week 15, DEA managed to secure a score of
84.43% (Middle tier band) for Week 15. DEA did that by simply avoiding any weights to be placed on service level,
and placed all the output weights on “CSR Service Capacity”. Also, since the “Inter-arrival time” has decreased
dramatically in Week 15, this allowed Week 15 to get away with a high efficiency score like that.
As we can see from the previous DEA iteration (Appendix 4.3), DEA almost always weighs one output
much higher than the other in order to maximize the DMUs efficiency score. In order to avoid this “Score inflation”
we will simply add a constraint on output weights. For simplicity purposes, we will assume that the “Service level”
and “Service Capacity” are both equally important to the client. Hence, the constraint will simply be that the
weights should always be equal. We will try 3 different scenarios:
(1) Scenario one: client cares about both outputs (Service level and Server Capacity) equally important, thus
we will place equal weights on both
(2) Scenario two: client has an interest in providing fast service, hence, all the attention is given to CSR Service
Capacity.
(3) Scenario Three: client cares only about Service level, as a result, all the output weight is placed on service
level only.
45 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
After completing the second iteration of analysis with differing weights on outputs, the new efficiency scores are as
follows:
Exhibit 4.13 DEA’s first and second iterations’ efficiency scores (Dataset 1)
As we can see, the efficiency scores of all DMUs have changed dramatically as a result of changing the
weights on outputs. This shows how flexible DEA is in incorporating the changing needs of different users or the
changing needs of the same users over time. As a result of constraining output weights, we have also treated the
“efficiency score inflation” symptom, and we now have a much more representative view of the outsourcing
destination’s performance over time. It is also worth observing that the unrestricted model is still to some extent
conservative, since the really inefficient DMUs in the unrestricted analysis will remain inefficient in the weight-
restricted models.
DMUs
Efficiency
Scores
(unrestricted)
Efficiency Scores
(Weight-restricted -
Equal weights)
Efficiency Scores
(Weight-restricted -
CSR Service Capacity
focus)
Efficiency Scores
(Weight-restricted -
Service Level focus)
Week 14 100.00% 92.76% 92.13% 83.07%
Week 15 84.43% 84.04% 84.43% 24.54%
Week 16 82.89% 82.43% 82.89% 18.07%
Week 17 95.50% 94.27% 94.16% 56.13%
Week 18 100.00% 100.00% 100.00% 53.55%
Week 19 66.85% 66.62% 66.85% 17.47%
Week 20 74.21% 68.10% 67.64% 52.95%
Week 21 72.09% 72.09% 72.09% 28.37%
Week 22 90.88% 90.78% 90.88% 30.65%
Week 23 98.90% 98.50% 98.90% 18.33%
Week 24 89.90% 89.54% 89.90% 18.27%
Week 25 76.07% 75.94% 76.07% 24.99%
Week 26 68.59% 67.97% 67.70% 35.21%
Week 27 68.52% 53.24% 52.36% 60.96%
Week 28 70.61% 65.94% 65.04% 56.53%
Week 29 81.56% 72.01% 70.67% 71.49%
Week 30 99.30% 98.17% 97.67% 38.61%
Week 32 100.00% 95.77% 93.91% 73.35%
Week 33 89.48% 86.29% 84.88% 59.05%
Week 34 100.00% 83.61% 80.59% 100.00%
Week 35 93.82% 72.58% 69.70% 93.89%
Week 36 84.27% 70.71% 68.24% 83.81%
Week 37 87.35% 77.81% 75.62% 76.90%
Week 38 89.01% 69.88% 67.15% 89.01%
46 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
The applications to this result can be numerous. The client can agree with the outsourcing destination on a
specific efficiency score threshold to act as their minimum performance every week, if they go below the threshold,
they will be penalized in some way, and if the poor performance persists, the whole account can be withdrawn
from them.
In order to realize the extent of variability in evaluation introduced by changing the weights of outputs,
let’s take a look at the following chart.
Exhibit 4.14 DEA’s first and second iterations’ efficiency scores summarized in graph (Dataset 1)
After conducting DEA’s both iterations we can see how DEA was able to tackle almost all the shortcomings
of the previous analytics tools. DEA was able to provide a single ranking that can be used by a client company to
evaluate the overall performance of their outsourced call center over time. The process didn’t require much of
subject matter knowledge about running call centers.
The following table summarizes the different rankings achieved by the various Preliminary and analytics
tools explored in this chapter.
47 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Exhibit 4.15 Summary of different rankings achieved by various analyses (Dataset 1)
This shows two main findings:
Performance is dependent on how it is defined
DEA was the only tool capable of defining overall performance on a single scale.
Further analysis with DEA – accommodating queue differences:
Looking back to the DEA model’s inputs used so far, especially the “inter-arrival time” as an input, which is
the inverse of call volume. This input metric is very tricky to include as it has a hidden assumption if the company
being investigated has multiple queues in their inbound call center. In case of different queues, each queue receives
a different type of calls. Each call type has its own level of difficulty that dictates a different level of AHT. Hence,
each queue will have its own definition of AHT. As a result, using “Call volume” aggregated across all queues as an
input assumes that the proportion of calls coming to different queues is the same every week. However, for dataset
1 in Company A, this is not true, as we have looked at the call proportions coming into the different queues each
week across the 24 weeks, and we have found them fluctuating a lot. (See Appendix 4.4 and 4.5 for Queue data and
Queue Call proportions chart).
There are multiple ways to accommodate this assumption into the DEA to reflect the different queues, some of
these ways are:
Conduct DEA analysis on the queue level, to eliminate the need to use aggregated “Call volume”. This way,
the type of calls will be homogenous and there will be no more hidden assumptions. Although this method
is a clear workaround the problem, but it might not fit the needs of a client who is interested in aggregated
overall performance of the call center rather than separate queues
Another approach will be through adding an additional input to reflect the “Service Capacity expectation”
every week based on the varying queue call proportions. This input “Service Capacity expectation” will be
calculated as a “SUMPRODUCT” of the Grand AHT per queue - which is the average AHT per queue over the
24-week period – array and the call proportions per queue array. The result is an “AHT expectation” in
minutes for each of the 25 weeks, but for the purpose of input orientation “the lower, the better”, we will
need to transform “AHT expectation” into its inverse “Service Capacity expectation” and multiply it by 60
minutes per hour to get “Service Capacity expectation per hour”. It is worthy to be mentioned that we will
Agent
Contribution to
Service Level
ratio
ASSC
Ratio
Regression Model 1
(Service Capacity)
Regression
Model 2
(Service level)
DEA unrestricted
DEA
(equal
weights)
DEA -
Service
Capacity
focus
DEA -
Service
level Focus
Best Week(s) Week 34 Week 34 Week 18 Week 18 Week 14,18,32, and 34 Week 18 Week 18 Week 34
Second Best Week(s) Weel 35 Week 35 Week 30 Week 14 Week 30 Week 23 Week 23 Week 35
Worst Week(S) Week 19 Week 19 Week 21 Week 24 Week 19 Week 27 Week 27 Week 19
48 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
still need to eliminate Week 31 from the dataset, as it is still an outlier. (See Appendix 4.6 for “Service
Capacity expectation per hour” dataset)
Now, we will try to illustrate the use of the second suggestion by running two more iterations of DEA after
including the “Service Capacity expectation” column as an input in dataset 1 (See Appendix 4.7 for DEA’s third
iteration efficiency scores – unrestricted). The results are as follows (See Appendix 4.8 for DEA’s fourth iteration
results charted):
Exhibit 4.16 DEA’s third and fourth iterations’ (Accommodating Queue differences) efficiency scores
(Dataset 1)
As we can see, some changes have occurred to some of the DMUs’ efficiency scores as a result of the
inclusion of the new input. All the changes are in favor of the DMUs of course, because if it is otherwise, DEA will
not place weight on the new “Service Capacity expectation per hour” column. To sum up, this was an illustration of
a possible method to accommodate queue differences while using DEA.
DMUs
Efficiency
Scores
(unrestricted)
Efficiency Scores
(Weight-restricted -
Equal weights)
Efficiency Scores
(Weight-restricted -
CSR Service Capacity
focus)
Efficiency Scores
(Weight-restricted -
Service Level focus)
Week 14 100.00% 92.85% 92.22% 83.07%
Week 15 85.41% 85.01% 85.41% 24.54%
Week 16 82.89% 82.43% 82.89% 18.07%
Week 17 96.03% 94.98% 94.88% 56.13%
Week 18 100.00% 100.00% 100.00% 53.55%
Week 19 68.58% 68.22% 68.58% 17.47%
Week 20 80.76% 80.02% 79.61% 52.95%
Week 21 72.09% 72.09% 72.09% 28.37%
Week 22 90.88% 90.78% 90.88% 30.65%
Week 23 98.90% 98.50% 98.90% 18.33%
Week 24 89.90% 89.54% 89.90% 18.27%
Week 25 79.42% 79.14% 79.42% 24.99%
Week 26 78.68% 78.63% 78.56% 35.21%
Week 27 68.52% 77.37% 76.37% 60.96%
Week 28 80.97% 79.90% 79.10% 56.53%
Week 29 92.43% 89.96% 88.66% 71.49%
Week 30 100.00% 100.00% 100.00% 38.61%
Week 32 100.00% 100.00% 99.14% 73.35%
Week 33 94.20% 94.08% 93.57% 59.05%
Week 34 100.00% 92.98% 90.61% 100.00%
Week 35 100.00% 94.57% 91.82% 100.00%
Week 36 96.91% 92.81% 90.55% 89.17%
Week 37 99.10% 96.74% 95.07% 80.63%
Week 38 96.51% 91.41% 88.81% 95.49%
49 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
4.6 Summary of Findings
Preliminary Data Analysis Theoretical Benchmarking
Empirical Benchmarking
Individual variables
ratios Queueing Analysis Linear Regression
DEA
Can it combine
multiple variables?
No
Yes, but only if they
produce a meaningful
ratio
It can only combine Staffing level, Service
Capacity, and Call volume)
Yes, if they have a single
dependent variable
Yes
Can it combine multiple output
metrics?
No
Yes, but only if they
produce a meaningful
ratio
No
No
Yes
Does it require subject-matter
knowledge?
No
Yes
Yes
No
Yes
Does it provide a
single scale definition of overall
performance?
No
No
No
No
Yes
Is it fair to the
outsourcing destination (given
that we don’t know much about their
operations)?
No
Depends
No, because it
depends on theory, which might be
different from reality
Yes, to some
Extent because it
is data-sensitive
Yes, because it gives the benefit of
doubt
Exhibit 4.17 Summary of findings on Call Center aggregate performance analysis
50 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Chapter 5: Agent Performance Assessment 5.1 Introduction
In this chapter we will discuss the call center agent performance evaluation challenge. We will assume the
perspective of an inbound call center supervisor who is interested in evaluating the performance of his/her team.
Based on the evaluation results, the supervisor will make his hiring, firing, training, and promotions decisions. In
this chapter, we will start by conducting a preliminary data analysis with “Dataset 2” from company B which will
acts as a “baseline” analysis. Then, we will apply both analytics tools linear regression and DEA respectively to the
same dataset.
Inbound call center agent performance is usually defined in terms of three main dimensions:
1- Productivity, which focuses on how the agent manages his/her time during the shift. This dimension is
represented by metrics such as AHT, ACW, and Hold time. Call centers focus on productivity as a means of
achieving service level efficiently
2- Quality, which focuses on the call center agent’s performance during any interactions with customers
whether on or off the call. For example, the quality dimension focuses on metrics such as quality scores that
are given by internal quality coaches. Also, CSAT scores can be another measure of quality.
3- Punctuality, which focuses on the agent’s attendance, adherence to schedule and breaks, and conformance
to scheduled shift length. The main difference between adherence and conformance is that adherence has
to do with how closely the agent adheres to his login, logout and scheduled break times. While
conformance looks at the agent’s commitment to the scheduled shift length, for example, if the agent is
scheduled to work for 9 hours and works only 8, he/she will be penalized in the conformance metric.
As a result, inbound call center supervisors are always monitoring these 3 categories of agent performance
for each agent. When an agent underperforms in at least one of these dimensions of performance, the supervisor is
entitled to take any of the following actions:
Give verbal warning to the agent
Make the agent sign an “action plan” to correct the poor performance, to which the agent will be
held responsible if he/she didn’t follow through.
Ask the training department to retrain the agent
Make the agent sign a warning letter
Fire the agent
51 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Hence, the call center agent performance evaluation challenge from an inbound call center supervisor
perspective is challenging because:
Agent performance is highly multi-dimensional
The performance parameters, such as the difficulty of calls or the nature of customers, change very
rapidly, which means that performance needs to be redefined quite often
Call center agents often vary in their experience due to the high turnover rate in the industry, so
accounting for experience differences in measuring performance isn’t straight forward
It isn’t clear what defines a good target for performance
The weights of various performance dimensions in agents’ scorecards are arbitrary, and there is no
clear methodology of determining the optimal weights
Agent performance needs to be defined on a single overall performance scale “i.e. there needs to be
only one ranking of agents based on their overall performance”
Performance evaluation needs to be done in a way that supports training and development
In this chapter, we will try to explore the usefulness of various analytics tools in meeting the needs of agent
performance evaluation challenge. We will start the chapter with an attempt to use “Preliminary Data Analysis” to
see how far we can go without the use of analytics. And then, in the sections to follow, we will compare and
contrast the results with and without the use of various analytical tools.
5.2 Preliminary Data Analysis
In this section, we will attempt to conduct the analysis to Dataset 2 without the use of analytics. We will use
our findings in this section as a benchmark when assessing the yield from deploying various analytics tools.
For the purpose of standardization, we will transform some of the fields in dataset 2 to fit the needs of the
DEA analysis later. We will perform the following transformations:
Dataset 2 is aggregated over a three-month period, we will use the day in the middle of the three
months to calculate “Agent Experience” in months using the “DATEDIF( )” equation in Excel®
Both fields “Queue AHT target” and “AHT” will be transformed into their inverse versions, which
are “Queue Service Capacity target per hour” and “CSR Service Capacity per hour” respectively.
Agent absenteeism will be converted into its corresponding value “Agent attendance” by deducting
agent absenteeism (%) from 1
Both fields “Agent Attendance” and “Agent Adherence” will be combined into one field, which is
their weighted average, which will be called “Agent Punctuality”
Before we extract the data, we have a final change that we need to make, but first, let us take a look at the
suggested inputs and outputs for this agent performance evaluation challenge:
52 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Outputs (Performance metrics) – orientation “the higher the better”:
CSR Service Capacity per hour Quality Punctuality
Inputs (Differentiating variables) - orientation is “The lower the better”:
Agent Experience (months) Queue Service Capacity target per hour
We think that all these inputs and outputs are very sound. However, we have a technical concern on the
soundness of “Experience” as an input to this analysis. The reason is that as we all know, both analytics tools we
chose are linear in nature, which means that they assume linearity in the relationships between inputs and
outputs. Applying that to “Experience”, we think that experience in general is a curve that has diminishing returns
over time, hence it is not linear. For that reason, we will have to discard experience as a possible input to that level
of analysis. After applying that change, dataset 2 looks as follows:
Exhibit 5.1 Dataset 2 in analysis-ready formatting
Inputs
Agent NameQueue
name
Queue Service
Capacity target
per hour
CSR Service
Capacity per hourQuality (%) Punctuality (%)
Agent 1 20.81 20.57 95.2% 95.3%
Agent 2 20.81 20.93 94.3% 91.0%
Agent 3 20.81 17.48 83.3% 94.8%
Agent 4 20.81 21.43 91.1% 92.9%
Agent 5 20.81 19.35 94.6% 96.8%
Agent 6 20.81 16.22 86.6% 91.7%
Agent 7 20.81 19.46 98.7% 95.1%
Agent 8 20.81 18.65 89.2% 86.5%
Agent 9 20.81 19.78 98.9% 93.8%
Agent 10 20.81 18.65 90.3% 95.8%
Agent 11 20.81 19.05 92.5% 95.4%
Agent 12 20.81 15.93 98.4% 83.7%
Agent 13 20.81 21.18 90.9% 91.5%
Agent 14 20.81 21.95 99.6% 91.9%
Agent 15 20.81 22.78 99.8% 94.1%
Agent 16 20.81 14.94 86.6% 95.0%
Agent 17 20.81 20.93 91.6% 83.0%
Agent 18 20.81 15.58 90.9% 88.8%
Agent 19 20.81 16.74 90.0% 95.8%
Agent 20 20.81 18.37 95.7% 91.7%
Agent 21 17.73 13.28 95.9% 98.2%
Agent 22 17.73 19.89 100.0% 96.3%
Agent 23 17.73 13.48 84.8% 92.4%
Agent 24 17.73 12.50 92.0% 93.4%
Agent 25 16.98 14.69 70.0% 94.8%
Agent 26 16.98 16.90 91.7% 95.9%
Agent 27 16.98 14.63 100.0% 83.9%
Agent 28 16.98 14.94 81.0% 91.7%
Agent 29 16.98 18.00 90.9% 96.8%
Agent 30 16.98 21.43 57.8% 96.9%
Outputs
Que
ue 1
Que
ue 2
Que
ue 3
53 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Next is our “Preliminary Data Analysis” for Dataset 2 with some descriptive statistics.
Descriptive statistics:
Since dataset 2 is a snapshot of different DMUs at the same point of time, a line chart will not be a good
representation, so let us examine a histogram of each output to give us an idea of how variable each output is.
Exhibit 5.2 CSR Service Capacity (output) graphed
CSR Service Capacity per hour
As we can see in Exhibit 5.2, this metric is dominated by “Agent 15”, followed by “Agent 14”. The average
service capacity among the sample is 17.99. On the other hand, the lowest agent on this metric is “Agent 24” with
12.5 customers per hour.
Quality
In Exhibit 5.3 below we can see that the quality metric is dominated by multiple agents “Agents 22, and 27”.
While there are several other agents who are very close to the top such as “Agents 14, and 15” who have quality
scores of 99.6% and 99.8% respectively. The average quality score among the group is 91%, which is quite high.
However, our lowest performer on that metric is “Agent 30” with a quality score of 57.8%.
54 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Exhibit 5.3 Quality (output) graphed
Exhibit 5.4 Punctuality (output) graphed
Punctuality
From the graph above, we can see the size of variability introduced by this hybrid output “i.e. combines
attendance and adherence”. The group is dominated by “Agent 21” followed by “Agent 30”, which is surprising
given agent 30’s poor quality score. The average punctuality score here is 92.83%. Lowest performer on that
metric is “Agent 17”.
55 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Correlation Matrix:
The correlation matrix shows a moderate positive correlation between “CSR Service Capacity per hour” and
“Queue Service Capacity target per hour”. This is expected, because as the queue target changes, the difficulty of
calls change, which means that service capacity should also change. The other surprising relationship is that
between “Quality” and “Queue Service Capacity target”, which is quite counter-intuitive because one would expect
that quality in more demanding queues “i.e. in terms of speed of service” should be compromised. But, apparently
this is not the case for this sample of agents. The other last important relationship to highlight is the negative
correlation between “Quality” and “Punctuality”, which is also expected because if agents care enough to come,
they would be expected to care about the quality of their service, but of course this is not always true!
Exhibit 5.5 Correlation Matrix on Dataset 2
Using ratios to combine variables – Dataset 2:
In the previous chapter, we used ratios as a means to combine inputs and outputs to produce more
meaningful trends than presented by single variables independently. However, this is very challenging with this
dataset “Dataset 2” since we have only 1 input, and many of the outputs are very hard to combine into a meaningful
ratio. But, regardless of the challenging dataset, we will look at the deviation of “Service Capacity” from its “Queue
target”.
Percentage deviation from target Service Capacity: This isn’t an actual ratio, rather it is a comparison
between an input “Queue Service Capacity target per hour” – which can be considered a
performance target – and the output “CSR Service Capacity per hour”. We will look at the
percentage deviation of each agent’s service capacity from target. As we can see, Agent 30 – not
agent 15 or 14 as we expected before – was able to dominate the whole group, followed by agent
22. This confirms the value of combining different inputs and outputs together.
Queue Service
Capacity target per
CSR Service Capacity
per hour
Quality
(%)
Punctuality
(%)
Queue Service Capacity target per hour 1.00
CSR Service Capacity per hour 0.48 1.00
Quality (%) 0.39 0.15 1.00
Punctuality (%) -0.20 0.05 -0.21 1.00
56 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Exhibit 5.6 Percentage Deviation from Target Service Capacity graphed (Dataset 2)
As a result to this comparison metric above, we were able to see our first ranking for agents, in terms of
their productivity relative to their queue productivity targets. Next, we will move into a discussion about using
performance targets in agent scorecards as an “absolute benchmark”.
5.3 Absolute Benchmarking “Performance Targets”
Unlike “Queueing theory” used in chapter 4, we are not familiar with a theory that predicts call center
agents performance to provide the optimum to which we can compare our agents. Call centers also face the same
problem, they need an internally developed benchmark to compare their agents’ performance to in terms of
(Productivity, Quality, and Punctuality). Finding such benchmark isn’t quite easy, because when targets are picked,
nature of call center calls and their challenges change a lot over time, which will require revisiting the performance
targets quite often to adjust them according to the new challenges. And unfortunately, the process of constantly
revisiting targets is very costly, susceptible to subjectivity, and time consuming!
For that reason, companies should optimally start with a blank sheet “i.e. no expectations for performance” and
have a pilot period, in which they will monitor performance very closely to be able to define what a good target
should be. As we mentioned earlier, companies should focus on performance from the customer perspective to
make sure that they do not overproduce on non-value adding metrics. This way, performance targets will be
custom tailored to the call center’s specific set of conditions. Even then, call center management should constantly
listen to their agents and revisit these targets every now and then to see if they need any adjustment, otherwise,
57 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
performance targets will go obsolete and agents will feel that they are not realistic and will not care about them
anymore!
Now, after setting the right performance targets, the question now is “what weights should be placed on each
performance dimension?” The answer to that question is that there is no one right answer to that question,
because the weights are only an extension to the company’s strategy and positioning. For example if a company is
strategically positioned for its high quality service, they should stress Quality through weighing it heavily in the
agent’s scorecard. To illustrate how the different weights look like in a real scorecard (See Appendix 1.2 for a
sample scorecard weights and calculation methods from a major Telecommunications inbound call center in the
MENA region)
To sum up, absolute benchmarking through “internally developed performance targets” is widely used in the
call center industry because of its perceived fairness, ease of use, and flexibility. But on the other hand, it usually
goes obsolete quite often if not revisited frequently. In addition, performance targets that aren’t based on solid
research tend to be very arbitrary in nature, which yields performance evaluation efforts inaccurate. That’s why
we don’t recommend the use of performance target if there isn’t enough solid data to support these targets.
As a result, in the following sections we will attempt to explore analytics tool that can work as an alternative
path to measuring performance without having “Performance Targets”.
5.4 Empirical Peer Benchmarking – I “Multiple Regression” – Dataset 2
In this section we will apply “Linear Multiple Regression” to dataset 2, to illustrate how it can be used to
evaluate individual agents’ performance. We need to start by defining the dependent and independent variables.
But before we do so let us remind ourselves with the analysis parameters:
1. DMU: In this analysis, we are looking at the performance of 30 agents in Company B (Dataset 2). So, the
DMU here is each agent of the 30 agents.
2. Independent variables (inputs): our single independent variable will be “Queue Service Capacity target per
hour”
3. Dependent variables (outputs): our dependent variables are “CSR Service Capacity per hour”, “Quality”, and
“Punctuality”.
For this analysis to take place, we will need three separate models, since linear regression can accommodate only a
single output in each model. Hence, the “model formulation” is as follows for our 3 models:
𝐶𝑆𝑅 𝑆𝑒𝑟𝑣𝑖𝑐𝑒 𝐶𝑎𝑝𝑎𝑐𝑖𝑡𝑦 𝑝𝑒𝑟 ℎ𝑜𝑢𝑟 (𝑦1) = 𝛼1 + 𝛽1 ∗ 𝑄𝑢𝑒𝑢𝑒 𝑆𝑒𝑟𝑣𝑖𝑐𝑒 𝐶𝑎𝑝𝑎𝑐𝑖𝑡𝑦 𝑡𝑎𝑟𝑔𝑒𝑡 𝑝𝑒𝑟 ℎ𝑜𝑢𝑟(𝑥) + 𝜀1
𝑄𝑢𝑎𝑙𝑖𝑡𝑦 (𝑦2) = 𝛼2 + 𝛽2 ∗ 𝑄𝑢𝑒𝑢𝑒 𝑆𝑒𝑟𝑣𝑖𝑐𝑒 𝐶𝑎𝑝𝑎𝑐𝑖𝑡𝑦 𝑡𝑎𝑟𝑔𝑒𝑡 𝑝𝑒𝑟 ℎ𝑜𝑢𝑟(𝑥) + 𝜀2
𝑃𝑢𝑛𝑐𝑡𝑢𝑎𝑙𝑖𝑡𝑦 (𝑦3) = 𝛼3 + 𝛽3 ∗ 𝑄𝑢𝑒𝑢𝑒 𝑆𝑒𝑟𝑣𝑖𝑐𝑒 𝐶𝑎𝑝𝑎𝑐𝑖𝑡𝑦 𝑡𝑎𝑟𝑔𝑒𝑡 𝑝𝑒𝑟 ℎ𝑜𝑢𝑟(𝑥) + 𝜀3
58 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Our tool for estimating these models was Excel® (See Appendix 5.1 for model estimation reports). After
estimating the model successfully (See Appendix 5.2 for model results). We can now discuss the conclusions that
we can extract from the model results, as follows:
Model 1 “Service Capacity”: regression is confirming our findings from the ratio analysis about
service capacity “i.e. the productivity aspect of the agents’ performance”. We can clearly see that
“Agent 30” still dominates this territory followed by “Agents 22 and 15”. While “Agent 24” remains
as the poor performer on this metric.
Exhibit 5.7 Percentage deviation from Model 1’s estimate graphed (Dataset 2)
Model 2 “Quality”: The model here confirms that “Agent 27” is the dominating agent on the quality
metric, which is a very clear distinction, unlike having multiple agents hovering around the top “i.e.
Agents 14, 15 and 22” as the preliminary data analysis suggested. However, the model confirms the
findings of the preliminary data analysis in terms of the lowest performer “ Agent 30”
59 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Exhibit 5.8 Percentage deviation from Model 2’s estimate graphed (Dataset 2)
Model 3 “Punctuality”: In this model, we can also see the change brought in by combining an input.
Instead of “Agent 21” dominating the group as the preliminary study suggested, regression
confirms that “Agent 5” is now the dominating agent followed closely by “Agent 21”. Another
surprising result is that “Agent 27” has now become the lowest performer in this group, rather than
“Agent 17”.
Exhibit 5.9 Percentage deviation from Model’s 3 estimate graphed (Dataset 2)
60 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
After analyzing our findings from using regression as an analytics tool for benchmarking call center agent
performance, we can see clearly the value that regression brings in even if we only have 1 input like in our case
with dataset 2. Next, we will examine DEA’s ability in analyzing dataset 2.
5.5 Empirical Peer Benchmarking – II “Data Envelopment Analysis” – Dataset 2
As important as AHT to operations managers, everyone is much more concerned about quality. You can be
the best agent in handling customers calls as quickly as possible, but “are you doing it right?” are you spending
enough time producing on the various aspects of quality defined by the company to its customers, for example “are
you educating the customer about the company’s service?”, more importantly “are you preserving customers
privacy by confirming the phone password before sharing any account information?”, “are you directing customers
to self-service online in order to help reduce workload in the future?”, etc… (See Appendix 1.1 for a sample of
quality checklist)
As a result, it is very uncommon (but it might happen!) to find an agent famous for their low AHT, although
they will be appreciated by their supervisor. On the other hand, almost all agents will be well known and noticed
by their bosses for their “quality service”, and sometimes customers would want to do the agent a favor if they
receive good service and would ask to talk to the agent’s supervisor to praise the agent’s service, this is usually
called a “thank you!” call, and bosses often email the whole Customer Service department when an agent gets a
“Thank you” call. They will also mention that agent’s name in the email. As a result, we think that DEA is necessary
for combining all the three different outputs (Service Capacity, Quality, and Punctuality) together in order to have a
single holistic scale on which different agents in dataset 2 will be rated.
The DMU in this analysis will be the different agents, we have 30 of them in this dataset 2. For each agent
we will analyze 3 main metrics (1) Service Capacity, which represents agent’s productivity. (2) Quality, and (3)
Punctuality, which was calculated as the average of both “Attendance and Adherence”. These metrics will be the
outputs to our model, with the only available input being the “Queue Service Capacity target”.
Unfortunately, since DEA is also a linear program, we will have to discard “Experience” as an input. We
could run DEA with experience, but we will end up over-punishing the experienced agents on the expense of the
newly hired agents. So, we decided to run the DEA model with “Queue Service Capacity target” as the only input. It
is also worth mentioning that even if we didn’t have any inputs we could still run an “output only” DEA (Lovell and
Pastor) (See Exhibit 5.1 for the input and output data in dataset 2). To summarize, our model is as follows:
DMU: each of the 30 agents
Inputs: Queue Service Capacity target per hour
Outputs: Productivity (Service Capacity), Quality, and Punctuality (average of adherence and attendance)
61 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
DEA’s first iteration (Unrestricted weights) – Output only:
The value behind the weight unrestricted model was illustrated in the last chapter when we wanted to be
fair to the outsourcing destination. Which made sense than because we didn’t know much about their internal
operations. But here, this isn’t the case, as the supervisor have access to everything that the agent does. That’s why
we are using the “weight unrestricted” model as a baseline only, or a way to show the improvement brought by the
next iterations. However, we think that the weight-unrestricted DEA model here is too conservative for internal
use. The results can be summarized as follows (See Appendix 5.3 for details on efficiency scores and associated
weights):
Exhibit 5.10 DEA’s first iteration’s (unrestricted weights) efficiency scores graphed (Dataset 2)
As we can see, five agents got away with an efficient score of (100%), which was expected due to lack of
weight restrictions. Even though, this model was weight-unrestricted, but we can still see clearly that there are
many inefficient agents. In the next iteration, we will run DEA with a specific set of weights, but in order to allow
some degree of freedom, we will not make the weights add up to one. The choice of weights was quite arbitrary, we
went with the same weights as in Appendix 1.2. But again, our discussion here isn’t about DEA weight selection,
rather, it is to illustrate the use of the tool in the call center agent performance evaluation challenge.
62 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
DEA’s second iteration (weight restricted) – Output only:
As we said before, the chosen weights were as follows:
Productivity: 20%
Quality: 40%
Punctuality: 20%
Which leaves 20% of freedom for DEA to assign to whichever output that maximizes the DMU’s efficiency
score. After running the analysis, the results were as follows:
Exhibit 5.11 DEA’s second iteration’s (restricted weights) efficiency scores (Dataset 2)
Agent NameQueue
name
Efficiency
Scores
Service
CapacityQuality Punctuality
Agent 1 82.45% 20% 60% 20%
Agent 2 83.43% 20% 60% 20%
Agent 3 70.76% 20% 60% 20%
Agent 4 84.85% 20% 60% 20%
Agent 5 78.32% 20% 60% 20%
Agent 6 66.72% 20% 60% 20%
Agent 7 79.03% 20% 60% 20%
Agent 8 75.05% 20% 60% 20%
Agent 9 80.10% 20% 60% 20%
Agent 10 75.48% 20% 60% 20%
Agent 11 77.02% 20% 60% 20%
Agent 12 66.68% 20% 60% 20%
Agent 13 83.93% 20% 60% 20%
Agent 14 87.44% 20% 60% 20%
Agent 15 90.36% 20% 60% 20%
Agent 16 62.51% 20% 60% 20%
Agent 17 82.87% 20% 60% 20%
Agent 18 64.93% 20% 60% 20%
Agent 19 69.00% 20% 60% 20%
Agent 20 74.92% 20% 60% 20%
Agent 21 68.02% 20% 60% 20%
Agent 22 94.65% 20% 60% 20%
Agent 23 67.26% 20% 60% 20%
Agent 24 64.25% 20% 60% 20%
Agent 25 73.53% 20% 60% 20%
Agent 26 85.42% 20% 60% 20%
Agent 27 76.55% 20% 60% 20%
Agent 28 75.78% 20% 60% 20%
Agent 29 89.91% 20% 60% 20%
Agent 30 100.00% 20% 60% 20%
Weights chosen by DEA
Qu
eu
e 1
Qu
eu
e 2
Qu
eu
e 3
63 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
As we can see here, this is quite an improvement from the last iteration. We can see a more real-view of agents’
performance. We can see that Agent 30 is dominating the whole group regardless of his/her poor performance in
Quality, due to his/her superior performance in service capacity and punctuality with a lower input than most of
the group “queue 3”. We can also see how DEA uses the 20% degree of freedom identically across the whole
sample, which means that this specific recipe of weights maximizes the efficiency scores of all the DMUs in this
sample.
5.6 Summary of findings
In this chapter, we were able to see the value of the analytics tools in defining overall performance on a
single scale, especially DEA. Although dataset 2 was particularly hard for linear regression because it had only 1
input and multiple outputs, but were still able to see the improvements brought in by the use of linear regression.
Not to mention the magnificent picture painted by DEA, which is the dominant analytics tool in multi-dimensional
environments of performance. So, getting back to the needs of the operations supervisor. Let’s summarize what
we’ve learned about the various analytical methods used in this chapter.
Preliminary Data Analysis Absolute Benchmarking
Empirical Benchmarking
Individual variables
ratios Performance Targets Linear Regression
DEA
Can it combine
multiple variables?
No
No
No
Yes
Yes
Can it combine multiple output
metrics?
No
No
No
No
Yes
Could it provide a single definition of
overall agent performance on a
single scale?
No
No
No
No
Yes
Could it
accommodate experience as a differentiator?
No
No
No
No
No
Exhibit 5.12 Summary of findings on Agent performance analysis
64 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Chapter 6: Conclusions, and future research opportunities
In this chapter we will summarize our comparison between linear regression and DEA as possible analytics
tools for the two performance evaluation challenges we examined earlier. Then we will engage in a qualitative
discussion on the topics that we think can act as future research opportunities.
Conclusions:
After applying the suggested analytics methods “i.e. DEA and linear regression” to the two evaluation
challenges we explored in this research, which were the call center performance over time and the agent overall
performance, we summarized our findings about each as follows:
Linear regression
This widely used analytics tool brought some very beneficial insights to our analyses, let us name a few:
It provided information about the direction, magnitude and even the significance of the different
relationships between various inputs and outputs. This is very helpful for users to make them realize what
the data is saying exactly. This in turn helps the user to shape their prior knowledge of the inputs and
outputs into a more informed posterior knowledge that is based on real data.
It also helped combining all the inputs in each of the analyses to each of the outputs separately, which produced a much more meaningful picture of the outputs after controlling for the inputs, and we showed
how more meaningful was that during our analysis chapters in several occasions
On the other hand, linear regression seemed to have a major shortcoming when applied to these two performance
evaluation challenges:
Linear regression could not incorporate multiple outputs into the same model, which was not very helpful in this particular performance evaluation challenge because we ended up with more meaningful but yet
separate scales of performance for each of the outputs. So, from the perspective of our analysis, we needed
a single scale of performance in order to be able to judge different DMUs holistically.
Overall, we see linear regression as an analytics tool that is perfect for blending multiple inputs with a single
output, especially because of the data it provides in the model estimation report which helps shape the user’s
understanding of the relationships. But when it comes to multi-dimensional performance, linear regression might
not be the best tool for that.
Data Envelopment Analysis
DEA was a very useful tool in both of these analyses, especially the first one “i.e. aggregate performance
tracking”. The main reasons behind that are as follows:
DEA was able to produce a single scale of overall performance that combined all of the inputs and outputs
together. This was a perfect fit for the nature of our challenges, which made DEA the lead analytics tool for these performance evaluation challenges.
The flexibility of DEA brought by the use of “weight unrestricted” versus “weight restricted” DEA models
was very valuable in the sense that it allowed for a change in the degree of firmness we wanted in our
results. If we prefer more conservative results in order to give the benefit of doubt to the various DMUs we
should use the “weigh unrestricted” DEA model. On the other hand, if we would like a more unified weight
system across all DMUs, then “weight restricted” DEA is the answer. This flexibility meets the needs of
different users, we like to think that “weight unrestricted” DEA is a better fit for external users who have
minimal information about internal operations and/or relationships between various inputs and outputs.
While we also think that the “weight restricted” DEA is a better fit for internal use, were a more unified
65 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
measure is needed to fit the more knowledge that the user has about performance and also to promote
fairness of the internal performance evaluation process.
DEA is very easy to learn and deploy
On the other hand, DEA showed some shortcomings in tackling the two performance evaluation challenges in
this research. These shortcomings were as follows:
DEA could not incorporate obviously non-linear inputs such as “Agent Experience”. This is a real
shortcoming of the methodology, but we think that there are many ways to linearize non-linear inputs.
DEA doesn’t provide any information about the relationships between inputs and outputs, which makes us
think that it assumes that the user’s prior knowledge of the relationships between inputs and outputs is
enough. This might be true in the case of internal use, but external users might not have the right
understanding of the relationships between inputs and outputs which might lead to very inaccurate results
if used haphazardly! As a result, we think that DEA requires some knowledge of the relationship between
the different inputs and outputs, but it doesn’t require much knowledge about the fine details of operations
in a specific call center.
There are some question marks on the comparability or the fairness of DEA “weight unrestricted” models
in the sense that the DMUs have different weights on inputs and outputs, which doesn’t seem fair to some.
But we agree that in some situations, it is very useful due to its more conservative nature.
Overall, we think that of both methodologies, DEA seems to be the dominating analytics tool that best meets
the needs of both of these challenges. However, we would like to stress the importance of careful use of DEA in this
application, sense it doesn’t correct the user if the tool was used incorrectly.
Future research opportunities:
In this research, we have tried to use the data we obtained in the best way possible to test both analytics
tools and we came to conclusions about which tool is more fitting to these specific two performance evaluation
challenges “i.e. Self-benchmarking and Peer benchmarking”. However, we found ourselves facing some obstacles
that we couldn’t tackle in this research due to time and project scope constraints, these obstacles should serve as
potential future research opportunities in the field of performance evaluation in call centers. These research
opportunities are:
We think that “Agent experience” can be incorporated successfully into the DEA analysis of the agent performance. But since DEA requires inputs and outputs to be linearly connected, we would like to find a
way of turning agent experience into a linear variable that reflects either the agent’s experience or the
amount of training or knowledge that he/she has, in order to serve as a differentiator among agents.
Due to the uncommon nature of “weight unrestricted” DEA in evaluating performance, we would like to investigate the psychological effect associated with using DEA in evaluation on both the evaluator and the
DMU being evaluated. Will it make DMUs more motivated to perform? Will they understand how it works
and find a way around it? How to use it to better align agent’s motivation to that of the call center and the
client company?
66 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Customer
Centric/Business
Attributes
Error Error Reason Error Type
Asked repetitive questions
Asking the customer for an information already mentioned before
Didn’t confirm customer's understanding
Didn’t keep the conversation on track
Didn’t use acknowledgment listening
Didn’t wait for customer's confirmation
Inappropriate wordings used to fill the dead air
Interrupted the customer
Let the customer repeat the information
Not concentrated
Not confident and hesitant
Not patient
Over confident
Used mute button
Didn’t follow the correct conference protocol
Didn’t follow the correct transfer protocol
Didn't ask the customer for permission
Didn't give the reason for hold
Didn't thank the customer for hold
Didn't use the hold statement
Didn't wait for customer permission
Take the steps on notes if more than 3 steps
Follows proper call sequence then proceed accordingly with relevant information
Warm Greeting & Closing
Used language not match with the customer
Didn’t absorb customer anger
Didn’t allow customer to vent completely
Didn’t offer a sincere apology showing understanding of the situation
Not able to handle stress
Not empathetic
Not understanding
Showed understanding with over reacting
Didn’t offer extra assistance
Didn’t offer extra assistance in a willing way
Didn’t ask for customer name
Didn’t explain the reason for verification (when needed)
Uses of the Transitional phrase
Verified customer's data while no need for it
Verified mobile/Land Line number while no need for it
Didn’t verify mobile/Land line number
Didn’t verify mobile/Land line number properly
Didn’t ask customer for his mobile number/Land line
Didn't address the customer with his/her name using available data
Didn’t repeat mobile/Land Line number after the customer
Welcoming the customer
Didn't ask for customer permission to talk at the beginning of the call
Didn't ask for customer permission to talk at all
Didn't introduce himself at all
Didn't mention his name
Didn't mention the company name
Didn't explain any reason for the call
Explains wrong reason for the call
Addressed customer by wrong name
Didn’t address customer by his/her formal name (title)
Didn’t address customer by his/her name at all
Didn’t Follow with customer on time / as promised
Didn’t follow up with the customer when required
Didn’t make security verification
Didn’t verify customer address
Didn’t verify customer birth date
Didn’t verify customer contact numbers
Didn’t verify customer ID number
Released customer personal data
Verified customer's data while no need
Going extra mile to solve the customer’s problem and ability to retain the customer
Way of education / Satisfaction confirmed
Tariff advisory
Extra relevant information
What it is for customer interest
Cross and up selling
Profit to the organization
Exceeding
Exceeding
Maintain
ConfidentialitySecurity verification End User Critical Error
Extra Mile /
Revenue
Opportunity
(Outstanding
/Unique)
Extra Mile
Revenue Opportunity
Addressing
customer by
formal name
Professional personalization Non Critical Error
Follow up when
required.
Escalating / directing the
customer to the correct
channels
End User Critical Error
Asking for customer
permission to talk or / and
introducing yourself
Non Critical Error
Explains reason for the call Non Critical Error
Staff members
are attentative
Standard verification Non Critical Error
Controls the call well Non Critical Error
Non Critical Error
Offers extra assistance Non Critical Error
Offers a sincere apology
showing understanding of
the situation & displaying
empathy
Appendix 1.1 Sample of Quality rubric “checklist” for a Major Telecommunications company in the MENA region:
67 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
1.2 Sample of Agent Scorecard for a Major Telecommunications company in the MENA region:
2014 Target 2014 Weight Way of Calculation Grading
TRIM Above Competition 10%
Service Level 80/20 5%
Repetitive Callers % 4% 5%
AHT Inbound 210 5%
<= 210 Full Score
< 210 - 230 Prorated
> 230 Lose Score
<= 200-210 Score 110%
= 190-200 Score 120%
Hold % 2% 5%
<= 2% Full Score
< 2.5% - 3% Prorated
> 3% Lose Score
<= 1% -2% Score 110%
> 1% Score 120%
Not Ready
(Personal/Business
Related/ACW)
Total 7% 10%
<= 7% Full Score
< 7% - 9%Prorated
> 9% Lose Score
<= 5% Score 110%
>5% Score 120%
Quality Assurance98% C
95% NC20%
1 EUC - 7.5%
1 BC - 5%
1 NC - 2.5%
<= 99% Score 110%
> 100% Score 120%
SPV Calls Observations98% C
95% NC5%
1 EUC - 5%
1 BC - 5%
1 NC - 2.5%
<= 99% Score 110%
> 100% Score 120%
Rejection 2% 5% Go or No Go
Revenue Loss Mistakes Zero Revenue Loss 5%
50 - 100 1%
100-300 2%
300-500 5%
500-1000 10%
1000-2000 15%
2000-2500 17%
25000-3000 19%
3000-3500 21%
3500-4000 23%
Dropped Calls VS Call
BacksTo make call back for 75% of the calls dropped5%
>=75% to get 5%
75%~65% to get 2%
<65% to lose score
<= 90% score 110%
> 100% score 120%
Conformance 100% 5% Go or No Go
Adherence 99% 5% Go or No Go
Absenteeism 0 10% 1 day = -5%
Total
Productivity 20%
Quality 40%
100%
Punctuality 20%
Corp End UserItemWeightCore Job KPI (70%)
20%Global
68 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
4.1 Regression Models estimation reports
Model 1: Service Capacity
Model 2: Service Level
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.114541652
R Square 0.01311979
Adjusted R Square -0.080868801
Standard Error 1.044698859
Observations 24
ANOVA
df SS MS F Significance F
Regression 2 0.304694057 0.152347 0.1395892 0.870515725
Residual 21 22.91930984 1.091396
Total 23 23.2240039
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 12.32151756 1.409568637 8.741339 1.933E-08 9.390159107 15.25287601 9.390159107 15.25287601
Staffing level (# of Agents) -0.001044493 0.019298804 -0.054122 0.9573494 -0.041178552 0.039089567 -0.041178552 0.039089567
Inter-arrival time (mins) 1.299324032 2.492550919 0.521283 0.6076239 -3.884219369 6.482867433 -3.884219369 6.482867433
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.872916887
R Square 0.761983892
Adjusted R Square 0.739315692
Standard Error 0.104435151
Observations 24
ANOVA
df SS MS F Significance F
Regression 2 0.733250102 0.366625 33.6146614 2.84686E-07
Residual 21 0.229040715 0.010907
Total 23 0.962290816
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept -0.408218984 0.140909997 -2.89702 0.008622716 -0.70125736 -0.115180604 -0.701257364 -0.115180604
Staffing level (# of Agents) 0.01408381 0.001929239 7.300191 3.45934E-07 0.010071739 0.018095882 0.010071739 0.018095882
Inter-arrival time (mins) 0.418075245 0.249172217 1.677857 0.108197086 -0.10010675 0.936257237 -0.100106747 0.936257237
69 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
4.2 Regression Models analysis results
Model 1: Service Capacity
Intercept Staffing level inter-arrival time
12.321518 -0.001044493 1.299324032
Staffing level (#
of Agents)
Inter-arrival
time (mins)
CSR Service
Capacity per hour
Expected CSR
Service Capacity
per hour
Percentage
deviation from
model estimateWeek 14 18 0.71 13.87 13.22 4.89%
Week 15 18 0.52 12.71 12.98 -2.09%
Week 16 18 0.46 11.57 12.91 -10.38%
Week 17 18 0.61 14.17 13.10 8.25%
Week 18 18 0.50 15.05 12.95 16.21%
Week 19 25 0.52 10.48 12.97 -19.21%
Week 20 25 0.61 12.37 13.09 -5.50%
Week 21 25 0.51 11.13 12.96 -14.14%
Week 22 25 0.49 13.42 12.93 3.73%
Week 23 27 0.44 13.02 12.86 1.20%
Week 24 27 0.48 12.91 12.91 -0.05%
Week 25 27 0.56 12.85 13.02 -1.32%
Week 26 33 0.64 12.96 13.11 -1.18%
Week 27 36 0.82 12.86 13.35 -3.63%
Week 28 36 0.69 13.42 13.18 1.89%
Week 29 36 0.66 13.95 13.14 6.17%
Week 30 36 0.48 14.22 12.91 10.15%
Week 32 39 0.50 14.06 12.93 8.73%
Week 33 39 0.52 13.27 12.96 2.44%
Week 34 39 0.53 12.87 12.97 -0.78%
Week 35 45 0.61 12.88 13.07 -1.51%
Week 36 45 0.62 12.75 13.08 -2.52%
Week 37 50 0.59 13.42 13.04 2.93%
Week 38 64 0.62 12.50 13.06 -4.27%
70 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Model 2: Service level
Intercept Staffing level inter-arrival time
-0.408219 0.01408381 0.418075245
Staffing level
(# of Agents)
Inter-arrival time
(mins)
Service Level
(%)
Expected
Service level
(%)
Percentage
deviation
Week 14 18 0.71 23.67% 14.11% 67.72%
Week 15 18 0.52 7.00% 6.33% 10.46%
Week 16 18 0.46 5.15% 3.94% 30.79%
Week 17 18 0.61 16.00% 10.03% 59.55%
Week 18 18 0.50 15.26% 5.47% 179.23%
Week 19 25 0.52 6.91% 16.19% -57.31%
Week 20 25 0.61 20.96% 19.81% 5.79%
Week 21 25 0.51 11.23% 15.86% -29.19%
Week 22 25 0.49 12.13% 14.92% -18.69%
Week 23 27 0.44 7.84% 15.51% -49.48%
Week 24 27 0.48 7.81% 17.17% -54.53%
Week 25 27 0.56 10.68% 20.70% -48.39%
Week 26 33 0.64 18.40% 32.28% -43.01%
Week 27 36 0.82 34.74% 44.04% -21.11%
Week 28 36 0.69 32.22% 38.59% -16.50%
Week 29 36 0.66 40.75% 37.33% 9.16%
Week 30 36 0.48 21.74% 30.13% -27.84%
Week 32 39 0.50 42.46% 34.92% 21.57%
Week 33 39 0.52 35.70% 35.85% -0.42%
Week 34 39 0.53 61.75% 36.32% 70.03%
Week 35 45 0.61 66.85% 48.25% 38.54%
Week 36 45 0.62 59.71% 48.55% 22.99%
Week 37 50 0.59 52.76% 54.28% -2.79%
Week 38 64 0.62 64.08% 75.21% -14.80%
71 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
4.3 DEA’s results
DEA’s first iteration results
DMUsEfficiency Scores
(unrestricted)
Weight on CSR
Service Capacity
per hour
Weight on
Service level
Week 14 100.00% 5.00% 95.00%
Week 15 84.43% 100.00% 0.00%
Week 16 82.89% 100.00% 0.00%
Week 17 95.50% 7.00% 93.00%
Week 18 100.00% 100.00% 0.00%
Week 19 66.85% 100.00% 0.00%
Week 20 74.21% 2.00% 98.00%
Week 21 72.09% 100.00% 0.00%
Week 22 90.88% 100.00% 0.00%
Week 23 98.90% 100.00% 0.00%
Week 24 89.90% 100.00% 0.00%
Week 25 76.07% 100.00% 0.00%
Week 26 68.59% 23.00% 77.00%
Week 27 68.52% 2.00% 98.00%
Week 28 70.61% 8.00% 92.00%
Week 29 81.56% 2.00% 98.00%
Week 30 99.30% 23.00% 77.00%
Week 32 100.00% 23.00% 77.00%
Week 33 89.48% 23.00% 77.00%
Week 34 100.00% 0.00% 100.00%
Week 35 93.82% 0.00% 100.00%
Week 36 84.27% 2.00% 98.00%
Week 37 87.35% 7.00% 93.00%
Week 38 89.01% 0.00% 100.00%
72 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
4.4 Queue Data
4.5 Queue call-proportions chart
Call Volume AHT Call Volume AHT Call Volume AHT Call Volume AHT Call Volume AHT Call Volume AHT
Week 14 2596 4.32 1799 4.98 2603 3.81 51 3.23 46 4.51 27 7.69
Week 15 3725 4.64 2460 5.34 3263 4.30 79 4.09 69 5.03 68 4.47
Week 16 4300 4.92 2602 5.92 3670 4.66 93 4.04 95 7.17 97 6.67
Week 17 3424 3.94 1883 5.38 2749 3.76 82 5.44 73 3.94 53 3.70
Week 18 3828 4.05 2391 3.92 3570 3.92 110 3.79 90 5.21 75 3.82
Week 19 3773 5.78 2447 6.60 3102 5.04 169 3.65 115 4.52 57 4.35
Week 20 2784 5.17 2187 5.17 3069 4.43 82 4.26 92 4.72 73 3.34
Week 21 2733 5.56 2540 7.18 4296 4.45 85 3.41 85 1.83 74 2.55
Week 22 2792 3.80 2596 5.50 4611 4.43 51 3.16 100 3.25 112 3.69
Week 23 3235 4.85 2811 6.42 5036 3.99 121 2.96 140 3.09 168 2.29
Week 24 3148 4.80 2405 5.91 4604 4.42 84 1.84 129 3.15 182 2.14
Week 25 2658 4.48 2114 5.97 3889 4.40 72 1.78 111 3.28 124 1.78
Week 26 2200 5.45 2013 5.79 3397 4.01 37 1.59 114 2.30 153 1.22
Week 27 1620 5.04 1563 4.94 2809 4.44 49 2.45 51 3.16 76 2.15
Week 28 1596 4.78 1996 4.84 3589 4.17 31 2.96 74 3.01 54 2.78
Week 29 2870 4.69 2310 4.31 2357 4.00 52 1.77 27 1.62 60 1.81
Week 30 7422 4.21 2841 4.21 0 0 25 3.70 45 5.09 70 4.83
Week 31 3435 3.82 969 5.21 0 0 27 2.55 23 4.95 30 2.70
Week 32 6954 4.12 2978 4.93 0 0 53 4.29 77 3.35 59 3.65
Week 33 6689 4.21 2823 5.45 0 0 56 3.96 72 4.72 49 4.16
Week 34 6628 4.19 2695 6.45 0 0 42 2.98 61 5.61 61 2.47
Week 35 5213 4.27 2859 5.86 0 0 33 4.82 53 4.19 43 4.11
Week 36 5264 4.48 2680 5.59 0 0 61 3.84 55 3.76 46 2.80
Week 37 5694 4.16 2700 5.19 0 0 30 5.62 48 3.09 66 2.90
Week 38 5380 4.31 2651 6.38 0 0 27 4.12 44 3.09 35 2.73
Queue 6Queue 1 Queue 2 Queue 3 Queue 4 Queue 5
73 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
4.6 Service Capacity Expectation per hour Dataset
Staffing level
(# of Agents)
Inter-arrival time
(mins)
CSR Service Capacity
Expectation per hour
CSR Service
Capacity per hourService Level (%)
Week 14 18 0.71 14.59 13.87 23.67%
Week 15 18 0.52 14.42 12.71 7.00%
Week 16 18 0.46 14.49 11.57 5.15%
Week 17 18 0.61 14.49 14.17 16.00%
Week 18 18 0.50 14.61 15.05 15.26%
Week 19 25 0.52 14.37 10.48 6.91%
Week 20 25 0.61 14.63 12.37 20.96%
Week 21 25 0.51 15.08 11.13 11.23%
Week 22 25 0.49 15.19 13.42 12.13%
Week 23 27 0.44 15.18 13.02 7.84%
Week 24 27 0.48 15.23 12.91 7.81%
Week 25 27 0.56 15.17 12.85 10.68%
Week 26 33 0.64 15.09 12.96 18.40%
Week 27 36 0.82 15.24 12.86 34.74%
Week 28 36 0.69 15.38 13.42 32.22%
Week 29 36 0.66 14.08 13.95 40.75%
Week 30 36 0.48 12.49 14.22 21.74%
Week 32 39 0.50 12.45 14.06 42.46%
Week 33 39 0.52 12.45 13.27 35.70%
Week 34 39 0.53 12.47 12.87 61.75%
Week 35 45 0.61 12.31 12.88 66.85%
Week 36 45 0.62 12.37 12.75 59.71%
Week 37 50 0.59 12.39 13.42 52.76%
Week 38 64 0.62 12.36 12.50 64.08%
OutputsInputs
74 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
4.7 DEA third iteration – Efficiency scores (unrestricted)
DMUsEfficiency Scores
(unrestricted)
Weight on CSR
Service Capacity
per hour
Weight on
Service level
Week 14 100.00% 7.00% 93.00%
Week 15 85.41% 100.00% 0.00%
Week 16 82.89% 100.00% 0.00%
Week 17 96.03% 7.00% 93.00%
Week 18 100.00% 100.00% 0.00%
Week 19 68.58% 100.00% 0.00%
Week 20 80.76% 14.00% 86.00%
Week 21 72.09% 100.00% 0.00%
Week 22 90.88% 100.00% 0.00%
Week 23 98.90% 100.00% 0.00%
Week 24 89.90% 100.00% 0.00%
Week 25 79.42% 100.00% 0.00%
Week 26 78.68% 37.00% 63.00%
Week 27 68.52% 2.00% 98.00%
Week 28 80.97% 14.00% 86.00%
Week 29 92.43% 14.00% 86.00%
Week 30 100.00% 100.00% 0.00%
Week 32 100.00% 63.00% 37.00%
Week 33 94.20% 63.00% 37.00%
Week 34 100.00% 0.00% 100.00%
Week 35 100.00% 0.00% 100.00%
Week 36 96.91% 19.00% 81.00%
Week 37 99.10% 19.00% 81.00%
Week 38 96.51% 19.00% 81.00%
75 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
4.8 DEA fourth iteration – weight restricted - results charted
76 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
5.1 Regression Model estimation reports:
Model 1: Service Capacity
Model 2: Quality
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.475770044
R Square 0.226357135
Adjusted R Square 0.198727033
Standard Error 2.569468021
Observations 30
ANOVA
df SS MS F Significance F
Regression 1 54.08765213 54.08765213 8.192410317 0.007875183
Residual 28 184.8606455 6.602165911
Total 29 238.9482976
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 2.263454857 5.514808657 0.410432165 0.68461279 -9.033118581 13.5600283 -9.033118581 13.5600283
Queue target Service Capacity per hour 0.801049863 0.279868295 2.86223869 0.007875183 0.227765648 1.374334078 0.227765648 1.374334078
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.387681016
R Square 0.15029657
Adjusted R Square 0.119950019
Standard Error 0.084678424
Observations 30
ANOVA
df SS MS F Significance F
Regression 1 0.035512828 0.035512828 4.952673844 0.03427946
Residual 28 0.200772192 0.007170435
Total 29 0.23628502
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 0.504402038 0.181743964 2.775344098 0.009714394 0.132116404 0.876687672 0.132116404 0.876687672
Queue target Service Capacity per hour 0.020525943 0.009223234 2.225460367 0.03427946 0.001633003 0.039418882 0.001633003 0.039418882
77 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Model 3: Punctuality
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.195946224
R Square 0.038394923
Adjusted R Square 0.004051884
Standard Error 0.040414899
Observations 30
ANOVA
df SS MS F Significance F
Regression 1 0.001826073 0.001826073 1.117982698 0.29939071
Residual 28 0.045734193 0.001633364
Total 29 0.047560266
Coefficients Standard Error t Stat P-value Lower 95% Upper 95% Lower 95.0% Upper 95.0%
Intercept 1.019717839 0.086741859 11.75577565 2.41601E-12 0.842035195 1.197400484 0.842035195 1.197400484
Queue target Service Capacity per hour -0.004654462 0.00440202 -1.057347009 0.29939071 -0.013671591 0.004362666 -0.013671591 0.004362666
78 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
5.2 Regression Models results:
Model 1: Service Capacity
Intercept
Queue Service
Capacity target
2.263454857 0.801049863
Agent Name Queue
Queue target
Service Capacity
per hour
CSR Service
Capacity per
hour
Expected
Service
Capacity
Percentage
deviation
from model
estimate
Agent 1 20.81 20.57 18.93 8.7%
Agent 2 20.81 20.93 18.93 10.6%
Agent 3 20.81 17.48 18.93 -7.7%
Agent 4 20.81 21.43 18.93 13.2%
Agent 5 20.81 19.35 18.93 2.2%
Agent 6 20.81 16.22 18.93 -14.3%
Agent 7 20.81 19.46 18.93 2.8%
Agent 8 20.81 18.65 18.93 -1.5%
Agent 9 20.81 19.78 18.93 4.5%
Agent 10 20.81 18.65 18.93 -1.5%
Agent 11 20.81 19.05 18.93 0.6%
Agent 12 20.81 15.93 18.93 -15.9%
Agent 13 20.81 21.18 18.93 11.9%
Agent 14 20.81 21.95 18.93 15.9%
Agent 15 20.81 22.78 18.93 20.3%
Agent 16 20.81 14.94 18.93 -21.1%
Agent 17 20.81 20.93 18.93 10.6%
Agent 18 20.81 15.58 18.93 -17.7%
Agent 19 20.81 16.74 18.93 -11.6%
Agent 20 20.81 18.37 18.93 -3.0%
Agent 21 17.73 13.28 16.47 -19.3%
Agent 22 17.73 19.89 16.47 20.8%
Agent 23 17.73 13.48 16.47 -18.1%
Agent 24 17.73 12.50 16.47 -24.1%
Agent 25 16.98 14.69 15.87 -7.4%
Agent 26 16.98 16.90 15.87 6.5%
Agent 27 16.98 14.63 15.87 -7.8%
Agent 28 16.98 14.94 15.87 -5.9%
Agent 29 16.98 18.00 15.87 13.4%
Agent 30 16.98 21.43 15.87 35.1%
Qu
eu
e 1
Qu
eu
e 2
Qu
eu
e 3
79 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Model 2: Quality
Intercept
Queue Service
Capacity target
0.504402038 0.020525943
Agent Name Queue name
Queue target
Service Capacity
per hour
QualityExpected
Quality
Percentage
deviation
from model
estimate
Agent 1 20.81 95.2% 93.2% 2.2%
Agent 2 20.81 94.3% 93.2% 1.3%
Agent 3 20.81 83.3% 93.2% -10.5%
Agent 4 20.81 91.1% 93.2% -2.2%
Agent 5 20.81 94.6% 93.2% 1.5%
Agent 6 20.81 86.6% 93.2% -7.1%
Agent 7 20.81 98.7% 93.2% 5.9%
Agent 8 20.81 89.2% 93.2% -4.2%
Agent 9 20.81 98.9% 93.2% 6.2%
Agent 10 20.81 90.3% 93.2% -3.0%
Agent 11 20.81 92.5% 93.2% -0.7%
Agent 12 20.81 98.4% 93.2% 5.7%
Agent 13 20.81 90.9% 93.2% -2.4%
Agent 14 20.81 99.6% 93.2% 6.9%
Agent 15 20.81 99.8% 93.2% 7.1%
Agent 16 20.81 86.6% 93.2% -7.1%
Agent 17 20.81 91.6% 93.2% -1.7%
Agent 18 20.81 90.9% 93.2% -2.4%
Agent 19 20.81 90.0% 93.2% -3.4%
Agent 20 20.81 95.7% 93.2% 2.7%
Agent 21 17.73 95.9% 86.8% 10.4%
Agent 22 17.73 100.0% 86.8% 15.2%
Agent 23 17.73 84.8% 86.8% -2.4%
Agent 24 17.73 92.0% 86.8% 5.9%
Agent 25 16.98 70.0% 85.3% -17.9%
Agent 26 16.98 91.7% 85.3% 7.5%
Agent 27 16.98 100.0% 85.3% 17.2%
Agent 28 16.98 81.0% 85.3% -5.0%
Agent 29 16.98 90.9% 85.3% 6.6%
Agent 30 16.98 57.8% 85.3% -32.3%
Qu
eu
e 1
Qu
eu
e 2
Qu
eu
e 3
80 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
Model 3: Punctuality
Intercept
Queue Service
Capacity target
1.019717839 -0.004654462
Agent Name Queue name
Queue target
Service Capacity
per hour
PunctualityExpected
Punctuality
Percentage
deviation
from model
estimate
Agent 1 20.81 95.3% 0.92 3.2%
Agent 2 20.81 91.0% 0.92 -1.4%
Agent 3 20.81 94.8% 0.92 2.8%
Agent 4 20.81 92.9% 0.92 0.6%
Agent 5 20.81 96.8% 0.92 4.9%
Agent 6 20.81 91.7% 0.92 -0.6%
Agent 7 20.81 95.1% 0.92 3.1%
Agent 8 20.81 86.5% 0.92 -6.2%
Agent 9 20.81 93.8% 0.92 1.7%
Agent 10 20.81 95.8% 0.92 3.8%
Agent 11 20.81 95.4% 0.92 3.4%
Agent 12 20.81 83.7% 0.92 -9.4%
Agent 13 20.81 91.5% 0.92 -0.9%
Agent 14 20.81 91.9% 0.92 -0.4%
Agent 15 20.81 94.1% 0.92 1.9%
Agent 16 20.81 95.0% 0.92 2.9%
Agent 17 20.81 83.0% 0.92 -10.1%
Agent 18 20.81 88.8% 0.92 -3.7%
Agent 19 20.81 95.8% 0.92 3.8%
Agent 20 20.81 91.7% 0.92 -0.6%
Agent 21 17.73 98.2% 0.94 4.8%
Agent 22 17.73 96.3% 0.94 2.8%
Agent 23 17.73 92.4% 0.94 -1.4%
Agent 24 17.73 93.4% 0.94 -0.4%
Agent 25 16.98 94.8% 0.94 0.7%
Agent 26 16.98 95.9% 0.94 1.9%
Agent 27 16.98 83.9% 0.94 -10.8%
Agent 28 16.98 91.7% 0.94 -2.5%
Agent 29 16.98 96.8% 0.94 3.0%
Agent 30 16.98 96.9% 0.94 3.0%
Qu
eu
e 1
Qu
eu
e 2
Qu
eu
e 3
81 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
5.3 DEA’s first iteration efficiency scores and weights:
Agent NameQueue
name
Efficiency
Scores
Service
CapacityQuality Punctuality
Agent 1 86.47% 14% 86% 0%
Agent 2 87.45% 14% 86% 0%
Agent 3 79.90% 0% 0% 100%
Agent 4 88.42% 14% 86% 0%
Agent 5 83.94% 2% 23% 75%
Agent 6 77.49% 0% 49% 51%
Agent 7 84.13% 0% 49% 51%
Agent 8 78.99% 14% 86% 0%
Agent 9 84.64% 14% 86% 0%
Agent 10 81.92% 2% 23% 75%
Agent 11 82.61% 2% 23% 75%
Agent 12 81.38% 1% 99% 0%
Agent 13 87.55% 14% 86% 0%
Agent 14 91.85% 14% 86% 0%
Agent 15 94.61% 14% 86% 0%
Agent 16 80.01% 0% 0% 100%
Agent 17 86.89% 14% 86% 0%
Agent 18 78.04% 0% 49% 51%
Agent 19 80.77% 0% 49% 51%
Agent 20 81.42% 0% 66% 34%
Agent 21 98.93% 0% 49% 51%
Agent 22 100.00% 14% 86% 0%
Agent 23 91.37% 0% 0% 100%
Agent 24 94.47% 0% 49% 51%
Agent 25 97.80% 0% 0% 100%
Agent 26 99.91% 0% 49% 51%
Agent 27 100.00% 1% 99% 0%
Agent 28 94.71% 0% 0% 100%
Agent 29 100.00% 2% 23% 75%
Agent 30 100.00% 100% 0% 0%
Qu
eu
e 1
Qu
eu
e 2
Qu
eu
e 3
Weights chosen by DEA
82 | P e r f o r m a n c e E v a l u a t i o n i n C a l l C e n t e r s
References Agasisti, Tommaso and Geraint Johnes. "Beyond frontiers: comparing the efficiency of higher education decision-making
units across more than one country." Education Economics (2009).
Aksin, Zeynep, Francis de Vericourt and Fikri Karaesmen. "Call Center Outsourcing Contract Analysis and Choice."
Management Science 54.2 (2008).
BEASLEY, J. E. "Determining Teaching and Research Efficiencies." Journal of the Operational Research Society (1995).
Call Center Helper Magazine. "The history of the Call Center." 2011.
Cooper, William W., Lawrence M. Seiford and Joe Zhu. Handbook on Data Envelopment Analysis. Springer, 2011.
Datamonitor. "Global - Call Centers." 2004.
Dimension Data. "Dimension Data's Global Contact Centre Benchmarking Summary report 2012." 2012.
Forker, Laura B. and David Mendez. "An Analytical method for benchmarking best peer suppliers." International Journal
of Operations and Production Management 21.1/2 (2001): 195-209.
Gans, Noah, Ger Koole and Avishai Mandelbaum. "Telephone Call Centers: Tutorial, Review, and Research Prospects."
Manufacturing and Service Operations Management (2003): 79-141.
Kaplan, Robert S. and David P. Norton. "The Balanced Scorecard - Measure That Drive Performance." Harvard Business
Review January-February 1992: 71-79.
Lapre, Michael A. and Gary D. Scudder. "Performance Improvement Paths in the U.S. Airline industry: Linking Trade-offs
to Asset Frontiers." Production and Operations Management 13.2 (2004).
Lovell, C.A. Knox and Jesus T. Pastor. "Radial DEA models without inputs or without outputs." European Journal of
Operational Research (1999).
Ross, Anthony and Carnelia Droge. "An integrated benchmarking approach to distribution center performance using DEA
modeling." Journal of Operations Management (2002): 19-32.
Sarkis, Joseph. "An Analysis of the operational efficiency of major airports in the United States." Journal of Operations
Management (2000).
Sun, Shinn. "Assessing joint maintenance shops in the Taiwanese Army using data envelopment analysis." Journal of
Operations Management (2004).
Sunnetci, Aysun and James C. Benneyan. "Weight Restricted DEA Models to Identify the Best U.S. Hospitals."
Proceedings of the 2008 Industrial Engineering Research Conference. 2008.