Torrent pharmaceuticals limited ; suchit

Project Report on

Developing statistical models of demand forecasting for domestic trade market operations of Torrent Pharmaceuticals

Ltd.

In partial fulfillment of requirements of

Master of Business Administration (2006-08)

Submitted BY

SUCHIT SHAHRoll no.51

MBA-I

SUBMITTED TO AES PGIBM

Undertaken at Torrent Pharmaceuticals Limited

ACKNOWLEDGEMENT

I here by wish to take the opportunity to express my gratitude to Mr. K G Ramchandran- General Manager Human Resources; for allowing to undertake my summer training at a well reputed organization like Torrent Pharmaceuticals Limited and Ms. MIti Randeri and Ms. Mallika Priyadarshini- Assistant Manager HR for taking care of all our official requirements.

I express my sincere thanks and gratitude to Mr. Vipul Patel- General Manager Supply Chain Management and Mr. Chandan Chatterjee-AGM Supply Chain Management for guiding and encouraging me to carry out my project work successfully.

I wish to convey my deepest regards and thanks to my project guide, Mr. Bhavesh Nainani-Manager DPC and Mr. Deep Vyas -Manager DPC for their all help, timely guidance and feedback in spite of a very busy schedule. They always managed to find time to sit with me and provide the necessary guideline and ideas.

I also wish to express my sincere thanks to Mr. Bhavin Shah- Assistant Manager DPC and Mr. Jayant Nikhare- Assistant Manager DPC for their help and support in every possible way.

Finally I wish to thank all staff of Supply Chain Management Department for their kind co operation during the tenure of my project.

Suchit ShahSummer Trainee

2

May-July (2007)AES Post Graduate Institute of Business ManagementAhmedabad

TABLE OF CONTENTS

Heading Page No.Executive Summary 04Torrent Group-overview 06Mission Vision & Values 06Objective 07Salient Features 07Project Constraint 07Assumptions made during the project 07Project Overview 09Benefits Expected 10Demand Forecasting-Introduction 10

The basic steps in a forecasting task 11

Company network 14Demand Planning @SCM Dept 17

Exponential Smoothing 25

Triple Exponential Smoothing 25Multiple Regressions with ‘n’ factors 33

Multiple Regression with MS Excel 42

LINEST function 49Fitting Multiple Regression Model 46

Methodology 56

3

Regional level forecasting 75

Findings 79

Recommendations 83

Future scopes of the model 83

Limitations 84

References 85

Executive Summary

This report first attempts to study, how Planning system works at Domestic Demand Planning Cell (DPC), SCM dept, Torrent Pharmaceuticals Ltd with a view to get acquainted with the system & processes. It tries to understand the various reports prepared by DPC.What type of data are maintained, in what form, in what type? To understand the existing Demand forecasting procedure. An exploratory analysis is done.

Initially it attempts to get the idea of product basket. Products are primarily classified as per their sales behavior. In the initial phase, it studies the product basket. It attempts to identify and define all the factors which may directly or indirectly affect the sales of the SKU.

After getting acquainted with the product basket and its behavior, it defines the definition of problem. Demand forecasting is the process of determining what products are needed where, when, and in what quantities. It is needed to forecast for the sales in a way so that it shows less fluctuations. It explores all concerned topic with demand forecasting.

The present system of the forecasting is well structured and well defined. Somehow it has not been able to show accurate results of forecasting. The system can’t quantify the fluctuation in the actual sales. That is why a need, for developing a statistical model, arises. Then report tries to explore for the alternative models available.

With having the sales data of 24 months, a triple exponential smoothing model is applied with its assumptions. That hasn’t shown the desired output and so, it has been rejected.

4

The actual sales data are affected by many parameters. They all should be taken care of and should be given effect to the actual sales. After seeing the complexities, it decides to apply Multiple Regression model with the parameters short listed. It takes all the assumption of multiple regression for granted. Under the model it considers the sales as an independent and affecting parameter as dependent one.

A database is made for getting the data of included parameters. Data are collected from the SAP and ORG-Marg. Primary sales and institutional sales are got through the internal source of data. Tertiary sales are got through ORG-data.

By using MS Excel (2003), multiple regression is fit to the data and the forecasts are generated for the future months. Forecast accuracy is calculated as per DPC methodology. These results are compared with the results of the existing systems.It has shown a significant increase in forecast accuracy.

A model of demand forecasting at regional level is also made. But it can neither be analyzed nor be validated due to time constraint.It is recommended to implement model for demand forecasting with the personnel intervention and with the addition of due insights.

5

Torrent Group: Overview

It all began with the inspired efforts of one enterprising individual, Shri U N Mehta, when he ventured on his own to create history in the Indian Pharmaceutical industry, by successfully implementing the concept of niche marketing. With the launch of Trinicalm Plus, an effective tranquilizer, the foundation of the company was laid as ‘Trinity Laboratories, which was later, renamed ‘Torrent ‘. Today Torrent is one of the leading pharmaceutical companies of India. Torrent is multifaceted and dynamic group dedicated to transforming life by serving two of its most critical needs- healthcare and energy.

In the power sector, the Torrent Group remains the most experienced private sector player in the state of Gujarat. Torrent just lunched a mega project, the 1100 MW SUGEN CCPP, being set up at an investment of Rs. 3096 crores, is a backward integration move of Torrent Power to secure a reliable source of supply for its Ahmedabad and Surat distribution areas.

The project is strategically located. It is close to River Tapi, National Highway No.8, gas supply infrastructure comprising LNG terminals and main gas trunk lines The plant would comprise of 3 advanced class gas turbines with a high operating efficiency. Environmental and social impact of this project is minimal due to use of eco-friendly Natural Gas

The flagship company of Torrent group, Torrent Pharmaceuticals Limited, is a dominant player in the therapeutic areas of cardiovascular (CV) and central nervous system (CNS) and has achieved significant presence in gastro-intestinal, diabetology, anti-infective and pain management segments.

To cater to new niche segments and sharpen its focus among customers, Torrent Pharma has ‘11’ marketing divisions, each catering to defined therapeutic segment. Torrent Pharma’s competitive advantage as a manufacturer stems from its world-class manufacturing facilities. Its manufacturing facilities at Indrad, Gujarat, comply with USFDA,WHO, cGMP, MHRA and TGA norms and have received ISO 9001, ISO 14001 and OHSAS 18001 (Occupational Health and Safety Management System) and ISO/IEC- 17025 certifications.

With a view to cater to its growth requirements, Torrent Pharma commissioned a new state of art formulations manufacturing facility at Baddi, Himachal Pradesh, in November 2005. The facility has a capacity to manufacture 3600 million tablets, 400 million capsules and 18 million Oral Liquid bottles, per annum and would cater to the domestic formulations requirement.

Torrent has a modern and well-equipped state-of-the-art R&D Centre, built with an investment of US $ 40 million. It is manned by more than 525 highly qualified scientists, with a combined experience of over 2500 scientific man-years in Drug Discovery and Development. Torrent Pharma has earmarked 9% of sales year-after-year for R&D advancement.

6

In the International operations arena, Torrent Pharma exports to more than 50 countries around the world with over 1000 product registrations. The international business has been broadly divided into five zones- USA, Latin America, Russia and CIS, Western Europe and CEE and Rest of the World (ROW). For its export excellence in International Business, Torrent Pharma has won several prestigious export awards.

Torrent Pharma is now gearing up to enter the advanced highly regulated international markets. Torrent Pharma has incorporated Zao Torrent Pharma in Russia, Torrent Do Brasil Ltda in Brazil, Torrent Pharma GmbH in Germany, Torrent Pharma Inc. in USA and Torrent Pharma Philippines Inc. in Philippines. These wholly owned subsidiaries will become a springboard for entry into several regulated and less regulated international markets.

TORRENT PHARMACEUTICALS LIMITED

Mission:We commit ourselves to total customer care by delivering world –class products and services.

VisionTo be the leader in the pharmaceutical industry

ValuesA set of core value continue to guide us through the process of transforming the conglomerate into a high-performing and caring organization for our customers, employees, shareholders and society.

Improving quality of life of our customers, as we believe quality is a way of life. Creating value for our shareholders, for the trust bestowed on us. Building an empowered and ethical Torrent family, as the foundation for a

bright future. Responsibility towards the society and environment, as we owe our existence

to them. Being innovative in solutions, for being different, counts. Striving for excellence in whatever we do, to follow the exclusive path to

leadership. Flexibility and speed shall be our oars for navigating the turbulent seas.

7

Objective of Project

To develop a statistical model of demand forecasting for domestic trade market operations of Torrent Pharmaceuticals Ltd.

@Gross level @Regional level

Salient Features

Following are the salient features of the project

It aims to improve the existing demand forecasting process by using a statistical tool

It tries to cover all the quantitative and qualitative factors which affect the actual sales

It takes into consider the uncertain fluctuations and captures them It discusses the product specific sales behavior It makes the whole forecasting procedure a dynamic one It reveals the clear picture of Pharma sector from drug specific to

macro level

Project Constraints

The project is based on the tertiary sales made available from ORG-Marg data, it may contain inaccuracy up to some extent The project doesn’t include the secondary sales data The project may considers the parameters only for which data are

available The project tries to estimate the future values of the parameters

Assumptions Made during the project

Data, which are collected, is accurate. Future estimates of the parameters are true. Parameters taken into considerations are least correlated Data collection horizon ranges from May’05 to June’05

8

Project overview

9

To study how the demand planning works at SCM dept.,

To develop a statistical tool for demand forecasting

Identifying and defining the parameters

Applying various statistical tools

Matching output of the model with the past sales

Comparison with existing system and suggested system

Checking the robustness of the tool

Comparison with existing system and suggested system

Implementation, if all the criteria are fulfilled for all or partial numbers of SKUs

Benefits expected Minimization of overstocking Reducing the gap between orders and actual sales No opportunity loss, which may result into growth Better inventory control and hence better cash flow Better utilization of resources Dispatch efficiency Smooth operation flow from demand planning to order execution Prior planning of recruitment and changes in workforce Proper allocation of promotional budget

DEMAND FORECASTING- a brief overviewIntroductionWhat does the word forecast mean?

The word “fore” means ‘watch out’ in golf and is shouted as a warning to anyone who could potentially be in the path of a misplaced golf ball. The word “cast” to an angler means “throw out.” Putting the two words together, a word is made i.e. “forecast”. That means “watch out and Throw out.’

Forecast management is the process of making, checking, correcting and using forecasts. It also includes determination of the forecast horizon.

Forecast- An estimate of future demand. A forecast can be determined by mathematical means using historical data. It can be created subjectively by using estimates from informal sources, or it can represent a combination of both techniques.

Forecasting involves making projections about future performance on the basis of historical and current data.

Forecast methods can be divided into history-based and future-based ones. History-based demand forecasts are analytic methods based on consume

statistics. They can be further divided into mathematical and graphic methods.

Future-based demand forecasts use already existing information about future demand e.g. offers, confirmed orders in a contracting phase and interviews on customer behavior. (Schönsleben, 1998) In this study, conditional variance models are used for quantifying the demand process uncertainty. The uncertainty can for example be dependent on the level of demand, the previous shocks and the historic level of the variance process.

Understanding customer demand is key to any manufacturer to make and keep sufficient inventory so customer orders can be correctly met. The discipline that helps a supply chain forecast and plan well is called as demand planning.

Accurate and timely demand plans are a vital component of and effective supply chain. Inaccurate demand forecasts typically would result in supply imbalances. Although

10

revenue forecast accuracy is important for corporate planning, forecast accuracy at the SKU level is critical for proper allocation of resources.

Types of forecasting Quantitative forecasting is used when sufficient quantitative information is

available. Qualitative forecasting is used when little quantitative information is available, but

sufficient qualitative knowledge exists.

Quantitative forecasting can be applied when three conditions exist:1. Information about the past is available2. This information can be quantified in the form of numerical data.3. It can be assumed that some aspects of the past pattern will continue into

the future.

Under quantitative forecasting methods, there are tow major types of forecasting models: Explanatory models Time series forecasting

Explanatory models assume that the variable to be forecasted exhibits an explanatory relationship with one or mote independent variables.Time series forecasting deals with the past data only. It makes no attempt to discover the factors affecting its behavior. The objective of time series forecasting methods is to discover the pattern in the historical data series and extrapolate that pattern into the future.

Forecast Management

Forecast management is the process of making, checking, correcting and using forecasts. It also includes determination of the forecast horizon. While designing a forecasting system, the policy issues of what to forecast, why forecast is needed, and who does the forecasting must be addressed. A forecast is meaningful only in relation to planning and decision making in some area of business application. Thus, an important aspect of any forecasting system is knowing and planning how it will be used in business planning, budgeting, and the operations aspects of master scheduling and inventory planning. Different attributes of the forecasting system of varying levels of concern and interest to people in each of these areas.

The basic steps in a forecasting taskForecasting is a five steps sequential process for which quantitative data is available. Step 1: Problem Definition Step 2: Gathering information Step 3: Preliminary (exploratory analysis) Step 4: Choosing and fitting models Step 5: Using and evaluating a forecasting model

11

Problem definitionThe definition of problem involves developing a deep understanding of how the forecasts will be used, who requires the forecasts, and how the forecasting function fits within the organization. It is worth spending time talking to everyone who sill be involved in collecting data, maintaining databases, and using the forecasts for future planning. A forecaster has a great deal of work to do to properly define the forecasting problem, before any answers can be provided. One need to know exactly wha products are stored who uses them, how long it takes to produce each item, what level of unsatisfied demand the company is prepared to bear, and so on.

Gathering informationThe information available can be mainly of two types:1. Statistical data2. The accumulated judgment and expertise of key personnel

Exploratory analysisBy calculating simple statistics like mean, standard deviation, correlation, minimum, maximum, percentiles associated with each set of data. On having more than one series of historical data, one can use descriptive statistics for exploration.The purpose of doing this at this stage is to get a feel for the data. Do they follow consistent patterns? Is there evidence of the presence of business cycles? Are there any outliers in the data that need to be explained by those with expert knowledge? How strong are the relationships among the variables available for analysis?

Choosing and fitting modelsAfter doing the exploratory analysis, it can be understood that how to handle the data.What pattern and what behavior is being observed? One can understand that what are the things that affect the actual sales? So, it is the stage when one can choose the model which is to be fitted. One can interpret the characteristics of the actual past data. And can also determine which model can be chosen? One has to match the assumption of the specific models with the data. After choosing the model, one should fit it to the data. If necessary than it should be modified accordingly.

Using and evaluating a forecasting modelAfter fitting the model with the actual data, inference can be derived. Accordingly one can have the forecasts as per the model for the future data. It should be checked by holding one month actual data, and giving the forecast. After getting that forecast, it should be compared with the data. Forecast effectiveness (forecast accuracy) should be calculated. If that is better than the present system, it should be used.

SCM @ TORRENT

12

Supply Chain Management coordinates entire channel from supplier to customer. Supply Chain Management is the management of the entire value-added chain, from the supplier to manufacturer right through to the retailer and the final customer. Supply chain management coordinates almost all the departments of the company. It links the departments and smoothens the whole system.

SCM has three primary goals: Reduce inventory, Increase the transaction speed by exchanging data in real-time, Increase sales by implementing customer requirements more efficiently.

Planning done at SCM is the indicator for all the other departments i.e. Production, finance, marketing and HR also.

Torrent’s supply chain management is mainly bifurcated in to two divisions,i.e. Domestic operations division International operations divisions

Domestic operations division is bifurcated in to following,i.e. C&FA Cell Demand Planning Cell Indrad Warehouse Zirakpur Warehouse PPC- Indrad & Baddi

Supply chain management department is well equipped with necessary infrastructure. It has all the means of modern software, and hardware.

To cater and handle the large company multipoint network across India and whole world, SAP is implemented at TORRENT SCM dept., MM module (Material Management module) and PP module (Production planning module) is used by the department personnel.

Microsoft excel is used extensively at the department. Various MIS are prepared by using MS Excel from SAP data.

Company networkTorrent’s corporate office is based at Ahmedabad. All the supporting activities are conducted from the HO (based at Ahmedabad). Two plants are situated at Indrad (Gujarat) and Baddi (Himachal Pradesh).Most of the domestic requirement is served by the Baddi plant. Company has set up its warehouse at Zirakpur (Punjab).Products produced at Baddi ppc are stored at Zirakpur warehouse. All dispatches are done from the warehouse.Company has 25 carrying and Forwarding agents across all over India.C&FAs are responsible for the primary sales in the particular allocated region.C&FAs are the agents which sell the products to the stockiest. They get the orders from the stockiest and that is further put to the supply chain department at HO.Again all these activities are coordinated by Supply Chain Department.Company engages in mainly two type of selling.

13

Trade sales Institutional sales

Sales with trade aspect are the sales done through the channels of C&FAs. While institutional sales are the sales to the institutions like hospitals, railways, army etc…

14

SCM@HO

Baddi ppc

Indrad PPC

Indrad Warehouse

Zirakpur Warehouse

C&FAsC&FAsC&FAs C&FAs

Institutions Stockist

Stockist

Retailer

RetailerRetailer

Demand planning

Secondary sales

Primary sales

CustomerTertiary sales

Stock Transfer

Inter C&FA

Warehouse to C&FA

Company network

ProductsCompany has a product basket consisting 500+ products.

15

Company produces products in the form of Tablets, capsules, liquid and injections.Each product is allocated a unique 7 digit product code.From marketing point of view, there are 11 divisions made; accordingly the drugs are allocated to the divisions. Sensa, Mind, Axon, Neuron, Azuca, Psycan, Omega, Delta, Prima, Vista, Alfa

These division again are classified into three groups; PVA, APOD, SMAN Where; PVA= Prima, Vista, Alpha (Anti Infective segment) APOD= Azuca, Psycan, Omega, Delta (Cardiology and Diabetology) SMAN=Sensa, Mind, Axon, Neuron (Central Nervous System)

Product Classification

These products show different behaviors in selling quantity. Accordingly one should also classify as,

Matured (stable) products Seasonal product New products

Matured products are the products which are there in a market since longtime.They show the particular pattern and do not show significant deviation. One can understand the fluctuation. They reflect clearly the stable pattern.

E.g. Nikoran 5, Deplatt tab, Antidep

Seasonal products are the products which show particular seasonal behaviors.Sales goes high in particular season i.e. in particular month.By having more than one cycle i.e. a year, it can also be estimated that amount hike due to the particular season. There are certain products which depend on the season.

E.g. Quintor Infusion is the product which has shown high sales in the month of April, May.

New Products are the products which are launched within 6 months. It is not easy to estimate its behavior. By having less data, one can not capture the trend and the amount of deviation. So, it is not that easy to capture the fluctuation in the selling quantity.

e.g Rimofit, Rimoslim are the product just launched in the month of May’07.

16

Demand Planning @SCM Dept.,

Demand planning is the process through which an organization generates a forecast of market demand for its products on a regular basis. This allows the organization to calculate a historically based statistical forecast for each point (that is, part number/warehouse combination). Some key output variables include demand in pieces, demand in customer orders, pieces per customer order; standard (forecast) deviation, and pieces per deviation.

At Torrent, there is a separate demand planning cell under the SCM dept., which conducts the demand planning on the basis of 4 months rolling plan. Under the rolling plan, planning is done 4 months prior to the corresponding month. Planning includes the demand forecasting, production planning, Supply planning, and dispatch planning.

Demand plan is first given by the marketing department. And then it is to be reviewed by the demand planning cell. For every product in each division, demand plan is reviewed, and corrected if needed.

On the 20th day of every month M Demand planning is done by the demand planning cell for the month M+3 .

After deciding demand plan, it is being executed by the related departments of the company in a very sequential manner and in a very structured way. All the planning like production planning, financial planning. Dispatch planning, procurement planning is made accordingly.

Company produces most of the products at in-house facilities i.e. at the Indrad and Baddi plants. While for certain products, company has P2P and LLM arrangements.

P2P is principle to principle arrangement, in which the products made by other companies, are marketed by TORRENT. Drug license and manufacturing licenses must be had by that company. Torrent need not to have drug license and manufacturing license. There are approx 230 products which are received from P2P..

LLM is Loan License manufacturing, in which the company uses the plant of other companies. But TORRENT uses the facilities of others’. Torrent must have a drug license and manufacturing license of that particular drug. There are approx 38 products which are received from LLM.

It is very complex task to forecast for the products which are not produced in house, as it has a longer lead time than the products produced in-house.

17

Demand Planning deals with these arrangements. They are responsible for getting the products in time and for planning its demand, dispatches at the right point of the time at least cost.

Due to certain circumstances, it is not possible to execute all the orders got from the stockiest. There are situations when they are not able to connect the stock as per the order.

It generally happens due to certain situation like non availability of raw material, machine breakdown, transportation problems, or due to sudden excess demand.

In some cases, it can be known in advance that a particular product may not be available for the coming month. So that product is declared as Non available product, which is abbreviated as NAP. This can be the genuine sales, if proper demand planning.

On the beginning of every month, every aspect of the past month is analyzed and proper justification is done to the particular aspect. Certain reports like Gap report, Nap report, inventory analysis report, connectivity report etc… are prepared.

Planning Horizon-4 months rolling plan (Tentative plan) Solid rockJuly

SolidAugust

SlushySeptember

Liquid October

M M+1 M+2 M+3

Let’s consider the month of June’07 as a reference point. According to the 4 months rolling plan, in the month of June, demand planning is made for the coming 4 month. As it is a continuous process, a new month is added every month. Status is changed for the consequent months.

Next two consecutive months are considered as a frozen. That means in the month ‘M-1’, demand plan is made fixed for the next two months i.e. ‘M’ and ‘M+1’. It can not be changed in the status of solid rock, solid status.

While the planning for the 3rd and 4th month is made tentative.

Status of these months is Slushy and Liquid. In tentative plan, demand can be changed as per the constraints.

In the same manner, the status of month ‘M’ and ‘M+1’ were tentative in ‘M-3’.

Status of every month is changed on arrival at the new month.

18

Existing system of forecasting

Torrent’s domestic operation system work on make to stock basis. Products are manufactured prior to the orders are received at C&FA.Hence there is a need to forecast the sales in advance.

Optimum quantity should be produced to serve the market.

At Torrent, Existing system of forecasting doesn’t use the specific statistical tool. Forecasting process is performed based on the past data & statistics like average sales, minimum, maximum sales and orders are taken into consideration.

Division vise demand plan is prepared by marketing department on the basis of field target. This plan is reviewed by the demand planning cell. So, according to the schedule of rolling plan, the demand plan is made.

Demands (forecasts) are generally predicted on the basis of past data. Past behavior of the resent months along with the general trend is considered to forecast. Field targets given to the sales force also are taken in to considerations. That means quantitative data is considered.

Certain factors like epidemics, seasonal effect and the some visible factors are taken care of. Visible factors include the competitor’s move, market behavior, and authoritarian factors. These factors are the qualitative data. Qualitative data should be quantified in a particular manner.

Considering all these factors, forecasts are put forward.

Present system works more on the judgment, no particular statistical tool is applied.So, it has not been able to capture all these factors precisely. Fluctuations can not be quantified in the proper proportion. There may be a bias in estimation and quantification of these parameters.

These all results in to forecast which doesn’t match exactly with the actual sales. Forecasts made do not fit to the actual data. Poor forecast accuracy will result into

Dispatch inefficiencies. Loss of genuine sales High inventory, so does the blockage of working capital High lead time

It’s must to have good forecast accuracy. Forecast accuracy here is less, which needs to be improved.Hence, there is a need to develop a system (model), which takes care of all the concerned factors. All the factors are needed to be understood and are to be quantified properly. How a single factor affects different SKUs in different manner.

19

By demand planning cell, a file named CODIS is prepared, which is Correlation among orders, demand, Inventory and sales. From the SAP, for every product a data is available which gives the demand, orders got, sales, and the total availability.

By this file, it is tried to analyze the actual scenario, to what extent orders are executed. %Variation of demand to sales and % variation to orders is calculated.That shows how the demand is close or away from the actual sales and orders.

Graph shown on the next two pages are the graphs, showing the status of orders, demand, sales and stock. And the other is showing the % variation demand to sales and % variation demand to orders with the corresponding trend lines.

The graph given on the next page is for the product Alprax, 0.5 tabs, which composites the molecule Alprazolam, which belongs to the class Tranquilizers.

20

Forecast Accuracy J un'05 - May'07Qua

ntity (U

nits)

Demand Orders Sales TA @ CFA

Demand 950010 900120 850000 860000 772000 855000 820000 730000 750000 700000 850000 800000 900000 950000 900000 900130 670000 600130 500130 500000 450000 375000 525000 525000 550000

Orders 878172 755927 795987 885868 665098 749917 773628 691974 616661 686613 940231 913461 962695 839190 855761 741450 490813 538989 492931 534135 558444 428911 693125 607044 596129

Sales 562903 499151 525658 590579 443185 499145 515552 459476 411027 432582 613461 604213 621979 557786 568361 476396 486713 535429 491120 527857 533434 420533 668147 580078 574958

TA @ CFA 1431263 1449613 1413207 1364700 1322452 1577114 1412617 1138279 933865 1319114 1727856 1282504 1235664 1275416 1552737 767560 687486 387987 792249 261937 667944 610092 893657 904710 734085

J une '05 J uly '05 Aug '05 Sept'05 Oct '05 Nov '05 Dec'05 J an '06 Feb '06March

'06April '06 May '06 J une '06 J uly '06 Aug '06 Sept'06 Oct'06 Nov '06 Dec'06 J an'07 Feb'07 Mar'07 Apr'07 May'07 J une'07

21

Tracking Forecast Acccuracy

-60.00%

-40.00%

-20.00%

0.00%

20.00%

40.00%

Month

% Variatio

n

Series1 Series2

Linear (Series2) Linear (Series1)

Series1 -40.75% -44.55% -38.16% -31.33% -42.59% -41.62% -37.13% -37.06% -45.20% -38.20% -27.83% -24.47% -30.89% -41.29% -36.85% -47.07% -27.36% -10.78% -1.80% 5.57% 18.54% 12.14% 27.27% 10.49%

Series2 -7.56% -16.02% -6.35% 3.01% -13.85% -12.29% -5.66% -5.21% -17.78% -1.91% 10.62% 14.18% 6.97% -11.66% -4.92% -17.63% -26.74% -10.19% -1.44% 6.83% 24.10% 14.38% 32.02% 15.63%

J une '05

J uly '05 Aug '05 Sept'05 Oct '05Nov '05

Dec'05 J an '06 Feb '06March

'06April '06

May '06

J une '06

J uly '06 Aug '06 Sept'06 Oct'06Nov '06

Dec'06 J an'07 Feb'07 Mar'07 Apr'07 May'07

The above given is the graph showing %variation of sales and orders to demand

22

Forecast accuracy at TORRENT with the present system

It is said to an accurate forecast, if; Sales= (90% to 110% of the forecast)

Demand planning cell, at TORRENT, calculates the forecast accuracy on the beginning of the month for the past month. Forecast accuracy, at the gross level and C&FA level, are calculated.An actual sale during the last month is compared to the projected demand of the corresponding product and corresponding month. The deviation of actual sales from the demand is calculated.Let’s consider for the ‘X’ product, the actual sales are ‘Yt’ and accordingly the forecast for the same is ‘Ft’.Then the deviation is calculated by the formula, (Yt-Ft)/Ft. This will give us the % deviation of demand to sales.

At TORRENT, a range is defined for the specification of the forecast accuracy. A forecast is considered to be a HIT, if it fluctuates within the range of the +/-10% range, otherwise miss.MS Excel is used for the purpose. By the present system, it has shown less accurate results.There is a need to work on the demand forecasting.Present system is efficient when it comes to the stable, fast moving and matured products. Present system takes care of the products, which have shown high skewness due to promotions and schemes. Present system can estimate well the sales of the product which are to be launched.Demand planning cell also interacts with the marketing people about the product behavior on line extension. Demand planning cell does the well job in estimating fluctuation of the forthcoming incidents, which can be known in advance.

Present system has its own unique features. System is well defined and well designed. It is a foolproof system.

Defining parameters Parameters are the factors, which directly or indirectly affect the actual sales. These factors are needed to be identified. What types of factors affect the actual sales?Factors which have a direct effect and indirect effect should be explored out. By the process of exploration one can have a list of parameters. Then it needs to be sort out in way to get the parameters which have a significant impact on it. There are the statistical methods to check the significance of various parameters on the actual sales. These can be the factors which can affect the actual sales.

23

Trend Total availability of the SKU Seasonal factors i.e. for months Promotions and schemes Price sensitivity Market share Market growth of the SKU Market growth of the molecule Market growth of the brand Market growth of the molecule class Market growth expected by the organizations Additional duties, taxes levied by government Introduction of new drugs by competitor in the same segment Line extension by company Introduction of new drugs by company in the same segment Regional factor Drugs with Same molecule Same Products with the different power Institutional sales Sales force Field Targets Secondary sales No. of stockiest Tertiary sales No. of retailers Miscellaneous factors (Epidemic, Billing channel, Government factors,

availability, orders etc…)

Above stated can be the factors which can have significant impact on the actual sales. There must be a proper selection of the parameters for having accurate and close forecasts.

Matching the results

After developing an appropriate model, models should be applied on the past data. Forecast for the past data should be done. It should be compared with the actual past data to verify the reliability and validity of the model. Various other statistical tools can be used to check for the same purpose.

Comparing it with the present system

After developing the models, it is necessary to compare it with the present system.If it gives better results than the present one or not. Comparison should be on the basis of various aspects, it should give reliable and consistent results. Does it have an impact on inventory level? Does it have an impact on profitability? Can it make the whole system smoother?

24

Is it Robust?Models should give the accurate results in any situation. If it gives the proper forecast in any situation, then it should be implemented. Model should capture the fluctuation.It should react to the adjustment done on foreseeing certain factors. Model has to be robust. It should be flexible towards the changes done. And it should react accordingly.

ImplementationAfter inspecting all the criteria, one should validate the model. If it gives reliable, consistent and precise results and have a significant impact on the topics of concern.Then it should be implemented. It should be used for the future.

Statistical tool Tools which can be considered are Time series Exponential smoothing Multiple regressions

Many forecasting methods are based on the concept that when an underlying pattern exists in a data series, that pattern can be distinguished from randomness by smoothing (averaging) past values. The effect of smoothing is to eliminate randomness so the pattern can be broken down into sub patterns that identify each component of the time series separately. Such a breakdown can frequently aid in better understanding the behavior of the series, which facilitates improved accuracy in forecasting. Time series decomposes the data in to the sub patterns. It analyzes the data and separates the effects of the components.

Data= pattern error =f (trend-cycle, seasonality, error)

But here at Torrent, there is a product basket having 500+ products. Each has a different behavior to behave. There are several factors which affects the overall dimensions. It is not enough to use time series. As it captures the trend, seasonality and error. To analyze and determine the trend, seasonality and level which is followed by the data, Triple Exponential Smoothing is applied. On the basis of the assumption and the methodology of the model, one can fit the model to the past data. And accordingly the forecasts for the coming period are got on the basis of past data.

Data AvailabilityThere is a 24 months data available, which gives the monthly primary sales of past 24 months i.e. From May’05 to May’06. Data available is of two complete cycles, which is the least requirement of applying triple exponential smoothing. Primary sales are the sales done through the channels of CFAs. But it also includes the institutional sales, which is to be nullifying later. Data for institutional sales are got from the SAP as a dump for the same period as stated above.

25

Exponential Smoothing

A model is an extension of moving average method and uses weighted moving average. In this particular method, weights are allocated to the past data and the recent data. A class of methods that imply exponentially decreasing weights as the observations get older. This method has the property that recent values are given relatively more weight in forecasting than the older observations. Triple Exponential Smoothing (Holt Winters multiplicative model)Holt’s method of exponential smoothing is developed by Winters (1960) to capture seasonality.It considers (1) Deseasonalized level (2) Trend (Growth) level (3) Seasonality

Let’s consider the, Original data i.e. monthly sales as Yt . Deseasonalized factor Rt Trend factor (Growth factor) Gt Seasonal factor St Forecast Ft

As monthly data is available for 24 months, we have two complete cycles. Data is available from June’05 to May’07. In the table given on the next page shows the 3 rd

column having these data.

To get the level and trend, one should apply the linear regression. In linear regression Equation, Y=a+bX; Y= actual sales a= intercept (Rt) b= Growth (Gt)

After getting the deseasonalized level and growth factor, seasonal factor is calculated. Seasonal factor= Actual sales of the corresponding month Forecasted sales for the same month by linear regression

By this one can have the seasonal factor. If it is greater than 1 than it is showing that amount of higher sales due to season. If it is less than 1 than it is showing that amount of less sales due to the season.

Equations for the Holt-Winters’ method are as follows;Level: Rt = α*Yt + (1-α)*(Gt-1+Rt-1) St-s

Trend: Gt=β*(Rt-Rt-1) +(1-β)*Gt-1

Seasonal: St=γ*Yt+ (1-γ)*St-s

Rt

26

Forecast: Ft= (Rt +Gt*X)*St-s-x

Here α, β, γ are the smoothing constant,0< α, β, γ<1.These values are chosen by the forecaster as per the feasibility of the data. There can be a bias in initializing the values of the smoothing constants. And it has been observed that α, β, γ=0.5 gives the favorable results. But to remove the bias of initializing the method is modified. So that it gives the same results as per the above calculation. The modified method is as follows:Rather than using the smoothing equation for the trend, level and seasonal factors by the above equation. One should fixed the trend and the level factor as it is got by the linear regression. It should be held constant for every month i.e. for the past months as well as the coming months.For seasonal indices of the future months, one should consider the average of the same corresponding months of the past cycles.This makes the calculations easy for the value of all the smoothing constants as 0.5.

So below given is the forecast for the two drugs Nikoran 5 Mg tab and Torleva 500.Last column indicates the %variation between the forecast and the actual sales.For the past months, it has shown very less variation i.e.+/-10%

Nikoran 5 Mg Tab

27

month sales (Yt)Yt^(deseasonalized factor) Rt(level) Gt(trend)

seasonal factor

seasonal indices

forecasted demand(Ft) % variation

108762.3 420.073

June '05 117586 109633 1.072542 118193.3 -0.51651

July '05 106675 109991.6 0.969846 113104.7 -6.02741

Aug '05 110095 110350.3 0.997687 113275.7 -2.88902

Sept'05 108912 109205.4 0.997314 108723.6 0.173017

Oct '05 97303 111067.5 0.876071 101156 -3.95981

Nov '05 111810 112921.2 0.99016 110736.4 0.96022

Dec'05 100237 111784.7 0.896697 106306.8 -6.05549

Jan '06 116430 112143.3 1.038225 119867.4 -2.95235

Feb '06 106153 112501.9 0.943566 107077.4 -0.87078

March '06 87058 119773.7 0.726854 79559.33 8.613413

April '06 147642 113219.2 1.304037 139645.1 5.416415

May '06 126534 113577.8 1.114074 123260.7 2.586905

June '06 124478 113936.4 1.092522 123650.3 0.664976

July '06 125046 114295 1.094063 118306.7 5.389457

Aug '06 120342 113375.1 1.06145 118465.6 1.559224

Sept'06 111741 115012.2 0.971557 113686 -1.74062

Oct'06 109466 115370.9 0.948818 105755.5 3.389605

Nov '06 115732 115729.5 1.000022 106138.8 8.289126

Dec'06 112807 112057.2 1.006691 111104.2 1.509473

Jan'07 128082 116446.7 1.09992 125256.5 2.206027

Feb'07 112052 116805.3 0.959306 111873.4 0.159366

Mar'07 79875 117163.9 0.681737 83109.6 -4.04958

Apr'07 136233 117522.5 1.159207 145853.6 -7.06184

May'07 124027 117881.2 1.052136 128720.5 -3.78424

June'07 118239.8 1.082532 129107.2

july'07 118598.4 1.031955 123508.7

aug'07 118957 1.029568 123655.6

sept'07 119315.6 0.984436 118648.4

Oct'07 119674.2 0.912445 110355.1

Nov'07 120032.8 0.995091 120768.7

Dec'07 120391.5 0.951694 115901.6

Jan'08 120750.1 1.069072 130645.6

Feb'08 121108.7 0.951436 116669.5

Mar'08 121467.3 0.704296 86659.9

Apr'08 121825.9 1.231622 152062.1

May'08 122184.5 1.083105 134180.3

Torleva 500

monthsales (Yt)

Yt^(deseasonalized factor) Rt(level) Gt(treind)

seasonal factor

seasonal indices

forecasted demand(Ft) % variation

28

7530.833 232.21June '05 5580 7763.043 0.71879 7073.096 -26.758

July '05 8250 7995.253 1.031862 8739.312 -5.93105

Aug '05 8250 8227.463 1.002739 8242.473 0.091238

Sept'05 9165 8459.673 1.083375 8108.181 11.53103

Oct '05 8280 8691.883 0.952613 7685.767 7.176727

Nov '05 9310 8924.093 1.043243 9216.261 1.006867

Dec'05 9835 9156.303 1.074123 9020.762 8.278982

Jan '06 9820 9388.513 1.045959 9863.728 -0.4453

Feb '06 7760 9620.723 0.806592 8007.905 -3.19465March '06 6212 9852.933 0.630472 6937.039 -11.6716April '06 15723 10085.14 1.559026 14403.85 8.389929

May '06 12660 10317.35 1.227059 11451.72 9.544068June '06 11641 10549.56 1.103458 9611.962 17.4301

July '06 12445 10781.77 1.154263 11785.15 5.30211

Aug '06 11024 11013.98 1.000909 11034.08 -0.0914

Sept'06 9374 11246.19 0.833527 10778.92 -14.9874

Oct'06 9365 11478.4 0.81588 10149.74 -8.37947

Nov '06 11971 11710.61 1.022235 12094.01 -1.02756

Dec'06 10704 11942.82 0.896271 11766.03 -9.92184

Jan'07 12848 12175.03 1.055274 12791.29 0.441369

Feb'07 10647 12407.24 0.858128 10327.29 3.002793

Mar'07 9829 12639.45 0.777644 8898.912 9.462696

Apr'07 16700 12871.66 1.297424 18383.63 -10.0816

May'07 13010 13103.87 0.992836 14544.61 -11.7956

June'07 0.911124 12150.83

july'07 1.093063 14830.99

aug'07 1.001824 13825.68

Sept'07 0.958451 13449.67

Oct'07 0.884246 12613.71

Nov'07 1.032739 14971.76

Dec'07 0.985197 14511.3

Jan'08 1.050617 15718.86

Feb'08 0.83236 12646.68

Mar'08 0.704058 10860.78

Apr'08 1.428225 22363.41

May'08 1.109948 17637.5

sales (y)Y^(deseasonalized factor) Rt(level) Gt(trend)

seasonal factor

seasonal indices

forecasted demand

% variation

908464.975-

19904.11 June '05 1601814 888560.8633 1.802 2159685 -35%July '05 1127094 868656.752 1.297 1150926 -2%Aug '05 754860 848752.6407 0.889 1247370 -65%Sept'05 780661 828848.5294 0.941 486365 38%Oct '05 366338 808944.4181 0.452 314832 14%

Domstal Tab

29

Nov '05 519499 789040.3068 0.658 980747 -89%Dec'05 1153666 769136.1955 1.499 1007405 13%Jan '06 93867 749232.0842 0.125 232877 -148%Feb '06 161767 729327.9729 0.221 241359 -49%March '06 207495 709423.8616 0.292 273507 -32%April '06 551717 689519.7503 0.800 680936 -23%May '06 521824 669615.639 0.779 849166 -63%June '06 1987064 649711.5277 3.058 1579151 21%July '06 851742 629807.4164 1.352 834463 2%Aug '06 1250256 609903.3051 2.049 896345 28%Sept'06 136720 589999.1938 0.231 346209 -153%Oct'06 185576 570095.0825 0.325 221874 -20%Nov '06 1005490 550190.9712 1.827 683866 32%Dec'06 593723 530286.8599 1.119 694563 -17%Jan'07 253332 510382.7486 0.496 158637 37%Feb'07 215842 490478.6372 0.440 162316 25%Mar'07 225209 470574.5259 0.478 181422 19%Apr'07 529518 450670.4146 1.174 445060 16%May'07 756852 430766.3033 1.756 546272 28%June'07 2.43054243 998618 july'07 1.32494925 518000 aug'07 1.46965034 545320 sept'07 0.58679561 206053 Oct'07 0.38918846 128917 Nov'07 1.24296128 386986Dec'07 1.30978815 381721Jan'08 0.31082059 84398Feb'08 0.33093342 83273Mar'08 0.38553344 89338Apr'08 0.98755149 209184May'08 1.26813932 243377

A graph showing correlation among forecast, sales, orders, stock available for the Domstal tab.

30

Forecast accuracy June'05-May'07(sales&orders)

0

500000

1000000

1500000

2000000

2500000

3000000

sales forecasted demand (sales) orders forecasted demand (order)

sales 1601814 1127094 754860 780661 366338 519499 1153666 93867 161767 207495 551717 521824 1987064 851742 1250256 136720 185576 1005490 593723 253332 215842 225209 529518 756852

forecasted demand (sales) 2159685 1150926 1247370 486365 314832 980747 1007405 232877 241359 273507 680936 849166 1579151 834463 896345 346209 221874 683866 694563 158637 162316 181422 445060 546272

orders 20252871328359 782084 781741 371538 526339 1163220 93867 161863 207591 556421 525592 2037925864498 1256448 140196 191010 1073078 603124 260148 232799 228318 560602 795742

forecasted demand (order) 24772461292622 1315619 495389 330171 1086651 1051971 253379 269189 292252 751399 951116 1723485 889079 893926 332245 218369 708007 674449 159645 166439 177034 445103 549776

J une '05

J uly '05 Aug '05 Sept'05 Oct '05Nov '05

Dec'05 J an '06 Feb '06March

'06April '06

May '06

J une '06

J uly '06 Aug '06 Sept'06 Oct'06Nov '06

Dec'06 J an'07 Feb'07 Mar'07 Apr'07 May'07

A graph showing the variation of forecast to sale and orders

31

Tracking forecasting accuracy

-250%

-200%

-150%

-100%

-50%

0%

50%

100%

month

% varia

tion

%variation sales Vs Demand %variaton orders Vs Demand% variation sales Vs past forecat %variation orders Vs past forecastLinear (%variation sales Vs Demand) Linear (%variaton orders Vs Demand)

%variation sales Vs Demand -35% -2% -65% 38% 14% -89% 13% -148% -49% -32% -23% -63% 21% 2% 28% -153% -20% 32% -17% 37% 25% 19% 16% 28%

%variaton orders Vs Demand -22% 3% -68% 37% 11% -106% 10% -170% -66% -41% -35% -81% 15% -3% 29% -137% -14% 34% -12% 39% 29% 22% 21% 31%

% variation sales Vs past forecat 52% -46% -59% -28% -118% -54% -4% -220% -24% 13% 0% -5% 30% -41% -12% -47% -8% 10% -136% 41% 31% 33% 6% 34%

%variation orders Vs past forecast 62% -24% -53% -28% -115% -52% -3% -220% -24% 13% 1% -5% 31% -39% -11% -43% -5% 16% -132% 42% 36% 34% 11% 37%

J une '05

J uly '05

Aug '05

Sept'05

Oct '05

Nov '05

Dec'05

J an '06

Feb '06

March '06

April '06

May '06

J une '06

J uly '06

Aug '06

Sept'06

Oct'06

Nov '06

Dec'06

J an'07

Feb'07

Mar'07

Apr'07

May'07

In the above SKU, it has shown much fluctuation in the past forecasts.But this model works on some basic assumption and hence limitations;

32

It needs data of two cycles, but TORRENT has many products that are launched after that. This means that this method fails with products having less data.This method concentrates only on 3 parameters which are very less. As there are many other probable factors which affect the actual sales. So, the method will not be able to give the accurate results.Method may also contain certain biases as the constants are initialized by the forecaster.

So it is not advisable to carry on with the triple exponential method for forecasting.A more robust, flexible, and inclusive model is needed to be chosen and fitted to the data. Need of another methodAnother method must be applied, which can include every parameter affecting the actual sales.

A method which is adjustable to any change regarding the parameters. One which gives very significant results. One which gives elaborate explanations about the steps taken. The method which gives less error. One, which increases the forecast accuracy and effectiveness to the significant

level. A new method should be an inclusive one. Later when a new parameter is

identified, it should be able to consider it.

Multiple Regressions with ‘n’ factors General Purpose The general purpose of multiple regressions (the term was first used by Pearson, 1908) is to learn more about the relationship between several independent or predictor variables and a dependent or criterion variable. .OverviewMultiple regression, a time-honored technique going back to Pearson's 1908 use of it, is employed to account for (predict) the variance in an interval dependent, based on linear combinations of interval, dichotomous, or dummy independent variables. Multiple regression can establish that a set of independent variables explains a proportion of the variance in a dependent variable at a significant level (through a significance test of R2), and can establish the relative predictive importance of the independent variables (by comparing beta weights). Power terms can be added as independent variables to explore curvilinear effects. Cross-product terms can be added as independent variables to explore interaction effects. One can test the significance of difference of two R2's to determine if adding an independent variable to the model helps significantly. Using hierarchical regression, one can see how most variance in the dependent can be explained by one or a set of new independent variables, over and above that explained by an earlier set. Of course, the estimates (b coefficients and constant) can be used to construct a prediction equation and generate predicted scores on a variable for further analysis. The multiple regression equation takes the form y = b1x1 + b2x2 + ... + bnxn + c. The b's are the regression coefficients, representing the amount the dependent variable y changes when the corresponding independent changes 1 unit. The c is the constant,

33

where the regression line intercepts the y axis, representing the amount the dependent y will be when all the independent variables are 0. The standardized version of the b coefficients is the beta weights, and the ratio of the beta coefficients is the ratio of the relative predictive power of the independent variables. Associated with multiple regression is R2, multiple correlation, which is the percent of variance in the dependent variable, explained collectively by all of the independent variables. Multiple regression shares all the assumptions of correlation: linearity of relationships, the same level of relationship throughout the range of the independent variable ("homoscedasticity"), interval or near-interval data, absence of outliers, and data whose range is not truncated. In addition, it is important that the model being tested is correctly specified. The exclusion of important causal variables or the inclusion of extraneous variables can change markedly the beta weights and hence the interpretation of the importance of the independent variables. Key Terms and ConceptsThe regression equation takes the form Y =bo+ b1*x1 + b2*x2 + e ; Where Y is the true dependent, b's are the regression coefficients for the corresponding x (independent) terms, c is the constant or intercept, e is the error term reflected in the residuals.

Sometimes this is expressed more simply as y = bo+ b1*x1 + b2*x2 + e; Where y is the estimated dependent ‘e’ is the constant (which includes the error term). Equations such as that above, with no interaction effects (see below), are called main effects models. In MS ExcelSelect Tools, Data Analysis, RegressionAnalyze, Regression, Linear; select your dependent and independent variables; click Statistics; select Estimates, Confidence Intervals, Model Fit; continue; OK. Predicted values, also called fitted values, are the values of each case based on using the regression equation for all cases in the analysis. In SPSS, dialog boxes use the term PRED to refer to predicted values and ZPRED to refer to standardized predicted values. Click the Save button in SPSS to add and save these as new variables in your dataset. Adjusted predicted values are the values of each case based on using the regression equation for all cases in the analysis except the given case. Residuals are the difference between the observed values and those predicted by the regression equation. Interaction effects are sometimes called moderator effects because the interacting third variable which changes the relation between two original variables is a moderator variable which moderates the original relationship. For instance, the relation between income and conservatism may be moderated depending on the level of education. The regression coefficient, b, is the average amount the dependent increases when the independent increases one unit and other independents are held constant. Put another way, the b coefficient is the slope of the regression line: the larger the b, the steeper the slope, the more the dependent changes for each unit change in the independent. The b coefficient is the unstandardized simple regression coefficient for the case of one independent. When there are two or more independents, the b

34

coefficient is a partial regression coefficient, though it is common simply to call it a "regression coefficient" also. In SPSS, Analyze, Regression, Linear; click the Statistics button; make sure Estimates is checked to get the b coefficients (the default). b coefficients compared to partial correlation coefficients. The b coefficient is a semi-partial coefficient, in contrast to partial coefficients as found in partial correlation. The partial coefficient for a given independent variable removes the variance explained by control variables from both the independent and the dependent, then assesses the remaining correlation. In contrast, a semi-partial coefficient removes the variance only from the independent. That is, where partial coefficients look at total variance of the dependent variable, semi-partial coefficients look at the variance in the dependent after variance accounted for by control variables is removed. Thus the b coefficients, as semi-partial coefficients, reflect the unique (independent) contributions of each independent variable to explaining the total variance in the dependent variable. Dynamic inference is drawing the interpretation that the dependent changes b units because the independent changes one unit. That is, one assumes that there is a change process (a dynamic) which directly relates unit changes in x to b changes in y. This assumption implies two further assumptions which may or may not be true: (1) b is stable for all sub samples or the population (cross-unit invariance) and thus is not an artificial average which is often unrepresentative of particular groups; and (2) b is stable across time when later re-samples of the population are taken (cross-time invariance). t-tests are used to assess the significance of individual b coefficients. Specifically testing the null hypothesis that the regression coefficient is zero. A common rule of thumb is to drop from the equation all variables not significant at the .05 level or better. Note that restricted variance of the independent variable in the particular sample at hand can be a cause of a finding of no significance. Like all significance tests, the t-test assumes randomly sampled data. In SPSS, Analyze, Regression, Linear; click the Statistics button; make sure Estimates is checked to get t and the significance of b. Level-importance is the b coefficient times the mean for the corresponding independent variable. The sum of the level importance contributions for all the independents, plus the constant, equals the mean of the dependent variable. Achen (1982: 72) notes that the b coefficient may be conceived as the "potential influence" of the independent on the dependent, while level importance may be conceived as the "actual influence." This contrast is based on the idea that the higher the b, the more y will change for each unit increase in b, but the lower the mean for the given independent, the fewer actual unit changes will be expected. By taking both the magnitude of b and the magnitude of the mean value into account, level importance is a better indicator of expected actual influence of the independent on the dependent. Level importance is not computed by SPSS. The beta weights are the regression (b) coefficients for standardized data. Beta is the average amount the dependent increases when the independent increases one standard deviation and other independent variables are held constant. If an independent variable has a beta weight of .5, this means that when other independents are held constant, the dependent variable will increase by half a standard deviation (.5 also). The ratio of the beta weights is the ratio of the estimated unique predictive importance of the independents. Note that the betas will change if variables or interaction terms are added or deleted from the equation. Reordering the variables without adding or deleting will not affect the beta weights. That is, the beta weights help assess the unique importance of the independent variables relative to the given model embodied in the regression equation. Note that adding or subtracting variables from the model can cause the b and

35

beta weights to change markedly, possibly leading the researcher to conclude that an independent variable initially perceived as unimportant is actually and important variable. In SPSS, Analyze, Regression, Linear; click the Statistics button; make sure Estimates is checked to get the beta coefficients (the default). Note that the betas reflect the unique contribution of each independent variable. Joint contributions contribute to R-square but are not attributed to any particular independent variable. The result is that the betas may underestimate the importance of a variable which makes strong joint contributions to explaining the dependent variable but which does not make a strong unique contribution. Thus when reporting relative betas, one must also report the correlation of the independent variable with the dependent variable as well, to acknowledge if it has a strong correlation with the dependent variable. Standardized means that for each datum the mean is subtracted and the result divided by the standard deviation. The result is that all variables have a mean of 0 and a standard deviation of 1. This enables comparison of variables of differing magnitudes and dispersions. Only standardized b-coefficients (beta weights) can be compared to judge relative predictive power of independent variables. Note some authors use "b" to refer to sample regression coefficients, and "beta" to refer to regression coefficients for population data. They then refer to "standardized beta" for what is simply called the "beta weight" here. Correlation: Pearson's r2 is the percent of variance in the dependent explained by the given independent when (unlike the beta weights) all other independents are allowed to vary. The result is that the magnitude of r2 reflects not only the unique covariance it shares with the dependent, but uncontrolled effects on the dependent attributable to covariance the given independent shares with other independents in the model. A rule of thumb is that multicollinearity may be a problem if a correlation is > .90 or several are >.7 in the correlation matrix formed by all the independents. The intercept,Variously expressed as e, c, or x-sub-0, is the estimated Y value when all the independents have a value of 0. Sometimes this has real meaning and sometimes it doesn’t — that is, sometimes the regression line cannot be extended beyond the range of observations, either back toward the Y axis or forward toward infinity. In SPSS, Analyze, Regression, Linear; click the Statistics button; make sure Estimates is checked to get the intercept, labeled the "constant" (the default). MS EXCEL allows the researcher to check a box to not have an intercept. This is equivalent to forcing the regression line to run through the origin. In rare cases the researcher may know the relation is linear and that the dependent variable is zero when all the independents are zero, in which case the option may be selected. R2, also called multiple correlations or the coefficient of multiple determination, is the percent of the variance in the dependent explained uniquely or jointly by the independents. R-squared can also be interpreted as the proportionate reduction in error in estimating the dependent when knowing the independents. That is, R2 reflects the number of errors made when using the regression model to guess the value of the dependent, in ratio to the total errors made when using only the dependent's mean as the basis for estimating all cases. Mathematically, R2 = (1 - (SSE/SST)), where SSE = error sum of squares = SUM ((Yi - EstYi) squared), where Yi is the actual value of Y for the ith case and EstYi is the regression prediction for the i th case; and where SST = total sum of squares = SUM ((Yi - MeanY) squared). The "residual sum of squares" in SPSS output is SSE and reflects regression error. Thus R-square is 1 minus regression error

36

http://www2.chass.ncsu.edu/garson/PA765/correl.htm

as a percent of total error and will be 0 when regression error is as large as it would be if you simply guessed the mean for all cases of Y. Put another way, the regression sum of squares/total sum of squares = R-square, where the regression sum of squares = total sum of squares - residual sum of squares. In SPSS, Analyze, Regression, Linear; click the Statistics button; make sure Model fit is checked to get R2. Maximizing R2 by adding variables is inappropriate unless variables are added to the equation for sound theoretical reason. At an extreme, when n-1 variables are added to a regression equation, R2 will be 1, but this result is meaningless. Adjusted R2 is used as a conservative reduction to R2 to penalize for adding variables and is required when the number of independent variables is high relative to the number of cases or when comparing models with different numbers of independentsStandard Error of Estimate (SEE), confidence intervals, and prediction intervals. Confidence intervals around the mean are discussed in the section on significance. In regression, however, the confidence refers to more than one thing. Note the confidence and prediction intervals will improve (narrow) if sample size is increased, or the confidence level is decreased (ex., from 95% to 90%). For large samples, SEE approximates the standard error of a predicted value. SEE is the standard deviation of the residuals. In a good model, SEE will be markedly less than the standard deviation of the dependent variable. In a good model, the mean of the dependent variable will be greater than 1.96 times SEE. The confidence interval of the regression coefficient. Based on t-tests, the confidence interval is the plus/minus range around the observed sample regression coefficient, within which we can be, say, 95% confident the real regression coefficient for the population regression lies. Confidence limits are relevant only to random sample datasets. If the confidence interval includes 0, then there is no significant linear relationship between x and y. We then do not reject the null hypothesis that x is independent of y. In SPSS, Analyze, Regression, Linear; click Statistics; check Confidence Limits to get t and confidence limits on b. The confidence interval of y (the dependent variable) is also called the standard error of mean prediction. Some 95 times out of a hundred, the true mean of y will be within the confidence limits around the observed mean of n sampled cases. That is, the confidence interval is the upper and lower bounds for the mean predicted response. Note the confidence interval of y deals with the mean, not an individual case of y. Moreover, the confidence interval is narrower than the prediction interval, which deals with individual cases. Note a number of textbooks do not distinguish between confidence and prediction intervals and confound this difference. In SPSS, select Analyze, Regression, Linear; click Save; under "Prediction intervals" check "Mean" and under "Confidence interval" set the confidence level you want (ex., 95%). Note SPSS calls this a prediction interval for the mean. The prediction interval of y. For the 95% confidence limits, the prediction interval on a fitted value is plus/minus is the estimated value plus or minus 1.96 times SQRT (SEE + S2

y), where S2y is the standard error of the mean prediction. Prediction intervals are

upper and lower bounds for the prediction of the dependent variable for a single case. Thus some 95 times out of a hundred; a case with the given values on the independent variables would lie within the computed prediction limits. The prediction interval will be wider (less certain) than the confidence interval, since it deals with an interval estimate of cases, not means. In SPSS, select Analyze, Regression, Linear; click Save; under "Prediction intervals" check "Individual" and under "Confidence interval" set the confidence level you want (ex., 95%).

37

http://www2.chass.ncsu.edu/garson/PA765/regress.htm#adjusted%23adjusted

F test: The F test is used to test the significance of R, which is the same as testing the significance of R2, which is the same as testing the significance of the regression model as a whole. If prob(F) < .05, then the model is considered significantly better than would be expected by chance and we reject the null hypothesis of no linear relationship of y to the independents. F is a function of R2, the number of independents and the number of cases. F is computed with k and (n - k - 1) degrees of freedom, where k = number of terms in the equation not counting the constant. F = [R2/k]/[(1 - R2 )/(n - k - 1)]. In MS EXCEL, the F test appears in the ANOVA table, which is part of regression output. Note that the F test is too lenient for the stepwise method of estimating regression coefficients and an adjustment to F is recommended (Outliers are data points which lie outside the general linear pattern of which the midline is the regression line. A rule of thumb is that outliers are points whose standardized residual is greater than 3.3 (corresponding to the .001 alpha level). The removal of outliers from the data set under analysis can at times dramatically affect the performance of a regression model. Outliers should be removed if there is reason to believe that other variables not in the model explain why the outlier cases are unusual -- that is, these cases need a separate model. Alternatively, outliers may suggest that additional explanatory variables need to be brought into the model (that is, the model needs respecification). Another alternative is to use robust regression, whose algorithm gives less weight to outliers but does not discard them. Multicollinearity is the intercorrelation of independent variables. R2's near 1 violate the assumption of no perfect colinearity, while high R2's increase the standard error of the beta coefficients and make assessment of the unique role of each independent difficult or impossible. While simple correlations tell something about multicollinearity, the preferred method of assessing multicollinearity is to regress each independent on all the other Assumptions Proper specification of the model: If relevant variables are omitted from the model, the common variance they share with included variables may be wrongly attributed to those variables, and the error term is inflated. If causally irrelevant variables are included in the model, the common variance they share with included variables may be wrongly attributed to the irrelevant variables. The more the correlation of the irrelevant variable(s) with other independents, the greater the standard errors of the regression coefficients for these independents. Omission and irrelevancy can both affect substantially the size of the b and beta coefficients. This is one reason why it is better to use regression to compare the relative fit of two models rather than to seek to establish the validity of a single model. Linearity. Regression analysis is a linear procedure. To the extent nonlinear relationships are present, conventional regression analysis will underestimate the relationship. That is, R-square will underestimate the variance explained overall and the betas will underestimate the importance of the variables involved in the non-linear relationship. Substantial violation of linearity thus means regression results may be more or less unusable. Minor departures from linearity will not substantially affect the interpretation of regression output. Checking that the linearity assumption is met is an essential research task when use of regression models is contemplated. Nonlinear transformations. When nonlinearity is present, it may be possible to remedy the situation through use of exponential or interactive terms. Nonlinear transformation of selected variables may be a pre-processing step, but beware that this runs the danger

38

of overfitting the model to what are, in fact, chance variations in the data. Power and other transform terms should be added only if there is a theoretical reason to do so. Adding such terms runs the risk of introducing multicollinearity in the model. A guard against this is to use centering when introducing power terms (subtract the mean from each score). Correlation and unstandardized b coefficients will not change as the result of centering. Partial regression plots are often used to assess nonlinearity. These are simply plots of each independent on the x axis against the dependent on the y axis. Curvature in the pattern of points in a partial regression plot shows if there is a nonlinear relationship between the dependent and any one of the independents taken individually. Note, however, that whereas partial regression plots are preferred for illuminating cases with high leverage, partial residual plots (below) are preferred for illuminating nonlinearities.

Simple residual plots also show nonlinearity but do not distinguish monotone from nonmonotone nonlinearity. These are usually plots of standardized residuals against standardized estimates of Y, the dependent variable. The plot should show a random pattern, with no nonlinearity or heteroscedasticity. In jargon, this will show the error vector is orthogonal to the estimate vector. Non-linearity is, of course, shown when points form a curve. Non-normality is shown when points are not equally above and below the Y axis 0 line. Non-homoscedasticity is shown when points form a funnel or other shape showing variance differs as one moves along theY axis. Non-recursivity. The dependent cannot also be a cause of one or more of the independents. This is also called the assumption of non-simultaneity or absence of joint dependence. Violation of this assumption causes regression estimates to be biased and means significance tests will be unreliable. No overfitting. The researcher adds variables to the equation while hoping that adding each significantly increases R-squared. However, there is a temptation to add too many variables just to increase R-squared by trivial amounts. Such overfitting trains the model to fit noise in the data rather than true underlying relationships. Subsequent application of the model to other data may well see substantial drops in R-squared. Cross-validation is a strategy to avoid overfitting. Under cross-validation, a sample (typically 60% to 80%) is taken for purposes of training the model, then the hold-out sample (the other 20% to 40%) is used to test the stability of R-squared. This may be done iteratively for each alternative model until stable results are achieved. Unbounded data are an assumption. That is, the regression line produced by OLS can be extrapolated in both directions but is meaningful only within the upper and lower natural bounds of the dependent. Data are not censored, sample selected, or truncated. There are as many observations of the independents as for the dependents. Collapsing an interval variable into fewer categories leads to attenuation and will reduce R2. Absence of perfect multicollinearity. When there is perfect multicollinearity, there is no unique regression solution. Perfect multicollinearity occurs if independents are linear functions of each other (ex., age and year of birth), when the researcher creates dummy variables for all values of a categorical variable rather than leaving one out, and when there are fewer observations than variables. Absence of high partial multicollinearity. When there is high but imperfect multicollinearity, a solution is still possible but as the independents increase in correlation with each other, the standard errors of the regression coefficients will become inflated. High multicollinearity does not bias the estimates of the coefficients,

39

only their reliability. This means that it becomes difficult to assess the relative importance of the independent variables using beta weights. It also means that a small number of discordant cases potentially can affect results strongly. The importance of this assumption depends on the type of multicollinearity. In the discussion below, the term "independents" refers to variables on the right-hand side of the regression equation other than control variables. Normally distributed residual error: Error, represented by the residuals, should be normally distributed for each set of values of the independents. A histogram of standardized residuals should show a roughly normal curve. An alternative for the same purpose is the normal probability plot, with the observed cumulative probabilities of occurrence of the standardized residuals on the Y axis and of expected normal probabilities of occurrence on the X axis, such that a 45-degree line will appear when observed conforms to normally expected. The F test is relatively robust in the face of small to medium violations of the normality assumption. The central limit theorem assumes that even when error is not normally distributed, when sample size is large, the sampling distribution of the b coefficient will still be normal. Therefore violations of this assumption usually have little or no impact on substantive conclusions for large samples, but when sample size is small, tests of normality are important. Additivity. Likewise, regression does not account for interaction effects, although interaction terms (usually products of standardized independents) may be created as additional variables in the analysis. As in the case of adding nonlinear transforms, adding interaction terms runs the danger of overfitting the model to what are, in fact, chance variations in the data. Such terms should be added only when there are theoretical reasons for doing so. That is, significant but small interaction effects from interaction terms not added on a theoretical basis may be artifacts of overfitting. Such artifacts are unlikely to be replicable on other datasets. Homoscedasticity: The researcher should test to assure that the residuals are dispersed randomly throughout the range of the estimated dependent. Put another way, the variance of residual error should be constant for all values of the independent(s). If not, separate models may be required for the different ranges. Also, when the homoscedasticity assumption is violated "conventionally computed confidence intervals and conventional t-tests for OLS estimators can no longer be justified" (Berry, 1993: 81). However, moderate violations of homoscedasticity have only minor impact on regression estimates (Fox, 2005: 516). No outliers. Outliers are a form of violation of homoscedasticity. Detected in the analysis of residuals and leverage statistics, these are cases representing high residuals (errors) which are clear exceptions to the regression explanation. Outliers can affect regression coefficients substantially. The set of outliers may suggest/require a separate explanation. Some computer programs allow an option of listing outliers directly, or there may be a "case wise plot" option which shows cases more than 2 s.d. from the estimate. To deal with outliers, the researcher may remove them from analysis and seek to explain them on a separate basis, or transforms may be used which tend to "pull in" outliers. These include the square root, logarithmic, and inverse (x = 1/x) transforms. Reliability: Reliability is reduced by measurement error and, since all variables have some measurement error, by having a large number of independent variables. To the extent there is random error in measurement of the variables, the regression coefficients will be attenuated. To the extent there is systematic error in the measurement of the variables, the regression coefficients will be simply wrong. (In

40

contrast to OLS regression, structural equation modeling involves explicit modeling of measurement error, resulting in coefficients which, unlike regression coefficients, are unbiased by measurement error.) Note measurement error terms are not to be confused with residual error of estimate, discussed below. Population error is uncorrelated with each of the independents). This is the "assumption of mean independence": that the mean error is independent of the x independent variables. This is a critical regression assumption which, when violated, may lead to substantive misinterpretation of output. The (population) error term, which is the difference between the actual values of the dependent and those estimated by the population regression equation, should be uncorrelated with each of the independent variables. Since the population regression line is not known for sample data, the assumption must be assessed by theory. Specifically, one must be confident that the dependent is not also a cause of one or more of the independents, and that the variables not included in the equation are not causes of Y and correlated with the variables which are included. Either circumstance would violate the assumption of uncorrelated error. One common type of correlated error occurs due to selection bias with regard to membership in the independent variable "group" (representing membership in a treatment vs. a comparison group): measured factors such as gender, race, education, etc., may cause differential selection into the two groups and also can be correlated with the dependent variable. When there is correlated error, conventional computation of standard deviations, t-tests, and significance are biased and cannot be used validly. Note that residual error -- the difference between observed values and those estimated by the sample regression equation -- will always be uncorrelated and therefore the lack of correlation of the residuals with the independents is not a valid test of this assumption. Independent observations (absence of autocorrelation) leading to uncorrelated error terms. Current values should not be correlated with previous values in a data series. This is often a problem with time series data, where many variables tend to increment over time such that knowing the value of the current observation helps one estimate the value of the previous observation. Spatial autocorrelation can also be a problem when units of analysis are geographic units and knowing the value for a given area helps one estimate the value of the adjacent area. That is, each observation should be independent of each other observation if the error terms are not to be correlated, which would in turn lead to biased estimates of standard deviations and significance. By accepting all the assumptions and understanding the technicalities of the multiple regression model, it has been unanimously decided that multiple regression model should be used. As demand for the pharma products is affected by the various parameters with less or more concentration. So, it has been decided to work to construct the multiple regression model for the demand forecasting. So, there were certain steps to be taken. First of all proper software should be selected to apply the multiple regression model on the product basket of 500+ products.It was found that MS Excel has the facility to apply the multiple regression with using certain number of parameters. Let’s learn first how to use Multiple Regression function in MS Excel.

41

Multiple Regression with MS ExcelTo do regression in Excel, you need the Analysis Toolpak add-in to be installed in Excel. This was an option when you installed Excel, but you might not have selected it. If you didn't install it, Excel will ask you for the CD, when you try to add the toolpak.

Check that the add-in is installed, and added-in, by choosing Add-ins from the tools menu (as shown below).

Then ensure that "Analysis ToolPak" is selected, as shown below.

You can now use the data analysis functions in Excel, which include multiple regression.The example that we will work through is taken from dataset 6.1b in the book "Applying regression and correlation" (if you jumped straight in here, that is what these web pages is about. To get to the data analysis function in Excel, you select the Tools menu, and then choose Data Analysis.

42

This gives the following Dialog, click on Regression and then click OK.

The following dialog appears:

In here, we tell Excel about the data that we would like to analyze. The first box is the input Y range. Here, we tell Excel about our dependent variable. The dependent variable must be a column, 1 cell wide and N cells long (where N is the number of individuals that we are analyzing). The dataset we are using, the dependent variable is An, which is the column which goes from cell D1 to Cell D41. You can either type this information in directly as D1:D41, or you can select the appropriate data from the spreadsheet.

Because we have included row 1, which includes the variable name, we are going to have to tell Excel this, by clicking on the "Labels" checkbox.

43

The next stage is to input the independent variables. The independent variables must be a block of data, of k columns (where k is the number of independent variables) and N rows (where N is still the number of people). In the dataset we are using we have three independent variables: hassles, hassles2 and hassles3. (These represent the linear, quadratic and cubic effects of hassles - we are analyzing a non-linear relationship here,) These are held in rows 1 - 41 of columns A, B and C. Again, we can type in A1:C41 or select the data from the spreadsheet - it will have the same effect.

Next we tell Excel where we want the results to be written. It is best to ask for a new sheet - you don't want to accidentally overwrite some of your precious data, and have to go to all of the effort of restoring it from a backup, do you? (You do have a backup, don't you?)

We can ask fro residuals and standardized residuals to be saved - these will be new columns of numbers created in the new spreadsheet.

Two types of graphs will be drawn automatically if you ask for them. A residual plot will draw scatter plots of each independent variable on the x-axis,

and the residual on the y-axis. A line fit plot will draw scatter plots of each independent variable on the x-axis,

and the predicted and actual values of the dependent variable on the y axis. You cannot, as far as I have been able to determine, automatically have A scatter plot with the predicted values on the x-axis, and the residuals on the y-

axis (although you can calculate these values and save them.) You can also request a normal probability plot. This appears to be a plot of the dependent variable, which is a curious thing to plot - regression analysis does not assume normal distribution of the dependent variable. The usual plot of this type would be the residuals, but this is not possible in Excel.The dialog box now looks like this:

44

.So, finally, we click OK.

And we get a lot of output, written to a new sheet. A note about this output - output from analysis in Excel is usually "live" that is to say, the data are linked to the output. If you change the data, you will change the output. This is not the case for this type of output in Excel. The results of the analysis are "dead" and will not change.Regression Statistics

The first part of the output is the regression statistics. These are standard statistics which are given by most programs.ANOVAThe ANOVA table comes next. This gives a test of significance of the R2. Note that Excel uses scientific notation, by default, so when it says 2.22E-08 it means, 2.22 * 10 -

8 . (i.e. 0.0000000222).

ON the next page is shown the summary output given by the regression function in MS EXCEL.

Summary Output45

Regression StatisticsMultiple R

R Square

Adjusted R Square

Standard Error

Observations

ANOVA

df SS MS F Significance F

Regression

Residual

Total

Coefficients Standard Error

t Stat P-value Lower 95% Upper 95% Lower 95.0%

Upper 95.0%

Intercept

X1

X2

X3X4

X5

X6

X7

…

46

Coefficients

The next stage is the coefficients. Note that here I have converted the numbers to 2 decimal places to save space). It gives the coefficient for each parameter, including the intercept (the constant). The standard errors, and the t-values follow (the t-value is

RESIDUAL OUTPUT

Observation Ft(forecast) Residuals(Yt-Ft)

1 98559.34 704.6626

2 108155.6 -280.247

3 116368.6 -281.312

4 123269.4 -62.7746

5 94083.14 -83.1435

6 110911.2 -241.879

7 102224 -57.3114

8 107602.8 10.49856

9 95990.37 -69.035

10 85130.35 9.64867

11 157103.7 102.9695

12 144048.8 -76.1371

13 119017.1 140.9353

14 129806.9 -32.2203

15 112633.6 76.05105

16 112319.5 -5.18338

17 134574.3 176.6968

18 121418.5 125.4875

19 89674.86 88.14016

20 112742.9 -179.894

21 98022.22 117.7834

22 74801.29 -205.29

23 127936 -25.0449

24 94617.27 11.72969

47

the coefficient divided by the standard error). Next comes the p-value associated with the variable, and the confidence intervals of the parameter estimates (Excel gave these to me twice, even though I didn't ask for them.)

Residuals

The final part of the output is the residual information. The observation in the left had column is the case number - although Excel never told us about this, it has labeled the first person Observation 1, the second Observation 2, etc. (Note that this is NOT the original row number - Observation 1 was row 2). The predicted anxiety score is the score that was predicted from the regression equation. The residual is the raw residual - that is the difference between the predicted score and the actual score on the dependent variable. The final value is the standardized residual (the residuals adjusted to ensure that they have a standard deviation of 1; they have a mean of zero already).Graphsfinally we will have a quick look at the graphs.The first graph is an example of the residual plots - it has hassles on the x-axis and the unstandardized residual on the y-axis.

The second graphs show the predicted and actual anxiety scores plotted against hassles3.

48

By using MS Excel it is possible to apply the Multiple Regression function, as stated above.

Limitation of Regression function

Regression function gives the sheet, which doesn’t change. It is known as a dead sheet.This doesn’t fit into our criteria. A dynamic function is needed which gives the output which changes, as data changes.By a validation list, data is made changed along with the SKUs.By Regression function, we are not getting an output which changes with the SKU. It is not possible to create summary output for the entire product basket.

Hence, another function is used to get the changing output. A function called LINEST (Linear Estimation) is used.

LINESTCalculates the statistics for a line by using the "least squares" method to calculate a straight line that best fits your data, and returns an array that describes the line. Because this function returns an array of values, it must be entered as an array formula.The equation for the line is:y = mx + b ory = m1x1 + m2x2 + ... + b (if there are multiple ranges of x-values)Where the dependent y-value is a function of the independent x-values. The m-values are coefficients corresponding to each x-value, and b is a constant value. Note that y, x, and m can be vectors. The array that LINEST returns is {mn,mn-1,...,m1,b}. LINEST can also return additional regression statistics.SyntaxLINEST(known_y's,known_x's,const,stats)Known_y's is the set of y-values you already know in the relationship y = mx + b.If the array known_y's is in a single column, then each column of known_x's is interpreted as a separate variable.

49

If the array known_y's is in a single row, then each row of known_x's is interpreted as a separate variable.Known_x's is an optional set of x-values that you may already know in the relationship y = mx + b.The array known_x's can include one or more sets of variables. If only one variable is used, known_y's and known_x's can be ranges of any shape, as long as they have equal dimensions. If more than one variable is used, known_y's must be a vector (that is, a range with a height of one row or a width of one column).If known_x's is omitted, it is assumed to be the array {1, 2,3,...} that is the same size as known_y's.

Const is a logical value specifying whether to force the constant b to equal 0.If const is TRUE or omitted, b is calculated normally.If const is FALSE, b is set equal to 0 and the m-values are adjusted to fit y = mx. Statistics are a logical value specifying whether to return additional regression statistics.If stats is TRUE, LINEST returns the additional regression statistics, so the returned array is {mn,mn-1,...,m1,b;sen,sen-1,...,se1,seb;r2,sey;F,df;ssreg,ssresid}.If stats is FALSE or omitted, LINEST returns only the m-coefficients and the constant b.The additional regression statistics are as follows.

Statistic Descriptionse1,se2,...,sen The standard error values for the coefficients m1,m2,...,mn.

seb The standard error value for the constant b (seb = #N/A when const is FALSE).

r2 The coefficient of determination. Compares estimated and actual y-values, and ranges in value from 0 to 1. If it is 1, there is a perfect correlation in the sample— there is no difference between the estimated y-value and the actual y-value. At the other extreme, if the coefficient of determination is 0, the regression equation is not helpful in predicting a y-value. For information about how r2 is calculated, see "Remarks" later in this topic.

sey The standard error for the y estimate.F The F statistic or the F-observed value. Use the F statistic to

determine whether the observed relationship between the dependent and independent variables occurs by chance.

df The degrees of freedom. Use the degrees of freedom to help you find F-critical values in a statistical table. Compare the values you find in the table to the F statistic returned by LINEST to determine a confidence level for the model. For information about how df is calculated, see "Remarks" later in this topic. Example 4 below shows use of F and df.

SSreg The regression sum of squares.SSresid The residual sum of squares. For information about how ssreg and

ssresid are calculated, see "Remarks" later in this topic.

50

The following illustration shows the order in which the additional regression statistics are returned.

Statistics given by functioncoeff(n) coeff(n-1) coeff(n-2) coeff(n-3) ……se(n) se(n-1) se(n-2) se(n-3) ……coeff of det S.E F stats d.f. SS reg SS resid

Fitting Multiple Regression Model

AT SCM dept, DPC plays with vast and scattered product basket. Product basket contains various drugs in the form of tablets, capsules, vials and bottles. Various drugs are combination of the different molecules. Product belongs to the different molecule classes. As we have discussed and got certain numbers of parameters which can affect the actual sales, each parameter has to be checked out for its impact on the actual sales.

We have the question of including parameters in to the model as an independent parameter.One should check out the significance and validity of the parameter. After deciding all those criteria, a decision should be taken as to which parameter should be included.

ASSUMPTIONS madeParameters taken into considerations are least correlatedMultiple regression model follows all the assumption of the correlation.Data, which are collected, is accurate.Future estimates of the parameters are true.There is no intercept considered.

Data Sources SAP data files- SAP data files are the files which are extracted from the SAP.As SAP contains all the data regarding the sales, orders, availability, field targets, institutional sales and what not! SAP contains past data in every form in which it is needed. Generally these data are fed into SAP in the past. So to get the data, SAP is used and data files are used as the data source. Thus, SAP data files are the internal source of the data.

ORG-MARG DATAORG-MARG is the market research company. They collect the sales data from retail counters. Data collected by ORG people is product specific, company specific, industry specific, market specific.

Data used for the project is of Pharmaceuticals’ sector.

51

TORRENT is a subscriber of the ORG Data. TORRENT uses the org data for the market research and analysis purpose.

There is a separate cell at TORRENT, which deals with the ORG data. ORG data is replenished on every month for the recent past month by the ORG-MARG. ORG MARG has the dedicated software, which are used to get the data in the form as it is needed.ORG data is available on the market basis.

Data available has shown the hierarchies as shown below in the graph.

ORG data is available on the monthly as well as yearly basis.ORG data is available in the units (strips) and value. They also give the company market share, company market growth, molecule growth, molecule class growth, and company’s share in the particular sector. It also provides the statistics in terms of years. How much market share does the company have? How much does it have gained or lost? ORG provides the data a month later i.e. in the month of June, it provides the data of the month of May.

Below given is the format of data provided by ORG. (in value terms (‘Lac Rs))

52

MarketPharmaceuticals

Molecule classTranquilizers

MoleculeAalprazolam

Pack wiseAlprax .5 tab

Alprax Sr 0.5 TabStrength wiseAlprax (0.5)

(All the products consisting

0.5 strength)

BrandAlprax

TC IV DESC MAT ~ 04/2007 MAT ~ 04/2007 MTH ~ 05/2005 MTH ~ 03/2007 MTH~ 04/2007

PRODUCT DESC LC-RUPEES LC-RUPEES % LC-RUPEES LC-RUPEES LC-RUPEES

A02B1 RANITIDINE ORAL SOLIDS 1,852,554,623 2.05 18.29 141,721,577 180,572,584

ZINET ZINETAC GSK 675,064,933 36.44 11.89 54,307,493 58,729,261

150MG 150MG 547,763,664 81.14 7.28 43,551,236 47,357,767

300MG 300MG 127,301,269 18.86 8.00 10,756,257 11,371,494

ACILO ACILOC CI6 500,304,897 27.01 4.31 38,059,151 60,039,507

150MG 150MG 420,820,418 84.11 9.65 31,135,774 51,767,882

300MG 300MG 79,484,479 15.89 9.30 6,923,377 8,271,625

RANTA RANTAC UNQ 286,799,348 15.48 11.49 21,292,366 26,842,265

150MG 150MG 238,862,933 83.29 15.75 18,051,574 22,356,936

300MG 300MG 47,936,415 16.71 10.05 3,240,792 4,485,329

HISTA HISTAC RBY 124,060,012 6.70 55.97 8,075,763 11,001,086

150MG 150MG 93,556,557 75.41 22.51 5,946,048 8,238,910

300MG 300MG 30,503,455 24.59 21.35 2,129,715 2,762,176

HISTA HISTAC EVT RBY 99,640,931 5.38 26.22 7,430,115 9,476,606

150MG 150MG 99,640,931 100.00 23.28 7,430,115 9,476,606

R-LOC R-LOC ZYC 59,206,884 3.20 23.28 4,961,553 5,714,628

150MG 150MG 59,069,063 99.77 37.68 4,917,631 5,684,890

75MG 75MG 137,821 0.23 37.36 43,922 29,738

RANIT RANITIN TNT 46,791,612 2.53 999.00 3,296,745 3,812,735

150MG 150MG 39,532,356 84.49 32.42 2,827,519 3,211,397

300MG 300MG 7,259,256 15.51 30.22 469,226 601,338

A02B4 FAMOTIDINE ORAL SOLIDS 246,582,354 0.27 -4.50 18,445,030 23,006,964

TOPCI TOPCID TNT 68,367,597 27.73 4.51 5,080,466 7,097,347

40MG 40MG 44,835,931 65.58 7.55 3,298,159 4,778,015

20MG 20MG 23,531,666 34.42 -0.82 1,782,307 2,319,332

10MG 10MG 0 0.00 -100.00 0 0

FAMTA FAMTAC NPL 67,354,774 27.32 -9.38 5,456,090 6,733,760

40MG 40MG 38,800,381 57.61 -5.97 3,352,932 4,184,010

20MG 20MG 28,554,393 42.39 -13.64 2,103,158 2,549,750

FAMOC FAMOCID SPI 50,311,246 20.40 -6.70 3,794,967 4,534,748

40MG 40MG 32,702,278 65.00 0.19 2,439,442 3,049,632

20MG 20MG 17,608,968 35.00 -17.28 1,355,525 1,485,116

FAMOT FAMOTIN USV 27,931,644 11.33 -0.39 1,756,530 2,118,098

40MG 40MG 18,104,595 64.82 2.85 1,133,154 1,366,555

20MG 20MG 9,827,049 35.18 -5.85 623,376 751,543

Below given is the format of data provided by ORG. (in units (strips))TC IV DESC MAT ~ 04/2007

MAT ~ 04/2007

MAT ~ 04/2007

MTH ~ 03/2007

MTH ~ 04/2007

PRODUCT DESC UN-T.UNITS UN-T.UNITSUN-

T.UNITSUN-T.UNITS UN-T.UNITS

53

%+ ~

04/2006

A02B1 RANITIDINE ORAL SOLIDS 367,296,921 10.61 7.56 27,597,697 34,621,248

ZINET ZINETAC GSK 141,026,575 38.40 10.46 11,356,739 12,307,377

150MG 150MG 123,651,047 87.68 10.80 9,853,256 10,714,723

300MG 300MG 17,375,528 12.32 8.11 1,503,483 1,592,654

ACILO ACILOC CI6 78,049,549 21.25 -8.65 5,840,357 9,380,343

150MG 150MG 70,099,274 89.81 -8.68 5,146,839 8,550,860

300MG 300MG 7,950,275 10.19 -8.39 693,518 829,483

RANTA RANTAC UNQ 56,514,201 15.39 6.10 3,552,385 4,421,561

150MG 150MG 50,184,524 88.80 2.20 3,179,974 3,892,181

300MG 300MG 6,329,677 11.20 52.20 372,411 529,380

HISTA HISTAC RBY 24,846,757 6.76 22.12 1,604,564 2,198,173

150MG 150MG 20,790,438 83.67 21.35 1,321,357 1,830,862

300MG 300MG 4,056,319 16.33 26.22 283,207 367,311

HISTA HISTAC EVT RBY 30,734,286 8.37 23.78 2,411,993 3,043,047

150MG 150MG 30,734,286 100.00 23.78 2,411,993 3,043,047

R-LOC R-LOC ZYC 13,366,513 3.64 39.56 1,199,132 1,381,871

150MG 150MG 13,336,874 99.78 39.25 1,189,686 1,375,476

75MG 75MG 29,639 0.22 999.00 9,446 6,395

RANIT RANITIN TNT 9,773,764 2.66 31.62 692,404 795,535

150MG 150MG 8,804,571 90.08 30.22 629,758 715,250

300MG 300MG 969,193 9.92 45.86 62,646 80,285

A02B4FAMOTIDINE ORAL

SOLIDS75,748,514 75,748,514 2.19 5,773,081 7,153,647

TOPCI TOPCID TNT 18,543,500 18,543,500 24.48 1,382,388 1,909,048

40MG 40MG 10,167,576 10,167,576 54.83 747,989 1,083,447

20MG 20MG 8,375,924 8,375,924 45.17 634,399 825,601

10MG 10MG 0 0 0.00 0 0

FAMTA FAMTAC NPL 26,738,199 26,738,199 35.30 2,185,117 2,710,002

40MG 40MG 12,648,727 12,648,727 47.31 1,090,297 1,378,652

20MG 20MG 14,089,472 14,089,472 52.69 1,094,820 1,331,350

FAMOC FAMOCID SPI 13,873,695 13,873,695 18.32 1,049,915 1,237,173

40MG 40MG 7,517,099 7,517,099 54.18 560,708 701,036

20MG 20MG 6,356,596 6,356,596 45.82 489,207 536,137

FAMOT FAMOTIN USV 7,280,208 7,280,208 9.61 458,596 552,992

40MG 40MG 3,795,479 3,795,479 52.13 237,549 286,492

20MG 20MG 3,484,729 3,484,729 47.87 221,047 266,500

ORG data also provides the strength wise data, which gives the strength of the same brand. Any brands has a drug with a particular strength is given. Data also gives the competitor’s status.As per the data provided to us, we have top 20 company’s position in the particular market segment, along with TORRENT.

54

There are several indicator used in ORG data as per their terminology. Sometimes the name of the drug differs with the name which is there in the dump files of the SAP.

ORG data has been classified into many classes. ORG data is available with the molecule (Therapeutic class)ORG data is available with the Molecule class.ORG data is available with zone and regional basis also.For the secondary objective of the data, regional data of the ORG is used.

The actual problem was to create database which can be used for the proposed model. This database should be made in way that it can easily be understood, can easily be read, can easily be updated with the new months added.

To create the database, the ORG data should be matched with the SAP data. They should have common indicators, which can be easily identified by MS EXCEL. It is not that easy to insert the common indicators in the data having Thousands of rows and hundreds of columns. It is a big deal in combining the whole data.

What is to be made?How should it be?On what assumption should it be made?

These are the questions which must be answered before proceeding ahead.It is necessary to answer these question because of the complexity of the task i.e. demand forecasting.

Question addressed, is of making a statistical model with the ample amount of data available and which covers all the data.A model must give output for multiple numbers of products. It should not address only one product. Or it should not be different for every class or division.A model must be like software, which should be a user-friendly, flexible and robust one.

It is a task for forecasting demand for the product basket having 500+ products.It has to have certain features, which makes it to be used by others.A model must be of a type, which gives outputs without wasting the time of the concerned personnel.

There are certain assumptions which must be addressed before making it.Like what are the parameters included?Like it must work on the basis of 4 month rolling plan.Let me put it in this way;The objective of the model is;

Model must give the demand for the month ‘M+3’ referring to the month ‘M’ with the X% accuracy.

55

In the month of JUNE’07, it should give the forecasts of the demand for the month of OCOTBER’07 with specific % level accuracy.

The forecast should be made for the trade sales. The effect of institutional sales must be removed.

Methodology

Sales is considered as a dependent variable as ‘Y’.While the other factors are considered as the independent variables X1, X2, X3…Again here it is assumed that all these parameters including dependent and independent follow the basic assumption of the Multiple Regression Model.

There are several factors available which directly or indirectly affect the sales. It is not advisable to consider all the factors as a parameter. It can be hazardous to include all the factors as a parameter.

One should try to filter out the factors which actually explain the variation of the sales data. In other words, a parameter must have a particular level of significance which explains the variation in the data.One should select only those factors which significantly cause the fluctuation in the actual manner of the data.

To separate out those factors, various factors are included as the independent factors and then on finding no significance it was removed. To check the significance, let’s again recall the summary output of the model.An R2 value means that together all the parameters explain X% variation in monthly sales. So, first criteria of considering parameters should be the value of R2. It should be high enough to get accepted. No specified value of R2 is considered as a reference. But it should be more than 90%. Our first objective is to get higher and significant R2 value.After adding related parameters, desired value of R2 is got.

But this is not enough to get the higher R2 value. After passing through the first page, it is necessary to check the validity of the individual parameter.To check the significance of individual parameter, one should deal with the P-value from the summary output.

P-value reflects the predictive power of that parameter. In simple words, the variation in the dependent variable is explained by the P-value.The smaller the predictive power, the higher the predictive power of the independent variable.

It is not important to get the high R2 value by adding no. of parameters.But it is much of importance to get the output by having the high predictive power of the individual parameter.

56

BY having more number of parameters, the assumptions of the multiple regression technique are getting violated. So, it is needed to consider parameters with the reasonable predictive power.

‘1-P’ value is the proportion of predictive power.Suppose for a specific SKU, P-value got is 0.62 i.e. 62%.Then one can very easily say that that parameter has a predictive power of 38% variation in actual sales.As a rule of thumb, it is good to have the p- value not higher than .15.That means, a parameter must have at least 85% predictive power.But for this task, it has been modified. Decision of including a parameter is taken on the basis of several SKUs.BY checking the outputs of 1 SKU, it is not advisable to remove or include a parameter. The impact of it should have a considerable impact on the product basket as a whole.

Parameters included…By the help of the company officials, a list of the factors affecting the sales was made. As per the data availability and the feasibility criteria, parameters are short listed, which are to be included in the model. Enough pondering is done on each factor.Each side of the concerned topic is checked, analyzed and moderated. According to the extraction of brainstorming, parameters are included as an independent factor affecting the actual past sales.

Let’s discuss about each factor’s feasibility and impact.Future scope for the inclusion of parameter is also assessed.Let’s inspect the parameters one by one.

Time referenceAs a time horizon for 24 months is being considered, it is necessary to consider the time reference. Each months starting from June’05 to May’07 are allocated the rank starting from 1 to 24.In any Regression technique it is must to have the time reference as a parameter.Hence, time reference is considered as a parameter.P-value is also observed to validate this. And every time, we have got the lower p-value. That signifies the high predictive power.

Trend In any regression technique, it is must to consider the trend. It gives the actual

picture of the pattern, followed by the SKU. SO, it is also included as a parameter.

SeasonalityPharmaceutical products show sudden increase in the sales in particular duration and sudden decrease also. This is solely due to seasons. IT is known as the effect of seasonality. Almost all the products have exposed the seasonality in typical month in more or less proportion. It is necessary to include this.Seasonality is the most contributing factor. It gives fluctuations due to seasons. Several drugs are season specific and they are used in that season only.Quantification of seasonal factor is done and considered as one of the parameter.

57

Promotions and schemesPromotions and schemes are the kind of factors which hikes the sales up. There are certain drugs like Quintor, Lopamide, Diclogesic… which has shown very high increase due to schemes and promotions. Every company has its own strategies about these two factors. Promotional budget varies from company to company. It is one of those factors, which affect the actual sales in very irregular manner. It is wise to add these factors as independent parameter.

Market shareMarket share is also included as a parameter. Market share gives the picture of whole market. It gives the effect of company’s share with the reference of whole market,

Market growth of the SKUBy considering market growth of the SKU, we can have the future market growth estimation. One can check the actual growth or degrowth of the SKU. By having market share of the SKU, one can project the market share in future.

Market growth of the BrandIt gives the growth of the brand. A brand contains various SKUs. To consider the inter effect of SKUs, it is to be considered. It is possible that a SKU may be degrowing and others are not or vice versa. To capture the comparative picture, market growth of the brand is considered as a parameter.

Market growth of the Molecule and molecule class Market growth of the molecule and molecule class again gives the growth of whole capture the whole picture of the market of that particular molecule. By considering

molecule, we can capture the effect of competitors’ products. In a particular molecule, there can be number of brands available. To capture the whole competitive scenario, molecule growth is considered as a parameter.

While molecule class gives the picture of whole segment. By considering molecule class, we will be able to capture the macro factors like government decision, patent related issues, Additional duties, taxes levied by government. We can have an effect of hike or decline in the particular segment.

Field TargetsField Targets are the targets given to the marketing sales force. It has to have an effect on actual sales. And so forecasts are to be made based on this. It has been included as a parameter but due to less data availability it is not possible to include field targets.

Tertiary salesTertiary sales are the sales to the consumer from retailer’s counter.It must have a significant impact on primary sales. It must be considered as a parameter. Tertiary sales data are collected by ORG-Marg.

58

Field ForceField force has always a significant impact on actual sales. It must be considered as a factor. It has been included as a parameter, but due data wasn’t available division vise. So, later it has been excluded.

After enough brainstorming and checking the statistical significance of all the factors, the below given 8 factors are included as independent parameter in to the multiple regression model.

Construction of Multiple Regression Model by using ExcelAn excel file is prepared which includes the proposed model. This file is made by combining the SAP Data and ORG Data..

Model form which is defined for fitting

Y =bo+ b1*x1 + b2*x2 + e;

Where Y= Actual sales data, without institutional sales, with Nap quantity X1=Time reference (month no.) X2=Tertiary sales X3=Market share of an SKU X4=SKU growth X5=Brand growth X6=Molecule growth X7=Molecule class growth X8=Seasonal factor

Output Sheet

59

An output sheet contains the name of the SKUs marketed by Torrent, with its unique product code. It also contains the name of the division to which it belongs to.

e.g.

ADCEF CAPS., 5X10 C, SL 5000505 Prima

ALPRAX 0.5, 15 TABLETS, SALE 5000798 Vista

ANTIDEP 25 TAB 5000020 Neuron

AZUKON M TAB 5000030 Delta

AZULIX -1 5000035 Azuca

BALACOL CAP 5000039 Alfa

CARBATOL 100 TAB 5000049 Mind

CLOBATOR 10, SALE 5000660 Axon

DILZEM 30 TAB 5000065 Psycan

DILZEM SR TAB 5000071 Omega

FELIZ S 20 TAB 5000113 Sensa

Then in the cell, a filtered list is created by using validation function in the data tab of MS Excel.BY creating this list, one can have the product specific data.

How is validation list prepared?

60

Select a cell in the excel file.Go to Data-Validation-Settings-List-RangeUnder Validation a dialogue box will get open. Under that go for settings.Select List then specify the range where data is placed. In our case, we give the range of the data of the SKU.BY doing the operation, we can have a list of all the SKUs in a single cell.

In the other cell, by using vlookup function, a product code allocated to that SKU is put. By getting the product name and code in the sheet, the summary statistics of the sales and orders are also put in the output sheet.BY seeing this, one can have the average of sales and orders and the maxi-min statistics.

Then the month list is put in column. As the data used is on month basis, the months’ years format is followed i.e. April’07, May’07

In every column, the variables are put. The sequence of the variables is as follows;

Month Primary sales

Month no.

Tertiary Sales

Market Share

SKU Growth

Brand Growth

Molecule Growth

Molecule ClassGrowth

Forecast

APRIL’07

Excel considers only those data, which are in the adjacent columns.NO blank column or cell should be put under the range.An output sheet is prepared by having all the independent parameters’ past data and dependent variable’s past data.

A database is needed to be created for the purpose. As we should have 24 months data for 500+products for 9 variables, it is necessary to have a database which can be used as a reference

Database As there are 8 independent parameters included and a dependent parameter listed as above. Data for each product has to be stored and modified as needed.Data must be modified, so that it can be used for the purpose.The actual challenge is to create a database which can easily be updated on having new data.This database must be on monthly basis.A lot of work has been done on combining the data of ORG and SAP files.Let’s go one by one how the data base is prepared.

Actual Primary sales data, without institutional sales, with Nap quantity

61

Sales data are taken from the file named CODIS (Correlation of Orders, Demand, Inventory and Sales). This file contains data of Sales, Orders, Demand, and Inventory for period ranges from June’05 to June’07.

Primary Sales data are thus got.As per our objective, the model should be made for trade operations only.The primary sales got from CODIS, include institutional sales also.We are not interested in considering institutional sales.So, institutional sales must be removed from the data.To get it removed, first of all we should have a data of institutional sales of every SKU for every month.

The data of Institutional sales are got from the SAP.SAP provides the data of the whole year for each SKU (date wise). After getting this data, it is needed to modify it. The data needed should be month wise data. But the data got is data day wise. So, it is possible to have institutional sales on more than one day in a month for a same SKU. This should be added up to get the monthly data of institutional sales. Data for each SKU is assigned the name of SKU and a unique product code.

The date format was converted to the month format.Then a PIVOT Table was made to sum up the various data of the same SKU for the same month.

After getting a PIVOT table, institutional sales data are got for every month.IN the output sheet, a separate column for Institutional sales is made. In that particular column, the data should be placed against the corresponding month.This can be done by using Vlookup Function.

NAP (Non Available Product)Due to certain circumstances, it is possible to have some products not available for the next month. These products are known as NAP at Torrent. NAP quantity is the quantity which could not be made available for the sales. It could have resulted into genuine sales.So, this should be considered as sales and is added to the primary sales as a part of sales.DPC prepares the NAP report for every month. So, it is easy to have NAP quantity for every month.A NAP sheet is prepared and put in the file.Again by using Vlookup function, we have a column for the NAP quantity,By adding it to the primary sales, we can have the primary sales with NAP quantity.

Thus, we are ready with the dependent variable i.e. Primary sales without Institutional sales with NAP quantity.

Tertiary SalesTertiary sales are made available by the ORG-data as per the format shown before.

62

The basic task is to match it with the primary sales data.Product description by ORG and by Torrent differs many times.So, it is the basic requirement to create a common indicator for both the data.

Product codes, assigned by the TPL, are used as an indicator. In the ORG data sheet, product codes are entered against the corresponding products manually.After adding the Product code, tertiary sales (in units) are placed at Output sheet by using Vlookup function.

ORG data also provides the unique codes to every product.On every month, new data must be fed in the sheet for updating it.For updating the file, one should refer the unique codes given by the ORG.In the beginning of a new month, new data are replenished along with the description and code.By referring this code, one should use vlookup function for updating the file.It is 5 minute exercise to get the file updated for a new month.Time ReferenceIn any regression method it is must to refer the time reference. So, here we assign month numbers to the months starting from the June’05.

Market ShareMarket Share= No. of units sold of an SKU

No. of units sold of the molecule of SKU for the total market

Market share of ALPRAX 0.5= No. of units sold for ALPRAX 0.5 No. of units sold for ALPRAZOLAM To get the division of the above two terms it is necessary to get the corresponding molecule data for each SKU data, in the same pattern as it is for the SKU.

In ORG data, Molecule name is generally given the cell above the cell containing SKU’s name. It is not convenient to directly use the division formula.

It was made convenient by using various Excel formulae and with some manual work.

After getting the molecule data in the row same as SKU, a division is done and accordingly the market share is derived.

A column for market share is created by putting the data in output sheet.

SKU growth (monthly basis)SKU growth is calculated on the basis of quantity sold.

SKU growth (for month’N+1’)=No. of units sold of the same SKU in the month N+1 NO. of units sold of the same SKU in the month N

It is very easy to calculate. Thus we can have the column of SKU growth in the output sheet.

63

-1

Brand GrowthIn the ORG sheet, against every product code, a brand name is put. This brand name is taken from the Material Master, which gives every information about any TPL product.

SKU CODE BRAND

VELOZ 20 TAB 5000438 Veloz

VELOZ 10 TAB 5000439 Veloz

DOMSTAL RD CAP 5000080 Domstal

DOMSTAL RD C, SL(MALEATE) 5000406 Domstal

DOMSTAL RD SR, 10X10 CAPSULES, SALE 5000825 Domstal

EUREPA 1.0 TAB 5000106 Eurepa



Then by having brand name against every SKU, a pivot for the brands is prepared.BY preparing Pivot, we can have the sum data of the Brand.Pivot table is also prepared in the same format as of SKU units sold.

Brand growth (for month’N+1’) =No. of units sold of the same brand during the month N+1 NO. of units sold of the same brand during the month N

This is again got by the simple division.It can be put to the output sheet under the column of Brand growth.

Molecule growthEach product has its own molecule. To get the molecule data, a pivot for molecule is prepared. And then accordingly the formula, molecule growth is considered.

Molecule growth (for month’N+1’) =No. of units sold of the same Molecule during the month N+1 NO. of units sold of the same Molecule during the month N

Molecule Class Growth

Each molecule belongs to certain class. Below shown is the molecules under the molecule class.

e.g. Molecule

class ANTIPEPTIC ULCERANTS

64

-1

-1

Molecule RANITIDINE ORAL SOLIDS

Molecule RANITIDINE ORAL LIQUIDS

Molecule RANITIDINE INJECTABLES

Molecule FAMOTIDINE ORAL SOLIDS

Molecule FAMOTIDINE ORAL LIQUIDS

Molecule FAMOTIDINE INJECTABLES

Molecule OMEPRAZOLE ORAL SOLIDS

Molecule OMEPRAZOLE ORAL LIQUIDS

Molecule LANSOPRAZOLE

Molecule PANTOPRAZOLE SOLIDS

Molecule PANTOPRAZOLE INJ.

Molecule COMBIPACK+ANTIINFEC

Molecule ESOMEPRAZOLE

Molecule RABEPRAZOLE

Molecule OTH. ANTI-PEPT.ULCE.SOL.

Molecule SUCRALFATE.

Molecule OMEPRAZOLE.INJ

Molecule OTH. ANTI-PEPT.ULCE.INJ.

Molecule OTH. ANTI-PEPT.ULCE.LIQ.

Molecule OMEPRAZ.+ DOMPERID.

Molecule RANITID.+ DOMPERID.

Molecule LANSOPR.+ DOMPERID.

Molecule PANTOPR.+ DOMPERID.

Molecule RABEPRA.+ DOMPERID.

Molecule ESOMEPR.+ DOMPERID.65

Molecule RABEPRAZOLE INJ.

Molecule class growth (for month’N+1’) =No. of units sold of molecule class during month N+1 NO. of units sold of the same molecule class during the month N

ORG gives the data for the molecule class. A separate sheet is made for the molecule class.ORG gives the data of molecule class along with molecules belong to it.Again the column of molecule class growth can be filled up by using this data.

Seasonal FactorTo calculate the seasonal factor, first the sales should be made deseasonalized. To get the deseasonalized data, one should use linear regression method.Linear regression deals only with the trend and intercept.By applying we can have the Y^ referring to the particular month, which doesn’t show any seasonal impact.

Seasonal factor= Actual sales for the month N Forecasted sale by using linear regression for the month N

Suppose Actual sales of a product is 11000 While the forecasted sales by the linear regression is 10000Then the seasonal factor = (11000)\ (10000) =1.1This shows the seasonal factor 1.1. It can be interpreted like this, due to season there is a hike of 10% in sales.We have the data available for two full cycles. SO, we can have the seasonal factor of two years.

How to calculate seasonal factor? Seasonal factor = actual sales Y for month N Forecast(X, known Y’s, Known X’s)

Below given is the data of primary sales of Nexpro RD 40 Cap

month

primary sales without

insti.with nap

Forecast by linear

regressionSeasonal Factor(St)

Formula for calculating St

May'05June '05 83,399 83109.9 1.00 Y/Ft (linear regression)July '05 81,680 83430.2 0.98 Y/Ft (linear regression)

66

-1

Aug '05 83,437 83750.5 1.00 Y/Ft (linear regression)Sept'05 64,362 84070.8 0.77 Y/Ft (linear regression)Oct '05 85,145 84391 1.01 Y/Ft (linear regression)Nov '05 79,934 84711.3 0.94 Y/Ft (linear regression)Dec'05 81,277 85031.6 0.96 Y/Ft (linear regression)Jan '06 88,265 85351.9 1.03 Y/Ft (linear regression)Feb '06 68,110 85672.1 0.80 Y/Ft (linear regression)March '06 56,950 85992.4 0.66 Y/Ft (linear regression)April '06 110,115 86312.7 1.28 Y/Ft (linear regression)May '06 107,381 86632.9 1.24 Y/Ft (linear regression)June '06 117,940 86953.2 1.36 Y/Ft (linear regression)July '06 109,300 87273.5 1.25 Y/Ft (linear regression)Aug '06 104,270 87593.8 1.19 Y/Ft (linear regression)Sept'06 98,595 87914 1.12 Y/Ft (linear regression)Oct'06 94,140 88234.3 1.07 Y/Ft (linear regression)Nov '06 91,240 88554.6 1.03 Y/Ft (linear regression)Dec'06 70,325 88874.9 0.79 Y/Ft (linear regression)Jan'07 81,815 89195.1 0.92 Y/Ft (linear regression)Feb'07 74,930 89515.4 0.84 Y/Ft (linear regression)Mar'07 56,101 89835.7 0.63 Y/Ft (linear regression)Apr'07 102,663 90156 1.14 Y/Ft (linear regression)May'07 91,660 90476.2 1.02 Y/Ft (linear regression)June'07 93,705 90796.5 1.03 Y/Ft (linear regression)July '07 1.12 Aug '07 1.09 Sept'07 0.94 Oct'07 1.04 Nov '07 0.99 Dec'07 0.87 Jan'08 0.98 Feb'08 0.82 Mar'08 0.64

After getting the seasonal factor for the available data, it is needed to give estimates for the coming data.For calculating seasonal factor, certain formulae were made.Let’s consider, the seasonal factor for October’07, St (October’07) = (St( october’06) +St(october’05))/2

When Data are available for two cycles or moreIf two cycles are available, then the seasonal factor for the particular month should be taken as an average of the corresponding months.On having three cycles, one should always take the average of the corresponding months.

When Data are available for 1 cycle ON having data of one cycle, i.e. 12 months, formula get changed.The formula

67

St= 0.8*(Seasonal factor of the corresponding month of last year) +0.1*(average of Seasonal factor of immediately preceding 6 months) +0.1*(average of Seasonal factor of past 6 to 12 month)

When data are available for less than 1 cycleIn this case, the seasonal factor is straight taken as the average of the past 6 months seasonal factor.

St= average of seasonal factor of preceding 6 months

.

Multiple regression Output sheet & Summary sheet On the next page is given the output sheet of the model and the summary output of the data by LINEST function

The sheet contains the following information.

SKU name Product code The division name

68

Brand name Molecule name Molecule class name Launch date Orders and sales statistics

Let’s understand the sheet,For the SKU shown in the corresponding cell, the data available is for that specific SKU. The columns contain the following description sequentially

Data horizon ranges from May’05 to June’07. Month, starting from May’05 to Mar’08. The time reference. i.e. month number The primary sales without institutional sales with NAP quantity. The market share SKU growth Brand growth Molecule Growth Molecule class growth Seasonality Forecasts based on 2years data Forecasts based on 1 year data Forecasts based on 9 months data Forecasts based on 7 months data Forecasts based on 6 months data

69

Summary output by LINEST

Multiple regression(25 months)Seasonal

Factor

Molecule class growth

Molecule

growth

Brand

Growth

SKU growth market

share

Tertiary

sales (ORG)

Month no. Intercept

85702.81 1189.07 -4.14E+03 -2137.77 4148.39 -37629.80 0.06 292.98 0.00561.98 1191.25 3118.64 2144.92 1547.88 6164.03 0.02 12.49 #N/A

1.00 401.69 #N/A #N/A #N/A #N/A #N/A #N/A #N/A1.45E+05 16.00 #N/A #N/A #N/A #N/A #N/A #N/A #N/A1.87E+11 2.6E+06 #N/A #N/A #N/A #N/A #N/A #N/A #N/A

Multiple regression(13 months)

88499.87 -1762.17 -4709.18 -173.08 4024.04 -39244.52 0.03 301.71 0.001085.93 1634.58 3105.55 2210.94 1404.08 8862.80 0.01 31.43 #N/A

1.00 276.90 #N/A #N/A #N/A #N/A #N/A #N/A #N/A186686 5.00 #N/A #N/A #N/A #N/A #N/A #N/A #N/A1.E+11 3.83E+05 #N/A #N/A #N/A #N/A #N/A #N/A #N/A


94107.04 -5754.60 1284.85 6148.97 -2083.76 -13635.59 -0.07 196.13 0.00562.60 519.19 1400.03 807.66 893.63 3358.84 0.01 13.64 #N/A

1.00 52.13 #N/A #N/A #N/A #N/A #N/A #N/A #N/A3547642 2.00 #N/A #N/A #N/A #N/A #N/A #N/A #N/A

8.E+10 5434.78 #N/A #N/A #N/A #N/A #N/A #N/A #N/A


98034.48 -5249.61 2166.28 5800.22 -2839.62 0.00 -0.14 155.25 0.000.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 #N/A1.00 0.00 #N/A #N/A #N/A #N/A #N/A #N/A #N/A

#NUM! 0.00 #N/A #N/A #N/A #N/A #N/A #N/A #N/A4.77E+10 0.00 #N/A #N/A #N/A #N/A #N/A #N/A #N/A


98743.70 -6460.18 0.00 6597.00 -1817.73 0.00 -0.16 201.99 0.000.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 #N/A1.00 0.00 #N/A #N/A #N/A #N/A #N/A #N/A #N/A

#NUM! 0.00 #N/A #N/A #N/A #N/A #N/A #N/A #N/A4.E+10 0.00 #N/A #N/A #N/A #N/A #N/A #N/A #N/A

coeff(n) coeff(n-1) coeff(n-2) coeff(n-3) ……se(n) se(n-1) se(n-2) se(n-3) ……coeff of det S.EF stats d.f.SS reg SS resid

where;Coeff= coefficient, se= Standard Error, coeff of det= coefficient of determination, d.f.= degrees of freedom, SS reg= Sum of squares due to regression (variance), SSresid= Sum of squares due to residuals (variance)

NEXPRO RD 40 CAP

Statistics given by function

71

Getting the summary output

To get the Summary output (multiple regression Output), LINEST function is used.After getting all the variables (independent and dependent) in adjacent columns, one should apply LINEST function.To apply LINEST function, all the independent variables must be put together within the adjacent columns and before or after it, one should put the dependent variable.

The format can be seen in the table the page before.

Applying LINEST functionThere are 8 parameters included in our model.

1. A table of 8 columns and 5 rows is selected in the output sheet2. By pressing F2, one will select a cell to write the formula3. Formula for the LINEST function =LINEST (Input Y range, Input X range, false, true) False is put for not considering intercept True is put for getting the other statistics along with the coefficient data.

The table format of the summary output is shown in the above sheet.

By this we can have the coefficients of all the parameters in the output.These coefficients are multiplied with the relevant cofactor of the corresponding parameter.

We have data available in a varying range.For certain products we have the data available for 2 years, 1 year or less than 1 years.

To take care of these type of product with the less data LINEST function is applied and the summary output is obtained for the varying range of data.That is shown in the output sheet and summary output sheet.

Forecast Y= Respective coeff.*Month no.+ Respective coeff*Tertiary sales+Respective coeff.*market share+ Respective coeff*SKU growth+ Respective coeff* Brand growth+ Respective coeff*Molecule growth+ Respective coeff* molecule class growth+Respective coeff*seasonal Factor

Respective coefficient is the coefficient got from the summary output for individual parameter.

This will give us the forecast for the past months. BY this we can calculate the residuals. LINEST function also provides us the statistics regarding the data.According to the assumptions of multiple regression models, standard error is given.Standard error is the value which should be used to decide the outliers.

72

If residual is greater than 2*standard error, then that particular forecast is considered as an outlier. There has been no case seen, where there was found any outlier.This is validation of the model.

But to cover the error part, in our forecast we add the 2*standard error.

Why is 2*standard error is considered?2*standard error covers 95.5% of the residuals.One can estimate that for the future forecasts, the residuals will fall under the range of +/- 2*standard error.

To cover the future residuals 2*standard error is added in our forecast equation.So, the equation for the forecasting sales will be;

Forecast Y= Respective coeff.*Month no.+ Respective coeff*Tertiary sales +Respective coeff.*market share+ Respective coeff*SKU growth

+ Respective coeff* Brand growth + Respective coeff*Molecule growth + Respective coeff* molecule class growth +Respective coeff*seasonal Factor

+2*standard error

Estimation of future values for independent parameters

We have the data available from June’05 to May’07. Forecasts of the future months are needed. Forecast for the primary sales are needed to be predicted. Forecasted sales are the dependent variable, which depends upon the values of independent parameters.Actual values of all the parameters are available till the month of May’07 only. To get the future forecast, one must have the future estimated values of the parameters based on the past data.There are certain formulae developed for the parameters. According to the formula, a value of the parameter is estimated and used for the forecast demand.

On the top of the output sheet, one can notice the formulae used to estimate the values of the independent parameters for the coming month.

73

For future values of parameters, the following formulae were worked out to get the estimate of future values.

Parameter value for the month ‘N’=0.7*(average of last 6 months) +0.3*(average of same quarter last year)



Parameter value for the month ‘N’= Average of last 6 months Parameter value for the month ‘N’= Average of last quarter

Any of the above stated formula, which is most suitable, should be used.The formulae used for estimating parameters differ from parameter to parameter.After enough brainstorming, there are certain formulae decided for forecasting the month of June’07.

For validate the results, the data of the months are held back.In the month of June’07, we have the real data available for all the parameter till the month May’07.That can be used for the forecasting. But instead of using the real data, the estimated data for parameters are used and the forecasts for the month of June’07 are derived.It has shown favorable results.

By multiple regression model, we have 5 forecasts available for each month, each SKU.

Why 5 forecasts?

This is done to cover all the products under the model. There are the products which may have the less data available.BY having the forecasts based on the different time period, a forecaster has a better idea of product forecast. While reviewing, it is possible to classify the products which rely on the longer data horizon. Or does it rely on the less data.A forecaster can select the most suitable forecasts by using expertise.

74

Regional level forecasting

Regional level forecasting model is prepared in the same manner as it is prepared for the gross level forecasting.All the assumptions made are same as in the gross level forecasting.All the parameters included are also same as in the gross level forecasting.

For this the region wise primary sales data, are fetched from the SAP.And the region wise tertiary data are collected from the ORG.

There are some differences in name of regions used in these two files. They are needed to make identical.

The below given is the list of regions which differ in both the files as per terminology

Region as per the SAP terminology Region as per the ORG terminology

Common name used

Punjab,Haryana,Chandigarh,Jammu Kashmir

Punjab/Haryana Punjab/Haryana

Bihar, Zarkhand Bihar BiharAssam,Mizoram,Nagaland,Manipur,Arunachal Pradesh

Assam Assam

Madhya Pradesh,Chhatisgarh Madhya Pradesh Madhya PradeshChennai, Pondichery Tamil Nadu Tamil NaduMumbai, Maharashtra Maharashtra Mumbai

Other names which were identical are not changed and taken as available.

Once having the primary sales month wise, SKU wise and region wise. A unique key is made by product code and region name. Then by using unique key, a pivot table is prepared for the product wise-region wise data.

To get every parameter data in the desired form, the unique keys are made by the parameter and region name. These unique keys are used to get the statistical values from the database.

Two validation lists are prepared in two separate cells. In one cells, a validation list of SKUs and in another a validation list of region name are prepared.

75

All the parameter used in the gross level model, are also used here for forecasts.By using the same function, forecasting is performed.

By regional level forecasting model, one can have the forecasts for a particular SKU for a particular region.

Due to time constraint, it wasn’t possible to analyze the outcomes of the model for regional forecasting. It is not possible to compare it with the existing data.

It has shown significant and matching outcomes (forecasts) with the actual past data.It has shown a high R2 value, which again states the reliability of the model.

Validation of the model couldn’t be performed.

So, it is not advisable to rely on the model without checking its trustworthiness.

76

Group Division

APOD Psycan code u key Molecule ukey Brand ukeyMclass ukeyProduct Name 5000173 5000173PUNJAB/HARYANA

NikoranPUNJAB/HA

RYANA

POTASSIUM CHANNEL

OPENERSPUNJAB/HARY

ANA

St error St error St error St error235.7927102 159.878166 114.9744 0

Month Primary

Sales

Month no Tertiary

sales

Market

Share

SKU

growth

Brand

growth

Molecu

le

growth

Molecul

e class

growth

Seasonal

factor

forecast forecast forecast forecast

May'05 7320

June '05 8150 1 563 #VALUE! 1.27

July '05 5250 2 691 #VALUE! 23% 0.81

Aug '05 6955 3 474 5.36% -31.40% 33.35% -100.00% 20.13% 1.06 7008

Sept'05 6000 4 462 5.23% -2.53% -17.18% -100.00% -0.30% 0.91 5987

Oct '05 5915 5 378 4.28% -18.18% -19.16% -7.56% -8.59% 0.89 6043

Nov '05 6230 6 489 5.53% 29.37% -4.82% 11.74% -12.37% 0.93 6401

Dec'05 5455 7 590 6.67% 20.65% -3.91% -19.82% 12.97% 0.81 5494

Jan '06 7250 8 415 4.69% -29.66% 13.41% -3.33% 8.91% 1.07 7428

Feb '06 7150 9 931 10.53% 124.34% 3.98% 1.50% 5.37% 1.05 7166

March '06 6225 10 556 6.29% -40.28% 7.66% 21.97% -21.90% 0.90 6181

April '06 6800 11 868 9.82% 56.12% -5.02% 21.28% 19.51% 0.98 6762

May '06 6750 12 705 7.97% -18.78% 13.78% 4.06% -4.06% 0.97 6637

June '06 7690 13 655 7.41% -7.09% 5.20% -20.28% 10.17% 1.09 7639 7757

July '06 9010 14 510 5.77% -22.14% -10.56% 23.01% -22.00% 1.27 9017 9037

Aug '06 5635 15 819 9.26% 60.59% -13.72% 14.29% 4.93% 0.79 5633 5662

Sept'06 9260 16 758 8.57% -7.45% 32.03% -9.95% 11.81% 1.29 9085 9188 9278

Oct'06 5720 17 578 6.54% -23.75% 0.70% 5.10% 5.64% 0.79 5775 5699 5720

Nov '06 11685 18 575 6.50% -0.52% -18.18% -16.98% -4.25% 1.60 11498 11690 11676

Dec'06 6810 19 616 6.97% 7.13% -2.02% 6.36% -0.57% 0.93 6854 6833 6875 6810

Jan'07 6300 20 583 6.59% -5.36% -17.58% -4.96% 12.57% 0.85 6384 6346 6303 6300

Feb'07 8255 21 674 7.62% 15.61% -3.90% 13.21% -0.13% 1.11 8193 8184 8213 8255

Mar'07 6450 22 538 6.08% -20.18% -16.58% -18.40% -12.84% 0.86 6479 6446 6449 6450

Apr'07 6310 23 603 6.82% 12.08% 29.34% 10.47% 1.79% 0.84 6465 6390 6321 6310

May'07 7220 24 716 8.10% 18.74% -11.42% 0.82% -0.13% 0.95 7179 7141 7183 7220

June'07 7090 25 631 7.03% 4.67% -3.69% -1.59% 1.54% 1.18 8890 8887 8792 9005

July '07 26 607 7.04% 4.26% -3.97% -0.66% 0.40% 1.04 7984 7939 7828 8071

Aug '07 27 634 7.12% 5.86% -1.71% -1.30% -0.18% 0.93 7216 7140 7048 7230

Sept'07 28 673 7.03% 4.24% -1.34% 0.44% 0.27% 1.10 8435 8378 8274 8478

Oct'07 29 645 7.19% 8.31% 1.20% -0.22% 0.37% 0.84 6711 6598 6483 6693

Nov '07 30 615 7.25% 7.68% -3.49% 0.31% -0.99% 1.27 9764 9736 9535 10023

Dec'07 31 627 7.11% 5.84% -2.17% -1.13% -1.09% 0.87 7009 6883 6722 7049

Jan'08 32 630 7.12% 6.03% -1.91% 0.79% 0.22% 0.96 7701 7581 7385 7786

Feb'08 33 631 7.14% 6.33% -1.57% -0.28% 0.05% 1.08 8567 8467 8241 8721

Mar'08 34 625 7.14% 6.40% -1.55% -0.41% 0.07% 0.88 7257 7106 6879 7335

Product

code

PUNJAB/HARYANANIKORAN 10 MG TAB

POTASSIUM CHANNEL

OPENERSPUNJAB/HARYANA

77

coeff(n) coeff(n-1) coeff(n-2) coeff(n-3) ……Multiple regression with 22 months data se(n) se(n-1) se(n-2) se(n-3) ……6999.49418 213.64955 130.175197 59.5043904 223.309255 0 -0.949342 49.0647227 coeff of det S.E

122.780004 289.07747 101.682141 182.658392 105.769277 0 0.236504 4.96445932 F stats d.f.

0.99981663 117.89636 #N/A #N/A #N/A #N/A #N/A #N/A SS reg SS resid

11683.7445 15 #N/A #N/A #N/A #N/A #N/A #N/A

1136791582 208493.26 #N/A #N/A #N/A #N/A #N/A #N/A where;

Multiple regression with 12 months data

7222.83534 197.21526 -144.76964 79.1722886 380.927131 0 -1.025748 39.9117252

112.37055 371.74543 218.079519 163.416148 143.725691 0 0.279087 5.8750729

0.9999553 79.939083 #N/A #N/A #N/A #N/A #N/A #N/A

15979.7021 5 #N/A #N/A #N/A #N/A #N/A #N/A

714800824 31951.285 #N/A #N/A #N/A #N/A #N/A #N/A


7100.23421 -274.8171 -207.9687 -54.6672434 478.379199 0 -0.19953 21.1117289

109.416008 369.19228 275.697148 133.057019 170.508045 0 0.3778399 8.29341088

0.99998782 57.487201 #N/A #N/A #N/A #N/A #N/A #N/A

23461.9905 2 #N/A #N/A #N/A #N/A #N/A #N/A

542756740 6609.5566 #N/A #N/A #N/A #N/A #N/A #N/A


7489.4905 -106.92 0 -420.556145 1032.13099 0 -2.176247 59.4340853

0 0 0 0 0 0 0 0

1 0 #N/A #N/A #N/A #N/A #N/A #N/A

#NUM! 0 #N/A #N/A #N/A #N/A #N/A #N/A

287758125 0 #N/A #N/A #N/A #N/A #N/A #N/A

Statistics given by function

Coeff= coefficient, se= Standard Error, coeff of det= coefficient of determination, d.f.= degrees of freedom, SS reg= Sum of squares due to regression (variance), SSresid= Sum of squares due to residuals (variance)

78

Measuring forecast accuracy for forecast by modelAs discussed before, DPC has a specific method for measuring forecast accuracy.A sheet is prepared for calculating the forecast accuracy. For every SKU, marketing demand is put for the month of June’07.The forecast accuracy for the Month of June’07 is calculated. The below stated is the method of measuring forecast accuracy of marketing demand.% sale to demand= sales DemandIt is considered HIT, if a % sale to demand falls within the range of 90% to 110%. Otherwise it is marked as MISS.Forecast accuracy= No. of Hits Total no. of SKUs for which demand is put forwardThe demand of model is put under the test sheet prepared. By the above stated method, again the forecast accuracy is measured.The same exercise is done for the whole product basket and thus the forecast accuracy report is prepared.Forecast accuracy is measured for each group i.e. PVA, APOD, SMANThese results are compared to the results of forecast accuracy of the marketing demand.

FINDINGSIncrease in Forecast Accuracy

Product group

Forecast accuracy

Forecast accuracy for JUNE'07. (Considering MARCH'07 as a reference point)

Data used (in months) Bymarketing demand

by model demand

PVA 20 31% 36%12 30% 33%9 30% 35%7 29% 38%

Whole data 28% 32%

APOD 20 48% 74%12 46% 68%9 46% 60%7 46% 60%

Whole data 43% 49%SMAN 20 44% 53%

12 36% 41%9 34% 36%7 34% 36%

Whole data 33% 38%

79

The above table shows the comparison of the forecast accuracy by marketing demand and by model demand for the month of June’07 referring to the month of March’07.The data available in the month of March is 20 months. So, the maximum data available in March’07 is 20 months only.

It has shown significant increase in accuracy. To check the impact of the past behavior of the model, forecasts are used by using the data in different horizon.

Forecasts are based on the 20 month data and their forecast accuracy is calculated.Similar exercise is repeated for the data of 12 months, 9 months and 7 months.Forecast accuracy is also calculated for the whole data. By whole data, it means that a demand is use whichever is available.

New products, stable products, and seasonal products are covered under the whole data

Reduction in over sales

For every month, a demand plan is given by the marketing department for every group.In demand plan, demand for every product is given and accordingly the value is calculated on the PTS (price to stockist).

We filtered the products which are oversold. The products in which more quantity is sold then the given demand plan.This states that the inventory of the next month is used for the sales of present month.This will create problems for the next month supply planning. This amounts to Rs. 287.50 Lac.While same data is reviewed for the demand plan given by the Multiple Regression model. That has given the total amount of Rs. 65.43 Lac.

Thus there is a high difference seen in the value of over sales done than demand plan.

Model has also shown a reduction in the no. of SKUs where over sales is observed.

By following the market demand over sales was observed in 74 products while by following model, the over sales is observed only in 47 products

80

Over sales for the month of June'07 (Figures in Rs Lacs.)

Marketingdemand

Modeldemand

NO. of

SKUs

DemandPlanvalue

ActualSalesvalue

OverSales than

demand given

% todemand

plan

NO. of

SKUs

DemandPlanvalue

ActualSalesvalue

OverSales than

demand given

% todemand

plan

PVA 26 625.90 858.49 232.59 37% 9 79.86 101.96 22.10 28%

APOD 17 122.41 154.98 32.57 27% 18 64.60 87.83 23.23 36%

SMAN 31 67.36 89.70 22.34 33% 20 107.7646127.8905 20.13 19%

Postponement of production

05

1015202530

35

division

PVA APOD SMAN

By market demand plan

No. of SKUs with Over Sales

By marketing demandPlanBy model demand plan

81

Product name Value in ‘Lac Rs.ADCEF CAPS., 5x10 66.98948

LAMITOR DT 50 TAB 36.42165ENSELIN 4 TAB 30.98331ENSELIN 2 TAB 30.88783TOZAAR 25 TAB 27.72641

OLEPTAL DT 300 TAB 25.43204TOPCEF 50 DT TAB 24.17513

TIDOMET FORTE (DE) TAB 21.58802SERENATA 50 TAB 20.47776

DICLOMAX INJECTION, 25 mg, 3 ml 18.88192TOPCEF INSTA-USE,SL 16.0351

VASOTRATE 20 TAB 12.70533BETACARD 100 TAB 9.438405

THIORIL 50 TAB 8.35996VASOTRATE 30 OD TAB 8.114095

DOMSTAL MPS TAB 6.450026CARBATOL CR 200 TAB 3.8868

VASOTRATE 10 TAB 3.030534DROXYL ORAL SUSPEN 2.71025

Total 374.2941

Batches of Products which can be scheduled for the next month and working capitalcould have resulted for unblocking the working capital.Total availability on the 1st June, 2007, is taken into account to calculate this report.Along with the total availability, marketing demand for the month of June and July is taken into consideration.The quantity of Products received during the month of the June’07 is also taken into account.According to production planning, May receipts are used to serve the July’07.Along with all these data, the model demand for June’07 and July’07 is also considered.After having these much data, the following calculations are done.

Excess quantity by Mktg. demand= Total availability@ June opening -Mktg.demand for JUNe’07 -Mktg. demand for July’07

In the same manner the excess quantity by model demand is also calculated.By calculating this we come to know that for the following products, there is no need to produce batches during the month MAY.It can be postponed to the next month. A list of products is prepared, for which there was no need to produce batches in the month of MAY.

82

By considering the PTS value and batch size of the particular SKU and batches produced, a value of the amount is got.Total value of the production which could have postponed is RS 3.74 Crores, if the model demand were used.

Findings of the project

In APOD and SMAN group, model has shown favorable results with the actual data.

While in PVA group, it stresses more on recent data, rather than relying on data of 24 months.

No. of products which are oversold can be reduced by using model for demand forecasting. And production planning can be made easy.

Production, which could have postponed for the next month. It could have resulted unblocking the amount for a month and that could have spent anywhere else for some crucial project.

Recommendations

83

Model should be used for SKU (gross) level forecasting, with due personnel intervention. By adding insights, demand given by the model should be used.

Critical SKUs, which show high skewness due to promotion, should be just reviewed when there is no scheme. It should not be reviewed when there is a scheme available.

Future scopes of the model

New parameters can be added later, if identified Parameters which can be added later on data availability.

Secondary sales No. of stockiest No. of MRs (division specific)

If data is made available, then the model for regional level can be extended up to C&F agents level

Marketing efforts can be optimized Production planning can be enhanced up to the satisfactory level Efficient dispatch planning Software for forecasting can be implemented.

Other uses of the model

Data interpretation Can be used for taking decision related to product continuation/competitors Can be used for setting field target Can be used for deciding promotional strategies Can be used for recruitment of contract labors used in warehouse /shop floor. Inventory level can be maintained and can lessen the hassles.

Limitations

In practical situation, multiple regression model’s basic assumptions are violated.

Hence that can not give the exact output. Model does not give reliable results on having less data. Model gives more importance to seasonality Model Can’t tackle the sudden hike or sudden steep MS Excel doesn't allow more than 15 variables It doesn’t work for whole product basket Model doesn’t give significant results when it comes to products, which shows

high skewness due to promotion and scheme

Limitations due to human bias

84

May be wrong selection of parameters Improper estimation of future values of independent parameters Discrepancies with the data Improper quantification of qualitative data

Abbreviation used

TPL-Torrent Pharmaceuticals Ltd.SCM-Supply Chain ManagementDPC- Demand Planning CellSKU- Stock keeping UnitPVA- Prima, Vista, AlphaAPOD-Azuca, Psycan, Omega, DeltaSMAN-Sensa, Mind, Axon, NeuronCODIS-Correlation of orders, demand, inventories, sales

References

85

Websites (Only a few has been mentioned)www.demandplanning.netwww.chass.ncsu.eduwww.torrentpharma.com

Books

(1)FORECASTING: Methods and Applications-Spyros Makridakis & Steven C. Wheelwright & Rob J.Hymdman-Third edition-Wiley publication

(2)Microsoft Excel: Data Analysis and Business Modeling-Wayne L. Winston-Prentice-Hall India-2004

86

http://www.demandplanning.net/

Torrent pharmaceuticals limited ; suchit

Education

Transcript of Torrent pharmaceuticals limited ; suchit