Operations on egrocery

6
OPERATION 1) INTRODUCTION- a) Before we go into operation first we will have three type of data set- 1) Master data set which contains all the sets combined 2) E-grocery subset which contains basic details of subjects belonging to e- grocery and contains questions 20 till 32 only 3) Non E-grocery subset which contains basic details of subjects and contains questions 33 till 42 b) If we try to get data on the subject’s 2) OBJECTIVE- we need to do analyses in different perspectives but two main objectives are- a. To analyse people who are under e- grocery – their satisfaction level, is there any need for development in the service by these e-groceries firm? b. To analyse people who are under non e- grocery -compare with people who are in e-grocery, reason(factors) affecting the people for not going into e- grocery thereby detecting the areas to be developed and people understanding on e-grocery. 3) THINGS TO BE KNOWN- A. We will denote from question from 1 till 19 as ‘X’ – independent variable (here within these question we can make analyses but for a broader perspective will denote it as like this B. From question 20 till 32 as dependent variable –‘Y1’ with respect to first sub data set and from questions 33 till 42 as dependent variable –‘Y2’ with respect to second sub data set 4) OPERATION- At initial level we shall start by three approach - a) we can try factor analysis using R language just to identify what are all the variables in data turned out to be quite significant and narrow down our analysis (but this would be informal approach)

description

egrocery in statistical view

Transcript of Operations on egrocery

Page 1: Operations on egrocery

OPERATION1) INTRODUCTION-

a) Before we go into operation first we will have three type of data set-1) Master data set which contains all the sets combined2) E-grocery subset which contains basic details of subjects belonging to e- grocery

and contains questions 20 till 32 only3) Non E-grocery subset which contains basic details of subjects and contains

questions 33 till 42b) If we try to get data on the subject’s

2) OBJECTIVE- we need to do analyses in different perspectives but two main objectives are-a. To analyse people who are under e- grocery – their satisfaction level, is there any

need for development in the service by these e-groceries firm? b. To analyse people who are under non e- grocery -compare with people who are

in e-grocery, reason(factors) affecting the people for not going into e- grocery thereby detecting the areas to be developed and people understanding on e-grocery.

3) THINGS TO BE KNOWN- A. We will denote from question from 1 till 19 as ‘X’ – independent variable (here

within these question we can make analyses but for a broader perspective will denote it as like this

B. From question 20 till 32 as dependent variable –‘Y1’ with respect to first sub data set and from questions 33 till 42 as dependent variable –‘Y2’ with respect to second sub data set

4) OPERATION- At initial level we shall start by three approach -

a) we can try factor analysis using R language just to identify what are all the variables in data turned out to be quite significant and narrow down our analysis (but this would be informal approach)

b) Either from former step or we can start dependency test like Chi-square test –X2, G2 test and linear trend analysis variables on ordinal variables, for this dependency test we should take Y1 and Y2 as dependent factors and X as independent factors thus there will n number of dependency test which will give us insight that for what are all the factors which influence decision made by subjects on particular question (note – these test can be conducted only for two categorical variable)

c) Similarly we can try with odds and odds ratio (here dependent variable should have only two options).

Page 2: Operations on egrocery

Dependency test- Of many different combination of dependency test the interesting ones would be

1) Take y1 as satisfaction question 21 Vs Question 1-12(gender, marital status..etc.) Here we can convert continuous variables like age, monthly income into categorical variable by grouping here we should use chi-square and G-square.If we detect dependency we can further go for standardized Pearson X2 so that we can find cells having lack of fit this helps in narrowing down the factor having highest dependency for example consider table between satisfaction level and place of living here if cell having satisfaction-strongly agree for a person from south Chennai have highest positive standardized Pearson X2 then it shows that people from south Chennai are highly satisfied with e-grocery.

2) Now consider Q22 Vs Q11, 12,2- this might give insight on whether subjects feelings on security is affected by their age or proficiency in internet or amount time spent in internet

3) Consider Q35 Vs education level, age – here both variable can be considered as ordinal thus linear trend analysis can be carried out thus it give insight on whether the degree of insecurity about identity theft increases as education level or age increases (note-using one sided hypothesis). Similarly we can conduct this for Q23, Q41 Vs income level.

4) We can consider Q25 Vs Q32 might help firms on payment mode5) Q33Vs place of living- this will give insights on which area of people are less

accessible for e- grocery –for example people from north Chennai might not have so many e-grocery stores which may delay in delivery thus there may be more need of warehouses or stores in north Chennai.

Thus by not only carrying out only the important ones if we take different independent variable and dependent variable and if take out the significant dependencies we might be able to detect the areas that have been developed and to be developed and whop are all their target customer and who can be made as their customer, if the variable are ordinal that will give us much more understanding.

Page 3: Operations on egrocery

Odds ,odds ratio- 1) At basic level we can find odds for a person having specific quality shop at

e-grocery or not eg- take e- grocery as success consider gender i.e. we can find odds for male to shop online and odds for female to shop online.Odds ratio- here we can find odds for female to shop online compared to a male to shop online.

2) Marginal odds ratio, conditional odds ratio –Above example for odds ratio is marginal odds ratio. For the example we add another factor(x) say sector i.e. we are conditioning on factor ‘sector’Therefore this would be a 3 dimensional table- here we can find odds for a woman from IT sector choosing e-grocery compare to a man from IT sector choosing e-grocery, similarly for other sector.This helps us on who are all your potential customer and what are customer remaining to be covered. Thus this will narrow down on marketing strategy also. Similarly we can use different type factors affecting people to choose e-grocery.

Binary logistic regression model- 1) We can take y as people choosing e- grocery as success and consider

independent variables like Q1-12, Q14-18 and build binary logistic regression model. This model will help us interpreting the factors ‘X’ affecting probability of success i.e. here people choosing e- grocery for e.g. if slope coefficient for age is negative means as age increases the odds for people choosing e-grocery reduces Consider if slope coefficient for income is positive means as income increases the odds for people choosing e-grocery increases- high income prefer e-grocery.

Multinomial logistic regression model- 1) Here consider y which has more than 2 categories e.g.- consider Q31

medium of online shopping as Y and consider suitable X and build multinomial logistic regression model here there three categories-pc, tablet ,mobile thus masking one category we will get coefficients for other two i.e. probabilities for person to choose tablet or mobile can be find using coefficients and interpretations are quite similar to above This will help in firms on the matter of advertising e.g. if people using mobiles and tablet more means then there should be proper application (in android or other) and connectivity.

2) Similarly consider dependent variable as Q32 – medium of payment this will help in financial front for e.g. online payment they might give offers or by cards there might be shopping points etc.

3) By taking y which is most common category as baseline or reference we mask that effect and build model with remaining categories we get baseline category model

Page 4: Operations on egrocery

Ordinal logistic regression model

1) Here it as same as multinomial logistic regression model except dependent categories are ordinal (there is inherent ranking) We can build 10 ordinal logistic regression model under e- grocer taking Q21-30 as dependent variable.We can build 10 ordinal logistic regression model under non e- grocery taking Q33-42 as dependent variable.

Ordinal response- cumulative category logit 1) This cumulative category logit gives better interpretation in case of ordinal

responses. From above example we have y as 1) Strongly disagree 2) disagree 3) neutral 4) agree 5) strongly agreeHere first cumulative logit will have strongly disagree divided by sum of remaining categories, next will sum of first two categories then sum of 3,4,5 and so onthese type of logits helps interpretation as there is degree of ranking on opinion on given question.

Ordinal response- adjacent category logit 1) Here the logits are such that it compares only two categories in a

particular pattern such that it compares only neighbouring catgory e.g. it will start from comparing probabilities of strongly disagree with disagree, next one disagree to neutral.in this type of situation adjacent category logit is most suitable

Other than these analysis if we have data base of a firm from e- grocery we can do time-series analysis on number of products being bought at which part of time and we can do conditioning using factors from questionnaire.

Further we can try multivariate analysis (more than one Y) . we can use clustering, structural equation modelling ,decision tree.

Note- the data sets should be altered suitable to the tools that is being used for analysis.