A study of Data Quality and Analytics

11
Research Topic A Study of Data Quality and Analytics 1 Experimental work •Predictive Modeling - Linear vs. Nonlinear • GARCH (Generalized Autoregressive Conditional Heteroskedasticity ) model application on Time series data GARCH vs. ANN with Heteroskedasticity Deployment of Predictive Model synopsis

description

 

Transcript of A study of Data Quality and Analytics

Page 1: A study of Data Quality and Analytics

1

Research Topic

A Study of Data Quality and Analytics

Experimental work

•Predictive Modeling - Linear vs. Nonlinear• GARCH (Generalized Autoregressive

Conditional Heteroskedasticity ) model application on Time series data

• GARCH vs. ANN with Heteroskedasticity • Deployment of Predictive Model using

PMML (Predictive Model Markup Language) synopsis

Page 2: A study of Data Quality and Analytics

2

Literature Survey

Areas of focus:▪ Predictive Analytics – Various Applications of

Predictive Models ▪ PMML

Resources▪ IEEE Computer Society, Transaction publications▪ International Journal for Research and Application▪ International Institute of Forecasters ▪ ACM Journals /transactions

Status ▪ Literature survey - about 85% completion ▪ Relevant publications extracted : 75+▪ Further survey – Deployment of Model using PMML

Litarature_survey

Page 3: A study of Data Quality and Analytics

3

Publications

Application of R Programming for Forecasting Day-ahead electricity demand - Internal Journal of Computer Science Issues, Vol 9, Issue 6, no 1, Nov 2012

Mining of Time series data for forecasting Day and Night variances in electricity demand - National Conference on Business Analytics and Business Intelligence , Institute of Public Enterprise , Jan 2013

Forecasting of Electricity Demand using SARIMA and Feed Forward Neural Network Models, Accepted for publication in International Journal of Research in Computer Application and Management

Page 4: A study of Data Quality and Analytics

4

Application of R for GARCH

Evaluate GARCH and ARIMA model for forecasting Day ahead electricity demand

Data - Daily Power consumption data Develop Testing Procedure for GARCH

using R programming

Data collection, Data cleaning, Setup the environment, Evaluate Predictive Models , Analysis

paper_1

Page 5: A study of Data Quality and Analytics

5

GARCH and SARIMA Modeling Evaluate GARCH and SARIMA model for

forecasting day and night variances in electricity demand

Data - Hourly Power consumption data GARCH forecasting has lower RMSE (Root Mean

Square Error) than that of SARIMA forecasting

Data collection, Data cleaning, Setup the environment, Evaluate Predictive Models , Analysis

paper_2

Page 6: A study of Data Quality and Analytics

6

Neural Networks and SARIMA Modeling Evaluate SARIMA and Neural Networks model for

forecasting monthly electricity demand Data - Monthly Power consumption data RMSE of SARIMA fitted model is smaller than that of

NN whereas NN forecasting has smaller RMSE (Root Mean Square Error) than that of SARIMA forecasting

Data collection, Data cleaning, Setup the environment, Evaluate Predictive Models , Analysis

paper_3

Page 7: A study of Data Quality and Analytics

7

Knowledge areas

Predictive methods and techniques – ▪ Linear Regression – ARMA, ARIMA, SARIMA ▪ Non-linear - Neural Networks, GARCH

Tools ▪ R Project, IBM SPSS

Data - Power Consumption , Stock exchange data

PMML - Predictive Model Markup Language Model Deployment using PMML

Page 8: A study of Data Quality and Analytics

8

Work plan – Next 6 months

Evaluate the GARCH model for comparing the share price performance of 3 companies

Prototype Development for the Deployment of Predictive model using PMML

Page 9: A study of Data Quality and Analytics

WHAT IS ARCH?

Autoregressive Conditional Heteroskedasticity

Predictive (conditional) Uncertainty (heteroskedasticity) That fluctuates over time

(autoregressive)

Page 10: A study of Data Quality and Analytics

FROM THE SIMPLE ARCH GREW: GARCH

GENERALIZED ARCH (Bollerslev) a most important extension

Tomorrow’s variance is predicted to be a weighted average of the Long run average variance Today’s variance forecast The news (today’s squared return)

Page 11: A study of Data Quality and Analytics

11

Thank you