SCALABLE SCHEDULING OF UPDATES IN STREAMING DATA WAREHOUSES

download SCALABLE SCHEDULING OF UPDATES  IN STREAMING DATA WAREHOUSES

of 6

description

The goal of a streaming warehouse is to propagate new data across all the relevant tables and views as quickly as possible. The external sources push append-only data streams into the warehouse with a wide range of inter-arrival times.

Transcript of SCALABLE SCHEDULING OF UPDATES IN STREAMING DATA WAREHOUSES

SCALABLE SCHEDULING OF UPDATES IN STREAMING DATA WAREHOUSES

PROJECT REPORT

Submitted by

T.MANUJA

Register No: 075002620035

in partial fulfillment for the award of the degree

ofMASTER OF SCIENCEinINFORMATION TECHNOLOGY

VIVEKANANDHA INSTITUTE OF ENGINEERING ANDTECHNOLOGY FOR WOMEN, TIRUCHENGODE-637 205

JULY 2012

VIVEKANANDHA INSTITUTE OF ENGINEERINGAND TECHNOLOGY FOR WOMEN,TIRUCHENGODE-637 205

Department of Applied Science

PROJECT WORK

JULY 2012

This is to certify that the project entitled

SCALABLE SCHEDULING OF UPDATES IN STREAMING DATA WAREHOUSES

is the bonafide record of project work done by

T.MANUJA

Register No: 075002620035

of M.Sc (INFORMATION TECHNOLOGY) during the year 2011-2012. ----------------------- -------------------------Project Guide Head of the DepartmentMr.M.DINESH, M.Sc., Mr.K.P.MOHAN, M.Sc., M.Phil., Lecturer, Lecturer,Department Of Applied Science. Department Of Applied Science.

Submitted for the Project Viva-Voce examination held on______________

------------------------ -----------------------Internal Examiner External Examintion

DECLARATION

I affirm that the project work titled SCALABLE SCHEDULING OF UPDATES IN STREAMING DATA WAREHOUSES is being submitted in partial fulfillment for the award of M.Sc.(INFORMATION TECHNOLOGY) is the original work carried out by me.It has not formed the part of any other project work submitted for award of any degree or diploma, either in this or any other university.

( Signature of the Candidate) T.MANUJA. Register No:075002620035

I certify that the declaration made above by the candidate is true

Signature of the Guide, Mr.M.DINESH, M.Sc., Lecturer, Department of Applied Science.

ACKNOWLEDGEMENT

I would like to take this opportunity to say my thanks to the people who have helped me to make this project a reality.

I wish to express my honorable thanks to Chairman & Secretary Prof.Dr.M.KARUNANITHI, B.Pharm.,M.S., Ph.D., D.Litt.,VIVEKANANDHA EDUCATIONAL INSTITUTIONS for providing an extra ordinary infrastructure. I express my sincere thanks to the Principal Dr.V.SUBRAMANIA BHARATHI , B.E(Struct)., M.E., Ph.D., M.I.E., M.I.C.I., M.I.S.T.E.,for his kind encouragement to do this project in IT Industry. I thank Head, Department of Applied Science , Mr.K.P.MOHAN, M.Sc., M.Phil., encouragement, valuable suggestions and support in doing this project. I would like to thank the internal Project Guide Mr.M.DINESH, M.Sc., Lecturer Department of Applied Science for the kind co-operation and support rendered in making project success.

I would like to thank External Project Guide Mr.R.Rajesh, M.Sc.,Team Leader, SPIRO SOFTWARE SOLUTIONS in the company for permit me to do project and give this full support to complete this project.

I would like to say my sincere thanks to all other faculty members, Department of Applied Science for their active and kind guidance and useful advises for my project. Above all I would like to express my sincere gratitude and thanks to my Parents and Friends for their valuable comments and suggestions for making this work a success.

ABSTRACT

The scheduling framework that handles the complications encountered by a stream warehouse: view hierarchies and priorities, data consistency, inability to preempt updates, heterogeneity of update jobs caused by different inter-arrival times and data volumes among different sources, and transient overload. In existing, traditional data warehouses are typically refreshed during downtimes, streaming warehouses are updated as new data arrive. In this project, to discuss update scheduling in streaming data warehouses, which combine the features of traditional data warehouses and data stream systems. To model the streaming warehouse update problem as a scheduling problem, where jobs correspond to processes that load new data into tables, and whose objective is to minimize data staleness over time at time t, if a table has been updated with information up to some earlier time r, its staleness is t minus r. A novel feature of our framework is that scheduling decisions do not depend on properties of update jobs such as deadlines, but rather on the effect of update jobs on data staleness. Finally, to present a suite of update scheduling algorithms and extensive simulation experiments to map out factors which affect their performance.