SCALABLE SCHEDULING OF UPDATES IN STREAMING DATA WAREHOUSES
-
Upload
ramakrishnan -
Category
Documents
-
view
16 -
download
0
description
Transcript of SCALABLE SCHEDULING OF UPDATES IN STREAMING DATA WAREHOUSES
SCALABLE SCHEDULING OF UPDATES IN STREAMING DATA WAREHOUSES
PROJECT REPORT
Submitted by
T.MANUJA
Register No: 075002620035
in partial fulfillment for the award of the degree
ofMASTER OF SCIENCEinINFORMATION TECHNOLOGY
VIVEKANANDHA INSTITUTE OF ENGINEERING ANDTECHNOLOGY FOR WOMEN, TIRUCHENGODE-637 205
JULY 2012
VIVEKANANDHA INSTITUTE OF ENGINEERINGAND TECHNOLOGY FOR WOMEN,TIRUCHENGODE-637 205
Department of Applied Science
PROJECT WORK
JULY 2012
This is to certify that the project entitled
SCALABLE SCHEDULING OF UPDATES IN STREAMING DATA WAREHOUSES
is the bonafide record of project work done by
T.MANUJA
Register No: 075002620035
of M.Sc (INFORMATION TECHNOLOGY) during the year 2011-2012. ----------------------- -------------------------Project Guide Head of the DepartmentMr.M.DINESH, M.Sc., Mr.K.P.MOHAN, M.Sc., M.Phil., Lecturer, Lecturer,Department Of Applied Science. Department Of Applied Science.
Submitted for the Project Viva-Voce examination held on______________
------------------------ -----------------------Internal Examiner External Examintion
DECLARATION
I affirm that the project work titled SCALABLE SCHEDULING OF UPDATES IN STREAMING DATA WAREHOUSES is being submitted in partial fulfillment for the award of M.Sc.(INFORMATION TECHNOLOGY) is the original work carried out by me.It has not formed the part of any other project work submitted for award of any degree or diploma, either in this or any other university.
( Signature of the Candidate) T.MANUJA. Register No:075002620035
I certify that the declaration made above by the candidate is true
Signature of the Guide, Mr.M.DINESH, M.Sc., Lecturer, Department of Applied Science.
ACKNOWLEDGEMENT
I would like to take this opportunity to say my thanks to the people who have helped me to make this project a reality.
I wish to express my honorable thanks to Chairman & Secretary Prof.Dr.M.KARUNANITHI, B.Pharm.,M.S., Ph.D., D.Litt.,VIVEKANANDHA EDUCATIONAL INSTITUTIONS for providing an extra ordinary infrastructure. I express my sincere thanks to the Principal Dr.V.SUBRAMANIA BHARATHI , B.E(Struct)., M.E., Ph.D., M.I.E., M.I.C.I., M.I.S.T.E.,for his kind encouragement to do this project in IT Industry. I thank Head, Department of Applied Science , Mr.K.P.MOHAN, M.Sc., M.Phil., encouragement, valuable suggestions and support in doing this project. I would like to thank the internal Project Guide Mr.M.DINESH, M.Sc., Lecturer Department of Applied Science for the kind co-operation and support rendered in making project success.
I would like to thank External Project Guide Mr.R.Rajesh, M.Sc.,Team Leader, SPIRO SOFTWARE SOLUTIONS in the company for permit me to do project and give this full support to complete this project.
I would like to say my sincere thanks to all other faculty members, Department of Applied Science for their active and kind guidance and useful advises for my project. Above all I would like to express my sincere gratitude and thanks to my Parents and Friends for their valuable comments and suggestions for making this work a success.
ABSTRACT
The scheduling framework that handles the complications encountered by a stream warehouse: view hierarchies and priorities, data consistency, inability to preempt updates, heterogeneity of update jobs caused by different inter-arrival times and data volumes among different sources, and transient overload. In existing, traditional data warehouses are typically refreshed during downtimes, streaming warehouses are updated as new data arrive. In this project, to discuss update scheduling in streaming data warehouses, which combine the features of traditional data warehouses and data stream systems. To model the streaming warehouse update problem as a scheduling problem, where jobs correspond to processes that load new data into tables, and whose objective is to minimize data staleness over time at time t, if a table has been updated with information up to some earlier time r, its staleness is t minus r. A novel feature of our framework is that scheduling decisions do not depend on properties of update jobs such as deadlines, but rather on the effect of update jobs on data staleness. Finally, to present a suite of update scheduling algorithms and extensive simulation experiments to map out factors which affect their performance.