Data Warehousing & Data Mining Aqeel Al-Jishi Nick Farley Masaki Osada A review of Data...

19
Data Warehousing & Data Mining Aqeel Al-Jishi Nick Farley Masaki Osada A review of Data Warehousing and Data Mining

Transcript of Data Warehousing & Data Mining Aqeel Al-Jishi Nick Farley Masaki Osada A review of Data...

  • Slide 1
  • Data Warehousing & Data Mining Aqeel Al-Jishi Nick Farley Masaki Osada A review of Data Warehousing and Data Mining
  • Slide 2
  • Data Warehousing Defined: Data Warehouse The main repository of an organization historical data, it corporate memory It contains the raw material for managements decision support system Data Mining (Knowledge Discovery) The process of analyzing data from different perspectives and summarizing it into useful information
  • Slide 3
  • History In the late 1980s to early 1990s distinct computer databases were created These databases were designed to meet the data analysis needs that Operational Systems were failing to support Operational Systems failed for many reasons: Long report generation time Inability to handle loads and not optimized Many organizations had multiple Operation Systems which was a reporting nightmare Custom applications were required for reporting which slowed reporting and increased costs
  • Slide 4
  • Goals of Data Warehousing Make information easily accessible Provide endless views and combinations of data (Slicing & Dicing) Query results returned with minimal wait time Be adaptive and resilient to change Designed with the correct users in mind (business users and management) Keep information secure but allow access to insiders
  • Slide 5
  • Common Issues Problems with Data Acquisition may arise 80% of the time building a data warehouse will be spent on extracting, cleaning, and loading data Errors with data can be rampant: Incomplete Data (missing fields) Incorrect Data (wrong calculations) Readability Issues (strange formatting)
  • Slide 6
  • Differing Design Views, Relational Relational Model - Bill Inmon The data warehouse is but one part of the Business Intelligence system An enterprise has one data warehouse and data marts source their information from it. Uses 3 rd Normal Form to store information in the database
  • Slide 7
  • Differing Design Views, Dimensional Dimensional model - Ralph Kimball Data warehouse is the conglomerate of all data marts within the enterprise Uses Star or Snowflake schema to emulate a multi-dimensional database Overall information is NOT normalized in the database
  • Slide 8
  • Terminology Data Mart A data mart is a subset of data from the data warehouse, typically used when the broad scope of the data warehouse isnt needed Business departments commonly create, use, and alter their own data marts. Level of Granularity The amount and level of data brought in to the data warehouse during acquisition
  • Slide 9
  • Dimensional Model Dimension Table A table with a single-part primary key and descriptive attribute columns. Describes the business entities of an enterprise, represented as hierarchical, categorical information such as time, departments, locations, and products
  • Slide 10
  • Dimensional Model Fact Table A table with numeric performance measures (metrics) characterized by a composite key The elements for the composite key come from the foreign keys from a dimensional table
  • Slide 11
  • What is Data Mining? Basic Concept Getting Useful Information out of a large amount of Data In Business World Getting Business Intelligence out of a large amount of Information What is the difference between Business Intelligence and Information?
  • Slide 12
  • Evolutionary Step Business Question Enabling Technologies Characteristics Data Collection (1960s) What was my total revenue in the last five years? Computers, tapes, disks Retrospective, static data delivery Data Access (1980s) What were unit sales in New England last March? Relational Databases, Structured Query Language Retrospective, dynamic data delivery at record level Data Warehousing & Decision Support (1990s) What were unit sales in New England last March? Drill down to Boston. On-line analytic processing, multidimensional databases Retrospective, dynamic data delivery at multiple levels Data Mining (Emerging Today) Whats likely to happen to Boston unit sales next month? Why? Advanced algorithms, multiprocessor computers, massive databases Prospective, proactive information delivery
  • Slide 13
  • Focus of Data Mining Predict the future trends and behaviors using the past information Prospective analysis
  • Slide 14
  • Uniqueness of Data Mining Heuristic in nature Capable of finding hidden patterns users would never think of Modeling is the key technology to find patterns Self-guiding
  • Slide 15
  • An example of Modeling YesterdayTodayTomorrow Static Information Known Dynamic Information Known Target Build a model based on known combinations of static and dynamic information to see if it indicates the result of a dynamic information in the future.
  • Slide 16
  • Real world examples of Data Mining Diaper and Beer VISA Fraud Detection Blockbuster Entertainment Personalized Recommendation Suppliers to Wal-mart Sales prediction NBA Teams Strategy analysis
  • Slide 17
  • ERPCRMSCM ERP Data Warehouse Data Mart DM
  • Slide 18
  • Questions?
  • Slide 19
  • Sources BOOKS: Mastering Data Warehouse Design, Relational and Dimensional Techniques Claudia Imhoff Nicholas Galemmo Jonathan G Geiger 2003, Wiley Publishing, Inc. Indianapolis, Indiana The Data Warehouse Toolkit second edition, The Complete Guide to Dimensional Modeling Ralph Kimball Margy Ross 2002, Wiley and Songs, Inc. New York, NY Statistical Data Mining and Knowledge Discovery Edited by Hamparsum Bozdogan 2004 CRC Press LLC, Boca Raton, Florida Data Mining and Business Intelligence: A Guide to Productivity Stephan Kudyba and Richard Hoptroff 2001 Idea Group Publishing, London, England Modern Systems Analysis and Design, fourth edition Jeffrey A. Hoffer, et al 2005, 2002 by Pearson Education, Inc. Upper Saddle River, New Jersey Database Systems An Application-Oriented Approach Michael Kifer, Arthur Bernstein, Philip Lewis 2005 Pearson Education, Inc., New York, New York ELECTRONIC: http://www.1keydata.com/datawarehousing/inmon-kimball.html http://www.fortunecity.com/skyscraper/oracle/699/orahtml/dbmsmag/9807d05.html http://en.wikipedia.org/wiki/Data_warehouse http://en.wikipedia.org/wiki/Dimension_(data_warehouse) http://en.wikipedia.org/wiki/Data_mining http://www.anderson.ucla.edu/faculty/jason.frand/teacher/technologies/palace/datamining.htm http://www.dwinfocenter.org http://www.thearling.com/text/dmwhite/dmwhite.htm http://www.learndatamodeling.com/dimension.htm http://www.learndatamodeling.com/fact.htm