IS 466 ADVANCE TOPICS IN INFORMATION SYSTEMS 1 Chapter 2 Information Collection, Processing &...

download IS 466 ADVANCE TOPICS IN INFORMATION SYSTEMS 1 Chapter 2 Information Collection, Processing & Retrieval: Part 1: Big Data Analytics 1436/1437 Semester.

If you can't read please download the document

Transcript of IS 466 ADVANCE TOPICS IN INFORMATION SYSTEMS 1 Chapter 2 Information Collection, Processing &...

  • Slide 1
  • IS 466 ADVANCE TOPICS IN INFORMATION SYSTEMS 1 Chapter 2 Information Collection, Processing & Retrieval: Part 1: Big Data Analytics 1436/1437 Semester II Dr. Sapiah Sakri Assistant Professor [email protected]
  • Slide 2
  • OPENING SCENARIO 2
  • Slide 3
  • 3 LEARNING OBJECTIVES By the end of the lecture the students should be able to: Recognize what is Big Data Analytics and its challenges. Recognize new models and/or techniques in Big Data Analytics. Recognize examples of how Big Data Analytics is used today. Recognize the benefits of Big Data Analytics
  • Slide 4
  • WHAT ARE WE GOING TO UNDERSTAND What is Big Data? Huge amount of Data - Are we ready? Why we landed up there? To whom does it matter What are the concerns? Tools and Technologies Is Big Data Hadoop Moving Towards Analytics Applications 4
  • Slide 5
  • INTRODUCTION Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications. The challenges that we face with dbms tools and other tehnologies is capture, curation, storage, search, sharing, transfer, analysis, and visualization.
  • Slide 6
  • SIMPLE TO START What is the maximum file size you have dealt so far? Movies/Files/Streaming video that you have used? What have you observed? What is the maximum download speed you get? Simple computation How much time to just transfer. 6
  • Slide 7
  • WHAT IS BIG DATA? Every day, we create 2.5 quintillion bytes of data so much that 90% of the data in the world today has been created in the last two years alone. This data comes from everywhere: sensors used to gather climate information, posts to social media sites, digital pictures and videos, purchase transaction records, and cell phone GPS signals to name a few. This data is big data. 7
  • Slide 8
  • HUGE AMOUNT OF DATA There are huge volumes of data in the world: From the beginning of recorded time until 2003, IBM created 5 billion gigabytes (exabytes) of data. In 2011, the same amount was created every two days In 2013, the same amount of data is created every 10 minutes. 8
  • Slide 9
  • 9 2+ billion people on the Web by end 2011 30 billion RFID tags today (1.3B in 2005) 4.6 billion camera phones world wide 100s of millions of GPS enabled devices sold annually 76 million smart meters in 2009 200M by 2014 12+ TBs of tweet data every day 25+ TBs of log data every day ? TBs of data every day 9 9 THE SOCIAL LAYER IN AN INSTRUMENTED INTERCONNECTED WORLD
  • Slide 10
  • 10 HUGE AMOUNT OF DATA
  • Slide 11
  • 11
  • Slide 12
  • WEB 2.0 IS DATA-DRIVEN 12
  • Slide 13
  • THE WORLD OF DATA-DRIVEN APPLICATIONS 13
  • Slide 14
  • CHARACTERISTICS OF BIG DATA Collectively Analyzing the broadening Variety Responding to the increasing Velocity Cost efficiently processing the growing Volume Establishing the Veracity of big data sources 30 Billion RFID sensors and counting 1 in 3 business leaders dont trust the information they use to make decisions 50x 35 ZB 2020 80% of the worlds data is unstructured 2010 14
  • Slide 15
  • CHARACTERISTICS OF BIG DATA 15
  • Slide 16
  • Volume: Enterprises are awash with ever-growing data of all types, easily amassing terabyteseven petabytesof information. Turn 12 terabytes of Tweets created each day into improved product sentiment analysis Convert 350 billion annual meter readings to better predict power consumption Velocity: Sometimes 2 minutes is too late. For time-sensitive processes such as catching fraud, big data must be used as it streams into your enterprise in order to maximize its value. Scrutinize 5 million trade events created each day to identify potential fraud Analyze 500 million daily call detail records in real-time to predict customer churn faster The latest I have heard is 10 nano seconds delay is too much. Variety: Big data is any type of data - structured and unstructured data such as text, sensor data, audio, video, click streams, log files and more. New insights are found when analyzing these data types together. Monitor 100s of live video feeds from surveillance cameras to target points of interest Exploit the 80% data growth in images, video and documents to improve customer satisfaction 16 EXAMPLES
  • Slide 17
  • FINALLY. `Big- Data is similar to Small-data but bigger But having data bigger it requires different approaches: Techniques, tools, architecture with an aim to solve new problems or old problems in a better way 17
  • Slide 18
  • WHY BIG DATA? Key enablers for the appearance and growth of Big- Data are: Increase in storage capabilities Increase in processing power Availability of data 18
  • Slide 19
  • TEN COMMON BIG DATA PROBLEMS 19
  • Slide 20
  • THE BIG DATA OPPORTUNITY 20
  • Slide 21
  • INDUSTRIES ARE EMBRACING BIG DATA 21
  • Slide 22
  • WHOM DOES IT MATTER Financial Services 22 Business Community - New tools, new capabilities, new infrastructure, new business models etc., Health Services
  • Slide 23
  • BIG DATA EXPLORATION: VALUE 23 File Systems Relational Data Content Management Email CRM Supply Chain ERP RSS Feeds Cloud Custom Sources Data Explorer Application/ Users Find, Visualize & Understand all big data to improve business knowledge Greater efficiencies in business processes New insights from combining and analyzing data types in new ways Develop new business models with resulting increased market presence and revenue
  • Slide 24
  • BIG DATA EXPLORATION: CUSTOMER EXAMPLE Exploring 4 TB to drive point business solutions ( supplier portal, call center, etc.) Single-point of data fusion for all employees to use Reduced costs & improved operational performance for the business Can you navigate and explore all enterprise and external content in a single user interface? Can you quickly identify areas of data risk? Do you have a logical starting point for your big data initiatives? Key Questions to Ask Can you separate the noise from useful content? Can you perform data exploration on large and complex data? Can you find insights in new or unstructured data types (e.g. social media and email)? 24 Airline Manufacturer
  • Slide 25
  • ENHANCED 360 VIEW OF THE CUSTOMER: NEEDS Need a deeper understanding of customer sentiment from both internal and external sources Extend existing customer views (MDM, CRM, etc) by incorporating additional internal and external information sources Desire to increase customer loyalty and satisfaction by understanding what meaningful actions are needed Challenged getting the right information to the right people to provide customers what they need to solve problems, cross-sell & up-sell 25
  • Slide 26
  • ENHANCED 360 CUSTOMER VIEW: CUSTOMER EXAMPLE Create Facebook Identify 200+ different customer profiles to help in fulfillment & marketing efforts Leverage new data types in customer analysis How are you driving consistency across your information assets when representing your customer, clients, partners etc.? How can a complete view of the customer enhance your line of business users and result in better business outcomes? Key Questions to Ask Can you identify and deliver all data as it relates to a customer, product, competitor to those to need it? Can you gathering insights about your customers from social data, surveys, support emails, etc.? Can you combine your structured and unstructured data to run analytics? Product Starting Point: InfoSphere MDM Server, Data Explorer, BigInsights 26 Confidential, Internal Use Only
  • Slide 27
  • OPERATIONS ANALYSIS: NEEDS Benefits: Gain real-time visibility into operations, customer experience, transactions and behavior Proactively plan to increase operational efficiency Analyze a variety of machine data for improved business results Business Challenges: Complexity and rapid growth of machine data Difficult to capture small fraction of machine for better decision In-ability to analyze machine data and combine it with enterprise data for a full view analysis Identify and investigate anomalies Monitor end-to-end infrastructure to proactively avoid service degradation or outages
  • Slide 28
  • Raw Logs and Machine Data Indexing, Search Statistical Modeling Root Cause Analysis Federated Navigation & Discovery Real-time Analysis Only store what is needed OPERATIONS ANALYSIS: VALUE & DIAGRAM Machine Data Accelerator
  • Slide 29
  • OPERATIONAL - ANALYSIS Capabilities: Hadoop & Stream Computing Intelligent Infrastructure Management: log analytics, energy bill forecasting, energy consumption optimization, anomalous energy usage detection, presence-aware energy management Optimized building energy consumption with centralized monitoring; Automated preventive and corrective maintenance
  • Slide 30
  • Integrate big data and data warehouse capabilities to increase operational efficiency DATA WAREHOUSE AUGMENTATION: NEEDS Need to leverage variety of dataExtend warehouse infrastructure Optimized storage, maintenance and licensing costs by migrating rarely used data to Hadoop Reduced storage costs through smart processing of streaming data Improved warehouse performance by determining what data to feed into it Structured, unstructured, and streaming data sources required for deep analysis Low latency requirements (hoursnot weeks or months) Required query access to data
  • Slide 31
  • DATA WAREHOUSE AUGMENTATION: CUSTOMER EXAMPLE Are you drowning in very large data sets (TBs to PBs) that are difficult and costly to store? Are you able to utilize and store new data types? Are you facing rising maintenance/licensing costs? Do you use your warehouse environment as a repository for all data? Internal Use Only Creates pre-processing hub and performs ad hoc analysis Hadoop-based landing zone used to store, manage and analyze structured, semi-structured and multi-structured data before moving to the warehouse Benefits: Data warehouse optimized for workload and performance Utilized InfoSphere BigInsights, InfoSphere DataStage Do you have a lot of cold, or low-touch, data driving up costs or slowing performance? Do you want to perform analysis of data in-motion to determine what should be stored in the warehouse? Do you want to perform data exploration on all data? Are you using your data for new types of analytics? Key Questions to Ask 31
  • Slide 32
  • HOW ARE REVENUES LOOKING LIKE. 32
  • Slide 33
  • WHAT DOES BIG DATA TRIGGER? From Big Data and the Web: Algorithms for Data Intensive Scalable Computing, Ph.D Thesis, Gianmarco 33
  • Slide 34
  • DEALING WITH BIG DATA IS HARD When the operations on data are complex: Eg. Simple counting is not a complex problem. Modeling and reasoning with data of different kinds can get extremely complex Good news with big-data: Often, because of the vast amount of data, modeling techniques can get simpler (e.g., smart counting can replace complex model-based analytics) as long as we deal with the scale. 34
  • Slide 35
  • 35 Manage & store huge volume of any data Hadoop File System MapReduce Manage streaming data Stream Computing Analyze unstructured data Text Analytics Engine Data Warehousing Structure and control data Integrate and govern all data sources Integration, Data Quality, Security, Lifecycle Management, MDM Understand and navigate federated big data sources Federated Discovery and Navigation 35 TECHNOLOGIES USED IN BIG DATA
  • Slide 36
  • TYPES OF TOOLS TYPICALLY USED IN BIG DATA SCENARIO Where is the processing hosted? Distributed server/cloud Where data is stored? Distributed Storage (eg: Amazon s3) Where is the programming model? Distributed processing (Map Reduce) How data is stored and indexed? High performance schema free database What operations are performed on the data? Analytic/Semantic Processing (Eg. RDF/OWL) 36
  • Slide 37
  • WHY HADOOP? 37
  • Slide 38
  • WHY HADOOP? 38
  • Slide 39
  • Time for thinking What do you do with the data. Lets take an example: From application developers to video streamers, organizations of all sizes face the challenge of capturing, searching, analyzing, and leveraging as much as terabytes of data per secondtoo much for the constraints of traditional system capabilities and database management tools. 39
  • Slide 40
  • QUESTIONS FROM BUSINESSES 40
  • Slide 41
  • Big Data Platform Systems Management Application Development Visualization & Discovery Accelerators Information Integration & Governance Hadoop System Stream Computing Data Warehouse BI / ReportingExploration / Visualization Functional App Industry App Predictive Analytics Content Analytics Analytic Applications New analytic applications drive the requirements for a big data platform Integrate and manage the full variety, velocity and volume of data Apply advanced analytics to information in its native form Visualize all available data for ad-hoc analysis (even in motion!) Development environment for building new analytic applications Workload optimization and scheduling Security and Governance BIG DATA STRATEGY: MOVE THE ANALYTICS CLOSER TO THE DATA And grow and evolve on your current IT infrastructure
  • Slide 42
  • FOUR ENTRY POINTS OF BIG DATA Unlock Big Data Simplify Your Warehouse Preprocess Raw Data Analyse Streaming Data Big Data Platform Systems Management Application Development Visualization & Discovery Accelerators Information Integration & Governance Hadoop System Stream Computing Data Warehouse BI / ReportingExploration / Visualization Functional App Industry App Predictive Analytics Content Analytics Analytic Applications
  • Slide 43
  • ADVANTAGES Dialogue with consumers Redevelop your products Perform risk analysis Keeping data safe Customize your website in real time Reducing maintenance cost
  • Slide 44
  • Thank you