Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am
-
Upload
barrettpeterson -
Category
Documents
-
view
342 -
download
0
Transcript of Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am
BUSINESS INTELLIGENCE & ADVANCED ANALYTICS
The Search for Patterns, Waldo, and Black SwansBarrett Peterson, C.P.A.ICPAS Fox River Trail Chapter, June 28,
2012
WHYBUSINESSINTELLIGENCE?
Information
Good Data
Good Analysis
HISTORY AND BACKGROUND
• Computer based business intelligence systems is an idea that is middle aged – about 40 . Previously described as:
– Decision support systems [DSS]– Executive information systems
[EIS]– Management information systems
[MIS]
A LITTLE BACKGROUND
HISTORY
A trip down memory lane
• Internet Development– ARAPNET and others – 1960s– Internet Protocols – 1982, presumably by Al
Gore
• IBM researcher Edgar Codd credited with development of relational data base theory in 1970.
• IBM’s Donald Chamberlin and Raymond Boyce develop structured query language [SQL] in the early 1970s to manipulate and retrieve data from IBM’s early relational data base management system
• World Wide Web and 1st web browser invented by Tim Berners-Lee in 1990 by combining the internet, hypertext mark-up language, and Uniform Resource Locator [URL] system. Became Nexus.
• Mosaic, designed by Marc Andressen became the first commercial web browser [Netscape].
• Development of big data enabling database designs and high speed processing during the last 15 years.
A LITTLE BACKGROUND
History
ImportantTechnologyInventions
• Development of the primary infrastructure– Database design– Processing and Storage Hardware– Server Development and Massively Parallel
Processing• Improved telecommunications speed• Hardware miniaturization, capacity, and speed
– Memory [RAM] capacity– Storage capacity and transfer speed– Bus speed– Video processing capacity and speed
• Increased hardware speed and capacity• Digital formats for sensors, cameras, RFID, and
other data collection sources• Mobile computing• “Cloud” capability exploits many of these
developments
A LITTLE BACKGROUND
History
DriversEnablingBI and AdvancedAnalytics
• Analytics• Business Intelligence• Knowledge Management• Content Management• Data Mining• Big Data• Data Integration• Gameification• Blob [Binary Large Object]
A LITTLE BACKGROUND
TERMINOLOGY
A consultant’s collection ofconfusing names - a sampler
• CPU speed and power– Moore’s law– Multi-core chips– Solid State Memory
• Storage improvement and cost reduction– Greatly increased capacity– Greatly increased access/transfer speed– Greatly reduced cost
• Data collection from a wide range of devices
• Data communications – speed and volume
• Database management techniques and software
• Application speed and power
A LITTLE BACKGROUND
DriversAndEnablers ofBigData
BUSINESS INTELLIGENCE AND ADVANCED
ANALYTICSDEFINED
A system comprised of “computer” hardware and software to:
• Collect, “clean”, filter, and integrate data
• Store data [hardware and software]• Provide knowledge management,
analytical , and presentation tools to translate data into decision useful information
TONIGHT’S CRITICAL DEFINITIONS
BusinessIntelligence
• Prehistoric – Mainframe Era– DSS, EIS, MIS– Hierarchical Master Data Files
• The Current Era [Primarily] – Business Intelligence– Primarily “structured” data [data that can
be represented in relational /dimensional tables or flat files], and BLOB [binary large object] formats
– Analysis of “known” patterns– Presented in tables, simple charts, and
dashboards
• Emerging – Big Data and Advanced Analytics– to discover new, changing, or variable
patterns– A wide variety of “unstructured” digital
data formats added to “structured” data– Emerging storage structures– “Exploratory” analytics – Zoomable User Interface [ZUIs]– Solid State Memory and Solid-State Drives
TONIGHT’S CRITICAL DEFINITIONS
Business IntelligenceGenerations
THE HARDWARE AND SOFTWARE ELEMENTS OF
BUSINESS INTELLIGENCE
• Computer – CPU, Memory, and Operating System Software• Data Collection
– Master Data Management– Collection Processes and Devices– Data Cleansing Processes and Software
• Data Storage– Physical Devices and Storage Management Software– Data Management and Integration– Database Software Storage
• Relational – Traditional ERP/Transaction systems• Dimensional – Traditional Data Warehouse,
including associated BLOB• Distributed , Multiple Server, Storage Systems• NoSQL [Not Only SQL] Distributed Operational
Stores• Hadoop for Highly Parallel Processing and
Intensive Data Analytics Applications• Middleware Software• Business Intelligence Application Software
– OLAP, Dashboard, and Chart Reports– Statistical Analysis and Presentation Tools
BUSINESS INTELLIGENCE ELEMENTS
PrincipalComponentsfor MaximumApplication
• Data Governance and Management– Uniform terminology– Uniform meaning– Uniform units of measure– Metadata
• Data Structure and Attributes– Structured - Relational/Dimensional– Unstructured– Rate of change, context, and other
attributes
• Data Collection and Preparation– Filtering, particularly “Big Data”– Extract, Transform, Load [ETL] for
“structured data
• Data Base File Systems• Data Storage and Retrieval
– Capacity– Access/Retrieval speed
BUSINESS INTELLIGENCE ELEMENTS
DATAISSUES:THECORNERESTONE
• Metadata management– Business definitions , rules, sources– Technical attributes, such as type, scale,
transformation methods– Processing requirements – filtering, ETL,
aggregation, summarization• Data Definitions and data dictionaries
– Name– Unit(s) of measure
• Data collection and filtering or transforming requirements– Sources – internal and external– Context addition/filtering requirements
• Data integration specifications– Multiple platforms and applications– Mapping to intermediate data marts
• Privacy requirements– Personal Identifying data– Laws: HIPPA, Privacy act
BUSINESS INTELLIGENCE ELEMENTS
MASTERDATAGOVERNANCEANDMANAGEMENT
• Data Structures– “Structured” Data , principally text
and numbers capable of incorporation in relational or dimensional tables
– “Unstructured” Data, not suitable for relational tables, many in newer data formats
• Big Data Attributes– Both “structured” and “unstructured”– The four major “Vs” of big data
• Volume - huge• Velocity – fast changing, unlike
structured• Variety – format and content• Variability – lacks the consistency of
structured data
BUSINESS INTELLIGENCE ELEMENTS
DataStructuresandAttributesAre CriticalDrivers
• Content Structure – Traditional Financial Data – Numerical– Sign/Debit or Credit– Text Descriptions
• Database Management Structures– Legacy Systems: Hierarchical and Network– Transaction Systems: Relational
• Relations [Tables]. Attribute [columns], Instance [Rows]
• Rules: no duplicate rows; single value for attributes– Warehouse Systems: Dimensional
• Facts [data items, usually a dollar amount or unit count]
• Measures – dollar or count for facts• Dimensions – groups of hierarchies and descriptors of
various aspects or context for the facts/measures
• Microsoft Office and Similar File Formats • Photography and Art
BUSINESS INTELLIGENCE ELEMENTS
Data StructuresITLingo
RELATIONALTABLEILLUSTRATION
“Tuple” is borrowed from mathematics and set theory and is used in database design to refer to the attributes of an “item” or “value” [row], the subject or title of the table. Value examples include customers, vendors, orders, product SKUs
Business Intelligence Elements
BUSINESS INTELLIGENCE ELEMENTS
MATHCAN BECOMPLICATED
• Numbers and words/letters– Relational/Dimensional– Spreadsheets– Word Processing documents
• Sound and Music• Photo• Video• Video Game• CAD Design• Graphical
– PDF– Raster, Vector Graphics– Statistical Visualization
• Scientific• Signal• XML [Web based mark-up formats]• Geo-Location• Web Logs
BUSINESS INTELLIGENCE ELEMENTS
DATAFILETYPECATEGORIES,ALMOST ENGLISH
• Collection– Company transaction/ERP systems– Purchased, such as Nielsen, IRI– Vendor supplied, such as bank
transactions• Filtering
– Adding context such as date or location– Eliminating “chatter” from high volume
data– Error correction
• Aggregation & Integration
BUSINESS INTELLIGENCE ELEMENTS
DATACOLLECTIONAND PREPARATION
DATA COLLECTION - RFID
RFID tag RFID tag reader
DATA COLLECTION
Various sensors Surveillance Camera
DATA FILTERING AND CLEANSING IS IMPORTANT
• Relational – SQL • Dimensional – SQL, OLAP• Binary Large Object [BLOB] – binary
data, most often photos, video, audio, or PDF files
• Massively Parallel-Processing [MPP]• Apache Hadoopp Distributed File
System [HDFS] – Java – Google File System [GFS], used solely by
Google– Google Map Reduce
• Amazon S3 filesystem [used by Amazon]
• NoSQL• Resource Description Framework
[RDF] Databases, like Big Data
BUSINESS INTELLIGENCE ELEMENTS
DATABASEFILE SYSTEMS
BUSINESS INTELLIGENCE ELEMENTS
SELECT BIG DATADATABASEMANAGEMENTSYSTEMS
• Significant Originators– Google MapReduce– Google File System [GFS]– Amazon S3 filesystem
• Continuing Developments– Apache Software Foundation
• Apache Cassandra distributed database management system
• Apache Hadoop software framework to support data-intensive distributed applications
• Apache Hive, a data warehouse structure built on Hadoop
• Pig - high level programming language for creating MapReduce programs with Hadoop
– Significant to Technology Development• Facebook• Yahoo• LinkedIn [Project Voldemort]
• Convergence aspect of mainframes and servers
• Massively parallel , multiple server, distributed processing, in multiple data centers – grid computing
• Multi-core , high capacity, lower power consumption, CPUs
• Memory servers for RAM employing DRAM comprised of Fully Buffered Direct Inline Memory Modules [FBDIMM]
• Solid state flash drive storage• Greatly improved., and less
costly, hard drive storage
BUSINESS INTELLIGENCE ELEMENTS
COMPUTERHARDWARECONSIDERATIONS
BI CONFIGURATION SIZES
Small – BI, but not Big
Data capable MediumLarge – IBM Sequoia At
Livermore Labs
• Data Storage Terminology– Memory – CPU direct connected, often called
RAM– Storage – not directly connected to the CPU
• Data Storage Device Types– Memory
• DRAM – based• Flash memory – based Solid-State Drives
[SSDs]– Storage
• Hard Disk Drives [HDD]• Optical Drives – CDs, DVDs
• Data Storage Systems– Direct Attached– Network Attached Storage [NAS]– Storage Area Network [SAN]– pNFS – Parallel Network file systems
BUSINESS INTELLIGENCE ELEMENTS
DATASTORAGEHARDWARE/SOFTWARE
• Traditional Reporting Systems– ERP systems, including extract and
presentation tools– Downloads to Excel and similar programs for
analysis using functions and pivot tables• Presentation Tools• Specialized Analytics
– IBM InfoSphere BigInsights and InfoSphere Streams
– IBM Netezza– ParAccel Analytic Database– EMC Greenplum– SAS High Performance Computing– Information Builders WebFocus
• Exploratory Tools, like IBM SPSS [originally Statistical Package for the Social Sciences]– Data mining with specialized algorithms– Statistical analysis and related charting
software
BUSINESS INTELLIGENCE ELEMENTS
BIAPPLICATIONSOFTWARE
• BI Reporting• Predictive Analytics• Data Exploration -
correlation• Data Visualization -
graphical• Instrumentation Analytics• Content Analytics• Web Analytics• Functional Applications• Industry Applications
BUSINESS INTELLIGENCE ELEMENTS
ADVANCEDANALYTICSAPPLICATIONTYPES
BUSINESS INTELLIGENCE ELEMENTS
USESTATISTICALTECHNIQUESAPPROPRIATELY
ALGORITHMS CAN BE TREACHEROUS
DATAMODELSHAVE LIMITS
BI AND ADVANCE ANALYTICS OUTPUT ILLUSTRATIONS
EXAMPLES OF USES
• Sales and Operations Planning• Financial Instruments Modeling• Production Control• Online Retail• Economics and Policy Development• Agriculture/Farming• Weather Analysis/Prediction• Environmental Impact Assessment• Healthcare Diagnosis and Records Management• Genomic Analytics and Pharmaceutical and Medical
Research• Natural Resource Exploration• Research Physics• Road, Rail Traffic Management• Security Surveillance• Astronomy• Logistics Management, Including GPS Tracking• Electrical and Telecommunications Grids Mgmt• Social Media – Facebook, LinkedIn, Google+, Twitter,
YouTube, Pinterest• TV shows – Star Trek, Person of Interest
SELECTED EXAMPLES OF USES
• Retail– Amazon– Dell– Delta Sonic Car Washes
• Data Services– IBM– Google– Amazon
• Financial Services• Manufacturing
– McCain Foods – Frozen foods– Boeing
• Transportation and Logistics– Logistics – UPS, FedEx– Rail – UP, CSX, TTX– Air – United, AMR, Southwest
• Social Media– LinkedIn– Facebook
• Medicine and Health– Center for Disease Control (CDC)– J. Craig Venter Institute
• Science– Livermore Labs
SELECTEDUSERS
• Technical Elements– Direct on-line access– Amazon specialized “Big Data”
database – Distributed and extremely large
data centers– Highly automated, high
technology warehouses– High supplier [vendors]
integration• User Benefits– Favorable prices– Suggested associated
purchases– Individual interest advertising
SELECTED EXAMPLES OF USE
AMAZON
• Technical Elements– Web driven order entry and
custom purchase configuration– Tracking of sales correspondence
with promotional offers– Supplier re-order integration
• User Benefits– Ability to customize purchase– Reasonable cost– Prompt delivery
SELECTED EXAMPLES OF USE
DELL
• Technical components– Shared component and assembly
designs– More detailed quality
specifications and product tolerances
– Control of assembly schedule– “Real time” exchange of technical
information– Dissemination of best practices
• Customer benefits– Faster deliveries– Increased product quality– Reduced defects
SELECTED EXAMPLES OF USE
BOEING
• Techniques employed– Collect cellphone and GPS signals,
traffic cameras, and roadside sensors– Identify accidents, traffic jams, and
road damage– Emergency vehicles can be dispatched– Update traffic websites– Sends messages to drivers’ GPS
devices and cellphones– Uses supercomputers running Intrix
application• Benefits
– Eliminates traffic congestion faster– More timely relief for accident victims– Facilitate road paving scheduling
SELECTED EXAMPLES OF USE
NEW JERSEYDEPARTMENTOFTRANSPORTATION
• Technical Elements– General LinkedIn Structure
• Personal Profile• Individual Connections• Groups• Company Searches• Questions and Answers
– Attached application partners• Amazon – Reading List• Slideshare
• User Benefits– Networking with professional contacts– Personal branding capabilities– Business Development– Job Search enhancement
SELECTED EXAMPLES OF USE
LINKEDIN PROFILE PAGE SAMPLE
Facebook Page Sample
TRENDS• More, bigger, faster – big data gets
bigger• Cloud services continue to expand• Mobile computing expands• Hadoop becomes more common• Interactive data visualization will
expand• Social media type platforms will
increase their prominence• Analytics skills demands will increase
RESOURCES• Books
• Competing on Analytics, Davenport & Harris• Analytics at Work, Davenport, Harris, & Morison• The Data Asset, Fisher• Data Strategy, Adelman, Moss, Abai
• Websites• The Data Warehouse Institute – tdwi.org• IBM data analytics: www.ibm.com, smarter planet
SUMMARYWHY USE BI AND ADVANCED ANALYTICS
INSIGHTFROMDATA