Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am

47
BUSINESS INTELLIGENCE & ADVANCED ANALYTICS The Search for Patterns, Waldo, and Black Swans Barrett Peterson, C.P.A. ICPAS Fox River Trail Chapter, June 28, 2012

Transcript of Business Intelligence Data Analytics June 28 2012 Icpas V4 Final 20120625 8am

Page 1: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

BUSINESS INTELLIGENCE & ADVANCED ANALYTICS

The Search for Patterns, Waldo, and Black SwansBarrett Peterson, C.P.A.ICPAS Fox River Trail Chapter, June 28,

2012

Page 2: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

WHYBUSINESSINTELLIGENCE?

Information

Good Data

Good Analysis

Page 3: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

HISTORY AND BACKGROUND

Page 4: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Computer based business intelligence systems is an idea that is middle aged – about 40 . Previously described as:

– Decision support systems [DSS]– Executive information systems

[EIS]– Management information systems

[MIS]

A LITTLE BACKGROUND

HISTORY

A trip down memory lane

Page 5: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Internet Development– ARAPNET and others – 1960s– Internet Protocols – 1982, presumably by Al

Gore

• IBM researcher Edgar Codd credited with development of relational data base theory in 1970.

• IBM’s Donald Chamberlin and Raymond Boyce develop structured query language [SQL] in the early 1970s to manipulate and retrieve data from IBM’s early relational data base management system

• World Wide Web and 1st web browser invented by Tim Berners-Lee in 1990 by combining the internet, hypertext mark-up language, and Uniform Resource Locator [URL] system. Became Nexus.

• Mosaic, designed by Marc Andressen became the first commercial web browser [Netscape].

• Development of big data enabling database designs and high speed processing during the last 15 years.

A LITTLE BACKGROUND

History

ImportantTechnologyInventions

Page 6: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Development of the primary infrastructure– Database design– Processing and Storage Hardware– Server Development and Massively Parallel

Processing• Improved telecommunications speed• Hardware miniaturization, capacity, and speed

– Memory [RAM] capacity– Storage capacity and transfer speed– Bus speed– Video processing capacity and speed

• Increased hardware speed and capacity• Digital formats for sensors, cameras, RFID, and

other data collection sources• Mobile computing• “Cloud” capability exploits many of these

developments

A LITTLE BACKGROUND

History

DriversEnablingBI and AdvancedAnalytics

Page 7: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Analytics• Business Intelligence• Knowledge Management• Content Management• Data Mining• Big Data• Data Integration• Gameification• Blob [Binary Large Object]

A LITTLE BACKGROUND

TERMINOLOGY

A consultant’s collection ofconfusing names - a sampler

Page 8: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• CPU speed and power– Moore’s law– Multi-core chips– Solid State Memory

• Storage improvement and cost reduction– Greatly increased capacity– Greatly increased access/transfer speed– Greatly reduced cost

• Data collection from a wide range of devices

• Data communications – speed and volume

• Database management techniques and software

• Application speed and power

A LITTLE BACKGROUND

DriversAndEnablers ofBigData

Page 9: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

BUSINESS INTELLIGENCE AND ADVANCED

ANALYTICSDEFINED

Page 10: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

A system comprised of “computer” hardware and software to:

• Collect, “clean”, filter, and integrate data

• Store data [hardware and software]• Provide knowledge management,

analytical , and presentation tools to translate data into decision useful information

TONIGHT’S CRITICAL DEFINITIONS

BusinessIntelligence

Page 11: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Prehistoric – Mainframe Era– DSS, EIS, MIS– Hierarchical Master Data Files

• The Current Era [Primarily] – Business Intelligence– Primarily “structured” data [data that can

be represented in relational /dimensional tables or flat files], and BLOB [binary large object] formats

– Analysis of “known” patterns– Presented in tables, simple charts, and

dashboards

• Emerging – Big Data and Advanced Analytics– to discover new, changing, or variable

patterns– A wide variety of “unstructured” digital

data formats added to “structured” data– Emerging storage structures– “Exploratory” analytics – Zoomable User Interface [ZUIs]– Solid State Memory and Solid-State Drives

TONIGHT’S CRITICAL DEFINITIONS

Business IntelligenceGenerations

Page 12: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

THE HARDWARE AND SOFTWARE ELEMENTS OF

BUSINESS INTELLIGENCE

Page 13: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Computer – CPU, Memory, and Operating System Software• Data Collection

– Master Data Management– Collection Processes and Devices– Data Cleansing Processes and Software

• Data Storage– Physical Devices and Storage Management Software– Data Management and Integration– Database Software Storage

• Relational – Traditional ERP/Transaction systems• Dimensional – Traditional Data Warehouse,

including associated BLOB• Distributed , Multiple Server, Storage Systems• NoSQL [Not Only SQL] Distributed Operational

Stores• Hadoop for Highly Parallel Processing and

Intensive Data Analytics Applications• Middleware Software• Business Intelligence Application Software

– OLAP, Dashboard, and Chart Reports– Statistical Analysis and Presentation Tools

BUSINESS INTELLIGENCE ELEMENTS

PrincipalComponentsfor MaximumApplication

Page 14: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Data Governance and Management– Uniform terminology– Uniform meaning– Uniform units of measure– Metadata

• Data Structure and Attributes– Structured - Relational/Dimensional– Unstructured– Rate of change, context, and other

attributes

• Data Collection and Preparation– Filtering, particularly “Big Data”– Extract, Transform, Load [ETL] for

“structured data

• Data Base File Systems• Data Storage and Retrieval

– Capacity– Access/Retrieval speed

BUSINESS INTELLIGENCE ELEMENTS

DATAISSUES:THECORNERESTONE

Page 15: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Metadata management– Business definitions , rules, sources– Technical attributes, such as type, scale,

transformation methods– Processing requirements – filtering, ETL,

aggregation, summarization• Data Definitions and data dictionaries

– Name– Unit(s) of measure

• Data collection and filtering or transforming requirements– Sources – internal and external– Context addition/filtering requirements

• Data integration specifications– Multiple platforms and applications– Mapping to intermediate data marts

• Privacy requirements– Personal Identifying data– Laws: HIPPA, Privacy act

BUSINESS INTELLIGENCE ELEMENTS

MASTERDATAGOVERNANCEANDMANAGEMENT

Page 16: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Data Structures– “Structured” Data , principally text

and numbers capable of incorporation in relational or dimensional tables

– “Unstructured” Data, not suitable for relational tables, many in newer data formats

• Big Data Attributes– Both “structured” and “unstructured”– The four major “Vs” of big data

• Volume - huge• Velocity – fast changing, unlike

structured• Variety – format and content• Variability – lacks the consistency of

structured data

BUSINESS INTELLIGENCE ELEMENTS

DataStructuresandAttributesAre CriticalDrivers

Page 17: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Content Structure – Traditional Financial Data – Numerical– Sign/Debit or Credit– Text Descriptions

• Database Management Structures– Legacy Systems: Hierarchical and Network– Transaction Systems: Relational

• Relations [Tables]. Attribute [columns], Instance [Rows]

• Rules: no duplicate rows; single value for attributes– Warehouse Systems: Dimensional

• Facts [data items, usually a dollar amount or unit count]

• Measures – dollar or count for facts• Dimensions – groups of hierarchies and descriptors of

various aspects or context for the facts/measures

• Microsoft Office and Similar File Formats • Photography and Art

BUSINESS INTELLIGENCE ELEMENTS

Data StructuresITLingo

Page 18: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

RELATIONALTABLEILLUSTRATION

“Tuple” is borrowed from mathematics and set theory and is used in database design to refer to the attributes of an “item” or “value” [row], the subject or title of the table. Value examples include customers, vendors, orders, product SKUs

Business Intelligence Elements

Page 19: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

BUSINESS INTELLIGENCE ELEMENTS

MATHCAN BECOMPLICATED

Page 20: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Numbers and words/letters– Relational/Dimensional– Spreadsheets– Word Processing documents

• Sound and Music• Photo• Video• Video Game• CAD Design• Graphical

– PDF– Raster, Vector Graphics– Statistical Visualization

• Scientific• Signal• XML [Web based mark-up formats]• Geo-Location• Web Logs

BUSINESS INTELLIGENCE ELEMENTS

DATAFILETYPECATEGORIES,ALMOST ENGLISH

Page 21: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Collection– Company transaction/ERP systems– Purchased, such as Nielsen, IRI– Vendor supplied, such as bank

transactions• Filtering

– Adding context such as date or location– Eliminating “chatter” from high volume

data– Error correction

• Aggregation & Integration

BUSINESS INTELLIGENCE ELEMENTS

DATACOLLECTIONAND PREPARATION

Page 22: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

DATA COLLECTION - RFID

RFID tag RFID tag reader

Page 23: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

DATA COLLECTION

Various sensors Surveillance Camera

Page 24: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

DATA FILTERING AND CLEANSING IS IMPORTANT

Page 25: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Relational – SQL • Dimensional – SQL, OLAP• Binary Large Object [BLOB] – binary

data, most often photos, video, audio, or PDF files

• Massively Parallel-Processing [MPP]• Apache Hadoopp Distributed File

System [HDFS] – Java – Google File System [GFS], used solely by

Google– Google Map Reduce

• Amazon S3 filesystem [used by Amazon]

• NoSQL• Resource Description Framework

[RDF] Databases, like Big Data

BUSINESS INTELLIGENCE ELEMENTS

DATABASEFILE SYSTEMS

Page 26: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

BUSINESS INTELLIGENCE ELEMENTS

SELECT BIG DATADATABASEMANAGEMENTSYSTEMS

• Significant Originators– Google MapReduce– Google File System [GFS]– Amazon S3 filesystem

• Continuing Developments– Apache Software Foundation

• Apache Cassandra distributed database management system

• Apache Hadoop software framework to support data-intensive distributed applications

• Apache Hive, a data warehouse structure built on Hadoop

• Pig - high level programming language for creating MapReduce programs with Hadoop

– Significant to Technology Development• Facebook• Yahoo• LinkedIn [Project Voldemort]

Page 27: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Convergence aspect of mainframes and servers

• Massively parallel , multiple server, distributed processing, in multiple data centers – grid computing

• Multi-core , high capacity, lower power consumption, CPUs

• Memory servers for RAM employing DRAM comprised of Fully Buffered Direct Inline Memory Modules [FBDIMM]

• Solid state flash drive storage• Greatly improved., and less

costly, hard drive storage

BUSINESS INTELLIGENCE ELEMENTS

COMPUTERHARDWARECONSIDERATIONS

Page 28: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

BI CONFIGURATION SIZES

Small – BI, but not Big

Data capable MediumLarge – IBM Sequoia At

Livermore Labs

Page 29: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Data Storage Terminology– Memory – CPU direct connected, often called

RAM– Storage – not directly connected to the CPU

• Data Storage Device Types– Memory

• DRAM – based• Flash memory – based Solid-State Drives

[SSDs]– Storage

• Hard Disk Drives [HDD]• Optical Drives – CDs, DVDs

• Data Storage Systems– Direct Attached– Network Attached Storage [NAS]– Storage Area Network [SAN]– pNFS – Parallel Network file systems

BUSINESS INTELLIGENCE ELEMENTS

DATASTORAGEHARDWARE/SOFTWARE

Page 30: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Traditional Reporting Systems– ERP systems, including extract and

presentation tools– Downloads to Excel and similar programs for

analysis using functions and pivot tables• Presentation Tools• Specialized Analytics

– IBM InfoSphere BigInsights and InfoSphere Streams

– IBM Netezza– ParAccel Analytic Database– EMC Greenplum– SAS High Performance Computing– Information Builders WebFocus

• Exploratory Tools, like IBM SPSS [originally Statistical Package for the Social Sciences]– Data mining with specialized algorithms– Statistical analysis and related charting

software

BUSINESS INTELLIGENCE ELEMENTS

BIAPPLICATIONSOFTWARE

Page 31: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• BI Reporting• Predictive Analytics• Data Exploration -

correlation• Data Visualization -

graphical• Instrumentation Analytics• Content Analytics• Web Analytics• Functional Applications• Industry Applications

BUSINESS INTELLIGENCE ELEMENTS

ADVANCEDANALYTICSAPPLICATIONTYPES

Page 32: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

BUSINESS INTELLIGENCE ELEMENTS

USESTATISTICALTECHNIQUESAPPROPRIATELY

Page 33: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

ALGORITHMS CAN BE TREACHEROUS

DATAMODELSHAVE LIMITS

Page 34: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

BI AND ADVANCE ANALYTICS OUTPUT ILLUSTRATIONS

Page 35: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

EXAMPLES OF USES

Page 36: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Sales and Operations Planning• Financial Instruments Modeling• Production Control• Online Retail• Economics and Policy Development• Agriculture/Farming• Weather Analysis/Prediction• Environmental Impact Assessment• Healthcare Diagnosis and Records Management• Genomic Analytics and Pharmaceutical and Medical

Research• Natural Resource Exploration• Research Physics• Road, Rail Traffic Management• Security Surveillance• Astronomy• Logistics Management, Including GPS Tracking• Electrical and Telecommunications Grids Mgmt• Social Media – Facebook, LinkedIn, Google+, Twitter,

YouTube, Pinterest• TV shows – Star Trek, Person of Interest

SELECTED EXAMPLES OF USES

Page 37: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Retail– Amazon– Dell– Delta Sonic Car Washes

• Data Services– IBM– Google– Amazon

• Financial Services• Manufacturing

– McCain Foods – Frozen foods– Boeing

• Transportation and Logistics– Logistics – UPS, FedEx– Rail – UP, CSX, TTX– Air – United, AMR, Southwest

• Social Media– LinkedIn– Facebook

• Medicine and Health– Center for Disease Control (CDC)– J. Craig Venter Institute

• Science– Livermore Labs

SELECTEDUSERS

Page 38: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Technical Elements– Direct on-line access– Amazon specialized “Big Data”

database – Distributed and extremely large

data centers– Highly automated, high

technology warehouses– High supplier [vendors]

integration• User Benefits– Favorable prices– Suggested associated

purchases– Individual interest advertising

SELECTED EXAMPLES OF USE

AMAZON

Page 39: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Technical Elements– Web driven order entry and

custom purchase configuration– Tracking of sales correspondence

with promotional offers– Supplier re-order integration

• User Benefits– Ability to customize purchase– Reasonable cost– Prompt delivery

SELECTED EXAMPLES OF USE

DELL

Page 40: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Technical components– Shared component and assembly

designs– More detailed quality

specifications and product tolerances

– Control of assembly schedule– “Real time” exchange of technical

information– Dissemination of best practices

• Customer benefits– Faster deliveries– Increased product quality– Reduced defects

SELECTED EXAMPLES OF USE

BOEING

Page 41: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Techniques employed– Collect cellphone and GPS signals,

traffic cameras, and roadside sensors– Identify accidents, traffic jams, and

road damage– Emergency vehicles can be dispatched– Update traffic websites– Sends messages to drivers’ GPS

devices and cellphones– Uses supercomputers running Intrix

application• Benefits

– Eliminates traffic congestion faster– More timely relief for accident victims– Facilitate road paving scheduling

SELECTED EXAMPLES OF USE

NEW JERSEYDEPARTMENTOFTRANSPORTATION

Page 42: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

• Technical Elements– General LinkedIn Structure

• Personal Profile• Individual Connections• Groups• Company Searches• Questions and Answers

– Attached application partners• Amazon – Reading List• Slideshare

• User Benefits– Networking with professional contacts– Personal branding capabilities– Business Development– Job Search enhancement

SELECTED EXAMPLES OF USE

LINKEDIN

Page 43: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

LINKEDIN PROFILE PAGE SAMPLE

Page 44: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

Facebook Page Sample

Page 45: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

TRENDS• More, bigger, faster – big data gets

bigger• Cloud services continue to expand• Mobile computing expands• Hadoop becomes more common• Interactive data visualization will

expand• Social media type platforms will

increase their prominence• Analytics skills demands will increase

Page 46: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

RESOURCES• Books

• Competing on Analytics, Davenport & Harris• Analytics at Work, Davenport, Harris, & Morison• The Data Asset, Fisher• Data Strategy, Adelman, Moss, Abai

• Websites• The Data Warehouse Institute – tdwi.org• IBM data analytics: www.ibm.com, smarter planet

Page 47: Business Intelligence  Data Analytics June 28 2012 Icpas V4  Final 20120625 8am

SUMMARYWHY USE BI AND ADVANCED ANALYTICS

INSIGHTFROMDATA