Python business intelligence (PyData 2012 talk)
-
Upload
stefan-urbanek -
Category
Documents
-
view
11.280 -
download
0
description
Transcript of Python business intelligence (PyData 2012 talk)
![Page 1: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/1.jpg)
Python for Business Intelligence
Štefan Urbánek ■ @Stiivi ■ [email protected] ■ PyData NYC, October 2012
![Page 2: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/2.jpg)
python business intelligence
)
![Page 3: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/3.jpg)
Q/A and articles with Java solution references
(not listed here)
Results
![Page 4: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/4.jpg)
![Page 5: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/5.jpg)
Why?
![Page 6: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/6.jpg)
Overview
■ Traditional Data Warehouse
■ Python and Data
■ Is Python Capable?
■ Conclusion
![Page 7: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/7.jpg)
Business Intelligence
![Page 8: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/8.jpg)
people
technology processes
![Page 9: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/9.jpg)
Data Governance
Analysis and Presentation
Extraction, Transformation, LoadingData
Sources
Technologies and Utilities
![Page 10: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/10.jpg)
Traditional Data Warehouse
![Page 11: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/11.jpg)
■ Extracting data from the original sources
■ Quality assuring and cleaning data
■ Conforming the labels and measures in the data to achieve consistency across the original sources
■ Delivering data in a physical format that can be used by query tools, report writers, and dashboards.
Source: Ralph Kimball – The Data Warehouse ETL Toolkit
![Page 12: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/12.jpg)
Source Systems
Staging Area Operational Data Store Datamarts
structured documents
databases
APIs
TemporaryStaging Area
staging relational dimensional
L0 L1 L2
![Page 13: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/13.jpg)
real time = daily
![Page 14: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/14.jpg)
Multi-dimensionalModeling
![Page 15: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/15.jpg)
![Page 16: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/16.jpg)
aggregation browsingslicing and dicing
![Page 17: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/17.jpg)
business / analyst’spoint of view
regardless of physical schema implementation
![Page 18: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/18.jpg)
Facts
fact
most detailed information
measurable
fact data cell
![Page 19: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/19.jpg)
dimensions
location
type
time
![Page 20: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/20.jpg)
■ provide context for facts
■ used to filter queries or reports
■ control scope of aggregation of facts
Dimension
![Page 21: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/21.jpg)
Pentaho
![Page 22: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/22.jpg)
Python and Datacommunity perception*
*as of Oct 2012
![Page 23: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/23.jpg)
Scientific & Financial
![Page 24: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/24.jpg)
Python
![Page 25: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/25.jpg)
Data Governance
Analysis and Presentation
Extraction, Transformation, LoadingData
Sources
Technologies and Utilities
![Page 26: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/26.jpg)
T1[s] T2[s] T3[s] T4[s]
P1 112,68 941,67 171,01 660,48
P2 96,15 306,51 725,88 877,82
P3 313,39 189,31 41,81 428,68
P4 760,62 983,48 371,21 281,19
P5 838,56 39,27 389,42 231,12
n-dimensional array of numbers
Scientific Data
![Page 27: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/27.jpg)
Assumptions
■ data is mostly numbers
■ data is neatly organized...
■ … in one multi-dimensional array
![Page 28: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/28.jpg)
Data Governance
Analysis and Presentation
Extraction, Transformation, LoadingData
Sources
Technologies and Utilities
![Page 29: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/29.jpg)
Business Data
![Page 30: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/30.jpg)
multiple representations
of same data
multiple snapshots of one source
categories are
changing
![Page 31: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/31.jpg)
❄
![Page 32: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/32.jpg)
Is Python Capable?very basic examples
![Page 33: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/33.jpg)
Data Pipes with SQLAlchemy
Data Governance
Analysis and Presentation
Extraction, Transformation, LoadingData
Sources
Technologies and Utilities
![Page 34: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/34.jpg)
■ connection: create_engine
■ schema reflection: MetaData, Table
■ expressions: select(), insert()
![Page 35: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/35.jpg)
src_engine = create_engine("sqlite:///data.sqlite")src_metadata = MetaData(bind=src_engine)src_table = Table('data', src_metadata, autoload=True)
target_engine = create_engine("postgres://localhost/sandbox")target_metadata = MetaData(bind=target_engine)target_table = Table('data', target_metadata)
![Page 36: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/36.jpg)
for column in src_table.columns: target_table.append_column(column.copy())
target_table.create()
insert = target_table.insert()
for row in src_table.select().execute(): insert.execute(row)
clone schema:
copy data:
![Page 37: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/37.jpg)
magic used:
metadata reflection
![Page 38: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/38.jpg)
reader = csv.reader(file_stream)
columns = reader.next()
for column in columns: table.append_column(Column(column, String))
table.create()
for row in reader: insert.execute(row)
text file (CSV) to table:
![Page 39: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/39.jpg)
Simple T from ETL
Data Governance
Analysis and Presentation
Extraction, Transformation, LoadingData
Sources
Technologies and Utilities
![Page 40: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/40.jpg)
transformation = [
('fiscal_year', {"w function": int, ". field":"fiscal_year"}), ('region_code', {"4 mapping": region_map, ". field":"region"}), ('borrower_country', None), ('project_name', None), ('procurement_type', None), ('major_sector_code', {"4 mapping": sector_code_map, ". field":"major_sector"}), ('major_sector', None), ('supplier', None), ('contract_amount', {"w function": currency_to_number, ". field": 'total_contract_amount'} ]
target fields source transformations
![Page 41: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/41.jpg)
for row in source: result = transform(row, [ transformation) table.insert(result).execute()
Transformation
![Page 42: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/42.jpg)
OLAP with Cubes
Data Governance
Analysis and Presentation
Extraction, Transformation, LoadingData
Sources
Technologies and Utilities
![Page 43: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/43.jpg)
cubes dimensionsmeasures levels, attributes, hierarchy
Model{ “name” = “My Model” “description” = ....
“cubes” = [...] “dimensions” = [...]}
![Page 44: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/44.jpg)
❄
logical
physical
![Page 45: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/45.jpg)
workspace.browser(cube)
load_model("model.json")
create_workspace("sql", model, url="sqlite:///data.sqlite")
model.cube("sales")
Aggregation Browser backend
cubes
Application
∑
1
2
3
4
![Page 46: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/46.jpg)
browser.aggregate(o cell, . drilldown=[9 "sector"])
drill-down
![Page 47: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/47.jpg)
q row.label k row.key
for row in result.table_rows(“sector”):
row.record["amount_sum"]
![Page 48: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/48.jpg)
✂ cut = PointCut(9 “date”, [2010])o cell = o cell.slice(✂ cut)
browser.aggregate(o cell, drilldown=[9 “date”])
2006 2007 2008 2009 2010
Total
Jan Feb Mar Apr March April May ...
whole cube
o cell = Cell(cube)browser.aggregate(o cell)
browser.aggregate(o cell, drilldown=[9 “date”])
![Page 49: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/49.jpg)
How can Python be Useful
![Page 50: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/50.jpg)
■ saves maintenance resources
■ shortens development time
■ saves your from going insane
Languagejust the
![Page 51: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/51.jpg)
Source Systems
Staging Area Operational Data Store Datamarts
structured documents
databases
APIs
TemporaryStaging Area
staging relational dimensional
L0 L1 L2
faster
![Page 52: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/52.jpg)
Data Governance
Analysis and Presentation
Extraction, Transformation, LoadingData
Sources
Technologies and Utilities
faster advanced
understandable, maintainable
![Page 53: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/53.jpg)
Conclusion
![Page 54: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/54.jpg)
people
technology processes
BI is about…
![Page 55: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/55.jpg)
don’t forget metadata
![Page 56: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/56.jpg)
who is going to fix your COBOL Java toolif you have only Python guys around?
Future
![Page 57: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/57.jpg)
is capable, let’s start
![Page 58: Python business intelligence (PyData 2012 talk)](https://reader034.fdocuments.us/reader034/viewer/2022042601/546ef5f3b4af9fc8268b4875/html5/thumbnails/58.jpg)
Thank You
Twitter:
@StiiviDataBrewery blog:
blog.databrewery.orgGithub:
github.com/Stiivi
[t\