Big datalab
-
Upload
chien-hsun-chen -
Category
Technology
-
view
341 -
download
0
description
Transcript of Big datalab
BIGDATA LABBigQuery & Query Visualization
Outline
• BigQuery
• BigQuery Visualization
• BigData Lab Open Source!
About meDavid Chen
!TAGOO CTO
PyCon APAC 2014 PR Taipei.py Coorganizer
GDE !
Speaker: PyCon Apac 2014
PyCon 2013 Google Festival
Google Launch Event
Big Query
RealTime
BigQuery: Big Data Analytics in the cloud
BigData SQL
SQL
Basic Characteristic• STRING, INTEGER, FLOAT, BOOLEAN, TIMESTAMP,
RECORD
• schema: Support repeated / nested field (json)
• Import / (parallel) Export with CSV / JSON
• Streaming (real time insert)
• 100,000 rows/s
Big Join
Nested / Repeated
Table wildcard / decorators
User defined function
Big Query Visualization
BigQuery Taiwanhttp://littleq0903.github.io/bq-taiwan/
With google chartshttps://gcdc2013-coder.appspot.com/app#
http://googlegeodevelopers.blogspot.tw/2013/09/visualizing-airport-delay-correlations.html
BigQuery + Map
http://nbviewer.ipython.org/gist/fhoffa/6459195
BigQuery + Ipython Notebook
Even More• BigQuery with R
http://thinktostart.wordpress.com/2014/03/10/using-google-bigquery-with-r/
• BigQuery with Pandashttp://pandas.pydata.org/pandas-docs/stable/io.html#google-bigquery-experimental
• BigQuery with Hadoophttp://googlecloudplatform.blogspot.tw/2014/04/google-bigquery-and-datastore-connectors-for-hadoop.html
• Excel Connectorhttps://developers.google.com/bigquery/bigquery-connector-for-excel
Real Time
BigQuery + Hadoop
https://www.youtube.com/watch?v=yKBHEznag-g#t=231Live Dashboard
Big Data LabOpen Source
Google Developer Challenge 2013
AppEngine Manipulate data with MapReduce
Cloud Storage Storage with low price and highly consistence
Predict API* Machine learning on cloudBigQuery AdHoc Query to google sheet & Visualization
No Deploy / Config needs Easy to use (for kids) but still powerful Open Source
Big Data Pipeline
AppEngine
Task Client Pipeline WorkerVirtual Env
AppEngine
Task Client Pipeline WorkerVirtual Env
Map Reduce
Map Reduce
GCE
Task Client Hadoop!!
GCE Task Controller
Cron Tab
Task Graph
Controller UI
Virtual Env
Currently use Luigi
• Task Workerhttps://github.com/Tagtoo/TaskWorker
• Predefined Pipelinehttps://github.com/Tagtoo/TaskWorker
• Virtual Package https://github.com/Tagtoo/BigDataLabWorker
AppEngine Manipulate data with MapReduce
Reference
• https://cloud.google.com/events/google-cloud-platform-live/