Big datalab

Post on 15-Jan-2015

341 views 0 download

Tags:

description

 

Transcript of Big datalab

BIGDATA LABBigQuery & Query Visualization

Outline

• BigQuery

• BigQuery Visualization

• BigData Lab Open Source!

About meDavid Chen

!TAGOO CTO

PyCon APAC 2014 PR Taipei.py Coorganizer

GDE !

Speaker: PyCon Apac 2014

PyCon 2013 Google Festival

Google Launch Event

Big Query

RealTime

BigQuery: Big Data Analytics in the cloud

BigData SQL

SQL

Basic Characteristic• STRING, INTEGER, FLOAT, BOOLEAN, TIMESTAMP,

RECORD

• schema: Support repeated / nested field (json)

• Import / (parallel) Export with CSV / JSON

• Streaming (real time insert)

• 100,000 rows/s

Big Join

Nested / Repeated

Table wildcard / decorators

User defined function

Big Query Visualization

BigQuery Taiwanhttp://littleq0903.github.io/bq-taiwan/

With google chartshttps://gcdc2013-coder.appspot.com/app#

http://googlegeodevelopers.blogspot.tw/2013/09/visualizing-airport-delay-correlations.html

BigQuery + Map

http://nbviewer.ipython.org/gist/fhoffa/6459195

BigQuery + Ipython Notebook

Even More• BigQuery with R

http://thinktostart.wordpress.com/2014/03/10/using-google-bigquery-with-r/

• BigQuery with Pandashttp://pandas.pydata.org/pandas-docs/stable/io.html#google-bigquery-experimental

• BigQuery with Hadoophttp://googlecloudplatform.blogspot.tw/2014/04/google-bigquery-and-datastore-connectors-for-hadoop.html

• Excel Connectorhttps://developers.google.com/bigquery/bigquery-connector-for-excel

Real Time

BigQuery + Hadoop

https://www.youtube.com/watch?v=yKBHEznag-g#t=231Live Dashboard

Big Data LabOpen Source

Google Developer Challenge 2013

AppEngine Manipulate data with MapReduce

Cloud Storage Storage with low price and highly consistence

Predict API* Machine learning on cloudBigQuery AdHoc Query to google sheet & Visualization

No Deploy / Config needs Easy to use (for kids) but still powerful Open Source

Big Data Pipeline

AppEngine

Task Client Pipeline WorkerVirtual Env

AppEngine

Task Client Pipeline WorkerVirtual Env

Map Reduce

Map Reduce

GCE

Task Client Hadoop!!

GCE Task Controller

Cron Tab

Task Graph

Controller UI

Virtual Env

Currently use Luigi

• Task Workerhttps://github.com/Tagtoo/TaskWorker

• Predefined Pipelinehttps://github.com/Tagtoo/TaskWorker

• Virtual Package https://github.com/Tagtoo/BigDataLabWorker

AppEngine Manipulate data with MapReduce

Reference

• https://cloud.google.com/events/google-cloud-platform-live/