Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

Post on 03-Aug-2015

553 views 3 download

Transcript of Hadoop Summit - Interactive Big Data Analysis with Solr, Spark and Hue

INTERACTIVELY QUERY AND SEARCH YOUR BIG DATA

Romain Rigaux

GOALS

Build  a  Web  app  Quickly  explore  data  

…  with  Solr

make  Solr  /  Hadoop  easier  to  use

+

ARCHITECTURE“Just  a  view”  on  top  of  the  standard  Solr  API

REST

HISTORYV1 USER

HISTORYV1 ADMIN

ARCHITECTURENEXT!

Lot  of  learning,  UX  Boost  needed  

Simple,  don’t  know  it  is  Solr

HISTORYV2 USER

HISTORYV2 ADMIN

HISTORYV2 BETTER UX

ARCHITECTURE

/select  /admin/collections  /get  /luke...

/add_widget  /zoom_in  /select_facet  /select_range...

REST AJAXTemplates  

+  JS  Model

www….

ARCHITECTUREUI FOR FACETS

Query

Collection

 Layout All  the  2D  positioning  (cell  ids),  visual,  drag&drop

Dashboard,  fields,  template,  widgets  (ids)

Search  terms,  selected  facets  (q,  fqs)

ADDING A WIDGETLIFECYCLE

Load  the  initial  page  Edit  mode  and  Drag&Drop

/solr/zookeeper/clusterstate.json  /solr/admin/luke…

/get_collection

ADDING A WIDGETLIFECYCLE

/solr/select?stats=true /new_facet

Select  the  field  Guess  ranges  (number  or  dates)  Rounding  (number  or  dates)

ADDING A WIDGETLIFECYCLE

Query  part  1

Query  Part  2

Augment  Solr  response

facet.range={!ex=bytes}bytes&f.bytes.facet.range.start=0&f.bytes.facet.range.end=9000000&  f.bytes.facet.range.gap=900000&f.bytes.facet.mincount=0&f.bytes.facet.limit=10

q=Chrome&fq={!tag=bytes}bytes:[900000+TO+1800000]

{ 'facet_counts':{ 'facet_ranges':{ 'bytes':{ 'start':10000, 'counts':[ '900000', 3423, '1800000', 339,

... ] } }}

{ ..., 'normalized_facets':[ { 'extraSeries':[

], 'label':'bytes', 'field':'bytes', 'counts':[ { 'from’:'900000', 'to':'1800000', 'selected':True, 'value':3423, 'field’:'bytes', 'exclude':False } ], ... } }}

JSON TO WIDGET{ "field":"rate_code","counts":[ { "count":97797, "exclude":true, "selected":false, "value":"1", "cat":"rate_code" } ...

{ "field":"medallion","counts":[ { "count":159, "exclude":true, "selected":false, "value":"6CA28FC49A4C49A9A96", "cat":"medallion" } ….

{ "extraSeries":[

],"label":"trip_time_in_secs","field":"trip_time_in_secs","counts":[ { "from":"0", "to":"10", "selected":false, "value":527, "field":"trip_time_in_secs", "exclude":true } ...

{ "field":"passenger_count","counts":[ { "count":74766, "exclude":true, "selected":false, "value":"1", "cat":"passenger_count" } ...

REPEATUNTIL…

GAME CHANGER!

Possibilihes

5.1  /  5.2

Analyhc  Facets

FACETFUNCTIONS

Count  Sum  Avg  Percentile  Max  ...

Count(id)  Sum(bytes)  Avg(mul(price,  quantity))  Percentile(salary,  50,  90)  Max(temperature)  ...

FACETFUNCTIONS

SUB “NESTED”FACETS

top_os  {      type:  term,      field:  os,      limit:  5  }

top_os  {      type:  term,      field:  os,      limit:  5,      facet  :  {          by_country:  {              type:  term,              field:  country          }      }  }

FUNCTION + NESTED =ANALYTICS states  {  

   type:  term,      field:  state,      facet  :  {        by_month  :  {              type:  range,              field:  time,              start:  “TODAY-­‐6MONTHS”,              end:  “TODAY”,              gap:  “MONTH”,              facet  :  {                    avg_sal:  “avg(salary)”              }          }      }  }

states  {      type:  term,      field:  state,      facet  :  {          avg_sal:  “avg(salary)”      }  }

OPERATIONS ONBUCKETS OF DATA

Counts  →  Functions

OPERATIONS ONBUCKETS OF DATA

Nested  →  nD  functions

ENTERPRISEFEATURES

- Access  to  Search  App  configurable,  LDAP/SAML  auths  - Share  by  link  - Solr  Cloud  (or  non  Cloud)  - Proxy  user

    /solr/jobs_demo/select?user.name=hue&doAs=romain&q=  - Security

    Kerberos  - Sentry

    Collection  level,  Solr  calls  like  /admin,  /query,  Solr  UI,  ZooKeeper

SEARCH AS ONLYAPP IN HUE

gethue.com/solr-­‐search-­‐ui-­‐only/

• Spark  in  your  browser  

• Notebooks  

• New  REST  Server

SPARKINDEXING WHAT

• Open  source  REST  for  Spark  Shell  

• Runs  locally  or  inside  YARN  

• Spark  Scala,  PySpark  and  jar/py  submission

SPARKINDEXING WHAT

hsps://github.com/cloudera/hue/tree/master/apps/spark/java

SPARK STREAMING

Real  hme!                    Spark  Solr

• Pytho  

• Scala  

• Charts

NOTEBOOKS / SHELL

WHAT

DEMO TIME• Analyze  Bay  area  bike  share  

• Visualize  one  year  of  data  

• Know  your  users,  predict  behavior

MISSEDSOMETHING?

demo.gethue.com

• Full  Analyhcs  

• Easier  indexing  

• Geo  

• Export/Share  results  

• “More  like  this”  

• Solr  Joins,  Solr  SQL  

• Spark,  SQL...  integrahon,  Hue  4

WHAT’S NEXT

NEW FEATURES

TWITTER

@gethue

USER GROUP

hue-­‐user@

WEBSITE

hsp://gethue.com

LEARN

hsp://learn.gethue.com

THANKS!