Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

31
Kitenga reinventing information Mark Davis Founder/CTO

description

Presented by Mark Davis, CTO Kitenga - See conference video - http://www.lucidimagination.com/devzone/events/conferences/lucene-revolution-2012 Kitenga's Analyst system uses the LucidWorks Enterprise REST API in a variety of ways, including for configuring collections and managing Solr schema. As part of the Kitenga platform, the ZettaSearch Designer empowers the end-user to dynamically drag-and-drop search widgets to create a specialized search interface. For a user to effectively design search UIs that meet their needs, they need to be able to understand the available schema fields that populate a given collection. ZettaSearch Designer interrogates the Solr infrastructure using the Lucid REST API to provide an overview of the available metadata. It is then easy for the user to build rich, facetted search experiences around the metadata library indexed into the collection. In this implementation overview, I will describe the design of ZettaSearch Designer, how it interacts with big data technologies like Hadoop as part of the indexing pipeline, and how it uses the LucidWorks API to enable user discovery of the metadata needed to create novel search user interfaces on the fly.

Transcript of Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

Page 1: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

Kitenga reinventing information

Mark Davis Founder/CTO

Page 2: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

Enabling Big Data Search via the Lucid ReST API

Page 3: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

Big  Data    

Enormous  transactional  data  Enormous  unstructured  information  Too  big  for  databases  New  tools  are  needed    

Page 4: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

kilobyte (kB) 103 210 kibibyte (KiB) 210 megabyte (MB) 106 220 mebibyte (MiB) 220 gigabyte (GB) 109 230 gibibyte (GiB) 230 terabyte (TB) 1012 240 tebibyte (TiB) 240 petabyte (PB) 1015 250 pebibyte (PiB) 250 exabyte (EB) 1018 260 exbibyte (EiB) 260 zettabyte (ZB) 1021 270 zebibyte (ZiB) 270 yottabyte (YB) 1024 280 yobibyte (YiB) 280

Volume   Velocity   Variety  

Page 5: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

Gather  Resources  

•  Crawl  •  Crack  formats  

Extract  Metadata  

•  Named  entities  

•  Categories  •  Machine  learning  

•  Semantic  analysis  

Index  

•  Schema  definition  

•  Collection  management  

Indexing  Challenges    

Complex,  varied  data  Compute-­‐intensive  metadata  generation  Schema  and  collection  management    

Page 6: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

Initial  Query  

•  Keyword  guesses  

•  Category  guidance  

Refine  Query  

•  Analytic  tools  

•  Facetted  guidance  

Evaluate  Relevance  

•  Read  KWIC  •  Read  metadata  

•  Read  document  

Search  Experience  Challenges    

Complex,  varied  data  Resource  discovery  Facetted  search  experience  management    

Page 7: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

The  Solution  

Enable fast metadata generation:

Hadoop Mahout GPUs

Manage and control collections and schema:

LucidWorks Enterprise API

Page 8: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

SQL  RDBMS  

Transactional  Data  BI  Tools  

Search  Documents  Text  Classification  Taxonomies  Ontologies  

Page 9: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience
Page 10: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

Parts-­‐of-­‐Speech  Tagging  

Tokenization  

Lemmatization  

Finite  State  Transducer   Finite  State  Transducer  

Finite  State  Transducer  

Machine-­‐Learning  

Page 11: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

Query  Language  

Metadata  Extraction  

Indexing  

Facet  Browsing   Facet  Charting  

Resource  Integration  

Autosuggest  Spellcheck  

Page 12: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

¡  Start  to  POC  in  a  week  ¡  Open  source  intelligence  problems  

Page 13: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

GOAL:  Be  more  competitive  

SOURCES:  Patents,  PR  

announcements,  legal  documents,  

whitepapers,  crawled  websites  

ANALYSIS:  Extract  named  entities  and  

relationships,  classify  and  label;  

visually  understand  relationships  and  

trends  

ACTION:  Change  R&D  priorities  and  

improve  marketing  approaches  

13

ZettaS

earch  

Facetted Search and Analytics

ZettaV

ox   metadata  

relationships  

data  entities  

Source

s  

Page 14: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

¡  Understand  IP  among  competitors  ¡  Assist  legal  team  with  litigation  ¡  Custom  search  experience  ¡  Custom  extractors:  

§  Electronic  parts  § Memory  types  §  Flash  memory  

5/15/12 . 14

Page 15: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

5/15/12 . 15

Documents   Size  

Dell   102,508   9Gb  

EMC   303,678   14Gb  

Huawei   11,912   890Mb  

Kingston   2,534   134Mb  

Lenovo   8,305   542Mb  

NEC   3,900   252Mb  

Nokia   174,681   22Gb  

Panasonic   5,804   473Mb  

Rim   181   8Mb  

Sharp  USA   31,918   4.9Gb  

645,421   60.2Gb  

Page 16: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

GOAL:  Discover  new  drugs,  detect  side-­‐

effects,  speed  R&D  

SOURCES:  Published  research  reports,  

patents,  adverse  effects  databases,  

genomics  and  proteomics  databases  

ANALYSIS:  Extract  named  entities  and  

relationships,  classify  and  label;  visually  

discover  trends  and  relationships  

ACTION:  Change  R&D  priorities  

16

ZettaS

earch  

Facetted Search and Analytics

Source

s  Ze

ttaV

ox  

relationships  

data  entities  pathways  

sequences  

Page 17: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

¡  Lousy  search  (Google  Search  Appliance)  ¡  Internal  regulators  can’t  find  by  accession  number  

¡  Custom  extractors:  §  Accession  number  §  Ontology  of  active  ingredients  §  Drug  names  

© 2012 Kitenga Proprietary 17

Page 18: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

GOAL:  Build  “second  screen  

experiences”  

SOURCES:  wikipedia,  IMDB,  blogs  

ANALYSIS:  Extract  named  entities  and  

relationships,  preserve  existing  

structural  metadata  

ACTION:  Enable  new  media  experiences  

18

ZettaS

earch  

Facetted Search and Analytics

ZettaV

ox   metadata  

relationships  

data  entities  

Source

s  

Page 19: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

¡  Crawlers  on  Hadoop  ¡  Document  format  crackers  on  Hadoop  ¡  Extractors  on  Hadoop  ¡  Filters  on  Hadoop  ¡  HTTP  documents  to  Solr  sharded  cluster  ¡  Intermediary  files  remain  on  HDFS  for  reprocessing  

Page 20: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

¡ Missing  piece  of  the  puzzle  ¡  Addresses  the  impedance  mismatch  between  Big  Data  technologies  and  Solr  search  

¡ Manage  collections  ¡ Manage  schema  

Page 21: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience
Page 22: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience
Page 23: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

¡  Create  collections  ¡  Delete  collections  ¡  Update  collection  properties  ¡  Create  schema  ¡ Modify  schema  

Page 24: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

¡  Schema  interrogation  ¡  Schema  binding  to  user  experience  ¡  Facetted  search  ¡  Embedded  analytics  

Page 25: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience
Page 26: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience
Page 27: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

¡  Big  Data  search  and  analytics  has  many  challenges:  §  Volume  of  data  §  Variety  of  data  §  Velocity  of  data  §  Extracting  structure  from  unstructured  information  

¡  Hadoop  processing  enables  each  of  these  aspects  ¡  Controlling  indexing  and  search  is  enabled  by  the  

Lucid  Imagination  search  API  ¡  We  can  enable  complex  user  interactions  with  Big  

Data  on  a  self-­‐serve  basis  

Page 28: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

ZettaVox  Author  RIA  

Tomcat  App  Server  

Tomcat  Web  Services  

ZettaVoxServices  Manager  XML  

+  JSON  

Amazon  S3  

GPU  Services  Manager  

Hadoop  Services  Manager  

Analyst  Browser   Enterprise  servers   Cloud  services  

GPU  MR  Service  Manager  

GPU  

GPU  

Enterprise  Cloud  

Hadoop  Server  Job  Tracker  

Hadoop  Task  Manager  Hadoop  

Task  Manager  Hadoop  

Task  Manager  

Hadoop  Server  Name  node  

Search  Indexing  

©  2012    Kitenga  Proprietary  Mahout  

Entity  Extraction   Crawling  

Quantum4D  

RDBMS  

ReST  JSON  

Page 29: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

ZettaVox  Author  RIA  

Analyst  Browser  Enterprise  servers  

Hadoop  Server  Job  Tracker  

Hadoop  Task  Manager  Hadoop  

Task  Manager  Hadoop  

Task  Manager  

Hadoop  Server  Name  node  

Search  Indexing  

©  2012    Kitenga  Proprietary  Mahout  

Entity  Extraction   Crawling  

ReST    

JSON  

• Get  collection  information  • Create  new  collection  • Create  fields  • Delete  fields  • Edit  fields  

Indexing  

Page 30: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience

Questions?  

Page 31: Using the LucidWorks REST API to Support User-Configuration Big Data Search Experience