Download - NetDocuments- Journey from FAST to Solr

Transcript
Page 1: NetDocuments- Journey from FAST to Solr

Journey from FAST to Solr

Presented By : David Hamson , Mou Nandi

Page 2: NetDocuments- Journey from FAST to Solr

Goal of the Session

•  NetDocuments  •  Why  move  to  Solr  from  FAST  •  Architec8ng  Solr  to  work  as  a  core  module  for  a  Cloud  Document  

Management  product  user  interface  building  and  document  discovery  

•  Tes8ng  and  benchmarking  Solr  to  scale  and  perform  for  billions  of  documents  with  200  QPS  and  200  DPS  

•  Lessons  learned/  shortcuts  found  migra8ng  from  FAST  to  Solr  

2/14

Page 3: NetDocuments- Journey from FAST to Solr

Who We Are

2/14

A  Leading  cloud  content  management  and  collabora8on  service  for  small  to  medium  businesses  (SMB)  and  professional  services  firms  

Page 4: NetDocuments- Journey from FAST to Solr

Who We Serve

We  service  over  1,000  customers  across  128  countries  worldwide  and  host  over  250+million  documents.    

2/14

Page 5: NetDocuments- Journey from FAST to Solr

Why Migrate to Solr

•  Product  roadmap  does  not  fit  with  company  roadmap  •  Large  hardware  footprint  ,  expensive  to  scale  •  High  indexing  latency    •  Unpredictable  and  untraceable  document  loss    •  A  black  box  search  engine,  dependency  on  MicrosoT  FAST  support  team    •  No  control  over  new  features  •  Expensive  license    

2/14

   

•  Solr  supports  massive  index  •  Ac8ve  hardworking  development  community  •  Access  to  what’s  happening  under  the  hood  •  Improved  hardware  footprint    •  Reduced  licensing  cost    

Page 6: NetDocuments- Journey from FAST to Solr

Migration to Solr

2/14

FAST Instance 1

FAST Instance 2

FIXML

More FAST Instances

MDI + FTI

FIXML

Fast Doc Processors

Fast Doc Processors

ND Document

Fast Indexer

Fast Indexer MDI + FTI

•  95  %  of  searches  are  metadata  search  -­‐  Metadata  index  does  not  need  rich  text  processing    

•  Flexibility  to  implement  different  architecture  for  MDI  and  FTI  

•  Highest  level  of  logging  can  not  trace  the  document  loss  during  a  heavy  feeding  traffic  

Page 7: NetDocuments- Journey from FAST to Solr

Migration to Solr – Solr Indexing

2/14

ND Pipeline

Solr MD XML

Solr FT XML

Aspire

ND Document

Solr MD Instance 1

MDI Solr MDI

MD

FT

Solr FTI FTI

Solr FT Instance

Solr MD Instance 1

MDI Solr MDI

Solr FTI FTI

Solr FT Instance

Page 8: NetDocuments- Journey from FAST to Solr

The Migration Project

2/14

•  Only create MDI •  Use FAST data to prototype Solr •  Use the fixmls to build the Solr index •  Use 100% filter queries

Phase 1 - MDI

•  Build a robust feeding pipeline to handle both MD FT •  Building a text processing pipeline

Phase 2 – FTI

•  Implement new Solr features Phase 3

Page 9: NetDocuments- Journey from FAST to Solr

Some ft. view of NetDocuments Search Architecture

2/14

Web App

File System

Web Queue Solr MDI

Solr FTI

Web App

MD H

andl

er P

ool

FT P

roce

ssor

poo

l

Disp

atch

er p

ool

Query Distributor

Administration ( monitoring, debugging, stats)

FT Q

ueue

Disp

atch

er q

ueue

MDH5

MDH4

MDH3

MDH2

MDH1

FTP5

FTP4

FTP3

FTP2

FTP1

D5

D4

D3

D2

D1

NDPipeline    -­‐    

Page 10: NetDocuments- Journey from FAST to Solr

Benchmarking Solr Config Parameter for indexing

•  Created  Solr  index  from  fixmls  with  different  ram  buffer,  merge  factor  and  auto  commit  configura8on  

2/14

•  We  did  not  see  any  performance  difference  between  HDD  (  15k  rpm)  and  the  iodrive2  with  ND  documents  

•  15  threads  running  at  a  8me  from  client  feeder  applica8on  

Testing with HDD and SSD

Page 11: NetDocuments- Journey from FAST to Solr

2/14

Testing using different file system

•  We  did  not  see  huge  performance  difference  between  ext3  and  xfs  on  HDD  or  SSD,  with  ND  Documents  

•  We  chose  to  use  ext3  for  FTI    with  15K  HDD  on  RAID10    •  We  are  using  xfs  for  iodrive  for  MDI  as  suggested  by  fusion  Io  

Page 12: NetDocuments- Journey from FAST to Solr

Benchmarking Solr Indexing and Query Process

2/14

search  going  to  5  shards  search  going  to  10  shards  

5  solr  meter  instances   10    Solr  meter  instances  

Each  shard  serving    3000  queries  per  min   Each  shard  serving    1500  queries/min  

Total  15000  queries/min   Total  15000  queries/min  

avg  response  8me  8  ms   avg  response  8me  12  ms  

cpu  20  %   cpu  32  %  

ram  -­‐  52  G   ram  -­‐  53  G  

cache  warmup  8me  2.5  S   cache  warmup  8me  2.7  S  

cachehit  ra8o  .98   cachehit  ra8o  .98  

cache  size  2276   cache  size  2276  

no  evic8on   no  evic8on  

index  updated  every  7  sec   index  updated  every  7  sec  

test  ran  5  min   test  ran  8  min  

Implemented  and  compared  mul8-­‐core  index  processing  and  query    performance  compared  to  single  core  index  

Page 13: NetDocuments- Journey from FAST to Solr

6/14

qTime does not vary much with start row increase.

Benchmark qtime increase as Solr scales and start row increases

Page 14: NetDocuments- Journey from FAST to Solr

Tuning System queries for Solr

•  System  searches  are  metadata  searches  •  Thousands  of  real-­‐life  queries  were  extracted  from  FAST  query  log  •   Extensive  use  of  filter  queries  and  filter  cache  give  excellent  response  8me  for  complex  queries  

•  Example  queries:  

FAST  Query  :  ANDNOT(ANDNOT(ANDNOT(AND(AND(ndcabinets:string(“cab1",  mode="and"),ndcredate:range(2011-­‐09-­‐26T00:00:00,2012-­‐04-­‐13T23:59:59)),FILTER(ndacl:string(“acl1  acl2  acl3  ",mode="OR"))),nddeletedcabs:string(“cab1",  mode="and")),ndexten:string("ndws",  mode="and")),ndexten:string("ndflt",  mode="and"))    Solr  Query:  hlp://solrserver:port/solrSearch/core0/select?shards=solrserver:port/solrSearch/core0,1solrserver:port/solrSearch/core1&start=0&rows=500&fl=ndenvurl,nddocmodnum_s_std,nd8tle_t_idx_std&sort=ndlastmoddate_tdt_idx+desc&q=ndenvurl:*&fq=ndcabinets_smul8_idx:cab1&fq=ndcredate_tdt_idx:[2011-­‐09-­‐26T00:00:00Z  TO  2012-­‐04-­‐13T23:59:59Z]&fq={!cache=false  cost=100}(ndacl_smul8_idx:acl1  OR  ndacl_smul8_idx:acl2  OR  ndacl_smul8_idx:acl3)&fq=-­‐nddeletedcabs_smul8_idx:cab1&fq=-­‐ndexten_s_idx:ndws&fq=-­‐ndexten_s_idx:ndflt  

2/14

Page 15: NetDocuments- Journey from FAST to Solr
Page 16: NetDocuments- Journey from FAST to Solr

THANK YOU