Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote

Search Architecture at Evernote Not Your Typical Big Data Problem

CHRISTIAN KOHLSCHÜTTERSr. Search Researcher

Augmented Intelligence @ Evernote

We are the workspace.

Write Collect FindWrite Collect Find Present

Collect

Serving 100+ Million Users Worldwide

• 559 Shards (200k users per shard), Linux/Tomcat/MySQL

• 3.2 PB WebDAV-based Storage

• 224 TB SSD capacity for System, MySQL and Lucene

• 3.1 Billion Notes stored, 3.8 Bn Notes ever created

• 115 Million Notes created or edited last week

• 26 Million API calls to Context last week

• 1 Lucene index per user

Evernote’s Three Laws of Data Protection

• Your Data is Yours

• Your Data is Protected

• Your Data is Portable

We are not a “big data” company and do not try to make

money from your content.

Technical Debt

• I/O over Lucene 2.9 indexes became a bottleneck

• Code was woven into our “NoteStore” platform

• Index changes had to be backwards-compatible

• Complex re-indexing would require taking down a shard

• Needed to rethink the entire architecture, but keep public API

• Make search faster vs. Make us move faster

From Lucene 2.9 to 4.x and beyond

• Large refactoring of search code

• Lucene no longer is a direct dependency in “NoteStore”

• Design-by-Contract

• Can now run multiple Lucene versions concurrently in one VM

• … and one specific version / schema per user

• Migrated all users to Lucene 4.5, avg. downtime/user < 1 min

Separate the What from the How

Separation of Concerns

UserIndexManager

UserIndexFactory

UserIndex

Lucene29UserIndexImpl

Lucene4UserIndexImpl

API

Implementation

Caching UserIndex

Benchmarking UserIndex

NoteStore

...

Hide Lucene behind ClassLoaders

• One Maven artifact per major Lucene version,

build profiles for code-reuse between minor updates

• Code is packaged with dependencies into one common fat-jar with prefixes for each

implementation:

- lucene29/org/apache/lucene/...

lucene29/com/evernote/search/lucene2/…

- lucene43/org/apache/lucene/...


- lucene45/org/apache/lucene/…


• ResourcePrefixClassLoader called from outside code strips prefix,

uses fat-jar as the only dependency

New Index Structure

• Each user’s index now comes with a properties file that

describes its internal structures, such as index type and

version. Can handle different behavior in code.

• Changes to the index schema? Just increase the index version

and handle the rest in code

• Automatically trigger re-indexing if necessary

Index Auto-Migration

• Target Default Index Implementation centrally set by DevOps

• Triggered upon UserIndex access

• UserIndex facade determines whether re-index is necessary

• “Cruise Control” automates off-peak access

# Threads

Phase 1: Migration to Lucene 4

• Changes in Disk I/O (CPU correlates)

overall: -81%

searchRelatedNotes: -87%

keyword-based search: -96%

Saves TBs of I/OSaves TBs of I/O

Phase 2: Add Compression

• User Indexes sizes and access patterns are skewed

• Optimize large accounts

• Directory-level compression

• Compress segment files, invisible to the IndexReader

• Only when re-indexing / every 3 months

• In-memory Caching

LuceneTransform

• https://code.google.com/p/lucenetransform by Mitja Lenič

• We ported it to Lucene 4.5 (now available upstream for 4.9)

• Improved LRU caching, added LZ4/Snappy compression

• We will contribute our changes soon

https://code.google.com/p/lucenetransform

OverlayDirectory

on disk:

_23.cfe

_23.si

c$_23.cfs

segments.gen

segments_2

visible to IndexReader:

_23.cfe

_23.si

_23.cfs

segments.gen

segments_2

Results

• Compressed the largest 5% of all indexes using LZ4

• 1.9 TB index space saved

• 100 MB LRU Cache hit rate: 79% on avg (67% — 93%)

• Saved 0.5 PB disk reads/week

• Cache is so good, may use better/slower compression algorithm,

may apply to more usersSaves PBs of I/OSaves PBs of I/O

Bugs, Bugs, Bugs :-)

• We’ve been warned

“VInt bug”

“background merge hit exception”

JVM segfaults

!

• and then this happened, too

SPI / ContextClassLoaders … LUCENE-4713

Deadlocks / over-optimistic locking

Unclosed resources / Too many open file handles => HousekeepingDirectory

Issues with FieldCache singleton => LUCENE-831, LUCENE-2133, …

…

• UserIndex tracks “broken” state; allows self-healing (rebuild)

Conclusion

• Design-By-Contract, Separation of Concerns

• Per-user Search Implementation / Multiple Lucene versions

• Migrated 60M users, without noticeable downtime

• Migration allowed index changes, saves TBs of disk I/O

• Block-level Index Compression, saves PBs of disk I/O

• This is just the beginning.

Thank [email protected]

mailto:[email protected]

We’re hiringevernote.com/careers

https://evernote.com/careers/

Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote

Software

Transcript of Search Architecture at Evernote: Presented by Christian Kohlschütter, Evernote