Solr at Etsy - Giovanni Fernandez-Kincade
-
Upload
lucenerevolution -
Category
Technology
-
view
1.914 -
download
0
description
Transcript of Solr at Etsy - Giovanni Fernandez-Kincade
![Page 1: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/1.jpg)
Solr @
![Page 2: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/2.jpg)
Things I’m not going to talk about:
A/B Testingi18n
Continuos Deployment
![Page 3: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/3.jpg)
AboutUs
![Page 4: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/4.jpg)
![Page 5: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/5.jpg)
![Page 6: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/6.jpg)
10+ Million Listings500 qps
![Page 7: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/7.jpg)
Architecture Overview
![Page 8: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/8.jpg)
Architecture OverviewThrift
![Page 9: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/9.jpg)
Architecture OverviewThrift
struct Listing { 1: i64 listing_id }
struct ListingResults { 1: i64 count, 2: list<Listing> listings }
service Search { ListingResults search(1:string query) }
![Page 10: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/10.jpg)
Architecture OverviewThrift
public class Search { public interface Iface { public ListingResults search(String query) throws TException; }
Generated Java server code:
Generated PHP client code: class SearchClient implements SearchIf {
/**...**/ public function search($query) { $this->send_search($query); return $this->recv_search(); }
![Page 11: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/11.jpg)
Architecture OverviewThrift
• Service Encapsulation• Reduced Network Traffic
Why use Thrift?
![Page 12: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/12.jpg)
Architecture OverviewThrift
• Index Size• Easy to scale PK lookups
Why only return IDs?
![Page 13: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/13.jpg)
The Search Server
![Page 14: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/14.jpg)
Architecture OverviewSearch Server
• Identical Code + Hardware• Roles/Behavior controlled by Env variables• Single Java Process• Solr running as a Jetty Servlet• Thrift Servers • Smoker
![Page 15: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/15.jpg)
Architecture OverviewSearch Server
Master-specific processes:• Incremental Indexer• External File Field Updaters
![Page 16: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/16.jpg)
Load Balancing
![Page 17: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/17.jpg)
Load BalancingThrift TSocketPool
![Page 18: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/18.jpg)
Load BalancingThrift TSocketPool
![Page 19: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/19.jpg)
Load BalancingThrift TSocketPool
![Page 20: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/20.jpg)
Load BalancingServer Affinity
![Page 21: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/21.jpg)
Load BalancingServer Affinity Algorithm
$serversNew = array();
$numServers = count($servers); while($numServers > 0) {
// Take the first 4 chars of the md5sum of the server count // and the query, mod the available servers $key = hexdec(substr(md5($numServers . '+' . $query),0,4))%($numServers); $keySet = array_keys($servers); $serverId = $keySet[$key];
// Push the chosen server onto the new list and remove it // from the initial list array_push($serversNew, $servers[$serverId]); unset($servers[$serverId]); --$numServers;
}
[“host2”, “host3”, “host1”, “host4”]
![Page 22: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/22.jpg)
Load BalancingServer Affinity Algorithm
“jewelry” [“host2”, “host3”, “host1”, “host4”]
“scarf”
$key = hexdec(substr(md5($query),0,4))
[“host2”, “host3”, “host1”, “host4”]
![Page 23: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/23.jpg)
Load BalancingServer Affinity Algorithm
“jewelry” [“host2”, “host3”, “host1”, “host4”]
“scarf” [“host2”, “host1”, “host4”, “host3”]
$key = hexdec(substr(md5($numServers . '+' . $query),0,4))%(count($servers));
![Page 24: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/24.jpg)
Load BalancingServer Affinity Results
2% 20%
![Page 25: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/25.jpg)
Load BalancingServer Affinity Caveats
• Stemming / Analysis • Be wary of query distribution
![Page 26: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/26.jpg)
Replication
![Page 27: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/27.jpg)
ReplicationThe Problem
![Page 28: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/28.jpg)
ReplicationThe Problem
![Page 29: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/29.jpg)
ReplicationMulticast Rsync?
![Page 30: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/30.jpg)
ReplicationMulticast Rsync?
[15:25] <engineer> patrick: i'm gonna test multi-rsyncing some indexes from host1 to host2 and host3 in prod. I'll be watching the graphs and what not, but let me know if you see anything funky with the network[15:26] <patrick> ok....
[15:31] <keyur> is the site down?
![Page 31: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/31.jpg)
ReplicationMulticast Rsync?
![Page 32: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/32.jpg)
Hmm...Bit Torrent?
![Page 33: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/33.jpg)
ReplicationBit Torrent POCUsing BitTornado:
![Page 34: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/34.jpg)
ReplicationBit Torrent + Solr
Fork of TTorent: https://github.com/etsy/ttorrent
Multi-File SupportPerformance Enhancements
![Page 35: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/35.jpg)
ReplicationBit Torrent + Solr
![Page 36: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/36.jpg)
ReplicationBit Torrent + Solr
![Page 37: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/37.jpg)
ReplicationBit Torrent + Solr
![Page 38: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/38.jpg)
ReplicationBit Torrent + Solr
![Page 39: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/39.jpg)
Solr InterOp
![Page 40: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/40.jpg)
QParsers
![Page 41: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/41.jpg)
“writing query strings is for suckers”
![Page 42: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/42.jpg)
![Page 43: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/43.jpg)
Solr InterOpQParsers
http://host:8393/solr/person/select/?q=_query_:%22{!dismax %20qf=$fqf%20v=$fnq}%22%20OR%20(_query_:%22{!dismax%20qf=$fiqf %20v=$fiq}%22%20AND%20(_query_:%22{!dismax%20qf=$lwqf%20v=$lwq} %22%20OR%20_query_:%22{!dismax%20qf=$lqf%20v=$lq}%20%22))&fnq= %22giovanni%20fernandez-kincade%22&fqf=full_name^4&fiq=giovanni&fiqf=first_name^2.0%20first_name_syn&qt=standard&lwq=fernandez-kincade*&lwqf=last_name&lq=fernandez-kincade&lqf=last_name^3
![Page 44: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/44.jpg)
Solr InterOpQParsers
http://host:8393/solr/person/select/?q={!personrealqp}giovanni %20fernandez-kincade
![Page 45: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/45.jpg)
Solr InterOpQParsers
class PersonNameRealQParser extends QParser { public PersonNameRealQParser(String qstr, SolrParams localParams,
SolrParams params, SolrQueryRequest req) { super(qstr, localParams, params, req); }
![Page 46: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/46.jpg)
Solr InterOpQParsers
@Override public Query parse() throws ParseException { TermQuery exactFullNameQuery = new TermQuery(new Term("full_name", qstr)); exactFullNameQuery.setBoost(4.0f);
String[] userQueryTerms = qstr.split("\\s+"); Query firstLastQuery = null;
if (2 == userQueryTerms.length) firstLastQuery = parseAsFirstAndLast(userQueryTerms[0], userQueryTerms[1]); else firstLastQuery = parseAsFirstOrLast(userQueryTerms);
DisjunctionMaxQuery realNameQuery = new DisjunctionMaxQuery(0); realNameQuery.add(exactFullNameQuery); realNameQuery.add(firstLastQuery);
return realNameQuery; }
![Page 47: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/47.jpg)
Solr InterOpQParsersThe QParserPlugin that returns our new QParser: public class PersonNameRealQParserPlugin extends QParserPlugin { public static final String NAME = "personrealqp";
@Override public void init(NamedList args) {}
@Override public QParser createParser(String qstr, SolrParams localParams,
SolrParams params, SolrQueryRequest req) { return new PersonNameRealQParser(qstr, localParams, params, req); } }
![Page 48: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/48.jpg)
Solr InterOpQParsers
Registering the plugin in solrconfig.xml:
<queryParser name="personrealqp" class="com.etsy.person.solr.PersonNameRealQParserPlugin" />
![Page 49: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/49.jpg)
Custom Stemmer
![Page 50: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/50.jpg)
Solr InterOpCustom Stemmer
![Page 51: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/51.jpg)
Solr InterOpCustom Stemmer
banded, banding, birding, bouldering, bounded, buffing, bundler, canning, carded, circled, coupler, dangler, doubler, firring, foiling, hooper, japanned, lipped, napped, papered, pebbled, pitted, pocketed, reductive, ricer, rooter, roper, seeded, shouldered, silvered, skinning, spindling, staining, stitcher,
strapped, threaded, yellowing
![Page 52: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/52.jpg)
Solr InterOpCustom StemmerFirst we extend KStemmer and intercept stem calls:
public class LStemmer extends KStemmer {
/**.....**/
@Override String stem(String term) { String override = overrideStemTransformations.get(term); if(override != null) return override; return super.stem(term); } }
![Page 53: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/53.jpg)
Solr InterOpCustom Stemmer
Then create a TokenFilter that uses the new Stemmer:
final class LStemFilter extends TokenFilter {
/**.....**/ protected LStemFilter(TokenStream input, int cacheSize) { super(input); stemmer = new LStemmer(cacheSize); } @Override public boolean incrementToken() throws IOException { /**....**/ }
![Page 54: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/54.jpg)
Solr InterOpCustom Stemmer
Create a FilterFactory that exposes it:
public class LStemFilterFactory extends BaseTokenFilterFactory { private int cacheSize = 20000; @Override public void init(Map<String, String> args) { super.init(args); String cacheSizeStr = args.get("cacheSize"); if (cacheSizeStr != null) { cacheSize = Integer.parseInt(cacheSizeStr); } } @Override public TokenStream create(TokenStream in) { return new LStemFilter(in, cacheSize); } }
![Page 55: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/55.jpg)
Solr InterOpCustom Stemmer
And finally plug it into your analysis chain:
<analyzer> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.ASCIIFoldingFilterFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true"
words="solr/common/conf/stopwords.txt"/> <filter class="com.etsy.solr.analysis.LStemFilterFactory" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> </analyzer>
![Page 56: Solr at Etsy - Giovanni Fernandez-Kincade](https://reader033.fdocuments.us/reader033/viewer/2022052911/559e19d91a28ab9c4e8b45e3/html5/thumbnails/56.jpg)
Thanks!