Indextank east bay ruby meetup slides

41
Finding anything: Real-time search with IndexTank Tim Spence April 19, 2011

Transcript of Indextank east bay ruby meetup slides

Page 1: Indextank east bay ruby meetup slides

Finding anything: Real-time search with IndexTank

Tim SpenceApril 19, 2011

Page 2: Indextank east bay ruby meetup slides

About the Presenter

Tim Spence● Senior Infrastructure Engineer at MedHelp

( http://www.medhelp.org/ )● Former .NET developer● Recently converted to Ruby● In love with Open Source Software● More at http://whyhello.im/tim

Page 3: Indextank east bay ruby meetup slides

Agenda

● State of search today● Quick survey: how much time/effort did

YOU spend implementing search on your webapp?

● Examples of services that need improved search

● IndexTank to the rescue● Case study: reddit.com

Page 4: Indextank east bay ruby meetup slides

Agenda, continued

● How I found out about IndexTank● Two apps I built with IndexTank● Live Demo

Page 5: Indextank east bay ruby meetup slides
Page 6: Indextank east bay ruby meetup slides

The State of Search Today

● Not well implemented at all– Search works, but...

– Barely

● How many pages of results do you typically browse through before finding what you were looking for?

● Or do you give up and head for google site search instead?

Page 7: Indextank east bay ruby meetup slides

Survey Time!

● How much time/effort did YOU spend implementing search on your webapp?

● How many times have you iterated on your search feature?

● When was the last time someone thanked you for building a powerful, reliable search feature for your webapp?

Page 8: Indextank east bay ruby meetup slides

My Opinion

● Search as an in-app feature is an afterthought

● Minimal implementation is the norm● If it wasn't for MySQL/MS-SQL full text

indexing, most apps probably wouldn't even have a search feature

● Most good web apps don't make it easy for users to find specific content outside of predetermined navigation

Page 9: Indextank east bay ruby meetup slides

Let's pick on some apps!

● These are companies with great products, but their search comes up short

● Don't worry–they can take it!

Page 10: Indextank east bay ruby meetup slides

App #1: Github

Page 11: Indextank east bay ruby meetup slides

App #1: Github

Page 12: Indextank east bay ruby meetup slides

App #1: Github

● Interface is decent– Search repos, code, users, or everything

– Search by language

● However...– Can't do much with results but browse

– Check out this example

Page 13: Indextank east bay ruby meetup slides

App #1: Github

Page 14: Indextank east bay ruby meetup slides

App #1: Github

● Why these results aren't so hot– Can't search by most recently maintained

– Can't search by most popular (most watched)

– Are you ready to browse 1,297 results?

● Advanced search capabilities exist, but not the best interface

– recency/popularity implemented, but require specific arguments

Page 15: Indextank east bay ruby meetup slides

App #2: Amazon Web Services

● ”Hey, I bet I can find an AMI from the community for the exact EC2 setup I need”

● Fact: probably not

Page 16: Indextank east bay ruby meetup slides

App #2: Amazon Web Services

Page 17: Indextank east bay ruby meetup slides

App #2: Amazon Web Services

● Notice something missing?– No search

– Only sort by date, title

● Ready to browse 934 results? – I'd rather build my own AMI

● Incredible missed opportunity– o/s search

– Stack search

– etc...

Page 18: Indextank east bay ruby meetup slides

Fact: Github & Amazon aren't the only ones

● Lots of good web services● Massive quantities of quality content● Unfortunately not discoverable in

meaningful ways

Page 19: Indextank east bay ruby meetup slides

Interlude: Sites with great search

● Foodspotting– Proximity

– Recency

– Rating

● Medhelp– Content category

– Promoted content

● Other sites I overlooked? Whose search do you like?

Page 20: Indextank east bay ruby meetup slides

What was the point of that last slide?

● Search can be useful if it is valued as a feature

● Any company willing to invest in the resources can build and host a high quality search engine

● However, must you roll your own?

Page 21: Indextank east bay ruby meetup slides

Enter Search as a Service

● No need for you to invest in additional infrastructure

● No need to reinvent the wheel– Search is a solved problem

– Let the experts refine it

Page 22: Indextank east bay ruby meetup slides

IndexTank to the rescue!

● Hosted–no load on your infrastructure● Powerful

– We'll get into the details next

● Always Improving– Search IS their product

● Freemium● Easy to implement

Page 23: Indextank east bay ruby meetup slides

Let's talk features

● Real-time search– Real-time indexing–results immediately

available

● Custom scoring● Autocomplete● Faceting● Geo search● Advanced text search

Page 24: Indextank east bay ruby meetup slides

●Real-time search

● Real-time indexing– results immediately available

● Index multiple docs/sec● Overwrite existing docs as you wish

– Changes also immediately available

Page 25: Indextank east bay ruby meetup slides

Custom Scoring

● Implementer has full control over how results are returned

● Choose which fields are searched● Use pre-written scoring functions● Or write your own

Page 26: Indextank east bay ruby meetup slides

Custom Scoring

Page 27: Indextank east bay ruby meetup slides

Everyone loves autocomplete

● Saves users time● Potentially avoids spelling errors

– Not for hunters/peckers

● Adds a degree of intelligence to the search process

Page 28: Indextank east bay ruby meetup slides

Faceting

● Does it make sense for you to categorize documents in your index?

– In all cases, YES

● Consider your advanced users and the narrow results they seek

– Don't make anyone sift through irrelevant results

Page 29: Indextank east bay ruby meetup slides

Faceting

Page 30: Indextank east bay ruby meetup slides

Geo

● It's 2011– Location is more relevant than ever before

– Mobile is skyrocketing–every client has a GPS

● IndexTank has built-in geo proximity search capability

Page 31: Indextank east bay ruby meetup slides

Geo

Page 32: Indextank east bay ruby meetup slides

Advanced Text Search (Beta)

● Fuzzy search (Did you mean...?)● Stemming

– Alternate word forms (tense, possession, etc...)

● Alternate spellings– Misspellings

Page 33: Indextank east bay ruby meetup slides

Other Benefits

● Zero maintenance● Scalability included for free● Easy implementation

– Clients available in many languages

– Excellent documentation–Let's check it out

● Excellent support– Humans or bots? You decide

● Dog food: their site search is done well

Page 34: Indextank east bay ruby meetup slides
Page 35: Indextank east bay ruby meetup slides

Case Study: reddit.com

● High traffic news aggregator (> 1.0E9 pvs/mo) with tons of content

● Who remembers how bad reddit's search was?

– When it even worked

● Can't blame them for trying– Many attempts, but none worked

● IndexTank excelled in all areas● Let's check it out now

Page 36: Indextank east bay ruby meetup slides

My experience with IndexTank

● Discovered through Heroku/IndexTank contest

● Built my first irl Rails app in an afternoon/evening w/ fellow hacker Chris Saylor (@cwsaylor)

● Didn't win the contest but learned how easy it is to quickly create highly targeted search

Page 37: Indextank east bay ruby meetup slides

App #1: Toxosis

● Searchable database of toxic release data supplied by U.S. E.P.A.

● Hosted at http://toxosis.heroku.com/● Search enabled on many fields including

city/state/zip, toxin● Additional fields can be added to index

– When I have time, of course...

Page 38: Indextank east bay ruby meetup slides

More personal backstory

● Still in the business of reinventing myself as a Rails developer

● How to get a Rails gig? Develop an app multiple Rails apps and show it them off

● Opportunities are everywhere–contests, hackathons, and weekend hacks for developer community

Page 39: Indextank east bay ruby meetup slides

App #2: SXSWdex

● Searchable database of 2011 SXSW attendees

● Hosted at http://sxswdex.heroku.com/● Design goal: do a better job than SXSW

official site● Search within bio, company, location,

name● Facets: company, city/state

Page 40: Indextank east bay ruby meetup slides

The moment we've all been waiting for

● Let's build an app!

Page 41: Indextank east bay ruby meetup slides

Questions?

● Q&A time with an IndexTank engineer