Searching Images by Color Using Solr

35

description

Slides from "Searching 35 Million Images by Color Using Solr" presented by Chris Becker at Solr Lucene Revolution 2014 in Washington D.C.

Transcript of Searching Images by Color Using Solr

Page 1: Searching Images by Color Using Solr
Page 2: Searching Images by Color Using Solr

Searching Images by ColorChris Becker

Search Engineering @ Shutterstock

Page 3: Searching Images by Color Using Solr

What is Shutterstock?

• Shutterstock sells stock images, videos & music.

• Crowdsourced from artists around the world

• Shutterstock reviews and indexes them for search

• Customers buy a subscription and download them

Page 4: Searching Images by Color Using Solr

Why search by color?

Page 5: Searching Images by Color Using Solr

Stock photography on the internet…

images from www.shutterstock.com

Page 6: Searching Images by Color Using Solr

Stock photography on the internet…

images from www.shutterstock.com

Page 7: Searching Images by Color Using Solr

Color is one of many visual

attributes that you can use

to create an engaging

image search experience

Page 9: Searching Images by Color Using Solr

Diving into Color Data

Page 10: Searching Images by Color Using Solr

Color Spaces

• RGB

• HSL

• Lab

• LCH

images from www.wikipedia.org

Page 11: Searching Images by Color Using Solr

Calculating Distances Between Colors

• Euclidean distance works reasonably well in any color space

distRGB = sqrt((r1-r

2)^2 + (g

1-g

2)^2 + (b

1-b

2)^2)

distHSL = sqrt((h1-h

2)^2 + (s

1-s

2)^2 + (l

1-l

2)^2)

distLCH = sqrt((L1-L

2)^2 + (C

1-C

2)^2 + (H

1-H

2)^2)

distLAB = sqrt((L1-L

2)^2 + (a

1-a

2)^2 + (b

1-b

2)^2)

• More sophisticated equations that better account for human

perception can be found at

http://en.wikipedia.org/wiki/Color_difference

Page 12: Searching Images by Color Using Solr

Images are just numbers

[

[[054,087,058], [054,116,206], [017,226,194], [234,203,215], [188,205,000], [229,156,182]],

[[214,238,109], [064,190,104], [191,024,161], [104,071,036], [222,081,005], [204,012,113]],

[[197,100,189], [159,204,024], [228,214,054], [250,098,125], [050,144,093], [021,122,101]],

[[255,146,010], [115,156,002], [174,023,137], [161,141,077], [154,189,005], [242,170,074]],

[[113,146,064], [196,057,200], [123,203,160], [066,090,234], [200,186,103], [099,074,037]],

[[194,022,018], [226,045,008], [123,023,087], [171,029,021], [040,001,143], [255,083,194]],

[[115,186,246], [025,064,109], [029,071,001], [140,031,002], [248,170,244], [134,112,252]],

[[116,179,059], [217,205,159], [157,060,251], [151,205,058], [036,214,075], [107,103,130]],

[[052,003,227], [184,037,078], [161,155,181], [051,070,186], [082,235,108], [129,233,211]],

[[047,212,209], [250,236,085], [038,128,148], [115,171,113], [186,092,227], [198,130,024]],

[[225,210,064], [123,049,199], [173,207,164], [161,069,220], [002,228,184], [170,248,075]],

[[234,157,201], [168,027,113], [117,080,236], [168,131,247], [028,177,060], [187,147,084]],

[[184,166,096], [107,117,037], [154,208,093], [237,090,188], [007,076,086], [224,239,210]],

[[105,230,058], [002,122,240], [036,151,107], [101,023,149], [048,010,225], [109,102,195]],

[[050,019,169], [219,235,027], [061,064,133], [218,221,113], [009,032,125], [109,151,137]],

[[010,037,189], [216,010,101], [000,037,084], [166,225,127], [203,067,214], [110,020,245]],

[[180,147,130], [045,251,177], [127,175,215], [237,161,084], [208,027,218], [244,194,034]],

[[089,235,226], [106,219,220], [010,040,006], [094,138,058], [148,081,166], [249,216,177]],

[[121,110,034], [007,232,255], [214,052,035], [086,100,020], [191,064,105], [129,254,207]],

]

Page 13: Searching Images by Color Using Solr

• getting histograms

• computing median values

• standard deviations / variance

• other statistics

Any operation you can do on a set of

numbers, you can do on an image

Page 14: Searching Images by Color Using Solr
Page 15: Searching Images by Color Using Solr

Extracting Color Data

Page 16: Searching Images by Color Using Solr

Tools & Libraries

• ImageMagick

• Python Image Library

• ImageJ

Page 17: Searching Images by Color Using Solr

# python example to get a histogram from an image

import PIL

from PIL import Image

from pprint import pprint

image = Image.open('./samplephoto.jpg')

width, height = image.size

colors = image.getcolors(width*height)

hist = {}

for i, c in enumerate(colors):

hex = '%02x%02x%02x' % (c[1][0],c[1][1],c[1][2])

hist[hex] = c[0]

pprint(hist)

Page 18: Searching Images by Color Using Solr

Indexing & Searching

in Solr

Page 19: Searching Images by Color Using Solr

Indexing color histograms

color_txt = "cfebc2

cfebc2 cfebc2 cfebc2

cfebc2 cfebc2 cfebc2

cfebc2 cfebc2 cfebc2

95bf40 95bf40 95bf40

95bf40 95bf40 95bf40

2e6b2e 2e6b2e 2e6b2e

ff0000 …"

• index colors just like you would index text

• amount of color = frequency of the term

Page 20: Searching Images by Color Using Solr

Solr Schema & Queries

• Can use solr’s default ranking effectively

/solr/select?q=ff0000 e2c2d2&qf=color&defType=edismax…

• or use term frequencies directly for specific sort functions:

sort=product(tf(color,"ff0000"),tf(color,"e2c2d2")) desc

<field name="color" type="text_ws" …>

Page 21: Searching Images by Color Using Solr

Indexing color statistics

lightness:

median: 2

standard dev: 1

largest bin: 0

largest bin size: 50

saturation

median: 0

standard dev: 0

largest bin: 0

largest bin size: 100

Represent aggregate statistics of each image

Page 22: Searching Images by Color Using Solr

Solr Fields & Queries

• Sort by the distance between input param

and median value for each image

/solr/select?q=*&sort=abs(sub($query,hue_median)) asc

<field name=”hue_median” type=”int” …>

Page 23: Searching Images by Color Using Solr

Ranking & Relevance

Page 24: Searching Images by Color Using Solr

How much of the image has the color ?

image from www.shutterstock.com

Page 25: Searching Images by Color Using Solr

is this relevant if I search for ?

image from www.shutterstock.com

Page 26: Searching Images by Color Using Solr

which image is more relevant if I search for ?

image from www.shutterstock.com

Page 27: Searching Images by Color Using Solr

is this relevant if I search for ?

image from www.shutterstock.com

Page 28: Searching Images by Color Using Solr

How do we account for these factors?

Page 29: Searching Images by Color Using Solr

How much of the image contains the

selected color?

• Score each color by the number of pixels

sort=tf(color,"cfebc2") desc

Page 30: Searching Images by Color Using Solr

Balance Precision and Recall

• Reduce your colorspace enough

to balance:

• color accuracy

• index size

• query complexity

• result counts

• only need 100-200 colors for a good UX

Page 31: Searching Images by Color Using Solr

Weighing Multiple Colors Together

• If you search for 2 or more colors, the top result should have

the most even distribution of those colors

• simple option:sort=product(tf(color,"ff9900"),tf(color,"2280e2")) desc

• more complex: compute the standard deviation or variance

of the term frequencies of matching color values for each

image, and sort the results with the lowest variance first.

Page 32: Searching Images by Color Using Solr

Weighing Similar & Different Colors

• The score for one color should reflect all the colors in the image.

• At indexing time, increase the score based on similar colors;

decrease it based on differing colors.

Page 33: Searching Images by Color Using Solr

Conclusion

Page 34: Searching Images by Color Using Solr

Conclusion• Steps for building color search in Solr:

• Extract colors using a tool like the Python Image Library

• Score colors based on the number of pixels

• Adjust scores based on similar / different colors

• Index colors into Solr as text document

• In your query, sort by the term frequency values for each

color

Page 35: Searching Images by Color Using Solr

One more demo…