Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final...

25
Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net, Directi January 20, 2014 1 Knowledge Graph Based Keyword Update Mentor: Jigar Patel

Transcript of Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final...

Page 1: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Final Project Presentation

Knowledge Graph Based Keyword Update

Ashwin Kumar, IIT DelhiMedia.net, Directi

January 20, 20141 Knowledge Graph Based Keyword Update

Mentor: Jigar Patel

Page 2: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Problem Statement

Given a keyword (in its best form), identify whether it is about a product whose newer version is available is the market. If yes, modify the keyword appropriately.

Examples.

Buy new iPhone 3G => Buy new iPhone 5

Samsung Galaxy S III Reviews => Samsung Galaxy S4 Reviews

Apple iPad 2 16GB => Apple iPad Mini 16GB

January 20, 20142 Knowledge Graph Based Keyword Update

Page 3: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Approach Followed

Created a graph containing all entities.

Entities have attributes associated with them.

There exists relationships across entities.

Parent-Child

Successor-Predecessor

January 20, 20143 Knowledge Graph Based Keyword Update

Page 4: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Knowledge Graph

January 20, 20144 Knowledge Graph Based Keyword Update

root

smartphones automobiles . . . . .

apple samsung

iphone 3GS iphone 4 iphone 4S iphone 5

. . . . .

. . .

Page 5: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Knowledge Graph

January 20, 20145 Knowledge Graph Based Keyword Update

iphone 5

Name iphone 5Brand apple inc.DeveloperManufacturer foxconnType smartphoneRelease Date 2012End Date presentParent appleSuccessor -Predecessor iphone 4SChildren nullWebsite www.apple.com/iphoneExternal Links

Page 6: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Step1: Sources of Data

Looked at three different sources of data.

Wikipedia

Has roughly 7.5 million entities.

Dbpedia

Contains dataset extracted from wikipedia dumps.

Last updated on May, 2012.

So, it is of no use.

Freebase

Stores wikipedia entities in RDF format.

Best data source available turned out to be Wikipedia.

January 20, 20146 Knowledge Graph Based Keyword Update

Page 7: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Step1: Wikipedia Date Extraction

Downloaded the latest wikipedia dump available. (Jun 14, 2013 sized 42GB)

Wrote my own parser to extract relevant information from each wikipedia page.

Created four tables.

Wikipedia Categories

Wikipedia Infobox

Wikipedia Redirection

Wikipedia External Links

Table creation took roughly 25 hours.

January 20, 20147 Knowledge Graph Based Keyword Update

Page 8: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Step1: Data Collection

January 20, 20148 Knowledge Graph Based Keyword Update

Page 9: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Step2: Inserting Entities

Targeted approach for different classes of products.

Status as of now.

Smartphones

Automobiles

Ipods/Ipads

Cameras

January 20, 20149 Knowledge Graph Based Keyword Update

Page 10: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Step2: Smartphones

Entity Identification.

Pages in categories Nokia mobile phones, Samsung mobile phones, Sony Ericsson mobile phones etc.

Pages in categories Smartphones, Touchscreen mobile phones, Multi-touch mobile phones etc.

Entity Classification.

Based on “manufacturer” / “developer” / “brand”.

January 20, 201410 Knowledge Graph Based Keyword Update

Page 11: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Step2: Smartphones

How to get release date?

Infobox

Available

Releasedate

Released

Production

Model Years

January 20, 201411 Knowledge Graph Based Keyword Update

Page 12: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Step2: Smartphones

How to get release date?

First paragraph of wikipedia article Apple held an event to formally introduce the phone on

September 12, 2012.

The beTouch E110 was released on February 15, 2010.

It is the fifth generation of the iPhone, succeeding the iPhone 4, and was announced on October 4, 2011.

Categories

Ford Freestar.

Only applicable in case of automobiles.

January 20, 201412 Knowledge Graph Based Keyword Update

Page 13: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Step2: Smartphones

Algorithm of Keyword Replacement

Generate Ngrams of the given keyword.

For each Ngram, check whether it matches with any entity present in the graph and generate a list of all matching Ngrams.

Merge shorter Ngrams to larger Ngrams to get filtered list.

If filtered list has exactly one Ngram, then only the keyword is subject to replacement.

January 20, 201413 Knowledge Graph Based Keyword Update

Page 14: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Step2: Smartphones

Example1.

Keyword: buy new iphone 3G now

Two Ngram matches: iphone, iphone 3G

Filtered match: iphone 3G

Go to parent of iphone 3G => apple

Get the entity with latest release date among the children of apple => iphone 5

Replace iphone 3G (2008) with iphone 5 (2012).

January 20, 201414 Knowledge Graph Based Keyword Update

Page 15: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Step2: Smartphones

Example2.

Keyword: iphone 3G vs samsung galaxy S

Four Ngram matches:

iphone

iphone 3G

samsung

samsung galaxy S

Filtered matches:

iphone 3G

samsung galaxy S

No replacement.

January 20, 201415 Knowledge Graph Based Keyword Update

Page 16: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Step2: Smartphones

Issues.

No release date available.

Infobox not present

If present, no “production” / “released” fields

No date mentioned in first para.

Several smartphones do not have a wikipedia page.

Nokia 3585i, BlackBerry 7730

Entities have multiple names.

buy new apple iphone four

Fortunately, wikipedia redirection helps in this case.

iphone four => iphone 4

January 20, 201416 Knowledge Graph Based Keyword Update

Page 17: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Step2: Smartphones

Results.

Total Entities Inserted: 900

Test Keywords: 5200

Keywords Updated: 3200

General Keywords (no replacement needed): ~1000

Nokia Connectivity Adapter Cable

Keywords that could not be updated: ~1000

..\Downloads\Data_wikipedia\enwiki-latest-pages-articles.xml\output8_smartphones.xls

January 20, 201417 Knowledge Graph Based Keyword Update

Page 18: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Step3: Automobiles

Entity Identification.

Categories: Ford vehicles, BMW vehicles, Porsche vehicles etc.

Categories: Hatchbacks, SUVs, Sedans etc.

Entity Classification.

Based on “wikipedia category”.

January 20, 201418 Knowledge Graph Based Keyword Update

Page 19: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Step3: Automobiles

An automobile cannot be replaced with another automobile of the same company arbitrarily.

Only a sedan car can replace a sedan car.

But “type” information is not available in organised form.

Decided to perform only year replacement.

Example.

2004 Chevrolet Silverado => 2013 Chevrolet Silverado

2003 Suzuki Aerio => 2007 Suzuki Aerio

January 20, 201419 Knowledge Graph Based Keyword Update

Page 20: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Step3: Automobiles

Tried to be on a safer side.

Replaced keyword only on a complete match.

Generated a list of stopwords for this.

compare, discount, engine etc.

Example. 2005 Mustang GT Convertible

January 20, 201420 Knowledge Graph Based Keyword Update

Year Entity Stopword

Page 21: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Step3: Automobiles

Results.

Total Entities Inserted: 4100

Keywords: ~30000

Keywords Updated: 22000

Keywords that could not be updated: ~8000

..\Downloads\Data_wikipedia\enwiki-latest-pages-articles.xml\output12_automobiles.xls

January 20, 201421 Knowledge Graph Based Keyword Update

Page 22: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Step3: Ipods

Found that there is no need to replace any ipod related keywords as almost all ipod models are selling to this date.

Ipod Classic

Ipod Touch

Ipod Shuffle

Ipod Nano

Discontinued Models

Ipod Mini

Ipod Photo

There are only a few keywords having these.

January 20, 201422 Knowledge Graph Based Keyword Update

Page 23: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Step4: Ipads and Tablets

Followed the same approach as of smartphones.

Results.

Total Entites Inserted: 82

Keywords: 4352

Keywords Replaced: almost all

..\Downloads\Data_wikipedia\enwiki-latest-pages-articles.xml\output7_ipad_ipod.xls

January 20, 201423 Knowledge Graph Based Keyword Update

Page 24: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

Step5: Future Work

Extend this to cover all kind of electronic products.

Fill up missing entities.

Add month to “release date” attribute.

January 20, 201424 Knowledge Graph Based Keyword Update

Page 25: Final Project Presentation - cse.iitd.ernet.incs1100211/files/directi_presentation.pdf · Final Project Presentation Knowledge Graph Based Keyword Update Ashwin Kumar, IIT Delhi Media.net,

THANK YOU

Any Question?

January 20, 201425 Knowledge Graph Based Keyword Update