Post on 16-Apr-2017
Dato Confidential
Fraud Detection Webinar
Alon PalomboData Scientist
alon@dato.com
Product Matching Webinar
Dato Confidential
Agenda• Who is Dato?• Data science workflow• What is product matching?• Demo using real public data• Questions
Dato Confidential
Dato: We Intelligent Applications
45+ and growing fast!
Dato Confidential
Customers
Dato Confidential
Data Science workflow
Ingest Transform
Model DeployUnstructured Data
Dato Confidential
What is product matching?• In 2016, global e-commerce sales are expected to
reach $1.92 Trillion.
• Online retailers and price comparison sites curate product catalogues by aggregating from multiple sources.
• Product matching is the task of keeping these catalogues free of duplicates, full of attributes per product, and consistent across different sites.
6
Dato Confidential
DifficultyStructured Attributes
Reviews
Images
Description
Thor, Andreas. "Toward an adaptive String Similarity Measure for Matching Product Offers." GI Jahrestagung (1). 2010.
{Aggregate MultipleSources
Dato Confidential
Definition• Ironically, there are similar names for very similar
problems:• Entity resolution• Record linking• De-duplication• Reference reconciliation• Data matching• and more…
Dato Confidential
Definition• In GraphLab Create we distinguish between Record
Linkage and De-duplication.
• Record Linkage refers to matching structured query records to a fixed set of reference records with the same schema.
• De-duplication refers to assigning an entity label to each row. Records with the same label are likely correspond to the same real-world entity.
Dato Confidential
Product matching demo – using real public data
Dato Confidential
Summary• Product matching is at the heart of e-commerce.• Many relevant similar problems with similar
solutions.• Easy exploration, modeling, and evaluation using
GraphLab Create.
Dato Confidential
Our machine learning course
https://www.coursera.org/learn/ml-foundations
Dato Confidential
Questions?
alon@dato.com