Optimizing Data Architecture for Natural Language Processing

17
x.ai a personal assistant who schedules meetings for you DATA ENGINEERING APRIL 2015 NEW YORK CITY VISIT X.AI TO JOIN THE WAITLIST Optimizing data architecture design for natural language processing @alexpoon06 @xdotai

Transcript of Optimizing Data Architecture for Natural Language Processing

Page 1: Optimizing Data Architecture for Natural Language Processing

x.ai a personal assistant who schedules meetings for you

DATA ENGINEERING APRIL 2015 NEW YORK CITY VISIT X.AI TO JOIN THE WAITLIST

Optimizing data architecture design for

natural language processing

@alexpoon06@xdotai

Dennis R. Mortensen
I love you! :-) but you should delete this slide. It's generally much better to do this as a voice over to the initial slide, and sometimes the announcer will do one part of it already and it comes off as insecure when you revisit. Just a suggestion, so go wild.
Alex Poon
i'll take your advice :)
Page 2: Optimizing Data Architecture for Natural Language Processing

What’s x.ai?

Magically Schedule Meetings

Page 3: Optimizing Data Architecture for Natural Language Processing

Pain Solution Jane Alex Jane [email protected] Alex

CC: Amy @ x.ai“Amy, please set something up for John and I next week.”

Page 4: Optimizing Data Architecture for Natural Language Processing

Product Characteristics

● Need quick response

● Supervised Learning requires large training data set

● # meetings scale linearly with # users

● 1 user meets with N people

● people share meeting places and company

Page 5: Optimizing Data Architecture for Natural Language Processing

Technical challenges

● Natural language understanding with extremely high accuracy

● Natural conversation over email with people

● Complex data relationship

● Optimize for sparse data

● Speed of development and change

Page 6: Optimizing Data Architecture for Natural Language Processing

Stack

Database(tell you in a couple of slides)

Page 7: Optimizing Data Architecture for Natural Language Processing

Queue based architecture

Page 8: Optimizing Data Architecture for Natural Language Processing

Picking a database

● Familiar technology

● Low initial maintenance

● Flexible schema

● Easy early scaling

● Reasonable production quality

Page 9: Optimizing Data Architecture for Natural Language Processing

Pros● Schema-less

● Mongoose (Schema Control)

● Work out of the box

● Repliset scales reasonably well

● MMS provides good monitoring

Cons● No joins

● Pain to do backup yourself

● DB level locking (Mongo v2.6)

● Cross datacenter is not great

● I don’t want to shard this

Page 10: Optimizing Data Architecture for Natural Language Processing

Modeling Meetings{

host : Participant,guests : [Participant],time : { start : Date,

end: Date, recurring: String},

timezone : String,duration : Number,locations : [Location],timeInitiated : Date,timeRescheduled: [Date],timeCompleted: Date,status : String,…...

}

Page 11: Optimizing Data Architecture for Natural Language Processing

Modeling Meetings

Meetings

People

Places Companies

1:N and N:N relationships across various collections

Page 12: Optimizing Data Architecture for Natural Language Processing

Embedding vs. Referencing

{ host : { name : {.....}, nicknames : [String], phones : [{Type: String}] primaryEmail : String, secondaryEmails : [String], title : String, signatures: [String], …... }, travelTime : String, status : String, timezone : String, duration : Number, …...}

{ host : Participant, travelTime : String, status : String, timezone : String, duration : Number, …...}

Participant { name : {.....}, nicknames : [String], phones : [{Type: String}] primaryEmail : String, secondaryEmails : [String], title : String, signatures: [String], …... },

Embedding ReferencingConsiderations

● Query patterns

● Access to embedded doc

● # references to a doc

● Application level join

● 1-way or 2-way referencing

Page 13: Optimizing Data Architecture for Natural Language Processing

Assistant is a PERSON Assistant is an Attribute of PERSON

Assistant is a PROFILE, a separate and smaller

entity

Modeling someone’s assistant1st try 2nd try 3rd try

{ name : {.....}, nicknames : [String], phones : [{Type: String}] primaryEmail : String, secondaryEmails : [String], title : String, signatures: [String] …...}

{ name : { first : String, last: String }, primaryEmail : String }

{ name : {.....}, nicknames : [String], phones : [{Type: String}] primaryEmail : String, secondaryEmails : [String], title : String, signatures: [String], assistant : { name : {.....}, primaryEmail : String } …...}

Page 14: Optimizing Data Architecture for Natural Language Processing

Dealing with schema changes

Issues

● Inconsistent character offsets

● Inconsistent time representation

● Improper sent date (yr 2026)

● Key info not saved

Fixes

● Recalculate character offsets

● Reconstruct time entities

● Recalculate timezone based on context

● Filter out unsalvageable data

Page 15: Optimizing Data Architecture for Natural Language Processing

Feeding data science

Page 16: Optimizing Data Architecture for Natural Language Processing

ML training architecture

Page 17: Optimizing Data Architecture for Natural Language Processing

alex @ x.aicoo and founder

25 Broadway. 9th FloorNew York, 10005 NY

E: [email protected]: @xdotai

Visit x.ai to join the waitlist