CIDR: Chat-oriented Innovations in Database...

1
CIDR: Chat-oriented Innovations in Database Research Eugene Wu Columbia University [email protected] estimated # innovations Year new millennia Figure 1: Estimated innovations per year. The number of large innovations per year has been increasing rapidly since the start of the new millennia (Figure ??). From Hadoop and big data, to internet of things, to millennial startups, to visualization, to deep learning, each new innovation has left an undeniable mark on society, and ultimately, the world. Although the database community may not have led the charge in these in- novations, we have been successful at staking our claim to the in- frastructure and core tenants in each of these innovations. But must we continue to play a reactive role? Should we not be the drivers in new innovative areas? The answer is clearly a resounding yes, but towards what nascent area? Well, do I have a bridge to sell you, and it is chat bots. Yes. Chat bots. Conversational communication is slated to be the dominante com- munication channel. Chat-based applications e.g., Snapchat 1 , WeChat 2 , and Line messenger 3 are respectively worth $20B, $83B, and $9B. In other words, each of these applications are individually valued on the order of the entire $33B database market 4 . In the long term, the largest companies in the world (e.g., Google, Microsoft, Facebook, Tencent) are throwing their weight behind chat. This makes sense because chat is ubiquitous—texting is chat, IMs are chat, Slack is chat, Twitter is chat, email is chat with non- deterministic delays. It replaces slow, synchronous communica- tion with asynchronous, parallelizable communication: just ask any teenager. Moreover, its seemingly simple interface is the HTTP and browser for mobile: a protocol and an interface. For instance, 1 http://money.cnn.com/2016/05/24/technology/snapchat-valuation-20-billion/ 2 https://www.techinasia.com/talk/wechat-valued-at-83-6-billion-half-of-tencents-market-cap 3 http://qz.com/235215/line-ipo-naver-will-finally-put-a-real-value-on-messaging-apps/ 4 http://blog.memsql.com/incumbents-and-contenders-in-the-33b-database-market/ ACM ISBN 978-1-4503-2138-9. DOI: 10.1145/1235 Answer Cache Natural language questions Text—Query Translator Natural language answers Businesses Call center Internet InnoChat Figure 2: InnoChat System Architecture. services 5 , assistants 6 , and customer communication 7 can all be ac- cessed through a chat interface. In other words, chat-based acces- siblity is what having a website was in 1999: innovation. To this end, chat companies are furiously building APIs and chat- bot frameworks 89 so that forward thinking services can develop ser- vices against these APIs. However, most businesses don’t have a website, and those with a chat-interface is approximately zero. Enter the database community. There were 42M registered busi- nesses in the world in 2015, and the majority are connected—not to the internet, but to a telephone. Forget struggling to develop chat-bots for individual businesses: our community can bring a chat interface to every phone-connected business! With only a healthy mix of human-powered computing, information extraction, caching, and data integration. This abstract presents InnoChat (Figure 2). Natural language questions aimed at phone-connected businesses (e.g., “do any Berke- ley hotels have pillows for my British Shorthair cat?”) are submit- ted through chat boxes, voice, telephones, and other mediums. The Text-Query translator extracts a structure query and attempts to an- swer it using an existing answer cache as well as the internet. If the cache lookup fails, operators are standing by to directly call the requisite businesses with the question. The phone calls are logged and the answers extracted, inserted into the cache, and returned to the user in chat, voice, or multimedia form. Although deceptively similar to classic crowd-sourced Q&A, InnoChat requires innova- tions in 1) translating NLP to structured queries, 2) batching and managing call center operators, 3) developing new phone-based, rather than mechanical turk-based question interfaces, and 4) man- aging the large amounts of success that such a system will bring to the research community. References available after publication or over chat. 5 http://a16z.com/2015/08/06/wechat-china-mobile-first/ 6 https://getmagicnow.com/ 7 https://www.intercom.io/ 8 https://dev.botframework.com/ 9 https://aws.amazon.com/blogs/compute/create-and-deploy-a-chat-bot-to-aws-lambda-in-five-minutes/

Transcript of CIDR: Chat-oriented Innovations in Database...

Page 1: CIDR: Chat-oriented Innovations in Database Researchcidrdb.org/cidr2017/gongshow/abstracts/cidr2017_109.pdf · 2020-05-24 · CIDR: Chat-oriented Innovations in Database Research

CIDR: Chat-oriented Innovations in Database ResearchEugene Wu

Columbia [email protected]

estim

ated

#

inno

vatio

ns

Year

new millennia

Figure 1: Estimated innovations per year.

The number of large innovations per year has been increasingrapidly since the start of the new millennia (Figure ??). FromHadoop and big data, to internet of things, to millennial startups,to visualization, to deep learning, each new innovation has left anundeniable mark on society, and ultimately, the world. Althoughthe database community may not have led the charge in these in-novations, we have been successful at staking our claim to the in-frastructure and core tenants in each of these innovations. But mustwe continue to play a reactive role? Should we not be the driversin new innovative areas? The answer is clearly a resounding yes,but towards what nascent area? Well, do I have a bridge to sell you,and it is chat bots.

Yes. Chat bots.

Conversational communication is slated to be the dominante com-munication channel. Chat-based applications e.g., Snapchat1, WeChat2,and Line messenger3 are respectively worth $20B, $83B, and $9B.In other words, each of these applications are individually valuedon the order of the entire $33B database market4.

In the long term, the largest companies in the world (e.g., Google,Microsoft, Facebook, Tencent) are throwing their weight behindchat. This makes sense because chat is ubiquitous—texting is chat,IMs are chat, Slack is chat, Twitter is chat, email is chat with non-deterministic delays. It replaces slow, synchronous communica-tion with asynchronous, parallelizable communication: just ask anyteenager. Moreover, its seemingly simple interface is the HTTPand browser for mobile: a protocol and an interface. For instance,

1http://money.cnn.com/2016/05/24/technology/snapchat-valuation-20-billion/2https://www.techinasia.com/talk/wechat-valued-at-83-6-billion-half-of-tencents-market-cap3http://qz.com/235215/line-ipo-naver-will-finally-put-a-real-value-on-messaging-apps/4http://blog.memsql.com/incumbents-and-contenders-in-the-33b-database-market/

ACM ISBN 978-1-4503-2138-9.

DOI: 10.1145/1235

AnswerCache

Natural language questions

Text—QueryTranslator

Natural language answers

Businesses

Call center

InternetInnoChat

Figure 2: InnoChat System Architecture.

services5, assistants6, and customer communication7 can all be ac-cessed through a chat interface. In other words, chat-based acces-siblity is what having a website was in 1999: innovation.

To this end, chat companies are furiously building APIs and chat-bot frameworks89 so that forward thinking services can develop ser-vices against these APIs. However, most businesses don’t have awebsite, and those with a chat-interface is approximately zero.

Enter the database community. There were 42M registered busi-nesses in the world in 2015, and the majority are connected—notto the internet, but to a telephone. Forget struggling to developchat-bots for individual businesses: our community can bring achat interface to every phone-connected business! With only ahealthy mix of human-powered computing, information extraction,caching, and data integration.

This abstract presents InnoChat (Figure 2). Natural languagequestions aimed at phone-connected businesses (e.g., “do any Berke-ley hotels have pillows for my British Shorthair cat?”) are submit-ted through chat boxes, voice, telephones, and other mediums. TheText-Query translator extracts a structure query and attempts to an-swer it using an existing answer cache as well as the internet. Ifthe cache lookup fails, operators are standing by to directly call therequisite businesses with the question. The phone calls are loggedand the answers extracted, inserted into the cache, and returned tothe user in chat, voice, or multimedia form. Although deceptivelysimilar to classic crowd-sourced Q&A, InnoChat requires innova-tions in 1) translating NLP to structured queries, 2) batching andmanaging call center operators, 3) developing new phone-based,rather than mechanical turk-based question interfaces, and 4) man-aging the large amounts of success that such a system will bring tothe research community.

References available after publication or over chat.

5http://a16z.com/2015/08/06/wechat-china-mobile-first/6https://getmagicnow.com/7https://www.intercom.io/8https://dev.botframework.com/9https://aws.amazon.com/blogs/compute/create-and-deploy-a-chat-bot-to-aws-lambda-in-five-minutes/