UnderstandingInterpersonalVariations in Word Meanings via ...oba/assets/CICLing_oba_v9.pdf:...

Post on 29-Sep-2020

1 views 0 download

Transcript of UnderstandingInterpersonalVariations in Word Meanings via ...oba/assets/CICLing_oba_v9.pdf:...

1. Relationships to linguistic features

Understanding Interpersonal Variations in Word Meanings via Review Target Identification

Top

-20

ery bready ark slight floral toasty tangy updated citrusy soft deep mainly grassyaroma doughy dissipating grass ot great earthy

Botto

m-20

reminds cask batch oil reminded beyond canned conditioned double abv hope horse oats rye brewery blueberry blueberries maple bells old

2. Word list sorted by semantic variation

Proposal:

1. Pretrain model by MTL with metadata identification (auxiliary tasks)・Stabilize the training for the extreme multi-class classification

2. Fine-tune personalized word embeddings only for the target task・Prevent the word embeddings from learning selection bias・Reduce memory usage caused by explosion of parameter counts in proportion to the number of reviewers

Reviewer specific matrixReviewer universal word embedding

Experiments: Review target identification

Analysis: What kind of words have strong semantic variations?

[Target]: Words used by at least 30% reviewers(excluding stop words)

[Proposed metrics]:Semantic variation

1

|U(wi)|!

uj∈U(wi)

(1− cos(eujwi, ewi))

Words related to the five senses and adjectives have large semantic variations

[Premise]: Target identification performance reflectswhether the model can capture semantic variations

Related Work:

・Annotation bias: How annotators labelex) Some reviewers tend to give extreme ratings

・Selection bias: How reviewers choose the review targetsex) Write reviews on products of a specific manufacturer

・Semantic variation (our target): How people use the words

Approach: Induce personalized word embeddings through review-target (objective label) identification! By solving a task with objective target, the model can capture only the semantic variation

We express different meanings with the same word or the same meaning with different words

Motivation: Clarify what kind of words have different meanings by individuals

Cold

Yellow

Cold

Gold

!"℉~!%℉ ≠

3. Relationships between the words

Intersection between clusters means thatthe same meaning is expressed different ways

Pearson correlation -0.07Pearson correlation 0.43

grainy

bready

doughy

toastybiscuity

crackery

grassy

[Baseline]: Select the majority label in the training set[Proposed]: Four different settings, ・Whether pretraining by MTL is applied (MTL or ----)・Whether the fine-tuning for personalization is employed (PRS or ----)

RateBeer dataset Yelp dataset

ModelTarget metadata Target metadataBeer

[Acc.(%)]Brewery[Acc.(%)]

Style[Acc.(%)]

ABV[RMSE]

Service[Acc.(%)]

Location[Acc.(%)]

Category[Micro F1(%)]

Baseline 0.08 1.51 6.19 2.321 0.05 27.00 31.5---- / ---- 15.74 n/a n/a n/a 6.75 n/a n/aMTL/ ---- 16.16 19.98 49.00 1.428 9.71 70.33 57.8---- /PRS 16.69 n/a n/a n/a 7.15 n/a n/aMTL/PRS 17.56 20.81 49.78 1.406 10.72 83.14 57.7

" Prevents smooth communication in our daily lives" Causes problems for computer when solving NLP tasks

Our method could successfully capture semantic variations

* The same trend was also seen on Yelp dataset

・Proposed a method of inducing personalized word emb. via review target identification

・Showed the effectiveness of the personalized word emb. on the target identification task

・Clarified that the meanings of words related to the five senses highly fluctuate by individuals

Summary

Daisuke Oba1, Shoetsu Sato1, Naoki Yoshinaga2, Satoshi Akasaki1, Masashi Toyoda21 The University of Tokyo, 2 Institute of Industrial Science, The University of Tokyo

Output: Review target

In the target identification task ...・The output label is automatically given without an annotator: ! Annotation bias・The same reviewer selects the same target only once in the dataset: ! Selection biasInput: Review

This beer has high drinkability estimate

'(℉~'"℉

dataset # reviews # reviewers metadataRateBeer[McAuley+,13] 2,695,615 3,670 Style, Brewery,

Alcohol by volume(ABV)Yelp 426,816 2,414 Location, Category

Frequent words tend to have strong semantic variations・ In the beer/food/restaurant domain, expressions depending

on individual senses or experiences frequentry appearMeaning change to other synsets does not happen・The small domain dataset restricts the actual meanings

eujwi

<latexit sha1_base64="wil07PffMt5sodU9PSFlQLFu4wc=">AAAC33ichVFNSxVhFH4cK+324dU2QZvByw1DuJwRQREEsU2LFnr1egWvDTPTq705X8y89/oxzFpwEe0iWhW0iPb9gNz4B1r4D4pWZdCmRWc+IFS0M8y85zznec48L8cOXRkrouM+rf/K1WsDg9crN27euj1UHR5ZiYNu5IiWE7hBtGpbsXClL1pKKleshpGwPNsVbXvrYdZv90QUy8BfVruhWPesTV9uSMdSDJnVTsf2EpE+Sbrm89RMtk2Z6pX6rN5RYkdFXtIUj1vpWEZqczsj6YWi4I7nlV22Hujjp7pmtUYNykM/nxhlUkMZC0H1Ezp4igAOuvAg4ENx7sJCzM8aDBBCxtaRMBZxJvO+QIoKa7vMEsywGN3i7yZXayXqc53NjHO1w39x+Y1YqaNOX+gDndARfaRv9OfCWUk+I/Oyy6ddaEVoDh3cXfr9X5XHp8Kzf6pLPStsYDr3Ktl7mCPZLZxC39t7dbI006wn9+kdfWf/b+mYDvkGfu+X835RNN9c4sdmL9l6jLPLOJ+sTDQMahiLk7W5+XJRg7iHUYzxNqYwh0dYQIunf8ZX/MBPzdL2tRfay4Kq9ZWaOzgV2uu/sLq2Ww==</latexit><latexit sha1_base64="wil07PffMt5sodU9PSFlQLFu4wc=">AAAC33ichVFNSxVhFH4cK+324dU2QZvByw1DuJwRQREEsU2LFnr1egWvDTPTq705X8y89/oxzFpwEe0iWhW0iPb9gNz4B1r4D4pWZdCmRWc+IFS0M8y85zznec48L8cOXRkrouM+rf/K1WsDg9crN27euj1UHR5ZiYNu5IiWE7hBtGpbsXClL1pKKleshpGwPNsVbXvrYdZv90QUy8BfVruhWPesTV9uSMdSDJnVTsf2EpE+Sbrm89RMtk2Z6pX6rN5RYkdFXtIUj1vpWEZqczsj6YWi4I7nlV22Hujjp7pmtUYNykM/nxhlUkMZC0H1Ezp4igAOuvAg4ENx7sJCzM8aDBBCxtaRMBZxJvO+QIoKa7vMEsywGN3i7yZXayXqc53NjHO1w39x+Y1YqaNOX+gDndARfaRv9OfCWUk+I/Oyy6ddaEVoDh3cXfr9X5XHp8Kzf6pLPStsYDr3Ktl7mCPZLZxC39t7dbI006wn9+kdfWf/b+mYDvkGfu+X835RNN9c4sdmL9l6jLPLOJ+sTDQMahiLk7W5+XJRg7iHUYzxNqYwh0dYQIunf8ZX/MBPzdL2tRfay4Kq9ZWaOzgV2uu/sLq2Ww==</latexit><latexit sha1_base64="wil07PffMt5sodU9PSFlQLFu4wc=">AAAC33ichVFNSxVhFH4cK+324dU2QZvByw1DuJwRQREEsU2LFnr1egWvDTPTq705X8y89/oxzFpwEe0iWhW0iPb9gNz4B1r4D4pWZdCmRWc+IFS0M8y85zznec48L8cOXRkrouM+rf/K1WsDg9crN27euj1UHR5ZiYNu5IiWE7hBtGpbsXClL1pKKleshpGwPNsVbXvrYdZv90QUy8BfVruhWPesTV9uSMdSDJnVTsf2EpE+Sbrm89RMtk2Z6pX6rN5RYkdFXtIUj1vpWEZqczsj6YWi4I7nlV22Hujjp7pmtUYNykM/nxhlUkMZC0H1Ezp4igAOuvAg4ENx7sJCzM8aDBBCxtaRMBZxJvO+QIoKa7vMEsywGN3i7yZXayXqc53NjHO1w39x+Y1YqaNOX+gDndARfaRv9OfCWUk+I/Oyy6ddaEVoDh3cXfr9X5XHp8Kzf6pLPStsYDr3Ktl7mCPZLZxC39t7dbI006wn9+kdfWf/b+mYDvkGfu+X835RNN9c4sdmL9l6jLPLOJ+sTDQMahiLk7W5+XJRg7iHUYzxNqYwh0dYQIunf8ZX/MBPzdL2tRfay4Kq9ZWaOzgV2uu/sLq2Ww==</latexit><latexit sha1_base64="wil07PffMt5sodU9PSFlQLFu4wc=">AAAC33ichVFNSxVhFH4cK+324dU2QZvByw1DuJwRQREEsU2LFnr1egWvDTPTq705X8y89/oxzFpwEe0iWhW0iPb9gNz4B1r4D4pWZdCmRWc+IFS0M8y85zznec48L8cOXRkrouM+rf/K1WsDg9crN27euj1UHR5ZiYNu5IiWE7hBtGpbsXClL1pKKleshpGwPNsVbXvrYdZv90QUy8BfVruhWPesTV9uSMdSDJnVTsf2EpE+Sbrm89RMtk2Z6pX6rN5RYkdFXtIUj1vpWEZqczsj6YWi4I7nlV22Hujjp7pmtUYNykM/nxhlUkMZC0H1Ezp4igAOuvAg4ENx7sJCzM8aDBBCxtaRMBZxJvO+QIoKa7vMEsywGN3i7yZXayXqc53NjHO1w39x+Y1YqaNOX+gDndARfaRv9OfCWUk+I/Oyy6ddaEVoDh3cXfr9X5XHp8Kzf6pLPStsYDr3Ktl7mCPZLZxC39t7dbI006wn9+kdfWf/b+mYDvkGfu+X835RNN9c4sdmL9l6jLPLOJ+sTDQMahiLk7W5+XJRg7iHUYzxNqYwh0dYQIunf8ZX/MBPzdL2tRfay4Kq9ZWaOzgV2uu/sLq2Ww==</latexit>

ewi<latexit sha1_base64="1Tw3Qq7j2zJO/YljKE/dnOLSS4w=">AAADBnichVFNS9xQFL1JazuOtjO2G6Wb0CFlijDcSEERBGk3Llzo6DiCM4Ykfepz8kXyZqwNWXUj/gEXrlroogjuWreCm/4BF/4E6dKCGxe9+QCZSu0Nybv33HNuzuOavs1DgXghyQ8eDjx6XBgsDg0/eVoqjzxbCb1uYLGG5dlesGoaIbO5yxqCC5ut+gEzHNNmTbPzLuk3eywIuecui12ftR1j0+Ub3DIEQXr5k9oynYjF61FX3471aEfnsVJUZ5SWYB9E4ER1Nt+IqwmpSe2EpGSKjDueVmbeeq2M93WLal/V8shK4jTK4BzXyxWsYRrK3UTLkwrkseCVf0AL3oMHFnTBAQYuCMptMCCkZw00QPAJa0NEWEAZT/sMYiiStkssRgyD0A59N6lay1GX6mRmmKot+otNb0BKBVQ8x294hT/xCC/x5p+zonRG4mWXTjPTMl8v7Y8uXf9X5dApYOtWda9nARswlXrl5N1PkeQWVqbvfTy4Wpquq9Er/IK/yP9nvMAzuoHb+219XWT1w3v8mOQlWY/29zLuJisTNQ1r2uKbyuzbfFEFeAEvoUrbmIRZmIMFaND0S6kkjUpj8p58LH+XTzKqLOWa59AX8ukf1LjENg==</latexit><latexit sha1_base64="1Tw3Qq7j2zJO/YljKE/dnOLSS4w=">AAADBnichVFNS9xQFL1JazuOtjO2G6Wb0CFlijDcSEERBGk3Llzo6DiCM4Ykfepz8kXyZqwNWXUj/gEXrlroogjuWreCm/4BF/4E6dKCGxe9+QCZSu0Nybv33HNuzuOavs1DgXghyQ8eDjx6XBgsDg0/eVoqjzxbCb1uYLGG5dlesGoaIbO5yxqCC5ut+gEzHNNmTbPzLuk3eywIuecui12ftR1j0+Ub3DIEQXr5k9oynYjF61FX3471aEfnsVJUZ5SWYB9E4ER1Nt+IqwmpSe2EpGSKjDueVmbeeq2M93WLal/V8shK4jTK4BzXyxWsYRrK3UTLkwrkseCVf0AL3oMHFnTBAQYuCMptMCCkZw00QPAJa0NEWEAZT/sMYiiStkssRgyD0A59N6lay1GX6mRmmKot+otNb0BKBVQ8x294hT/xCC/x5p+zonRG4mWXTjPTMl8v7Y8uXf9X5dApYOtWda9nARswlXrl5N1PkeQWVqbvfTy4Wpquq9Er/IK/yP9nvMAzuoHb+219XWT1w3v8mOQlWY/29zLuJisTNQ1r2uKbyuzbfFEFeAEvoUrbmIRZmIMFaND0S6kkjUpj8p58LH+XTzKqLOWa59AX8ukf1LjENg==</latexit><latexit sha1_base64="1Tw3Qq7j2zJO/YljKE/dnOLSS4w=">AAADBnichVFNS9xQFL1JazuOtjO2G6Wb0CFlijDcSEERBGk3Llzo6DiCM4Ykfepz8kXyZqwNWXUj/gEXrlroogjuWreCm/4BF/4E6dKCGxe9+QCZSu0Nybv33HNuzuOavs1DgXghyQ8eDjx6XBgsDg0/eVoqjzxbCb1uYLGG5dlesGoaIbO5yxqCC5ut+gEzHNNmTbPzLuk3eywIuecui12ftR1j0+Ub3DIEQXr5k9oynYjF61FX3471aEfnsVJUZ5SWYB9E4ER1Nt+IqwmpSe2EpGSKjDueVmbeeq2M93WLal/V8shK4jTK4BzXyxWsYRrK3UTLkwrkseCVf0AL3oMHFnTBAQYuCMptMCCkZw00QPAJa0NEWEAZT/sMYiiStkssRgyD0A59N6lay1GX6mRmmKot+otNb0BKBVQ8x294hT/xCC/x5p+zonRG4mWXTjPTMl8v7Y8uXf9X5dApYOtWda9nARswlXrl5N1PkeQWVqbvfTy4Wpquq9Er/IK/yP9nvMAzuoHb+219XWT1w3v8mOQlWY/29zLuJisTNQ1r2uKbyuzbfFEFeAEvoUrbmIRZmIMFaND0S6kkjUpj8p58LH+XTzKqLOWa59AX8ukf1LjENg==</latexit><latexit sha1_base64="1Tw3Qq7j2zJO/YljKE/dnOLSS4w=">AAADBnichVFNS9xQFL1JazuOtjO2G6Wb0CFlijDcSEERBGk3Llzo6DiCM4Ykfepz8kXyZqwNWXUj/gEXrlroogjuWreCm/4BF/4E6dKCGxe9+QCZSu0Nybv33HNuzuOavs1DgXghyQ8eDjx6XBgsDg0/eVoqjzxbCb1uYLGG5dlesGoaIbO5yxqCC5ut+gEzHNNmTbPzLuk3eywIuecui12ftR1j0+Ub3DIEQXr5k9oynYjF61FX3471aEfnsVJUZ5SWYB9E4ER1Nt+IqwmpSe2EpGSKjDueVmbeeq2M93WLal/V8shK4jTK4BzXyxWsYRrK3UTLkwrkseCVf0AL3oMHFnTBAQYuCMptMCCkZw00QPAJa0NEWEAZT/sMYiiStkssRgyD0A59N6lay1GX6mRmmKot+otNb0BKBVQ8x294hT/xCC/x5p+zonRG4mWXTjPTMl8v7Y8uXf9X5dApYOtWda9nARswlXrl5N1PkeQWVqbvfTy4Wpquq9Er/IK/yP9nvMAzuoHb+219XWT1w3v8mOQlWY/29zLuJisTNQ1r2uKbyuzbfFEFeAEvoUrbmIRZmIMFaND0S6kkjUpj8p58LH+XTzKqLOWa59AX8ukf1LjENg==</latexit>

U(wi)<latexit sha1_base64="NWpSiaw4n6fYx0g+oVdwK3k+Des=">AAADF3ichVHNS9xQEJ9EW+3a1q1eCr0ElxS3wjIRwSIIYi8eethdu+6Ca0MSn/bVfJG8XT9C/gHPBRFPLfRQvHv1IBSvPfTgnyAeFQqlh04+pKxSnZD3Zn4zv5nfY0zf5qFAPJPkvv4HDwcGHxWGHj95Olx8NrIUep3AYg3Ls72gZRohs7nLGoILm7X8gBmOabOmufEmyTe7LAi5574T2z5bcYx1l69xyxAE6cVPatt0Iha/jzr6x1iPNnUeKwV1VmkLtiUCJ6qzt414PClqUjopUjJGVjuRRmaeKisTPdmCeiPySEsiNcrw68T1MJpEcVkvlrCCqSm3HS13SpBb1SseQRtWwQMLOuAAAxcE+TYYENK3DBog+IStQERYQB5P8wxiKBC3Q1WMKgxCN+hcp2g5R12Kk55hyrZoik1/QEwFVPyJ3/AST/EQz/HPf3tFaY9EyzbdZsZlvj68+3zx170sh24BH/6x7tQsYA1ep1o5afdTJHmFlfG7O3uXizN1NXqJX/CC9H/GMzyhF7jdK+trjdUP7tBjkpaY1qPdXMZtZ2myomFFq02V5ubzRQ3CCxiDcdrGNMzBAlShQd1/S4pUll7J+/Kx/F0+zUplKeeMQo/JP/4CuGfK7A==</latexit><latexit sha1_base64="NWpSiaw4n6fYx0g+oVdwK3k+Des=">AAADF3ichVHNS9xQEJ9EW+3a1q1eCr0ElxS3wjIRwSIIYi8eethdu+6Ca0MSn/bVfJG8XT9C/gHPBRFPLfRQvHv1IBSvPfTgnyAeFQqlh04+pKxSnZD3Zn4zv5nfY0zf5qFAPJPkvv4HDwcGHxWGHj95Olx8NrIUep3AYg3Ls72gZRohs7nLGoILm7X8gBmOabOmufEmyTe7LAi5574T2z5bcYx1l69xyxAE6cVPatt0Iha/jzr6x1iPNnUeKwV1VmkLtiUCJ6qzt414PClqUjopUjJGVjuRRmaeKisTPdmCeiPySEsiNcrw68T1MJpEcVkvlrCCqSm3HS13SpBb1SseQRtWwQMLOuAAAxcE+TYYENK3DBog+IStQERYQB5P8wxiKBC3Q1WMKgxCN+hcp2g5R12Kk55hyrZoik1/QEwFVPyJ3/AST/EQz/HPf3tFaY9EyzbdZsZlvj68+3zx170sh24BH/6x7tQsYA1ep1o5afdTJHmFlfG7O3uXizN1NXqJX/CC9H/GMzyhF7jdK+trjdUP7tBjkpaY1qPdXMZtZ2myomFFq02V5ubzRQ3CCxiDcdrGNMzBAlShQd1/S4pUll7J+/Kx/F0+zUplKeeMQo/JP/4CuGfK7A==</latexit><latexit sha1_base64="NWpSiaw4n6fYx0g+oVdwK3k+Des=">AAADF3ichVHNS9xQEJ9EW+3a1q1eCr0ElxS3wjIRwSIIYi8eethdu+6Ca0MSn/bVfJG8XT9C/gHPBRFPLfRQvHv1IBSvPfTgnyAeFQqlh04+pKxSnZD3Zn4zv5nfY0zf5qFAPJPkvv4HDwcGHxWGHj95Olx8NrIUep3AYg3Ls72gZRohs7nLGoILm7X8gBmOabOmufEmyTe7LAi5574T2z5bcYx1l69xyxAE6cVPatt0Iha/jzr6x1iPNnUeKwV1VmkLtiUCJ6qzt414PClqUjopUjJGVjuRRmaeKisTPdmCeiPySEsiNcrw68T1MJpEcVkvlrCCqSm3HS13SpBb1SseQRtWwQMLOuAAAxcE+TYYENK3DBog+IStQERYQB5P8wxiKBC3Q1WMKgxCN+hcp2g5R12Kk55hyrZoik1/QEwFVPyJ3/AST/EQz/HPf3tFaY9EyzbdZsZlvj68+3zx170sh24BH/6x7tQsYA1ep1o5afdTJHmFlfG7O3uXizN1NXqJX/CC9H/GMzyhF7jdK+trjdUP7tBjkpaY1qPdXMZtZ2myomFFq02V5ubzRQ3CCxiDcdrGNMzBAlShQd1/S4pUll7J+/Kx/F0+zUplKeeMQo/JP/4CuGfK7A==</latexit><latexit sha1_base64="NWpSiaw4n6fYx0g+oVdwK3k+Des=">AAADF3ichVHNS9xQEJ9EW+3a1q1eCr0ElxS3wjIRwSIIYi8eethdu+6Ca0MSn/bVfJG8XT9C/gHPBRFPLfRQvHv1IBSvPfTgnyAeFQqlh04+pKxSnZD3Zn4zv5nfY0zf5qFAPJPkvv4HDwcGHxWGHj95Olx8NrIUep3AYg3Ls72gZRohs7nLGoILm7X8gBmOabOmufEmyTe7LAi5574T2z5bcYx1l69xyxAE6cVPatt0Iha/jzr6x1iPNnUeKwV1VmkLtiUCJ6qzt414PClqUjopUjJGVjuRRmaeKisTPdmCeiPySEsiNcrw68T1MJpEcVkvlrCCqSm3HS13SpBb1SseQRtWwQMLOuAAAxcE+TYYENK3DBog+IStQERYQB5P8wxiKBC3Q1WMKgxCN+hcp2g5R12Kk55hyrZoik1/QEwFVPyJ3/AST/EQz/HPf3tFaY9EyzbdZsZlvj68+3zx170sh24BH/6x7tQsYA1ep1o5afdTJHmFlfG7O3uXizN1NXqJX/CC9H/GMzyhF7jdK+trjdUP7tBjkpaY1qPdXMZtZ2myomFFq02V5ubzRQ3CCxiDcdrGNMzBAlShQd1/S4pUll7J+/Kx/F0+zUplKeeMQo/JP/4CuGfK7A==</latexit>

: the set of reviewers who used the word )*: personalized word emb. for )* of the reviewer+,: the avg. of -./0 for U()*)

Personalized word embeddings are expressed via transformation of reviewer universal word embeddings using reviewer specific parameters

Attempted to consider personal biases to improve task performance

" They model various types of biases altogether

Personalize word embeddings in review target identification based on multi-task learning (MTL) and fine-tuning

Visualizing a word “bready” with closest wordsUsing principal component analysis (PCA)