Big data meets journalism: automatic news detection · Big data meets journalism: automatic news...

Post on 18-May-2020

37 views 0 download

Transcript of Big data meets journalism: automatic news detection · Big data meets journalism: automatic news...

Big data meets journalism: automatic news detection

Sandjai Bhulais.bhulai@vu.nl

Sandjai Bhulai (s.bhulai@vu.nl)

The world of today

Sandjai Bhulai (s.bhulai@vu.nl)

The world of today: social media

Sandjai Bhulai (s.bhulai@vu.nl)

Twitter

Sandjai Bhulai (s.bhulai@vu.nl)

Social media in our daily life

Sandjai Bhulai (s.bhulai@vu.nl)

Social media in our daily life

Sandjai Bhulai (s.bhulai@vu.nl)

Social media in our daily life

Sandjai Bhulai (s.bhulai@vu.nl)

Social media in our daily life

Sandjai Bhulai (s.bhulai@vu.nl)

Forecasting news for nu.nl?

Sandjai Bhulai (s.bhulai@vu.nl)

Challenge #1

• How do you get (all) Dutch tweets?

• Twitter has a streaming API• Fair use policy delivers random 1% of the Twitter

stream• Following keywords is allowed

• How much data do you need?• How much data can you get?• How much data can you deal with?• How much data can you store?

Sandjai Bhulai (s.bhulai@vu.nl)

Twitter - popularity

Sandjai Bhulai (s.bhulai@vu.nl)

Challenge #2

• How do you detect trend on Twitter?

• Absolute frequencies of tweets• Relative frequencies of tweets• Speed of tweets• Acceleration of tweets• Seasonal patterns

• We need a real-time algorithm• We need to efficiently handle memory

Sandjai Bhulai (s.bhulai@vu.nl)

Trending topics

1. #PrayforMexico2. #SocialMovies3. #temblor4. Sismo de 7.85. Earthquake in Mexico6. John Elway7. Pat Bowlen8. Marcelo Lagos9. Azcapotzalco10.Niñas de 13 y 14

20 maart 2012, Twitter.com

Sandjai Bhulai (s.bhulai@vu.nl)

Trending topics

Sandjai Bhulai (s.bhulai@vu.nl)

Number of tweets #PrayforMexico

Sandjai Bhulai (s.bhulai@vu.nl)

Speed of tweets

Sandjai Bhulai (s.bhulai@vu.nl)

Acceleration of tweets

Sandjai Bhulai (s.bhulai@vu.nl)

Challenge #3

• How do you deal with the following tweets?

• “Brand in Amsterdam”• “Vuur in 020”• “Fikkie in A’dam”

• “Ik heb brand gezien”• “Ik zag brand”• “Ik zie brand”

Sandjai Bhulai (s.bhulai@vu.nl)

Visualization

Sandjai Bhulai (s.bhulai@vu.nl)

Visualization

Sandjai Bhulai (s.bhulai@vu.nl)

Live demo (prototype)

• Prototype: live twitter stream!

• Full Dutch twitter stream• Tries to detect news before it is reported

Sandjai Bhulai (s.bhulai@vu.nl)

Voorbeeld: Schiphol (08-03-2012 15:02:00)

Sandjai Bhulai (s.bhulai@vu.nl)

Voorbeeld: Schiphol (08-03-2012 15:12:00)

Sandjai Bhulai (s.bhulai@vu.nl)

From nu.nl to straks.nl

Sandjai Bhulai (s.bhulai@vu.nl)

The future

• Many challenges ahead:

• How to deal with retweets?• Integration of reputation scores?• Use of profile information?• Advantages of semantic research?• Add feeds of other social media?• Generalize to other languages?• Dependencies of GPS information?• …

Sandjai Bhulai (s.bhulai@vu.nl)

Questions