Twitter: @ketab n egtema3.pdf · Twitter: @ketab_n. Twitter: @ketab_n. Twitter: @ketab_n
Automatic Extraction of Soccer Game Event Data from Twitter
-
Upload
marieke-van-erp -
Category
Documents
-
view
914 -
download
4
description
Transcript of Automatic Extraction of Soccer Game Event Data from Twitter
Automa'c extrac'on of soccer game event data
from Twi6er
Guido van Oorschot, Marieke van Erp and Chris Dijkshoorn
Monday, November 12, 12
Soccer data
Monday, November 12, 12
Theory
1. Fair body of research on automated sports highlight extraction
2. Twitter data can offer interesting insights in real world phenomena
Monday, November 12, 12
Automated highlight detec@on
Let’s Use Twitter data!
Monday, November 12, 12
1.Detecting events What minutes did events occur?
2.Classifying events Is the event a goal, card or substitution?
3.Assigning events to teams Is the event for the home team or away team?
3 Tasks
Monday, November 12, 12
5 types of events
- Goal
- Own Goal
- Red Card
- Yellow Card
- Substitution
Monday, November 12, 12
Methodology
1. Gathering the data
2. Exploring and cleaning the data
3. Classifying interesting data points
Monday, November 12, 12
Gathering data
- Collect all tweets with game hashtags
#ajafey #nacgro #psvutr
- Collect official data for each match
Goals, cards, substitutions
Monday, November 12, 12
Our data
6 months61 games
661 events10,643 tweets
Monday, November 12, 12
1. Detecting events
2. Classifying events
3. Assigning events to teams
Three Experiments
Monday, November 12, 12
1. Detecting events
Monday, November 12, 12
1. Detecting events
Monday, November 12, 12
1. Experimental Setup
- Goal: detect peaks in # tweets per minute signal to extract events
- Setup: Test three peak detection methods:
1. LocMaxNoBaseLineCorr2. IntThresNoBaseLineCorr3. IntThresWithBaseLineCorr
Monday, November 12, 12
1. Results
Monday, November 12, 12
1. Findings
- Goals and red cards are detected better than yellow cards and substitutions
- None of the three peak selection methods works well.
- Highlights can be extracted, but not precise enough
Monday, November 12, 12
1. Detecting events
2. Classifying events
3. Assigning events to teams
Three Experiments
Monday, November 12, 12
2. Classifying Events
minute “goal” “1” “red” “card” “boring” class
34 0 2 0 1 20 nothing
35 23 34 0 0 0 goal
12 1 2 0 0 5 nothing
13 1 0 22 11 0 red card
- Goal: Classify minutes into event classes
Monday, November 12, 12
Issues
Problem: Huge, sparse matrix
1. Reduce features Choose words/features smartly
2. Reduce instances Choose minutes smartly
Monday, November 12, 12
2. Experimental Setup
- 3 Instance selection settings
1. AllMinutes2. PeakMinutes3. Eventminutes
Monday, November 12, 12
2. Experimental Setup
- 7 Feature selection settings1. AllMoreThanOnce2. Top500TotalFreq3. Top10MinuteFreq4. Top500TotalTfIdf5. Top10MinuteTfIdf6. Top50Infogain7. Top50GainRatio
Monday, November 12, 12
2. Experimental Setup
- 6 types of classifiers1. C4.52. RandomForest3. NaiveBayes4. NaiveBayesMultinomial5. libSVM6. IB1
Monday, November 12, 12
2. Results
Monday, November 12, 12
2. Discussion
- Top50GainRatio best feature selection- libSVM best classifier- EventMinutes results:
Class F-‐measure
OVERALL 0.822Goal 0.841
Own goal 0.000
Red card 0.848
Yellow card 0.785
Subs@tu@on 0.839
Monday, November 12, 12
1. Detecting events
2. Classifying events
3. Assigning events to teams
Three Experiments
Monday, November 12, 12
3. Experimental Setup
- Goal: Assign events to team
- Based on the ratio between tweets from fans for home and away team
- But first: extract fans
Monday, November 12, 12
3. Extracting fans
- Hypothesis:
People that tweet for the same team each week are probably fan of that team
Monday, November 12, 12
3. Extracting fans
- Extracted 38,527 fans from 146,326 users (26%)
- This method of extracting fans works well:
Right team Not clear Wrong team
88% 10% 2%
Monday, November 12, 12
3. Results
Monday, November 12, 12
3. Results
- Performance of assigning events to teams above baseline performance:
Class Baseline Performance
OVERALL 52% 58%Goal 58% 69%
Red card 50% 62%
Yellow card 63% 63%
Subs@tu@on 52% 57%
Monday, November 12, 12
1. Detecting events => difficult
2. Classifying events => good results
3. Assigning events to teams=> promising results
Conclusion
Monday, November 12, 12
Future Work
- Use sentiment in tweets (for detecting events and assigning events to teams)
- Player detection
- Other sports
Monday, November 12, 12
Ques@ons?Monday, November 12, 12