Carl Miller

44
CASM The Centre for the Analysis of Social Media

Transcript of Carl Miller

Page 1: Carl Miller

CASM The Centre for the Analysis of Social Media

Page 2: Carl Miller

Since 2008: 1.2 bn regular users

More time on social media than any other way of using the Internet

Page 3: Carl Miller

Rapid transition of our lives onto social-digital platforms

More political, social and intellectual activity being captured,

data which had been previously lost

Page 4: Carl Miller

sOCMINT The Promise of Socmint

Very large

Constantly refreshing Unmediated

Rich

Linked

New bodies of data that are:

Page 5: Carl Miller

Social media intelligence

Socmint

Page 6: Carl Miller

sOCMINT The Opportunities

Inform 1 understand 2

PRedict 3

the ‘who’, ‘what’, ‘when’

The ‘why’ and ‘how

‘what next’

Page 7: Carl Miller

To inform Socmint

Page 8: Carl Miller

Event Detection

Page 9: Carl Miller

Targeted Event Detection

Page 10: Carl Miller

2012 olympics

We knew We didn’t know ‘Olympians’ were

competing in ‘events’ for ‘medals’.

When the medals were being won, or by whom

Page 11: Carl Miller

targeted events Step 1: Collecting Tweets

1 30,470,932 Tweets posted between 18 July and 13 August 2012 were collected. These Tweets all contained at least either the first or last name of an Olympian competing in the games.

Page 12: Carl Miller

targeted events Step 2: Measuring Tweets over Time

Page 13: Carl Miller

targeted events Step 3: Measuring Change in rate of Tweets

Page 14: Carl Miller

targeted events Step 4: Identify the possible pre- and post-event windows of the Tweetstream

Page 15: Carl Miller

Tweet Text Score

Gold Gold Chad le Clos by the fingertip #teamSA #london2012

0.715

20th of a second between Chad Le Clos and Michael Phelps in the 200M Butterfly!? Wow, what a final! Credit to Le Clos! #London2012Olympics

0.618

Here comes Michael Phelps - WOW! Misses gold by 0.01 seconds! Phelps takes silver. South African Chad Le Clos wins gold #London2012 #Olympx

0.597

Wow Michael Phelps misses gold by 0.01 seconds! Phelps takes silver. South African Chad Le Clos wins gold #London2012

0.595

Tweet Text Score

When I was 10 I dreamed of going to the Olympics, the furthest I got was European Champion at the age of 12 and then I stopped...

-0.066

took gymnastics from 18 ms - 10 yrs then quit cuz I didn't think I was good I find out I prob wld hv been in the Olympics fml

-0.067

@raytetreault: I'm gonna get the Olympics ring tattoo and just tell everyone I was in the Olympics. #soundsgood

-0.067

Page 16: Carl Miller

Situational Awareness

Page 17: Carl Miller
Page 18: Carl Miller
Page 19: Carl Miller

To understand

Socmint

Page 20: Carl Miller
Page 21: Carl Miller

Social network analysis

Natural Language

Processing

Page 22: Carl Miller

The Classifier Natural language processing §  The practical value of NLP is to create classifiers §  Classifiers are models that are taught to put natural

language – most often Tweets - into categories defined by the analyst on the basis of examples of each category provided by an analyst.

§  This is ‘machine learning’ through ‘annotation’. We’ll be doing it this afternoon.

§  The basis of this is Bayesian mathematics: it is inherently probabilistic.

Page 23: Carl Miller

Method 51 Natural language processing

Page 24: Carl Miller
Page 25: Carl Miller
Page 26: Carl Miller
Page 27: Carl Miller
Page 28: Carl Miller
Page 29: Carl Miller
Page 30: Carl Miller
Page 31: Carl Miller

predicting X factor

Page 32: Carl Miller
Page 33: Carl Miller

Predicting x factor From Opinion to Action

Soci

al M

edia

Opin

ion

“Brand” Evaluation

immediate Evaluation

actio

n

Behaviour Modelling

Page 34: Carl Miller

Predicting x factor

Page 35: Carl Miller

challenges To socmint

Page 36: Carl Miller

sOCMINT Two Parallel Challenges

Methodology Old methods overwhelmed.

New, unfamiliar applications of new

technologies to understand a new, digital-social world

Legitimacy An obviously contested

area, and one that stands to suffer much harm from use without public consent

Two challenges that stand in the way of it paying decisive dividends to public security

Page 37: Carl Miller

Challenges to socmint Representivity

§  Most data needs to be applicable to a given group in the offline world. There are a number of reasons to be suspect about social media data:

§  Data gathered from the platform may not represent the platform (sampling issues, especially keywords)

§  Social media content may not represent social media users: social media subject to power laws. Research suggests that a small number, around 5 percent, of ‘power-users’ on Twitter are responsible for 75 percent of Twitter activity.

§  Social media users may not represent actual people (sock-puppets and bots)

Page 38: Carl Miller

Challenges to socmint Veracity

§  Is what is being measured what is happening? New technologies and methodologies, many experimental and probabilistic.

§  The openness and anonymity of social media, especially, make them a suitable medium for deceptive tactics. A deliberate intent to mislead could be expressed through: disinformation; misinformation; honeypot accounts; impersonation, and wiki-circularity, even self-deception

Deception

Page 39: Carl Miller

Challenges to socmint Reality

§  Does it actually correspond with the real world? §  Online disinhibition effect. Our ability to count things

on social media has outpaced our understanding of what these things mean as social and cultural practices – as symbols, as language-games, as rituals, as products of digital worlds ruled by new norms and subjective truths.

Page 40: Carl Miller

Challenges to socmint Validation

§  There are not yet developed and tested strategies commonly used across SOCMINT practice or social media research to validate whatever is produced.

§  Either single source to rate the confidence in any single piece of intelligence reporting

§  Or how it feeds into all-source assessment against other pieces of intelligence and bodies of open-data.

Page 41: Carl Miller

Challenges to socmint Use

§  SOCMINT depends on getting to the right people in time, securely, and presented in a format that makes sense to strategic and operational decision makers as well as those at the front-line. Issues are:

§  SOCMINT often complex, self-contradictory and dynamic

§  Must be understood within a cloud of caveats

Page 42: Carl Miller

Challenges to socmint Legitimacy, public acceptability and law

§  The Internet is a contested place – from the beginning, a cyber-libertarian belief that the Internet exists to evolve humanity beyond states

§  its universal language – the TCP/IP protocol – embraces an open architecture that distrusts centralised control, allows any computer or network to join, and does not make (nor allow internet service providers to make) judgments about content.

§  Therefore vital that the collection and use of SOCMINT rests on a firm basis of public acceptability

Page 43: Carl Miller

Challenges to socmint Not yet a Discipline

§  Scattering of isolated islands of emphasis §  Not a united body of learning, method or example §  Spans disciplines from computer sciences and

ethnography to advertising and brand management §  Conducted across the private sector (from tech start-

ups to large business analytics firms), academia, now beginning in the public sector.

§  Fastest take-up was marketing and advertising §  Slower was the government and public sector §  Still made barely an impact on charities and the third

sector

Page 44: Carl Miller

CASM @carljackmiller [email protected]