Application of Confidence Intervals to Text-based Social Network Construction By CDT Julie...

22
Application of Confidence Intervals to Text-based Social Network Construction By CDT Julie Jorgensen, 06, G4 Advisors: MAJ Ian McCulloh, D/MATH LTC John Graham, D/BS&L

Transcript of Application of Confidence Intervals to Text-based Social Network Construction By CDT Julie...

Application of Confidence Intervals to Text-based Social Network Construction

By

CDT Julie Jorgensen, 06, G4Advisors: MAJ Ian McCulloh, D/MATH

LTC John Graham, D/BS&L

Agenda The Real-World Problem Text Analysis/Social Network Analysis Solution

Social Network Analysis Simple Text Analysis

A Better Solution Themed Analysis Example Case – Jihadist Texts Theme Scores

Network Construction Procedure Jihadist Network

Results Importance and Conclusions

The Real-World Problem

Commanders need to understand “Human Terrain” Majority of ‘HT’ information is in text form

The Combating Terrorism Center receives volumes of data every day.

Harmony Database is being rapidly declassified Need an efficient way to plow through large amounts

of text data and see the linkages.

Solution: Text Analysis Displayed in Social Network Analysis

Social Network Analysis

A mathematical method of quantifying connections between individuals or groups and drawing conclusions from those connections

Assumes rational beings are interdependent Nodes

Key Actors

Links Relationships between Nodes

“Human Terrain” Example: 9/11 Hijacker Network

Barzani Khamenei

Iraq Elections

Demonstration Data Set:Jihadist Texts

Approx. 250 translated texts MEMRIFBISOther Sources

15 Authors More than 1 textNot well known

Simple Text Analysis: The Plagiarism Check

Problem Word matching is

overly simple. Ignores context Actors can be

overly weighted by writing more

Alternative: Themed Analysis

Traditional Network Analysis MethodsCitation AnalysisPhysical NetworkCommunication or Financial Network

Themed Analysis Relates nodes across multiple fields

One similar theme versus many similar themes

Demonstration: Text Analysis

Theme ScoresISLAM JIHAD SALAF INFIDEL FOREIGNERSSHEIKH BATTLEGROUNDS JEWS

allah al_jihad salaf infidel united_states shaykh Afghanistan jewsreligion mujahid sunnah apostate government bosnia zionistsislam attack sallam heretic al-Saud two-rivers usurymuslim raid kuffr Australia iraq israelummah defense taghoot Britain palestinebrother plane idol Spainbook bombing Italymessenger operation Franceprophet clashmohammad fight

conflict

THEMES

*Theme Score is the sum of each word’s score per text

Problem Commander needs information in representations

he/she understands. Networks can compare authors across single themes But difficult to compare authors across multiple

themes

Constructing a Network Across Multiple Themes

Scrub Texts Construct Theme Scores Construct Confidence Intervals Discern Similarity between Nodes

Binary or Standardized Difference of Means Create Square Matrix Draw Network

*why not ANOVA?

Confidence Intervals

95% Confidence Interval = Each Author, Each Theme

Example:

nst

Author MugrinTheme Islam

Text Score Mean Width Low Highctc127 0.7234 0.50602 0.191819 0.314201 0.697839ctc126 0.7328ctc125 0.5387ctc124 0.668ctc123 0.2012ctc122 0.6931ctc121 0.3977ctc120 0.227ctc119 0.0553ctc118 0.823

Relationship Scores

Each possible pair of authors per themeOverlapping Confidence Intervals

Disparate Confidence Intervals

MaxDiff

ActDiffMaxDiffs ji

,

0, jis

Matrix Construction

• Multiplication of Scores for each author and each theme

• Resultant Square Matrix

Mugrin al-Iraqi Alshareef al Albanee Ibn Baaz Abdul Aziz Azzam At Tartusi Maqdisi Shuaibi Al-Fahd Madkhalee Madhi Al-Awdah QaradhawiMugrin 1.00000 0.76695 0.00000 0.00000 0.00000 0.00000 0.84938 0.00000 0.84852 0.80676 0.83939 0.00000 0.84403 0.00000 0.00000al-Iraqi 0.76695 1.00000 0.51748 0.00000 0.00000 0.00000 0.84449 0.00000 0.69722 0.82516 0.81203 0.00000 0.72532 0.00000 0.00000

Alshareef 0.00000 0.51748 1.00000 0.75690 0.83688 0.00000 0.00000 0.00000 0.00000 0.00000 0.77599 0.00000 0.94616 0.00000 0.00000al Albanee 0.00000 0.00000 0.75690 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.90076 0.00000 0.00000 0.00000Ibn Baaz 0.00000 0.00000 0.83688 0.00000 1.00000 0.91174 0.82297 0.78024 0.80594 0.90168 0.91619 0.00000 0.86383 0.87589 0.69418

Abdul Aziz 0.00000 0.00000 0.00000 0.00000 0.91174 1.00000 0.00000 0.00000 0.73681 0.52157 0.85487 0.95733 0.88681 0.94896 0.00000Azzam 0.84938 0.84449 0.00000 0.00000 0.82297 0.00000 1.00000 0.59977 0.93159 0.81534 0.89227 0.00000 0.79010 0.00000 0.63895

At Tartusi 0.00000 0.00000 0.00000 0.00000 0.78024 0.00000 0.59977 1.00000 0.52446 0.81876 0.82699 0.00000 0.00000 0.00000 0.00000Maqdisi 0.84852 0.69722 0.00000 0.00000 0.80594 0.73681 0.93159 0.52446 1.00000 0.77203 0.86424 0.00000 0.82544 0.76400 0.77915Shuaibi 0.80676 0.82516 0.00000 0.00000 0.90168 0.52157 0.81534 0.81876 0.77203 1.00000 0.92896 0.00000 0.57030 0.64583 0.00000Al-Fahd 0.83939 0.81203 0.77599 0.00000 0.91619 0.85487 0.89227 0.82699 0.86424 0.92896 1.00000 0.00000 0.80821 0.86983 0.00000

Madkhalee 0.00000 0.00000 0.00000 0.90076 0.00000 0.95733 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000Madhi 0.84403 0.72532 0.94616 0.00000 0.86383 0.88681 0.79010 0.00000 0.82544 0.57030 0.80821 0.00000 1.00000 0.00000 0.00000

Al-Awdah 0.00000 0.00000 0.00000 0.00000 0.87589 0.94896 0.00000 0.00000 0.76400 0.64583 0.86983 0.00000 0.00000 1.00000 0.00000Qaradhawi 0.00000 0.00000 0.00000 0.00000 0.69418 0.00000 0.63895 0.00000 0.77915 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000

Overall Theme Scores

Geometric Mean = nn

iia

1

1

Themed Network

Theme Analysis: Confidence Interval vs Average

Texts Degree NrmDegreeAl-Fahd 8 9.389 70.053Maqdisi 6 8.549 63.789Ibn Baaz 10 8.41 62.745Shuaibi 3 7.606 56.753Madhi 4 7.26 54.17Azzam 4 7.185 53.608

Abdul Aziz 4 5.818 43.41al-Iraqi 7 5.189 38.714Mugrin 10 4.955 36.971

Al-Awdah 16 4.105 30.625Alshareef 2 3.833 28.602At Tartusi 2 3.55 26.489Qaradhawi 7 2.112 15.76Madkhalee 7 1.858 13.864al Albanee 2 1.658 12.368

WeightedAuthor islam jihad salaf infidel foreigners battlegrounds sheikh jew Average Rank Overallal-Fahd 10 5 6 3 7 6 2 9 5.57 1Mugrin 6 4 12 6 1 4 11 9 6.29 2Shuaibi 11 9 11 1 6 5 3 9 6.57 3Azzam 12 2 10 7 8 3 7 8 7.00 4

Maqdisi 9 1 8 8 5 9 10 4 7.14 5al-Iraqi 8 6 13 5 9 1 11 9 7.57 6

At-Tartusi 14 8 4 2 15 10 1 5 7.71 7Abdul Aziz 5 10 9 4 10 12 5 6 7.86 8

Madhi 2 7 13 12 3 7 11 2 7.86 9Qaradhawi 15 3 13 9 11 2 4 1 8.14 10Alshareef 3 14 3 14 2 11 11 3 8.29 11

Madkhalee 4 13 2 11 12 12 6 9 8.57 12Al-Awdah 13 11 7 10 4 8 8 7 8.71 13al Albanee 1 14 1 14 14 12 11 9 9.57 14Ibn Baaz 7 12 5 13 13 12 9 9 10.14 15

Theme Ranks

Able to look at each theme individually.

Average Rank does not account for connections importance, weighting, predictors

Themes are combined

Can see connections between authors across a combination of themes.

Method Comparison

Themed Network Analysis Plagiarism Theme Ranks Jihad ThemeAl-Fahd Al-Awdah Al-Fahd MaqdisiMaqdisi Maqdisi Mugrin AzzamIbn Baaz Al-Albanee Shuaibi QaradhawiShuaibi Al-Iraqi Azzam MugrinMadhi Azzam Maqdisi Al-Fahd

Top 5 OnceTop 5 Every Method

Conclusions

Socially Engineered Algorithms involve extensive tradeoffs and decisions by the mathematician that can significantly impact commander’s decision-making.

Multiple views of the same data is a critical requirement.

Find Linkages in large amounts of data Find Connections across multiple fields Non-Tangible Relationships Real World: Track / Catch criminals / radical ideologues Representation of Human Terrain

Future Work

Publish method in Journal of Computational and Mathematical Organization Theory

Integration into ORA (Organizational Risk Analysis) Statistical Software: In use by Intelligence Analysts.

Analysis of change over time

Questions?

References

Dr. Jaret Brachman. Combating Terrorism Center, USMA.

Dr. Steven Corman. Hugh Downs School of Human Communication, Arizona State University.

http://www.checkpoint-online.ch/CheckPoint/Images/N-HusseinCapture.jpg

http://www.salmac.co.za/profile-writing-arabic.gif Wasserman, Stanley and Katherine Faust. Social

Network Analysis: Methods and Applications. New York: Cambridge University Press, 1994, 4.