Post on 18-Dec-2015
Application of Confidence Intervals to Text-based Social Network Construction
By
CDT Julie Jorgensen, 06, G4Advisors: MAJ Ian McCulloh, D/MATH
LTC John Graham, D/BS&L
Agenda The Real-World Problem Text Analysis/Social Network Analysis Solution
Social Network Analysis Simple Text Analysis
A Better Solution Themed Analysis Example Case – Jihadist Texts Theme Scores
Network Construction Procedure Jihadist Network
Results Importance and Conclusions
The Real-World Problem
Commanders need to understand “Human Terrain” Majority of ‘HT’ information is in text form
The Combating Terrorism Center receives volumes of data every day.
Harmony Database is being rapidly declassified Need an efficient way to plow through large amounts
of text data and see the linkages.
Solution: Text Analysis Displayed in Social Network Analysis
Social Network Analysis
A mathematical method of quantifying connections between individuals or groups and drawing conclusions from those connections
Assumes rational beings are interdependent Nodes
Key Actors
Links Relationships between Nodes
Demonstration Data Set:Jihadist Texts
Approx. 250 translated texts MEMRIFBISOther Sources
15 Authors More than 1 textNot well known
Simple Text Analysis: The Plagiarism Check
Problem Word matching is
overly simple. Ignores context Actors can be
overly weighted by writing more
Alternative: Themed Analysis
Traditional Network Analysis MethodsCitation AnalysisPhysical NetworkCommunication or Financial Network
Themed Analysis Relates nodes across multiple fields
One similar theme versus many similar themes
Theme ScoresISLAM JIHAD SALAF INFIDEL FOREIGNERSSHEIKH BATTLEGROUNDS JEWS
allah al_jihad salaf infidel united_states shaykh Afghanistan jewsreligion mujahid sunnah apostate government bosnia zionistsislam attack sallam heretic al-Saud two-rivers usurymuslim raid kuffr Australia iraq israelummah defense taghoot Britain palestinebrother plane idol Spainbook bombing Italymessenger operation Franceprophet clashmohammad fight
conflict
THEMES
*Theme Score is the sum of each word’s score per text
Problem Commander needs information in representations
he/she understands. Networks can compare authors across single themes But difficult to compare authors across multiple
themes
Constructing a Network Across Multiple Themes
Scrub Texts Construct Theme Scores Construct Confidence Intervals Discern Similarity between Nodes
Binary or Standardized Difference of Means Create Square Matrix Draw Network
*why not ANOVA?
Confidence Intervals
95% Confidence Interval = Each Author, Each Theme
Example:
nst
Author MugrinTheme Islam
Text Score Mean Width Low Highctc127 0.7234 0.50602 0.191819 0.314201 0.697839ctc126 0.7328ctc125 0.5387ctc124 0.668ctc123 0.2012ctc122 0.6931ctc121 0.3977ctc120 0.227ctc119 0.0553ctc118 0.823
Relationship Scores
Each possible pair of authors per themeOverlapping Confidence Intervals
Disparate Confidence Intervals
MaxDiff
ActDiffMaxDiffs ji
,
0, jis
Matrix Construction
• Multiplication of Scores for each author and each theme
• Resultant Square Matrix
Mugrin al-Iraqi Alshareef al Albanee Ibn Baaz Abdul Aziz Azzam At Tartusi Maqdisi Shuaibi Al-Fahd Madkhalee Madhi Al-Awdah QaradhawiMugrin 1.00000 0.76695 0.00000 0.00000 0.00000 0.00000 0.84938 0.00000 0.84852 0.80676 0.83939 0.00000 0.84403 0.00000 0.00000al-Iraqi 0.76695 1.00000 0.51748 0.00000 0.00000 0.00000 0.84449 0.00000 0.69722 0.82516 0.81203 0.00000 0.72532 0.00000 0.00000
Alshareef 0.00000 0.51748 1.00000 0.75690 0.83688 0.00000 0.00000 0.00000 0.00000 0.00000 0.77599 0.00000 0.94616 0.00000 0.00000al Albanee 0.00000 0.00000 0.75690 1.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.90076 0.00000 0.00000 0.00000Ibn Baaz 0.00000 0.00000 0.83688 0.00000 1.00000 0.91174 0.82297 0.78024 0.80594 0.90168 0.91619 0.00000 0.86383 0.87589 0.69418
Abdul Aziz 0.00000 0.00000 0.00000 0.00000 0.91174 1.00000 0.00000 0.00000 0.73681 0.52157 0.85487 0.95733 0.88681 0.94896 0.00000Azzam 0.84938 0.84449 0.00000 0.00000 0.82297 0.00000 1.00000 0.59977 0.93159 0.81534 0.89227 0.00000 0.79010 0.00000 0.63895
At Tartusi 0.00000 0.00000 0.00000 0.00000 0.78024 0.00000 0.59977 1.00000 0.52446 0.81876 0.82699 0.00000 0.00000 0.00000 0.00000Maqdisi 0.84852 0.69722 0.00000 0.00000 0.80594 0.73681 0.93159 0.52446 1.00000 0.77203 0.86424 0.00000 0.82544 0.76400 0.77915Shuaibi 0.80676 0.82516 0.00000 0.00000 0.90168 0.52157 0.81534 0.81876 0.77203 1.00000 0.92896 0.00000 0.57030 0.64583 0.00000Al-Fahd 0.83939 0.81203 0.77599 0.00000 0.91619 0.85487 0.89227 0.82699 0.86424 0.92896 1.00000 0.00000 0.80821 0.86983 0.00000
Madkhalee 0.00000 0.00000 0.00000 0.90076 0.00000 0.95733 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000 0.00000 0.00000 0.00000Madhi 0.84403 0.72532 0.94616 0.00000 0.86383 0.88681 0.79010 0.00000 0.82544 0.57030 0.80821 0.00000 1.00000 0.00000 0.00000
Al-Awdah 0.00000 0.00000 0.00000 0.00000 0.87589 0.94896 0.00000 0.00000 0.76400 0.64583 0.86983 0.00000 0.00000 1.00000 0.00000Qaradhawi 0.00000 0.00000 0.00000 0.00000 0.69418 0.00000 0.63895 0.00000 0.77915 0.00000 0.00000 0.00000 0.00000 0.00000 1.00000
Overall Theme Scores
Geometric Mean = nn
iia
1
1
Theme Analysis: Confidence Interval vs Average
Texts Degree NrmDegreeAl-Fahd 8 9.389 70.053Maqdisi 6 8.549 63.789Ibn Baaz 10 8.41 62.745Shuaibi 3 7.606 56.753Madhi 4 7.26 54.17Azzam 4 7.185 53.608
Abdul Aziz 4 5.818 43.41al-Iraqi 7 5.189 38.714Mugrin 10 4.955 36.971
Al-Awdah 16 4.105 30.625Alshareef 2 3.833 28.602At Tartusi 2 3.55 26.489Qaradhawi 7 2.112 15.76Madkhalee 7 1.858 13.864al Albanee 2 1.658 12.368
WeightedAuthor islam jihad salaf infidel foreigners battlegrounds sheikh jew Average Rank Overallal-Fahd 10 5 6 3 7 6 2 9 5.57 1Mugrin 6 4 12 6 1 4 11 9 6.29 2Shuaibi 11 9 11 1 6 5 3 9 6.57 3Azzam 12 2 10 7 8 3 7 8 7.00 4
Maqdisi 9 1 8 8 5 9 10 4 7.14 5al-Iraqi 8 6 13 5 9 1 11 9 7.57 6
At-Tartusi 14 8 4 2 15 10 1 5 7.71 7Abdul Aziz 5 10 9 4 10 12 5 6 7.86 8
Madhi 2 7 13 12 3 7 11 2 7.86 9Qaradhawi 15 3 13 9 11 2 4 1 8.14 10Alshareef 3 14 3 14 2 11 11 3 8.29 11
Madkhalee 4 13 2 11 12 12 6 9 8.57 12Al-Awdah 13 11 7 10 4 8 8 7 8.71 13al Albanee 1 14 1 14 14 12 11 9 9.57 14Ibn Baaz 7 12 5 13 13 12 9 9 10.14 15
Theme Ranks
Able to look at each theme individually.
Average Rank does not account for connections importance, weighting, predictors
Themes are combined
Can see connections between authors across a combination of themes.
Method Comparison
Themed Network Analysis Plagiarism Theme Ranks Jihad ThemeAl-Fahd Al-Awdah Al-Fahd MaqdisiMaqdisi Maqdisi Mugrin AzzamIbn Baaz Al-Albanee Shuaibi QaradhawiShuaibi Al-Iraqi Azzam MugrinMadhi Azzam Maqdisi Al-Fahd
Top 5 OnceTop 5 Every Method
Conclusions
Socially Engineered Algorithms involve extensive tradeoffs and decisions by the mathematician that can significantly impact commander’s decision-making.
Multiple views of the same data is a critical requirement.
Find Linkages in large amounts of data Find Connections across multiple fields Non-Tangible Relationships Real World: Track / Catch criminals / radical ideologues Representation of Human Terrain
Future Work
Publish method in Journal of Computational and Mathematical Organization Theory
Integration into ORA (Organizational Risk Analysis) Statistical Software: In use by Intelligence Analysts.
Analysis of change over time
References
Dr. Jaret Brachman. Combating Terrorism Center, USMA.
Dr. Steven Corman. Hugh Downs School of Human Communication, Arizona State University.
http://www.checkpoint-online.ch/CheckPoint/Images/N-HusseinCapture.jpg
http://www.salmac.co.za/profile-writing-arabic.gif Wasserman, Stanley and Katherine Faust. Social
Network Analysis: Methods and Applications. New York: Cambridge University Press, 1994, 4.