Forsvarsakademiet · Web viewProject AvatAR Project Avatar is a research project conducted by the...

16
Social Media for Intelligence: Practical Examples of Analysis for Understanding Jonas Alastair Juhlin a , John Richardson b , a Royal Danish Defence College; b United States Army Research Laboratory, Computational and Information Science Directorate, Aberdeen Proving Ground, MD USA Abstract Social media has become a dominating feature in modern life. Platforms like Facebook, Twitter, and Google have users all over the world. People from all walks of life use social media. For the intelligence services, social media is an element that cannot be ignored. It holds immense amount of information, and the potential to extract useful intelligence cannot be ignored. Social media has been around for sufficient time that most intelligence services recognize the fact that social media needs some form of attention. However, for the intelligence collector and analyst several aspects must be uncovered in order to fully exploit social media for intelligence purposes. This paper will present Project Avatar, an experiment in obtaining effective intelligence from social media sources, and several emerging analytic techniques to expand the intelligence gathered from these sources. Keywords: Social Media Intelligence 1. PROJECT AVATAR Project Avatar is a research project conducted by the Royal Danish Defence College. The objective of Project Avatar is to collect and process intelligence from open sources (Open Source Intelligence (OSINT), like Google, Facebook and Twitter) and see if it is possible to obtain effective and precise intelligence in the most cost effective manner. In other words, the project attempt to answer “How simplistic can an intelligence collection platform be?”

Transcript of Forsvarsakademiet · Web viewProject AvatAR Project Avatar is a research project conducted by the...

Social Media for Intelligence: Practical Examples of Analysis for Understanding

Jonas Alastair Juhlina, John Richardsonb,

aRoyal Danish Defence College;b United States Army Research Laboratory, Computational and Information Science Directorate, Aberdeen Proving Ground, MD USA

Abstract

Social media has become a dominating feature in modern life. Platforms like Facebook, Twitter, and Google have users all over the world. People from all walks of life use social media. For the intelligence services, social media is an element that cannot be ignored. It holds immense amount of information, and the potential to extract useful intelligence cannot be ignored. Social media has been around for sufficient time that most intelligence services recognize the fact that social media needs some form of attention. However, for the intelligence collector and analyst several aspects must be uncovered in order to fully exploit social media for intelligence purposes. This paper will present Project Avatar, an experiment in obtaining effective intelligence from social media sources, and several emerging analytic techniques to expand the intelligence gathered from these sources.

Keywords: Social Media Intelligence

Project AvatAR

Project Avatar is a research project conducted by the Royal Danish Defence College. The objective of Project Avatar is to collect and process intelligence from open sources (Open Source Intelligence (OSINT), like Google, Facebook and Twitter) and see if it is possible to obtain effective and precise intelligence in the most cost effective manner. In other words, the project attempt to answer “How simplistic can an intelligence collection platform be?”

Real world cases were used in order to give the necessary level of measurement to ascertain the usefulness of the intelligence platform. A conceptual model (called Xai Li’s model from 2010 [3]) is used to illustrate the social networks and the spatio-temporal characteristics of the social network. The model will organize the users task and focus the search actions toward the desired knowledge objective, as set out by the research question.

The conceptual model (which will be explained further in the next section) is necessary because it gives a consistent method for the user in all the search tasks. The aim of the method is to organize the known information, also called data, into an effective and constructive way in search for the unknown element also called the knowledge element. It will also ensure standardization in solving of the tasks.

It is necessary to determine how simplistic we can make an intelligence collection platform, and still gather useful intelligence in order to set a minimum standard in which we can measure training requirement, commercial tools usefulness, and the platforms ability to support other intelligences platforms.

1.1 Visualizing of Social Media Data and Network:

Researcher Peuquet defined that spatio-temporal data consist of three basic components to structure questions consisting of When, Where and What [2]. Each component represented an element that constituted a question. When searching for information within a social network, a question was constructed by one or more data elements which are known information. He also defined the components of a question to be interrelated. These basic components that composed a question were interrelated with each other as follows:

When + Where What. Describes the object, or set of objects, that are present at a given time, or set of times, at a location or set of locations.

What + When Where. Describes the location, or set of locations, occupied by a given object at a given time or a period.

Where + What When. Describes the time, or a period, that a given object, or sets of objects, occupy at a location or set of locations.

It is necessary to determine the component elements before one can answer the unknown element. For example, in order to determine the What element it is necessary to know (or assume) the When and Where elements. The compositions of the questions were used to structure search actions within social media in academic research. Two different researchers worked with this approach. Mennis [2] and Xai Li [3]. Xai Li constructed a model to illustrate the relationship they uncovered between the question elements. In the process they added a fourth question that concerns the existence of the object.

What + When + Where Whether. It describes the existence of an object in a certain situation.

The model from Xai Li [3] and can be seen below:

Operationalizing the above figure for search actions, Xai Li [3] designed the following concepts. The Whether element was added as a component element to increase accuracy of the search actions.

When + where +Whether What. Describes the character, attributes and nature of an event, object, a series of events or a series of objects.

What + When + Whether Where. Describes the location, or a series of locations, of an event.

Where + What + whether When. Describes the time or a period of an event

What + When + Where Whether. Describes the existence of an object in a certain situation.

In these equations the right side is the data element and the left side is the knowledge element or object of the search. The search actions consists of one knowledge element, which is the element that you wish to confirm, deny or improve. The other known elements are the data elements. The knowledge element or the object of the question can be a node or link between two nodes or it can be a sub-network in a search task within a dynamic social network [1] [4]. In this project the known information is organized as the data elements in order to confirm, deny or improve the accuracy of the knowledge element.

1.2 Test Conditions

We selected two people, with an academic degree at Master level, called “Kandidat uddannelse fra universitet” in Denmark, for the test-persons. We decided that it was important the test persons did not have any intelligence training or experience at all. They would start from the lowest minimum level. This would lower possible biases towards the method or the tasks. Positive or negative experience could affect their attitude towards the tasks. Previous training might also disrupt the application of the method. We decided that the minimum requirement was an academic background at the level of graduate students (completed BA or undertaking a MA). The test-persons were familiar with basic academic research methods. We used ordinary laptops, one 4 year-old HP computer with Windows Vista and one 1-year old MacBook computer. No tools were used, neither free nor commercial, to store and handle the information. The test-persons only used Google search engine, Twitter and Facebook. We explained the method and the theoretical model behind the concept for the test persons before they received the tasks. We collected material from within the Middle East and there would be some material on Arabic. One of the researchers could read some Arabic and we used it on some of the sources. If the cases had been on Russian or Chinese we could not have used these sources. The language element is a challenge that affects the use of sources. If the operator can read a foreign language, besides English, it will widen the number of sources that are available. The tasks can be completed with English as the main language only but it does offer a considerable advantage to the operators if one or more have some skill in foreign language.

1.3 Tasks

A private group presented the cases as an example of the results, and level of detail, a group could achieve by exploiting social media and open sources for information on Islamic State (DAESH). The results were displayed to the public on the internet. We decided to pick these cases, and try to see if we could improve the results by using the method of structuring the search tasks by organizing the information into data elements (known information) and an knowledge element (known information).

We picked the cases because they were diverse in nature and it allowed us to try the different knowledge elements on different tasks. We judged that the cases could allow us to gather information that would be useful for an intelligence purpose. Only 2 cases, out of 5, will be presented here due to length of results.

Task 2: Confirm or deny IS areas of control in Northern Iraq

Task 4: Determine the existence of an IS training camp and improve the geolocation

1.3.1 Task 2 Confirm or Deny Control of Select Areas of Responsibility (AORs)

In this task, we wanted to confirm or deny IS (DAESH) control over the following townships and municipalities. We appreciated that the situation was fluid and that many of the areas might be contested. We organized the search as follows:

The nature and character of a controlled area, which is the attributes of the What element, is defined as an area where one side has the sole military control of the area and that it is not contested by the other side with military forces. When an area is controlled, the controlling force can impose their military and political will in the area. If there is military confrontation in the area, it is a contested area.

The When element is defined as the start of the IS offensive on August the 7th until November the 6th the final day of work on the case. It appeared that the offensive had faded away by then.

The Where element were the areas of interest listed below.

We seek to uncover whether IS has control over specific areas. It is the Whether element that is the object.

Sub Task no.

Search word

Platform used

IS control – confirmed or denied

Date of search

1

Sinjar + IS

Google + Twitter

Denied. IS had control over the area from early August, but reports from October shows that the area are contested.

24-10-14

2

Talkif (NR) – Talkaif + Iraq + IS – Telkaif + Iraq + IS

Google + Twitter

Confirmed. There are several reports that confirms large numbers of refugees, as well as the town occupied.

30-10-14

3

IS control al-hamdaniya - al hamdaniya ISIS –

al hamdaniya

Google + Twitter + seacrhfacebookpost.com (NR) + social-searcher.com (NR)

Denied. Al-Hamdaniya has not occupied that much of the media, but nothing directly says it is under control by IS.

30-10-14

4

Makhmour + Iraq + IS

Google + Twitter

Denied. Kurdish forces liberated the town in late August.

30-10-14

5

Zammar – al zammar – zammar + IS – zammar + IS

Google + Twitter + searchfacebookpost.com (NR) + social-searcher.com (NR) + icerocket.com (NR)

Denied. Twitter updates report heavy fighting between Kurdish and IS forces and the area is contested.

30-10-14

6

Rabee’ah (NR) - Rabiah + Iraq + IS

Google + Twitter

Denied. A single source stated that Kurdish forces had liberated the town after a two-day siege.

06-11-14

7

Bartela (NR) –

Bartella Iraq

Bartala Iraq

Google + Twitter

Denied. A news article and several tweets shows that IS had control over the town around July. In August US airstrikes and Iraqi army special forces attacked the town and tweets reports that IS militias are retreating from the town.

06-11-14

8

Karam Lay (NR) – Karamlaish + Iraq – karam laish – karamlash + Iraq – karam lash + Iraq + IS

Google + Twitter

Confirmed.

Archbishop of Erbil (Irbil) says that Christians are fleeing the town due to IS attacks and occupation.

06-11-14

9

Al-Kweir

Google + Twitter + searchfacebookposts.com + icerocket.com + social-searcher.com

No results of relevance on any of the searches

03-11-14

10

Wana + IS + Iraq

Google + Twitter

Confirmed. Kurdish forces reports the fall of the town

03-11-14

11

Fifeel – fifeel + IS

Google + Twitter + searchfacebookposts.com + icerocket.com + social-searcher.com

No results of relevance on any of the searches

03-11-14

12

Ba’ashiqa (NR) - Baashiqa + IS + Iraq

Google + Twitter

Denied. Through October and early November Kurdish sources reports fighting in Baashiqa. From pictures and videos, it appears that the Kurdish forces have secured a hold in the town.

06-11-14

13

Al-shalalat – al shalalat + IS

Google + Twitter + searchfacebookposts.com + icerocket.com + social-searcher.com

Denied. It is possible the sources are only observing the Mosul area. Nevertheless Twitter reports Kurdish forces fighting in al-shalalat.

03-11-14

14

Sada Bawiza + IS + Iraq

Google + Twitter

No results of relevance on any of the searches

03-11-14

15

Ayn Zalah – Ein Zala – Ain Zala

Google + Twitter

Denied. It can be concluded that peshmergan troops took the city back in late august, and no article shows it has been retaken by IS.

03-11-14

16

Mosul Dam + IS

Google + Twitter

Denied. Kurdish forces have captured the dam and are now in control.

03-11-14

17

Tumarat – Tumarat base

Google + Twitter + searchfacebookposts.com + icerocket.com + social-searcher.com

Confirmed. A single source reports that IS took control with the base in August.

03-11-14

In summary, all searches were carried out in both google and twitter on 30/10 2014, 03/11 2014 and 06/11 2014. The spelling was of the greatest importance. In no. 2 Telkif could also be spelled Talkaif or Telkaif. In no. 12 Ba’ashiqa Township could also be spelled Bashiqa. Other alternative spelling is: Rabee’ah – Rabiah, Karam Lay- Karamlaish – karam laish – karamlash – karam lash, and Ba’ashiqa – Baashiqa. The different spelling gave more results from both twitter and various news sites and blogs. The results varied in form, from updates with pictures, update without pictures, videos, news articles and blog updates. Total time spent on the task was 34 hours. Each task took about 2 hours to complete. The test persons divided the tasks between them and each used 17 hours.

1.3.2 TASK 4 Confirm or Deny the Existence of a Facility and Improve Geo-Location

The object is twofold. We must determine whether the facility exists and improve the location. The object is a training camp or military camp/barracks for an ISIS military unit. These attributes of the objective is the What element.

The timeframe was limited to the period where IS had been active in Northern Iraq. The When element was limited to the period between 1st of June to 20th of November. We use Nanawa province, Northern Iraq as the location as it is mentioned in the initial information. It is the Where element.

We confirmed the existence of the camp, the Whether element, and used the same data for the What and When element in order to improve the geolocation which is the Where element.

Search words

Platform used

Results

Date of search

Nanawa + Iraq + IS + training camp

Google and twitter

No results. The province is transcribed “Ninawa”

20/11/2014

Ninawa + Iraq + IS + training camp

Google

Article and video on longwarjournal.org. see reference. Existence confirmed.

20/11/2014

Ninawa + Iraq + IS

Twitter

A considerable number of tweets that links to the video on longwarjournal.org.

20/11/2014

In this task there was one source which included both an article and a video. Both sources confirmed the existence of a training facility in Ninawa. The video displayed the type of training being conducted and what facilities were available. They conducted basic training for light infantry. Their weapons available were AK47 variations, PKM light machineguns and variants of the RGP launcher. Some of the instructors had M16 rifles. The basic uniform and equipment is also on display.

The geo-location could not be improved. No sources gave any exact information on its whereabouts in Ninawa. No distinct terrain features appeared in the video. The camp is surrounded by high earth banks on three sides, with a building on the fourth side. The camp consisted mostly of white tents with only a single structure. The camp trained a company sized force of approximately 125 recruits. Total time spent on the task was 5 hours.

When: The article is from the 12th of October 2014 and the video is from about the same time.

What: The target facility is used to train and indoctrinate militia forces.

Whether: The existence of the facility is confirmed by the video.

Where: The Ninawa location was given in the task, but the exact geo-location were we not able to improve.

1.4 Task Assessments and the Strength and Limitation of Collecting From OSINT

Task 2 was more time consuming. There were few original sources but these sources were re-twitted several times. The strongest evidence we could find was a tweet backed up by a blog or news feed. The situation was very fluid in many places due to fighting between the Kurds and IS militias. The When element became very important. We used it to control the time frame in order to get accurate information. Common to all tasks, the time frame could be narrowed or broaden depending on the requirements and results. We discovered that by following the Tweets we could almost get a real time picture of the fighting in some of the areas.

The task was not without problems. Some of the areas gave no search results. The difference of the spelling affected the search results. We cannot control the spelling used by the users and the possibilities are several as task 2 details. Alternative some of the areas might not have received any attention on social media or in the news. We found no information on 2 of the 17 areas.

We concluded that accurate information on 15 out of 17 (88,24 %) areas was to a satisfactory degree. It took longer time to confirm or deny the areas than we thought. This was due to a large amount of information that was continuously tweeted from the contested area. However, it did allow us to give very precise and detailed answers by using 34 hours total by 2 operatives.

Task 4 gave us some challenges. We found the video within one hour of searching. The video gave us much information on the nature of the camp (including types of recruits, quality of training, weapons). However, it was beyond our capabilities to accurately determine the geolocation of the camp. There were no distinct terrain features that we could use and there were no other indicators we could use.

Improving the geolocation (the “Where” element) was the most difficult task. It required either a geotagged photo or a clear picture of a distinct terrain feature. It was not possible to improve the geolocation if neither requirement could be fulfilled. The nature of the information on the open sources and social media made it simpler to answer the other knowledge elements.

Overall, the tasks showed us some of the strength and limitations of open sources and social media. It is important to underline that open sources (OSINT), including social media, suffers from the limitation that befalls all single sources. To effective validate a piece of information, several sources are needed like SIGINT, HUMINT, GEOINT or other sources of intelligence. Therefore, validation is not considered in these cases. Instead we wanted to show the amount of information and level of detail that was available on the social media and open sources alone. Social media and OSINT in general must be seen as a supplement to the other sources of intelligence.

The results from the cases gave us very detailed information. Detailed information could form a basis for further collecting and investigation. Detailed information from OSINT could be used to better direct and guide other collection platforms and it will contribute to a more focused tasking of other intelligence assets.

In Task 2, the information was almost real-time but a lot of time could be spent by monitoring the Twitter accounts. The information flow on Twitter was very fast and it could be used to guide additional intelligence assets in order to create situational awareness and to follow events unfolding. The detailed results could be used to task other collection platforms. Considering the time spent, it could be a cost effective investment to begin the collection or analysis with a search on OSINT and social media. Task 4 showed a challenge with OSINT. Geolocation was very difficult, and the usefulness of the answer was limited.

The strength of OSINT and the social media is as follows:

· Plenty of information could be gathered and processed quickly and in a time effective manner.

· The only requirement for the collector was familiarity with academic research method.

· The test persons used only free search engines and basic computers. They were both graduate students, applying basic research skills to the tasks and obtained detailed results.

· Xai Li’s model proved effective when dealing with the following knowledge elements: What, Whether, and When.

· Exploitation of local users and regional observers is possible by using their pictures, observations, and descriptions. This greatly expands our reach into areas we would not normally have access to.

· Information can be gathered in a time-effective manner and provide an entry point for further investigations.

· In short limited resources with substantial gains.

Limitations:

· The Where element was the most difficult to determine. It was the knowledge element that we had the least success with. In order to determine the Where element some special elements must be present like landmarks, terrain features or geotagging on photos.

· One should be careful not to suffer from overload of information. There are a lot of cases with repeated “re-Twitting” of the same piece of information.

· The collector must take into account the bloggers bias and agenda. It is also important to take into account how many links the information has been passed through. This can be considerable in social media.

1.4.1 Implications for Training

The most important element is academic research skills. An effort must be made to get to the original source (if possible). How many links has the information been passed through, and is there a risk that the information has become distorted? It is also important to take into account personal bias and agenda of the sources. All these elements must be constantly present in the mind of the collector and analyst with social media and the OSINT. The above would be universal skills with graduate students (or should be as they are the trademark of the academic research method).

The method of visualizing social media data can be an important aid for the collector and analyst. The method proved to be a simple tool for the collectors to organize the search actions and focus the effort towards the knowledge element. It organizes the known data and aides in the search for the desired knowledge. The method is simple to apply to all tasks and it is simple to implement with the staff. However, the method (as with other methods) does not guarantee a satisfactory answer.

Due to the simplicity of the method and the collection platform, the level of results gained in the cases should be the minimum level to be expected by the intelligence staff. The basic experience level of the test persons as well as the simplicity of the method makes it attainable to demand the same level of all analysts regardless whether single source or all source analysts.

In regards to commercial tools, the minimum expectation is that the tools must be able to answer knowledge elements relating to What, Whether, Where, and When. Preferably the tools should be able to handle a much larger amount of data, including being able to monitor. It is a matter of personal judgement as to how much one is willing to pay for commercial tools, but the results of the cases should give an indication on the minimum level that should be accepted by a commercial product.

1.5 Project Avatar Conclusion

From the tasks we learned that by structuring our search actions in accordance with Xai Li’s model, we could gather detailed information concerning What, When, and Whether. The Where element required more information that we could extract under the test circumstances. If the requirements could not be fulfilled, the Where element could not be improved significantly.

The most important skill of the operator is a critical research skill. It is a universal skill among graduate students and it is most suited for the search actions. With no special equipment or training the two test persons could extract detailed intelligence from social media and OSINT. The information was useful because there was enough detail so that other sources of intelligence could be tasked in a focused direction. Due to the simplistic nature of the tasks we want to encourage all intelligence analyst and collectors to use OSINT and social media actively regardless of specialty or function.

To answer the initial research question, “How simplistic can an intelligence collection platform be?” The answer is; two graduate students with no experience, intelligence training or special equipment can gain detailed and extensive information on What, Whether, and When knowledge elements in a 2-7 hour timeframe by using social media and open sources.

Human Terrain Understanding

In 2013, U.S. Army Researchers in the Computational and Information Science Directorate began investigating the impact OSINT and Social Media analysis tools in a simulated field event [5]. With the understanding that their exists large amounts of crucial information that an Intelligence Officer is unable to digest due to lack of time and resources the researchers investigated a way to alleviate this technology gap. Using several prototype tools under development at the U.S. Army Research Laboratory they were able to create the Human Terrain Exploitation Suite (HTES) and test it at the Command, Control, Communications, Computers, Intelligence, Surveillance and Reconnaissance (C4ISR) On The Move (OTM) field exercise. The HTES consisted of tools performing topic modeling with links to documents and semantic search and sentiment analysis of entities and topics over time. The results showed that the intelligence officers were able to digest large amount of simulated OSINT and Social Media and use the tools to recreate the ground truth social network and plan a successful Key Leader Engagement. This field exercise is an interesting companion to the Project Avatar. Unlike Project Avatar the users were well trained intelligence officers (with minimal training using the HTES) and did not have access to standard OSINT sources (twitter, google, news) except through the HTES.

Following experimentation with HTES the U.S. Army Research Laboratory continues to study methods to exploit OSINT and Social Media, specifically through collaborations with academia under the Network Science Collaborative Technology Alliance (NS-CTA).

References

[1]Ding Ma, Visualization of Social Media Data: Mapping Changing Social Networks, The Netherlands 2010, University of Twente

[2] Mennis, J. L., Peuquet D. J., and Qian L. A Conceptual Framework for Incorporating Cognitive Principles Into Geographical Database Representation, International Journal of Geographical Information Science 14(6)

[3]Li, Xai: The Time Wave in Time Space: A Visual Exploration Environment for Spatio-temporal Data. University of Twnete, Enschede, 2010

[4]Li, Xai and Kraak M. J. The Time Wave – A New Method of Visual Exploration of Geo-data in Time and Space Cartographic Journal 45(3)

[5]Hanratty, T., Richardson, J., Mittrick, M., Dumer, J., Heilman, E., Roy, H., and Kase, S., "Applying Visual Analytics to Open Source Information to Improve Tactical Human Terrain Understanding", Proceedings of the 2014 SPIE Defense, Security,and Sensing Conference, 5 - 9 May 2014 in Baltimore, Maryland