Pattern Recognition and Applications Lab OSINT -...

25
Pattern Recognition and Applications Lab Università di Cagliari, Italia Dipartimento di Ingegneria Elettrica ed Elettronica OSINT - Raccolta di Informazioni dal Web e da Sorgenti Aperte Davide Ariu http://pralab.diee.unica.it Intelligence Defined * Simply defined, intelligence is information that has been analyzed and refined so that it is useful to policymakers in making decisions—specifically, decisions about potential threats to national security. 1. Intelligence is a product that consists of information that has been refined to meet the needs of policymakers. 2. Intelligence is also a process through which that information is identified, collected, and analyzed. 3. And intelligence refers to both the individual organizations that shape raw data into a finished intelligence product for the benefit of decision makers and the larger community of these organizations. *https://www.fbi.gov/about-us/intelligence/defined

Transcript of Pattern Recognition and Applications Lab OSINT -...

Page 1: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

Pattern Recognition and Applications Lab

Università di Cagliari, Italia

Dipartimento di Ingegneria Elettrica

ed Elettronica

OSINT - Raccolta di Informazioni dal Web e da Sorgenti Aperte

Davide Ariu

http://pralab.diee.unica.it

Intelligence Defined*

•  Simply defined, intelligence is information that has been analyzed and refined so that it is useful to policymakers in making decisions—specifically, decisions about potential threats to national security. 1.  Intelligence is a product that consists of information that has been refined to

meet the needs of policymakers. 2.  Intelligence is also a process through which that information is identified,

collected, and analyzed. 3.  And intelligence refers to both the individual organizations that shape raw data

into a finished intelligence product for the benefit of decision makers and the larger community of these organizations.

*https://www.fbi.gov/about-us/intelligence/defined

Page 2: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

Intelligence Cycle*

•  The intelligence cycle is the process of developing unrefined data into polished intelligence for the use of policymakers.

–  The process has a circular nature, although movement between the steps is fluid. Intelligence uncovered at one step may require going back to an earlier step before moving forward.

•  6 steps: 1.   Requirements 2.   Planning and Direction 3.   Collection 4.   Processing and Exploitation 5.   Analysis and Production 6.   Dissemination

*https://www.fbi.gov/about-us/intelligence/intelligence-cycle

http://pralab.diee.unica.it

Intelligence Collection Disciplines (INTs)* - 1

•  Intelligence is not only gathered through secret or covert means. –  While some intelligence is indeed collected through clandestine operations and

known only at the highest levels of government, other intelligence consists of information that is widely available.

•  Intelligence can be used for both legitimate and nefarious purposes.

•  Currently 5 intelligence collection disciplines exists: 1.   HUMan INTelligence (HUMINT). Is the process of gaining intelligence from

humans or individuals by analyzing behavioral responses through direct interaction.

•  Openly vs. Clandestine/Covert (Espionage) 2.   SIGnal INTelligence (SIGINT) refers to electronic transmissions that can be

collected by ships, planes, ground sites, or satellites. •  Communications Intelligence (COMINT) is a type of SIGINT and refers to the

interception of communications between two parties.

*https://www.fbi.gov/about-us/intelligence/disciplines

Page 3: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

Intelligence Collection Disciplines (INTs)* - 2

•  […] 3.   IMagery INTelligence (IMINT) is sometimes also referred to as photo

intelligence (PHOTINT). 4.   Measurement And Signatures INTelligence (MASINT) includes the advanced

processing and use of data gathered from overhead and airborne IMINT and SIGINT collection systems.

– 

– 

– 

– 

5.   Open Source INTelligence (OSINT) is the process of gathering intelligence from publicly available resources (including Internet and others)

*https://www.fbi.gov/about-us/intelligence/disciplines

http://pralab.diee.unica.it

OSINT - Definitions

•  Open Source Information (OSINF) is data which is available publicly – not necessarily free

•  Open Source Intelligence (OSINT) is proprietory intelligence recursively derived from OSINF

•  OSINF Collection is monitoring, selecting, retrieving, tagging, cataloging, visualising & disseminating data

•  OSINT is the result of expert analysis of OSINF

Slide Credit: C.H. Best, JRC – European Commission

Page 4: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

OSINT – Origins - 1

•  Term Originates from Security Services

•  The practice of using open source information to build intelligence is indeed not new:

–  In Italy OVRA (Organizzazione per la Vigilanza e la Repressione dell'Antifascismo) reported to use OSINF since 1930

“Gli anonimi informatori, secondo l’ex-prefetto di Brescia Arturo Bocchini (capo indiscusso sia dell’OVRA che della Polizia sino al 1940, anno della sua morte) dovevano fornire elementi per “…sondare con ogni mezzo e continuamente la pubblica opinione”, in modo che Mussolini potesse “…rendersi conto della temperatura del paese”. –  During the cold war, american and german secret services heavily analysed the

russian press to gather information about their russian enemies

•  Nevertheless, open source information has been traditionally considered definitely less valuable than classified information

* “Anonymous Informer Report”, OVRA Region 1 – Milano, 1939 - http://gnosis.aisi.gov.it/Gnosis/Rivista2.nsf/ServNavig/15

http://pralab.diee.unica.it

OSINT – Origins - 2

•  Paradigm change after 9/11 (shock to the system of old style intelligence)

–  Pre 9/11 intelligence services were closed and relied on HUMINT, SIGINT and classified information

–  Realisation that open source could have foreseen attacks -> “Failure to connect the dots” reassessment in use of OS, & in sharing intel between agencies.

–  Terrorists skilled use of internet was an eye opener.

•  The fast growth of the Internet and the appearance of Social Networks have further pushed the paradigm change

•  “The need to restructure the intelligence community grows out of six problems that have become apparent before and after 9/11:

–  Structural barriers to performing joint intelligence work –  Lack of common standards and practice across the foreign-domestic divide –  Divided management of national intelligence capabilities –  Weak capacity to set priorities and to move resources –  Too many jobs –  Too complex and secret”

*The 9/11 Commission Report

Page 5: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

OSINT – Who is Involved?

Tool Builder/Developer

Minister General

Commissioner CEO

Analyst

Classified Information

OSINF Collector /Researcher

Slide Credit: C.H. Best, JRC – European Commission

http://pralab.diee.unica.it

Who uses OSINT?

•  Security Services, Law Enforcement & Military Bodies

•  Governmental Organisations –  EU, NATO, AU Situation Centre –  IAEA – Nuclear Safeguard –  UN Department for Peacekeeping Operations –  World Health Organisation –  NGOs

•  Large Companies –  Oil/Gas Industries –  Multinationals

•  Business Intelligence

Page 6: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

OSINT Sources of Information - 1 •  Media

–  Newspapers, magazines, radio, television, etc.

•  The Internet –  News, Social Networks, Blogs, Video sharing sites, Thematic sites. etc. –  DeepWeb (not indexed by traditional search engines)

•  Dynamic Web Pages •  Sites behind Log-in •  Sytes with a ROBOT.txt file properly configured

–  Dark Nets/Web (TOR, I2P)

•  Subscription Services –  LexisNexis (http://www.lexisnexis.com) is a corporation providing computer-assisted legal

research as well as business research and risk management services. During the 1970s, LexisNexis pioneered the electronic accessibility of legal and journalistic documents.

–  Factiva (http://www.dowjones.com/products/product-factiva/) is the world’s leading source of premium news, data and insight, with access to thousands of premium news and information sources on more than 22 million public and private companies

–  Jane's (www.janes.com) Information Group is a British publishing company specialising in military, aerospace and transportation topics.

–  BBC Monitoring (http://www.bbc.co.uk/monitoring) includes news, information and comment gathered from the mass media around the world for service subscribers.

http://pralab.diee.unica.it

OSINT Sources of Information - 2

•  Commercial Satellites –  http://www.euspaceimaging.com/applications/fields/security-defense-intelligence –  https://www.digitalglobe.com/industries/defense-and-intelligence

•  Public Data –  government reports, budgets, demographics, hearings, legislative debates, press

conferences, speeches, marine and aeronautical safety warnings, environmental impact statements and contract awards.

•  Professional and Academic –  conferences, professional associations, academic papers, and subject matter experts.

•  Open Data –  https://open-data.europa.eu/en/data –  http://www.dati.gov.it –  http://www.datiopen.it –  Geospatial Data Providers

•  An exhaustive list is available here https://en.wikipedia.org/wiki/List_of_GIS_data_sources

Page 7: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

*INT Target Modes

•  Intelligence activities can both target: –  Individuals - Intelligence gathering refers to querying online public resources

that provide information specific to the targeted individuals.

–  Corporates and organizations - Here, intelligence gathering refers to the process of collecting information about target organizations.

http://pralab.diee.unica.it

*INT Target Modes – Targeting Individuals

•  When individuals are targeted, the following informations may be of interest:

–  Physical locations of the individuals –  OSN profiles for checking on relationships, contacts, content sharing, preferred

web sites, etc. –  E-mail addresses, users’ handles and aliases available on the Internet including

infrastructure owned by the individual such as domain names and servers. –  Associations and historical perspective of the work performed including

background details, criminal records, owned licenses, registrations, etc. This data is categorized into public data pro- vided by official databases and private data provided by profes- sional organizations.

–  Released intelligence such as content on blogs, journal papers, news articles, and conference proceedings.

–  Mobile information including phone numbers, device type, applications in use, etc.

Source: Targeted Cyber Attacks Multi-staged - Attacks Driven by Exploits and Malware, Elsevier, 2014

Page 8: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

*INT Target Modes – Targeting Corporates and Organisations •  When Corporates and Organisations are targeted, the following

informations may be of interest: –  Determining the nature of business and work performed by target corporates

and organizations to understand the market vertical. –  Fingerprinting infrastructure including IP address ranges, network peripheral

devices for security and protection, deployed technologies and servers, web applications, informational web sites, etc.

–  Extracting information from exposed devices on the network such as CCTV cameras, routers, and servers belonging to specific organizations.

–  Mapping hierarchical information about the target organizations to understand the complete layout of employees at different layers including ranks, e-mail addresses nature of work, service lines, products, public releases, meeting, etc.

–  Collecting information about the different associations including business clients and business partners.

–  Extracting information out of released documents about business, marketing, financial, and technology aspects.

–  Gathering information about the financial stand of the organization from financial reports, trade reports, market caps, value history, etc.

Source: Targeted Cyber Attacks Multi-staged - Attacks Driven by Exploits and Malware, Elsevier, 2014

http://pralab.diee.unica.it

OSINT Processes

Collect Transform Analyse Visualise & Report Collaborate

Slide Credit: C.H. Best, JRC – European Commission

•  Multilingual Information Retrieval

•  Search

•  Crawl

•  News feeds

•  Machine Translation

•  Geo-tagging

•  Translation

•  Entity Extraction

•  Entity Resolution

•  Link Analysis

•  Relationships

•  Geolinking

•  Trends

•  Statistics

•  Networks

•  Relations

•  Time graphs

•  Maps

•  Intel Wiki

•  IM

•  Case DB

•  Publish

TECHNICAL ISSUES •  Data Mapping •  Data Deduping •  Data Cleansing •  Data Conversion •  Data Linking •  Data Normalisation

Page 9: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

Information Collection – Issues (1)

•  Information may be either textual or non-textual •  Textual Information

–  How can I search it? •  Search Engines

– 

– » » » » 

•  Libraries – – 

–  How can I extract it? •  API – constraints on the information which can be accessed; subject to change;

specific for earch platform •  Scraping (ad-hoc source code for each platform; noise shall be removed; open

solutions exists need to merge results)

http://pralab.diee.unica.it

Information Collection – Issues (2)

•  Non-Textual Information –  Images

•  People (who)? Places? Texts? Objects? –  Videos

•  People (who)? Places? Text? Objects? (as for images) •  Video contains audio?

– – – 

–  Audio Traces •  Transcription •  Translation

–  Other files •  E.g. Executables files; files in proprietary formats

•  Extraction of Non-Textual information is usally not easy to automate…

Page 10: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

Language Issues (1) Information Collection

•  Culture •  Art •  Religion-Thought •  University

Howzeh

•  Hi-Tech •  Health-Environment •  Society •  Economy •  Markets

•  Sport •  Politic •  International •  Provinces •  Photo •  Video

•  Magazine •  Short news

http://pralab.diee.unica.it

Language Issues (2) Information Collection

http://www.mehrnews.com/en/

Slide Credit: C.H. Best, JRC – European Commission

•  News •  Culture •  Literature •  Religion •  University •  Social •  Economic •  Political •  International •  Sport •  Nuclear •  Photo

http://www.mehrnews.com/fa/

Page 11: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

Language Issues (3) Information Collection

Slide Credit: C.H. Best, JRC – European Commission

http://pralab.diee.unica.it

Social Network Analysis – Issues (1) Information Collection

•  Privacy Restrictions –  Many social networking platforms are continuosly adding privacy control

mechanisms to restrict access to private information. •  A common approach by third party application developers is to create

applications that ask the user for unneeded permissions in the hope of gaining additional information.

•  Platform Restrictions –  Information flowing into online social networks is collected on a massive scale,

but is tightly controlled by the social media platform regarding how it flows out. –  Social networking platforms generally control information flowing out based on

social relationships, user-based privacy settings, as well as rate limiting, activity monitoring, and IP address based restrictions.

•  Data Availability –  In the worst case scenario, the desired information may have never been

gathered by platform, such as when a user chooses not to provide information on a social networking profile.

–  In a more typical case, the target information exists, but privacy and platform restrictions introduce a fog that masks large portions of the desired information.

Source: B. R. Holland, Enabling Open Source Intelligence (OSINT) in private social networks

Page 12: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

Social Network Analysis – Issues (2) Information Collection

•  Data Longevity –  The social graph is far from static, relationship dynamics change frequently and

profiles are updated constantly. •  Previous studies of the Facebook social graph have limited collection periods to a

maximum of one to two weeks to limit the corruption of data due to changes in the social graph during collection time.

–  We can think of each data access as a snapshot in time that is capturing the state of the social graph at collection time.

•  Legal Issues –  While many social networking platforms such as Facebook disallow the use of

screen scrapers and other data mining tools through their terms of service agreements, the legal enforceability of these terms remains unclear.

•  U.S. courts have recognized that in some cases automated web spiders and screen scrapers may be held liable for digital trespassing.

•  E.g. eBay v. Bidder's Edge

•  Ethical Issues –  Crawling social networks for personal information is an ethically sensitive area.

Source: B. R. Holland, Enabling Open Source Intelligence (OSINT) in private social networks

http://pralab.diee.unica.it

World Map of Social Networks (1) Information Collection

Page 13: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

World Map of Social Networks (2) Information Collection

http://pralab.diee.unica.it

Linguist Requirements – 1 Information Transformation

•  The areas of the highest criticality of required foreign language skills and knowledge proficiency are:

–  Transcription. Both listening and writing proficiency in the source language are essential.

–  Interpretation. Both listening in the source language and speaking proficiency in the final language are essengual

–  Translation. Bilingual competence is a prerequisite for translation. Linguists must be able to

•  Read and comprehend the source language •  Write comprehensively in the final language •  Chose the equivalent expression in the final language that fully conveys and

best matches the meaning intended in the source language

Source: Open-Source Intelligence, Federation of American Scientist, 2012

Page 14: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

Linguist Requirements – 2 Information Transformation

•  The Army uses Interagency Language Roundtable (ILR) descriptions of the proficiency levels for the skills of speaking, listening, reading, and writing a foreign language.

–  Level 0 (no proficiency). The Soldier has no functional foreign language ability. Level 0+ is the minimum standard for special-forces personnel and indicates a memorized proficiency only.

–  Level 1 (elementary proficiency). The Soldier has limited control of the foreign language skill area to meet limited practical needs and elementary foreign language requirements.

–  Level 2 (limited working proficiency). The linguist is sufficiently skilled to be able to satisfy routine foreign language demands and limited work requirements.

–  Level 3 (general professional proficiency). The linguist is capable of performing most general, technical, formal, and informal foreign language tasks on a practical, social, and professional level.

–  Level 4 (advanced professional proficiency). The linguist is capable of performing advanced professional foreign language tasks fluently and accurately on all levels.

–  Level 5 (functionally native proficiency). The linguist is functionally equivalent to an articulate and well-educated native in all foreign language skills; the linguist reflects the cultural standards of the country where the foreign language is natively spoken.

Source: L. Getoor, A. Machanavajjhala - Entity Resolution Tutorial

http://pralab.diee.unica.it

Entity Resolution Information Transformation

•  Problem of identifying and linking/grouping different manifestations of the same real world object.

•  Examples of manifestations and objects: –  Different ways of addressing (names, email addresses, FaceBook accounts) the

same person in text. –  Web pages with differing descriptions of the same business. –  Different photos of the same object.

Source: L. Getoor, A. Machanavajjhala - Entity Resolution Tutorial

Page 15: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

Entity Resolution Information Transformation

Before After

Slide Credit: L. Getoor, A. Machanavajjhala - Entity Resolution Tutorial

http://pralab.diee.unica.it

Entity Resolution Information Transformation

Slide Credit: L. Getoor, A. Machanavajjhala - Entity Resolution Tutorial

Page 16: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

Traditional Challenges in Entity Resolution Information Transformation

•  Name/Attribute Ambiguity

Michael Jordan

Tom Cruise

Slide Credit: L. Getoor, A. Machanavajjhala - Entity Resolution Tutorial

http://pralab.diee.unica.it

Entity Resolution Information Transformation

Enzo Ferrari e i suoi piloti «Un padrone delle ferriere». Era questo il modo singolare, ma per certi versi affettuoso, con cui Clay Regazzoni amava definire Enzo Ferrari. E che il Drake fosse un vero padrone, e fatto che nessuno dei piloti che hanno fatto tappa a Maranello puo mettere in discussione. Era lui, Ferrari, che stabiliva simpatie e antipatie, ordini e concessioni, stipendi e provvigioni. Su una cosa soltanto non concedeva margini neppure a se stesso: il valore di chi correva per lui. A patto, pero, che il nome del pilota non avesse il sopravvento, nella popolarita, sul nome delle macchine. Arrivando a tempi piu recenti, il pilota che piu affascinò Enzo Ferrari fu Niki Lauda. Fortemente parsimonioso e terribilmente abile nella trattativa economica, in cui eccelleva peraltro anche il Drake, Niki racconta che Ferrari ad un certo punto gli affibbio un curioso soprannome: «Mi chiamava ebreo, probabilmente perche mi riteneva anche un buon commerciante della mia professionalità. A fine luglio 1977, quando l'ex campione del mondo aveva gia firmato per la Brabham Alfa Romeo, Ferrari rivelo un'ammissione di Lauda. «Fino a quando lei sara vivo io guidero per lei», questo disse Niki al Drake, nel frattempo da dieci anni ingegnere honoris causa. Ma alla fine di agosto, Lauda si recò a Maranello e disse a Ferrari che non avrebbe guidato piu le sue macchine. «Se Lauda fosse restato con noi avrebbe almeno eguagliato il record di Fangio di cinque titoli mondiali vinti», confesso Ferrari tempo dopo. Non perdonò mai Lauda e non lo rivolle in Ferrari quando l'austriaco si offerse. Il perdono arrivò anni dopo, poco prima della morte del Drake. L'ultimo pilota, nella classifica degli amori tecnici di Enzo Ferrari, fu Gilles Villeneuve. Il Grande Vecchio era un umorale, quando Lauda lo lasciò fece una scommessa con se stesso: prender un signor nessuno e portarlo al titolo mondiale.

Page 17: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

Entity Resolution Information Transformation

Enzo Ferrari e i suoi piloti «Un padrone delle ferriere». Era questo il modo singolare, ma per certi versi affettuoso, con cui Clay Regazzoni amava definire Enzo Ferrari. E che il Drake fosse un vero padrone, e fatto che nessuno dei piloti che hanno fatto tappa a Maranello puo mettere in discussione. Era lui, Ferrari, che stabiliva simpatie e antipatie, ordini e concessioni, stipendi e provvigioni. Su una cosa soltanto non concedeva margini neppure a se stesso: il valore di chi correva per lui. A patto, pero, che il nome del pilota non avesse il sopravvento, nella popolarita, sul nome delle macchine. Arrivando a tempi piu recenti, il pilota che piu affascinò Enzo Ferrari fu Niki Lauda. Fortemente parsimonioso e terribilmente abile nella trattativa economica, in cui eccelleva peraltro anche il Drake, Niki racconta che Ferrari ad un certo punto gli affibbio un curioso soprannome: «Mi chiamava ebreo, probabilmente perche mi riteneva anche un buon commerciante della mia professionalità. A fine luglio 1977, quando l'ex campione del mondo aveva gia firmato per la Brabham Alfa Romeo, Ferrari rivelo un'ammissione di Lauda. «Fino a quando lei sara vivo io guidero per lei», questo disse Niki al Drake, nel frattempo da dieci anni ingegnere honoris causa. Ma alla fine di agosto, Lauda si recò a Maranello e disse a Ferrari che non avrebbe guidato piu le sue macchine. «Se Lauda fosse restato con noi avrebbe almeno eguagliato il record di Fangio di cinque titoli mondiali vinti», confesso Ferrari tempo dopo. Non perdonò mai Lauda e non lo rivolle in Ferrari quando l'austriaco si offerse. Il perdono arrivò anni dopo, poco prima della morte del Drake. L'ultimo pilota, nella classifica degli amori tecnici di Enzo Ferrari, fu Gilles Villeneuve. Il Grande Vecchio era un umorale, quando Lauda lo lasciò fece una scommessa con se stesso: prender un signor nessuno e portarlo al titolo mondiale.

http://pralab.diee.unica.it

Entity Resolution Information Transformation

Enzo Ferrari e i suoi piloti «Un padrone delle ferriere». Era questo il modo singolare, ma per certi versi affettuoso, con cui Clay Regazzoni amava definire Enzo Ferrari. E che il Drake fosse un vero padrone, e fatto che nessuno dei piloti che hanno fatto tappa a Maranello puo mettere in discussione. Era lui, Ferrari, che stabiliva simpatie e antipatie, ordini e concessioni, stipendi e provvigioni. Su una cosa soltanto non concedeva margini neppure a se stesso: il valore di chi correva per lui. A patto, pero, che il nome del pilota non avesse il sopravvento, nella popolarita, sul nome delle macchine. Arrivando a tempi piu recenti, il pilota che piu affascinò Enzo Ferrari fu Niki Lauda. Fortemente parsimonioso e terribilmente abile nella trattativa economica, in cui eccelleva peraltro anche il Drake, Niki racconta che Ferrari ad un certo punto gli affibbio un curioso soprannome: «Mi chiamava ebreo, probabilmente perche mi riteneva anche un buon commerciante della mia professionalità. A fine luglio 1977, quando l'ex campione del mondo aveva gia firmato per la Brabham Alfa Romeo, Ferrari rivelo un'ammissione di Lauda. «Fino a quando lei sara vivo io guidero per lei», questo disse Niki al Drake, nel frattempo da dieci anni ingegnere honoris causa. Ma alla fine di agosto, Lauda si recò a Maranello e disse a Ferrari che non avrebbe guidato piu le sue macchine. «Se Lauda fosse restato con noi avrebbe almeno eguagliato il record di Fangio di cinque titoli mondiali vinti», confesso Ferrari tempo dopo. Non perdonò mai Lauda e non lo rivolle in Ferrari quando l'austriaco si offerse. Il perdono arrivò anni dopo, poco prima della morte del Drake. L'ultimo pilota, nella classifica degli amori tecnici di Enzo Ferrari, fu Gilles Villeneuve. Il Grande Vecchio era un umorale, quando Lauda lo lasciò fece una scommessa con se stesso: prender un signor nessuno e portarlo al titolo mondiale.

Page 18: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

Entity Resolution – Other Challenges Information Transformation

•  Errors due to data entry •  Changing Attributes

•  Abbreviations/Data Truncation

V. Rossi

Valentino Rossi Vasco Rossi Valeria Rossi

http://pralab.diee.unica.it

Entity Resolution – Abstract Problem Statement Information Transformation

Slide Credit: L. Getoor, A. Machanavajjhala - Entity Resolution Tutorial

Page 19: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

Entity Resolution – Deduplication Information Transformation

•  Cluster the record mentions that correspond to the same entity

Slide Credit: L. Getoor, A. Machanavajjhala - Entity Resolution Tutorial

http://pralab.diee.unica.it

Entity Resolution – Deduplication Information Transformation

•  Cluster the record mentions that correspond to the same entity •  Compute a cluster representative

Source: L. Getoor, A. Machanavajjhala - Entity Resolution Tutorial

Page 20: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

Entity Resolution – Linkage Information Transformation •  Link Records that match across databases

Source: L. Getoor, A. Machanavajjhala - Entity Resolution Tutorial

http://pralab.diee.unica.it

Open Source Information Reliability (1) Analysis •  The types of sources used to evaluate information are

–  Primary sources. •  A primary source refers to a document or physical object that was written or

created during the time under study. These sources are present during an experience or time period and offer an inside view of a particular event.

– 

– – – – 

–  Secondary Sources •  Journals that interpret findings •  Textbooks •  Magazine Articles •  Commentaries •  Histories •  Criticism •  Encyclopedias

Source: Open-Source Intelligence, Federation of American Scientist, 2012

Page 21: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

Open Source Information Reliability (2) Analysis

Source: Open-Source Intelligence, Federation of American Scientist, 2012

http://pralab.diee.unica.it

Open Source Information Content Credibility Analysis

Source: Open-Source Intelligence, Federation of American Scientist, 2012

Page 22: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

Link Analysis Analysis •  The basic problem for intelligence analysts is putting information

together in an organized way so the di cult task of extracting meaning from the assembled information is made easier.

•  Link analysis puts information about the relationships among entities—individuals, organizations, locations, and so on—into a graphic format and context that will clarify relationships and aid in inference development. Link analysis can be applied to relationships among those entities, which might have been identi ed in a given analysis.

•  The seven steps of link analysis are: 1.  Assemble all raw data 2.  Determine focus of the chart 3.  Construct an association matrix 4.  Code the associations in the matrix 5.  Determine the number of links for each entity 6.  Draw a preliminary chart (not covered in the presentation) 7.  Clarify and re-plot the chart (not covered in the presentation)

Source: Criminal Intelligence – United Nations Office of Drugs and Crime

http://pralab.diee.unica.it

Link Analysis Analysis

Source: Criminal Intelligence – United Nations Office of Drugs and Crime

Page 23: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

Link Analysis Analysis 1.  Assemble all raw data

–  Assemble all relevant files, field reports, informant reports, records, etc.

2.  Determine the focus of the chart –  Identify the entities that will be the focus of your chart. Read through your data

and underline or highlight these entities, which may include names of people and/or Organizations, auto license numbers, addresses, etc.

3.  Construct an association matrix –  An association matrix is an essential, interim step in constructing a link chart. It

is used to identify associations between entities but is not used for presentation purposes. Regardless of which charts are going to be constructed, an association matrix should always be constructed first.

Source: Criminal Intelligence – United Nations Office of Drugs and Crime

http://pralab.diee.unica.it

Link Analysis Analysis 4.  Code the associations in the matrix

–  Association codes are used to indicate the basis for or nature of each relationship shown in the matrix.

Source: Criminal Intelligence – United Nations Office of Drugs and Crime

Page 24: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

Link Analysis Analysis 5.  Determine the number of links for each entity

–  A useful way to start your chart is to count the number of links associated with each entity in the matrix. Be sure to count across and down for each entity.

Source: Criminal Intelligence – United Nations Office of Drugs and Crime

http://pralab.diee.unica.it

Reporting – Basic Guidelines Visualise and Report

•  The basic guidelines in preparing products for reporting and disseminating information are

–  Timely. Information should be reported to affected units without delay for the sole purpose of ensuring the correct format.

–  Relevant. Information must contribute to the answering of intelligence requirements. Relevant information reduces collection, organization, and transmission times.

–  Complete.

•  The three reporting methods used to convey intelligence and information are

–  Written. –  Graphic. Web-based report dissemination is an effective technique to ensure the wide

stawareness of written and graphical information across echelons. OSINT personnel can collaborate and provide statuses of intelligence requirements through Web sites. Information can also be uploaded to various databases to support future open-source missions and operations.

–  Verbal and voice. The most common way to disseminate intelligence and information verbally is through a briefing. Based on the criticality, sensitivity, and timeliness of the information, ad hoc and impromptu verbal communication methods are the most efficient to deliver information to commanders.

Page 25: Pattern Recognition and Applications Lab OSINT - …people.unica.it/giorgiogiacinto/files/2016/05/OpenSource... · Pattern Recognition and Applications Lab Università di Cagliari,

http://pralab.diee.unica.it

Categories of Intelligence Products – 1 Collaborate

•  Indications and Warnings –  Intelligence activities intended to detect and report time-sensitive intelligence

information on foreign developments that could involve an threat a particular target.

•  Current Intelligence –  supports ongoing operations; it involves the integration of time-sensitive, all-source

intelligence and information into concise, accurate, and objective reporting on the area of operations (AO) and current threat situation.

–  One of the most important forms of current intelligence is the threat situation portion of the common operational picture.

•  General Military Intelligence –  is intelligence concerning (1) military capabilities of foreign countries or organizations

or (2) topics affecting potential military operations relating to armed forces capabilities, including threat characteristics, organization, training, tactics, doctrine, strategy, and other factors bearing on military strength and effectiveness and area terrain intelligence

•  Target Intelligence –  Intelligence that portrays and locates the components of a target or target complex and

indicates its vulnerability and relative importance (JP 3-60). Target intelligence is the analysis of threat units, dispositions, facilities, and systems to identify and nominate specific assets or vulnerabilities for attack, reattack, or exploitation.

http://pralab.diee.unica.it

Categories of Intelligence Products – 2 Collaborate

•  Scientific and Technical Intelligence –  is the product resulting from the collection, evaluation, analysis, and interpretation of

foreign scientific and technical information. –  Scientific and technical intelligence covers foreign developments in basic and applied

research and in applied engineering techniques and scientific and technical characteristics, capabilities, and limitations of all foreign military systems, weapons, weapon systems, and materiel, the related research and development, and the production methods employed for their manufacture

•  Counterintelligence –  Information gathered and activities conducted to identify, deceive, exploit, disrupt, or

protect against espionage, other intelligence activities, sabotage, or assassinations conducted for or on behalf of foreign powers, organizations or persons or their agents, or international terrorist organizations or activities (EO 12333). Counterintelligence analyzes the threats posed by foreign intelligence and security services, and international terrorist organizations, and the intelligence activities of non-state actors such as organized crime, terrorist groups, and drug traffickers.

•  Estimative Intelligence –  identifies, describes, and forecasts adversary capabilities and the implications for

planning and executing military operations. Estimates provide forecasts on how a situation may develop and the implications of planning and executing military operations.