Best Methods for Using Internet Search and Analysis in...
Transcript of Best Methods for Using Internet Search and Analysis in...
Internet Search and Analysis in Intelligence and Investigations
Tuesday, January 11, 20117:30 AM – 8:45 AM
Ed AppelProprietor, iNameCheck
1
Presentation
• Quick Internet Overview• Online Sources & Methods• Legal, Policy & Privacy Guidelines• Policy & Regulatory Issues
2
The Internet Is Essential for Investigations and Intelligence
• Accessible data• Who’s online: 80%• 30%+ power users• Crime & misbehavior• Due diligence• Intelligence• Vetting• Investigations
3
Pew found all age groups online in significant percentages.
Millions of Users
IP traffic in Petabytes/month
Source: InternetWorldStats
Source: Cisco
Internet GrowthThe numbers, however precise, show that Internet growth is rapid
US: #2 in world (after China) – 239,893,600 users of 310.2M population – 77.3%, per InternetWorldStats.com
4
Millions of Users
IP traffic in Petabytes/month
The Internet Universe
5
November 3, 2003 Map of the Internet
MIT Internet Map 2007
San Diego Supercomputer Center I-‐Map 2008
An increasingly complex, interconnected galaxy of nodes is portrayed in these Internet maps by leading technologists.
6
San Diego Supercomputer Study of Internet Links
The map of the Internet, as built and described in a Nature Communicationspaper, shows the locations of Internet systems on the hyperbolic plane. Image courtesy of Dmitri Krioukov, SDSC/CAIDA
A billion or more people use the Internet daily, according to recent studies by SDSC research.
What’s on the Internet?Social NetworkingNews & BlogsMaps & LocationsGames & HobbiesPhotosVideo, Film, MusicLibrariesE-CommerceAdvertisingPrivate Websites
Porn, ExploitationIllegal SitesIllegal ActivitiesIllicit Activities Forbidden ActivitiesFantasyHumorJuvenile Delinquency
7
Wireless: Major Growth Area
What’s on the Internet?
• Public Records (Real Estate, Courts, Licenses, Businesses, Arrests, Liens, etc.)
• Residences, Building Occupants• Telephones, Email, Mailing Addresses• Genealogy, Births, Deaths• Educational Institutions & Alumni• Business & Executive Profiles• Associations & Volunteer Organizations• Private data vendors (Acurint, IRB, TLO)
8
Self-‐Descriptions in Online Profiles
9
Yedo Da Meth Lover, 26, Colbert, Washington
MySpaceLowlife, 26, Brownsville/Austin, TX, “Death
to the New World Order” MySpace Crack Monkey, 21, Somerset, NJ, Rider grad, MySpace
Hacker ClubFacebook
Lynn, N. Seattle ecstasy dealer
MySpaceAngela, meth addict
MySpace
Illicit Behavior Online: People We Trusted
10
Florida Asst. US Attorney arrested in 2007 as he arrived in Detroit with doll, earrings, Vaseline, for trying to arrange to have sex with 5-‐year-‐old in Internet chats. He committed suicide in his cell in 2007.
A DHS press spokesman caught trying to induce “14-‐year-‐old girl” (an undercover detective) to have sex, pled No Contest in 2006
Army Chief Warrant Officer, Director of Army School of Information Technology, arrested in 2010 for collecting and sharing child pornography over the Internet
US military contractor in Baghdad hacked girls’ computers, extorted them for nude photos & sex tapes, tried to meet some for sex while on leave, had over 4,000 victims when arrested. Serving a 30-‐year sentence, 2010.
Case Examples
11
A computer forensic analyst – part of the IT security department of a Fortune 500 firm – was found publicizing himself online as a profane, offensive “leader” of 5,000 players in a worldwide, popular massively multiplayer online fantasy sci-fi game –which led to discovery of his game playing all day, during both work and off-hours.
A new chief of research was found to have been disciplined by the FDA – 3 years prohibition from government contracting – for admitted scientific misconduct. While the FDA database did not show the 10-year-old sanctions, three FDA newsletters online reported them.
One lesson: What you don’t know about what’s online can hurt you.
Case Examples
12
~1,000 US Navy personnel using their Navy.mil email addresses as their MySpace user names. Many postings contain unsuitable material, including operational security issues.
A computer security man who pled guilty to operating a massive botnet that stole IDs and money was hired by a Santa Monica Internet search firm while he was awaiting sentencing. The firm failed to Google the convict.
13
Spc. Bradley Manning Accused in Wikileaks Case
“Wikileaks” chief suspect Spc. Bradley Manning, 22, of Potomac, MD, was arrested in Kuwait and incarcerated at Quantico Marine Base, charged in July 2010 with leaking classified videos of US air strikes in Iraq to the Wikileaks website in April 2010. An online chat acquaintance, Adrian Lano (formerly convicted of computer hacking) told authorities and the press that Manning provided thousands of classified documents to Wikileaks. Julian Assange, Wikileaks’ founder, claimed the leaker exposed US military misdeeds. US government leaders voiced fear that US troops and informants would be killed based on secrets leaked, and defended the actions depicted. 75 MB of classified documents posted by Wikileaks numbered in the thousands.
Adrian Lamo ~2001
Julian Assange, Wikileaks
Bradley Manning was reportedly despondent over losing a lover and disciplined for striking a soldier
Leaked videos included US air strikes that killed civilians, including a Reuters reporter & driver
Manning’s charges include illegally transferring classified data to his PC, placing unauthorized software on military computers and delivering national defense info to an unauthorized party
Internet Searching is Useful For:
• Cyber vetting – virtual neighborhoods• Criminal & corporate investigations• IP & asset protection (insider threat)• Compliance• Competitive intelligence• Legal support• Research (any topic)
14
Likely Findings
• History of malicious online activities: ~3-‐6%• Derogatory information, e.g. past bad acts
– Arrests, convictions, lawsuits, bankruptcies, firing
• Misuse of “anonymous” virtual identity online• Most likely: Verification of qualifications and eligibility for the position sought in vetting
15
Sources & Methods for Internet Searching
• Systems & Tools• Search Engines & Metasearch• Websites with Databases: “Dark Web”• Automated Searching
16
Analysis is critical for the information to have value
Systems
• Search on the right computer– Use a separate system for searching -‐malware risk– Keep anti-‐virus, firewall, anti-‐malware up to date
• Protect your anonymity – you can be detected• Protect the subject – don’t leave a trail• Use fast systems, applications, enough memory
17
Applications
• Browser: Internet Explorer, Firefox, Chrome, Safari, Opera
• Browser settings, search engine integration• PDF printer (e.g. Adobe Acrobat)• Database or folders – retrievable files• Search tools (internal, Internet)
18
Manual Searches
• Big 5 Search Engines – Live & Cached Results– Google (YouTube) – Page Rank: 100 factors– Yahoo! 4B pages– Microsoft (Bing) – Ask (MyWebSearch) 3% of searches– AOL (MapQuest)
• Popular (Social & Sales) websites – eBay, Facebook, MySpace, Craigslist, Amazon
19
Other Search EnginesAll the Web -‐ "live search" looks for terms as you type them AltaVista -‐ A Yahoo property that's not what it used to be Exalead -‐ Search engine from France FreeSearch -‐ U.K. search engine Gigablast -‐ Looks similar to Google, smaller database IceRocketLycos Mamma (really a metasearch engine)Openfind -‐ Emphasizes Chinese-‐language results WiseNut -‐ Includes "Wise Guides," (topic groups )
20
Contemporary (“Web 2.0”) Search Tools
Twitter.com , Trackle.com, Monitter.com and Friendfeed.com –help find people & provide “right now” results
Specialized Searching (Examples)
• Blogs: blogsearch.google.com, icerocket,com, sphere.com, technorati.com, blogdigger.com
• IP addresses: SamSpade.org, whois.com, networksolutions.com, domaintools.com
• Reverse phone/address: Whitepages.com, anywho.com, verizon.com
• Public records: brbpub.com (county)• Government: usa.gov
21
22
More Searches
• Advanced search (Boolean logic)• Special features: images, videos, maps, news, blogs
• Country-‐based searching• Translations (rough)• Tracking: Google.com/alerts (emails)
23
Tracking
• Google and other tools (Trackle.com) allow one to track:– Changes in websites– Appearance of terms on indexed pages– Appearance of terms in Twitter & other places– Blogs & news references to a term
• Tracking is important in protection of assets and following activities of rivals & adversaries
24
Leveraging Search Engine Findings
• Identify websites that may hold more on topic– Colleges, associations, groups, social sites– Local press, hobbies, sports, high schools
• Identify subject’s activities that may lead to further searching
• Identify subject’s family and closest friends, who may post about the subject
Metasearch Engines
Notice that results differ in order & number
25
Cached Web Pages
Archive.org: Website content no longer online (Wayback Machine)
Dogpile http://www.dogpile.com/ Google, Yahoo, Bing, Ask
ixquick http://www.ixquick.com/ 11 sites
Metasearch http://www.metasearchengine.com/ 27 sites
Excite http://www.excite.com/ Google, Yahoo, Bing, Ask
Infospace http://www.infospace.com/ Google, Yahoo, Bing, Ask, Twitter
Addictomatic http://addictomatic.com/ Metasearch engine (23 sites)
Metacrawler http://www.metacrawler.com/ 9 or more sites
Search3 http://www.search3.com/ Google, Twitter, Bing, in columns
27
Variations in Name Searches: Examples
• Use different versions of a name:– “John J. Doe” (full name in quotes)– “Jack Doe” (nickname in quotes)– “Jack Doe” Nevada (name in quotes + geographic location)– “Jack Doe” IBM (name in quotes + job/industry/hobby)– “Jack Doe” Purdue (name in quotes + school)
• Address – reverse address – J. Doe may work better than John Doe• Phone Numbers• Email Addresses
– [email protected]– doe– jjdoe@– @jacksbar (used with smaller companies)
28
Quick Anatomy of Google• Google (YouTube) constantly spiders the Internet, hits pages about once every 30 days
• Caches & indexes about 10 billion pages, more than any other search engine
• Presents search results instantly, showing live and cached data links
• Presents results in “PageRank” order based on popularity (note: ads influence results)
The Internet:506M
websites56B pages
Google has about 18% of pages indexed
Web
Searching Online Databases: Contents May Not Be Indexed by Search Engines
• PeopleFinders, zabasearch
• WhitePages.com, Anywho.com
• USA.gov• USTaxCourt.gov• BlogSearch.google, IceRocket, Sphere
• Yahoo message boards
• Whois, SamSpade.org• Nsopr.gov• SSNValidator.com• USAF-‐locator.com• Bop.gov/inmate• AMA-‐assn.org, bms.org (MDs)
• RipoffReport.com• RagingBull.com
29
Finding Search Tools
• Library of Congress: http://www.loc.gov/rr/ElectronicResources/subjects.php?subjectID=69
• List of Search Engines: http://www.pandia.com/powersearch
• Yahoo List: http://dir.yahoo.com/Computers_and_Internet/Internet/World_Wide_Web/Searching_the_Web/Search_Engines_and_Directories/
30
Search Automation
• Metasearch• Copernic: www.copernic.com • Corporate datamining tools• Proprietary Software
Better COTS products are neededBoolean Logic, Search Techniques Optimize Queries
31
32
Step-‐by-‐Step Approach
1. Search enginesIndividual (e.g. Google, Yahoo)Meta (DogPile, Metasearchengine)
2. Social Networks/Blog sites3. Copernic4. Automated searches5. Follow-up searches
Keeping Up With The Internet
• Keep a spreadsheet with links to best sources• Don’t rely on search engines alone• Find new sites & drop those no longer useful• Research what works best• Use experts in Internet searching -‐ outsource• Train & equip internal Internet searchers
33
Procedures
• Plan – include subject-‐specific sites & terms• Capture content, print into PDFs• Include details (URLs, dates, specifics)• Provide source for each item reported• Log the process, if evidence results• Do not include inappropriate data (Title VII)• Include caveats about reliability in reports
34
Controversial Methods
• “Friending” subjects – in real or false identity• Social engineering to elicit info about subject• Emailing subject under a false identity• “Pretexting” as the subject to elicit data from a company or someone who knows subject
• Identifying an anonymous emailer using hidden code
• “Lurking” in chat rooms
35
Large Scale Internet Intelligence
• Use automated search tools• Capture & store on-‐line activities for reference• Filter and scan results to find relevant data• Analyze and report results along with other investigative sources• Identify users: link real names to online IDs
• Be careful in using Internet data to ensure accuracy and fairness
36
Analyzing Search Results
• Attribution: Who uses a virtual identity, posts• Verification: Proving or confirming online data
– Ultimate confirmation: admission of subject
• Filter non-‐identifiable, irrelevant references• Evaluating the seriousness of findings• How much searching is enough?
37
Preserving Online Evidence
• Print relevant web pages (PDF files)• Maintain securely (encryption, digital signatures)
• Keep long enough to meet legal obligations (then delete completely)
38
If you are not using computer forensic tools….
If the content can become evidence, keep a log and notes to support testimony about collection.
Using Search Results
• Integrate into other reporting – with clear indication of source
• Remember: subject may not have posted item• Fairness may demand verification of the data by the subject
• In vetting, it’s best to interview the subject about any questionable postings
39
Is Internet Vetting Legal?Is Internet Information “Private?”
• Internet data is public, not private: plain view, published information
• No restriction on using published information• Must abide by all legal requirements for other types of investigative information
• No current legal requirements for– Advising the subject– Using Internet searching, if not outsourced
Caveat: This does not constitute legal advice40
Legal & Privacy Gold Standard
Notice, consent: add to current formsAttribution, verification, subject interview, redressAssessing results as intelligence:– Virtual ID might be used by someone else– Online data may be fabricated, fantasy, altered– Basis for subject interview, adjudication
Meets FCRA & other legal requirements
41
Cyber Vetting Guidelines
• IACP-‐PERSEREC Project: Guidelines– Cyber Vetting for Law Enforcement– Cyber Vetting for National Security– Cyber Posting for both above
• Nationwide series of focus groups, research• Baseline considerations for establishing enterprise policies and procedures
42
PERSEREC: Defense Personnel Security Research Center, Monterey, CAIACP: International Association of Chiefs of Police
43
IACP Cyber Vetting Guidelines
Developing a Cybervetting Strategy for Law Enforcement, December 2010, IACP[Companion study for national security]
http://www.iacpsocialmedia.org/Portals/1/documents/CybervettingReport.pdf
Key Policy Issues
• Trained Internet investigators• Outsourced (can address EEO issue)• Internet search policies & procedures
– Liability if Internet searching is done improperly
• Defining sufficiency -‐ completeness• Utilizing results of searching
44
Issues with Private Investigators
• Licensing of cyber investigators– Training
• Legal and ethical guidelines for cyber vetting• Watching the watchers: regulators online• Keeping up with the Internet
45
Internet Searches for Vetting, Investigations and Open-‐Source Intelligence
By Edward J. AppelTaylor & Francis
http://www.taylorandfrancis.com/books/details/9781439827512/
Scheduled publication January 14, 2011
46
…contains more details on topics discussed here, e.g. how to do cybervetting and investigations ethically & legally
Forthcoming Book: