NLP and Graph Databases in
Charlie Greenbacker & Joe Kerner
Agenda
Graph Databases
Lumify Overview
Introductions
Natural Language Processing
photo:&Columbia&Pictures&
About me: @greenbacker Theories: popular tripe Methods: sloppy Conclusions: highly questionable
Best reason for not finishing PhD
@ExploreAltamira
is an open source big data analysis and visualization platform built by Altamira engineers
Key Lumify Concepts
structure for organizing information (i.e., your data model) Ontology
any “thing” you want to represent (e.g., person, place, event) Entities
a link between two entities (e.g., leader-of, works-for, sibling-of) Relationships
data about an entity (e.g., first name, last name, date of birth) Properties
collection of entities and the relationships between them Graph
Live Demo
Who can Lumify help?
Lumify helps analysts fuse structured and unstructured data from myriad sources into actionable intelligence.
Intelligence Analyst
Law enforcement personnel can use Lumify to explore criminal networks, uncover hidden connections, and develop leads.
Police Investigator
Lumify analyzes financial data and transaction records to help detect fraud and identify possible insider threats.
Financial Analyst
photo:&Ken&Teegardin&(h9ps://flic.kr/p/9rn9Yh)&
Scientists, law firms, news organizations, and others can track their research in Lumify to unearth latent knowledge and discover critical new insights.
Research Staff
photo:&UK&NaConal&Archives&(h9p://bit.ly/1n9dhR8)&
Why Lumify?
• Distributed under the permissive Apache 2.0 license
• No restrictions on modifications
• No licensing or usage constraints
Free and Open Source
Built on Scalable Open Source Tech
Hadoop&CDH&4&
Accumulo&
ElasCcSearch&
tesseract&CLAVIN& CMU&Sphinx&OpenNLP& OpenCV& ffmpeg&
Apache&Storm&
Secure&Graph&
custom&code&
• Separate security restrictions at the entity, property, and relationship level
• Implemented in and enforced by Accumulo cell-level security
Highly Secure
Joaquin Guzman Loera
DOB: 1957-04-04 POB: Badiraguarto Nationality: Mexican
Founded: 2010-01-11 Location: Mexico City Employees: 121
Zarka de Mexico
• Full-time development staff
• Custom development and customization services
• Commercial support offerings
Supported
• Day-to-day development done on Amazon infrastructure
• Primarily use EC2, VPC, S3, SES, CloudWatch
• Altamira is an AWS consulting partner
AWS Compatible
Natural Language Processing in
Text Extraction
video
text docs structured data
images OCR tesseract
audio CMU Sphinx
CMU Sphinx
OCR tesseract
extractor
Text Enrichment
• Apache OpenNLP • Named Entity Recognition • Extracts names of entities
from unstructured text • Persons, Orgs, & Locations • Highlighted in preview text • User must confirm/resolve
• CLAVIN • Geospatial Entity Resolution • Resolves extracted location
names to gazetteer records • Solves “Springfield problem” • Disambiguates place names • Turns text docs into maps!
Machine-powered entity extraction and resolution, combined with human QA and supplementation, supports rich semantic analysis of raw text.
Enriched Text Documents
Drug Lord “El Chapo” Captured in Mexico
PUBLISHED DATE SOURCE
Audit
2014/02/22 Wikipedia
Add Property
Although Guzman had long hidden successfully in remote areas of the Sierra Madre mountains, the arrested members of his security team told the military he had begun venturing out to Culiacan and the beach town of Mazatlan. A week prior to his capture, Guzman and Zambada were reported to have attended a family reunion in Sinaloa. The Mexican military followed the bodyguards tips to Guzman’s ex-wife’s house, but they had trouble ramming the steel-reinforced front door, which allowed Guzman to escape through a system of secret tunnels that connected six houses, eventually moving south to Mazatlan. He planned to stay a few days in Mazatlan to see his twin baby daughters before retreating to the mountains. On 22 February 2014, at around 6:40 a.m., Mexican authorities arrested Guzman at a hotel in a beach front area in Mazatlan, Sinaloa, following an operation by the Mexican Navy, with joint intelligence from the DEA and
Benefits to Users
quickly find relevant data without reading Increases Discoverability
machines process text faster than humans Helps Deal with Information Overload
enables object-based analysis & investigations Uncovers Hidden Connections
Future NLP Integration
e.g., Stanford NER, SUTime, MITIE Support other NER tools
e.g., OpenIE (formerly ReVerb) Event/Relationship Extraction
augmenting/extending GATE/ANNIE Coreference Resolution
e.g., frequency analysis, topic modeling, sentiment analysis Additional Text Analytics
use non-English language models for NER, etc. Multilingual Support
Graph Databases in
view part 2 of the presentation here: github.com/altamiracorp/secure-graph-presentation
Questions?
more info: lumify.io
Top Related