Download - Natural Language Processing and Graph Databases in Lumify

Transcript
Page 1: Natural Language Processing and Graph Databases in Lumify

NLP and Graph Databases in

Charlie Greenbacker & Joe Kerner

Page 2: Natural Language Processing and Graph Databases in Lumify

Agenda

Graph Databases

Lumify Overview

Introductions

Natural Language Processing

Page 3: Natural Language Processing and Graph Databases in Lumify

photo:&Columbia&Pictures&

About me: @greenbacker Theories: popular tripe Methods: sloppy Conclusions: highly questionable

Page 4: Natural Language Processing and Graph Databases in Lumify

Best reason for not finishing PhD

Page 5: Natural Language Processing and Graph Databases in Lumify

@ExploreAltamira

Page 6: Natural Language Processing and Graph Databases in Lumify

is an open source big data analysis and visualization platform built by Altamira engineers

Page 7: Natural Language Processing and Graph Databases in Lumify

Key Lumify Concepts

structure for organizing information (i.e., your data model) Ontology

any “thing” you want to represent (e.g., person, place, event) Entities

a link between two entities (e.g., leader-of, works-for, sibling-of) Relationships

data about an entity (e.g., first name, last name, date of birth) Properties

collection of entities and the relationships between them Graph

Page 8: Natural Language Processing and Graph Databases in Lumify

Live Demo

Page 9: Natural Language Processing and Graph Databases in Lumify

Who can Lumify help?

Page 10: Natural Language Processing and Graph Databases in Lumify

Lumify helps analysts fuse structured and unstructured data from myriad sources into actionable intelligence.

Intelligence Analyst

Page 11: Natural Language Processing and Graph Databases in Lumify

Law enforcement personnel can use Lumify to explore criminal networks, uncover hidden connections, and develop leads.

Police Investigator

Page 12: Natural Language Processing and Graph Databases in Lumify

Lumify analyzes financial data and transaction records to help detect fraud and identify possible insider threats.

Financial Analyst

photo:&Ken&Teegardin&(h9ps://flic.kr/p/9rn9Yh)&

Page 13: Natural Language Processing and Graph Databases in Lumify

Scientists, law firms, news organizations, and others can track their research in Lumify to unearth latent knowledge and discover critical new insights.

Research Staff

photo:&UK&NaConal&Archives&(h9p://bit.ly/1n9dhR8)&

Page 14: Natural Language Processing and Graph Databases in Lumify

Why Lumify?

Page 15: Natural Language Processing and Graph Databases in Lumify

•  Distributed under the permissive Apache 2.0 license

•  No restrictions on modifications

•  No licensing or usage constraints

Free and Open Source

Page 16: Natural Language Processing and Graph Databases in Lumify

Built on Scalable Open Source Tech

Hadoop&CDH&4&

Accumulo&

ElasCcSearch&

tesseract&CLAVIN& CMU&Sphinx&OpenNLP& OpenCV& ffmpeg&

Apache&Storm&

Secure&Graph&

custom&code&

Page 17: Natural Language Processing and Graph Databases in Lumify

•  Separate security restrictions at the entity, property, and relationship level

•  Implemented in and enforced by Accumulo cell-level security

Highly Secure

Joaquin Guzman Loera

DOB: 1957-04-04 POB: Badiraguarto Nationality: Mexican

Founded: 2010-01-11 Location: Mexico City Employees: 121

Zarka de Mexico

Page 18: Natural Language Processing and Graph Databases in Lumify

•  Full-time development staff

•  Custom development and customization services

•  Commercial support offerings

Supported

Page 19: Natural Language Processing and Graph Databases in Lumify

•  Day-to-day development done on Amazon infrastructure

•  Primarily use EC2, VPC, S3, SES, CloudWatch

•  Altamira is an AWS consulting partner

AWS Compatible

Page 20: Natural Language Processing and Graph Databases in Lumify

Natural Language Processing in

Page 21: Natural Language Processing and Graph Databases in Lumify

Text Extraction

video

text docs structured data

images OCR tesseract

audio CMU Sphinx

CMU Sphinx

OCR tesseract

extractor

Page 22: Natural Language Processing and Graph Databases in Lumify

Text Enrichment

•  Apache OpenNLP •  Named Entity Recognition •  Extracts names of entities

from unstructured text •  Persons, Orgs, & Locations •  Highlighted in preview text •  User must confirm/resolve

•  CLAVIN •  Geospatial Entity Resolution •  Resolves extracted location

names to gazetteer records •  Solves “Springfield problem” •  Disambiguates place names •  Turns text docs into maps!

Page 23: Natural Language Processing and Graph Databases in Lumify

Machine-powered entity extraction and resolution, combined with human QA and supplementation, supports rich semantic analysis of raw text.

Enriched Text Documents

Drug Lord “El Chapo” Captured in Mexico

PUBLISHED DATE SOURCE

Audit

2014/02/22 Wikipedia

Add Property

Although Guzman had long hidden successfully in remote areas of the Sierra Madre mountains, the arrested members of his security team told the military he had begun venturing out to Culiacan and the beach town of Mazatlan. A week prior to his capture, Guzman and Zambada were reported to have attended a family reunion in Sinaloa. The Mexican military followed the bodyguards tips to Guzman’s ex-wife’s house, but they had trouble ramming the steel-reinforced front door, which allowed Guzman to escape through a system of secret tunnels that connected six houses, eventually moving south to Mazatlan. He planned to stay a few days in Mazatlan to see his twin baby daughters before retreating to the mountains. On 22 February 2014, at around 6:40 a.m., Mexican authorities arrested Guzman at a hotel in a beach front area in Mazatlan, Sinaloa, following an operation by the Mexican Navy, with joint intelligence from the DEA and

Page 24: Natural Language Processing and Graph Databases in Lumify

Benefits to Users

quickly find relevant data without reading Increases Discoverability

machines process text faster than humans Helps Deal with Information Overload

enables object-based analysis & investigations Uncovers Hidden Connections

Page 25: Natural Language Processing and Graph Databases in Lumify

Future NLP Integration

e.g., Stanford NER, SUTime, MITIE Support other NER tools

e.g., OpenIE (formerly ReVerb) Event/Relationship Extraction

augmenting/extending GATE/ANNIE Coreference Resolution

e.g., frequency analysis, topic modeling, sentiment analysis Additional Text Analytics

use non-English language models for NER, etc. Multilingual Support

Page 26: Natural Language Processing and Graph Databases in Lumify

Graph Databases in

view part 2 of the presentation here: github.com/altamiracorp/secure-graph-presentation

Page 27: Natural Language Processing and Graph Databases in Lumify

Questions?

more info: lumify.io