Open Platform

33
© 2008 Palantir Technologies Inc. All rights reserved. Palantir Open Platform Brian Schimpf Forward Deployed Engineer

description

 

Transcript of Open Platform

Page 1: Open Platform

© 2008 Palantir Technologies Inc. All rights reserved.

Palantir Open PlatformBrian SchimpfForward Deployed Engineer

Page 2: Open Platform

Presentation Overview

Palantir is an open platform– Designed from the ground up to be open and extensible– Rich set of APIs spanning the product

Palantir works with your IT infrastructure In this talk

– Integrating with existing software ecosystem– Palantir extensibility

Page 3: Open Platform

Existing IT Ecosystem

Your existing IT infrastructure– Authentication– Information Extractors– Legacy data stores– Rapidly changing data sources

Page 4: Open Platform

Existing IT Ecosystem

Your existing IT infrastructure– Authentication– Information Extractors– Legacy data stores– Rapidly changing data sources

Page 5: Open Platform

Authentication

Already have an existing authentication and authorization infrastructure

May have multiple authentication sources

Want to provide a unified access control solution across information sources

LDAP Credentials

Public Key Credentials

Page 6: Open Platform

Authentication WS provides a common interface Provide users, groups and group memberships Allows multiple sources to be registered

PKI Auth Source

LDAP Auth Source

Authentication Web Service

Dispatch Server

Sample User AUsername: jdoe@source1Name: Jane DoeUID: ABCD-EFGH-IJKLGroups: 1234, 5678

Sample User BUsername: jsmith@source2Name: John SmithUID: ZXYW-VUTS-RQPOGroups: 9876, 5432

Authjdoe@source1

Authjsmith@source2

Page 7: Open Platform

Authentication Web Service

Prebuilt implementation for LDAP– Compatible with Microsoft Active Directory

Implemented via SOAP-RPC– Can be arbitrarily complex

Works seamlessly with Palantir Access Control Model– ACLs can span authentication sources

Can be leveraged by other applications for authentication

Page 8: Open Platform

Existing IT Ecosystem

Your existing IT infrastructure– Authentication– Information Extractors– Legacy data stores– Rapidly changing data sources

Page 9: Open Platform

Information Extractors

Large repositories of unstructured text Multiple information extractors have been run across the text Provide different types of extraction

– Entities– Relationships– Metadata– Geotagging

Siloed view of each entity extractors output Want to combine these views alongside structured data into one

interface

Page 10: Open Platform

Entity Extractor SDK

Palantir provides excellent visualization and integration of entity extracted documents

Entity Extractor SDK provides common interface to all major extractors– Command line interface– SOA Web Service

Page 11: Open Platform

Entity Extractor SDK

Leverages DocXML format to represent data– Can combine multiple extractor outputs into one representation– See Palantir XML Formats Presentation for more information

Standard SOAP-RPC and XML allows custom implementations in any language on any platform

Open interface and format allow the platform to be leveraged by other applications

Page 12: Open Platform

Existing IT Ecosystem

Your existing IT infrastructure– Authentication– Information Extractors– Legacy data stores– Rapidly changing data sources

Page 13: Open Platform

Legacy Data Systems

Multiple stovepiped sources of information No common schema No common interface No common access control Want to provide common interface for analysis and data access

Page 14: Open Platform

XML APIs

Palantir XML provides a serialized form of the Palantir Object Model Can exactly control the representation of data in Palantir

– Fine grained access control – Tracking of pedigree and lineage

See Palantir XML Formats Presentation for more information

Page 15: Open Platform

Existing IT Ecosystem

Your existing IT infrastructure– Authentication– Information Extractors– Legacy data stores– Rapidly changing data sources

Page 16: Open Platform

Rapidly Changing Data Sources

New data sources come on line all the time Want to easily integrate this content with existing data to discover

new information Palantir has flexible user interface and backend data import utilities

– Easy to quickly map new datasets– Handles popular unstructured document formats– Rapidly transforms structured import sources

• Flat files• Excel spreadsheets• Relational databases

Page 17: Open Platform

Data Quality

This looks great, but… The quality of analysis is only as good as the data that goes into it

Palantir handles dirty data– Attempts to parse and validate attribute values– Unparseable, incomplete or invalid data is still allowed and

indexed but does not clutter the system Data parsing and validation framework is extensible through the

Palantir Ontology APIs

Page 18: Open Platform

Object Model

Page 19: Open Platform

Property Ontology APIs

The Palantir Ontology APIs enable developers to extend the functionality of the Property Ontology

Page 20: Open Platform

Structure of a Property

Two types of property values are supported in Palantir– Simple

• Used for single, unparsed values• e.g. Nationality, Organization Name

– Composite• Used for values composed of discreet, semantic units• e.g. Name (first & last), Address (city, state, zip, etc).

Page 21: Open Platform

Palantir Data Store

Approx Gen

Validator

Maker

Lifecycle of a Property

Extracted Value Raw property value

Transforms raw string

Validate components

Generate approxes

Store to database and index

ExtractTransform

Load

Page 22: Open Platform

Property Maker

Data parsing interface Transform tool that can be leveraged by both XML APIs and

standard import interface

In: String with value “John Smith” Out: Name Property with

First Name: John Last Name: Smith

Page 23: Open Platform

Property Parser API

In: Lindengasse 24-9, A-1020 Vienna Out: Address Property with

– Address 1: Lindengasse 24-9– City: Vienna– Postal Code: 1020– Country: AT

Page 24: Open Platform

Passport Machine Readable Zone– Encoding of passport information in standardized form– Includes checksum after each field

Validator can verify checksum digits

Property Validator

Data validation interface Presents notification to the user if the property does not pass

validation

P<UTOERIKSSON<<ANNA<MARIA<<<<<<<<<<<<<<<<<<<L898902C<3UTO6908061F9406236ZE184226B<<<<<14

Page 25: Open Platform

Approx Generator

Fuzzy searching interface Property can support multiple types of approxes Approxes are indexed for fast searching

فكري:محمد

Example: Arabic Name Normalization

Transliterate Name to Arabic

Page 26: Open Platform

ETL Tools

The Palantir Ontology APIs allow you to customize Palantir’s data handling

Extensions are leveraged across all imports– Data integration without a complex ETL toolchain

Works with manually entered or tagged data as well

Page 27: Open Platform

Palantir Extensibility

I have all this data in Palantir, now what?

I need to extract the information for some other tool I need to present the information to the user in a different way

Client Connection API allows for all these operations

Page 28: Open Platform

Client Connection API

Used by Palantir Workspace Proxies all requests to Dispatch Written in Java Get started coding in 5 minutes Provides abstraction for

– Object Model– Revisioning DB– Access Control Model

Dispatch Server

Spring HTTP RPC

Client

Page 29: Open Platform

SOA Data Interface

Client Connection API used to provide SOA WS interface to Palantir Examples

– Searching • Works across all data sources• SearchQuery and SearchAroundQuery classes

– Entity Extractor tuning• Retrieve manual edited tagging to train entity extractor• getAppEventObjectInfo for begin and end date

– Revisioning database• Extract history of changes to objects• DBEvent class

Page 30: Open Platform

Custom Presentation

Simple access to data for custom presentation Searching and storing objects requires a few API calls Examples

– Data entry forms• Standard border crossing forms• createBlankObject, Property.attemptToCreate

– Report generation• Report on changes in activity• SearchQuery and HGBin class

– Thin client graph presentation• Transform graph to HTML• Graph class

Page 31: Open Platform

Client Connection API

Provides simple and powerful access to Palantir data Functionality of application plus more Complete web-based viewer application written in under 6 hours

Page 32: Open Platform

Summary

Palantir integrates with and becomes a part of your infrastructure Can unify your authentication, information extraction and data

resources in one environment Provides a rich platform that can be leveraged in other projects

Page 33: Open Platform

© 2008 Palantir Technologies Inc. All rights reserved.

Palantir Open PlatformBrian SchimpfForward Deployed Engineer