Actionable Intelligence From Unstructured Data using MDA

8
M E M B E R www.adasoftusa.com 379 THORNALL STREET, WEST TOWER - 7TH FL, METROPARK, NJ 08837 Call 888.453.0014 Informational Primer Actionable Intelligence from Unstructured Data ADA SOFTWARE The automated software modernization company

description

Data is everywhere, but far too often, not the information we need. Businesses continue to generate a huge volume of memos, reports, minutes of meetings, planning documents, proposals, emails, website content, blogs, wikis and other content. But this wealth of data is not providing companies with the information base it needs to make the right decisions when it needs to. Because all this unstructured data is not actionable intelligence. As a result, although we are awash with data everywhere, we make uninformed decisions based on a very small slice of that information that is readily available to us. This white paper explores a solution strategy.

Transcript of Actionable Intelligence From Unstructured Data using MDA

Page 1: Actionable Intelligence From Unstructured Data using MDA

Software Modernization. It’s all we do!!! PAGE 1 OF 7

SOFTWARE MODERNIZATION - POWERED BY MODELING

M E M B E R

www.adasoftusa.com

379 THORNALL STREET, WEST TOWER - 7TH FL, METROPARK, NJ 08837

Call

888.453.0014

Informational Primer

Actionable Intelligence

from Unstructured

Data

Call

888.453.0014 ADA SOFTWARE

The automated software modernization company

Page 2: Actionable Intelligence From Unstructured Data using MDA

Software Modernization. It’s all we do!!! PAGE 2 OF 7

SOFTWARE MODERNIZATION - POWERED BY MODELING

D ata is everywhere, but far too often,

not the information we need. Busi-

nesses continue to generate a huge

volume of memos, reports, minutes

of meetings, planning documents, proposals,

emails, website content, blogs, wikis and other

content. But this wealth of data is not providing

companies with the information base it needs to

make the right decisions when it needs to. Be-

cause all this unstructured data is not actionable

intelligence. As a result, although we are awash

with data everywhere, we make uninformed deci-

sions based on a very small slice of that informa-

tion that is readily available to us. Figure-1 shows

how the Information Framework stands broken.

Worse still, all this underutilized deluge

of unstructured data is actually causing compa-

nies to lose money.

IDC estimated in

their report titled

“The High Cost of

Not Finding Infor-

m a t i o n ” ( I D C

#29127) that com-

panies with 1,000

white collar employ-

e e s t y p i c a l l y

wasted in excess of

$6 million per year searching for information and

not finding it. Add to this the lost revenues

caused by unproductive employee time.

The potential loss from unstructured data is,

therefore, multi-faceted and consists of:

Uninformed decisions

Overlooked risks

Loss of employee time

Loss of opportunity

Loss of revenues

All of these can be fixed by our meta-

model driven information management solution

that can turn all this unstructured data into rich,

actionable intelligence.

EXECUTIVE SUMMARYEXECUTIVE SUMMARYEXECUTIVE SUMMARY

Fig - 1

Page 3: Actionable Intelligence From Unstructured Data using MDA

Software Modernization. It’s all we do!!! PAGE 3 OF 7

SOFTWARE MODERNIZATION - POWERED BY MODELING

UNDERSTANDING “UNSTRUCTURE”

“Unstructured data” is not really unstruc-

tured. Let us take the example of a paper maga-

zine. It has a wonderful structure. The Table of

Contents offers an instant overview of the entire

magazine and provides an useable index that we

can use to jump to any article by page number.

Within articles, there are pictures to help us visu-

alize the information contained in

the text; there are headings that

are bolded and tell us what a

section of text is talking about;

there are blurbs (information call-

outs) that highlights some of the

main points of the article; there might be an ab-

stract providing a gist of the whole article; there

may be footnotes, citations and references that

link the ideas expressed with a world of informa-

tion outside the magazine, There are advertise-

ments, which we immediately recognize as ad-

vertisements. There is information about the edi-

torial team, the company publishing the maga-

zine and the authors of the various articles,

An e-mail gives you all the information as

to who wrote the e-mail; when was it written; to

whom was it addressed; who all received a copy;

what was it all about (the Subject); the main body

of the message; and, any reference material pro-

vided as an attachment or an URL

So there is, indeed, a lot of structure in

what we started out as identifying as

“unstructured data”.

The problem is not with the data. The

problem is that a machine does not understand

this structure automatically unless we find a way

of adding machine-readable information to all this

data.

SOLUTIONS STRATEGY

The key to unleashing knowledge from al

this powerful, but untapped, information lies in

being able to:

Generate the right METADATA (data about

the unstructured data) that a machine can

understand,

CATEGORIZE the data using an easily un-

derstood VOCABULARY, and a TAXONOMY

that indicates the data hierarchy and relation-

ships.

Provide a KNOWLEDGE RETRIEVAL

mechanism that understands all of the

above.

APPLYING OMG STANDARDS

OMG has modeling standards embodied

in Model Driven Architecture that can be utilized

for modeling any kind of information (though it is

originally intended for modeling and understand-

ing software systems). The Knowledge Discovery

Metamodel (KDM), for instance, separates

Make Unstructured Data Come Alive

ACTIONABLE INTELLIGENCE FROM UNSTRUCTURED DATAACTIONABLE INTELLIGENCE FROM UNSTRUCTURED DATAACTIONABLE INTELLIGENCE FROM UNSTRUCTURED DATA

“Unstructured” data might have an

excellent structure of its own - that

computers do not understand.

Page 4: Actionable Intelligence From Unstructured Data using MDA

Software Modernization. It’s all we do!!! PAGE 4 OF 7

SOFTWARE MODERNIZATION - POWERED BY MODELING

knowledge about existing systems into four or-

thogonal dimensions: Structure, Behavior, Data

and User Interface, Unlike software, data has no

behavior. But it has associative fact patterns. So

we utilize a modified version of the KDM concept

adapted for understanding unstructured data,

which we call mKDM.

OMG also has an initiative called the Se-

mantics of Business Vocabulary and Rules

(SBVR) which is a standard for establishing a

business vocabulary and terminology system that

can be used to express business models. This is

very useful for defining vocabularies to under-

stand unstructured data, taxonomies to catego-

rize unstructured data, and rules for processing

unstructured data.

Coupled together, mKDM and SBVR pro-

vide the base technology for creating metadata;

defining the vocabularies, taxonomies and rules

for processing the data; and retrieving useful in-

formation based on linked entities as well as

“inferred” fact patterns. This helps convert un-

structured data into actionable intelligence.

PARTS OF THE SYSTEM

SCANNERS AND PARSERS

Scanners and parsers will process the

unstructured data with reference to the Knowl-

edge Modeling Standards of OMG, and produce

symbol tables and syntax trees. This will be an

Abstract Syntax Tree Metamodel representing

the unstructured data.

AUTOMATIC CATEGORIZERS

Automatic categorizers will act on the

metadata and perform the following functions:

Linguistic analysis

Statistical inference

Machine learning

Rule-based processing

These will obtain the relevant vocabular-

ies, taxonomies and rules from the Semantics of

Business Vocabulary & Rules (SBVR) that is part

of our reference Knowledge Modeling Standard.

The SBVR will provide the relevant business

vocabulary necessary to do this job properly.

For instance, if we are doing this job for a stock-

broker, the relevant business vocabulary will be

far different from what will be relevant for a law

firm. Documents will be assigned to multiple

categories.

The output of the automatic categorizers

will be a Metadata Repository; and catalogs, fact

patterns and indexes.

KNOWLEDGE RETRIEVAL ENGINE

Regardless of whether the user is

searching or browsing or seeking information

through a web service or an API, the actual re-

trieval will be performed by a Knowledge Re-

trieval Engine. It has to scan and parse the

“request for information” with reference to the

same vocabularies, taxonomies and rules in the

SBVR that were used by the automatic categoriz-

ers.

It will then retrieve two kinds of informa-

tion:

Page 5: Actionable Intelligence From Unstructured Data using MDA

Software Modernization. It’s all we do!!! PAGE 5 OF 7

SOFTWARE MODERNIZATION - POWERED BY MODELING

1. ENTITY EXTRACTION: Focus on identifying

named entities.

2. FACT EXTRACTION: Focus on fact patterns

and detecting relationship between data us-

ing “inference”.

The retrieved information will be focused and

relevant to the “request for information”. It will be

actionable intelligence.

PACKAGING & DELIVERY ENGINE

The retrieved information has to be pack-

aged and delivered to the seeker of information

using the right channel, The request can come

from one of many channels, such as interactive

search, interactive browsing, web services, or

well defined APIs. The results are pushed back

through the same channel. There is also more

than one way of representing the results: textu-

ally or visually, through spatial diagrams or mind

maps.

Figure-2 is a schematic representing the

methodology..

PRACTICAL APPLICATIONS

Apart from the holistic application of this

methodology across an enterprise for rich pro-

ductivity gains, greater revenue and informed

decisions, this methodology also has many

smaller practical applications on limited sets of

data.

E-DISCOVERY FROM EMAILS

Email has become the standard for both

Fig - 2

Page 6: Actionable Intelligence From Unstructured Data using MDA

Software Modernization. It’s all we do!!! PAGE 6 OF 7

SOFTWARE MODERNIZATION - POWERED BY MODELING

internal and external communication. A com-

pany's email contains important, and sometimes

confidential, information that is today increasingly

going into massive e-mail archives, whether to

comply with mandatory gov-

ernment regulations or for in-

formation archival.

E-discovery refers to

discovery in civil litigation

which deals with information in

electronic format also referred to as Electronically

Stored Information (ESI). Emails can be a prime

source of information in civil litigation.

Financial and other firms subject to Sar-

banes-Oxley regulatory compliance need effec-

tive e-discovery mechanisms from their e-mail

archives and other documents.

Our solution can help you implement a

powerful information retrieval mechanism from

Email Archives, resulting in the following capabili-

ties, and more:

Advanced search capabilities to find specific

records within your complete and secure ar-

chive.

Locate and produce evidence-quality mes-

sages with metadata in seconds.

Analyze a complete audit trail for every mes-

sage.

Review and classify every message (based

on your company's rules and permissions)

that leaves or enters your organization's do-

mains.

Messages can easily be classified for legal

hold when court or counsel requests that all

data relevant to a particular case be pre-

served.

E-mail analytics designed to be utilized in

complex litigation or investigative matters.

Search for and identify key individuals and

assess their relationships and communica-

tion patterns. The Activity Schematic in Fig-

Powerful E-Mail Analytics

can provide never before discovered intelligence from plain company

emails

Fig - 3

Page 7: Actionable Intelligence From Unstructured Data using MDA

Software Modernization. It’s all we do!!! PAGE 7 OF 7

SOFTWARE MODERNIZATION - POWERED BY MODELING

ure-3 displays communication patterns with a

key individual placed in the center, and e-

mail correspondents connected with radial

spokes.

Timeline View can be produced as a horizon-

tal timeline to help assess critical time peri-

ods in the matter under investigation.

E-mail Analytics help you to easily identify

communications of the key players for sub-

stantive review.

We can transform your email archive into

rich actionable intelligence.

OTHER REGULATORY COMPLIANCE

Regulatory compliance also weighs

down the l i fe sc ience companies

(pharmaceutical, biotech and medical device

companies). FDA regulations pertaining to clini-

cal trials, manufacturing proc-

esses and drug discovery re-

quire similar diligence in pre-

serving “evidence” for a stipu-

lated period of time. Such evi-

dence is also contained in the

unstructured data items like e-

mails and documents. Our methodology equips

the company and auditors with reliable and quick

e-discovery processes, apart from other proac-

tive compliance monitoring functions of interest to

the company.

RESEARCH AND DEVELOPMENT

Pharmaceutical companies engaged in

drug development, for instance, can benefit from

every bit of better intelligence and every minute

of human effort saved. Drug development activi-

ties for a single product can span over ten years

and involve collaboration from a wide range of

professionals like research scientists, pharma-

cologists, chemists, biologists, chemical engi-

neers, production floor specialists, clinical trial

units and others. All the information flow amongst

these diverse entities located in diverse geo-

graphical locations has a very large share of

“unstructured” data.

LAW FIRMS

Law firms try to make sense from un-

structured data every single minute of their exis-

tence. With the expanding Internet-driven uni-

verse, making sense out of information overload

and using the results meaningfully for their cli-

ents’ benefit is an ever-expanding challenge.

CONTENT PUBLISHERS

Companies engaged in any kind of pub-

lishing, especially delivery of content over the

Internet, are competing for differentiation in

search capability.

Content metadata is most important, as

that is indispensable for setting up the catalogs,

fact patterns and indexes that, in turn, can trans-

late into accuracy of information delivered.

INTELLIGENCE & LAW ENFORCEMENT

Especially in this age of rampant terror-

ism, proactive prevention of crimes is a top prior-

ity. The huge world of “unstructured” data con-

stantly evolving on the Internet is a rich source of

intelligence and alerts, but too humungous for

manual processing and/or informal methods.

A methodology such as ours can effec-

tively harness and delivery untold value from the

Internet.

The Lifebood of the Enterprise is

information. The information economy thrives and survives

on information.

Page 8: Actionable Intelligence From Unstructured Data using MDA

Software Modernization. It’s all we do!!! PAGE 8 OF 7

SOFTWARE MODERNIZATION - POWERED BY MODELING

When one needs a heart bypass, one goes to a cardiac surgeon.

When one needs the best storage solutions, one goes to EMC, the storage specialists.

Why would you go to Accenture, Cap Gemini, Infosys or Wipro for software modernization?

WE ARE THE SOFTWARE MODERNIZATION SPECIALISTS. IT IS ALL WE DO.

Call

888.453.0014

Software modernization. It’s all we do!!!

www.adasoftusa.com

379 THORNALL STREET, WEST TOWER - 7TH FL, METROPARK, NJ 08837