Actionable Intelligence From Unstructured Data using MDA
-
Upload
probal-dasgupta -
Category
Business
-
view
1.233 -
download
1
description
Transcript of Actionable Intelligence From Unstructured Data using MDA
Software Modernization. It’s all we do!!! PAGE 1 OF 7
SOFTWARE MODERNIZATION - POWERED BY MODELING
M E M B E R
www.adasoftusa.com
379 THORNALL STREET, WEST TOWER - 7TH FL, METROPARK, NJ 08837
Call
888.453.0014
Informational Primer
Actionable Intelligence
from Unstructured
Data
Call
888.453.0014 ADA SOFTWARE
The automated software modernization company
Software Modernization. It’s all we do!!! PAGE 2 OF 7
SOFTWARE MODERNIZATION - POWERED BY MODELING
D ata is everywhere, but far too often,
not the information we need. Busi-
nesses continue to generate a huge
volume of memos, reports, minutes
of meetings, planning documents, proposals,
emails, website content, blogs, wikis and other
content. But this wealth of data is not providing
companies with the information base it needs to
make the right decisions when it needs to. Be-
cause all this unstructured data is not actionable
intelligence. As a result, although we are awash
with data everywhere, we make uninformed deci-
sions based on a very small slice of that informa-
tion that is readily available to us. Figure-1 shows
how the Information Framework stands broken.
Worse still, all this underutilized deluge
of unstructured data is actually causing compa-
nies to lose money.
IDC estimated in
their report titled
“The High Cost of
Not Finding Infor-
m a t i o n ” ( I D C
#29127) that com-
panies with 1,000
white collar employ-
e e s t y p i c a l l y
wasted in excess of
$6 million per year searching for information and
not finding it. Add to this the lost revenues
caused by unproductive employee time.
The potential loss from unstructured data is,
therefore, multi-faceted and consists of:
Uninformed decisions
Overlooked risks
Loss of employee time
Loss of opportunity
Loss of revenues
All of these can be fixed by our meta-
model driven information management solution
that can turn all this unstructured data into rich,
actionable intelligence.
EXECUTIVE SUMMARYEXECUTIVE SUMMARYEXECUTIVE SUMMARY
Fig - 1
Software Modernization. It’s all we do!!! PAGE 3 OF 7
SOFTWARE MODERNIZATION - POWERED BY MODELING
UNDERSTANDING “UNSTRUCTURE”
“Unstructured data” is not really unstruc-
tured. Let us take the example of a paper maga-
zine. It has a wonderful structure. The Table of
Contents offers an instant overview of the entire
magazine and provides an useable index that we
can use to jump to any article by page number.
Within articles, there are pictures to help us visu-
alize the information contained in
the text; there are headings that
are bolded and tell us what a
section of text is talking about;
there are blurbs (information call-
outs) that highlights some of the
main points of the article; there might be an ab-
stract providing a gist of the whole article; there
may be footnotes, citations and references that
link the ideas expressed with a world of informa-
tion outside the magazine, There are advertise-
ments, which we immediately recognize as ad-
vertisements. There is information about the edi-
torial team, the company publishing the maga-
zine and the authors of the various articles,
An e-mail gives you all the information as
to who wrote the e-mail; when was it written; to
whom was it addressed; who all received a copy;
what was it all about (the Subject); the main body
of the message; and, any reference material pro-
vided as an attachment or an URL
So there is, indeed, a lot of structure in
what we started out as identifying as
“unstructured data”.
The problem is not with the data. The
problem is that a machine does not understand
this structure automatically unless we find a way
of adding machine-readable information to all this
data.
SOLUTIONS STRATEGY
The key to unleashing knowledge from al
this powerful, but untapped, information lies in
being able to:
Generate the right METADATA (data about
the unstructured data) that a machine can
understand,
CATEGORIZE the data using an easily un-
derstood VOCABULARY, and a TAXONOMY
that indicates the data hierarchy and relation-
ships.
Provide a KNOWLEDGE RETRIEVAL
mechanism that understands all of the
above.
APPLYING OMG STANDARDS
OMG has modeling standards embodied
in Model Driven Architecture that can be utilized
for modeling any kind of information (though it is
originally intended for modeling and understand-
ing software systems). The Knowledge Discovery
Metamodel (KDM), for instance, separates
Make Unstructured Data Come Alive
ACTIONABLE INTELLIGENCE FROM UNSTRUCTURED DATAACTIONABLE INTELLIGENCE FROM UNSTRUCTURED DATAACTIONABLE INTELLIGENCE FROM UNSTRUCTURED DATA
“Unstructured” data might have an
excellent structure of its own - that
computers do not understand.
Software Modernization. It’s all we do!!! PAGE 4 OF 7
SOFTWARE MODERNIZATION - POWERED BY MODELING
knowledge about existing systems into four or-
thogonal dimensions: Structure, Behavior, Data
and User Interface, Unlike software, data has no
behavior. But it has associative fact patterns. So
we utilize a modified version of the KDM concept
adapted for understanding unstructured data,
which we call mKDM.
OMG also has an initiative called the Se-
mantics of Business Vocabulary and Rules
(SBVR) which is a standard for establishing a
business vocabulary and terminology system that
can be used to express business models. This is
very useful for defining vocabularies to under-
stand unstructured data, taxonomies to catego-
rize unstructured data, and rules for processing
unstructured data.
Coupled together, mKDM and SBVR pro-
vide the base technology for creating metadata;
defining the vocabularies, taxonomies and rules
for processing the data; and retrieving useful in-
formation based on linked entities as well as
“inferred” fact patterns. This helps convert un-
structured data into actionable intelligence.
PARTS OF THE SYSTEM
SCANNERS AND PARSERS
Scanners and parsers will process the
unstructured data with reference to the Knowl-
edge Modeling Standards of OMG, and produce
symbol tables and syntax trees. This will be an
Abstract Syntax Tree Metamodel representing
the unstructured data.
AUTOMATIC CATEGORIZERS
Automatic categorizers will act on the
metadata and perform the following functions:
Linguistic analysis
Statistical inference
Machine learning
Rule-based processing
These will obtain the relevant vocabular-
ies, taxonomies and rules from the Semantics of
Business Vocabulary & Rules (SBVR) that is part
of our reference Knowledge Modeling Standard.
The SBVR will provide the relevant business
vocabulary necessary to do this job properly.
For instance, if we are doing this job for a stock-
broker, the relevant business vocabulary will be
far different from what will be relevant for a law
firm. Documents will be assigned to multiple
categories.
The output of the automatic categorizers
will be a Metadata Repository; and catalogs, fact
patterns and indexes.
KNOWLEDGE RETRIEVAL ENGINE
Regardless of whether the user is
searching or browsing or seeking information
through a web service or an API, the actual re-
trieval will be performed by a Knowledge Re-
trieval Engine. It has to scan and parse the
“request for information” with reference to the
same vocabularies, taxonomies and rules in the
SBVR that were used by the automatic categoriz-
ers.
It will then retrieve two kinds of informa-
tion:
Software Modernization. It’s all we do!!! PAGE 5 OF 7
SOFTWARE MODERNIZATION - POWERED BY MODELING
1. ENTITY EXTRACTION: Focus on identifying
named entities.
2. FACT EXTRACTION: Focus on fact patterns
and detecting relationship between data us-
ing “inference”.
The retrieved information will be focused and
relevant to the “request for information”. It will be
actionable intelligence.
PACKAGING & DELIVERY ENGINE
The retrieved information has to be pack-
aged and delivered to the seeker of information
using the right channel, The request can come
from one of many channels, such as interactive
search, interactive browsing, web services, or
well defined APIs. The results are pushed back
through the same channel. There is also more
than one way of representing the results: textu-
ally or visually, through spatial diagrams or mind
maps.
Figure-2 is a schematic representing the
methodology..
PRACTICAL APPLICATIONS
Apart from the holistic application of this
methodology across an enterprise for rich pro-
ductivity gains, greater revenue and informed
decisions, this methodology also has many
smaller practical applications on limited sets of
data.
E-DISCOVERY FROM EMAILS
Email has become the standard for both
Fig - 2
Software Modernization. It’s all we do!!! PAGE 6 OF 7
SOFTWARE MODERNIZATION - POWERED BY MODELING
internal and external communication. A com-
pany's email contains important, and sometimes
confidential, information that is today increasingly
going into massive e-mail archives, whether to
comply with mandatory gov-
ernment regulations or for in-
formation archival.
E-discovery refers to
discovery in civil litigation
which deals with information in
electronic format also referred to as Electronically
Stored Information (ESI). Emails can be a prime
source of information in civil litigation.
Financial and other firms subject to Sar-
banes-Oxley regulatory compliance need effec-
tive e-discovery mechanisms from their e-mail
archives and other documents.
Our solution can help you implement a
powerful information retrieval mechanism from
Email Archives, resulting in the following capabili-
ties, and more:
Advanced search capabilities to find specific
records within your complete and secure ar-
chive.
Locate and produce evidence-quality mes-
sages with metadata in seconds.
Analyze a complete audit trail for every mes-
sage.
Review and classify every message (based
on your company's rules and permissions)
that leaves or enters your organization's do-
mains.
Messages can easily be classified for legal
hold when court or counsel requests that all
data relevant to a particular case be pre-
served.
E-mail analytics designed to be utilized in
complex litigation or investigative matters.
Search for and identify key individuals and
assess their relationships and communica-
tion patterns. The Activity Schematic in Fig-
Powerful E-Mail Analytics
can provide never before discovered intelligence from plain company
emails
Fig - 3
Software Modernization. It’s all we do!!! PAGE 7 OF 7
SOFTWARE MODERNIZATION - POWERED BY MODELING
ure-3 displays communication patterns with a
key individual placed in the center, and e-
mail correspondents connected with radial
spokes.
Timeline View can be produced as a horizon-
tal timeline to help assess critical time peri-
ods in the matter under investigation.
E-mail Analytics help you to easily identify
communications of the key players for sub-
stantive review.
We can transform your email archive into
rich actionable intelligence.
OTHER REGULATORY COMPLIANCE
Regulatory compliance also weighs
down the l i fe sc ience companies
(pharmaceutical, biotech and medical device
companies). FDA regulations pertaining to clini-
cal trials, manufacturing proc-
esses and drug discovery re-
quire similar diligence in pre-
serving “evidence” for a stipu-
lated period of time. Such evi-
dence is also contained in the
unstructured data items like e-
mails and documents. Our methodology equips
the company and auditors with reliable and quick
e-discovery processes, apart from other proac-
tive compliance monitoring functions of interest to
the company.
RESEARCH AND DEVELOPMENT
Pharmaceutical companies engaged in
drug development, for instance, can benefit from
every bit of better intelligence and every minute
of human effort saved. Drug development activi-
ties for a single product can span over ten years
and involve collaboration from a wide range of
professionals like research scientists, pharma-
cologists, chemists, biologists, chemical engi-
neers, production floor specialists, clinical trial
units and others. All the information flow amongst
these diverse entities located in diverse geo-
graphical locations has a very large share of
“unstructured” data.
LAW FIRMS
Law firms try to make sense from un-
structured data every single minute of their exis-
tence. With the expanding Internet-driven uni-
verse, making sense out of information overload
and using the results meaningfully for their cli-
ents’ benefit is an ever-expanding challenge.
CONTENT PUBLISHERS
Companies engaged in any kind of pub-
lishing, especially delivery of content over the
Internet, are competing for differentiation in
search capability.
Content metadata is most important, as
that is indispensable for setting up the catalogs,
fact patterns and indexes that, in turn, can trans-
late into accuracy of information delivered.
INTELLIGENCE & LAW ENFORCEMENT
Especially in this age of rampant terror-
ism, proactive prevention of crimes is a top prior-
ity. The huge world of “unstructured” data con-
stantly evolving on the Internet is a rich source of
intelligence and alerts, but too humungous for
manual processing and/or informal methods.
A methodology such as ours can effec-
tively harness and delivery untold value from the
Internet.
The Lifebood of the Enterprise is
information. The information economy thrives and survives
on information.
Software Modernization. It’s all we do!!! PAGE 8 OF 7
SOFTWARE MODERNIZATION - POWERED BY MODELING
When one needs a heart bypass, one goes to a cardiac surgeon.
When one needs the best storage solutions, one goes to EMC, the storage specialists.
Why would you go to Accenture, Cap Gemini, Infosys or Wipro for software modernization?
WE ARE THE SOFTWARE MODERNIZATION SPECIALISTS. IT IS ALL WE DO.
Call
888.453.0014
Software modernization. It’s all we do!!!
www.adasoftusa.com
379 THORNALL STREET, WEST TOWER - 7TH FL, METROPARK, NJ 08837