UIG360 Brochure
-
Upload
daniel-sapir -
Category
Documents
-
view
22 -
download
2
Transcript of UIG360 Brochure
Why is having a unified information governance approach to managing unstructured
data so important? Simply put, long term growth and success can neither be sustained
nor achieved without harnessing the business value— content intelligence— contained
in the vast repositories of unstructured data. In addition, existing approaches to man-
aging unstructured information do not offer a holistic, unified solution, but rather ad-
dress discreet areas and are too simplistic for the complexities of today’s information
environment.
This is at the heart of the challenge being faced by private and governmental organiza-
tions. Today, Information governance is a topic of interest both inside and outside IT.
CIOs, CFO, IT managers responsible for data and infrastructure management, security
officers, GRC (Governance, Risk, and Compliance) officers and general counsel are
struggling to find a comprehensive solution to managing the ever growing volume of
unstructured data. The realization that information governance has gone beyond just
Unified Information Governance 360 — UIG360°™
SECU RIT Y HOSTI NG CLOUD SE RV ICES
PARAGON TOWERS 233 NEEDHAM STREET, NEWTON MA 02464 | p: 617 -658 -2030
WWW.HAYSTAC.COM
ANAL YTIC S SER VICE S
Adopting a holistic approach means looking at the unstructured information in an organic way and acknowledging that the
entire information supply chain is an inter-related eco-system that needs a unified and comprehensive management. Ad-
dressing only a sub-set of the information flow will simply not work.
Haystac’s UIG360˚™ is a Unified Information Governance solution that effectively addresses the need of enterprises for
hierarchical document classification, unstructured data analytics and document retention management, all on a single,
highly flexible platform delivered via a secured private cloud.
As the IG landscape continues to change – and the definition of “information
governance” itself matures over time – organizations that take the time to es-
tablish a holistic, program-based approach to IG management will be in the
best position to benefit from:
Effective risk management
Assigning a value to the risk being managed
Reduced risk and improve security and privacy.
PARAGON TOWERS 233 NEEDHAM STREET, NEWTON MA 02464 | p: 617 -658 -2030
WWW.HAYSTAC.COM
UIG360°™ Core Capabilities Effective Technology For Pro-active Classification
UIG360°™ Overa l l a rc h i te ct u re for scan ne d image c lass ifi catio n, Data C a p-tu re an d Log i ca l Docu me nt Bou nd ar y D eter min atio n (L DBD)
Document Exemplars
Identify document classes and types
Run reports
Define feature set and classification rules
Train (seed) documents
Run page classification
Run visual classification and visual classification
post-processing
Define/refine rules of page collapsing into documents
Initial input data set review to detect image defects and possible feature sets for classification models
Review results
Configure image pre-processing algorithm
for data set
Run image pre-processing
Image preprocessing and visual classification
in case of unacceptable noise back to start
Reporting
Review results
Classification
in case of unacceptable noise back to start
Review results
in case of unacceptable noise back to start
Genre-based, supervised
learning for image classifica-
tion:
‘Visual words’ - points of
interest and layout based
features
Multi-class probabilistic
Support Vector Machine
Support for multiple mod-
els
Soft Dictionary - Fuzzy search
of dictionary terms or phrases
to classify into a pre-defined
taxonomy
Image quality improve-
ment algorithms
Skewed angle ad page
orientation detection
Image segmentation—
separation of image into
graphics, photos, tables,
text and ’noisy data’
OCR post processing
Fuzzy pattern matching frame-
work with key phrase data ex-
traction. Built-in capabilities in-
cludes:
Ability to find merged and
split words
Built-in expressions and
dictionaries
Boolean and built-in func-
tions
Domain-specific language
with simple grammar:
“invoice Amt.”,
“>invoiceAamount”….
API for custom expressions
Classification Image Processing Data Capture and Extraction
Location based constraints
Uses hOCR format to retrieve text
location information
Improved OCR segmentation into
logical text blocks
Fuzzy in-text rules to find and
match location of anchor expres-
sions
Built-in location expressions:
right, left, above, below, block..
Example:
Above (“invoice summary”, block> supplier
contracts”
Right (“sub total”, below (“taxes”, money
>taxAamout
Built in custom reporting
Export of search results
Extractive Summaries
Hierarchical clustering of search
results
Faceted Search by:
Categories
Captured data points
Document age
Extensions…..
Correlate facets and queries
Knowledge model development con-
sists of defining the relevant docu-
ment
Pre-defined libraries
Gap analysis—compare knowledge
model to retention policy
Disposition eligibility—for machine-
classified results can be run at both
summary and detail levels.
Anomaly Detection – detecting docu-
ments in the data stream that do not
belong to any of the classification
models
UIG360°™ is designed to support scalable (100s TB) multi-class unstructured data classification based on discriminative Ma-
chine Learning (ML) algorithms. It supports multitude of documents formats (over 3000) and special feature set to classify
emails threads. It also supports best practices in classifying emails based on attachments. Its designed to allow unstructured
data across the network to be crawled, indexed and processed in accordance with machine learning classification and rule-
based coding.
Analytics Document Retention Data Capture and Extraction
Fuzzy pattern matching framework with key
phrase data extraction. Built-in capabilities
includes:
Ability to find merged and split words
Built-in expressions and dictionaries
Boolean and built-in functions
Domain-specific language with simple
grammar
API for custom expressions
UIG360°™ O vera l l work flow a rch i te cture for c la ss i fi cation and data a na lys i s
Document Exemplars
Critical Attributes
Apply Classifica-tion
Decision
Process Data
Apply Classification
Strategy
Controlled Vocabulary
Versions
Initiate Process
System Files
Duplicates
Taxonomies
Rules
Iterate Strategy
Data Quality
Quality Control
Database
Auto Extract Attributes
Machine Learn-
Rules
Image Processing
Manu-al
Reconcile to CMS
Taxonomies
Critical Attributes
PARAGON TOWERS 233 NEEDHAM STREET, NEWTON MA 02464 | p: 617 -658 -2030
WWW.HAYSTAC.COM
PARAGON TOWERS 233 NEEDHAM STREET, NEWTON MA 02464 | p: 617 -658 -2030
WWW.HAYSTAC.COM
Deploys unsupervised machine learning methods to associate documents with user-selected business topics/categories
(themes)
Facilitates analytics by providing for each document its related topics and related documents and versions
Combines (clusters) documents that belong to the same business topic/category, and provides extractive summary for
a given cluster or a single document
Facilitates search by supporting facets, or business topics (themes), related categories, versions, document age, etc.,
and provides in context query completion based on generated key phrases for the entire topic/category
Intelligent ingestion supports single and batch documents upload. It provides view into the upload process. It suggests
possible categories (business topics) of the new document and identifies previous versions if they exist.
SECU RIT Y HOSTI NG CLOUD SE RV ICES ANAL YTIC S SER VICE S
UIG360°™ Core Capabilities
Effective Technology For Content Intelligence
Categorization Cross Reference and Correlation
Content Upload