aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic...
Transcript of aka Distributed Index-based and Conventional Data ... · Conventional Data Virtualization Basic...
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SmartData Fabric® (SDF)
aka SmartData Lake™ (SDL)
aka Distributed Index-based and
Conventional Data Virtualization
Basic Overview
Revision 12.2 Copyright 2020 WhamTech, Inc. 1
October 2020
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Significant customers and partners
Revision 12.2 Copyright 2020 WhamTech, Inc. 2
CUSTOMERS PARTNERS
Very large
healthcare
provider
Very large
bank
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Conventional data solutions compared to
SmartData Fabric®
Revision 12.2 Copyright 2020 WhamTech, Inc. 3
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Conventional data solutions between a rock and a hard place
Revision 12.2 Copyright 2020 WhamTech, Inc. 4
ROCK = DATA VIRTUALIZATION/FEDERATION
Pros
• Leaves data where it is
• Easy to add or remove data sources
• Meets on-soil data retention regulations
Cons
• 100% dependent on source data quality and systems for queries
• Access control and data security impacted by data quality
• Difficult to manage and integrate Master Data Management (MDM)
HARD PLACE = DATA WAREHOUSE
Pros
• Removes dependency on source data quality and systems for queries
• Determines best (master) data, deduplicates data and ensures referential integrity
• Single database for queries
Cons
• Copies all data, which introduces latency and a security liability
• Transforms source schemas and data to a one-size-fits-all database schema
• Takes significant time and cost
• Has an inflexible schema
• Difficult to add or remove data sources
• Difficult to trace and erase personal data for CCPA/CCPR and GDPR
• In many cases, needs additional data marts
• Does not meet on-soil data retention regulations
Rock
Hard Place
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Data lakes are in-between solutions
Revision 12.2 Copyright 2020 WhamTech, Inc. 5
IN-BETWEEN A ROCK AND A HARD PLACE = DATA LAKES
Pros
• All data in a single location or system
• Leaves schema and data “as are”
• Helps with IT issues of access control, scalability, and query processing, performance and
load
• Easy to add or remove data sources
Cons
• Copies all data, which introduces latency and a security liability
• Does not help with data management – still requires ETL to a data warehouse and then data
marts, or an additional data management layer
• New market solutions are data lake + ETL + data warehouse (+ data marts?)
• Difficult to erase personal data as per CCPA/CCPR and GDPR unless traceback from ETL
process or data management layer
• Does not meet on-soil data retention regulations
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Comparison among conventional data solutions
Revision 12.2 Copyright 2020 WhamTech, Inc. 6
Feature
Rock =
Data virtualization/federation
Hard Place =
Data warehouse
In-between =
Data lake
Leaves data in sources ✓
Leaves schema and data “as are” ✓ ✓
Addresses data quality and other data-related
issues ✓
Avoids queries on source systems ✓ ✓
Easy to add/remove data sources ✓ ✓
Supports on-soil data retention regulations ✓
Avoids schema and resultant complex data
transformation✓ ✓
Supports traceback and erase personal data ✓ ✓
Avoids latency and security liability ✓
Avoids additional ETL ✓ or ✓
Offers integrated Master Data Management (MDM) ✓
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Almost ANY and MULTIPLE data sources on
ANY and MULTIPLE platforms, including
• mainframes
• databases
• data lakes/warehouses/marts
• files
• logs
• office docs
• applications/SaaS
• Web docs
• social media
• Big Data
• streaming
• clouds
• IoT
SmartData Fabric® is a complete data management layer
Revision 12.2 Copyright 2020 WhamTech, Inc. 7
DATA + virtualization
discovery
identification
classification
security
cleansing
entity extraction
transformation
standardization
access control
governance
federation or index store
relationships/links
master data management (MDM)
integration
catalog
monitoring
virtual graph database
support reporting, BI and analytics
Leverage index-
based federated
adapters for
preprocessing and
query execution
only on data that
needs attention -
much/most data
may not, e.g.,
transactional data
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Comparison between conventional data solutions and SmartData Fabric®
Revision 12.2 Copyright 2020 WhamTech, Inc. 8
Feature
Rock =
Data virtualization/federation
Hard Place =
Data warehouse
In-between =
Data lake SmartData Fabric®
Leaves data in sources ✓ ✓
Leaves schema and data “as are” ✓ ✓ ✓
Addresses data quality and other data-related
issues ✓ ✓
Avoids queries on source systems ✓ ✓ ✓
Easy to add/remove data sources ✓ ✓ ✓
Supports on-soil data retention regulations ✓ ✓
Avoids schema and resultant complex data
transformation✓ ✓ ✓
Supports traceback and erase personal data ✓ ✓ ✓
Avoids latency and security liability ✓ ✓
Avoids additional ETL ✓ or ✓ ✓
Offers integrated Master Data Management (MDM) ✓ ✓
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Enterprise needs
Revision 12.2 Copyright 2020 WhamTech, Inc. 9
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SmartData Fabric® impacts all enterprise needs
Revision 12.2 Copyright 2020 WhamTech, Inc. 10
Most Enterprises
MEETING REGULATIONS and regulatory scrutiny that are increasing
INCREASING EFFICIENCY by
reducing costs and latency in operations
INCREASING EFFECTIVENESS by
gaining and leveraging complete views of clients and other entities using
analytics
LEVERAGING NEW TECHNOLOGIES such as AI/ML, Big
Data, cloud, virtualization, APIs, process automation
and Blockchain
ADDING VALUE to clients and therefore
the enterprise
Meeting these
needs depends
on access to
high quality,
standardized
data and master
data
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Digital Transformation -> Projects -> Changes
Revision 12.2 Copyright 2020 WhamTech, Inc. 11
Digital
Transformation
[DRIVER]
Project
Management
Change
Management
These processes
depend on access
to high quality,
standardized data
through standard
APIs and
workflows, and
support for
interoperability
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
A fundamental shift in all markets
• Data is now seen as a prime asset
• Data-driven reporting, BI and analytics are, in-turn, seen as driving business
• Data plus AI/ML, cloud, hybrid cloud, data/app/network/storage virtualization,
APIs, process automation, Blockchain, etc. seen as differentiators
Revision 12.2 Copyright 2020 WhamTech, Inc. 12
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
• Data access control and data
security
• Master data management (MDM)
• Reporting, BI and analytics
• Compliance
• Near real-time access to, and
insights from, data
• Closing the loop from analytics to
operations
• CCPA/CCPR and/or GDPR
• Data discovery
• Metadata
• Data quality
• Legacy data sources, including
mainframes and file systems
• Unstructured data, including docs,
email and social media
• Data catalog
• Data governance
• Data integration
• Data interoperability
Revision 12.2 Copyright 2020 WhamTech, Inc. 13
However, enterprises continue to struggle with data-
related fundamentals, including…
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Main data problems
Revision 12.2 Copyright 2020 WhamTech, Inc. 14
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Main data problems are
(a) data is everywhere, and
(b) data-related issues
Migrating ALL data and applications to a cloud helps IT issues,
but NOT data-related issues, and
UNREALISTIC in short-to-medium term for most enterprises
Revision 12.2 Copyright 2020 WhamTech, Inc. 15
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Main data problems
Data is everywhere
• Where is it – copies aka discovery?
• What is it – personal/sensitive?
• Is it secured/protected?
• Is metadata available?
• Is governance in-place?
• What is it used for – essential for
CCPA/CCPR and GDPR, with right to erase
personal data?
• Can it be copied to one location, e.g., a data
lake?
• Can it be ETL’d to a one-size-fits-all
database, e.g., a data warehouse?
• How is it connected to other data within and
across source systems?
Data issues
• Does it need cleansing, transformation and/or
standardization?
• Does it need pre-processing, e.g.,
unstructured data?
• Does it need to be de-duplicated – keep the
latest/best?
• Does it need to be aggregated, joined and/or
calculated?
• Is it part of master data?
• Is master data (i) seamlessly and
automatically integrated with data access, (ii)
updated in near-real-time, or (iii) centralized
or distributed?
• Are different views required - operational vs.
analytical?
Revision 12.2 Copyright 2020 WhamTech, Inc. 16
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Typical data source, access control and deployment issues
Data source issues
• Are standard drivers of ODBC and JDBC available? SQL query processing? What about APIs?
• Query performance? Availability of indexes and indexed views? Load on data sources?
• If using cache, how is it populated and maintained? Load on data sources? Latency?
• Data monitoring? Event processing? BPM workflows/process automation
• Key updates propagated to other data sources, views and master data, e.g., emails and phone numbers
Data source access control issues
• Support for advanced access control within and across domains, e.g., AD/LDAP, IAM, SSO, RBAC,
ABAC/RLS and CLS, regardless of data source support for any of these?
Compute deployment issues
• Accepting that data is everywhere, it is difficult to deploy compute everywhere – even Hybrid Cloud (1.0)
needs multiple remote local deployments that need to be resourced, managed and coordinated
Revision 12.2 Copyright 2020 WhamTech, Inc. 17
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SmartData Fabric® and the power of indexes
Revision 12.2 Copyright 2020 WhamTech, Inc. 18
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Conventional data virtualization/federation vendors
Revision 12.2 Copyright 2020 WhamTech, Inc. 19
1. Leave data in sources (some do not)
2. Virtualize the view of data in sources
3. Connect standard applications using standard drivers and SQL
4. Access and query data in multiple sources in parallel, usually through connectors or adapters
5. Combine results from multiple sources (some do not)
6. Cache results data for improved query performance and less data source load (some do not)
7. Build and maintain MDM (some do not)
8. Apply MDM to combined results data to provide integrated results to applications (some do not)
In step 4 above,
• ALL CONVENTIONAL DATA VIRTUALIZATION VENDORS ARE 100% DEPENDENT ON DATA SOURCES AND DATA
IN SOURCES for data quality, data standardization, available indexes and indexed views, and query processing, which
• CAN IMPOSE SIGNIFICANT QUERY LOAD ON DATA SOURCES AND LEAD TO POOR QUERY PERFORMANCE
• HAS AN IMPACT ON DATA SOURCE ACCESS CONTROL AND DATA SECURITY and
• FAILS TO 100% DELIVER
➔ Enterprises need to enable all/most data-related fundamentals AND address data-related
issues to be successful – not just enable access to data
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SmartData Fabric® data virtualization/federation value-add
100% NOT dependent on data sources or data in sources, and addresses related issues upfront by
a)pre-processing raw source data (and optionally storing processed data in indexes) while building and
maintaining indexes and indexed views,
b)processing queries against these indexes and indexed views, and
c) post-query processing raw results data read from sources (or read directly from indexes)
but only on data in sources that needs it!
• Most data in many systems is non-human-generated and does not need pre-processing or post-query
processing, e.g., transaction systems, but may still need external indexing and query processing
• Some data needs pre-processing and post-query processing, such as customer, product, organization,
etc., e.g., entities and associated attributes, usually for data quality, standardization and security, and
master data management (MDM)
Enables all/most data-related fundamentals that enterprises continue to struggle with, mainly by
addressing all/most issues with data source, data in sources and access control
➔ COMBINES THE BEST of conventional data virtualization, data warehousing, enterprise search
and graph database, and OVERCOMES THE WORST of these approaches
Revision 12.2 Copyright 2020 WhamTech, Inc. 20
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
• Transparent virtual distributed data management layer that plugs-and-plays in existing IT infrastructures
• Complements and leverages existing IT systems, tools and applications
• Key differentiator: Federated adapters that Read, Transform and INDEX (RTI) data, wherever it
resides, process queries against these indexes, and read and transform results data from sources
• Indexes enable UPFRONT semi-automated data discovery, security, quality, standards, MDM and other
data-related processes, BEFORE the first query made/application used
• Leaves and guards data in sources
Unconventional data virtualization
Revision 12.2 Copyright 2020 WhamTech, Inc. 21
EXTERNAL
COMMERCIAL/PUBLIC
DATA SOURCE
CLOUD
ORGANIZATION A
ORGANIZATION B
ORGANIZATION C
Data
Governance
MDMETL
ORGANIZATION’S OWN
SYSTEMS OF REFERENCE
= SDF Federation Server
= SDF Adapter
= SDF Indexes
= SDF Hybrid Adapter
= Direct connect
DS1 DS2 DS3 DS4 DS5 DS6 DS7 DS8 DS10DS6
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
New paradigm – Read, Transform and INDEX (RTI)
Revision 12.2 Copyright 2020 WhamTech, Inc. 22
Raw DATA
DATA management
Master DATA management
DATA integration
DATA relationships
DATA analytics
INDEXES
increase
the value,
and
reduce the
cost and
risk, of
DATA
DATA access control
DATA governance
DATA protection regulations
DATA classification
DATA security
DATA discovery and profiling
Source: #WhamTech SmartData Fabric Power of Indexes
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Three types of indexes
Revision 12.2 Copyright 2020 WhamTech, Inc. 23
Content indexes basis for other
indexes
All indexes resolve to “record
numbers” – internal to SDF, but
correlated with external/data
source references, and can be
combined using Boolean
operations on physical
and virtual bitmaps
ContentIndexes
Master Data
Indexes
Link Indexes™
✓
Source: #WhamTech SmartData Fabric Power of Indexes
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
INDEXES ARE KEY to a complete understanding
of data, enabling capabilities and driving value
Including being able to identify and access GDPR, HIPAA, PCI
and other confidential, classified and risk data
Revision 12.2 Copyright 2020 WhamTech, Inc. 24
Source: #WhamTech SmartData Fabric Power of Indexes
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Revision 12.2 Copyright 2020 WhamTech, Inc. 25
SmartData Fabric®
Indexes(data stays in
sources)
DataDiscovery/Profiling/Matching/
TransformsZero Trust Data
Security
Data Pre-processing
Link Indexes™ as basis for MDM and other
Combine for
complete and
multiple views
Full
Data
Traceability
High Performance
QueryProcessing
Data Monitoring
/Event Processing
Virtual Graph
Database and Link Analysis
Generate results without
data source
1. Use raw indexes for DATA DISCOVERY (metadata), build and maintain DATA
PROFILING, DATA MATCHING within and across data sources, and
DEVELOPING AND TESTING DATA TRANSFORMS
2. Support FORRESTER ZERO TRUST DATA SECURITY FRAMEWORK –
discover, INDEX, classify and secure – GDPR, PCI, PHI, PII, etc.
3. PRE-PROCESS DATA while building and maintaining production indexes to
address data management fundamentals, e.g., cleansing, transformation,
standardization and security – data is usually discarded
4. Use LINK INDEXES™ AS BASIS FOR MDM AND OTHER CAPABILITIES –
future development to use indexes exclusively for MDM match and merge
5. Provide COMPLETE AND MULTIPLE VIEWS OF DATA through queries on
combined content, link and master data indexes
6. Provide FULL DATA TRACEABILITY as indexes and results contain unique
pointers to data in sources – data lineage, governance and audit
7. Enable HIGH PERFORMANCE, DISTRIBUTED PARALLEL QUERY
PROCESSSING through standard drivers, APIs, Web/data services, SQL and
other query languages
8. MONITOR DATA SOURCES for content and relationships in near real-time,
and support EVENT PROCESSING
9. Enable VIRTUAL GRAPH DATABASE, link analysis and graph/link
visualization
10. GENERATE RESULTS WITHOUT DATA SOURCE when source is
unavailable, for query optimization, or as storage, e.g., for IoT devices, as
indexes are columnar and can be inverted and combined
Indexes key to understanding data, enabling capabilities and driving value
Source: #WhamTech SmartData Fabric Power of Indexes
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Eight types of content indexes
Revision 12.2 Copyright 2020 WhamTech, Inc. 26
Content Indexes
Source data
Composite(source data combined)
Derived from source data
Indexed views (pre-
aggregated, calculated and
joined data)
Unstructured text
Extracted entities
Fuzzy match
Security or access level
Source: #WhamTech SmartData Fabric Power of Indexes
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SmartData Fabric® configuration and
deployment options
Revision 12.2 Copyright 2020 WhamTech, Inc. 27
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Revision 12.2 Copyright 2020 WhamTech, Inc. 28
Content
Indexes
EIQ SuperAdapter
SDV with MDM extension
EIQ Federation Server
Applications
SDV with MDM extension
Credit Reporting Bureau
(Oracle)
Content
Indexes
EIQ SuperAdapter
SDV with MDM extension
Master
Customer
Index
MCRs
Link
Indexes
Master
Customer
Index
MCRs
Link
Indexes
SDV = Standard Data View
aka Standard/Common Data Model
MCR = Master Customer Record
* = All SDF Adapters have Standard Drivers/SQL
SEAMLESS INTEGRATION OF
MASTER CUSTOMER DATA WITH
OPERATIONAL/TRANSACTIONAL
AND ANY OTHER DATA
EIQ ConventionalAdapter
SDV
EIQ Federation Server
SDV with MDM extension
Credit Card Transactions
(MapR Hive)
CLOUD
External to the Cloud
(On-premise, SaaS, data
center, other Cloud, etc.)
HYBRID
CLOUD
2.0
Standard Drivers/SQL*
AD/LDAP
HYBRID
ADAPTER
SmartData Fabric® capabilities address issues
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Revision 12.2 Copyright 2020 WhamTech, Inc. 29
Content
Indexes
EIQ SuperAdapter
SDV with MDM extension
EIQ Federation Server
Applications
SDV with MDM extension
Credit Reporting Bureau
(Oracle)
Content
Indexes
EIQ SuperAdapter
SDV with MDM extension
Master
Customer
Index
MCRs
Link
Indexes
Master
Customer
Index
MCRs
Link
Indexes
SEAMLESS INTEGRATION OF
MASTER CUSTOMER DATA WITH
OPERATIONAL/TRANSACTIONAL
AND ANY OTHER DATA
EIQ ConventionalAdapter
SDV
EIQ Federation Server
SDV with MDM extension
Credit Card Transactions
(MapR Hive)
CLOUD
External to the Cloud
(On-premise, SaaS, data
center, other Cloud, etc.)
HYBRID
CLOUD
2.0
1. Data discovery
2. Only select data
3. Links/relationships
4. MDM
5. SDVs
6. Unstructured
data
7. Data monitoring
8. Standard drivers/SQL
9. EIQ layer
10. Advanced
access control
11. Hybrid Cloud
2.0 (and 1.0)
Standard Drivers/SQL*
AD/LDAP
HYBRID
ADAPTER
SmartData Fabric® capabilities address issues
SDV = Standard Data View
aka Standard/Common Data Model
MCR = Master Customer Record
*All SDF Adapters have Standard Drivers/SQL
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Smartphone CRM app invoking new BPM-based workflows with
write-back to legacy systems through standard APIs/data services
Revision 12.2 Copyright 2020 WhamTech, Inc. 30
ORGANIZATION A ORGANIZATION B
Patient
Management
Indexes
EIQ Adapter
EIQ Federation Server
EIQ Federation Server
FHIR REST APIs
EHR
Type 1
Indexes
EIQ Adapter
Labs
Indexes
EIQ Adapter
ORGANIZATION C
Patient
Management
Indexes
EIQ Adapter
EIQ Federation Server
Patient
Management
Indexes
EIQ Adapter
EIQ Federation Server
EHR
Type 3
Indexes
EIQ Adapter
Web Server
EHR
Type 2
Indexes
EIQ Adapter
Applications
Applications
Applications
Applications Applications
Applications
Applications
Applications
Applications
Applications
Applications
Public Cloud
Data sources remain on premise
Local SmartData Fabric deployed on premise
Data sources remain on premise
Local SmartData Fabric deployed on Cloud
Web Server
Applications
Applications
• Patient-centric smartphone app
interacts with legacy data sources
through new workflows developed
and orchestrated by BPM software
• BPM workflows interact with data
source through standard FHIR
REST APIs provided as data
services
• BPM workflows both read and
write back to legacy data sources
Web Server
BPM Workflow
Hybrid Cloud 1.0
Hybrid Cloud 2.0
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Special note on Hybrid Cloud 2.0
1. Data is everywhere - leave it where it is: On-premise, on mainframes, in data centers, cloud(s), SaaS, third-parties, Web,
social media, etc.
2. Run SmartData Fabric® unconventional data virtualization in the cloud, leveraging index-based, in addition to
conventional, federated adapters for data-related pre-processing and query processing in and/or from the cloud – no
need to install and run anything elsewhere, as is the case with Hybrid Cloud 1.0
- Establish index update process through changed data capture (CDC)
- Multiple CDC options, including near real-time (NRT)
3. Focus on data that needs processing for quality, standardization, security, relationship mapping and master data
management (MDM) – various options for the rest of the data
- Enable data-related fundamentals
- Address data, data source and access control issues
4. Multiple configuration options, including (a) some data indexed and the rest stays in the source, (b) all data indexed and
stored in indexes, and (c) no data indexed and all queries on data source, with other options in-between
5. Avoid incomplete or incorrect query results, query load and/or poor query performance of conventional data
virtualization/federation, i.e., avoid dependence on data sources, data in sources or data source own access control
6. Immediate short-to-medium-term implementation
7. Optional, medium-to-longer-term transition-migration to the cloud
Revision 12.2 Copyright 2020 WhamTech, Inc. 31
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Example multiple data source SDF configuration
Revision 12.2 Copyright 2020 WhamTech, Inc. 32
F I
R E
W A
L L
F I
R E
W A
L L
EIQ Federation
Server
EIQ Federation
Server
Social
Media
FeedIndexes
EIQ
SuperAdapter
EIQ Conventional
Adapter
3rd Party
AdapterSalesforce
Hadoop IndexesEIQ
SuperAdapter
Mainframe IndexesEIQ
SuperAdapter
ERP
System
EIQ Federation
Server Application(s)
WhamTech
ODBC/JDBC
Driver,
APIs,
Web/data
services
TCP / IP
RDBMS IndexesEIQ
SuperAdapter • Adapters and federation servers
independently configurable and accessible at
multiple levels
• Potential LIFO/FIFO query processing
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Example shared-nothing architecture
Revision 12.2 Copyright 2020 WhamTech, Inc. 33
Data
Source
Indexes
EIQ SuperAdapter EIQ SuperAdapter EIQ SuperAdapter
EIQ Federation ServerEIQ Federation ServerEIQ Federation Server
EIQ Federation Server
Indexes Indexes
Application(s)
EIQ SuperAdapter EIQ SuperAdapter EIQ SuperAdapter
Indexes can be multiple
sharded segments or replicated
copies
Out-of-the-box configurable
backup, failover and load
balancing = high availability
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Automated SmartData
Discovery and
Classification
(ASDAC) thus far
Initial EIQ Adapter configuration, index build and data view mapping
Revision 12.2 Copyright 2020 WhamTech, Inc. 34
Data
Source
Data Read,
Transform/
clean-up
(and Index)
Index schema
and names
usually same
as data source
Twelve ways
to build and
maintain
indexes
EIQ
Adapter*
w/SDV**
EIQ
Indexes
Develop
and test
Data Transforms
using profiles
Network
Asset
and Device
Discovery
Metadata
Discovery
and Semantic
Mapping
Data
Source
Discovery
Indexes usually
do not store data
– only queryable
representations*EIQ SuperAdapter and EIQ TurboAdapter
**Standard Data View
Data
Classification
and Data
Security
Alternate use of raw indexes to initially build EIQ Indexes
Data Discovery
and raw index-
based
Data Profiling
Indexes mapped
to SDV
Distributed Metadata Repository,
incl. Data Governance
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
EIQ Adapter index update, query and results retrieval
Revision 12.2 Copyright 2020 WhamTech, Inc. 35
EIQ
Server
(sub-
Middleware)
Data
Source
Application(s)
Data Read,
Transform/
clean-up
(and Index)
Result-set pointers
to data in source
Results provided
in almost any format
Applications / middleware
connect with standard drivers or
Web Services and SQL***
EIQ
Adapter*
w/SDV**
Multiple other data sources
EIQ
Indexes
User-level
access
…
…
Middleware
*EIQ SuperAdapter and EIQ TurboAdapter
**Standard Data View
Queries resolved
in the EIQ Adapter
and EIQ Indexes
Raw results data usually
transformed/cleaned-up
from source
EIQ
Federation
Server
(sub-
middleware)
w/SDV
EIQ
Federation
Server
…
…
…***Future OQL, SPARQL and NoSQL options
Continual EIQ Indexes updates
Distributed Metadata Repository,
incl. Data Governance
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Results Level
Batch updates (flat
file export)
Incremental updates
(flat file export)
Polling*
Update / event
notifications*
Index updates/changed data capture
Revision 12.2 Copyright 2020 WhamTech, Inc. 36
LEGEND
Data Schema Level
Triggers
Transaction / change
/ redo logs
Existing replication /
backup / change data
capture processes
Batch updates
(schema file export)
Incremental updates
(schema file export)
Either Data Schema
Level or Results Level
Crawler / spider
Message queues
RSS feeds*
Near real-time
– low rate
DE
CR
EA
SIN
G
INT
RU
SIV
EN
ES
S
Near real-time
– high rate
Batch / incremental
– high volume
Batch / incremental
– low volume
Preferred option
* = User-level access
Data Schema Level
Source: #WhamTech SmartData Fabric Power of Indexes
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Levels of data representations and processes
Revision 12.2 Copyright 2020 WhamTech, Inc. 37
T1: PERSON
PER_ID
PERSON:
: PER_LNAME
: PER_FNAME
T3: ADDRESS
ADD_ID
PROPERTY_NO
ADDRESS:
: ADD_1
: ADD_2
: ADD_CITY
: ADD_STATE
: ADD_ZIP
T2: PERADD
PER_ID
ADD_ID
DATA SOURCE SCHEMA
STANDARD DATA MODEL
Could be industry standard, e.g.,
ACORD, HL7 or NIEM
PERSON
PER_ID
Last Name
First Name
Sex
DOB
SSN
Height
Weight
Eye Color
LICENSE
LIC_ID
License No.
Class
Date Issued
Data Expires
Restrictions
ADDRESS
ADD_ID
Property No.
St. No.
St. Name
St. Type
Apt./PO/Ste. No.
City
State
ZIP
VEHICLE
VEH_ID
VIN
Year
Manufacturer
Model
Color
VEH-REG-ADD
VEH_ID
ADD_ID
PER-REG-ADD
PER_ID
ADD_ID
PER-OWNS-VEH
PER_ID
ADD_ID
PER-OWNS-ADD
PER_ID
ADD_ID
PER-LIC
PER_ID
LIC_ID
PER-LIC-ADD
PER_ID
ADD_ID
T1: PERSON
PER_ID
PERSON:
: PER_LNAME
: PER_FNAME
T3: ADDRESS
ADD_ID
PROPERTY_NO
ADDRESS:
: ADD_1
: ADD_2
: ADD_CITY
: ADD_STATE
: ADD_ZIP
T2: PERADD
PER_ID
ADD_ID
OPTIONAL DATA LAKE SCHEMA
Optional copy
or conversion to
another format, e.g.,
document
PR
OC
ES
S
Larry
Curly
Moe
Recno
WURN005
WURN245
WURN912
Recno
WURN193
Recno
WURN087
WURN332
T1:PERSON.PER_FNAME
INDEXESTypically, do not store data, but, optionally, can
Recno
WURN005
WURN245
WURN005
WURN087
RECORD NO.
Recno
WURN245
WURN912
WURN332
WURN245
WURN912
Recno
WURN332
Recno
WURN005
WURN912
Recno
WURN087
L1
L2 L3 L4 L6
LINK INDEX
Index schemas are usually the same as
or very similar to data source schemas
Recno
1
2
3
…
Last Name
Smith
Jones
Parker
…
First Name
Curly
Larry
Moe
…
No. Properties Owned
7
1
3
…
INDEXED VIEW
e.g., materialized aggregation
Can be virtual and hierarchicalGRAPH DB
ALL VIRTUAL LOGICAL
DATA VIEWS
STANDARD DATA VIEW
IMP
OR
T
L7
ADDITIONAL BUSINESS OBJECT(S)
ADDITIONAL BUSINESS OBJECT(S)
QU
ER
Y
PERSON WHO OWNS PROPERTY
Last Name
First Name
Property No.
St. No.
St. Name
St. Type
Apt./PO/Ste. No.
City
State
ZIP
DATA MART(S)
L5
Source: #WhamTech Link Indexes and Ontologies
CONTENT INDEX
What
applications/
end-users see
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SmartData Fabric® enables operational and
analytical solutions, and bridges the gap
between them
Revision 12.2 Copyright 2020 WhamTech, Inc. 38
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SDF bridges the gap between the enterprise, and reporting, BI, analytics and other apps
Revision 12.2 Copyright 2020 WhamTech, Inc. 39
Enterprise(operational,
transactional,
other and
external
data sources)
Integration
(read only)
Single customer
view (read only)
Interoperability
(read and write) Data Provisioning
Other Predictive rules/interactive CRM and BPM
Queries/Results
Data Security
Data Quality
Data Links/
Relationships
Data Masking, Tokenization and
Encryption
Master Data
Data Governance
Global
Individual
Regional
Reporting,
BI,
analytics
and other
apps
Multiple
entity
centricities
Local
Group
Population
SmartData Fabric®
Data Aggregation
Data Mapping
Access Security
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Unique security-centric distributed indexed adapter-based data virtualization, data federation and data
integration software for the following general solutions:
• Automated data discovery, profiling, quality, standardization, governance and relationships mapping
• Advanced data access and data security – seen as a security solution
• Virtual data warehouse and/or virtual data mart
• Data lake + data management + master data management = clean and usable data reservoir
• Data provisioning for highly curated, self-serve reporting, BI and analytics
• Interoperability with write-back to data sources – integrated data, not just app to app
• Seamless, automatic and near real-time updateable distributed master data management
• Virtual graph database and link analysis, and interactive graph/link visualization
• Hybrid Cloud 2.0 where data sources remain wherever they reside, but run all compute in the Cloud or data center
• Near real-time data source monitoring, event processing and Business Process Management (BPM)
• Embrace and enable STANDARDS such as ODBC, JDBC, REST APIs and SQL, and standard applications
➔ Discover, secure, access, integrate and deliver INTEGRATED structured, unstructured and semi-
structured data from almost ANYWHERE to almost ANYWHERE in almost ANY FORMAT, aka Actionable
Data Catalog
SmartData Fabric® general solutions
Revision 12.2 Copyright 2020 WhamTech, Inc. 40
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SmartData Fabric® basic solutions
Revision 12.2 Copyright 2020 WhamTech, Inc. 41
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SmartData Fabric® incremental solutions
Revision 12.2 Copyright 2020 WhamTech, Inc. 42
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Services-based combination for vertical market-specific solutions
Revision 12.2 Copyright 2020 WhamTech, Inc. 43
BPM Workflows/Automation(Third-party)
Actionable Data Catalog(WhamTech)
API Catalog(Third-party)
✓
REST
APIs
Support
REST
APIs
Direct
REST
APIs
Indirect
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Basic architecture for services-based vertical market-specific solutions
Revision 12.2 Copyright 2020 WhamTech, Inc.
44
Applications (many options)
Mainframes Databases Files LogsOffice
docsEmail
Big
Data
Web
docs
Social
media
Cloud
DBStreaming IoTApplications
Data
Sources
Actionable Data Catalog
Index-based and conventional federated adapters
API Catalog
BPM Workflows/Automation
Optional Data Lake – Distributed, Partial or Centralized
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
AD
MIN
IST
RA
TIO
N/ C
ON
FIG
UR
AT
ION
AC
CE
SS
SE
CU
RIT
Y
Basic SmartData Fabric® architecture
Revision 12.2 Copyright 2020 WhamTech, Inc.
45
Virtual interactive
link analysis
and visualization
Virtual
reporting, BI
and analytics
Virtual
MDMVirtual
cybersecurity
Virtual
data
security
Virtual
event
processing
Applications
Mainframes Databases Files LogsOffice
docsEmail
Big
Data
Web
docs
Social
media
Cloud
DBStreaming IoTApplications
WhamTech EIQ Adapters (indexed and conventional federated)
Data
Sources
WhamTech
SmartData
Fabric®
(SDF)
Independent structured and unstructured Indexes, and indexed views only for data that needs it!
Standard drivers, APIs, Web/data services, SQL and potential other query languages
Virtual network, data
source and advanced
data discovery
Data preprocessing (read, but not usually stored, but can be in indexes and index views)
Optional Data Lake – Centralized or Distributed
WhamTech EIQ Federation Servers
WhamTech EIQ Federation Servers
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Revision 12.2 Copyright 2020 WhamTech, Inc. 46
Enterprise conventional data life-cycle
Created
Stored
Copied
Quality/improved
Stored
Copied
Quality/improvedStored
Copied
Quality/improved
Stored
Related
Reported
Analyzed
Acted on
Operational
Data Store (ODS)
Data
Warehouse (ETL and DW)
Data Mart
(DM)/
Analytics
Database/
Link
Analysis-
Graph
Database
Log/Transaction
System
Indexed
Indexed
Data copied multiple timesDiscarded/retained
Indexed
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Revision 12.2 Copyright 2020 WhamTech, Inc. 47
Big Data life-cycle similar to enterprise data life-cycle
Created
Stored
Copied x 3
Quality/improved
Stored
Copied
Quality/improvedStored
Copied
Quality/improved
Stored
Related
Reported
Analyzed
Acted on
Big Data
Lake/Reservoir
(similar to ODS)
Big Data Refinery
(similar to ETL)
Log/Transaction
System
Indexed
Indexed
Data copied multiple timesDiscarded/retained
Big Data/
Analytics
Database/
Link
Analysis/
Graph
Database
Indexed
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Revision 12.2 Copyright 2020 WhamTech, Inc. 48
Created
Stored
Copied
Quality/improved
Stored
Copied
Quality/improvedStored
Copied
Quality/improved
Stored
Indexed
Related
Reported
Analyzed
Acted on
Indexed
Indexed
Discarded/retained
Created
Stored
Reported
Analyzed
Acted on
Discarded/retained
Quality/improved
Indexed
Related
Master Data
Capabilities in the SmartData
Fabric® support applications
WhamTech
SmartData Fabric®
SDF eliminates most conventional data life-cycle stages
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Revision 12.2 Copyright 2020 WhamTech, Inc. 49
Created
Stored
Quality/improved
Indexed
RelatedCopied
Quality/improved
Stored
Indexed
Reported
Analyzed
Acted on
WhamTech
SmartData Fabric®
Log/Transaction
System
Master data
Discarded/retained
Big Data/
Analytics
Database/
Link
Analysis/
Graph
Database
Data provisioning for Big Data and other analytics
Data mapping, quality, security, masking
tokenization, encryption and link mapping,
and master data, addressed
• Assume data engineering role
• Eliminate up to 80% of time spent by
expensive data scientists and
analysts preparing data
• Tend towards real-time analytics and
feedback to operational/transactional
systems
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
The End
Appendix: Backup material
Revision 12.2 Copyright 2020 WhamTech, Inc. 50
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SmartData Fabric® for data virtualization/federation
Revision 12.2 Copyright 2020 WhamTech, Inc. 51
Discovery
Raw indexingand profiling
(data/(pointers) discarded)
Classification and
categorization
Access anddata
security
Cleansing,transformation
andstandardization
Maskingtokenization,
andencryption
Productionindexing
(data discarded)
Standard data view mapping
Indexed viewsBI, analytics,
CRM and BPM support
Link mapping/indexing
Master Data Management
(MDM)
Event processing
[Event correlation]
[Anomaly detection]
High performance parallel query
processing
Integration (results read
only)
Interoperability (results read
and write)
Link Analysis/Graph
Database
WhamTech key differentiators addressed upfront
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SDF combines best and overcomes worst of alternatives
Revision 12.2 Copyright 2020 WhamTech, Inc. 52
No. Feature
SmartData
Fabric®
Data
Warehouse
Conventional
Federated Data Search
Data
Lake
1 Query clean, transformed and standardized data ✓ ✓
2 Consistent and multiple indexes and types ✓ ✓ (✓)
3 Near real-time updateable pre-aggregated, pre-
calculated and pre-joined views
✓ ✓
4 Results when data sources unavailable ✓ ✓ ✓ or ✓
5 Row, column and data element security ✓ ✓ (✓)
6 Data stays in original format ✓ ✓ ✓
7 Data remains in source ✓ ✓ ✓ or
8 User-level access to source data ✓ ✓
9 Latest data available ✓ ✓
10 Drill-down capability ✓ ✓ (✓)
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SDF better than or as good as alternatives
Revision 12.2 Copyright 2020 WhamTech, Inc. 53
No. Feature
SmartData
Fabric®
Data
Warehouse
Conventional
Federated Data Search
Data
Lake
11 Actively monitor data sources ✓ () ✓ or
12 Work with unstructured data/text analytics ✓ ✓ ✓
13 Unlimited query options and performance ✓ (✓)*
14 Data/entity relationship/link mapping ✓ ✓ or ✓ or
15 Write back to data sources ✓ ✓ or
16 Avoid schema transforms ✓ ✓ or ✓
17 Full text search ✓ ✓
*with data marts
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SDF slightly disadvantaged compared to alternatives
Revision 12.2 Copyright 2020 WhamTech, Inc. 54
No. Feature
SmartData
Fabric®
Data
Warehouse
Conventional
Federated Data Search
Data
Lake
18 No index or query load on data sources (✓) ✓ ✓ ✓
19 Data source owners not aware of queries (✓) ✓ ✓ ✓
20 Archive options (✓) ✓ () ✓
21 Good for application data sources (✓) ✓ ✓ ✓
22 Minimal additional system cost () ✓ ()
23 No need for data or index update process ✓
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Example projects (1 of 2)
1. Optum/NAMM: Hybrid Cloud 2.0-type access to 100s and eventually 1000s of remote healthcare
partner healthcare data sources using selective indexing to provide a single patient virtual logical
view – data cannot be copied or moved
2. General Dynamics (GD): Tableau with Single Sign-On (SSO) enablement on multiple data
sources, including Peoplesoft HR and SaaS, using both index-based and conventional federated
adapters – seen as a data access and data security solution by GD
3. Northrop Grumman: Major DoD cyber program and platform – data cannot be copied or moved
and need real-time access - potential inclusion
4. Major healthcare payer: Matching unstructured contract content to structured claims data –
involves ML-trained entity extraction - potential project
5. Major healthcare provider: Test data access to production data using data virtualization + access
and data security + data masking - potential project
Revision 12.2 Copyright 2020 WhamTech, Inc. 55
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Example projects (2 of 2)
6. Past work with major DoD and intel government contractors – high performance and complex
query processing, including up to 60 billion records/day on HBase
7. POC for single patient view for NHS Trust in the UK – 3 organizations, 7 data sources = 4 on
premise and 2 in Cloud - Hybrid Cloud, use NHS MPI and own MPI, FHIR APIs, services and
AWS
8. Message Bank for very large medical academic delivery system for HL7 and other messages –
Cassandra target data source, real-time, parse and index, VMPI, FHIR APIs, and future support
for reporting, BI and analytics, including SPARK and ML
9. Bitcoin/Blockchain transaction reporting, BI and analytics for fraud detection – graph visualization
10. Virtual graph database, link analysis and graph visualization using simple SQL – OEM KeyLines
visualization
Revision 12.2 Copyright 2020 WhamTech, Inc. 56
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Reasons why conventional data virtualization fails to 100% deliver
FOR EACH DATA SOURCE:
• DATA QUALITY ISSUES
• Needing cleansed, typos, transpositions, missing, wrongly placed, etc.
• DATA PRE-PROCESSING NEEDS
• Text analytics on unstructured data, e.g., entity extraction, and other analytics on structured data
• DATA STANDARDIZATION ISSUES
• Different format, type, values, etc. – can impact range queries
• INDEX and INDEXED VIEW LIMITATIONS OR NOT AVAILABLE
• Queries unable to execute, need full-table scans and/or poor query performance
• QUERY PROCESSING LIMITATIONS
- Capabilities
- Performance/scale
- Load
• RESULTS DATA INCORRECT OR INCOMPLETE
• ACCESS CONTROL and DATA SECURITY LIMITATIONS
• Assume AD/LDAP-based IAM in-place, is SSO?
• Limited security levels, e.g., RBAC, ABAC/RLS and CLS
• May read/access incorrect, protected or sensitive data
Revision 12.2 Copyright 2020 WhamTech, Inc. 57
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SmartData Fabric® addresses issues with… (1 of 2)
DATA, by enabling:
1. Automated data discovery, profiling, identification, quality, standardization and governance that:
a. Can be acted on directly as part of SmartData Fabric® index and query processing layer vs. a one-time handoff to a
metadata repository, data governance or ETL system
b. Updates complete metadata/data profiles as indexes are updated
2. Data to be cleansed, transformed, standardized, masked, tokenized and/or encrypted in indexes and indexed
views, e.g., for personal, sensitive, MDM, other entity, “dirty” and incomplete data
3. Data/entity link/relationship mapping within and across multiple data sources for MDM, virtual graph database
and other uses
4. Seamless, automatic and optionally distributed MDM with near real-time updates for integration within and
across multiple data sources
5. Standard data views, business objects and knowledge graph across all data
6. Integration of unstructured data with structured data through text analytics, e.g., entity extraction and OCR,
and search
7. Data monitoring, event processing and BPM workflows in near real-time, e.g., operational reporting, BI and
analytics
Revision 12.2 Copyright 2020 WhamTech, Inc. 58
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
DATA SOURCES, by enabling:
8. Standard drivers of ODBC, JDBC and APIs, and/or SQL query processing for incompatible formats, e.g.,
mainframe files, file systems, IoT devices, office docs, email, Web pages and other unstructured/semi-
structured data sources
9. An external indexing and query processing layer that can absorb the load of external queries
DATA SOURCE ACCESS CONTROL, by enabling:
10.Advanced access control within and across domains, e.g., AD/LDAP, IAM, SSO, RBAC, ABAC/RLS and CLS,
regardless of data support for any of these – also applies to conventional federated adapters
DEPLOYMENT, by enabling:
11.Hybrid Cloud 2.0 where compute is in the Cloud or a data center, but data sources remain remote on-premise,
in data centers, SaaS, third-parties, multi-Cloud, etc., in addition to Hybrid Cloud 1.0
Revision 12.2 Copyright 2020 WhamTech, Inc. 59
SmartData Fabric® addresses issues with… (2 of 2)
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Conventional vs. SDF adapters costs and ROI comparison
Revision 12.2 Copyright 2020 WhamTech, Inc. 60
Attribute Conventional Federated Data
Access Adapters
WhamTech SDF Adapters
Costs - TCO Up to 1000 % of WhamTech 100%
ROI – assuming TCO as
basis, and revenue
gains and cost savings
0 - 10 10 – 100
Capabilities Basic Advanced – more capabilities for less cost
Perpetual License
Costs – CAPEX
IBM and others > 200% of
WhamTech; some freeware and Red
Hat < WhamTech
100%, starting at $10K per data source
Lease/SaaS Costs Assume 40% of perpetual license
costs per year, including
maintenance and support
40% of perpetual license costs per year,
including maintenance and support
Implementation Costs Up to 500% of WhamTech, long
duration to implement
100%, relatively simple to implement = low
costs and short duration
Maintenance and
Support Costs
18% of perpetual license costs –
included in lease/SaaS costs
18% of perpetual license costs – included in
lease/SaaS costs
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Simplified* layer capabilities
Revision 12.2 Copyright 2020 WhamTech, Inc. 61
Discovery, profiling and correlation – device, source and data
Security – classification, data masking, tokenization and encryption
Indexing – content, security, extracted entities, indexed views, unstructured and Link Indexes™
Quality – cleansing, transformation and standardization
Analytics – parsing, categorization, entity extraction and other analytics
Standard data view mapping – more than one possible
Master data management
Event processing Event correlation [Anomaly detection]
Support for BI/analytics
Support for CRMSupport for BPM/ decision support
Support for interoperability
RBAC and data loss prevention
Link analysis/graph database
VisualizationBig Data/analytics data provisioning
Standard drivers, APIs, Web/data services, SQL and other query languages
Post-index, standard
data view, multi-record
processing
Built-in support for
common applications
Built-in advanced
capabilities
Pre-index, single record
processing
AU
TO
MA
TIO
N
Data
SourcesMainframes Databases Files Logs
Officedocs
Applications EmailWebdocs
Socialmedia
BigData
StreamingCloud
DBIoT
Applications*Detailed layer diagram in
Appendix on slide 75
Optional Data Lake – Centralized or Distributed
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
EIQ Adapter
Data source-specific
Query Transform
Application to Standard Data View Mapping
SDF EIQ Adapter index and query process
Revision 12.2 Copyright 2020 WhamTech, Inc. 62
EIQ Product
front-end
Data
Source
Data
Source
EIQ Indexes
Update ServerData Profiler
Read Transform
Index (RTI) Tool
Data Transforms/clean-ups
Data Retrieval
CONVENTIONAL DRIVER
OR BULK LOAD
USER API / DRIVER
EIQ Adapter
Other data source EIQ Adapters
and EIQ Federation Servers
DISCOVERY
INITIAL INDEX BUILD
CONTINUOUS INDEX UPDATE
QUERY PROCESSING
RESULTS RETRIEVAL
STANDARD
DRIVER
SQL
DEVELOP
and TEST
USED BY
BUILD
Transaction
Log
MESSAGE QUEUE
Data Discovery
Automatic Query Processing
BI / Analytics / Application(s)
Standard Data View Mapping to EIQ Indexes
EIQ Federation Server
EIQ Federation
Server
Result-set
data source
pointers
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SmartData Fabric® Keyword Descriptions (1 of 2)
DISTRIBUTED aka FEDERATED (not centralized)
VIRTUAL (leaves data where it resides – uses federated adapters and/or stores data
in indexes as needed)
SECURITY (data and access to it)
DATA MANAGEMENT (data discovery, classification, security, processing/analytics,
cleansing, transformation, standardization, mapping to a standard data view,
linking/matching within and across data sources, indexing and query processing)
MASTER DATA MANAGEMENT (hybrid [limited repository + full registry] near real-
time distributed, seamless and automatic integration with data access)
ANALYTICS (built-in, externally run against and highly curated data provisioning for)
INDEPENDENT (of where data resides and associated systems, and configurations)
INTEROPERABLE (can write back to data sources)Revision 12.2 Copyright 2020 WhamTech, Inc. 63
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SmartData Fabric® Keyword Descriptions (2 of 2)
INDEXES SYNCED TO DATA SOURCES IN REAL-TIME TO BATCH (twelve
change data capture options)
VIRTUAL GRAPH DATABASE (semantic model)
VIRTUAL LINK ANALYSIS (find connections between entities, n degrees of
separation)
GRAPH/LINK VISUALIZATION (highly interactive thin client, OEM tool)
RUNS IN CLOUD, ON PREMISE, IN DATA CENTERS OR AS HYBRID (including
Hybrid Cloud 2.0)
EVENT PROCESSING (as indexes and index views are being updated)
ULTIMATE METADATA MANAGEMENT (complete on all data)
BUILDS AND SUPPORTS DATA GOVERNANCE (bottom-up/edge-in)EOS
Revision 12.2 Copyright 2020 WhamTech, Inc. 64
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Advanced data management
• Combine the best and overcome the worst of conventional approaches of data warehousing, federated data
access and enterprise search through index-based federated adapters to create a Hybrid SmartData
Fabric/Lake without needing to copy or move data from source systems, although that is an option
• Access adapters and federation servers at any level and from any location with advanced access control
• Include all data sources – from mainframes to IoT devices, on premise, Cloud, Hybrid Cloud, external, etc.
• Use multiple types of indexes and indexed views – distributed, 100% contiguous across data sources,
columnar, file-based and contain pointers to source data or data itself
• Federate/distribute data governance built and maintained from the bottom up as systems are discovered,
read, indexed and metadata captured – combine with advanced access security and data security - can
obtain (and store) a complete centralized data governance view at any time, and intervene and impose as
needed
• Federate/distribute metadata repository, data discovery, classification, security, quality, transforms,
relationships and mapping to standard data views
• Seamless, automatic and near real-time updateable integration of master data management to enable single
customer/patient and other entity views across the extended organization – can also federate/distribute
master data
Revision 12.2 Copyright 2020 WhamTech, Inc. 65
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Advanced application support
• Combine queries on structured, semi-structured and unstructured data
• Accelerate query processing on existing systems, but almost no load on data sources
• High performance parallel/edge query processing
• Enable direct access to data sources through indexes and/or use indexes to represent and use data logically
as (1) objects, (2) relational, (3) hierarchical and (4) NoSQL/Big Table
• Built-in virtual graph database, link analysis and graph visualization - use simple SQL
• Event processing - monitor changes to data sources through indexes and indexed views, trigger workflows,
and update applications and visualizations, e.g., operational dashboards and graphs
• True interoperability based on single customer/patient views with both read and write-back to data sources –
goal to have almost any application working with almost any data source(s)
• Provision highly curated data to Big Data/analytics in near real-time
• Bridge the gap between enterprise operational/transactional systems and reporting, BI, analytics and other
applications – tend towards closing the loop in near real-time
• Partnered with Tableau reporting, BI and analytics tool, Cambridge Intelligence (KeyLines) highly interactive
graph/link visualization tool
Revision 12.2 Copyright 2020 WhamTech, Inc. 66
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Advanced data source access and data security
• Leverage centrally managed AD/LDAP, IAM, SSO with Kerberos, RBAC, ABAC/RLS and CLS
– Data source stewards can have ultimate veto
• Supports advanced access security and data security for all data sources regardless of data source system
support
• Cross/multi-domain support (a major hurdle for most solutions)
• Be a data security gatekeeper for data sources
– Follow Forrester Zero Trust Data Security Model = Discover, INDEX, classify and secure
• All results data traceable to source records
• Dynamic data masking, tokenization and encryption (third-party Format-Preserving Encryption (FPE))
• Data governance from the bottom-up and/or can support a top-down tool
• Full auditability
• Support for third-party User Behavior Analytics (UBA)
– Alleviates/prevents insider data thefts (25%) and external origin (hacks) data thefts (75%)
– Leverage user logs, including queries made
Revision 12.2 Copyright 2020 WhamTech, Inc. 67
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Other advanced capabilities
NEAR REAL-TIME ARCHITECTURE
• Edge process data
• Enable a near real-time data/event-driven architecture
• Build new workflows using BPM software on top of legacy systems to support operations, CRM, smartphone
apps, IOT devices, reporting, BI and analytics
STANDARDS
• Standard drivers, APIs, Web/data services, REST APIs, ANSI SQL and other query languages with
conversion, Cloud, VMs, PMs, Windows, Linux and soon-to-be IBM Power Systems
• Standard data models/views, e.g., HL7 and FHIR for healthcare, NIEM for government and other areas, XBR
and others for financial services, ACORD for insurance or organization’s own
Revision 12.2 Copyright 2020 WhamTech, Inc. 68
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SmartData Fabric® example Bluemix deployment
Revision 12.2 Copyright 2020 WhamTech, Inc. 69
• Multiple access methods
• Multiple query language options
• Multiple ways to represent data
• Standard data view, e.g., FHIR APIs
and NIEM
• Cloud platform-based data services
• New BPM workflows running on
legacy data sources
• Write-back to data sources
• VMPI-governed data access
• Multiple legacy data sources
• Data sources could be in multiple
organizations
• Data sources could be on premise
and in the Cloud – Hybrid Cloud
access
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Multi-level value contribution
Revision 12.2 Copyright 2020 WhamTech, Inc. 70
Convert data to value-added INFORMATION
Advancedtext
search
Entityextraction
Indexedviews
LinkIndexes™/mapping
Real-timealerts
Entityresolution
CEP Categorization
Provide basic indexed virtual DATA discovery, access, integration, sharing and interoperability
Accessalmost
any datasource
Work withstructured
andunstructured
data
Improvedata
quality
Buildstructured
andunstructured
indexes
Integratemultiple
datasources
Leavedata insources
Scale withdistributed
parallelprocessing
Almostno loadon datasources
Map toa virtualstandarddata view
Updatein near
real-time
Datadiscovery
Dataprofiling
Convert knowledge to SUCCESS OUTCOMES
Tend toreal-time
Improvecompliance
Gaincustomers
Improvecustomer
experience
Upsell andcross-sell
customers
Reducecosts
Increaserevenue
Increaseprofit
Improvereporting
Reducewaste
Reduceliability
Convert information to KNOWLEDGE
Decisionsupport
BPM CRM/MDM
BI/analytics
EHR/HIE
Visualization Ontologyrepresentation
Linkanalysis
Socialanalytics
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SmartData Fabric® is location agnostic
Revision 12.2 Copyright 2020 WhamTech, Inc. 71
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SmartData Fabric® is configuration agnostic
Revision 12.2 Copyright 2020 WhamTech, Inc. 72
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Data-driven bottom-up vs top-down approach
Revision 12.2 Copyright 2020 WhamTech, Inc. 73
For each data source:
Application/Middleware
External Query in SQL
Data Source
Data Source Driver/API/Web Service
Data Quality/Parser/Entity Extraction/Other
WhamTech’s Automatic Query Processor
WhamTech’s Mapping Layer
WhamTech’s Standard Drivers/APIs/Web Services
WhamTech Link Indexes™
Transaction Log Reader,
MQ or similar
EIQ SuperAdapter™
WhamTech Content Indexes
Data discovery and profiling
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Detailed SmartData Fabric® layer capabilities
Revision 12.2 Copyright 2020 WhamTech, Inc. 74
Living
Networks™
Real-time
Business
Intelligence
Distributed
Analytics
Virtual
Graph
Database
Complex
Event
Processin
g
Link
Analysis
Social
Network
Analysi
s
Enterpris
e
and Web
Search
CDI-MDM/
Single
Entity View
Other
ApplicationsApplications
Changed Data Capture Intelligent Spider™
Device, Source and Data Discovery
Metadata Discovery/Data Profiling
Entity Extraction/NLP/Categorization/Other Text Processing
Data Cleansing, Transformation, Standardization, Masking, Tokenization and Encryption
Structured
Indexes
Text
Indexes
Extracted
Entity
Indexes
Fuzzy
Match
Indexes
Pre-
aggregated
Indexes
Pre-
calculated
Indexes
Embedded
Value
Indexes
Join
Indexes
Link
Indexes™
De-
normalized
Indexes
Data Security Layer – Query Side
Automatic Query Processing
EIQ SuperAdapter™
Standard drivers, APIs, Web/data services, SQL and other query languages
Real-time Monitoring and Event Processing
Administration
and
Configuration
Tools
Security
and
Privacy
Access
Controls
Master
Data
Indexe
s
Data Security Layer – Data Side
Semantic Mapping to Standard Data View(s)SmartData
Fabric®
Relationa
l
Database
s
Enterprise documents
and email
Mainframe
data
Spidered files from Web and
other sources
Web Services Applications
Standard, Proprietary
and Web Service Drivers
Application
Drivers
Files
Changed Data Capture
Data Sources
Network Assets and Devices
Network Assets and
Devices
Metadata
Management
and
Repository,
incl. Data
Governance
Master Data Management
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Discovery
SmartData Fabric® impact diagram
Revision 12.2 Copyright 2020 WhamTech, Inc. 75
Standard
Data View
ResultsContent
Indexes
Link
Indexes
Master Data
and Indexes
Device/Host
Source
Data
Profiling
Intra and Inter-
Source Data
Correlation
Index Data Preparation and
Results Data Transformation
Entity Extraction
Transform
Development
and Testing
Transform
Masking,
Tokenization
and Encryption
Structured
Unstructured Parser
Semantic
Identification
Categorization
Security
Classification
Structured
Indexes
(most data
discarded)
Unstructured
Indexes
(most data
discarded)
Standard
Data View
(indexes
semantically
mapped)
Distributed
Metadata
RepositoryLink
Indexes™
Security
Classification
Master
Data
Link
Analytics
and
Visualization
Master
Data
Indexes
Results
Data
Results
Data
Results
Data
Pointers
Distributed
Metadata
Indexes
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
ResultsMaster Data
and Indexes
Content
Indexes
Index Data Preparation and
Results Data Transformation
Discovery
3
SmartData Fabric® impact diagram for query submission
Revision 12.2 Copyright 2020 WhamTech, Inc. 76
Standard
Data View
Link
Indexes
Data
Structured
Unstructured
Structured
Indexes
(most data
discarded)
Unstructured
Indexes
(most data
discarded)
Standard
Data View
(indexes
semantically
mapped)
Distributed
Metadata
RepositoryLink
Indexes™
Master
Data
Link
Analytics
and
Visualization
Master
Data
Indexes
Distributed
Metadata
Indexes
APPLICATION
1 1
2
3
3
3
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
ResultsDiscovery Standard
Data View
Content
Indexes
Link
Indexes
Master Data
and Indexes
Index Data Preparation and
Results Data Transformation
Data
Transform
Masking,
Tokenization
and Encryption
Structured
Unstructured
Structured
Indexes
(most data
discarded)
Unstructured
Indexes
(most data
discarded)
Standard
Data View
(indexes
semantically
mapped)
Distributed
Metadata
RepositoryLink
Indexes™
Master
Data
Link
Analytics
and
Visualization
Master
Data
Indexes
Results
Data
Results
Data
Pointers
Distributed
Metadata
Indexes
4
7
5
8
4
49
SmartData Fabric® impact diagram for results retrieval
Revision 12.2 Copyright 2020 WhamTech, Inc. 77
APPLICATION
5
6
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SmartData Fabric® processes (1 of 5)
Automate (using BPM) as much as possible:
• Deploy on AWS, Azure, IBM Bluemix, OpenStack, VMs, physical servers – other
cloud options available
• Instantiate an EIQ System Administration and Configuration Tool
• Instantiate a distributed network asset/device, data source and metadata
repository
• Network asset/device discovery
• Data source discovery
− Using network asset/device discovery tool
− Using spiders for eDiscovery-type documents, files, email, etc.
• Instantiate EIQ Adapters™ on demand
Revision 12.2 Copyright 2020 WhamTech, Inc. 78
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SmartData Fabric® processes (2 of 5)
• Data discovery
− Optionally, with raw Link Indexes™ (internal and external pre-joins)
• Data identification
• DSL: Data risk classification
• CS: Event correlation
• Data profiling for data transforms for typos, transpositions and non-standard data,
e.g., name, address, phone and email correction
− Lookup dictionaries and thesauri, USPS or other address correction, regular expressions, APIs,
DLLs, transformation server, etc.
− DSL: Masking, tokenization or encryption for indexed data or dynamically depending on
access controls
Revision 12.2 Copyright 2020 WhamTech, Inc. 79
DSL = Data Security Layer
CS = Cybersecurity
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SmartData Fabric® processes (3 of 5)
• Multiple indexes and types, e.g., basic content, DSL: security (classification),
aggregations, calculations, fuzzy, text, extracted entities and Link Indexes
• DSL: Can encrypt entire disc volumes, individual indexes or entire sets of indexes
• MDM: Data source-specific tables containing unique indexed primary entity IDs,
and master data, links and date-time
− Create using Link Index process, with multi-attribute fuzzy match for composite scoring and
master data rules
• DSL: WhamTech Security and Privacy Access Profiles (SPAPs) or other Role-
Based Access Control
− Current: Source organization, user, role, application, target organization and data source
profiles available
− Future: Extend for application processes
Revision 12.2 Copyright 2020 WhamTech, Inc. 80
MDM = Master Data Management
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SmartData Fabric® processes (4 of 5)
• Hierarchies honored through joins and/or Link Indexes
− Inferred ontologies
− Reasons for hierarchies change depending on application, e.g., one vendor has multiple
products and one product from multiple vendors
• MDM: Versioning with access to historic master data
• Combine with other data sources, tending towards EDW/enterprise solutions
• MDM: Pure registry option to replace either data source indexes or source data
itself (automatically updates indexes) with master data
− Pure registry-based master data table, but limits options, lower performance and more complex
• Execute analytics, combined with other data and search/query filters, e.g.,
reporting, BI and link analysis/graph database
− Include aggregations, calculations, master data (if available) and other data, e.g., external
Revision 12.2 Copyright 2020 WhamTech, Inc. 81
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
SmartData Fabric® processes (5 of 5)
• Write back selective updates/corrections to data sources with possible inverse
data transforms (MDM: See previous slide)
• Continuously monitor metadata (index tree profiles) using stored procedures with
triggers
− Helps identify anomalies/outliers
• Event processing enabled (federated solution for Oracle® Event Processing)
– Open source and commercial BPM software for non-Oracle solutions
• Interoperability query transformation to avoid rewriting applications
– Goal to enable almost any application(s) to work with almost any data source(s)
• Mainframe data source option – files and live systems
• Hadoop (HBase/Hive and HDFS levels) and Cassandra options
Revision 12.2 Copyright 2020 WhamTech, Inc. 82
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Revision 12.2 Copyright 2020 WhamTech, Inc. 83
SmartData Fabric® data processes
Device/host
discovery
Data source
discovery
Data quality/transformation/
masking/tokenization/encryption
Query processing
Reported
Analyzed
Acted on
Discarded/retained
Data discovery
and profiling
Data and
link indexing
Results retrieval
(Results data quality/
transformation)
Note: Data not copied or moved
- only results retrieved
Data security
classification
Master data
management
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Solutions to multiple problems in one platform (1 of 2)
Revision 12.2 Copyright 2020 WhamTech, Inc. 84
Applications
Basic
Product
Optional
Modules
Distributed Data Virtualization (and Federation, Integration and Interoperability) Platform,
aka SmartData Fabric®
Virtual interactive
link analysis
and visualization
Virtual
reporting, BI
and analytics
Virtual
MDMVirtual
cybersecurity
Virtual
data
security
Virtual
event
processing
Virtual network, data
source and advanced
data discovery
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Solutions to multiple problems in one platform (2 of 2)
1. SMARTDATA FABRIC™ (SDF) for basic data discovery, profiling, quality, mapping, indexing, virtualization, federation,
integration and interoperability as basis for capability support modules and applications
2. EXTEND SDF WITH AUTOMATED NETWORK, DATA SOURCE AND ADVANCED DATA DISCOVERY including
relationships and eventual automated mapping to a standard data view
3. EXTEND SDF WITH EVENT PROCESSING to keep track of significant changes occurring in data
4. EXTEND SDF WITH DISTRIBUTED (preferably HYBRID) MDM to seamlessly combine with operational/transactional
data and maintain in near real-time
5. EXTEND SDF WITH IMPROVED CYBERSECURITY through indexed federated log and other data source access,
including automated anomaly detection and automated event correlation
6. EXTEND SDF WITH VIRTUAL DATA SECURITY LAYER to defend and protect data of value (i) as it is created, (ii) at
rest in the source, (iii) in transit, (iv) at the recipient and (v) after no longer needed
7. EXTEND SDF WITH BI/ANALYTICS oriented virtual and materialized real-time updateable hierarchical indexed views,
text analytics including entity extraction, and locally executed algorithms
8. EXTEND SDF WITH LINK ANALYSIS (and OEM LINK VISUALIZATION) for link analysis/graph database for almost
any type of analytics, including virtual MDM (master patient index), cybersecurity and data security
Revision 12.2 Copyright 2020 WhamTech, Inc. 85
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Discovery/raw index process
Revision 12.2 Copyright 2020 WhamTech, Inc. 86
Read
File
Record
Parse
Header
3rd party
Index
Content
ProfileValue
distribution
Metadata
Auto
identify
Personal data
Other data
Auto map to
standard data view
Auto transform
SmartData Fabric® security-centric distributed virtual data, master data and graph data management, and analytics
Production index process
Revision 12.2 Copyright 2020 WhamTech, Inc. 87
Read
File
Record
Parse
Header
3rd party
Process
Entity extraction
Analyze
Transform
Cleanse
Data type
Standardize
Secure
Index
Content
Link
Master data