Post on 30-Dec-2015
1Chuck Kelley Excellence In Data, LLC
1
room 1 kitchen
garage
room 2room 3
The Information Blueprint
Metadata
2Chuck Kelley Excellence In Data, LLC
Definition of MetadataMetadata is
Data about DataThe map of the Data WarehouseDefines the construction, health and
descriptive informationMetadata is not
The data itselfMaster dataExternal data (depending on the type of data!)
3Chuck Kelley Excellence In Data, LLC
Hmmm, I wonder whatthis information reallymeans?
What information is available?What does it mean?How was it derived?What was its source?
How current is it?Who uses it?
How often is it used?
Metadata: the information “yellow pages”Metadata: the information “yellow pages”
4Chuck Kelley Excellence In Data, LLC
5Chuck Kelley Excellence In Data, LLC
Types of MetadataTechnicalBusinessContextual
6Chuck Kelley Excellence In Data, LLC
Technical MetadataTechnical metadata is data about data needed by the
technology folks to do their work correctly. This includes the "good ole days" metadata, but adds much more. Technical metadata is used by the IT side to understand how the data warehouse/data mart was constructed. What is the system of record for a specific piece of
data, What transformations were performed on what source
data to produce data in the data warehouse/data mart, What are the columns in the data warehouse/data mart
and what do they mean, What is used to reconcile the data with the source
system, and When was the last date and time the data was loaded
into the data warehouse/data mart.
7Chuck Kelley Excellence In Data, LLC
Business MetadataBusiness metadata is data about data needed by the
business community to do their work better. Business metadata is used by the business to understand what is available in the data warehouse/data mart and how, intheir terminology, is it built. Business metadata includeWho is the data steward,What is the confidence level of the data
and its quality, What algorithm is used to create the values,What is the definition of this data, andWhat reports are available.
8Chuck Kelley Excellence In Data, LLC
Contextual Metadata Contextual metadata is data that sets "context" of your data. It really
isn't metadata in the typical sense of the word, but is classified asmetadata nonetheless. Examples of contextual metadata are
Weather reports, Headlines of the day, and Social, economic, and political issues.
Contextual metadata is the hardest to collect. Possible sources are newswire feeds (like AP, Wall Street Journal, Christian
ScienceMonitor), Internet sites (http://www.weather.com, http://www.wsj.com),or just plain manual input (which is probably the least desirable).
How does contextual metadata help? Let's say that your organization is the Department of Energy and you
noticed a major jump in spending on security during the late 1990s. Now, in 2009, the spending seems to be trending downward. How do you know why that might be happening? Duringthe late 1990s (see I don't remember the year or the name of the person already!), there was believed to be some breach of security and that classified data was being "stolen". If that information was captured, then when the trend is discovered, we could look at the context of what was happening in the late 1990s to see if it can help understand thetrend.
9Chuck Kelley Excellence In Data, LLC
10Chuck Kelley Excellence In Data, LLC
DataWarehouse
META DATA
room 1 kitchen
garage
room 2room 3
Metadata provides the blueprint of the Data Warehouse
11Chuck Kelley Excellence In Data, LLC
Importance of Technical MetadataServes the IT community with
operational detail about information systems. HoweverMetadata is not the primary focus
of ITLooked upon as a documentation
exercise of minimal valueOften relegated to “nice to have”
status
12Chuck Kelley Excellence In Data, LLC
Importance of Business MetadataServes the Business Community
as a source to discover what and where information existsBusiness meaning takes
precedence over technical detail (look for commonality)
Looked upon as a key source for knowledge on Operational processes
13Chuck Kelley Excellence In Data, LLC
Importance of MetadataServes the Data Warehouse as a key
enablerOf primary importance to DSS AnalystsMetadata is critical to tracking the
content and validity of data in the Warehouse
Provides context to the dataIssue: “Knowledge Gap” between OLTP and DW can affect the success of Data Warehousing Implementations
14Chuck Kelley Excellence In Data, LLC
Importance of MetadataWhat it does
Describes data in operational systems which facilitates mapping data elements to the DW data conversion aggregation & summarization logic coordinating naming conventions managing anomalies between
physical characteristics of common data across information systems
15Chuck Kelley Excellence In Data, LLC
Importance of MetadataMetadata provides a new
dimensionIt allows
Data to be managed over time ( 5 - 10 years) Data to be managed by context (business
meaning and business value will change over time)
Manages structural changes to the DW database (versioning of metadata)
Allows Operational Systems to reinvent themselves by discovering corporate data which exists across systems
16Chuck Kelley Excellence In Data, LLC
17Chuck Kelley Excellence In Data, LLC
Metadata Exists Everywhere!
Metadata
Manager
Metadata
Manager
Physical
Database
Definition
External
Sources
Operational
Data
Sources
Summarization
Data Warehouse
Data Model
Internal
Non-Operational
Data Sources
Database Definitions
File Definitions
COBOL Copybooks
Data Extraction Tool
Data Dictionary
Database Definitions
Data Modeling Tool
Data Dictionary
DSS Tool
Data Definitions
DSS Tool
Business Catalog
18Chuck Kelley Excellence In Data, LLC
Metadata and the Data WarehouseIn
terf
ace
, T
ran
sform
ati
on
,an
d L
oad
Data
Ag
gre
gati
on
Data
Acc
ess
Data
Ware
hou
se
Data
Mod
el
Data Quality
Data Warehouse OperationsB
usi
ness
Ru
les
/D
eri
ved
Measu
res
19Chuck Kelley Excellence In Data, LLC
Interface, Transform, and LoadDescribes the interface location and
content.Describes information about the
transformation from source system codes to reference data codes.
Describes information about custom transformation, such as, using subsets of the data.
Describes information about the Data Warehouse destination
20Chuck Kelley Excellence In Data, LLC
ETL Metadata Points
FilterFilter CleanseCleanse
ExtractExtract
TransformTransform Log/QALog/QA
DataWarehouse
DataSources
21Chuck Kelley Excellence In Data, LLC
Information Sourcing Activities
Extract Pull Data from Operational Systems
Raw Data
Activity Description Outcomes
Filter Discard “Noise” data from data set
Dirty Data
Cleanse Analyze data qualityand make corrections
Clean Data
Transform Rearrange and SummarizeData
Useful Data
Log/QA Perform Final Checkand Build “Yellow Pages”
Verified DataMetadata
22Chuck Kelley Excellence In Data, LLC
DW Data ModelA description of each attribute and entity of
the data model.This is an extract from the CASE tool that
manages the data model or has been the output of a data dictionary.
23Chuck Kelley Excellence In Data, LLC
AggregationRules based engine for Aggregation.States which fields from which DW tables are
combined and the algorithm that aggregates the data.
Used to create code or to suggest stored procedures for aggregation.
24Chuck Kelley Excellence In Data, LLC
Data Access - ReportingReport Generation Metadata
A rules based reporting tool that describes a report format from the header to column and rows.
Report Menu MetadataDescribes the reports that are available.Describes the Menu that the user is shown for
accessing the available reports.
25Chuck Kelley Excellence In Data, LLC
Data Access - QueryDefines canned queries available.Defines public and private queries.Allows queries to be combined.
26Chuck Kelley Excellence In Data, LLC
Data Access - End User
Data Model
Interface, Transformation,
and Load
Application Help files
End User Application
Data Warehouse
Derived Business Measures
27Chuck Kelley Excellence In Data, LLC
Data QualityDW Load Statistics
The use of control numbers from the source system, compared to the load data.
DW Quality RulesRules that tracked known data trends for
report checking
28Chuck Kelley Excellence In Data, LLC
29Chuck Kelley Excellence In Data, LLC
Metadata ComponentsStorage in the WarehouseOperational MappingExtract HistoryVolumetricsAlgorithmsRelationship HistoryOwnership/StewardshipExternal/Reference DataBusiness Meaning (Data Models)
Storage Mapping History
Volumetrics Algorithms Relationships
Ownership Reference Data Models
30Chuck Kelley Excellence In Data, LLC
Metadata ComponentsStorage Requirements
Database SchemaTable SpacesDatabase Tables(Dimensions, Facts)Keys and IndexesFacts (Attributes)Information Access (Data Topology)
PC’s EIS DSS Operational
Ownership Reference Data Models
Volumetrics Algorithms Relationships
Mappin History
Storage
31Chuck Kelley Excellence In Data, LLC
Metadata ComponentsOperational Mapping
Location of data sourcesData Element conversion
Physical characteristic conversions naming changes default values encoding
Data Key changesLogic & Algorithms
Ownership Reference Data Models
Volumetrics Algorithm Relationships
History Storage
Mapping
32Chuck Kelley Excellence In Data, LLC
Metadata ComponentsExtract History
Logged history of data extracts and transformations
Audit logsJob Scheduling (Batch, On-line)
Ownership Reference Data Models
Volumetrics Algorithms Relationships
Storage Mapping
History
33Chuck Kelley Excellence In Data, LLC
Metadata ComponentsVolumetrics
Number of TablesNumber of RowsUsage CharacteristicsTable IndexingAging Criteria
Ownership Reference Data Models
Algorithms Relationships
Storage Mapping History
Volumetrics
34Chuck Kelley Excellence In Data, LLC
Metadata ComponentsAlgorithms
Levels of SummarizationCriteria applied to Data
AggregationData Derivation
Ownership Reference Data Models
Relationships
Storage Mapping History
Volumetrics
Algorithms
35Chuck Kelley Excellence In Data, LLC
Metadata ComponentsRelationship History
Relationship ArtifactsRelationship History
Tables included Effective Dates Constraints in Effect Cardinality in Effect Description Ownership Reference Data Models
Storage Mapping History
Volumetrics Algorithms
Relationships
36Chuck Kelley Excellence In Data, LLC
Metadata ComponentsOwnership/Stewardship
Operational Ownership Updates Recovery Accuracy
Data Warehouse Stewardship Data consistency Loading Access
Reference Data Models
Storage Mapping History
Volumetrics Algorithms Relationships
Ownership
37Chuck Kelley Excellence In Data, LLC
Metadata ComponentsExternal/Reference Data
Location, type and content of external dataEncoded values and changesAudit log of changesDate/Time stamps
Data Models
Storage Mapping History
Volumetrics Algorithms Relationships
Ownership
Reference
38Chuck Kelley Excellence In Data, LLC
Metadata ComponentsBusiness Meaning (Data Models)
Data Warehouse Data Model (Logical)
Mapping to Data Warehouse Database Design (Physical)
Mapping to Operational Systems Data Models (Corporate & Business Area)
Mapping to other DW architecture Metadata EIS/DSS Data Mining/Data Journalism
StorageMapping History
Volumetrics Algorithms Relationships
Ownership Reference
Data Models
39Chuck Kelley Excellence In Data, LLC
40Chuck Kelley Excellence In Data, LLC
How to Use MetadataMetadata Manager Requirements
Required features include: GUI Data Model Management Model/Data Versioning Data Access & Security Integration with the DW DBMS Integration with DW Architecture Unstructured Reference Data
Management (futures)
Storage Mapping History
Volumetrics Algorithms Relationships
Ownership Reference Data Models
41Chuck Kelley Excellence In Data, LLC
How to Use MetadataMost current Repositories are not
extensibleFew specialized tools are available for
Metadata Mining (Data Re-engineering)There is no standard way to exchange
metadatabetween various Meta Manager toolsbetween EIS/DSS tool setsbetween OLTP DBMS and DW DBMS
OLTP CASE repositories which manage business models are not geared for Data Warehousing
However...
42Chuck Kelley Excellence In Data, LLC
How to use MetadataHow to support it (Metadata
Maintenance)Care and feeding of Metadata is
just as important as the data itselfOther Considerations.
How to get IT and the Business Client to use metadata Have a single point of contact Always do it at their terminal (DW or
Client) Always let them do it with your help
43Chuck Kelley Excellence In Data, LLC43
44Chuck Kelley Excellence In Data, LLC
What to Look for in Products
44
From David Marco’s book
Building and Managing the Meta Data Repository: A Full Lifecycle Guide
45Chuck Kelley Excellence In Data, LLC
Vendor BackgroundFull name and business address of vendor. Parent Company. Number of years company has been in business. Company structure. Is it a corporation, partnership, or
privately held? List names associated with structure if different from question # 1.
Public or privately held company. If public, which exchange is company traded on, and what is the company's market symbol?
When did the company go public, or when is it expected to go public?
Total number of employees worldwide?Total number of U.S. employees?Web site URLNumber of developers supporting proposed product solution?Company profit/loss for the last three years (if available).
45
46Chuck Kelley Excellence In Data, LLC
Proposed Solution Overview Summary of the vendor's proposed solution and explain how it
meets the needs What are the names and versions of the product(s) component(s) comprising
the vendor's proposed solution? The repository architect and infrastructure architect need to carefully review all the components in the proposed
solution and compare them with the target technical environment and support structure. How do the components communicate? What hardware platforms, DBMS's, Web servers and communications protocols do the components require? How is security and migration handled among the various components?
Number of worldwide production installations using precisely this proposed solution configuration.
Be sure to consider the hardware, DBMS, Web server, etc. How many other companies are using the same confiruration? Is your company going to be the first?
What hardware, operating system, DBMS and web browser limitations do each of the product(s) component(s) have in the proposed solution on client and server platforms?
Be mindful of any requirements to download. Java applets and/or ActiveX controls to the client. This might be in conflict with your company's web policy or if deployed externally your clients.
What is the release date and version number history of each of the product(s) component(s) for the past 24 months
What was the anticipated release date and new feature list for each of the product(s) features and component(s) over the next 12 months?
46
47Chuck Kelley Excellence In Data, LLC
Cost of SolutionTotal cost of proposed solution. Cost of consulting services required for installation.
Negotiate consulting time up front to complete staff training and get the repository up and running as quickly as possible.
Cost of consulting services for initial project setupWhat is the vendor's daily rate for consulting
services without expenses? Annual maintenance cost/fee
This should range any where from 14 percent to 18 percent of solution price?
Are all new product component releases/upgrades provided while under an annual maintenance agreement? If not, please explain in detail.
47
48Chuck Kelley Excellence In Data, LLC
Technical Requirements Are there any database schema design requirements for the DSS data
model in order to function with the repository product? Does the proposed solution require a change in the existing DSS schema
design in order to function? How does the tool control the various versions of the meta data
(development, quality assurance and production) stored in the repository? How is meta data from multiple DSS projects controlled and separated?
How can the various projects share meta data? The answer to this question will determine how you administer the product
and provide security. Describe how meta data repository contents are migrated from one system
engineering phase to the next (development, quality assurance and production)? How does this processing sequence differ when dealing with multiple projects on various time lines?
In particular how is meta data migrated through the various design phases? Can a single project or portion of a project be migrated forward? How?
What DBMS privileges does the product support (e.g., roles, accounts, and views)?
Can DBMS-specific SQL statements be incorporated into queries?
48
49Chuck Kelley Excellence In Data, LLC
Technical RequirementsDescribe the security model used with the product? How does the product use existing infrastructure
security systems? Does the product use any type of single sign-on
authentication (e.g., LDAP)? Where are user security constraints for the product
stored? Can a user have access to the repository tool for
one project but no access for another project? Can a user view the SQL generated by the product? Is the product Web-enabled? Describe.
49
50Chuck Kelley Excellence In Data, LLC
ImplementationDescribe sequence of events and level of effort
recommended for clients to consider in planning their implementation strategy.
What is the typical duration of the implementation cycle?
How many DSS database schema dimensions and facts can the proposed product solution handle?
Provide a sample project plan for implementation of your proposed solution for a single DSS project.
What client resource skill sets need to be in place for installation and implementation?
50
51Chuck Kelley Excellence In Data, LLC
And Lastly, but very importantObtain from vendor at least three customers
references
51
52Chuck Kelley Excellence In Data, LLC52
53Chuck Kelley Excellence In Data, LLC
Top Five Mistakes of Metadata1. Not defining the Objectives of the Metadata2. Purchasing the tool before the requirements3. Choosing the tool before an evaluation4. Making Metadata to hard to utilize5. Not understanding the effort of Metadata
53
54Chuck Kelley Excellence In Data, LLC
ConclusionsMetadata is a critical component of any data
warehouseInformation users must learn how to use it.Learn from others mistakes
55Chuck Kelley Excellence In Data, LLC
Chuck Kelley30+ year professional in dealing with Datachuckkelley@usa.net480-797-5850
“I never metadata I didn’t like”