12. Models of Business Information [2]...

18
1 of 35 12. Models of Business Information [2] DE + IA (INFO 243) - 3 March 2008 Bob Glushko 2 of 35 Plan for Today's Class "Operation Clean Data" case studies Authority Control Data Warehouses "Interoperability Costs in Auto Supply Chain" case study Hub languages The Universal Business Language

Transcript of 12. Models of Business Information [2]...

Page 1: 12. Models of Business Information [2] (35)courses.ischool.berkeley.edu/i243/s08/lectures/243-12-20080303.pdf · 3/3/2008  · -- Carlson Wagonlit Case What were the symptoms or implications

1 of 35

12. Models of Business Information [2]

DE + IA (INFO 243) - 3 March 2008

Bob Glushko

2 of 35

Plan for Today's Class

"Operation Clean Data" case studies

Authority Control

Data Warehouses

"Interoperability Costs in Auto Supply Chain" case study

Hub languages

The Universal Business Language

Page 2: 12. Models of Business Information [2] (35)courses.ischool.berkeley.edu/i243/s08/lectures/243-12-20080303.pdf · 3/3/2008  · -- Carlson Wagonlit Case What were the symptoms or implications

3 of 35

But First... Schedule for Assignments

Assignment 3, Business patterns (assigned today 3/3, due 3/12)

Assignment 4, Requirements and Source Inventory (assigned 3/12, due

3/24)

Assignment 5, Process Analysis (assigned 3/31, due 4/9)

Assignment 6, Document Analysis (assigned 4/16, due 4/23)

4 of 35

We're Both "Shipping Containers"

"The expense of resolving ambiguous business terms over and over on

a daily basis pales in comparison with the expense of NOT realizing

there is an ambiguity in the term" (Farish)

Page 3: 12. Models of Business Information [2] (35)courses.ischool.berkeley.edu/i243/s08/lectures/243-12-20080303.pdf · 3/3/2008  · -- Carlson Wagonlit Case What were the symptoms or implications

5 of 35

Controlled Vocabularies

The words people use to describe things or concepts are "embodied" in

their context and experiences... so they are often different or even "bad"

with respect to the words used by others

These naturally-occurring words are an "uncontrolled vocabulary"

Information retrieval or other processes with uncontrolled vocabularies

are often ineffective and error-prone

Creating a controlled vocabulary creates an artificial language by:

Choosing an authoritative form of a term, name or identifier1.

Ensuring that the term is distinctive2.

Mapping all the variant forms to the authoritative one3.

6 of 35

"Operation Clean Data" -- British Military Case

What were the symptoms or implications of "dirty" data in the British

army's supply chains?

What were the primary causes of this "dirty" data?

Which data items were the focus of the data cleanup effort? Why?

What technologies or tools were used in the data cleanup effort?

Page 4: 12. Models of Business Information [2] (35)courses.ischool.berkeley.edu/i243/s08/lectures/243-12-20080303.pdf · 3/3/2008  · -- Carlson Wagonlit Case What were the symptoms or implications

7 of 35

"Operation Clean Data"-- Carlson Wagonlit Case

What were the symptoms or implications of "dirty" data for the Carlson

Wagonlit travel agency?

What were the primary causes of this "dirty" data?

How is Carlson Wagonlit improving its data quality?

8 of 35

"Operation Clean Data" -- Cendant Case

What were the symptoms or implications of "dirty" data for Cendant?

What were the primary causes of this "dirty" data?

How is Cendant improving its data quality?

Page 5: 12. Models of Business Information [2] (35)courses.ischool.berkeley.edu/i243/s08/lectures/243-12-20080303.pdf · 3/3/2008  · -- Carlson Wagonlit Case What were the symptoms or implications

9 of 35

Normative Name Forms

When names appear in multiple forms, one form needs to be chosen

using criteria that include:

Fullness (e.g., full names vs. initials only)

Language of the name

Spelling (choose predominant form)

Entry element

"Smith, John" not "John Smith"

"Mao Zedong" or "Zedong, Mao" or "Mao Tse Tung" or ?

10 of 35

Authority Control for Places

Variant forms: St. Petersburg, Санкт Пербургскйй, Saint-Pétersbourg

Multiple names: Cluj, in Romania / Roumania / Rumania, is also called

Klausenburg and Kolozsvar

Name changes: Bombay -> Mumbai.

Homographs:Vienna, VA, and Vienna, Austria; 50 Springfields

Anachronisms: No Germany before 1870

Vague, e.g. Midwest, Silicon Valley

Unstable boundaries: 19th century Poland; Balkans; USSR

Page 6: 12. Models of Business Information [2] (35)courses.ischool.berkeley.edu/i243/s08/lectures/243-12-20080303.pdf · 3/3/2008  · -- Carlson Wagonlit Case What were the symptoms or implications

11 of 35

"Operation Clean Data" -- US govt agencies

How are the US Census Bureau and CDC improving data quality?

How do these processes differ for printed and electronic surveys/forms?

12 of 35

Some General Questions about Data Quality

Are the data quality problems primarily technology ones or

process/management ones?

Why are "homonyms" worse than "synonyms" in a set of item

identifiers?

Does data have to be perfectly clean? Can it ever be?

How can your own actions contribute to data quality problems or to their

resolution?

Page 7: 12. Models of Business Information [2] (35)courses.ischool.berkeley.edu/i243/s08/lectures/243-12-20080303.pdf · 3/3/2008  · -- Carlson Wagonlit Case What were the symptoms or implications

13 of 35

Principles and Processes for Quality Information

Prioritize the data items

Involve the data owners

Keep future data clean (enough)

Find the data owners and the "headwaters"

Validate at the time of capture or creation

Set realistic goals for data quality

14 of 35

Data Warehouses

A data warehouse is a "subject-oriented, integrated, time-varying,

non-volatile collection of data used in organizational decision making"

Data warehouses extract data from ERP systems or other transactional

applications into a separate repository

It is common practice to "stage" data prior to merging it into a data

warehouse with an "Extract, Transform, and Load" (ETL) application

The data model for the warehouse, designed to enable efficient ad hoc

data analysis and reporting, is sometimes called a "hypercube"

Page 8: 12. Models of Business Information [2] (35)courses.ischool.berkeley.edu/i243/s08/lectures/243-12-20080303.pdf · 3/3/2008  · -- Carlson Wagonlit Case What were the symptoms or implications

15 of 35

Generic Enterprise Information Integration Architecture with Warehouse (Gantz, XML 2004)

16 of 35

ETL vs ELT

The traditional ETL (Extract-Transform-Load) approach relies on

proprietary ETL engines being deployed between sources and targets.

Relational databases are rapidly eliminating the ETL category by

incorporating transformation functionality

So ETL is becoming ELT (Extract-Load-Transform), with all the

complex processing of data occurring inside the database

Page 9: 12. Models of Business Information [2] (35)courses.ischool.berkeley.edu/i243/s08/lectures/243-12-20080303.pdf · 3/3/2008  · -- Carlson Wagonlit Case What were the symptoms or implications

17 of 35

The Virtual Warehouse

A virtual warehouse is created "on demand" by centralizing and

normalizing metadata about the data sources rather than the data itself.

The data is left in its original location and extracted only when needed,

which makes more "real time" analysis

18 of 35

Virtual Warehouse Via Metadata Repository(Gantz, XML 2004)

Page 10: 12. Models of Business Information [2] (35)courses.ischool.berkeley.edu/i243/s08/lectures/243-12-20080303.pdf · 3/3/2008  · -- Carlson Wagonlit Case What were the symptoms or implications

19 of 35

"Interoperability Costs in the US Auto Supply Chain"

Excellent case study about how a concurrent engineering business

model escalates the information exchanges and interoperability

problems in the "ecosystem"

Analyzes various alternatives for data transfer, and finds that the

choices made are not the optimal ones

Concepts and lessons apply to other industries with "data

exchange-intensive" supply chains

20 of 35

Alternatives for Data Transfer Between TwoSystems

Manual re-entry

Everyone has to learn to "speak" all the languages

Native formal transfer

Point-to-point translation

Everyone has to learn just one new language but it has to be the same

one

Dominant players impose their language on their ecosystem

Multiple vocabularies exist, but there is at least one "interchange" or "hub"

language designed to facilitate translations between "native" vocabularies

Page 11: 12. Models of Business Information [2] (35)courses.ischool.berkeley.edu/i243/s08/lectures/243-12-20080303.pdf · 3/3/2008  · -- Carlson Wagonlit Case What were the symptoms or implications

21 of 35

CAD / CAM Systems Proliferation

22 of 35

Juran's "Quality Costs" Framework

Joseph Juran's "Quality Control Handbook" (1951) -- "cost of quality"

framework determines how much to spend on quality at any point in the

"quality system"

The costs of preventing and finding quality problems (avoidance) ...

Prevention costs (design reviews, training, guidelines, knowledge...)

Appraisal costs (tests, process control measurements, reports,

evaluations,...)

... must be balanced against the costs associated with those quality

problems (mitigation):

Internal failure costs (costs incurred before the product or service is

delivered: scrap, rework, lost time, unused capacity, ...)

External failure costs (cost incurred when quality problems reach customers:

returns, recalls, complaints, field services, warranty repairs, liability

lawsuits,...)

Page 12: 12. Models of Business Information [2] (35)courses.ischool.berkeley.edu/i243/s08/lectures/243-12-20080303.pdf · 3/3/2008  · -- Carlson Wagonlit Case What were the symptoms or implications

23 of 35

The Case for Investing in Avoidance

24 of 35

Interoperability Avoidance Costs

Page 13: 12. Models of Business Information [2] (35)courses.ischool.berkeley.edu/i243/s08/lectures/243-12-20080303.pdf · 3/3/2008  · -- Carlson Wagonlit Case What were the symptoms or implications

25 of 35

Interoperability Mitigation and Delay Costs

26 of 35

Estimated Interoperability Costs

Page 14: 12. Models of Business Information [2] (35)courses.ischool.berkeley.edu/i243/s08/lectures/243-12-20080303.pdf · 3/3/2008  · -- Carlson Wagonlit Case What were the symptoms or implications

27 of 35

An Interchange or Hub Language

28 of 35

Hub Languages for e-Business

(early 1990s) - Ad hoc efforts in EDIFACT to "harmonize" core

components across verticals

1997- XML Common Business Library

is 1st XML horizontal vocabulary, incorporated EDIFACT semantics and

code lists

1999 - ebxml

initiative of EDIFACT and OASIS to develop syntax-neutral "core

components"

2001 - Universal Business Language

effort begins, building on xCBL and ebXML Core Components

Page 15: 12. Models of Business Information [2] (35)courses.ischool.berkeley.edu/i243/s08/lectures/243-12-20080303.pdf · 3/3/2008  · -- Carlson Wagonlit Case What were the symptoms or implications

29 of 35

Universal Business Language

DOCUMENT ARCHITECTURE: A generic XML interchange format for

business documents that can be extended to meet the requirements of

particular industries

CORE COMPONENTS: A library of XML schemas for reusable data

components such as "Address," "Item," and "Payment" -- the common

data elements of everyday business documents

STANDARD DOCUMENTS: A small set of XML schemas for common

business documents such as "Order," "Despatch Advice," and "Invoice"

that are constructed from the UBL library components and can be used

in a generic order-to-invoice trading context

30 of 35

UBL 1.0 Document / Process Scope

Page 16: 12. Models of Business Information [2] (35)courses.ischool.berkeley.edu/i243/s08/lectures/243-12-20080303.pdf · 3/3/2008  · -- Carlson Wagonlit Case What were the symptoms or implications

31 of 35

How A Hub Language Increases the XML Advantage over EDI

32 of 35

How a Hub Language Shortens the Time to the XML Payoff

Page 17: 12. Models of Business Information [2] (35)courses.ischool.berkeley.edu/i243/s08/lectures/243-12-20080303.pdf · 3/3/2008  · -- Carlson Wagonlit Case What were the symptoms or implications

33 of 35

Document Exchange Context with UBL

34 of 35

Mapping in and out of Hub Language

If all parties/applications/services rely on a hub language for their

external interfaces, an exponential interoperability challenge becomes a

linear one

Mapping

tools for transforming instances from an internal information model to

another one are ubiquitous as standalone tools and as parts of

application servers

EXAMPLE: Altova MapForce

Page 18: 12. Models of Business Information [2] (35)courses.ischool.berkeley.edu/i243/s08/lectures/243-12-20080303.pdf · 3/3/2008  · -- Carlson Wagonlit Case What were the symptoms or implications

35 of 35

For Wednesday March 5

Chapter 5 of Document Engineering

"E-Government Architecture in Ireland" Sean McGrath and Fergal

Murray, XML 2004 Conference

"Mobile Telemedicine System for Home Care and Patient Monitoring"

M. V. M. Figeuredo and J. S. Dias, Proceedings of the 26th Annual

Conference of the IEEE EMBS (September 2004)

"Redefining the Patient Record Paradigm" MedicAlert Foundation,

Whitepaper (January 2005)