Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

37
Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak

Transcript of Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Page 1: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Data Quality and Uncertainty Visualization

UC San DiegoCOGS 220

Winter Quarter 2006Barry Demchak

Page 2: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Immediate Motivation: Wiisard

A joint project of Veterans Administration and UC San Diego, funded by the National Library of Medicine

Mass casualty triage and treatment Enter patient information via PDAs Patient information summarized on tablet PCs Command/control for supervisors and incident

comment personnel Tied together using 802.11b and store-and-

forward database access

Page 3: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Wiisard – Explosion with Pesticides

Page 4: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Wiisard – Network Deployment

Page 5: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Wiisard – Tablet Display

Page 6: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Wiisard – Command/Control

Page 7: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Wiisard – The Problem

What if the network becomes partitioned? Tablet display shows out-of-date patient

information Summary displays are out of date, too

How does this lead to bad decisions? Supervisors may mis-deploy doctors Incident command may mis-deploy resources

People may die

Page 8: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

DOD Example

Sensor-to-shooter (STS) Networks – Patrick Driscoll (USMA), June 2002

Page 9: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

DOD Example

Page 10: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

DOD Example

“… our first attempt to get the military community to realize that there is a degree of uncertainty involved in (digital) information systems that cannot be engineered out of thesystem.”

“Ultimately, our concern was an awareness issue (for the decision maker) …”

“… woman at MITRE had proposed a system of tagging intelligence starting at the source in a way that would reflect the uncertainty of the data being put into the intel database.”

Page 11: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

The Problem

How to visualize the uncertainty in data so that humans can exercise judgment in making the best decision

Accounting for uncertainty is not the same thing as visualizing uncertainty

Page 12: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

What Labs are Involved

MIT Sloan School of Management Richard Wang (Data Quality)

Penn State University Alan MacEachren (GIS)

University of Maine Kate Beard-Tisdale (GIS)

University of California, Santa Cruz Alex Pang (Scientific Visualization)

University of Arkansas, Little Rock Master of Sciences in Information Quality

Page 13: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

What Conferences are There?

MIT Information Quality (IQatMIT) ACM SIGMOD Workshop on Information Qua

lity in Information Systems (IQIS) ACM SIGKDD (Knowledge Discovery and Dat

a Mining) MIT International Conference on Information

Quality (ICIQ)

Page 14: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Semiotic Interpretation

Data Visualization

Normal Mapping

Mapping

Normal

Data Visualization

Normal Mapping

PoorData

Quality

DataMapping

DataUncertaintyVisualization

Uncertainty Mapping

Mapping

Poor DataQuality w/

Uncertainty

Page 15: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Definition of Data Quality

From Wand & Wang:

Page 16: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Metrics

Timeliness How up to date relative to intended purpose

Ballou et al: Timeliness = Max(0, 1-(currency/volatility) Currency = delivery_time – input_time Volatility = length of time data remains valid Apply sensitivity factor “s”: Timeliness ^ s

Tim

elin

ess

time

Tim

elin

ess

time

Pulse = 80 Pulse = 180

Page 17: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Interplay with Uncertainty

Metrics are application dependent Metrics are data dependent Metrics are user dependent Question: If a metric describes an individual

data element, what is the effect of aggregating data elements having uncertainty??

Page 18: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

GIS Examples – NCGIA

Sample point locations as overlay

Sample points and corresponding contours using naïve shading

Page 19: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

GIS Examples – NCGIA

Gray shading uncertainty surface captures distance function used by interpolation method

Uncertainty encoded in contour line widths

Page 20: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Fill Clarity

Resolution

GIS Techniques

Contour Crispness

Fog

Page 21: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Merging Data and Uncertainty

Risk and uncertainty separately

Risk and uncertainty combined

Page 22: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Basic Data Examples

Errors

Page 23: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Basic Data Examples

Errors

Page 24: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Basic Data Examples

Ambiguation

Page 25: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Basic Data Examples

Ambiguation

Page 26: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Photo Realistic

Page 27: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Uncertainty Vector Glyphs

Page 28: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Uncertainty Vector Glyphs

Page 29: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Hue as Uncertainty

Without

With

Page 30: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Texture as Uncertainty

Raw

Trans-parent Points

Cer-tain-ty

Opaque Lines

Page 31: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Data Confidence

x is a device, is decay constant, R(x) is a weighting for device x in the calculation

Back to Wiisard

x

xpingtimexposttimecurtime

xRC

)(

)(1

1)(

Page 32: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Back to Wiisard

Individual data (annotation)

Aggregate data (annotated/integrated)

Page 33: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Back to Wiisard

Annotated

Page 34: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Back to Wiisard

Integrated

Page 35: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Research Questions

What are the dimensions of metrics relevant for determining data quality for medical providers in a mass casualty context?

What kind of visualization best conveys the use suitability for various kinds of data? Single data points Streaming bioinformation Aggregated information

Page 36: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Research Questions

What kinds of visualizations are best suited to field personnel? Non-IS frenzied technicians High glare, small footprint screens Low processing power

What kinds of visualizations are best suited to incident command? Seasoned experts Large, high density displays Highly connected with high data processing

Page 37: Data Quality and Uncertainty Visualization UC San Diego COGS 220 Winter Quarter 2006 Barry Demchak.

Conclusion

Data Quality and Uncertainty Visualization are like the weather …

… everyone’s talks about it, but no one does anything about it