Distributed Tera-Mining

25
Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc.

description

Distributed Tera-Mining. R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc. Trend 1. Explosion of Data …. … All in the Wrong Format. With no one to analyze it. The Data Gap. Most data comes a GB and a TB at a time. The Data Gap. - PowerPoint PPT Presentation

Transcript of Distributed Tera-Mining

Page 1: Distributed Tera-Mining

Distributed Tera-Mining

R. L. Grossman

Laboratory for Advanced Computing

University of Illinois &

Magnify, Inc.

Page 2: Distributed Tera-Mining

Trend 1. Explosion of Data …

Page 3: Distributed Tera-Mining

… All in the Wrong Format

With no one to analyze it.

Page 4: Distributed Tera-Mining

The Data Gap

0

500,000

1,000,000

1,500,000

2,000,000

2,500,000

3,000,000

3,500,000

4,000,000

1995 1996 1997 1998 1999

The Data Gap

Total new disk (TB) since 1995

New Ph.D.s

Most data comes a GB and a TB at a time.

Page 5: Distributed Tera-Mining

Trend 2. Sonet is dead. Lambda Rules.

Gigabytes can be moved in seconds.

Page 6: Distributed Tera-Mining

Trend 3: Most Data is Distributed

Bush’s Law: The usefulness of a column of data varies as the square of the number of columns it is compared to.

Page 7: Distributed Tera-Mining

Example 1: ENSO & Cholera

El Nino Data at NCAR Cholera Data at WHO

Page 8: Distributed Tera-Mining

Example 2: Voting

County BUCHANANALACHUA 263BAKER 73BAY 248BRADFORD 65BREVARD 570BROWARD 788 Table 1

County ReformAlachua 91Baker 4Bay 55Bradford 3Brevard 148Broward 332

Table 2

Page 9: Distributed Tera-Mining

Correlation: Reform Voters vs Votes for Buchanan

0

500

1000

1500

2000

2500

3000

3500

4000

0 50 100 150 200 250 300 350 400 450

Palm Beach

Page 10: Distributed Tera-Mining

DataSpace – One Approach to Making Data Useful

16 terabytes of documents4 billion documents

Today’sMulti-media

Web

Tomorrow’sData Web

petabytes of data tens of billions to

trillions of records

• html• http• search by keyword• workstations servers

• pmml & dtml • dstp• correlate & mine• data & compute clusters

Complementary to the grid, which we view as a distributed computer.

Page 11: Distributed Tera-Mining

attributes [aid]

UCK [uckid]

k[i], y[j]

k[i], x[i]

DSTP Server 1

DSTP Server 2

Click to obtain graph

Page 12: Distributed Tera-Mining

Terra Mining TestbedOptical testbed for distributed tera miningof scientific data.

Goal also to be testbed forbroadband based business services.

Page 13: Distributed Tera-Mining

Lessons Learned

1. It’s the data stupid. Cycles, cylinders & lambdas are all commodities.

2. The fundamental challenge: lower the cost to make data useful.

3. The emergence of internet infrastructure for data is inevitable.

Opens up possibilities for new

types of scientific discoveries.

Page 14: Distributed Tera-Mining

For More Information DataSpace

http://www.dataspaceweb.nethttp://www.ncdm.uic.edu

DataSpace Standardshttp://www.dmg.org

Selected articleshttp://www.twocultures.net

Magnify – http://www.magnify.com

Page 15: Distributed Tera-Mining

End of Slides

Page 16: Distributed Tera-Mining

FTP Still Lives

Page 17: Distributed Tera-Mining

Trend 2. Bandwidth is a Commodity

OC-3 OC-12 OC-48

Page 18: Distributed Tera-Mining

El Nina Anomalies

Page 19: Distributed Tera-Mining

Indonesia Cholera Cases

Page 20: Distributed Tera-Mining

Cholera Cases

Page 21: Distributed Tera-Mining

Distributed Exabytes (New Disks)

0

2000

4000

6000

8000

10000

12000

14000

1995 1996 1997 1998 1999 2000 2001 2002 2003

Source: IDC (1999) "1999 Winchester Disk Drive Market Forecast and Review"

Petabytes1 Exabyte

Page 22: Distributed Tera-Mining

Trend 3: Most Data is Distributed

W’s Law: The usefulness of a column of data varies as the square of the number of columns it is compared to.

Page 23: Distributed Tera-Mining

Example 2: Voting

Page 24: Distributed Tera-Mining

Database 1: Total Votes for Buchanan by County

County BUCHANANALACHUA 263BAKER 73BAY 248BRADFORD 65BREVARD 570BROWARD 788

Page 25: Distributed Tera-Mining

Database 2: Total Registered Reform Voters by County

County ReformAlachua 91Baker 4Bay 55Bradford 3Brevard 148Broward 332