Data Mining At the Crossroads: Successes, Failures, and ...
Transcript of Data Mining At the Crossroads: Successes, Failures, and ...
![Page 1: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/1.jpg)
Data Mining At the Crossroads:Successes, Failures, and Learning from Them
Haym Hirsh
Department of Computer Science
Rutgers University
Division of Information and Intelligent Systems
U.S. National Science Foundation
![Page 2: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/2.jpg)
Copyright © 2007 Haym Hirsh
What is Data Mining?
For the purposes of my presentation:
Data Mining =
The extraction of useful information from data
(I.e., Data Mining broadly construed)
![Page 3: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/3.jpg)
Copyright © 2007 Haym Hirsh
Some of My Favorite Data Mining Successes
![Page 4: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/4.jpg)
Copyright © 2007 Haym Hirsh
Some of My Favorite Data Mining Successes
• Web search
![Page 5: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/5.jpg)
Copyright © 2007 Haym Hirsh
Some of My Favorite Data Mining Successes
• Web search
• Spam filtering
![Page 6: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/6.jpg)
Copyright © 2007 Haym Hirsh
Some of My Favorite Data Mining Successes
• Web search
• Spam filtering
• Recommender systems
![Page 7: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/7.jpg)
Copyright © 2007 Haym Hirsh
Some of My Favorite Data Mining Successes
• Web search
• Spam filtering
• Recommender systems
• Machine translation
![Page 8: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/8.jpg)
Copyright © 2007 Haym Hirsh
Some of My Favorite Data Mining Successes
• Web search
• Spam filtering
• Recommender systems
• Machine translation
• Massive data clusters
![Page 9: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/9.jpg)
Copyright © 2007 Haym Hirsh
Some of My Favorite Data Mining Successes
• Web search
• Spam filtering
• Recommender systems
• Machine translation
• Massive data clusters
• Conferences like this one: Participation by people in diverse, previously disjoint subfields (databases, machine learning, statistics, etc.)
![Page 10: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/10.jpg)
Copyright © 2007 Haym Hirsh
Some of My Favorite Data Mining Successes
• Web search
• Spam filtering
• Recommender systems
• Machine translation
• Massive data clusters
• Conferences like this one: Participation by people in diverse, previously disjoint subfields (databases, machine learning, statistics, etc.)
• Benchmark datasets
![Page 11: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/11.jpg)
Copyright © 2007 Haym Hirsh
Some Data Mining “Failures”
![Page 12: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/12.jpg)
Copyright © 2007 Haym Hirsh
Some Data Mining “Failures”
• Socio-Political
![Page 13: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/13.jpg)
Copyright © 2007 Haym Hirsh
![Page 14: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/14.jpg)
Copyright © 2007 Haym Hirsh
![Page 15: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/15.jpg)
Copyright © 2007 Haym Hirsh
![Page 16: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/16.jpg)
Section 13.3: UNITY OF EFFORT IN
SHARING INFORMATION
The U.S. government has access to a vast
amount of information. When databases not
usually thought of as "intelligence," such as
customs or immigration information, are
included, the storehouse is immense. … In
interviews around the government, official after
official urged us to call attention to frustrations
with the unglamorous "back office" side of
government operations. …
Recommendation: The president should lead
the government-wide effort to bring the major
national security institutions into the
information revolution. He should coordinate
the resolution of the legal, policy, and technical
issues across agencies to create a "trusted
information network."
![Page 17: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/17.jpg)
Copyright © 2007 Haym Hirsh
Some Data Mining “Failures”
• Socio-Political
![Page 18: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/18.jpg)
Copyright © 2007 Haym Hirsh
Some Data Mining “Failures”
• Socio-Political– Bad data mining
![Page 19: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/19.jpg)
Copyright © 2007 Haym Hirsh
Some Data Mining “Failures”
• Socio-Political– Bad data mining
– Misused data mining
![Page 20: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/20.jpg)
Copyright © 2007 Haym Hirsh
Some Data Mining “Failures”
• Socio-Political– Bad data mining
– Misused data mining
– Ignorant decision-making
![Page 21: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/21.jpg)
Copyright © 2007 Haym Hirsh
Some Data Mining “Failures”
• Socio-Political– Bad data mining
– Misused data mining
– Ignorant decision-making
– Ramifications of data mining
![Page 22: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/22.jpg)
Copyright © 2007 Haym Hirsh
Some Data Mining “Failures”
• Socio-Political– Bad data mining
– Misused data mining
– Ignorant decision-making
– Ramifications of data mining
– Presuming fixed technology
![Page 23: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/23.jpg)
Copyright © 2007 Haym Hirsh
Some Data Mining “Failures”
• Data Mining is about Real Data:Benchmark data sets are a means to an end
– Data sets are supposed to be representative of the sorts of problems our algorithms will see in practice
– Data sets must stay timely as technological and scientific advances allow our ambitions to grow
– A data set from some domain is not an application --Who do you personally know that cares about your results?
![Page 24: Data Mining At the Crossroads: Successes, Failures, and ...](https://reader033.fdocuments.us/reader033/viewer/2022042702/62657d15a0ec177a3045a789/html5/thumbnails/24.jpg)
Copyright © 2007 Haym Hirsh
Some Data Mining “Failures”
• How do we ensure reproducible results?– Many of the applications of data mining are in the
commercial sector -- How do we handle research results that reflect proprietary or otherwise restricted data?
– How do we make sure academic research results address problems that are important in practice?
– How do we handle inherent resource differentials between industry and academic research?• Access to data
• Massive data centers
– What new models of publication are particularly suited to data mining – “Executable articles” (Mark Liberman)