Creating a science base to support new directions in computer science Cornell University Ithaca New...
-
Upload
june-atkinson -
Category
Documents
-
view
213 -
download
0
Transcript of Creating a science base to support new directions in computer science Cornell University Ithaca New...
![Page 1: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/1.jpg)
![Page 2: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/2.jpg)
Creating a science base to support new directions in computer science
Cornell University Ithaca New York
John Hopcroft
![Page 3: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/3.jpg)
Time of change
The information age is a fundamental revolution that is changing all aspects of our lives.
Those individuals, institutions, and nations who recognize this change and position themselves for the future will benefit enormously.
21 Century computing
![Page 4: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/4.jpg)
Computer Science is changing
Early years
Programming languages Compilers Operating systems Algorithms Data bases
Emphasis on making computers useful21 Century computing
![Page 5: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/5.jpg)
Computer Science is changing
The future years
Tracking the flow of ideas in scientific literature
Tracking evolution of communities in social networks
Extracting information from unstructured data sources
Processing massive data sets and streams
Extracting signals from noise
Dealing with high dimensional data and dimension
reduction21 Century computing
![Page 6: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/6.jpg)
Drivers of change
Merging of computing and communications Data available in digital form Networked devices and sensors Computers becoming ubiquitous
21 Century computing
![Page 7: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/7.jpg)
Implications for TCS
• Need to develop theory to support the new directions
• Update computer science education
21 Century computing
![Page 8: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/8.jpg)
Internet search engines are changing
• When was Einstein born?
Einstein was born at Ulm, in Wurttemberg, Germany, on March 14, 1879.
List of relevant web pages
21 Century computing
![Page 9: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/9.jpg)
21 Century computing
![Page 10: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/10.jpg)
Internet queries will be different
Which car should I buy? What are the key papers in Theoretical
Computer Science? Construct an annotated bibliography on graph
theory. Where should I go to college? How did the field of computer science
develop?
21 Century computing
![Page 11: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/11.jpg)
Which car should I buy?
• Search engine response: Which criteria below are important to you?
Fuel economy Crash safety Reliability Performance Etc.
21 Century computing
![Page 12: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/12.jpg)
21 Century computing
Make Cost Reliability Fuel economy
Crash safety
Links to photos/ articles
Toyota Prius 23,780 Excellent 44 mpg Fair photoarticle
Honda Accord 28,695 Better 26 mpg Excellent photoarticle
Toyota Camry 29,839 Average 24 mpg Good photoarticle
Lexus 350 38,615 Excellent 23 mpg Good photoarticle
Infiniti M35 47,650 Excellent 19 mpg Good photoarticle
![Page 13: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/13.jpg)
21 Century computing
![Page 14: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/14.jpg)
21 Century computing
2010 Toyota Camry - Auto ShowsToyota sneaks the new Camry into the Detroit Auto Show.Usually, redesigns and facelifts of cars as significant as the hot-selling Toyota Camry are accompanied by a commensurate amount of fanfare. So we were surprised when, right about the time that we were walking by the Toyota booth, a chirp of our Blackberries brought us the press release announcing that the facelifted 2010 Toyota Camry and Camry Hybrid mid-sized sedans were appearing at the 2009 NAIAS in Detroit.
We’d have hardly noticed if they hadn’t told us—the headlamps are slightly larger, the grilles on the gas and hybrid models go their own way, and taillamps become primarily LED. Wheels are also new, but overall, the resemblance to the Corolla is downright uncanny. Let’s hear it for brand consistency!
Four-cylinder Camrys get Toyota’s new 2.5-liter four-cylinder with a boost in horsepower to 169 for LE and XLE grades, 179 for the Camry SE, all of which are available with six-speed manual or automatic transmissions. Camry V-6 and Hybrid models are relatively unchanged under the skin.
Inside, changes are likewise minimal: the options list has been shaken up a bit, but the only visible change on any Camry model is the Hybrid’s new gauge cluster and softer seat fabrics. Pricing will be announced closer to the time it goes on sale this March.
Toyota Camry
· › Overview ·› Specifications ·› Price with Options ·› Get a Free Quote
News & Reviews
· 2010 Toyota Camry - Auto Shows
Top Competitors· Chevrolet Malibu
·Ford Fusion ·Honda Accord sedan
![Page 15: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/15.jpg)
Which are the key papers in Theoretical Computer Science?
• Hartmanis and Stearns, “On the computational complexity of algorithms”
• Blum, “A machine-independent theory of the complexity of recursive functions”
• Cook, “The complexity of theorem proving procedures”
• Karp, “Reducibility among combinatorial problems”
• Garey and Johnson, “Computers and Intractability: A Guide to the Theory of NP-Completeness”
• Yao, “Theory and Applications of Trapdoor Functions”
• Shafi Goldwasser, Silvio Micali, Charles Rackoff , “The Knowledge Complexity of Interactive Proof Systems”
• Sanjeev Arora, Carsten Lund, Rajeev Motwani, Madhu Sudan, and Mario Szegedy, “Proof Verification and the Hardness of Approximation Problems”
21 Century computing
![Page 16: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/16.jpg)
21 Century computing
Temporal Cluster Histograms: NIPS Results
12: chip, circuit, analog, voltage, vlsi11: kernel, margin, svm, vc, xi10: bayesian, mixture, posterior, likelihood,
em9: spike, spikes, firing, neuron, neurons8: neurons, neuron, synaptic, memory,
firing7: david, michael, john, richard, chair6: policy, reinforcement, action, state,
agent5: visual, eye, cells, motion, orientation4: units, node, training, nodes, tree3: code, codes, decoding, message, hints2: image, images, object, face, video1: recurrent, hidden, training, units, error0: speech, word, hmm, recognition, mlp
NIPS k-means clusters (k=13)
0
20
40
60
80
100
120
140
160
180
1 2 3 4 5 6 7 8 9 10 11 12 13 14Year
Num
ber o
f Pap
ers
Shaparenko, Caruana, Gehrke, and Thorsten
![Page 17: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/17.jpg)
Fed Ex package tracking
21 Century computing
![Page 18: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/18.jpg)
21 Century computing
![Page 19: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/19.jpg)
21 Century computing
![Page 20: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/20.jpg)
21 Century computing
![Page 21: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/21.jpg)
21 Century computing
![Page 22: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/22.jpg)
21 Century computing
![Page 23: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/23.jpg)
21 Century computing
NEXRAD Radar
Binghamton, Base Reflectivity 0.50 Degree Elevation Range 124 NMI — Map of All US Radar Sites Animate MapStorm TracksTotal PrecipitationShow SevereRegional RadarZoom Map Click:Zoom InZoom OutPan Map
(Full Zoom Out)
»
A
dvanced
R
adar
Types
C
LIC
K
»
BGMN0R042.40547-76.51950Ithaca, NY000.12574445412029999999900
![Page 24: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/24.jpg)
21 Century computing
![Page 25: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/25.jpg)
Collective Inference on Markov Models for Modeling Bird Migration
21 Century computing
Space
Time
![Page 26: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/26.jpg)
21 Century computing
Daniel Sheldon, M. A. Saleh Elmohamed, Dexter Kozen
![Page 27: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/27.jpg)
Science base to support activities
• Track flow of ideas in scientific literature• Track evolution of communities in social networks• Extract information from unstructured data
sources.
21 Century computing
![Page 28: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/28.jpg)
21 Century computing
Tracking the flow of ideas in scientific literature
Yookyung Jo
![Page 29: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/29.jpg)
21 Century computing
Tracking the flow of ideas in scientific literatureYookyung Jo
Page rank
Web
Link
GraphRetrieval
Query
Search
Text
Web
Page
Search
Rank
Web
Chord
Usage
Index
Probabilistic
TextFile
Retrieve
Text
Index
Discourse
Word
Centering
Anaphora
![Page 30: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/30.jpg)
21 Century computingOriginal papers
![Page 31: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/31.jpg)
21 Century computingOriginal papers cleaned up
![Page 32: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/32.jpg)
21 Century computingReferenced papers
![Page 33: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/33.jpg)
21 Century computing
Referenced papers cleaned up. Three distinct categories of papers
![Page 34: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/34.jpg)
21 Century computing
Topic evolution thread
• Seed topic :– 648 : making c program
type-safe by separating pointer types by their usage to prevent memory errors
• 3 subthreads :– Type :
– Garbage collection :
– Pointer analysis :
Yookyung Jo, 2010
![Page 35: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/35.jpg)
21 Century computing
Topic Evolution Map of the ACM corpus
Yookyung J o, 2010
![Page 36: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/36.jpg)
Tracking communities in social networks
21 Century computing
Liaoruo Wang
![Page 37: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/37.jpg)
“Statistical Properties of Community Structure in Large Social and Information Networks”, Jure Leskovec; Kevin Lang; Anirban Dasgupta; Michael Mahoney
• Studied over 70 large sparse real-world networks.
• Best communities are of approximate size 100 to 150.
21 Century computing
![Page 38: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/38.jpg)
Our most striking finding is that in nearly every network dataset we examined, we observed tight but almost trivial communities at very small scales, and at larger size scales, the best possible communities gradually "blend in" with the rest of the network and thus become less "community-like".
21 Century computing
![Page 39: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/39.jpg)
21 Century computing
Conductance
Size of community100
![Page 40: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/40.jpg)
21 Century computing
Giant component
![Page 41: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/41.jpg)
21 Century computing
Whisker: A component with v vertices connected by edgese v
![Page 42: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/42.jpg)
Our view of a community
21 Century computing
TCS
Me
Colleagues at Cornell
Classmates
Family and friendsMore connections outside than inside
![Page 43: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/43.jpg)
• Early work Min cut – two equal size communities Conductance – minimizes cross edges
• Future work Consider communities with more cross edges than
internal edges Find small communities Track communities over time Develop appropriate definitions for communities Understand the structure of different types of social
networks
21 Century computing
![Page 44: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/44.jpg)
“Clustering Social networks”Mishra, Schreiber, Stanton, and Tarjan
• Each member of community is connected to a beta fraction of community
• No member outside the community is connected to more than an alpha fraction of the community
• Some connectivity constraint
21 Century computing
![Page 45: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/45.jpg)
In sparse graphs
• How do you find alpha-beta communities?
• What if each person in the community is connected to more members outside the community than inside?
21 Century computing
![Page 46: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/46.jpg)
Structure of social networks
• Different networks may have quite different structures.• An algorithm for finding alpha-beta communities in
sparse graphs.• How many communities of a given size are there in a
social network?• Results on exploring a tweeter network of approximately
100,000 nodes.
21 Century computing
Liaoruo Wang, Jing He, Hongyu Liang
![Page 47: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/47.jpg)
200 randomly chosen nodes
Apply alpha- beta algorithm
Resulting alpha- beta community
How many alpha-beta communities?
21 Century computing
![Page 48: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/48.jpg)
Massively overlapping communities
• Are there a small number of massively overlapping communities that share a common core?
• Are there massively overlapping communities in which one can move from one community to a totally disjoint community?
21 Century computing
![Page 49: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/49.jpg)
21 Century computing
Massively overlapping communities with a common core
![Page 50: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/50.jpg)
21 Century computing
Massively overlapping communities
![Page 51: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/51.jpg)
• Define the core of a set of overlapping communities to be the intersection of the communities.
• There are a small number of cores in the tweeter data set.
21 Century computing
![Page 52: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/52.jpg)
Size of initial set Number of cores
25 221
50 94
100 19
150 8
200 4
250 4
300 4
350 3
400 3
450 3
21 Century computing
![Page 53: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/53.jpg)
450 1 5 6
451 1 5 6
350 1 5 6
300 1 3 5 6
250 1 3 5 6
200 1 3 5 6
150 1 2 3 4 5 6 7 8
21 Century computing
![Page 54: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/54.jpg)
• What is the graph structure that causes certain cores to merge and others to simply vanish?
• What is the structure of cores as they get larger? Do they consist of concentric layers that are less dense at the outside?
21 Century computing
![Page 55: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/55.jpg)
21 Century computing
Sucheta Soundarajan
Transmission paths for viruses, flow of ideas, or influence
![Page 56: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/56.jpg)
Sparse vectors
21 Century computing
There are a number of situations where sparse vectors are important
Tracking the flow of ideas in scientific literature
Biological applications
Signal processing
![Page 57: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/57.jpg)
Sparse vectors in biology
21 Century computing
plants
GenotypeInternal code
PhenotypeObservablesOutward manifestation
![Page 58: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/58.jpg)
Theory to support new directions
21 Century computing
Large graphs Spectral analysis High dimensions and dimension reduction Clustering Collaborative filtering Extracting signal from noise
![Page 59: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/59.jpg)
Theory of Large Graphs
Large graphs with billions of vertices
Exact edges present not critical
Invariant to small changes in definition
Must be able to prove basic theorems
21 Century computing
![Page 60: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/60.jpg)
Erdös-Renyi n vertices each of n2 potential edges is present with
independent probability
21 Century computing
Nn
pn (1-p)N-n
vertex degreebinomial degree distribution
numberof
vertices
![Page 61: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/61.jpg)
21 Century computing
![Page 62: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/62.jpg)
Generative models for graphs
Vertices and edges added at each unit of time
Rule to determine where to place edges Uniform probability Preferential attachment - gives rise to power
law degree distributions
21 Century computing
![Page 63: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/63.jpg)
21 Century computingVertex degree
Number
of
vertices
Preferential attachment gives rise to the power law degree distribution common in many graphs
![Page 64: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/64.jpg)
21 Century computing
Protein interactions
2730 proteins in data base
3602 interactions between proteins SIZE OF COMPONENT
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 … 1000
NUMBER OF COMPONENTS
48 179 50 25 14 6 4 6 1 1 1 0 0 0 0 1 0
Science 1999 July 30; 285:751-753
Only 899 proteins in components. Where are the 1851 missing proteins?
![Page 65: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/65.jpg)
21 Century computing
Protein interactions
2730 proteins in data base
3602 interactions between proteins
SIZE OF COMPONENT
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 … 1851
NUMBER OF COMPONENTS
48 179 50 25 14 6 4 6 1 1 1 0 0 0 0 1 1
Science 1999 July 30; 285:751-753
![Page 66: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/66.jpg)
Science base
• What do we mean by science base?
Example: High dimensions
21 Century computing
![Page 67: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/67.jpg)
High dimension is fundamentally different from 2 or 3 dimensional space
21 Century computing
![Page 68: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/68.jpg)
High dimensional data is inherently unstable
• Given n random points in d dimensional space, essentially all n2 distances are equal.
•
21 Century computing
22
1
d
i ii
x yx y
![Page 69: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/69.jpg)
21 Century computing
High Dimensions
Intuition from two and three dimensions not valid for high dimension
Volume of cube is one in all dimensions
Volume of sphere goes to zero
![Page 70: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/70.jpg)
21 Century computing
1
Unit sphere
Unit square
2 Dimensions
2 2 11 10.707
2 2 2
![Page 71: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/71.jpg)
21 Century computing
2 2 2 21 1 1 1
12 2 2 2
4 Dimensions
![Page 72: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/72.jpg)
21 Century computing
21
22
dd
d Dimensions
1
![Page 73: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/73.jpg)
Almost all area of the unit cube is outside the unit sphere
21 Century computing
![Page 74: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/74.jpg)
Gaussian distribution
21 Century computing
Probability mass concentrated between dotted lines
![Page 75: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/75.jpg)
Gaussian in high dimensions
3
√d
21 Century computing
![Page 76: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/76.jpg)
Two Gaussians
3√d
21 Century computing
![Page 77: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/77.jpg)
21 Century computing
![Page 78: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/78.jpg)
Distance between two random points from same Gaussian
Points on thin annulus of radius
Approximate by a sphere of radius
Average distance between two points is
(Place one point at N. Pole, the other point at random. Almost surely, the second point is near the equator.)
21 Century computing
d
d
2d
![Page 79: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/79.jpg)
21 Century computing
![Page 80: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/80.jpg)
21 Century computing
2d
d
d
![Page 81: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/81.jpg)
21 Century computing
Expected distance between points from two Gaussians separated by δ
2 2d
2d
![Page 82: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/82.jpg)
Can separate points from two Gaussians if
2
14
2
12 2
2
2 2
2 1 2
1
2 2
2 2
d
d d
d d
d
d
21 Century computing
![Page 83: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/83.jpg)
Dimension reduction
Given n points in d dimensions, a random projection to log n dimensions will preserve all pair wise distances
21 Century computing
![Page 84: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/84.jpg)
Dimension reduction for the Gaussians
• Project points onto subspace containing centers of Gaussians
• Reduce dimension from d to k, the number of Gaussians
21 Century computing
![Page 85: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/85.jpg)
21 Century computing
• Centers retain separation• Average distance between points reduced
by dk
1 2 1 2, , , , , , ,0, ,0d k
i i
x x x x x x
d x k x
![Page 86: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/86.jpg)
Can separate Gaussians provided
2 2 2k k
21 Century computing
> some constant involving k and γ independent of the dimension
![Page 87: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/87.jpg)
Finding centers of Gaussians
21 Century computing
1
2
n
a
a
a
First singular vector v1 minimizes the perpendicular distance to points
1a2a
na
1v
![Page 88: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/88.jpg)
The first singular vector goes through center of Gaussian and minimizes distance to points
Gaussian
21 Century computing
![Page 89: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/89.jpg)
Best k-dimensional space for Gaussian is any space containing the line through the center of the Gaussian.
21 Century computing
![Page 90: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/90.jpg)
Given k Gaussians, the top k singular vectors define a k dimensional space that contains the k lines through the centers of the k Gaussian and hence contain the centers of the Gaussians.
21 Century computing
![Page 91: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/91.jpg)
21 Century computing
• We have just seen what a science base for high dimensional data might look like.
• What other areas do we need to develop a science base for?
![Page 92: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/92.jpg)
21 Century computing
Ranking is important Restaurants, movies, books, web pages Multi-billion dollar industry
Collaborative filtering When a customer buys a product, what else is
he likely to buy? Dimension reduction Extracting information from large data
sources Social networks
![Page 93: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/93.jpg)
Time of change
The information age is a fundamental revolution that is changing all aspects of our lives.
Those individuals and nations who recognize this change and position themselves for the future will benefit enormously.
21 Century computing
![Page 94: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/94.jpg)
Conclusions
We are in an exciting time of change.
Information technology is a big driver of that change.
Computer science theory needs to be developed to support this information age.
21 Century computing
![Page 95: Creating a science base to support new directions in computer science Cornell University Ithaca New York John Hopcroft.](https://reader035.fdocuments.us/reader035/viewer/2022062718/56649e745503460f94b7426c/html5/thumbnails/95.jpg)
THANK YOU