Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles...
-
Upload
hortense-woods -
Category
Documents
-
view
213 -
download
0
Transcript of Welcome to Intro to Bioinformatics. Intergalactic Border Patrol Bioinformatics in Space Tribbles...
Welcome to Intro to Bioinformatics
Intergalactic Border PatrolBioinformatics in Space
Tribbles
Warning! Highly dangerous!
Trogs
Cute and harmless.
Intergalactic Border PatrolBioinformatics in Space
Tribbles
Warning! Highly dangerous!
Trogs
Cute and harmless.
Welcome to the Intergalactic Detention Center
Please answer the following questions
1. Like broccoli
2. Floss every brushing
3. Enjoy ballet
4. Always pair socks
5. Liked Moby Dick
6. Eat the Maraschino cherry
1………………………….……………..10
T1 T2 T3 T4 T5 T6 T7 . . .
Responses to questionnaire
1. Broccoli
2. Floss
3. Ballet
4. Pair socks
5. Moby Dick
6. Maraschino
. . .
9.2 1.6 4.0 5.2 2.2 9.1 1.0 . . .
2.2 1.9 1.0 4.6 7.6 9.8 1.0 . . .
8.3 3.1 2.4 6.1 9.3 9.2 1.0 . . .
9.6 5.5 1.3 8.4 9.8 9.0 1.0 . . .
6.4 8.9 7.1 3.3 1.9 2.0 1.0 . . .
6817. MacArthur’s Park 1.2 1.5 5.1 3.4 1.1 1.7 9.9 . . .
4.2 2.1 1.0 4.1 5.2 4.4 1.0 . . .
You need a plan
A Plan
• Release all Tribbles / Trogs
• Note outcome for each individual
• Deduce identities
• Integrate identities into results
• Figure out which questions/answers informative
T1 T2 T3 T4 T5 T6 T7 . . .
Responses to questionnaire
1. Broccoli
2. Floss
3. Ballet
4. Pair socks
5. Moby Dick
6. Maraschino
. . .
9.2 1.6 4.0 5.2 2.2 9.1 1.0 . . .
2.2 1.9 1.0 4.6 7.6 9.8 1.0 . . .
8.3 3.1 2.4 6.1 9.3 9.2 1.0 . . .
9.6 5.5 1.3 8.4 9.8 9.0 1.0 . . .
6.4 8.9 7.1 3.3 1.9 2.0 1.0 . . .
6817. MacArthur’s Park 1.2 1.5 5.1 3.4 1.1 1.7 9.9 . . .
Tribbles Trogs
4.2 2.1 1.0 4.1 5.2 4.4 1.0 . . .
(what now?)
T1 T2 T3 T4 T5 T6 T7 Mean
Responses to questionnaire
1. Broccoli
2. Floss
3. Ballet
4. Pair socks
5. Moby Dick
6. Maraschino
. . .
9.2 1.6 4.0 5.2 2.2 9.1 1.0 6.4 2.2
2.2 1.9 1.0 4.6 7.6 9.8 1.0 6.0 1.3
8.3 3.1 2.4 6.1 9.3 9.2 1.0 8.2 2.2
9.6 5.5 1.3 8.4 9.8 9.0 1.0 9.2 2.6
4.2 2.1 1.0 4.1 5.2 4.4 1.0 4.4 1.4
6.4 8.9 7.1 3.3 1.9 2.0 1.0 4.4 3.7
6817. MacArthur’s Park 1.2 1.5 5.1 3.4 1.1 1.7 9.9 1.8 5.5
Tribbles Trogs
Which questions are informative?Which can be used to predict class?
The responses to which questions are correlated with class?
1…………………….……………..10
Δμ
1…………………….……………..10
Δμ
Δμ
σ + σCorrelation of question with class =
Which questions are informative?Which can be used to predict class?
Strategy
Δμ
σ + σCorrelation =
• Calculate correlation for each question
• Look for questions with largest correlations with class
Implementation
μ = (Σ s ) / N
1…………………….……………..10
Which questions are informative?Which can be used to predict class?
Strategy
Δμ
σ + σCorrelation =
• Calculate correlation for each question
• Look for questions with largest correlations with class
Implementation
σ2 = [Σ (s - μ)2 / (N-1)]σ = sqrt(σ)
1…………………….……………..10
- +
Which questions are informative?Which can be used to predict class?
Strategy
Δμ
σ + σCorrelation =
• Calculate correlation for each question
• Look for questions with largest correlations with class
Implementation
(Σ s)/ N - (Σ s)/N
sqrt(Σ (s - μ)2 / (N-1)] + sqrt(Σ (s - μ)2 / (N-1)) =
Which questions are informative?Which can be used to predict class?
Δμ
σ + σCorrelation =
Implementation
=
Read_Responses_To_Question();
$numerator = Mean(@tribble_scores) – Mean(@trog_scores);
$denominator = StDev(@tribble_scores) + StDev(@trog_scores);
$correlation = $numerator / $denominator;
push @question_info, [$question_number, $correlation];
(Σ s)/ N - (Σ s)/N
sqrt(Σ (s - μ)2 / (N-1)] + sqrt(Σ (s - μ)2 / (N-1))
Which questions are informative?Which can be used to predict class?
Δμ
σ + σCorrelation =
Implementation
=
Read_Responses_To_Question();
$numerator = Mean(@tribble_scores) – Mean(@trog_scores);
$denominator = StDev(@tribble_scores) + StDev(@trog_scores);
while (<INPUT>) {
}
$correlation = $numerator / $denominator;
push @question_info, [$question_number, $correlation];
(Σ s)/ N - (Σ s)/N
sqrt(Σ (s - μ)2 / (N-1)] + sqrt(Σ (s - μ)2 / (N-1))
Which questions are informative?Which can be used to predict class?
Implementation
sub Mean {
my @scores = @_; # Grab Tribble or Trog scores
my $s_sum = 0; # Start Σ at 0
my $N = 0; # Need to count N
foreach my $score (@scores) {
$s_sum = $s_sum + $score;
$N = $N + 1;
}
return $s_sum / $N; # mean = (Σ s)/ N
Which questions are informative?Which can be used to predict class?
Results
Question Correlation
3497 1.76 281 1.72 1114 1.71
… …
Are these questions good predictors of class?
Suppose there are NO good predictors of class…
(Interlude)
NEWS!
Precinct in Harrisonburg has voted for the winning senatorial candidate every time
for the past ten elections!
(Probability if by chance = (1/2) · (1/2) · (1/2) · …
= (1/2)10
= 1/1024 1/1000
Suppose there are 1000 precincts in Virginia…
(BLAST from the past) E = (probability) · (number of combinations)
Beware the fallacy of the unlikely result!
Which questions are informative?Which can be used to predict class?
Results
Question Correlation
3497 1.76 281 1.72 1114 1.71
… …
Are these questions good predictors of class?
Suppose there are NO good predictors of class…
… what would be the expected correlation?
? ? ?
Which questions are informative?How to test class predictors?
Choice #1
Rerun time with the different (?) reality that Tribbles are no different from Trogs
Choice #2
Use random data
T1 T2 T3 T4 T5 T6 T7 . . .
Random responses to questionnaire
1. Broccoli
2. Floss
3. Ballet
4. Pair socks
5. Moby Dick
6. Maraschino
. . .
9.2 -1600 331/3 99 3.14159 -0 1.0 . . .
6817. MacArthur’s Park
Random doesn’t mean crazy
T1 T2 T3 T4 T5 T6 T7 . . .
Random responses to questionnaire
1. Broccoli
2. Floss
3. Ballet
4. Pair socks
5. Moby Dick
6. Maraschino
. . .
9.2 1.6 4.0 5.2 2.2 9.1 1.0 . . .
2.2 1.9 1.0 4.6 7.6 9.8 1.0 . . .
8.3 3.1 2.4 6.1 9.3 9.2 1.0 . . .
9.6 5.5 1.3 8.4 9.8 9.0 1.0 . . .
6.4 8.9 7.1 3.3 1.9 2.0 1.0 . . .
6817. MacArthur’s Park 1.2 1.5 5.1 3.4 1.1 1.7 9.9 . . .
4.2 2.1 1.0 4.1 5.2 4.4 1.0 . . .
Maybe but…
T1 T2 T3 T4 T5 T6 T7 . . .
Random responses to questionnaire
1. Broccoli
2. Floss
3. Ballet
4. Pair socks
5. Moby Dick
6. Maraschino
. . .
9.2 1.6 4.0 5.2 2.2 9.1 1.0 . . .
2.2 1.9 1.0 4.6 7.6 9.8 1.0 . . .
8.3 3.1 2.4 6.1 9.3 9.2 1.0 . . .
9.6 5.5 1.3 8.4 9.8 9.0 1.0 . . .
6.4 8.9 7.1 3.3 1.9 2.0 1.0 . . .
6817. MacArthur’s Park 1.2 1.5 5.1 3.4 1.1 1.7 9.9 . . .
4.2 2.1 1.0 4.1 5.2 4.4 1.0 . . .
Keep the data, shuffle the players
Which questions are informative?How to test class predictors?
Choice #1
Rerun time with the different (?) reality that Tribbles are no different from Trogs
Choice #2
Use random data
Choice #3
Shuffle data
Which questions are informative?How to test class predictors?
Correlation2.0 1.5 1.0 0.5 0 -0.5
# of questions
with better correlations
10000
1000
100
10
0
5% of shuffled responses
Which questions are informative?How to test class predictors?
Correlation2.0 1.5 1.0 0.5 0 -0.5
# of questions
with better correlations
10000
1000
100
10
0
1% of shuffled responses
Actual responses
Which questions are informative?How to test class predictors?
Correlation2.0 1.5 1.0 0.5 0 -0.5
# of questions
with better correlations
10000
1000
100
10
0
1% of shuffled responses
Actual responses
If class predictors don’t work
If class predictors are valid