Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10....
Transcript of Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10....
Professional Skills in Computer ScienceLecture 8: Induction (2)
Ullrich Hustadt
Department of Computer ScienceSchool of Electrical Engineering, Electronics, and Computer Science
University of Liverpool
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 1
Ind. generalisation Statistical syllogism Ind. reasoning in CS
Contents
1 Inductive generalisationDefinitionHasty generalisationOvergeneralisationBiased sampleObservation
2 Statistical syllogismDefinition and examplesFallacy by accidentArguments from authorityFallacy by appeal to inappropriate authorityArguments from consensus
3 Inductive Reasoning in Computer ScienceMachine LearningData Mining
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 2
Ind. generalisation Statistical syllogism Ind. reasoning in CS
Today . . .
Relevant learning outcomes:
1 Ability to describe and discusseconomic, historic, organisational, research, and socialaspects of computing as a discipline and computing in practice
2 To effectively retrieve informationincluding the use of library and web sources andthe evaluation of information retrieved from such sources
3 To recognise and employ sound reasoning and argumentationtechniques as part of conducting basic research
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 3
Ind. generalisation Statistical syllogism Ind. reasoning in CS
Causal induction / Causal inference
• Mill’s five methods of induction / five methods of experimental inquiry
1 Method of agreement2 Method of difference3 Joint method of agreement and difference4 Method of concomitant variations5 Method of residue
are methods for causal induction (or causal inference)
• Causal induction draws a conclusion about a causal connectionbased on the circumstances of the occurrence of an effect
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 4
Ind. generalisation Statistical syllogism Ind. reasoning in CS
Causal induction: Example
• In 1695, Edmond Halley was computing the orbits of a set of cometsfor inclusion in Newton’s Principia Mathematica
• He noticed that comets that were observed in 1531, 1607, and 1682took very similar paths across the skyAlso, the observations were 75–76 years apart(suggesting a regular interval)
• Newton had already established (by induction) that comets followcertain paths, e.g. a parabolic path or an elliptic orbit
• Halley inferred by induction that the three sightings were caused by thesame comet orbiting the sun on a highly elliptic orbit
• Note: This could be seen as hasty generalisation, but we now know thatthe comet has been observed since 240 BC by Chinese and Babylonianastronomers
(Source: T. L. Griffiths and J. B. Tenenbaum: Theory-Based CausalInduction. Psychological Review 116(4):661-716, 2009.)
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 5
Ind. generalisation Statistical syllogism Ind. reasoning in CS
Other forms of inductive reasoning
• Causal induction is only one form of inductive reasoning
• In particular, we were looking for reasoning that from observations like
All the crows I’ve ever seen were black
draws a conclusion like
All crows are black
• This does not appear to be causal induction
• Instead this form of inductive reasoning is based on
1 Inductive generalisation2 Statistical syllogism
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 6
Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Hasty Overgeneral Bias Observation
Inductive generalisation
• An inductive generalisation takes a sample of a populationand draws a conclusion about the entire population:
Proportion X of sample S have property PthereforeProportion X of the entire population have property P
Example:
• You have a box with 100 balls in it, some black, some white
• You draw a sample of 5 balls out of the box,4 of them are black, i.e., 80%, and 1 is white, i.e., 20%
• Inductive generalisation:80% of all the balls in the box are black and 20% are white
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 7
Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Hasty Overgeneral Bias Observation
Inductive generalisation
• A special case of inductive generalisation occurswhen the proportion X of the sample represents the whole sample:
Every instance of sample S has property PthereforeEvery instance of the entire population has property P
Example:Every crow that I have ever seen was blackthereforeEvery crow in the entire world is black
• This was exactly the kind of inductive reasoning that we were looking for
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 8
Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Hasty Overgeneral Bias Observation
Hasty generalisation
• Inductive generalisation requires a sample that is sufficiently large andunbiased
• A sample that is too small can lead to a hasty generalisation
Example:
• You have a box with 100 balls in it, some black, some white, some red
• You draw a sample of 2 balls out of the box,1 of them is black, i.e., 50%, and 1 is white, i.e., 50%
• Generalisation:50% of all the balls in the box are black and 50% are white,there are no red balls in the box
; A sample of 2 balls could never have been representativegiven that there are 3 colours involved
; Note that this generalisation might still be correct!
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 9
Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Hasty Overgeneral Bias Observation
Overgeneralisation
• A special instance of hasty generalisation is overgeneralisation
• Overgeneralisation occurs if you draw an overly-general conclusion thatis unwarranted by the sample
Instances Salad Fish Meat Chicken Sick
Andy yes yes yes yes yes
Dave yes yes yes
Frank yes yes yes yes
Eve yes yes yes
Jack yes yes yes yes
Betty yes yes yes
• Causal induction: This particular salad makes you sick
• Overgeneralisation: Salad is bad for you
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 10
Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Hasty Overgeneral Bias Observation
Biased sample
• A biased sample occurs when a sample is collected in such a way thatsome members of the intended population are less likely to be includedthan others
• A biased sample is again not a sound basis for inductive generalisation
Example:
• The average age of people studying or working at the Universityis 28 years
• Generalisation: The average age of the UK population is 28 years
; – In reality, the average age of the UK population is 38 years– The sample of people studying and working at the University
is biased towards younger people
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 11
Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Hasty Overgeneral Bias Observation
Insufficient Range of Observational Circumstances
Example:
• We observe that a fellow student, Michael, is grumpy onWednesday, 2nd November,Wednesday, 9th November,Wednesday, 16th November,Wednesday, 23rd November
• We conclude that Michael is always grumpy on Wednesdays
• We failed to recognise that these dates coincide with COMP101coursework deadlines and that this is the cause for Michael’s grumpiness
• As soon as COMP101 is over Michael will be grumpy on a different dayof the week
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 12
Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Accident Authority
Statistical syllogism
• A statistical syllogism proceeds from a generalisation toa conclusion about an individual
– Proportion X of the population have property P (where X is large)– Individual I is a member of that population– Therefore, I has property P
• Syllogism means “conclusion” or “inference”
• Beware: Some dictionaries define a syllogism asa “deductive scheme” or “deductive reasoning”
Statistical syllogism is not a form of deductive reasoningIt is a form of inductive reasoning
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 13
Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Accident Authority
Statistical syllogism
• A statistical syllogism proceeds from a generalisation toa conclusion about an individual
– Proportion X of the population have property P (where X is large)– Individual I is a member of that population– Therefore, I has property P
Example:– 90% of university students have above average intelligence– You are a university student– Therefore, you have above average intelligence
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 14
Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Accident Authority
Statistical syllogism: Fallacy by accident
• Fallacy by accident: a generalisation is applied when circumstancessuggest that there should be an exception
Example:– Exceeding the speed limit is (almost always) an offence– The driver of an ambulance has exceeded the speed limit– Therefore, the driver has committed an offence
Obviously, we should realise that an ambulance may be exempted fromobeying the speed limit
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 15
Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Accident Authority
Statistical syllogism: Arguments from authority
• Arguments from authority can be seen as a version ofstatistical syllogism:
Statistical syllogism
– Proportion X of the population have property P (where X is large)– Individual I is a member of that population– Therefore, I has property P
Argument from authority
– Most of what authority A says on subject matter S is correct– X is something that A says in the context of S– Therefore, X is true
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 16
Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Accident Authority
Arguments from authority:Appeal to inappropriate authority
• Arguments from authority are best avoided in science
• If you still feel the need to use such an argument, make sure that youavoid the fallacy of appeal to inappropriate authority where the authorityand subject matter does not satisfy all of the following conditions:
1 The authority is a recognised expert on the subject matter2 There is general agreement among authorities on questions / statements
relating to that subject matter3 There is no good reason to suspect that the authority is biased on the
subject matter or the particular question
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 17
Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Accident Authority
Statistical syllogism: Arguments from consensus
• Arguments from consensus can be seen as a version ofstatistical syllogism:
Argument from consensus
– Most of the claims that most of the people agree upon are true– X is a claim that most people agree upon– Therefore, X is true
• Even worse than arguments from authority
• But admissible when the subject matter is public opinion or stronglyinfluenced by public opinion
Example:If opinion polls suggest that a considerable majority believes that therewill be a change of government at the next election, then there will be achange of government
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 18
Ind. generalisation Statistical syllogism Ind. reasoning in CS Machine Learning Data Mining
Inductive reasoning: Summary and applications
• Our motivation for considering inductive reasoning was the question
What is the right proto-theory/hypothesis/modelin a particular situation?
• We have seen that, for example, the method of differencemay also help us with the question
What is the right experiment to conduct?
• Both of these questions relate to the conduct of Research in generaland the conduct of Computer Science Research in particular
• A central question of Computer Science Research is
What can be (efficiently) automated(described as an algorithmic process)?
• So, a natural question is
Can inductive reasoning be automated?
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 19
Ind. generalisation Statistical syllogism Ind. reasoning in CS Machine Learning Data Mining
Computational Scientific Discovery
Can inductive reasoning be automated?
• Computational Scientific Discovery is the branch of Artificial Intelligencethat is concerned with providing answers to this question
• An early example of a scientific discovery system is Meta-Dendral
B. G. Buchanan and E. A. Feigenbaum: Dendral and Meta-Dendral.Artificial Intelligence 11(1–2):5–24, 1978
• System for rule discovery in thearea of chemical analysis viamass spectrometry
• Motivated by applications inspace exploration; Experiments and analysis may
need to be conducted withouthuman involvement
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 20
Ind. generalisation Statistical syllogism Ind. reasoning in CS Machine Learning Data Mining
BACON
• Another early example of a scientific discovery system is BACON(Langley et al, 1977–1983)
• Named after Francis Bacon (1561–1626),a pioneer of the scientific method
• BACON was a system for the discovery of (scientific) numeric laws,that is, laws of the form y = F (x)
• BACON was able to rediscover Ohm’s law, Boyle’s gas law,Kepler’s law of planetary motion, Galileo’s law of uniform acceleration
• Uses the plan-generate-test approach using a number ofsimple inference rules / rules of thumb for the generation of F
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 21
Ind. generalisation Statistical syllogism Ind. reasoning in CS Machine Learning Data Mining
BACON: Example
We have the following data for the period of revolution (P) of four ofJupiter’s moons in relation to their mean distance (D) to the planet
Moon Distance (D) Period (P)
A 5.67 1.769B 8.67 3.571C 14.00 7.155D 24.67 16.689
The task is to find a function F linking P to D
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 22
Ind. generalisation Statistical syllogism Ind. reasoning in CS Machine Learning Data Mining
BACON: Example
We have the following data for the period of revolution (P) of four ofJupiter’s moons in relation to their mean distance (D) to the planet
Moon Distance (D) Period (P) (D/P) (D2/P) (D3/P2)
A 5.67 1.769 3.203 18.153 58.15B 8.67 3.571 2.427 21.036 51.06C 14.00 7.155 1.957 27.395 53.61D 24.67 16.689 1.478 36.459 53.89
The task is to find a function F linking P to D
Solution: D3/P2 = 54.1775 or√D3/54.1775 = P
; We have rediscovered Kepler’s third law:“The square of the orbital period of a planet is directly
proportional to the cube of the semi-major axis of its orbit.”
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 23
Ind. generalisation Statistical syllogism Ind. reasoning in CS Machine Learning Data Mining
Robot Scientist
• Developed by the University of Aberystwyth
• Experiments can be designed by intelligent software andexecuted by the robot
• The results are analysed automatically by the softwareand are fed back into the next round of hypothesisformation and experimentation
• Theory generation uses inductive reasoning
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 24
Ind. generalisation Statistical syllogism Ind. reasoning in CS Machine Learning Data Mining
Machine Learning
• Inductive reasoning is not only useful for research,but for learning in general
• Machine Learning is the branch of Artificial Intelligence that isconcerned with the development of algorithms that learn rules,behaviours, etc using inductive reasoning based on data(or using abductive reasoning)
• Important subcategories of machine learning:
• Learning to classify
• Pattern recognition
• Example applications:
• Recognition of faces, crop blights, mal-manufactured items
• Intelligent non-player characters in computer games
• Classification of DNA sequences
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 25
Ind. generalisation Statistical syllogism Ind. reasoning in CS Machine Learning Data Mining
Data Mining
• Machine learning is a key component of Data Mining
• Typically associated with the analysis of large amounts of data
• Additionally involves storing large amounts of data, data cleansing,data visualisation
• Aims to find• previously unknown patterns (cluster analysis)• unusual data records (anomaly detection)• interdependencies in the data (association rule mining)
• Example applications:
• Advertising: To which offer/advertisement is a potential customermost likely to respond
• Basket analysis: What items are customers most likelyto buy together
• Sensitive data: Finding a user’s religious affiliations, political leanings,sexual orientation via analysis of social networking data
; Serious privacy concerns
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 26
Ind. generalisation Statistical syllogism Ind. reasoning in CS Machine Learning Data Mining
Data Mining: Example (People Analytics)
Google applied data mining to the question
What makes a good team leader?Answer:
1 Be a good coach
2 Empower your teams and don’t micromanage
3 Express interest in team member’s success and personal well-being
4 Don’t be a sissy: be productive and results orientated
5 Be a good communicator and listen to your team
6 Help your employees with career development
7 Have a clear vision and strategy for the team
8 Have key technical skills so you can help your team
; ( 8 ) is the only surprise– contradicts that “good managers can manage anything”– also contradicts that “technicals skills” are the most important
skills for a manager
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 27
Ind. generalisation Statistical syllogism Ind. reasoning in CS Machine Learning Data Mining
Further reading
• For more on Inductive Reasoning see
W. Hughes, J. Lavery, and K. Doran:Critical Thinking: An Introduction to the Basic Skills (6th revised ed).Broadview Press, 2010.Chapter 10
Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 28