Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10....

28
Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt Department of Computer Science School of Electrical Engineering, Electronics, and Computer Science University of Liverpool Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 1

Transcript of Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10....

Page 1: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Professional Skills in Computer ScienceLecture 8: Induction (2)

Ullrich Hustadt

Department of Computer ScienceSchool of Electrical Engineering, Electronics, and Computer Science

University of Liverpool

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 1

Page 2: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS

Contents

1 Inductive generalisationDefinitionHasty generalisationOvergeneralisationBiased sampleObservation

2 Statistical syllogismDefinition and examplesFallacy by accidentArguments from authorityFallacy by appeal to inappropriate authorityArguments from consensus

3 Inductive Reasoning in Computer ScienceMachine LearningData Mining

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 2

Page 3: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS

Today . . .

Relevant learning outcomes:

1 Ability to describe and discusseconomic, historic, organisational, research, and socialaspects of computing as a discipline and computing in practice

2 To effectively retrieve informationincluding the use of library and web sources andthe evaluation of information retrieved from such sources

3 To recognise and employ sound reasoning and argumentationtechniques as part of conducting basic research

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 3

Page 4: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS

Causal induction / Causal inference

• Mill’s five methods of induction / five methods of experimental inquiry

1 Method of agreement2 Method of difference3 Joint method of agreement and difference4 Method of concomitant variations5 Method of residue

are methods for causal induction (or causal inference)

• Causal induction draws a conclusion about a causal connectionbased on the circumstances of the occurrence of an effect

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 4

Page 5: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS

Causal induction: Example

• In 1695, Edmond Halley was computing the orbits of a set of cometsfor inclusion in Newton’s Principia Mathematica

• He noticed that comets that were observed in 1531, 1607, and 1682took very similar paths across the skyAlso, the observations were 75–76 years apart(suggesting a regular interval)

• Newton had already established (by induction) that comets followcertain paths, e.g. a parabolic path or an elliptic orbit

• Halley inferred by induction that the three sightings were caused by thesame comet orbiting the sun on a highly elliptic orbit

• Note: This could be seen as hasty generalisation, but we now know thatthe comet has been observed since 240 BC by Chinese and Babylonianastronomers

(Source: T. L. Griffiths and J. B. Tenenbaum: Theory-Based CausalInduction. Psychological Review 116(4):661-716, 2009.)

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 5

Page 6: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS

Other forms of inductive reasoning

• Causal induction is only one form of inductive reasoning

• In particular, we were looking for reasoning that from observations like

All the crows I’ve ever seen were black

draws a conclusion like

All crows are black

• This does not appear to be causal induction

• Instead this form of inductive reasoning is based on

1 Inductive generalisation2 Statistical syllogism

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 6

Page 7: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Hasty Overgeneral Bias Observation

Inductive generalisation

• An inductive generalisation takes a sample of a populationand draws a conclusion about the entire population:

Proportion X of sample S have property PthereforeProportion X of the entire population have property P

Example:

• You have a box with 100 balls in it, some black, some white

• You draw a sample of 5 balls out of the box,4 of them are black, i.e., 80%, and 1 is white, i.e., 20%

• Inductive generalisation:80% of all the balls in the box are black and 20% are white

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 7

Page 8: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Hasty Overgeneral Bias Observation

Inductive generalisation

• A special case of inductive generalisation occurswhen the proportion X of the sample represents the whole sample:

Every instance of sample S has property PthereforeEvery instance of the entire population has property P

Example:Every crow that I have ever seen was blackthereforeEvery crow in the entire world is black

• This was exactly the kind of inductive reasoning that we were looking for

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 8

Page 9: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Hasty Overgeneral Bias Observation

Hasty generalisation

• Inductive generalisation requires a sample that is sufficiently large andunbiased

• A sample that is too small can lead to a hasty generalisation

Example:

• You have a box with 100 balls in it, some black, some white, some red

• You draw a sample of 2 balls out of the box,1 of them is black, i.e., 50%, and 1 is white, i.e., 50%

• Generalisation:50% of all the balls in the box are black and 50% are white,there are no red balls in the box

; A sample of 2 balls could never have been representativegiven that there are 3 colours involved

; Note that this generalisation might still be correct!

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 9

Page 10: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Hasty Overgeneral Bias Observation

Overgeneralisation

• A special instance of hasty generalisation is overgeneralisation

• Overgeneralisation occurs if you draw an overly-general conclusion thatis unwarranted by the sample

Instances Salad Fish Meat Chicken Sick

Andy yes yes yes yes yes

Dave yes yes yes

Frank yes yes yes yes

Eve yes yes yes

Jack yes yes yes yes

Betty yes yes yes

• Causal induction: This particular salad makes you sick

• Overgeneralisation: Salad is bad for you

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 10

Page 11: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Hasty Overgeneral Bias Observation

Biased sample

• A biased sample occurs when a sample is collected in such a way thatsome members of the intended population are less likely to be includedthan others

• A biased sample is again not a sound basis for inductive generalisation

Example:

• The average age of people studying or working at the Universityis 28 years

• Generalisation: The average age of the UK population is 28 years

; – In reality, the average age of the UK population is 38 years– The sample of people studying and working at the University

is biased towards younger people

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 11

Page 12: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Hasty Overgeneral Bias Observation

Insufficient Range of Observational Circumstances

Example:

• We observe that a fellow student, Michael, is grumpy onWednesday, 2nd November,Wednesday, 9th November,Wednesday, 16th November,Wednesday, 23rd November

• We conclude that Michael is always grumpy on Wednesdays

• We failed to recognise that these dates coincide with COMP101coursework deadlines and that this is the cause for Michael’s grumpiness

• As soon as COMP101 is over Michael will be grumpy on a different dayof the week

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 12

Page 13: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Accident Authority

Statistical syllogism

• A statistical syllogism proceeds from a generalisation toa conclusion about an individual

– Proportion X of the population have property P (where X is large)– Individual I is a member of that population– Therefore, I has property P

• Syllogism means “conclusion” or “inference”

• Beware: Some dictionaries define a syllogism asa “deductive scheme” or “deductive reasoning”

Statistical syllogism is not a form of deductive reasoningIt is a form of inductive reasoning

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 13

Page 14: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Accident Authority

Statistical syllogism

• A statistical syllogism proceeds from a generalisation toa conclusion about an individual

– Proportion X of the population have property P (where X is large)– Individual I is a member of that population– Therefore, I has property P

Example:– 90% of university students have above average intelligence– You are a university student– Therefore, you have above average intelligence

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 14

Page 15: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Accident Authority

Statistical syllogism: Fallacy by accident

• Fallacy by accident: a generalisation is applied when circumstancessuggest that there should be an exception

Example:– Exceeding the speed limit is (almost always) an offence– The driver of an ambulance has exceeded the speed limit– Therefore, the driver has committed an offence

Obviously, we should realise that an ambulance may be exempted fromobeying the speed limit

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 15

Page 16: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Accident Authority

Statistical syllogism: Arguments from authority

• Arguments from authority can be seen as a version ofstatistical syllogism:

Statistical syllogism

– Proportion X of the population have property P (where X is large)– Individual I is a member of that population– Therefore, I has property P

Argument from authority

– Most of what authority A says on subject matter S is correct– X is something that A says in the context of S– Therefore, X is true

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 16

Page 17: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Accident Authority

Arguments from authority:Appeal to inappropriate authority

• Arguments from authority are best avoided in science

• If you still feel the need to use such an argument, make sure that youavoid the fallacy of appeal to inappropriate authority where the authorityand subject matter does not satisfy all of the following conditions:

1 The authority is a recognised expert on the subject matter2 There is general agreement among authorities on questions / statements

relating to that subject matter3 There is no good reason to suspect that the authority is biased on the

subject matter or the particular question

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 17

Page 18: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS Definition Accident Authority

Statistical syllogism: Arguments from consensus

• Arguments from consensus can be seen as a version ofstatistical syllogism:

Argument from consensus

– Most of the claims that most of the people agree upon are true– X is a claim that most people agree upon– Therefore, X is true

• Even worse than arguments from authority

• But admissible when the subject matter is public opinion or stronglyinfluenced by public opinion

Example:If opinion polls suggest that a considerable majority believes that therewill be a change of government at the next election, then there will be achange of government

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 18

Page 19: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS Machine Learning Data Mining

Inductive reasoning: Summary and applications

• Our motivation for considering inductive reasoning was the question

What is the right proto-theory/hypothesis/modelin a particular situation?

• We have seen that, for example, the method of differencemay also help us with the question

What is the right experiment to conduct?

• Both of these questions relate to the conduct of Research in generaland the conduct of Computer Science Research in particular

• A central question of Computer Science Research is

What can be (efficiently) automated(described as an algorithmic process)?

• So, a natural question is

Can inductive reasoning be automated?

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 19

Page 20: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS Machine Learning Data Mining

Computational Scientific Discovery

Can inductive reasoning be automated?

• Computational Scientific Discovery is the branch of Artificial Intelligencethat is concerned with providing answers to this question

• An early example of a scientific discovery system is Meta-Dendral

B. G. Buchanan and E. A. Feigenbaum: Dendral and Meta-Dendral.Artificial Intelligence 11(1–2):5–24, 1978

• System for rule discovery in thearea of chemical analysis viamass spectrometry

• Motivated by applications inspace exploration; Experiments and analysis may

need to be conducted withouthuman involvement

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 20

Page 21: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS Machine Learning Data Mining

BACON

• Another early example of a scientific discovery system is BACON(Langley et al, 1977–1983)

• Named after Francis Bacon (1561–1626),a pioneer of the scientific method

• BACON was a system for the discovery of (scientific) numeric laws,that is, laws of the form y = F (x)

• BACON was able to rediscover Ohm’s law, Boyle’s gas law,Kepler’s law of planetary motion, Galileo’s law of uniform acceleration

• Uses the plan-generate-test approach using a number ofsimple inference rules / rules of thumb for the generation of F

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 21

Page 22: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS Machine Learning Data Mining

BACON: Example

We have the following data for the period of revolution (P) of four ofJupiter’s moons in relation to their mean distance (D) to the planet

Moon Distance (D) Period (P)

A 5.67 1.769B 8.67 3.571C 14.00 7.155D 24.67 16.689

The task is to find a function F linking P to D

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 22

Page 23: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS Machine Learning Data Mining

BACON: Example

We have the following data for the period of revolution (P) of four ofJupiter’s moons in relation to their mean distance (D) to the planet

Moon Distance (D) Period (P) (D/P) (D2/P) (D3/P2)

A 5.67 1.769 3.203 18.153 58.15B 8.67 3.571 2.427 21.036 51.06C 14.00 7.155 1.957 27.395 53.61D 24.67 16.689 1.478 36.459 53.89

The task is to find a function F linking P to D

Solution: D3/P2 = 54.1775 or√D3/54.1775 = P

; We have rediscovered Kepler’s third law:“The square of the orbital period of a planet is directly

proportional to the cube of the semi-major axis of its orbit.”

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 23

Page 24: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS Machine Learning Data Mining

Robot Scientist

• Developed by the University of Aberystwyth

• Experiments can be designed by intelligent software andexecuted by the robot

• The results are analysed automatically by the softwareand are fed back into the next round of hypothesisformation and experimentation

• Theory generation uses inductive reasoning

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 24

Page 25: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS Machine Learning Data Mining

Machine Learning

• Inductive reasoning is not only useful for research,but for learning in general

• Machine Learning is the branch of Artificial Intelligence that isconcerned with the development of algorithms that learn rules,behaviours, etc using inductive reasoning based on data(or using abductive reasoning)

• Important subcategories of machine learning:

• Learning to classify

• Pattern recognition

• Example applications:

• Recognition of faces, crop blights, mal-manufactured items

• Intelligent non-player characters in computer games

• Classification of DNA sequences

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 25

Page 26: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS Machine Learning Data Mining

Data Mining

• Machine learning is a key component of Data Mining

• Typically associated with the analysis of large amounts of data

• Additionally involves storing large amounts of data, data cleansing,data visualisation

• Aims to find• previously unknown patterns (cluster analysis)• unusual data records (anomaly detection)• interdependencies in the data (association rule mining)

• Example applications:

• Advertising: To which offer/advertisement is a potential customermost likely to respond

• Basket analysis: What items are customers most likelyto buy together

• Sensitive data: Finding a user’s religious affiliations, political leanings,sexual orientation via analysis of social networking data

; Serious privacy concerns

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 26

Page 27: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS Machine Learning Data Mining

Data Mining: Example (People Analytics)

Google applied data mining to the question

What makes a good team leader?Answer:

1 Be a good coach

2 Empower your teams and don’t micromanage

3 Express interest in team member’s success and personal well-being

4 Don’t be a sissy: be productive and results orientated

5 Be a good communicator and listen to your team

6 Help your employees with career development

7 Have a clear vision and strategy for the team

8 Have key technical skills so you can help your team

; ( 8 ) is the only surprise– contradicts that “good managers can manage anything”– also contradicts that “technicals skills” are the most important

skills for a manager

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 27

Page 28: Professional Skills in Computer Science - Handoutsullrich/COMP110/notes/lect08.pdf · 2016. 10. 13. · Professional Skills in Computer Science Lecture 8: Induction (2) Ullrich Hustadt

Ind. generalisation Statistical syllogism Ind. reasoning in CS Machine Learning Data Mining

Further reading

• For more on Inductive Reasoning see

W. Hughes, J. Lavery, and K. Doran:Critical Thinking: An Introduction to the Basic Skills (6th revised ed).Broadview Press, 2010.Chapter 10

Ullrich Hustadt COMP110 Professional Skills in Computer Science L8 – 28