Research methods in computer science - The University of...

49
Research methods in computer science Jonathan Shapiro School of Computer Science University of Manchester January 26, 2015

Transcript of Research methods in computer science - The University of...

Page 1: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Research methods in computer science

Jonathan Shapiro

School of Computer ScienceUniversity of Manchester

January 26, 2015

Page 2: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Research methods in computer science

Page 3: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Research methods commonly used in computerscience (largely after Chris Johnson)

Implementation or build driven research: Goal is to producesome artefact. A software system, a data set, etc.

Formal methods and mathematical proof: Uses formalmathematical methods to prove a system has givenproperties, or to design a system which has those properties.

Empirical methods: Uses experiments designed to testhypothesis.

Observational studies: Determine how systems perform in realuse, by studying their use.

Simulation: Uses computer simulations to address questiondifficult to answer in the real application.

Page 4: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Research methods commonly used in computerscience (largely after Chris Johnson)

Implementation or build driven research: Goal is to producesome artefact. A software system, a data set, etc.

Formal methods and mathematical proof: Uses formalmathematical methods to prove a system has givenproperties, or to design a system which has those properties.

Empirical methods: Uses experiments designed to testhypothesis.

Observational studies: Determine how systems perform in realuse, by studying their use.

Simulation: Uses computer simulations to address questiondifficult to answer in the real application.

Page 5: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Research methods commonly used in computerscience (largely after Chris Johnson)

Implementation or build driven research: Goal is to producesome artefact. A software system, a data set, etc.

Formal methods and mathematical proof: Uses formalmathematical methods to prove a system has givenproperties, or to design a system which has those properties.

Empirical methods: Uses experiments designed to testhypothesis.

Observational studies: Determine how systems perform in realuse, by studying their use.

Simulation: Uses computer simulations to address questiondifficult to answer in the real application.

Page 6: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Research methods commonly used in computerscience (largely after Chris Johnson)

Implementation or build driven research: Goal is to producesome artefact. A software system, a data set, etc.

Formal methods and mathematical proof: Uses formalmathematical methods to prove a system has givenproperties, or to design a system which has those properties.

Empirical methods: Uses experiments designed to testhypothesis.

Observational studies: Determine how systems perform in realuse, by studying their use.

Simulation: Uses computer simulations to address questiondifficult to answer in the real application.

Page 7: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Research methods commonly used in computerscience (largely after Chris Johnson)

Implementation or build driven research: Goal is to producesome artefact. A software system, a data set, etc.

Formal methods and mathematical proof: Uses formalmathematical methods to prove a system has givenproperties, or to design a system which has those properties.

Empirical methods: Uses experiments designed to testhypothesis.

Observational studies: Determine how systems perform in realuse, by studying their use.

Simulation: Uses computer simulations to address questiondifficult to answer in the real application.

Page 8: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Research methods commonly used in computerscience (largely after Chris Johnson)

Implementation or build driven research: Goal is to producesome artefact. A software system, a data set, etc.

Formal methods and mathematical proof: Uses formalmathematical methods to prove a system has givenproperties, or to design a system which has those properties.

Empirical methods: Uses experiments designed to testhypothesis.

Observational studies: Determine how systems perform in realuse, by studying their use.

Simulation: Uses computer simulations to address questiondifficult to answer in the real application.

Page 9: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Are there others?

(Discussion)

Page 10: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — implementation driven

Risks:

I How do you add to knowledge just bybuilding a system? How to generalise from aspecific system to general principles.

I What if you fail. Will you be able todifferentiate failures of the implementationfrom failures of the general idea.

Benefits:

I If what you build is widely used, or used foran important problem, it can have impact

Page 11: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — implementation driven

Risks:I How do you add to knowledge just by

building a system? How to generalise from aspecific system to general principles.

I What if you fail. Will you be able todifferentiate failures of the implementationfrom failures of the general idea.

Benefits:

I If what you build is widely used, or used foran important problem, it can have impact

Page 12: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — implementation driven

Risks:I How do you add to knowledge just by

building a system? How to generalise from aspecific system to general principles.

I What if you fail. Will you be able todifferentiate failures of the implementationfrom failures of the general idea.

Benefits:

I If what you build is widely used, or used foran important problem, it can have impact

Page 13: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — implementation driven

Risks:I How do you add to knowledge just by

building a system? How to generalise from aspecific system to general principles.

I What if you fail. Will you be able todifferentiate failures of the implementationfrom failures of the general idea.

Benefits:I If what you build is widely used, or used for

an important problem, it can have impact

Page 14: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — implementation driven

I Be very careful with this type of research.I Don’t let implementation distract you from your research

question.

Page 15: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — formal methods

Risks:

I Cases you can solve may be too simple to berelevant to interesting applications.

I Cases may be too general or abstract tocover the complexity of real systems.

Benefits:

I A correct proof is indisputable.I You can tell people to shut up. (Because you

can prove that their claims are wrong.)

Page 16: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — formal methods

Risks:I Cases you can solve may be too simple to be

relevant to interesting applications.

I Cases may be too general or abstract tocover the complexity of real systems.

Benefits:

I A correct proof is indisputable.I You can tell people to shut up. (Because you

can prove that their claims are wrong.)

Page 17: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — formal methods

Risks:I Cases you can solve may be too simple to be

relevant to interesting applications.I Cases may be too general or abstract to

cover the complexity of real systems.Benefits:

I A correct proof is indisputable.I You can tell people to shut up. (Because you

can prove that their claims are wrong.)

Page 18: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — formal methods

Risks:I Cases you can solve may be too simple to be

relevant to interesting applications.I Cases may be too general or abstract to

cover the complexity of real systems.Benefits:

I A correct proof is indisputable.

I You can tell people to shut up. (Because youcan prove that their claims are wrong.)

Page 19: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — formal methods

Risks:I Cases you can solve may be too simple to be

relevant to interesting applications.I Cases may be too general or abstract to

cover the complexity of real systems.Benefits:

I A correct proof is indisputable.I You can tell people to shut up. (Because you

can prove that their claims are wrong.)

Page 20: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — empirical approach

Risks:

I Requires a good baseline for comparison.I Requires careful control.I Can leave unanswered the question of

“why?”.

Benefits:

I A carefully designed experiment with goodstatistical support can produce strongevidence for a hypothesis.

Page 21: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — empirical approach

Risks:I Requires a good baseline for comparison.

I Requires careful control.I Can leave unanswered the question of

“why?”.

Benefits:

I A carefully designed experiment with goodstatistical support can produce strongevidence for a hypothesis.

Page 22: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — empirical approach

Risks:I Requires a good baseline for comparison.I Requires careful control.

I Can leave unanswered the question of“why?”.

Benefits:

I A carefully designed experiment with goodstatistical support can produce strongevidence for a hypothesis.

Page 23: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — empirical approach

Risks:I Requires a good baseline for comparison.I Requires careful control.I Can leave unanswered the question of

“why?”.Benefits:

I A carefully designed experiment with goodstatistical support can produce strongevidence for a hypothesis.

Page 24: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — empirical approach

Risks:I Requires a good baseline for comparison.I Requires careful control.I Can leave unanswered the question of

“why?”.Benefits:

I A carefully designed experiment with goodstatistical support can produce strongevidence for a hypothesis.

Page 25: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — observational studies

Risks:

I Lack of controls; how to quantify performanceand relative to what.

Benefits:

I Can see how a system behaves in real use,rather than an artificially controlledexperiment.

Page 26: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — observational studies

Risks:I Lack of controls; how to quantify performance

and relative to what.Benefits:

I Can see how a system behaves in real use,rather than an artificially controlledexperiment.

Page 27: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — observational studies

Risks:I Lack of controls; how to quantify performance

and relative to what.Benefits:

I Can see how a system behaves in real use,rather than an artificially controlledexperiment.

Page 28: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — simulation driven

Risks:

I Simulation might not be a faithfulrepresentation.

I Example: simulated parallelism versus realparallelism.

Benefits:

I Allows for potentially large number ofexperiments.

I Allows for high level of control.

Page 29: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — simulation driven

Risks:I Simulation might not be a faithful

representation.

I Example: simulated parallelism versus realparallelism.

Benefits:

I Allows for potentially large number ofexperiments.

I Allows for high level of control.

Page 30: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — simulation driven

Risks:I Simulation might not be a faithful

representation.I Example: simulated parallelism versus real

parallelism.Benefits:

I Allows for potentially large number ofexperiments.

I Allows for high level of control.

Page 31: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — simulation driven

Risks:I Simulation might not be a faithful

representation.I Example: simulated parallelism versus real

parallelism.Benefits:

I Allows for potentially large number ofexperiments.

I Allows for high level of control.

Page 32: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Risk and benefits — simulation driven

Risks:I Simulation might not be a faithful

representation.I Example: simulated parallelism versus real

parallelism.Benefits:

I Allows for potentially large number ofexperiments.

I Allows for high level of control.

Page 33: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

The scientific method

Page 34: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

The scientific method

I What is the scientific method?(discussion)

I Is computer science a science?(more discussion?)

Page 35: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Karl Popper’s view of the scientific method

(as presented by Medawar’s “Notes to a Young Scientist”)1. Begin with good knowledge of the subject.2. Generate questions.3. Create hypotheses — creative, imaginative.4. Deduce consequences which must follow — deductive,

logical.5. Do experiments attempting to disprove the consequences

(or establish null hypotheses).Hypothetico-deductive model of research. Is this relevant tocomputer science?

Page 36: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

What makes a good hypothesis?

I It must be falsifiable.I (Think of some examples of hypotheses which are not

falsifiable.)

Page 37: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

What makes a good hypothesis?

I It must be falsifiable.

I (Think of some examples of hypotheses which are notfalsifiable.)

Page 38: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

What makes a good hypothesis?

I It must be falsifiable.I (Think of some examples of hypotheses which are not

falsifiable.)

Page 39: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Does science really work this way?

Philosophers of science: (and social scientists studyingscience) Science does not really work this way.

Working scientists: The Popperian view is the most importantstatement of what science strives to be.

Page 40: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Criticism of Popper

Important theories are not falsifiable: They are verifiedindirectly through auxiliary theories.

Important theories are not falsified: They are modified toaccommodate new evidence. This takes placeuntil it becomes untenable, whence a “paradigmshift” can occur.

Important discoveries may lack a hypothesis: e.g. DNA, thecosmic background radiation.

Page 41: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Hallmarks of good (empirical) science

I Contains a clear hypothesis.I Is replicable — can be repeated by others.I Is controlled.I Manipulates 1 variable at a time.I Is concerned with ceiling and floor effects — in some

cases, any method performs very well or very badly.I Is concerned with scaling.I Is concerned with getting good statistical support.

Page 42: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Relevant to CS?

I Sadly, often not.I Survey of 150 AAAI papers (Cohen, 90)

60% gave no evidence the method had been triedon more than one problem,

80% made no attempt to explain performance,16% had a clearly defined question or hypothesis.

Page 43: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Relevant to CS?

More recent movements towards better empiricalmethodologies

I In algorithmic analysis, by Ian Gent and Toby Walsh (see,for example, “How Not To Do It”, Gent, et. al. 1997).

I In software engineering, ACM/IEEE Symposium onEmpirical Software Engineering and Measurement.

Page 44: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Summary

1. Be developing background knowledge through readingand carrying out work proposed by your supervisor.

2. Be constantly developing research questions,I Criticise, analyseI Be open to merging new parts of the field together.

3. Research proposal develops from turning researchquestion into hypothesis.

4. Derive consequences from hypothesis which can beformally proven or refuted by observation.

5. Devise critical method to test hypothesis.

Page 45: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

Factors common to all scientific research

Page 46: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

1. Clearly defined research problem

I If you lack a clearly defined research problem, you willstruggle to produce a thesis which is about anything.

I Keep reminding yourself what your research problem is.I Keep articulating it to others.

Page 47: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

2. Continue to improve your knowledge of the field

I Know your research context.I Literature review is an on-going process.I Remember: published literature tells you what is known a

year or more ago; Research seminars and talking topeople required for up-to-date knowledge.

Page 48: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

3. Document your findings

I Keep a notebook, or use some tool for documentation.I Ideas are nothing until you at least write them down.I Keep careful records of (potentially) useful literature.

Page 49: Research methods in computer science - The University of ...studentnet.cs.manchester.ac.uk/pgr/2014/COMP80122/RM.pdf · Research methods commonly used in computer science (largely

4. Be efficient with your time

I This is hard.I Try to plan your time as much as you can.