Geodemographic Output Area Classifications for London, 2001-2011
What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in...
Transcript of What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in...
![Page 1: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/1.jpg)
WhatarethechallengesforDataScience?
MagnusRattrayDirector,UniversityofManchesterDataScienceInstitute
ProfessorofComputational&SystemsBiologyFacultyofBiology,Medicine&Health
UniversityofManchester
www.datascience.manchester.ac.uk
![Page 2: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/2.jpg)
TheLargeSynopticSurveyTelescope:• 3.2Gpixelcamera• 2000exposurespernight• 20TBpernight• 10yearsurvey100PBdata
Initsfirstmonthofoperation,LSSTwillsurveymoreoftheUniversethanallprevioustelescopes
Astronomy
![Page 3: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/3.jpg)
Particlephysics
LargeHadronCollider(Atlasexperiment)• 1billionproton-protoncollisionseverysecond• Nominaloutputrateofdetector:68TB/s• Actualoutputratetodisk:1.5GB/s(reducedviafastidentificationof“interesting”events)
• Datarateofupto100TBperday,forupto6monthsperyear,for10-15years200PB
![Page 4: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/4.jpg)
Commute-flowisabrandnewgeodemographic classification ofcommutingflowsforEnglandandWalesbasedonorigin-destinationdatafromthe2011Censusthathasbeenusedtoanalysethespatialdynamicsofcommuting.Aninteractivetoolkitis@www.commute-flow.net26milliontraveltoworkflowsrecordedin2011censusforEnglandandWales
Hincks,S.,Kingston,R.,Webb,B.andWong,C.(inpress)ANewGeodemographicClassificationofCommutingFlowsforEnglandandWales.InternationalJournalofGeographicInformationScience.
A new two-tiergeodemographictypologyofcommutingpatternswith9super-groupsandatotalof40groups.Eachincludesapenportraitwithaninteractiveflowmapandradialchart.
Geography
![Page 5: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/5.jpg)
Mental health
Sport
Swimmingpool
Volleyball
1.RawGPSdata
2.Detectionofgeolocationvisited
3.Geolocationsvisited
4.Identificationofplacesvisited
5.Placesvisited
6.Typeofplacesandactivitiesrecognition
7.Out-of-homeactivities
Difrancesco et al. Out-of-home activity recognition from GPS data in schizophrenic patients. IEEE 29th International Symposium on Computer-Based Medical Systems (CBMS 2016).
![Page 6: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/6.jpg)
Respiratoryhealth
![Page 7: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/7.jpg)
Researchisincreasinglydata-drivenBottom-upmodelling:• Definemodelofsystemfromassumedmicroscopicprinciples• Developatractableapproximationto“solve”themodel• Exploresystempropertiesforvariousparametersettings(e.g.growthrates,stationaryproperties,phasetransitions)• Test/refine/revisethemodelgivenexperimentaldata
Data-drivenmodelling:• Identifysystemvariablesthatcanbemeasured:thedata• Fitagenerativeorpredictivestatisticalmodeltothedata• Makeinferences,learnhiddenvariables,scoremodels
Increasinglyweareconnectingtheseapproaches– allowingforstrong“mechanistic”priorknowledgewithindata-drivenmodels
![Page 8: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/8.jpg)
ChallengesforDataScience
• Bigdata– scalability• Complexdata– modelling &inference• Messydata– probability& statistics• Humandata– privacy,ethics,interaction• Accessibledata– openness,reproducibility
![Page 9: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/9.jpg)
“Datahandlingisnowthebottleneck.Itcostsmoretoanalyze agenomethantosequenceagenome.”DavidHaussler
High-throughputDNAsequencing
Example:Genomics
![Page 10: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/10.jpg)
Genomics:[email protected]_11067_FC7070M:4:1:2299:1109length=50TTGCCTGCCTATCATTTTAGTGCCTGTGAGGTGGAGATGTGAGGATCAGT+SRR566546.970HWUSI-EAS1673_11067_FC7070M:4:1:2299:1109length=50hhhhhhhhhhghhghhhhhfhhhhhfffffe`ee[`X]b[d[ed`[Y[^[email protected]_11067_FC7070M:4:1:2374:1108length=50GATTTGTATGAAAGTATACAACTAAAACTGCAGGTGGATCAGAGTAAGTC+SRR566546.971HWUSI-EAS1673_11067_FC7070M:4:1:2374:1108length=50hhhhgfhhcghghggfcffdhfehhhhcehdchhdhahehffffde`[email protected]_11067_FC7070M:4:1:2438:1109length=50TGCATGATCTTCAGTGCCAGGACCTTATCAAGCGGTTTGGTCCCTTTGTT+SRR566546.972HWUSI-EAS1673_11067_FC7070M:4:1:2438:1109length=50dhhhgchhhghhhfhhhhhdhhhhehhghfhhhchfddffcffafhfghe
200GBdatafor60xcoverageoverhumangenome20PBfor100Kgenomes
![Page 11: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/11.jpg)
Royetal.Science2010
RNA-SeqTranscriptomics
Bis-Seq,ChIP-SeqEpigenomics
DNA-SeqGenomics
HiC,ChIA-PETInteractomics
Genomics:complexdata• DNAsequencingisanincrediblydisruptivetechnology• Genomicsisnotjustaboutgenomes!Many‘omics layers
![Page 12: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/12.jpg)
Lister,Pelizzola etal.Nature2009
Genomics:messydata
![Page 13: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/13.jpg)
• 111reference“epigenomes”• 2804high-throughputsequencingdatasets• 1.5x1011mappedsequencereads• >1013sequencedDNAbases(>1000genomes)
Everynew‘omic layerisasbigasagenome
![Page 14: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/14.jpg)
Genomic&Precisionmedicine
Precisiondiagnosis&precisiontreatment
Prognostics&Theranostics
Informingprevention
Newmodelsofcareatdisease
boundariesDrivingrapidinnovation&adoption
Roleofmulti-omics
Linking‘big’data
Re-aligningincentivesforcommiss’ng –drivenbyscience,research
Genomics– humandata
“Genomics– thechangingfaceofclinicalcare”SueHill,ChiefScientificOfficerforEngland
![Page 15: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/15.jpg)
• Life-coursecomplexityindicatesmultiple(sub-)diseases– Usuallystartsyoung– Mayprogress,remit orrelapse overlife
• Inconsistentgene-environmentinteractionsindicatesmultiple(sub-)diseases– Variableeffectsofgeneticpolymorphisms,e.g.CD14– Variabletreatment-setting interactions
Example:Asthmas StretchGenomics
Calleleassociated
Talleleassociated
Noassociation
CD14EndotoxinReceptor
SimpsonAetal.Endotoxinexposure,CD14,andallergicdisease:aninteractionbetweengenesandtheenvironment.AmJRespir Crit CareMed.2006;174(4):386-92.
50-60%heritabilityintwinstudiesbut<2%phenotype
explainedbycurrentgenomics
SlidesfromIainBuchan
![Page 16: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/16.jpg)
• ProgressionofallergyEczema →Asthma→Rhinitis
• Inferredfrompopulationsummary→
• Assumedcausal linkbetweeneczema– asthma&rhinitis
• Clinicalresponse:target childrenwitheczematoreduceprogressiontoasthma
ReceivedWisdom:AtopicMarch
Spergel &Paller,2003
WorldAllergyOrganization,2014
![Page 17: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/17.jpg)
EcologicFallacyRevealed
Belgraveetal.DevelopmentalProfilesofEczema,Wheeze,andRhinitis:TwoPopulation-BasedBirthCohortStudies.PloS Medicine2014;21;11(10):e1001748.
MRCSTELARconsortiumworkingatscaleacrossMAASandALSPACScohorts
Model-basedmachinelearning
allowingfortransitionsbetweenskin,lungandnasalallergiesovertime
![Page 18: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/18.jpg)
BetterTargetsfor‘Omics
Belgraveetal.DevelopmentalProfilesofEczema,Wheeze,andRhinitis:TwoPopulation-BasedBirthCohortStudies.PloS Medicine2014;21;11(10):e1001748.
Disambiguatediseaseprofilestomovetowardcausalmodellingandefficientidentificationof
mechanisms
![Page 19: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/19.jpg)
Data TypeLarge-scale Structural Changes
Balanced Translocations
Distant Consanguinity
Uniparental Disomy
Novel / Known Coding Variants
Novel / Known Non-coding
VariantsTargetedgenesequencing û û û û ü ûSNP+arrays ûü û ü ü û ûArrayCGH* ûü û û û û ûExome ûü û ûü ûü ü ûWholeGenome ûü ü ü ü ü ü
+SingleNucleotidePolymorphism*ComparativeGenomicHybridisation
10,000
100,000
1,000,000
10,000,000
100,000,000
1,000,000,000
10,000,000,000
0 0.5 1 1.5 2 2.5
GenotypingWholegenome3.3bnbasesBothexonsandintronsExome
10mbasesExonsonly
Panels<10mbases
Subsetofexons
“Genomics– thechangingfaceofclinicalcare”SueHill,ChiefScientificOfficerforEngland
Towardsgenomicmedicine
![Page 20: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/20.jpg)
Genomics– accessibledata?
• Sequencing100,000genomesfrompatientswithcancerandrarediseases• £24mdatainfrastructureawardfromMRC• GenomicsEnglandClinicalInterpretationPartnerships(GeCIPs)toenhancevalueofdata
![Page 21: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/21.jpg)
• SequencingfacilityattheSangerCentre• 30PBdatainadatacentreonamilitarybase• Researchers(GeCIP members)willnotbeallowedtodownloadrawdatafiles
• Restrictedaccesstodataandcomputethroughsecurevirtualdesktop(Inuvika)
• Analysishastomovetothedata
Buthowdowemovethistoaglobalscale?Howdoweanalyseacrossmanydatasets?
100KGenomesProject
![Page 22: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/22.jpg)
NextGenomicRevolution:Scalingdowntosinglecells
Microfluidicssequencing/cytometry
DNA/RNA
ProteinFuidigm C1
![Page 23: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/23.jpg)
Single-celldata
• Existinggenomicmethodsaverageoveracellpopulationof̴107cells
• Single-cellmethodsuncoverhiddenstructure:– Diversesub-populationsofimmunecells– Clonalstructurewithintumours– Rarecirculatingtumourcellsfromblood– Asynchronouscellulardynamics– Eachcellisnowahigh-dimensionaldatapoint
![Page 24: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/24.jpg)
Clusteringsinglecellproteindata
Amiretal. NatureBiotech.2013
![Page 25: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/25.jpg)
Uncoveringclonalevolutionintumours
Time
Normal cells
t0 t1 t2 t3 tsample
Tissue volumeat time of sampling
A
ABD
ABC
Genotypes
20%
15%
25%
40%
Clones
Life history of the tumor Poly-clonal tumor at sampling
0
Clonal evolution tree
15
20
0
A
AB
40
ABD
25
ABC
FlorianMarkowetz,CRUKCambridge– fromhisblog“ScientificB-sides”
![Page 26: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/26.jpg)
Approach
Targeted:• BasicCNAtoverifyCTCstatus• Target1-20genes• UseWBCsas–ve controls
GenomeWide:• Copynumberalteration(CNA)• WES- comprehensiveanalysis• UseWBCsas–ve controls
6SCLCpatientschosenwith=>4singleisolatedCTCsandCTCpoolsCNAdatafrom6,682cancer-relatedprotein-codinggenes
TP53
* Poolof10CTCs
** * * * * * *
Circulatingtumourcells(CTC)profiling
Expandedstudyongoing,2000CTCsfrom30patients
CTCenrichmentviaCellSearchCTCisolationviaDepArray
CarolineDiveandGed Brady,CRUKManchesterInstitute
![Page 27: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/27.jpg)
Modellingchallenge:confoundingvariation
Stegle etal.NatureReviewsGenetics2014
![Page 28: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/28.jpg)
SinglecelldataLastyear
Single-cellRNA-Seq103 cellsperexperiment107 sequencereadspercell104featuresextractedpercell
CyTOF proteinquantification103cellspersecond106 perexperiment30-50featurespercell
ThisyearSingle-cellRNA-Seq106 cellsperexperiment108 readspercell>105featurespercell
Singlecellmulti-omics
?
![Page 29: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/29.jpg)
Whatarethepinchpoints?
• Datavolume:costandtransferspeed• Dataanalysis:scalablealgorithms• Dataquality:batcheffects,missingdata,missingmetadata,conceptdrift
• Dataintegration:multi-modalmodelling• Reproducibleandrobustresearch
![Page 30: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/30.jpg)
Datavolume
• Movealgorithmstothedata– Putcomputeclosetolocaldata– Commercialcloud(e.g.BaseSpace,Cytobank)– Bespokesecurecloud(e.g.100Kgenomesproject)
• Issuestoconsider– Willyouralgorithmsgivesameresults?– Willtheanalysisbereproducibleinthefuture?– Howtointegrateacrossresources?
![Page 31: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/31.jpg)
Dataanalysis
• Scalingupalgorithms,e.g.DeeplearninglibrariesintegratingCPU/GPUarchitectures
• Fastapproximatemethods• Online/streamingdataprocessing• Avoidsolvingcompute-intensiveintermediatetasks:e.g.avoidgenomicalignmentpriortocountingsub-sequencematches(k-mers)
• Mixedprecisionnumerics
![Page 32: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/32.jpg)
MethodsforMachineLearningnolongersimplyassessedonpredictiveaccuracy
Dataanalysis
![Page 33: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/33.jpg)
Dataquality
Bigcollecteddataaretypicallynotdesignedforasingleresearchquestion(oranyresearchquestion)
Weneedmethodstodealwith:
Confounders,batcheffects,missingdata,missingmetadata,conceptdrift,outliers….
(whileremainingscalable)
![Page 34: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/34.jpg)
![Page 35: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/35.jpg)
RobustandreproducibleresearchPublishdata,code,workflows,versionnumbers,containers…
Resultsshouldnotdependstronglyonarbitrarymodellingchoices“shakethemodel”(ChrisHolmes)
“Hypothesisselection”leadstoupwardsignificancebias• Trytobreakyourmodels• Userobustmodels• Usebootstrapping
Keeptrackofallhypothesesyouhaveconsidered• Storeyourworkinghistory– notebookscience• Publishnegativeresults
![Page 36: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/36.jpg)
Robustandreproducibleresearch• Buildreproducibilityintoyourroutine– don’twaituntilafter
yourpaperisaccepted• Don’tfeaturehere:
![Page 37: What are the challenges for Data Science?...Hincks, S., Kingston, R., Webb, B. and Wong, C. (in press) A New Geodemographic Classification of Commuting Flows for England and Wales.](https://reader034.fdocuments.us/reader034/viewer/2022051905/5ff6ae97c97a8177854c25ef/html5/thumbnails/37.jpg)
Conclusion
• Researchisincreasinglydata-drivenacrossallfields– DataScienceisnowubiquitous
• Newchallengescomefromthescale,complexityandnatureofdata:Bigdata– scalablealgorithmsandarchitecturesComplexdata– bettermodels:bottomupandtopdownMessydata– statisticalthinkingisessentialHumandata– ethicaldimensionsareofkeyimportanceAccessibledata– avaluablecommonresource