A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an...
Transcript of A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an...
![Page 1: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/1.jpg)
Dear Students – welcome to the second lecture of our course!
1WS 2013/14
A. Holzinger LV 444.152
![Page 2: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/2.jpg)
Inthissecondlecturewestartwithalookondatasources,reviewsomedatastructures,discussstandardizationversusstructurization,reviewthedifferencesbetweendata,informationandknowledgeandclosewithanoverviewaboutinformationentropy.
2WS 2013/14
A. Holzinger LV 444.152
![Page 3: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/3.jpg)
3WS 2013/14
A. Holzinger LV 444.152
![Page 4: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/4.jpg)
4WS 2013/14
A. Holzinger LV 444.152
![Page 5: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/5.jpg)
5WS 2013/14
A. Holzinger LV 444.152
![Page 6: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/6.jpg)
6WS 2013/14
A. Holzinger LV 444.152
![Page 7: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/7.jpg)
Arecommendable referenceis:Scheinerman,E.R.2011.MathematicalNotation:AGuideforEngineersandScientists.Whichalsoincludes themostimportantLATEXcommandsforproducingmaths symbols
7WS 2013/14
A. Holzinger LV 444.152
![Page 8: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/8.jpg)
componentsthroughpandw(Golan,Judge,andMiller;1996);
8WS 2013/14
A. Holzinger LV 444.152
![Page 9: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/9.jpg)
JohnvonNeumannandhishigh‐speedcomputer,approx.1952
Ourfirstquestionis:Wheredoesthedatacomefrom?Thesecondquestion:Whatkindofdataisthis?Thethirdquestion:Howbigisthisdata?So,letuslookatsomebiomedicaldatasources(seeSlide2‐1):
9WS 2013/14
A. Holzinger LV 444.152
![Page 10: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/10.jpg)
Duetotheincreasingtrendtowardspersonalizedandmolecularmedicine,biomedicaldataresultsfromvarioussourcesindifferentstructuraldimensions,rangingfromthemicroscopicworld(e.g.genomics,epigenomics,metagenomics,proteomics,metabolomics)tothemacroscopicworld(e.g.diseasespreadingdataofpopulationsinpublichealthinformatics).Justfororientation:theGlucosemoleculehasasizeof900
=900 10 andtheCarbonatomapprox.300 .Ahepatitisvirusisrelativelylargewith45 =45 10 andtheX‐Chromosomemuchbiggerwith7 =7 10.
Herealotof“bigdata”isproduced,e.g.genomics,metabolomicsandproteomicsdata.Thisisreally“bigdata”– thedatasetsenormouslylarge– whereasineachindividualweestimatemanyTerabytes(1TB=1 10 Byte=1000GByte)ofgenomicsdata,weareconfrontedwithPetabytesofproteomicsdataandthefusionofthoseforpersonalizedmedicineresultsinExabytes ofdata(1EB=1 10 Byte).
Ofcoursetheseamountsareforeachhumanindividual,however,wehaveacurrentworldpopulationof7Billion(1BillioninEnglishlanguageis1MilliardinEuropeanlanguage)people(=7 10 people).Soyoucanseethatthisisreally“bigdata”.This“natural”dataisthenfusedwith“produced”data,e.g.theunstructureddata(text)inthepatientrecords,ordatafromphysiologicalsensorsetc.– thesedataisalsorapidlyincreasinginsizeandcomplexity.Youcanimaginethatwithoutcomputationalintelligencewehavenochancetosurviveinthiscomplexbigdatasets.
http://learn.genetics.utah.edu/content/begin/cells/scale/C‐Atom340pm=340.10‐12mMoleculeGlucose900pmVirus HepatitisVirus45nm=45.10‐9mMicroscope200.10‐9mConfocalmicroscopy 20.10‐6mElectron‐Microscopy0,1.10‐9mX‐Chromosome7.10‐6mDNA2.10‐9mEncyme =Metabolomics
10WS 2013/14
A. Holzinger LV 444.152
![Page 11: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/11.jpg)
MostofourcomputersareVon‐Neumannmachines(seechapter1),consequentlyatthelowestphysicallayer,dataisrepresentedaspatternsofelectricalon/offstates(1/0,H/L,high/low);wespeakofabit,whichisalsoknownasBit,theBasicindissolubleinformationunit(Shannon,1948).DonotconfusethisBitwiththeIEC60027‐2symbolbit– insmallletters– whichisusedasanSIdimensionprefix(e.g.1Kbit=1024bit,1Byte=8bit).Beginningwiththephysicallevelofdatawecandeterminevariouslevelsofdatastructures(seeSlide2‐2):Referto:http://physics.nist.gov/cuu/Units/binary.html
1)Physicallevel: inaVon‐Neumannsystem:bit;inaQuantumsystem:qubitNote: Regardlessofitsphysicalrealization(e.g.voltage,ormechanicalstate,orblack/whiteetc.),abitisalwayslogicallyeither0or1(analog toalight‐switch).Aqubithassimilaritiestoaclassicalbit,butisoverallverydifferent:Aclassicalbitisascalarvariablewiththesinglevalueofeither0or1,sothevalueisunique,deterministicandunambiguous.Aqubitismoregeneralinthesensethatitrepresentsastatedefinedbyapairofcomplexnumbers , , whichexpresstheprobabilitythatareadingofthevalueofthequbitwillgiveavalueof0or1.Thus,aqubitcanbeinthestateof0,1,orsomemixture‐ referredtoasasuperposition‐ ofthe0and1states.Theweightsof0and1inthissuperpositionaredeterminedby(a,b)inthefollowingway:qubit≜ , ≜ 0 1 .Pleasebeawarethatthismodelofquantumcomputationisnottheonlyone(Lanzagorta &Uhlmann,2008).2)LogicalLevel:1)Primitivedatatypes,including:a)Booleandatatype(true/false);b)numericaldatatype(e.g.integer( ,floating‐pointnumbers(“reals”),etc.);2)compositedatatypes,including:a)array,b)record,c)union,d)set(storesvalueswithoutanyparticularorder,andnorepeatedvalues),e)object(containsothers);3)Stringandtexttypes,including:a)alphanumericcharacters,b)alphanumericstrings(=sequenceofcharacterstorepresentwordsandtext)3)AbstractLevel: includingabstractdatastructures,e.g.queue(FIFO),stack(LIFO),set(noorder,norepeatedvalues),lists,hashtable,arrays,trees,graphs,…4)TechnicalLevel: Applicationdataformats,e.g.text,vectorgraphics,pixelimages,audiosignals,videosequences,multimedia,…5)HospitalLevel: Narrative(textual,naturallanguage)patientrecorddata(structured/unstructuredandstandardized/non‐standardized),Omicsdata(genomics,proteomics,metabolomics,microarraydata,fluxomics,phenomics),numericalmeasurements(physiologicaldata,timeseries,labresults,vitalsigns,bloodpressure,CO2 partialpressure,temperature,…),recordedsignals(ECG,EEG,ENG,EMG,EOG,EP…),graphics(sketches,drawings,handwriting,…);audiosignals,images(cams,x‐ray,MR,CT,PET,…),etc.
11WS 2013/14
A. Holzinger LV 444.152
![Page 12: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/12.jpg)
Inbiomedicalinformaticswehavealottodowithabstractdatatypes(ADT),consequentlywebrieflyreviewthemostimportantoneshere.FordetailspleaserefertoacourseonAlgorithm&Datastructures,ortoaclassictextbooksuchas(Aho,Hopcroft&Ullman,1983),(Cormen etal.,2009),orinGerman(Ottmann &Widmayer,2012),(Holzinger,2003)andpleasetakeintoconsiderationthatdatastructuresandalgorithmsgohandinhand(Cormen,2013).
Listisasequentialcollectionofitems , , … , accessibleoneafteranother,beginningattheheadandendingatthetailz.InaVon‐Neumannmachineitisawidelyuseddatastructureforapplicationswhichdonotneedrandomaccess.Itdiffersfromthestack(last‐in‐first‐out,LIFO)andqueue(first‐in‐first‐out,FIFO)datastructuresinsofar,thatadditionsandremovalscanbemadeatanypositioninthelist.Incontrasttoasimpleset theorderisimportant.AtypicalexamplefortheuseofalistisaDNAsequence.ThecombinationofGGGTTTAAAissuchalist,theelementsofthelistarethenucleotidebases.Nucleotides arethejoinedmoleculeswhichformthestructuralunitsoftheRNAandtheDNAandplaythecentralroleinmetabolism.
12WS 2013/14
A. Holzinger LV 444.152
![Page 13: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/13.jpg)
Graph isapair , ,where isasetoffinite,non‐emptyvertices(nodes)and isasetofedges(lines,arcs),whichare2‐elementsubsetsof .If isasetoforderedpairsofvertices(arcs,directededges,arrows),thenitisadirectedgraph(digraph).Thedistancesbetweentheedgescanberepresentedwithinadistance‐matric(twodimensionalarray).Theedgesinagraphcanbemultidimensionalobjects, e.g.vectorscontainingtheresultsofmultipleGen‐expressionmeasures.Forthispurposethedistanceoftwoedgescanbemeasuredbyvariousdistancemetrics.Graphsareideallysuitedforrepresentingnetworksinmedicineandbiology,e.g.metabolismpathways,etc.Inbioinformatics,distancematricesareusedtorepresentproteinstructuresinacoordinate‐independentmanner,aswellasthepairwisedistancesbetweentwosequencesinsequencespace.Theyareusedinstructuralandsequentialalignment,andforthedeterminationofproteinstructuresfromNMRorX‐raycrystallography.Evolutionarydynamicsactonpopulations.Neithergenes,norcells,norindividualsevolve;onlypopulationsevolve.ThissocalledMoranprocess describesthestochasticevolutionofafinitepopulationofconstantsize:Ineachtimestep,anindividualischosenforreproductionwithaprobabilityproportionaltoitsfitness;asecondindividualischosenfordeath.Theoffspringofthefirstindividualreplacesthesecondandindividualsoccupytheverticesofagraph.Ineachtimestep,anindividualisselectedwithaprobabilityproportionaltoitsfitness;theweightsoftheoutgoingedgesdeterminetheprobabilitiesthatthecorrespondingneighborwillbereplacedbytheoffspring.Theprocessisdescribedbyastochasticmatrix ,where denotestheprobabilitythatanoffspringofindividuali willreplaceindividualj.Ateachtimestep,anedge isselectedwithaprobabilityproportionaltoitsweightandthefitnessoftheindividualatitstail.TheMoranprocessisacompletegraphwithidenticalweights(Lieberman,Hauert &Nowak,2005).
13WS 2013/14
A. Holzinger LV 444.152
![Page 14: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/14.jpg)
Treeisacollectionofelementscallednodes,oneofwhichisdistinguishedasaroot,alongwitharelation("parenthood")thatplacesahierarchicalstructureonthenodes.Anode,likeanelementofalist,canbeofwhatevertypewewish.Weoftendepictanodeasaletter,astring,oranumberwithacirclearoundit.Formally,atreecanbedefinedrecursivelyinthefollowingmanner:1.Asinglenodebyitselfisatree.Thisnodeisalsotherootofthetree.2.Suppose isanodeand 1, 2, . . . , aretreeswithroots 1, 2, . . . , , respectively.Wecanconstructanewtreebymaking betheparentofnodes 1, 2, . . . , .Inthistree istherootand 1, 2, . . . , arethesubtrees oftheroot.Nodes 1, 2, . . . , arecalledthechildrenofnode .Dendrogram (fromGreekdendron "tree",‐gramma "drawing")isatreediagramfrequentlyusedtoillustratethearrangementoftheclustersproducedbyhierarchicalclustering.Dendrograms areoftenusedincomputationalbiologytoillustratetheclusteringofgenesorsamples.Theoriginofsuchdendrograms canbefoundin(Darwin,1859).Theexampleby(Hufford etal.,2012)showsaneighbor‐joiningtreeandthechangingmorphologyofdomesticatedmaizeanditswildrelatives.Taxaintheneighbor‐joiningtreearerepresentedbydifferentcolors:parviglumis (green),landraces(red),improvedlines(blue),mexicana (yellow)andTripsacum (brown).Themorphologicalchangesareshownforfemaleinflorescencesandplantarchitectureduringdomesticationandimprovement.
14WS 2013/14
A. Holzinger LV 444.152
![Page 15: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/15.jpg)
Keyproblemsindealingwithdatainclude:1)Heterogeneousdatasources(needfordatafusionanddataintegration)2)Complexityofthedata(high‐dimensionality)3)Noisy,uncertaindata(challengeofpre‐processing)4)Thediscrepancybetweendata‐information‐knowledge(variousdefinitions)5)Bigdatasets(manualhandlingofthedataisimpossible)
15WS 2013/14
A. Holzinger LV 444.152
![Page 16: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/16.jpg)
Nowthatwehaveseensomeexamplesofdatafromthebiomedicaldomain,wecanlookatthe“bigpicture”.Manyika etal.(2011)localizedfourmajordatapoolsintheUShealthcareanddescribethatthedataarehighlyfragmented,withlittleoverlapandlowintegration.Moreover,theyreportthatapprox.30%ofclinicaltext/numericaldataintheUnitedStates,includingmedicalrecords,bills,laboratoryandsurgeryreports,isstillnotgeneratedelectronically.Evenwhenclinicaldataareindigitalform,theyareusuallyheldbyanindividualproviderandrarelyshared(seeSlide2‐4).Biomedicalresearchdata,e.g.clinicaltrials,predictivemodelingetc.,isproducedbyacademiaandpharmaceuticalcompaniesandstoredindatabasesandlibraries.Clinicaldataisproducedinthehospitalandarestoredinhospitalinformationsystems(HIS),picturearchivingandcommunicationsystems(PACS)orinlaboratorydatabases,etc.Muchdataishealthbusinessdataproducedbypayors,providers,insurances,etc.Finally,thereisanincreasingpoolofpatientbehaviorandsentimentdata,producedbyvariouscustomersandstakeholders,outsidethetypicalclinicalcontext,includingthegrowingdatafromthewellnessandambientassistedlivingdomain.
16WS 2013/14
A. Holzinger LV 444.152
![Page 17: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/17.jpg)
Amajorchallengeinournetworkedworldistheincreasingamountofdata– todaycalled“bigdata”.Thetrendtowardspersonalizedmedicinehasresultedinasheermassofthegenerated(‐omics)data,(seeSlide2‐7).Inthelifesciencesdomain,mostdatamodelsarecharacterizedbycomplexity,whichmakesmanualanalysisverytime‐consumingandfrequentlypracticallyimpossible(Holzinger,2013).
MoreandmoreOmics‐dataaregenerated,including:1)Genomicsdata(e.g.sequenceannotation),2)Transcriptomics data(e.g.microarraydata);thetranscriptome isthesetofallRNAmolecules,includingmRNA,rRNA,tRNA andnon‐codingRNAproducedinthecells.3)Proteomicsdata:Proteomicstudiesgeneratelargevolumesofrawexperimentaldataandinferredbiologicalresultsstoredindatarepositories,mostlyopenlyavailable;anoverviewcanbefoundhere:(Riffle&Eng,2009).Theoutcomeofproteomicsexperimentsisalistofproteinsdifferentiallymodifiedorabundantinacertainphenotype.Thelargesizeofproteomicsdatasetsrequiresspecializedanalyticaltools,whichdealwithlargelistsofobjects4)Metabolomics(e.g.enzymeannotation),themetabolome representsthecollectionofallmetabolitesinacell,tissue,organororganism.5)Protein‐DNAinteractions,6)Protein‐proteininteractions;PPIareatthecoreoftheentireinteractomics systemofanylivingcell.7)Fluxomics (isotopictracing,metabolicpathways),8)Phenomics (biomarkers),9)Epigenetics,isthestudyofthechangesingeneexpression– othersthantheDNAsequence,thereforetheprefix“epi‐“10)Microbiomics11)LipidomicsOmics‐dataintegrationhelpstoaddressinterestingbiologicalquestionsonthebiologicalsystemsleveltowardspersonalizedmedicine(Joyce&Palsson,2006).
17WS 2013/14
A. Holzinger LV 444.152
![Page 18: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/18.jpg)
MoreandmoreOmics‐dataaregenerated,including:1)Genomicsdata(e.g.sequenceannotation),2)Transcriptomics data(e.g.microarraydata);thetranscriptome isthesetofallRNAmolecules,includingmRNA,rRNA,tRNA andnon‐codingRNAproducedinthecells.3)Proteomicsdata:Proteomicstudiesgeneratelargevolumesofrawexperimentaldataandinferredbiologicalresultsstoredindatarepositories,mostlyopenlyavailable;anoverviewcanbefoundhere:(Riffle&Eng,2009).Theoutcomeofproteomicsexperimentsisalistofproteinsdifferentiallymodifiedorabundantinacertainphenotype.Thelargesizeofproteomicsdatasetsrequiresspecializedanalyticaltools,whichdealwithlargelistsofobjects(Bessarabova etal.,2012).4)Metabolomics(e.g.enzymeannotation),themetabolome representsthecollectionofallmetabolitesinacell,tissue,organororganism.5)Protein‐DNAinteractions,6)Protein‐proteininteractions;PPIareatthecoreoftheentireinteractomics systemofanylivingcell.7)Fluxomics (isotopictracing,metabolicpathways),8)Phenomics (biomarkers),9)Epigenetics,isthestudyofthechangesingeneexpression– othersthantheDNAsequence,thereforetheprefix“epi‐“10)Microbiomics11)LipidomicsOmics‐dataintegrationhelpstoaddressinterestingbiologicalquestionsonthebiologicalsystemsleveltowardspersonalizedmedicine(Joyce&Palsson,2006).
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2908408/
18WS 2013/14
A. Holzinger LV 444.152
![Page 19: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/19.jpg)
Afurtherchallengeistointegratethedataandtomakeitaccessibletotheclinician.Whilethereismuchresearchontheintegrationofheterogeneousinformationsystems,ashortcomingisintheintegrationofavailabledata.Datafusionistheprocessofmergingmultiplerecordsrepresentingthesamereal‐worldobjectintoasingle,consistent,accurate,andusefulrepresentation(Bleiholder &Naumann,2008).AnexampleforthemixofdifferentdataforsolvingamedicalproblemcanbeseeninSlide2‐8.
AgoodexampleforcomplexmedicaldataisRCQM,whichisanapplicationthatmanagestheflowofdataandinformationintherheumatologyoutpatientclinic(50patientsperday,5daysperweek)ofGrazUniversityHospital,onthebasisofaqualitymanagementprocessmodel.Eachexaminationproduces100+clinicalandfunctionalparametersperpatient.Thisamasseddataaremorphedintobetteruseableinformationbyapplyingscoringalgorithms(e.g.DiseaseActivityScore,DAS)andareconvolutedovertime.Togetherwithpreviousfindings,physiologicallaboratorydata,patientrecorddataandOmicsdatafromthePathologydepartment,thesedataconstitutetheinformationbasisforanalysisandevaluationofthediseaseactivity.Thechallengeisintheincreasingquantitiesofsuchhighlycomplex,multi‐dimensionalandtimeseriesdata(Simonicetal.,2011).
19WS 2013/14
A. Holzinger LV 444.152
![Page 20: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/20.jpg)
Donotconfusestructurewithstandardization(seeSlide2‐9).Datacanbestandardized(e.g.numericalentriesinlaboratoryreports)andnon‐standardized.Atypicalexampleisnon‐standardizedtext– impreciselycalled“Free‐Text”or“unstructureddata”inanelectronicpatientrecord(Kreuzthaleretal.,2011).
Standardizeddata isthe basisforaccuratecommunication.Inthemedicaldomain,manydifferentpeopleworkatdifferenttimesinvariouslocations.Datastandards canensurethatinformationisinterpretedbyalluserswiththesameunderstanding.Moreover,standardizeddatafacilitatecomparabilityofdataandinteroperabilityofsystems.Itsupportsthereusabilityofthedata,improvestheefficiencyofhealthcareservicesandavoidserrorsbyreducingduplicatedeffortsindataentry.
Datastandardizationreferstoa)thedatacontent;b)theterminologiesthatareusedtorepresentthedata;c)howdataisexchanged;andiv)howknowledge,e.g.clinicalguidelines,protocols,decisionsupportrules,checklists,standardoperatingproceduresarerepresentedinthehealthinformationsystem(refertoIOM).Technicalelementsfordatasharingrequirestandardizationofidentification,recordstructure,terminology,messaging,privacyetc.ThemostusedstandardizeddatasettodateistheinternationalClassificationofDiseases(ICD),whichwasfirstadoptedin1900forcollectingstatistics(Ahmadian etal.,2011),whichwewilldiscussin→Lecture3.Non‐standardizeddata isthemajorityofdataandinhibitdataquality,dataexchangeandinteroperability.Well‐structureddata istheminorityofdataandanidealisticcasewheneachdataelementhasanassociateddefinedstructure,relationaltables,ortheresourcedescriptionframeworkRDF,ortheWebOntologyLanguageOWL(see→Lecture3).Note:Ill‐structured isatermoftenusedfortheoppositeofwell‐structured,althoughthistermoriginallywasusedinthecontextofproblemsolving(Simon,1973).Semi‐structuredisaformofstructureddatathatdoesnotconformwiththestrictformalstructureoftablesanddatamodelsassociatedwithrelationaldatabasesbutcontainstagsormarkerstoseparatestructureandcontent,i.e.areschema‐lessorself‐describing;atypicalexampleisamarkup‐languagesuchasXML(see→Lecture3and4).Weakly‐Structureddata isthemostofourdatainthewholeuniverse,whetheritisinmacroscopic(astronomy)ormicroscopicstructures(biology)– see→Lecture5.Non‐structureddata orunstructureddata isanimprecisedefinitionusedforinformation expressedinnaturallanguage,whennospecificstructurehasbeendefined.Thisisanissuefordebate:Texthasalsosomestructure:words,sentences,paragraphs.Ifweareveryprecise,unstructureddatawouldmeantthatthedataiscompleterandomized– whichisusuallycallednoiseandisdefinedby(Duda,Hart&Stork,2000)asanypropertyofdatawhichisnotduetotheunderlyingmodelbutinsteadtorandomness(eitherintherealworld,fromthesensorsorthemeasurementprocedure).
20WS 2013/14
A. Holzinger LV 444.152
![Page 21: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/21.jpg)
“Multivariate”and“multidimensional”aremodernwordsandconsequentlyoverusedinliterature.Eachitemofdataiscomposedofvariables, andifsuchadataitemisdefinedbymorethanonevariableitiscalledamultivariabledataitem.Variablesarefrequentlyclassifiedintotwocategories:dependentorindependent.
21WS 2013/14
A. Holzinger LV 444.152
![Page 22: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/22.jpg)
InPhysics,EngineeringandStatisticsavariableisaphysicalpropertyofasubject,whosequantitycanbemeasured,e.g.mass,length,time,temperature,etc.
22WS 2013/14
A. Holzinger LV 444.152
![Page 23: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/23.jpg)
SMILESdata(.smi)consists ofastringobtainedbythesymbolnodesencounteredinadepth‐firsttreetraversalofachemicalgraph, whichisfirsttrimmedtoremovehydrogenatomsandcyclesarebrokentoturnitintoaspanningtree.Wherecycleshavebeenbroken,numericsuffixlabelsareincludedtoindicatetheconnectednodes.
23WS 2013/14
A. Holzinger LV 444.152
![Page 24: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/24.jpg)
Proteomicanalysisofmesenchymal stemcells(MSCs).Two‐dimensionalgelelectrophoresiswasperformedusingwholeproteincellextractsfromP2MSCculturesofpatientswithrheumatoidarthritis(RA)(n=10)(A)andhealthycontrols(n=6)(B).Afterscanning,spotdetection,quantificationandnormalisation,gelswerecomparedusingHierarchicalClusteringSoftwareandPearsontest(C).Noclustercouldbedetectedusingtheseproteomicprofiles.
Proteomicanalysis:Two‐dimensionalelectrophoresiswasperformedusingP2MSCsinpatientswithRA(n=10)andhealthycontrols(n=6)(fig4A,B).ByusingtheHierarchicalClusteringmethod,wecouldnotdefineanyclusterthatmightdiscriminatepatientandcontrolcells(fig4C).ThePearsoncorrelationcoefficientwasnotsignificantlydifferentbetweenpatientandcontrolcells(r=0.933(0.022)andr=0.929(0.020),respectively).Thesedatacorroboratethelackofsignificantchangesincytokineproductionbetweenpatientsandcontrols.
24WS 2013/14
A. Holzinger LV 444.152
![Page 25: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/25.jpg)
http://www.rcsb.org/pdb/images/3ond_bio_r_500.jpgThePDBisalargerepository containing3‐Dstructuralinformation,establishedin1971Dataastoredin2Dbutcaninfactrepresentbiologicalentitiesinthreeormoredimensions
25WS 2013/14
A. Holzinger LV 444.152
![Page 26: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/26.jpg)
Transaxial (left),coronal(middle),andsagittal(right)imagesofapatientwhowasscannedfor30mininlist‐modewiththeBrainPET scanner;therecordingwasstarted20minafterinjectionofabout300MBq fluor‐deoxy‐glucose.
26WS 2013/14
A. Holzinger LV 444.152
![Page 27: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/27.jpg)
InMathematics,henceinInformatics,however,avariableisassociatedwithaspace–oftenann‐dimensionalEuclideanspace – inwhichanentity(e.g.afunction)oraphenomenonofcontinuousnatureisdefined.Thedatalocationwithinthisspacecanbereferencedbyusingarangeofcoordinatesystems(e.g.Cartesian,Polar‐coordinates,etc.):Thedependentvariablesarethoseusedtodescribetheentity(forexamplethefunctionvalue)whilsttheindependentvariablesarethosethatrepresentthecoordinatesystemusedtodescribethespaceinwhichtheentityisdefined.Ifadatasetiscomposedofvariableswhoseinterpretationfitsthisdefinitionourgoalistounderstandhowthe‘entity’isdefinedwithinthen‐dimensionalEuclideanspace .Sometimeswemaydistinguishbetweenvariablesmeaningmeasurementofproperty,fromvariablesmeaningacoordinatesystem,byreferringtotheformerasvariate,andreferringtothelatterasdimension(DosSantos&Brodlie,2002), (dosSantos&Brodlie,2004).Aspaceisasetofpoints.Ametricspacehasanassociatedmetric,whichenablesustomeasuredistancesbetweenpointsinthatspaceand,inturn,implicitlydefinetheirneighborhoods.Consequently,ametricprovidesaspacewithatopology,andametricspaceisatopologicalone.Topologicalspacesfeelalientousbecauseweareaccustomedtohavingametric.BiomedicalExample:Aproteinisasinglechainofaminoacids,whichfoldsintoaglobularstructure.TheThermodynamicsHypothesisstatesthataproteinalwaysfoldsintoastateofminimumenergy.Topredictproteinstructure,wewouldliketomodelthefoldingofaproteincomputationally.Assuch,theproteinfoldingproblembecomesanoptimizationproblem:Wearelookingforapathtotheglobalminimuminaveryhigh‐dimensionalenergylandscape(Zomorodian,2005).
27WS 2013/14
A. Holzinger LV 444.152
![Page 28: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/28.jpg)
Letuscollect ‐dimensional observationsintheEuclideanvectorspace andweget:Eq.2‐1
, … ,
Acloudofpointssampledfromanysource(e.g.medicaldata,sensornetworkdata,asolid3‐Dobject,surfaceetc.).ThosedatapointscanbecoordinatedasanunorderedsequenceinanarbitrarilyhighdimensionalEuclideanspace,wheremethodsofalgebraictopologycanbeapplied.Themainchallengeisinmappingthedatabackinto ortobemorepreciseinto ,becauseourretinaisinherentlyperceivingdatain .Thecloudofsuchdatapointscanbeusedasacomputationalrepresentationoftherespectivedataobject.Atemporalversioncanbefoundinmotion‐capturedata,wheregeometricpointsarerecordedastimeseries.Nowyouwillaskanobviousquestion:“Howdowevisualizeafour‐dimensionalobject?”Theobviousansweris:“Howdowevisualizeathreedimensionalobject?”Humansdonotseeinthreespatialdimensionsdirectly,butviasequencesofplanarprojectionsintegratedinamannerthatissensedifnotcomprehended.Littlechildrenspendasignificanttimeoftheirfirstyearoflifelearninghowtoinferthree‐dimensionalspatialdatafrompairedplanarprojections,andmanyyearsofpracticehavetunedaremarkableabilitytoextractglobalstructurefromrepresentationsinastrictlylowerdimension(Ghrist,2008).Becausewehavethesameproblemhereinthisbook,wemuststayin andthereforetheexampleinSlide2‐12(Zomorodian,2005).InEinstein'stheoryofSpecialRelativity,Euclidean3‐spaceplustime(the"4th‐dimension")areunifiedintotheMinkowskispace
28WS 2013/14
A. Holzinger LV 444.152
![Page 29: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/29.jpg)
Ametricspacehasanassociatedmetric,whichenablestomeasurethedistancesbetweenpointsinthatspaceand,implicitlydefinetheirneighborhoods.Consequently,ametricprovidesaspacewithatopology,henceametricspaceisatopologicalspace.AsetXwithametricfunctiondiscalledametricspace.Wegiveitthemetrictopologyofd,wherethesetofopenballsMostofour“natural”spacesareaparticulartypeofmetricspaces:theEuclideanspaces:TheCartesianproductof copiesof ,thesetofrealnumbers,alongwiththeEuclideanmetric:Eq.2‐2
,
isthe ‐dimensionalEuclideanspace .Wemayinduceatopologyonsubsetsofmetricspacesasfollows:If ⊆ withtopology ,thenwegettherelativeorinducedtopology bydefiningFormoreinformationreferto(Zomorodian,2005)or(Edelsbrunner &Harer,2010).
29WS 2013/14
A. Holzinger LV 444.152
![Page 30: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/30.jpg)
KnowledgeDiscoveryfromData:Bygettinginsightintothedata;thegainedinformationcanbeusedtobuildupknowledge.Thegrandchallengeistomaphigherdimensionaldataintolowerdimensions,hencemakeitinteractivelyaccessibletotheend‐user(Holzinger,2012),(Holzinger,2013).Thismappingfrom → isthecoretaskofvisualizationandamajorcomponentforknowledgediscovery:Enablingeffectiveinteractivehumancontroloverpowerfulmachinealgorithmstosupporthumansensemaking(Holzinger,2012),(Holzinger,2013).
Holzinger,A.2013.Human–ComputerInteraction&KnowledgeDiscovery(HCI‐KDD):Whatisthebenefitofbringingthosetwofieldstoworktogether?In:AlfredoCuzzocrea,C.K.,Dimitris E.Simos,EdgarWeippl,Lida Xu (ed.)MultidisciplinaryResearchandPracticeforInformationSystems,SpringerLectureNotesinComputerScienceLNCS8127.Heidelberg,Berlin,NewYork:Springer,pp.319‐328.
30WS 2013/14
A. Holzinger LV 444.152
![Page 31: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/31.jpg)
Multivariatedataset isadatasetthathasmanydependentvariables andtheymightbecorrelatedtoeachothertovaryingdegrees.Usuallythistypeofdatasetisassociatedwithdiscretedatamodels.
Multidimensionaldataset isadatasetthathasmanyindependentvariables clearlyidentified,andoneormoredependentvariablesassociatedtothem.Usuallythistypeofdatasetisassociatedwithcontinuousdatamodels.
Inotherwords,everydataitem(orobject)inacomputerisrepresented(stored)asasetoffeatures.Insteadofthetermfeatureswemayusethetermdimensions,becauseanobjectwith ‐featurescanalsoberepresentedasamultidimensionalpointinan ‐dimensionalspace.Dimensionalityreductionistheprocessofmappingan ‐dimensionalpoint,intoalower ‐dimensionalspace– thisisthemainchallengeinvisualizationsee→Lecture9.
Thenumberofdimensionscansometimesbesmall,e.g.simple1D‐datasuchastemperaturemeasuredatdifferenttimes,to3Dapplicationssuchasmedicalimaging,wheredataiscapturedwithinavolume.Standardtechniques—contouringin2D;isosurfacing andvolumerenderingin3D—haveemergedovertheyearstohandlethissortofdata.Thereisnodimensionreductionissueintheseapplications,sincethedataanddisplaydimensionsessentiallymatch.
31WS 2013/14
A. Holzinger LV 444.152
![Page 32: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/32.jpg)
Datacanbecategorizedintoqualitative(nominalandordinal)andquantitative(intervalandratio):Intervalandratiodataareparametric,andareusedwithparametrictoolsinwhichdistributionsarepredictable(andoftenNormal).Nominalandordinaldataarenon‐parametric,anddonotassumeanyparticulardistribution.Theyareusedwithnon‐parametrictoolssuchastheHistogram.Theclassicpaperonthetheoryofscalesofmeasurementis(Stevens,1946).
32WS 2013/14
A. Holzinger LV 444.152
![Page 33: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/33.jpg)
Wecansummarizewhatwelearnedsofaraboutdata:Datacanbenumeric,non‐numeric,orboth.Non‐numericdatacanincludeanythingfromlanguagedata(text)tocategorical,image,orvideodata.Datamayrangefromcompletelystructured,suchascategoricaldata,tosemi‐structured,suchasanXMLFilecontainingmetainformation,tounstructured,suchasanarrative“free‐text”.Note,thattermunstructureddoesnotmeanthatthedataarewithoutanypattern,whichwouldmeancompleterandomnessanduncertainty,butratherthat“unstructureddata”areexpressedso,thatonlyhumanscanmeaningfullyinterpretit.Structureprovidesinformationthatcanbeinterpretedtodeterminedataorganizationandmeaning,henceitprovidesacontext fortheinformation.Theinherentstructureinthedatacanformabasisfordatarepresentation.Animportant,yetoftenneglectedissuearetemporalcharacteristicsofdata: Dataofalltypesmayhaveatemporal(time)association,andthisassociationmaybeeitherdiscreteorcontinuous(Thomas&Cook,2005).InMedicalInformaticswehaveapermanentinteractionbetweendata,informationandknowledge,withdifferentdefinitions(Bemmel&Musen,1997),seeSlide2‐16:
33WS 2013/14
A. Holzinger LV 444.152
![Page 34: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/34.jpg)
Data arethephysicalentitiesatthelowestabstractionlevelwhichare,e.g.generatedbyapatient(patientdata)orabiologicalprocess(e.g.Omicsdata).Accordingto(Bemmel&Musen,1997)datacontainnomeaning.
Informationisderivedbyinterpretationofthedatabyaclinician(humanintelligence).
Knowledge isobtainedbyinductivereasoningwithpreviouslyinterpreteddata,collectedfrommanysimilarpatientsorprocesses,whichisaddedtothesocalledbodyofknowledgeinmedicine,theexplicitknowledge. Thisknowledgeisusedfortheinterpretationofotherdataandtogainimplicitknowledge whichguidestheclinicianintakingfurtheraction.
34WS 2013/14
A. Holzinger LV 444.152
![Page 35: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/35.jpg)
Forhypothesisgenerationandtesting,fourtypesofinferencesexist(Peirce,1955):abstraction,abduction,deduction,andinduction.Thefirsttwodrivehypothesisgenerationwhilethelatterdrivehypothesistesting,seeSlide2‐17:Abstractionmeansthatdataarefilteredaccordingtotheirrelevancefortheproblemsolutionandchunkedinschemasrepresentinganabstract descriptionoftheproblem(e.g.,abstractingthatanadultmalewithhaemoglobinconcentrationlessthan14g/dL isananaemicpatient).Followingthis,hypothesesthatcouldaccountforthecurrentsituationarerelatedthroughaprocessofabduction,characterizedbya"backwardflow"ofinferencesacrossachainofdirectedrelationswhichidentifythoseinitialconditionsfromwhichthecurrentabstractrepresentationoftheproblemoriginates.Thisprovidestentativesolutionstotheproblemathandbywayofhypotheses.Forexample,knowingthatdisease willcausesymptom,abductionwilltrytoidentifytheexplanationforB,whiledeductionwillforecastthatapatientaffectedbydisease will
manifestsymptom :bothinferencesareusingthesamerelationalongtwodifferentdirections(Patel&Ramoni,1997).Abduction ischaracterizedbyacyclicalprocessofgeneratingpossibleexplanations(i.e.,identificationofasetofhypothesesthatareabletoaccountfortheclinicalcaseonthebasisoftheavailabledata)andtestingthoseexplanations(i.e.,evaluationofeachgeneratedhypothesisonthebasisofitsexpectedconsequences)fortheabnormalstateofthepatientathand(Patel,Arocha &Zhang,2004).
ThehypothesistestingprocedurescanbeinferredfromSlide2‐17:Generalknowledgeisgainedfrommanypatients,andthisgeneralknowledgeisthenappliedtoanindividualpatient.Wehavetodeterminebetween:Reasoning istheprocessbywhichaclinicianreachesaconclusionafterthinkingaboutallthefacts;Deduction consistsofderivingaparticularvalidconclusionfromasetofgeneralpremises;Induction consistsofderivingalikelygeneralconclusionfromasetofparticularstatements.Reasoninginthe“realworld”doesnotappeartofitneatlyintoanyofthesebasictypes.Therefore,athirdformofreasoninghasbeenrecognizedbyPeirce(1955),wheredeductionandinductionareinter‐mixed;
35WS 2013/14
A. Holzinger LV 444.152
![Page 36: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/36.jpg)
Thequestion“whatisinformation?”isstillanopenquestioninbasicresearch,andanydefinitionisdependingontheviewtaken.Forexample,thedefinitiongivenbyCarl‐FriedrichvonWeizsäcker:“Informationiswhatisunderstood,”impliesthatinformationhasbothasenderandareceiverwhohaveacommonunderstandingoftherepresentationandthemeanstoconveyinformationusingsomepropertiesofthephysicalsystems,andhisaddendum:“Informationhasnoabsolutemeaning;itexistsrelativelybetweentwosemanticlevels”impliestheimportanceofcontext(Marinescu,2011).Withoutdoubtinformationisafundamentallyimportantconceptwithinourworldandlifeiscomplexinformation,seeSlide2‐14:
Manysystems,e.g.inthequantumworldtonotobeytheclassicalviewofinformation.Inthequantumworldandinthelifesciencestraditionalinformationtheoryoftenfailstoaccuratelydescribereality…forexampleinthecomplexityofalivingcell:Allcomplexlifeiscomposedofeukaryotic(nucleated)cells(Lane&Martin,2010).Agoodexampleofsuchacellistheprotist EuglenaGracilis (inGerman“Augentierchen”)withalengthofapprox.30 .Lifecanbeseenasadelicateinterplayofenergy,entropyandinformation,essentialfunctionsoflivingbeingscorrespondtothegeneration,consumption,processing,preservationandduplicationofinformation.
P:Complexity<>Information<>Energy<> Entropy
36WS 2013/14
A. Holzinger LV 444.152
![Page 37: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/37.jpg)
TheetymologicaloriginofthewordinformationcanbetracedbacktotheGreek“forma”andtheLatin“information”and“informare”,tobringsomethingintoashape(“in‐a‐form”).Consequently,thenaivedefinitionincomputerscienceis“informationisdataincontext” andthereforedifferentthandataorknowledge.However,wefollowthenotionof(Boisot &Canals,2004)anddefinethatinformationisanextractionfromdatathat,bymodifyingtherelevantprobabilitydistributions,hasdirectinfluenceonanagent’sknowledgebase.Forabetterunderstandingofthisconcept,wefirstreviewthemodelofhumaninformationprocessingbyWickens (1984):ThemodelbyWickens (1984)beautifullyemphasizesourviewondata,informationandknowledge:thephysicaldatafromthereal‐worldareperceivedasinformationthroughperceptualfilters,controlledbyselectiveattentionandformhypotheseswithintheworkingmemory.Thesehypothesesaretheexpectationsdependingonourpreviousknowledgeavailableinourmentalmodel,storedinthelong‐termmemory.Thesubjectivelybestalternativehypothesiswillbeselectedandprocessedfurtherandmaybetakenasoutcomeforanaction.Duetothefactthatthissystemisaclosedloop,wegetfeedbackthroughnewdataperceivedasnewinformationandtheprocessgoeson.
37WS 2013/14
A. Holzinger LV 444.152
![Page 38: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/38.jpg)
Theincomingstimulifromthephysicalworldmustpassbothaperceptualfilterandaconceptualfilter.Theperceptualfilter orientatesthesenses(e.g.visualsense)tocertaintypesofstimuliwithinacertainphysicalrange(e.g.visualsignalrange,pre‐knowledge,attentionetc.).Onlythestimuliwhichpassthroughthisfiltergetregisteredasincomingdata–everythingelseisfilteredout.Atthispointitisimportanttofollowourphysicalprincipleofdata:todifferentiatebetweentwonotionsthatarefrequentlyconfused:anexperiment’s(raw,hard,measured,factual)dataandits(meaningful,subjective)interpretedinformationresults.Dataarepropertiesconcerningonlytheinstrument;itistheexpressionofafact. Theresultconcernsapropertyoftheworld.Thefollowingconceptualfiltersextractinformation‐bearingdatafromwhathasbeenpreviouslyregistered.Bothtypesoffiltersareinfluencedbytheagents’cognitiveandaffectiveexpectations,storedintheirmentalmodels.Theenormousutilityofdataresidesinthefactthatitcancarryinformationaboutthephysicalworld.Thisinformationmaymodifysetexpectationsorthestate‐of‐knowledge.Theseprinciplesallowanagent toactinadaptivewaysinthephysicalworld(Boisot &Canals,2004).Conferthisprocesswiththehumaninformationprocessingmodelby(Wickens,1984),seeninSlide2‐19anddiscussedin→Lecture7.
38WS 2013/14
A. Holzinger LV 444.152
![Page 39: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/39.jpg)
Entropyhasmanydifferentdefinitionsandapplications,originallyinstatisticalphysicsandmostoftenitisusedasameasurefordisorder.Ininformationtheory,entropycanbeusedasameasurefortheuncertaintyinadataset.
To demonstratehowusefulentropycanbe‐ youcanhavealookatthispaper:Holzinger,A.,Stocker,C.,Peischl,B.&Simonic,K.‐M.2012.OnUsingEntropyforEnhancingHandwritingPreprocessing.Entropy,14,(11),2324‐2350.http://www.mdpi.com/1099‐4300/14/11/2324
39WS 2013/14
A. Holzinger LV 444.152
![Page 40: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/40.jpg)
Theconceptofentropywasfirstintroducedinthermodynamics(Clausius,1850),whereitwasusedtoprovideastatementofthesecondlawofthermodynamics.Later,statisticalmechanicsprovidedaconnectionbetweenthemacroscopicpropertyofentropyandthemicroscopicstateofasystembyBoltzmann.Shannonwasthefirsttodefineentropyandmutualinformation.
Shannon(1948)usedaGedankenexperiment(thoughtexperiment)toproposeameasureofuncertaintyinadiscretedistributionbasedontheBoltzmannentropyofclassicalstatisticalmechanics,seeSlide2‐22:
40WS 2013/14
A. Holzinger LV 444.152
![Page 41: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/41.jpg)
Anexampleshalldemonstratetheusefulnessofthisapproach:1)Let beadiscretedatasetwithassociatedprobabilities :Eq.2‐5
… , … ,
2)NowweapplyShannon’sequationEq.2‐4:Eq.2‐6
3)Weassumethatoursourcehastwovalues(ball=white,ball=black)LetusdothefamoussimpleGedankenexperiment(thoughtexperiment):Imagineaboxwhichcancontaintwocoloredballs:blackandwhite.Thisisoursetofdiscretesymbolswithassociatedprobabilities.Ifwegrabblindlyintothisboxtogetaball,wearedealingwithuncertainty,becausewedonotknowwhichballwetouch.Wecanask:Istheballblack?NO.THENitmustbewhite,soweneedonequestiontosurelyprovidetherightanswer.Becauseitisabinarydecision(YES/NO)themaximumnumberof(binary)questionsrequiredtoreducetheuncertaintyis:log ,where isthenumberofthepossibleoutcomes.Ifthereare eventswithequalprobability then 1/ .Ifyouhaveonly1blackball,thenlog 1 0,whichmeansthereisnouncertainty.Eq.2‐7
, with , 14)NowwesolvenumericallyEq.2‐6:Eq.2‐8
∗ log1
∗ log1
1Since rangesfrom0(forimpossibleevents)to1(forcertainevents),theentropyvaluerangesfrominfinity(forimpossibleevents)to0(forcertainevents).So,wecansummarizethattheentropyistheweightedaverageofthesurpriseforallpossibleoutcomes.Forourexamplewiththetwoballswecandrawthefollowingfunction:Theentropyvalueis1for =0,5anditisboth0foreither =0or =1.Thisexamplemightseemtrivial,buttheentropyprinciplehasbeendevelopedalotsinceShannonandtherearemanydifferentmethods,whichareveryusefulfordealingwithdata.
41WS 2013/14
A. Holzinger LV 444.152
![Page 42: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/42.jpg)
Shannoncalledittheinformationentropy (akaShannonentropy)anddefined:Eq.2‐9
log1
log
where istheprobabilityoftheeventoccurring.If isnotidenticalforalleventsthentheentropy isaweightedaverageofallprobabilities,whichShannondefinedas:Eq.2‐10
2
Basically,theentropyp(x)approacheszeroifwehaveamaximumofstructure– andopposite,theentropyp(x)reacheshighvaluesifthereisnostructure– hence,ideally,iftheentropyisamaximum,wehavecompleterandomness,totaluncertainty.LowEntropymeansdifferences,structure,individuality– highEntropymeansnodifferences,nostructure,noindividuality.Consequently,lifeneedslowentropy.
42WS 2013/14
A. Holzinger LV 444.152
![Page 43: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/43.jpg)
Theprinciplewhatwecaninferfromentropyvaluesis:1)Lowentropy valuesmeanhighprobability,highcertainty, henceahighdegreeofstructurization inthedata.2)Highentropy valuesmeanlowprobability,lowcertainty (≅ highuncertainty;‐),hencealowdegreeofstructurization inthedata.Maximumentropywouldmeancompleterandomnessandtotaluncertainty.Highlystructureddatacontainlowentropy;ideallyifeverythingisinorderandthereisnosurprise(nouncertainty)theentropyislow:Eq.2‐11
0
Eq.2‐12 log .
Ontheotherhandifthedataareweaklystructured– asforexampleinbiologicaldata–andthereisnoabilitytoguess(alldataisequallylikely)theentropyishigh:Ifwefollowthisapproach,“unstructureddata”wouldmeancompleterandomness.Letuslookonthehistoryofentropytounderstandwhatwecandoinfuture,seeSlide2‐25.
43WS 2013/14
A. Holzinger LV 444.152
![Page 44: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/44.jpg)
Youmight arguewhatthepracticalpurposeofthisapproachis– manifoldapplications!!
44WS 2013/14
A. Holzinger LV 444.152
![Page 45: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/45.jpg)
TheoriginmaybefoundintheworkofJakob Bernoulli,describingtheprincipleofinsufficientreason:weareignorantofthewaysaneventcanoccur,theeventwilloccurequallylikelyinanyway.ThomasBayes(1763)andPierre‐SimonLaplace(1774)carriedonandHaroldJeffreys andDavidCoxsolidifieditintheBayesianStatistics,akastatisticalinference.ThesecondpathleadingtotheclassicalMaximumEntropy,en‐routewiththeShannonEntropy,canbeidentifiedwiththeworkofJamesClerkMaxwellandLudwigBoltzmann,continuedbyWillardGibbsandfinallyClaudeElwoodShannon.Thisworkisgearedtowarddevelopingthemathematicaltoolsforstatisticalmodelingofproblemsininformation.Thesetwoindependentlinesofresearchareverysimilar.Theobjectiveofthefirstlineofresearchistoformulateatheory/methodologythatallowsunderstandingofthegeneralcharacteristics (distribution)ofasystemfrompartialandincompleteinformation.Inthesecondrouteofresearch,thesameobjectiveisexpressedasdetermininghowtoassign(initial)numericalvaluesofprobabilitieswhenonlysome(theoretical)limitedglobalquantitiesoftheinvestigatedsystemareknown.RecognizingthecommonbasicobjectivesofthesetwolinesofresearchaidedJaynes inthedevelopmentofhisclassicalwork,theMaximumEntropyformalism.Thisformalismisbasedonthefirstlineofresearchandthemathematicsofthesecondlineofresearch.TheinterrelationshipbetweenInformationTheory,statisticsandinference,andtheMaximumEntropy(MaxEnt)principlebecameclearin1950ies,andmanydifferentmethodsarosefromtheseprinciples(Golan,2008),seenextSlide
45WS 2013/14
A. Holzinger LV 444.152
![Page 46: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/46.jpg)
MaximumEntropy(MaxEn),describedby(Jaynes,1957),isusedtoestimateunknownparametersofamultinomialdiscretechoiceproblem,whereastheGeneralizedMaximumEntropy(GME)modelincludesnoisetermsinthemultinomialinformationconstraints.Eachnoisetermismodeledasthemeanofafinitesetofaprioriknownpointsintheinterval 1,1 withunknownprobabilitieswherenoparametricassumptionsabouttheerrordistributionaremade.AGMEmodelforthemultinomialprobabilitiesandforthedistributions,associatedwiththenoisetermsisderivedbymaximizingthejointentropyofmultinomialandnoisedistributions,undertheassumptionofindependence(Jaynes,1957).TopologicalEntropy (TopEn),wasintroducedby(Adler,Konheim &McAndrew,1965)withthepurpose tointroducethenotionofentropyasaninvariantforcontinuousmappings:Let , beatopologicaldynamicalsystem,i.e.,let beanonemptycompactHausdorff spaceand : → acontinuousmap;theTopEn isanonnegativenumberwhichmeasuresthecomplexity ofthesystem(Adler,Downarowicz &Misiurewicz,2008).GraphEntropywasdescribedby(Mowshowitz,1968)tomeasurestructuralinformationcontentofgraphs,andadifferentdefinition,morefocusedonproblemsininformationandcodingtheory,wasintroducedby(Körner,1973).Graphentropyisoftenusedforthecharacterizationofthethe structureofgraph‐basedsystems,e.g.inmathematicalbiochemistry.Intheseapplicationstheentropyofagraphisinterpretedasitsstructuralinformationcontentandservesasacomplexitymeasure,andsuchameasureisassociatedwithanequivalencerelationdefinedonafinitegraph;byapplicationofShannon’sEq.2.4withtheprobabilitydistributionwegetanumericalvaluethatservesasanindexofthestructuralfeaturecapturedbytheequivalencerelation(Dehmer&Mowshowitz,2011).
MinimumEntropy (MinEn),describedby(Posner,1975),providesustheleastrandom,andtheleastuniformprobabilitydistributionofadataset,i.e.theminimumuncertainty,whichisthelimitofourknowledgeandofthestructureofthesystem.Often,theclassicalpatternrecognitionisdescribedasaquestforminimumentropy.Mathematically,itismoredifficulttodetermineaminimumentropyprobabilitydistributionthanamaximumentropyprobabilitydistribution;whilethelatterhasaglobalmaximumduetotheconcavityoftheentropy,theformerhastobeobtainedbycalculatingalllocalminima,consequentlytheminimumentropyprobabilitydistributionmaynotexistinmanycases(Yuan&Kesavan,1998).CrossEntropy (CE),discussedby(Rubinstein,1997),wasmotivatedbyanadaptivealgorithmforestimatingprobabilitiesofrareeventsincomplexstochasticnetworks,whichinvolvesvarianceminimization.CEcanalsobeusedforcombinatorialoptimizationproblems(COP).Thisisdonebytranslatingthe“deterministic”optimizationproblemintoarelated“stochastic”optimizationproblemandthenusingrareeventsimulationtechniques(DeBoeretal.,2005).Rényi entropy isageneralizationoftheShannonentropy(informationtheory),andTsallis entropyisageneralizationofthestandardBoltzmann–Gibbsentropy(statisticalphysics).Forusmoreimportantare:ApproximateEntropy(ApEn),describedby(Pincus,1991),isuseabletoquantifyregularityindatawithoutanyaprioriknowledgeaboutthesystem,seeanexampleinSlide2‐20.SampleEntropy(SampEn),wasusedby(Richman&Moorman,2000)foranewrelatedmeasureoftimeseriesregularity.SampEn wasdesignedtoreducethebiasofApEn andisbettersuitedfordatasetswithknownprobabilisticcontent.
46WS 2013/14
A. Holzinger LV 444.152
![Page 47: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/47.jpg)
Problem:Monitoringbodymovementsalongwithvitalparametersduringsleepprovidesimportantmedicalinformationregardingthegeneralhealth,andcanthereforebeusedtodetecttrends(largeepidemiologystudies)todiscoversevereillnessesincludinghypertension(whichisenormouslyincreasinginoursociety).Thisseeminglysimpledata– onlyfromonenightperiod– demonstratesthecomplexityandtheboundariesofstandardmethods(forexampleFastFourierTransformation)todiscoverknowledge(forexampledeviations,similaritiesetc.).Duetothecomplexityanduncertaintyofsuchdatasets,standardmethods(suchasFFT)comprisethedangerofmodelingartifacts.Sincetheknowledgeofinterestformedicalpurposesisinanomalies(alterations,differences,a‐typicalities,irregularities),theapplicationofentropicmethodsprovidesbenefits.PhotographtakenduringtheEUProjectEMERGEandusedwithpermission.
47WS 2013/14
A. Holzinger LV 444.152
![Page 48: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/48.jpg)
1)Wehaveagivendataset wherecapital isthenumberofdatapoints:Eq.2‐13
, , … ,
2)Nowweformm‐dimensionalvectorsEq.2‐14
, , … ,3)Wemeasurethedistancebetweeneverycomponent,i.e.themaximumabsolutedifferencebetweentheirscalarcomponentsEq.2‐15
, max, ,…,
4)Welook– sotosay– inwhichdimensionisthebiggestdifference;asaresultwegettheApproximateEntropy(ifthereisnodifferencewehavezerorelativeentropy):Eq.2‐16
ApEn , lim →
where istherunlengthand isthetolerancewindow (letusassumethat isequalto ),ApEn (m,r)couldalsobewrittenasH ,5) iscomputedbyEq.2‐17
1 1
ln
withEq.2‐18
1
6) measureswithinthetolerance theregularityofpatternssimilartoagivenoneofwindowlength7)Finallyweincreasethedimensionto 1 andrepeatthestepsbeforeandgetasaresulttheapproximateentropyApEn ,ApEn , , isapproximatelythenegativenaturallogarithmoftheconditionalprobability(CP)thatadatasetoflength,havingrepeateditselfwithinatolerance for points,willalsorepeatitselffor 1 points.Animportantpointto
keepinmindabouttheparameter isthatitiscommonlyexpressedasafractionoftheStandarddeviation(SD)ofthedataandinthiswaymakesApEn ascale‐invariantmeasure.Alowvaluearisesfromahighprobabilityofrepeatedtemplatesequencesinthedata(Hornero etal.,2006).
48WS 2013/14
A. Holzinger LV 444.152
![Page 49: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/49.jpg)
Inthisslidewecanseetheplotofthenormalizedapproximateentropyforeachoftheepisodesandthemedianacrossalltheepisodes.Fromthisfigurewecanseethattheentropyisaminimumwherewehavenoalterationsandentropyisincreasingwhenhavingirregularities.Ifwehavenodifferenceswegetzeroentropy
49WS 2013/14
A. Holzinger LV 444.152
![Page 50: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/50.jpg)
Afinalexampleshouldmaketheadvantageofsuchanentropymethodtotallyclear:Intherightdiagramitishardtodiscoverirregularitiesforamedicalprofessional–especiallyoveralongerperiod,butananomalycaneasilybedetectedbydisplayingthemeasuredrelativeApEn.Whatcanwelearnfromthisexperiment?Approximateentropyisrelativelyunaffectedbynoise;itcanbeappliedtocomplextimeserieswithgoodreproduction;itisfiniteforstochastic,noisy,compositeprocesses;thevaluescorresponddirectlytoirregularities;anditisapplicabletomanyotherareas– forexamplefortheclassificationoflargesetsoftexts– theabilitytoguessalgorithmicallythesubjectofatextcollectionwithouthavingtoreaditwouldpermitautomatedclassification.
50WS 2013/14
A. Holzinger LV 444.152
![Page 51: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/51.jpg)
Whatcanwelearnfromthisexperiment?Approximateentropyisrelativelyunaffectedbynoise;itcanbeappliedtocomplextimeserieswithgoodreproducibility;itisfiniteforstochastic,noisy,compositeprocesses;thevaluescorresponddirectlytoirregularities;
anditisapplicabletomanyotherareas– forexamplefortheclassificationoflargesetsoftexts– theabilitytoguessalgorithmicallythesubjectofatextcollectionwithouthavingtoreaditwouldpermitautomatedclassification.
51WS 2013/14
A. Holzinger LV 444.152
![Page 52: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/52.jpg)
52
My DEDICATION is to make data valuable … Thank you!
WS 2013/14
A. Holzinger LV 444.152
![Page 53: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/53.jpg)
53WS 2013/14
A. Holzinger LV 444.152
![Page 54: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/54.jpg)
54WS 2013/14
A. Holzinger LV 444.152
![Page 55: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/55.jpg)
MFC=MinimumFoot ClearanceStride=stepYoucanseebrilliantlywhatyoucanmeasurewithentropy– youcandetermineanomalies,i.e.thebalanceproblemsofelderlygait
MFCPoincaré plots.ToppanelsshowMFCtimeseriesfromahealthyelderlysubject(A)anditscorrespondingPoincaré plot(B).BottompanelsshowMFCtimeseriesfromanelderlysubjectwithbalanceproblem(C)anditscorrespondingPoincaré plot(D).
SignificantrelationshipsofmeanMFCwithPoincaré plotindexes(SD1,SD2)andApEn (r=0.70,p<0.05;r=0.86,p<0.01;r=0.74,p<0.05)werefoundinthefalls‐riskelderlygroup.Ontheotherhand,suchrelationshipswereabsentinthehealthyelderlygroup.Incontrast,theApEn valuesofMFCdataseriesweresignificantly(p<0.05)correlatedwithPoincaré plotindexesofMFCinthehealthyelderlygroup,whereascorrelationswereabsentinthefalls‐riskgroup.TheApEn valuesinthefalls‐riskgroup(meanApEn =0.18± 0.03)wassignificantly(p<0.05)higherthanthatinthehealthygroup(meanApEn =0.13± 0.13).ThehigherApEn valuesinthefalls‐riskgroupmightindicateincreasedirregularitiesandrandomnessintheirgaitpatternsandanindicationoflossofgaitcontrolmechanism.ApEn valuesofrandomlyshuffledMFCdataoffallsrisksubjectsdidnotshowanysignificantrelationshipwithmeanMFC.
55WS 2013/14
A. Holzinger LV 444.152
![Page 56: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/56.jpg)
56WS 2013/14
A. Holzinger LV 444.152
![Page 57: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/57.jpg)
57WS 2013/14
A. Holzinger LV 444.152
![Page 58: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/58.jpg)
Surrogatedatarecords.AandBshowthemajorcomponents.A:themeanprocess,whichhassetpointandspikemodes.B:thebaselineprocess,heremeaningtheheartratevariability,modeledasGaussianrandomnumbers.C:theirsum,asurrogatedatarecord.D–F:amorerealisticsurrogatewiththesamefrequencycontentastheobserveddata.D:aclinicallyobserveddatarecordof4,096R‐Rintervals.Thelefthand ordinateislabeledinms andtherighthand ordinateinSD.E:a4,096‐pointisospectral surrogatedatasetformedusingtheinverseFouriertransformoftheperiodogram ofthedatainD.F:thesurrogatedataafteradditionofaclinicallyobserveddecelerationlasting50pointsandscaledsothatthevarianceoftherecordisincreasedfrom1to2.
58WS 2013/14
A. Holzinger LV 444.152
![Page 59: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/59.jpg)
59WS 2013/14
A. Holzinger LV 444.152
![Page 60: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/60.jpg)
60WS 2013/14
A. Holzinger LV 444.152
![Page 61: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/61.jpg)
61WS 2013/14
A. Holzinger LV 444.152
![Page 62: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/62.jpg)
http://support.sas.com/documentation/cdl/en/etsug/60372/HTML/default/viewer.htm#etsug_entropy_sect018.htm
Wheremanyotherlanguagesrefertotables,rows,andcolumns/fields,SASusesthetermsdatasets,observations,andvariables.ThereareonlytwokindsofvariablesinSAS:numericandcharacter(string).Bydefaultallnumericvariablesarestoredas(8byte)real.Itispossibletoreduceprecisioninexternalstorageonly.DateanddatetimevariablesarenumericvariablesthatinherittheCtraditionandarestoredaseitherthenumberofdays(fordatevariables)orseconds(fordatetime variables).
http://www.sas.com/technologies/analytics/statistics/stat/index.html
62WS 2013/14
A. Holzinger LV 444.152
![Page 63: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/63.jpg)
Hadoop andtheMapReduce programmingparadigmalreadyhaveasubstantialbaseinthebioinformaticscommunity– inparticular inthefieldofhigh‐throughput next‐generationsequencinganalysis.Thisisduetothecost‐effectivenessofHadoop‐basedanalysisoncommodityLinuxclusters,andinthecloudviadatauploadtocloudvendorswhohaveimplementedHadoop/HBase;andduetotheeffectivenessandease‐of‐useoftheMapReduce methodinparallelizationofmanydataanalysisalgorithms.
63WS 2013/14
A. Holzinger LV 444.152
![Page 64: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/64.jpg)
Nanomedicine opensnewavenuesformanyresearchinbiomedicalinformaticsmethodsandtools.For surethefuturechallengesarenewtopicssuchas“bigdata”inbioinformatics,novelmethodsfortheuseof“omics”‐dataetc.Futureresearchisneededonalgorithmicandmethodologicalissues.Thisneedsthewillingnesstocooperate withdifferentdisciplines.
Twoareasofferidealconditionstowardssolvingthesechallenges:Human‐ComputerInteraction(HCI)andKnowledgeDiscoveryandDataMining(KDD),withthegoalofsupportinghumanintelligencewithmachineintelligence– todiscovernew,previouslyunknowninsightsintothedata.
Holzinger,A.2013.Human–ComputerInteraction&KnowledgeDiscovery(HCI‐KDD):Whatisthebenefitofbringingthosetwofieldstoworktogether?In:AlfredoCuzzocrea,C.K.,Dimitris E.Simos,EdgarWeippl,Lida Xu (ed.)MultidisciplinaryResearchandPracticeforInformationSystems,SpringerLectureNotesinComputerScienceLNCS8127.Heidelberg,Berlin,NewYork:Springer,pp.319‐328.
64WS 2013/14
A. Holzinger LV 444.152
![Page 65: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/65.jpg)
Thechallengewefaceisthatanestimatedaverageof5%ofdataarestructured,therestiseithersemi‐structured,weaklystructuredandmostofourdataisunstructured.
Maybethemostimportantfield forthefutureisdatamining– especiallynoveltechniquesofdatamining,includingbothtimeandspace(e.g.graph‐based,entropy‐based,topological‐baseddataminingapproaches).
65WS 2013/14
A. Holzinger LV 444.152
![Page 66: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/66.jpg)
http://minnesotafuturist.pbworks.com/w/page/21441129/DIKW
Afunnydescription ofdatainformationknowledge.
66WS 2013/14
A. Holzinger LV 444.152
![Page 67: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/67.jpg)
67WS 2013/14
A. Holzinger LV 444.152
![Page 68: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/68.jpg)
Avery placative image.Nicetolookat– buttheusefulnessisquestionable.
68WS 2013/14
A. Holzinger LV 444.152
![Page 69: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/69.jpg)
Allthismodelsareveryquestionable. PleaserememberthatwefollowinourlecturethenotionofBoisot &Canals.
69WS 2013/14
A. Holzinger LV 444.152
![Page 70: A. Holzinger LV 444 - Genomegenome.tugraz.at/MedicalInformatics/WinterSemester... · time step, an edge E Fis selected with a probability proportional to its weight and the fitness](https://reader033.fdocuments.us/reader033/viewer/2022060321/5f0d1f387e708231d438c97d/html5/thumbnails/70.jpg)
Theinterestingissue ofthisgraphicisthatitincludesatime‐axis,whichisimportantfordecisionmakingandpredictiveanalytics.
70WS 2013/14
A. Holzinger LV 444.152