CSE 111 Bio: Program Design I Lecture 2: Python … 111 Bio: Program Design I Lecture 2: Python...
Transcript of CSE 111 Bio: Program Design I Lecture 2: Python … 111 Bio: Program Design I Lecture 2: Python...
CSE111Bio:ProgramDesignILecture2:PythonBasics&
IntrotoBioRobertSloan(CS)&RachelPoretsky(Bio)
UniversityofIllinois,ChicagoAugust31,2017
ACTIVELEARNING&CLICKERS
3
Didyou registeryourclicker?
Lecture:PartiallyPeerInstruction
• Posecarefullydesignedquestion– Solovote:Thinkforyourselfandselectanswer– Discuss:Analyzeprobleminsmallteams
• Practiceanalyzing,talkingaboutchallengingconcepts• Reachconsensus• Ifyouhavequestions,raiseyourhandandwewillcomeover
– Classwidediscussion:• LedbyYOU(students)– telluswhatyoutalkedaboutinyourdiscussionsthatyouthinkeveryoneshouldknow!
WhyPeerInstruction?
• Yougettomakesureyouarefollowingthelecture.
• Igetfeedbackastowhatyouunderstand.
• It’slessboring!
• Researchshowsitpromotesmorelearningthanstandardlecture.
DiscussionGroups
• Threepeopleeach• Adhocforfirstfewclassesuntilenrollmentsettles
• Mayassigngroupsfor5th or6th class onward
DiscussionEtiquette
• It’sokaytonotknowthings– Ifyoualreadyknoweverything,you’rewastingyourtime
• It’snotacompetition
Example:Thebestdinosauris:
A.T-rex B.Raptor
C.TriceratopsD.Stegosaurus
E.Someotherdinosaur/Noclue
DoingTheReadingMeans
• Youdothe“easy”partbeforeclass.– ReaditandanalyzeforYOURSELF!– IfIrephraseitforyou,whatpurposedoesthatserve?
• Traditionalclassstructuresoftenlooklike:
• Yougetverylittleopportunityfor“expert”feedback
FirstExposure
Lecture Textbook
ReadHardStuff
Assignment
SeeifYouKnowHardStuff
Exam
ShowKnowledgeMastery
ActiveLearning
• Greateropportunityforexpert feedback!• Researchonhowpeoplelearn:
– Everyoneconstructstheirownunderstanding• Ican’tdumpunderstandingintoyourbrain
– Tolearn,YOUmustactivelyworkwithaproblemandconstructyourownunderstandingofit
Textbook Lecture Assignment Exam
ShowKnowledgeMastery
FirstExposure:Withresourcesand
Feedback
LearnHardStuff:Withteacherand
discussion
PracticeKnowledgeMastery
DoesthismeanIhavetoshowuptoclass?
• Yes
• Willnotpenalizeanybodymissingduringfirstfewclassesandwilldropyourlowest3classparticipationscores
Registeryourclickeronlinesowecangiveyoupoints
• InBlackboard,onthesideofthecoursepage
• Yourvoteswillbesavedbeforeyouregister,justnotassociatedwithyou
ABITMORERESYLLABUSETC.
Grading
SubjecttochangeatanytimeforanyreasonLabprogrammingassignments(lowest3dropped) 20%Labquizzes(lowest 2dropped) 5%
Programmingprojects 25%
Twomidterms,10%each 20%
FinalExam 20%
Lectureparticipation(clicker) 10%
Zybook reading/completion beforestartofclass 5%
Inaddition,topassCS111, onemustpassboththeprogrammingpartofthecourse(labsplusprogrammingprojects)andtheexampartofthecourse(midtermsplusfinal).
WeeklyLabQuizzes
• Quick(2–5question)reviewofwhatwe’vebeencoveringinclass
• Youmustbeinlabtodothelabquiz
• OnBlackboard,solutionswillbeavailableonthedayfollowinglab
ProgrammingAssignments
• Longerassignmentsthatgiveyouthechancetoputtogetherwhatwehavelearnedinacreativeway
• Giveyoupracticecreatingalongerprogram
ProgrammingProjectLatePolicy• Youhavetwolatedaysthatwillbeautomaticallyappliedifyourlabislate.Noworkwillbeacceptedmorethan2dayslate.
• Tousealateday,youmustfilloutthelatedaysformonblackboardbeforetheassignmentisdue
• Thesooneryouturninthework,thesoonerwecangradeitandgetitbacktoyou
CollaborationPolicy• Pleasedo
– Postontheforum– Talkwithclassmatesaboutassignments
• Pleasedon’t– Copysomeone’scode– Showsomeoneelseyourcode
• Goodruleofthumb:Waithalfanhourafterdiscussing,writecode/doproblem
Questions?
• Allofthisinfoandgradingpolicy,etc.,etc.,oncoursewebsite
HowtoAceThisClass
• Dothereading• Attendclassandparticipateindiscussion• Startprogrammingprojectsearly• Gotolabsectionandaskquestions• Ifyougetstuck,postonpiazzaorcometoofficehours
• Havefun!
HowdoIbecomeagreatcomputerscientist?
• Writealotofcode!
21
SOMEBIOLOGY
Figure1.7,1.8
Figure1.10,1.11
2626
RelevantTheories:Genes
• In1866Gregor Mendelpublishedhisexperimentsonpeaplants,demonstratingthatdiscretetraitsaretransmittedovergenerationsinapredictablemanner.(Section8.1,CoB)
• T.H.Morganin1915establisheschromosomesascarriersofheredity.(Section8.3,CoB)
• Theoryofheredity:traitsareaffectedbygenes,transmittedbetweengenerationsinapredictablemanner.
RelevantTheories:Genes
Whatisachromosomemadeof?
Figure9.7
GeneticElements• Gene: Functionalunitofgeneticinformation;genesareincellsandarecomposedofDNA
• Genome:entirecomplementofgenesincellorvirus
• Chromosome:maingeneticelement.Presenceofessentialgenesisnecessaryforageneticelementtobecalledachromosome
Figure9.6
• Aeukaryotecontainsawell-definednucleus,whereasinprokaryotes,thechromosome liesinthecytoplasminanareacalledthenucleoid.
32
http://blogs.nature.com/freeassociation/tag/watson-and-crick
Copyright(c)HenryGrantArchive/MuseumofLondon
https://profiles.nlm.nih.gov/KR/
Figure9.3
• (a)EachDNAnucleotideismadeupofasugar,aphosphategroup,andabase.• (b)Cytosineandthyminearepyrimidines.Guanineandadeninearepurines.
• Purines– adenine (A)– guanine (G)
• Pyrimidines– thymine (T)– cytosine (C)
• Pairing– A : T
• 2 bonds– C : G
• 3 bonds
DNA Pairing Rules
Figure9.4
• DNA(a)formsadoublestrandedhelix,and(b)adeninepairswiththymineandcytosinepairswithguanine.(credita:modificationofworkbyJeromeWalker,DennisMyts)
Whatisthis3¢ or5¢ stuff?
• theleadingcarbonattachestoaphosphate-P (5’)
• thelasttrailingcarbonisattachedtohydroxyl–OH(3’)
• base=A,C,G,T
DNAStructure
OH
O
3¢
PO4
base
CH2O
base
OPO
C
O–O
CH2
1¢
2¢
4¢
5¢
1¢
2¢
3¢
3¢
4¢
5¢
5¢
Figure9.5
• ThedifferencebetweentheribosefoundinRNAandthedeoxyribose foundinDNAisthatribosehasahydroxylgroupatthe2'carbon.
DNAandRNAStructure:Polymersofsugar,phosphate,andbase
DNAisdoublestranded
38http://upload.wikimedia.org/wikipedia/commons/thumb/e/e4/DNA_chemical_structure.svg/450px-DNA_chemical_structure.svg.png
Figure9.8
• ThetwostrandsofDNAarecomplementary,meaningthesequenceofbasesinonestrandcanbeusedtocreatethecorrectsequenceofbasesintheotherstrand.
• NucleotidesinDNAbackbonearebondedfromphosphatetosugarbetween3¢ &5¢ carbons
– DNAmoleculehas“direction”– complementarystrandrunsin
oppositedirection
AntiparallelStrands
TheDoubleHelix
Onehelicalturn
(10basepairs)
Minorgroove
Majorgroove
Sugar–phosphatebackbone
3.4nm
TheDoubleHelix
• SizeofDNAmoleculeisexpressedinbasepairs
• 1,000basepairs=1kilobasepair=1kbp• 1millionbasepairs=1megabasepair=1Mbp• E.coli genome=4.64Mbp• Eachbasepairtakesup0.34nmoflengthalongthehelix
• 10basepairsmakeup1turnofthehelix
GenomeSizes
• Carsonella ruddii,160,000basepairs(bp)• porcine(pig)circovirus 1.7Kbp,ssDNA• cowpeamosaicvirus,9.4Kbp,ssRNA• phi29(Bacillus phage)19Kbp,dsDNA• humans?
GenomeSizes
• Carsonella ruddii,160,000basepairs(bp)• porcine(pig)circovirus 1.7Kbp,ssDNA• cowpeamosaicvirus,9.4Kbp,ssRNA• phi29(Bacillus phage)19Kbp,dsDNA• humans?• Largestgenome:
(A)Human,(B)Fish,(C)Amoeba,(D)Floweringplant,(E)Don’tknow
GenomeSizes
• Carsonella ruddii,160,000basepairs(bp)• porcine(pig)circovirus 1.7Kbp,ssDNA• cowpeamosaicvirus,9.4Kbp,ssRNA• phi29(Bacillus phage)19Kbp,dsDNA• Protopterus aethiopicus,130Gbp• Parisjaponica,150Gbp• Amoebadubia,
670Gbp
SohowdowerepresentDNAonacomputer?
46
PROGRAMS,PROCESSES,ALGORITHMS
Whichiscorrect?
A. Acomputerprogramistypicallyamoredetailedfleshingoutofthegeneralspecificationofanalgorithm
B. Analgorithmistypicallyamoredetailedfleshingoutofthegeneralspecificationofacomputerprogram
C. Computerprograms&algorithmsaretypicallystatedwiththesamelevelofdetail
Algorithm
• Analgorithm isamethodicalstep-by-stepproceduretoperformatask.– Typicallystatedinnaturallanguage(e.g.,English)thoughoftensomemathorprogramminglanguage-ish ismixedin
• Designofgoodprogramsofmoderatecomplexity(notWeek1ofCS111)startswithdescriptionofalgorithm;movesto(somewhatmoredetailed&precise)program
50
Programmingisacommunicationsandanalysisskill
• Ifyouwanttounderstandwhatyourtools(e.g.,Excel)canorcannotdo,youneedtounderstandwhattheprogramsaredoing
• Ifyouwanttosaysomethingthatyourtoolsdon’tallow,programityourself
• Ifyoucareaboutanalyzingdata…thenit’sworthyourwhiletounderstandhowtodoitatscale!
• KnowledgeisPower,Knowinghowprogramsworkispowerfulandfreeing
Asidetonon-CS-Majors:Process
• AlanPerlis– Oneofthefoundersofcomputerscience– Arguedin1961 thatComputerScienceshouldbepartofaliberaleducation:Everyone shouldlearntoprogram.
• Perhapscomputingismore criticaltoaliberaleducationthanCalculus
• Calculusisaboutrates,andthat’simportanttomany.• Computerscienceisaboutprocess,andthat’simportanttoeveryone.
Thus:ProgrammingisaboutCommunicatingProcess
• Aprogramisthemostconcisestatementpossibletocommunicateaprocess– That’swhyit’simportantthosewhowanttospecifyhow todosomethingunderstandablyinasfewwordsaspossible
– Andonereasonwhyweshouldstrivetomakeourprogramseasyforhumans toread
COMPUTERS,DATA,SIZES
Whatcomputersunderstand• 0’sand1’s.
– Everything is0’sand1’s
• Computersareexceedingly stupid– Theonlydata theyunderstandis0’sand1’s– Theycanonlydothemostsimplethingswiththose0’sand1’s
• Movethisvaluehere• Add,multiply,subtract,dividethesevalues• Comparethesevalues,andifoneislessthantheother,gofollowthisstepratherthanthatone.
KeyConcept:Encodings• Butwecaninterpret these
numbersanywaywewant.– Wecanencode informationinthosenumbers
• Eventhenotionthatthecomputerunderstandsnumbersisaninterpretation– Weencodethevoltagesonwiresas0’sand1’s
– Whichwecan,inturn,interpretasadecimalnumber
Usefulterminology
• 8bitbyteisfundamentalunitofmemory• 1byteisreallytinyunit.Moreoftensee:
– N.B.210,220,230,240,insteadofpowersof10(e.g.,1024bytekB)alsoused!See,e.g.,https://support.apple.com/en-us/HT201402
1kilobyte 1kB 103 bytes 1000bytes
1megabyte 1MB 106 bytes 1,000,000 bytes
1gigabyte 1GB 109 bytes 1billionbytes
1terabyte 1TB 1012 bytes 1trillionbytes
Unitstodataamounts• 1GBcanholdabout
– 250photos(Basedon12Megapixelcamera,e.g.,recentiPhone,andJPEG100%quality)
– 7–30minutesofvideo(dependingonquality,framespersecond,and720–1080p)
– 40,000pagesofsimpleWorddocs– 1/3ofthecompletegenomeofonehuman
• So1TBharddrivecanhold200hoursofvideoand10,000photoswithroomleftover(assumingvideoishigh-quality720p@30fps)or300people’scompleteDNA
UnitstoFall2017$@amazon
• SeagateExpansion1TBPortableExternalHardDriveUSB3.0(STEA1000400)
• by Seagate• $ 54 99 Prime|FREEOne-Day
• (Pricesnowstable;costaboutthesameayearago;betyoucouldfinditfor$50ifyoushoppedaround)
Whatisacomputer?
• Adevicethatexecutesastoredprogram(sequenceofinstructions).
• Aprogramisaparticularwritingofarecipeinsomeparticularlanguage.(RecipeislikelytobeinEnglish orFrenchorArabicorHindi;programinaprogramminglanguage suchasC,Java,VisualBasic,orPython)
Allcomputersconsistof3components
• Memory–storesprogramanddata(information)– Primary:RAM(RandomAccessMemory)“memory”– Secondary:harddrive“storage”
• CentralProcessingUnit(CPU)– Control–fetchnextinstruction,decode it,executeit– ArithmeticLogicUnit–performsimple operationsondata(add,comparetwoforequality,etc.)
• Input/Output
Detour:Specsforacomputer
• Adsforcomputerstypicallygive:– SpeedoftheCPU(inGHz,say1.0-3.25)– AmountofRAMinGB– SizeofharddriveinGBorTB– Which“nice”I/Odevices(e.g.,retinadisplay)
• Interestingly,perceived speedtodayoftendependsheavilyonamountofRAM
Moore’sLaw
• GordonMoore,oneofthefoundersofIntel,madetheclaimthat(essentially)computerpowerdoublesforthesamedollarevery18months.
• Thishasheldtrueforover30years.– (Note:somethinktheendisfinallynear.)
• Goahead!Makeyourcomputerdothesamethingtoall2.9billionbasepairsofyourDNA!Itdoesn’tcare!Anditwon’ttakemuchtimeeither!
Solet’sstart!
64
PYTHONBASICS
RememberPythonfromlasttime
• We’lluseitforourfirst~1.5monthmission:• LearntowritefunctionsusedtoanalyzeDNA,thestuffuponwhichalllifeonearthisbuilt,andinparticulartosearchthroughahugesequenceofletterstofindabiologicallymeaningfulgene
AfewwordsaboutwhyPython• Popular,widelyused(therearejobs!)
– Top5foroverallgeneraluse– #1or#2fordatascience/dataanalytics(withR)
• Easy(forhumans!)toread;easytowrite;easytolearn
• Candorealisticexamplesveryearlyon• Outstandingforexploratory,experimental,“getananswer”programming
• Twicethefunforhalftheannoyingsyntax!• It’swidelyusedinbiologyandscientificcomputing
Why?(cont.)
68
Afterall,therearethousandsoflanguagestochoosefrom!
•Relatively“nice”syntax•Emergingaslanguageofchoiceinmanyfields•Packagesforgraphics,audio,scientificcomputing,…
Python: print("HelloWorld!")
Java: classHelloWorld {staticpublicvoidmain(Stringargs[]){
System.out.println("HelloWorld!");}
}
Befunge:
Image:UniversityofFlorida
HelloWorld…
69
C++: #include<iostream.h>main(){
cout <<"HelloWorld!"<<endl;return0;
}
Ook:
Python:Richersyntaxallowsgreaterexpressiveness!
70
def alifeSim(numGens,popSize,numToSelect,Network,inhibitorL):"""DoanartificiallifesimulationfornumGens generationswithpopSize organisms.""”
fitD={}#getinitialpopandcreatepopL of(fitness,org)tuplespopL=[]fororgincreateInitialPop(popSize,Network,inhibitorL):
fitness=org.getFitness()popL.append((fitness,org))fitD[hash(org)]=fitness
topL =getTopOrgs(popL,numToSelect)#gettoporgsfori inrange(numGens):
popL=[]forjinrange(popSize):
toReplicate=random.choice(topL)neworg =toReplicate[1].replicate()#getfitness
ifhash(neworg)infitD:fitness=fitD[hash(neworg)]
else:fitness=neworg.getFitness()fitD[hash(neworg)]=fitness
popL.append((fitness,neworg))topL =getTopOrgs(popL,numToSelect)print"gen:",i,":",topL[0]ifi%50==0:
fitD.clear()returntopL[0]
Learningtoprogramisabitlikelearninga
foreignlanguage!
Spyder forPython
SpeakingofPython
• LabAssignment1duetonight,Thursdaybymidnight
72