lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $...
Transcript of lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $...
![Page 1: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8d47a583a7ec25cc23e9f7/html5/thumbnails/1.jpg)
2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$
$$Week$13,$Lecture$25$
István'Albert''
Biochemistry$and$Molecular$Biology$$and$Bioinforma;cs$Consul;ng$Center$
$Penn$State$
![Page 2: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8d47a583a7ec25cc23e9f7/html5/thumbnails/2.jpg)
Final$Project$
• Will$account$for$50%$of$your$the$grade.$
• It$will$consist$of$one$or$more$datasets$and$a$number$of$hypotheses$that$need$to$be$tested$
• Project$given$out$Thursday,$Dec$6th$and$is$due$Tuesday,$Dec$11th$
$
![Page 3: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8d47a583a7ec25cc23e9f7/html5/thumbnails/3.jpg)
Data$and$scripts$for$this$lecture$
• Data,$code$and$scripts$are$packaged$in$lec25.tar.gz'
• have$to$master$too$many$tools$and$it$is$easy$to$get$stuck$
• Use$it$as$a$guide$%$$don’t$just$copy/paste$$
• Develop$your$own$mini$libraries$of$tools,$shell$scripts$and$methods$$
![Page 4: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8d47a583a7ec25cc23e9f7/html5/thumbnails/4.jpg)
Chip%Seq$peak%caller$Recap$
1.$A$peak$caller$transforms$aligned$read$coverages$to$intervals$of$enrichment$
![Page 5: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8d47a583a7ec25cc23e9f7/html5/thumbnails/5.jpg)
The$most$common$misconcep;on$Cau;on:$this$a$personal$opinion$that$others$disagree$with$
Most'confusion'in'peak'calling'arises'from'combining'the'peak'calling'with'sta>s>cal'tes>ng'$• A$peaks$means$a$region$that$appears$to$have$an$enrichment.$$
$• The$opposite$of$peak$is$a$region$with$no'data'
$• Some$peaks$may$be$caused$by$random$agglomera;on$of$data$–$but$that$
those$are$s;ll$peaks$–$only$that$they$are$peaks$occurring$by$random$chance$$
• Most$published$peak$callers$will$o]en$remove$peaks$based$on$o]en$non%obvious$reasons$–$$
![Page 6: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8d47a583a7ec25cc23e9f7/html5/thumbnails/6.jpg)
Crazy$peaks$by$MACS$
![Page 7: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8d47a583a7ec25cc23e9f7/html5/thumbnails/7.jpg)
Understanding$peak$calling$
• We$don’t$need$to$use$the$en;re$read!$Only$the$5’$end$ma`ers.$$
• Transform$your$data$into$reads$that$are$1$bp$long$around$the$start$sites!$
• See$the$README.txt$for$step$by$step$instruc;ons$
$
![Page 8: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8d47a583a7ec25cc23e9f7/html5/thumbnails/8.jpg)
README.txt$in$the$data$
![Page 9: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8d47a583a7ec25cc23e9f7/html5/thumbnails/9.jpg)
Effects$of$increase$of$smoothing$
![Page 10: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8d47a583a7ec25cc23e9f7/html5/thumbnails/10.jpg)
Isolate$peak$calling$by$strands$
• Tools$that$merge$strands$always$need$to$make$assump;ons$on$the$data$–$some;mes$it$is$not$obvious$what$these$are$
• The$best$approach$is$to$operate$on$strand$individually,$then$combine$the$results$
• Caveat:$low$coverage,$error$prone$data$will$be$even$more$difficult$to$analyze$once$split$$
![Page 11: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8d47a583a7ec25cc23e9f7/html5/thumbnails/11.jpg)
Inves;ga;ng$two$error$models$$
Fidng$and$predic;ons$smoothing=5$
Fidng$and$predic;ons$smoothing=20$
DNA$fragment$length$=$factor$footprint$
![Page 12: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8d47a583a7ec25cc23e9f7/html5/thumbnails/12.jpg)
Peaks$with$no$exclusion$zone$around$them$
![Page 13: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8d47a583a7ec25cc23e9f7/html5/thumbnails/13.jpg)
How$to$get$the$average$fragment$size?$
1. Need$to$find$peak$pairs$2. Compute$distance$between$their$limits$$Step$by$step:$• Expand$each$fragment$to$the$le]$by$a$limit$• Intersect$fragment$on$the$opposite$strand$• Compute$distances$
![Page 14: lecture-25 · 2013-08-23 · 2012$%$BMMB$597D:$Analyzing$Next$Genera;on$Sequencing$Data$ $ $Week$13,$Lecture$25$ István'Albert' ' Biochemistry$and$Molecular$Biology$$ and$Bioinforma;cs$Consul;ng](https://reader034.fdocuments.us/reader034/viewer/2022050417/5f8d47a583a7ec25cc23e9f7/html5/thumbnails/14.jpg)
Homework$25$
• Run$the$pipeline$described$in$README.txt$$
• Report$the$average$fragment$size$that$it$generates.$