Advanced NCBI

103
Advanced NCBI. The Entrez API https://github.com/lindenb/courses Pierre Lindenbaum @yokofakun [email protected] http://plindenbaum.blogspot.com Institut du Thorax. Nantes. France September 27, 2016 Pierre Lindenbaum@yokofakun [email protected] Advanced NCBI.The Entrez APIhttps://github.com/lindenb/cour

Transcript of Advanced NCBI

Page 1: Advanced NCBI

Advanced NCBI.The Entrez API

https://github.com/lindenb/courses

Pierre Lindenbaum@yokofakun

[email protected]://plindenbaum.blogspot.com

Institut du Thorax. Nantes. France

September 27, 2016

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 2: Advanced NCBI

NCBI ? What about EBI, ENSEMBL, ...

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 3: Advanced NCBI

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 4: Advanced NCBI

What will be covered today? :

File formats...

EInfo, GQuery, ESearch , Esummary, EFetch..

processing XML answer with XSLT: HTML, SVG, R...

generating a java parser for dbSNP.

NCBI EBot

using standalone BLAST

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 5: Advanced NCBI

CURL

c u r l ” h t t p : / / en . w i k i p e d i a . o rg / w i k i / Main page ”wget −O − ” h t t p : / / en . w i k i p e d i a . o rg / w i k i / Main page ”

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 6: Advanced NCBI

XML

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 7: Advanced NCBI

XSLT

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 8: Advanced NCBI

XSLT

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 9: Advanced NCBI

XSLTPROC

x s l t p r o c s t y l e s h e e t . x s l f i l e . xml > r e s u l t . xml

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 10: Advanced NCBI

JSON

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 11: Advanced NCBI

Formats

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 12: Advanced NCBI

FormatsGenbank

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.

fcgi?db=nucleotide&id=25&rettype=gb

LOCUS X53813 422 bp DNA l i n e a r MAM 22−JUN−1992DEFINITION Blue Whale heavy s a t e l l i t e DNA.ACCESSION X53813 X17460VERSION X53813 . 1 GI : 25KEYWORDS s a t e l l i t e DNA.SOURCE Ba l a enop t e r a muscu lus ( Blue whale )

ORGANISM Ba la enop t e r a muscu lusEukaryota ; Metazoa ; Chordata ; C r an i a t a ; Ve r t e b r a t a ; Eu t e l e o s t om i ;Mammalia ; Eu t h e r i a ; L a u r a s i a t h e r i a ; C e t a r t i o d a c t y l a ; Cetacea ;My s t i c e t i ; B a l a e nop t e r i d a e ; Ba l a enop t e r a .

REFERENCE 1 ( ba se s 1 to 422)AUTHORS Arnason ,U. and Widegren ,B .TITLE Compos i t ion and chromosomal l o c a l i z a t i o n o f c e t acean h i g h l y

r e p e t i t i v e DNA with s p e c i a l r e f e r e n c e to the b l u e whale ,Ba l a enop t e r a muscu lus

JOURNAL Chromosoma 98 (5 ) , 323−329 (1989)PUBMED 2612291

COMMENT See a l s o <X52700−2> f o r 1 ,760 bp common ce tacean component c l o n e sand <X52703−6>,<X53811−4> f o r the 422 bp heavy s a t e l l i t e c l o n e s .

FEATURES Loca t i on / Q u a l i f i e r ss ou r c e 1 . . 4 2 2

/ organ i sm=”Ba l a enop t e r a muscu lus ”/mo l type=”genomic DNA”/ db x r e f=”taxon :9771”/ c l o n e=”7”

m i s c f e a t u r e 1 . . 4 2 2/ note=”heavy s a t e l l i t e DNA”

ORIGIN1 t a g t t a t t c a a c c t a t c c c a c t c t c t a g a t a c c c c t t a g c acgtaaagga a t a t t a t t t g

61 ggggtccagc ca tggagaa t ag t t t a ga c a c tagga tgag ataaggaaca c a c c c a t t c t121 aaagaaatca c a t t a g g a t t c t c t t t t t a a g c t g t t c c t t aaaacac tag ag t c t t a gaa181 a t c t a t t g g a ggcagaagca gtcaagggta g c c t aggg t t agggt taggc t t a ggg t t a g241 gg t t aggg ta cggc t taggg t a c t g t t t c g gggaggggtt caggtacggc g taggg ta tg301 gg t t a ggg t t agggt taggg t t a g t g t t a g gg t t agggc t cgg t t t aggg t a cggg t t ag361 ga t t aggg ta cg tg t t aggg t t aggg tagg g c t t a g g g t t agggtacgtg t t a ggg t t a g421 gg

//

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 13: Advanced NCBI

FormatsASN.1

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.

fcgi?db=nucleotide&id=25

Seq−e n t r y : := seq {i d {

embl {a c c e s s i o n ”X53813” ,v e r s i o n 1 } ,

g i 25 } ,d e s c r {

t i t l e ”Blue Whale heavy s a t e l l i t e DNA” ,s ou r c e {

org {taxname ” Ba l a enop t e r a muscu lus ” ,common ”Blue whale ” ,db {{

db ” taxon ” ,tag

i d 9771 } } ,orgname {

nameb i nom i a l {

genus ” Ba l a enop t e r a ” ,s p e c i e s ”muscu lus ” } ,

l i n e a g e ” Eukaryota ; Metazoa ; Chordata ; C r an i a t a ; Ve r t e b r a t a ;Eu t e l e o s t om i ; Mammalia ; Eu t h e r i a ; L a u r a s i a t h e r i a ; C e t a r t i o d a c t y l a ; Cetacea ;My s t i c e t i ; B a l a e nop t e r i d a e ; Ba l a enop t e r a ” ,

gcode 1 ,mgcode 2 ,d i v ”MAM” } } ,

sub type {{

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 14: Advanced NCBI

FormatsASN.1 (schema)

http:

//www.ncbi.nlm.nih.gov/data_specs/asn/insdseq.asn

INSDSeq : := SEQUENCE {l o c u s V i s i b l e S t r i n g ,l e n g t h INTEGER ,s t r a nd edn e s s V i s i b l e S t r i n g OPTIONAL ,moltype V i s i b l e S t r i n g ,t opo l ogy V i s i b l e S t r i n g OPTIONAL ,d i v i s i o n V i s i b l e S t r i n g ,update−date V i s i b l e S t r i n g ,c r e a t e−date V i s i b l e S t r i n g OPTIONAL ,update−r e l e a s e V i s i b l e S t r i n g OPTIONAL ,c r e a t e−r e l e a s e V i s i b l e S t r i n g OPTIONAL ,d e f i n i t i o n V i s i b l e S t r i n g ,pr imary−a c c e s s i o n V i s i b l e S t r i n g OPTIONAL ,ent ry−v e r s i o n V i s i b l e S t r i n g OPTIONAL ,a c c e s s i o n−v e r s i o n V i s i b l e S t r i n g OPTIONAL ,othe r−s e q i d s SEQUENCE OF INSDSeqid OPTIONAL ,secondary−a c c e s s i o n s SEQUENCE OF INSDSecondary−accn OPTIONAL,p r o j e c t V i s i b l e S t r i n g OPTIONAL ,keywords SEQUENCE OF INSDKeyword OPTIONAL ,segment V i s i b l e S t r i n g OPTIONAL ,s ou r c e V i s i b l e S t r i n g OPTIONAL ,organ i sm V i s i b l e S t r i n g OPTIONAL ,taxonomy V i s i b l e S t r i n g OPTIONAL ,r e f e r e n c e s SEQUENCE OF INSDReference OPTIONAL ,comment V i s i b l e S t r i n g OPTIONAL ,comment−s e t SEQUENCE OF INSDComment OPTIONAL ,s t r u c−comments SEQUENCE OF INSDStrucComment OPTIONAL ,p r imary V i s i b l e S t r i n g OPTIONAL ,source−db V i s i b l e S t r i n g OPTIONAL ,database−r e f e r e n c e V i s i b l e S t r i n g OPTIONAL ,f e a t u r e−t a b l e SEQUENCE OF INSDFeature OPTIONAL ,f e a t u r e−s e t SEQUENCE OF INSDFeatureSet OPTIONAL ,sequence V i s i b l e S t r i n g OPTIONAL , −− Opt i ona l f o r con t i g , wgs , e t c .c o n t i g V i s i b l e S t r i n g OPTIONAL ,a l t−seq SEQUENCE OF INSDAltSeqData OPTIONAL

}

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 15: Advanced NCBI

FormatsASN.1 (tools)

DATATOOLGenerate C++ data storage classes based on ASN.1 serialization

streams.Convert data between ASN.1, XML and JSON formats.

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 16: Advanced NCBI

FormatsXML

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.

fcgi?db=nucleotide&id=25&retmode=xml

<?xml v e r s i o n=” 1 .0 ”?><!DOCTYPE GBSet PUBLIC ”−//NCBI//NCBI GBSeq/EN” ” h t t p : //www. ncb i . nlm . n i h . gov/ dtd /NCBI GBSeq . dtd ”><GBSet>

<GBSeq><GBSeq locus>X53813</GBSeq locus><GBSeq length>422</GBSeq length><GBSeq st randedness>doub l e</GBSeq st randedness><GBSeq moltype>DNA</GBSeq moltype><GBSeq topology> l i n e a r</GBSeq topology><GBSeq d i v i s i o n>MAM</ GBSeq d i v i s i o n><GBSeq update−date>22−JUN−1992</GBSeq update−date><GBSeq create−date>13−JUL−1990</GBSeq create−date><GBSeq d e f i n i t i o n>Blue Whale heavy s a t e l l i t e DNA</ GBSeq d e f i n i t i o n><GBSeq primary−a c c e s s i o n>X53813</GBSeq primary−a c c e s s i o n><GBSeq access ion−v e r s i o n>X53813 . 1</GBSeq access ion−v e r s i o n><GBSeq other−s e q i d s>

<GBSeqid>emb |X53813 . 1 |</GBSeqid><GBSeqid>g i |25</GBSeqid>

</GBSeq other−s e q i d s><GBSeq secondary−a c c e s s i o n s>

<GBSecondary−accn>X17460</GBSecondary−accn></GBSeq secondary−a c c e s s i o n s><GBSeq keywords>

<GBKeyword> s a t e l l i t e DNA</GBKeyword></GBSeq keywords><GBSeq source>Ba laenop t e r a muscu lus ( Blue whale )</GBSeq source><GBSeq organism>Ba laenop t e r a muscu lus</GBSeq organism><GBSeq taxonomy>Eukaryota ; Metazoa ; Chordata ; C r an i a t a ; Ve r t e b r a t a ; Eu t e l e o s t om i ; Mammalia ; Eu t h e r i a ; L a u r a s i a t h e r i a ; C e t a r t i o d

a c t y l a ; Cetacea ; My s t i c e t i ; B a l a e nop t e r i d a e ; Ba l a enop t e r a</GBSeq taxonomy><GBSeq r e f e r ence s>

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 17: Advanced NCBI

FormatsXML (DTD)

http://www.ncbi.nlm.nih.gov/dtd/NCBI_GBSeq.mod.dtd

<!ELEMENT GBSeq (GBSeq locus ,GBSeq length ,GBSeq s t randedness ? ,GBSeq moltype ,GBSeq topology ? ,GBSeq d i v i s i on ,GBSeq update−date ,GBSeq create−date ? ,GBSeq update−r e l e a s e ? ,GBSeq create−r e l e a s e ? ,GBSeq de f i n i t i o n ,GBSeq primary−a c c e s s i o n ? ,GBSeq entry−v e r s i o n ? ,GBSeq access ion−v e r s i o n ? ,GBSeq other−s e q i d s ? ,GBSeq secondary−a c c e s s i o n s ? ,GBSeq pro j ec t ? ,GBSeq keywords ? ,GBSeq segment ? ,GBSeq source ? ,GBSeq organism ? ,GBSeq taxonomy ? ,GBSeq r e f e r ence s ? ,GBSeq comment ? ,GBSeq comment−s e t ? ,GBSeq struc−comments ? ,( . . . )

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 18: Advanced NCBI

E-Utilities

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 19: Advanced NCBI

GI

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 20: Advanced NCBI

GI

http://www.ncbi.nlm.nih.gov/news/

03-02-2016-phase-out-of-GI-numbers/ : ”NCBI is phasingout sequence GIs - use Accession.Version instead!”

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 21: Advanced NCBI

E-Utilities

Set of seven server-side programs that provide a stable interface tothe search, retrieval, and linking functions of the Entrez system,

using a fixed URL syntax.The output provided by the E-Utilities is in XML format,

sometimes JSON, (...)

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 22: Advanced NCBI

Entrez Direct

http://www.ncbi.nlm.nih.gov/books/NBK179288/ ”EntrezDirect (EDirect) is an advanced method for accessing the NCBI’sset of interconnected databases (publication, sequence, structure,gene, variation, expression, etc.) from a UNIX terminal window.

Functions take search terms from command-line arguments.Individual operations are combined to build multi-step queries.

Record retrieval and formatting normally complete the process.”

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 23: Advanced NCBI

EInfo

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 24: Advanced NCBI

EInfo

Provides a list of the names of all valid Entrez databases.Provides statistics for a single database, including lists of indexing

fields and available link names.

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 25: Advanced NCBI

EInfo

Base URL:https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 26: Advanced NCBI

EInfoXML Ouput

https:

//eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi

<e I n f o R e s u l t><D bLi s t>

<DbName>pubmed</DbName><DbName>p r o t e i n</DbName><DbName>n u c c o r e</DbName><DbName>n u c l e o t i d e</DbName><DbName>n u c g s s</DbName><DbName>n u c e s t</DbName><DbName>s t r u c t u r e</DbName><DbName>genome</DbName><DbName>a s s e m b l y</DbName><DbName>g c a s s e m b l y</DbName><DbName>genomepr j</DbName><DbName>b i o p r o j e c t</DbName><DbName>b i o s a m p l e</DbName><DbName>b i o s y s t e m s</DbName><DbName>b l a s t d b i n f o</DbName><DbName>books</DbName><DbName>cdd</DbName><DbName>c l i n v a r</DbName>

( . . . )

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 27: Advanced NCBI

EInfoJSON Ouput

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.

fcgi?retmode=json

{” h e a d e r ” : {

” t y p e ” : ” e i n f o ” ,” v e r s i o n ” : ” 0 . 3 ”

} ,” e i n f o r e s u l t ” : {

” d b l i s t ” : [”pubmed ” ,” p r o t e i n ” ,” n u c c o r e ” ,

( . . . )” u n i g e n e ” ,” g e n c o l l ” ,” g t r ”

]}

}Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 28: Advanced NCBI

EInfo

Return statistics for a given Entrez database:https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.fcgi?

db=DbName

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 29: Advanced NCBI

EInfoStatistics for Pubmed

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.

fcgi?db=pubmed

<?xml v e r s i o n=” 1 .0 ”?><e I n f o R e s u l t>

<DbInfo><DbName>pubmed</DbName><MenuName>PubMed</MenuName><De s c r i p t i o n>PubMed b i b l i o g r a p h i c r e c o r d</ De s c r i p t i o n><DbBui ld>Bui ld130805−2117m.4</DbBui ld><Count>22974581</Count><LastUpdate>2013/08/06 08 :33</ LastUpdate><F i e l d L i s t>

( . . . )<F i e l d>

<Name>UID</Name><FullName>UID</FullName><De s c r i p t i o n>Unique number a s s i g n e d to p u b l i c a t i o n</ De s c r i p t i o n><TermCount>0</TermCount><I sDa t e>N</ I sDa t e><I sNume r i c a l>Y</ I sNume r i c a l><S ing l eToken>Y</ S ing l eToken><H i e r a r c h y>N</ H i e r a r c h y><I sH idden>Y</ I sH idden>

</ F i e l d><F i e l d>

( . . . )

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 30: Advanced NCBI

EInfoStatistics for Pubmed

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/einfo.

fcgi?db=pubmed&retmode=json

{” heade r ” : {

” type ” : ” e i n f o ” ,” v e r s i o n ” : ”0 .3”

} ,” e i n f o r e s u l t ” : {

” db i n f o ” : {”dbname ” : ”pubmed” ,”menuname ” : ”PubMed” ,” d e s c r i p t i o n ” : ”PubMed b i b l i o g r a p h i c r e c o r d ” ,” dbbu i l d ” : ”Bui ld160921−2207m.6” ,” count ” : ”26470199” ,” l a s t u p d a t e ” : ”2016/09/22 16 :32” ,” f i e l d l i s t ” : [

{”name ” : ”ALL” ,” f u l l n ame ” : ” A l l F i e l d s ” ,” d e s c r i p t i o n ” : ” A l l te rms from a l l s e a r c h a b l e f i e l d s ” ,” termcount ” : ”179424126” ,” i s d a t e ” : ”N” ,” i s n um e r i c a l ” : ”N” ,” s i n g l e t o k e n ” : ”N” ,” h i e r a r c h y ” : ”N” ,” i s h i d d e n ” : ”N”

} ,{

”name ” : ”UID” ,” f u l l n ame ” : ”UID” ,” d e s c r i p t i o n ” : ”Unique number a s s i g n e d to p u b l i c a t i o n ” ,

( . . . )Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 31: Advanced NCBI

EInfoWith entrez-direct

$ e i n f o −dbs$ e i n f o −db pubmed

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 32: Advanced NCBI

GQuery

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 33: Advanced NCBI

GQuery

Provides the number of records retrieved in all Entrez databases bya single text query.

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 34: Advanced NCBI

GQueryExample

$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ gquery ? term=ty r anno s au r u s%20r e x&retmode=xml”

<Re su l t><Term>t y r a nno s au r u s r e x</Term><eGQueryResu l t>

<Resu l t I t em><DbName>pubmed</DbName><MenuName/><Count>41</Count><Sta tu s>Ok</ S ta tu s></ Re su l t I t em>

<Resu l t I t em><DbName>pmc</DbName><MenuName/><Count>160</Count><Sta tu s>Ok</ S ta tu s></ Re su l t I t em>

<Resu l t I t em><DbName>mesh</DbName><MenuName/><Count>15</Count><Sta tu s>Ok</ S ta tu s></ Re su l t I t em>

<Resu l t I t em><DbName>books</DbName><MenuName/><Count>179</Count><Sta tu s>Ok</ S ta tu s></ Re su l t I t em>

<Resu l t I t em><DbName>pubmedhealth</DbName><MenuName/><Count>21</Count><Sta tu s>Ok</ S ta tu s></ Re su l t I t em>

<Resu l t I t em><DbName>omim</DbName><MenuName/><Count>10</Count><Sta tu s>Ok</ S ta tu s></ Re su l t I t em>

<Resu l t I t em><DbName>omia</DbName><MenuName/><Count>0</Count><Sta tu s>Termor Database i s not found</ S ta tu s></ Re su l t I t em>

<Resu l t I t em><DbName>n c b i s e a r c h</DbName><MenuName/><Count>1</Count><Sta tu s>Ok</ S ta tu s></ Re su l t I t em>

<Resu l t I t em><DbName>nucco re</DbName><MenuName/><Count>0</Count><Sta tu s>Term or Database i s not found</ S ta tu s></ Re su l t I t em>

( . . . )

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 35: Advanced NCBI

GQueryTransforming to HTML using XSLT

The XSLT stylesheet. https://raw.githubusercontent.com/

lindenb/courses/master/about.ncbi/gquery2html.xsl

1 <?xml v e r s i o n=’ 1 .0 ’ encod ing=”UTF−8” ?>2 <x s l : s t y l e s h e e t xm l n s : x s l= ’ h t t p : //www.w3 . org /1999/XSL/Transform ’ v e r s i o n=’ 1 .0 ’>3 <x s l : o u t p u t method=”html ”/>45 <x s l : t em p l a t e match=”/”><html><body>6 <x s l : a p p l y−t emp l a t e s s e l e c t=” Re su l t ”/>7 </body></html></ x s l : t em p l a t e>89 <x s l : t em p l a t e match=” Re su l t ”>

10 <t a b l e><c ap t i o n><x s l : v a l u e−o f s e l e c t=”Term”/></ c ap t i o n>11 <t r><th>Database</ th><th>Count</ th><th>Sta tu s</ th></ t r>12 <x s l : a p p l y−t emp l a t e s s e l e c t=” eGQueryResu l t / Re su l t I t em ”/>13 </ t a b l e>14 </ x s l : t em p l a t e>1516 <x s l : t em p l a t e match=” Re su l t I t em ”>17 <t r>18 <td><a>19 <x s l : a t t r i b u t e name=” h r e f ”>h t t p : //www. ncb i . nlm . n i h . gov/<x s l : v a l u e−o f s e l e c t=”

DbName”/>?cmd=sea r ch&amp ; term=<x s l : v a l u e−o f s e l e c t=” t r a n s l a t e (/ Re s u l t /Term, ’ ’ , ’+ ’ ) ”/></ x s l : a t t r i b u t e>

20 <x s l : v a l u e−o f s e l e c t=”DbName”/></a></ td>21 <td><x s l : v a l u e−o f s e l e c t=”Count”/></ td>22 <td><x s l : v a l u e−o f s e l e c t=” Sta tu s ”/></ td>23 </ t r>24 </ x s l : t em p l a t e>2526 </ x s l : s t y l e s h e e t>

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 36: Advanced NCBI

GQueryTransforming to HTML

$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ gquery ? term=ty r anno s au r u s%20r e x&retmode=xml” |\

x s l t p r o c gquery2html . x s l −

<html><body>

<t a b l e><capt i on>t y r a nno s au r u s r e x</ capt i on><t r>

<th>Database</ th><th>Count</ th><th>Sta tu s</ th>

</ t r><t r>

<td><a h r e f=” h t t p s : //www. ncb i . nlm . n i h . gov/pubmed?cmd=sea r ch&amp ; term=ty r anno s au r u s+r e x ”>pubmed</a>

</ td><td>41</ td><td>Ok</ td>

</ t r><t r>

<td><a h r e f=” h t t p s : //www. ncb i . nlm . n i h . gov/pmc?cmd=sea r c h&amp ; term=ty r anno s au r u s+r e x ”>pmc</a>

</ td><td>160</ td><td>Ok</ td>

</ t r><t r>

<td><a h r e f=” h t t p s : //www. ncb i . nlm . n i h . gov/mesh?cmd=sea r ch&amp ; term=ty r anno s au r u s+r e x ”>mesh</a>

</ td><td>15</ td>Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 37: Advanced NCBI

ESearch

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 38: Advanced NCBI

ESearch

Provides a list of UIDs matching a text query

Posts the results of a search on the History server

Downloads all UIDs from a dataset stored on the Historyserver

Combines or limits UID datasets stored on the History server

Sorts sets of UIDs

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 39: Advanced NCBI

ESearchSyntax

Base URL https:

//eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 40: Advanced NCBI

ESearchSearching for ’Mammuthus primigenius’

c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D” |\

xm l l i n t −−fo rmat −

<eSea r c hRe su l t><Count>684</Count><RetMax>20</RetMax><Re tS t a r t>0</ Re tS t a r t><I d L i s t>

<I d>507866428</ Id><I d>124056416</ Id><I d>383843869</ Id><I d>383843867</ Id><I d>383843865</ Id><I d>383843863</ Id><I d>383843861</ Id><I d>383843859</ Id><I d>383843857</ Id><I d>383843855</ Id><I d>383843853</ Id><I d>383843851</ Id><I d>383843849</ Id><I d>383843847</ Id><I d>383843845</ Id><I d>157367690</ Id><I d>157367676</ Id><I d>157367662</ Id><I d>157367648</ Id><I d>157367634</ Id>

</ I d L i s t><Tr a n s l a t i o n S e t>

<T r a n s l a t i o n><From>”Mammuthus p r im i g e n i u s ” [ORGN]</From><To>”Mammuthus p r im i g e n i u s ” [ Organism ]</To>

</ T r a n s l a t i o n></ T r a n s l a t i o n S e t><Tran s l a t i o nS t a c k>

<TermSet><Term>”Mammuthus p r im i g e n i u s ” [ Organism ]</Term><F i e l d>Organism</ F i e l d><Count>684</Count><Exp lode>Y</ Exp lode>

</TermSet><OP>GROUP</OP>

</ T r a n s l a t i o nS t a c k><Que r yT ran s l a t i o n>”Mammuthus p r im i g e n i u s ” [ Organism ]</ Que r yT ran s l a t i o n>

</ eSea r chRe su l t>

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 41: Advanced NCBI

ESearchSearching for ’Mammuthus primigenius’ (JSON)

c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D&retmode=j s o n ”

{” heade r ” : {

” type ” : ” e s e a r c h ” ,” v e r s i o n ” : ”0 .3”

} ,” e s e a r c h r e s u l t ” : {

” count ” : ”811” ,” retmax ” : ”20” ,” r e t s t a r t ” : ”0” ,” i d l i s t ” : [

”1059791223” ,”198241525” ,”198241523” ,”198241521” ,”198241519” ,”198241517” ,”198241515” ,”198241513” ,”198241511” ,”198241509” ,”198241507” ,”198241505” ,”198241503” ,”198241501” ,”198241499” ,”198241497” ,”198241495” ,”198241493” ,”198241491” ,”198241489”

] ,” t r a n s l a t i o n s e t ” : [

{” from ” : ”\”Mammuthus p r im i g e n i u s \”[ORGN]” ,” to ” : ”\”Mammuthus p r im i g e n i u s \”[ Organism ]”

}] ,” t r a n s l a t i o n s t a c k ” : [

{” term ” : ”\”Mammuthus p r im i g e n i u s \”[ Organism ] ” ,” f i e l d ” : ”Organism ” ,” count ” : ”811” ,” exp l ode ” : ”Y”

} ,”GROUP”

] ,” q u e r y t r a n s l a t i o n ” : ”\”Mammuthus p r im i g e n i u s \”[ Organism ]”

}}

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 42: Advanced NCBI

ESearchthe retmax parameter

c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D&retmax=2” |\

xm l l i n t −−fo rmat −

<eSea r c hRe su l t><Count>684</Count><RetMax>2</RetMax><Re tS t a r t>0</ Re tS t a r t><I d L i s t>

<I d>507866428</ Id><I d>124056416</ Id>

</ I d L i s t><Tr a n s l a t i o n S e t>

<T r a n s l a t i o n><From>”Mammuthus p r im i g e n i u s ” [ORGN]</From><To>”Mammuthus p r im i g e n i u s ” [ Organism ]</To>

</ T r a n s l a t i o n></ T r a n s l a t i o n S e t><Tran s l a t i o nS t a c k>

<TermSet><Term>”Mammuthus p r im i g e n i u s ” [ Organism ]</Term><F i e l d>Organism</ F i e l d><Count>684</Count><Exp lode>Y</ Exp lode>

</TermSet><OP>GROUP</OP>

</ T r a n s l a t i o nS t a c k><Que r yT ran s l a t i o n>”Mammuthus p r im i g e n i u s ” [ Organism ]</ Que r yT ran s l a t i o n>

</ eSea r chRe su l t>

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 43: Advanced NCBI

ESearchthe retstart parameter

c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D&retmax=3&r e t s t a r t =100” |\

xm l l i n t −−fo rmat −

<eSea r c hRe su l t><Count>684</Count><RetMax>3</RetMax><Re tS t a r t>100</ Re tS t a r t><I d L i s t>

<I d>300810656</ Id><I d>300810655</ Id><I d>300810654</ Id>

</ I d L i s t><Tr a n s l a t i o n S e t>

<T r a n s l a t i o n><From>”Mammuthus p r im i g e n i u s ” [ORGN]</From><To>”Mammuthus p r im i g e n i u s ” [ Organism ]</To>

</ T r a n s l a t i o n></ T r a n s l a t i o n S e t><Tran s l a t i o nS t a c k>

<TermSet><Term>”Mammuthus p r im i g e n i u s ” [ Organism ]</Term><F i e l d>Organism</ F i e l d><Count>684</Count><Exp lode>Y</ Exp lode>

</TermSet><OP>GROUP</OP>

</ T r a n s l a t i o nS t a c k><Que r yT ran s l a t i o n>”Mammuthus p r im i g e n i u s ” [ Organism ]</ Que r yT ran s l a t i o n>

</ eSea r chRe su l t>

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 44: Advanced NCBI

ESearchrettype=retcount

c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D&r e t t y p e=count ” |\

xm l l i n t −−fo rmat −

<e S e a r c h R e s u l t><Count>684</ Count>

</ e S e a r c h R e s u l t>

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 45: Advanced NCBI

ESearchsort=Date Released

c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=nu c l e o t i d e&term=%22Mammuthus%20p r im i g e n i u s%22%5BORGN%5D&s o r t=Date+Re l ea s ed ”

xm l l i n t −−fo rmat −

<e S e a r c h R e s u l t><Count>811</ Count><RetMax>20</RetMax><R e t S t a r t>0</ R e t S t a r t>< I d L i s t><I d>1033204644</ I d><I d>1033204658</ I d><I d>1033204672</ I d><I d>1033204686</ I d><I d>1033204729</ I d><I d>1033204771</ I d><I d>1033204785</ I d><I d>1033204799</ I d><I d>1033204813</ I d><I d>1033204827</ I d><I d>1033204871</ I d><I d>1033205124</ I d><I d>1033205194</ I d><I d>1033205208</ I d><I d>1033205222</ I d><I d>1033205236</ I d><I d>1033205264</ I d><I d>1033205390</ I d>( . . . )

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 46: Advanced NCBI

ESummary

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 47: Advanced NCBI

ESummarySyntax

Returns document summaries (DocSums) for a list of inputUIDs

Returns DocSums for a set of UIDs stored on the EntrezHistory server

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 48: Advanced NCBI

ESummarySyntax

Base URL: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=(DB)&id=(TERM)

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 49: Advanced NCBI

ESummaryRetrieve nucleotide gi=507866428

$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=nu c l e o t i d e&i d =507866428”

<eSummaryResult><DocSum><I d>507866428</ Id><I tem Name=”Capt ion ” Type=” S t r i n g ”>KC524742</ Item><I tem Name=” T i t l e ” Type=” S t r i n g ”>Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in (Mb) gene , p a r t i a l cds</ Item><I tem Name=”Ext ra ” Type=” S t r i n g ”>g i |507866428 | gb |KC524742 . 1 | [ 5 0 7866428 ]</ Item><I tem Name=”Gi ” Type=” I n t e g e r ”>507866428</ Item><I tem Name=”CreateDate ” Type=” S t r i n g ”>2013/06/15</ Item><I tem Name=”UpdateDate” Type=” S t r i n g ”>2013/06/21</ Item><I tem Name=” F l ag s ” Type=” I n t e g e r ”>0</ Item><I tem Name=”TaxId ” Type=” I n t e g e r ”>37349</ Item><I tem Name=”Length ” Type=” I n t e g e r ”>9042</ Item><I tem Name=” Sta tu s ” Type=” S t r i n g ”> l i v e</ Item><I tem Name=”ReplacedBy ” Type=” S t r i n g ”></ Item><I tem Name=”Comment” Type=” S t r i n g ”><! [CDATA[ ] ]></ Item></DocSum></ eSummaryResult>

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 50: Advanced NCBI

ESummaryRetrieve nucleotide gi=507866428 in JSON

$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=nu c l e o t i d e&i d =507866428& retmode=j s o n ”

{” heade r ” : {

” type ” : ”esummary ” ,” v e r s i o n ” : ”0 .3”

} ,” r e s u l t ” : {

” u i d s ” : [”507866428”

] ,”507866428”: {

” u id ” : ”507866428” ,” c ap t i o n ” : ”KC524742 ” ,” t i t l e ” : ”Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in (Mb) gene , p a r t i a l cds ” ,” e x t r a ” : ” g i |507866428 | gb |KC524742 . 1 | ” ,” g i ” : 507866428 ,” c r e a t e d a t e ” : ”2013/06/15” ,” updatedate ” : ”2013/06/21” ,” f l a g s ” : ”” ,” t a x i d ” : 37349 ,” s l e n ” : 9042 ,” b iomol ” : ” genomic ” ,”moltype ” : ”dna ” ,” t opo l ogy ” : ” l i n e a r ” ,” sou rcedb ” : ” i n s d ” ,” s e g s e t s i z e ” : ”” ,” p r o j e c t i d ” : ”0” ,

( . . . )

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 51: Advanced NCBI

ESummaryRetrieve snp rs25

$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=snp&i d=25”

<eSummaryResult><DocSum><I d>25</ Id><I tem Name=”SNP ID” Type=” I n t e g e r ”>25</ Item><I tem Name=”Organism” Type=” S t r i n g ”></ Item><I tem Name=”ALLELE ORIGIN” Type=” S t r i n g ”></ Item><I tem Name=”GLOBAL MAF” Type=” S t r i n g ”>0 .4913</ Item><I tem Name=”GLOBAL POPULATION” Type=” S t r i n g ”></ Item><I tem Name=”GLOBAL SAMPLESIZE” Type=” I n t e g e r ”>0</ Item><I tem Name=”SUSPECTED” Type=” S t r i n g ”></ Item><I tem Name=”CLINICAL SIGNIFICANCE” Type=” S t r i n g ”></ Item><I tem Name=”GENE” Type=” S t r i n g ”>THSD7A</ Item><I tem Name=”LOCUS ID” Type=” I n t e g e r ”>221981</ Item><I tem Name=”ACC” Type=” S t r i n g ”>NM 015204 . 2 , NT 007819 .17</ Item><I tem Name=”CHR” Type=” S t r i n g ”>7</ Item><I tem Name=”WEIGHT” Type=” I n t e g e r ”>1</ Item><I tem Name=”HANDLE” Type=” S t r i n g ”>1000GENOMES, BGI , BL ,BUSHMAN,COMPLETE GENOMICS,CSHL−HAPMAP,GMI , ILLUMINA−UK,KWOK,PERLEGEN,SSMP,TISHKOFF</ Item><I tem Name=”FXN CLASS” Type=” S t r i n g ”>i n t r on−v a r i a n t</ Item><I tem Name=”VALIDATED” Type=” S t r i n g ”>by−1000G, by−c l u s t e r , by−f r equency , by−hapmap</ Item><I tem Name=”GTYPE” Type=” S t r i n g ”>t r u e</ Item><I tem Name=”NONREF” Type=” S t r i n g ”>f a l s e</ Item><I tem Name=”DOCSUM” Type=” S t r i n g ”>HGVS=NC 000007 .13 : g .11584142T&gt ;C , NG 027670 . 1 : g .292683A&gt ;G, NM 015204 . 2 : c .1454−1398A&gt ;G, NT 007819 .17 : g .11574142T&gt ;C|SEQ=TCTGTGAGCTTCTGCATGCAATCCT[A/G]TGCAATTGGAATTTGATAGTCCTTT|GENE=THSD7A:221981</ Item><I tem Name=”HET” Type=” I n t e g e r ”>50</ Item><I tem Name=”SRATE” Type=” I n t e g e r ”>0</ Item><I tem Name=”TAX ID” Type=” I n t e g e r ”>9606</ Item><I tem Name=”CHRRPT” Type=” S t r i n g ”>2 5 | 2 | 0 | 1 | 1 | 1 | 7 | NT 007819 .17 |11574141 |11584142 |THSD7A|0 . 499848 |0 . 00872267 | | 51 |1 | 1 |36 | 13 8 | 0 | | | T:2178 :0 .4913</ Item><I tem Name=”ORIG BUILD” Type=” I n t e g e r ”>36</ Item><I tem Name=”UPD BUILD” Type=” I n t e g e r ”>138</ Item><I tem Name=”CREATEDATE” Type=” S t r i n g ”>2000−09−19 17 :02</ Item><I tem Name=”UPDATEDATE” Type=” S t r i n g ”>2013−06−21 14 :17</ Item><I tem Name=”POP CLASS” Type=” S t r i n g ”></ Item><I tem Name=”METHOD CLASS” Type=” S t r i n g ”>computed , h y b r i d i z e , sequence , unknown</ Item><I tem Name=”SNP3D” Type=” S t r i n g ”></ Item><I tem Name=”LINKOUT” Type=” S t r i n g ”>ILLUMINA−UK| h t t p : //www. i l l um i n a . com/HumanGenomeNA18507 000019106 NCBI36 . 1 ch r7 11550667</ Item><I tem Name=”SS” Type=” I n t e g e r ”>654151077</ Item><I tem Name=”LOCSNPID” Type=” S t r i n g ”>7 11584142</ Item><I tem Name=”ALLELE” Type=” S t r i n g ”>R</ Item><I tem Name=”SNP CLASS” Type=” S t r i n g ”>snp</ Item><I tem Name=”CHRPOS” Type=” S t r i n g ”>7 :11584142</ Item><I tem Name=”CONTIGPOS” Type=” S t r i n g ”>NT 007819 .17 :11574142</ Item><I tem Name=”TEXT” Type=” S t r i n g ”></ Item><I tem Name=”LOOKUP” Type=” S t r i n g ”>325952</ Item></DocSum></ eSummaryResult>

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 52: Advanced NCBI

ESummaryRetrieve pubmed pmid=7939126

$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s /esummary . f c g i ?db=pubmed&i d =7939126”

<eSummaryResult><DocSum><I d>7939126</ Id><I tem Name=”PubDate” Type=”Date”>1994 Apr</ Item><I tem Name=”EPubDate” Type=”Date”></ Item><I tem Name=”Source ” Type=” S t r i n g ”>S l e ep</ Item><I tem Name=” Au tho rL i s t ” Type=” L i s t ”><I tem Name=”Author ” Type=” S t r i n g ”>Broughton R</ Item><I tem Name=”Author ” Type=” S t r i n g ”>B i l l i n g s R</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Ca r tw r i gh t R</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Doucette D</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Edmeads J</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Edwardh M</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Er v i n F</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Orchard B</ Item><I tem Name=”Author ” Type=” S t r i n g ”>H i l l R</ Item><I tem Name=”Author ” Type=” S t r i n g ”>Tu r r e l l G</ Item></ Item><I tem Name=” LastAuthor ” Type=” S t r i n g ”>Tu r r e l l G</ Item><I tem Name=” T i t l e ” Type=” S t r i n g ”>Homic ida l somnambul ism: a ca se r e p o r t .</ Item><I tem Name=”Volume” Type=” S t r i n g ”>17</ Item><I tem Name=” I s s u e ” Type=” S t r i n g ”>3</ Item><I tem Name=”Pages ” Type=” S t r i n g ”>253−64</ Item><I tem Name=” LangL i s t ” Type=” L i s t ”><I tem Name=”Lang” Type=” S t r i n g ”>Eng l i s h</ Item></ Item><I tem Name=”NlmUniqueID” Type=” S t r i n g ”>7809084</ Item><I tem Name=”ISSN” Type=” S t r i n g ”>0161−8105</ Item><I tem Name=”ESSN” Type=” S t r i n g ”>1550−9109</ Item><I tem Name=”PubTypeList ” Type=” L i s t ”><I tem Name=”PubType” Type=” S t r i n g ”>Jou r na l A r t i c l e</ Item></ Item><I tem Name=”Reco rdSta tus ” Type=” S t r i n g ”>PubMed − i ndexed f o r MEDLINE</ Item><I tem Name=”PubStatus ” Type=” S t r i n g ”>ppub l i s h</ Item><I tem Name=” A r t i c l e I d s ” Type=” L i s t ”><I tem Name=”pubmed” Type=” S t r i n g ”>7939126</ Item><I tem Name=” e i d ” Type=” S t r i n g ”>7939126</ Item><I tem Name=” r i d ” Type=” S t r i n g ”>7939126</ Item></ Item><I tem Name=” H i s t o r y ” Type=” L i s t ”><I tem Name=”pubmed” Type=”Date”>1994/04/01 00 :00</ Item><I tem Name=”med l i ne ” Type=”Date”>1994/04/01 00 :01</ Item><I tem Name=” en t r e z ” Type=”Date”>1994/04/01 00 :00</ Item></ Item><I tem Name=” Re f e r e n c e s ” Type=” L i s t ”></ Item><I tem Name=”HasAbst rac t ” Type=” I n t e g e r ”>1</ Item><I tem Name=”PmcRefCount” Type=” I n t e g e r ”>4</ Item><I tem Name=”Fu l l Journa lName ” Type=” S t r i n g ”>S l e ep</ Item><I tem Name=”ELocat ion ID ” Type=” S t r i n g ”></ Item><I tem Name=”SO” Type=” S t r i n g ”>1994 Apr ; 1 7 ( 3 ) :253−64</ Item></DocSum></ eSummaryResult>

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 53: Advanced NCBI

EFetch

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 54: Advanced NCBI

EFetchSyntax

Base URL: https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=(db)&id=(ID)

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 55: Advanced NCBI

EFetchRetrieve nucleotide gi=507866428 as ASN.1

Default https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=nucleotide&id=507866428

Seq−e n t r y : := s e t {c l a s s nuc−p ro t ,d e s c r {

source {genome genomic ,org {

taxname ”Mammuthus p r im i g e n i u s ” ,common ” woo l l y mammoth” ,db {{

db ” taxon ” ,tag

i d 37349 } } ,orgname {

nameb i nom i a l {

genus ”Mammuthus” ,s p e c i e s ” p r im i g e n i u s ” } ,

mod {{

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 56: Advanced NCBI

EFetchRetrieve nucleotide gi=507866428 as Fasta

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.

fcgi?db=nucleotide&id=507866428&rettype=fasta

>g i |507866428 | gb |KC524742 . 1 | Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in(Mb) gene , p a r t i a l cds

GCACTTGCTTTTTTTGTCTTCTTCAGACCACGACATGGGACTCAGCGACGGGGAATGGGAGTTGGTGTTGAAAACCTGGGGGAAAGTGGAGGCTGACATCCCGGGCCATGGGCTGGAAGTCTTCGTCAGGTAAAGGAAGAAATCCTGTGGCCCCCATCACCCACCCCNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 57: Advanced NCBI

EFetchRetrieve nucleotide gi=507866428 as TinySeq

https:

//eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?

db=nucleotide&id=507866428&rettype=fasta&retmode=xml

<?xml v e r s i o n=” 1 .0 ”?><!DOCTYPE TSeqSet PUBLIC ”−//NCBI//NCBI TSeq/EN”<TSeqSet>

<TSeq><TSeq seqtype v a l u e=” n u c l e o t i d e ”/><TSeq g i>507866428</TSeq g i><TSeq accver>KC524742 . 1</TSeq accver><TSeq tax id>37349</TSeq tax id><TSeq orgname>Mammuthus p r im i g e n i u s</TSeq orgnam<TSeq d e f l i n e>Mammuthus p r im i g e n i u s i s o l a t e CME2<TSeq length>9042</TSeq length><TSeq sequence>GCACTTGCTTTTTTTGTCTTCTTCAGACCACGA

</TSeq></TSeqSet>

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 58: Advanced NCBI

EFetchRetrieve nucleotide gi=507866428 as Genbank-xml

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.

fcgi?db=nucleotide&id=507866428&retmode=xml

<GBSeq><GBSeq locus>KC524742</GBSeq locus><GBSeq length>9042</GBSeq length><GBSeq st randedness>doub l e</GBSeq st randedness><GBSeq moltype>DNA</GBSeq moltype><GBSeq topology> l i n e a r</GBSeq topology><GBSeq d i v i s i o n>MAM</ GBSeq d i v i s i o n><GBSeq update−date>21−JUN−2013</GBSeq update−date><GBSeq create−date>15−JUN−2013</GBSeq create−date><GBSeq d e f i n i t i o n>Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in (Mb) gene , p a r t i a l cds</ GBSeq d e f i n i t i o n><GBSeq primary−a c c e s s i o n>KC524742</GBSeq primary−a c c e s s i o n><GBSeq access ion−v e r s i o n>KC524742 . 1</GBSeq access ion−v e r s i o n><GBSeq other−s e q i d s>

<GBSeqid>gb |KC524742 . 1 |</GBSeqid><GBSeqid>g i |507866428</GBSeqid>

</GBSeq other−s e q i d s><GBSeq source>Mammuthus p r im i g e n i u s ( woo l l y mammoth)</GBSeq source><GBSeq organism>Mammuthus p r im i g e n i u s</GBSeq organism>

( . . . )

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 59: Advanced NCBI

EFetchRetrieve nucleotide gi=507866428 as Genbank

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.

fcgi?db=nucleotide&id=507866428&rettype=gb

LOCUS KC524742 9042 bp DNA l i n e a r MAM 21−JUN−2013DEFINITION Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in (Mb) gene ,

p a r t i a l cds .ACCESSION KC524742VERSION KC524742 . 1 GI :507866428KEYWORDS .SOURCE Mammuthus p r im i g e n i u s ( woo l l y mammoth)

ORGANISM Mammuthus p r im i g e n i u sEukaryota ; Metazoa ; Chordata ; C r an i a t a ; Ve r t e b r a t a ; Eu t e l e o s t om i ;Mammalia ; Eu t h e r i a ; A f r o t h e r i a ; P robo s c i d ea ; E l e phan t i d a e ;Mammuthus .

REFERENCE 1 ( ba se s 1 to 9042)AUTHORS Mirceta , S . , S ignore ,A .V . , Burns , J .M. , Cos s i n s ,A .R . , Campbel l ,K. L .

and Berenbr ink ,M.TITLE Evo l u t i o n o f mammalian d i v i n g c a p a c i t y t r a c e d by myog lob in net

s u r f a c e cha rgeJOURNAL Sc i e n c e 340 (6138) , 1234192 (2013)PUBMED 23766330

REFERENCE 2 ( ba se s 1 to 9042)AUTHORS Signore ,A .V . , Campbel l ,K. L . and Poinar ,H.N.TITLE D i r e c t Submis s i onJOURNAL Submitted (09−JAN−2013) B i o l o g i c a l Sc i ence s , U n i v e r s i t y o f

Manitoba , 50 S i f t o n Road , Winnipeg , Manitoba R3T2N2 , CanadaCOMMENT ##Assembly−Data−START##

Sequenc ing Technology : : Sanger d i d eoxy s equenc i ng##Assembly−Data−END##

FEATURES Loca t i on / Q u a l i f i e r ssource 1 . . 9 042

/ organ i sm=”Mammuthus p r im i g e n i u s ”/mo l type=”genomic DNA”/ i s o l a t e=”CME2005/915”/ d b x r e f=” taxon :37349 ”/ t i s s u e t y p e=”bone”

gene <35..>9042/gene=”Mb”

mRNA j o i n ( <35 . .129 ,5627 . .5849 ,8979 . . >9042)/ gene=”Mb”/ product=”myog lob in ”

CDS j o i n (35 . . 129 , 5627 . . 5849 , 8979 . . >9042 )

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 60: Advanced NCBI

EFetchEfetch works with the ACCESSION NUMBERS

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.

fcgi?db=nucleotide&id=KC524742&rettype=gb

LOCUS KC524742 9042 bp DNA l i n e a r MAM 21−JUN−2013DEFINITION Mammuthus p r im i g e n i u s i s o l a t e CME2005/915 myog lob in (Mb) gene ,

p a r t i a l cds .ACCESSION KC524742VERSION KC524742 . 1 GI :507866428KEYWORDS .SOURCE Mammuthus p r im i g e n i u s ( woo l l y mammoth)

ORGANISM Mammuthus p r im i g e n i u sEukaryota ; Metazoa ; Chordata ; C r an i a t a ; Ve r t e b r a t a ; Eu t e l e o s t om i ;Mammalia ; Eu t h e r i a ; A f r o t h e r i a ; P robo s c i d ea ; E l e phan t i d a e ;Mammuthus .

REFERENCE 1 ( ba se s 1 to 9042)AUTHORS Mirceta , S . , S ignore ,A .V . , Burns , J .M. , Cos s i n s ,A .R . , Campbel l ,K. L .

and Berenbr ink ,M.TITLE Evo l u t i o n o f mammalian d i v i n g c a p a c i t y t r a c e d by myog lob in net

s u r f a c e cha rgeJOURNAL Sc i e n c e 340 (6138) , 1234192 (2013)PUBMED 23766330

REFERENCE 2 ( ba se s 1 to 9042)AUTHORS Signore ,A .V . , Campbel l ,K. L . and Poinar ,H.N.TITLE D i r e c t Submis s i onJOURNAL Submitted (09−JAN−2013) B i o l o g i c a l Sc i ence s , U n i v e r s i t y o f

Manitoba , 50 S i f t o n Road , Winnipeg , Manitoba R3T2N2 , CanadaCOMMENT ##Assembly−Data−START##

Sequenc ing Technology : : Sanger d i d eoxy s equenc i ng##Assembly−Data−END##

FEATURES Loca t i on / Q u a l i f i e r ssource 1 . . 9 042

/ organ i sm=”Mammuthus p r im i g e n i u s ”/mo l type=”genomic DNA”/ i s o l a t e=”CME2005/915”/ d b x r e f=” taxon :37349 ”/ t i s s u e t y p e=”bone”

gene <35..>9042/gene=”Mb”

mRNA j o i n ( <35 . .129 ,5627 . .5849 ,8979 . . >9042)/ gene=”Mb”/ product=”myog lob in ”

CDS j o i n (35 . . 129 , 5627 . . 5849 , 8979 . . >9042 )

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 61: Advanced NCBI

EFetchUsing the WebEnv parameter.

Web environment string returned from a previous ESearch, EPostor ELink call. When provided, ESearch will post the results of thesearch operation to this pre-existing WebEnv.

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 62: Advanced NCBI

EFetchUsing the WebEnv parameter.

Searching extinct species in the NCBI taxonomy (’extinct[PROP]’)c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?usehistory=y&db=

taxonomy&term=e x t i n c t%5BPROP%5D”

<eSea r c hRe su l t><Count>145</Count><RetMax>20</RetMax><Re tS t a r t>0</ Re tS t a r t><QueryKey>1</QueryKey><WebEnv>NCID 1 75550312 130.14.18.34 9001 1375948145 325582538</WebEnv><I d L i s t>

<I d>1225531</ Id><I d>1225530</ Id><I d>1211276</ Id><I d>1211275</ Id><I d>1027716</ Id><I d>948961</ Id><I d>943952</ Id><I d>867394</ Id><I d>867393</ Id><I d>748142</ Id><I d>748141</ Id><I d>741158</ Id><I d>703576</ Id><I d>703571</ Id><I d>703559</ Id><I d>693865</ Id><I d>686441</ Id><I d>665113</ Id><I d>659069</ Id><I d>656807</ Id>

</ I d L i s t><Tr a n s l a t i o n S e t /><Tran s l a t i o nS t a c k>

<TermSet><Term>e x t i n c t [PROP]</Term><F i e l d>PROP</ F i e l d><Count>145</Count><Exp lode>N</ Exp lode>

</TermSet><OP>GROUP</OP>

</ T r a n s l a t i o nS t a c k><Que r yT ran s l a t i o n>e x t i n c t [PROP]</ Que r yT ran s l a t i o n>

</ eSea r chRe su l t>

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 63: Advanced NCBI

EFetchUsing the WebEnv parameter.

Fetch the extinct species in the NCBI taxonomy (’extinct[PROP]’)using the WebEnv parameter.

$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=taxonomy&query key=1&WebEnv=NCID 1 75550312 130.14.18.34 9001 1375948145 325582538&retmode=xml”

<TaxaSet><Taxon><TaxId>1225531</TaxId><Sc i e n t i f i cName>Equus ovodov i</ S c i e n t i f i cName><OtherNames>

<Synonym>Equus ( Sussemionus ) ovodov i</Synonym><Name>

<ClassCDE>a u t h o r i t y</ClassCDE><DispName>Equus ovodov i Eisenmann &amp ; Se rge j , 2011</DispName>

</Name></OtherNames><ParentTax Id>1225530</ParentTax Id><Rank>s p e c i e s</Rank><D i v i s i o n>Mammals</ D i v i s i o n><Genet icCode>

<GCId>1</GCId><GCName>Standard</GCName>

</Genet icCode><MitoGenet icCode>( . . . . )

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 64: Advanced NCBI

EPOST

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 65: Advanced NCBI

EPost

Uploads a list of UIDs to the Entrez History server

Appends a list of UIDs to an existing set of UID lists attachedto a Web Environment

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 66: Advanced NCBI

EPostPost gi to epost

Get a list of gis of extincts animals:

wget −O − ’ h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=taxonomy&term=e x t i n c t [PROP]& retmax=1000 ’ |\

xm l l i n t −fo rmat − |\grep ’< Id>’ |\cut −d ’< ’ −f 2 |\cut −d ’> ’ −f 2|\t r ”\n” ” , ”

output:

1860150 ,1860149 ,1849957 ,1825730 ,1825729 ,1636722 ,1607772 ,1607771 ,1607767 ,1607757 ,1607756 ,1597978 ,1582057 ,1566623 ,1563127 ,1563126 ,1563125 ,1563124 ,1563123 ,1563122 ,1563121 ,1563120 ,1560315 ,1560314 ,1543223 ,1542494 ,1542469 ,1530197 ,1524889 ,1523245 ,1513476 ,1513474 ,1503129 ,1453604 ,1425170 ,1415635 ,1295174 ,1225531 ,1225530 ,1211276 ,1211275 ,1027716 ,948961 ,943952 ,867394 ,867393 ,748142 ,748141 ,741158 ,703576 ,703571 ,703559 ,693865 ,686441 ,665113 ,659069 ,656807 ,647691 ,647690 ,643746 ,643745 ,643744 ,643742 ,577682 ,572106 ,572105 ,572104 ,572099 ,572098 ,570943 ,570942 ,570941 ,551196 ,544298 ,523825 ,523824 ,523822 ,523821 ,523820 ,518692 ,518691 ,518689 ,475185 ,436495 ,436494 ,436493 ,436488 ,402889 ,399386 ,399178 ,386524 ,379504 ,363580 ,363579 ,363578 ,363571 ,339614 ,339612 ,339609 ,330944 ,330640 ,330639 ,330638 ,330637 ,330636 ,328612 ,314500 ,307641 ,304335 ,272462 ,268291 ,251263 ,251094 ,251093 ,239970 ,239969 ,237965 ,230980 ,230979 ,227166 ,227165 ,223567 ,222863 ,222862 ,216182 ,216181 ,201717 ,201716 ,192211 ,188536 ,187135 ,187134 ,187133 ,187132 ,187131 ,187118 ,184920 ,180214 ,180178 ,180177 ,180176 ,180175 ,180174 ,173935 ,166505 ,148923 ,147494 ,147466 ,147464 ,136416 ,136415 ,126594 ,126429 ,115942 ,107030 ,103864 ,94623 ,92649 ,92648 ,89252 ,89250 ,63631 ,63221 ,54568 ,54500 ,54497 ,54366 ,54365 ,48784 ,46906 ,39097 ,39053 ,39051 ,37349 ,37348 ,37185 ,27445 ,27444 ,20678 ,13266 ,13140 ,9619 ,9275 ,9274 ,9273 ,8818 ,8817 ,8815 ,8813 ,8812 ,8811 ,8810 ,8367 ,3409

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 67: Advanced NCBI

EPostPost gi to epost

wget −O − ’ h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / epo s t . f c g i ?db=taxonomy&WebEnd=NCID 1 15435144 130 . 1 4 . 2 2 . 2 1 59001 1474637318 669113391 0MetA0 S MegaStore F 1&i d=1860150 ,1860149 ,1849957 ,1825730 ,1825729 ,1636722 ,1607772. . . ”

Output:

<?xml v e r s i o n=” 1 .0 ”?><!DOCTYPE ePo s tRe su l t PUBLIC ”−//NLM//DTD ePos tResu l t , 11 May 2002//EN” ” h t t p : //

www. ncb i . nlm . n i h . gov/ e n t r e z / query /DTD/ ePost 020511 . dtd ”><ePo s tRe su l t><QueryKey>1</QueryKey><WebEnv>NCID 1 15467192 130 . 1 4 . 2 2 . 2 1 5

9001 1474637456 570452194 0MetA0 S MegaStore F 1</WebEnv></ ePo s tRe su l t>

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 68: Advanced NCBI

EPostSearching in the WebEnv

Search Homo Sapiens in WebEnv ?

c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=taxonomy&term=Homo%20Sapiens&u s e h i s t o r y=y&WebEnv=NCID 1 75550312 130 . 1 4 . 1 8 . 3 49001 1375948145 325582538&que r y k ey=1”

<eSea r c hRe su l t><Count>0</Count><RetMax>0</RetMax><Re tS t a r t>0</ Re tS t a r t><QueryKey>8</QueryKey><WebEnv>NCID 1 75550312 130 . 1 4 . 1 8 . 3 4 9001 1375948145 325582538</WebEnv><I d L i s t /><Tr a n s l a t i o n S e t /><Tran s l a t i o nS t a c k>

<OP>GROUP</OP><TermSet>

<Term>homo s a p i e n s [ A l l Names ]</Term><F i e l d>A l l Names</ F i e l d><Count>0</Count><Exp lode>N</ Exp lode>

</TermSet><OP>AND</OP>

</ T r a n s l a t i o nS t a c k><Que r yT ran s l a t i o n>(#2) AND homo s a p i e n s [ A l l Names ]</ Que r yT ran s l a t i o n>

</ eSea r chRe su l t>

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 69: Advanced NCBI

EPostSearching in the WebEnv

Search Tyranosaurus in WebEnv ?

$ c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=taxonomy&term=Tyrannosaurus&u s e h i s t o r y=y&WebEnv=NCID 1 75550312 130 . 1 4 . 1 8 . 3 49001 1375948145 325582538&que r y k ey=1”

<eSea r c hRe su l t><Count>1</Count><RetMax>1</RetMax><Re tS t a r t>0</ Re tS t a r t><QueryKey>9</QueryKey><WebEnv>NCID 1 75550312 130 . 1 4 . 1 8 . 3 4 9001 1375948145 325582538</WebEnv><I d L i s t>

<I d>436494</ Id></ I d L i s t><Tr a n s l a t i o n S e t /><Tran s l a t i o nS t a c k>

<OP>GROUP</OP><TermSet>

<Term>Tyrannosaurus [ A l l Names ]</Term><F i e l d>A l l Names</ F i e l d><Count>1</Count><Exp lode>N</ Exp lode>

</TermSet><OP>AND</OP>

</ T r a n s l a t i o nS t a c k><Que r yT ran s l a t i o n>(#2) AND Tyrannosaurus [ A l l Names ]</ Que r yT ran s l a t i o n>

</ eSea r chRe su l t>

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 70: Advanced NCBI

EDirect: combining tools

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 71: Advanced NCBI

Piping Edirect

e s e a r c h −db taxonomy −q u e r y ” T y r a n n o s a u r u s ” | \e f e t c h −fo rmat xml

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 72: Advanced NCBI

Piping Edirect

e s e a r c h −db pubmed −q u e r y ” T y r a n n o s a u r u s ” | \e f i l t e r −mindate 2005 | \e f e t c h −fo rmat docsum | \x t r a c t −p a t t e r n DocumentSummary \−e l em en t M e d l i n e C i t a t i o n /PMID \−e l em en t I d S o r t F i r s t A u t h o r

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 73: Advanced NCBI

Elink

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 74: Advanced NCBI

Elink

Returns UIDs linked to an input set of UIDs in either thesame or a different Entrez database

Returns UIDs linked to other UIDs in the same Entrezdatabase that match an Entrez query

Checks for the existence of Entrez links for a set of UIDswithin the same database

Lists the available links for a UID

Lists LinkOut URLs and attributes for a set of UIDs

Lists hyperlinks to primary LinkOut providers for a set of UIDs

Creates hyperlinks to the primary LinkOut provider for a singleUID

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 75: Advanced NCBI

Elink

Base URL:https://eutils.ncbi.nlm.nih.gov/entrez/eutils/elink.fcgi

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 76: Advanced NCBI

ELinkSearching the pubmed records associated to sequence gi:507866428

h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e l i n k . f c g i ? dbfrom=nu c l e o t i d e&db=pubmed&i d =507866428&cmd=n e i g h b o r s c o r e

<eL i n kRe s u l t><L inkSe t>

<DbFrom>nucco re</DbFrom><I d L i s t>

<I d>507866428</ Id></ I d L i s t><LinkSetDb>

<DbTo>pubmed</DbTo><LinkName>nuccore pubmed</LinkName><L ink>

<I d>23766330</ Id><Score>0</ Score>

</ L ink></LinkSetDb>

</ L inkSe t></ eL i n kRe s u l t>

$ c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&i d =23766330& r e t t y p e=med l i ne&retmode=t e x t ”

PMID− 23766330TI − Evo l u t i o n o f mammalian d i v i n g c a p a c i t y t r a c e d by myog lob in net s u r f a c e

cha rge .PG − 1234192LID − 10.1126/ s c i e n c e .1234192 [ do i ]

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 77: Advanced NCBI

Transformations

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 78: Advanced NCBI

EfetchTransforming to SVG

Using the stylesheethttps://github.com/lindenb/xslt-sandbox/blob/master/

stylesheets/bio/ncbi/gb2svg.xsl

x s l t p r o c <( c u r l ” h t t p s : // raw . g i t hub . com/ l i n d e n b / x s l t−sandbox /master / s t y l e s h e e t s/ b i o / ncb i / gb2svg . x s l ” ) \

” h t t p s : //www. ncb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=nu c l e o t i d e&i d=14971102&retmode=xml&r e t t y p e=gbc”

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 79: Advanced NCBI

EfetchTransforming to SVG

1 <?xml v e r s i o n=” 1 .0 ” encod ing=”UTF−8”?>2 <s v g : s v g xm ln s : s v g=” h t t p : //www.w3 . org /2000/ svg ” h e i g h t=”121” width=”920” s t y l e=”

s t r oke−wid th : 1px ; ”>3 <s v g : t i t l e>Human r o t a v i r u s segment 7 NSP3 gene , complete cds</ s v g : t i t l e>4 <s v g : d e f s>5 <s v g : l i n e a r G r a d i e n t x1=”0%” y1=”0%” x2=”0%” y2=”100%” i d=” grad ”>6 <s v g : s t o p o f f s e t=”5%” stop−c o l o r=” b l a ck ”/>7 <s v g : s t o p o f f s e t=”50%” stop−c o l o r=”whitesmoke ”/>8 <s v g : s t o p o f f s e t=”95%” stop−c o l o r=” b l a ck ”/>9 </ s v g : l i n e a r G r a d i e n t>

10 <s v g : l i n e a r G r a d i e n t x1=”0%” y1=”0%” x2=”0%” y2=”100%” i d=”v e r t i c a l b o d y g r a d i e n t ”>

11 <s v g : s t o p o f f s e t=”5%” stop−c o l o r=”wh i t e ”/>12 <s v g : s t o p o f f s e t=”95%” stop−c o l o r=” l i g h t g r a y ”/>13 </ s v g : l i n e a r G r a d i e n t>14 </ s v g : d e f s>15 <s v g : s t y l e type=” t e x t / c s s ”/>16 <s v g : g>17 <s v g : g t r an s f o rm=” t r a n s l a t e (0 , 0 ) ”>18 <s v g : r e c t x=”0” y=”0” width=”920” h e i g h t=”120” f i l l =” u r l (#

v e r t i c a l b o d y g r a d i e n t ) ” s t r o k e=” b l a c k ”/>19 <s v g : t e x t s t y l e=” c o l o r : r e d ; font−s i z e : 3 5 p x ; ” x=”10” y=”35”>Human r o t a v i r u s

segment 7 NSP3 gene , complete cds</ s v g : t e x t>20 <s v g : g>21 <s v g : r e c t x=”10” y=”40” width=”900” h e i g h t=”18” s t y l e=” f i l l : u r l (#grad ) ;

s t r o k e : b l a c k ; ” t i t l e=” 1 . . 1 074 ”/>22 <s v g : t e x t y=”54” x=”460” tex t−anchor=”midd le ”><s v g : t s p a n s t y l e=” font−

we i g h t : b o l d ; ”>s ou r c e</ s v g : t s p a n><s v g : t s p a n xm l n s : x s i=” h t t p : //www.w3. org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=” h t t p : //www.w3 . org /1999/x l i n k ” font−we ight=” bo ld ”>organ i sm</ s v g : t s p a n>:Human r o t a v i r u s A <s v g : t s p a n xm l n s : x s i=” h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ”xm l n s : x l i n k=” h t t p : //www.w3 . org /1999/ x l i n k ” font−we ight=” bo ld ”>mol type</ s v g : t s p a n>: genomic RNA <s v g : t s p a n xm l n s : x s i=” h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=” h t t p : //www.w3 . org/1999/ x l i n k ” font−we ight=” bo ld ”>s t r a i n</ s v g : t s p a n>:M <s v g : t s p a nxm l n s : x s i=” h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=”h t t p : //www.w3 . org /1999/ x l i n k ” font−we ight=” bo ld ”>segment</ s v g : t s p a n>: 7 <s v g : t s p a n xm l n s : x s i=” h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e” xm l n s : x l i n k=” h t t p : //www.w3 . org /1999/ x l i n k ” font−we ight=” bo ld ”>c l o n e</ s v g : t s p a n>:M0</ s v g : t e x t>

23 </ s v g : g>24 <s v g : g>25 <s v g : r e c t x=”10” y=”60” width=”27.6794035414725 ” h e i g h t=”18” s t y l e=”

f i l l : u r l (#grad ) ; s t r o k e : b l a c k ; ” t i t l e=” 1 . . 3 4 ”/>26 <s v g : t e x t y=”74” x=”39.6794035414725 ” tex t−anchor=” s t a r t ”>27 <s v g : t s p a n s t y l e=” font−we i g h t : b o l d ; ”>5 ’UTR</s vg : t s pan>28 </s v g : t e x t>29 </svg :g>30 <svg :g>31 <s v g : r e c t x=”38.5181733457595” y=”80” width =”781.733457595526” h e i g h t

=”18” s t y l e=” f i l l : u r l (#grad ) ; s t r o k e : b l a c k ; ” t i t l e =”35..967”/>32 <s v g : t e x t y=”94” x=”429.384902143523” tex t−anchor=”midd le”><s v g : t s p a n

s t y l e=”font−we i g h t : b o l d ;”>CDS</s vg : t s pan><s v g : t s p a n xm l n s : x s i=”h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=”h t t p : //www.w3 . org /1999/ x l i n k ”font−we ight=”bo ld”>codon s t a r t</s vg : t s pan>: 1 <s v g : t s p a n xm l n s : x s i=”h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=”h t t p : //www.w3 . org /1999/x l i n k ” font−we ight=”bo ld”>product</s vg : t s pan>:NSP3 <s v g : t s p a n xm l n s : x s i=”h t t p : //www.w3 . org /2001/XMLSchema−i n s t a n c e ” xm l n s : x l i n k=”h t t p : //www.w3 . org/1999/ x l i n k ” font−we ight=”bo ld”>p r o t e i n i d </s vg : t s pan>:AAK74116.1</ s v g : t e x t>

33 </svg :g>34 <svg :g>35 <s v g : r e c t x=”821.090400745573” y=”100” width =”88.909599254427” h e i g h t

=”18” s t y l e=” f i l l : u r l (#grad ) ; s t r o k e : b l a c k ; ” t i t l e =”968..1074”/>36 <s v g : t e x t y=”114” x=”819.090400745573” tex t−anchor=”end”>37 <s v g : t s p a n s t y l e=”font−we i g h t : b o l d ;”>3 ’UTR</ s v g : t s p a n>38 </ s v g : t e x t>39 </ s v g : g>40 <s v g : r e c t x=”0” y=”0” width=”920” h e i g h t=”120” f i l l =”none” s t r o k e=” b l a ck ”/

>41 </ s v g : g>42 </ s v g : g>43 </ s v g : s v g>

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 80: Advanced NCBI

EfetchTransforming to SVG

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 81: Advanced NCBI

EfetchTransforming to R

$ c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e s e a r c h . f c g i ?db=pubmed&term=Tyrannosaurus&u s e h i s t o r y=t r u e ” | xm l l i n t −−fo rmat −

$ c u r l −s ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 59001 1375957034 1619786167&que r y k ey=1&retmode=xml”

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 82: Advanced NCBI

EfetchTransforming to R

1 <?xml v e r s i o n=’ 1 .0 ’ encod ing=”UTF−8” ?>2 <x s l : s t y l e s h e e t xm l n s : x s l= ’ h t t p : //www.w3 . org /1999/XSL/Transform ’ v e r s i o n=’ 1 .0 ’>3 <x s l : o u t p u t method=” t e x t ”/>456 <x s l : t em p l a t e match=”/”>7 date2count &l t ;− l i s t ( )8 <x s l : a p p l y−t emp l a t e s s e l e c t=”/PubmedArt i c l eSet / PubmedArt i c l e [ Med l i n eC i t a t i o n /

DateCreated /Year ] ”/>9 d f &l t ;− data . f rame (

10 Year=as . i n t e g e r ( names ( date2count ) ) ,11 Count=u n l i s t ( date2count )12 )13 png ( ’ j e te rpubmed . png ’ )14 p l o t ( d f )15 t i t l e ( ’ pubmed: count ( a r t i c l e s )=f ( y ea r ) ’ )16 dev . o f f ( )17 </ x s l : t em p l a t e>1819 <x s l : t em p l a t e match=”PubmedArt i c l e ”>20 <x s l : v a r i a b l e name=” yea r ” s e l e c t=”Med l i n eC i t a t i o n /DateCreated /Year ”/>21 date2count [ [ ”<x s l : v a l u e−o f s e l e c t=”$ yea r ”/>” ] ] & l t ;− i f e l s e ( i s . n u l l ( da te2count [ [

”<x s l : v a l u e−o f s e l e c t=”$ yea r ”/>” ] ] ) ,1 ,1+ date2count [ [ ”<x s l : v a l u e−o f s e l e c t=”$ yea r ”/>” ] ] )

22 </ x s l : t em p l a t e>2324 </ x s l : s t y l e s h e e t>

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 83: Advanced NCBI

EfetchTransforming to R

$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 59001 1375957034 1619786167&que r y k ey=1&retmode=xml” |\

x s l t p r o c pubmed2rs ta t s . x s l −

date2count <− l i s t ( )

da te2count [ [ ”2013” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2013” ] ] ) ,1 ,1+ date2count [ [ ”2013” ] ] )

da te2count [ [ ”2012” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2012” ] ] ) ,1 ,1+ date2count [ [ ”2012” ] ] )

da te2count [ [ ”2012” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2012” ] ] ) ,1 ,1+ date2count [ [ ”2012” ] ] )

da te2count [ [ ”2011” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2011” ] ] ) ,1 ,1+ date2count [ [ ”2011” ] ] )

da te2count [ [ ”2011” ] ] <− i f e l s e ( i s . n u l l ( date2count [ [ ”2011” ] ] ) ,1 ,1+ date2count [ [ ”2011” ] ] )

( . . )df <− data . frame (Year=as . i n t e g e r (names ( date2count ) ) ,Count=u n l i s t ( date2count ))png ( ’ j e te rpubmed . png ’ )p l o t ( df )t i t l e ( ’ pubmed : count ( a r t i c l e s )=f ( y ea r ) ’ )dev . o f f ( )

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 84: Advanced NCBI

EfetchTransforming to R

$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=pubmed&u s e h i s t o r y=t r u e&WebEnv=NCID 1 52434791 130 . 1 4 . 2 2 . 2 1 59001 1375957034 1619786167&que r y k ey=1&retmode=xml” |\

x s l t p r o c pubmed2rs ta t s . x s l − |\R −−no−save

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 85: Advanced NCBI

Generating a JAVA parser

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 86: Advanced NCBI

Using the XML schemaXML Schema for dbSNP

ftp://ftp.ncbi.nlm.nih.gov/snp/specs/docsum_3.4.xsd

<?xml v e r s i o n=” 1 .0 ” encod ing=”UTF−8”?><xsd : schema xm ln s : x s d=” h t t p : //www.w3 . org /2001/XMLSchema” xmlns=” h t t p : //www. ncb i . nlm . n i h . gov/SNP/docsum” targetNamespace=” h t t p : //www. ncb i . nlm . n i h . gov/SNP/docsum” e lementFormDefault=” q u a l i f i e d ” a t t r i b u t eFo rmDe f a u l t=” u n q u a l i f i e d ”><x s d : e l emen t name=”ExchangeSet ”>

<x s d : a n n o t a t i o n><x sd :documenta t i on>Set o f dbSNP refSNP docsums , v e r s i o n 3 .4</ x sd :documenta t i on>

</ x s d : a n n o t a t i o n><xsd :complexType>

<x s d : s e qu en c e><x s d : e l emen t name=”SourceDatabase ” minOccurs=”0”>

<xsd :complexType><x s d : a t t r i b u t e name=” t a x I d ” type=” x s d : i n t ” use=” r e q u i r e d ”>

<x s d : a n n o t a t i o n><x sd :documenta t i on>NCBI taxonomy ID f o r v a r i a t i o n</ x sd :documenta t i on>

</ x s d : a n n o t a t i o n></ x s d : a t t r i b u t e><x s d : a t t r i b u t e name=” organ i sm ” type=” x s d : s t r i n g ” use=” r e q u i r e d ”>

<x s d : a n n o t a t i o n><x sd :documenta t i on>common name f o r s p e c i e s used as pa r t o f da tabase name .</ x sd :documenta t i on>

</ x s d : a n n o t a t i o n></ x s d : a t t r i b u t e><x s d : a t t r i b u t e name=”dbSnpOrgAbbr” type=” x s d : s t r i n g ”>

<x s d : a n n o t a t i o n><x sd :documenta t i on>organ i sm a b b r e v i a t i o n used i n dbSNP . </ x sd :documenta t i on>

</ x s d : a n n o t a t i o n></ x s d : a t t r i b u t e><x s d : a t t r i b u t e name=”gpipeOrgAbbr ” type=” x s d : s t r i n g ”>

<x s d : a n n o t a t i o n><x sd :documenta t i on>organ i sm a b b r e v i a t i o n used w i t h i n NCBI genome p i p e l i n e data dumps .</ x sd :documenta t i on>

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 87: Advanced NCBI

Using the XML schemaCompiling the XML Schema for dbSNP with XJC

$ x j c −d . ” f t p : // f t p . n cb i . nlm . n i h . gov/ snp/ spe c s /docsum 3 . 4 . xsd ”p a r s i n g a schema . . .c omp i l i n g a schema . . .h t t p s /www ncb i n lm n ih gov / snp/docsum/Assay . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Assembly . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/BaseURL . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Component . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/ExchangeSet . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/FxnSet . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/MapLoc . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Ob j e c tFac to r y . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Pr imarySequence . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Rs . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/RsL inkout . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/ RsSt ruc t . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/Ss . j a v ah t t p s /www ncb i n lm n ih gov / snp/docsum/package−i n f o . j a v a

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 88: Advanced NCBI

Using the XML schemaCompiling the XML Schema for dbSNP with XJC

Search the non-genomic rs# in dbSNP.1 import h t t p s . www ncb i n lm n ih gov . snp . docsum .∗ ;2 import j a v a x . xml . b ind .∗ ;3 import j a v a x . xml . s t ream .∗ ;4 import j a v a x . xml . s t ream . e v en t s .∗ ;5 c l a s s ParseDbSnp6 {7 pub l i c s t a t i c vo id main ( S t r i n g [ ] a r g s ) throws Excep t i on8 {9 JAXBContext j a xbC t x t=JAXBContext . new In s tance ( ” h t t p s . www ncb i n lm n ih gov

. snp . docsum” ) ;10 Unmar sha l l e r u nma r s h a l l e r=j a xbC t x t . c r e a t eUnma r s h a l l e r ( ) ;11 XMLInputFactory i f a c t o r y = XMLInputFactory . new Ins tance ( ) ;12 XMLEventReader r= i f a c t o r y . createXMLEventReader ( System . i n ) ;13 wh i l e ( r . hasNext ( ) )14 {15 XMLEvent ev t=r . peek ( ) ;16 i f ( ! ( e v t . i s S t a r t E l emen t ( ) && ev t . a sS t a r tE l emen t ( ) . getName ( ) .

g e t Lo c a lPa r t ( ) . e q u a l s ( ”Rs” ) ) )17 {18 ev t=r . nex tEvent ( ) ;19 cont inue ;20 }2122 Rs r s=unma r s h a l l e r . unmarsha l ( r , Rs . c l a s s ) . ge tVa lue ( ) ;23 i f ( ” genomic ” . e qua l s ( r s . getMolType ( ) ) ) cont inue ;24 System . out . p r i n t l n ( ” r s ”+r s . g e tRs I d ( )+” ”+r s . getMolType ( ) ) ;25 }26 r . c l o s e ( ) ;27 }28 }

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 89: Advanced NCBI

Using the XML schemaCompiling the XML Schema for dbSNP with XJC

compile...$ j a v a c ParseDbSnp . j a v a h t t p s /www ncb i n lm n ih gov / snp/docsum/∗ . j a v a

and run...$ c u r l −s ” f t p : // f t p . n cb i . n i h . gov/ snp/ o rgan i sms /human 9606/XML/ ds ch1 . xml . gz” |\gunz ip −c |\j a v a ParseDbSnp

r s701 cDNArs860 cDNArs861 cDNArs862 cDNArs863 cDNArs864 cDNArs865 cDNArs866 cDNArs877 cDNArs878 cDNArs879 cDNArs880 cDNArs882 cDNArs883 cDNArs884 cDNArs885 cDNArs886 cDNArs913 cDNArs945 cDNArs946 cDNA( . . . )

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 90: Advanced NCBI

NCBI EBot

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 91: Advanced NCBI

NCBI EBotURL

https://www.ncbi.nlm.nih.gov/Class/PowerTools/eutils/

ebot/ebot.cgi

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 92: Advanced NCBI

NCBI EBotSample output

#!/ u s r / b i n / p e r l( . . . )# PUBLIC DOMAIN NOTICE# Nat i o na l Cente r f o r B i o t e chno l ogy I n f o rma t i o nuse LWP: : S imple ;use LWP: : UserAgent ;use Net : : FTP ;

my $de l a y = 0 ;my $maxdelay = 3 ;my $base = ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s /” ;

$params{ ema i l} = ”nobody@nowhere . com” ;$params{db} = ” nuccore ” ;$params{ t o o l} = ” ebot ” ;$params{term} = ”Mammuthus+p r im i g e n i u s [ORGN] ” ;%params = e s e a r c h (%params ) ;

$params{retmode} = ”xml” ;$params{ o u t f i l e } = ” r e s u l t . xml” ;$params{ r e t t y p e} = ” na t i v e ” ;e f e t c h b a t c h (%params ) ;

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 93: Advanced NCBI

BLAST

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 94: Advanced NCBI

Standalone BlastDownloading

Standalone tools are available at ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/

#add BLAST to your pathexport PATH=${PATH} : / path / to / ncb i−b l a s t −2.2.28+/ b i n

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 95: Advanced NCBI

Standalone BlastDownload a sample

apis mellifera proteins

c u r l −o p r o t e i n . f a . gz \” f t p : // f t p . n cb i . n i h . gov/genomes/ A p i s m e l l i f e r a / p r o t e i n / p r o t e i n . f a . gz”

gunz ip p r o t e i n . f a . gz

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 96: Advanced NCBI

Standalone BlastCreate a Blast database with makeblastdb

Getting help...

$ makeb lastdb −h e l p( . . . )−dbtype <S t r i n g , ‘ nuc l ’ , ‘ p rot ’>

M o l e c u l e type o f t a r g e t db− i n <F i l e I n >

I n p u t f i l e / d a t a b a s e nameD e f a u l t = ‘− ’

− i n p u t t y p e <S t r i n g , ‘ a s n 1 b i n ’ , ‘ a s n 1 t x t ’ , ‘ b l a s t d b ’ , ‘ f a s t a ’>Type o f the data s p e c i f i e d i n i n p u t f i l eD e f a u l t = ‘ f a s t a ’

( . . )

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 97: Advanced NCBI

Standalone BlastCreate a Blast database with makeblastdb

Create the BLAST database:

$ makeb lastdb − i n p r o t e i n . f a −dbtype p r o t

B u i l d i n g a new DB, c u r r e n t t ime : 09/02/2013 1 8 : 2 9 : 3 8New DB name : p r o t e i n . f aNew DB t i t l e : p r o t e i n . f aSequence type : P r o t e i nKeep L i n k o u t s : TKeep MBits : TMaximum f i l e s i z e : 1000000000BAdding s e q u e n c e s from FASTA ; added 10570 s e q u e n c e s i n 1 .84458 s e c o n d s .

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 98: Advanced NCBI

Standalone BlastQuery a Blast database with blastp

Get help:

$ b l a s t p −h e l p( . . . )−q u e r y <F i l e I n >

I n p u t f i l e nameD e f a u l t = ‘− ’

−db <S t r i n g>BLAST d a t a b a s e name

( . . . )

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 99: Advanced NCBI

Standalone BlastBlast human EIF4G1 gi:187956781

$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=p r o t e i n&r e t t y p e=f a s t a&i d =187956781” |\

b l a s t p −db p r o t e i n . f a

Query= g i |187956781 | gb |AAI40897 . 1 | EIF4G1 p r o t e i n [Homo s a p i e n s ]( . . . )

Score ESequences p roduc i ng s i g n i f i c a n t a l i g nmen t s : ( B i t s ) Value

g i |328782175 | r e f |XP 394628 . 4 | PREDICTED : e u k a r y o t i c t r a n s l a t i o n . . . 189 4e−49g i |328779480 | r e f | XP 003249661 . 1 | PREDICTED : h y p o t h e t i c a l p r o t e i . . . 38 .1 0 .017g i |110762568 | r e f | XP 001121713 . 1 | PREDICTED : h y p o t h e t i c a l p r o t e i . . . 38 .1 0 .018

( . . . )> g i |328782175 | r e f |XP 394628 . 4 | PREDICTED : e u k a r y o t i c t r a n s l a t i o ni n i t i a t i o n f a c t o r 4 gamma 2− l i k e [ Ap i s m e l l i f e r a ]Length=899

Score = 189 b i t s (479) , Expect = 4e−49, Method : Compos i t i ona l mat r i x a d j u s t .I d e n t i t i e s = 115/319 (36%) , P o s i t i v e s = 175/319 (55%) , Gaps = 39/319 (12%)

Query 717 KEPRKIIATVLMTEDIKLNKAEKAWKPSS−−KRTAADKDRGEEDADGSKTQDLFRRVRSI 774++P + +++ +DI+ E+ W P S +R A + S+ +FR+VR I

Sb j c t 22 RKPSETTVGLVIKDDIRSLSTEQRWIPPSTLRRDALTPE−−−−−−−−SRNNFIFRKVRGI 73

Query 775 LNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCL−−−−− 829LNKLTP+ F +L + + ++++ LKGVI LIFEKA+ EP +S YA +C+ L

Sb j c t 74 LNKLTPEKFAKLSNDLLNVELNSDVILKGVIFLIFEKALDEPKYSSMYAQLCKRLSDEAA 133

Query 830 −MALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLK 888K E F LLL++C+ EFE E FE + DE EE

Sb j c t 134 NFEPKKALIESQKGQSTFTFLLLSKCRDEFENRSKASEAFENQ−−−−DELGPEEE−−−−− 184Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 100: Advanced NCBI

Standalone BlastBlast human EIF4G1 gi:187956781 , ouput XML

$ c u r l ” h t t p s : // e u t i l s . n cb i . nlm . n i h . gov/ e n t r e z / e u t i l s / e f e t c h . f c g i ?db=p r o t e i n&r e t t y p e=f a s t a&i d =187956781” |\

b l a s t p −db p r o t e i n . f a −outfmt 5

( . . . )<H i t h s p s>

<Hsp><Hsp num>1</Hsp num><Hsp b i t−s c o r e>189.119</Hsp b i t−s c o r e><Hsp sco r e>479</ Hsp sco r e><Hsp eva l ue>3.78314 e−49</ Hsp eva l ue><Hsp query−from>717</Hsp query−from><Hsp query−to>1017</Hsp query−to><Hsp h i t−from>22</Hsp h i t−from><Hsp h i t−to>319</Hsp h i t−to><Hsp query−f rame>0</Hsp query−f rame><Hsp h i t−f rame>0</Hsp h i t−f rame><Hs p i d e n t i t y>115</ H s p i d e n t i t y><Hs p p o s i t i v e>175</ H s p p o s i t i v e><Hsp gaps>39</Hsp gaps><Hsp a l i gn−l e n>319</ Hsp a l i gn−l e n><Hsp qseq>KEPRKIIATVLMTEDIKLNKAEKAWKPSS−−KRTAADKDRGEEDADGSKTQDLFRRVRSILNKLTPQMFQQLMKQVTQLAIDTEERLKGVIDLIFEKAISEPNFSVAYANMCRCL−−−−−−MALKVPTTEKPTVTVNFRKLLLNRCQKEFEKDKDDDEVFEKKQKEMDEAATAEERGRLKEELEEARD

IARRRSLGNIKFIGELFKLKMLTEAIMHDCVVKLL−−−−−−−−KNHDEESLECLCRLLTTIGKDLDFEKAKPRMDQYFNQMEKIIKEKKTSSRIRFMLQDVLDLRGSNWVPRRG−−DQGPKTIDQIHKEAE</Hsp qseq><Hsp hseq>RKPSETTVGLVIKDDIRSLSTEQRWIPPSTLRRDALTPE−−−−−−−−SRNNFIFRKVRGILNKLTPEKFAKLSNDLLNVELNSDVILKGVIFLIFEKALDEPKYSSMYAQLCKRLSDEAANFEPKKALIESQKGQSTFTFLLLSKCRDEFENRSKASEAFENQ−−−−DELGPEEE−−−−−−−−−ERRQ

VAKRKMLGNIKFIGELGKLGIVSETILHRCILQLLEKKRRRRSRGDTAEDIECLCQIMRTCGRILDSDKGRGLMDQYFKRMNSLAESRDLPLRIKFMLRDVIELRRDGWVPRKATSTEGPMPINQIRNDNE</Hsp hseq><Hsp m id l i n e>++P + +++ +DI+ E+ W P S +R A + S+ +FR+VR ILNKLTP+ F +L

+ + ++++ LKGVI LIFEKA+ EP +S YA +C+ L K E F LLL++C+ EFEE FE + DE EE ER +A+R+ LGNIKFIGEL KL +++E I+H C+++LL + E +ECLC+++ T G+ LD +K + MDQYF +M

+ + + RI+FML+DV++LR WVPR+ +GP I+QI + E</ Hsp m id l i n e></Hsp>

( . . . )Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 101: Advanced NCBI

NCBI URL-API Blast

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 102: Advanced NCBI

NCBI URL-API Blast

https://www.ncbi.nlm.nih.gov/blast/Doc/urlapi.html

$ c u r l ” h t t p s : //www. ncb i . nlm . n i h . gov/ b l a s t / B l a s t . c g i ?CMD=Put&QUERY=PAERLMERKADIE&DATABASE=nr&PROGRAM=b l a s t p&FILTER=L&HITLIST SZE=500”

( . . . )

<!−−QBla s t I n f oBeg i nRID = 1NRYGX9K014RTOE = 29

QBlas t In foEnd−−>

( . . . )

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses

Page 103: Advanced NCBI

The End

Pierre Lindenbaum@yokofakun [email protected] http://plindenbaum.blogspot.comAdvanced NCBI.The Entrez APIhttps://github.com/lindenb/courses