Post on 19-Feb-2018
USGENE Workshop Manual
© Fachinformationszentrum Karlsruhe, February 2008 Robert Austin FIZ Karlsruhe Inc 376 Carter Road Princeton, NJ 08540 Email: robert.austin@fiz-k.com Web: www.fiz-k.com
USGENE Workshop Manual
Contents
Page
Workshop Slides 1 Agenda 1 STN sequence searchable databases 2 USGENE® database content 3 The 7 basic steps of USGENE BLAST® 11 BLAST and Patent Family SORT (FSORT) 29 Post-processing BLAST search results 36 Sequence Code Match (SCM) searching with GETSEQ 47 GETSIM (FASTA) similarity searching 52 Offline BATCH search mode 58 Multifile searching with DGENE 60 Comparisons and conclusions 74 Appendices USGENE BLAST OPG post-processing example (see page 36) Multifile searching with DGENE STN Transcript (see page 60)
1
The USPTO Genetic Sequence Database, USGENE®, on STN
Robert Austin – FIZ Karlsruhe
2Agenda
• STN sequence searchable databases• USGENE database content• The 7 basic steps of USGENE BLAST®
• BLAST and Patent Family SORT (FSORT) • Post-processing BLAST search results• Sequence Code Match (SCM) with GETSEQ• Similarity searching GETSIM (FASTA)• Offline BATCH search mode• Multifile searching with DGENE• Comparisons and conclusionsBLAST is a registered trademark of the U.S. National Library of Medicine (NLM)
2
3STN sequence searchable databases
• CAS REGISTRYSM
– Chemical Abstracts Service (CAS) Registry File• DGENE
– Thomson Scientific GENESEQTM
• PCTGEN– WIPO/PCT Patent Application Biosequences
• USGENE– The USPTO Genetic Sequence Database
4A new subject for many….
Bluff Your Way in Genetics!!http://www.stn-international.com/training_center/bioseq/bluff.pdf
3
5USGENE is the USPTO Genetic Sequence Database
• Sequences from all relevant USPTO published patent applications and granted (issued) patents
• Assignee and full inventor names; publication, application and parent case PCT numbers and dates; original publication title, abstract, and claims
• Organism name, sequence length, Molecule Type, SEQ ID, and feature tables for features/annotations
• Produced by the SequenceBase Corporation
• Updated weekly – within 3 days of publication
• 1982 – present
6USGENE consolidates unique USPTO sequence data from different sources
• USPTO Publication Site for Issued and Published Sequences (PSIPS)– The official mega-publication download site, 2001-date
• International Nucleotide Sequence Database Collaboration (INSDC) (NCBI/EMBL/DDBJ, Genbank)– U.S. granted patent nucleotide sequences, 1982-date
• USPTO Protein Database (NCBI/EMBL)– U.S. granted patent protein/peptide sequences, 1982-date
• USPTO Patents and Published Applications Full-Text– Filling in omissions, coverage gaps and to enhance timeliness
The USGENE Sequence Source (/SSO) field indicates which source any given USGENE sequence record was derived from.
4
7USGENE combines these sequences with bibliographic data and claims text
USPTO biblio, title, abstract
and claims text
USPTO PSIPS Sequences
INSDCUSPTO nucleotide
Sequences
NCBI/EMBL-EBIUSPTO peptide
Sequences
USPTO full-text sequences
8An individual publication is represented by one or more USGENE sequence records
AN .... Protein USGENEPI US …. B2SEQ 1 ….
AN .... DNA USGENEPI US …. B2SEQ 2 ….
AN .... cDNA USGENEPI US …. B2SEQ n ….
5
9Each USGENE sequence record includes full patent bibliography, title and abstract
L1 ANSWER 1 OF 1 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN AN 7255990.1 DNA USGENETI Method for screening genes expressing at desired sites (Patent)IN Higo Kenichi (Tsukuba, JP); Iwamoto Masao (Tsukuba, JP)PA National Institute of Agrobiological Sciences (Ibaraki JP)PI US 7255990 B2 20070814
US 20040086855 A1 20040506WO 2003044227 A 20030530
AI US 2001-221596 20011121RLI WO 2001-JP10195 20011121ED 20070817DT PatentAB The present invention relates to a method for inferring a plant
organ, in which a certain gene is to be expressed, using a part of a base sequence, a method for searching for a gene which is to be expressed at a desired site, and a composition, kit, system and program for carrying out these methods. The present invention also relates to a method for inferring a plant organ, in which a plant gene is to be expressed, based on information about the presence or absence of a base sequence which is highly similar to . . . .
(1) (2)(3)
(4)(5)
(6)
ALL display format.
See (1) - (7)on slide 12.
(7)USGENE records are typically available within 3 days of publication by the USPTO.
10Each USGENE sequence record includes patent or published application claims text
CLM US7255990 B2: 1. A method for detecting a gene which is expressed in a flower and other organs in a rice plant, comprising the steps of:(1)searching a gene population using a Tourist C transposon sequence consisting of SEQ ID NO: 1 as a key sequence,(2) selecting a gene having the transposon sequence in the vicinity of a putative protein coding region, and(3) detecting expression of said gene in the flower and other organs.
2. The method according to claim 1, wherein the expression of said gene includes expression of at least one site selected from a stamen and a pistil.
3. The method according to claim 1, wherein the gene population is alibrary and the key sequence is a probe sequence.
4. The method according to claim 3, wherein the database is a DNAlibrary.
5. The method according to claim 3, wherein the search . . . .
ALL display format (cont.)
(8)
6
11All USGENE sequences are provided in STN standardized format
SSO NUCLEIC; USPTO; GRANTED
ORGN Zea mays
SQL 352
SEQ
1 gggtctgttt agttcccaaa caaaattttt cacgctgtta cataggatgt
51 ttggacacat gcatagagta ctaaatgtag aaaaaaaaca attaaacatt
101 tcgccttgaa attacgagac aaatctttta agcctaattg cgccatgatt
151 tgacaatttg gtgctacaat aaatatttgc taataataga ttaattaggc
201 ttaataaatt cgtcttgcag tttccagacg gaatctgtaa tttattttat
251 gagatacagc tgcttcgatc ttccatcaca tattcagacc gtacctaatc
301 tgaaaggtta gtaatttgaa ctgcgtagta atgctacaag gtaaatcaat
351 ca
FEATURE TABLE:
Key |Location |
============+==========+=======================
misc_feature|(1)..(352)|
(9)(10)
(13)
(11)
(12)
See (8) - (13)on slide 13.
ALL display format (cont.)
12USGENE sample record annotations
1) USGENE Accession Number (AN), including the sequence identity number (SEQ ID NO)
2) Molecule Type (MTY)3) Original publication title – a “Published Application”
or “Patent” indication is given in parentheses4) Full inventor names, city and state/country5) Patent assignee name, city and state/country6) Publication, application and related PCT parent
case application details and dates7) Original patent or published application abstract
7
13USGENE sample record annotations
8) Published application or granted patent claims9) The Sequence Source (SSO) – nucleic or protein;
PSIPS/USPTO, NCBI, etc; granted or application10) Organism (where given) – providing the name of
the organism from which the sequence is derived11) Searchable and sortable Sequence Length (SQL)12) Standardized patent sequence (SEQ) – each
USGENE record is based upon a sequence13) Feature table including sequence modifications,
features and/or annotations, as provided by the patent applicant or assignee
14The original format of a USGENE sequence is available for display using the SEQO display
=> S 20070224666.21/ANL1 1 20070224666.21/AN
=> D TRI SEQO
L1 ANSWER 1 OF 1 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN TI Alleles of the zwf gene from coryneform bacteria
(PublishedApplication)MTY DNASQL 1263SEQO
gtg gcc ctg gtc gta cag aaa tat ggc ggt tcc tcg ctt gag agt gcg 48Met Ala Leu Val Val Gln Lys Tyr Gly Gly Ser Ser Leu Glu Ser Ala 1 5 10 15gaa cgc att aga aac gtc gct gaa cgg atc gtt gcc acc aag aag gct 96Glu Arg Ile Arg Asn Val Ala Glu Arg Ile Val Ala Thr Lys Lys Ala 20 25 30 gga aat gat gtc gtg gtt gtc tgc tcc gca atg gga gac acc acg gat 144 Gly Asn Asp Val Val Val Val Cys Ser Ala Met Gly Asp Thr Thr Asp 35 40 45 . . . . . . .
Often the SEQO original format includes the patent applicant’s alignment of the nucleotide sequence coding region with its corresponding protein sequence.
USGENE Accession Numbers (/AN) comprise the publication number + the sequence identity number (SEQ ID NO).
8
15In contrast, NCBI/EMBL/DDBJ patent records
have minimal bibliographic and text data
Note: USPTO peptide sequence records available at EMBL (shown here), are also available a NCBI, but not at DDBJ.
16USGENE represents a new tool for tackling business critical searches
• DGENE and REGISTRY sequences are indexed by Thomson from the DWPISM basic and by CAS from the CAplusSM basic respectively– 65% of basics are PCT published applications
• USGENE provides sequences from both USPTO granted patents and published applications– Updated weekly, within 3 days of USPTO publication
• Sequence listing variation often occurs between published application and granted patent stage– Especially important, e.g. for freedom-to-operate
9
17USGENE provides sequences from both USPTO published applications and granted patents
AN .... Protein USGENEPI US …. A1SEQ 1 ….
AN .... DNA USGENEPI US …. A1SEQ 2 ….
AN .... Protein USGENEPI US …. B2SEQ 1 ….
AN .... DNA USGENEPI US …. B2SEQ 2 ….
AN .... Protein DGENEPI WO …. A1SEQ 1 ….
AN .... DNA DGENEPI WO …. A1SEQ 2 ….
WPINDEX = Derwent World Patents Index® on STN
DGENE = GENESEQTM on STN
USGENE® = USPTO Genetic Sequence Database
AN .... WPINDEX
PI WO ….. A1
FR ….. A1
EP ….. A1
US ….. A1
EP ….. B1
US ….. B2
In contrast, DGENE sequences are indexed from DWPI basic publications.
18Sequence listing variation often occurs between published application and granted patent stage
L1 ANSWER 1 OF 1 WPINDEX COPYRIGHT 2008 THE THOMSON CORP on STN AN 1994-358278 [44] WPINDEXTI New polynucleotide(s) specific for hepatitis C virus types 4, 5 and 6 -
and related antigenic peptide(s) and antibodies, useful in vaccines,diagnosis, HCV typing and treatment
DC B04; D16; S03IN PIKE I H; SIMMONDS P; YAP P LPA (COMM-N) COMMON SERVICES AGENCY; (MURE-N) MUREX DIAGNOSTICS INT INC; . . . PI WO 9425602 A1 19941110 (199444)* EN 70[5]
AU 9465797 A 19941121 (199508) ENFI 9505224 A 19951220 (199611) FIEP 698101 A1 19960228 (199613) EN [0] JP 09500009 W 19970107 (199711) JA 52[0] AU 695259 B 19980813 (199844) ENEP 698101 B1 20041103 (200475) ENDE 69434116 E 20041209 (200481) DEUS 20050032047 A1 20050210 (200512) ENUS 6881821 B2 20050419 (200527) EN. . . . .
ADT WO 9425602 A1 WO 1994-GB957 19940505 . . . . PRAI GB 1994-263 19940107
GB 1993-9237 19930505
In this example the patent family has:
• 9 sequences from WO 9425602 in DGENE• 58 sequences from US 6881821 in USGENE
10
19USGENE covers a comprehensive variety of USPTO patent publication types
PK Patent Kind covered in USGENE (field /PK)
USA1 Published patent applicationUSA2 Republished patent applicationUSA9 Corrected published patent application USA Granted patent (until 2000)USB1 Granted patent without pre-grant publication (2001 onwards)USB2 Granted patent with pre-grant publication (2001 onwards)USE Reissued patentUSP1 Published plant patent applicationUSP2 Granted plant patent without pre-grant publicationUSP3 Granted plant patent with pre-grant publicationWOA WIPO/PCT published patent application (parent case data)
20Agenda
• STN sequence searchable databases• USGENE database content• The 7 basic steps of USGENE BLAST®
• BLAST and Patent Family SORT (FSORT) • Post-processing BLAST search results• Sequence Code Match (SCM) with GETSEQ• Similarity searching GETSIM (FASTA)• Offline BATCH search mode• Multifile searching with DGENE• Comparisons and conclusions
11
21USGENE offers the same sequence search options as DGENE
• NCBI BLAST similarity– RUN BLAST
• FASTA similarity– RUN GETSIM
• Sequence Code Match (SCM)– RUN GETSEQ
• Offline BATCH and ALERT options
The DGENE Workshop Manual is the complete guide:http://www.stn-international.com/training_center/bioseq/dgene_wm.pdf
22The 7 basic steps of USGENE BLAST
1) SAVE, UPLOAD, and VERIFY the query (L1)2) RUN the BLAST search (/SQP or /SQN)3) Decide how many answers to keep (L2)4) SORT SCORE in Descending order (L3)5) Review answers in a free-of-charge format
e.g. D L3 TRI ORGN ALIGN 1-6) Display selected answers in bibliographic
format, e.g. D L3 BIB AB CLM ALIGN 1,3,107) Ensure transcript was captured and Logoff
12
23The 7 basic steps of USGENE BLAST
Search Question:Find relevant U.S. published application and patent references for this protein sequence:
1 vqtvplsrlf dhamleahra helaidtyqe feetyipkdq kysflhdsqt51 sfcfsdsipt psnmeetqqk snlellrisl llieswlepv rflrsmfann
101 lvydtsdsdd yhllkdleeg iqtlmgrled gsrrtgqilk qtyskfdtns151 hnhdallkny gllycfrkdm dkvetflrmv qcrsvegscg f
24The 7 basic steps of USGENE BLAST
1) SAVE, UPLOAD, and VERIFY the sequence query text file (L1)
Upload options• STN Express®: Use UPLOAD command or Upload
Query Wizard (STN Express 8.2+)• STN® on the WebSM: Use Upload feature or
Sequence Assistant (link below)Verify the sequence with D LQUE
STN on the Web Sequence Search Assistant:http://www.stn-international.com/training_center/bioseq/seq_se_ass.pdf
13
25Requirements for sequences for the STN Express Upload Query Wizard
• Sequence queries must be saved individually in text (.txt) format
• Files may – Be 3 letter codes (amino acids) or single letter – Have header information as seen in, e.g. WIPO
ST.25, USPTO PSIPS or EMBL formats– Include sequence count numbers
• Query (.txt) files must– Be 10,000 characters or less
• After upload to STN verify with D LQUE
26Examples of formats that work
DETD SEQUENCE CHARACTERISTICS:
SEQ ID NO: 4
LENGTH: 724
TYPE: PRT
ORGANISM: Artificial Sequence
FEATURE:
OTHER INFORMATION: Description of Artificial Sequence; Note = synthetic construct
SEQUENCE: 4
Met Ser Phe Val Asp His Pro Pro Asp Trp Leu Glu Glu Val Gly Glu
1 5 10 15
Gly Leu Arg Glu Phe Leu Gly Leu Glu Ala Gly Pro Pro Lys Pro Lys
20 25 30
<210> SEQ ID NO 137
<211> LENGTH: 951
<212> TYPE: DNA
<213> ORGANISM: Zea mays
<400> SEQUENCE: 137
accgaggccg acttcccgtt cactggccac gacgggacgt gcgatctcaa actgaaaaat 60
acaagggttg tatccataga ttcgttcgag cgtgtgccca tcaactacga gagagcgctg 120
cagaaggccg tggcgcacca gcctgttagt gccagcattg aagcatctcg gcgcgcgttc 180
cagctctaca gttctggcat cttcgacggg agatgcggga cgtacctgga ccacggtgtg 240
USPTO PSIPS ST.25 format
USPATFULL/USPAT2 format
14
27a) Choose the Upload Query Wizard
OR
From the Discover! button menu.
From the Select Discover! Wizard window.
28b) Browse to locate sequence file
Click Next button to go to the next step.
16
31e) Select STN file to upload to
Use USGENE to upload queries and verify them (lower connect hour). The resulting L-numbers may be searched in DGENE, PCTGEN, or USGENE.
Click Finish for the file to be “scrubbed”and uploaded to STN.
321) SAVE, UPLOAD and VERIFY (cont.)
=> FILE USGENE
=> UPL R BLAST
UPLOAD SUCCESSFULLY COMPLETEDL1 GENERATED
=> D L1 LQUE
L1 ANSWER 1 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN LQUE vqtvplsrlfdhamleahrahelaidtyqefeetyipkdqkysflhdsqtsfcfsdsi
ptpsnmeetqqksnlellrislllieswlepvrflrsmfannlvydtsdsddyhllkdleegiqtlmgrledgsrrtgqilkqtyskfdtnshnhdallknygllycfrkdmdkvetflrmvqcrsvegscgf
=>The sequence query is now ready for searching directly in USGENE using the L-number (L1).
These commands are automatically run by the STN Express Sequence Query Upload wizard.
17
33The 7 basic steps of USGENE BLAST
2) RUN the BLAST searchProtein search: RUN BLAST L1 /SQPNucleotide search: RUN BLAST L1 /SQNTranslated search: RUN BLAST L1 /TSQN
342) RUN the USGENE BLAST search
=> FILE USGENE
FILE 'USGENE' ENTERED AT 12:09:16 ON 03 FEB 2008COPYRIGHT (C) 2008 SEQUENCEBASE CORP
FILE LAST UPDATED: 1 FEB 2008 <20080201/UP>MOST RECENT PUBLICATION DATE: 31 JAN 2008 <20080131/PD>
FILE COVERS 1982 TO DATE
>>> SIMULTANEOUS LEFT AND RIGHT TRUNCATION (SLART) IS AVAILABLEIN THE BASIC INDEX (/BI) AND FEATURE TABLE (/FEAT) FIELDS <<<
=> RUN BLAST L1 /SQP -F F
BLAST Version 2.2
The BLAST software is used herein with permission of the National Center for Biotechnology Information (NCBI) of the National Library of Medicine (NLM). See also, . . . .
BLAST SEARCHING . . . .
Turn the Low Complexity Filter off with the syntax… /SQP –F F
USGENE is updated within 3 days of publication by the USPTO.
18
35RUN BLAST command syntax
Similarity Searching with BLAST (protein/polypeptides)
=> RUN BLAST L1 (sequence or L-number)/SQP (protein) (default)
-e (Expect-value)-f (Filter) (on by default)-w (Word size)-m (Matrix)-g (Gap penalty)-x (Gap extension)
BATCH (offline)ALERT (Alert/SDI)
36RUN BLAST command syntax
Similarity Searching with BLAST (Nucleic acids)=> RUN BLAST L1 (sequence or L-number)
/SQN (nucleotide)SIN (single strand)COM (complementary strand)BOTH (both strands) (default)
-e (Expect-value)-f (Filter)-w (Word size)-g (Gap penalty)-x (Gap extension)-q (penalty for mismatch)-r (reward for match)
BATCH (offline)ALERT (Alert/SDI)
19
37RUN BLAST advanced options
Expectation Value (-E)Expectation value (E-Value) is the statistical significance threshold for reporting matches against a sequence database. The E-value can be any positive number, and the default value is 10. This means that 10 matches may be expected to be found merely by chance. In general E-value is lowered to make the search more precise and raised to retrieve more answers. Word Size (-W)Word Size is the length of the character string fragments of a sequence query which are used as the basis for a BLAST search. For SQN the default is 11 and the range 7-23. For all other BLAST searches the default is 3 and the range 2-3. For short search queries, reducing the default word size can give improved search results.
38RUN BLAST advanced options (cont.)
Low Complexity Filtering (on by default) (-F)The low complexity filter can eliminate biologically uninteresting segments that have low compositional complexity and are statistically significant, as determined by specific programs for peptide or nucleotide sequences in nature. Filtering is applied to the query sequence and is indicated by a series of Xs for peptide sequences and Ns for nucleotide sequences. Low complexity filtering can be turned off (i.e. set to F - false). Peptide similarity matrices (-M)For peptide based searches SQP and TSQN the advanced options provide additional scoring matrices to the default BLOSUM62 (next slide)
20
39Guidelines from NCBI on the use of Advanced Settings for peptide sequence
searching are as follows:
Query Length Matrix Gap costs
<35 PAM-30 (9,1)
35 – 50 PAM-70 (10,1)
50 – 85 BLOSUM-80 (10,1)
>85 BLOSUM-62 (11,1) (BLAST default)
40The 7 basic steps of USGENE BLAST
3) Decide how many answers to keep (L2)How many answers would you like to keep? (ALL) or ?:Recommendation: Keep ALL answers
21
41
1350 ANSWERS FOUND BELOW EXPECTATION VALUE OF 10.0
SimilarityScore
390 | | | ||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||
195 |||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Answer Count 270 540 810 1080 1350
HOW MANY ANSWERS WOULD YOU LIKE TO KEEP ? (ALL) OR ?: ALL
3) Decide how many answers to keep
The graphic representation gives a count of hit sequences (x-axis) and similarity score (y-axis). The graph gives a visual clue about the proportion of similar and not so similar sequences in the answer set.
Recommendation: keep ALL answers
42The 7 basic steps of USGENE BLAST
4) SORT by SCORE descending (L3)SOR L2 SCORE DOption: limit using text terms and/or dates (L4)Remember to SORT L4 SCORE D !! (L5)
22
434) SORT by SCORE descending
HOW MANY ANSWERS WOULD YOU LIKE TO KEEP ? (ALL) OR ?: ALLL2 RUN STATEMENT CREATED
L2 1350 VQTVPLSRLFDHAMLEAHRAHELAIDTYQEFEETYIPKDQKYSFLHDSQT
SFCFSDSIPTPSNMEETQQKSNLELLRISLLLIESWLEPVRFLRSMFANN
LVYDTSDSDDYHLLKDLEEGIQTLMGRLEDGSRRTGQILKQTYSKFDTNS
HNHDALLKNYGLLYCFRKDMDKVETFLRMVQCRSVEGSCGF/SQP.-F F
Answer set arranged by accession number; to sort by descending
similarity score, enter at an arrow prompt (=>) "sor score d".
=> SOR SCORE DPROCESSING COMPLETED FOR L2
L3 1350 SOR L2 SCORE DUse SORT SCORE D to sort by descending BLAST score.
44The 7 basic steps of USGENE BLAST
5) Review answers using a free-of-charge format including alignment (ALIGN), while “parked” in the STNGUIDESM file
D L5 TRI ORGN ALIGN 1-FILE STNGUIDE
23
455) Review answers with a free-of-charge format including alignment
=> D L3 TRI ORGN ALIGN 1-30; FILE STNGUIDE
L3 ANSWER 1 OF 1350 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN TI Recombinant DNA transfer vectors (Patent)MTY ProteinSQL 191ORGN UnknownBLASTALIGN
Query = 191 lettersLength = 191Score = 390 bits (1001), Expect = e-113Identities = 191/191 (100%), Positives = 191/191 (100%)Query: 1 VQTVPLSRLFDHAMLEAHRAHELAIDTYQEFEETYIPKDQKYSFLHDSQTSFCFSDSIPT
VQTVPLSRLFDHAMLEAHRAHELAIDTYQEFEETYIPKDQKYSFLHDSQTSFCFSDSIPTSbjct: 1 VQTVPLSRLFDHAMLEAHRAHELAIDTYQEFEETYIPKDQKYSFLHDSQTSFCFSDSIPT Query: 61 PSNMEETQQKSNLELLRISLLLIESWLEPVRFLRSMFANNLVYDTSDSDDYHLLKDLEEG
PSNMEETQQKSNLELLRISLLLIESWLEPVRFLRSMFANNLVYDTSDSDDYHLLKDLEEGSbjct: 61 PSNMEETQQKSNLELLRISLLLIESWLEPVRFLRSMFANNLVYDTSDSDDYHLLKDLEEG . . . .
This top hit comes from a U.S. issued patent.
465) Review answers with a free-of-charge format including alignment
L3 ANSWER 5 OF 1350 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN TI Genetic polymorphisms associated with myocardial infarction, methods
of detection and uses thereof (PublishedApplication)MTY ProteinSQL 217ORGN Homo SapiensBLASTALIGN
Query = 191 lettersLength = 217Score = 387 bits (995), Expect = e-113Identities = 189/191 (98%), Positives = 191/191 (100%)Query: 1 VQTVPLSRLFDHAMLEAHRAHELAIDTYQEFEETYIPKDQKYSFLHDSQTSFCFSDSIPT
VQTVPLSRLFDHAML+AHRAH+LAIDTYQEFEETYIPKDQKYSFLHDSQTSFCFSDSIPTSbjct: 1 VQTVPLSRLFDHAMLQAHRAHQLAIDTYQEFEETYIPKDQKYSFLHDSQTSFCFSDSIPT Query: 61 PSNMEETQQKSNLELLRISLLLIESWLEPVRFLRSMFANNLVYDTSDSDDYHLLKDLEEG
PSNMEETQQKSNLELLRISLLLIESWLEPVRFLRSMFANNLVYDTSDSDDYHLLKDLEEGSbjct: 61 PSNMEETQQKSNLELLRISLLLIESWLEPVRFLRSMFANNLVYDTSDSDDYHLLKDLEEG Query: 121 IQTLMGRLEDGSRRTGQILKQTYSKFDTNSHNHDALLKNYGLLYCFRKDMDKVETFLRMV
IQTLMGRLEDGSRRTGQILKQTYSKFDTNSHNHDALLKNYGLLYCFRKDMDKVETFLRMVSbjct: 147 IQTLMGRLEDGSRRTGQILKQTYSKFDTNSHNHDALLKNYGLLYCFRKDMDKVETFLRMV . . . .
The 5th from top hit comes from a U.S. published application.
BLAST alignment details are explained on the next slide. . . .
24
47Understanding BLAST alignments
Query the length of the query sequenceLength the length of the answer sequenceScore a relative score assigned by BLASTExpect Expectation Value – a value representing the
chance that an answer is a random hit. The closer to zero, the less likely the hit is random
Identities the number of exact letter matches between query and answer within the displayed local alignment. The amino acid letter is repeated* in the display
Positives a combination of identities and amino acid family matches shown with + (plus) in the alignment
Gaps shown as dashes - where BLAST must break the query or answer to maintain an alignment
(* For nucleic acid searches a vertical bar is used to indicate nucleotide identities in the alignment display.)
48USGENE provides text search options for refining sequence searches
• The USGENE default text search index – known on STN as the Basic Index (/BI) – comprises– Original publication Title (/TI) and abstract (/AB)– Organism name (/ORGN) and Molecule Type (/MTY)
• The Exemplary Claim (/ECLM) and Feature Table (/FEAT) can also be added to a search– Either specify the fields: => S VIRUS/BI,FEAT– Or use SET SFIELDS: => SET SFIELDS BI ECLM
• The Basic Index and Feature Table both offer simultaneous left and right truncation (SLART)
25
49USGENE provides bibliographic search options for refining sequence searches
• Patent Assignee (/PA) and Inventor (/IN)– Examples: GLAXO/PA, SMITH JOHN/IN
• Granted or application Sequence Source (/SSO)– Examples: APPLICATION/SSO, GRANTED/SSO
• Publication date (/PD) or publication year (/PY)– Examples: PY < 2001, PD < 1 Mar 1995
• Application date (/AD) or application year (/AY)– Examples: AY < 2002, AD < 1 Mar 1998
• WO application date (/RLD) or year (/RLY)– Examples: RLY < 1993, RLD < 1 Aug 1986
50Option: refine USGENE BLAST results with text and/or date search terms
HOW MANY ANSWERS WOULD YOU LIKE TO KEEP ? (ALL) OR ?: ALLL2 RUN STATEMENT CREATEDL2 1350 VQTVPLSRLFDHAMLEAHRAHELAIDTYQEFEETYIPKDQKYSFLHDSQT
SFCFSDSIPTPSNMEETQQKSNLELLRISLLLIESWLEPVRFLRSMFANNLVYDTSDSDDYHLLKDLEEGIQTLMGRLEDGSRRTGQILKQTYSKFDTNSHNHDALLKNYGLLYCFRKDMDKVETFLRMVQCRSVEGSCGF/SQP.-F F
Answer set arranged by accession number; to sort by descendingsimilarity score, enter at an arrow prompt (=>) "sor score d".
=> SOR SCORE DPROCESSING COMPLETED FOR L2 L3 1350 SOR L2 SCORE D
=> S L2 AND SOMATOMAMMOTROPIN/BI,ECLM AND AY<1996 AND GRANTED/SSOL4 7 L2 AND SOMATOMAMMOTROPIN AND AY<1996 AND GRANTED/SSO
=> SOR SCORE DPROCESSING COMPLETED FOR L4 L5 7 SOR L4 SCORE D
If you limit using text and/or date terms remember to SORT SCORE D again!
The BLAST search (L2) is further refined to sequences from granted patents, with application year prior to 1996, and to a specific text search term (L4).
26
51The 7 basic steps of USGENE BLAST
6) Display selected relevant answers in a bibliographic format including alignment
D L5 BIB AB CLM ALIGN 1 5 67) Ensure your STN Express session transcript
was captured and then logoff
526) Display selected USGENE answers in a preferred bibliographic format
=> D BIB AB CLM ORGN SSO ALIGN 1 3 5
L5 ANSWER 1 OF 7 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN AN 4363877.1 Protein USGENETI Recombinant DNA transfer vectors (Patent)IN Goodman Howard M. (San Francisco, CA); Shine John (San Francisco, CA);
Seeburg Peter H. (San Francisco, CA)PA The Regents of the University of California(Berkeley CA)PI US 4363877 A 19821214AI US 1978-897710 19780419AB Recombinant DNA transfer vectors containing codons for human
somatomammotropin and for human growth hormone.
CLM US4363877 A: What is claimed is:1. A recombinant DNA transfer vector comprising codons for humanchorionic somatomammotropin comprising the nucleotide . . . .
ORGN UnknownSSO PROTEIN; EMBL; GRANTED
BLASTALIGN . . . .
This sequence hit comes from a U.S. granted patent, with an application date prior to 1996, and a key concept in the abstract and claims.
Note: this USGENE sequence record, sourced from EMBL, is an example of one which is not indexed in DGENE or REGISTRY.
27
53Useful USGENE display fields/formats
TRIAL* Title, Molecule Type, Sequence LengthSCAN* Random TitleALIGN* BLAST/GETSIM Sequence AlignmentSCORE* Similarity Score (for post-processing)BIB Inventors, Assignees, numbers, datesAB Original abstractECLM Exemplary (1st) claim textCLM All claims textBRIEF BIB + AB + ECLM, sequence, sequence
source (SSO), feature table (FEAT)ALL BRIEF with CLM instead of ECLM
(* Free of charge display formats in USGENE.)
54The importance of using the correct BLAST advanced options
=> RUN BLAST GSSFLSPEHQR/SQP
BLAST Version 2.2 . . . .
NO ANSWERS FOUND BELOW THRESHOLD OF 10
=> RUN BLAST GSSFLSPEHQR/SQP -M PAM30 –W 2 –E 1000 –F F
BLAST Version 2.2 . . . .
712 ANSWERS FOUND BELOW EXPECTATION VALUE OF 1000.0
HOW MANY ANSWERS WOULD YOU LIKE TO KEEP ? (ALL) OR ?: ALLL1 RUN STATEMENT CREATEDL1 712 GSSFLSPEHQR/SQP.-M PAM30 –W 2 –E 1000 –F F
Answer set arranged by accession number; to sort by descendingsimilarity score, enter at an arrow prompt (=>) "sor score d".
Changing BLAST options is especially important for short sequence queries!
28
55The importance of using the correct BLAST advanced options (cont.)
=> SOR L1 SCORE DPROCESSING COMPLETED FOR L1 L2 712 SOR L1 SCORE D
=> D TRI ALIGN
L2 ANSWER 1 OF 712 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN TI Fluorescently labeled growth hormone secretagogue (Patent)MTY ProteinSQL 18BLASTALIGN
Query = 11 lettersLength = 18Score = 37.5 bits (81), Expect = 1e-09Identities = 11/11 (100%), Positives = 11/11 (100%)Query: 1 GSSFLSPEHQR 11
GSSFLSPEHQRSbjct: 1 GSSFLSPEHQR 11
Correct use of BLAST options finds relevant sequence hits.
56Review: 7 steps of USGENE BLAST
1) SAVE, UPLOAD, and VERIFY the query (L1)2) RUN the BLAST search (/SQP or /SQN)3) Decide how many answers to keep (L2)4) SORT SCORE in Descending order (L3)5) Review answers in a free-of-charge format
e.g. D L3 TRI ORGN ALIGN 1-6) Display selected answers in bibliographic
format, e.g. D L3 BIB AB CLM ALIGN 1,3,107) Ensure transcript was captured and Logoff
29
57Agenda
• STN sequence searchable databases• USGENE database content• The 7 basic steps of USGENE BLAST®
• BLAST and Patent Family SORT (FSORT)• Post-processing BLAST search results• Sequence Code Match (SCM) with GETSEQ• Similarity searching GETSIM (FASTA)• Offline BATCH search mode• Multifile searching with DGENE• Comparisons and conclusions
58USGENE answer sets may be grouped by source
publications using Family SORT (FSORT)
• FSORT gathers multiple sequence hits from the same applications together via publication, application and/or WO/PCT related application numbers
• FSORT organizes answers into two subgroups: multiple sequence hit (multi-record) families and single sequence hit (individual-record) families
• When FSORT is used on an answer set previously sorted by similarity SCORE, the two FSORT subgroups each separately retain their similarity sort order
• FSORT makes it possible to review, e.g. just the most similar sequence answer for each application retrieved, or all the sequences from a single application
30
59USGENE answer sets may be grouped by source
publications using Family SORT (FSORT)
Search Question:Find all relevant U.S. published application and patent references with sequences similar to the Banana Bunchy Top Virus (BBTV) Replication Initiation Protein (NCBI: AAG44003).
60Banana Bunchy Top Virus (BBTV) Replication
Initiation Protein (NCBI: AAG44003)
31
61SAVE, UPLOAD and VERIFY
There are 17 sequence records in DGENE for CA2325774.
=> FILE USGENE
=> UPL R BLAST
UPLOAD SUCCESSFULLY COMPLETEDL1 GENERATED
=> D L1 LQUE
L1 ANSWER 1 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN LQUE MSSFKWCFTLNYSSAAEREDFLALLKEEELNYAVVGDEVAPSSGQKHLQGYLSLKKSIK
LGGLKKKYSSRAHWERARGSDEDNAKYCSKETLILELGFPASQGSNRRKLSEMVSRSPERMRIEQPEIYHRYTSVKKLKKFKEEFVHPCLDRPWQIQLTEAIDEEPDDRSIIWVYGPNGNEGKSTYAKSLMKKDWFYTRGGKKENILFSYVDEGSEKHIVFDIPRCNQDYLNYDVIE ALKDRVIESTKYKPIKLVELINIHVIVMANFMPEFCKISEDRIKIIYC
=>
These commands are automatically run by the STN Express Sequence Query Upload wizard (slides 27-31).
The sequence query is now ready for searching directly in USGENE using the L-number (L1).
62RUN the USGENE BLAST search
=> FILE USGENE
FILE 'USGENE' ENTERED AT 22:44:51 ON 03 FEB 2008COPYRIGHT (C) 2008 SEQUENCEBASE CORP
FILE LAST UPDATED: 1 FEB 2008 <20080201/UP>MOST RECENT PUBLICATION DATE: 31 JAN 2008 <20080131/PD>
FILE COVERS 1982 TO DATE
>>> SIMULTANEOUS LEFT AND RIGHT TRUNCATION (SLART) IS AVAILABLEIN THE BASIC INDEX (/BI) AND FEATURE TABLE (/FEAT) FIELDS <<<
=> RUN BLAST L1 /SQP -F F
BLAST Version 2.2
The BLAST software is used herein with permission of the National Center for Biotechnology Information (NCBI) of the National Library of Medicine (NLM). See also, . . . .
BLAST SEARCHING . . . .
Turn the Low Complexity Filter off with the syntax… /SQP –F F
USGENE is updated within 3 days of publication by the USPTO.
32
63Decide how many answers to keep
57 ANSWERS FOUND BELOW EXPECTATION VALUE OF 10.0
SimilarityScore
520 | | | | | | | || || ||
260 || ||| ||| ||| ||||| ||||| ||||| |||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Answer Count 10 20 30 40 50
HOW MANY ANSWERS WOULD YOU LIKE TO KEEP ? (ALL) OR ?: ALL
The graphic representation gives a count of hit sequences (x-axis) and similarity score (y-axis). The graph gives a visual clue about the proportion of similar and not so similar sequences in the answer set.
Recommendation: keep ALL answers
64SORT by SCORE descending
HOW MANY ANSWERS WOULD YOU LIKE TO KEEP ? (ALL) OR ?:ALLL2 RUN STATEMENT CREATEDL2 57 MSSFKWCFTLNYSSAAEREDFLALLKEEELNYAVVGDEVAPSSGQKHLQG
YLSLKKSIKLGGLKKKYSSRAHWERARGSDEDNAKYCSKETLILELGFPASQGSNRRKLSEMVSRSPERMRIEQPEIYHRYTSVKKLKKFKEEFVHPCLDRPWQIQLTEAIDEEPDDRSIIWVYGPNGNEGKSTYAKSLMKKDWFYTRGGKKENILFSYVDEGSEKHIVFDIPRCNQDYLNYDVIEALKDRVIESTKYKPIKLVELINIHVIVMANFMPEFCKISEDRIKIIYC/SQP.-F F
Answer set arranged by accession number; to sort by descendingsimilarity score, enter at an arrow prompt (=>) "sor score d".
=> SOR SCORE DPROCESSING COMPLETED FOR L2 L3 57 SOR L2 SCORE D
=> SET FORMAT .MYUSGENE BIB AB ECLM ORGN SQL SCORE ALIGNSET COMMAND COMPLETED
=> SET DFORMAT .MYUSGENESET COMMAND COMPLETED
Option: set a customized display format with SET FORMAT. The new format may be set as the file default with SET DFORMAT.
33
65Display selected USGENE answers using the new customized default display format
=> D 1-2
L3 ANSWER 1 OF 57 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN AN 5846705.16 Protein USGENETI Nucleotide sequence of two circular SSDNA associated with banana
bunchy top virus and method for detection of banana bunchy top virus (Patent)
IN Wu Rey-Yuh (Taipei, TW); You Li-Ru (Taipei, TW); Soong Tai-Seng(Taipei,TW)
PA Development Center for Biotechnology (Taipei TW)PI US 5846705 A 19981208AI US 1995-418071 19950406AB Nucleotide sequences of two circular single-stranded DNAs . . . .ECLM US5846705 A: 1. An isolated DNA molecule comprising a . . . .ORGN UnknownSQL 286SCORE 520 BLASTALIGN
Query = 284 lettersLength = 286Score = 520 bits (1338), Expect = e-152Identities = 247/282 (87%), Positives = 268/282 (94%)
Query: 3 SFKWCFTLNYSSAAEREDFLALLKEEELNYAVVGDEVAPSSGQKHLQGYLSLKKSIKLGG S KWCFTLNYSSAAERE+FL+LLKEE+++YAVVGDEVAP++GQKHLQGYLSLKK I+LGG
Sbjct: 5 SLKWCFTLNYSSAAERENFLSLLKEEDVHYAVVGDEVAPATGQKHLQGYLSLKKRIRLGG . . . . .
The top hit is SEQ ID 16 from US5846705.
66The second hit sequence comes from the same U.S. patent as the top hit
L3 ANSWER 2 OF 57 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN AN 5846705.17 Protein USGENETI Nucleotide sequence of two circular SSDNA associated with banana
bunchy top virus and method for detection of banana bunchy top virus (Patent)
IN Wu Rey-Yuh (Taipei, TW); You Li-Ru (Taipei, TW); Soong Tai-Seng(Taipei, TW)
PA Development Center for Biotechnology(Taipei TW)PI US 5846705 A 19981208AI US 1995-418071 19950406AB Nucleotide sequences of two circular single-stranded DNAs . . . .ECLM US5846705 A: 1. An isolated DNA molecule comprising a nucleotide
sequence encoding a polypeptide comprising amino acid . . . .ORGN UnknownSQL 285SCORE 340 BLASTALIGN
Query = 284 lettersLength = 285Score = 340 bits (872), Expect = 2e-98Identities = 171/288 (59%), Positives = 217/288 (74%), Gaps = 7/288Query: 1 MSSFKWCFTLNYSSAAEREDFLALLKEEELNYAVVGDEVAPSSGQKHLQGYLSLKKSIKL
MSSFKWCFTLNYSSAAEREDFLALLKEE+++Y+VVGDEVAP++GQKHL GYLSLKKSI+LSbjct: 1 MSSFKWCFTLNYSSAAEREDFLALLKEEDVHYSVVGDEVAPATGQKHLGGYLSLKKSIRL
. . . . .
The 2nd hit is SEQ ID 17 from US5846705.
34
67USGENE answer sets may be grouped by source
publications using Family SORT (FSORT)=> FSORT L3
SEL L3 1- PN,APPSL4 SEL L3 1- PN APPS : 45 TERMS
'L4' DELETEDL4 57 FSO L3
12 Multi-record Families Answers 1-50Family 1 Answers 1-3Family 2 Answers 4-5Family 3 Answers 6-7Family 4 Answers 8-13Family 5 Answers 14-19Family 6 Answers 20-25Family 7 Answers 26-31Family 8 Answers 32-37Family 9 Answers 38-39Family 10 Answers 40-44Family 11 Answers 45-47Family 12 Answers 48-50
7 Individual Records Answers 51-570 Non-patent Records
The 57 sequence hits belong to 12 multi-hit and 7 individual-hit source publications.
68Use the patent family display (PFAM) feature to
display selective records from a FSORT L-number
General format of PFAM:=> D L# PFAM=# RECORD# FORMAT
Examples using PFAM:=> D PFAM=1-10
1st member of patent family number 1-10 in default display format
=> D PFAM=2 TRI ORGN ALIGN 1-TOTAL
All members of family number 2 in a free sequence review format
35
69The top answer is the same as before….
=> D PFAM=1-2
L4 ANSWER 1 OF 57 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN FAMILY1AN 5846705.16 Protein USGENETI Nucleotide sequence of two circular SSDNA associated with banana
bunchy top virus and method for detection of banana bunchy top virus (Patent)
IN Wu Rey-Yuh (Taipei, TW); You Li-Ru (Taipei, TW); Soong Tai-Seng(Taipei,TW)
PA Development Center for Biotechnology (Taipei TW)PI US 5846705 A 19981208AI US 1995-418071 19950406AB Nucleotide sequences of two circular single-stranded DNAs . . . .ECLM US5846705 A: 1. An isolated DNA molecule comprising a . . . .ORGN UnknownSQL 286SCORE 520 BLASTALIGN
Query = 284 lettersLength = 286Score = 520 bits (1338), Expect = e-152Identities = 247/282 (87%), Positives = 268/282 (94%)
Query: 3 SFKWCFTLNYSSAAEREDFLALLKEEELNYAVVGDEVAPSSGQKHLQGYLSLKKSIKLGG S KWCFTLNYSSAAERE+FL+LLKEE+++YAVVGDEVAP++GQKHLQGYLSLKK I+LGG
Sbjct: 5 SLKWCFTLNYSSAAERENFLSLLKEEDVHYAVVGDEVAPATGQKHLQGYLSLKKRIRLGG . . . . .
The top hit is SEQ ID 16 from US5846705.
The first record from families 1 & 2 in default format.
70…but the second answer displayed is now the best answer from the 2nd family
L4 ANSWER 4 OF 57 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN FAMILY2AN 5756708.26 Protein USGENETI DNA sequences of banana bunchy top virus (Patent)IN Karan Mirko (Holland Park, AU); Burns Thomas Michael (Herston, AU);
Dale James Langham (Moggill, AU); Harding Robert Maxwell(Lawnton, AU)PA Queensland University of Technology(Brisbane AU)PI US 5756708 A 19980526AI US 1994-202186 19940224DT PatentAB The invention provides DNA molecules consisting essentially of a
nucleotide sequence or part thereof which are associated . . . . ECLM US5756708 A: 1. An isolated DNA molecule derived from banana bunchy
top virus, consisting of a nucleotide sequence selected . . . .ORGN UnknownSQL 290SCORE 243BLASTALIGN
Query = 284 lettersLength = 290Score = 243 bits (621), Expect = 3e-69Identities = 117/282 (41%), Positives = 183/282 (64%), Gaps = 6/282
Query: 5 KWCFTLNYSSAAEREDFLALLKEEELNYAVVGDEVAPSSGQKHLQGYLSLKKSIKLGGLK +WCFTLNY + E + + ++ L YA+VGDEVAPS+GQ+HLQG++ LK +L GLK
Sbjct: 7 RWCFTLNYETEEEAANVVRRIESLNLVYAIVGDEVAPSTGQRHLQGFIHLKTGRRLQGLK . . . . .
The 2nd hit is now SEQ ID 26 from US5756708.
36
71Agenda
• STN sequence searchable databases• USGENE database content• The 7 basic steps of USGENE BLAST®
• BLAST and Patent Family SORT (FSORT)• Post-processing BLAST search results• Sequence Code Match (SCM) with GETSEQ• Similarity searching GETSIM (FASTA)• Offline BATCH search mode• Multifile searching with DGENE• Comparisons and conclusions
72STN Express 8.2+ post-processing tools
• Table Tool to create tabulated results– Good for scanning/reviewing search results
• Predefined Report Tool for a report using a Standard Patent Record layout– Easy way to tidy-up your patent results for a client
• Customized Report Tool to control all options– E.g. fonts, cover page, which data fields to include
37
73USGENE results may be tabulated using STN Express 8.2+ Table Tool
Search Question:Find all relevant U.S. published application and patent references with sequences similar to the Human osteoprotegerin (OPG) mRNA, complete CDS (NCBI: U94332).
74Human osteoprotegerin (OPG) mRNA, complete CDS (NCBI: U94332)
38
75Ensure you capture your STN session
Record your session as a Transcript (.TRN) file or as an RTF file.
76SAVE, UPLOAD and VERIFY
There are 17 sequence records in DGENE for CA2325774.
=> FILE USGENE
=> UPL R BLAST
UPLOAD SUCCESSFULLY COMPLETEDL1 GENERATED
=> D L1 LQUE
L1 ANSWER 1 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN LQUE gtatatataacgtgatgagcgtacgggtgcggagacgcaccggagcgctcgcccagccg
ccgctccaagcccctgaggtttccggggaccacaatgaacaagttgctgtgctgcgcgctcgtgtttctggacatctccattaagtggaccacccaggaaacgtttcctccaaagtac. . . . .tggccattgagctgtttcctcacaattggcgagatcccatggatgataa
=>
These commands are automatically run by the STN Express Sequence Query Upload wizard (slides 27-31).
The sequence query is now ready for searching directly in USGENE using the L-number (L1).
39
77RUN the USGENE BLAST search
=> FILE USGENE
FILE 'USGENE' ENTERED AT 04:38:02 ON 03 FEB 2008COPYRIGHT (C) 2008 SEQUENCEBASE CORP
FILE LAST UPDATED: 1 FEB 2008 <20080201/UP>MOST RECENT PUBLICATION DATE: 31 JAN 2008 <20080131/PD>
FILE COVERS 1982 TO DATE
>>> SIMULTANEOUS LEFT AND RIGHT TRUNCATION (SLART) IS AVAILABLEIN THE BASIC INDEX (/BI) AND FEATURE TABLE (/FEAT) FIELDS <<<
=> RUN BLAST L1 /SQN -F F
BLAST Version 2.2
The BLAST software is used herein with permission of the National Center for Biotechnology Information (NCBI) of the National Library of Medicine (NLM). See also, . . . .
BLAST SEARCHING . . . .
Turn the Low Complexity Filter off with the syntax… /SQP –F F
USGENE is updated within 3 days of publication by the USPTO.
78Decide how many answers to keep
554 ANSWERS FOUND BELOW EXPECTATION VALUE OF 10.0
SimilarityScore
2686 || || ||||||||| |||||||||| |||||||||| ||||||||||| |||||||||||| |||||||||||| ||||||||||||| |||||||||||||
1343 |||||||||||||| ||||||||||||||| ||||||||||||||| |||||||||||||||| ||||||||||||||||||| |||||||||||||||||||| |||||||||||||||||||| ||||||||||||||||||||| ||||||||||||||||||||| ||||||||||||||||||||||||
Answer Count 110 220 330 440 550
HOW MANY ANSWERS WOULD YOU LIKE TO KEEP ? (ALL) OR ?: ALL
The graphic representation gives a count of hit sequences (x-axis) and similarity score (y-axis). The graph gives a visual clue about the proportion of similar and not so similar sequences in the answer set.
Recommendation: keep ALL answers
40
79SORT by SCORE descending
HOW MANY ANSWERS WOULD YOU LIKE TO KEEP ? (ALL) OR ?: ALLL2 RUN STATEMENT CREATEDL2 554 GTATATATAACGTGATGAGCGTACGGGTGCGGAGACGCACCGGAGCGCTC
GCCCAGCCGCCGCTCCAAGCCCCTGAGGTTTCCGGGGACCACAATGAACA. . . . .TGGAAATGGCCATTGAGCTGTTTCCTCACAATTGGCGAGATCCCATGGATGATAA/SQN.-F F
Answer set arranged by accession number; to sort by descendingsimilarity score, enter at an arrow prompt (=>) "sor score d".
=> SET SFIELDS BI ECLM PERMSET COMMAND COMPLETED
=> S L2 AND (OSTEO? OR BONE) AND GRANTED/SSO AND AY<2001L3 245 L2 AND (OSTEO?/BI,ECLM OR BONE/BI,ECLM) AND GRANTED/SSO AND
AY<2001
=> SOR SCORE DPROCESSING COMPLETED FOR L3 L4 245 SOR L3 SCORE D
Use SET SFIELDS to change the USGENE default search index.
After refining using date and text terms remember to SOR SCORE D.
80Grouped by source publications using Family SORT (FSORT)
=> FSORT L4
SEL L4 1- PN,APPSL5 SEL L4 1- PN APPS : 25 TERMS
L5 245 FSO L4
11 Multi-record Families Answers 1-244Family 1 Answers 1-11Family 2 Answers 12-22Family 3 Answers 23-33Family 4 Answers 34-44Family 5 Answers 45-71Family 6 Answers 72-83Family 7 Answers 84-118Family 8 Answers 119-179Family 9 Answers 180-240Family 10 Answers 241-242Family 11 Answers 243-244
1 Individual Record Answer 2450 Non-patent Records
The 245 sequence hits belong to 11 multi-hit and 1 individual-hit source publications.
41
81Reviewing the SCORE display can be one way to identify answers of interest
=> D PFAM=1- SCORE
L5 ANSWER 1 OF 245 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN FAMILY1SCORE 2686
L5 ANSWER 12 OF 245 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN FAMILY2SCORE 2686 . . . . .
L5 ANSWER 119 OF 245 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN FAMILY8SCORE 2375
L5 ANSWER 180 OF 245 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN FAMILY9SCORE 2375
L5 ANSWER 241 OF 245 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STNFAMILY10SCORE 40
L5 ANSWER 243 OF 245 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STNFAMILY11SCORE 40
L5 ANSWER 245 OF 245 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 2686
The SCORE for the best answer from each family.
Note: the FSORT individual-hit record also has a top score.
82Use the PFAM feature to display selective records from an FSORT L-number
=> D PFAM=1-9,12L5 ANSWER 1 OF 245 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STNFAMILY1AN 6284740.5 cDNA USGENETI Osteoprotegerin (Patent)IN Boyle William J. (Moorpark, CA); Lacey David L. (Thousand Oaks, CA);
Calzone Frank J. (Westlake Village, CA); . . . .PA Amgen Inc (Thousand Oaks CA)PI US 6284740 B1 20010904AI US 1997-974186 19971118AB The present invention discloses a novel secreted polypeptide, termed
Osteoprotegerin, which is a member of the tumor necrosis . . . .ECLM US6284740 B1: What is claimed is:1. A method of increasing levels of
osteoprotegerin in a mammal comprising administering to . . . .ORGN not providedSQL 1355SCORE 2686 BLASTALIGN
Query = 1355 lettersLength = 1355Score = 2686 bits (1355), Expect = 0.0Identities = 1355/1355 (100%)Strand = Plus / Plus
Query: 1 gtatatataacgtgatgagcgtacgggtgcggagacgcaccggagcgctcgcccagccgc||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1 gtatatataacgtgatgagcgtacgggtgcggagacgcaccggagcgctcgcccagccgc
The top hit is SEQ ID 5 from US6284740.
42
83Use the PFAM feature to display selective records from an FSORT L-number (cont.)
L5 ANSWER 180 OF 245 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STNFAMILY9AN 6919434.6 DNA USGENETI Monoclonal antibodies that bind OCIF (Patent)IN Goto Masaaki (Tochigi, JP); Tsuda Eisuke (Tochigi, JP); . . . .PA Sankyo Co Ltd (Tokyo JP)PI US 6919434 B1 20050719AI US 1999-338063 19990623AB A protein which inhibits osteoclast diffraction and/or maturation and
a method for producing the protein. The protein is produced by humanembryonic lung fibroblasts and has a molecular weight of . . . .
ECLM US6919434 B1: 1. An isolated monoclonal antibody produced by a hybridoma selected from the group consisting of A1G5 having Accession No. FERM BP-7441,D2F4having Accession No. FERM BP-7442, . . . .
ORGN UnknownSQL 1206SCORE 2375 BLASTALIGN
Query = 1355 lettersLength = 1206Score = 2375 bits (1198), Expect = 0.0Identities = 1204/1206 (99%)Strand = Plus / Plus
Query: 94 atgaacaagttgctgtgctgcgcgctcgtgtttctggacatctccattaagtggaccacc|||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1 atgaacaacttgctgtgctgcgcgctcgtgtttctggacatctccattaagtggaccacc
This hit is SEQ ID 6 from US6919434.
84After logging off from STN select the table tool from the main STN Express tool bar
The most recent Transcript is automatically selected.
43
85If available choose any template you have defined previously
The first time you use the table tool, no templates have been defined yet.
86Choose a previously defined template
Pick the chosen answer set L-number and record numbers.
44
87Set highlighting preferences
Extra terms that were not originally searched may be highlighted.
88Set up report cover page
Here, we have decided not to add a cover page.
45
89Select fields, fonts, colors, change field order,
customize field names and save templates
Choose fields, order formats and personalized names.
90STN Express Table Tool output can be edited and adjusted as needed
If needed, go back and edit choices fields, formats, etc.
46
91STN Express Table Tool output can be edited, adjusted and saved in Excel format
Note: see separate appendix for the full printout of this table.
92Agenda
• STN sequence searchable databases• USGENE database content• The 7 basic steps of USGENE BLAST®
• BLAST and Patent Family SORT (FSORT)• Post-processing BLAST search results• Sequence Code Match (SCM) with GETSEQ• Similarity searching GETSIM (FASTA)• Offline BATCH search mode• Multifile searching with DGENE• Comparisons and conclusions
47
93Sequence code match (SCM) searching in USGENE using RUN GETSEQ
• GETSEQ is designed to retrieve either exact matches to a sequence query, or answers with conservative variation using special symbols
• It can also be used to retrieve exact length matches, or subsequence hits, i.e. where the query is a small part of a larger hit sequence
• GETSEQ can be prove to be a fast, precise and effective alternative to BLAST for very short sequence queries, e.g. DNA probes and primers
The DGENE Workshop Manual is the complete guide (page 38):http://www.stn-international.com/training_center/bioseq/dgene_wm.pdf
94Sequence code match (SCM) searching in USGENE using RUN GETSEQ
Search Question:Find all relevant U.S. published application and patent references which were applied for prior to 2001, disclosing sequences with this fragment:
DSDGLAPPQHLIRV
48
95RUN GETSEQ command syntax
Sequence Code Match searching with GETSEQ
=> RUN GETSEQ L1 (sequence or L-number)/SQEP (exact protein) (default)/SQEFP (exact family protein)/SQSP (subsequence protein)/SQSFP (subsequence family protein)/SQEN (exact nucleotide)/SQSN (subsequence nucleotide)
96Amino acid families for RUN GETSEQ SQEFP and SQSFP search options
F, W, Y Aromatic
H, K, RBasic – hydrophilic
I, M, L, VHydrophobic
CCross-linking
Q, N, E, D, B, ZAcid Amine – hydrophilic
P, A, G, S, TNeutral – weakly hydrophobic
Amino acidsGroup
49
97GETSEQ searches can be combined with other search terms, e.g. application year
There are 17 sequence records in DGENE for CA2325774.
=> FILE USGENE
FILE 'USGENE' ENTERED AT 20:09:16 ON 03 FEB 2008COPYRIGHT (C) 2008 SEQUENCEBASE CORP
FILE LAST UPDATED: 1 FEB 2008 <20080201/UP>MOST RECENT PUBLICATION DATE: 31 JAN 2008 <20080131/PD>
FILE COVERS 1982 TO DATE
>>> SIMULTANEOUS LEFT AND RIGHT TRUNCATION (SLART) IS AVAILABLEIN THE BASIC INDEX (/BI) AND FEATURE TABLE (/FEAT) FIELDS <<<
=> RUN GETSEQ DSDGLAPPQHLIRV/SQSP
RUN GETSEQ AT 20:51:38 ON 03 FEB 2008COPYRIGHT (C) 2008 FIZ KARLSRUHE GMBH
L1 RUN STATEMENT CREATEDL1 114 DSDGLAPPQHLIRV/SQSP
=> S L1 AND AY<2001L2 72 L1 AND AY<2001
72 sequence hits (L2) have been found in USGENE with the containing the sequence fragment of interest.
98The BRIEF format provides full bibliography and abstract ….
There are 17 sequence records in DGENE for CA2325774.
=> D BRIEF
L2 ANSWER 1 OF 72 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN AN 6326464.44 Protein USGENETI P53 protein variants and therapeutic uses thereof (Patent)IN Conseiller Emmanuel (Paris, FR); Bracco Laurent (Paris, FR)PA Aventis Pharma S A (Antony FR)PI US 6326464 B1 20011204
WO 1997004092 A 19970206AI US 1998-983035 19980220RLI WO 1996-FR1111 19960717ED 20070328DT PatentAB Proteins derived from the product of tumor suppressor gene p53 and
having enhanced functions for therapeutical use are disclosed. Theproteins advantageously have enhanced tumour suppressor and programmed cell death inducer functions, particularly inproliferative disease contexts where wild-type p53 protein isinactivated. Nucleic acids coding for such molecules, vectorscontaining same, and therapeutical use thereof, particularly in genetherapy, are also disclosed. Continued on next slide….
This sequence hit comes from a U.S. granted patent, with an application date prior to 2001.
50
99…. plus the exemplary claim and sequence
There are 17 sequence records in DGENE for CA2325774.
ECLM US6326464 B1: What is claimed is:1. A variant of p53 protein wherein a C-terminal portion of the protein comprising a regulation domain and a part of an oligomerization domain is deleted from residue 326 or from residue 337 and replaced by an artificial leucine zipper comprising residues 334-363 of SEQ ID No: 26; anda transactivationdomain is deleted and replaced by a VP16 transactivation domain.
SSO PROTEIN; EMBL; GRANTEDORGN UnknownSQL 335SEQ
1 mgeyftlqir grerfemfre lnealelkda qagkepgrgg ggsggggsgg51 ggsrpapaap tpaapapaps wplsssvpsq ktyqgsygfr lgflhsgtak101 svtctyspal nkmfcqlakt cpvqlwvdst pppgtrvram aiykqsqhmt151 evvrrcphhe rcsdsdglap pqhlirvegn lrveylddrn tfrhsvvvpy
======= ======= 201 eppevgsdct tihynymcns scmggmnrrp iltiitleds sgnllgrnsf251 evrvcacpgr drrteeenlr kkgephhelp pgstkralpn ntssspqpkk301 kpldgdlkal keklkaleek lkaleeklka lvger
HITS AT: 164-177
D BRIEF (cont.)
The hit portion of the answer sequence is highlighted with double underlining.
100Sequence code match (SCM) searching in USGENE using RUN GETSEQ
Search Question:Find all relevant U.S. published application and patent references disclosing one or more of the sequences represented by this Markush:
LGPX1QLCX2VX3CAP
X1 = V or LX2 = any amino acid except, G or HX3 = any amino acid
51
101Variability symbols for RUN GETSEQ sequence code match searches
Alternate sequence expressions|A gap of one residue.A gap of zero or one residues:
Query appears at the beginning or the end of a sequence^Repeat the preceding symbol(s) one or more times+
Concatenate (join together) sequence queries
Repeat the preceding symbol(s) zero or more timesRepeat the preceding symbol(s) zero or one timeRepeat the preceding symbol(s) (number or range)Exclude a specific residue or alternate residuesSpecify alternate residuesFunction
*
{ }?
&
[-][ ]
Symbol
102GETSEQ can be a flexible alternative to BLAST for short sequence queries
There are 17 sequence records in DGENE for CA2325774.
=> FILE USGENE
=> RUN GETSEQ LGP[VL]QLC[-GH]LV.CAP/SQSP
RUN GETSEQ AT 22:40:20 ON 03 FEB 2008COPYRIGHT (C) 2008 FIZ KARLSRUHE GMBH
L1 RUN STATEMENT CREATEDL1 13 LGP[VL]QLC[-GH]LV.CAP/SQSP
=> D TRI SEQ
L1 ANSWER 1 OF 13 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN TI Method and composition for treating and preventing tumor metastasis
in vivo (PublishedApplication)MTY ProteinSQL 20SEQ
1 tvlllgplql calvhcappa====== ========
HITS AT: 5-18
13 sequence hits (L1) have been found in USGENE containing the sequence fragment(s) of interest.
The hit portion of the answer sequence is highlighted with double underlining.
52
103Agenda
• STN sequence searchable databases• USGENE database content• The 7 basic steps of USGENE BLAST®
• BLAST and Patent Family SORT (FSORT)• Post-processing BLAST search results• Sequence Code Match (SCM) with GETSEQ• GETSIM (FASTA) similarity searching• Offline BATCH search mode• Multifile searching with DGENE• Comparisons and conclusions
104Similarity searching in USGENE using FASTA-based RUN GETSIM
• GETSIM was originally developed by FIZ Karlsruhe for DGENE, and it has since been implemented in both PCTGEN and USGENE
• It is based on the industry standard FASTA methodology, and offers the same basic search modes as BLAST (/SQP, /SQN and /TSQN)
• Since GETSIM requires more computational time than BLAST, it is a usually a good idea to make use of the offline BATCH search mode
The DGENE Workshop Manual is the complete guide (page 60):http://www.stn-international.com/training_center/bioseq/dgene_wm.pdf
53
105General differences between FASTA (GETSIM) and BLAST algorithms
Equivalent for highly similar sequences
Calculates probabilities
Less separation between true homologs and random hits
Less sensitive when using default settings
Comparison of shorter sequence parts
Misses some less similar sequences
Faster than FASTA
BLAST
Calculates significance “on the fly” from the given dataset
More separation between true homologs and random hits
More sensitive, misses less homologs
Comparison of entire sequence length
Better for less similar sequences
Slower than BLAST
FASTA (GETSIM)
106Similarity searching in USGENE using FASTA-based RUN GETSIM
Search Question:Find sequences in U.S. published application and patents which are similar to the following nucleic acid query sequence:
GGGUUUAGGAGUGGUAGGUCUUACGAUGCCAGCUGUAAUGCCUACCGGATAA
54
107RUN GETSIM command syntax
Similarity Searching with GETSIM (protein/polypeptides)
=> RUN GETSIM L1 (sequence or L-number)/SQP (protein) (default)
BATCH (offline)ALERT (current awareness)
108RUN GETSIM command syntax
Similarity Searching with GETSIM (nucleotides)
=> RUN GETSIM L1 (sequence or L-number)/SQN (nucleotide)
SIN (single strand) (default)COM (complementary strand)BOTH (both strands)
BATCH (offline)ALERT (current awareness)
55
109Similarity searching in USGENE using FASTA-based RUN GETSIM
There are 17 sequence records in DGENE for CA2325774.
=> FILE USGENE
FILE 'USGENE' ENTERED AT 20:09:16 ON 03 FEB 2008COPYRIGHT (C) 2008 SEQUENCEBASE CORP
FILE LAST UPDATED: 1 FEB 2008 <20080201/UP>MOST RECENT PUBLICATION DATE: 31 JAN 2008 <20080131/PD>
FILE COVERS 1982 TO DATE
=> RUN GETSIM GGGUUUAGGAGUGGUAGGUCUUACGAUGCCAGCUGUAAUGCCUACCGGATAA/SQN
RUN GETSIM AT 20:10:11 ON 03 FEB 2008COPYRIGHT (C) 2008 FIZ KARLSRUHE GMBH
100000 SEQUENCES PROCESSED. . . .
5260000 SEQUENCES PROCESSED
6914 ANSWERS FOUND ABOVE A THRESHOLD OF 56QUERY SELF SCORE VALUE IS 260
6914 sequence hits have been found above the similarity threshold automatically set by STN.
Sequences of less than 256 characters may be searched directly on the command line. Longer sequences must be uploaded (see slides 27-31).
GETSIM calculates a query self score, to help assess answer similarity.
110Decide how many answers to keep
SimilarityScore
251 | | | | | | | |
126 | | | || |||||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Answer Count 1380 2760 4140 5520 6900
HOW MANY ANSWERS WOULD YOU LIKE TO KEEP ? (ALL) OR ?: ALL
The graphic representation gives a count of hit sequences (x-axis) and similarity score (y-axis). The graph gives a visual clue about the proportion of similar and not so similar sequences in the answer set.
Recommendation: keep ALL answers
56
111SORT by SCORE descending
There are 17 sequence records in DGENE for CA2325774.
HOW MANY ANSWERS WOULD YOU LIKE TO KEEP ? (ALL) OR ?: ALLL1 RUN STATEMENT CREATEDL1 6914 GGGUUUAGGAGUGGUAGGUCUUACGAUGCCAGCUGUAAUGCCUACC
GGATAA/SQN
Answer set arranged by accession number; to sort by descendingsimilarity score, enter at an arrow prompt (=>) "sor score d".
=> SOR SCORE DPROCESSING COMPLETED FOR L1 L2 6914 SOR L1 SCORE D
As with a BLAST search, the initial GETSIM search answer set should be sorted by similarity score descending, to bring the best answers to the top.
112Review answers with a free-of-charge format including alignment
There are 17 sequence records in DGENE for CA2325774.
=> D TRI ORGN SCORE ALIGN 1-100L2 ANSWER 1 OF 6914 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN TI Capsid polypeptides and use to inhibit viral packaging (Patent)MTY DNASQL 4580ORGN UnknownSCORE 251 96% of query self score 260ALIGN Smith-Waterman score: 251
52 na overlap starting at 1958 ggguuuaggagugguaggucuuacgaugccagcuguaaugccuaccggataa:::...:::::.::.:::.:..::::.::::::.:.::.:::.:::::: ::gggtttaggagtggtaggtcttacgatgccagctgtaatgcctaccggagaa
L2 ANSWER 2 OF 6914 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN TI Detection kits, such as nucleic acid arrays, for detecting the
expression or 10,000 or more Drosophila genes and uses thereof(PublishedApplication)
MTY DNASQL 2327ORGN DROSOPHILASCORE 101 38% of query self score 260ALIGN Smith-Waterman score: 101
46 na overlap starting at 144 ggagugguaggu_cuuacgaugccagcuguaaugccuaccggataa::::.::. : . : .: : :::::. ::.::: :::: :: :ggagtggtggctccatatgcctccagcttcaatgcccaccgcatca
The GETSIM ALIGN display:• First line: portion of query
with similarity• Second line: similarity
(identical- 2 dots, no match-blank, one dot- family match)
• Third line: portion of retrieved sequence with similarity
The GETSIM SCORE display includes a similarity percentage(Score/Query Self Score x 100).
57
113Display selected USGENE answers in a preferred bibliographic format
There are 17 sequence records in DGENE for CA2325774.
=> D BIB AB ECLM ORGN SQL SCORE ALIGNL2 ANSWER 1 OF 6914 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN AN 5831013.1 DNA USGENETI Capsid polypeptides and use to inhibit viral packaging (Patent)IN Bruenn Jeremy A. (Buffalo, NY); Yao Wensheng (Kenmore, NY)PA The Research Foundation of State University of New York (Amherst NY)PI US 5831013 A 19981103AI US 1996-674351 19960702DT PatentAB The present invention is directed to a viral capsid polypeptide
capable of inhibiting viral packaging, the viral capsid polypeptide consisting of a portion of a viral capsid protein of an RNA virus and including a multimerization domain of the viral capsid protein. The invention further provides an isolated nucleic acid . . . .
ECLM US5831013 A: 1. A viral capsid polypeptide capable of inhibiting viral packaging, said viral capsid polypeptide having an amino acid sequence selected from the group consisting of amino acids 1 to 473 of SEQ ID NO:2 and amino acids 1 to 443 of SEQ ID NO:4.
ORGN UnknownSQL 4580SCORE 251 96% of query self score 260ALIGN Smith-Waterman score: 251
52 na overlap starting at 1958 ggguuuaggagugguaggucuuacgaugccagcuguaaugccuaccggataa:::...:::::.::.:::.:..::::.::::::.:.::.:::.:::::: ::gggtttaggagtggtaggtcttacgatgccagctgtaatgcctaccggagaa
USGENE records can be displayed in a wide variety of customized formats.
114Agenda
• STN sequence searchable databases• USGENE database content• The 7 basic steps of USGENE BLAST®
• BLAST and Patent Family SORT (FSORT)• Post-processing BLAST search results• Sequence Code Match (SCM) with GETSEQ• GETSIM (FASTA) similarity searching• Offline BATCH search mode• Multifile searching with DGENE• Comparisons and conclusions
58
115BLAST and GETSIM similarity searches can both be run offline in BATCH search mode
• Multiple BATCH requests may be queued, to run sequentially one after another– A maximum of 16 requests can be queued per STN Login ID
• BATCH request results may be collected in an online session up to 3 months from initiation– Already retrieved results may be re-retrieved multiple times at no
additional cost, up to 8 days from the initial retrieval
• BATCH is most useful for GETSIM queries, as these can take considerable computational time when run online– Also a higher query length limit of 2,000 characters is permitted
116Similarity searching in USGENE using GETSIM in offline BATCH mode
There are 17 sequence records in DGENE for CA2325774.
=> FILE USGENEFILE 'USGENE' ENTERED AT 20:40:17 ON 03 FEB 2008COPYRIGHT (C) 2008 SEQUENCEBASE CORP
FILE LAST UPDATED: 1 FEB 2008 <20080201/UP>MOST RECENT PUBLICATION DATE: 31 JAN 2008 <20080131/PD>
FILE COVERS 1982 TO DATE
=> RUN GETSIM GGGUUUAGGAGUGGUAGGUCUUACGAUGCCAGCUGUAAUGCCUACCGGATAA/SQN BOTH BATCH
PLEASE ENTER BATCH IDENTIFIER (MAX. 8 CHARS): EXAMPLE3
RUN GETSIM AT 20:40:44 ON 03 FEB 2008COPYRIGHT (C) 2008 FIZ KARLSRUHE GMBH
BATCH PROCESSING STARTED FOR EXAMPLE3
=> LOG HSESSION WILL BE HELD FOR 120 MINUTESSTN INTERNATIONAL SESSION SUSPENDED AT 20:41:23 ON 03 FEB 2008
To automatically search the nucleotide sequence and its complement specify BOTH.
Add BATCH for BATCH mode.
Name the BATCH search.
Most GETSIM searches take between 5 and 20 minutes to run.
59
117Use RUN GETBATCH to retrieve and manage the results of BATCH searches
There are 17 sequence records in DGENE for CA2325774.
* * * * * * RECONNECTED TO STN INTERNATIONAL * * * * * * SESSION RESUMED IN FILE 'USGENE' AT 20:57:23 ON 03 FEB 2008FILE 'USGENE' ENTERED AT 20:57:23 ON 03 FEB 2008
=> RUN GETBATCHPlease enter your batch identifier
or enter # for batch id listor enter * for batch id at top of listor enter - before batch id to deleteor enter . for (end)
BATCH REQUEST: #Batch result files remaining:EXAMPLE1 Retrieved (getsim) EXAMPLE2 Retrieved (getsim) EXAMPLE3 Completed (getsim)
-----------------------Please enter your batch identifier
or enter # for batch id listor enter * for batch id at top of listor enter - before batch id to deleteor enter . for (end)
BATCH REQUEST: EXAMPLE3
Login with 2 hours if you want to reconnect to your previous STN session.
BATCH result files status can be: Queued, Running, Completed or Retrieved.
Enter # for a BATCH ID list.
Enter the name of the BATCH search results to retrieve.
118Decide how many answers to keep
5230 ANSWERS FOUND ABOVE A THRESHOLD OF 66QUERY SELF SCORE VALUE IS 260
SimilarityScore
251 | | | | | | | | |
126 | | | ||||| ||||||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Answer Count 1050 2100 3150 4200 5250
HOW MANY ANSWERS WOULD YOU LIKE TO KEEP ? (ALL) OR ?: ALL
The graphic representation gives a count of hit sequences (x-axis) and similarity score (y-axis). The graph gives a visual clue about the proportion of similar and not so similar sequences in the answer set.
Recommendation: keep ALL answers
60
119After BATCH retrieval, all search, sort and display
options are the same as in online search mode
There are 17 sequence records in DGENE for CA2325774.
L1 RUN STATEMENT CREATEDL1 5230 GGGUUUAGGAGUGGUAGGUCUUACGAUGCCAGCUGUAAUGCCUACCGGAT
AA/SQN.BOTH
Answer set arranged by accession number; to sort by descendingsimilarity score, enter at an arrow prompt (=>) "sor score d".Batch result files remaining:EXAMPLE1 Retrieved (getsim) EXAMPLE2 Retrieved (getsim) EXAMPLE3 Retrieved (getsim)
-----------------------
=> SOR SCORE DPROCESSING COMPLETED FOR L1 L2 5230 SOR L1 SCORE D
=> D TRI ORGN ALIGN 1-10
L2 ANSWER 1 OF 5230 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN TI Capsid polypeptides and use to inhibit viral packaging (Patent)MTY DNASQL 4580ORGN UnknownALIGN Smith-Waterman score: 251
52 na overlap starting at 1958 ggguuuaggagugguaggucuuacgaugccagcuguaaugccuaccggataa:::...:::::.::.:::.:..::::.::::::.:.::.:::.:::::: ::gggtttaggagtggtaggtcttacgatgccagctgtaatgcctaccggagaa
BATCH result files status can be: Queued, Running, Completed or Retrieved.
120Agenda
• STN sequence searchable databases• USGENE database content• The 7 basic steps of USGENE BLAST®
• BLAST and Patent Family SORT (FSORT)• Post-processing BLAST search results• Sequence Code Match (SCM) with GETSEQ• GETSIM (FASTA) similarity searching• Offline BATCH search mode• Multifile searching with DGENE• Comparisons and conclusions
61
121Background: USGENE and DGENE capture
sequences from different patent family members
AN .... Protein USGENEPI US …. A1SEQ 1 ….
AN .... DNA USGENEPI US …. A1SEQ 2 ….
AN .... Protein USGENEPI US …. B2SEQ 1 ….
AN .... DNA USGENEPI US …. B2SEQ 2 ….
AN .... Protein DGENEPI WO …. A1SEQ 1 ….
AN .... DNA DGENEPI WO …. A1SEQ 2 ….
WPINDEX = Derwent World Patents Index® on STN
DGENE = GENESEQTM on STN
USGENE® = USPTO Genetic Sequence Database
AN .... WPINDEX
PI WO ….. A1
FR ….. A1
EP ….. A1
US ….. A1
EP ….. B1
US ….. B2
DGENE sequences are indexed by Thomson from DWPI basic publications.
122The “best-practice” recipe for multifile searching incorporates DWPI patent families
DGENESequences
USGENESequences
DWPIPatent
Families
PN
The connection between DWPI and patent sequence databases DGENE and USGENE is via publication numbers (PN).
PNPN
PN
62
123The basic mechanics of the “best-practice”multifile patent sequence search
1) Ensure preferred file default display formats are set2) UPLOAD the sequence query via STN Express (L1)3) USGENE: BLAST (L2); SORT SCORE D (L3); review
and isolate chosen hits with SORT AN 1-x (L4)4) DGENE: BLAST (L5); SORT SCORE D (L6); review
and isolate selected hits with SORT AN 1-x (L7)5) WPINDEX: TRA PN L4 (L9); TRA PN L7 (L11);
combine answer sets L9 OR L11 (L12)6) Merge: DUP IDE L4 L7 L12 (L13); FSORT (L14)7) Display results: D PFAM=1- TOTAL8) Remember to capture your transcript and logoff
124The basic mechanics of a the “best-practice”multifile patent sequence search
Search Question:Find relevant patent references for Eukaryotic translation elongation factor 1 gamma (NP_001395)MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPAFEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGIMHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSFRQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREEKQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNEDTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASVILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWEGAFQHVGKAFNQGKIFK
( Search conducted on February 6th, 2008.)
63
1251) Ensure preferred file default display formats are set
=> FILE STNGUIDE
=> SET FORMAT .MYUSGENEALIGN TRI ORGN SEQN SEQC ALIGN
=> SET FORMAT .MYDGENEALIGN TRI OS ALIGN
=> SET FORMAT .MYWPINDEX BIB
=> FILE USGENE; SET DFORMAT .MYUSGENEALIGN
=> FILE DGENE; SET DFORMAT .MYDGENEALIGN
=> FILE WPINDEX; SET DFORMAT .MYWPINDEX
=> D FORMAT
USER-DEFINED FORMAT DEFINITION DEFAULT FORMATFOR FILE
------------------- ------------------------- -------------------.MYDGENEALIGN TRI OS ALIGN DGENE .MYUSGENEALIGN TRI ORGN SEQN SEQC ALIGN USGENE .MYWPINDEX BIB WPINDEX
A simple STN script can be used to issue these commands automatically.
126
From the Discover! button menu.
2) UPLOAD the sequence query (1) Click Upload Sequence.(2) Choose file of interest.(3) Select database.
(2)
(1)
(3)
The sequence becomes a Query L-number in the database of choice for use with RUN BLAST.
64
1272) UPLOAD the sequence query (cont.)
=> FILE USGENE
=> UPL R BLAST
UPLOAD SUCCESSFULLY COMPLETEDL1 GENERATED
=> D L1 LQUE
L1 ANSWER 1 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN LQUE MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVP
AFEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGIMHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSFRQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREEKQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNEDTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRK NAFASVILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWEGAFQHVGKAFNQGKIFK
=>
The sequence query is now ready for searching in USGENE and DGENE using the L-number (L1).
These commands are automatically run by the STN Express Sequence Query Upload wizard.
Verify that the UPLOAD was successful with D LQUE.
1283) RUN the USGENE BLAST search
=> FILE USGENE
FILE 'USGENE' ENTERED AT 04:59:03 ON 06 FEB 2008COPYRIGHT (C) 2008 SEQUENCEBASE CORP
FILE LAST UPDATED: 1 FEB 2008 <20080206/UP>MOST RECENT PUBLICATION DATE: 31 JAN 2008 <20080131/PD>
FILE COVERS 1982 TO DATE
>>> SIMULTANEOUS LEFT AND RIGHT TRUNCATION (SLART) IS AVAILABLEIN THE BASIC INDEX (/BI) AND FEATURE TABLE (/FEAT) FIELDS <<<
=> RUN BLAST L1 /SQP -F F
BLAST Version 2.2The BLAST software is used herein with permission of the National Center for Biotechnology Information (NCBI) of the National Library of Medicine (NLM).
Turn the Low Complexity Filter off for the protein (SQP) search using… /SQP –F F
USGENE is updated within 3 daysof publication by the USPTO.
65
129
1593 ANSWERS FOUND BELOW EXPECTATION VALUE OF 10.0
SimilarityScore
902 | | | | | | | | |
451 | | | | ||| ||| |||| ||||| ||||||||||||||||||||||||||||||||||||||||||||||||||
Answer Count 320 640 960 1280 1600
HOW MANY ANSWERS WOULD YOU LIKE TO KEEP ? (ALL) OR ?: ALL
3) RUN the USGENE BLAST search (cont.)
The graphic representation gives a count of hit sequences (x-axis) and similarity score (y-axis). The graph gives a visual clue about the proportion of similar and not so similar sequences in the answer set.
Recommendation: keep ALL answers.
1303) RUN the USGENE BLAST search (cont.)
HOW MANY ANSWERS WOULD YOU LIKE TO KEEP ? (ALL) OR ?: ALLL2 RUN STATEMENT CREATED
L2 1593 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFL
RKFPAGKVPAFEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVS
FADSDIVPPASTWVFPTLGIMHHNKQATENAKEEVRRILGLLDAYLKTRT
FLVGERVTLADITVVCTLLWLYKQVLEPSFRQAFPNTNRWFLTCINQPQF
RAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREEKQKPQAERKE
EKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE
DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLD
KLRKNAFASVILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWR
KLDPGSEETQTLVREYFSWEGAFQHVGKAFNQGKIFK/SQP.-F F
Answer set arranged by accession number; to sort by descending
similarity score, enter at an arrow prompt (=>) "sor score d".
=> SOR SCORE DPROCESSING COMPLETED FOR L2
L3 1593 SOR L2 SCORE D
Use SORT SCORE D to sort by descending BLAST score.
66
1313) RUN the USGENE BLAST search (cont.)
=> D 1-30
L3 ANSWER 1 OF 1593 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN TI Tissue-and serum-derived glycoproteins and methods of their use
(PublishedApplication)MTY ProteinSQL 437ORGN Homo SapiensSEQN 10979SEQC 14918BLASTALIGN
Query = 437 lettersLength = 437Score = 902 bits (2331), Expect = 0.0Identities = 437/437 (100%), Positives = 437/437 (100%)Query: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA
MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPASbjct: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPAQuery: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI
FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGISbjct: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI
. . . .
Review answers in the free-of-charge default format, including alignment.
The top BLAST score is 902.
1323) RUN the USGENE BLAST search (cont.)
=> D SCORE 1-20
L3 ANSWER 1 OF 1593 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 902 . . . .
L3 ANSWER 10 OF 1593 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 788
L3 ANSWER 11 OF 1593 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 717
L3 ANSWER 12 OF 1593 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 656 . . . .
L3 ANSWER 20 OF 1593 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 290
=> SOR AN 1-10PROCESSING COMPLETED FOR L3 L4 10 SOR L3 1-10 AN
Another way to review quickly is by BLAST SCORE.
Gather selected USGENE hits into a new L-number with SORT AN (L4).
67
1334) RUN the DGENE BLAST search
=> FILE DGENE
=> RUN BLAST L1 /SQP -F F. . . .
HOW MANY ANSWERS WOULD YOU LIKE TO KEEP ? (ALL) OR ?: ALLL5 RUN STATEMENT CREATEDL5 1904 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFL
RKFPAGKVPAFEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGIMHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSFRQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREEKQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNEDTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASVILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWEGAFQHVGKAFNQGKIFK/SQP.-F F
Answer set arranged by accession number; to sort by descendingsimilarity score, enter at an arrow prompt (=>) "sor score d".
=> SOR SCORE DPROCESSING COMPLETED FOR L5 L6 1904 SOR L5 SCORE D
Turn the Low Complexity Filter off for the protein (SQP) search using… /SQP –F F
Use SORT SCORE D to sort by descending BLAST score.
1344) RUN the DGENE BLAST search (cont.)
=> D 1-30
L6 ANSWER 1 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN AN AEL43555 protein DGENETI New human cancer suppressor proteins and DNA, useful for diagnosing,
preventing, and treating human cancers, e.g. cancer of the breast,brain, heart, muscles, large intestine, thymus, spleen, kidney,liver, or small intestine.
DESC Human cancer suppressor protein GIG35.KW diagnosis; therapeutic; prophylaxis; gene therapy; cancer; tumor;
neoplasm; cytostatic; GIG35; EEF1G.SQL 437OS 2006-747536 [76]BLASTALIGN
Query = 437 lettersLength = 437Score = 902 bits (2331), Expect = 0.0Identities = 437/437 (100%), Positives = 437/437 (100%)Query: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA
MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPASbjct: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA
. . . .
Review answers in the free-of-charge default format, including alignment.
The top BLAST score is 902.
68
1354) RUN the DGENE BLAST search (cont.)
=> D SCORE 1-20
L6 ANSWER 1 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 902 . . . .
L6 ANSWER 14 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 880
L6 ANSWER 15 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 880
L6 ANSWER 16 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 717 . . . .
L6 ANSWER 20 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 495
=> SOR AN 1-15PROCESSING COMPLETED FOR L6 L7 15 SOR L6 1-15 AN
Another way to review quickly is by BLAST SCORE.
Gather selected DGENE hits into a new L-number with SORT AN (L7).
1365) Transfer PNs from USGENE and DGENE and combine answer sets in DWPI
=> FILE WPINDEX
=> TRA L4 PN; TRA L7 PN
L8 TRANSFER L4 1- PN : 9 TERMSL9 8 L8
L10 TRANSFER L7 1- PN : 12 TERMSL11 12 L10
=> S L9 OR L11L12 16 L9 OR L11
L4 = USGENE selected BLAST hits.L7 = DGENE selected BLAST hits.
Total DWPI records is 16 (L12) – both USGENE and DGENE have found unique DWPI patent families!
10 USGENE sequence hits (L4) found 8 DWPI records (L9)
15 DGENE sequence hits (L7) found 12 DWPI records (L11).
69
1376) Merge results with duplicate identify (DUP IDE) and sort by patent family (FSORT)
=> DUP IDE L4 L7 L12
DUPLICATE IS NOT AVAILABLE IN 'USGENE, DGENE'. ANSWERS FROM THESE FILES WILL BE CONSIDERED UNIQUE
FILE 'USGENE' ENTERED AT 05:16:23 ON 06 FEB 2008COPYRIGHT (C) 2008 SEQUENCEBASE CORP
FILE 'DGENE' ENTERED AT 05:16:23 ON 06 FEB 2008COPYRIGHT (C) 2008 THE THOMSON CORPORATION
FILE 'WPINDEX' ENTERED AT 05:16:23 ON 06 FEB 2008COPYRIGHT (C) 2008 THE THOMSON CORPORATION
PROCESSING COMPLETED FOR L4 PROCESSING COMPLETED FOR L7 PROCESSING COMPLETED FOR L12
L13 41 DUP IDE L4 L7 L12 (INCLUDES 0 SETS OF DUPLICATES)ANSWERS '1-10' FROM FILE USGENE ANSWERS '11-25' FROM FILE DGENE ANSWERS '26-41' FROM FILE WPINDEX
L4 = USGENE selected BLAST hits.L7 = DGENE selected BLAST hits.L12 = corresponding DWPI records.
138
AN .... WPINDEX
PI GB ….. A1
JP ….. A1
US ….. B1
Note that an INPADOC (FSORT) patent family may be represented by one or more DWPI records
AN .... DNA USGENEPI US …. B1SEQ 1 ….
AN .... DNA USGENEPI US …. B1SEQ 1 ….
AN .... Protein DGENEPI WO …. A1SEQ 1 ….
AN .... DNA DGENEPI GB …. A1SEQ 1 ….
AN .... WPINDEX
PI WO ….. A1
EP ….. A1
US ….. B1
AN …. INPADOCDB
PI WO ….. A1
EP ….. A1
US ….. B1
GB ….. A1
JP ….. A1
US ….. B1
70
1396) Merge results with DUP IDE and sort by patent family (FSORT) (cont.)
=> FSORT L13
L14 41 FSO L13
15 Multi-record Families Answers 1-41Family 1 Answers 1-8Family 2 Answers 9-11Family 3 Answers 12-13Family 4 Answers 14-15Family 5 Answers 16-17Family 6 Answers 18-19Family 7 Answers 20-22Family 8 Answers 23-25Family 9 Answers 26-28Family 10 Answers 29-30Family 11 Answers 31-32Family 12 Answers 33-34Family 13 Answers 35-37Family 14 Answers 38-39Family 15 Answers 40-41
0 Individual Records 0 Non-patent Records
The 16 DWPI records (L12), 10 USGENE sequence hits and 15 DGENE sequence hits belong to 15 FSORT families (L14).
140Use the patent family display (PFAM) feature to
display selective records from a FSORT L-number
General format of PFAM:=> D L# PFAM=# RECORD# FORMAT
Examples using PFAM:=> D PFAM=1-10
1st member of patent family number 1-10 in default display format
=> D PFAM=2 TRI ORGN ALIGN TOTAL
All members of family number 2 in a free sequence review format
71
1417) Display results using the customized file default display formats (see slide 7)
=> D PFAM=1- TOTAL. . . .
L14 ANSWER 9 OF 41 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN FAMILY2TI Compositions and methods for the diagnosis and treatment of tumor
(PublishedApplication)MTY ProteinSQL 437ORGN Homo SapiensSEQN 2421SEQC 6355BLASTALIGN
Query = 437 lettersLength = 437Score = 889 bits (2296), Expect = 0.0Identities = 430/437 (98%), Positives = 433/437 (98%)Query: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA
MAAGTLYTYPENWRAFKALIAAQYSGAQ+RVLSAPPHFHFGQTNRT EFLRKFPAGKVPASbjct: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQIRVLSAPPHFHFGQTNRTSEFLRKFPAGKVPAQuery: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI
FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGISbjct: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI
. . . .
This displays all records (TOTAL), from all families (PFAM=1-) in file default format.
USGENE hit sequence display(s).
1427) Display results using the customized file default display formats (cont.)
L14 ANSWER 10 OF 41 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN FAMILY2AN ABM80939 protein DGENETI New tumor-associated antigenic target polypeptides and nucleic acids,
useful in preparing a medicament for treating or detecting a proliferative disorder, e.g. breast, lung, colorectal, ovarian or prostate cancer or tumor.
DESC Tumour-associated antigenic target (TAT) polypeptide PRO81615, SEQ:2421.
KW Tumour-associated antigenic target; TAT; human; overexpression; cancer; tumour; diagnosis; cell proliferative disorder; breast cancer; colorectal cancer; lung cancer; ovarian cancer; . . . .
SQL 437OS 2004-347921 [32]BLASTALIGN
Query = 437 lettersLength = 437Score = 889 bits (2296), Expect = 0.0Identities = 430/437 (98%), Positives = 433/437 (98%)Query: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA
MAAGTLYTYPENWRAFKALIAAQYSGAQ+RVLSAPPHFHFGQTNRT EFLRKFPAGKVPASbjct: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQIRVLSAPPHFHFGQTNRTSEFLRKFPAGKVPA
. . . .
DGENE hit sequence display(s).
72
1437) Display results using the customized file default display formats (cont.)
L14 ANSWER 11 OF 41 WPINDEX COPYRIGHT 2008 THE THOMSON CORP on STN FAMILY2AN 2004-347921 [32] WPINDEXTI New tumor-associated antigenic target polypeptides and nucleic acids,
useful in preparing a medicament for treating or detecting a proliferative disorder, e.g. breast, lung, colorectal, ovarian or prostate cancer or tumor
DC B04; D16; S03IN WU T D; ZHANG Z; ZHOU YPA (GETH-C) GENENTECH INCCYC 105PIA WO 2004030615 A2 20040415 (200432)* EN 7273[635] <--
AU 2003295328 A1 20040423 (200465) ENEP 1594447 A2 20051116 (200575) ENJP 2006516089 W 20060622 (200641) JA 1466
ADT WO 2004030615 A2 WO 2003-US28547 20030929; AU 2003295328 A1 AU 2003-295328 20030929; EP 1594447 A2 EP 2003-786510 20030929;EP 1594447 A2 WO 2003-US28547 20030929; JP 2006516089 WWO 2003-US28547 20030929; JP 2006516089 W JP 2004-541530 20030929
FDT AU 2003295328 A1 Based on WO 2004030615 A; EP 1594447 A2 Based on WO 2004030615 A; JP 2006516089 W Based on WO 2004030615 A
PRAI US 2002-414971P 20021002
WPINDEX patent family display.
144Summary of results for Eukaryotic translation
elongation factor 1 gamma (NP_001395)
151621--Total
440--Overlap
88*9101593USGENE
111212151904DGENE
FSORT Families
DWPI RecordsPNsSEQs
> 80% SEQs
( * USGENE US20070224201 was not present in DWPI, as of February 6th, 2008.)
73
145Example: USGENE unique retrieval
L14 ANSWER 18 OF 41 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN FAMILY6TI Genetic polymorphisms associated with coronary heart disease, methods
of detection and uses thereof (PublishedApplication)MTY ProteinSQL 437ORGN Homo SapiensSEQN 138SEQC 17377BLASTALIGN
Query = 437 lettersLength = 437Score = 889 bits (2296), Expect = 0.0Identities = 430/437 (98%), Positives = 433/437 (98%)Query: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA
MAAGTLYTYPENWRAFKALIAAQYSGAQ+RVLSAPPHFHFGQTNRT EFLRKFPAGKVPASbjct: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQIRVLSAPPHFHFGQTNRTSEFLRKFPAGKVPAQuery: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI
FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGISbjct: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGIQuery: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF
MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSFSbjct: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF
. . . .
This USGENE hit sequence uniquely retrieved the DWPI record on the following slide (as of Feb 6th 2008).
146Example: USGENE unique retrieval (cont.)
L14 ANSWER 19 OF 41 WPINDEX COPYRIGHT 2008 THE THOMSON CORP on STN FAMILY6AN 2005-630949 [64] WPINDEX TI New isolated nucleic acid molecule comprising a single nucleotide
polymorphism, useful for identifying an individual at an increased risk of developing coronary heart disease, or for treating or preventing myocardial infarction
DC B04; D16IN CARGILL M; DEVLIN J; DEVLIN J J; SHIFFMAN D; CARGILL M C G; SHIFFMANPA (APPL-N) APPLERA CORPCYC 108PIA WO 2005087953 A2 20050922 (200564)* EN 135[1]
US 20060228715 A1 20061012 (200668) EN <--EP 1745147 A2 20070124 (200708) EN
ADT WO 2005087953 A2 WO 2005-US7453 20050307; US 20060228715 A1 Provisional US 2004-550051P 20040305; US 20060228715 A1 Provisional US 2004-567831P 20040505; US 20060228715 A1 Provisional US 2004-617163P 20041012; US 20060228715 A1 US 2005-73360 20050307; EP 1745147 A2EP 2005-724897 20050307; EP 1745147 A2 WO 2005-US7453 20050307
FDT EP 1745147 A2 Based on WO 2005087953 APRAI US 2004-617163P 20041012
US 2004-550051P 20040305US 2004-567831P 20040505US 2005-73360 20050307
This relevant DWPI record was uniquely retrieved via a USGENE BLAST search (on Feb 6th, 2008).
74
147Agenda
• STN sequence searchable databases• USGENE database content• The 7 basic steps of USGENE BLAST®
• BLAST and Patent Family SORT (FSORT)• Post-processing BLAST search results• Sequence Code Match (SCM) with GETSEQ• GETSIM (FASTA) similarity searching• Offline BATCH search mode• Multifile searching with DGENE• Comparisons and conclusions
148How does USGENE compare to other USPTO sequence data sources?
1981 -65 daysBiweeklyDGENE (DWPI basics)
1982 -
1957 -
1982 -
Backfile coverage
1-3 monthsDailyNCBI/EMBL
27 daysDailyREGISTRY(CAplus basics)
3 daysWeeklyUSGENE
Value added
Typical Timeliness
Update Frequency
75
149How does USGENE compare to other USPTO sequence data sources? (cont.)
DGENE (DWPI basics)
REGISTRY(CAplus basics)
USGENE
NCBI/EMBL
Value added
USPTO claims text
USPTO Patents
USPTO PGPs
150Comparing STN databases…
• DGENE– The most comprehensive patent sequence database– Implemented in-house at major patent offices
• REGISTRY– More timely than DGENE; complementary indexing– Unique non-patent literature coverage
• USGENE– More timely than DGENE and REGISTRY (3 days)– Sequences from equivalent USPTO applications and patents
• PCTGEN– The most timely database (24 hours)– Sequences from equivalent WIPO/PCT publications
76
151Conclusions
• USGENE is a vital new tool for business critical patent searches, providing a complete collection of U.S. Issued Patent sequences with searchable claims text
• USGENE also provides a collection of published application sequence data, not covered by NCBI/EMBL
• USGENE provides the most timely source of USPTO patent sequence data – within 3 days of publication
• DGENE and REGISTRY provide additional value-added indexing for U.S. patents and published applications
• DGENE, REGISTRY and USGENE are all required for a comprehensive search of USPTO sequence data
152Visit www.fiz-k.com/usgene for the latest USGENE reference materials
BLA
ST
OP
G
Acc
essi
on
Num
ber
Title
Pat
ent
Ass
igne
eIn
vent
orA
bstra
ctE
xem
plar
y C
laim
BLA
ST
Sco
reS
eque
nce
Leng
thB
LAS
T A
lignm
ent
Org
anis
m
6284
740.
5
cDN
A
US
GE
NE
Ost
eopr
oteg
erin
(P
aten
t)A
mge
n In
c (T
hous
and
Oak
s C
A)
Boy
le W
illia
m J
. (M
oorp
ark,
CA
); La
cey
Dav
id L
. (T
hous
and
Oak
s,
CA
); C
alzo
ne
Fran
k J.
(W
estla
ke V
illag
e,
CA
); C
hang
Min
g-S
hi (N
ewbu
ry
Par
k, C
A)
The
pres
ent i
nven
tion
disc
lose
s a
nove
l sec
rete
d po
lype
ptid
e, te
rmed
O
steo
prot
eger
in, w
hich
is a
m
embe
r of t
he tu
mor
ne
cros
is fa
ctor
rece
ptor
su
perfa
mily
and
is in
volv
ed
in th
e re
gula
tion
of b
one
met
abol
ism
. Als
o di
sclo
sed
are
nucl
eic
acid
s en
codi
ng
Ost
eopr
oteg
erin
, po
lype
ptid
es, r
ecom
bina
nt
vect
ors
and
host
cel
ls fo
r ex
pres
sion
, ant
ibod
ies
whi
ch b
ind
Ost
eopr
oteg
erin
, an
d ph
arm
aceu
tical
co
mpo
sitio
ns. T
he
poly
pept
ides
are
use
d to
tre
at b
one
dise
ases
ch
arac
teriz
ed b
y in
crea
sed
reso
rptio
n su
ch a
s os
teop
oros
is.
US
6284
740
B1:
Wha
t is
clai
med
is:1
. A m
etho
d of
in
crea
sing
leve
ls o
f os
teop
rote
gerin
in a
m
amm
al c
ompr
isin
g ad
min
iste
ring
to th
e m
amm
al a
nuc
leic
aci
d en
codi
ng o
steo
prot
eger
in,
whe
rein
the
adm
inis
tratio
n re
sults
in a
n in
crea
se in
the
leve
l of o
steo
prot
eger
in
and
whe
rein
the
incr
ease
in
the
leve
l of
oste
ogro
tege
rin in
the
mam
mal
resu
lts in
in
crea
sed
bone
den
sity
.
2686
1355
Query = 1355 letters
Length = 1355
Score = 2686 bits (1355), Expect = 0.0
Identities = 1355/1355 (100%)
Strand = Plus / Plus
Query: 1 gtatatataacgtgatgagcgtacgggtgcggagacgcaccggagcgctcgcccagccgc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1 gtatatataacgtgatgagcgtacgggtgcggagacgcaccggagcgctcgcccagccgc
Query: 61 cgctccaagcccctgaggtttccggggaccacaatgaacaagttgctgtgctgcgcgctc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 61 cgctccaagcccctgaggtttccggggaccacaatgaacaagttgctgtgctgcgcgctc
Query: 121 gtgtttctggacatctccattaagtggaccacccaggaaacgtttcctccaaagtacctt
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 121 gtgtttctggacatctccattaagtggaccacccaggaaacgtttcctccaaagtacctt
Query: 181 cattatgacgaagaaacctctcatcagctgttgtgtgacaaatgtcctcctggtacctac
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 181 cattatgacgaagaaacctctcatcagctgttgtgtgacaaatgtcctcctggtacctac
Query: 241 ctaaaacaacactgtacagcaaagtggaagaccgtgtgcgccccttgccctgaccactac
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 241 ctaaaacaacactgtacagcaaagtggaagaccgtgtgcgccccttgccctgaccactac
Query: 301 tacacagacagctggcacaccagtgacgagtgtctatactgcagccccgtgtgcaaggag
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 301 tacacagacagctggcacaccagtgacgagtgtctatactgcagccccgtgtgcaaggag
Query: 361 ctgcagtacgtcaagcaggagtgcaatcgcacccacaaccgcgtgtgcgaatgcaaggaa
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 361 ctgcagtacgtcaagcaggagtgcaatcgcacccacaaccgcgtgtgcgaatgcaaggaa
Query: 421 gggcgctaccttgagatagagttctgcttgaaacataggagctgccctcctggatttgga
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 421 gggcgctaccttgagatagagttctgcttgaaacataggagctgccctcctggatttgga
Query: 481 gtggtgcaagctggaaccccagagcgaaatacagtttgcaaaagatgtccagatgggttc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 481 gtggtgcaagctggaaccccagagcgaaatacagtttgcaaaagatgtccagatgggttc
not p
rovi
ded
6284
728.
5
cDN
A
US
GE
NE
Ost
eopr
oteg
erin
(P
aten
t)A
mge
n In
c (T
hous
and
Oak
s C
A)
Boy
le W
illia
m J
. (M
oorp
ark,
CA
); La
cey
Dav
id L
. (T
hous
and
Oak
s,
CA
); C
alzo
ne
Fran
k J.
(W
estla
ke V
illag
e,
CA
); C
hang
Min
g-S
hi (N
ewbu
ry
Par
k, C
A)
The
pres
ent i
nven
tion
disc
lose
s a
nove
l sec
rete
d po
lype
ptid
e, te
rmed
O
steo
prot
eger
in, w
hich
is a
m
embe
r of t
he tu
mor
ne
cros
is fa
ctor
rece
ptor
su
perfa
mily
and
is in
volv
ed
in th
e re
gula
tion
of b
one
met
abol
ism
. Als
o di
sclo
sed
are
nucl
eic
acid
s en
codi
ng
Ost
eopr
oteg
erin
, po
lype
ptid
es, r
ecom
bina
nt
vect
ors
and
host
cel
ls fo
r ex
pres
sion
, ant
ibod
ies
whi
ch b
ind
Ost
eopr
oteg
erin
, an
d ph
arm
aceu
tical
co
mpo
sitio
ns. T
he
poly
pept
ides
are
use
d to
tre
at b
one
dise
ases
ch
arac
teriz
ed b
y in
crea
sed
reso
rptio
n su
ch a
s os
teop
oros
is.
US
6284
728
B1:
Wha
t is
clai
med
is:1
. An
isol
ated
po
lype
ptid
e co
nsis
ting
of
the
amin
o ac
id s
eque
nce
as s
how
n in
FIG
. 9B
(SE
Q
ID N
O:6
) fro
m re
sidu
es 2
2 to
401
incl
usiv
e.
2686
1355
Query = 1355 letters
Length = 1355
Score = 2686 bits (1355), Expect = 0.0
Identities = 1355/1355 (100%)
Strand = Plus / Plus
Query: 1 gtatatataacgtgatgagcgtacgggtgcggagacgcaccggagcgctcgcccagccgc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1 gtatatataacgtgatgagcgtacgggtgcggagacgcaccggagcgctcgcccagccgc
Query: 61 cgctccaagcccctgaggtttccggggaccacaatgaacaagttgctgtgctgcgcgctc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 61 cgctccaagcccctgaggtttccggggaccacaatgaacaagttgctgtgctgcgcgctc
Query: 121 gtgtttctggacatctccattaagtggaccacccaggaaacgtttcctccaaagtacctt
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 121 gtgtttctggacatctccattaagtggaccacccaggaaacgtttcctccaaagtacctt
Query: 181 cattatgacgaagaaacctctcatcagctgttgtgtgacaaatgtcctcctggtacctac
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 181 cattatgacgaagaaacctctcatcagctgttgtgtgacaaatgtcctcctggtacctac
Query: 241 ctaaaacaacactgtacagcaaagtggaagaccgtgtgcgccccttgccctgaccactac
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 241 ctaaaacaacactgtacagcaaagtggaagaccgtgtgcgccccttgccctgaccactac
Query: 301 tacacagacagctggcacaccagtgacgagtgtctatactgcagccccgtgtgcaaggag
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 301 tacacagacagctggcacaccagtgacgagtgtctatactgcagccccgtgtgcaaggag
Query: 361 ctgcagtacgtcaagcaggagtgcaatcgcacccacaaccgcgtgtgcgaatgcaaggaa
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 361 ctgcagtacgtcaagcaggagtgcaatcgcacccacaaccgcgtgtgcgaatgcaaggaa
Query: 421 gggcgctaccttgagatagagttctgcttgaaacataggagctgccctcctggatttgga
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 421 gggcgctaccttgagatagagttctgcttgaaacataggagctgccctcctggatttgga
Query: 481 gtggtgcaagctggaaccccagagcgaaatacagtttgcaaaagatgtccagatgggttc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 481 gtggtgcaagctggaaccccagagcgaaatacagtttgcaaaagatgtccagatgggttc
not p
rovi
ded
Pag
e 1
of 5
BLA
ST
OP
G
Acc
essi
on
Num
ber
Title
Pat
ent
Ass
igne
eIn
vent
orA
bstra
ctE
xem
plar
y C
laim
BLA
ST
Sco
reS
eque
nce
Leng
thB
LAS
T A
lignm
ent
Org
anis
m
6288
032.
5
DN
A
US
GE
NE
Ost
eopr
oteg
erin
(P
aten
t)A
mge
n In
c (T
hous
and
Oak
s C
A)
Boy
le W
illia
m J
. (M
oorp
ark,
CA
); La
cey
Dav
id L
. (T
hous
and
Oak
s,
CA
); C
alzo
ne
Fran
k J.
(W
estla
ke V
illag
e,
CA
); C
hang
Min
g-S
hi (N
ewbu
ry
Par
k, C
A)
The
pres
ent i
nven
tion
disc
lose
s a
nove
l sec
rete
d po
lype
ptid
e, te
rmed
O
steo
prot
eger
in, w
hich
is a
m
embe
r of t
he tu
mor
ne
cros
is fa
ctor
rece
ptor
su
per f
amily
and
is in
volv
ed
in th
e re
gula
tion
of b
one
met
abol
ism
. Als
o di
sclo
sed
are
nucl
eic
acid
s en
codi
ng
Ost
eopr
oteg
erin
, po
lype
ptid
es, r
ecom
bina
nt
vect
ors
and
host
cel
ls fo
r ex
pres
sion
, ant
ibod
ies
whi
ch b
ind
Ost
eopr
oteg
erin
, an
d ph
arm
aceu
tical
co
mpo
sitio
ns. T
he
poly
pept
ides
are
use
d to
tre
at b
one
dise
ases
ch
arac
teriz
ed b
y in
crea
sed
reso
rptio
n su
ch a
s os
teop
oros
is.
US
6288
032
B1:
Wha
t is
clai
med
is:1
. An
isol
ated
po
lype
ptid
e ha
ving
the
biol
ogic
al a
ctiv
ity o
f in
hibi
ting
bone
reso
rptio
n,
the
poly
pept
ide
com
pris
ing
a de
rivat
ive
of th
e am
ino
acid
seq
uenc
e 22
to 4
01
as s
how
n in
FIG
. 9B
(SE
Q
ID N
O:6
) whi
ch d
iffer
s fro
m
amin
o ac
id re
sidu
es 2
2 to
40
1 of
SE
Q ID
NO
:6 b
y de
letio
n of
car
boxy
-te
rmin
al tr
unca
tion
of p
art
or a
ll of
am
ino
acid
re
sidu
es 1
80-4
01.
2686
1355
Query = 1355 letters
Length = 1355
Score = 2686 bits (1355), Expect = 0.0
Identities = 1355/1355 (100%)
Strand = Plus / Plus
Query: 1 gtatatataacgtgatgagcgtacgggtgcggagacgcaccggagcgctcgcccagccgc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1 gtatatataacgtgatgagcgtacgggtgcggagacgcaccggagcgctcgcccagccgc
Query: 61 cgctccaagcccctgaggtttccggggaccacaatgaacaagttgctgtgctgcgcgctc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 61 cgctccaagcccctgaggtttccggggaccacaatgaacaagttgctgtgctgcgcgctc
Query: 121 gtgtttctggacatctccattaagtggaccacccaggaaacgtttcctccaaagtacctt
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 121 gtgtttctggacatctccattaagtggaccacccaggaaacgtttcctccaaagtacctt
Query: 181 cattatgacgaagaaacctctcatcagctgttgtgtgacaaatgtcctcctggtacctac
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 181 cattatgacgaagaaacctctcatcagctgttgtgtgacaaatgtcctcctggtacctac
Query: 241 ctaaaacaacactgtacagcaaagtggaagaccgtgtgcgccccttgccctgaccactac
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 241 ctaaaacaacactgtacagcaaagtggaagaccgtgtgcgccccttgccctgaccactac
Query: 301 tacacagacagctggcacaccagtgacgagtgtctatactgcagccccgtgtgcaaggag
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 301 tacacagacagctggcacaccagtgacgagtgtctatactgcagccccgtgtgcaaggag
Query: 361 ctgcagtacgtcaagcaggagtgcaatcgcacccacaaccgcgtgtgcgaatgcaaggaa
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 361 ctgcagtacgtcaagcaggagtgcaatcgcacccacaaccgcgtgtgcgaatgcaaggaa
Query: 421 gggcgctaccttgagatagagttctgcttgaaacataggagctgccctcctggatttgga
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 421 gggcgctaccttgagatagagttctgcttgaaacataggagctgccctcctggatttgga
Query: 481 gtggtgcaagctggaaccccagagcgaaatacagtttgcaaaagatgtccagatgggttc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 481 gtggtgcaagctggaaccccagagcgaaatacagtttgcaaaagatgtccagatgggttc
Unk
now
n
6284
485.
5
cDN
A
US
GE
NE
Nuc
leic
aci
ds
enco
ding
os
teop
rote
gerin
(P
aten
t)
Am
gen
Inc
(Tho
usan
d O
aks
CA
)
Boy
le W
illia
m J
. (M
oorp
ark,
CA
); La
cey
Dav
id L
. (T
hous
and
Oak
s,
CA
); C
alzo
ne
Fran
k J.
(W
estla
ke V
illag
e,
CA
); C
hang
Min
g-S
hi (N
ewbu
ry
Par
k, C
A)
The
pres
ent i
nven
tion
disc
lose
s a
nove
l sec
rete
d po
lype
ptid
e, te
rmed
O
steo
prot
eger
in, w
hich
is a
m
embe
r of t
he tu
mor
ne
cros
is fa
ctor
rece
ptor
su
perfa
mily
and
is in
volv
ed
in th
e re
gula
tion
of b
one
met
abol
ism
. Als
o di
sclo
sed
are
nucl
eic
acid
s en
codi
ng
Ost
eopr
oteg
erin
, po
lype
ptid
es, r
ecom
bina
nt
vect
ors
and
host
cel
ls fo
r ex
pres
sion
, ant
ibod
ies
whi
ch b
ind
Ost
eopr
oteg
erin
, an
d ph
arm
aceu
tical
co
mpo
sitio
ns. T
he
poly
pept
ides
are
use
d to
tre
at b
one
dise
ases
ch
arac
teriz
ed b
y in
crea
sed
reso
rptio
n su
ch a
s os
teop
oros
is.
US
6284
485
B1:
Wha
t is
clai
med
is:1
. An
isol
ated
nu
clei
c ac
id e
ncod
ing
a po
lype
ptid
e ha
ving
the
biol
ogic
al a
ctiv
ity o
f in
hibi
ting
bone
reso
rptio
n an
d co
mpr
isin
g a
deriv
ativ
e of
the
amin
o ac
id
sequ
ence
22
to 4
01 a
s sh
own
in F
IG. 9
B (S
EQ
ID
NO
:6) w
hich
diff
ers
from
am
ino
acid
resi
dues
22
to
401
of S
EQ
ID N
O:6
by
dele
tion
or c
arbo
xy-
term
inal
trun
catio
n of
par
t or
all
of a
min
o ac
id
resi
dues
180
to 4
01.
2686
1355
Query = 1355 letters
Length = 1355
Score = 2686 bits (1355), Expect = 0.0
Identities = 1355/1355 (100%)
Strand = Plus / Plus
Query: 1 gtatatataacgtgatgagcgtacgggtgcggagacgcaccggagcgctcgcccagccgc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1 gtatatataacgtgatgagcgtacgggtgcggagacgcaccggagcgctcgcccagccgc
Query: 61 cgctccaagcccctgaggtttccggggaccacaatgaacaagttgctgtgctgcgcgctc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 61 cgctccaagcccctgaggtttccggggaccacaatgaacaagttgctgtgctgcgcgctc
Query: 121 gtgtttctggacatctccattaagtggaccacccaggaaacgtttcctccaaagtacctt
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 121 gtgtttctggacatctccattaagtggaccacccaggaaacgtttcctccaaagtacctt
Query: 181 cattatgacgaagaaacctctcatcagctgttgtgtgacaaatgtcctcctggtacctac
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 181 cattatgacgaagaaacctctcatcagctgttgtgtgacaaatgtcctcctggtacctac
Query: 241 ctaaaacaacactgtacagcaaagtggaagaccgtgtgcgccccttgccctgaccactac
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 241 ctaaaacaacactgtacagcaaagtggaagaccgtgtgcgccccttgccctgaccactac
Query: 301 tacacagacagctggcacaccagtgacgagtgtctatactgcagccccgtgtgcaaggag
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 301 tacacagacagctggcacaccagtgacgagtgtctatactgcagccccgtgtgcaaggag
Query: 361 ctgcagtacgtcaagcaggagtgcaatcgcacccacaaccgcgtgtgcgaatgcaaggaa
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 361 ctgcagtacgtcaagcaggagtgcaatcgcacccacaaccgcgtgtgcgaatgcaaggaa
Query: 421 gggcgctaccttgagatagagttctgcttgaaacataggagctgccctcctggatttgga
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 421 gggcgctaccttgagatagagttctgcttgaaacataggagctgccctcctggatttgga
Query: 481 gtggtgcaagctggaaccccagagcgaaatacagtttgcaaaagatgtccagatgggttc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 481 gtggtgcaagctggaaccccagagcgaaatacagtttgcaaaagatgtccagatgggttc
not p
rovi
ded
Pag
e 2
of 5
BLA
ST
OP
G
Acc
essi
on
Num
ber
Title
Pat
ent
Ass
igne
eIn
vent
orA
bstra
ctE
xem
plar
y C
laim
BLA
ST
Sco
reS
eque
nce
Leng
thB
LAS
T A
lignm
ent
Org
anis
m
6369
027.
127
DN
A
US
GE
NE
Ost
eopr
oteg
erin
(P
aten
t)A
mge
n In
c (T
hous
and
Oak
s C
A)
Boy
le W
illia
m J
. (M
oorp
ark,
CA
); La
cey
Dav
id L
. (T
hous
and
Oak
s,
CA
); C
alzo
ne
Fran
k J.
(W
estla
ke V
illag
e,
CA
); C
hang
Min
g-S
hi (N
ewbu
ry
Par
k, C
A)
The
pres
ent i
nven
tion
disc
lose
s a
nove
l sec
rete
d po
lype
ptid
e, te
rmed
os
teop
rote
gerin
, whi
ch is
a
mem
ber o
f the
tum
or
necr
osis
fact
or re
cept
or
supe
rfam
ily a
nd is
invo
lved
in
the
regu
latio
n of
bon
e m
etab
olis
m. A
lso
disc
lose
d ar
e nu
clei
c ac
ids
enco
ding
os
teop
rote
gerin
, po
lype
ptid
es, r
ecom
bina
nt
vect
ors
and
host
cel
ls fo
r ex
pres
sion
, ant
ibod
ies
whi
ch b
ind
OP
G, a
nd
phar
mac
eutic
al
com
posi
tions
. The
po
lype
ptid
es a
re u
sed
to
treat
bon
e di
seas
es
char
acte
rized
by
incr
ease
d re
sorp
tion
such
as
oste
opor
osis
.
US
6369
027
B1:
Wha
t is
clai
med
is:1
. A p
olyp
eptid
e co
mpr
isin
g a
carb
oxy
term
inal
trun
catio
n of
the
amin
o ac
id s
eque
nce
as
show
n in
SE
Q ID
NO
:128
w
here
in th
e po
lype
ptid
e co
mpr
ises
the
amin
o ac
id
resi
dues
22
to 1
85, 2
2 to
18
9, 2
2 to
194
or 2
2 to
201
an
d ha
s th
e ac
tivity
of
inhi
bitin
g bo
ne re
sorp
tion.
2686
1355
Query = 1355 letters
Length = 1355
Score = 2686 bits (1355), Expect = 0.0
Identities = 1355/1355 (100%)
Strand = Plus / Plus
Query: 1 gtatatataacgtgatgagcgtacgggtgcggagacgcaccggagcgctcgcccagccgc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1 gtatatataacgtgatgagcgtacgggtgcggagacgcaccggagcgctcgcccagccgc
Query: 61 cgctccaagcccctgaggtttccggggaccacaatgaacaagttgctgtgctgcgcgctc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 61 cgctccaagcccctgaggtttccggggaccacaatgaacaagttgctgtgctgcgcgctc
Query: 121 gtgtttctggacatctccattaagtggaccacccaggaaacgtttcctccaaagtacctt
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 121 gtgtttctggacatctccattaagtggaccacccaggaaacgtttcctccaaagtacctt
Query: 181 cattatgacgaagaaacctctcatcagctgttgtgtgacaaatgtcctcctggtacctac
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 181 cattatgacgaagaaacctctcatcagctgttgtgtgacaaatgtcctcctggtacctac
Query: 241 ctaaaacaacactgtacagcaaagtggaagaccgtgtgcgccccttgccctgaccactac
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 241 ctaaaacaacactgtacagcaaagtggaagaccgtgtgcgccccttgccctgaccactac
Query: 301 tacacagacagctggcacaccagtgacgagtgtctatactgcagccccgtgtgcaaggag
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 301 tacacagacagctggcacaccagtgacgagtgtctatactgcagccccgtgtgcaaggag
Query: 361 ctgcagtacgtcaagcaggagtgcaatcgcacccacaaccgcgtgtgcgaatgcaaggaa
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 361 ctgcagtacgtcaagcaggagtgcaatcgcacccacaaccgcgtgtgcgaatgcaaggaa
Query: 421 gggcgctaccttgagatagagttctgcttgaaacataggagctgccctcctggatttgga
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 421 gggcgctaccttgagatagagttctgcttgaaacataggagctgccctcctggatttgga
Query: 481 gtggtgcaagctggaaccccagagcgaaatacagtttgcaaaagatgtccagatgggttc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 481 gtggtgcaagctggaaccccagagcgaaatacagtttgcaaaagatgtccagatgggttc
Unk
now
n
6613
544.
5
DN
A
US
GE
NE
Ost
eopr
oteg
erin
(P
aten
t)A
mge
n In
c (T
hous
and
Oak
s C
A)
Boy
le W
illia
m J
. (M
oorp
ark,
CA
); La
cey
Dav
id L
. (T
hous
and
Oak
s,
CA
); C
alzo
ne
Fran
k J.
(Pin
e C
rest
Circ
le, C
A);
Cha
ng M
ing-
Shi
(N
ewbu
ry P
ark,
C
A)
The
pres
ent i
nven
tion
disc
lose
s a
nove
l sec
rete
d po
lype
ptid
e, te
rmed
O
steo
prot
eger
in, w
hich
is a
m
embe
r of t
he tu
mor
ne
cros
is fa
ctor
rece
ptor
su
perfa
mily
and
is in
volv
ed
in th
e re
gula
tion
of b
one
met
abol
ism
. Als
o di
sclo
sed
are
nucl
eic
acid
s en
codi
ng
Ost
eopr
oteg
erin
, po
lype
ptid
es, r
ecom
bina
nt
vect
ors
and
host
cel
ls fo
r ex
pres
sion
, ant
ibod
ies
whi
ch b
ind
Ost
eopr
oteg
erin
, an
d ph
arm
aceu
tical
co
mpo
sitio
ns. T
he
poly
pept
ides
are
use
d to
tre
at b
one
dise
ases
ch
arac
teriz
ed b
y in
crea
sed
reso
rptio
n su
ch a
s os
teop
oros
is.
US
6613
544
B1:
Wha
t is
clai
med
is:1
. An
isol
ated
nu
clei
c ac
id e
ncod
ing
a po
lype
ptid
e co
mpr
isin
g th
e am
ino
acid
seq
uenc
e fro
m
resi
dues
1 to
401
or f
rom
re
sidu
es 2
2 to
401
as
show
n in
FIG
. 9B
(SE
Q ID
N
O:6
).
2686
1355
Query = 1355 letters
Length = 1355
Score = 2686 bits (1355), Expect = 0.0
Identities = 1355/1355 (100%)
Strand = Plus / Plus
Query: 1 gtatatataacgtgatgagcgtacgggtgcggagacgcaccggagcgctcgcccagccgc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1 gtatatataacgtgatgagcgtacgggtgcggagacgcaccggagcgctcgcccagccgc
Query: 61 cgctccaagcccctgaggtttccggggaccacaatgaacaagttgctgtgctgcgcgctc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 61 cgctccaagcccctgaggtttccggggaccacaatgaacaagttgctgtgctgcgcgctc
Query: 121 gtgtttctggacatctccattaagtggaccacccaggaaacgtttcctccaaagtacctt
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 121 gtgtttctggacatctccattaagtggaccacccaggaaacgtttcctccaaagtacctt
Query: 181 cattatgacgaagaaacctctcatcagctgttgtgtgacaaatgtcctcctggtacctac
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 181 cattatgacgaagaaacctctcatcagctgttgtgtgacaaatgtcctcctggtacctac
Query: 241 ctaaaacaacactgtacagcaaagtggaagaccgtgtgcgccccttgccctgaccactac
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 241 ctaaaacaacactgtacagcaaagtggaagaccgtgtgcgccccttgccctgaccactac
Query: 301 tacacagacagctggcacaccagtgacgagtgtctatactgcagccccgtgtgcaaggag
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 301 tacacagacagctggcacaccagtgacgagtgtctatactgcagccccgtgtgcaaggag
Query: 361 ctgcagtacgtcaagcaggagtgcaatcgcacccacaaccgcgtgtgcgaatgcaaggaa
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 361 ctgcagtacgtcaagcaggagtgcaatcgcacccacaaccgcgtgtgcgaatgcaaggaa
Query: 421 gggcgctaccttgagatagagttctgcttgaaacataggagctgccctcctggatttgga
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 421 gggcgctaccttgagatagagttctgcttgaaacataggagctgccctcctggatttgga
Query: 481 gtggtgcaagctggaaccccagagcgaaatacagtttgcaaaagatgtccagatgggttc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 481 gtggtgcaagctggaaccccagagcgaaatacagtttgcaaaagatgtccagatgggttc
Unk
now
n
Pag
e 3
of 5
BLA
ST
OP
G
Acc
essi
on
Num
ber
Title
Pat
ent
Ass
igne
eIn
vent
orA
bstra
ctE
xem
plar
y C
laim
BLA
ST
Sco
reS
eque
nce
Leng
thB
LAS
T A
lignm
ent
Org
anis
m
7005
413.
124
DN
A
US
GE
NE
Com
bina
tion
ther
apy
for
cond
ition
s le
adin
g to
bon
e lo
ss
(Pat
ent)
Am
gen
Inc
(Tho
usan
d O
aks
CA
)
Boy
le W
illia
m J
. (M
oorp
ark,
CA
); La
cey
Dav
id L
ee
(New
bury
Par
k,
CA
); C
alzo
ne
Fran
k J.
(W
estla
ke V
illag
e,
CA
); C
hang
Min
g-S
hi (T
aina
n, T
W)
The
pres
ent i
nven
tion
disc
lose
s a
nove
l sec
rete
d po
lype
ptid
e, te
rmed
os
teop
rote
gerin
, whi
ch is
a
mem
ber o
f the
tum
or
necr
osis
fact
or re
cept
or
supe
rfam
ily a
nd is
invo
lved
in
the
regu
latio
n of
bon
e m
etab
olis
m. A
lso
disc
lose
d ar
e nu
clei
c ac
ids
enco
ding
os
teop
rote
gerin
, po
lype
ptid
es, r
ecom
bina
nt
vect
ors
and
host
cel
ls fo
r ex
pres
sion
, ant
ibod
ies
whi
ch b
ind
OP
G, a
nd
phar
mac
eutic
al
com
posi
tions
. The
po
lype
ptid
es a
re u
sed
to
treat
bon
e di
seas
es
char
acte
rized
by
incr
ease
d re
sorp
tion
such
as
oste
opor
osis
. Met
hods
of
treat
men
t are
des
crib
ed
usin
g th
e po
lype
ptid
es in
co
njun
ctio
n w
ith v
ario
us
agen
ts, i
nclu
ding
IL-1
in
hibi
tors
, TN
F-et
hX98
in
hibi
tors
, and
ser
ine
prot
ease
inhi
bito
rs.
US
7005
413
B1:
Wha
t is
clai
med
is:1
. A m
etho
d of
tre
atin
g bo
ne lo
ss, w
hich
co
mpr
ises
adm
inis
terin
g an
IL-1
inhi
bito
r, a
TNF-
ethX
98 in
hibi
tor,
and
an
OP
G p
rote
in, w
here
in
'OP
G p
rote
in' r
efer
s to
a
poly
pept
ide
com
pris
ing
cons
erve
d re
sidu
es fr
om
resi
dues
22
to 1
85 o
f SE
Q
ID N
OS
: 121
, 123
, or 1
25.
2656
1356
Query = 1355 letters
Length = 1356
Score = 2656 bits (1340), Expect = 0.0
Identities = 1353/1356 (99%), Gaps = 1/1356 (0%)
Strand = Plus / Plus
Query: 1 gtatatataacgtgatgagcgtacgggtgcggagacgcaccggagcgctcgcccagccgc
||||||||||||||||||||||||||||||||||||||||||| ||||||||||||||||
Sbjct: 1 gtatatataacgtgatgagcgtacgggtgcggagacgcaccggcgcgctcgcccagccgc
Query: 61 cg-ctccaagcccctgaggtttccggggaccacaatgaacaagttgctgtgctgcgcgct
|| |||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 61 cgyctccaagcccctgaggtttccggggaccacaatgaacaagttgctgtgctgcgcgct
Query: 120 cgtgtttctggacatctccattaagtggaccacccaggaaacgtttcctccaaagtacct
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 121 cgtgtttctggacatctccattaagtggaccacccaggaaacgtttcctccaaagtacct
Query: 180 tcattatgacgaagaaacctctcatcagctgttgtgtgacaaatgtcctcctggtaccta
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 181 tcattatgacgaagaaacctctcatcagctgttgtgtgacaaatgtcctcctggtaccta
Query: 240 cctaaaacaacactgtacagcaaagtggaagaccgtgtgcgccccttgccctgaccacta
||||||||||||||||||||||||||||||| ||||||||||||||||||||||||||||
Sbjct: 241 cctaaaacaacactgtacagcaaagtggaagtccgtgtgcgccccttgccctgaccacta
Query: 300 ctacacagacagctggcacaccagtgacgagtgtctatactgcagccccgtgtgcaagga
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 301 ctacacagacagctggcacaccagtgacgagtgtctatactgcagccccgtgtgcaagga
Query: 360 gctgcagtacgtcaagcaggagtgcaatcgcacccacaaccgcgtgtgcgaatgcaagga
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 361 gctgcagtacgtcaagcaggagtgcaatcgcacccacaaccgcgtgtgcgaatgcaagga
Query: 420 agggcgctaccttgagatagagttctgcttgaaacataggagctgccctcctggatttgg
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 421 agggcgctaccttgagatagagttctgcttgaaacataggagctgccctcctggatttgg
Query: 480 agtggtgcaagctggaaccccagagcgaaatacagtttgcaaaagatgtccagatgggtt
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 481 agtggtgcaagctggaaccccagagcgaaatacagtttgcaaaagatgtccagatgggtt
Unk
now
n
7205
397.
6
cDN
A
US
GE
NE
Pro
tein
s an
d m
etho
ds fo
r pr
oduc
ing
the
prot
eins
(Pat
ent)
San
kyo
Co
Ltd
(Tok
yo
JP)
Got
o M
asaa
ki
(Ishi
bash
imac
hi,
JP);
Tsud
a E
isuk
e (Is
hiba
shim
achi
, JP
); M
ochi
zuki
S
hin?
ichi
(M
inam
ikaw
achi
mac
hi, J
P);
Yan
o K
azuk
i (Is
hiba
shim
achi
, JP
); K
obay
ashi
Fu
mie
(K
awac
him
achi
, JP
); S
him
a N
obuy
uki
(Min
amik
awac
him
achi
, JP
); Y
asud
a H
isat
aka
(Min
amik
awac
him
achi
, JP
); N
akag
awa
Nob
uaki
(Is
hiba
shim
achi
, JP
); M
orin
aga
Tom
onor
i (M
ibum
achi
, JP
); U
eda
Mas
atsu
gu
(Kaw
agoe
, JP
); H
igas
hio
Kan
ji (K
awag
oe, J
P)
A p
rote
in w
hich
inhi
bits
os
teoc
last
diff
eren
tiatio
n an
d/or
mat
urat
ion
and
a m
etho
d fo
r pro
duci
ng th
e pr
otei
n. T
he p
rote
in is
pr
oduc
ed b
y hu
man
em
bryo
nic
lung
fibr
obla
sts
and
has
a m
olec
ular
wei
ght
of a
bout
60
kD a
nd a
bout
12
0 kD
und
er n
on-r
educ
ing
cond
ition
s an
d ab
out 6
0 kD
un
der r
educ
ing
cond
ition
s on
SD
S-p
olya
cryl
amid
e ge
l el
ectro
phor
esis
. The
pro
tein
ca
n be
isol
ated
and
pur
ified
fro
m th
e cu
lture
med
ium
of
fibro
blas
ts. F
urth
erm
ore,
th
e pr
otei
n ca
n be
pro
duce
d by
gen
e en
gine
erin
g. T
he
pres
ent i
nven
tion
incl
udes
cD
NA
for p
rodu
cing
the
prot
ein
by g
ene
engi
neer
ing,
ant
ibod
ies
havi
ng s
peci
fic a
ffini
ty fo
r th
e pr
otei
n or
a m
etho
d fo
r de
term
inin
g pr
otei
n co
ncen
tratio
n us
ing
thes
e an
tibod
ies.
US
7205
397
B2:
1. A
n is
olat
ed p
olyn
ucle
otid
e co
mpr
isin
g th
e nu
cleo
tide
sequ
ence
as
prov
ided
in
SE
Q ID
NO
. 83.
2375
1206
Query = 1355 letters
Length = 1206
Score = 2375 bits (1198), Expect = 0.0
Identities = 1204/1206 (99%)
Strand = Plus / Plus
Query: 94 atgaacaagttgctgtgctgcgcgctcgtgtttctggacatctccattaagtggaccacc
|||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1 atgaacaacttgctgtgctgcgcgctcgtgtttctggacatctccattaagtggaccacc
Query: 154 caggaaacgtttcctccaaagtaccttcattatgacgaagaaacctctcatcagctgttg
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 61 caggaaacgtttcctccaaagtaccttcattatgacgaagaaacctctcatcagctgttg
Query: 214 tgtgacaaatgtcctcctggtacctacctaaaacaacactgtacagcaaagtggaagacc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 121 tgtgacaaatgtcctcctggtacctacctaaaacaacactgtacagcaaagtggaagacc
Query: 274 gtgtgcgccccttgccctgaccactactacacagacagctggcacaccagtgacgagtgt
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 181 gtgtgcgccccttgccctgaccactactacacagacagctggcacaccagtgacgagtgt
Query: 334 ctatactgcagccccgtgtgcaaggagctgcagtacgtcaagcaggagtgcaatcgcacc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 241 ctatactgcagccccgtgtgcaaggagctgcagtacgtcaagcaggagtgcaatcgcacc
Query: 394 cacaaccgcgtgtgcgaatgcaaggaagggcgctaccttgagatagagttctgcttgaaa
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 301 cacaaccgcgtgtgcgaatgcaaggaagggcgctaccttgagatagagttctgcttgaaa
Query: 454 cataggagctgccctcctggatttggagtggtgcaagctggaaccccagagcgaaataca
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 361 cataggagctgccctcctggatttggagtggtgcaagctggaaccccagagcgaaataca
Query: 514 gtttgcaaaagatgtccagatgggttcttctcaaatgagacgtcatctaaagcaccctgt
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 421 gtttgcaaaagatgtccagatgggttcttctcaaatgagacgtcatctaaagcaccctgt
Query: 574 agaaaacacacaaattgcagtgtctttggtctcctgctaactcagaaaggaaatgcaaca
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 481 agaaaacacacaaattgcagtgtctttggtctcctgctaactcagaaaggaaatgcaaca
Not
pro
vide
d
Pag
e 4
of 5
BLA
ST
OP
G
Acc
essi
on
Num
ber
Title
Pat
ent
Ass
igne
eIn
vent
orA
bstra
ctE
xem
plar
y C
laim
BLA
ST
Sco
reS
eque
nce
Leng
thB
LAS
T A
lignm
ent
Org
anis
m
6919
434.
6
DN
A
US
GE
NE
Mon
oclo
nal
antib
odie
s th
at b
ind
OC
IF (P
aten
t)
San
kyo
Co
Ltd
(Tok
yo
JP)
Got
o M
asaa
ki
(Toc
higi
, JP
); Ts
uda
Eis
uke
(Toc
higi
, JP
); M
ochi
zuki
S
hin'
ichi
(Toc
higi
, JP
); Y
ano
Kaz
uki
(Toc
higi
, JP
); K
obay
ashi
Fum
ie
(Toc
higi
, JP
); S
him
a N
obuy
uki
(Toc
higi
, JP
); Y
asud
a H
isat
aka
(Toc
higi
, JP
); N
akag
awa
Nob
uaki
(Toc
higi
, JP
); M
orin
aga
Tom
onor
i (T
ochi
gi, J
P);
Ued
a M
asat
sugu
(S
aita
ma,
JP
); H
igas
hio
Kan
ji (S
aita
ma,
JP
)
A p
rote
in w
hich
inhi
bits
os
teoc
last
diff
ract
ion
and/
or
mat
urat
ion
and
a m
etho
d fo
r pro
duci
ng th
e pr
otei
n.
The
prot
ein
is p
rodu
ced
by
hum
an e
mbr
yoni
c lu
ng
fibro
blas
ts a
nd h
as a
m
olec
ular
wei
ght o
f abo
ut
60 k
D a
nd a
bout
120
kD
un
der n
on-r
educ
ing
cond
ition
s an
d ab
out 6
0 kD
un
der r
educ
ing
cond
ition
s an
SD
S-p
olya
cryl
amid
e ge
l el
ectro
phor
esis
. The
pro
tein
ca
n be
isol
ated
and
pur
ified
fro
m th
e cu
lture
med
ium
of
fibro
blas
ts. F
urth
erm
ore,
th
e pr
otei
n ca
n be
pro
duce
d by
gen
e en
gine
erin
g. T
he
pres
ent i
nven
tion
incl
udes
cD
NA
for p
rodu
cing
the
prot
ein
by g
ene
engi
neer
ing,
ant
ibod
ies
havi
ng s
peci
fic a
ffini
ty fo
r th
e pr
otei
n or
a m
etho
d fo
r de
term
inin
g pr
otei
n co
ncen
tratio
n us
ing
thes
e an
tibod
ies.
US
6919
434
B1:
1. A
n is
olat
ed m
onoc
lona
l an
tibod
y pr
oduc
ed b
y a
hybr
idom
a se
lect
ed fr
om
the
grou
p co
nsis
ting
of
A1G
5 ha
ving
Acc
essi
on
No.
FE
RM
BP
-74
41,D
2F4h
avin
g A
cces
sion
No.
FE
RM
BP
-74
42, a
nd E
3H8
havi
ng
Acc
essi
on N
o. F
ER
M B
P-
7443
reco
gniz
ing
oste
ocla
stog
enes
is
inhi
bito
ry fa
ctor
.
2375
1206
Query = 1355 letters
Length = 1206
Score = 2375 bits (1198), Expect = 0.0
Identities = 1204/1206 (99%)
Strand = Plus / Plus
Query: 94 atgaacaagttgctgtgctgcgcgctcgtgtttctggacatctccattaagtggaccacc
|||||||| |||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1 atgaacaacttgctgtgctgcgcgctcgtgtttctggacatctccattaagtggaccacc
Query: 154 caggaaacgtttcctccaaagtaccttcattatgacgaagaaacctctcatcagctgttg
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 61 caggaaacgtttcctccaaagtaccttcattatgacgaagaaacctctcatcagctgttg
Query: 214 tgtgacaaatgtcctcctggtacctacctaaaacaacactgtacagcaaagtggaagacc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 121 tgtgacaaatgtcctcctggtacctacctaaaacaacactgtacagcaaagtggaagacc
Query: 274 gtgtgcgccccttgccctgaccactactacacagacagctggcacaccagtgacgagtgt
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 181 gtgtgcgccccttgccctgaccactactacacagacagctggcacaccagtgacgagtgt
Query: 334 ctatactgcagccccgtgtgcaaggagctgcagtacgtcaagcaggagtgcaatcgcacc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 241 ctatactgcagccccgtgtgcaaggagctgcagtacgtcaagcaggagtgcaatcgcacc
Query: 394 cacaaccgcgtgtgcgaatgcaaggaagggcgctaccttgagatagagttctgcttgaaa
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 301 cacaaccgcgtgtgcgaatgcaaggaagggcgctaccttgagatagagttctgcttgaaa
Query: 454 cataggagctgccctcctggatttggagtggtgcaagctggaaccccagagcgaaataca
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 361 cataggagctgccctcctggatttggagtggtgcaagctggaaccccagagcgaaataca
Query: 514 gtttgcaaaagatgtccagatgggttcttctcaaatgagacgtcatctaaagcaccctgt
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 421 gtttgcaaaagatgtccagatgggttcttctcaaatgagacgtcatctaaagcaccctgt
Query: 574 agaaaacacacaaattgcagtgtctttggtctcctgctaactcagaaaggaaatgcaaca
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 481 agaaaacacacaaattgcagtgtctttggtctcctgctaactcagaaaggaaatgcaaca
Unk
now
n
6790
823.
1
mR
NA
U
SG
EN
E
Com
posi
tions
and
m
etho
ds fo
r the
pr
even
tion
and
treat
men
t of
card
iova
scul
ar
dise
ases
(Pat
ent)
Am
gen
Inc
(Tho
usan
d O
aks
CA
)
Sim
onet
Sco
tt (T
hous
and
Oak
s,
CA
); S
aros
i Ild
iko
(New
bury
Par
k,
CA
)
Met
hods
and
com
posi
tions
fo
r the
pre
vent
ion
and
treat
men
t of c
ardi
ovas
cula
r di
seas
e is
des
crib
ed.
Adm
inis
tratio
n of
os
teop
rote
gerin
(OP
G) i
n a
phar
mac
eutic
al c
ompo
sitio
n pr
even
ts a
nd tr
eats
at
hero
scle
rosi
s an
d as
soci
ated
car
diov
ascu
lar
dise
ases
.
US
6790
823
B1:
Wha
t is
clai
med
is:1
. A m
etho
d of
re
duci
ng th
e in
cide
nce
of
arte
rial c
alci
ficat
ion
in a
pa
tient
at r
isk
for a
rteria
l ca
lcifi
catio
n co
mpr
isin
g ad
min
iste
ring
to s
aid
patie
nt a
ther
apeu
tical
ly
effe
ctiv
e am
ount
of
oste
opro
tege
rin(O
PG
) in
a ph
arm
aceu
tical
co
mpo
sitio
n.
2686
1355
Query = 1355 letters
Length = 1355
Score = 2686 bits (1355), Expect = 0.0
Identities = 1355/1355 (100%)
Strand = Plus / Plus
Query: 1 gtatatataacgtgatgagcgtacgggtgcggagacgcaccggagcgctcgcccagccgc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 1 gtatatataacgtgatgagcgtacgggtgcggagacgcaccggagcgctcgcccagccgc
Query: 61 cgctccaagcccctgaggtttccggggaccacaatgaacaagttgctgtgctgcgcgctc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 61 cgctccaagcccctgaggtttccggggaccacaatgaacaagttgctgtgctgcgcgctc
Query: 121 gtgtttctggacatctccattaagtggaccacccaggaaacgtttcctccaaagtacctt
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 121 gtgtttctggacatctccattaagtggaccacccaggaaacgtttcctccaaagtacctt
Query: 181 cattatgacgaagaaacctctcatcagctgttgtgtgacaaatgtcctcctggtacctac
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 181 cattatgacgaagaaacctctcatcagctgttgtgtgacaaatgtcctcctggtacctac
Query: 241 ctaaaacaacactgtacagcaaagtggaagaccgtgtgcgccccttgccctgaccactac
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 241 ctaaaacaacactgtacagcaaagtggaagaccgtgtgcgccccttgccctgaccactac
Query: 301 tacacagacagctggcacaccagtgacgagtgtctatactgcagccccgtgtgcaaggag
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 301 tacacagacagctggcacaccagtgacgagtgtctatactgcagccccgtgtgcaaggag
Query: 361 ctgcagtacgtcaagcaggagtgcaatcgcacccacaaccgcgtgtgcgaatgcaaggaa
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 361 ctgcagtacgtcaagcaggagtgcaatcgcacccacaaccgcgtgtgcgaatgcaaggaa
Query: 421 gggcgctaccttgagatagagttctgcttgaaacataggagctgccctcctggatttgga
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 421 gggcgctaccttgagatagagttctgcttgaaacataggagctgccctcctggatttgga
Query: 481 gtggtgcaagctggaaccccagagcgaaatacagtttgcaaaagatgtccagatgggttc
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 481 gtggtgcaagctggaaccccagagcgaaatacagtttgcaaaagatgtccagatgggttc
Unk
now
n
Pag
e 5
of 5
USGENE® and DGENE on STN® Multifile techniques
Page 1
Example: patent family based duplicate identification/removal => FILE USGENE FILE 'USGENE' ENTERED AT 00:30:11 ON 10 FEB 2008 COPYRIGHT (C) 2008 SEQUENCEBASE CORP FILE LAST UPDATED: 8 FEB 2008 <20080208/UP> MOST RECENT PUBLICATION DATE: 7 FEB 2008 <20080207/PD> FILE COVERS 1982 TO DATE >>> SIMULTANEOUS LEFT AND RIGHT TRUNCATION (SLART) IS AVAILABLE IN THE BASIC INDEX (/BI) AND FEATURE TABLE (/FEAT) FIELDS <<< >>> FOR THE LATEST USGENE REFERENCE MATERIALS, PLEASE VISIT: http://www.stn-international.de/stndatabases/details/usgene-first-p.html >>> DOWNLOAD RUN BLAST/GETSIM FREQUENTLY ASKED QUESTIONS: http://www.stn-international.de/training_center/bioseq/usgenefaq.pdf >>> DOWNLOAD COMPLETE USGENE HELP AS PDF: http://www.stn-international.de/training_center/bioseq/usgene_help.pdf <<< >>> USGENE now provides USPTO sequence data within 3 days of publication - see NEWS <<< => UPL R BLAST UPLOAD SUCCESSFULLY COMPLETED L1 GENERATED => D LQUE L1 ANSWER 1 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN LQUE MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPAFEGDDGFCVFESN AIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGIMHHNKQATENAKEEVRRILGLLDAYL KTRTFLVGERVTLADITVVCTLLWLYKQVLEPSFRQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAK KFAETQPKKDTPRKEKGSREEKQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDE FKRKYSNEDTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASVILFGT NNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWEGAFQHVGKAFNQGKIFK => D FORMAT USER-DEFINED FORMAT DEFINITION DEFAULT FORMAT FOR FILE ------------------- ---------------------------- ----------------------------- .MYDGENEALIGN TRI OS ALIGN DGENE .MYPCTGENALIGN TRI ORGN SEQN ALIGN PCTGEN .MYUSGENEALIGN TRI ORGN SEQN SEQC ALIGN USGENE .MYWPINDEX BIB WPINDEX
USGENE® and DGENE on STN® Multifile techniques
Page 2
=> RUN BLAST L1 /SQP -F F BLAST Version 2.2 The BLAST software is used herein with permission of the National Center for Biotechnology Information (NCBI) of the National Library of Medicine (NLM). See also, Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic Acids Res. 25:3389-3402 BLAST SEARCHING BLAST SEARCHING Database USGENE AA Posted date: Feb 8, 2008 2:40 PM Number of letters in database: 577,282,380 Number of sequences in database: 2,645,472 Lambda K H 0.319 0.134 0.412 Gapped Lambda K H 0.267 0.0410 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Hits to DB: 549,143,651 Number of Sequences: 2645472 Number of extensions: 24136842 Number of successful extensions: 184208 Number of sequences better than 10.0: 1600 Number of HSP's better than 10.0 without gapping: 665 Number of HSP's successfully gapped in prelim test: 948 Number of HSP's that attempted gapping in prelim test: 157618 Number of HSP's gapped (non-prelim): 13426 length of query: 437 length of database: 577,282,380 effective HSP length: 126 effective length of query: 311 effective length of database: 243,952,908 effective search space: 75869354388 effective search space used: 75869354388 T: 11 A: 40 X1: 16 ( 7.4 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 41 (21.8 bits) S2: 74 (33.1 bits) 1600 ANSWERS FOUND BELOW EXPECTATION VALUE OF 10.0
USGENE® and DGENE on STN® Multifile techniques
Page 3
Similarity Score 902 | | | | | | | | | | 451 | | | | ||| ||| |||| |||| ||||| |||||||||||||||||||||||||||||||||||||||||||||||||| Answer Count 320 640 960 1280 1600 HOW MANY ANSWERS WOULD YOU LIKE TO KEEP ? (ALL) OR ?: ALL L2 RUN STATEMENT CREATED L2 1600 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFL RKFPAGKVPAFEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVS FADSDIVPPASTWVFPTLGIMHHNKQATENAKEEVRRILGLLDAYLKTRT FLVGERVTLADITVVCTLLWLYKQVLEPSFRQAFPNTNRWFLTCINQPQF RAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREEKQKPQAERKE EKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLD KLRKNAFASVILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWR KLDPGSEETQTLVREYFSWEGAFQHVGKAFNQGKIFK/SQP.-F F Answer set arranged by accession number; to sort by descending similarity score, enter at an arrow prompt (=>) "sor score d". => SOR SCORE D PROCESSING COMPLETED FOR L2 L3 1600 SOR L2 SCORE D
USGENE® and DGENE on STN® Multifile techniques
Page 4
=> D ( = customized default format .MYUSGENEALIGN) L3 ANSWER 1 OF 1600 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN TI Tissue-and serum-derived glycoproteins and methods of their use (PublishedApplication) MTY Protein SQL 437 ORGN Homo Sapiens SEQN 10979 SEQC 14918 BLASTALIGN Query = 437 letters Length = 437 Score = 902 bits (2331), Expect = 0.0 Identities = 437/437 (100%), Positives = 437/437 (100%) Query: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Sbjct: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Query: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Sbjct: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Query: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Sbjct: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Query: 181 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Sbjct: 181 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Query: 241 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Sbjct: 241 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Query: 301 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Sbjct: 301 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Query: 361 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Sbjct: 361 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Query: 421 GAFQHVGKAFNQGKIFK 437 GAFQHVGKAFNQGKIFK Sbjct: 421 GAFQHVGKAFNQGKIFK 437 => D SCORE 1-20 L3 ANSWER 1 OF 1600 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 902 L3 ANSWER 2 OF 1600 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 902 L3 ANSWER 3 OF 1600 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 902 L3 ANSWER 4 OF 1600 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 902 L3 ANSWER 5 OF 1600 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 900
USGENE® and DGENE on STN® Multifile techniques
Page 5
L3 ANSWER 6 OF 1600 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 889 L3 ANSWER 7 OF 1600 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 889 L3 ANSWER 8 OF 1600 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 889 L3 ANSWER 9 OF 1600 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 880 L3 ANSWER 10 OF 1600 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 788 L3 ANSWER 11 OF 1600 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 717 L3 ANSWER 12 OF 1600 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 656 L3 ANSWER 13 OF 1600 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 656 L3 ANSWER 14 OF 1600 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 656 L3 ANSWER 15 OF 1600 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 495 L3 ANSWER 16 OF 1600 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 405 L3 ANSWER 17 OF 1600 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 392 L3 ANSWER 18 OF 1600 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 392 L3 ANSWER 19 OF 1600 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 290 L3 ANSWER 20 OF 1600 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN SCORE 290 => SOR AN 1-10 PROCESSING COMPLETED FOR L3 L4 10 SOR L3 1-10 AN
USGENE® and DGENE on STN® Multifile techniques
Page 6
=> FILE DGENE FILE 'DGENE' ENTERED AT 23:57:11 ON 09 FEB 2008 COPYRIGHT (C) 2008 THE THOMSON CORPORATION => RUN BLAST L1 /SQP -F F BLAST Version 2.2 The BLAST software is used herein with permission of the National Center for Biotechnology Information (NCBI) of the National Library of Medicine (NLM). See also, Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs." Nucleic Acids Res. 25:3389-3402 BLAST SEARCHING BLAST SEARCHING BLAST SEARCHING BLAST SEARCHING Database DGENE AA Posted date: Jan 30, 2008 9:11 PM Number of letters in database: 588,906,564 Number of sequences in database: 3,295,692 Lambda K H 0.319 0.134 0.412 Gapped Lambda K H 0.267 0.0410 0.140 Matrix: BLOSUM62 Gap Penalties: Existence: 11, Extension: 1 Number of Hits to DB: 565,627,694 Number of Sequences: 3295692 Number of extensions: 24731570 Number of successful extensions: 202877 Number of sequences better than 10.0: 1904 Number of HSP's better than 10.0 without gapping: 807 Number of HSP's successfully gapped in prelim test: 1110 Number of HSP's that attempted gapping in prelim test: 170460 Number of HSP's gapped (non-prelim): 16066 length of query: 437 length of database: 588,906,564 effective HSP length: 123 effective length of query: 314 effective length of database: 183,536,448 effective search space: 57630444672 effective search space used: 57630444672 T: 11 A: 40 X1: 16 ( 7.4 bits) X2: 38 (14.6 bits) X3: 64 (24.7 bits) S1: 41 (21.8 bits) S2: 73 (32.7 bits) 1904 ANSWERS FOUND BELOW EXPECTATION VALUE OF 10.0
USGENE® and DGENE on STN® Multifile techniques
Page 7
Similarity Score 902 | | | | | | | | | | 451 | | | | || ||| ||| ||| |||| |||||||||||||||||||||||||||||||||||||||||||||||||| Answer Count 380 760 1140 1520 1900 HOW MANY ANSWERS WOULD YOU LIKE TO KEEP ? (ALL) OR ?:ALL L5 RUN STATEMENT CREATED L5 1904 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFL RKFPAGKVPAFEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVS FADSDIVPPASTWVFPTLGIMHHNKQATENAKEEVRRILGLLDAYLKTRT FLVGERVTLADITVVCTLLWLYKQVLEPSFRQAFPNTNRWFLTCINQPQF RAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREEKQKPQAERKE EKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLD KLRKNAFASVILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWR KLDPGSEETQTLVREYFSWEGAFQHVGKAFNQGKIFK/SQP.-F F Answer set arranged by accession number; to sort by descending similarity score, enter at an arrow prompt (=>) "sor score d". => SOR SCORE D PROCESSING COMPLETED FOR L5 L6 1904 SOR L5 SCORE D
USGENE® and DGENE on STN® Multifile techniques
Page 8
=> D ( = customized default format .MYDGENETRIAL) L6 ANSWER 1 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN AN AEL43555 protein DGENE TI New human cancer suppressor proteins and DNA, useful for diagnosing, preventing, and treating human cancers, e.g. cancer of the breast, brain, heart, muscles, large intestine, thymus, spleen, kidney, liver, or small intestine. DESC Human cancer suppressor protein GIG35. KW diagnosis; therapeutic; prophylaxis; gene therapy; cancer; tumor; neoplasm; cytostatic; GIG35; EEF1G. SQL 437 OS 2006-747536 [76] BLASTALIGN Query = 437 letters Length = 437 Score = 902 bits (2331), Expect = 0.0 Identities = 437/437 (100%), Positives = 437/437 (100%) Query: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Sbjct: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Query: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Sbjct: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Query: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Sbjct: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Query: 181 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Sbjct: 181 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Query: 241 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Sbjct: 241 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Query: 301 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Sbjct: 301 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Query: 361 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Sbjct: 361 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Query: 421 GAFQHVGKAFNQGKIFK 437 GAFQHVGKAFNQGKIFK Sbjct: 421 GAFQHVGKAFNQGKIFK 437 => D SCORE 1-20 L6 ANSWER 1 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 902 L6 ANSWER 2 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 902 L6 ANSWER 3 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 902 L6 ANSWER 4 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 902
USGENE® and DGENE on STN® Multifile techniques
Page 9
L6 ANSWER 5 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 902 L6 ANSWER 6 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 900 L6 ANSWER 7 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 900 L6 ANSWER 8 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 900 L6 ANSWER 9 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 897 L6 ANSWER 10 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 889 L6 ANSWER 11 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 889 L6 ANSWER 12 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 889 L6 ANSWER 13 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 889 L6 ANSWER 14 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 880 L6 ANSWER 15 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 880 L6 ANSWER 16 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 717 L6 ANSWER 17 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 656 L6 ANSWER 18 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 656 L6 ANSWER 19 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 495 L6 ANSWER 20 OF 1904 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN SCORE 495 => SOR AN 1-15 PROCESSING COMPLETED FOR L6 L7 15 SOR L6 1-15 AN
USGENE® and DGENE on STN® Multifile techniques
Page 10
=> FILE WPINDEX FILE 'WPINDEX' ENTERED AT 00:00:27 ON 10 FEB 2008 COPYRIGHT (C) 2008 THE THOMSON CORPORATION '.MYWPINDEX' IS DEFAULT FORMAT FOR 'WPINDEX' FILE => TRA L4 PN; TRA L7 PN L8 TRANSFER L4 1- PN : 9 TERMS L9 8 L8 L10 TRANSFER L7 1- PN : 12 TERMS L11 12 L10 => S L9 OR L11 L12 16 L9 OR L11 => DUP IDE L4 L7 L12 DUPLICATE IS NOT AVAILABLE IN 'USGENE, DGENE'. ANSWERS FROM THESE FILES WILL BE CONSIDERED UNIQUE FILE 'USGENE' ENTERED AT 00:01:34 ON 10 FEB 2008 COPYRIGHT (C) 2008 SEQUENCEBASE CORP FILE 'DGENE' ENTERED AT 00:01:34 ON 10 FEB 2008 COPYRIGHT (C) 2008 THE THOMSON CORPORATION FILE 'WPINDEX' ENTERED AT 00:01:34 ON 10 FEB 2008 COPYRIGHT (C) 2008 THE THOMSON CORPORATION PROCESSING COMPLETED FOR L4 PROCESSING COMPLETED FOR L7 PROCESSING COMPLETED FOR L12 L13 41 DUP IDE L4 L7 L12 (INCLUDES 0 SETS OF DUPLICATES) ANSWERS '1-10' FROM FILE USGENE ANSWERS '11-25' FROM FILE DGENE ANSWERS '26-41' FROM FILE WPINDEX
USGENE® and DGENE on STN® Multifile techniques
Page 11
=> FSORT L13 SET SMARTSELECT ON SET COMMAND COMPLETED SET HIGHLIGHTING OFF SET COMMAND COMPLETED SEL L13 1- PN,APPS L14 SEL L13 1- PN APPS : 187 TERMS 'L14' DELETED L14 41 FSO L13 15 Multi-record Families Answers 1-41 Family 1 Answers 1-8 Family 2 Answers 9-11 Family 3 Answers 12-13 Family 4 Answers 14-15 Family 5 Answers 16-17 Family 6 Answers 18-19 Family 7 Answers 20-22 Family 8 Answers 23-25 Family 9 Answers 26-28 Family 10 Answers 29-30 Family 11 Answers 31-32 Family 12 Answers 33-34 Family 13 Answers 35-37 Family 14 Answers 38-39 Family 15 Answers 40-41 0 Individual Records 0 Non-patent Records SET SMARTSELECT OFF SET COMMAND COMPLETED SET HIGHLIGHTING ON SET COMMAND COMPLETED
USGENE® and DGENE on STN® Multifile techniques
Page 12
=> D PFAM=1,3,5,6,8,11,14 TOT (using customized file default formats) L14 ANSWER 1 OF 41 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN FAMILY 1 TI Novel nucleic acids and polypeptides (PublishedApplication) MTY Protein SQL 516 ORGN Homo Sapiens SEQN 3426 SEQC 3960 BLASTALIGN Query = 437 letters Length = 516 Score = 900 bits (2327), Expect = 0.0 Identities = 436/437 (99%), Positives = 436/437 (99%) Query: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Sbjct: 80 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Query: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Sbjct: 140 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Query: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Sbjct: 200 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Query: 181 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE RQAFPNTNRWFLTCINQPQFRAV GEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Sbjct: 260 RQAFPNTNRWFLTCINQPQFRAVFGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Query: 241 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Sbjct: 320 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Query: 301 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Sbjct: 380 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Query: 361 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Sbjct: 440 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Query: 421 GAFQHVGKAFNQGKIFK 437 GAFQHVGKAFNQGKIFK Sbjct: 500 GAFQHVGKAFNQGKIFK 516
USGENE® and DGENE on STN® Multifile techniques
Page 13
L14 ANSWER 2 OF 41 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN FAMILY 1 TI Novel nucleic acids and polypeptides (PublishedApplication) MTY Protein SQL 466 ORGN Homo Sapiens SEQN 1458 SEQC 3960 BLASTALIGN Query = 437 letters Length = 466 Score = 880 bits (2273), Expect = 0.0 Identities = 426/428 (99%), Positives = 426/428 (99%) Query: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA MAAGTLYTYPENWRAFKALIAAQYSGA VRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Sbjct: 7 MAAGTLYTYPENWRAFKALIAAQYSGAHVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Query: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Sbjct: 67 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Query: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Sbjct: 127 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Query: 181 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Sbjct: 187 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Query: 241 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Sbjct: 247 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Query: 301 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Sbjct: 307 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Query: 361 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Sbjct: 367 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Query: 421 GAFQHVGK 428 GAFQHV K Sbjct: 427 GAFQHVAK 434
USGENE® and DGENE on STN® Multifile techniques
Page 14
L14 ANSWER 3 OF 41 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN FAMILY 1 AN AGI34472 protein DGENE TI New isolated human polynucleotides and polypeptides, useful for treating, e.g. Alzheimer's, Parkinson's disease, Huntington's disease, multiple sclerosis, rheumatoid arthritis, allergy, asthma, or and cancers. DESC Human protein, SEQ ID 3426. KW Neuroprotective; Nootropic; Antiparkinsonian; Immunosuppressive; Antirheumatic; Antiarthritic; Antithyroid; Immunosuppressive; Antidiabetic; Antiallergic; Dermatological; Ophthalmological; Antiasthmatic; Respiratory- Gen.; Cytostatic; Gene Therapy; neurological disease; autoimmune disease; allergy; asthma; respiratory disorder; cancer. SQL 516 OS 2007-372063 [35] BLASTALIGN Query = 437 letters Length = 516 Score = 900 bits (2327), Expect = 0.0 Identities = 436/437 (99%), Positives = 436/437 (99%) Query: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Sbjct: 80 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Query: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Sbjct: 140 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Query: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Sbjct: 200 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Query: 181 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE RQAFPNTNRWFLTCINQPQFRAV GEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Sbjct: 260 RQAFPNTNRWFLTCINQPQFRAVFGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Query: 241 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Sbjct: 320 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Query: 301 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Sbjct: 380 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Query: 361 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Sbjct: 440 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Query: 421 GAFQHVGKAFNQGKIFK 437 GAFQHVGKAFNQGKIFK Sbjct: 500 GAFQHVGKAFNQGKIFK 516
USGENE® and DGENE on STN® Multifile techniques
Page 15
L14 ANSWER 4 OF 41 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN FAMILY 1 AN AGI32504 protein DGENE TI New isolated human polynucleotides and polypeptides, useful for treating, e.g. Alzheimer's, Parkinson's disease, Huntington's disease, multiple sclerosis, rheumatoid arthritis, allergy, asthma, or and cancers. DESC Human protein, SEQ ID 1458. KW Neuroprotective; Nootropic; Antiparkinsonian; Immunosuppressive; Antirheumatic; Antiarthritic; Antithyroid; Immunosuppressive; Antidiabetic; Antiallergic; Dermatological; Ophthalmological; Antiasthmatic; Respiratory- Gen.; Cytostatic; Gene Therapy; neurological disease; autoimmune disease; allergy; asthma; respiratory disorder; cancer. SQL 466 OS 2007-372063 [35] BLASTALIGN Query = 437 letters Length = 466 Score = 880 bits (2273), Expect = 0.0 Identities = 426/428 (99%), Positives = 426/428 (99%) Query: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA MAAGTLYTYPENWRAFKALIAAQYSGA VRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Sbjct: 7 MAAGTLYTYPENWRAFKALIAAQYSGAHVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Query: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Sbjct: 67 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Query: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Sbjct: 127 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Query: 181 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Sbjct: 187 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Query: 241 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Sbjct: 247 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Query: 301 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Sbjct: 307 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Query: 361 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Sbjct: 367 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Query: 421 GAFQHVGK 428 GAFQHV K Sbjct: 427 GAFQHVAK 434
USGENE® and DGENE on STN® Multifile techniques
Page 16
L14 ANSWER 5 OF 41 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN FAMILY 1 AN AAM79780 Protein DGENE TI Nucleic acids encoding polypeptides with cytokine-like activities, useful in diagnosis and gene therapy - DESC Human protein SEQ ID NO 3426. KW Human; cytokine; cell proliferation; cell differentiation; gene therapy; vaccine; peptide therapy; stem cell growth factor; haematopoiesis; tissue growth factor; immunomodulatory; cancer; leukaemia; nervous system disorder; arthritis; inflammation. SQL 516 OS 2001-476283 [51] BLASTALIGN Query = 437 letters Length = 516 Score = 900 bits (2327), Expect = 0.0 Identities = 436/437 (99%), Positives = 436/437 (99%) Query: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Sbjct: 80 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Query: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Sbjct: 140 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Query: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Sbjct: 200 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Query: 181 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE RQAFPNTNRWFLTCINQPQFRAV GEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Sbjct: 260 RQAFPNTNRWFLTCINQPQFRAVFGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Query: 241 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Sbjct: 320 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Query: 301 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Sbjct: 380 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Query: 361 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Sbjct: 440 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Query: 421 GAFQHVGKAFNQGKIFK 437 GAFQHVGKAFNQGKIFK Sbjct: 500 GAFQHVGKAFNQGKIFK 516
USGENE® and DGENE on STN® Multifile techniques
Page 17
L14 ANSWER 6 OF 41 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN FAMILY 1 AN AAM78796 Protein DGENE TI Nucleic acids encoding polypeptides with cytokine-like activities, useful in diagnosis and gene therapy - DESC Human protein SEQ ID NO 1458. KW Human; cytokine; cell proliferation; cell differentiation; gene therapy; vaccine; peptide therapy; stem cell growth factor; haematopoiesis; tissue growth factor; immunomodulatory; cancer; leukaemia; nervous system disorder; arthritis; inflammation. SQL 466 OS 2001-476283 [51] BLASTALIGN Query = 437 letters Length = 466 Score = 880 bits (2273), Expect = 0.0 Identities = 426/428 (99%), Positives = 426/428 (99%) Query: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA MAAGTLYTYPENWRAFKALIAAQYSGA VRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Sbjct: 7 MAAGTLYTYPENWRAFKALIAAQYSGAHVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Query: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Sbjct: 67 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Query: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Sbjct: 127 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Query: 181 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Sbjct: 187 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Query: 241 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Sbjct: 247 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Query: 301 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Sbjct: 307 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Query: 361 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Sbjct: 367 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Query: 421 GAFQHVGK 428 GAFQHV K Sbjct: 427 GAFQHVAK 434
USGENE® and DGENE on STN® Multifile techniques
Page 18
L14 ANSWER 7 OF 41 WPINDEX COPYRIGHT 2008 THE THOMSON CORP on STN FAMILY1 AN 2007-372063 [35] WPINDEX Full-text CR 2001-442253; 2001-442255; 2001-451890; 2001-451908; 2001-451909; 2001-451912; 2001-451938; 2001-451939; 2001-457603; 2001-457740; 2001-465363; 2001-465571; 2001-465578; 2001-465705; 2001-476114; 2001-476164; 2001-476197; 2001-476198; 2001-476199; 2001-476282; 2001-476283; 2001-483140; 2001-483233; 2001-488707; 2001-488788; 2001-488875; 2001-488895; 2001-496929; 2001-496930; 2001-496931; 2001-496932; 2001-514838; 2001-522358; 2001-565565; 2001-582152; 2001-582153; 2001-589862; 2001-589934; 2001-607699; 2001-611724; 2001-611725; 2001-626375; 2001-626426; 2001-626432; 2001-626527; 2001-639362; 2001-657166; 2002-010428; 2002-025688; 2002-062370; 2002-280918; 2002-426278; 2002-575369; 2002-590824; 2002-674924; 2002-759812; 2003-018710; 2003-028924; 2003-110596; 2003-313249; 2003-381616; 2003-456302; 2003-513756; 2003-569235; 2003-625403; 2003-678194; 2003-679633; 2003-697229; 2003-697230; 2003-810980; 2003-829799; 2003-851723; 2003-852227; 2004-061257; 2004-089285; 2004-143291; 2004-167523; 2004-167906; 2004-169496; 2004-238579; 2004-441049; 2004-441076; 2004-468837; 2005-010094; 2005-072810; 2005-252261; 2005-479395; 2005-553028; 2005-562715; 2005-618089; 2005-648263; 2005-711889; 2006-009303; 2006-361825; 2006-501977; 2007-410887 DNC C2007-134881 [35] TI New isolated human polynucleotides and polypeptides, useful for treating, e.g. Alzheimer's, Parkinson's disease, Huntington's disease, multiple sclerosis, rheumatoid arthritis, allergy, asthma, or and cancers DC B04; D16; D21 IN ASUNDI V; CAO Y; CHEN R; DRMANAC R T; GOODRICH R W; LIU C; MA Y; REN F; TANG Y T; WANG D; WANG J; WANG Z W; WEHRMAN T; XU C; XUE A; YANG Y; ZHANG J; ZHAO Q A; ZHOU P PA (NUVE-N) NUVELO INC CYC 1 PIA US 20070042392 A1 20070222 (200735)* EN 50[0] <-- ADT US 20070042392 A1 CIP of US 2000-496914 20000203; US 20070042392 A1 CIP of US 2000-560875 20000427; US 20070042392 A1 Div Ex US 2000-598075 20000620; US 20070042392 A1 CIP of US 2000-620325 20000719; US 20070042392 A1 CIP of US 2000-654936 20000901; US 20070042392 A1 Div Ex US 2000-663561 20000915; US 20070042392 A1 Div Ex US 2000-693325 20001020; US 20070042392 A1 CIP of US 2000-728422 20001130; US 20070042392 A1 CIP of US 2001-774434 20010130; US 20070042392 A1 CIP of WO 2001-US4098 20010205; US 20070042392 A1 CIP of US 2002-112931 20020328; US 20070042392 A1 CIP of US 2002-233045 20020830; US 20070042392 A1 CIP of US 2002-256113 20020925; US 20070042392 A1 Cont of US 2002-293244 20021112; US 20070042392 A1 US 2005-218141 20050831 PRAI US 2005-218141 20050831 US 2000-496914 20000203 US 2000-560875 20000427 US 2000-598075 20000620 US 2000-620325 20000719 US 2000-654936 20000901 US 2000-663561 20000915 US 2000-693325 20001020 US 2000-728422 20001130 US 2001-774434 20010130 WO 2001-US4098 20010205 US 2002-112931 20020328 US 2002-233045 20020830 US 2002-256113 20020925 US 2002-293244 20021112
USGENE® and DGENE on STN® Multifile techniques
Page 19
L14 ANSWER 8 OF 41 WPINDEX COPYRIGHT 2008 THE THOMSON CORP on STN FAMILY 1 AN 2001-476283 [51] WPINDEX Full-text CR 2001-442253; 2001-442255; 2001-451890; 2001-451908; 2001-451909; 2001-451912; 2001-451938; 2001-451939; 2001-457603; 2001-457740; 2001-465363; 2001-465571; 2001-465578; 2001-465705; 2001-476114; 2001-476164; 2001-476197; 2001-476198; 2001-476199; 2001-476282; 2001-483140; 2001-483233; 2001-488707; 2001-488788; 2001-488875; 2001-488895; 2001-496929; 2001-496930; 2001-496931; 2001-496932; 2001-514838; 2001-522358; 2001-565565; 2001-582152; 2001-582153; 2001-589862; 2001-589934; 2001-607699; 2001-611724; 2001-611725; 2001-626375; 2001-626426; 2001-626432; 2001-626527; 2001-639362; 2001-657166; 2002-010428; 2002-025688; 2002-062370; 2002-280918; 2002-426278; 2002-575369; 2002-590824; 2002-674924; 2002-759812; 2003-018710; 2003-028924; 2003-110596; 2003-174164; 2003-313249; 2003-381616; 2003-456302; 2003-513756; 2003-569235; 2003-625403; 2003-678194; 2003-679633; 2003-697229; 2003-697230; 2003-697231; 2003-810980; 2003-829799; 2003-851723; 2003-852227; 2004-061257; 2004-089285; 2004-143291; 2004-167523; 2004-167906; 2004-169496; 2004-238579; 2005-010094; 2005-072810; 2005-252261; 2005-553028; 2005-562715; 2005-618089; 2005-648263; 2005-711889; 2006-009303; 2006-361825; 2006-501977; 2007-372063; 2007-410887 DNC C2001-142911 [51] TI Nucleic acids encoding polypeptides with cytokine-like activities, useful in diagnosis and gene therapy DC B04; D16 IN ASUNDI V; CAO Y; CHEN R; DRMANAC R T; GOODRICH R; LIU C; MA Y; REN F; TANG Y T; WANG D; WANG J; WANG Z W; WEHRMAN T; WEJHRMAN T; XU C; XUE A J; YANG Y; ZHANG J; ZHAO Q A; ZHOU P PA (ASUN-I) ASUNDI V; (DRMA-I) DRMANAC R T; (GOOD-I) GOODRICH R; (HYSE-N) HYSEQ INC; (LIUC-I) LIU C; (NUVE-N) NUVELO INC; (RENF-I) REN F; (TANG-I) TANG Y T; (WEHR-I) WEHRMAN T; (XUEA-I) XUE A J; (YANG-I) YANG Y; (ZHAN-I) ZHANG J; (ZHAO-I) ZHAO Q A; (ZHOU-I) ZHOU P CYC 93 PIA WO 2001057190 A2 20010809 (200151)* EN 827[0] <-- AU 2001034944 A 20010814 (200173) EN US 20020128187 A1 20020912 (200262) EN US 20030158400 A1 20030821 (200356) EN US 20030165921 A1 20030904 (200359) EN EP 1572987 A2 20050914 (200560) EN AU 2001234944 A8 20061221 (200729) EN ADT WO 2001057190 A2 WO 2001-US4098 20010205; US 20020128187 A1 CIP of US 2000-496914 20000203; US 20030158400 A1 CIP of US 2000-496914 20000203; US 20030165921 A1 CIP of US 2000-496914 20000203; US 20020128187 A1 CIP of US 2000-560875 20000427; US 20030158400 A1 CIP of US 2000-560875 20000427; US 20030165921 A1 CIP of US 2000-560875 20000427; US 20030165921 A1 Div Ex US 2000-663561 20000915; US 20030158400 A1 Div Ex US 2000-693325 20001020; US 20020128187 A1 US 2000-728422 20001130; AU 2001034944 A AU 2001-34944 20010205; EP 1572987 A2 EP 2001-907128 20010205; EP 1572987 A2 WO 2001-US4098 20010205; US 20030165921 A1 US 2002-233045 20020830; US 20030158400 A1 US 2002-256113 20020925; AU 2001234944 A8 AU 2001-234944 20010205 FDT AU 2001034944 A Based on WO 2001057190 A; EP 1572987 A2 Based on WO 2001057190 A; AU 2001234944 A8 Based on WO 2001057190 A PRAI US 2000-728422 20001130 US 2000-496914 20000203 US 2000-560875 20000427 US 2000-598075 20000620 US 2000-620325 20000719 US 2000-654936 20000901 US 2000-663561 20000915 US 2000-693325 20001020 US 2002-233045 20020830 US 2002-256113 20020925
USGENE® and DGENE on STN® Multifile techniques
Page 20
L14 ANSWER 12 OF 41 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN FAMILY 3 TI Tissue-and serum-derived glycoproteins and methods of their use (PublishedApplication) MTY Protein SQL 437 ORGN Homo Sapiens SEQN 10979 SEQC 14918 BLASTALIGN Query = 437 letters Length = 437 Score = 902 bits (2331), Expect = 0.0 Identities = 437/437 (100%), Positives = 437/437 (100%) Query: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Sbjct: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Query: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Sbjct: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Query: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Sbjct: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Query: 181 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Sbjct: 181 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Query: 241 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Sbjct: 241 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Query: 301 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Sbjct: 301 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Query: 361 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Sbjct: 361 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Query: 421 GAFQHVGKAFNQGKIFK 437 GAFQHVGKAFNQGKIFK Sbjct: 421 GAFQHVGKAFNQGKIFK 437 L14 ANSWER 13 OF 41 WPINDEX COPYRIGHT 2008 THE THOMSON CORP on STN FAMILY3 AN 2007-560359 [54] WPINDEX Full-text DNC C2007-204913 [54] DNN N2007-432303 [54] TI New diagnostic panel comprising detection reagents that are specific for tissue-derived serum glycoprotein, useful in defining a disease-associated tissue-derived blood fingerprint or monitoring response to therapy in a subject DC A89; B04; D16; K08; S03 IN AEBERSOLD R H; ZHANG H PA (SYST-N) INST SYSTEMS BIOLOGY CYC 116 PIA WO 2007047796 A2 20070426 (200754)* EN 242[6] US 20070099251 A1 20070503 (200754) EN <-- ADT WO 2007047796 A2 WO 2006-US40784 20061017; US 20070099251 A1 Provisional US 2005-728044P 20051017; US 20070099251 A1 US 2006-582861 20061017 PRAI US 2005-728044P 20051017 US 2006-582861 20061017
USGENE® and DGENE on STN® Multifile techniques
Page 21
L14 ANSWER 16 OF 41 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN FAMILY 5 TI Polymorphisms in known genes associated with human disease, methods of detection and uses thereof (Patent) MTY Protein SQL 443 ORGN Human SEQN 10598 SEQC 207012 BLASTALIGN Query = 437 letters Length = 443 Score = 902 bits (2331), Expect = 0.0 Identities = 437/437 (100%), Positives = 437/437 (100%) Query: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Sbjct: 7 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Query: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Sbjct: 67 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Query: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Sbjct: 127 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Query: 181 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Sbjct: 187 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Query: 241 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Sbjct: 247 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Query: 301 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Sbjct: 307 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Query: 361 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Sbjct: 367 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Query: 421 GAFQHVGKAFNQGKIFK 437 GAFQHVGKAFNQGKIFK Sbjct: 427 GAFQHVGKAFNQGKIFK 443 L14 ANSWER 17 OF 41 WPINDEX COPYRIGHT 2008 THE THOMSON CORP on STN FAMILY5 AN 2004-774378 [76] WPINDEX Full-text CR 2007-371008 DNC C2004-271118 [76] TI Novel isolated polynucleotide consisting of single nucleotide polymorphisms of genes associated with human disease, useful for screening for human disease susceptibility, prevention, and development of diagnostics for human disease DC B04; D16 IN CRAVCHIK A; KALUSH F; LIU X; NAIK A; ROWE W; SUBRAMANIAN G; VENTER J C; WOODAGE T; ZHANG J N PA (APPL-N) APPLERA CORP CYC 1 PIA US 6812339 B1 20041102 (200476)* EN 24[1] <-- ADT US 6812339 B1 Provisional US 2000-231498P 20000908; US 6812339 B1 Provisional US 2000-237768P 20001003; US 6812339 B1 Provisional US 2000-241755P 20001020; US 6812339 B1 US 2001-949016 20010910 PRAI US 2001-949016 20010910 US 2000-231498P 20000908 US 2000-237768P 20001003 US 2000-241755P 20001020
USGENE® and DGENE on STN® Multifile techniques
Page 22
L14 ANSWER 18 OF 41 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN FAMILY 6 TI Genetic polymorphisms associated with coronary heart disease, methods of detection and uses thereof (PublishedApplication) MTY Protein SQL 437 ORGN Homo Sapiens SEQN 138 SEQC 17377 BLASTALIGN Query = 437 letters Length = 437 Score = 889 bits (2296), Expect = 0.0 Identities = 430/437 (98%), Positives = 433/437 (98%) Query: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA MAAGTLYTYPENWRAFKALIAAQYSGAQ+RVLSAPPHFHFGQTNRT EFLRKFPAGKVPA Sbjct: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQIRVLSAPPHFHFGQTNRTSEFLRKFPAGKVPA Query: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Sbjct: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Query: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Sbjct: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Query: 181 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE R+AF NTNRWFLTCINQPQFRAVLGE+KLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Sbjct: 181 RRAFRNTNRWFLTCINQPQFRAVLGELKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Query: 241 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Sbjct: 241 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Query: 301 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Sbjct: 301 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Query: 361 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPG EETQTLVREYFSWE Sbjct: 361 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGREETQTLVREYFSWE Query: 421 GAFQHVGKAFNQGKIFK 437 GAFQHVGKAFN GKIFK Sbjct: 421 GAFQHVGKAFNHGKIFK 437 L14 ANSWER 19 OF 41 WPINDEX COPYRIGHT 2008 THE THOMSON CORP on STN FAMILY6 AN 2005-630949 [64] WPINDEX Full-text DNC C2005-189319 [64] TI New isolated nucleic acid molecule comprising a single nucleotide polymorphism, useful for identifying an individual at an increased risk of developing coronary heart disease, or for treating or preventing myocardial infarction DC B04; D16 IN CARGILL M; DEVLIN J; DEVLIN J J; SHIFFMAN D; CARGILL M C G; SHIFFMAN PA (APPL-N) APPLERA CORP CYC 108 PIA WO 2005087953 A2 20050922 (200564)* EN 135[1] US 20060228715 A1 20061012 (200668) EN <-- EP 1745147 A2 20070124 (200708) EN ADT WO 2005087953 A2 WO 2005-US7453 20050307; US 20060228715 A1 Provisional US 2004-550051P 20040305; US 20060228715 A1 Provisional . . . . FDT EP 1745147 A2 Based on WO 2005087953 A PRAI US 2004-617163P 20041012 US 2004-550051P 20040305 US 2004-567831P 20040505 US 2005-73360 20050307
USGENE® and DGENE on STN® Multifile techniques
Page 23
L14 ANSWER 23 OF 41 USGENE COPYRIGHT 2008 SEQUENCEBASE CORP on STN FAMILY 8 TI Novel nucleic acids and polypeptides (PublishedApplication) MTY Protein SQL 438 ORGN Homo Sapiens SEQN 34203 SEQC 60736 BLASTALIGN Query = 437 letters Length = 438 Score = 889 bits (2296), Expect = 0.0 Identities = 431/437 (98%), Positives = 431/437 (98%) Query: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Sbjct: 2 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Query: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Sbjct: 62 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Query: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Sbjct: 122 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Query: 181 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE RQAFPNT RWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKK TPRKEKGSREE Sbjct: 182 RQAFPNTTRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKGTPRKEKGSREE Query: 241 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Sbjct: 242 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Query: 301 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQ FMSCNLITGM QRLDKLRKNAFASV Sbjct: 302 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQPFMSCNLITGMLQRLDKLRKNAFASV Query: 361 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPG EETQTLVREYFSWE Sbjct: 362 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGREETQTLVREYFSWE Query: 421 GAFQHVGKAFNQGKIFK 437 GAFQHVGKAFN GKIFK Sbjct: 422 GAFQHVGKAFNHGKIFK 438
USGENE® and DGENE on STN® Multifile techniques
Page 24
L14 ANSWER 24 OF 41 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN FAMILY 8 AN ABG03844 Protein DGENE TI New isolated polynucleotide and encoded polypeptides, useful in diagnostics, forensics, gene mapping, identification of mutations responsible for genetic disorders or other traits and to assess biodiversity - DESC Novel human diagnostic protein #3835. KW Human; chromosome mapping; gene mapping; gene therapy; forensic; food supplement; medical imaging; diagnostic; genetic disorder. SQL 438 OS 2001-639362 [73] BLASTALIGN Query = 437 letters Length = 438 Score = 889 bits (2296), Expect = 0.0 Identities = 431/437 (98%), Positives = 431/437 (98%) Query: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Sbjct: 2 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Query: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Sbjct: 62 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Query: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Sbjct: 122 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Query: 181 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE RQAFPNT RWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKK TPRKEKGSREE Sbjct: 182 RQAFPNTTRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKGTPRKEKGSREE Query: 241 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Sbjct: 242 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Query: 301 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQ FMSCNLITGM QRLDKLRKNAFASV Sbjct: 302 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQPFMSCNLITGMLQRLDKLRKNAFASV Query: 361 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPG EETQTLVREYFSWE Sbjct: 362 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGREETQTLVREYFSWE Query: 421 GAFQHVGKAFNQGKIFK 437 GAFQHVGKAFN GKIFK Sbjct: 422 GAFQHVGKAFNHGKIFK 438
USGENE® and DGENE on STN® Multifile techniques
Page 25
L14 ANSWER 25 OF 41 WPINDEX COPYRIGHT 2008 THE THOMSON CORP on STN FAMILY8 AN 2001-639362 [73] WPINDEX Full-text CR 2001-442253; 2001-442255; 2001-451890; 2001-451908; 2001-451909; 2001-451912; 2001-451938; 2001-451939; 2001-457603; 2001-457740; 2001-465363; 2001-465571; 2001-465578; 2001-465705; 2001-476114; 2001-476164; 2001-476197; 2001-476198; 2001-476199; 2001-476282; 2001-476283; 2001-483140; 2001-483233; 2001-488707; 2001-488788; 2001-488875; 2001-488895; 2001-496929; 2001-496930; 2001-496931; 2001-496932; 2001-514838; 2001-522358; 2001-565565; 2001-582152; 2001-582153; 2001-589862; 2001-589934; 2001-607699; 2001-611724; 2001-611725; 2001-626375; 2001-626426; 2001-626432; 2001-626527; 2001-657166; 2002-010428; 2002-025688; 2002-062370; 2002-280918; 2002-426278; 2002-575369; 2002-590824; 2002-674924; 2002-759812; 2003-018710; 2003-028924; 2003-110596; 2003-174164; 2003-313249; 2003-381616; 2003-456302; 2003-513756; 2003-569235; 2003-625403; 2003-678194; 2003-679633; 2003-697229; 2003-697230; 2003-697231; 2003-810980; 2003-829799; 2003-851723; 2003-852227; 2004-061257; 2004-089285; 2004-143291; 2004-167523; 2004-167906; 2004-169496; 2004-238579; 2005-010094; 2005-072810; 2005-252261; 2005-553028; 2005-562715; 2005-618089; 2005-648263; 2005-711889; 2006-009303; 2006-501977; 2007-372063; 2007-410887 DNC C2001-189193 [73] TI New isolated polynucleotide and encoded polypeptides, useful in diagnostics, forensics, gene mapping, identification of mutations responsible for genetic disorders or other traits and to assess biodiversity DC B04; D16 IN DRMANAC R T; LIU C; TANG Y T PA (DRMA-I) DRMANAC R T; (HYSE-N) HYSEQ INC; (LIUC-I) LIU C; (TANG-I) TANG Y T CYC 93 PIA WO 2001075067 A2 20011011 (200173)* EN 103[0] <-- AU 2001049251 A 20011015 (200209) EN US 20050196754 A1 20050908 (200559) EN <-- AU 2001249251 A8 20050915 (200569) EN ADT WO 2001075067 A2 WO 2001-US8631 20010330; AU 2001049251 A AU 2001-49251 20010330; AU 2001249251 A8 AU 2001-249251 20010330; US 20050196754 A1 WO 2001-US8631 20010330; US 20050196754 A1 US 2004-450763 20040220 FDT AU 2001049251 A Based on WO 2001075067 A; AU 2001249251 A8 Based on WO 2001075067 A PRAI US 2000-649167 20000823 US 2000-540217 20000331
USGENE® and DGENE on STN® Multifile techniques
Page 26
L14 ANSWER 31 OF 41 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN FAMILY 11 AN AEL43555 protein DGENE TI New human cancer suppressor proteins and DNA, useful for diagnosing, preventing, and treating human cancers, e.g. cancer of the breast, brain, heart, muscles, large intestine, thymus, spleen, kidney, liver, or small intestine. DESC Human cancer suppressor protein GIG35. KW diagnosis; therapeutic; prophylaxis; gene therapy; cancer; tumor; neoplasm; cytostatic; GIG35; EEF1G. SQL 437 OS 2006-747536 [76] BLASTALIGN Query = 437 letters Length = 437 Score = 902 bits (2331), Expect = 0.0 Identities = 437/437 (100%), Positives = 437/437 (100%) Query: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Sbjct: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Query: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Sbjct: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Query: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Sbjct: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Query: 181 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Sbjct: 181 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Query: 241 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Sbjct: 241 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Query: 301 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Sbjct: 301 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Query: 361 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Sbjct: 361 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Query: 421 GAFQHVGKAFNQGKIFK 437 GAFQHVGKAFNQGKIFK Sbjct: 421 GAFQHVGKAFNQGKIFK 437 L14 ANSWER 32 OF 41 WPINDEX COPYRIGHT 2008 THE THOMSON CORP on STN FAMILY11 AN 2006-747536 [76] WPINDEX Full-text DNC C2006-230801 [76] TI New human cancer suppressor proteins and DNA, useful for diagnosing, preventing, and treating human cancers, e.g. cancer of the breast, brain, heart, muscles, large intestine, thymus, spleen, kidney, liver, or intestine DC B04; D16 IN KIM H; KIM J; KIM H K PA (KIMH-I) KIM H; (KIMH-I) KIM H K CYC 112 PIA WO 2006109941 A1 20061019 (200676)* EN 333[174] <-- KR 2006104265 A 20061009 (200705) KO EP 1866333 A1 20071219 (200802) EN ADT WO 2006109941 A1 WO 2006-KR1174 20060330; KR 2006104265 A KR 2005-26254 20050330; EP 1866333 A1 EP 2006-732747 20060330; EP 1866333 A1 WO 2006-KR1174 20060330 FDT EP 1866333 A1 Based on WO 2006109941 A PRAI KR 2005-26254 20050330
USGENE® and DGENE on STN® Multifile techniques
Page 27
L14 ANSWER 38 OF 41 DGENE COPYRIGHT 2008 THE THOMSON CORP on STN FAMILY 14 AN ADR67296 protein DGENE TI New nucleic acids, and encoded proteins, from bladder cancer tissue, useful for diagnosis, treatment and in screening for specific binding agents. DESC Human bladder cancer associated amino acid sequence. KW bladder cancer tissue; bladder cancer; cytostatic. SQL 437 OS 2004-653385 [63] BLASTALIGN Query = 437 letters Length = 437 Score = 902 bits (2331), Expect = 0.0 Identities = 437/437 (100%), Positives = 437/437 (100%) Query: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Sbjct: 1 MAAGTLYTYPENWRAFKALIAAQYSGAQVRVLSAPPHFHFGQTNRTPEFLRKFPAGKVPA Query: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Sbjct: 61 FEGDDGFCVFESNAIAYYVSNEELRGSTPEAAAQVVQWVSFADSDIVPPASTWVFPTLGI Query: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Sbjct: 121 MHHNKQATENAKEEVRRILGLLDAYLKTRTFLVGERVTLADITVVCTLLWLYKQVLEPSF Query: 181 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Sbjct: 181 RQAFPNTNRWFLTCINQPQFRAVLGEVKLCEKMAQFDAKKFAETQPKKDTPRKEKGSREE Query: 241 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Sbjct: 241 KQKPQAERKEEKKAAAPAPEEEMDECEQALAAEPKAKDPFAHLPKSTFVLDEFKRKYSNE Query: 301 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Sbjct: 301 DTLSVALPYFWEHFDKDGWSLWYSEYRFPEELTQTFMSCNLITGMFQRLDKLRKNAFASV Query: 361 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Sbjct: 361 ILFGTNNSSSISGVWVFRGQELAFPLSPDWQVDYESYTWRKLDPGSEETQTLVREYFSWE Query: 421 GAFQHVGKAFNQGKIFK 437 GAFQHVGKAFNQGKIFK Sbjct: 421 GAFQHVGKAFNQGKIFK 437 L14 ANSWER 39 OF 41 WPINDEX COPYRIGHT 2008 THE THOMSON CORP on STN FAMILY 14 AN 2004-653385 [63] WPINDEX Full-text DNC C2004-233839 [63] TI New nucleic acids, and encoded proteins, from bladder cancer tissue, useful for diagnosis, treatment and in screening for specific binding agents DC B04; D16 IN DAHL E; HERR A; HINZMANN B; PILARSKY C; SPECHT T; STAUB E PA (DAHL-I) DAHL E; (HERR-I) HERR A; (HINZ-I) HINZMANN B; (PILA-I) PILARSKY C; (SPEC-I) SPECHT T; (STAU-I) STAUB E CYC 106 PIA WO 2004076613 A2 20040910 (200463)* DE 112[3] <-- DE 10309729 A1 20040916 (200463) DE ADT WO 2004076613 A2 WO 2004-DE364 20040224; DE 10309729 A1 DE 2003-10309729 20030226 PRAI DE 2003-10309729 20030226
STN Service Centers
FIZ KarlsruheSTN EuropeP.O. Box 246576012 KarlsruheGermany
Phone: +49 7247 808 555Fax: +49 7247 808 259E-mail: helpdesk@fiz-karlsruhe.deInternet: www.stn-international.de
CASSTN North AmericaP.O. Box 3012Columbus, Ohio 43210-0012
CAS Customer Care:Phone: 800-753-4227 (North America)
614-447-3700 (worldwide)Fax: 614-447-3751E-mail: help@cas.orgInternet: www.cas.org
Japan Association for International Chemical Information (JAICI)STN JapanNakai Building6-25-4 Honkomagome, Bunkyo-kuTokyo 113-0021, Japan
Phone: +81-3-5978-3601 (Technical Service)+81-3-5978-3621 (Customer Service)
Fax: +81-3-5978-3600E-mail: helpdesk@jaici.or.jp (Technical Service)
cas-stn@jaici.or.jp (Customer Service)Internet: www.jaici.or.jp