Sequence Conservation/Variation Analysis

2
Human HOST H1N1 SUB TYPE Genomic Sequence Analysis ANALYSIS 1 SEGMENT/PROTEIN Run Clear SEARCH TYPE Precomputed analysis using sequences in the IRD database for a specified host, segment, and subtype New analysis of sequences you select from the IRD database or upload To study sequence polymorphism across different influenza A strains, all influenza type A sequences were downloaded from GenBank. These sequences were processed through the IRD curation pipeline. During this processing, pre-aligned sequences were generated with the ClustalW multiple alignment tool. Aligned sequences from the same host and segment (and subtype) were used to determine sequence polymorphism. A consensus sequence for each subtype was created by following the majority rule. For each position in the nucleotide multiple alignment, a score was generated by using a formula modified from the one as described in Crooks et al.. The score ranges from 0 (no polymorphism) to 200 (highest polymorphism). For more details on the score and general polymorphism approach, reference our SEARCH CRITERIA Analyze Sequence Variation (SNP) The IRD team has grouped sequences by segment number, flu type, subtype, and host and aligned the sequences. A consensus sequence has been determined for each group and the variation from that value has been determined for each position in the sequence. This has been done for both nucleotide and amino acid sequences. The IRD tool can perform the same analysis using nucleotide or amino acid sequences that you either select from the IRD database or upload on this page. Home Analyze Sequence Variation SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA http://www.fludb.org/ Freely available Integrated datasets Bioinformatics tool suite Platform for influenza data submission IRD is funded by the National Institute of Allergy and Infectious Diseases under Contract No. HHSN266200400041C and is a collaboration between Northrop Grumman Health IT, J. Craig Venter Institute, Vecna Technologies, SAGE Analytica and Los Alamos National Laboratory. University of Texas Southwestern Medical Center was a past subcontractor. Questions, suggestions? Contact us at [email protected] Sequence Conservation/Variation Analysis I. Examine consensus sequence/sequence polymorphism across all influenza A strains 1 2 You can view the consensus and all sequence variations existing in all GenBank influenza type A sequences based on host of isolation, flu subtype and segment. Lets you analyze sequence polymorphism at the nucleotide or amino acid level. Provides pre-computed consensus sequence and polymorphism score at each position for all influenza A strains. Allows you to calculate polymorphism of IRD sequences or your own sequences. Select host, influenza subtype, type of sequence to analyze, segment number or protein name. Figure View Protein Sequence Analysis (4 HA) FASTA Format Raw Alignment Sequence Variance Analysis for Human H1N1, Segment 4 Position Coding Score Consensus A T G C Deletion # Sequences 1 no 72 A 683 6 129 3 0 821 2 no 70 G 5 2 720 136 0 863 3 no 73 C 132 0 15 730 0 877 4 no 12 A 882 2 10 1 0 895 5 no 5 A 927 0 3 2 0 932 6 no 6 A 939 1 2 2 0 944 7 no 66 A 801 1 145 3 0 950 8 no 69 G 6 1 803 147 1 958 9 no 66 C 149 0 6 844 0 999 67 A 853 1 150 2 7 G 3 1 1043 3 5 G 2 2 1197 2 6 G 7 0 1502 2 69 G 313 0 1412 1 6 A 1839 2 8 1 27 A 2027 88 3 1 0 2119 3 A 2250 1 3 3 0 2257 66 A 1995 32 6 275 0 2309 158 N 751 1037 2 1028 0 2818 8 A 4054 13 5 3 9 4084 Home Analyze Sequence Variation Results 3 At each position, the consensus is the allele with frequency greater than 50%. If no allele exceeds 50%, N (for nucleotide) or Xaa (for amino acid) is used to indicate ambiguity. Switch to protein sequence variation result analysis result Download consensus sequence in FASTA format Download raw alignment of all sequences Score ranges from 0 (no polymorphism) to 200 (highest polymorphism). Count for different nucleotides at each position View sequence polymorphism plot 1. Mouse-over the “Analyze & Visualize” tab and click “Analyze Sequence Variation (SNP)”. 2. On the tool landing page, select “Pre- computed analysis using sequences in the IRD database for a specified host, segment, and subtype” in search type. Next, select desired host, subtype, analysis type (genomic or protein sequence), and segment/protein to analyze. Click “Run”. 3. The analysis result page will be loaded, which shows the polymorphism score, consensus, and counts for each different base/amino acid at each position. Sequence polymorphism plot, consensus sequence, and raw alignment are available for download.

Transcript of Sequence Conservation/Variation Analysis

Cite IRD Tutorials Glossary of Terms Report a Bug Request Web Training Contact Us Release Date: Sep 7, 2012

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN266200400041C and is a collaboration between NorthropGrumman Health IT, J. Craig Venter Institute , Vecna Technologies, SAGE Analytica and Los Alamos National Laboratory.

Human

HOSTH1N1

SUB TYPEGenomic Sequence Analysis

ANALYSIS1

SEGMENT/PROTEIN

RunClear

SEARCH TYPE Precomputed analysis using sequences in the IRD database for a specified host, segment, and subtype New analysis of sequences you select from the IRD database or upload

To study sequence polymorphism across different influenza A strains, all influenza type A sequences were downloaded from GenBank. These sequences were processedthrough the IRD curation pipeline. During this processing, pre-aligned sequences were generated with the ClustalW multiple alignment tool. Aligned sequences from thesame host and segment (and subtype) were used to determine sequence polymorphism.

A consensus sequence for each subtype was created by following the majority rule. For each position in the nucleotide multiple alignment, a score was generated by using aformula modified from the one as described in Crooks et al.. The score ranges from 0 (no polymorphism) to 200 (highest polymorphism). For more details on the scoreand general polymorphism approach, reference our Help document.

SEARCH CRITERIA

Analyze Sequence Variation (SNP) The IRD team has grouped sequences by segment number, flu type, subtype, and host and aligned the sequences. A consensus sequence has been determined for each groupand the variation from that value has been determined for each position in the sequence. This has been done for both nucleotide and amino acid sequences. The IRD tool canperform the same analysis using nucleotide or amino acid sequences that you either select from the IRD database or upload on this page.

Home Analyze Sequence Variation

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA

About Us Community Announcements Links Resources Support Sign Out

You are logged in as [email protected]

http://www.fludb.org/

Freely available Integrated datasets Bioinformatics tool suite Platform for influenza data submission

IRD is funded by the National Institute of Allergy and Infectious Diseases under Contract No. HHSN266200400041C and is a collaboration between Northrop Grumman Health IT, J. Craig Venter Institute, Vecna Technologies, SAGE Analytica and Los Alamos National Laboratory. University of Texas Southwestern Medical Center was a past subcontractor. Questions, suggestions? Contact us at [email protected]

Sequence Conservation/Variation Analysis

I. Examine consensus sequence/sequence polymorphism across all influenza A strains

1 2

You can view the consensus and all sequence variations existing in all GenBank influenza type A sequences based on host of isolation, flu subtype and segment.

• Lets you analyze sequence polymorphism at the nucleotide or amino acid level. • Provides pre-computed consensus sequence and polymorphism score at each

position for all influenza A strains. • Allows you to calculate polymorphism of IRD sequences or your own sequences.

Select host, influenza subtype, type of sequence

to analyze, segment number or protein name.

Figure View Protein Sequence Analysis (4 HA) FASTA Format Raw Alignment

Sequence Variance Analysis for Human H1N1, Segment 4

Position Coding Score Consensus A T G C Deletion # Sequences

1 no 72 A 683 6 129 3 0 821

2 no 70 G 5 2 720 136 0 863

3 no 73 C 132 0 15 730 0 877

4 no 12 A 882 2 10 1 0 895

5 no 5 A 927 0 3 2 0 932

6 no 6 A 939 1 2 2 0 944

7 no 66 A 801 1 145 3 0 950

8 no 69 G 6 1 803 147 1 958

9 no 66 C 149 0 6 844 0 999

10 no 67 A 853 1 150 2 3 1009

11 no 7 G 3 1 1043 3 0 1050

12 no 5 G 2 2 1197 2 0 1203

13 no 6 G 7 0 1502 2 0 1511

14 no 69 G 313 0 1412 1 0 1726

15 no 6 A 1839 2 8 1 0 1850

16 no 27 A 2027 88 3 1 0 2119

17 no 3 A 2250 1 3 3 0 2257

18 no 66 A 1995 32 6 275 0 2309

19 no 158 N 751 1037 2 1028 0 2818

20 no 8 A 4054 13 5 3 9 4084

21 no 2 A 4150 0 4 4 0 4158

22 no 2 A 4200 2 0 3 1 4206

23 no 69 A 3543 4 778 1 0 4326

24 no 138 G 904 3 2604 831 0 4342

25 no 69 C 787 1 2 3565 0 4355

26 no 3 A 4354 2 5 4 1 4366

27 no 69 A 3611 2 1 792 0 4406

28 no 69 C 779 6 0 3660 2 4447

29 no 123 A 2872 6 1 1224 368 4471

30 no 7 A 4499 4 28 2 0 4533

31 no 38 A 4228 3 325 1 1 4558

32 no 45 A 4169 1 9 1 400 4580

33 no 0 A 11497 0 2 0 0 11499

34 no 0 T 2 11498 1 0 0 11501

35 no 0 G 1 0 11503 0 0 11504

36 no 19 A 11190 0 331 0 0 11522

37 no 1 A 11519 2 4 1 0 11526

38 no 84 G 3006 4 8518 2 0 11531

39 no 2 G 24 1 11509 0 1 11535

40 no 73 C 2 2319 1 9213 1 11539

41 no 7 A 11458 4 77 2 1 11542

42 no 1 A 11579 0 4 0 1 11584

43 no 87 T 3057 8489 30 16 1 11594

44 no 5 A 11535 2 4 57 1 11599

45 no 2 C 12 9 0 11583 1 11606

Home Analyze Sequence Variation Results

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA

About Us Community Announcements Links Resources Support Sign Out

You are logged in as [email protected]

3

At each position, the consensus is the allele with

frequency greater than 50%. If no allele exceeds 50%, N (for nucleotide) or Xaa (for

amino acid) is used to indicate ambiguity.

Switch to protein sequence

variation result analysis result

Download consensus sequence

in FASTA format

Download raw alignment of all

sequences

Score ranges from 0 (no

polymorphism) to 200 (highest

polymorphism).

Count for different

nucleotides at each position

View sequence polymorphism

plot

1. Mouse-over the “Analyze & Visualize” tab and click “Analyze Sequence Variation (SNP)”.

2. On the tool landing page, select “Pre-computed analysis using sequences in the IRD database for a specified host, segment, and subtype” in search type. Next, select desired host, subtype, analysis type (genomic or protein sequence), and segment/protein to analyze. Click “Run”.

3. The analysis result page will be loaded, which shows the polymorphism score, consensus, and counts for each different base/amino acid at each position. Sequence polymorphism plot, consensus sequence, and raw alignment are available for download.

2

Cite IRD Tutorials Glossary of Terms Report a Bug Request Web Training Contact Us Release Date: Sep 7, 2012

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN266200400041C and is a collaboration between NorthropGrumman Health IT, J. Craig Venter Institute , Vecna Technologies, SAGE Analytica and Los Alamos National Laboratory.

Upload a file containing my sequences in FASTA format.

Paste sequences in FASTA format.

Use working sets

INPUT SEQUENCES

Sequences can also be selected from search results or a working set in your workbench.

File Path:Browse…

The minimum number of sequences is 2.

RunClear

SEARCH TYPE Precomputed analysis using sequences in the IRD database for a specified host, segment, and subtype New analysis of sequences you select from the IRD database or upload

To study sequence polymorphism across different influenza A strains, all influenza type A sequences were downloaded from GenBank. These sequences were processedthrough the IRD curation pipeline. During this processing, pre-aligned sequences were generated with the ClustalW multiple alignment tool. Aligned sequences from thesame host and segment (and subtype) were used to determine sequence polymorphism.

A consensus sequence for each subtype was created by following the majority rule. For each position in the nucleotide multiple alignment, a score was generated by using aformula modified from the one as described in Crooks et al.. The score ranges from 0 (no polymorphism) to 200 (highest polymorphism). For more details on the scoreand general polymorphism approach, reference our Help document.

Analyze Sequence Variation (SNP) The IRD team has grouped sequences by segment number, flu type, subtype, and host and aligned the sequences. A consensus sequence has been determined for each groupand the variation from that value has been determined for each position in the sequence. This has been done for both nucleotide and amino acid sequences. The IRD tool canperform the same analysis using nucleotide or amino acid sequences that you either select from the IRD database or upload on this page.

Home Analyze Sequence Variation

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA

About Us Community Announcements Links Resources Support Sign Out

You are logged in as [email protected]

Cite IRD Tutorials Glossary of Terms Report a Bug Request Web Training Contact Us Release Date: Sep 7, 2012

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN266200400041C and is a collaboration between NorthropGrumman Health IT, J. Craig Venter Institute , Vecna Technologies, SAGE Analytica and Los Alamos National Laboratory.

DATA TO RETURN Segment / Nucleotide

Protein

Strain

VIRUS TYPEA

B

C

SUB TYPE

* Use comma to separate multipleentries.Ex: H1N1, H7, H3N2.

STRAIN NAME

* Use comma to separate multipleentries.Ex: A/chicken/Israel/1055/2008,A/chicken/Laos/16/2008.

Complete Genome OnlyComplete SequencesInclude laboratory strainsInclude/exclude records with highsimilarity to 2009 pH1N1sequences (SOP)

SELECT SEGMENTSAll1 PB22 PB1/PB1-F23 PA/PA-X4 HA5 NP6 NA7 M1/M28 NS1/NS2

DATE RANGEFrom: YYYY To: YYYY

To add month to search, seeAdvance Options: Month Range

HOSTAllAvianBatBlow FlyCamelCheetahCivetDogDomestic CatDonkeyEnvironmentFerretHorseHumanLabLarge CatM l

GEOGRAPHIC GROUPINGAllAfricaAsiaEuropeNorth AmericaOceaniaS h A i

COUNTRYAfghanistanAlgeriaArgentinaAustraliaAustriaAzerbaijanB h i

ADVANCED OPTIONSSearchClear

Tip: To select multiple or deselect, Ctrl-click (Windows) or Cmd-click (MacOS) Show All

Nucleotide Sequence Search Search for influenza sequences, proteins, and strains using two types of searches. Use the advanced search to allow you to refine your search with the more fine grained search,and you can pick your viewing options.

205,653 matching results

Home Nucleotide Sequence Search

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA

QUICK SEARCH

Quick Search

SEARCH FOR OR FIND

Search Sequences

Animal Surveillance

Immune Epitopes

3D Protein Structures

Phenotype

Human Clinical Metadata

Serology Experiments (Beta)

Sequence Feature Variant Types (beta)

PCR Primer Probe Data

Host Factor Data (Beta)

Laboratory Experiments (beta)

WHO Influenza Vaccine Strains

SEARCH HISTORY

Retrieve a Download

Your Search History

Nucleotide Sequences

Protein Sequences

Strain Data

About Us Community Announcements Links Resources Support Sign Out

You are logged in as [email protected]

Your search returned 894 segments. Search Criteria Displaying 50 per pageDisplay Settings

Add to Working Set Save Search Download

1 2 3 4 5 6 7 Next > Page: 1 of 18

Nucleotide Sequence Search Results

Your Selected Items: 6 items selected | Deselect All

Select all 894 segments

SegmentProteinName

SequenceAccession

DateHost

SpeciesCountry

FluSeason

Strain Name

4 HA U04857 1937 Swine USA -N/A- *A/swine/29/1937(H1N1)

4 HA CY025002 1977 Swine USA -N/A- *A/swine/Arizona/148/1977(H1N1)

4 HA CY082070 No 900 H1N1 02/09/2006 *Swine USA 05-06 A/swine/Arkansas/00993/2006

4 HA CY082074 No 883 H1N1 02/23/2006 *Swine USA 05-06 A/swine/Arkansas/00998/2006

4 HA CY082359 * No 902 H1N1 01/31/2007 *Swine USA 06-07 A/swine/Arkansas/01460/2007

4 HA CY040460 No 901 H1N1 2008 Swine USA -N/A- *A/swine/Arkansas/63607-3/2008(H1N1)

4 HA CY028780 Yes 1732 H1N1 1991 Swine USA -N/A- *A/swine/California/T9001707/1991(H1N1)

4 HA CY082171 * No 859 H1N1 05/31/2006 *Swine USA 05-06 A/swine/Colorado/01151/2006

4 HA CY081685 No 895 H1N1 02/24/2004 *Swine USA 03-04 A/swine/Georgia/00252/2004

4 HA CY081707 * No 888 H1N1 05/04/2004 Swine USA 03-04 A/swine/Georgia/00297/2004

4 HA CY082071 No 904 H1N1 02/10/2006 Swine USA 05-06 A/swine/Georgia/00995/2006

4 HA FJ638298 No 1713 H1N1 2005 Swine USA -N/A- *A/swine/IL/00685/2005(H1N1)

4 HA GU984396 No 1701 H1N1 12/29/2009 Swine USA 09-10 *A/swine/IL/10-001550/2009(H1N1)

4 HA GU984399 No 1701 H1N1 12/20/2009 Swine USA 09-10 *A/swine/IL/10-001551-1/2009(H1N1)

4 HA GU984402 No 1701 H1N1 12/20/2009 Swine USA 09-10 *A/swine/IL/10-001551-2/2009(H1N1)

4 HA HM219618 No 1701 H1N1 02/23/2010 Swine USA 09-10 *A/swine/IL/12660/2010(H1N1)

4 HA HM219633 No 1701 H1N1 03/18/2010 Swine USA 09-10 *A/swine/IL/17315-1/2010(H1N1)

4 HA HM219636 No 1701 H1N1 03/18/2010 Swine USA 09-10 *A/swine/IL/17315-3/2010(H1N1)

4 HA HQ291537 No 1701 H1N1 05/18/2010 Swine USA 09-10 *A/swine/IL/25399-2/2010(H1N1)

4 HA HQ291540 No 1701 H1N1 05/18/2010 Swine USA 09-10 *A/swine/IL/25399-3/2010(H1N1)

4 HA HQ291543 No 1701 H1N1 05/18/2010 Swine USA 09-10 *A/swine/IL/25399-4/2010(H1N1)

4 HA HQ291546 No 1701 H1N1 06/02/2010 Swine USA -N/A- *A/swine/IL/27486-1/2010(H1N1)

4 HA HQ291549 No 1701 H1N1 06/02/2010 Swine USA -N/A- *A/swine/IL/27486-2/2010(H1N1)

4 HA GU480922 No 1701 H1N1 11/11/2009 Swine USA 09-10 *A/swine/IL/32974/2009(H1N1)

Run Analysis �

Home Nucleotide Sequence Search Results

Identify Similar Sequences (BLAST)

Align Sequences (MSA)

Visualize Aligned Sequences

Generate Phylogenetic Tree

Analyze Sequence Variation (SNP)

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA HOME

About Us Community Announcements Links Resources Support Sign Out

You are logged in as [email protected]

Inuenza Research Database - Nucleotide Sequence Search Results http://www.udb.org/brc/inuenza_sequence_search_segment_dis...

1 of 2 7/28/11 12:38 PM

Cite IRD Tutorials Glossary of Terms Report a Bug Request Web Training Contact Us Release Date: Sep 7, 2012

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN266200400041C and is a collaboration between NorthropGrumman Health IT, J. Craig Venter Institute , Vecna Technologies, SAGE Analytica and Los Alamos National Laboratory.

50 records were previously selected from search results

INPUT SEQUENCES

RunClear

To study sequence polymorphism across different influenza A strains, all influenza type A sequences were downloaded from GenBank. These sequences were processedthrough the IRD curation pipeline. During this processing, pre-aligned sequences were generated with the ClustalW multiple alignment tool. Aligned sequences from thesame host and segment (and subtype) were used to determine sequence polymorphism.

A consensus sequence for each subtype was created by following the majority rule. For each position in the nucleotide multiple alignment, a score was generated by using aformula modified from the one as described in Crooks et al.. The score ranges from 0 (no polymorphism) to 200 (highest polymorphism). For more details on the scoreand general polymorphism approach, reference our Help document.

Analyze Sequence Variation (SNP) The IRD team has grouped sequences by segment number, flu type, subtype, and host and aligned the sequences. A consensus sequence has been determined for each groupand the variation from that value has been determined for each position in the sequence. This has been done for both nucleotide and amino acid sequences. The IRD tool canperform the same analysis using nucleotide or amino acid sequences that you either select from the IRD database or upload on this page.

Home Nucleotide Sequence Search Results Analyze Sequence Variation

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA

About Us Community Announcements Links Resources Support Sign Out

You are logged in as [email protected]

Loading Influenza Research Database...

Cite IRD Tutorials Glossary of Terms Report a Bug Request Web Training Contact Us Release Date: Jul 18, 2011

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN266200400041C and is a collaboration between NorthropGrumman Health IT, University of Texas Southwestern Medical Center , Vecna Technologies, SAGE Analytica and Los Alamos National Laboratory.

Upload a file containing my sequences in FASTA format.

Paste sequence in FASTA format.

Use working set.

INPUT SEQUENCESSequences can also be selected from search results or a working set in your workbench.

>gb:HM628693|Organism:Influenza A virus A/Acre/15093/2010|Segment:4|Subtype:H3N2|Host:HumanATGAAGACTATCATTGCTTTGAGCTACATTCTATGTCTGGTTTTCGCTCAAAAACTTCCTGGAAATGACAACAGCACGGCAACGCTGTGCCTTGGGCACCATGCAGTACCAAACGGGACGATAGTGAAAACAATCACGAATGACCAAATTGAAGTTACTTATGCTACTGAGCTGGTTCAGAGTTCCTCAACAGGTGAAATATGCGACAGTCCCCATCAGATCCTTGATGGAAAAAACTGCACACTAATAGATGCTCTATTGGGAGACCCTCAGTGTGATGGCTTCCAAAATAAGAAATGGGACCTTTTTGTTGAACGCAGCAAAGCCTACAGCAACTGTTACCCTTATGATGTGCCGGATTATGCCTCCCTTAGGTCACTAGTTGCCTCATCCGGCACACTTGAGTTTAACAATGAAAGC

The minimum number of sequences is 2.Defline in your FASTA file will be used to label the display

HTML

SELECT OUTPUT FORMATAligned

SELECT OUTPUT ORDER

RunClear

Align Sequences (MSA) IRD uses the MUSCLE (Multiple Sequence Comparison by Log-Expectation) algorithm to align the sequences you select from a search result or a working set on yourworkbench or that you provide in an uploaded file.

Home Align Sequences (MSA)

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA HOME

About Us Community Announcements Links Resources Support Sign Out

You are logged in as [email protected]

Influenza Research Database - MUSCLE Multiple Sequence Alig... http://www.fludb.org/brc/msa.do?method=ShowCleanInputPage&...

1 of 1 8/1/11 4:10 PM

Cite IRD Tutorials Glossary of Terms Report a Bug Request Web Training Contact Us Release Date: Sep 7, 2012

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN266200400041C and is a collaboration between NorthropGrumman Health IT, J. Craig Venter Institute , Vecna Technologies, SAGE Analytica and Los Alamos National Laboratory.

Pandemic_H1N1

HOSTH1N1

SUB TYPEProtein Sequence Analysis

ANALYSIS1 PB2

SEGMENT/PROTEIN

RunClear

SEARCH TYPE Precomputed analysis using sequences in the IRD database for a specified host, segment, and subtype New analysis of sequences you select from the IRD database or upload

To study sequence polymorphism across different influenza A strains, all influenza type A sequences were downloaded from GenBank. These sequences were processedthrough the IRD curation pipeline. During this processing, pre-aligned sequences were generated with the ClustalW multiple alignment tool. Aligned sequences from thesame host and segment (and subtype) were used to determine sequence polymorphism.

A consensus sequence for each subtype was created by following the majority rule. For each position in the nucleotide multiple alignment, a score was generated by using aformula modified from the one as described in Crooks et al.. The score ranges from 0 (no polymorphism) to 200 (highest polymorphism). For more details on the scoreand general polymorphism approach, reference our Help document.

SEARCH CRITERIA

Analyze Sequence Variation (SNP) The IRD team has grouped sequences by segment number, flu type, subtype, and host and aligned the sequences. A consensus sequence has been determined for each groupand the variation from that value has been determined for each position in the sequence. This has been done for both nucleotide and amino acid sequences. The IRD tool canperform the same analysis using nucleotide or amino acid sequences that you either select from the IRD database or upload on this page.

Home Analyze Sequence Variation

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA

ANALYZE & VISUALIZE

Identify Similar Sequences (BLAST)

Align Sequences (MSA)

Identify Short Peptides in Proteins

Identify Point Mutations in Proteins

Generate Phylogenetic Tree

Visualize Aligned Sequences

Annotate Nucleotide Sequences

Analyze Sequence Variation (SNP)

Metadata Sequence Analysis

Sequence Format Conversion

Pandemic H1N1 Classification

PCR Primer Design

HISTORY

Retrieve an Analysis

Retrieve a Download

Retrieve an Annotation

Your Analysis History

About Us Community Announcements Links Resources Support Sign Out

You are logged in as [email protected]

http://www.fludb.org/

Freely available Integrated datasets Bioinformatics tool suite Platform for influenza data submission

IRD is funded by the National Institute of Allergy and Infectious Diseases under Contract No. HHSN266200400041C and is a collaboration between Northrop Grumman Health IT, J. Craig Venter Institute, Vecna Technologies, SAGE Analytica and Los Alamos National Laboratory. University of Texas Southwestern Medical Center was a past subcontractor. Questions, suggestions? Contact us at [email protected]

II. Calculate consensus sequence/sequence variation for your custom sequence set Option 1. Input sequences from search results

1

2.1

Three options to input sequences

Option 2. Use your own custom sequences

Loading Influenza Research Database...

Cite IRD Tutorials Glossary of Terms Report a Bug Request Web Training Contact Us Release Date: Jul 18, 2011

This project is funded by the National Institute of Allergy and Infectious Diseases (NIH / DHHS) under Contract No. HHSN266200400041C and is a collaboration between NorthropGrumman Health IT, University of Texas Southwestern Medical Center , Vecna Technologies, SAGE Analytica and Los Alamos National Laboratory.

Upload a file containing my sequences in format.

Paste sequence in FASTADefline in your FASTA file will be used to label the display

Use working set.

Choose a Working Set

TREE GENERATION

Quick Tree (Let IRD set all parameters - view all parameters) Custom Tree (I want to set my own parameters)

SEQUENCE TYPE *

Nucleotide Amino Acid (Protein)

SOURCE OF SEQUENCES TO BE ANALYZED Sequences can also be selected from workbench.

LABELINGDefline in your FASTA file will be used to label the display

FORMAT OF SEQUENCES PROVIDED

Unaligned FASTA Aligned FASTA Phylip (interleaved)

Build TreeClear

Generate Phylogenetic Tree IRD uses PhyML [ Guindon, S. and Gascuel, O., (2003) Syst Biol. 52: 696-704 ] to infer phylogenies based on nucleotide sequences. Additionally, IRD provides severaloptions to display a generated tree. (SOP)Note: An asterisk (*) = required field

Home Generate Phylogenetic Tree

SEARCH DATA ANALYZE & VISUALIZE WORKBENCH SUBMIT DATA HOME

About Us Community Announcements Links Resources Support Sign Out

You are logged in as [email protected]

Cancel Select

Choose Working Set

Name Type Number of Sequences Date

Australia HA 07 & 09 Segment 36 06/03/2011 6:40 PM

california swine flu Segment 4 05/31/2011 2:21 PM

H1N1 NA South Africa Segment 14 05/06/2011 5:49 PM

Perth-HA Segment 14 07/15/2011 1:34 PM

WHO vaccine strains - HA Segment 12 05/31/2011 3:45 PM

Inuenza Research Database - Phylogenetic Tree http://www.udb.org/brc/tree.do?method=ShowCleanInputPage&...

1 of 1 7/29/11 10:01 AM

2.3

1. Search for nucleotide or protein sequences in IRD by using one of the “Search Sequences” options available from the “Search Data” tab.

2. On the search results page, select the desired sequences, mouse-over the “Run Analysis” button and click “Analyze Sequence Variation (SNP)”.

3. Click “Run” on the next page. 4. A Sequence Variation Analysis report will be displayed. Refer to step 3

on the reverse side for explanation of the results.

2

2

3

1. Mouse-over the “Analyze & Visualize” tab and click “Analyze Sequence Variation (SNP)”.

2. On the tool landing page, select “New analysis of sequences you select from the IRD database or upload” in search type. Use one of the three options to input sequences:

2.1 Upload a sequence file in FASTA format OR 2.2 Paste sequences in FASTA format OR 2.3 Use a working set from your Workbench. Then click “Run” to run the analysis. 3. A Sequence Variation Analysis report will be

displayed. Refer to step 3 on the reverse side for explanation of the results.

1

2.2

Select sequences and add them to a working set for future analysis. You’ll need to register a Workbench account

to use this feature.

Click to view details of

the record