UCSC Genome Browser User Guide

download UCSC Genome Browser User Guide

of 22

description

Guide for using the UCSC Genome Browser

Transcript of UCSC Genome Browser User Guide

  • Genome Browser User Guide

    Table of Contents:

    What does the Genome Browser do?Getting started: Genome Browser gatewaysFine-tuning the Genome Browser displayAnnotation track descriptionsUsing BLAT alignmentsGetting started on the Table BrowserGetting started using Sessions

    Getting started on Genome GraphsUsing the VisiGene Image BrowserDNA text formattingConverting data between assembliesDownloading genome dataCreating and Managing custom annotation tracksGetting Started on Track Hubs

    Search the Genome Browser help pages: Submit

    Search the entire Genome Browser website: Submit

    Browse the Genome Browser mailing list.

    See also the Open Helix tutorial and training materials.

    Questions and feedback are welcome.

    What does the Genome Browser do?

    As vertebrate genome sequences near completion and research re-focuses on their analysis, the issue of effective sequence displaybecomes critical: it is not helpful to have 3 billion letters of genomic DNA shown as plain text! As an alternative, the UCSC GenomeBrowser provides a rapid and reliable display of any requested portion of genomes at any scale, together with dozens of alignedannotation tracks (known genes, predicted genes, ESTs, mRNAs, CpG islands, assembly gaps and coverage, chromosomal bands,mouse homologies, and more). Half of the annotation tracks are computed at UCSC from publicly available sequence data. Theremaining tracks are provided by collaborators worldwide. Users can also add their own custom tracks to the browser for educational orresearch purposes.

    The Genome Browser stacks annotation tracks beneath genome coordinate positions, allowing rapid visual correlation of different typesof information. The user can look at a whole chromosome to get a feel for gene density, open a specific cytogenetic band to see apositionally mapped disease gene candidate, or zoom in to a particular gene to view its spliced ESTs and possible alternative splicing.The Genome Browser itself does not draw conclusions; rather, it collates all relevant information in one location, leaving the explorationand interpretation to the user.

    The Genome Browser supports text and sequence based searches that provide quick, precise access to any region of specific interest.Secondary links from individual entries within annotation tracks lead to sequence details and supplementary off-site databases. Tocontrol information overload, tracks need not be displayed in full. Tracks can be hidden, collapsed into a condensed or single-linedisplay, or filtered according to the user's criteria. Zooming and scrolling controls help to narrow or broaden the displayed chromosomalrange to focus on the exact region of interest. Clicking on an individual item within a track opens a details page containing a summary ofproperties and links to off-site repositories such as PubMed, GenBank, Entrez, and OMIM. The page provides item-specific informationon position, cytoband, strand, data source, and encoded protein, mRNA, genomic sequence and alignment, as appropriate to the natureof the track.

    A blue navigation bar at the top of the browser provides links to several other tools and data sources. For instance, under the "View"menu, the "DNA" link enables the user to view the raw genomic DNA sequence for the coordinate range displayed in the browserwindow. This DNA can encode track features via elaborate text formatting options. Other links tie the Genome Browser to the BLATalignment tool, provide access to the underlying relational database via the Table Browser, convert coordinates across differentassembly dates, and open the window at the complementary Ensembl or NCBI Map Viewer annotation.

    The browser data represents an immense collaborative effort involving thousands of people from the international biomedical researchcommunity. The UCSC Bioinformatics Group itself does no sequencing. Although it creates the majority of the annotation tracks in-house, the annotations are based on publicly available data contributed by many labs and research groups throughout the world.Several of the Genome Browser annotations are generated in collaboration with outside individuals or are contributed wholly by externalresearch groups. UCSC's other major roles include building genome assemblies, creating the Genome Browser work environment, andserving it online. The majority of the sequence data, annotation tracks, and even software are in the public domain and are available for

    Genomes Genome Browser Tools Mirrors Downloads My Data Help About Us

  • anyone to download.

    In addition to the Genome Browser, the UCSC Genome Bioinformatics group provides several other tools for viewing and interpretinggenome data:

    BLAT - a fast sequence-alignment tool similar to BLAST. Read more.Table Browser - convenient text-based access to the database underlying the Genome Browser. Read more.Genome Graphs - a tool that allows you to upload and display genome-wide data sets such as the results of genome-wide SNPassociation studies, linkage studies and homozygosity mapping. Read more.Gene Sorter - expression, homology, and other information on groups of genes that can be related in many ways. Read more.

    Getting Started: Genome Browser gateways

    The UCSC Genome Bioinformatics home page provides access to Genome Browsers on several different genome assemblies. To getstarted, click the Browser link on the blue sidebar. This will take you to a Gateway page where you can select which genome to display.Note that there is also an official European mirror site for users who are geographically closer to central Europe than to the westernUnited States.

    Opening the Genome Browser at a specific positionTo get oriented in using the Genome Browser, try viewing a gene or region of the genome with which you are already familiar, or use thedefault position. To open the Genome Browser window:1. Select the clade, genome and assembly that you wish to display from the corresponding pull-down menus. Assemblies are typically

    named by the first three characters of an organism's genus and species names. For older assemblies that are no longer availablefrom the menu, the data may still be available on our Downloads page.

    2. Specify the genome location you'd like the Genome Browser to open to. To select a location, enter a valid position query in thesearch term text box at the top of the Gateway page or accept the default position already displayed. The search supports severaldifferent types of queries: gene symbols, mRNA or EST accession numbers, chromosome bands, descriptive terms likely to occur inGenBank text, or specific chromosomal ranges.

    3. Click the submit button to open up the Genome Browser window to the requested location. In cases where a specific term(accession, gene name, etc.) was queried, the item will be highlighted in the display.

    Occasionally the Gateway page returns a list of several matches in response to a search, rather than immediately displaying theGenome Browser window. When this occurs, click on the item in which you're interested and the Genome Browser will open to thatlocation.

    The search mechanism is not a site-wide search engine. Instead, it primarily searches GenBank mRNA records whose text annotationscan include gene names, gene symbols, journal title words, author names, and RefSeq mRNAs. Searches on other selected identifiers,such as NP and NM accession numbers, OMIM identifiers, and Entrez Gene IDs are supported. However, some types of queries willreturn an error, e.g. post-assembly GenBank entries, withdrawn gene names, and abandoned synonyms. If your initial query isunsuccessful, try entering a different related term that may produce the same location. For example, if a query on a gene symbolproduces no results, try entering an mRNA accession, gene ID number, or descriptive words associated with the gene.

    Finding a genome location using BLATIf you have genomic, mRNA, or protein sequence, but don't know the name or the location to which it maps in the genome, the BLATtool will rapidly locate the position by homology alignment, provided that the region has been sequenced. This search will find closemembers of the gene family, as well as assembly duplication artifacts. An entire set of query sequences can be looked upsimultaneously when provided in fasta format.

    A successful BLAT search returns a list of one or more genome locations that match the input sequence. To view one of the alignmentsin the Genome Browser, click the browser link for the match. The details link can be used to preview the alignment to determine if it is ofsufficient match quality to merit viewing in the Genome Browser. If too many BLAT hits occur, try narrowing the search by filtering thesequence in slow mode with RepeatMasker, then rerunning the BLAT search.

    For more information on conducting and fine-tuning BLAT searches, refer to the BLAT section of this document.

    Opening the Genome Browser with a custom annotation trackYou can open the Genome Browser window with a custom annotation track displayed by using the Add Custom Tracks feature availablefrom the gateway and annotation tracks pages. For more information on creating and using custom annotation tracks, refer to theCreating custom annotation tracks section.

    Annotation track data can be entered in one of three ways:-- Enter the file name for an annotation track source file in the Annotation File text box.-- Type or paste the annotation track data into the large text box.-- If the annotation data are accessible through a URL, enter the URL name in the large text box.

  • Once you've entered the annotation information, click the submit button at the top of the Gateway page to open up the Genome Browserwith the annotation track displayed.

    The Genome Browser also provides a collection of custom annotation tracks contributed by the UCSC Genome Bioinformatics groupand the research community.

    NOTE: If an annotation track does not display correctly when you attempt to upload it, you may need to reset the Genome Browser to itsdefault settings, then reload the track. For information on troubleshooting display problems with custom annotation tracks, refer to thetroubleshooting section in the Creating custom annotation tracks section.

    Viewing genome data as textThe Table Browser, a portal to the underlying open source MySQL relational database driving the Genome Browser, displays genomicdata as columns of text rather than as graphical tracks. For more information on using the Table Browser, see the section GettingStarted: on the Table Browser.

    Opening the Genome Browser from external gatewaysSeveral external gateways provide direct links into the Genome Browser. Examples include: Entrez Gene, AceView, Ensembl,SuperFamily, and GeneCards. Journal articles can also link to the browser and provide custom tracks. Be sure to use the assemblydate appropriate to the provided coordinates when using data from a journal source.

    Tips for UseTo facilitate your return to regions of interest within the Genome Browser, save the coordinate range or bookmark the page of displaysthat you plan to revisit or wish to share with others.

    It is usually best to work with the most recent assembly even though a full set of tracks might not yet be ready. Be aware that thecoordinates of a given feature on an unfinished chromosome may change from one assembly to the next as gaps are filled, artifactualduplications are reduced, and strand orientations are corrected. The Genome Browser offers multiple tools that can correctly convertcoordinates between different assembly releases. For more information on conversion tools, see the section Converting data betweenassemblies.

    To ensure uninterrupted browser services for your research during UCSC server maintenance and power outages, bookmark a mirrorsite that replicates the UCSC genome browser.

    Bear in mind that the Genome Browser cannot outperform the underlying quality of the draft genome. Assembly errors and sequencegaps may still occur well into the sequencing process due to regions that are intrinsically difficult to sequence. Artifactual duplicationsarise as unavoidable compromises during a build, causing misleading matches in genome coordinates found by alignment.

    Interpreting and fine-tuning the Genome Browser display

    The Genome Browser annotation tracks page displays a genome location specified through a Gateway search, a BLAT search, or anuploaded custom annotation track. There are five main features on this page: a set of navigation controls, a chromosome ideogram, theannotations tracks image, display configuration buttons, and a set of track display controls.

    The first time you open the Genome Browser, it will use the application default values to configure the annotation tracks display. Bymanipulating the navigation, configuration and display controls, you can customize the annotation tracks display to suit your needs. Fora complete description of the annotation tracks available in all assembly versions supported by the Genome Browser, see theAnnotation Track Descriptions section.

    The Genome Browser retains user preferences from session to session within the same web browser, although it never monitors orrecords user activities or submitted data. To restore the default settings, click the "Click here to reset" link on the Genome BrowserGateway page. To return the display to the default set of tracks (but retain custom tracks and other configured Genome Browsersettings), click the default tracks button on the Genome Browser page.

    Display conventionsThe annotation tracks displayed in the Genome Browser use a common set of display conventions:-- Annotation track descriptions: Each annotation track has an associated description page that contains a discussion of the track,

    the methods used to create the annotation, the data sources and credits for the track, and (in some cases) filter and configurationoptions to fine-tune the information displayed in the track. To view the description page, click on the mini-button to the left of adisplayed track or on the label for the track in the Track Controls section.

    -- Annotation track details pages: When an annotation track is displayed in full, pack, or squish mode, each line item within thetrack has an associated details page that can be displayed by clicking on the item or its label. The information contained in thedetails page varies by annotation track, but may include basic position information about the item, related links to outside sites anddatabases, links to genomic alignments, or links to corresponding mRNA, genomic, and protein sequences.

    -- Gene prediction tracks: Coding exons are represented by blocks connected by horizontal lines representing introns. The 5' and 3'untranslated regions (UTRs) are displayed as thinner blocks on the leading and trailing ends of the aligning regions. In full displaymode, arrowheads on the connecting intron lines indicate the direction of transcription. In situations where no intron is visible (e.g.single-exon genes, extremely zoomed-in displays), the arrowheads are displayed on the exon block itself.

  • -- Pattern Space Layout (PSL) alignment tracks: Aligning regions (usually exons when the query is cDNA) are shown as blackblocks. In dense display mode, the degree of darkness corresponds to the number of features aligning to the region or the degree ofquality of the match. In pack or full display mode, the aligning regions are connected by lines representing gaps in the alignment(typically spliced-out introns), with arrowheads indicating the orientation of the alignment, pointing right if the query sequence wasaligned to the forward strand of the genome and left if aligned to the reverse strand. Two parallel lines are drawn over double-sidedalignment gaps, which skip over unalignable sequence in both target and query. For alignments of ESTs, the arrows may bereversed to show the apparent direction of transcription deduced from splice junction sequences. In situations where no gap linesare visible, the arrowheads are displayed on the block itself. To prevent display problems, the Genome Browser imposes an upperlimit on the number of alignments that can be viewed simultaneously within the tracks image. When this limit is exceeded, theBrowser displays the best several hundred alignments in a condensed display mode, then lists the number of undisplayedalignments in the last row of the track. In this situation, try zooming in to display more entries or to return the track to full displaymode. For some PSL tracks, extra coloring to indicate mismatching bases and query-only gaps may be available.

    -- "Chain" tracks (2-species alignment): Chain tracks display boxes joined together by either single or double lines. The boxesrepresent aligning regions. Single lines indicate gaps that are largely due to a deletion in the genome of the first species or aninsertion in the genome of the second species. Double lines represent more complex gaps that involve substantial sequence in bothspecies. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in onespecies. In cases where there are multiple chains over a particular portion of the genome, chains with single-lined gaps are oftendue to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessedpseudogenes. In the fuller display modes, the individual feature names indicate the chromosome, strand, and location (inthousands) of the match for each matching alignment.

    -- "Net" tracks (2-species alignment): Boxes represent ungapped alignments, while lines represent gaps. Clicking on a box displaysdetailed information about the chain as a whole, while clicking on a line shows information on the gap. The detailed information isuseful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display arecategorized as one of four types (other than gap):

    Top - The best, longest match. Displayed on level 1.Syn - Lineups on the same chromosome as the gap in the level above it.Inv - A lineup on the same chromosome as the gap above it, but in the opposite orientation.NonSyn - A match to a chromosome different from the gap in the level above.

    -- Snake tracks (alignment tracks): The snake alignment track (or snake track) shows the relationship between the chosen Browsergenome (reference genome) and another genome (query genome). A snake is a way of viewing a set of pairwise gaplessalignments that may overlap on both the reference and query genomes. Alignments are always represented as being on the positivestrand of the reference species, but can be on either strand on the query sequence.

    In full display mode, a snake track can be decomposed into two drawing elements: segments (colored rectangles) and adjacencies(lines connecting the segments). Segments represent subsequences of the target genome aligned to the given portion of thereference genome. Adjacencies represent the covalent bonds between the aligned subsequences of the target genome.

    Red tick-marks within segments represent substitutions with respect to the reference, shown in windows of the reference of (bydefault) up to 50 Kb. Zoomed in to the base level, these substitutions are labeled with the non-reference base.

    An insertion in the reference relative to the query creates a gap between abutting segment sides that is connected by an adjacency.An insertion in the query relative to the reference is represented by an orange tick-mark that splits a segment at the location theextra bases would be inserted. Simultaneous independent insertions in both query and reference look like an insertion in thereference relative to the target, except that the corresponding adjacency connecting the two segments is colored orange. Morecomplex structural rearrangements create adjacencies that connect the sides of non-abutting segments in a natural fashion.

    Pack mode can be used to display a larger number of snake tracks in the limited vertical browser. This mode eliminates theadjacencies from the display and forces the segments onto as few rows as possible, given the constraint of still showingduplications in the query sequence.

    Dense mode further eliminates these duplications so that each snake track is compactly represented along just one row.-- "Wiggle" tracks: These tracks plot a continuous function along a chromosome. Data is displayed in windows of a set number of

    base pairs in width. The score for each window displays as "mountain ranges". The display characteristics vary among the tracks inthis group. See the individual track descriptions for more information on interpreting the display. If the "mountain peak" is taller orshorter than what can be shown in the display, it is clipped and colored magenta.

    Changing the display mode of an individual annotation trackEach annotation track within the window may have up to five display modes:

    -- Hide: the track is not displayed at all. To hide all the annotation tracks, click the hide all button. This mode is useful for restricting thedisplay to only those tracks in which you are interested. For example, someone who is not interested in SNPs or mouse syntenymay want to hide these tracks to reduce track clutter and improve speed. There are a few annotation tracks that pertain only to onespecific chromosome, e.g. Sanger22, Rosetta. In these cases, the track and its associated controller will be hidden automaticallywhen the track window is not open to the relevant chromosome.

    -- Dense: the track is displayed with all features collapsed into a single line. This mode is useful for reducing the amount of spaceused by a track when you don't need individual line item details or when you just want to get an overall view of an annotation. For

  • example, by opening an entire chromosome and setting the RefSeq Genes track to dense, you can get a feel for the known genedensity of the chromosome without displaying excessive detail.

    -- Full: the track is displayed with each annotation feature on a separate line. It is recommended that you use this option sparingly,due to the large number of individual track items that may potentially align at the selected position. For example, hundreds of ESTsmight align with a specified gene. When the number of lines within a requested track location exceeds 250, the track automaticallydefaults to a more tightly-packed display mode. In this situation, you can restore the track display to full mode by narrowing thechromosomal range displayed or by using a track filter to reduce the number of items displayed. On tracks that contain only hide,dense, and full modes, you can toggle between full and dense display modes by clicking on the track's center label.

    -- Squish: the track is displayed with each annotation feature shown separately, but at 50% the height of full mode. Features areunlabeled, and more than one may be drawn on the same line. This mode is useful for reducing the amount of space used by atrack when you want to view a large number of individual features and get an overall view of an annotation. It is particularly good fordisplaying tracks in which a large number of features align to a particular section of a chromosome, e.g. EST tracks.

    -- Pack: the track is displayed with each annotation feature shown separately and labeled, but not necessarily displayed on aseparate line. This mode is useful for reducing the amount of space used by a track when you want to view the large number ofindividual features allowed by squish mode, but need the labeling and display size provided by full mode. When the number of lineswithin the requested track location exceeds 250, the track automatically defaults to squish display mode. In this situation, you canrestore the track display to pack mode by narrowing the chromosomal range displayed or by using a track filter to reduce thenumber of items displayed. To toggle between pack and full display modes, click on the track's center label.

    The track display controls are grouped into categories that reflect the type of data in the track, e.g. Gene Prediction Tracks, mRNA andEST tracks, etc. To change the display mode for a track, find the track's controller in the Track Controls section at the bottom of theGenome Browser page, select the desired mode from the control's display menu, and then click the refresh button. Alternatively, youcan change the display mode by using the Genome Browser's right-click navigation feature, or can toggle between dense and fullmodes for a displayed track (or pack mode when available) by clicking on the optional center label for the track.

    Changing the display mode for a group of tracksTrack display modes may be set individually or as a group on the Genome Browser Track Configuration page. To access theconfiguration page, click the configure button on the annotation tracks page or the configure tracks and display button on the Gatewaypage. Exercise caution when using the show all buttons on track groups or assemblies that contain a large number tracks; this mayseriously impact the display performance of the Genome Browser or cause your Internet browser to time out.

    Hiding the track display controlsThe entire set of track display controls at the bottom of the annotation tracks page may be hidden from view by checking the Show trackcontrols under main graphic option in the Configure Image section of the Track Configuration page.

    Changing the display of a track by using filters and configuration optionsSome tracks have additional filter and configuration capabilities, e.g. EST tracks, mRNA tracks, NC160, etc. These options let the usermodify the color or restrict the data displayed within an annotation track. Filters are useful for focusing attention on items relevant to thecurrent task in tracks that contain large amounts of data. For example, to highlight ESTs expressed in the liver, set the EST track filter todisplay items in a different color when the associated tissue keyword is "liver". Configuration options let the user adjust the display tobest show the data of interest. For example, the min vertical viewing range value on wiggle tracks can be used to establish a datathreshold. By setting the min value to "50", only data values greater than 50 percent will display.

    To access filter and configuration options for a specific annotation track, open the tracks' description page by clicking the label for thetrack's control menu under the Track Controls section, the mini-button to the left of the displayed track, or the "Configure..." option fromthe Genome Browser's right-click popup menu. The filter and configration section is located at the top of the description page. In mostinstances, more information about the configuration options is available within the description text or through a special help link locatedin the configuration section.

    Filter and configuration settings are persistent from session to session on the same web browser. To return the Genome Browserdisplay to the default set of tracks (but retain custom tracks and other configured Genome Browser settings), click the default tracksbutton on the Genome Browser tracks page. To remove all user configuration settings and custom tracks, and completely restore thedefaults, click the "Click here to reset" link on the Genome Browser Gateway page.

    Zooming and scrolling the tracks displayAt times you may want to adjust the amount of flanking region displayed in the annotation tracks window or adjust the scale of thedisplay. At a scale of 1 pixel per base pair, the window accurately displays the width of exons and introns, and indicates the direction oftranscription (using arrowheads) for multi-exon features. At a grosser scale, certain features - such as thin exons - may disappear. Also,some exons may falsely appear to fall within RepeatMasker features at some scales.

    Click the zoom in and zoom out buttons at the top of the Genome Browser page to zoom in or out on the center of the annotation trackswindow by 1.5, 3 or 10-fold. Alternatively, you can zoom in 3-fold on the display by clicking anywhere on the Base Position track. In thiscase, the zoom is centered on the coordinate of the mouse click. To view the base composition of the sequence underlying the currentannotation track display, click the base button.

    Quickly zoom to a specific region of interest by using the browser's "drag-and-select" feature. To define the region you wish to zoom to,click and hold the mouse button on one edge of the desired zoom area in the Base Position track, drag the mouse right or left tohighlight the selection area, then release the mouse button. A "drag-and-select" popup will appear. Click on the "Zoom In" button tozoom in on the selected region. To disable the drag-and-select popup, check the "Don't show this dialog again and always zoom"

  • checkbox. To drag-and-select (zoom) on a part of the image other than the Base Position track, depress the shift key before clicking anddragging the mouse. Note that the Enable advanced javascript features option on the Track Configuration page must be toggled on touse this feature.

    To scroll (pan) the view of the entire tracks image horizontally, click on the image and drag the cursor to the left or right, then release themouse button, to shift the displayed region in the corresponding direction. The view may be scrolled by up to one image width. To scrollthe annotation tracks horizontally by set increments of 10%, 50%, or 95% of the displayed size (as given in base pairs), click thecorresponding move arrow. It is also possible to scroll the left or right side of the tracks by a specified number of vertical gridlines whilekeeping the position of the opposite side fixed. To do this, click the appropriate move start or move end arrow, located under theannotation tracks window. For example, to keep the left-hand display coordinate fixed but increase the right-hand coordinate, you wouldclick the right-hand move end arrow. To increase or decrease the gridline scroll interval, edit the value in the move start or move end textbox.

    Highlighting a regionThe browser's "drag-and-select" feature also allows you to highlight a region or gene of interest. To highlight a region, click and hold themouse button on one edge of the desired area to be highlighted in the Base Position track, drag the mouse right or left to highlight theselection area, then release the mouse button. Click the "Highlight" button on the "drag-and-select" popup. Note, if the "drag-and-select"popup has been disabled, you may re-enable it on the browser image 'configure' page by selecting "Enable highlight with drag-and-select (if unchecked, drag-and-select always zooms to selection)".

    Options to remove highlighting, zoom in to a highlighted region, or jump to a highlighted region, can be found on the browser's right-clickmenu.

    To highlight a gene of interest, right-click on the gene (e.g., SOD1) and select "Highlight SOD1".

    Changing the displayed track positionTo display a completely different position in the genome, enter the new query in the position/search text box, then click the jump button.For more information on valid entries for this text box, refer to the Getting Started section.

    If a chromosome image (ideogram) is available above the track display, click anywhere on the chromosome to move to that position (thecurrent window size will be maintained). Select a region of any size by clicking and dragging in the image. Finally, hold the "control" keywhile clicking on a chromosome band to select the entire band.

    Changing the order of the displayed tracksTo vertically reposition a track in the annotation track window, click-and-hold the mouse button on the side label, then drag thehighlighted track up or down within the image. Release the mouse button when the track is in the desired position. To move an entiregroup of associated tracks (such as all the displayed subtracks in a composite track), click-and-hold the gray mini-button to the left ofthe tracks, then drag.

    Changing the width of the annotation track windowThe first time the annotation track window is displayed, or after the Genome Browser has been reset, the size of the track window is setby default to the width that best fits your Internet browser window. If you horizontally resize the browser window, you can automaticallyadjust the annotation track image size to the new width by clicking the resize button under the track image. To manually override thedefault width, enter a new value in the image width text box on the Track Configuration page, then click the submit button. Themaximum supported width is 5000 pixels.

    Changing the width of the label area to the left of the imageThe item labels (or track label, when viewed in dense mode) are displayed to the left of the annotation image. The width of this area isset to 17 characters by default. To change the width, edit the value in the label area width text box on the Track Configuration page, thenclick Submit.

    Changing the text size in the annotation track imageThe annotation track image may be adjusted to display text in a range of fonts from "tiny" to "huge". To change the size of the text,select an option from the text size pull-down menu on the Track Configuration page, then click Submit. The text size is set to "small" bydefault.

    Hiding the annotation track labelsThe track and element labels displayed above and to the left of the tracks in the annotation tracks image may be hidden from view byunchecking the Display track descriptions above each track and Display labels to the left of items in tracks boxes, respectively, on theTrack Configuration page.

    Hiding the display grid on the annotation tracks imageThe light blue vertical guidelines on the annotation tracks image may be removed by unchecking the Show light blue vertical guidelinesbox on the Track Configuration page.

    Hiding the chromosome ideogramThe chromosome ideogram, located just above the annotation tracks image, provides a graphical overview of the features on theselected chromosome, including its bands, the position of the centromere, and an indication of the region currently displayed in theannotation tracks image. To hide the ideogram, uncheck the Display chromosome ideogram above main graphic box on the Tracks

  • Configuration page.

    Enabling item and exon navigationWhen the Next/previous item navigation configuration option is toggled on, on the Track Configuration page, gray double-headed arrowsdisplay in the Genome Browser tracks image on both sides of the track labels of gene, mRNA and EST tracks (or any standard tracksbased on BED, PSL or genePred format). Clicking on the gray arrows shifts the image window toward that end of the chromosome sothat the next item in the track is displayed. Similarly, the Next/previous exon navigation configuration option displays white double-headed arrows on both the 5' and 3' end of each track item that has exons positioned beyond the edges of the current image. Clickingon one of the white arrows shifts the image window to the next exon located towards that end of the feature.

    Enabling the right-click navigation featureSeveral of the common display and navigation operations offered on the Genome Browser tracks page may be quickly accessed byright-clicking on a feature on the tracks image and selecting an option from the displayed popup menu. Depending on context, the right-click feature allows the user to:

    change the track display modezoom in or out to the exact position coordinates of the featureopen the "Get DNA" window at the feature's coordinatesdisplay details about the featureopen a popup window to configure the track's displaydisplay the entire tracks image in a separate window for inclusion in spreadsheets or other documents. (Note that the GenomeBrowser "PDF/PS" described below can also be used to generate a high-quality annotation tracks image suitable for printing.)

    To use the right-click feature, make sure the Enable advanced javascript features option on the Track Configuration page is checked,and configure your internet browser to allow the display of popup windows from genome.ucsc.edu. When enabled, the right-clicknavigation feature replaces the default contextual popup menu typically displayed by the internet browser when a user right-clicks on thetracks image. A few combinations of the Mozilla Firefox browser on Mac OS do not support the right-click menu functionality usingsecondary click; in these instances, ctrl+left-click must be used to display the menu.

    Printing a copy of the annotation track windowThe Genome Browser provides a mechanism for saving a copy of the currently displayed annotation tracks image to a file that can beprinted or edited. Images saved in PostScript format can be printed at high resolution and edited by drawing programs such as AdobeIllustrator. This is useful for generating figures intended for publication. Images can also be saved in PDF format for viewing by AdobeAcrobat Reader.

    To print or save the image to a file:1. In the blue navigation bar at the top of the screen, from the "View" menu, click the "PDF/PS" link.2. Click one of the PDF or EPS links.

    NOTE: If you have configured your browser image to use one of the larger font sizes, the text in the resulting screen shot may notdisplay correctly. If you encounter this problem, reduce the Genome Browser font size using the Configuration utility, then repeat thesave/print process.

    Using BLAT alignments

    BLAT (BLAST-Like Alignment Tool) is a very fast sequence alignment tool similar to BLAST. For more information on BLAT's internalscoring schemes and its overall n-mer alignment seed strategy, refer to W. James Kent (2002) BLAT - The BLAST-Like Alignment Tool,Genome Res 12:4 656-664.

    On DNA queries, BLAT is designed to quickly find sequences with 95% or greater similarity of length 25 bases or more. It may missgenomic alignments that are more divergent or shorter than these minimums, although it will find perfect sequence matches of 32 basesand sometimes as few as 22 bases. The tool is capable of aligning sequences that contain large introns. On protein queries, BLATrapidly locates genomic sequences with 80% or greater similarity of length 20 amino acids or more. In general, gene family membersthat arose within the last 350 million years can generally be detected. More divergent sequences can be aligned to the human genomeby using NCBI's BLAST and psi-BLAST, then using BLAT to align the resulting match onto the UCSC genome assembly. In practiceDNA BLAT works well on primates, and protein BLAT works well on land vertebrates.

    Some common uses of BLAT include:-- finding the genomic coordinates of mRNA or protein within a given assembly-- determining the exon structure of a gene-- displaying a coding region within a full-length gene-- isolating an EST of special interest as its own track-- searching for gene family members

  • -- finding human homologs of a query from another species.

    Making a BLAT queryTo locate a nucleotide or protein within a genome using BLAT:1. Open the BLAT Search Genome page by clicking on the "Tools" pulldown in the top blue menu bar of the Genome Browser.2. Select the genome, assembly, query type, output sort order, and output type. To order the search results based on the closeness of

    the sequence match, choose one of the score options in the Sort output menu. The score is determined by the number of matchesvs. mismatches in the final alignment of the query to the genome.

    3. If the sequence to be uploaded is in an unformatted plain text file, enter the file name in the Upload sequence text box, then clickthe submit file button. Otherwise, paste the sequence or fasta-formatted list into the large edit box, and then click the submit button.Input sequence can be obtained from the Genome Browser as well as from a custom annotation track.

    Header lines may be included in the input text if they are preceded by > and contain unique names. Multiple sequences may besubmitted at the same time if they are of the same type and are preceded by unique header lines. Numbers, spaces, and extraneouscharacters are ignored:

    >sequence_1ATGCAGAGCAAGGTGCTGCTGGCCGTCGCCCTGTGGCTCTGCGTGGAGACCCGGGCCGCCTCTGTGGGTTTGCCTAGTGTTTCTCTTGATCTGCCCAGGC>sequence_2ATGTTGTTTACCGTAAGCTGTAGTAAAATGAGCTCGATTGTTGACAGAGATGACAGTAGTATTTTTGATGGGTTGGTGGAAGAAGATGACAAGGACAAAG>sequence_3ATGCTGCGAACAGAGAGCTGCCGCCCCAGGTCGCCCGCCGGACAGGTGGCCGCGGCGTCCCCGCTCCTGCTGCTGCTGCTGCTGCTCGCCTGGTGCGCGG

    BLAT limitationsDNA input sequences are limited to a maximum length of 25,000 bases. Protein or translated input sequences must not exceed 10,000letters. As many as 25 multiple sequences may be submitted at the same time. The maximum combined length of DNA input for multiplesequence submissions is 50,000 bases (with a 25,000 base limit per individual sequence). For protein or translated input, the maximumcombined input length is 25,000 letters (with a 5000 letter limit per individual sequence).

    NOTE: Program-driven BLAT use is limited to a maximum of one hit every 15 seconds and no more than 5000 hits per day.

    BLAT query search resultsIf a query returns successfully, BLAT will display a flat database file that summarizes the alignments found. A BLAT query oftengenerates multiple hits. This can happen when the genome contains multiple copies of a sequence, paralogs, pseudogenes, statisticalcoincidences, artifactual assembly duplications, or when the query itself contains repeats or common retrotransposons. When too manyhits occur, try resubmitting the query sequence after filtering in slow mode with RepeatMasker.

    Items in the search results list are ordered by the criteria specified in the Sort output menu. Each line item provides links to view thedetails of the sequence alignment or to open the corresponding view in the Genome Browser. The details link gives the letter-by-letteralignment of the sequence to the genome. It is recommended that you first examine the details of the alignment for match quality beforeviewing the sequence in the Genome Browser.

    When several nearby BLAT matches occur on a single chromosome, a simple trick can be used to quickly adjust the Genome Browsertrack window to display all of them: open the Genome Browser with the match that has the lowest chromosome start coordinate, pastein the highest chromosome end coordinate from the list of matches, then click the jump button.

    Creating a custom annotation track from BLAT outputTo make a custom track directly from BLAT, select the PSL format output option. The resulting PSL track can be uploaded into theGenome Browser by pasting the data into the data text box on the Genome Browser Add Custom Tracks page, accessed via the "addcustom tracks" button on the Browser gateway and annotation tracks pages. See the Creating custom annotation tracks section formore information.

    Using BLAT for large batch jobs or commercial useFor large batch jobs or internal parameter changes, it is best to install command line BLAT on your own Linux server. Sources andexecutables are free for academic, personal, and non-profit purposes. BLAT source may be downloaded fromhttp://www.soe.ucsc.edu/~kent (look for the blatSrc*.zip file with the most recent date). For BLAT executables, go to http://genome-test.cse.ucsc.edu/~kent/exe/; binaries are sorted by platform. Non-exclusive commercial licenses are available from the Kent Informaticswebsite.

    BLAT documentationFor more information on the BLAT suite of programs, see the BLAT Program Specifications and the Blat section of the Genome BrowserFAQ.

    Annotation track descriptions

    Detailed information about an individual annotation track, including display characteristics, configuration information, and associated

  • database tables, may be obtained from the track description page accessed by clicking the mini-button to the left of the displayed trackin the Genome Browser, or by selecting the "Open details..." or "Show details..." option from the Genome Browser's right-click menu.Click the "View table schema" link on the track description page to display additional information about the primary database tableunderlying the track. Table schema information may also be accessed via the "describe table schema" button in the Table Browser. Formore information on configuring and using the tracks displayed in the Genome Browser track window, see the section Interpreting andFine-tuning the Genome Browser display.

    Tips for viewing annotation track data-- To display a description page with more information about the track, click on the mini-button to the left of a track.-- To display a details page with additional information about a specific line item within a track in full display mode, click on the item or

    its label.-- A track does not appear in the browser if its display mode is set to hide. To restrict the browser's display to only those tracks in which

    you're interested, set the display mode of the unwanted tracks to hide.-- A track set to full display mode will default to a more tightly packed display mode if the total number of lines in the track exceeds 250.-- To quickly toggle between full and dense or pack display modes, click on the track's center label.-- Only the most recent assemblies are fully active. The data for older assemblies may be available on our Downloads page.-- Not all tracks appear in all assemblies. Only a basic set of tracks appears initially in a new assembly.-- Track data can be viewed as text tables using the Table Browser.-- Credit goes to many individuals and institutions for generously contributing the tracks. For specific information about the contributors

    of a given track, look at the Credits section on a track's description page.

    Getting started on the Table Browser

    The Table Browser provides text-based access to the genome assemblies and annotation data stored in the Genome Browserdatabase. As a flexible alternative to the graphical-based Genome Browser, this tool offers an enhanced level of query support thatincludes restrictions based on field values, free-form SQL queries, and combined queries on multiple tables. Output can be filtered torestrict the fields and lines returned, and may be organized into one of several formats, including a simple tab-delimited file that can beloaded into a spreadsheet or database as well as advanced formats that may be uploaded into the Genome Browser as customannotation tracks. The Table Browser provides a convenient alternative to downloading and manipulating the entire genome and itsmassive data tracks. (See the Downloading Genome Data section.)

    For information on using the Table Browser features, refer to the Table Browser User Guide.

    Getting started using Sessions

    The Sessions tool allows users to configure their browsers with specific track combinations, including custom tracks, and save theconfiguration options. Multiple sessions may be saved for future reference, for comparison of scenarios or for sharing with colleagues.Saved sessions persist for four months after the last access, unless deleted. User-generated tracks can be saved within sessions.

    This tool may be accessed by clicking the "My Data" pulldown in the top blue navigation bar in any assembly and then selectingSessions. To ensure privacy and security, you must create an account and/or log in to use the Session tool. Individual sessions may bedesignated by the user as either "shared" or "non-shared" to protect the privacy of confidential data. To avoid having a new sharedsession from someone else override existing Genome Browser settings, users are encouraged to open a new web-browser instance orto save existing settings in a session before loading a new shared session.

    For more detailed information on using the Session tool, see the Sessions User Guide.

    Getting started on Genome Graphs

    The Genome Graphs tool can be used to display genome-wide data sets such as the results of genome-wide SNP association studies,linkage studies, and homozygosity mapping. This tool is not pre-loaded with any sample data; instead, you can upload your own data fordisplay by the tool.

    Once you have uploaded your data, you can view it in a variety of ways. You can view multiple sets of genome-wide data simultaneouslyeither as superimposed graphs or side-by-side graphs. Once you see an area of interest in the Genome Graphs view, you can click on itto go directly to the Genome Browser at that position. You can also set a significance threshold for your data and view only regions orgene sets that meet that threshold.

    For information on using the Genome Graphs features, refer to the Genome Graphs User Guide.

    Using the VisiGene Image Browser

    VisiGene is a browser for viewing in situ images. It enables the user to examine cell-by-cell as well as tissue-by-tissue expression

  • patterns. The browser serves as a virtual microscope, allowing users to retrieve images that meet specific search criteria, theninteractively zoom and scroll across the collection.

    To start the VisiGene browser, click the VisiGene link in the left-hand sidebar menu on the Genome Browser home page.

    Images AvailableThe following image collections are currently available for browsing:

    High-quality high-resolution images of eight-week-old male mouse sagittal brain slices with reverse-complemented mRNAhybridization probes from the Allen Brain Atlas, courtesy of the Allen Institute for Brain ScienceMouse in situ images from the Jackson Lab Gene Expression Database (GXD) at MGITranscription factors in mouse embryos from the Mahoney Center for Neuro-OncologyMouse head and brain in situ images from NCBI's Gene Expression Nervous System Atlas (GENSAT) databaseXenopus laevis in situ images from the National Institute for Basic Biology (NIBB) XDB project

    Searching the Image DatabaseThe image database may be searched by gene symbols, authors, years of publication, body parts, GenBank or UniProtKB accessions,organisms, Theiler stages (mice), and Nieuwkoop/Faber stages (frogs). The search returns only those images that match all thespecified criteria. For a list of sample search strings, see the VisiGene Gateway page.

    The wildcard characters * and ? are supported for gene name searches. For example, to view the images of all genes in the Hox Acluster, search for hoxa*. When searching on author names that include initials, use the format Smith AJ.

    Image NavigationFollowing a successful search, VisiGene displays a list of thumbnails of images matching the search criteria in the lefthand pane of thebrowser. By default, the image corresponding to the first thumbnail in the list is displayed in the main image pane. If more than 25images meet the search criteria, links at the bottom of the thumbnail pane allow the user to toggle among pages of search results. Todisplay a different image in the main browser pane, click the thumbnail of the image you wish to view.

    By default, an image is displayed at a resolution that provides optimal viewing of the overall image. This size varies among images. Theimage may be zoomed in or out, sized to match the resolution of the original image or best fit the image display window, and moved orscrolled in any direction to focus on areas of interest. The original full-sized image may also be downloaded.

    Zooming in: To enlarge the image by 2X, click the Zoom in button above the image or click on the image using the left mouse button.Alternatively, the + key may be used to zoom in when the main image pane is the active window.

    Zooming out: To reduce the image by 2X, click the Zoom out button above the image or click on the image using the right mousebutton. Alternatively, the - key may be used to zoom out when the main image pane is the active window.

    Sizing to full resolution: Click the Zoom full button above the image to resize the image such that each pixel on the screencorresponds to a pixel in the digitized image.

    Sizing to best fit: Click the Zoom fit button above the image to zoom the image to the size that best fits the main image pane.

    Moving the image: To move the image viewing area in any direction, click and drag the image using the mouse. Alternatively, thefollowing keyboard shortcuts may be used after clicking on the image:

    Scroll left in the image: Left-arrow key or Home keyScroll right in the image: Right-arrow key or End keyScroll up in the image: Up-arrow key or PgUp keyScroll down in the image: Down-arrow key or PgDn key

    Downloading the original full-sized image: Most images may be viewed in their original full-sized format by clicking the "download"link at the bottom of the image caption. NOTE: due to the large size of some images, this action may take a long time and couldpotentially exceed the capabilities of some Internet browsers.

    If you have an image set you would like to contribute for display in the VisiGene Browser, contact Jim Kent.

    DNA text formatting

    The Genome Browser provides a feature to configure the retrieval, formatting, and coloring of the text used to depict the DNA sequenceunderlying the features in the displayed annotation tracks window. Retrieval options allow the user to add a padding of extra bases tothe upstream or downstream end of the sequence. Formatting options range from simply displaying exons in upper case to elaboratelymarking up a sequence according to multiple track data. The DNA sequence covered by various tracks can be highlighted by case,underlining, bold or italic fonts, and color.

  • The DNA display configuration feature can be useful to highlight features within a genomic sequence, point out overlaps between twotypes of features (for example, known genes vs. gene predictions), or mask out unwanted features.

    Using the DNA text formatting featureTo access the feature, click on the "View" pulldown on the top blue menu bar on the Genome Browser page and select "DNA", or selectthe "Get DNA..." option from the Genome Browser's right-click menu depending on context. "The Get DNA in Window" page thatappears contains sections for configuring the retrieval and output format.

    To display extra bases upstream of the 5' end of your sequence or downstream of the 3' end of the sequence, enter the number ofbases in the corresponding text box. This option is useful in looking for regulatory regions.

    The Sequence Formatting section lists several options for adjusting the case of all or part of the DNA sequence. To choose one of theseformats, click the corresponding option button, then click the get DNA button. To access a table of extended formatting options, click theExtended case/color options button.

    The Extended DNA Case/Color page presents a table with many more format options. The page provides instructions for using theformatting table, as well as examples of its use. The list of tracks in the Track Name column is automatically generated from the list oftracks available on the current genome.

    Tips for UseA few caveats mentioned on the Extended DNA Case/Color page bear repeating. Keep the formatting simple at first: it is easy to make adisplay that is pretty to look at but is also completely cryptic. Also, be careful when requesting complex formatting for a largechromosomal region: when all the HTML tags have been added to the output page, the file size may exceed the size limits that yourinternet browser, clipboard, and other software can safely display. The maximum size of genome that can be formatted by the tool isapproximately 10 Mbp.

    Converting data between assemblies

    Coordinates of features frequently change from one assembly to the next as gaps are closed, strand orientations are corrected, andduplications are reduced. Occasionally, a chunk of sequence may be moved to an entirely different chromosome as the map is refined.There are three different methods available for migrating data from one assembly to another: BLAT alignment, coordinate conversion,and coordinate lifting. The BLAT alignment tool is described in the section Using BLAT alignments.

    Coordinate conversionThe Genome Browser Convert utility is useful for locating the position of a feature of interest in a different release of the same genomeor (in some cases) in a genome assembly of another species. During the conversion process, portions of the genome in the coordinaterange of the original assembly are aligned to the new assembly while preserving their order and orientation. In general, it is easier toachieve successful conversions with shorter sequences.

    When coordinate conversion is available for an assembly, click on the "View" pulldown on the top blue menu bar on the GenomeBrowser page and select the "In Other Genomes (Convert)" link. You will be presented with a list of the genome/assembly conversionoptions available for the current assembly. Select the genome and assembly to which you'd like to convert the coordinates, then clickthe Submit button. If the conversion is successful, the browser will return a list of regions in the new assembly, along with the percent ofbases and span covered by that region. Click on a region to display it in the browser. If the conversion is unsuccessful, the utility returnsa failure message.

    Lifting coordinatesThe liftOver tool is useful if you wish to convert a large number of coordinate ranges between assemblies. This tool is available in bothweb-based and command line forms, and supports forward/reverse conversions as well as conversions between species.

    Web-based coordinate liftingTo access the graphical version of the liftOver tool, click on "Tools" pulldown in the top blue menu bar of the Genome Browser, thenselect LiftOver from the menu.

    To convert one or more coordinate ranges using the default conversion settings:

    1. Select the genome and assembly from which the ranges were taken ("Original"), as well as the genome and assembly to which thecoordinates should be converted ("New").

    2. Select the Data Format option: Browser Extensible Data format (BED) or position (coordinates of the form chrN:start-end).3. Enter coordinate ranges in the selected data format into the large text box, one per line.4. Click Submit.

    Alternatively, you may load the coordinate ranges from an existing data file by entering the file name in the upload box at the bottom ofthe screen, then clicking the Submit File button.

    The default parameter settings are recommended for general purpose use of the liftOver tool. However, you may want to customizesettings if you have several very large regions to convert.

  • Command-line coordinate liftingThe command-line version of liftOver offers the increased flexibility and performance gained by running the tool on your local server.This utility requires access to a Linux platform. The executable file may be downloaded here. Command-line liftOver requires a UCSC-generated over.chain file as input. Pre-generated files for a given assembly can be accessed from the assembly's "LiftOver files" link onthe Downloads page. If the desired conversion file is not listed, send a request to the genome mailing list and we may be able togenerate one for you.

    Downloading genome data

    Most of the underlying tables containing the genomic sequence and annotation data displayed in the Genome Browser can bedownloaded. All of the tables are freely usable for any purpose except as indicated in the README.txt file in the download directories.This data was contributed by many researchers, as listed on the Genome Browser Credits page. Please acknowledge the contributor(s)of the data you use.

    Downloading the dataGenome data can be downloaded in different ways:-- Via rsync: The UCSC Genome Bioinformatics hgdownload site contains download directories for all genome versions currently

    accessible in the Genome Browser. The rsync command rsync -a -P rsync://hgdownload.cse.ucsc.edu/path/file ./ can quicklyand efficiently download large files to your current directory (./). To download an entire directory (note the trailing slash), you woulduse an expression such as: rsync -a -P rsync://hgdownload.cse.ucsc.edu/directory/ ./ For more information please click here.

    -- Via ftp: The UCSC Genome Bioinformatics ftp site contains download directories for all genome versions currently accessible in theGenome Browser. The ftp command ftp://hgdownload.cse.ucsc.edu/goldenPath/ will take you to a directory that contains thegenome download directories. This download method is not recommended if you plan to download a large file or multiple files froma single directory compared to rsync (see above). You can, however, use the mget command to download multiple files: mgetfilename1 filename2, or mget -a (to download all the files in the directory).

    -- Via the Downloads link: Click the Downloads link on the left side bar on the UCSC Genome Bioinformatics home page to display alist of all database directories available for download. If the data you wish to download pre-dates the assembly versions listed, lookfor the data on our Downloads page.

    Types of data availableThere may be several download directories associated with each version of a genome assembly: the full data set (bigZips), the full dataset by chromosome (chromosome), the annotation database tables (database), and one or more sets of comparative cross-speciesalignments.

    BigZips contains the entire draft of the genome in chromosome and/or contig form. Depending on the genome, this directory maycontain some or all of the following files:-- chromAgp.zip: Description of how the assembly was generated, unpacking to one file per chromosome.-- chromFa.zip: The assembly sequence chromosomes, in one file per chromosome. Repeats from RepeatMasker and Tandem

    Repeats Finder are shown in lower case; non-repeating sequence is in upper case. The main assembly is contained in the chrN.fafiles, where chrN is the name of the chromosome. The chrN_random.fa files contain clones that are not yet finished or cannot beplaced with certainty at a specific place on the chromosome. In some cases, including the human HLA region on chromosome 6, thechrN_random.fa files also contain haplotypes that differ from the main assembly.

    -- chromFaMasked.zip: The assembly sequence chromosomes, in one file per chromosome. Repeats are masked by capital Ns;non-repeating sequence is shown in upper case.

    -- chromOut.zip: RepeatMasker .out file for chromosomes, generated by RepeatMasker at the -s sensitive setting.-- chromTrf.zip: Tandem Repeats Finder locations, filtered to keep repeats with period less than or equal to 12, translated into one

    .bed file per chromosome.-- contigAgp.zip: Description of how the assembly was generated from fragments at a contig layout level.-- contigFa.zip: The assembly sequence contigs, in one file per contig. All contigs are in forward orientation relative to the

    chromosome. In some cases, this means that contigs will be reversed relative to their orientation in the NCBI assembly. Repeatsare shown in lower case; non-repeating sequence is shown in upper case.

    -- contigFaMasked.zip: The assembly sequence contigs, in one file per contig. Repeats are masked by capital Ns; non-repeatingsequence is shown in upper case.

    -- contigOut.zip: RepeatMasker .out file for contigs, generated by RepeatMasker at the -s sensitive setting.-- contigTrf.zip: Tandem Repeats Finder locations, filtered to keep repeats with period less than or equal to 12, and translated into

    one .bed file per contig.-- database.zip: The Genome Browser database as tab-delimited files and associated MySQL table-creation tiles (eliminated in later

    assemblies due to size restrictions).-- est.fa.zip: Sequences of all GenBank ESTs for the selected species.-- liftAll.zip: The offsets of contigs within chromosomes.-- mrna.zip: mRNAs in GenBank from the selected species.-- refmrna.zip: RefSeq mRNAs from the selected species.-- upstream1000.zip: Sequences 1000 bases upstream of annotated transcription start of RefSeq genes. This includes only cases

    where the transcription start is annotated separately from the coding region start.

  • -- upstream2000.zip: Same as upstream1000, but with 2000 bases.-- upstream5000.zip: Same as upstream1000, but with 5000 bases.-- xenoMrna.zip: All GenBank mRNAs from species other than that of the selected one.

    Chromosomes contains the assembled sequence for the genome in separate files for each chromosome in a zipped fasta format. Themain assembly can be found in the chrN.fa files, where N is the name of the chromosome. The chrN_random.fa files contain clones thatare not yet finished or cannot be placed with certainty at a specific place on the chromosome. In some cases, the chrN_random.fa filesalso contain haplotypes that differ from the main assembly.

    Database contains all of the positional and non-positional tables in the genome annotation database. Each table is represented by 2files:-- .sql file: the MySQL commands used to create the table.-- .txt.gz file: the MySQL database table data in tab-delimited format and compressed with gzip.

    Schema descriptions for all tables in the genome annotation database may be viewed by using the "describe table schema" button inthe Table Browser.

    Cross-species alignments directories, such as the vsMm4 and humorMm3Rn3 directories in the hg16 assembly, contain pairwise andmultiple species alignments and filtered alignment files used to produce cross-species annotations. For more information, refer to theREADMEs in these directories and the description of the Multiple Alignment Format (MAF).

    Creating custom annotation tracks

    The Genome Browser provides dozens of aligned annotation tracks that have been computed at UCSC or have been provided byoutside collaborators. In addition to these standard tracks, it is also possible for users to upload their own annotation data for temporarydisplay in the browser. These custom annotation tracks are viewable only on the machine from which they were uploaded and areautomatically discarded 48 hours after the last time they are accessed, unless they are saved in a Session. Optionally, users can makecustom annotations viewable by others as well.

    Custom tracks are a wonderful tool for research scientists using the Genome Browser. Because space is limited in the Genome Browsertrack window, many excellent genome-wide tracks cannot be included in the standard set of tracks packaged with the browser. Othertracks of interest may be excluded from distribution because the annotation track data is too specific to be of general interest or can't beshared until journal publication. Many individuals and labs have contributed custom tracks to the Genome Browser website for use byothers. To view a list of these custom annotation tracks, click the Custom Tracks link on the Genome Browser home page.

    Custom annotation tracks are similar to standard tracks, but never become part of the MySQL genome database. Each track has its owncontroller and persists even when not displayed in the Genome Browser window, e.g. if the position changes to a range that no longerincludes the track. Typically, custom annotation tracks are aligned under corresponding genomic sequence, but they can also becompletely unrelated to the data. For example, a track can be displayed under a long sequence consisting of millions of Ns.

    Genome Browser annotation tracks are based on files in line-oriented format. Each line in the file defines a display characteristic for thetrack or defines a data item within the track. Annotation files contain three types of lines: browser lines, track lines, and data lines.Empty lines and those starting with "#" are ignored.

    To construct an annotation file and display it in the Genome Browser, follow these steps:

    Step 1. Format the data setFormulate your data set as a tab-separated file using one of the formats supported by the Genome Browser. Annotation data can be instandard GFF format or in a format designed specifically for the Human Genome Project or UCSC Genome Browser, includingbedGraph, GTF, PSL, BED, bigBed, WIG, bigWig, BAM, VCF, MAF, BED detail, Personal Genome SNP, broadPeak, narrowPeak, andmicroarray (BED15). GFF and GTF files must be tab-delimited rather than space-delimited to display correctly. Chromosome referencesmust be of the form chrN (the parsing of chromosome names is case-sensitive). You may include more than one data set in yourannotation file; these need not be in the same format.

    Step 2. Define the Genome Browser display characteristicsAdd one or more optional browser lines to the beginning of your formatted data file to configure the overall display of the GenomeBrowser when it initially shows your annotation data. Browser lines allow you to configure such things as the genome position that theGenome Browser will initially open to, the width of the display, and the configuration of the other annotation tracks that are shown (orhidden) in the initial display. NOTE: If the browser position is not explicitly set in the annotation file, the initial display will default to theposition setting most recently used by the user, which may not be an appropriate position for viewing the annotation track.

    Step 3. Define the annotation track display characteristicsFollowing the browser lines--and immediately preceding the formatted data--add a track line to define the display attributes for yourannotation data set. Track lines enable you to define annotation track characteristics such as the name, description, colors, initial displaymode, use score, etc. The track type= attribute is required for some tracks. If you have included more than one data set inyour annotation file, insert a track line at the beginning of each new set of data.

  • Example 1:Here is an example of a simple annotation file that contains a list of chromosome coordinates.

    browser position chr22:20100000-20100900track name=coords description="Chromosome coordinates list" visibility=2chr22 20100000 20100100chr22 20100011 20100200 chr22 20100215 20100400chr22 20100350 20100500chr22 20100700 20100800chr22 20100700 20100900

    Click here to view this track in the Genome Browser.

    Example 2:Here is an example of an annotation file that defines 2 separate annotation tracks in BED format. The first track displays blue one-basetick marks every 10000 bases on chr 22. The second track displays red 100-base features alternating with blank space in the sameregion of chr 22.

    browser position chr22:20100000-20140000track name=spacer description="Blue ticks every 10000 bases" color=0,0,255,chr22 20100000 20100001chr22 20110000 20110001chr22 20120000 20120001track name=even description="Red ticks every 100 bases, skip 100" color=255,0,0chr22 20100000 20100100 firstchr22 20100200 20100300 secondchr22 20100400 20100500 third

    Click here to view this track in the Genome Browser.

    Example 3A:This example shows an annotation file containing one data set in BED format. The track displays features with multiple blocks, a thickend and thin end, and hatch marks indicating the direction of transcription. The track labels display in green (0,128,0), and the gray levelof the each feature reflects the score value of that line. NOTE: The track name line in this example has been split over 2 lines fordocumentation purposes. If you paste this example into the Genome Browser, you must remove the line break to display the tracksuccessfully. Click here for a copy of this example that can be pasted into the browser without editing.

    browser position chr22:1000-10000browser hide alltrack name="BED track" description="BED format custom track example" visibility=2color=0,128,0 useScore=1chr22 1000 5000 itemA 960 + 1100 4700 0 2 1567,1488, 0,2512chr22 2000 7000 itemB 200 - 2200 6950 0 4 433,100,550,1500 0,500,2000,3500

    Click here to view this track in the Genome Browser.

    Example 3B:This example shows a simple annotation file containing one data set in the bigBed format. This track displays random sized blocksacross chr21 in the human genome. The big data formats, such as the bigBed format, can be uploaded using a bigDataUrl that isspecified in the track line. For more information on these track line parameters, refer to the Track Lines section.

    You may paste these two lines directly into the "Add Custom Tracks" page to view this example in the browser:

    browser position chr21:33,031,597-33,041,570

    track type=bigBed name="bigBed Example One" description="A bigBed file"bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigBedExample.bb

    Alternatively, you may also upload just the URL of the bigBed file:

    http://genome.ucsc.edu/goldenPath/help/examples/bigBedExample.bb

    This will infer the track type as "bigBed" based on the file extension and set the track name to "bigBedExample".

    Step 4. Display your annotation track in the Genome BrowserTo view your annotation data in the Genome Browser, open the Genome Browser home page and click the Genome Browser link in thetop menu bar. On the Gateway page that displays, select the genome and assembly on which your annotation data is based, then clickthe "add custom tracks" button. (Note: if the Gateway displays the "manage custom tracks" button instead, see Displaying andManaging Custom Tracks for information on how to display your track.)

    On the Add Custom Tracks page, load the annotation track data or URL for your custom track into the upper text box and the trackdocumentation (optional) into the lower text box, then click the Submit button. Tracks may be loaded by entering text, a URL, or apathname on your local computer. The track type= attribute is required for some tracks. For more information on thesemethods, as well as information on creating and adding track documentation, see Loading a Custom Track into the Genome Browser.

  • If you encounter difficulties displaying your annotation, read the section Troubleshooting Annotation Display Problems.

    Step 5. (Optional) Add details pages for individual track featuresAfter you've constructed your track and have successfully displayed it in the Genome Browser, you may wish to customize the detailspages for individual track features. The Genome Browser automatically creates a default details page for each feature in the trackcontaining the feature's name, position information, and a link to the corresponding DNA sequence. To view the details page for afeature in your custom annotation track (in full, pack, or squish display mode), click on the item's label in the annotation track window.

    You can add a link from a details page to an external web page containing additional information about the feature by using the trackline url attribute. In the annotation file, set the url attribute in the track line to point to a publicly available page on a web server. The urlattribute substitutes each occurrence of '$$' in the URL string with the name defined by the name attribute. You can take advantage ofthis feature to provide individualized information for each feature in your track by creating HTML anchors that correspond to the featurenames in your web page.

    Example 4:Here is an example of a file in which the url attribute has been set to point to the filehttp://genome.ucsc.edu/goldenPath/help/clones.html. The '#$$' appended to the end of the file name in the example points to the HTMLNAME tag within the file that matches the name of the feature (cloneA, cloneB, etc.). NOTE: The track line in this example has beensplit over 2 lines for documentation purposes. If you paste this example into the browser, you must remove the line break to display thetrack successfully. Click here for a copy of this example that can be pasted into the browser without editing.

    browser position chr22:10000000-10020000browser hide alltrack name=clones description="Clones" visibility=2color=0,128,0 useScore=1 url="http://genome.ucsc.edu/goldenPath/help/clones.html#$$"chr22 10000000 10004000 cloneA 960 chr22 10002000 10006000 cloneB 200 chr22 10005000 10009000 cloneC 700 chr22 10006000 10010000 cloneD 600chr22 10011000 10015000 cloneE 300chr22 10012000 10017000 cloneF 100

    Click here to display this track in the Genome Browser.

    Step 6. (Optional) Share your annotation track with othersThe previous steps showed you how to upload annotation data for your own use on your own machine. However, many users would liketo share their annotation data with members of their research group on different machines or with colleagues at other sites. To learn howto make your Genome Browser annotation track viewable by others, read the section Sharing Your Annotation Track with Others.

    Loading a Custom Track into the Genome Browser

    Using the Genome Browser's custom track upload and management utility, annotation tracks may be added for display in the GenomeBrowser, deleted from the Genome Browser, or updated with new data and/or display options. You may also use this interface to uploadand manage custom track sets for multiple genome assemblies.

    To load a custom track into the Genome Browser:

    Step 1. Open the Add Custom Tracks pageClick the "add custom tracks" button on the Genome Browser Gateway page. (Note: if one or more tracks have already been uploadedduring the current Browser session, additional tracks may be loaded on the Manage Custom Tracks page. In this case, the button on theGateway page will be labeled "manage custom tracks" and will automatically direct you to the track management page. See Displayingand Managing Custom Tracks for more information.)

    Step 2. Load the custom track dataThe Add Custom Tracks page contains separate sections for uploading custom track data and optional custom track descriptivedocumentation. Load the annotation data into the upper section by one of the following methods:

    Enter one or more URLs for custom tracks (one per line) in the data text box. The Genome Browser supports both the HTTP andFTP (passive-only) protocols.Data provided by a URL may need to be proceeded by a separate line defining type= required for some tracks, forexample such as "track type=broadPeak".Click the "Browse" button directly above the data text box, then choose a custom track file from your local computer, or type thepathname of the file into the "upload" text box adjacent to the "Browse" button. The custom track data may be compressed by anyof the following programs: gzip (.gz), compress (.Z), or bzip2 (.bz2). Files containing compressed data must include theappropriate suffix in their names.Paste or type the custom track data directly into the data box. Because the text in this box will not be saved to a file, this method is

  • not recommended unless you have a copy of the data elsewhere.

    Multiple custom tracks may be uploaded at one time on the Add Custom Tracks page through one of the following methods:

    Put all the tracks into the same file (rather than separate files), then load the file via the Browse button.Place your track files in a web-accessible location on your server, then load them into the Genome Browser by pasting their URLsinto the data box.

    NOTE: Please limit the number of custom tracks that you upload and maintain to less than 1000 tracks. If you havemore than this suggested limit of 1000 tracks, please consider setting up a track hub instead.

    Step 3. (Optional) Load the custom track description pageIf desired, you can provide optional descriptive text (in plain or HTML format) to accompany your custom track. This text will bedisplayed when a user clicks the track's description button on the Genome Browser annotation tracks page. Descriptive text may beloaded by one of the following methods:

    Click the "Browse" button directly above the documentation text box, then choose a text file from your local computer, or type thepathname of the file into the "upload" text box adjacent to the "Browse" button.Paste or type the custom track data directly into the data box. Note that the text in this box will not be saved to a file; therefore, thismethod is not recommended except for temporary documentation purposes.If your descriptive text is located on a website, you can reference it from your custom track file by defining the track line attribute"htmlUrl": htmlUrl=. In this case, there is no need to insert anything into the documentation text box.

    To format your description page in a style that is consistent with standard Genome Browser tracks, click the template link below thedocumentation text box for an HTML template that may be copied and pasted into a file for editing.

    If you load multiple custom tracks simultaneously using one of the methods described in Step 2, a track description can be associatedonly with the last custom track loaded, unless you upload the descriptive text using the track line "htmlUrl" attribute described above.

    Step 4. Upload the trackClick the Submit button to load your custom track data and documentation into the Genome Browser. If the track uploads successfully,you will be directed to the custom track management page where you can display your track, update an uploaded track, add moretracks, or delete uploaded tracks. If the Genome Browser encounters a problem while loading your track, it will display an error. See thesection Troubleshooting Annotation Display Problems for help in diagnosing custom track problems.

    NOTE: Please limit the number of custom tracks that you upload and maintain to less than 1000 tracks. If you havemore than this suggested limit of 1000 tracks, please consider setting up a track hub instead.

    Displaying and Managing Custom Tracks

    After a custom track has been successfully loaded into the Genome Browser, you can display it -- as well as manage your entire customtrack set -- via the options on the Manage Custom Tracks page. This page automatically displays when a track has been uploaded intothe Genome Browser (see Loading a Custom Track into the Genome Browser). Alternatively, you can access the track managementpage by clicking the "manage custom tracks" button on the Gateway or Genome Browser annotation tracks pages. (Note that the trackmanagement page is available only if at least one track has been loaded during the current browser session; otherwise, this button islabeled "add custom tracks" and opens the Add Custom Track page.)

    The table on the Manage Custom Tracks page shows the current set of uploaded custom tracks for the genome and assembly specifiedat the top of the page. If tracks have been loaded for more than one genome assembly, pulldown lists are displayed; to view theuploaded tracks for a different assembly, select the desired genome and assembly option from the lists.

    The following track information is displayed in the Manage Custom Tracks table:

    Name: a hyperlink to the Update Custom Track page where you can update your track configuration and data.Description: the value of the "description" attribute from the track line, if present. If no description is included in the input file, thisfield contains the track name.Type: the track type, determined by the Browser based on the format of the data.Doc: displays "Y" (Yes) if a description page has been uploaded for the track; otherwise the field is blank.Items: the number of data items in the custom track file. An item count is not displayed for tracks lacking individual items (e.g.wiggle format data).Pos: the default chromosomal position defined by the track file in either the browser line "position" attribute or the first data line.Click this link to open the Genome Browser or Table Browser at the specified position (Note: only the chromosome name is shown

  • in this column). The Pos column remains blank if the track lacks individual items (e.g. wiggle format data) and the browser line"position" attribute hasn't been set.

    Displaying a custom track in the Genome BrowserClick the "go to genome browser" button to display the entire custom track set for the specified genome assembly in the GenomeBrowser. By default, the browser will open to the position specified in the browser line "position" attribute or first data line of the firstcustom track in the table, or the last-accessed Genome Browser position if the track is in wiggle data format. To open the display at thedefault position for another track in the list, click the track's position link in the Pos column.

    Viewing a custom track in the Table BrowserClick the "go to table browser" button to access the data for the custom track set in the Table Browser. The custom tracks will be listed inthe "Custom Tracks" group pulldown list.

    Loading additional custom tracksTo load a new custom track into the currently displayed track set, click the "add custom tracks" button. To change the genome assemblyto which the track should be added, select the appropriate options from the pulldown lists at the top of the page. For instructions onadding a custom track on the Add Custom Tracks page, see Loading a Custom Track into the Genome Browser.

    Removing one or more custom tracksTo remove custom tracks from the uploaded track set, click the checkboxes in the "delete" column for all tracks you wish to remove, thenclick the "delete" button. A custom track may also be removed by clicking the "Remove custom track" button on the track's descriptionpage. Note: removing the track from the Genome Browser does not delete the track file from your server or local disk.

    Updating a custom trackTo update the stored information for a loaded custom track, click the track's link in the "Name" column in the Manage Custom Trackstable. A custom track may also be updated by clicking the "Update custom track" button on the track's description page.

    The Update Custom Track page provides sections for modifying the track configuration information (the browser lines and track lines),the annotation data, and the descriptive documentation that accompanies the track. Existing track configuration lines are displayed inthe top "Edit configuration" text box. In the current implementation of this utility, the existing annotation data is not displayed. Because ofthis, the data cannot be incrementally edited through this interface, but instead must be fully replaced using one of the data entrymethods described in Loading a Custom Track into the Genome Browser. If description text has been uploaded for the track, it will bedisplayed in the track documentation edit box, where it may be edited or completely replaced. Once you have completed your updates,click the Submit button to upload the new data into the Genome Browser.

    If the data or description text for your custom track was originally loaded from a file on your hard disk or server, you should first edit thefile, then reload it from the Update Custom Track page using the "Browse" button. Note that edits made on this page to description textuploaded from a file will not be saved to the original file on your computer or server. Because of this, we recommend that you use thedocumentation edit box only for changes made to text that was typed or pasted in.

    Browser Lines

    Browser lines configure the overall display of the Genome Browser window when your annotation file is uploaded. Each line defines onedisplay attribute. Browser lines consist of the format:

    browser attribute_name attribute_value(s)

    For example, if the browser line browser position chr22:1-20000 is included in the annotation file, the Genome Browser window willinitially display the first 20000 bases of chr 22.

    The following browser line attribute name/value options are available. The value track_name must be set to the name of the primarytable on which the the track is based. To identify this table, open up the Table Browser, select the correct genome assembly, then selectthe track name from the track list. The table list will show the primary table. Alternatively, the primary table name can be obtained from amouseover on the track name in the track control section.

    Note that composite track subtracks are not valid track_name values. To find the symbolic name of a composite track, look in thetableName field in the trackDb table, or mouseover the track name in the track control section. It is not possible to display only a subsetof the subtracks at this time.

    position - Determines the part of the genome that the Genome Browser will initially open to, in chromosome:start-endformat.hide all - Hides all annotation tracks except for those listed in the custom