STARmanual_2.1.4

download STARmanual_2.1.4

of 18

Transcript of STARmanual_2.1.4

  • 7/30/2019 STARmanual_2.1.4

    1/18

    STAR manual(2.1.4)

    Alex Dobin ([email protected])

    Oct 17, 2012

    1:

    Installation ................................................................................................................................................... 22: Before running STAR ................................................................................................................................. 23: Generating genomes .................................................................................................................................... 34: Running mapping jobs ................................................................................................................................ 35: Crucial input parameters ............................................................................................................................. 46: Loading genome into shared memory ......................................................................................................... 57: Output .......................................................................................................................................................... 6

    mailto:[email protected]:[email protected]
  • 7/30/2019 STARmanual_2.1.4

    2/18

    2

    1: Installation

    Unzip/tarSTAR_x.x.x.tgz file into a directory of your choice < STARsource >, cd< STARsource > and runmake. The source code will be compiled and the STAR executable will be generated. Standard GNU C++

    distribution is required for compilation. The pre-compiled executable STAR, included in the source directory,should work on any x86_64 Linux machine.

    The compilation of STAR was tested on the following Amazon EC2 servers, the fllowing commands wereused to install the necessary packages:

    1.1: Ubuntu 12.04.1 LTS

    sudo apt - get update

    sudo apt - get i nst al l g++

    sudo apt - get i nst al l make

    1.2: Red Hat Enterprise Linux 6.3, CentOS 6.2

    sudo yum updat e

    sudo yum i nst al l make

    sudo yum i nst al l gcc- c++

    sudo yum i nstal l gl i bc- stat i c

    1.3: SUSE Linux Enterprise Server 11

    sudo zypper updat e

    sudo zypper i n gcc gcc- c++

    2: Before running STAR

    You need the following items before you can run STAR:

    1. STARexecutable: see installation2. Genome files: download standard genomes from STAR web-site, or generate your own genomes using(see Generating genomes section).

    3. Sequencing .fastq files: standard .fastq files (e.g. Illumina .fastq output files) or.fasta files.Each STAR run should be made from a fresh working directory. All the output files are stored in the working

    directory. The output files will be overwritten without a warning every time you run STAR.

  • 7/30/2019 STARmanual_2.1.4

    3/18

    3

    3: Generating genomes

    From a fresh directory, run/ pat hToSt arDi r / STAR - - r unMode genomeGenerat e - - genomeDi r/ pat h/ t o/ GenomeDi r - - genomeFast aFi l es / pat h/ t o/ genome/ f ast a1/ path/ t o/ genome/ f ast a2 - - r unThr eadN Reference sequences (henceforth called chromosomes) will be collected from the all the speceified.fastafiles. Generated genome files comprise binary genome sequence, suffix arrays, text chromosome names and

    lengths information. You can rename the chromosomes names in the chrName.txtkeeping the order of the

    chromosomes in the file: the names from this file will be used in all output alignment files (such as .sam).The tabs are not allowed in chromosomes names, andspaces are not recommended. For each genome these

    files are stored in a separate directory/path/to/GenomeDirthat has to be supplied as a parameter to the

    alignment runs. You can use multiple threads by specifying --runThreadN .

    4: Generating genomes with a splice junct ions database

    STAR can use a splice junctions database to improve accuracy of the mapping. The splice junctions loci

    have to be supplied at the genome/suffix array generation step. The following two parameters have to be

    specified at the genome generation step:

    - - sj dbFi l eChr St ar t End : the file with annotated introns loci in a three-columnformat:

    Chr \tab\ Start \tab\ End

    where Start and End are first and last bases of the introns (1-based chromosome coordinates).

    - - sj dbOver hang : the length of the "overhang" on each side of a splice junctions. Ideally

    it's equal to (MateLength - 1).

    At the mapping stage, - - sj dbScor e (=1 by default) provides extra alignment score for alignmentsthat cross database junctions. If this score is positive, it will bias the alignment toward annotated junctions.

    5: Running mapping jobs

    From a fresh working directory run:

    / pat hToSt ar Di r / STAR - - genomeDi r / pat h/ t o/ GenomeDi r - - r eadFi l esI n/ pat h/ t o/ r ead1 [ / pat h/ t o/ r ead2] - - r unThr eadN - -

    All input parameters are entered as -- . If a parameterallows several values, the values are separated by spaces. For advanced usage, parameters can also be input

    from config files Parameters1.in andParameters2.in in the working directory. In these files each parameterwith its values occupies one line without preceding dashes (--). There are four levels of input parameters,Default, Parameters1.in, Parameters2.in, Command line; each level overrides the previous ones.

  • 7/30/2019 STARmanual_2.1.4

    4/18

    4

    6: Crucial input parameters

    All input parameters and their default values are briefly described in the parametersDefault file in the STARsource directory. The default parameters are typical for mapping 2x76 or 2x101 Illumina reads to the human

    genome.

    The following parameters are crucial for the STAR mapping runs:

    PARAMETER DEFAULTNAME VALUE

    genomeDi r GenomeDi r /st r i ng: pat h t o t he di r ect or y wher e genome f i l es ar e st or ed

    genomeLoad LoadAndKeepmode of shar ed memory usage f or t he genome f i l es ( see Loading genome

    into shared memory)

    r eadFi l esI n Read1 Read2

    st r i ng( s) : pat hs t o . f ast q or . f ast a f i l es t hat cont ai n i nput r ead1( and, i f needed, r ead2)

    r unThr eadN 1i nt >0: t he number of t hr eads t o use

    The following parameters control the filtering of the output alignments.

    To filter by mapped length and alignment score:out Fi l t er Scor eMi n 0

    i nt : al i gnment wi l l be out put onl y i f i t s scor e i s hi gher t hant hi s val ue

    out Fi l t er Scor eMi nOver Lr ead 0. 66f l oat : out Fi l t er Scor eMi n nor mal i zed t o read l engt h ( sum of

    mat es' l engt hs f or pai r ed- end r eads)

    out Fi l t er Mat chNmi n 0

    i nt : al i gnment wi l l be out put onl y i f t he number of mat ched basesi s hi gher t han t hi s val ue

    out Fi l t erMatchNmi nOver Lr ead 0. 66f l oat : out Fi l t er Mat chNmi n nor mal i zed t o read l engt h ( sum of

    mat es' l engt hs f or pai r ed- end r eads)

    To control the output of multi-mappers, use the following parameters:

  • 7/30/2019 STARmanual_2.1.4

    5/18

    5

    out Fi l t er Mul t i mapScor eRange 1i nt : t he score range bel ow t he maxi mum scor e f or mul t i mappi ng al i gnment s

    out Fi l t erMul t i mapNmax 10i nt : r ead al i gnment s wi l l be out put onl y i f t he r ead maps f ewer t han t hi s

    val ue, ot her wi se no al i gnment s wi l l be out put

    To control the number of mismatches per read pair, use:out Fi l t erMi smat chNmax 10i nt : al i gnment wi l l be out put onl y i f i t has f ewer mi smat ches t han t hi sval ue

    out Fi l t erMi smatchNoverLmax 0. 3i nt : al i gnment wi l l be out put onl y i f i t s r at i o of mi smat ches t o mappedl engt h i s l ess t han t hi s val ue

    To filter out alignments containing non-canonical junctions use:out Fi l t er I nt r onMot i f s None

    st r i ng: f i l t er al i gnment usi ng t hei r mot i f sNone : no f i l t er i ngKeepCanoni cal : f i l t er out al i gnment s t hat cont ai n non-canoni cal or i nconsi st ent canoni cal j unct i ons

    To set maximum intron size:

    - - al i gnI nt r onMax is the max intron size for a splice that occurs inside each mate, i.e. the one recordedas xxxN in the .sam CIGAR.

    - - al i gnMatesGapMax is the maximum genomic gap between the mates, i.e. inner distance betweenthe right end of the left mate and left side of the right mate. Note that this gap is not truly an intron, and it

    could contain several introns.If the (insert-2*mate) of the library is not very large (< typical exon size), you can assume that there is no

    more than one intron in the unsequenced RNA between the mates, and so you can use

    alignMatesGapMax=alignIntronMax However, for longer reads it might be necessary to usealignMatesGapMax>alignIntronMax to allow more than one intron in the genomic gap between the mates.

    7: Loading genome into shared memory

    The --genomeLoadparameter controls how the genome is loaded into memory. IfgenomeLoad=LoadAndKeep, STAR loads the genome as a standard Linux shared memory piece. Befroe

    loading the genome, STAR will check if the genome has already been loaded into the shared memory. The

    genomes are identified by their unique directory paths. If the genome has not been loaded, STAR job willload it and will keep it in memory even after STAR job itself finishes. The genome will be shared with all the

    other STAR instances. You can remove the genome from the shared memory running STAR withgenomeLoad=Remove. The shared memory piece will be physically removed only after all STAR jobs

    attached to it complete. IfgenomeLoad=LoadAndRemove, STAR will load genome in the shared memory,

    and mark it for removal, so that the genome will be removed from the shared memory once all STAR jobsusing it exit. IfgenomeLoad=LoadAndExit, STAR will load genome in the shared memory, and immediately

    exit without performing any alignment, keeping the genome loaded in the shared memory for the future runs.

  • 7/30/2019 STARmanual_2.1.4

    6/18

    6

    If you need to check or remove shared memory pieces manually, use the standard Linux commandipcs andipcrm. If the genome residing in shared memory is not used for a long time it may get paged out of RAM

    which will slow down STAR runs considerably. It is strongly recommended to regularly re-load (i.e. remove

    and load again) the shared memory genomes.

    IfgenomeLoad=NoSharedMemory, shared memory is not used. This option is recommended if the shared

    memory is not configured properly on your server.

    Many standard Linux distributions do not allow large enough shared memory blocks. You can fix this issue if

    you have root privileges, or ask you system administrator to do it. To enable the shared memory modify or

    add the following lines to /etc/sysctl.conf:kernel.shmmax = 31000000000

    kernel.shmall = 31000000000

    Then run:> /sbin/sysctl -p

    This will increase the allowed shared memory blocks to ~31GB, enough for human or mouse genome.

    8: Output

    STAR will produce the following output files:

    Log.out- log files with some information about the run recorded.

    Log.progress.out- job progress with the number of processed reads, % of mapped reads etc., updated every

    ~1 minute

    Log.final.out final mapping statistics

    Aligned.out.sam - alignments in standard SAM format.

    The number of lociNmap a read maps to (multi-mapping) is given by NH:i: field.

    The mapping quality MAPQ (column 5) is 255 for uniquely mapping reads, andint(-log10(1-1/Nmap)) formulti-mapping reads. This scheme is same as the one used by Tophat and is compatible with Cufflinks.

    For multi-mappers, all alignments except one are marked with 0x100 (secondary alignment) in the FLAG

    column 2. The un-marked alignment is either the best one (i.e. highest scoring), or is randomly selected from

    the alignments of equal quality.

    Reported attrbiutes:

    Column 12: NH: number of loci a read (pair) maps to

    Column 13: IH: alignment index for all alignments of a read

    Column 14: aS: alignment score

    Column 15: nM: number of mismatched (does not include indels)

  • 7/30/2019 STARmanual_2.1.4

    7/18

    7

    SJ.out.tab - high confidence collapsed splice junctions in tab-delimited format:

    Column 1: chromosome

    Column 2: first base of the intron (1-based)

    Column 3: last base of the intron (1-based)

    Column 4: strand

    Column 5: intron motif: 0: non-canonical; 1: GT/AG, 2: CT/AC, 3: GC/AG, 4: CT/GC, 5: AT/AC, 6: GT/AT

    Column 6: reserved

    Column 7: number of uniquely mapping reads crossing the junction

    Column 8: reserved

    Column 9: maximum left/right overhang

    The filtering for this output file is controlled by the following parameters (seeparametersDefaultfor

    description) - - out SJ f i l t er Over hangMi n, - - out SJ f i l t er Count Uni queMi n, - -out SJ f i l t er Di st ToOt her SJ mi n

  • 7/30/2019 STARmanual_2.1.4

    8/18

    8

    9: All input parameters

    The most up-to-date input parameters and their defaults can be found in theparametersDefaultfile in theSTAR source directory.

    ### PARAMETERS

    par amet er sFi l es -

    st r i ng: name of a user - def i ned par amet er s f i l e, " - " : none. Can onl ybe def i ned on the command l i ne.

    ### RUN PARAMETERS

    r unMode al i gnReadsst r i ng: t ype of t he r un: al i gnReads . . . map r eads

    genomeGenerate . . . generate genome f i l es

    r unThr eadN 1

    i nt : number of t hr eads t o r un STAR

    ### GENOME PARAMETERS

    genomeDi r . / GenomeDi r /

    st r i ng: pat h t o t he di r ect or y wher e genome f i l es ar e st or ed ( i fr unMode! =gener at eGenome) or wi l l be gener at ed ( i f r unMode==generat eGenome

    genomeLoad LoadAndKeep

    mode of shared memor y usage f or t he genome f i l es

    st r i ng: LoadAndKeep . . . l oad genome i nt o shar ed andkeep i t i n memory af t er r un

    LoadAndRemove . . . l oad genome i nto shar ed butr emove i t af t er r un

  • 7/30/2019 STARmanual_2.1.4

    9/18

    9

    LoadAndExi t . . . l oad genome i nt o shar edmemory and exi t , keepi ng t he genome i n memory f or f uture r uns

    Remove . . . do not map anyt hi ng, j ustr emove l oaded genome f r om memor y

    NoShar edMemor y . . . do not use shared memor y,each j ob wi l l have i t ' s own pr i vat e copy of t he genome

    ### GENOME GENERATI ON PARAMETERS

    genomeFast aFi l es -

    f ast a f i l es wi t h genomi c sequence f or genome f i l es gener at i on, onl yused i f r unMode==genomeGener at e

    st r i ng( s) : pat h( s) t o t he f i l es f r om t he wor ki ng di r ector y ( separ at edby spaces)

    genomeChrBi nNbi t s 18

    i nt : =l og2( chr Bi n) , wher e chr Bi n i s t he si ze of t he bi ns f or genomest orage: each chr omosome wi l l occupy an i nt eger number of bi ns

    genomeSAi ndexNbases 14i nt : l engt h ( bases) of t he SA pr e- i ndexi ng st r i ng. Typi cal l y bet ween

    10 and 15. Longer st r i ngs wi l l use much more memory, but al l ow f ast ersear ches.

    genomeSAspar seD 1

    i nt >0: suf f ux ar r ay spar si t y, i . e. di st ance bet ween i ndi ces: usebi gger numbers t o decr ease needed RAM at t he cost of mappi ng speedr educt i on

    ### READ PARAMETERS

    r eadFi l esI n Read1 Read2

    st r i ng( s) : pat hs t o f i l es t hat cont ai n i nput r ead1 ( and, i f needed,r ead2)

  • 7/30/2019 STARmanual_2.1.4

    10/18

    10

    r eadMat esLengt hsI n Not Equal

    st r i ng: Equal / Not Equal - l engt hs of names, sequences, qual i t i es f orboth mates are t he same / not t he same. NotEqual i s saf e i n al l

    s i t uat i ons.

    cl i p3pNbases 0

    i nt ( s) : number ( s) of bases t o cl i p f r om 3p of each mat e. I f one val uei s gi ven, i t wi l l be assumed t he same f or bot h mat es.

    cl i p5pNbases 0

    i nt ( s) : number ( s) of bases t o cl i p f r om 5p of each mat e. I f one val uei s gi ven, i t wi l l be assumed t he same f or bot h mat es.

    cl i p3pAdapt erSeq -

    st r i ng( s) : adapt er sequences to cl i p f r om 3p of each mat e. I f oneval ue i s gi ven, i t wi l l be assumed t he same f or bot h mat es.

    cl i p3pAdapt erMMp 0. 1

    doubl e( s) : max pr opor t i on of mi smat ches f or 3p adpat er cl i ppi ng f oreach mat e. I f one val ue i s gi ven, i t wi l l be assumed t he same f or bot hmat es.

    cl i p3pAf t er Adapt er Nbases 0

    i nt ( s) : number of bases t o cl i p f r om 3p of each mat e af t er t headapt er cl i ppi ng. I f one val ue i s gi ven, i t wi l l be assumed t he same f orboth mat es.

    ### LI MI TS

    l i mi t GenomeGener at eRAM 31000000000

    i nt >0: maxi mum avai l abl e RAM ( bytes) f or genome gener at i on, i nGi gaByt es

    l i mi t I Obuf f er Si ze 150000000

  • 7/30/2019 STARmanual_2.1.4

    11/18

    11

    i nt >0: max avai l abl e buf f er s si ze ( byt es) f or i nput / out put , pert hr ead

    ### OUTPUT

    out Fi l eNamePr ef i x . /st r i ng: out put f i l es name pr ef i x ( i ncl udi ng f ul l or r el at i ve pat h) .

    Can onl y be def i ned on t he command l i ne.

    out St d Log

    st r i ng: whi ch out put wi l l be di r ect ed t o st dout ( st andar d out )

    Log : l og messages

    SAM : al i gnment s i n . sam f or mat ( whi chnor mal l y ar e out put t o Al i gned. out . sam f i l e) , nor mal st andar d out put wi l lgo i nt o Log. st d. out

    out SAMmode Ful l

    st r i ng: mode of SAM out put None : no SAM out put

    Ful l : f ul l SAM out put

    out SAMst r andFi el d None

    st r i ng: Cuf f l i nks- l i ke st r and f i el d f l ag None : not used

    i nt r onMot i f : st r andder i ved f r om t he i nt r on mot i f . Reads wi t h i nconsi st ent and/ or non-canoni cal i nt r ons ar e f i l t er ed out .

    out SAMat t r i but es St andar d

    st r i ng: whi ch SAM at t r i but es t o out put ?

    St andar d : NH, HI , AS, nM at t r i but es

    Al l

    None

    out SAMunmapped None

    st r i ng: out put of unmapped r eads

    None : no out put

  • 7/30/2019 STARmanual_2.1.4

    12/18

    12

    Wi t hi n : out put unmapped r eads wi t hi nt he mai n SAM f i l e

    ### OUTPUT FI LTERI NG

    out Fi l t er Mul t i mapScor eRange 1

    i nt : t he score r ange bel ow t he maxi mum scor e f or mul t i mappi ngal i gnment s

    out Fi l t erMul t i mapNmax 10

    i nt : r ead al i gnment s wi l l be out put onl y i f t he r ead maps f ewer t hant hi s val ue, ot her wi se no al i gnment s wi l l be out put

    out Fi l t erMi smat chNmax 10

    i nt : al i gnment wi l l be out put onl y i f i t has f ewer mi smat ches t hant hi s val ue

    out Fi l t erMi smatchNoverLmax 0. 3

    i nt : al i gnment wi l l be out put onl y i f i t s r at i o of mi smat ches t omapped l engt h i s l ess t han t hi s val ue

    out Fi l t er Scor eMi n 0

    i nt : al i gnment wi l l be out put onl y i f i t s scor e i s hi gher t han t hi sval ue

    out Fi l t er Scor eMi nOver Lr ead 0. 66

    f l oat : out Fi l t er Scor eMi n nor mal i zed t o r ead l engt h ( sum of mat es'l engt hs f or pai r ed- end r eads)

    out Fi l t er Mat chNmi n 0i nt : al i gnment wi l l be out put onl y i f t he number of mat ched bases i s

    hi gher t han t hi s val ue

    out Fi l t erMatchNmi nOver Lr ead 0. 66

  • 7/30/2019 STARmanual_2.1.4

    13/18

    13

    f l oat : out Fi l t er Mat chNmi n nor mal i zed t o r ead l engt h ( sum of mat es'l engt hs f or pai r ed- end r eads)

    out Fi l t er I nt r onMot i f s None

    st r i ng: f i l t er al i gnment usi ng t hei r mot i f sNone : no f i l t er i ng

    KeepCanoni cal : f i l t er out al i gnment st hat cont ai n non- canoni cal or i nconsi st ent canoni cal j unct i ons

    ### OUTPUT FI LTERI NG: SPLI CE J UNCTI ONS

    out SJ f i l t erOver hangMi n 30 12 12 12

    4*i nt : mi ni mum over hang l engt h f or spl i ce j unct i ons on bot h si desf or : ( 1) non- canoni cal mot i f s, ( 2) GT/ AG mot i f , ( 3) GC/ AG mot i f , ( 4)AT/ AC mot i f . - 1 means no out put f or t hat mot i f

    out SJ f i l t er Count Uni queMi n 3 1 1 1

    4*i nt : mi ni mum r ead count per j unct i on f or : ( 1) non- canoni cal mot i f s,( 2) GT/ AG mot i f , ( 3) GC/ AG mot i f , ( 4) AT/ AC mot i f . - 1 means no out put f or

    t hat mot i f

    out SJ f i l t er Di st ToOt her SJ mi n 10 0 5 10

    4*i nt >=0: mi ni mum al l owed di st ance t o ot her j unct i ons' donor / accept or

    ### SCORI NG

    scor eGap 0

    gap open penal t y

    scor eGapNoncan - 8

    non- canoni cal gap open penal t y ( i n addi t i on t o scoreGap)

    scor eGapGCAG - 4

    GCAG gap open penal t y ( i n addi t i on t o scor eGap)

  • 7/30/2019 STARmanual_2.1.4

    14/18

    14

    scor eGapATAC - 8

    ATAC gap open penal t y ( i n addi t i on t o scor eGap)

    scoreGenomi cLengt hLog2scal e - 0. 25

    ext r a scor e l ogar i t hmi cal l y scal ed wi t h genomi c l engt h of t heal i gnment : - scoreGenomi cLengt hLog2scal e*l og2( genomi cLengt h)

    scoreDel Open - 2

    del et i on open penal t y

    scoreDel Base - 2

    del et i on extensi on penal t y per base ( i n addi t i on t o scor eDel Open)

    scoreI nsOpen - 2

    i nser t i on open penal t y

    scor eI nsBase - 2

    i nsert i on extensi on penal t y per base ( i n addi t i on t o scor eI nsOpen)

    scor eSt i t chSJ shi f t 1

    maxi mum score reduct i on whi l e sear chi ng f or SJ boundar i es i nt hesti t chi ng step

    ### ALI GNMENT and SEED PARAMETERS

    seedSear chSt ar t Lmax 50

    i nt >0: def i nes t he sear ch st ar t poi nt t hr ough t he r ead - t he read i sspl i t i nt o pi eces no l onger t han t hi s val ue

    seedSear chSt ar t LmaxOver Lr ead 1. 0

    f l oat : seedSear chSt ar t Lmax nor mal i zed t o r ead l engt h ( sumof mat es'l engt hs f or pai r ed- end r eads)

  • 7/30/2019 STARmanual_2.1.4

    15/18

    15

    seedSearchLmax 0

    i nt >=0: def i nes t he maxi mum l engt h of t he seeds, i f =0 max seedl engt hi s i nf i ni t e

    seedMul t i mapNmax 10000

    i nt >0: onl y pi eces t hat map f ewer t han t hi s val ue ar e ut i l i zed i n t hest i t chi ng pr ocedur e

    seedPer ReadNmax 1000

    i nt>0: max number of seeds per r ead

    seedPer Wi ndowNmax 50i nt>0: max number of seeds per wi ndow

    seedNoneLoci Per Wi ndow 10

    i nt>0: max number of one seed l oci per wi ndow

    al i gnI nt r onMi n 21

    mi ni mum i nt r on si ze: genomi c gap i s consi der ed i nt r on i f i t s

    l engt h>=al i gnI nt r onMi n, ot her wi se i t ' s consi der ed Del et i on

    al i gnI nt r onMax 0

    maxi mum i nt r on si ze, i f 0, max i nt r on si ze wi l l be det er mi ned by( 2 wi nBi nNbi t s) *wi nAnchor Di st Nbi ns

    al i gnMat esGapMax 0

    maxi mum gap between t wo mat es, i f 0, max i nt r on gap wi l l bedetermi ned by ( 2 wi nBi nNbi t s) *wi nAnchorDi st Nbi ns

    al i gnSJ over hangMi n 5

    i nt >0: mi ni mum over hang ( i . e. bl ock si ze) f or spl i ced al i gnment s

    al i gnSpl i cedMat eMapLmi n 0

  • 7/30/2019 STARmanual_2.1.4

    16/18

    16

    i nt >0: mi ni mum mapped l engt h f or a r ead mate t hat i s spl i ced

    al i gnSpl i cedMat eMapLmi nOverLmat e 0. 66

    f l oat >0: al i gnSpl i cedMateMapLmi n normal i zed t o mate l engt h

    al i gnWi ndowsPer ReadNmax 10000

    i nt>0: max number of wi ndows per r ead

    al i gnTranscr i ptsPer Wi ndowNmax 100

    i nt >0: max number of t r anscr i pt s per wi ndow

    al i gnTranscr i ptsPer ReadNmax 10000

    max number of di f f erent al i gnment s per r ead t o consi der

    ### SPLI CE J UNCTI ONS DATABASE PARAMETERS

    sj dbFi l eChr St ar t End -

    st r i ng: pat h t o the f i l e wi t h genomi c coor di nat es ( chr st ar t end) f or t he i nt r ons

    sj dbOverhang 0

    i nt >=0: l engt h of t he donor / accept or sequence on each si de of t he

    j unct i ons, i deal l y = ( mat e_l engt h - 1)

    i f =0, spl i ce j unct i on dat abase i s not used

    sj dbScor e 1

    i nt : ext r a al i gnment scor e f or al i gnmet s t hat cr oss dat abasej unct i ons

    ### WI NDOWS, ANCHORS, BI NNI NG

    wi nAnchor Mul t i mapNmax 50i nt >0: max number of l oci anchor s ar e al l owed t o map t o

    wi nBi nNbi t s 16

    i nt >0: =l og2( wi nBi n) , wher e wi nBi n i s t he si ze of t he bi n f or t hewi ndows/ cl ust er i ng, each wi ndow wi l l occupy an i nt eger number of bi ns.

  • 7/30/2019 STARmanual_2.1.4

    17/18

  • 7/30/2019 STARmanual_2.1.4

    18/18

    18

    ### Smi t h- Wat er man al i gnment paramet er s f or PacBi o reads

    swMode 0

    i nt >=0: 0: no SW al i gnment s, 1: f ul l SW al i gnment

    swWi nCover ageMi nP 50

    i nt >0: mi ni mum r ead coverage percent age i n a wi ndow t hat wi l l beal i gned wi t h SW