Nerd talk: regexes

Post on 05-Jul-2015

146 views 3 download

Transcript of Nerd talk: regexes

Regexes:It's magic!

“Some people, when confronted with a problem, think 'I know, I'll use regular expressions!'

Now they have two problems.”

*

Perl style regex:It's magic done right!

Metacharacters

^ beginning

$ end

. anything

\ escape

/^....G..AA$/

Escaped characters

\s whitespace

\S not-whitespace

\w word

\d digit

\. dot

\\ counterslash

/^\w\w\w\wG\w\wAA$/

/^\d\d\\\d\d\\\d\d\d\d$/

Repetition

? 0 or 1 time

* 0 or more times

+ 1 or more times

*? ungreedy *

+? ungreedy +

{m} m times

{m, n} m up to n times

{m, n}? ungreedy {m,n}

/^\w{4}G\w{2}AA$/

/^\d{1,2}\\\d{1,2}\\\d{2,4}$/

Grouping

[ABC] any of these characters

(AB|BC|CA) any of these expressions

(THIS!) save this

[A-Za-z0-9] ranges

/^[ACTG]{4}G[ACTG]{2}AA$/

/^(0?[1-9]|[0-2]\d|3[01])\\(0?\d|1[0-2])\\(\d{2}|\d{4})$/

OVERKILL

http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313.ipynb

In Python (sigh...)

E.g.: finding files

E.g.: finding files

ls -la | grep '->' | grep -v 'bubo' | grep -v 'Daniel'

E.g.: demultiplexing fasta

1. Barcode

2. Primer

3. Random nucleotides

grep -P '1:N:0:ACTGGTT' -A3 –no-group-separator multiplex_R1.fastq | grep -P '^[ACTGN]{4}CCC[ACGT]T[GC]AGATA' -A2 -B1 --no-group-separator > deplexed_R1.fq

E.g.: paper figures!

From the subset of unique sequences that span the entire region under study, how many unique sequences are matched by each primer combination?

Sed: find & replace“Are you gonna talk about vim regexes?”“Sed regexes are weird”

My work around: use ranges

[0-9][A-Z][a-z][A-Za-z]

Sed: find & replace“Are you gonna talk about vim regexes?”Sed regexes are weird”

My work around: use ranges

[0-9][A-Z][a-z][A-Za-z]

E.g.:

“Oh noes, Americans don't know how to separate decimals!”

sed 's/./,/g' hisfile.tab > myfile.tab

“Oh noes, this bloody file was edited in Windows!”

sed 's/\r/\n/' theirfile.tab > decentfile.tab

“Oh noes, Cassava 1.6 has a slash in it!”

sed 's,/1, 1:N:0:NNNNNN,' oldfile.fq > newfile.fq

Other neat stuff

grep (-c)

sort (-n, -r, -k, -t)

uniq -c

LMGTFY:

sedhttp://www.tutorialspoint.com/unix/unix-regular-expressions.htm

grephttp://linux.about.com/od/commands/l/blcmdl1_grep.htm

Perlhttp://www.cs.tut.fi/~jkorpela/perl/regexp.html

Pythonhttp://docs.python.org/2/howto/regex.html

Vimhttp://vimregex.com/

sed 's/fear of regex/love of regex/g'