Regexes:It's magic!
“Some people, when confronted with a problem, think 'I know, I'll use regular expressions!'
Now they have two problems.”
*
Perl style regex:It's magic done right!
Metacharacters
^ beginning
$ end
. anything
\ escape
/^....G..AA$/
Escaped characters
\s whitespace
\S not-whitespace
\w word
\d digit
\. dot
\\ counterslash
/^\w\w\w\wG\w\wAA$/
/^\d\d\\\d\d\\\d\d\d\d$/
Repetition
? 0 or 1 time
* 0 or more times
+ 1 or more times
*? ungreedy *
+? ungreedy +
{m} m times
{m, n} m up to n times
{m, n}? ungreedy {m,n}
/^\w{4}G\w{2}AA$/
/^\d{1,2}\\\d{1,2}\\\d{2,4}$/
Grouping
[ABC] any of these characters
(AB|BC|CA) any of these expressions
(THIS!) save this
[A-Za-z0-9] ranges
/^[ACTG]{4}G[ACTG]{2}AA$/
/^(0?[1-9]|[0-2]\d|3[01])\\(0?\d|1[0-2])\\(\d{2}|\d{4})$/
OVERKILL
http://nbviewer.ipython.org/url/norvig.com/ipython/xkcd1313.ipynb
In Python (sigh...)
E.g.: finding files
E.g.: finding files
ls -la | grep '->' | grep -v 'bubo' | grep -v 'Daniel'
E.g.: demultiplexing fasta
1. Barcode
2. Primer
3. Random nucleotides
grep -P '1:N:0:ACTGGTT' -A3 –no-group-separator multiplex_R1.fastq | grep -P '^[ACTGN]{4}CCC[ACGT]T[GC]AGATA' -A2 -B1 --no-group-separator > deplexed_R1.fq
E.g.: paper figures!
From the subset of unique sequences that span the entire region under study, how many unique sequences are matched by each primer combination?
Sed: find & replace“Are you gonna talk about vim regexes?”“Sed regexes are weird”
My work around: use ranges
[0-9][A-Z][a-z][A-Za-z]
Sed: find & replace“Are you gonna talk about vim regexes?”Sed regexes are weird”
My work around: use ranges
[0-9][A-Z][a-z][A-Za-z]
E.g.:
“Oh noes, Americans don't know how to separate decimals!”
sed 's/./,/g' hisfile.tab > myfile.tab
“Oh noes, this bloody file was edited in Windows!”
sed 's/\r/\n/' theirfile.tab > decentfile.tab
“Oh noes, Cassava 1.6 has a slash in it!”
sed 's,/1, 1:N:0:NNNNNN,' oldfile.fq > newfile.fq
Other neat stuff
grep (-c)
sort (-n, -r, -k, -t)
uniq -c
LMGTFY:
sedhttp://www.tutorialspoint.com/unix/unix-regular-expressions.htm
grephttp://linux.about.com/od/commands/l/blcmdl1_grep.htm
Perlhttp://www.cs.tut.fi/~jkorpela/perl/regexp.html
Pythonhttp://docs.python.org/2/howto/regex.html
Vimhttp://vimregex.com/
sed 's/fear of regex/love of regex/g'