HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li.

15
HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li

Transcript of HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li.

Page 1: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li.

HaploReg, RegulomeDB and more on Python programming

Lin LiuYang Li

Page 2: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li.
Page 3: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li.

• HaploReg retrieves the ENCODE annotation for the selected SNP, as well as other SNPs in LD

• Using the “Set Options” tab, the user can configure values such as the LD threshold and the population used from 1000 Genomes data used to calculate LD

Page 4: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li.

RegulomeDB

Page 5: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li.
Page 6: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li.
Page 7: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li.
Page 8: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li.
Page 9: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li.

Python programming wrap-up• if else• for and while loop• index: starts from 0, different from R• four important data structure:

– list: a = [1, 2, 3, 4]; a.append(5)– tuple: a = (‘cat’, ‘dog’); a[0], a[1] = a[1], a[0]– dictionary: a = {‘chr1’:{10254:’G’, 13257:’T’}}; a.keys();– sets:

• from sets import Set• species = Set([‘hs’, ‘mm’, ‘chimp’])• zoos = Set([‘mm’, ‘wolf’, ‘chimp’])• zoos | species• zoos & species• zoos - species

Page 10: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li.

• Some tricky fact:– Shallow copy and deep copy• Shallow copy: a = [1,2,3]; b = a; b[2] = 4; print(a)• Deep copy:

– from copy import deepcopy– a = [1, 2, 3]; b = deepcopy(a); b[2] = 4; print(a)

– List comprehension:• Like in R: loops are slow slow slow• a = [1, 2, 3]; a = [b + 1 for b in a]; print(a)

Page 11: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li.

• How to read bam (binary) files in python?– import pybedtools

• How to perform numerical computation in python?– import numpy as np– Include array and matrix calculation, very useful

• How to use shell script in python?– Get all files in a folder– import os– os.listdir(“yourdirectory”)

Page 12: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li.

Object oriented programming• Class and objects in pythonclass HMM: #constructor #transition_probs[i, j] is the probability of transitioning to state i from state j #emission_probs[i, j] is the probability of emitting emission j while in state i def __init__(self, transition_probs, emission_probs): self._transition_probs = transition_probs self._emission_probs = emission_probs

#accessors def emission_dist(self, emission): return self._emission_probs[:, emission]

@property def num_states(self): return self._transition_probs.shape[0]

@property def transition_probs(self): return self._transition_probs

Page 13: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li.

Interface with other programming language

• Rpy: R and python interface• cygwin: python and C interface• When to use python?– Text manipulation– Some simple machine learning implementation

(like using matlab)– Some very well-written package available: PyStan

(Bayesian MCMC sampler), matlablib, pybedtools etc

Page 14: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li.

• When not to use python:– Large scale simulation: most often you cannot get

rid of loops– Statistical analysis: R is much better and well

curated– Best strategy: C interface python

Page 15: HaploReg, RegulomeDB and more on Python programming Lin Liu Yang Li.

Some good reference code for python

• Check MACS14 python script• You can learn how to write a python script into

an executable software from MACS14