BIG DATA IN BIOINFORMATICS - BIST · BiG data is bioinformatics •Heterogeneous data •numerical...

18
BIG DATA IN BIOINFORMATICS

Transcript of BIG DATA IN BIOINFORMATICS - BIST · BiG data is bioinformatics •Heterogeneous data •numerical...

Page 1: BIG DATA IN BIOINFORMATICS - BIST · BiG data is bioinformatics •Heterogeneous data •numerical •non-numerical ... bioinformatics 24,500,000 chemoinformatics 275,000 astroinformatics

BIG DATA IN

BIOINFORMATICS

Page 2: BIG DATA IN BIOINFORMATICS - BIST · BiG data is bioinformatics •Heterogeneous data •numerical •non-numerical ... bioinformatics 24,500,000 chemoinformatics 275,000 astroinformatics

BIG DATA IS

BIOINFORMATICS

Page 3: BIG DATA IN BIOINFORMATICS - BIST · BiG data is bioinformatics •Heterogeneous data •numerical •non-numerical ... bioinformatics 24,500,000 chemoinformatics 275,000 astroinformatics

BiG data is bioinformatics• Heterogeneous data

• numerical• non-numerical

• Structures at different levels (from molecules to organisms)—images• Sequences• Longitudinal/dynamic—movies

• Multi-dimensional

• Collected at multiple sites• Produced by indivual small labs to large international consortiums

• Shared through the internet• Real time acces

• Need of integrative analysis

3

Page 4: BIG DATA IN BIOINFORMATICS - BIST · BiG data is bioinformatics •Heterogeneous data •numerical •non-numerical ... bioinformatics 24,500,000 chemoinformatics 275,000 astroinformatics

bioinformatics 24,500,000

chemoinformatics 275,000

astroinformatics 27,800

neuroinformatics 331,000

socioinformatics 14,100

geoinformatics 548,000

meteoinformatics 146

econoinformatics 2,010

ecoinformatics 92,800

physicoinformatics 5,390

Google search: X-informatics (june 4,2015)

Page 5: BIG DATA IN BIOINFORMATICS - BIST · BiG data is bioinformatics •Heterogeneous data •numerical •non-numerical ... bioinformatics 24,500,000 chemoinformatics 275,000 astroinformatics

0

5

10

15

20

25

30

35

40

45

17

60

17

70

17

80

17

90

18

00

18

10

18

20

18

30

18

40

18

50

18

60

18

70

18

80

18

90

# of commissioned years

Cedric Notredame, CRG

Number of scientific expeditions

Page 6: BIG DATA IN BIOINFORMATICS - BIST · BiG data is bioinformatics •Heterogeneous data •numerical •non-numerical ... bioinformatics 24,500,000 chemoinformatics 275,000 astroinformatics

Cedric Notredame, CRG

Page 7: BIG DATA IN BIOINFORMATICS - BIST · BiG data is bioinformatics •Heterogeneous data •numerical •non-numerical ... bioinformatics 24,500,000 chemoinformatics 275,000 astroinformatics

Thomas Heinis, EPFL

Page 8: BIG DATA IN BIOINFORMATICS - BIST · BiG data is bioinformatics •Heterogeneous data •numerical •non-numerical ... bioinformatics 24,500,000 chemoinformatics 275,000 astroinformatics

Stephens ZD et al. PLOS Biology, 2015

Page 9: BIG DATA IN BIOINFORMATICS - BIST · BiG data is bioinformatics •Heterogeneous data •numerical •non-numerical ... bioinformatics 24,500,000 chemoinformatics 275,000 astroinformatics

Big Data: Astronomical or Genomical?

Table 1. Four domains of Big Data in 2025

Page 10: BIG DATA IN BIOINFORMATICS - BIST · BiG data is bioinformatics •Heterogeneous data •numerical •non-numerical ... bioinformatics 24,500,000 chemoinformatics 275,000 astroinformatics

We are the Big Data

Page 11: BIG DATA IN BIOINFORMATICS - BIST · BiG data is bioinformatics •Heterogeneous data •numerical •non-numerical ... bioinformatics 24,500,000 chemoinformatics 275,000 astroinformatics

Wearable medical devices

Page 12: BIG DATA IN BIOINFORMATICS - BIST · BiG data is bioinformatics •Heterogeneous data •numerical •non-numerical ... bioinformatics 24,500,000 chemoinformatics 275,000 astroinformatics

Implantable wearable devices

Page 13: BIG DATA IN BIOINFORMATICS - BIST · BiG data is bioinformatics •Heterogeneous data •numerical •non-numerical ... bioinformatics 24,500,000 chemoinformatics 275,000 astroinformatics

Nanowearables

Page 14: BIG DATA IN BIOINFORMATICS - BIST · BiG data is bioinformatics •Heterogeneous data •numerical •non-numerical ... bioinformatics 24,500,000 chemoinformatics 275,000 astroinformatics

Stephens ZD et al. PLOS Biology, 2015

Moore’s Law

Page 15: BIG DATA IN BIOINFORMATICS - BIST · BiG data is bioinformatics •Heterogeneous data •numerical •non-numerical ... bioinformatics 24,500,000 chemoinformatics 275,000 astroinformatics
Page 16: BIG DATA IN BIOINFORMATICS - BIST · BiG data is bioinformatics •Heterogeneous data •numerical •non-numerical ... bioinformatics 24,500,000 chemoinformatics 275,000 astroinformatics

2 PB per 1 g DNA

Page 17: BIG DATA IN BIOINFORMATICS - BIST · BiG data is bioinformatics •Heterogeneous data •numerical •non-numerical ... bioinformatics 24,500,000 chemoinformatics 275,000 astroinformatics

“Goldman prediction”

• 2PB per 1g DNA (2 x 1015 bytes)

• Total world info (2013): 3ZBytes (3 x 1021 bytes)• Aproximately 1.5 x 106 g (1,5 tonnes of DNA) to store all information

• Information doubling time: 2 years

• Mass of earth: 6 x 1027 g (google)

• 1.5 x 106 x 2x/2 ≈ 6 x 1027 x ≈ 140 years

• the mass of total info in the world stored in DNA exceeds the mass of the Earth in year 2157

17

Page 18: BIG DATA IN BIOINFORMATICS - BIST · BiG data is bioinformatics •Heterogeneous data •numerical •non-numerical ... bioinformatics 24,500,000 chemoinformatics 275,000 astroinformatics

18

215 PB per 1 g DNA