HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and...

Post on 21-May-2020

4 views 0 download

Transcript of HopsFS & ePipe - GitHub Pages · 2018-10-03 · HopsFS & ePipe 1 Distributed Computing and...

Mahmoud IsmailKTH

HopsFS & ePipe

�1

Distributed Computing and Analytics Workshop, September 26th 2018

�2

�2

�2Problem: Data layer to store millions of these images and their annotations

At Scale

�3

Open Images Dataset

At Scale

�3

Open Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetOpen Images DatasetDataset X

Requirements

•Reading/Writing millions of images with high throughput

•Attaching annotations to each image, and then searching using these annotations

�4

HDFS

�5

HDFS

�5

Hadoop Software Stack

HDFS

�5

Hadoop Software Stack

HDFS

�5

Hadoop Software Stack

HDFS

�5

Hadoop Software Stack

HDFS

�5

Hadoop Software Stack

HDFS Architecture

�6

HDFS Architecture

�6

DataNodes

HDFS Architecture

�6

DataNodes

NameNode

HDFS Architecture

�6

DataNodes

HDFS Client

NameNode

HDFS Architecture

�6

DataNodes

HDFS Client

File1

NameNode

HDFS Architecture

�6

DataNodes

HDFS Client

File1

Where can I save the file?

NameNode

HDFS Architecture

�6

DataNodes

HDFS Client

File1

Where can I save the file?

DataNodes Addresses

NameNode

HDFS Architecture

�6

DataNodes

HDFS Client

File1

NameNode

HDFS Architecture

�6

DataNodes

HDFS Client

File1

NameNode

HDFS Architecture

�6

DataNodes

HDFS Client

File1

NameNode

HDFS Architecture

�6

DataNodes

HDFS Client

File Blocks Mappings

File System Metadata

File1

NameNode

HDFS Architecture

�6

DataNodes

File Blocks Mappings

File System Metadata

File1 Blk1 ! DN1, Blk2 ! DN5, Blk3 ! DN3

NameNode

HDFS Architecture

�6

DataNodes

File Blocks Mappings

File System Metadata

File1 Blk1 ! DN1, Blk2 ! DN5, Blk3 ! DN3 File2 Blk1 ! DN1, Blk2 ! DN4File3 Blk1 ! DN1, Blk2 ! DN2, Blk3 ! DN3File4 Blk1 ! DN100File5 Blk1 ! DN4, Blk2 ! DN2, Blk3 ! DN9

… … … …FileN Blk1 ! DN2, Blk2 ! DN8

NameNode

HDFS Performance at Scale

�7

DataNode

NameNode

HDFS Performance at Scale

�7

DataNode

NameNode

HDFS Performance at Scale

�7

DataNode

NameNode

HDFS Performance at Scale

�7

DataNode

NameNode

`

HDFS Limitations

• Namespace size upper bound: ~ 500 million files

• At most 70-80 thousands file system operations / sec

�9

HopsFS

�10

HopsFS

�10

HopsFS

�10

HopsFS Architecture

�1111

NameNodeFile Blocks Mappings

File System Metadata

File1 Blk1 ! DN1, Blk2 ! DN4, Blk3 ! DN5

File5 Blk1 ! DN4, Blk2 ! DN2, Blk3 ! DN9

File4 Blk1 ! DN100

File3 Blk1 ! DN1, Blk2 ! DN2, Blk3 ! DN3

File2 Blk1 ! DN1, Blk2 ! DN4

FileN Blk1 ! DN2, Blk2 ! DN8

HopsFS Architecture

�1111

NameNode File Blocks Mappings

File1 Metadata

File2 Metadata

File Blocks Mappings

File3 Metadata

File4 Metadata

File Blocks Mappings

File5 Metadata

File6 Metadata

File Blocks Mappings

File7 Metadata

File8 Metadata

HopsFS Architecture

�1111

NameNode File Blocks Mappings

File1 Metadata

File2 Metadata

File Blocks Mappings

File3 Metadata

File4 Metadata

File Blocks Mappings

File5 Metadata

File6 Metadata

File Blocks Mappings

File7 Metadata

File8 Metadata

Distributed Database

HopsFS Architecture

�1111

NameNode

File Blocks Mappings

File1 Metadata

File2 Metadata

File Blocks Mappings

File3 Metadata

File4 Metadata

File Blocks Mappings

File5 Metadata

File6 Metadata

File Blocks Mappings

File7 Metadata

File8 Metadata

Distributed Database

HopsFS Architecture

�1111

NameNode

File Blocks Mappings

File1 Metadata

File2 Metadata

File Blocks Mappings

File3 Metadata

File4 Metadata

File Blocks Mappings

File5 Metadata

File6 Metadata

File Blocks Mappings

File7 Metadata

File8 Metadata

Distributed Database

HopsFS Architecture

�1111

NameNode

File Blocks Mappings

File1 Metadata

File2 Metadata

File Blocks Mappings

File3 Metadata

File4 Metadata

File Blocks Mappings

File5 Metadata

File6 Metadata

File Blocks Mappings

File7 Metadata

File8 Metadata

Distributed Database

HopsFS Architecture

�1111

NameNode

File Blocks Mappings

File1 Metadata

File2 Metadata

File Blocks Mappings

File3 Metadata

File4 Metadata

File Blocks Mappings

File5 Metadata

File6 Metadata

File Blocks Mappings

File7 Metadata

File8 Metadata

Distributed Database

HopsFS Architecture

�1111

NameNode

File Blocks Mappings

File1 Metadata

File2 Metadata

File Blocks Mappings

File3 Metadata

File4 Metadata

File Blocks Mappings

File5 Metadata

File6 Metadata

File Blocks Mappings

File7 Metadata

File8 Metadata

Distributed Database

HopsFS Architecture

�1111

NameNode

File Blocks Mappings

File1 Metadata

File2 Metadata

File Blocks Mappings

File3 Metadata

File4 Metadata

File Blocks Mappings

File5 Metadata

File6 Metadata

File Blocks Mappings

File7 Metadata

File8 Metadata

Distributed Database

HopsFS Scalability

• 16X-37X the throughput of HDFS

• 37 times more files than HDFS

• 10 times lower latency

�12

Scale Challenge Winner (2017)Hops

Integration with NVMe

�13

https://cloud.google.com/compute/docs/disks/performance

Integration with NVMe

�13

https://cloud.google.com/compute/docs/disks/performance

Integration with NVMe

�13

https://cloud.google.com/compute/docs/disks/performance

HDFS (and S3) are designed around large blocks (optimized to overcome slow random I/O on disks), while new NVMe hardware supports fast random disk I/O (and potentially small blocks sizes)

Small files

�14

Small files

�14

0

0.2

0.4

0.6

0.8

1

1 KB

4 KB

5 KB

6 KB

8 KB

16 KB

32 KB

64 KB

100 KB

512 KB

1 MB

8 MB

64 MB

256 MB

1 GB

128 GB

CDF

File Size

a. File Size Distribution

Yahoo HDFS File DistributionSpotify HDFS File Distribution

LC HopsFS File Distribution At Yahoo! and Spotify ≈20% of the files are less than 4 KB. Logical Clocks’ HopsFS cluster ≈68% of the files are less than 4 KB

Small files

�14

0

0.2

0.4

0.6

0.8

1

1 KB

4 KB

5 KB

6 KB

8 KB

16 KB

32 KB

64 KB

100 KB

512 KB

1 MB

8 MB

64 MB

256 MB

1 GB

128 GB

CDF

File Size

a. File Size Distribution

Yahoo HDFS File DistributionSpotify HDFS File Distribution

LC HopsFS File Distribution At Yahoo! and Spotify ≈20% of the files are less than 4 KB. Logical Clocks’ HopsFS cluster ≈68% of the files are less than 4 KB

0

0.2

0.4

0.6

0.8

1

1 KB

4 KB

5 KB

6 KB

8 KB

16 KB

32 KB

64 KB

100 KB

512 KB

1 MB

8 MB

64 MB

256 MB

1 GB

128 GB

CDF

File Size

b. File Operations Distribution

Spotify HDFS File Ops DistributionLC HopsFS File Ops Distribution At Spotify, and Logical

Clocks ≈ 42% and ≈18% of all the file system operations are performed on files less than 16 KB files

Size Matters

�15

Small Files performance in HopsFS

�16

Open Images dataset

�17

83.5% of the files in the dataset are ⩽ 64 KB.

Open Images Dataset

�18

Requirements

•Reading/Writing millions of images with high throughput

• Attaching annotations to each image, and then searching using these annotations

�19

Attaching Extended Metadata

�20

Attaching Extended Metadata

�20

Attaching Extended Metadata

�20

Attaching Extended Metadata

�20

Foreign key

Attaching Extended Metadata

�20

attach /images/1.jpeg ’1 cat and 1 guitar’

Foreign key

Attaching Extended Metadata

�20

attach /images/1.jpeg ’1 cat and 1 guitar’

Foreign key

Attaching Extended Metadata

�20

attach /images/1.jpeg ’1 cat and 1 guitar’

Foreign key

Free text search?

HopsFS | ElasticSearch

�21

HopsFS | ElasticSearch

�21

HopsFS

HopsFS | ElasticSearch

�21

HopsFS ElasticSearch

HopsFS | ElasticSearch

�21

HopsFS ElasticSearch

HopsFS | ElasticSearch

�21

HopsFS ElasticSearch

1.jpeg

HopsFS | ElasticSearch

�21

HopsFS ElasticSearch

1 Dog

1.jpeg

HopsFS | ElasticSearch

�21

HopsFS ElasticSearch

1 Dog

1.jpeg

dog [1.jpeg,……]

HopsFS | ElasticSearch

�21

HopsFS ElasticSearch

1 Dog

1.jpeg

dog [1.jpeg,……]

Get All images that has a dog

HopsFS | ElasticSearch

�21

HopsFS ElasticSearch

1 Dog

1.jpeg

dog [1.jpeg,……]

Get All images that has a dog

HopsFS | ElasticSearch

�21

HopsFS ElasticSearch

1 Dog1 Cat and 1 Guitar

1.jpeg

dog [1.jpeg,……]

Get All images that has a dog

HopsFS | ElasticSearch

�21

HopsFS ElasticSearch

1 Dog1 Cat and 1 Guitar

1.jpeg

dog [1.jpeg,……]

Get All images that has a dog

?

HopsFS | ElasticSearch

�21

HopsFS ElasticSearch

1 Dog1 Cat and 1 Guitar

1.jpeg

dog [1.jpeg,……]

Get All images that has a dog

?

Store X

HopsFS | ElasticSearch

�21

HopsFS ElasticSearch

1 Dog1 Cat and 1 Guitar

1.jpeg

dog [1.jpeg,……]

Get All images that has a dog

?

Store X

ePipe

�22

ePipe

ePipe

�22

HopsFS

ePipe

ePipe

�22

HopsFS

NDB ePipe

ePipe

�22

HopsFS

NDB ePipe

ePipe

�22

HopsFS

NDB

Log fs changes

ePipe

ePipe

�22

HopsFS

NDB

Log fs changes

ChangeStream ePipe

ePipe

�22

HopsFS

NDB

Log fs changes

ChangeStream ePipe

ePipe

�22

HopsFS

NDB

Log fs changes

ChangeStream ePipe

ElasticSearch

ePipe

�22

HopsFS

NDB

Log fs changes

ChangeStream ePipe

ElasticSearch

Store X

Store Y

ePipe

�22

HopsFS

NDB

Log fs changes

ChangeStream ePipe

ElasticSearch

Store X

Store Y

ePipe

�22

HopsFS

NDB

Log fs changes

ChangeStream ePipe

ElasticSearch

Store X

Store Y

App A

App B

ePipe

�22

HopsFS

NDB

Log fs changes

ChangeStream ePipe

ElasticSearch

Store X

Store Y

App A

App B

ePipe

�23

HopsFS

NDBePipe

ePipe

�23

HopsFS

NDB

Create f1

ePipe

Append f1

ePipe

�23

HopsFS

NDB

Create f1

ePipe

Append f1

Epoch1

Create f1Append f1

ePipe

�23

HopsFS

NDB

Create f1

ePipe

Append f1

Epoch1

Create f1Append f1

ePipe

�23

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1

Create f1Append f1

ePipe

�23

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2

Create f1Append f1

Create f2Delete f1

ePipe

�23

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2

Create f1Append f1

Create f2Delete f1

ePipe

�23

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2

Create f1Append f1

Create f2Delete f1

Delete f2

ePipe

�23

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2Epoch3

Create f1Append f1

Create f2Delete f1

Delete f2

Delete f2

ePipe

�23

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2Epoch3

Create f1Append f1

Create f2Delete f1

Delete f2

Delete f2

Order?Order?

ePipe

�23

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2Epoch3

Create f1Append f1

Create f2Delete f1

Delete f2

Delete f2

Order?Order?

ePipe

�23

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2Epoch3

Create f1Append f1

Create f2Delete f1

Delete f2

Delete f2

Order?Order?

NDB Ordering Properties

• Property 1: epochs are totally ordered.

• Property 2: Changes within the same transaction happen in the same epoch.

• Property 3: Changes on files are ordered only if they are in different epochs, that is, no ordering is guaranteed within the same epoch

�24

Adding version numbers

�25

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2Epoch3

Create f1Append f1

Create f2Delete f1

Delete f2

Delete f2

Order?Order?

Adding version numbers

�25

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2Epoch3

Create f1Append f1

Create f2Delete f1

Delete f2

Delete f2

Order?Order?

, 1

, 1

Adding version numbers

�25

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2Epoch3

Create f1Append f1

Create f2Delete f1

Delete f2

Delete f2

Order?Order?

, 1, 2

, 1, 2

Adding version numbers

�25

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2Epoch3

Create f1Append f1

Create f2Delete f1

Delete f2

Delete f2

Order?Order?

, 1, 2

, 1, 2

Adding version numbers

�25

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2Epoch3

Create f1Append f1

Create f2Delete f1

Delete f2

Delete f2

Order?Order?

, 1, 2

, 1, 3

, 1, 2

, 1, 3

Adding version numbers

�25

HopsFS

NDB

Create f1

ePipe

Append f1Create f2Delete f1

Epoch1Epoch2Epoch3

Create f1Append f1

Create f2Delete f1

Delete f2

Delete f2

Order?Order?

, 1, 2

, 1, 3, 2

, 1, 2

, 1, 3

, 2

ePipe Ordering Properties

•Property 4 & 5: Version number ensures serializability of changes on the same file/directory within epochs.

•Property 6: The order of changes for different files/directories within the same epoch doesn't matter.

�26

ePipe

• Low replication lag (~100msec)

• High throughput

�27

Requirements

• Reading/Writing millions of images with high throughput

• Attaching annotations to each image, and then searching using these annotations

�28

Questions?

�29