Dbfs - Database filesystem

42
Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature Dbfs - Database filesystem 1 Timo Minartz Software project WS 2008/09 April 6, 2009 1 supervised by Julian Kunkel 1 / 25

Transcript of Dbfs - Database filesystem

Page 1: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Dbfs - Database filesystem 1

Timo Minartz

Software project WS 2008/09

April 6, 2009

1supervised by Julian Kunkel1 / 25

Page 2: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Inhalt

1 Concept and problem case

2 Software design

3 Implementation

4 Benchmarks

5 Conclusion and future work

6 Literature

2 / 25

Page 3: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Project goal

Problem case specific

• map filesystem sources and database tables in one namespace

• implement a lightweight filesystem with FUSE [Sou]

• easy to maintain database design

• minimize database overhead

General

• reusable software

• well documented

• usability

3 / 25

Page 4: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Project goal

Problem case specific

• map filesystem sources and database tables in one namespace

• implement a lightweight filesystem with FUSE [Sou]

• easy to maintain database design

• minimize database overhead

General

• reusable software

• well documented

• usability

3 / 25

Page 5: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Problem case

Initial situation

• a microscope generates lots of data in a specific folderhierarchy

• in particular it creates a tiff-File with a size of a few MByte

• this tiff-File is identicated by a collaboration, project, plate,replicate, well and file name

• there are multiple collaborations, projects, etc. so lots oftiff-Files are created

Further situation

• tiff-Files should be evaluated by different applications

• these applications store their results in simple files

• it should be easy to manage these files (i.e. by a databasesystem)

4 / 25

Page 6: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Problem case

Initial situation

• a microscope generates lots of data in a specific folderhierarchy

• in particular it creates a tiff-File with a size of a few MByte

• this tiff-File is identicated by a collaboration, project, plate,replicate, well and file name

• there are multiple collaborations, projects, etc. so lots oftiff-Files are created

Further situation

• tiff-Files should be evaluated by different applications

• these applications store their results in simple files

• it should be easy to manage these files (i.e. by a databasesystem)

4 / 25

Page 7: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Problem case (2)

Initial filestructure (base filesystem)

/collaboration/project/plate/replicate/well-file.tiff

Resulting filestructure (fuse filesystem, dbfs)

/collaboration/project/application/plate/replicate/well/file.tiff

5 / 25

Page 8: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Problem case (2)

Initial filestructure (base filesystem)

/collaboration/project/plate/replicate/well-file.tiff

Resulting filestructure (fuse filesystem, dbfs)

/collaboration/project/application/plate/replicate/well/file.tiff

5 / 25

Page 9: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Example

Base filesystem structure

/collab0/project0/plate0/replicate0/000-file1.tiff/collab0/project0/plate0/replicate0/000-file2.tiff/collab0/project0/plate0/replicate0/001-file3.tiff/collab0/project0/plate0/replicate0/metadata

Dbfs filestructure

/collab0/project0/application0/plate0/replicate0/000/file1.tiff/collab0/project0/application0/plate0/replicate0/000/file2.tiff/collab0/project0/application0/plate0/replicate0/001/file3.tiff/collab0/project0/application0/plate0/replicate0/metadata

6 / 25

Page 10: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Example

Base filesystem structure

/collab0/project0/plate0/replicate0/000-file1.tiff/collab0/project0/plate0/replicate0/000-file2.tiff/collab0/project0/plate0/replicate0/001-file3.tiff/collab0/project0/plate0/replicate0/metadata

Dbfs filestructure

/collab0/project0/application0/plate0/replicate0/000/file1.tiff/collab0/project0/application0/plate0/replicate0/000/file2.tiff/collab0/project0/application0/plate0/replicate0/001/file3.tiff/collab0/project0/application0/plate0/replicate0/metadata

6 / 25

Page 11: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Virtual files examples

Dbfs filesystem

/collaboration0/project0/application0/plate0/replicate0/000/ergs/collaboration0/project0/application0/plate0/replicate0/001/ergs

• virtual files are stored in database

• virtual files are identificated by collaboration, project, plate,replicate, well, file name AND application

7 / 25

Page 12: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Virtual files examples

Dbfs filesystem

/collaboration0/project0/application0/plate0/replicate0/000/ergs/collaboration0/project0/application0/plate0/replicate0/001/ergs

• virtual files are stored in database

• virtual files are identificated by collaboration, project, plate,replicate, well, file name AND application

7 / 25

Page 13: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Further constraints

Virtualization layers

• one for the application and

• one for the well

Permissions

• only read permission to tiff-Files

• permissions for metadata files inherited from base filesystem

• read and write permissions to virtual files on application level

• no structural changes allowed (chmod,mkdir,. . . )

8 / 25

Page 14: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Further constraints

Virtualization layers

• one for the application and

• one for the well

Permissions

• only read permission to tiff-Files

• permissions for metadata files inherited from base filesystem

• read and write permissions to virtual files on application level

• no structural changes allowed (chmod,mkdir,. . . )

8 / 25

Page 15: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Virtual files model

• table for every application

• table has columns for every subfolder and one for every virtualfile

Table: Example database table collaboration0 project0 application0

plate replicate well ergs

plate0 replicate0 000 “ergs for well 000”plate0 replicate0 001 “ergs for well 001”

9 / 25

Page 16: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Virtual files model

• table for every application

• table has columns for every subfolder and one for every virtualfile

Table: Example database table collaboration0 project0 application0

plate replicate well ergs

plate0 replicate0 000 “ergs for well 000”plate0 replicate0 001 “ergs for well 001”

9 / 25

Page 17: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Permissions model

• permissions on project level

• second table for permissions

• containing one column for application and one for the owner(user id from operating system)

Table: Example permission table permissions collaboration0 project0

name owner

application0 1000application1 1001

10 / 25

Page 18: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Permissions model

• permissions on project level

• second table for permissions

• containing one column for application and one for the owner(user id from operating system)

Table: Example permission table permissions collaboration0 project0

name owner

application0 1000application1 1001

10 / 25

Page 19: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Managing the directory structure

General

• changes in the base filesystem

• and in the database tables (i.e. new virtual files)

Howto

• “by hand”, see documentation and/or README file

• using a simple GUI

11 / 25

Page 20: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Managing the directory structure

General

• changes in the base filesystem

• and in the database tables (i.e. new virtual files)

Howto

• “by hand”, see documentation and/or README file

• using a simple GUI

11 / 25

Page 21: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Managing the directory structure (2)

Figure: Graphical user interface to manage the directory structure12 / 25

Page 22: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Optimizations and restrictions

Database overhead

• multiple users who need own database connections

• lots of queries are generated for a simple command (like ls)

Optimization

• thread-safe database pooling

• simple caching for query results

• both can be enabled in the sourcecode

Restrictions

• cache consistency problem

• if underlying base filesystem changes (creating new(sub-)folders etc.)

13 / 25

Page 23: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Optimizations and restrictions

Database overhead

• multiple users who need own database connections

• lots of queries are generated for a simple command (like ls)

Optimization

• thread-safe database pooling

• simple caching for query results

• both can be enabled in the sourcecode

Restrictions

• cache consistency problem

• if underlying base filesystem changes (creating new(sub-)folders etc.)

13 / 25

Page 24: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Optimizations and restrictions

Database overhead

• multiple users who need own database connections

• lots of queries are generated for a simple command (like ls)

Optimization

• thread-safe database pooling

• simple caching for query results

• both can be enabled in the sourcecode

Restrictions

• cache consistency problem

• if underlying base filesystem changes (creating new(sub-)folders etc.)

13 / 25

Page 25: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Implementation in C++

Implemented classes can be spread into 4 modules

• handling filesystem issues

• database access

• GUI and

• the helper classes and functions

Implemented filesystem operations

• gettattr

• readdir

• read and

• write

14 / 25

Page 26: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Implementation in C++

Implemented classes can be spread into 4 modules

• handling filesystem issues

• database access

• GUI and

• the helper classes and functions

Implemented filesystem operations

• gettattr

• readdir

• read and

• write

14 / 25

Page 27: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Implementation in C++ (2)

Further implementation details

• documentation (PDF)

• in-line documentation (doxygen)

• type make doc in software project root

15 / 25

Page 28: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

FUSE stumbling blocks

Mounting fuse without administrative privileges

• mount: ./dbfs mountpoint [args]

• umount: fusermount -u mountpoint

Logging

• fuse forks a new process, so logging to stdout is not possible

• the parameter -f prevents fuse from forking

• alternative: logging to a file (implemented)

Debugging with valgrind

• problem with older kernel versions: fusermount not traceable

• workaround available: see README in project root

• with kernel 2.6.27-11-generic working out-of-the-box

16 / 25

Page 29: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

FUSE stumbling blocks

Mounting fuse without administrative privileges

• mount: ./dbfs mountpoint [args]

• umount: fusermount -u mountpoint

Logging

• fuse forks a new process, so logging to stdout is not possible

• the parameter -f prevents fuse from forking

• alternative: logging to a file (implemented)

Debugging with valgrind

• problem with older kernel versions: fusermount not traceable

• workaround available: see README in project root

• with kernel 2.6.27-11-generic working out-of-the-box

16 / 25

Page 30: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

FUSE stumbling blocks

Mounting fuse without administrative privileges

• mount: ./dbfs mountpoint [args]

• umount: fusermount -u mountpoint

Logging

• fuse forks a new process, so logging to stdout is not possible

• the parameter -f prevents fuse from forking

• alternative: logging to a file (implemented)

Debugging with valgrind

• problem with older kernel versions: fusermount not traceable

• workaround available: see README in project root

• with kernel 2.6.27-11-generic working out-of-the-box

16 / 25

Page 31: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

The benchmark process

Testsets

• comparision of Dbfs and tmpfs

• evaluation of Dbfs• tmpfs as base filesystem• ext3 / tmpfs filesystem for the mysql database• clean / dirty database

Different use cases

• reading filesystem attributes

• reading metadata files and tiff-Files

• reading virtual files

• writing virtual files

17 / 25

Page 32: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

The benchmark process

Testsets

• comparision of Dbfs and tmpfs

• evaluation of Dbfs• tmpfs as base filesystem• ext3 / tmpfs filesystem for the mysql database• clean / dirty database

Different use cases

• reading filesystem attributes

• reading metadata files and tiff-Files

• reading virtual files

• writing virtual files

17 / 25

Page 33: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Metadata

Figure: Reading filesystem attributes18 / 25

Page 34: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Physical files

Figure: Read test for the physical files depending on blocksize19 / 25

Page 35: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Physical files (2)

Figure: Read test for the physical files, time for reading one byte

.

20 / 25

Page 36: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Physical files (3)

Figure: Read test for the physical files, bytes per sec

.

21 / 25

Page 37: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Virtual files

Figure: Read test for virtual files22 / 25

Page 38: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Virtual files (2)

Figure: Write test for virtual files23 / 25

Page 39: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Virtual files (3)

Figure: Read and write for virtual files24 / 25

Page 40: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Conclusion

Software project goal

• mapping filesystem and database sources in one namespacecan be solved by a fuse implementation

• good performance for physical files (stored on underlyingfilesystem)

• bottleneck for virtual files is not the database access itself

• concrete use case must take decision about using thisimplementation

Future work

• implementation issues (sql injection, dynamic virtualizationlayers)

25 / 25

Page 41: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

Conclusion

Software project goal

• mapping filesystem and database sources in one namespacecan be solved by a fuse implementation

• good performance for physical files (stored on underlyingfilesystem)

• bottleneck for virtual files is not the database access itself

• concrete use case must take decision about using thisimplementation

Future work

• implementation issues (sql injection, dynamic virtualizationlayers)

25 / 25

Page 42: Dbfs - Database filesystem

Concept and problem case Software design Implementation Benchmarks Conclusion and future work Literature

ROFS, the Read-Only Filesystem for FUSE.http://mattwork.potsdam.edu/projects/wiki/index.

php/Rofs

IEEE, The ; Group, The O.:The Open Group Base Specifications Issue 6.http://www.opengroup.org/onlinepubs/009695399/

functions/contents.html

Microsystems, Sun:MySQL 6.0 Reference Manual.http://dev.mysql.com/doc/refman/6.0/en/index.html

Sourceforge.net:Main Page - fuse.http://apps.sourceforge.net/mediawiki/fuse/index.

php?title=Main_Page

25 / 25