Efficient Shared Data in Perl
-
Upload
perrin-harkins -
Category
Technology
-
view
2.813 -
download
0
description
Transcript of Efficient Shared Data in Perl
Efficient Shared Data in Perl
Perrin Harkins
What’s your problem?
• Apache is multi-process
• Process assignment is random
• Information wants to be shared
• Inter-process data sharing is ad hoc
Sharing is good for
• Sessions
• Caching
• Usually transient data
• Otherwise, use a RDBMS
Approaches
• Files– One big file– One file per record
• DBM
• Shared memory– Seems like the obvious choice, but…
• RDBMS
Playing well together
• Atomic updates– Prevents corruption
• Exclusive Locking– Prevents lost updates– Without this, last save wins
PerlFund
Blossom Buttercup
$100
$105
$2100
$100
Cache::Cache
• Consistent interface to multiple storage methods– File system– Shared memory via IPC::ShareLite
• Many cache-related features built in– Expiration times– Size limit– Multiple namespaces
Cache::Cache, continued
• Atomic updates
• Easy to install– No compiler needed for file-based storage
• Benchmarks are on backend storage classes– Cache::FileBackend not Cache::FileCache
Cache::Mmap
• Uses one big mmap’ed file
• Many tuning options– Size of blocks– Size of locking regions
• Optimization for scalar data
• Uses locks internally
• Requires compiler
MLDBM::Sync
• Extension of MLDBM– Originally developed for Apache::ASP– Uses lock file, tie/untie
• Choice of DBM types– SDBM is fastest, but limited
• Tied interface• Locks on entire database• Explicit locking in API• Can run with standard library
BerkeleyDB
• Not DB_File, BerkeleyDB.pm• Requires Berkeley DB library from sleepycat.com• Tricky to install on some systems• Tied or OO interface• No built-in support for complex data structures• Locks on entire database or on pages• Supports transactions• Shared memory cache• Tests are on BTree
IPC::MM
• Interface for Engelschall’s mm
• Implements shared BTree and Hash in C
• Tied interface
• Data is not persistent
• Only shares between related processes
Tie::TextDir
• Dirt-simple: one record per file
• Keys must be legal file names
• No compiler needed
• Doesn’t handle complex data structures
IPC::Shareable
• Very Perlish and transparent
• Shared memory
• Lots going on under the hood
• Explicit locking supported
• Tied interface
• Requires a compiler
DBD::SQLite
• Fast, single-file SQL engine in a DBD
• Full transaction support!
• Locking between processes at database level
DBD::MySQL
• Adds network capabilities
• Atomic updates or transactions
• More work than most to set up
memcached
• Networked daemon• Intended for clusters• Non-blocking I/O• Clients for Perl, PHP, Java• Requires a Linux kernel patch, until 2.6 is
out
Testing Methodology
• P4 2.53 Ghz, 512MB RAM, Red Hat 9, ext3, Perl 5.8.0
• Abstraction layer IPC::SharedHash– Implements new(), fetch(), store()– Handles serialization where necessary– Calls FETCH() and STORE() instead of using tied
interface
• mod_perl handler• ab (Apache Bench)
Variables
• Number of parallel clients
• Percentage of writes– Sessions can have a lot of writes– Caches are mostly read, by definition
• Locality of access
• Scalars vs. complex data
Read-Only Sharing
0 100 200 300 400 500
reqs/sec
Cache::FileBackend
Cache::SharedMem
Cache::Mmap
Tie::TextDir
MLDBM::Sync
BerkeleyDB
IPC::MM
Effect of Increasing Clients0 100 200 300 400 500
IPC::MM
Cache::FileBackend
Cache::SharedMem
Cache::Mmap
Tie::TextDir
MLDBM::Sync
BerkeleyDB
reqs/sec
3010
1
Effect of Read/Write Ratio
0 100 200 300 400 500
IPC::MM
Cache::FileBackend
Cache::SharedMem
Cache::Mmap
Tie::TextDir
MLDBM::Sync
BerkeleyDB
reqs/sec
0%10%100%
Scalars vs. Complex Data Structures0 100 200 300 400 500
IPC::MM
Cache::FileBackend
Cache::Mmap
Tie::TextDir
MLDBM::Sync
BerkeleyDBreqs/sec
ScalarComplex
Latest Results
BerkeleyDB
IPC::MM
memcached
DBD::mysql (local)
DBD::mysql
Cache::Mmap
DBD::SQLite
Tie::textDir
MLDBM::Sync::SDBM
Cache::FileBackend
IPC::Shareable
Cache::Shared-MemoryBackend
0 25 50 75 100 125 150
Write/Read Accesses Per Second
Analysis
• Why is shared memory so slow?– Still has to serialize– Moving too much data at once
• What about IPC::MM?– Moves one at a time– Moving parts are in C
• Why is the file system so fast?– Modern VM system– Kernel-managed caching
Analysis
• Why is Tie::TextDir faster than Cache::FileBackend?– Digest::SHA1– Splitting into multiple directories not normally
necessary on modern filesystems:
/mu/lt/ip/ledirs
Problems with this test
• Size of values not considered• Size of overall hash not considered correctly• BerkeleyDB should be tested with fancier
lock mode• Needs a real network test for memchached
and MySQL• Should try harder to reduce margin of error
A Word About Clustering
• Shared filesystems– NFS– Samba/CIFS
• RDBMS– Most reliable, well understood, easy integration
• Replicated data– Multicast– Spread
What about threads?
• Apache 2/mod_perl 2/Perl 5.8 bring threads to the table
• Still not clear how this will work with complex data structures and objects
• Threaded performance is mostly bad in 5.8
Questions to help you choose
• Do you need to store complex data?– BerkeleyDB, Tie::TextDir, and IPC::MM require a
wrapper for this
• Are your keys valid filenames?– Tie::TextDir does not hash the keys
• Do you need persistence?– IPC::MM is not persistent
• Do you need explicit locking?– MLDBM::Sync, MySQL, BerkeleyDB
Questions to help you choose
• No compiler?– Cache::FileBackend, Tie::TextDir,
MLDBM::Sync if you have Storable
• Need clustering?– DBD::MySQL, memcached