Efficient Shared Data in Perl

Efficient Shared Data in Perl

Perrin Harkins

What’s your problem?

• Apache is multi-process

• Process assignment is random

• Information wants to be shared

• Inter-process data sharing is ad hoc

Sharing is good for

• Sessions

• Caching

• Usually transient data

• Otherwise, use a RDBMS

Approaches

• Files– One big file– One file per record

• DBM

• Shared memory– Seems like the obvious choice, but…

• RDBMS

Playing well together

• Atomic updates– Prevents corruption

• Exclusive Locking– Prevents lost updates– Without this, last save wins

PerlFund

Blossom Buttercup

$100

$105

$2100

$100

Cache::Cache

• Consistent interface to multiple storage methods– File system– Shared memory via IPC::ShareLite

• Many cache-related features built in– Expiration times– Size limit– Multiple namespaces

Cache::Cache, continued

• Atomic updates

• Easy to install– No compiler needed for file-based storage

• Benchmarks are on backend storage classes– Cache::FileBackend not Cache::FileCache

Cache::Mmap

• Uses one big mmap’ed file

• Many tuning options– Size of blocks– Size of locking regions

• Optimization for scalar data

• Uses locks internally

• Requires compiler

MLDBM::Sync

• Extension of MLDBM– Originally developed for Apache::ASP– Uses lock file, tie/untie

• Choice of DBM types– SDBM is fastest, but limited

• Tied interface• Locks on entire database• Explicit locking in API• Can run with standard library

BerkeleyDB

• Not DB_File, BerkeleyDB.pm• Requires Berkeley DB library from sleepycat.com• Tricky to install on some systems• Tied or OO interface• No built-in support for complex data structures• Locks on entire database or on pages• Supports transactions• Shared memory cache• Tests are on BTree

IPC::MM

• Interface for Engelschall’s mm

• Implements shared BTree and Hash in C

• Tied interface

• Data is not persistent

• Only shares between related processes

Tie::TextDir

• Dirt-simple: one record per file

• Keys must be legal file names

• No compiler needed

• Doesn’t handle complex data structures

IPC::Shareable

• Very Perlish and transparent

• Shared memory

• Lots going on under the hood

• Explicit locking supported

• Tied interface

• Requires a compiler

DBD::SQLite

• Fast, single-file SQL engine in a DBD

• Full transaction support!

• Locking between processes at database level

DBD::MySQL

• Adds network capabilities

• Atomic updates or transactions

• More work than most to set up

memcached

• Networked daemon• Intended for clusters• Non-blocking I/O• Clients for Perl, PHP, Java• Requires a Linux kernel patch, until 2.6 is

out

Testing Methodology

• P4 2.53 Ghz, 512MB RAM, Red Hat 9, ext3, Perl 5.8.0

• Abstraction layer IPC::SharedHash– Implements new(), fetch(), store()– Handles serialization where necessary– Calls FETCH() and STORE() instead of using tied

interface

• mod_perl handler• ab (Apache Bench)

Variables

• Number of parallel clients

• Percentage of writes– Sessions can have a lot of writes– Caches are mostly read, by definition

• Locality of access

• Scalars vs. complex data

Read-Only Sharing

0 100 200 300 400 500

reqs/sec

Cache::FileBackend

Cache::SharedMem

Cache::Mmap

Tie::TextDir

MLDBM::Sync

BerkeleyDB

IPC::MM

Effect of Increasing Clients0 100 200 300 400 500

IPC::MM

Cache::FileBackend

Cache::SharedMem

Cache::Mmap

Tie::TextDir

MLDBM::Sync

BerkeleyDB

reqs/sec

3010

1

Effect of Read/Write Ratio

0 100 200 300 400 500

IPC::MM

Cache::FileBackend

Cache::SharedMem

Cache::Mmap

Tie::TextDir

MLDBM::Sync

BerkeleyDB

reqs/sec

0%10%100%

Scalars vs. Complex Data Structures0 100 200 300 400 500

IPC::MM

Cache::FileBackend

Cache::Mmap

Tie::TextDir

MLDBM::Sync

BerkeleyDBreqs/sec

ScalarComplex

Latest Results

BerkeleyDB

IPC::MM

memcached

DBD::mysql (local)

DBD::mysql

Cache::Mmap

DBD::SQLite

Tie::textDir

MLDBM::Sync::SDBM

Cache::FileBackend

IPC::Shareable

Cache::Shared-MemoryBackend

0 25 50 75 100 125 150

Write/Read Accesses Per Second

Analysis

• Why is shared memory so slow?– Still has to serialize– Moving too much data at once

• What about IPC::MM?– Moves one at a time– Moving parts are in C

• Why is the file system so fast?– Modern VM system– Kernel-managed caching

Analysis

• Why is Tie::TextDir faster than Cache::FileBackend?– Digest::SHA1– Splitting into multiple directories not normally

necessary on modern filesystems:

/mu/lt/ip/ledirs

Problems with this test

• Size of values not considered• Size of overall hash not considered correctly• BerkeleyDB should be tested with fancier

lock mode• Needs a real network test for memchached

and MySQL• Should try harder to reduce margin of error

A Word About Clustering

• Shared filesystems– NFS– Samba/CIFS

• RDBMS– Most reliable, well understood, easy integration

• Replicated data– Multicast– Spread

What about threads?

• Apache 2/mod_perl 2/Perl 5.8 bring threads to the table

• Still not clear how this will work with complex data structures and objects

• Threaded performance is mostly bad in 5.8

Questions to help you choose

• Do you need to store complex data?– BerkeleyDB, Tie::TextDir, and IPC::MM require a

wrapper for this

• Are your keys valid filenames?– Tie::TextDir does not hash the keys

• Do you need persistence?– IPC::MM is not persistent

• Do you need explicit locking?– MLDBM::Sync, MySQL, BerkeleyDB

Questions to help you choose

• No compiler?– Cache::FileBackend, Tie::TextDir,

MLDBM::Sync if you have Storable

• Need clustering?– DBD::MySQL, memcached

Efficient Shared Data in Perl

Technology

Transcript of Efficient Shared Data in Perl