Efficient Shared Data in Perl

30
Efficient Shared Data in Perl Perrin Harkins

description

This talk was o

Transcript of Efficient Shared Data in Perl

Page 1: Efficient Shared Data in Perl

Efficient Shared Data in Perl

Perrin Harkins

Page 2: Efficient Shared Data in Perl

What’s your problem?

• Apache is multi-process

• Process assignment is random

• Information wants to be shared

• Inter-process data sharing is ad hoc

Page 3: Efficient Shared Data in Perl

Sharing is good for

• Sessions

• Caching

• Usually transient data

• Otherwise, use a RDBMS

Page 4: Efficient Shared Data in Perl

Approaches

• Files– One big file– One file per record

• DBM

• Shared memory– Seems like the obvious choice, but…

• RDBMS

Page 5: Efficient Shared Data in Perl

Playing well together

• Atomic updates– Prevents corruption

• Exclusive Locking– Prevents lost updates– Without this, last save wins

PerlFund

Blossom Buttercup

$100

$105

$2100

$100

Page 6: Efficient Shared Data in Perl

Cache::Cache

• Consistent interface to multiple storage methods– File system– Shared memory via IPC::ShareLite

• Many cache-related features built in– Expiration times– Size limit– Multiple namespaces

Page 7: Efficient Shared Data in Perl

Cache::Cache, continued

• Atomic updates

• Easy to install– No compiler needed for file-based storage

• Benchmarks are on backend storage classes– Cache::FileBackend not Cache::FileCache

Page 8: Efficient Shared Data in Perl

Cache::Mmap

• Uses one big mmap’ed file

• Many tuning options– Size of blocks– Size of locking regions

• Optimization for scalar data

• Uses locks internally

• Requires compiler

Page 9: Efficient Shared Data in Perl

MLDBM::Sync

• Extension of MLDBM– Originally developed for Apache::ASP– Uses lock file, tie/untie

• Choice of DBM types– SDBM is fastest, but limited

• Tied interface• Locks on entire database• Explicit locking in API• Can run with standard library

Page 10: Efficient Shared Data in Perl

BerkeleyDB

• Not DB_File, BerkeleyDB.pm• Requires Berkeley DB library from sleepycat.com• Tricky to install on some systems• Tied or OO interface• No built-in support for complex data structures• Locks on entire database or on pages• Supports transactions• Shared memory cache• Tests are on BTree

Page 11: Efficient Shared Data in Perl

IPC::MM

• Interface for Engelschall’s mm

• Implements shared BTree and Hash in C

• Tied interface

• Data is not persistent

• Only shares between related processes

Page 12: Efficient Shared Data in Perl

Tie::TextDir

• Dirt-simple: one record per file

• Keys must be legal file names

• No compiler needed

• Doesn’t handle complex data structures

Page 13: Efficient Shared Data in Perl

IPC::Shareable

• Very Perlish and transparent

• Shared memory

• Lots going on under the hood

• Explicit locking supported

• Tied interface

• Requires a compiler

Page 14: Efficient Shared Data in Perl

DBD::SQLite

• Fast, single-file SQL engine in a DBD

• Full transaction support!

• Locking between processes at database level

Page 15: Efficient Shared Data in Perl

DBD::MySQL

• Adds network capabilities

• Atomic updates or transactions

• More work than most to set up

Page 16: Efficient Shared Data in Perl

memcached

• Networked daemon• Intended for clusters• Non-blocking I/O• Clients for Perl, PHP, Java• Requires a Linux kernel patch, until 2.6 is

out

Page 17: Efficient Shared Data in Perl

Testing Methodology

• P4 2.53 Ghz, 512MB RAM, Red Hat 9, ext3, Perl 5.8.0

• Abstraction layer IPC::SharedHash– Implements new(), fetch(), store()– Handles serialization where necessary– Calls FETCH() and STORE() instead of using tied

interface

• mod_perl handler• ab (Apache Bench)

Page 18: Efficient Shared Data in Perl

Variables

• Number of parallel clients

• Percentage of writes– Sessions can have a lot of writes– Caches are mostly read, by definition

• Locality of access

• Scalars vs. complex data

Page 19: Efficient Shared Data in Perl

Read-Only Sharing

0 100 200 300 400 500

reqs/sec

Cache::FileBackend

Cache::SharedMem

Cache::Mmap

Tie::TextDir

MLDBM::Sync

BerkeleyDB

IPC::MM

Page 20: Efficient Shared Data in Perl

Effect of Increasing Clients0 100 200 300 400 500

IPC::MM

Cache::FileBackend

Cache::SharedMem

Cache::Mmap

Tie::TextDir

MLDBM::Sync

BerkeleyDB

reqs/sec

3010

1

Page 21: Efficient Shared Data in Perl

Effect of Read/Write Ratio

0 100 200 300 400 500

IPC::MM

Cache::FileBackend

Cache::SharedMem

Cache::Mmap

Tie::TextDir

MLDBM::Sync

BerkeleyDB

reqs/sec

0%10%100%

Page 22: Efficient Shared Data in Perl

Scalars vs. Complex Data Structures0 100 200 300 400 500

IPC::MM

Cache::FileBackend

Cache::Mmap

Tie::TextDir

MLDBM::Sync

BerkeleyDBreqs/sec

ScalarComplex

Page 23: Efficient Shared Data in Perl

Latest Results

BerkeleyDB

IPC::MM

memcached

DBD::mysql (local)

DBD::mysql

Cache::Mmap

DBD::SQLite

Tie::textDir

MLDBM::Sync::SDBM

Cache::FileBackend

IPC::Shareable

Cache::Shared-MemoryBackend

0 25 50 75 100 125 150

Write/Read Accesses Per Second

Page 24: Efficient Shared Data in Perl

Analysis

• Why is shared memory so slow?– Still has to serialize– Moving too much data at once

• What about IPC::MM?– Moves one at a time– Moving parts are in C

• Why is the file system so fast?– Modern VM system– Kernel-managed caching

Page 25: Efficient Shared Data in Perl

Analysis

• Why is Tie::TextDir faster than Cache::FileBackend?– Digest::SHA1– Splitting into multiple directories not normally

necessary on modern filesystems:

/mu/lt/ip/ledirs

Page 26: Efficient Shared Data in Perl

Problems with this test

• Size of values not considered• Size of overall hash not considered correctly• BerkeleyDB should be tested with fancier

lock mode• Needs a real network test for memchached

and MySQL• Should try harder to reduce margin of error

Page 27: Efficient Shared Data in Perl

A Word About Clustering

• Shared filesystems– NFS– Samba/CIFS

• RDBMS– Most reliable, well understood, easy integration

• Replicated data– Multicast– Spread

Page 28: Efficient Shared Data in Perl

What about threads?

• Apache 2/mod_perl 2/Perl 5.8 bring threads to the table

• Still not clear how this will work with complex data structures and objects

• Threaded performance is mostly bad in 5.8

Page 29: Efficient Shared Data in Perl

Questions to help you choose

• Do you need to store complex data?– BerkeleyDB, Tie::TextDir, and IPC::MM require a

wrapper for this

• Are your keys valid filenames?– Tie::TextDir does not hash the keys

• Do you need persistence?– IPC::MM is not persistent

• Do you need explicit locking?– MLDBM::Sync, MySQL, BerkeleyDB

Page 30: Efficient Shared Data in Perl

Questions to help you choose

• No compiler?– Cache::FileBackend, Tie::TextDir,

MLDBM::Sync if you have Storable

• Need clustering?– DBD::MySQL, memcached