Introduction to memcached

Post on 28-Jan-2018

49.416 views 4 download

Transcript of Introduction to memcached

INTRODUCTION TOMEMCACHED

Tagsmemcached, performance, scalability, php, mySQL, caching techniques, #ikdoeict

jurriaanpersyn.comlead web dev at Netlog since 4 yearsphp + mysql + frontendworking on Gatcha

For who?talk for students professional bachelor ICT www.ikdoeict.be

Why this talk?One of the first things I’ve learnt at Netlog. Using it every single day.

Program- About caching- About memcached- Examples- Tips & tricks- Toolsets and other solutions

What is caching?A copy of real data with faster (and/or cheaper) access

• From Wikipedia: "A cache is a collection of data duplicating original values stored elsewhere or computed earlier, where the original data is expensive to fetch (owing to longer access time) or to compute, compared to the cost of reading the cache."

• Term introducted by IBM in the 60’s

What is caching?

• simple key/value storage

• simple operations

• save

• get

• delete

The anatomy

• storage cost

• retrieval cost (network load / algorithm load)

• invalidation (keeping data up to date / removing irrelevant data)

• replacement policy (FIFO/LFU/LRU/MRU/RANDOM vs. Belady’s algorithm)

• cold cache / warm cache

Terminology

• cache hit and cache miss

• typical stats:

• hit ratio (hits / hits + misses)

• miss ratio (1 - hit ratio)

• 45 cache hits and 10 cache misses

• 45/(45+10) = 82% hit ratio

• 18% miss ratio

Terminology

• caches are only efficient when the benefits of faster access outweigh the overhead of checking and keeping your cache up to date

• more cache hits then cache misses

When to cache?

• at hardware level (cpu, hdd)

• operating systems (ram)

• web stack

• applications

• your own short term vs long term memory

Where are caches used?

• Browser cache

• DNS cache

• Content Delivery Networks (CDN)

• Proxy servers

• Application level

• full output caching (eg. Wordpress WP-Cache plugin)

• ...

Caches in the web stack

• Application level

• opcode cache (APC)

• query cache (MySQL)

• storing denormalized results in the database

• object cache

• storing values in php objects/classes

Caches in the web stack (cont’d)

• the earlier in the process, the closer to the original request(er), the faster• browser cache will be faster then cache on a proxy

• but probably also the harder to get it right• the closer to the requester the more parameters the cache

depends on

Efficiency of caching?

• As PHP backend developer, what to cache?

• expensive operations: operations that work with slower resources

• database access

• reading files (in fact, any filesystem access)

• API calls

• Heavy computations

• XML

What to cache on the server-side?

• As PHP backend developer, where to store cache results?

• in database (computed values, generated html)• you’ll still need to access your database

• in static files (generated html or serialized php values)• you’ll still need to access your file system

Where to cache on the server-side?

in memory!

memcached

• Free & open source, high-performance, distributed memory object caching system

• Generic in nature, intended for use in speeding up dynamic web applications by alleviating database load.

• key/value dictionary

About memcached

• Developed by Brad Fitzpatrick for LiveJournal in 2003

• Now used by Netlog, Facebook, Flickr, Wikipedia, Twitter, YouTube ...

About memcached (cont’d)

• It’s a server

• Client access over TCP or UDP

• Servers can run in pools

• eg. 3 servers with 64GB mem each give you a single pool of 192GB storage for caching

• Servers are independent, clients manage the pool

Technically

• high demand (used often)

• expensive (hard to compute)

• common (shared accross users)

• Best? All three

What to store in memcache?

• Typical:

• user sessions (often)

• user data (often, shared)

• homepage data (eg. often, shared, expensive)

What to store in memcache? (cont’d)

• Workflow:

• monitor application (query logs / profiling)

• add a caching level

• compare speed gain

What to store in memcache? (cont’d)

• Fast network access (memcached servers close to other application servers)

• No persistency (if your server goes down, data in memcached is gone)

• No redundancy / fail-over

• No replication (single item in cache lives on one server only)

• No authentication (not in shared environments)

Memcached principles

• 1 key is maximum 1MB

• keys are strings of 250 characters (in application typically MD5 of user readable string)

• No enumeration of keys (thus no list of valid keys in cache at certain moment, list of keys beginnen with “user_”, ...)

• No active clean-up (only clean up when more space needed, LRU)

Memcached principles (cont’d)

$ telnet localhost 11211Trying 127.0.0.1...Connected to localhost.Escape character is '^]'.get fooVALUE foo 0 2hiENDstatsSTAT pid 8861(etc)

• both ASCII as Binary protocol

• in real life:

• clients available for all major languages

• C, C++, PHP, Python, Ruby, Java, Perl, Windows, ...

Client Access

• Support the basics such as multiple servers, setting values, getting values, incrementing, decrementing and getting stats.

• pecl/memcache

• pecl/memcached

• newer, in beta, a couple more features

PHP Clients

pecl/memcache pecl/memcachedFirst Release Date 2004-06-08 2009-01-29 (beta)Actively Developed? Yes YesExternal Dependency None libmemcached FeaturesAutomatic Key Fixup Yes NoAppend/Prepend No YesAutomatic Serialzation2 Yes YesBinary Protocol No OptionalCAS No YesCompression Yes YesCommunication Timeout Connect Only Various OptionsConsistent Hashing Yes YesDelayed Get No YesMulti-Get Yes YesSession Support Yes YesSet/Get to a specific server No YesStores Numerics Converted to Strings Yes

PHP Client Comparison

• Memcached::add — Add an item under a new key

• Memcached::addServer — Add a server to the server pool• Memcached::decrement — Decrement numeric item's value

• Memcached::delete — Delete an item

• Memcached::flush — Invalidate all items in the cache

• Memcached::get — Retrieve an item• Memcached::getMulti — Retrieve multiple items

• Memcached::getStats — Get server pool statistics

• Memcached::increment — Increment numeric item's value

• Memcached::set — Store an item• ...

PHP Client functions

• Pages with high load / expensive to generate

• Very easy

• Very fast

• But: all the dependencies ...

• language, css, template, logged in user’s details, ...

Output caching

<?php

$html = $cache->get('mypage');if (!$html){ ob_start(); echo "<html>"; // all the fancy stuff goes here echo "</html>"; $html = ob_get_contents(); ob_end_clean(); $cache->set('mypage', $html);}echo $html;

?>

• on a lower level

• easier to find all dependencies

• ideal solution for offloading database queries

• the database is almost always the biggest bottleneck in backend performance problems

Data caching

<?php

function getUserData($UID){ $key = 'user_' . $UID; $userData = $cache->get($key); if (!$userData) { $queryResult = Database::query("SELECT * FROM USERS WHERE uid = " . (int) $UID); $userData = $queryResult->getRow(); $cache->set($userData); } return $userData;}

?>

“There are only two hard things in Computer Science: cache invalidation and naming things.”

Phil Karlton

• Caching for a certain amount of time

• eg. 10 minutes

• don’t delete caches

• thus: You can’t trust that data coming from cache is correct

Invalidation

• Use: Great for summaries

• Overview

• Pages where it’s not that big a problem if data is a little bit out of dat (eg. search results)

• Good for quick and dirty optimizations

Invalidation (cont’d)

• Store forever, and expire on certain events

• the userdata example

• store userdata for ever

• when user changes any of his preferences, throw cache away

Invalidation (cont’d)

• Use:

• data that is fetched more then it’s updated

• where it’s critical the data is correct

• Improvement: instead of delete on event, update cache on event. (Mind: race conditions. Cache invalidation always as close to original change as possible!)

Invalidation

• sessions (cross server)

• database results (via database class, or object caching)

• flooding checks

• output caching (eg. for RSS feeds)

• locks

Uses at Netlog

<?phpfunction getUserData($UID){ $db = DB::getInstance(); $db->prepare("SELECT * FROM USERS WHERE uid = {UID}"); $db->assignInt('UID', $UID); $db->execute(); return $db->getRow();}?>

<?phpfunction getUserData($UID){ $db = DB::getInstance(); $db->prepare("SELECT * FROM USERS WHERE uid = {UID}"); $db->assignInt('UID', $UID); $db->setCacheTTL(0); // cache forever $db->execute(); return $db->getRow();}?>

<?phpfunction getUserData($UID, $invalidateCache = false){ $db = DB::getInstance(); $db->prepare("SELECT * FROM USERS WHERE uid = {UID}"); $db->assignInt('UID', $UID); $db->setCacheTTL(0); // cache forever if ($invalidateCache) { return $db->invalidateCache(); } $db->execute(); return $db->getRow();}?>

<?phpfunction updateUserData($UID, $data){ $db = DB::getInstance(); $db->prepare("UPDATE USERS SET ... WHERE uid = {UID}");

... getUserData($UID, true); // invalidate cache return $result;}?>

<?phpfunction getLastBlogPosts($UID, $start = 0,

$limit = 10, $invalidateCache = false){ $db = DB::getInstance(); $db->prepare("SELECT blogid FROM BLOGS WHERE uid = {UID} ORDER BY dateadd DESC LIMIT {start}, {limit}"); $start; $limit; $UID; $db->setCacheTTL(0); // cache forever if ($invalidateCache) { return $db->invalidateCache(); } $db->execute(); return $db->getResults();}?>

<?phpfunction addNewBlogPost($UID, $data){ $db = DB::getInstance(); $db->prepare("INSERT INTO BLOGS ..."); ...// invalidate caches

getLastBlogPosts($UID, 0, 10); getLastBlogPosts($UID, 11, 20);... // ???

return $result;}?>

<?phpfunction getLastBlogPosts($UID, $start = 0, $limit = 10){ $cacheVersionNumber = CacheVersionNumbers:: get('lastblogsposts_' . $UID); $db = DB::getInstance(); $db->prepare("SELECT blogid FROM ..."); ... $db->setCacheVersionNumber($cacheVersionNumber); $db->setCacheTTL(0); // cache forever $db->execute(); return $db->getResults();}?>

<?phpclass CacheVersionNumbers{ public static function get($name) { $result = $cache->get('cvn_' . $name); if (!$result) { $result = microtime() . rand(0, 1000); $cache->set('cvn_' . $name, $result); } return $result; } public static function bump($name) { return $cache->delete('cvn_' . $name); }}?>

<?phpfunction addNewBlogPost($UID, $data){ $db = DB::getInstance(); $db->prepare("INSERT INTO BLOGS ...");

... CacheVersionNumbers::bump('lastblogsposts_' . $UID); return $result;}?>

• queries with JOIN and WHERE statements are harder to cache

• often not easy to find the cache key on update/change events

• solution: JOIN in PHP

Query Caching (cont’d)

• queries with JOIN and WHERE statements are harder to cache

• often not easy to find the cache key on update/change events

• solution: JOIN in PHP

• In following example: what if nickname of user changes?

Query Caching (cont’d)

<?php $db = DB::getInstance();$db->prepare("SELECT c.comment_message, c.comment_date, u.nickname FROM COMMENTS c JOIN USERS u ON u.uid = c.commenter_uid

WHERE c.postid = {postID}");...?>

<?php $db = DB::getInstance();$db->prepare("SELECT c.comment_message, c.comment_date,

c.commenter_uid AS uid FROM COMMENTS c WHERE c.postid = {postID}");...$comments = Users::addUserDetails($comments);...?>

<?php...public static function addUserDetails($array){ foreach($array as &$item) { $item = array_merge($item,

self::getUserData($item['uid'])); // assume high hit ratio

} return $item;} ...?>

• Pro’s:

• speed, duh.

• queries get simpler (better for your db)

• easier porting to key/value storage solutions

• Cons:

• You’re relying on memcached to be up and have good hit ratios

So?

• We reduced database access

• Memcached is faster, but access to memcache still has it’s price

• Solution: multiget

• fetch multiple keys from memcached in one single call

• result is array of items

Multi-Get Optimisations

• back to addUserDetails example

• find UID’s from array

• multiget to memcached for details of UID’s

• for UID’s without result, do a query• SELECT ... FROM USERS WHERE uid IN (...)

• for each fetched user, store in cache

• worst case (no hits): 1 query

• return merged cache/db results

Multi-Get Optimisations (cont’d)

• client is responsible for managing pool

• hashes a certain key to a certain server

• clients can be naïve: distribute keys on size of pool

• if one server goes down, all keys will now be queried on other servers > cold cache

• use a client with consistent hashing algorithms, so if server goes down, only data on that server gets lost

Consistent Hashing

• available stats from servers include:

• uptime, #calls (get/set/...), #hits (since uptime), #misses (since uptime)

• no enumeration, no distinguishing on types of caches

• add own logging / statistics to monitor effectiveness of your caching strategy

Memcached Statistics

• Be carefull when security matters. (Remember ‘no authentication’?)• Working on authentication for memcached via SASL Auth

Protocol

• Caching is not an excuse not to do database tuning. (Remember cold cache?)

• Make sure to write unit tests for your caching classes and places where you use it. (Debugging problems related to out-of-date cache data is hard and boring. Very boring.)

More tips ...

• Zend framework has Zend_Cache with support for a memcached backend

• Wordpress has 3 plugins for working with memcached

• all of the other major frameworks have some sort of support (built in or via plugins): Symfony, Django, CakePHP, Drupal, ...

• Gear6: memcached servers in the cloud

Libraries for memcached

• memcachedb (persistent memcached)

• opcode caching

• APC (php compiled code cache, usable for other purposes too)

• xCache

• eAccelerator

• Zend optimizer

memcached isn’t the only caching solution

• main bottleneck in php backends is database• adding php servers is easier then scaling databases

• a complete caching layer before your database layer solves a lot of performance and scalability issues• but being able to scale takes more then memcached

• performance tuning, beginning with identifying the slowest and most used parts stays important, be it tuning of your php code, memcached calls or database queries

Last thought

FOR DEVELOPERS

YOUR GAME

A TOP SOCIAL GAME

High-score Handling

Tournaments

Challenge builder

Achievements

Got an idea for a game? Great!

Gatcha For Game Developers

Game trackingStart game and end game calls results in accurate gameplay tracking and allows us to show who is playing the game at any given moment, compute popularity, target games.

High-scoresYou push your high-score to our API, we do the hard work of creating different types of leader boards and rankings.

AchievementsPushing achievements reached in your game, just takes one API call, no configuration needed.

Gatcha For Game Developers

Multiplayer GamesWe run SmartFox servers that enable you to build real-time multiplayer games, with e.g.. in game chat

coming:

Challenges & TournamentsAllow your game players to challenge each other, or build challenges & contests yourself.

Gatcha For Game Developers

How to integrate?Flash GamesWe offer wrapper for AS3 and AS2 games with full implementation of our API

Unity3D Games

OpenSocial GamesTalk to the supported containers via the Gatcha OpenSocial Extension

Other GamesSimple iframe implementation. PHP Client API available for the Gatcha API

Start developing in our sandbox.

Job openings

Weʼre searching for great developers!

PHP TalentsWorking on integrations and the gaming platform

Flash DevelopersWorking on Flash Games and the gaming platform

Design ArtistsDesigning games and integrations

Resources, a.o.:• memcached & apc: http://www.slideshare.net/benramsey/

caching-with-memcached-and-apc• speed comparison: http://dealnews.com/developers/

memcachedv2.html• php client comparison: http://code.google.com/p/memcached/

wiki/PHPClientComparison• cakephp-memcached: http://teknoid.wordpress.com/

2009/06/17/send-your-database-on-vacation-by-using-cakephp-memcached/

• caching basics: http://www.slideshare.net/soplakanets/caching-basics

• caching w php: http://www.slideshare.net/JustinCarmony/effectice-caching-w-php-caching