Taming the resource tiger

Post on 09-Feb-2017

131 views 0 download

Transcript of Taming the resource tiger

Taming the resource tigerYou cannot hide from Physics!

MassHow much matter is in an object

Data Storage• Hard Disk Drive - HDD

• Magnetizes a thin film of ferromagnetic material on a disk• Reads it with a magnetic head on an actuator arm

• Solid State Drive – SSD• Uses integrated circuit assemblies as memory to store data

persistently• No moving parts

Areal Storage Density• SSD

• 2.8 Tbit/in2 • HDD

• 1.5 Tbit/in2

Terabits per square inch – numbers as of 2016 (see Wikipedia, our materials are improving)

When hard drives go bad

Streams: Computing ConceptDefinitions

• Idea originating in 1950’s • Standard way to get Input

and Output• A source or sink of data

Who uses them

• C – stdin, stderr, stdout• C++ iostream• Perl IO• Python io• Java• C#

What is a Stream?• Access input and output generically• Can write and read linearly• May or may not be seekable• Comes in chunks of data

Why do I care about streams?• They are created to handle massive amounts of data• Assume all files are too large to load into memory• If this means checking size before load, do it• If this means always treating a file as very large, do it• PHP streams were meant for this!

What does a this have to do with PHP?

The chat that worked for 3 days…

What uses streams in PHP?• EVERYTHING• include/require _once• stream functions• file system functions• many other extensions

ALL IO

Attach Context

Stream Transpo

rt

Stream Filter

Stream Wrappe

r

How PHP Streams Work

Using Streams

You can also do logic on the fly!

What are Filters?• Performs operations on stream data• Can be prepended or appended (even on the fly)• Can be attached to read or write• When a filter is added for read and write, two instances of the

filter are created.

Using Filters

Things to watch for!• Data has an input and output state• When reading in chunks, you may need to cache in between

reads to make filters useful• Use the right tool for the job

Throw away your assumptions except for:

There will be Terabytes of Cat Gifs!!

DimensionBoth an object’s size and mathematical space

Random Access Memory (RAM)• The CPU uses RAM to work• It randomly shoves data inside and pulls data back out• RAM is faster then SSD and HDD• It’s also more expensive

Out of Memory

There are two reasons you’ll see that error• Recursion recursion recursion recursion

• Solution: install xdebug and get your stacktrace• Loading too much data into memory

• Solution: manage your memory

Inherently PHP hides this problem• Share nothing architecture• Extensions with C libraries that hide memory consumption• FastCGI/CGI blows away processes, restoring memory• Max child and other Apache settings blow away children,

restoring memory

How do I fix it!

Halp, I can’t upload!!

Arrays are evil• There are other ways to store data that are more efficient• They should be used for small amounts of data• No matter how hard you try, there is C overhead

Process with the appropriate tools• Load data into the appropriate place for processing• Hint – arrays are IN MEMORY – that is generally not an

appropriate place for processing• Datastores are meant for storing and retrieving data, use them

Select * from table

Use the iteration, Luke• Lazy fetching exists for database fetching – use it!• Always page (window) your result sets from the database –

ALWAYS• Use filters or generators to format or alter results on the fly

The N+1 problem• In simple terms, nested loops• Don’t distance yourself too much from your datastore• Collapse into one or two queries instead

Throw away all your assumptions except:

SpeedThe rate at which an object covers distance

How does a CPU work?

CPU limitations• Transmission delays• Heat

• Both are materials limitations

• http://www.mooreslaw.org/

Why I no longer overclock

What does this have to do with PHP?• You are limited by the CPU your site is deployed upon.• Yes even in a cloud – there are still physical systems running

your stuff• Yes even in a VM – there are still physical systems running your

stuff• Follow good programming habits • PROFILE

Good programming habits• Turn on opcache in production!• Keep your code error AND WARNING free• Watch complex logic in loops

• Short circuit the loop • Rewrite to do the logic on the entire set in one step• Calculate values only once• On small arrays use array_walk• On large arrays use generators/iterators

• Use isset instead of in_array if possible• Profile to find the place to rewrite for slow code issues

Distribute the load• Perfect for heavy processing for some type of data• Queue code that requires heavy processing but not immediate

viewing• Design your UX so you can inform users of completed jobs• Cache complex work items

Pick your system• php-resque• Gearman• Beanstalkd• IronMQ• RabbitMQ• ZeroMQ• AmazonSQS• Just visit http://queues.io

Job queuing and 10K page pdfs

Keep your CPU happy• Offload processing• Use a queue

VelocitySpeed + Direction

Networking 101• IP – forwards packets of data based on a destination address• TCP – verifies the correct delivery of data from client to server

with error and lost data correction• Network Sockets – subroutines that provide TCP/IP (and UDP

and some other support) on most systems

Packet of Data

Speed in the series of tubes• Bandwidth – size of your pipe• Latency – length of your pipe including size changes• Jitter – air bubbles in your pipe

Network Socket Types• Stream

• Connection oriented (tcp)• Datagram

• Connectionless (udp)• Raw

• Low level protocols

Definitions• Socket

• Bidirectional network stream that speaks a protocol• Transport

• Tells a network stream how to communicate• Wrapper

• Tells a stream how to handle specific protocols and encodings

Using Sockets

What does this have to do with PHP?• APIs fail• APIs go byby• AWS goes down

• Or loses network connection to a specific area• Or otherwise fails

What do you mean we can’t write files?

Prepare for failure• Handle timeouts• Handle failures• Abstract enough to replace systems if necessary, but only as

much as necessary• If you’re not paying for it, don’t base your business model on it

Checklist• Cultivate good coding habits• Try not to loop logic or processing• Don’t be afraid to offload work to other systems or services• Assume every file is huge• Assume there are 1 million rows in your DB table• Assume that every network request is slow or going to fail• Profile to find code bottlenecks, DON’T assume you know the

bottleneck• Wrap 3rd party tools enough to deal with downtime or

retirement of apis

SHHHHHH• Plotting• https://github.com/phplang/streams2/wiki• PHP is always improving!

About Me http://emsmith.net auroraeosrose@gmail.com twitter - @auroraeosrose IRC – freenode –

auroraeosrose #phpmentoring https://joind.in/talk/18dd4