Building Reliable Systems From Unreliable Parts

download Building Reliable Systems From Unreliable Parts

If you can't read please download the document

Upload
jonah-horowitz
Category

Technology
view
62
download
0

Embed Size (px):

Transcript of Building Reliable Systems From Unreliable Parts

Building Reliable Systems From Unreliable Parts

You might think that the problems at large scale are different from the problems at small scale, but it's all failure all the time, and it is only worse at large scale

Plan for it.

All systems fail

Tell the story about the failure last week when a developer pushed a small Java warmup script and took out all of netflix api servers.

Jonah Horowitz

(Site Reliability Engineer at Netflix and elsewhere)

Home built BBS in 1990Some NOC/Helpdesk workWalmart.com in 2000BSEE from the Univ of CincinnatiMusic Startup in 2005Telecom Startup in 2007Advertising companiesNetflix

Talk about Netflix scale 100k servers, 80M users, 30TB/s network traffic, 800 microservices, 1500 engineers

Chaos is your friend

Talk about Chaos Monkey, Chaos Kong

Stateless services are awesome

Most of the 800 microservices at Netflix are stateless, this allows for failure

store state somewhere

Globally replicated cassandra database rings, massive number of nodes, but you should have 2 copies of your database.

Repair Automatically

Never have an engineer do something that can be done automatically. Computers are better at pushing puttons than you are.

Talk about rebootageddon, zero downtime even though 1/3 of our cassandra servers were rebooted over 48 hours.

Even in open source projects.

Culture is important

Jonah HorowitzSite Reliability Engineer

@[email protected]

Netflix lawyers didn't approve my talk, so everything I said was my own opinion.

Speakers here were really inspiring.

Principles*of*reliable*datatransfer* - Montana Tech · Overview* 3 • Reliable*datatransfer* – Geng*datathere*despite*unreliable* channel* – Importantfor*applicaon,*transportand*

Principles*of*reliable*datatransfer* - Montana Tech · Overview* 3 • Reliable*datatransfer* – Geng*datathere*despite*unreliable* channel* – Importantfor*applicaon,*transportand*

Von Neumann -Reliable Organisms from Unreliable Components

Von Neumann -Reliable Organisms from Unreliable Components

Unreliable Failure Detectors for Reliable Distributed Systemscourses.csail.mit.edu/6.852/08/papers/CT96-JACM.pdf · Unreliable Failure Detectors for Reliable Distributed Systems Tushar

Unreliable Failure Detectors for Reliable Distributed Systemscourses.csail.mit.edu/6.852/08/papers/CT96-JACM.pdf · Unreliable Failure Detectors for Reliable Distributed Systems Tushar

Critical Thinking. 1)Understanding the nature of arguments 2) Reliable and unreliable arguments.

Critical Thinking. 1)Understanding the nature of arguments 2) Reliable and unreliable arguments.

Workshop 3 First Computers 1.How to make reliable systems from unreliable parts? 2.What are algorithms?

Workshop 3 First Computers 1.How to make reliable systems from unreliable parts? 2.What are algorithms?

Building Reliable SOA from the Unreliable Web Services Ben, Zibin ZHENG Department of Computer Science & Engineering The Chinese University of Hong Kong.

Building Reliable SOA from the Unreliable Web Services Ben, Zibin ZHENG Department of Computer Science & Engineering The Chinese University of Hong Kong.

Reliable Sources Unreliable Sourcesacademic.uprm.edu/ggriggs/PrimSecSources/Primary-SecondarySour… · * May be primary or secondary Secondary Research: secondary accounts derived

Reliable Sources Unreliable Sourcesacademic.uprm.edu/ggriggs/PrimSecSources/Primary-SecondarySour… · * May be primary or secondary Secondary Research: secondary accounts derived

Atlas Copco Parts Numbers Catalog - Reliable China ...

Atlas Copco Parts Numbers Catalog - Reliable China ...

The Clamor Outside as INWGDebated: Economic WarComes ...arussell.org/INWG-Day.pdf · systems and building reliable networks from unreliable parts captured the imagination of the small

The Clamor Outside as INWGDebated: Economic WarComes ...arussell.org/INWG-Day.pdf · systems and building reliable networks from unreliable parts captured the imagination of the small

Reliable Source of Aircraft Parts - ASAP Aero Supplies

Reliable Source of Aircraft Parts - ASAP Aero Supplies

HHgh rformanc , Scaa an Fautigh Performance, Scalable and ...– Reliable Connection Unreliable Datagram Reliable Reliable Connection, Unreliable Datagram, Reliable ... Sockets Based

HHgh rformanc , Scaa an Fautigh Performance, Scalable and ...– Reliable Connection Unreliable Datagram Reliable Reliable Connection, Unreliable Datagram, Reliable ... Sockets Based

Designing Reliable Systems With Unreliable Components

Designing Reliable Systems With Unreliable Components

CUBESAT ARCHITECTUREFOR RELIABLE ON-ORBIT COTS PARTS …

CUBESAT ARCHITECTUREFOR RELIABLE ON-ORBIT COTS PARTS …

1 Ch. 7 : Internet Transport Protocols. 3-2 TCP reliable data transfer r TCP creates reliable service on top of IP’s unreliable service r pipelined segments.

1 Ch. 7 : Internet Transport Protocols. 3-2 TCP reliable data transfer r TCP creates reliable service on top of IP’s unreliable service r pipelined segments.

Transport Layer3-1 Reliable Data Transfer Important in app., transport, link layers Top-10 list of important networking topics! Characteristics of unreliable.

Transport Layer3-1 Reliable Data Transfer Important in app., transport, link layers Top-10 list of important networking topics! Characteristics of unreliable.

Building reliable systems from unreliable components

Building reliable systems from unreliable components

TCP 10. TCP – purpose TCP provides reliable data transmission over an unreliable network. TCP provides congestion control TCP provides flow control TCP.

TCP 10. TCP – purpose TCP provides reliable data transmission over an unreliable network. TCP provides congestion control TCP provides flow control TCP.

Operational Buddhism - USENIX · Operational Buddhism Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest

Operational Buddhism - USENIX · Operational Buddhism Operational Buddhism: Building Reliable Services From Unreliable Components – Ernie Souhrada, Database Engineer @ Pinterest

Phytochrome and flowering. When is the right time to flower? Unreliable indicators of time of year –Temperature –Moisture –Light levels Reliable: length.

Phytochrome and flowering. When is the right time to flower? Unreliable indicators of time of year –Temperature –Moisture –Light levels Reliable: length.

Reaching reliable agreement in an unreliable worldhh360.user.srcf.net/slides/consensus_lecture.pdf · Reaching reliable agreement in an unreliable world Heidi Howard heidi.howard@cl.cam.ac.uk

Reaching reliable agreement in an unreliable worldhh360.user.srcf.net/slides/consensus_lecture.pdf · Reaching reliable agreement in an unreliable world Heidi Howard [email protected]

Languages

Pages

Legal

Copyright © 2022 FDOCUMENTS