The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers

The Peril and Promise of Early Adoption:Arriving 10 Years Early to ContainersCTO

[email protected]

Bryan Cantrill

@bcantrill

mailto:[email protected]

mailto:[email protected]

Who is Joyent?

• In an interview with ACM Queue in 2008, Joyent’s mission was described concisely — if ambitiously:

Virtualization as cloud catalyst

• This vision — dating back to 2005 — was an example of early cloud computing, but was itself not a new vision...

• In the 1960s — shortly after the dawn of computing! — pundits foresaw a multi-tenant compute utility

• The vision was four decades too early: it took the internet + commodity computing + virtualization to yield cloud computing

• Virtualization is the essential ingredient for multi-tenant operation — but where in the stack to virtualize?

• Choices around virtualization capture tensions between elasticity, tenancy, and performance

• tl;dr: Virtualization choices drive economic tradeoffs

• The historical answer — since the 1960s — has been to virtualize at the level of the hardware:

• A virtual machine is presented upon which each tenant runs an operating system of their choosing

• There are as many operating systems as tenants

• The singular advantage of hardware virtualization: it can run entire legacy stacks unmodified

• However, hardware virtualization exacts a heavy price: operating systems are not designed to share resources like DRAM, CPU, I/O devices or the network

• Hardware virtualization limits tenancy, elasticity and performance

Hardware-level virtualization?

• Virtualizing at the application platform layer addresses the tenancy challenges of hardware virtualization

• Added advantage of a much more nimble (& developer-friendly!) abstraction…

• ...but at the cost of dictating abstraction to the developer

• This creates the “Google App Engine problem”: developers are in a straightjacket where toy programs are easy — but sophisticated apps are impossible

• Virtualizing at the application platform layer poses many other challenges with respect to security, containment and scalability

Platform-level virtualization?

• Virtualizing at the OS level hits the sweet spot:

• Single OS (i.e., single kernel) allows for efficient use of hardware resources, maximizing tenancy and performance

• Disjoint instances are securely compartmentalized by the operating system

• Gives users what appears to be a virtual machine (albeit a very fast one) on which to run higher-level software

• The ease of a PaaS with the generality of IaaS

• Model was pioneered by FreeBSD jails and taken to their logical extreme by Solaris zones — and then aped by Linux containers

OS-level virtualization!

OS-level virtualization in the cloud

• Joyent runs OS containers in the cloud via SmartOS (our illumos derivative) — and we have run containers in multi-tenant production since ~2005

• Core SmartOS facilities are container-aware and optimized: Zones, ZFS, DTrace, Crossbow, SMF, etc.

• SmartOS also supports hardware-level virtualization — but we have long advocated OS-level virtualization for new build out

• We emphasized their operational characteristics (performance, elasticity, tenancy)...

And it worked!

• Our vision captured developers seeking to scale apps — and by 2007, a rapidly growing Twitter ran on Joyent Accelerators:

But there were challenges...

• OS-based virtualization was a tremendous strength — but SmartOS being (seemingly) spuriously different made it difficult to capture developer mind-share

• Differences are more idiosyncratic than meaningful, but they became an obstacle to adoption…

• Adopters had to be highly technical and really care about performance/scale

• Differentiating on performance alone is challenging, especially when the platform is different: too tempting to blame the differences instead of using the differentiators

Could we go upstack?

• To recapture the developer, we needed to get upstack

• First attempt was SmartPlatform (ca. 2009?), a JavaScript (SpiderMonkey!) + Perl frankensteinPaaS

• SmartPlatform had all of the problems of SpiderMonkey, Perl and a PaaS — but showed the value of server-side JavaScript

• When node.js first appeared in late 2009, we were among the first to see its promise, and we lunged...

node.js + OS-based virtualization?

• In 2010, the challenge became to tie node.js to our most fundamental differentiator, OS-based virtualization

• First experiments was a high-tenancy container-based PaaS, no.de, launched for Node Knockout in Fall 2010

• We ran high tenancy (400+ machines in 48GB), high performance — and developed DTrace-based graphical observability

• Early results were promising...

node.js + OS-based virtualization!

no.de: Challenges of a PaaS

• We went on to develop full cloud analytics for no.de:

• But the PaaS business is more than performance management — and it was clear that it was very early it what was going to be a tough business...

node.js: Wins and frustrations

• The SmartOS + node.js efforts were successful in as much as new developer converts to SmartOS were (and are!) often coming from node.js

• The debugging we built into node.js on SmartOS is (frankly) jawdropping — and essential for serious use...

• ...but our differentiators are production-oriented — developers still have to be highly technical, and still have to be willing to endure transitional pain

• Exacerbated by the fact that applications aren’t built in node.js — they are connected with node.js

• We ended up back with familiar problems...

Hardware virtualization?

• In late 2010, it was clear that — despite the (obvious!) technical superiority of OS-based virtualization — we also needed hardware-based virtualization

• Could OS-based virtualization could help us differentiate a hardware virtualization implementation?

• If we could port KVM to SmartOS, we could offer advantages over other hypervisors: shared filesystem cache, double-hulled security, global observability

• The problem is that KVM isn’t, in fact, portable — and had never been ported to a different system

KVM + SmartOS: Supergroup or stopgap?

• In 2011, we managed to successfully port KVM to SmartOS, becoming the first (and only) hypervisor to offer HW virtualization within OS virtualization

• Over the course of 2011, we built SmartDataCenter, a container-based orchestration and cloud-management system around SmartOS

• Deployed SmartDataCenter into production in the Joyent Public Cloud in late 2011

• Over the course of 2012, our entire cloud moved to SDC

• This was essential: most of our VMs today run inside KVM, and many customers are hybrid

The limits of hardware virtualization

• Ironically, our time on KVM helped to reinforce our most fundamental beliefs in OS-based virtualization...

• We spent significant time making KVM on SmartOS perform — but there are physical limits

• There are certain performance and resource problems around HW-based virtualization that are simple intractable

• While it is indisputably the right abstraction for running legacy software, it is the wrong abstraction for future elastic infrastructure!

Aside: Cloud storage

• In 2011, the gaping hole in the Joyent Public Cloud was storage — but we were reluctant to build an also-ran S3

• In thinking about this problem, it was tempting to fixate on ZFS, one of our most fundamental differentiators

• ZFS rivals OS-based virtualization for our earliest differentiator: we were the first large, public deployment of ZFS (ca. 2006) — and a long-time proponent

• While ZFS was part of the answer, it should have been no surprise that OS-based virtualization...

ZFS + OS-based virtualization?

Manta: ZFS + OS-based virtualization!

• Over 2012 and early 2013, we built Manta, a ZFS- and container-based internet-facing object storage system offering in situ compute

• OS-based virtualization allows the description of compute can be brought to where objects reside instead of having to backhaul objects to transient compute

• The abstractions made available for computation are anything that can run on the OS...

• ...and as a reminder, the OS — Unix — was built around the notion of ad hoc unstructured data processing, and allows for remarkably terse expressions of computation

Aside: Unix

• When Unix appeared in the early 1970s, it was not just a new system, but a new way of thinking about systems

• Instead of a sealed monolith, the operating system was a collection of small, easily understood programs

• First Edition Unix (1971) contained many programs that we still use today (ls, rm, cat, mv)

• Its very name conveyed this minimalist aesthetic: Unix is a homophone of “eunuchs” — a castrated Multics

We were a bit oppressed by the big system mentality. Ken wanted to do something simple. — Dennis Ritchie

Unix: Let there be light

• In 1969, Doug McIlroy had the idea of connecting different components:

At the same time that Thompson and Ritchie were sketching out a file system, I was sketching out how to do data processing on the blackboard by connecting together cascades of processes

• This was the primordial pipe, but it took three years to persuade Thompson to adopt it:

And one day I came up with a syntax for the shell that went along with the piping, and Ken said, “I’m going to do it!”

Unix: ...and there was light

And the next morning we had this orgy of one-liners. — Doug McIlroy

The Unix philosophy

• The pipe — coupled with the small-system aesthetic — gave rise to the Unix philosophy, as articulated by Doug McIlroy:

• Write programs that do one thing and do it well

• Write programs to work together

• Write programs that handle text streams, because that is a universal interface

• Four decades later, this philosophy remains the single most important revolution in software systems thinking!

• In 1986, Jon Bentley posed the challenge that became the Epic Rap Battle of computer science history:

Read a file of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies.

• Don Knuth’s solution: an elaborate program in WEB, a Pascal-like literate programming system of his own invention, using a purpose-built algorithm

• Doug McIlroy’s solution shows the power of the Unix philosophy:

tr -cs A-Za-z '\n' | tr A-Z a-z | \ sort | uniq -c | sort -rn | sed ${1}q

Doug McIlroy v. Don Knuth: FIGHT!

Big Data: History repeats itself?

• The original Google MapReduce paper (Dean et al., OSDI ’04) poses a problem disturbingly similar to Bentley’s challenge nearly two decades prior:

Count of URL Access Frequency: The function processes logs of web page requests and outputs ⟨URL, 1⟩. The reduce function adds together all values for the same URL and emits a ⟨URL, total count⟩ pair

• But the solutions do not adhere to the Unix philosophy...

• ...and nor do they make use of the substantial Unix foundation for data processing

• e.g., Appendix A of the OSDI ’04 paper has a 71 line word count in C++ — with nary a wc in sight

• Manta allows for an arbitrarily scalable variant of McIlroy’s solution to Bentley’s challenge: mfind -t o /bcantrill/public/v7/usr/man | \ mjob create -o -m "tr -cs A-Za-z '\n' | \ tr A-Z a-z | sort | uniq -c" -r \ "awk '{ x[\$2] += \$1 } END { for (w in x) { print x[w] \" \" w } }' | \ sort -rn | sed ${1}q"

• This description not only terse, it is high performing: data is left at rest — with the “map” phase doing heavy reduction of the data stream

• As such, Manta — like Unix — is not merely syntactic sugar; it converges compute and data in a new way

Manta: Unix for Big Data

Manta revolution

• Our experiences with Manta — like those with KVM — have served to strengthen our core belief in OS-based virtualization

• Compute/data convergence is clearly the future of big data: stores of record must support computation as a first-class, in situ operation

• Unix is a natural way of expressing this computation — and the OS is clearly the right level at which to virtualize to support this securely

• Manta will surely not be the only system to represent the confluence of these; the rest of the world will (ultimately) figure out the power of OS-based virtualization

Manta mental challenges

• Our biggest challenge with Manta has been that the key underlying technology — OS-based virtualization — is not well understood

• We underestimated the degree to which this would be an impediment: Manta felt “easy” to us

• When technology requires a shift in mental model, its transformative power must be that much greater to compensate for its increased burden!

• Would the world ever really figure out containers?!

Containers as PaaS foundation?

• Some saw the power of OS containers to facilitate up-stack platform-as-a-service abstractions

• For example, dotCloud — a platform-as-a-service provider — build their PaaS on OS containers

• Hearing that many were interested in their container orchestration layer (but not their PaaS), dotCloud open sourced their container-based orchestration layer...

...and Docker was born

Docker revolution

• Docker has used the rapid provisioning + shared underlying filesystem of containers to allow developers to think operationally

• Developers can encode dependencies and deployment practices into an image

• Images can be layered, allowing for swift development

• Images can be quickly deployed — and re-deployed

• As such, Docker is a perfect for for microservices

• Docker will do to apt what apt did to tar

Docker’s challenges

• The Docker model is the future of containers

• Docker’s challenges are largely around production deployment: security, network virtualization, persistence

• Security concerns are real enough that for multi-tenancy, OS containers are currently running in hardware VMs (!!)

• In SmartOS, we have spent a decade addressing these concerns — and we have proven it in production…

• Could we combine the best of both worlds?

• Could we somehow deploy Docker containers as SmartOS zones?

Docker + SmartOS: Linux binaries?

• First (obvious) problem: while it has been designed to be cross-platform, Docker is Linux-centric

• While Docker could be ported, the encyclopedia of Docker images will likely forever remain Linux binaries

• SmartOS is Unix — but it isn’t Linux…

• Could we somehow natively emulate Linux — and run Linux binaries directly on the SmartOS kernel?

OS emulation: An old idea

• Operating systems have long employed system call emulation to allow binaries from one operating system run on another on the same instruction set architecture

• Combines the binary footprint of the emulated system with the operational advantages of the emulating system

• Done as early as 1969 with DEC’s PA1050 (TOPS-10 on TOPS-20); Sun did this (for similar reasons) ca. 1993 with SunOS 4.x binaries running on Solaris 2.x

• In mid-2000s, Sun developed zone-based OS emulation for Solaris: branded zones

• Several brands were developed — notably including an LX brand that allowed for Linux emulation

LX-branded zones: Life and death

• The LX-branded zone worked for RHEL 3 (!): glibc 2.3.2 + Linux 2.4

• Remarkable amount of work was done to handle device pathing, signal handling, /proc — and arcana like TTY ioctls, ptrace, etc.

• Worked for a surprising number of binaries!

• But support was only for 2.4 kernels and only for 32-bit; 2.6 + 64-bit appeared daunting…

• Support was ripped out of the system on June 11, 2010

• Fortunately, this was after the system was open sourced in June 2005 — and the source was out there...

LX-branded zones: Resurrection!

• In January 2014, David Mackay, an illumos community member, announced that he was able to resurrect the LX brand —and that it appeared to work!

Linked below is a webrev which restores LX branded zones support to Illumos:

http://cr.illumos.org/~webrev/DavidJX8P/lx-zones-restoration/

I have been running OpenIndiana, using it daily on my workstation for over a month with the above webrev applied to the illumos-gate and built by myself.

It would definitely raise interest in Illumos. Indeed, I have seen many people who are extremely interested in LX zones.

The LX zones code is minimally invasive on Illumos itself, and is mostly segregated out.

I hope you find this of interest.





LX-branded zones: Revival

• Encouraged that the LX-branded work was salvageable, Joyent engineer Jerry Jelinek reintegrated the LX brand into SmartOS on March 20, 2014...

• ...and started the (substantial) work to modernize it

• Guiding principles for LX-branded zone work:

• Do it all in the open

• Do it all on SmartOS master (illumos-joyent)

• Add base illumos facilities wherever possible

• Aim to upstream to illumos when we’re done

LX-branded zones: Progress

• Working assiduously over the course of 2014, progress was difficult but steady:

• Ubuntu 10.04 booted in April

• Ubuntu 12.04 booted in May

• Ubuntu 14.04 booted in July

• 64-bit Ubuntu 14.04 booted in October (!)

• Going into 2015, it was becoming increasingly difficult to find Linux software that didn’t work...

LX-branded zones: Working well...

...and, um, well received

Docker + SmartOS: Provisioning?

• With the binary problem being tackled, focus turned to the mechanics of integrating Docker with the SmartOS facilities for provisioning

• Provisioning a SmartOS zone operates via the global zone that represents the control plane of the machine

• docker is a single binary that functions as both client and server — and with too much surface area to run in the global zone, especially for a public cloud

• docker has also embedded Go- and Linux-isms that we did not want in the global zone; we needed to find a different approach...

Docker Remote API

• While docker is a single binary that can run on the client or the server, it does not run in both at once…

• docker (the client) communicates with docker (the server) via the Docker Remote API

• The Docker Remote API is expressive, modern and robust (i.e. versioned), allowing for docker to communicate with Docker backends that aren’t docker

• The clear approach was therefore to implement a Docker Remote API endpoint for SmartDataCenter

Aside: SmartDataCenter

• Orchestration software for SmartOS-based clouds

• Unlike other cloud stacks, not designed to run arbitrary hypervisors, sell legacy hardware or get 160 companies to agree on something

• SmartDataCenter is designed to leverage the SmartOS differentiators: ZFS, DTrace and (esp.) zones

• Runs both the Joyent Public Cloud and business-critical on-premises clouds at well-known brands

• Born proprietary — but made entirely open source on November 6, 2014: http://github.com/joyent/sdc

http://github.com/joyent/sdc

http://github.com/joyent/sdc

SmartDataCenter: Architecture

Booter

AMQPbroker

PublicAPI

Customerportal

ZFS-based multi-tenant filesystem

Virtu

al N

IC

Virtu

al N

IC

VirtualSmartOS(OS virt.)

. . .

Virtu

al N

IC

Virtu

al N

ICLinuxGuest

(HW virt.)

. . .

Virtu

al N

IC

Virtu

al N

IC

WindowsGuest

(HW virt.)

. . .

Virtu

al N

IC

Virtu

al N

IC

Virtual OSor Machine

. . .

SmartOS kernel(network booted)

SmartOS kernel(flash booted)

Provisioner

Instrumenter

Heartbeater

DHCP/TFTP

AMQP

AMQP agents

Public HTTP

Head-node

Compute node Tens/hundreds per

head-node

. . .

SDC 7 core services

BinderDNS

Operatorportal

. . .

Firewall

SmartDataCenter: Core Services

Analyticsaggregator

Key/ValueService(Moray)

FirewallAPI

(FWAPI)

VirtualMachine

API(VMAPI)

DirectoryService(UFDS)

DesignationAPI

(DAPI)

WorkflowAPI

NetworkAPI

(NAPI)

Compute-Node API(CNAPI)

ImageAPI

Alerts &Monitoring

(Amon)

PackagingAPI

(PAPI)

ServiceAPI

(SAPI)

DHCP/TFTP

AMQP

DNS

Booter

AMQPbroker

Binder

PublicAPI

Customerportal

Public HTTP

Operatorportal

OperatorServices Manta

Other DCs

Note: Service interdependencies not shown for readability

Head-nodeOther core services

may be provisioned on compute nodes

SDC7 Core Services

SmartDataCenter + Docker

• Implementing an SDC-wide endpoint for the Docker remote API allows us to build in terms of our established core services: UFDS, CNAPI, VMAPI, Image API, etc.

• Has the welcome side-effect of virtualizing the notion of Docker host machine: Docker containers can be placed anywhere within the data center

• From a developer perspective, one less thing to manage

• From an operations perspective, allows for a flexible layer of management and control: Docker API endpoints become a potential administrative nexus

• As such, virtualizing the Docker host is somewhat analogous to the way ZFS virtualized the filesystem...

SmartDataCenter + Docker: Challenges

• Some Docker constructs have (implicitly) encoded co-locality of Docker containers on a physical machine

• Some of these constructs (e.g., --volumes-from) we will discourage but accommodate by co-scheduling

• Others (e.g., host directory-based volumes) we are implementing via NFS backed by Manta, our (open source!) distributed object storage service

• Moving forward, we are working with Docker to help assure that the Docker Remote API doesn’t create new implicit dependencies on physical locality

SmartDataCenter + Docker: Networking

• Parallel to our SmartOS and Docker work, we have been working on next-generation software-defined networking for SmartOS and SmartDataCenter

• Goal was to use standard encapsulation/decapsulation protocols (i.e., VXLAN) for overlay networks

• We have taken a kernel-based (and ARP-inspired) approach to assure scale

• Complements SDC’s existing in-kernel, API-managed firewall facilities

• All done in the open: in SmartOS (illumos-joyent) and as sdc-portolan

Putting it all together: sdc-docker

• Our Docker engine for SDC, sdc-docker, implements the end points for the Docker Remote API

• Work is young (started in earnest in early fall 2014), but because it takes advantage of a proven orchestration substrate, progress has been very quick…

• We are deploying it into early access production in the Joyent Public Cloud in Q1CY15 (yes: T-12 days!)

• It’s open source: http://github.com/joyent/sdc-docker; you can install SDC (either on hardware or on VMware) and check it out for yourself!

http://github.com/joyent/sdc-docker

http://github.com/joyent/sdc-docker

Containers: reflecting back

• For nearly a decade, we at Joyent have believed that OS-virtualized containers are the future of computing

• While the efficiency gains are tremendous, they have not alone been enough to propel containers into the mainstream

• Containers are being propelled by Docker and its embodiment of an entirely different advantage of OS containers: developer agility

• With Docker, the moment for the technology seems to have arrived: the technology seems to be in the right place at the right time

• Reflecting back on our adventure as an early adopter...

Early adoption: The peril

• When working on a revolutionary technology, it’s easy to dismiss the inconveniences as casualties of the future

• Some conveniences are actually constraints — but it can be very difficult to discern which!

• When adopters must endure painful differences to enjoy the differentiators, the economic advantages of a technological revolution are undermined

• And even when the thinking does shift, it can take a long time; as Keynes famously observed, “the market can stay irrational longer than you can stay solvent”!

Early adoption: The promise

• When the payoffs do come, they can be tremendously outsized with respect to the risk

• Placing gutsy technological bets attracts like-minded technologists — which can create uniquely fertile environments for innovation

• If and where early adoption is based on open source, the community of like-minded technologists is not confined to be within a company’s walls

• Open source innovation allows for new customers and/or new employees: for early adopters, open source is the farm system!

Early adoption: The peril and the promise

• While early adoption isn’t for everyone, every organization should probably be doing some early adoption somewhere — and probably in the open

• When an early adopter of a technology, don’t innovate in too many directions at once: know the differentiators and focus on ease of use/adoption for everything else

• Stay flexible and adaptable! You may very well be right on trajectory, but wrong on specifics

• Don’t give up! Technological revolutions happen much slower than you think they should — and then much more quickly than anyone would think possible

• “God bless the early adopters!”

Thank you!

• Jerry Jelinek, @jmclulow, @pfmooney and @jperkin for their work on LX branded zones

• @joshwilsdon, @trentmick, @cachafla and @orlandov for their work on sdc-docker

• @rmustacc, @wayfaringrob, @fredfkuo and @notmatt for their work on SDC overlay networking

• @dapsays for his work on Manta and node.js debugging

• @tjfontaine for his work on node.js

• The countless engineers who have worked on or with us because they believed in OS-based virtualization!

The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers

Technology

Transcript of The Peril and Promise of Early Adoption: Arriving 10 Years Early to Containers