Lxc – next gen virtualization for cloud intro (cloudexpo)
-
Upload
boden-russell -
Category
Technology
-
view
1.077 -
download
3
description
Transcript of Lxc – next gen virtualization for cloud intro (cloudexpo)
Linux Containers – NextGen Virtualization for Cloud (Intro & Overview)
Cloud ExpoJune 10-12, 2014New York City, NY
Boden Russell ([email protected])
04/11/2023 2
Why LXC: Performance
Manual VM LXC
Provision Time
Days
Minutes
Seconds / ms
linpack performance @ 45000
0
50
100
150
200
250
vcpus
GF
lop
s
04/11/2023 3
Why LXC: Industry UptrendGoogle trends - LXC
Google trends - docker
04/11/2023 4
Why LXC: Flexible & LightweightVirtual Machines Linux Containers
OS
bins / libsapp
OS
bins / libsapp app
bins / libsapp
bins / libsapp app
app app
OS
bins / libs
app
OS
bins / libs
app
OS
bins / libs
app
bins / libsapp
bins / libsapp
bins / libsapp
bins / libsapp
bins / libsapp
bins / libsapp
bins / libsapp
bins / libsapp
bins / libsapp
Flex
ibili
tyD
ensi
ty
OS
04/11/2023 5
Why LXC: Lower TCO
Supported with out of the box modern Linux Kernel
Open source toolsets
Cloudy integration
04/11/2023 6
Definitions
Linux Containers (LXC LinuX Containers)– Lightweight virtualization– Realized using features provided by a modern Linux kernel – VMs without the hypervisor (kind of)
Containerization of– (Linux) Operating Systems– Single or multiple applications
LXC as a technology ≠ LXC “tools”
04/11/2023 7
Hypervisors vs. Linux Containers
Hardware
Operating System
Hypervisor
Virtual Machine
Operating System
Bins / libs
App App
Virtual Machine
Operating System
Bins / libs
App App
Hardware
Hypervisor
Virtual Machine
Operating System
Bins / libs
App App
Virtual Machine
Operating System
Bins / libs
App App
Hardware
Operating System
Container
Bins / libs
App App
Container
Bins / libs
App App
Type 1 Hypervisor Type 2 Hypervisor Linux Containers
Containers share the OS kernel of the host and thus are lightweight.However, each container must have the same OS kernel.
Containers are isolated, but share OS and, where appropriate, libs / bins.
04/11/2023 8
LXC Technology Stack
Use
r Spa
ceKe
rnel
Spa
ce
Kernel
System Call Interface
Architecture Dependent Kernel Code
GLIBC / Pseudo FS / User Space Tools & Libs
Linux Container Tooling
Linux Container Commoditization
Orchestration & Management
Hardware
cgro
ups
nam
espa
ces
chro
ots
LSM
lxc
04/11/2023 9
So You Want To Build A Container?
High level checklist– Process(es)– Throttling / limits– Prioritization– Resource isolation– Root file system– Security
my-lxc
?
04/11/2023 10
Linux Control Groups (cgroups)
Problem– How do I throttle, prioritize, control and obtain metrics for a group of
tasks (processes)?
Solution control groups (cgroups)
cgroup blue
proc
proc
proc
– Device Access– Resource limiting– Prioritization– Accounting– Control– Injection
04/11/2023 11
Linux cgroup SubsystemsSubsystem Tunable Parameters
blkio - Weighted proportional block I/O access. Group wide or per device.- Per device hard limits on block I/O read/write specified as bytes per second or IOPS
per second.cpu - Time period (microseconds per second) a group should have CPU access.
- Group wide upper limit on CPU time per second.- Weighted proportional value of relative CPU time for a group.
cpuset - CPUs (cores) the group can access.- Memory nodes the group can access and migrate ability.- Memory hardwall, pressure, spread, etc.
devices - Define which devices and access type a group can use.
freezer - Suspend/resume group tasks.
memory - Max memory limits for the group (in bytes).- Memory swappiness, OOM control, hierarchy, etc..
hugetlb - Limit HugeTLB size usage.- Per cgroup HugeTLB metrics.
net_cls - Tag network packets with a class ID.- Use tc to prioritize tagged packets.
net_prio - Weighted proportional priority on egress traffic (per interface).
04/11/2023 12
Linux cgroups Pseudo FS Interface Linux pseudo FS is the interface to cgroups– Directory per subsystem per cgroup– Read / write to pseudo file(s) in your cgroup directory
/sys/fs/cgroup/my-lxc
|-- blkio| |-- blkio.io_merged| |-- blkio.io_queued| |-- blkio.io_service_bytes| |-- blkio.io_serviced| |-- blkio.io_service_time| |-- blkio.io_wait_time| |-- blkio.reset_stats| |-- blkio.sectors| |-- blkio.throttle.io_service_bytes| |-- blkio.throttle.io_serviced| |-- blkio.throttle.read_bps_device| |-- blkio.throttle.read_iops_device| |-- blkio.throttle.write_bps_device| |-- blkio.throttle.write_iops_device| |-- blkio.time| |-- blkio.weight| |-- blkio.weight_device| |-- cgroup.clone_children| |-- cgroup.event_control| |-- cgroup.procs| |-- notify_on_release| |-- release_agent| `-- tasks|-- cpu| |-- ...|-- ...`-- perf_event
echo "8:16 1048576“ > blkio.throttle.read_bps_devic
e
cat blkio.weight_devicedev weight8:1 2008:16 500 App
App
App
04/11/2023 13
Linux cgroups FS Layout
04/11/2023 14
Linux cgroups: CPU Usage
Use CPU shares (and other controls) to prioritize jobs / containers
Carry out complex scheduling schemes Segment host resources Adhere to SLAs
04/11/2023 15
Linux cgroups: CPU Pinning
Pin containers / jobs to CPU cores Carry out complex scheduling schemes Reduce core switching costs Adhere to SLAs
04/11/2023 16
Linux cgroups: Device Access
Limit device visibility; isolation Implement device access controls– Secure sharing
Segment device access Device whitelist / blacklist
04/11/2023 17
So You Want To Build A Container?
04/11/2023
Linux namespaces
Problem– How do I provide an isolated view of global resources to a group of tasks
(processes)?
Solution namespaces
18
namespace blue
– MNT; mount points, files systems, etc.
– PID; processes– NET; NICs, routing, etc.– IPC; System V IPC– UTS; host and domain name– USER; UID and GID
MNTPIDNETUTSUSER
proc
proc
proc
04/11/2023 19
Linux namespaces: Conceptual Overview
global (i.e. root) namespace
MNT NS//proc/mnt/fsrd/mnt/fsrw/mnt/cdrom/run2
UTS NSglobalhostrootns.com
PID NSPID COMMAND1 /sbin/init2 [kthreadd]3 [ksoftirqd]4 [cpuset]5 /sbin/udevd6 /bin/sh7 /bin/bash
IPC NSSHMID OWNER32452 root43321 boden
SEMID OWNER0 root1 Boden
MSQID OWNER
NET NSlo: UNKNOWN…eth0: UP…eth1: UP…br0: UP…
app1 IP:5000app2 IP:6000app3 IP:7000
USER NSroot 0:0ntp 104:109mysql 105:110boden 106:111
purple namespace
MNT NS//proc/mnt/purplenfs/mnt/fsrw/mnt/cdrom
UTS NSpurplehostpurplens.com
PID NSPID COMMAND1 /bin/bash2 /bin/vim
IPC NSSHMID OWNER
SEMID OWNER0 root
MSQID OWNER
NET NSlo: UNKNOWN…eth0: UP…
app1 IP:1000app2 IP:7000
USER NSroot 0:0app 106:111
blue namespace
MNT NS//proc/mnt/cdrom/bluens
UTS NSbluehostbluens.com
PID NSPID COMMAND1 /bin/bash2 python3 node
IPC NSSHMID OWNER
SEMID OWNER
MSQID OWNER
NET NSlo: UNKNOWN…eth0: DOWN…eth1: UP
app1 IP:7000app2 IP:9000
USER NSroot 0:0app 104:109
04/11/2023 20
Linux namespaces: Common Idioms
It’s not required to use all namespaces – Pick & choose; if your toolset allows it
Constructs exist to permit “connectivity” between parent / child namespace
Various linux user space tools have namespace support Linux sys API supports flexible namespace creation
04/11/2023 21
Linux namespaces & cgroups: Availability
Note: user namespace support in upstream kernel 3.8+, but distributions rolling out phased support:- Map LXC UID/GID between
container and host- Non-root LXC creation
04/11/2023 22
So You Want To Build A Container?
04/11/2023 23
Linux chroot & pivot_root Using pivot_root with MNT namespace addresses escaping chroot
concerns The pivot_root target directory becomes the “new root FS”
04/11/2023 24
LXC ImagesLXC images provide a flexible means to deliver only what you need – lightweight and minimal footprint
Basic constraints– Same architecture & endian– Linux’ish Operating System; you can run different Linux distros on same host
Image types– System; virtualize Operating System(s) – standard distro root FS less the kernel– Application; virtualize application(s) – only package apps + dependencies (aka JeOS – Just
enough Operating System) Bind mount host libs / bins into LXC to share host resources Container image init process
– Container init command provided on invocation – can be an application or a full fledged init process
– Init script customized for image – skinny SysVinit, upstart, etc.– Reduces overhead of lxc start-up and runtime foot print
Various tools to build images– SuSE Kiwi– Debootstrap– Etc.
LXC tooling options often include numerous image templates
04/11/2023 25
So You Want To Build A Container?
04/11/2023 26
Linux Security Modules & MAC Linux Security Modules (LSM) – kernel modules which provide a
framework for Mandatory Access Control (MAC) security implementations MAC vs DAC– In MAC, admin (user or process) assigns access controls to subject / initiator– In DAC, resource owner (user) assigns access controls to individual resources
Existing LSM implementations include: AppArmor, SELinux, GRSEC, etc.
04/11/2023 27
Linux Capabilities
Per process privileges which define sys call access
Can be assigned to LXC process(es)
04/11/2023 28
Other Security Measures
Reduce shared FS access using RO bind mounts Linux seccomp– Confine system calls
Keep Linux kernel up to date User namespaces in 3.8+ kernel– Launching containers as non-root user– Mapping UID / GID into container
04/11/2023 29
So You Want To Build A Container?
04/11/2023 30
LXC Industry ToolingVirtuozzo OpenVZ Linux
VServerLibvirt-lxc Lxc (tools) Warden lmctfy Docker
Summary Commercial product using OpenVZ under the hood
Custom Kernel providing well seasoned LXC support
A set of kernel patches providing LXC. Not based on cgroups or namespaces.
Libvirt support for LXC via cgroups and namespaces.
Lib + set of user spaces tools /bindings for LXC.
LXC management tooling used by CF.
Similar to LXC, but provides more intent based focus.
Commoditization of LXC adding support for images, build files, etc.
Part of upstream Kernel?
No No Partial Yes Yes Yes Yes, but additional patches needed for specific features.
Yes
License Commercial GNU GPL v2 GNU GPL v2 GNU LGPL GNU LGPL Apache v2 Apache v2 Apache v2
APIs / Bindings
- CLI- API
- CLI- C
- CLI- C- Python- Java- C#- PHP
- Python- Lua- GO- CLI
- GO- REST- CLI- Python- Other 3rd
party libs
Management plane/ Dashboard
Virtuozzo Parrallels
Virtuozzo Parrallels + others
- OpenStack- Archipel- Virt-
Manager
- LXC web panel
- Lexy
- OpenStack- Shipyard- Docker UI
04/11/2023 31
LXC Orchestration & Management Docker & libvirt-lxc in OpenStack– Manage containers heterogeneously with traditional VMs… but not w/the level
of support & features we might like CoreOS– Zero-touch admin Linux distro with docker images as the unit of operation– Centralized key/value store to coordinate distributed environment
Various other 3rd party apps– Maestro for docker– Shipyard for docker– Fleet for CoreOS– Etc.
LXC migration– Container migration via criu
But…– Still no great way to tie all virtual resources together with LXC – e.g. storage +
networking• IMO; an area which needs focus for LXC to become more generally applicable
04/11/2023 32
LXC Gaps
There are gaps…
Lack of industry tooling / support Live migration still a WIP Full orchestration across resources (compute / storage / networking) Fears of security Not a well known technology… yet Integration with existing virtualization and Cloud tooling Not much / any industry standards Missing skillset Slower upstream support due to kernel dev process Etc.
04/11/2023 33
LXC: Use Cases For Traditional VMs
There are still use cases where traditional VMs are warranted.
Virtualization of non Linux based OSs– Windows– AIX– Etc.
LXC not supported on host VM requires unique kernel setup which is not applicable to other VMs on the host
(i.e. per VM kernel config) Etc.
04/11/2023 34
References & Related Links http://www.slideshare.net/BodenRussell/realizing-linux-containerslxc http://bodenr.blogspot.com/2014/05/kvm-and-docker-lxc-benchmarking-with.htm
l https://www.docker.io/ http://sysbench.sourceforge.net/ http://dag.wiee.rs/home-made/dstat/ http://www.openstack.org/ https://wiki.openstack.org/wiki/Rally https://wiki.openstack.org/wiki/Docker http://devstack.org/ http://www.linux-kvm.org/page/Main_Page https://github.com/stackforge/nova-docker https://github.com/dotcloud/docker-registry http://www.netperf.org/netperf/ http://www.tokutek.com/products/iibench/ http://www.brendangregg.com/activebenchmarking.html http://wiki.openvz.org/Performance