Effectively using Open Source with conda

43
Effectively using Open Source with conda Travis E. Oliphant, PhD Continuum Analytics, Inc

description

Conda is a cross-platform package manager that lets you quickly and easily build environments containing complicated software stacks. It was built to manage the NumPy stack in Python but can be used to manage any complex software dependencies.

Transcript of Effectively using Open Source with conda

Page 1: Effectively using Open Source with conda

Effectively using Open Source with conda

Travis E. Oliphant, PhD Continuum Analytics, Inc

Page 2: Effectively using Open Source with conda

The Opportunity

• Millions of projects that can be used in the enterprise

• Not enough to just adopt once — these projects change rapidly

• Effective use requires a plan for managing updates

Page 3: Effectively using Open Source with conda

The ChallengeSeparation of Concerns leads to granular libraries with

often deep dependencies

Page 4: Effectively using Open Source with conda

The Challenge

• Different “entry-points” (end-user applications or scripts) can have different dependencies. Often many of the dependencies are shared but a few applications need different versions of some packages.

• Not specific to any particular language or ecosystem. Python, Ruby, Node.Js, C/C++, .NET, Java, all have the same problem: How do you manage software life-cycle effectively?

• Production deployments need stability. IT managers want ease of deployment and testing. Developers want agility and ease of development.

Page 5: Effectively using Open Source with conda

The Challenge

How can developers and domain experts in an organization quickly and easily take advantage of the latest software developments yet still have stable production deployments of complex software?

You cannot take full advantage of the pace of open-source development if

you don’t address this!

Page 6: Effectively using Open Source with conda

Case Study: SciPy

There was this thing called the Internet and one could make a web-page and put code up on it and people started using it ...

Facebook for Hackers

I started SciPy in 1999 while I was in grad-school at the Mayo Clinic

(it was called Multipack back then)

Page 7: Effectively using Open Source with conda

Case Study: SciPy

Packaging circa 1999: Source tar ball and make file (users had to build)

SciPy is basically a bunch of C/C++/Fortran routines with Python interfaces

Observation: Popularity of Multipack (Early SciPy) grew significantly when Robert Kern made pre-

built binaries for Windows

Page 8: Effectively using Open Source with conda

Case Study: SciPy• Difficulty of producing binaries plus the desire to avoid

the dependency chain and lack of broad packaging solutions led to early SciPy being a “distribution” instead of separate inter-related libraries.

• There were (and are) too many different projects in SciPy (projects need 1-5 core contributors for communication dynamic reasons related to team-sizes)

Page 9: Effectively using Open Source with conda

Case Study: NumPy

I started writing NumPy in 2005 while I was teaching at BYU (it was a merger

of Numeric and Numarray)

NumPy ABI has not changed “officially” since 1.0 came out in 2006

Presumably extension modules (SciPy, scikit-learn, matplotlib, etc.) compiled against NumPy 1.0 will still work on NumPy 1.8.1

This was not a design goal!!!

Page 10: Effectively using Open Source with conda

Case Study: NumPyThis was a point of some contention and community difficulty when date-time was added in version 1.4 (impossible without changing the ABI in some way) but not really settled until version 1.7

The fundamental reason was a user-driven obsession with keeping ABI compatibility.

Windows users lacked useful packaging solution in face of NumPy-Stack

Page 11: Effectively using Open Source with conda

NumPy Stack (cry for conda...)

NumPy

SciPy Pandas Matplotlib

scikit-learnscikit-image statsmodels

PyTables

OpenCV

Cython

Numba SymPy NumExpr

astropy BioPython GDALPySAL

... many many more ...

Page 12: Effectively using Open Source with conda

Fundamental Principles• Complex things are built out of simple things • Fundamental principle of software engineering is

“separation of concerns” (modularity) • Reusability is enhanced when you “do one thing

and do it well” • But, to deploy you need to bring the pieces back

together. !

• This means you need a good packaging system for binary artifacts — with multiple-environments.

Page 13: Effectively using Open Source with conda

Continuum Solutions (Free)Conda

binstar.org Anaconda

Free all-in-one distribution of Python for Analytics and Visualization

• numpy, scipy, ipython • matplotlib, bokeh, • pandas, statsmodels, scikit-learn • many, many more… 100+

Miniconda

Python + conda — with these you can install exactly what you want…

• Binary repository of packages (public) • Multiple package types • Free public build queue • Current focus on:

• Python pypi-compatible packages (source distributions)

• conda packages (binary distributions)

$ conda install anaconda

• Cross-platform package manager • Dependency management (uses SAT

solver to resolve all dependencies) • System-level virtual environments (more

flexible than virtualenv)

Page 14: Effectively using Open Source with conda

Continuum Solutions (Premium)Anaconda

Server• Binary repository for private package Premium features: • hosting of private packages (public

packages are free) • access to priority build queue

• $10 / month (individuals) • 25 private packages • 5 GB disk space

• $50 / month (organizations) • 200 private packages • 30 GB disk space • right to have private packages in

organizations • $1500 / year

• unlimited private packages • 100 GB of disk space

binstar.org

• Internal mirror of public repositories • Mix private internal packages with public

repositories • Build customized versions of Anaconda

installers • Environment to .exe and .rpm tools • Comprehensive licensing • Comprehensive support • On-premise version of binstar.org

Page 15: Effectively using Open Source with conda

System Packaging solutions

yum (rpm) apt-get (dpkg)

Linux OSX

macports homebrew

Windows

chocolatey npackd

Cross-platform

conda

With virtual environments conda provides a modern, cross-platform, system-level packaging and deployment solution

Page 16: Effectively using Open Source with conda

Conda Features• Excellent support for “system-level” environments (like

having mini VMs but much lighter weight than docker.io) • Minimizes code-copies (uses hard/soft links if possible) • Dependency solver using fast satisfiability solver (SAT

solver) • Simple format binary tar-ball + meta-data • Meta-data allows static analysis of dependencies • Easy to create multiple “channels” which are repositories

for binary packages • User installable (no root privileges needed) • Can still use tools like pip --- conda fills in where they

fail.

Page 17: Effectively using Open Source with conda

ExamplesSetup a test environment

$ conda update conda $ conda create -n test python pip $ source activate test

Install another package

(test)$ conda install scikit-learn

$ activate test

Windows

Page 18: Effectively using Open Source with conda

First steps

$ conda create -n py3k python=3.3 $ source activate py3k

Create an environment

Install IPython notebook

(py3k) $ conda install ipython-notebook

$ conda create -n py3k python=3.3 ipython-notebook $ source activate py3k

All in One

Page 19: Effectively using Open Source with conda

Anaconda installationROOT_DIR!The directory that Anaconda was installed into; for example, /opt/Anaconda or C:\Anaconda!

/pkgs!Also referred to as PKGS_DIR. This directory contains exploded packages, ready to be linked in conda environments. Each package resides in a subdirectory corresponding to its canonical name.!

/envs!The system location for additional conda environments to be created.!

!the default, or root, environment! /bin! /include! /lib! /share

Page 20: Effectively using Open Source with conda

Look at conda package --- a simple .tar.bz2

http://docs.continuum.io/conda/intro.html

Page 21: Effectively using Open Source with conda

Anatomy of unpacked conda package

/lib /include /bin /man

/info files index.json

bzipped tarfile of all the files comprising the package at the full-paths they would

be installed to relative to a “system” install or “chroot jail”

an environment is just a “union” of these paths

All conda packages have this info directory which contains meta-data for tracked files,

dependency information, etc.

Page 22: Effectively using Open Source with conda

EnvironmentsOne honking great idea! Let’s do more of those!

Easy to make Easy to throw away

Uses: • Testing (python 2.6, 2.7, 3.3) • Development • Trying new packages from PyPI • Separating deployed apps with

different dependency needs • Trying new versions of Python • Reproducing someone’s work conda create -h

Page 23: Effectively using Open Source with conda

conda info -e

Getting System information

Basic info

conda info

Named-environment info

conda info --all

System info

conda info --system

Page 24: Effectively using Open Source with conda

conda install -n py3k scipy pip

http://repo.continuum.io/pkgs/dev Experimental or developmental versions of packages

http://repo.continuum.io/pkgs/gpl GPL licensed packages

http://repo.continuum.io/pkgs/free non GPL open source packages

Default package repositories (configurable)

Installing packages

Page 25: Effectively using Open Source with conda

How it works

Channel 1

Channel 2

Channel N

metadata

metadata

metadata

conda merged metadata

l l l

Page 26: Effectively using Open Source with conda

Create channels

• Create a directory of conda packages • Run conda index <dirname> • Either use file:///path/to/dir in .condarc or

use simple web server on the /path/to/dir

Option 1

Option 2

Use binstar.org (also available as on-premise solution with Anaconda Server)

Page 27: Effectively using Open Source with conda

Binstar.org — channels (request invite)

conda install -c <channel name>

<pkg name> !

will install from binstar channel

!or you can add channel to your

config file

free for public packages

Page 28: Effectively using Open Source with conda

conda list also includes packages installed via pip!

List Installed packages

conda create -n py3k scipy pip source activate py3k pip install pint

$ conda list

# packages in environment at /Users/travis/anaconda/envs/py3k: # numpy 1.8.1 py27_0 openssl 1.0.1g 0 pint 0.4.2 <pip> pip 1.5.4 py27_0 python 2.7.6 1 readline 6.2 2 scipy 0.13.3 np18py27_0 setuptools 3.1 py27_0 sqlite 3.7.13 1 tk 8.5.13 1 wsgiref 0.1.2 <pip> zlib 1.2.7 1

Output

Page 29: Effectively using Open Source with conda

Update a package to latest

conda update pandas get the latest pandas from the channels you are subscribed to

conda update anaconda change to the latest released anaconda including its specific dependencies

this can downgrade packages if they are newer than those in

the “released” Anaconda

conda update --allTo update all the packages in an

environment to the latest versions use the --all option

Page 30: Effectively using Open Source with conda

conda search <regex>Search for a package

Find packages and channels they are in

conda search --outdated sympy Only show packages matching regex that are installed but outdated

conda search typo

typogrify * 2.0.0 py27_0 http://conda.binstar.org/travis/osx-64/ 2.0.0 py33_1 http://conda.binstar.org/asmeurer/osx-64/ 2.0.0 py26_1 http://conda.binstar.org/asmeurer/osx-64/

sympy 0.7.1 py27_0 defaults ! 0.7.4 py26_0 defaults 0.7.4.1 py33_0 defaults * 0.7.4.1 py27_0 defaults 0.7.4.1 py26_0 defaults 0.7.5 py34_0 defaults 0.7.5 py33_0 defaults

l l l

l l l

Page 31: Effectively using Open Source with conda

conda remove -n py3k scipy matplotlib

Removing files and environments

Removing Packages

Removing Environment

conda remove -n py3k --all

Note: packages are just “unlinked” from environment. All the files are still available

unpacked in a package cache.

Removing unused packages

conda clean -t conda clean -p

Remove unused tarballs Remove unused directories

Page 32: Effectively using Open Source with conda

conda package -u conda package --pkg-name bulk --pkg-version 0.1

Untracked Files

Easy way to install into an environment using anything (pip, make, setup.py, etc.) and then package up all of it into a binary tar-ball deployable via conda install <pkg-name>.tar.bz2 !

pickle for binary code!

Page 33: Effectively using Open Source with conda

# This is a sample .condarc file !# channel locations. These override conda defaults, i.e., conda will # search *only* the channels listed here, in the order given. Use "default" to # automatically include all default channels. !channels: - defaults - http://some.custom/channel !# Proxy settings # http://[username]:[password]@[server]:[port] proxy_servers: http: http://user:[email protected]:8080 https: https://user:[email protected]:8080 !envs_dirs: - /opt/anaconda/envs - /home/joe/my-envs !pkg_dirs: - /home/joe/user-pkg-cache - /opt/system/pkgs !changeps1: False !# binstar.org upload (not defined here means ask) binstar_upload: True

Conda configuration

Scripting interface

conda config —add KEY VALUE

conda config —remove-key KEY

conda config —get KEY

conda config —set KEY BOOL

conda config —remove KEY VALUE

Page 34: Effectively using Open Source with conda

conda skeleton pypi <pypi-name>

Building new packages

conda build <recipe-dir>

Option 1

Option 2

conda pipbuild <pypi-name>

conda install conda-build

Page 35: Effectively using Open Source with conda

Conda Recipe is a directory

build.sh BASH build commands (POSIX) bld.bat CMD build commands (Win) meta.yaml extended yaml declarative meta-data

Required

Optional

run_test.py will be executed during test phase *.patch patch-files for the source * any other resources needed by build but not included

in sources described in meta.yaml file

Page 36: Effectively using Open Source with conda

Recipe MetaDatapackage: name: # name of package version: # version of package about: home: # home-page license: # license !# All optional from here.... source: fn: # filename of source url: # url of source md5: # hash of source # or from git: git_url: git_tag: patches: # list of patches to source - fix.patch build: entry_points: # entry-points (binary commands or scripts) - name = module:function number: # defaults to 0 requirements: # lists of requirements build: # requirements for build (as a list) run: # requirements for running (as a list) test: requires: # list of requirements for testing commands: # commands to run for testing (entry-points) imports: # modules to import for testing

http://docs.continuum.io/conda/build.html

Page 37: Effectively using Open Source with conda

Converting to another platformConda packages are specific to a particular

platform. However, if there are no platform-specific binary files in a package, it can be

converted automatically to a package that can be installed on another platform.

conda convert --output-dir win32 --platform win-32 <package-file>

Example

Page 38: Effectively using Open Source with conda

Binstar.org (request invite)

Once you have built a

conda package, you can share it

with the world on binstar.org

!conda install -c <name> <pkgname>

free for public packages

Page 39: Effectively using Open Source with conda

Binstar

$ conda config --add channels 'http://conda.binstar.org/travis' $ conda config --add channels

'http://conda.binstar.org/asmuerer'

Adding channels

Uploading packages

binstar upload /full/path/to/package.tar.bz2

binstar register /full/path/to/package.tar.bz2

if package never uploaded before

Page 40: Effectively using Open Source with conda

Binstar Package Types

Permissions Description

Private Only people given permission can see this package.

Personal Everyone will be able to see this package in your user repository.

Publish This package will be published in the global public repository.

Page 41: Effectively using Open Source with conda

Useful aliases

workon=‘source activate’ workoff=‘source deactivate’

Page 42: Effectively using Open Source with conda

• Cross-platform Tested and Supported Python Distribution

• Enterprise Python Deployment • Private, Secure On-premise package repository • Comprehensive Licensing • Customized Installers and Mirrors • Additional Products • Enhanced Support • Optional, On-premise binstar.org

Page 43: Effectively using Open Source with conda

Thanks!

Aaron Meurer conda and binstar developer

Sean Ross-Ross (principal binstar.org)

Bryan Van de Ven (original conda author)

Ilan Schnell (principal conda developer)