The Day of the EXE Is Upon Us
@brandon_rhodes
PyCon 2014
Montréal

von Neumann

0x86

LDA #134

0x86 0x86

Code   Data

0x86 0x86

Ambiguous
Dangerous
homoiconic
Bare metal

lie

x86.js

compiler, n.   data → code

Problem

Targeting processors
with machine code is
slow
difficult
specialized
Python wants to run
on every single processor
But without having to
know everything about
every single processor
All problems in computer
science can be solved
by another level
of indirection”
— David Wheeler

Bytecode

Python compiles to a pretend
machine language that it runs
inside of a pretend processor

secret:

CPython does not,
by my earlier definition,
do actual compiling
     code               data
┌─────────────┐     ┌──────────┐
                    text     'file.py'
   CPython        code obj   'file.pyc'
                  bytecode 
└─────────────┘     └──────────┘
$ export PYTHONDONTWRITEBYTECODE=1
     code               data
┌─────────────┐     ┌──────────┐
                    text     'file.py'
   CPython        code obj 
                  bytecode 
└─────────────┘     └──────────┘
At no point does your Python script
gain the dignity of becoming code
from the machine’s point of view
>>> # Infinite paraboloid
>>>
>>> def f(x, y):
...     return x * x + y * y
...
>>> f(3, 4)
25
Hp16c.jpg
>>> from dis import dis
>>> dis(f)
  2           0 LOAD_FAST                0 (x)
              3 LOAD_FAST                0 (x)
              6 BINARY_MULTIPLY
              7 LOAD_FAST                1 (y)
             10 LOAD_FAST                1 (y)
             13 BINARY_MULTIPLY
             14 BINARY_ADD
             15 RETURN_VALUE

see

“All-Singing All-Dancing
Python Bytecode”
Larry Hastings
PyCon 2013

ceval.c

for (;;) {
    ...
    opcode = NEXTOP();
    ...
    switch (opcode) {
    TARGET(NOP) ...
    TARGET(LOAD_FAST) ...
    TARGET(BINARY_MULTIPLY) ...
        PyNumber_Multiply(left, right); ...
    TARGET(BINARY_ADD) ...
    TARGET(RETURN_VALUE) ...
    }
    ...
}

abstract.c

PyObject *
PyNumber_Multiply(PyObject *v, PyObject *w)
{
    PyObject *result = binary_op1(
          v, w, NB_SLOT(nb_multiply));
    ...
    return result;
}

floatobject.c

static PyNumberMethods float_as_number = {
    float_add,          /*nb_add*/
    float_sub,          /*nb_subtract*/
    float_mul,          /*nb_multiply*/
    float_rem,          /*nb_remainder*/
    float_divmod,       /*nb_divmod*/
    float_pow,          /*nb_power*/

floatobject.c

static PyObject *
float_mul(PyObject *v, PyObject *w)
{
    double a,b;
    CONVERT_TO_DOUBLE(v, a);
    CONVERT_TO_DOUBLE(w, b);
    PyFPE_START_PROTECT("multiply", return 0)
    a = a * b;
    PyFPE_END_PROTECT(a)
    return PyFloat_FromDouble(a);
}

code   data

         code                data
───────────────────────  ─────────────

PyEval_EvalFrameEx()     
                         next bytecode
TARGET(BINARY_MULTIPLY)  
PyNumber_Multiply()      
                         <type 'float'>
float_mul()              

Q:

Is Python slow
because it is
interpreted?
Or, is Python slow
because it is
dynamic?
         code                data
───────────────────────  ─────────────

PyEval_EvalFrameEx()     
                         next bytecode
TARGET(BINARY_MULTIPLY)  
PyNumber_Multiply()      
                         <type 'float'>
float_mul()              

Skip Interpretation

PyObject *res1, res2;
res1 = PyNumber_Multiply(x, x);
res2 = PyNumber_Multiply(y, y);
res3 = PyNumber_Add(res1, res2);
Py_DECREF(res1);
Py_DECREF(res2);
return res3;

at best

40% speedup

CPython incurs at worst a 30%
overhead from interpretation

Skip Allocation & Dynamic Dispatch

float res1, res2;
res1 = x * x;
res2 = y * y;
return res1 + res2;
         code                data
───────────────────────  ─────────────

PyEval_EvalFrameEx()     
                         next bytecode
TARGET(BINARY_MULTIPLY)  
PyNumber_Multiply()      
                         <type 'float'>
float_mul()              

result

Gives an additional
574% speedup
CPython can spend 85%
of its time dispatching

Moral

If you want Python fast,
fix dynamic not interpreted
Fix dynamic?
Explicit — Numba
Fix dynamic?
Magic — PyPy

Numba

@numba.jit('f8,f8')
def f(x, y):
    return x * x + y * y

see also MicroPython

@micropython.native
...

@micropython.viper
...

Magic

Old: Reigning: Announced:
Psyco PyPy Pyston
Unladen Swallow    
(ShedSkin)    

EXE

Why?

Distribution

python.exe
app/__init__.py
app/module_a.py
app/module_b.py
app/startup.py
otherlib/__init__.py
otherlib/module.py
binlib1.pyd
binlib2.pyd
dependency1.dll
dependency2.dll
start.bat
  # python.exe -m app.startup
Python can import .py and .pyc files
from a ZIP archive, and will execute
__main__.py if given a ZIP to run
python.exe
source.zip
binlib1.pyd
binlib2.pyd
dependency1.dll
dependency2.dll
start.bat
  # python.exe source.zip
On Unix, you can roll the source ZIP
together with the startup script:
$ (echo '#!/usr/bin/env python2';
   cat source.zip) > app
$ chmod +x app
$ ./app
But everywhere: you can append ZIP files
to binaries without harming them

$ cat python.exe source.zip > app.exe

app.exe
binlib1.pyd
binlib2.pyd
dependency1.dll
dependency2.dll
start.bat
  # app.exe app.exe
Or you can add a module
that Python auto-imports
app.exe
binlib1.pyd
binlib2.pyd
dependency1.dll
dependency2.dll
sitecustomize.py
But if you are going
to go to all of this trouble,
why not compile an interpreter
to auto-start your app?
py2exe
py2app
bbFreeze
cx_Freeze
PyInstaller

What is left?

app.exe
binlib1.pyd
binlib2.pyd
dependency1.dll
dependency2.dll

Binary modules and DLLs

Binary modules and DLLs

  1. Leave them alongside EXE

Binary modules and DLLs

  1. Leave them alongside EXE
  2. Secretly unpack at runtime

Binary modules and DLLs

  1. Leave them alongside EXE
  2. Secretly unpack at runtime
  3. Use this one weird trick

py2exe

emulate the Portable
Executable loader—”
DWORD *patchAddrHL;
int type, offset;

// the upper 4 bits define the type of relocation
type = *relInfo >> 12;
// the lower 12 bits define the offset
offset = *relInfo & 0xfff;
When it works, only py2exe
provides a truly general
single-file bundle!

Thanks to this one weird trick

Inline files

(Text, templates, images)

If a package uses plain old open()
then you must monkeypatch, or keep
plain files outside of EXE
import pkgutil
textdata = pkgutil.get_data(
    'app', 'templates/layout.html')
Who would need anything more?
“I myself dared to pass the doors
of the Necromancer in Dol Guldur
and secretly explored his ways”
I once worked in the enterprise
The enterprise sometimes
demands EXE’s specifically
Core language developers see no point
in merely eliminating interpretation —
they are targeting dispatch
         code                data
───────────────────────  ─────────────

PyEval_EvalFrameEx()     
                         next bytecode
TARGET(BINARY_MULTIPLY)  
PyNumber_Multiply()      
                         <type 'float'>
float_mul()              

The enterprise

Conservative culture
Look-alike phenomenon
Fear of decompilation

“This saved my life once”

“+1 another life saved ;)”

“+1 .. and another :)”

“+1 .. and me :P”

“just saved me from a nasty accident”

“It saved me hours
of work after a rm *.py
instead of rm *.pyc
Maybe all of those .pyc
files are useful after all

PSA

Do not set

PYTHONDONTWRITEBYTECODE=1

if they are your only backup

j/k

Without the .pyc files you
won’t rm *.pyc in the first place

git init

So

What if you need a real EXE?

res1 = PyNumber_Multiply(x, x);
res2 = PyNumber_Multiply(y, y);
res3 = PyNumber_Add(res1, res2);
The Day of the EXE Is Upon Us

Two solutions!

Nuitka

Compiler + Bundler

print "Hello, world."worked
print "Hello, world."worked
import PyCryptoworked
print "Hello, world."worked
import PyCryptoworked
import M2Cryptoworked
You can bundle a dependency or
put it elsewhere on the PYTHONPATH

Nuitka has a competitor

Cython

Compiler

Nuitka → C++ → Machine code
Cython → C → Machine code
Cython does not bundle
so you have to roll your
own single file executable

cascade imports cascade2 import cascade3

cdef extern:
    void initcascade()
    void initcascade3()
    void initcascade2()

initcascade3()
initcascade2()
initcascade()

import cascade
cascade.main()

Cython

Produces Python-version agnostic C

superpowers.png
$ ~/venv-27/bin/cython boto/...
$ ...(compile)...
$ ~/venv-33/bin/python -c 'import boto'
ImportError: No module named StringIO
ImportError: No module named ConfigParser

etc

Monkey patch StringIO
Monkey patch base64
etc
$ python test_boto.py

de405-1997.tar.gz
de406-1997.tar.gz
de421-2008.tar.gz
de422-2009.tar.gz
de423-2010.tar.gz
novas_de405-1997.tar.gz
packages.html

Cython

cdef float f(float x, float y):
    return x * x + y * y

Mashup of solutions

Cython → C.so → PyInstaller → single file

Sprint!

The Day of the EXE Is Upon Us

Thank you very much!

@brandon_rhodes