Python as C++’s limiting case

code::dive • Wrocław

2018 November 17

Brandon Rhodes

“limiting case”

Polygon of n sides

n = 3, 4, 5, …
triangle, square, petagon, …

Polygon of n sides

as n → ∞, shape → circle
“limiting case”

“limiting case”

C++ → Python

divestment

IRQs

Simply C++ without repeating the

sins of the old scripting languages

“scripting language”

• takes source as input
• compiler remains present

eval("3 + 3")
tokenize("3 + 3")
ast.parse("3 + 3")

Syntax

Perl

my $email_address = 'a.n@example.com';

Bash

echo Hello, $user

Perl

my $email_address = 'a.n@example.com';

Syntax?

Let’s use C’s

a = b

a = b

but not an expression
— a statement!

== != > < >= <=
+ - * / % & | ^

'Hello, %s!' % name

123 0x10 1e+100

"String syntax.\r\n"

if
else
while
break
continue

for item in sequence:
    print(item)

# Iterator-powered range-based “for”,
# like C++11

Keep exceptions & exception handling

Slicing

text[5:10]

Trailing comma

Pascal

“;” is separator

if (n = 0) then begin
  writeln('N is now zero');
  func := 1
end

K&R C

“;” is terminator

K&R C

“;” is terminator
“,” is separator

int a[] = {4, 5, 6};

int a[] = {
     4,
     5,
     6
};

Python

“,” is optional

# Leave it off

a = [4, 5, 6]

a = [
    4,
    5,
    6
]

a = [4, 5, 6,]

a = [
    4,
    5,
    6,
]

# Or, put it in

C99, C++11

“,” became optional!

Go

Trailing “,” mandatory at line
end, disallowed it otherwise

C++ → namespaces!

But simpler: the foo.py
namespace is always named foo

import json
j = json.load(file)

import pickle
p = pickle.load(file)

static

_ignore_this

function overloading

Instead, optional arguments
and keyword arguments

def write_csv(data, sep=',', eol='\n', quote='"'):
    ...

write_csv(rows)
write_csv(rows, ' ')  # “data, sep”
write_csv(rows, quote=':')

# Like C++, give full access
# to raw UNIX system calls!

s = socket.socket()
s.connect(address)
s.recv(1024)

Fixes

undefined behavior
unspecified behavior
implementation-defined behavior

“Python evaluates expressions from left to right”

Precedence

oflags & 0x80 == nflags & 0x80

oflags & (0x80 == nflags) & 0x80

              → 0

In Python
oflags & 0x80 == nflags & 0x80
means
(oflags & 0x80) == (nflags & 0x80)

Readability

&& → and

|| → or

Bonus: and and or will
short-circuit, like in C/C++

but also will return the final
value, instead of 0 or 1!

s = error.message or 'Error'

Extend C’s good idea
that 0 is “false” to empty
strings and containers

if 0:   # False
if 0.0: # False
if "":  # False
if []:  # False

if not my_list:
    # (list was empty)

eliminate overflow

int i = 2147483647;
i++;
printf("%d\n", i);

# -2147483648

i = 2147483647
print(i + 1)

# 2147483648

print(2 ** 257)

print(2 ** 257)

2315841784746323-
9084714197001737-
5815706539969331-
2811280789151680-
15826259279872

Type system: loose or strict?

js> '4' - 3

js> '4' - 3
1

js> '4' - 3
1
js> '4' + 3

js> '4' - 3
1
js> '4' + 3
"43"

js> [] + []

js> [] + []
""

js> [] + {}

js> [] + {}
"[object Object]"

js> [] + {}
"[object Object]"
js> {} + []

js> [] + {}
"[object Object]"
js> {} + []
0

“Wat”

Gary Bernhardt

perl> "3" + 1
4

$ echo | awk '{print "3" + 1}'
4

Traditional scripting languages

thought loose types were convenient

The truth: loose types are terrible

• Locality

• Readability

>>> '4' - 3
TypeError
>>> '4' + 3
TypeError

>>> int('4') + 3
7
>>> '4' + str(3)
'43'

Explicit is better than implicit

Errors should never pass silently

Python: strict type system

C++

Values ↔ Pointers

Python

Values ↔ Pointers

a = 3

b = a

How many list types?

How many list types?

list<pointer>

How many hash table types?

dict<pointer, pointer>

template

Everything-a-pointer
means data structures are
automatically composable

[{'ext': '.c', 'binary': 'gcc'},
 {'ext': '.py', 'binary': 'python3'},
 {'ext': '.sh', 'binary': 'bash'}]

{'linter': ['lint'],
 'compile': ['gcc', 'strip'],
 'archive': ['tar']}

Q: But what if you want

compound hash map keys?

A: List [] is mutable

but tuple () immutable

errors = {('load.py', 8): '"foo" undefined',
          ('load.py', 47): 'unused name "n"',
          ('store.py', 12): 'syntax error'}

Tuple dictionary keys
solve instantly a terrible
old problem in scripting

'store.py' 12
'load.py' 8
'load.py' 47

'store.py:12'
'load.py:8'
'load.py:47'

'load.py:47'
'load.py:8'
'store.py:12'

'load.py:000008'
'load.py:000047'
'store.py:0000012'

{('load.py', 8): '"foo" undefined',
 ('load.py', 47): 'unused name "n"',
 ('store.py', 12): 'Syntax error'}

Tuples (and lists) support sorting

[('dropbox.com', 443),
 ('google.com', 80),
 ('google.com', 443),
 ('stackoverflow.com', 443)]

Return multiple values

return (hostname, port)

Unpacking

host, port = get_address()

Swapping

x, y = y, x

Everything-a-pointer
means functions and types
become eligible data!

Design Patterns

openers = {
    '.pdf': pdflib.parse,
    '.png': imagelib.open,
    '.txt': str.read,
}

file_types = {
    '.pdf': PdfDocument,
    '.png': BitmapImage,
    '.txt': str,
}

David Wheeler

“All problems
in computer science
can be solved by another
level of indirection”

Think of Python as C++ but

with uniform indirection

Everything-a-pointer

But what do the
pointers point at?

C++ — many types

Python — one type

struct {
   struct _typeobject *ob_type;
   /* followed by object’s data */
}

struct _typeobject {
    ...
    getattrfunc tp_getattr;
    setattrfunc tp_setattr;
    ...
    newfunc tp_new;
    freefunc tp_free;
    ...
    binaryfunc nb_add;
    binaryfunc nb_subtract;
    ...
    richcmpfunc tp_richcompare;
    ...
}

In C++ types
belong to names
int a = 3

In Python types
belong to values
a = 3

declarations

Only support two pointer operations

#1 Copying

a = b
my_list[0] = n
my_dict['Wrocław'] = 'CET'

Copying is safe

#2 Calling a method

Python code can only
interact with a build-in object
through its methods

struct _typeobject {
    ...
    getattrfunc tp_getattr;
    setattrfunc tp_setattr;
    ...
    newfunc tp_new;
    freefunc tp_free;
    ...
    binaryfunc nb_add;
    binaryfunc nb_subtract;
    ...
    richcmpfunc tp_richcompare;
    ...
}

n = 3
some_function(n)

# Q: I just passed a pointer!
# Could some_function() have changed “3”?

A: Python only lets you call methods, and
int, float, str types are read-only —
their methods don’t implement “set”!

Immutable int, float, str provide
the semantics of C++ automatic variables
in an everything-a-pointer language

Uniform types and methods

make generic operators easy!

min([8, 3, 5])
max([3.125, 3.16, 3.139])
min(['Warszawa', 'Kraków', 'Wrocław'])

Great idea from C++:

newfunc tp_new;

• Arenas
• Flywheels

Q: NULL ?

A: Nope.

Python a = b syntax provides no
mechanism for creating a NULL pointer

Python

• Segmentation faults: impossible
• Buffer overflows: impossible

But wait

If the type struct is fixed,
what about user classes?

class Address(object):
    def __init__(self, host, port):
        self.host = host
        self.port = port

struct _typeobject {
    ...
    getattrfunc tp_getattr;
    setattrfunc tp_setattr;
    ...
    newfunc tp_new;
    ...

For a user-defined class —

tp_new() creates a hash table
tp_setattr() calls its set-item
tp_getattr() calls its get-item

C++ private and protected

In Python —

private protected

Built-in types: private members
User class attributes: entirely public

Language integrity is the only
excuse for hiding

Q: Memory Management?

A: shared_ptr

struct {
   struct _typeobject *ob_type;
   /* followed by object’s data */
}

struct {
   Py_ssize_t ob_refcnt;
   struct _typeobject *ob_type;
   /* followed by object’s data */
}

Reference counting vs GC

Instant — as soon as count drops
to zero, memory is freed

Reference counting vs GC

Locality — freed memory is often
re-used immediately, maybe
while still in-cache!

Caveat

MicroPython: int, str inside pointers
PyPy: JIT to machine code

Q: What did Python programmers get wrong?

A: Python 2 type system became

even stronger in Python 3

>>> 'byte string ' + u'unicode string'
u'byte string unicode string'

>>> b'byte string ' + u'unicode string'
TypeError: can't concat bytes to str

>>> sorted(['b', 1, 'a', 2])
[1, 2, 'a', 'b']

>>> sorted(['b', 1, 'a', 2])
TypeError: unorderable types: int() < str()

A: Programmers over-used dynamic features

class Address(object):
    def __init__(self, host, port):
        self.host = host
        self.port = port

# But what if the port wasn’t specified?

class Address(object):
    def __init__(self, host, port=None):
        self.host = host
        if port is not None: # So terrible
            self.port = port

# Code was forced to use introspection
# (terrible!)

if hasattr(addr, 'port'):
    print(addr.port)

# Today’s best practice:
# every attribute always present

if addr.port is not None:
    print(addr.port)

With respect to stability
of a class’s attribute lists:

Python classes → C++ classes

Another bad habit:

Python routines that tried to
support many argument types
instead of just one

def Dataframe(object):
    def __init__(self, columns):
        if isinstance(columns, str):
            columns = str.split(',')
        self.columns = columns

Dataframe(['date', 'x', 'y'])
Dataframe('date,x,y')

Code that supports several types
can be tempting because of convenience —
but is usually harder to test,
modify, and debug

Python code → C++ code

It’s usually a design bug

if you need eval()

Unless you’re writing the
Python language server for
the IPython Notebook

Then, eval() away!

What in general is the
value of introspection?

getattr()
hasattr()
isinstance()

getattr()
hasattr()
isinstance()

Python’s support for introspection
is a huge win for code about code:

loggers, linters, debuggers, tests

We are living through a great age of

consolidation of programming practice

shared_ptr prevalence in Tensorflow

Limiting case

t → ∞

Limiting case

t → ∞
C++ → Python

declarations
new delete template
private protected
NULL static
function overloading

@brandon_rhodes

dispatch only on obj, not other args

everything “virtual” (is that what they’re called) in inheritance