Python as C++’s limiting case

code::dive • Wrocław
2018 November 17
Brandon Rhodes
“limiting case”
Polygon of n sides

n = 3, 4, 5, …
triangle, square, petagon, …
Polygon of n sides

as n → ∞, shape → circle
“limiting case”
“limiting case”

C++ → Python
divestment

IRQs
Simply C++ without repeating the
sins of the old scripting languages
“scripting language”

• takes source as input
• compiler remains present

eval("3 + 3")
tokenize("3 + 3")
ast.parse("3 + 3")
Syntax
Perl

my $email_address = 'a.n@example.com';
Bash

echo Hello, $user
Perl

my $email_address = 'a.n@example.com';
Syntax?

Let’s use C’s
a = b
a = b

but not an expression
— a statement!
== != > < >= <=
+ - * / % & | ^
'Hello, %s!' % name
123 0x10 1e+100
"String syntax.\r\n"
if
else
while
break
continue
for item in sequence:
    print(item)

# Iterator-powered range-based “for”,
# like C++11
Keep exceptions & exception handling
Slicing

text[5:10]
Trailing comma
Pascal

“;” is separator
if (n = 0) then begin
  writeln('N is now zero');
  func := 1
end
K&R C

“;” is terminator
K&R C

“;” is terminator
“,” is separator
int a[] = {4, 5, 6};
int a[] = {
     4,
     5,
     6
};
Python

“,” is optional
# Leave it off

a = [4, 5, 6]

a = [
    4,
    5,
    6
]
a = [4, 5, 6,]

a = [
    4,
    5,
    6,
]

# Or, put it in
C99, C++11

“,” became optional!
Go

Trailing “,” mandatory at line
end, disallowed it otherwise
C++ → namespaces!

But simpler: the foo.py
namespace is always named foo
import json
j = json.load(file)

import pickle
p = pickle.load(file)
static

_ignore_this
function overloading

Instead, optional arguments
and keyword arguments
def write_csv(data, sep=',', eol='\n', quote='"'):
    ...

write_csv(rows)
write_csv(rows, ' ')  # “data, sep”
write_csv(rows, quote=':')
# Like C++, give full access
# to raw UNIX system calls!

s = socket.socket()
s.connect(address)
s.recv(1024)
Fixes
undefined behavior
unspecified behavior
implementation-defined behavior

“Python evaluates expressions from left to right”
Precedence
oflags & 0x80 == nflags & 0x80
oflags & (0x80 == nflags) & 0x80

               0
In Python
oflags & 0x80 == nflags & 0x80
means
(oflags & 0x80) == (nflags & 0x80)
Readability
&&and
||or
Bonus: and and or will
short-circuit, like in C/C++

but also will return the final
value, instead of 0 or 1!

s = error.message or 'Error'
Extend C’s good idea
that 0 is “false” to empty
strings and containers
if 0:   # False
if 0.0: # False
if "":  # False
if []:  # False

if not my_list:
    # (list was empty)
eliminate overflow
int i = 2147483647;
i++;
printf("%d\n", i);

# -2147483648
i = 2147483647
print(i + 1)

# 2147483648
print(2 ** 257)
print(2 ** 257)

2315841784746323-
9084714197001737-
5815706539969331-
2811280789151680-
15826259279872
Type system: loose or strict?
js> '4' - 3
js> '4' - 3
1
js> '4' - 3
1
js> '4' + 3
js> '4' - 3
1
js> '4' + 3
"43"
js> [] + []
js> [] + []
""
js> [] + {}
js> [] + {}
"[object Object]"
js> [] + {}
"[object Object]"
js> {} + []
js> [] + {}
"[object Object]"
js> {} + []
0
Gary Bernhardt
perl> "3" + 1
4
$ echo | awk '{print "3" + 1}'
4
Traditional scripting languages
thought loose types were convenient
The truth: loose types are terrible
• Locality
• Readability
>>> '4' - 3
TypeError
>>> '4' + 3
TypeError
>>> int('4') + 3
7
>>> '4' + str(3)
'43'
Explicit is better than implicit
Errors should never pass silently
Python: strict type system
C++

Values ↔ Pointers
Python

Values ↔ Pointers
a = 3
b = a
How many list types?


How many list types?

list<pointer>
How many hash table types?

dict<pointer, pointer>
template
Everything-a-pointer
means data structures are
automatically composable
[{'ext': '.c', 'binary': 'gcc'},
 {'ext': '.py', 'binary': 'python3'},
 {'ext': '.sh', 'binary': 'bash'}]
{'linter': ['lint'],
 'compile': ['gcc', 'strip'],
 'archive': ['tar']}
Q: But what if you want
compound hash map keys?
A: List [] is mutable
but tuple () immutable
errors = {('load.py', 8): '"foo" undefined',
          ('load.py', 47): 'unused name "n"',
          ('store.py', 12): 'syntax error'}
Tuple dictionary keys
solve instantly a terrible
old problem in scripting
'store.py' 12
'load.py' 8
'load.py' 47
'store.py:12'
'load.py:8'
'load.py:47'
'load.py:47'
'load.py:8'
'store.py:12'
'load.py:000008'
'load.py:000047'
'store.py:0000012'
{('load.py', 8): '"foo" undefined',
 ('load.py', 47): 'unused name "n"',
 ('store.py', 12): 'Syntax error'}
Tuples (and lists) support sorting
[('dropbox.com', 443),
 ('google.com', 80),
 ('google.com', 443),
 ('stackoverflow.com', 443)]
Return multiple values

return (hostname, port)
Unpacking

host, port = get_address()
Swapping

x, y = y, x
Everything-a-pointer
means functions and types
become eligible data!

Design Patterns
openers = {
    '.pdf': pdflib.parse,
    '.png': imagelib.open,
    '.txt': str.read,
}
file_types = {
    '.pdf': PdfDocument,
    '.png': BitmapImage,
    '.txt': str,
}
David Wheeler

“All problems
in computer science
can be solved by another
level of indirection”
Think of Python as C++ but
with uniform indirection
Everything-a-pointer

But what do the
pointers point at?
C++ — many types
Python — one type
struct {
   struct _typeobject *ob_type;
   /* followed by objects data */
}
struct _typeobject {
    ...
    getattrfunc tp_getattr;
    setattrfunc tp_setattr;
    ...
    newfunc tp_new;
    freefunc tp_free;
    ...
    binaryfunc nb_add;
    binaryfunc nb_subtract;
    ...
    richcmpfunc tp_richcompare;
    ...
}
In C++ types
belong to names
int a = 3

In Python types
belong to values
a = 3

declarations
Only support two pointer operations
#1 Copying
a = b
my_list[0] = n
my_dict['Wrocław'] = 'CET'
Copying is safe
#2 Calling a method

Python code can only
interact with a build-in object
through its methods
struct _typeobject {
    ...
    getattrfunc tp_getattr;
    setattrfunc tp_setattr;
    ...
    newfunc tp_new;
    freefunc tp_free;
    ...
    binaryfunc nb_add;
    binaryfunc nb_subtract;
    ...
    richcmpfunc tp_richcompare;
    ...
}
n = 3
some_function(n)

# Q: I just passed a pointer!
# Could some_function() have changed “3”?
A: Python only lets you call methods, and
int, float, str types are read-only
their methods don’t implement “set”!
Immutable int, float, str provide
the semantics of C++ automatic variables
in an everything-a-pointer language
Uniform types and methods
make generic operators easy!
min([8, 3, 5])
max([3.125, 3.16, 3.139])
min(['Warszawa', 'Kraków', 'Wrocław'])
Great idea from C++:

newfunc tp_new;

• Arenas
• Flywheels
Q: NULL ?
A: Nope.

Python a = b syntax provides no
mechanism for creating a NULL pointer
Python

• Segmentation faults: impossible
• Buffer overflows: impossible
But wait

If the type struct is fixed,
what about user classes?
class Address(object):
    def __init__(self, host, port):
        self.host = host
        self.port = port
struct _typeobject {
    ...
    getattrfunc tp_getattr;
    setattrfunc tp_setattr;
    ...
    newfunc tp_new;
    ...
For a user-defined class

tp_new() creates a hash table
tp_setattr() calls its set-item
tp_getattr() calls its get-item
C++ private and protected
In Python —
private protected
Built-in types: private members
User class attributes: entirely public

Language integrity is the only
excuse for hiding
Q: Memory Management?
A: shared_ptr
struct {
   struct _typeobject *ob_type;
   /* followed by objects data */
}
struct {
   Py_ssize_t ob_refcnt;
   struct _typeobject *ob_type;
   /* followed by objects data */
}
Reference counting vs GC

Instant — as soon as count drops
to zero, memory is freed
Reference counting vs GC

Locality — freed memory is often
re-used immediately, maybe
while still in-cache!
Caveat

MicroPython: int, str inside pointers
PyPy: JIT to machine code
Q: What did Python programmers get wrong?
A: Python 2 type system became
even stronger in Python 3
>>> 'byte string ' + u'unicode string'
u'byte string unicode string'

>>> b'byte string ' + u'unicode string'
TypeError: can't concat bytes to str
>>> sorted(['b', 1, 'a', 2])
[1, 2, 'a', 'b']

>>> sorted(['b', 1, 'a', 2])
TypeError: unorderable types: int() < str()
A: Programmers over-used dynamic features
class Address(object):
    def __init__(self, host, port):
        self.host = host
        self.port = port

# But what if the port wasn’t specified?
class Address(object):
    def __init__(self, host, port=None):
        self.host = host
        if port is not None: # So terrible
            self.port = port
# Code was forced to use introspection
# (terrible!)

if hasattr(addr, 'port'):
    print(addr.port)
# Today’s best practice:
# every attribute always present

if addr.port is not None:
    print(addr.port)
With respect to stability
of a class’s attribute lists:

Python classes → C++ classes
Another bad habit:

Python routines that tried to
support many argument types
instead of just one
def Dataframe(object):
    def __init__(self, columns):
        if isinstance(columns, str):
            columns = str.split(',')
        self.columns = columns

Dataframe(['date', 'x', 'y'])
Dataframe('date,x,y')
Code that supports several types
can be tempting because of convenience
but is usually harder to test,
modify, and debug

Python code → C++ code
It’s usually a design bug
if you need eval()
Unless you’re writing the
Python language server for
the IPython Notebook

Then, eval() away!
What in general is the
value of introspection?

getattr()
hasattr()
isinstance()
getattr()
hasattr()
isinstance()

Python’s support for introspection
is a huge win for code about code:

loggers, linters, debuggers, tests
We are living through a great age of
consolidation of programming practice
shared_ptr prevalence in Tensorflow
tensorflot-shared-ptrs.png
Limiting case

t → ∞

Limiting case

t → ∞
C++ → Python
declarations
new delete template
private protected
NULL static
function overloading

@brandon_rhodes
dispatch only on obj, not other args
everything “virtual” (is that what they’re called) in inheritance