Hoisting Your I/O

@brandon_rhodes
PyWaw Summit 2015
Warszawa
We run software
for the sake of
its side effects

Note

All else in RAM is lost — if your code produces NO side-effects, it will generate only heat — only Brownian motion!

Yet we routinely
criticize sub-routines
for having side effects

Q: What does this code do?

def parse_hosts_file(path):
    with open(path) as f:
        for line in f:
            host = line.strip()
            print(host)


def parse_hosts_file(path):
    with open(path) as f:
        for line in f:
            host = line.strip()
            print(host)

A: It returns None

# Servers

www1.example.com
www2.example.com
mq.example.com

# Admin machines

merry.example.com
pippin.example.com
def parse_hosts_file(path):
    with open(path) as f:
        for line in f:
            line = line.strip()
            if line:
                if not line.startswith('#'):
                    print(line)
def parse_hosts_file(path):
    with open(path) as f:
        for line in f:
            line = line.strip()
            if not line:
                continue
            if line.startswith('#'):
                continue
            print(line)
parse_hosts_file('good_hosts.txt')

New requirement

We now need to POST
each hostname to an API
requests.post(url, data={'allow': host})

Q:

How can our code support
both print() and post()?
def parse_hosts_file(path):
    with open(path) as f:
        for line in f:
            line = line.strip()
            if not line:
                continue
            if line.startswith('#'):
                continue
            print(line)

Note

The hostnames still need to be printed when called from the original location — but now the text needs to be POSTed to an API as well.

Q: How will we change our code?

A: Well — do we need to?
This is Python!
It’s dynamic!

One solution

from mock import patch

hosts = []
with patch('builtins.print', hosts.append):
    parse_hosts_file('good_hosts.txt')

for host in hosts:
    requests.post(url, data={'allow': host})
from io import StringIO

sio = StringIO()
with patch('sys.stdout', sio):
    parse_hosts_file('hosts.txt')

for host in sio.getvalue().split():
   requests.post(url, data={'allow': host})

Another solution

Q:

Would you use
mock.patch() to support
code re-use in production?

A:

No!


A:

No!
You would make the
code more general

Q:

If you consider patch() an
anti-pattern in production code—

Q:

If you consider patch() an
anti-pattern in production code—
why are you doing
it in your tests?

Q:

If you consider patch() an
anti-pattern in production code—
can you hold your unit tests
to the same high standard?
How you really solve
the problem of needing
not only print() but post()?
n=1 → n=2
or
n=1 → any n
total_shares += s
if symbol == 'IBM':
    ibm_shares += s
elif symbol == 'APPL':
    apple_shares += s
if symbol == 'IBM':
    ibm_shares += s
elif symbol == 'APPL':
    apple_shares += s
elif symbol == 'GOOG':
    google_shares += s
elif symbol == 'FB':
    facebook_shares += s
n=1 → n=2 → n=3 → n=4

n=1 → n=2 → n=3 → n=4
Only winning move is not to play

n=1 → any n

any n

shares = {}
for ...:
    shares[symbol] = shares.get(symbol, 0) + n
For parse_hosts_file(),
what do n=2 and “any n” look like?

n=2

def parse_hosts_file(path, url=None):
    ...
        if url is None:
            print(host)
        else:
            data = {'allow': host}
            requests.post(url, data=data)

Several “any n” solutions

def parse_hosts_file(path):
    hosts = []
    with open(path) as f:
        for line in f:
            line = line.strip()
            if not line:
                continue
            if line.startswith('#'):
                continue
            hosts.append(line)
    return hosts

Note

But if there are billions of hosts, this can exhaust RAM, and offers high latency.

def parse_hosts_file(path, use_host):
    with open(path) as f:
        for line in f:
            line = line.strip()
            if not line:
                continue
            if line.startswith('#'):
                continue
            use_host(line)
def parse_hosts_file(path):
    with open(path) as f:
        for line in f:
            line = line.strip()
            if not line:
                continue
            if line.startswith('#'):
                continue
            yield line

Top-level glue code

for host in parse_hosts_file('hosts.txt'):
    print(host)

# or

for host in parse_hosts_file('hosts.txt'):
    requests.post(url, data={'allow': host})
for host in parse_hosts_file('hosts.txt'):
    requests.post(url, data={'allow': host})
We have Hoisted our I/O = “Input/Output”
code up to the program’s top level

Coupled I/O

top ────────┐           ┌─────────
                       
            │↓ call     │↑ return
                       
subroutine  └──print()──┘

Hoisted I/O

top ────────┐       ┌──print()──
                   
                       
                   
subroutine  └───────┘

Hoisted I/O

top ────────┐       ┌──requests.post()──
                   
                   
                   
subroutine  └───────┘

Note

The parse_hosts_file() routine
is still tightly coupled to I/O!
def parse_hosts_file(path):
    with open(path) as f:
        for line in f:
            line = line.strip()
            if not line:
                continue
            if line.startswith('#'):
                continue
            yield line

pattern

If you see a filename
try passing a file instead
def parse_hosts_file(path):
    with open(path) as f:
        for line in f:
            line = line.strip()
            if not line:
                continue
            if line.startswith('#'):
                continue
            yield line
# Simpler: let caller do the open()

def parse_hosts_file(f):
    for line in f:
        line = line.strip()
        if not line:
            continue
        if line.startswith('#'):
            continue
        yield line
# But, wait - it is no longer
# limited to working with files!

def parse_hosts(lines):
    for line in lines:
        line = line.strip()
        if not line:
            continue
        if line.startswith('#'):
            continue
        yield line

Coupled I/O

top ────────┐                 ┌─────────
                             
                             
                             
subroutine  └─open()──print()─┘

Hoisted I/O

top ──open()──┐           ┌──print()──
                         
                            
                         
subroutine    └───────────┘

All I/O now lives in top-level glue code

with open('hosts.txt') as f:
    for host in parse_hosts(f):
        print(host)

3 patterns

  1. Data structures.
  2. Generators and iterators.
  3. Database facades.

Data Structures

Check out the architecture
of django migrate

Note

Andrew Godwin

2013 blog post

“Last week I was at DjangoCon EU,
in Warsaw, Poland, and I had
a fantastic time—

Andrew Godwin

2013 blog post

“good discussions with fellow core
developers and Django and South users,
to clear up some more thoughts”

Andrew Godwin

http://www.aeracode.org/2013/5/30/what-operation/ http://www.aeracode.org/2013/10/23/flat-pancake/

Andrew wound up re-imagining
database migrations, which often drive
side-effects directly, as transformations
of in-memory data structures

Generators and iterators

As we have already glimpsed,
generators let you compose data
processing steps like you would
on the Unix command line

Database facades

For the complex situation
where an initial data fetch
drives further operations
What is the user ID
for this username?    
                        DB
To which team does    
user 94914 belong?    
                        DB
Which other users     
belong to team 38135?
class UserStorage:
    def get_user_by_username(username):
        ...
    def get_team_by_user_id(username):
        ...
    def get_members_by_team_id(username):
        ...

Database Facade

One implementation of the facade
will lets a test give scripted responses,
while another will talk to a real DB

3 patterns

  1. Data structures.
  2. Generators and iterators.
  3. Database facades.

Problems with Design Patterns

  1. You can miss a good
    chance to use them
  2. You can use them when
    they have no benefit

Q:

What will signal to you
that a routine is tightly
coupled to its I/O?

A:

Your unit tests

Unit Tests

maintain correctness
and—

Unit Tests

maintain correctness
inspire re-usability
n=1 → n=2

n=1 → n=2
n=1 → any n
When you see patch() in
a unit test, it is signaling you
“the code under test is tightly coupled”

Q:

Are your unit tests
first-class code?
Or are they second-class code
that plays by different rules?

Python is a dynamic language

Dynamic

Python lets you test and re-use
code that does not deserve it



Dynamic

Python lets you test and re-use
code that does not deserve it
This has been so important these
last 25 years, because we are still
learning how to decouple code

Dynamic

Lets you treat a tightly
coupled architecture as though
it were parameterized

Dynamic

In Python, everything is a parameter!
modules
built-ins
methods

Dynamic

Good: Enjoy flexibility and
code re-use without having to
have a perfect architecture



Dynamic

Good: Enjoy flexibility and
code re-use without having to
have a perfect architecture
Bad: Python itself will place
little pressure upon your
code to be decoupled
Integration tests exercise your
code with the couplings in place
So they will offer little pressure
on your code to flip n=1 → any n

Unit tests

Are your chance to write every
routine so that it faces n=2 uses
from the very moment of its birth

Big Q:

Are you taking advantage of unit tests
to weigh upon your code architecture
as the second caller that every routine
has from the moment it is first written?


Big Q:

Are you taking advantage of unit tests
to weigh upon your code architecture
as the second caller that every routine
has from the moment it is first written?
Thank you!
@brandon_rhodes