The Contingent Build System

PyCon Colombia (remote)
Juneteenth, 2021
Brandon Rhodes
So!
I often use Sphinx
to write documentation

pip install sphinx
Sphinx is a framework
for writing documentation
using plain text + markup.

pip install sphinx
diagram01.png
Running Sphinx generates HTML

diagram02.png
When you first run Sphinx,
all the pages are built
preparing documents... done
writing output... [ 50%] home
writing output... [100%] tutorial
But if you run Sphinx a second time,
it detects that no input files have
been edited and rebuilds nothing
no targets are out of date.
build succeeded.
If you then edit one file and run Sphinx,
then Sphinx rebuilds only one target
preparing documents... done
writing output... [100%] tutorial
Sphinx saves time
by skipping files whose
HTML is already up-to-date
Sphinx, thus, is one example
of what’s called a build tool
make
April 1976
Stuart Feldman

Bell Laboratories
UNIX project
make
Still in use today!
The make tool
is general-purpose

It doesn’t need built-in knowledge
of your programming language
or document system

Instead, you describe a build
by writing a Makefile
(Aside: why the capital M,
when UNIX folks love
lowercase?

Makefile
A: When the computer sorts filenames
by their chr() code, capitals go first!
!"#$%&'()*+,-./0123456789:;<=>?
@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_
`abcdefghijklmnopqrstuvwxyz{|}~
$ ls

LICENSE
Makefile
README
home.html
home.rst
tutorial.html
tutorial.rst
If ls on your laptop
doesn’t sort capital letters first,
set LC_COLLATE=C in your .bashrc
to experience UNIX as it was intended.)
Anyway, the Makefile lists
the tasks for make to perform:
  1. Output files (“target”)
  2. Input files (“dependency”)
  3. The command to build each target
Example:

Let’s write a Makefile
to build the two Sphinx pages
we saw earlier.
diagram03.png
If we didn’t have Sphinx
to do the building for us,
we might write this:
# Makefile

home.html: home.rst
        rst2html.py $< $@

tutorial.html: tutorial.rst
        rst2html.py $< $@
Those two tasks are each a single step,
but build systems are often given
work that is several tasks deep
diagram04.png
# Makefile

program: main.o sort.o utils.o
        ld -o program main.o sort.o utils.o

main.o: main.c
        cc -c $<

sort.o: sort.c
        cc -c $<

tree.o: tree.c
        cc -c $<
Just like Sphinx, make does
nothing if no files have changed
Make only builds a target if:
  1. The output file is missing
  2. The input file has been edited
Like Sphinx, when make detects
that an input file has been edited,
it rebuilds only the targets
affected by that edit.
diagram05.png
Once you learn about make,
you will start to see little
build systems everywhere
As we’ve already seen,
the Sphinx tool has a little
build system inside
Another example: Python’s distutils
in the Standard Library has a little
build system inside of it!
# setup.py
module1 = Extension('demo', sources=[
    'main.c',
    'sort.c',
    'util.c',
])
A final example:
the Python interpreter itself
contains a tiny build system!
foo.pyfoo.pyc
(Aside: have you ever removed foo.py
but found you can still import the module foo
because you forgot to remove foo.pyc? —
— Back in 2004, I set

PYTHONDONTWRITEBYTECODE=TRUE

in my .bashrc and over
the subsequent 17 years, I
have never regretted it!)
So: build systems
are everywhere
But build systems
for documents always seem
to have the same problem:


But build systems
for documents always seem
to have the same problem:

Cross references
diagram06.png
What if I edit the
title of tutorial.rst?
TutorialMath Tutorial
--------   -------------
diagram07.png
diagram08.png
preparing documents... done
writing output... [100%] tutorial
diagram09.png
There’s just one problem.

Check out the cross reference
in home.html — it’s still
using the old title!
diagram10.png
Sphinx didn’t realize
that home.html depends
on text that lives over
in tutorial.html.
This happens all the time,
so we programmers have learned
ways around the problem.
touch home.rst
to force a rebuild
rm -r build/ && sphinx-build
But even though we can work around
the problem, it raises a big question.
Big Question:

Could this be fixed?

Could we design a build system
that understands cross references
and always knows what to rebuild?
I won’t leave you in suspense
Answer:

Yes!

Yes, we can.
I helped write a successful prototype
solution as an open source project.
For their 500 Lines or Less book,
the Architecture of Open Source Applications (AOSA) project
asked me and computer scientist Dr. Daniel Rocco
contribute a small project and a chapter
The task they gave us:
  1. Write a project in ≤500 lines of code
  2. Discuss its software design in a book chapter
We decided: it was time to tackle
the cross reference problem!

Chapter 4
“Contingent: A Fully Dynamic Build System”
So how do you
design a build system?

Happily, the solution is very well-known
Build System Design
  1. Dependency graph
  2. Detect changes
  3. Rebuild in the right order
1. Dependency graph
“graph”

diagram11.png
There are several ways
to represent a graph in
a computer program
diagram12.png
[('main.c', 'main.o'),
 ('tree.c', 'tree.o'),
 ('main.o', 'program'),
 ('tree.o', 'program')]
diagram13.png
{'main.c': {},
 'tree.c': {},
 'main.o': {'main.c'},
 'tree.o': {'tree.c'},
 'program': {'main.o', 'tree.o'}}
Graphs are so popular,
you can even choose to skip
building your own representation —
So: there are plenty of ways
for a build system to represent the
links between dependencies and targets
2. Detect Changes

Three well-known approaches
a. Compare file modify times
(make, distutils, Sphinx)
from os import stat
r = stat('home.rst').st_mtime
h = stat('home.html').st_mtime
if r > h:
    print('Out of date!')
b. Compute file checksums
(Bazel, Gradle)
c. Subscribe to live OS file
events using inotify






c. Subscribe to live OS file
events using inotify
(Contingent!)

I use Contingent to build my website
and don’t want to waste time running
a command — I like it to rebuild
the moment I hit save!
So, there are several ways
to detect file changes

Once you detect a change,
it’s time to rebuild
3. Rebuild in the correct order
diagram14.png
What if both *.c files
have been modified?

Danger
If we rebuild in random order,
we might build one of the
targets twice!
Why?

If you don’t give your
build system any guidance,
here’s what it might do:
diagram15.png

Both main.c and tree.c have been edited.
diagram16.png

“Let’s rebuild main.o first”
diagram17.png

“Next, let’s rebuild the program”
diagram18.png


diagram19.png

“Is there anything left to build?”
diagram20.png

Oh, yes, now let’s rebuild tree.o
diagram21.png

Oh, no, that made “program” out-of-date again!
diagram22.png

Only a second build of “program” makes it fully up-to-date
Building files in random order can
make us waste time building a target
several times during the same run.
General lesson:

To avoid building a target twice,
wait until all its dependencies
have been rebuilt
Well-known solution:

Run a topological sort on the
targets that need to be rebuilt,
to put targets after dependencies
So that’s basic build system design
  1. Dependency graph
  2. Detect changes
  3. Rebuild in the right order
So!
What big lesson do Dr. Rocco and I teach
in our book chapter about Contingent?
Well, I’m known for my
writing on Design Patterns

So, maybe predictably,
we use our chapter that describes
Contingent to illustrate a Design Pattern
We focused on how Contingent
represents the dependency graph

Facade Pattern
+
Anonymous data structures
Old days
class Node: 
class Edge: 
class Graph: 
Modern
class Graph: 
Q: If you only have one class,
how does it store its data?

A: In “anonymous” data structures!
Q: What do I mean by an
“anonymous” data structure?

A: I mean data structures without
special domain-specific names
like Edge or Node
Instead, we use simple data structures
like tuples, lists, dictionaries, and sets
to represent the dependency graph
{'main.c': {},
 'tree.c': {},
 'main.o': {'main.c'},
 'tree.o': {'tree.c'},
 'program': {'main.o', 'tree.o'}}
Then, use the Facade Pattern to hide
those data structures behind a single
big class that gives them meaning
class Graph:
    def add_edge(name1, name2): 
    def remove_edge(name1, name2): 
    def nodes_upstream_of(name): 
    def nodes_downstream_from(name): 
Our Graph methods speak the graph “domain language”
of “nodes” connected by “edges” but stores them
using plain old Python data structures!
Facade Pattern
+
Anonymous data structures

We argued that this Pythonic pattern
produces simpler code than a forest
of domain-specific classes
like Node and Edge
But re-reading Contingent’s code today,
I think there’s a seconds lesson that
Dr. Rocco and I could have drawn!
Yes, we usually think of a Design Pattern
as something we add to our code.

But I now think that it can be just as important
to see a Design Pattern and remove it!
Let’s now make explicit
the innovation we wanted
to make in Contingent!
diagram23.png
Build systems usually build their graphs
driven only by filenames.

We wanted Contingent to understand
dependencies that arise from file content!
I now want to show an example
with real parsing and rendering code



I now want to show an example
with real parsing and rendering code

Problem: parsing *.rst
files is pretty complex!
So let’s define a simpler markup language!
  1. Render *.in*.out
  2. Most text will be copied verbatim.
  3. But `filename` will be replaced with that file’s title.
  4. What’s a file’s “title”? Its first line of text!
diagram24.png
Happily, it’s easy to parse
this markup format!
text = 'Read `tutorial.in` and `guide.in`.'
pieces = text.split('`')

#     0          1            2         3        4
# ['Read ', 'tutorial.in', ' and ', 'guide.in', '.']
#
# Even index: text
# Odd index: document filename

for i in range(1, len(pieces), 2):
      # Loops over filenames!
So —
How can we explain to Contingent that
home.in needs the title from tutorial.in?
Every build system I had ever seen:
  1. Builds the dependency graph
  2. Starts running commands
That was always the order.
So I tried imagining the same thing:
# First idea: give framework two functions.
# One finds dependencies to build the graph,
# the other builds the output file.

def learn_dependencies(path, text):
    yield 'text_of', path
    t = text.split('`')
    for i in range(1, len(t), 2):
        yield 'title_of', t[i]

def render(text, title_dict):
    t = text.split('`')
    for i in range(1, len(t), 2):
        t[i] = title_dict[t[i]]
    return ''.join(a)
Two-phase approach

First step tells Contingent
the dependency → target edges
needed to build the graph.

Second step renders the output.
I then stepped back
and looked at the
sample code.
My reaction?
Ugh!
It made me unhappy.
Why?

As I stared at the code, I realized
that it reminded me of something.
# Have you ever seen “callback” code?

def step1(sock):
    sock.connect(('python.org', 80), step2)

def step2(sock):
    sock.sendall('data', step3)

def step3(sock):
    sock.close()
# Callbacks are SO MUCH WORK!
# All you wanted was normal
# procedural code, like:

sock.connect(('python.org', 80))
sock.sendall('data')
sock.close()
# But instead, some framework forced you to
# atomize your code into a maze of goto’s.

def step1(sock):
    sock.connect(('python.org', 80), step2)

def step2(sock):
    sock.sendall('data', step3)

def step3(sock):
    sock.close()
Well. Drat.
That’s exactly how I had just
designed Contingent: it needs each
task written as callback code!
def learn_dependencies(text):
    yield 'text_of', path
    t = text.split('`')
    for i in range(1, len(t), 2):
        yield 'title_of', t[i]

def render(text, title_dict):
    t = text.split('`')
    for i in range(1, len(t), 2):
        t[i] = title_dict[t[i]]
    return ''.join(a)
Just how bad is callback code?

It’s so bad that Python 3 added
two new keywords (“async” and “await”)
simply to help avoid the Callback Pattern.
Usually we think of a Design Pattern
as something we bring into our code.
(Facade Pattern → Contingent)

But in this case, I discovered a
pattern already in my design
that I wanted to remove.
Q: I know how to pivot
from callback network code
to procedural network code.
def step1(sock):
    sock.connect(('python.org', 80), step2)

def step2(sock):
    sock.sendall('data', step3)

def step3(sock):
    sock.close()
sock.connect(('python.org', 80))
sock.sendall('data')
sock.close()
Can I now do the same thing
with text rendering code?
Yes!
def render(path):
    pieces = read(path).split('`')
    for i in range(1, len(pieces), 2):
        pieces[i] = title_of(pieces[i])
    return ''.join(pieces)

def title_of(path):
    return read(path).splitlines()[0]

def read(path):
    return open(path).read()
I love how code looks! But —
Q: How can Contingent run this code
while learning its dependencies?
A:
All problems in computer
science can be solved
by another level
of indirection”

— David Wheeler
@task
def read(path):
    ...

@task
def title_of(path):
    ...

@task
def render(path):
    ...
By giving Contingent a @task decorator,
it can assume full control of each build step, even
though they are written as simple procedural code!
1. Contingent can now observe calls between tasks.
diagram25.png
diagram26.png
diagram27.png
diagram28.png
diagram29.png
diagram30.png
diagram31.png
diagram32.png
Thanks to Contingent’s decorator,
(a) we now know the call graph, and
(b) we rendered the first page successfully!
Let’s render the other page.
diagram33.png
diagram34.png
Wait! Wait!
Isn’t read('tutorial.in') a
task that we have already run?
diagram35.png
This is a problem specific to Contingent.

Because make deals with files,
a target only needs to be built once.
The file can then be read over and over.
But Contingent has made a crucial pivot:
its nodes aren’t files; they are calls!

nounsverbs
diagram36.png
Q: Can Contingent, with its graph
of verbs, still avoid repeating an action?

A: Yes! It can use memoization.
diagram37.png
By caching the return
value of the first call to

read('tutorial.in')

Contingent makes sure that
the task only runs once.
Victory!

This is a complete
and workable design.
Endgame

What does Contingent
do when a file changes?

Removes out-of-date return
values from the memoization cache
and re-runs those tasks.
Running Contingent
to rebuild my blog
feels magic!
Documents are rebuilt
with a minimum of work, and —

for the first time in my
long history of using document
build systems

— the output is always correct.
rm -r build/
rm -r build/
Contingent builds and maintains
a complete call graph, while all the
programmer needs to do is write
plain procedural functions!
<Conclusion>

Don’t learn patterns in isolation

That only prepares you
to drop in a pattern when
you see a matching problem.
Instead, learn patterns
as alternatives to each other

Callbacks ↔ Procedures
listdict
How often do you have the chance
to try writing a second solution
to a programming problem?

It’s a crucial exercise
as you grow as a programmer!
When you don’t have time
to write two complete solutions,
learning to prototype can be crucial
to testing out different paths before
committing to one or the other.
You will be a much stronger programmer
when you know not only the code
you are currently writing,

but how the other possible
approaches would have turned out
that you could have been writing instead.

</Conclusion>
I know that, since PyCon Colombia is remote,
you won’t be able to simply walk up after my talk
and ask about my slides, or Python, or our community.
But, please, if you want to talk,
find me on Slack this weekend,
I would love to learn more about what
you are doing with Python in Colombia!

white.png



But, please, if you want to talk,
find me on Slack this weekend,
I would love to learn more about what
you are doing with Python in Colombia!

flag.png

@brandon_rhodes
Thank you for listening!