Activation Energy


Brandon Rhodes

PyConAr Keynote
Buenos Aires • 23 Noviembre 2018
activation energy



activation energy

Chemistry

the

activation energy

of a chemical reaction is the
energy needed for two molecules
to react when they collide
activation energy

many molecules would like
to react but cannot if they
collide at low speed
wood + air



wood + air

room temperature: no reaction
high temperature: fire!
But for some reactions
there is an alternative
to high temperature
catalyst is an element or enzyme
that lowers activation energy

once you add the catalyst,
the reaction proceeds!
What are the catalysts
that lower the amount of energy
you need to solve a problem
with a computer program?
Catalysts

Tools
Ecosystem
Community
Tools
Always learn the details
of how your tools work
Triple-click
command + A = whole field
single-click + drag = characters
double-click + drag = whole words
triple-click + drag = whole paragraph or line
Another example: “Emacs macro”

1. Remembers keystrokes
2. Lets you run them again
Emacs macro

C-x ( ...keystrokes... C-x )

C-x e
emacs -nw -q ~/talks/2018-11-pyconar/example.py
Emacs macros let me automate
what I would otherwise do by hand
Another example:

The UNIX command line is
famous for making it easy
to select, sort, and tally text data
Example

Q: Which services restart most often?

A: Use the famous double-sort
$ cd /var/log
$ ltr
$ cd /var/log
$ less syslog.1 |
  grep 'systemd.*Started' |
  sed 's/.*Started //' |
  sort |
  uniq -c |
  sort -n
UNIX commands and pipelines lower
the activation energy needed
to automate text processing
Spreadsheets

For business, the most popular app ever
because it also offered an interactive
relationship with data
Tools

Scripting languages were invented
to lower the activation energy
necessary to write a program
$ less /var/log/syslog |
  grep elapsed |
  awk -F'elapsed' '{n += $2} END {print n}'
Python
activation energy

Q: How annoying does a task have to be
before you replace it with a script?
activation energy

A: It depends on whether your tools
make automation easy or hard
How can I lower
the activation energy of saving
a command for future use?
When it takes several minutes
to research a command I need to run,
I paste it into a file:

diary.txt
diary.txt is for rare commands
that I don’t want to forget,
but don’t need to run often

What about commands I need all the time?
PATH=$HOME/bin:$PATH
I keep my scripts under version control
$ cd ~
$ git status
On branch master
Your branch is up-to-date
/home/brandon/.gitignore:

*
git add -f bin/new_script
It can be annoying
for your home directory
to always be under version control

• Prompt would always show branch
• Could commit by accident
I have a command that renames:

.git.git-stowed
Q: How can my scripts
live in their own namespace
separate from normal commands?


Q: How can my scripts
live in their own namespace
separate from normal commands?

A: Through a unique prefix!
~ ! @ # $ % ^ & * ( ) _ +
` 1 2 3 4 5 6 7 8 9 0 - =

Q W E R T Y U I O P { } |
q w e r t y u i o p [ ] \

  A S D F G H J K L : "
  a s d f g h j k l ; '

   Z X C V B N M < > ?
   z x c v b n m , . /
~ ! @ # $ % ^ & * ( ) _ +
` 1 2 3 4 5 6 7 8 9 0 - =

Q W E R T Y U I O P { } |
                    [ ] \

  A S D F G H J K L : "
                    ; '

   Z X C V B N M < > ?
                 , . /
$ ls /usr/bin | head -12

2to3
2to3-2.7
2to3-3.5
411toppm
FvwmCommand
GET
HEAD
POST
X
X11
~ ! @ # $ % ^ & * ( ) _ +
` 1 2 3 4 5 6 7 8 9 0 - =

Q W E R T Y U I O P { } |
                    [ ] \

  A S D F G H J K L : "
                    ; '

   Z X C V B N M < > ?
                 , . /
~ ! @ # $ % ^ & * ( ) _ +
` 1 2 3 4 5 6 7 8 9 0 - =

                    { } |
                    [ ] \

                    : "
                    ; '

                 < > ?
                 , . /
~ ! @ # $ % ^ & * ( ) _ +
`                     - =

                    { } |
                    [ ] \

                    : "
                    ; '

                 < > ?
                 , . /
Cannot prefix my commands
with a character that is special
to the shell or filesystem
~ ! @ # $ % ^ & * ( ) _ +
`                     - =

                    { } |
                    [ ] \

                    : "
                    ; '

                 < > ?
                 , . /
@                 _ +
                  -




                :



               , .
@ _ + : — require Shift
- — name of shell built-in
,perfect!
So all of my custom scripts
tab-complete starting with ,

Guaranteed never to conflict
with system commands
, + TAB shows them all!
Some of my scripts are simple
some are complex

All of them are designed
to lower the activation energy
of tackling a particular problem
,compose-keys does “less”
python -m SimpleHTTPServer
What if you try to run two?
$ python -m SimpleHTTPServer
Traceback (most recent call last):
  ...
socket.error: [Errno 98] Address already in use
,simplehttpserver
# 1.
r = SimpleHTTPRequestHandler
port = 8002
while True:
    try:
        httpd = HTTPServer(('', port), r)
    except socket.error:
        port += 1
    ...
# 2.
webbrowser.open(url)
Instantly lets me browse
a static directory
# Sometimes I accidentally use the public URL

$ git clone https://github.com/my-acct/repo.git
.
.
.
$ git push
Username for 'https://github.com':
Password for 'https://github.com':
remote: Anonymous access denied.

# Because only SSH lets me “push”
,github-switch-url
.git/config

https://github.com/my-acct/repo.git
git@github.com:brandon-rhodes/homedir.git
It’s so fast!

git push
(read the error)
,github-switch-url
git push
,watch

Wraps a while loop
around inotify

,watch ./run-tests.sh -- *.py
,setup

Finally, the master script!

* apt install everything I use
* Configures locales, packages, services
* Many obscure tweaks I would forget
On a new laptop:

1. Install Ubuntu
2. Install git
3. git clone home directory
4. Run the ,setup script
5. Log out & back in
,setup

My own personal
“devops”
Tools

Good language, editor, scripts

To lower the activation energy
of using our computer to solve problems,
what else do we need besides good Tools?
I will offer an example
I was looking
at old Grand Cañon maps
but they were listed by name

map-names.png
But where is each map on the globe?
Q: Could I create an overview map?
github.com/brandon-rhodes/build-butchart-map-index
HTML → map names

easy
Python!
for line in search_result_html:
    if 'results_tn_img' in line:
        name = line.split('alt="')[1].split(',')[0]
Emmett Wash, Ariz.
Navajo Mountain, Utah - Ariz.
Mt. Dellenbaugh, Ariz.
Separation Canyon, Ariz.
map names → coordinates

easy
Python!
And open government data

United States policy is that taxpayers
have already paid for government geographic maps,
because taxes paid the surveyor and geographer salaries,
so taxpayers should not be charged a 2nd time
www.usgs.gov/faqs/where-can-i-get-index-us-topo-maps

for standard topographic maps in the US Topo series”
import csv
r = csv.reader(open('topomaps_all.csv'))
for row in r:
    name, scale = row[3], row[5]
    if (name, scale) not in targets:
        continue
    q = Quad(name, scale, ...)
Camp Verde    35.0  -112.0  34.5  -111.5
Blue Spring   36.25 -111.75 36.0  -111.5
Cameron       36.0  -111.5  35.75 -111.25
                      
coordinates → JavaScript

easy
Python
import json
j = json.dumps(data)

f = open('quad_data.js', 'w')
f.write('var quad_data = {};\n'.format(j))
JavaScript data → map

easy?
Google Maps

with-google-maps.png
Google Maps
requires API key

too much work

too high an activation energy
searched and found “Leafletjs”

Map tiles by Stamen Design, CC BY 3.0 — Map data © OpenStreetMap
var map = L.map('map');
map.fitBounds(lat1, lon1, lat2, lon2);
for (var i = 0; i < quads.length; i++) {
    var bounds = [q.lat1, q.lon1, q.lat2, q.lon2];
    L.rectangle(bounds, {color: '#ff7800'});
    ...
One afternoon was enough time

with-stamen-design.png
How did I take so many steps
in a single afternoon?
HTML → map names
map names → coordinates
coordinates → JavaScript data
JavaScript data → map




HTML → map names
map names → coordinates
coordinates → JavaScript data
JavaScript data → map

Tools


HTML → map names
map names → coordinates
coordinates → JavaScript data
JavaScript data → map

Tools
+
Ecosystem
Good languages are not enough

If all I had was Python and JS,
I would never have built the map index!

The activation energy required would
have been too high
common problems → Open Source → Ecosystem

with-stamen-design.png
Tools
+
Ecosystem
+
…what’s the last ingredient?
story
Backpacking

“I will help my friend learn Python”
github.com/brandon-rhodes/backpacking-planner
# Input 1: trail mileages

    South Kaibab Trailhead
1.5 Cedar Ridge
3.0 Skeleton Point
1.4 Tip Off
2.6 CBG Bright Angel Campground
# Input 2: route

route = [
    'South Kaibab Trailhead',
    'Skeleton Point',
    'Tip Off',
    'Skeleton Point',
    'South Kaibab Trailhead',
]
# Output: miles hiked along route

                         Miles
                         hiked
South Kaibab Trailhead     0.0
Cedar Ridge                3.0
Skeleton Point             4.4
Cedar Ridge                5.8
South Kaibab Trailhead     8.8
As I wrote the script,
I imagined explaining the
code to a new programmer
First: how can we represent distances
when they are not properties of waypoints
but of the paths between waypoints?
(Computer Science:
“How do we attach properties
not to nodes but edges?)

trails.svg
tuple, list, dict

What will the new programmer
invent to represent the edges?
One possibility:
edges = [
    ('South Kaibab Trailhead', 'Cedar Ridge', 1.5),
    ('Cedar Ridge', 'Skeleton Point', 1.5),
    ('Skeleton Point', 'Tip Off', 1.4),
]
Simple loop to find an edge:
def find_distance(waypoint1, waypoint2):
    for w1, w2, distance in edges:
        if w1 == waypoint1 and w2 == waypoint2:
            return distance
    raise ValueError('not found')
Problem: what if they are
traveling the other way?
edges = [
    ('South Kaibab Trailhead', 'Cedar Ridge', 1.5),
    ('Cedar Ridge', 'Skeleton Point', 1.5),
    ('Skeleton Point', 'Tip Off', 1.4),
]
def find_distance(waypoint1, waypoint2):
    for w1, w2, distance in edges:
        if w1 == waypoint1 and w2 == waypoint2:
            return distance
        elif w1 == waypoint2 and w2 == waypoint1:
            return distance
    raise ValueError('not found')
If the student asks —

“Why two if statements? Can we do one?”

— you can teach the concept of
a “canonical” representation
# New rule: alphabetical order

edges = [
    ('South Kaibab Trailhead', 'Cedar Ridge', 1.5),
    # ^ FIX

    ('Cedar Ridge', 'Skeleton Point', 1.5),
    ('Skeleton Point', 'Tip Off', 1.4),
]
edges = [
    ('Cedar Ridge', 'South Kaibab Trailhead', 1.5),
    ('Cedar Ridge', 'Skeleton Point', 1.5),
    ('Skeleton Point', 'Tip Off', 1.4),
]
def find_distance(waypoint1, waypoint2):
    waypoint1, waypoint2 = sorted([waypoint1, waypoint2])
    for w1, w2, distance in edges:
        if w1 == waypoint1 and w2 == waypoint2:
            return distance
    raise ValueError('not found')
But maybe the student
will not want to loop
to find every single edge

“Searching a list takes time
they told me dictionaries are faster!”
edges = {
    'South Kaibab Trailhead': {'Cedar Ridge': 1.5},
    'Cedar Ridge': {
        'South Kaibab Trailhead': 1.5,
        'Skeleton Point', 1.5,
    },
    'Skeleton Point': {
        'Cedar Ridge': 1.5,
        'Tip Off', 1.4,
    },
    'Tip Off': {'Skeleton Point': 1.4},
}
# No loop is required

miles = edges[w1][w2]
But each edge is stored twice
edges = {
    'South Kaibab Trailhead': {'Cedar Ridge': 1.5},
    'Cedar Ridge': {
        'South Kaibab Trailhead': 1.5,
        'Skeleton Point', 1.5,
    },
    'Skeleton Point': {
        'Cedar Ridge': 1.5,
        'Tip Off', 1.4,
    },
    'Tip Off': {'Skeleton Point': 1.4},
}
This is a such
a nice illustration
of data design!
flat or nested

relation or hierarchy

orthogonal or redundant
flat or nested
relation or hierarchy
orthogonal or redundant

These are the fundamental
questions of data design!
Once the new programmer
has tried both the flat list
and the nested dictionaries,
you could explain a relational database
Relation or hierarchy?
“A relational database does both!”

CREATE TABLE… -- flat orthogonal list of tuples
CREATE INDEX… -- fast redundant hierarchy
Another very nice
puzzle for the beginner:

To build the edges,
they will need to convert
items into pairs

a, b, c, d → (a,b) (b,c) (c,d)

How?
# Maybe they use indexing

for i in range(len(route) - 1):
    print(route[i], route[i+1])
But they might also discover
they can carry state from one
iteration to the next
# Carrying state

previous_waypoint = route[0]

for waypoint in route[1:]:
    print(previous_waypoint, waypoint)

    previous_waypoint = waypoint
Having loaded the data and stored it,
it was time to accomplish the goals!
Print total miles hiked


Print total miles hiked
for and + and print


Print miles since water


Print miles since water
if
The script worked!

I was ready to use it
to plan a backpacking trip
Problem

Problem
UX disaster
It was so tedious!
I had to list every waypoint
along every route!
# Instead of:

South Kaibab Trailhead
Bright Angel Campground

# I had to list every single waypoint:

South Kaibab Trailhead
Cedar Ridge
Skeleton Point
Tip Off
Bright Angel Campground
Bonus goal:
Find the path between waypoints!
South Kaibab Trailhead
Bright Angel Campground

          

South Kaibab Trailhead
Cedar Ridge
Skeleton Point
Tip Off
Bright Angel Campground
I sat down to write the code
and realized



I sat down to write the code
and realized

this was not going to be easy
to explain to my friend!
Theory from Computer Science
Problem #1

Need Dijkstra’s Algorithm


Problem #1

Need Dijkstra’s Algorithm

“Well, that escalated quickly”
It was unlikely that my friend
would invent Dijkstra’s Algorithm
on her own while learning Python
Problem #2

I thought I knew Dijkstra’s Algorithm —



Problem #2

I thought I knew Dijkstra’s Algorithm —

but when I tried, I wrote it wrong

Problem #2

I thought I knew Dijkstra’s Algorithm —

but when I tried, I wrote it wrong
twice

trails.svg
def find_path(start, end):
    visit_next = [(0.0, start)]
    while True:
        visit_next.sort()
        miles, w = visit_next.pop(0)
        for miles2, w2 in waypoints_connected_to(w):
            visit_next.append((miles + miles2, w2))
You are finished when you reach the end
def find_path(start, end):
    visit_next = [(0.0, start)]
    while True:
        visit_next.sort()
        miles, w = visit_next.pop(0)
        for miles2, w2 in waypoints_connected_to(w):
            if w2 == end:
                return
            visit_next.append((miles + miles2, w2))
Problem: I made a mistake
I put the if in the wrong place!
def find_path(start, end):
    visit_next = [(0.0, start)]
    while True:
        visit_next.sort()
        miles, w = visit_next.pop(0)
        for miles2, w2 in waypoints_connected_to(w):
            if w2 == end:
                return # WRONG
            visit_next.append((miles + miles2, w2))
def find_path(start, end):
    visit_next = [(0.0, start)]
    while True:
        visit_next.sort()
        miles, w = visit_next.pop(0)
        if w == end:
            return  # RIGHT
        for miles2, w2 in waypoints_connected_to(w):
            visit_next.append((miles + miles2, w2))
Next, you want
to avoid backtracking
to an earlier waypoint
def find_path(start, end):
    visit_next = [(0.0, start)]
    visited_already = {start}
    while True:
        visit_next.sort()
        miles, w = visit_next.pop(0)
        if w in visited_already: continue
        if w == end: return
        for miles2, w2 in waypoints_connected_to(w):
            visit_next.append((miles + miles2, w2))
            visited_already.add(w2)
Problem: I made a mistake
def find_path(start, end):
    visit_next = [(0.0, start)]
    visited_already = {start}
    while True:
        visit_next.sort()
        miles, w = visit_next.pop(0)
        if w in visited_already: continue
        if w == end: return
        for miles2, w2 in waypoints_connected_to(w):
            visit_next.append((miles + miles2, w2))
            visited_already.add(w2)  # WRONG
def find_path(start, end):
    visit_next = [(0.0, start)]
    visited_already = {start}
    while True:
        visit_next.sort()
        miles, w = visit_next.pop(0)
        if w in visited_already: continue
        visited_already.add(w)  # RIGHT
        if w == end: return
        for miles2, w2 in waypoints_connected_to(w):
            visit_next.append((miles + miles2, w2))
So a “simple feature” for my “simple program”
involved a Computer Science theory algorithm
that I myself could not write correctly


So a “simple feature” for my “simple program”
involved a Computer Science theory algorithm
that I myself could not write correctly

twice
I want to tell my friend
“Python makes problems easy!”
But instead she is going to think

But instead she is going to think
“programming is impossible and we have no hope”
What was missing?
Python
+
Ecosystem
+
?
Problem: I was imagining my friend
programming alone and trying to invent
everything by herself
But I didn’t learn Dijkstra by myself

I learned it in school —
from a community of programmers
Community

School
University
Workplace
Local meetup
National conference
Online community
Stack Overflow
Cut and paste Dijkstra?
Maybe use networkx?
You need a community to learn
the sharp boundaries where problems
transform from easy to hard

— to learn the patterns
of common programming problems
to which we have already invented
common programming solutions
Always be looking for catalysts,
always look for how to save effort
to bring new problems into range
of your time and skills
To lower activation energy
and make a problem worth tackling you need —
Tools
+
Ecosystem
+
Community




Tools
+
Ecosystem
+
Community


@brandon_rhodes