Skyfield and 15 Years of Bad APIs



@brandon_rhodes

PyCon Canada
August 2013

Goal

To reflect upon the practice of
Python API design through my recent
work on the Skyfield astronomy library

a   b

Elwood Charles Downey et al
1990 Ephem
1993 XEphem
e·phem·er·is — A table
giving the coordinates of
a celestial body at a number
of specific times
CDT 19:00:00  4/30/1990 | LST    8:19:50 |
UTC  0:00:00  5/01/1990 |                |
JulianDat 2448012.50000 | Dawn      4:10 |
Watch                   | Dusk     22:15 |
Listing             off | NiteLn    5:55 |
Plot                off | NStep        1 |
Menu        Planet Data | StpSz RT CLOCK |
---------------------------------------------
OCX  R.A.    Dec    Az     Alt  H Long H Lat
Su  2:32.3  14:58 278:40  12:38 220:22
Mo  8:09.9  21:11 186:06  65:53 119:55   1:04
Me  2:49.4  17:39 277:48  17:26 214:08   1:43
Ve 23:49.4  -2:25 296:53 -27:39 282:39  -1:30
Ma 22:39.8 -10:09 308:17 -44:14 297:56  -1:43
Ju  6:30.9  23:23 235:13  59:04 106:16   0:08
Sa 19:49.6 -20:53  17:24 -65:14 289:45   0:10
xephem.png

©

XEphem’s author reserves
the right to distribute binaries,
but allows free download of the source
Missing header (.h)
X11/Intrinsic.h: No such file or directory
...
$ apt-file search X11/Intrinsic.h
libxt-dev: /usr/include/X11/Intrinsic.h
Missing library (-l)
/usr/bin/ld: cannot find -lXext
...
$ apt-file search libXext.a
libxext-dev: /usr/lib/i386-linux-gnu/libXext.a
xephem.png
Instead of using a GUI,
I wanted to write scripts
Inside XEphem’s C-language
source code is a computation
engine called libastro
XEphem    PyEphem
libastro  →  libastro
“You have my full permission
to go with what you have.”
— Elwood’s generous reply!

PyEphem = C code

Wrapper around libastro

1998
Beazley’s Simple Wrapper
Interface Generator (SWIG)

SWIG

Exposed awkward details of C structs
body = Obj()
body.any.type = ephem.PLANET
body.pl.code = ephem.SUN
ephem.computeLocation(circum, body)
print ephem.formatHours(o.any.ra, 36000)
print ephem.formatDegrees(o.any.dec, 3600)
2003
Hand-written C that uses
Python 2.2 superpowers
Python 2.2 made attribute
access easy to customize
static PyGetSetDef body_getset[] = {
  {"ra", get_ra, 0, "right ascension"},
  {"dec", get_dec, 0, "declination"},
  {"elong", get_elong, 0, "elongation"},
  {"mag", get_mag, 0, "magnitude"},
                 
}
And, I added a pure-Python wrapper
on top like hashlib, sqlite3, and ssl
# ephem/__init__.py

import _libastro

More Pythonic
edition of PyEphem
mars = ephem.Mars()
mars.compute()
print mars.ra, mars.dec
But, both interfaces
were still based on C code
and required compilation

Early 2000s

Subject: PyEphem Win32 build errors
Subject: win32, does PhEphem work there too?
Subject: trying to download but I cant unzip it

Late 2000s

Subject: pyephem on Mac PPC
Subject: pyEphem wont build on Snow Leopard
Subject: PyEphem  Installation error in opensuse
Subject: PyEphem on Ubuntu 10.10
Subject: Pyephem on a 64-bit Win 7 PC?

Mac

Sometimes a problem,
but now on MacPorts!

Windows

python setup.py bdist_wininst
mingw — Open, but quirky
Visual Studio Express — Works!

C extensions

They are difficult to—

I slowly grew open to the idea
of an alternative approach

An email

“I’m interested in ephemeris options
for astrology apps. You probably know
about the Swiss Ephemeris. Do you know
how it compares in accuracy?”

libastro

Predicts planetary
positions using VSOP87
1987

Swiss Ephemeris

“based upon the DE406,”
the JPL Long Ephemeris
1997

But wait!

# ftp://ssd.jpl.nasa.gov/pub/eph/planets/ascii/

Name         Date Modified


de405/       10/7/07 8:00:00 PM
de406/       3/22/11 8:00:00 PM

de421/       2/6/13  7:00:00 PM
de422/       8/3/11  8:00:00 PM
de423/       3/30/10 8:00:00 PM

DE421

More recent
More accurate
planetary navigation accuracies

Wait—what?

DE421 uses the
“International Celestial Reference Frame”

USNO Circular 179

George H. Kaplan
2005
“The … resolutions passed by
the International Astronomical Union
at its General Assemblies in 1997 and 2000
are the most significant set of international
agreements in positional astronomy in
several decades and arguably since
the Paris conference of 1896.”

Time for a rewrite!

Not a simple re-implementation
But writing, in Python, a complete
replacement of the old algorithms
used throughout PyEphem

Goals

Example: Sunrise

PyEphem — custom hand-written
version of Newton’s Method
New approach — an IPython Notebook that
finds sunrise with scipy.optimize

Skyfield

Tools

git
Jedi
py.test
tox

py.test

py.test --pyargs skyfield \
        --doctest-glob='*.rst'

py.test

assert c.cal_date(jd) == timescales.cal_date(jd)

I distribute tests

setup(
    ...
    packages=['skyfield', 'skyfield.tests'],
    ...
    )

I distribute docs

setup(
    ...
    package_data = {
        'skyfield': ['documentation/*.rst']},
    ...
    )

Licensing

GPL or MIT?

GPL

 Free world          Closed world
 ──────────          ────────────

  Awesome!           Less awesome
                         
Your library    ×   Alternative?
                       Rewrite?

MIT/BSD

 Free world        Closed world
 ──────────        ────────────

  Awesome!        Closed awesome
                       
Your library      Your library

(Look closely — Open Source everywhere!)

 Free world        Closed world
 ──────────        ────────────

  Awesome!        Closed awesome
                       
Your library      Your library
This is Python’s own model,
and Python is increasingly everywhere

MIT/BSD

The free kind of freedom is what wins

Technique

single code base
that runs on both
Python 2 and Python 3
Release small projects fast
jplephem
sgp4
Carefully monitor my emotions
Emotions tell me whether
I am programming right

Then I might be doing it wrong

Frustration often signals
inadequate project structure
I notice that tiny goals
are easier to reach and
keep me calm
“Test-driven development is a way of
managing fear during programming.”
— Kent Beck,
Test Driven Development By Example

Version Control

“How soon can I commit
so that I can’t lose what
I’ve just typed?”

Young

“How can I split this task into
smaller pieces for the computer?”

Old

“How can I split this task into
smaller pieces for me?”

Big slog: risks everything!

here →→→→→ new feature

here → • → • → • → • → new feature

Incremental: cheap to revert

Always use git stash
to revert and try again
Aim for a Moon Shot
Write up an end-to-end example
then move straight there

The sins of my APIs

Sin: Inscrutable names

“Explicit is better than implicit”

d = '2012/11/9'
m = ephem.mars(d)
print(m.a_ra, m.a_dec)

# Skyfield, instead:

d = JulianDate('2012/11/9')
p = earth(d).observe(mars).astrometric()
print(p.ra, p.dec)

Sin: storing results on object

mars = ephem.Mars()
mars.compute('2012/11/9')
print(m.ra, m.dec)


# Mars
#   .name
#   .date
#   .compute() →↘
#   .ra  ↖       ↓
#   .dec ←←←←←←←↙
#   ⋮
# PyEphem:

positions = []
for date in dates:
    mars.compute(d)
    positions.append((mars.ra, mars.dec))

# Skyfield:

coords = [mars(d).astrometric() for d in dates]
positions = [(c.ra, c.dec) for c in coords]

Look familiar?

letters = list(set(message_string))
letters.sort()
print ''.join(letters)

# vs

print ''.join(sorted(set(message_string)))
outputs = sorted(inputs)
Making code more functional
is a big part of Pythonic

Sin: Concealing expense

Your API is the only lifeline
the programmer has to managing
complexity and expense!
# PyEphem

m = ephem.mars('2012/11/9')

print(m.name)        # zero work
print(m.ra, m.dec)   # computed
print(m.rise_time)   # expensive!

Guideline

It’s okay to hide quick
conveniences behind properties
But expensive operations
should always look like calls!
mars.name        # looks cheap
mars.apparent()  # looks expensive!
Use this difference to
train your customers!
# “I can use this over and over again!”

mars.name

# “Looks expensive; I’ll save the
#  result to a local name instead.”

mars.apparent()

The big lesson?

Write APIs that TEACH!

Write APIs that teach

You know how your API works
and how it can best be used
Share as much of that
knowledge as possible

Write APIs that teach

While an API should hide complexity
it should also suggest best practices

Write APIs that teach

r = urllib2.urlopen(url)  # bad
r = requests.get(url)     # good

Why?

r = requests.get(url)
The very first line of requests code
teaches users their first HTTP verb
without their even knowing it
PyEphem made every coordinate
an attribute — looking exactly the same
with no visible relationship!
mars.a_ra, mars.a_dec  # astrometric
mars.g_ra, mars.g_dec  # apparent geocentric
mars.ra, mars.dec      # apparent topocentric
mars.alt, mars.az      # apparent horizontal
Skyfield has you to build a
result from smaller operations
here = toronto(jd)
here.observe(mars).astrometric()
here.observe(mars).apparent().equatorial()
here.observe(mars).apparent().horizontal()

Sin: Confusing functions and methods

Python support both procedural
and object-based methods, but
choosing can be difficult

PyEphem

m = Mars(date)
print(constellation(m.ra, m.dec))

# This function can ONLY EVER be passed
# a right ascension and declination; why
# not make it a coordinate method?
So here are some guidelines
that I have been using lately
  1. Methods constitute Python’s
    built-in typecheck — the one clean
    way to limit the types that a function
    will accept as an argument
If you are tempted to start a function
with if isinstance(…) then you
might want a method instead!
  1. f(x) should by definition touch
    only public features of an x
If f(x) needs internal details,
then either make it a method itself,
or create a method that does the
internal manipulation for it
  1. f(x) should by definition not
    mutate the state of x
(If you need to mutate state from
outside, try the Adapter Pattern!)

PyEphem

m = Mars(date)
print m.ra       # prints one thing

toronto.next_rising(m)
print m.ra       # something different!
  1. Methods are discoverable

(Jedi!)

Skyfield: New tricks

What about when you do need
to dynamically compute
an attribute?
class Sample(object):

    @property
    def loud(self):
        return self.message.upper()

Problem: hidden expense

So, we cache

class Sample(object):

    _loud = None

    @property
    def loud(self):
        if _loud is None:
            self._loud = self.message.upper()
        return self._loud
Problem: makes every
access more expensive

Solution: dunder-getattr!

class Sample(object):

    def __getattr__(self, name):
        if name == 'loud':
            self.loud = self.message.upper()
            return self.loud
        raise AttributeError()

Solution: dunder-getattr

For a value frequently
accessed in heavy numeric
work, __getattr__() runs
only on the first lookup!

Trick: scalar style

NumPy initially excited me
by letting me write scalar code
that also works on arrays!
def f(x, y):
    return sqrt(x*x + y*y)

print(f(3, 4))
# => 5

x = array([3, 8, 60])
y = array([4, 6, 80])
print(f(x, y))
# => array([5, 10, 100])
# Number!

jd = today()
p = earth(jd).observe(planet)

# Vector!

jd = date_range('1980/1/1', '2010/1/1', 1.0)
p = earth(jd).observe(planet)
But scalar style is hard to
maintain across a large code base!
n = compute_nutation(jd)
p = compute_precession(jd.tdb)
f = J2000_to_ICRS

t = einsum('jin,kjn->ikn', n, p)
t = einsum('ijn,jk->ikn', t, f)

pos = einsum('in,ijn->jn', pos, t)
vel = einsum('in,ijn->jn', vel, t)

One last hint

Vi Hart says τ = 2𝜋

Q:

Should dunder-init import things?

# __init__.py

from .earthlib import Topos
from .planets import Jupiter

to allow

from skyfield import Topos, Jupiter

Pro

Simple way to surface
your primary interface

Cons

Innocent import skyfield
winds up importing everything

Cons

Imports do not match messages
from skyfield import Topos
print type(Topos)

# => <class 'skyfield.coordinates.Topos'>

Skyfield: decided against

Opinionated defaults

Should Skyfield include
an ephemeris by default?
setup(
    
    install_requires=['de421'],
    )

Avoid Contrived tests

How do you test both
branches of the if?
def f(...):
            
    # 20 lines of code
            
    if final_decision():
        action1
    else:
        action2
Do not contrive complex calls
to f() that exercise both branches
Instead factor the function
into smaller pieces that can be
exercised separately

Final trick

Choosing a support forum

Mailing list? Web forum?

What happens 4 years later when
someone has the same question?
Answer can be difficult to find
within a long discussion thread

Answers go out of date

Stack Overflow!

The last mile

The last mile

Lesson #1

New GitHub issue

“PyEphem works from ssh but it
does not work from the web”

I was unhappy

never-arrived.png

“Can you help me?”

“This question is not a good
fit to our Q&A format”

“Can you help me?”

yes

import sys
open(
    '/tmp/emergency.log', 'w'
  ).write(
    str(sys.path) + '\n'
  )
import sys
sys.path.append(
    '/home/astronomia/.local/lib64'
    '/python2.6/site-packages'
    )

It worked

“FYI the support solution is to keep
that emergency code in every script
astronomia.png
“Brandon, thanks a lot for your help.
PyEphem is great.”

The last mile

Lesson #2

October 2012 — I put NOVAS on PyPI

pip install novas

An email

“I have been fiddling around
with NOVAS_Py-3.1 and have
had some problems…”

(a list of bugs followed)

The temptation

Not my library — not my problem

Provide the email address of USNO?
Forward it to them myself?
Assistant Director for Exploration

          ...@nasa.gov

USNOmeNASA

The last mile

The last mile

In open source, the
final component of the API
is very often you!
You are the the API of last resort


The last mile

In open source, the
final component of the API
is very often you!
You are the the API of last resort
@brandon_rhodes
Thank you very much!