@brandon_rhodes

CodeMash 2013

The iPython Notebooks

that go with this talk:

They will continue evolving,

but you can always rewind to today!

```
git checkout 'master@{2013-01-11 12:00:00}'
```

*a modest introduction*

To talk about **astronomy**

introducing a simple

tool stack

- iPython Notebook
- matplotlib
*or*mayavi - SciPy
- NumPy
- Python 2.7

An iPython Notebook consists

of *cells* of code (or titles or text)

that you run with **Shift + Enter**

“The Inspiration”

- Array-agnostic functions are an
*art* - Vectors properly take
*singular*names(much like relations in SQL!) - Use
`print`and`assert`aslightweight in-notebook tests

“0 ≤ *i* < *N*

…

“let us let our ordinals start at zero:

an element's ordinal (subscript)

equals the number of elements

preceding it in the sequence.”

*— ED831*

`a[0]`

`a[3]`

`a[-2]`

`a[-1]`

Or, for those who took Latin—

`a[-1]`

`a[-2]`

`a[-3]`

Again, remember Dijkstra:

0 ≤ *i* < *N*

As well as C-style languages:

`for (i = 0; i < N; i++) ;`

Elements 3, 4, and 5

`a[3:6]`

First three elements 0, 1, and 2

`a[:3]`

All but first element

`a[1:]`

Last three elements

`a[-3:]`

New copy of entire list

`a[:]`

So that is how built-in Python

Generalize Python lists

to *n* dimensions

Use compact binary instead

of storing each item as an object

Index and slice work as expected,

with comma separating dimensions

```
a = np.array([[1, 2, 3, 4],
[5, 6, 7, 8],
[9, 10, 11, 12]])
print a[:-1,1:3]
```

produces

```
[[2 3]
[6 7]]
```

Support both element-wise

```
a = np.array([1, 2, 3])
print a + 10 # => [11 12 13]
print a + a # => [2 4 6]
print a.sum() # => 6
```

Enough

Let's look at a

larger data set

“Asteroids”

Was that crazy, or what?

```
theta[r > 4.5]
```

My first reaction: *“Crazy!”*

```
theta[r > 4.5]
```

**How** does that work?

My first hypothesis:

**They cheated**

```
theta[r > 4.5]
```

“Obviously, applying `>` to an

array returns a *magic object* that

remembers the comparison, and

**I was wrong**

**I was wrong**

*and*

NumPy is really beautiful!

The behavior of `>` is

arrays treat other operators!

```
theta = np.array([ 0.0, 1.6, 3.1, 4.7 ])
r = np.array([ 0.0, 1.0, 2.0, 3.0 ])
r + 1.5 # => [ 1.5 2.5 3.5 4.5]
```

```
theta = np.array([ 0.0, 1.6, 3.1, 4.7 ])
r = np.array([ 0.0, 1.0, 2.0, 3.0 ])
r + 1.5 # => [ 1.5 2.5 3.5 4.5]
r > 1.5 # => [False False True True]
```

```
theta = np.array([ 0.0, 1.6, 3.1, 4.7 ])
r = np.array([ 0.0, 1.0, 2.0, 3.0 ])
r + 1.5 # => [ 1.5 2.5 3.5 4.5]
r > 1.5 # => [False False True True]
theta[r > 1.5] # => [ 3.1 4.7]
```

```
theta = np.array([ 0.0, 1.6, 3.1, 4.7 ])
r = np.array([ 0.0, 1.0, 2.0, 3.0 ])
r + 1.5 # => [ 1.5 2.5 3.5 4.5]
r > 1.5 # => [False False True True]
theta[r > 1.5] # => [ 3.1 4.7]
```

The secret: you can index into

an array with a subscript that

is *another* array of booleans!

```
theta[r > 4.5]
```

Zen of Python:

```
theta[r > 4.5]
```

The best Python API tricks occur when

orthogonal, separately useful behaviors

can be combined in interesting ways

A quick reflection on

the science behind this talk

This talk does not

describe how *real*

astronomers work!

Model

↓

Point telescope

↓

Images and Data

↓

Process and Hypothesize

↓

Model

Model

↓

Point telescope

↓

Images and Data

↓

↓

Model

hard work to interpret

raw data and images

run models,

draw pictures!

Those were not *real* asteroid

positions that we just plotted

Those were *imperfect* predictions

based on models that keep improving

Planet positions: **VSOP87, DE405**

Asteroid positions: **Kepler**

SGP4

↓

Satellite positions

↓

3D globe with satellites

“3D Earth Satellites”

We can apply the same

3D tools to the whole

solar system

“3D Solar System”

as you ascend from atom

to planet to star system

It is usually cleanest

to build a **separate** 3D

visualization specific to each scale

⋮

A sky chart combining solar system

bodies with distant stars can simply

“mix down” each object into an *x,y*

coordinate and plot them in 2D

There are *many ways* to

pack a diagram with information

Glyphs can *vary* in—

“Stars”

Our visualizations

so far have emphasized

Diagrams can also

relate *non-spatial* data

sets to each other

“Hertzsprung-Russell”

While **NumPy** provides

efficient vector arrays,

the **SciPy** library provides

numeric processing

- Statistics
- Optimization
- Numerical integration
- BLAS/LINPACK
*(linear algebra)* - Fourier transforms
- Signal processing
- Image processing
- ODE solvers
- Special functions

“Stellar-Color-Index”

When viewing a Notebook

through someone else's server,

you can **download** it to your

own hard drive and run it

Drag `.py` and `.ipynb`

files into the *iPython Dashboard*,

and a local copy of the code

is created for you to edit

“iPython Features”

Jonathan Taylor of Stanford's

extension for cells that contain

```
%%R -i X,Y -o XYcoef
XYlm = lm(Y~X)
XYcoef = coef(XYlm)
print(summary(XYlm))
par(mfrow=c(2,2))
plot(XYlm)
```

Thanks to HTML export,

iPython can be used to write

that can be *re-executed* to verify

their code and diagrams

An online in-the-cloud

iPython Notebook interpreter

Can distribute computations

across either a *dedicated* or

an *in-the-cloud* cluster

Compatible with MIT's

Python Plugin includes

an iPython client

*and*

Takafumi Arakaki has written

an *Emacs client* for iPython Notebook

Front-ends use a simple,

well-documented protocol over **ØMQ**

to communicate with Notebook server

JSON that is designed to support

```
{
"cell_type": "code",
"collapsed": false,
"input": [
"from mayavi import mlab\n",
],
"language": "python",
"metadata": {},
"outputs": []
},
```

The frequent and predictable newlines

make it **version-control** friendly!

```
{
"cell_type": "code",
"collapsed": false,
"input": [
"from mayavi import mlab\n",
],
"language": "python",
"metadata": {},
"outputs": []
},
```

Unless you `Cell » All Output » Clear`

before saving, inline images and media

get saved as base64 strings—

—this means that an iPython Notebook

can survive uploading or emailing

with *all* of its images intact!

“Computing is the **backbone**

of all of science”

*— Fernando Perez*

Because the language runtime

it supports *rapid iteration* as you

work on a particular step in your

processing or visualization

Clean

Elegant

Natural notation

```
3*x + 1
f(x)
```

Large array operations

are performed in C or FORTRAN,

2002–2012 — Create, maintain Matplotlib

18 July 2012 — SciPy 2012 keynote speaker

28 August 2012 — Died from cancer

I am **@brandon_rhodes**

Thank you!