Oh, Come On.
Who Needs Bytearrays?

@brandon_rhodes
PyCon 2015
Montréal
In Python, normal string
objects are immutable
Python 2 / Python 3
str or bytes / bytes
unicode / str
>>> h = b'Content-Type: 30'
>>> h.lower()
b'content-type: 30'
>>> h
b'Content-Type: 30'
>>> h.split(b' ')
[b'Content-Type:', b'30']
>>> h
b'Content-Type: 30'
>>> h
b'Content-Type: 30'
>>> h[7] = b' '
Traceback (most recent call last):
  ...
TypeError: 'bytes' object does not
           support item assignment

Immutability

+ Simple
+ Functional
- Allocation
- Copying

bytearray

The bytearray is
• A mutable string
• Based on Python 3 bytes
• Available in Python 2.6, 2.7, 3

Note

Problem: "based on"!

Problem:

The Python 3 bytes type
is designed to be awkward
for string operations

Q: Why?

A: So you will want to run decode()
before treating it as characters

Python 2

>>> s = 'Hello world!'
>>> len(s)
12
>>> s.split()
['Hello', 'world!']

Python 3

>>> s = b'Hello world!'
>>> len(s)
12
>>> s.split()
[b'Hello', b'world!']

Python 2

>>> s = 'Hello world!'
>>> s[3:10]
'lo worl'

Python 3

>>> s = b'Hello world!'
>>> s[3:10]
b'lo worl'

Python 2

>>> s = 'abc'
>>> len(s)
3
>>> for item in s:
...     print repr(item)
'a'
'b'
'c'

Python 3

>>> for item in b'abc':
...     print repr(item)

?

Note

It gives you "SyntaxError, you failed to pay the Python 3 paren tax"

Python 3

>>> for item in b'abc':
...     print repr(item)
Traceback (most recent call last):
  ...
SyntaxError: you failed to pay the
             Python 3 paren tax

Note

Like an old text-based adventure game throwing an extra obstacle in your way.

> cross bridge

A BURLY TROLL STANDS BY THE BRIDGE AND
INSISTS YOU THROW HIM TWO PARENTHESES
BEFORE YOU MAY CROSS.

Note

Ruby

A Conservative Estimate

3 bugs per hour
×
2 print statements per investigation
×
2 parentheses per print
×
6 coding hours per day
×
250 coding days per year

Python 3 Owes Me

18,000 parenthesis characters per year

Note

I have to go into paren debt just to write any Emacs LISP. But, what's another two parens when I've already used so many?

>>> for item in b'abc':
...     print(repr(item))

Updated total: 18,002 parenthesis

Note

We'll just move on, not mention it.

Python 3

>>> for item in b'abc':
...     print(repr(item))
97
98
99

Breaks the contract

>>> s = b'Hello world!'
>>> s[0] + s[1:]
Traceback (most recent call last):
  ...
TypeError: unsupported operand type(s) for +:
           'int' and 'bytes'
>>> s = b'Hello world!'
>>> s[0:1] + s[1:]
b'Hello world!'
There is sometimes
a way around
But — bytes objects remain
an awkward fit for many tasks
they are called upon to do
ven.jpg

bytearray

Mutable version of Python’s
most underpowered
string type
I. The Bit Vector
II. The Reusable Buffer
III. The Accumulator
IV. The Freestyle
Mutable String

I. The Bit Vector

In the rare case
where you actually
want to store bytes—
is the bytearray
a good choice?

Bloom Filter

bits = 2048
mask = bits - 1
a = bytearray(bits >> 3)

def set(n):
    a[n >> 3] |= 1 << (n & 7)

for word in words:
    for h in hashes(word):
        set(h & mask)
def get(n):
    return a[n >> 3] & (1 << (n & 7))

def test(word):
    return all(get(h & mask)
               for h in hashes(word))
The name a in the previous code
can be either an old array.array
or a newfangled bytearray

A first victory

bytearray is more than 7% faster
than old general-purpose
array.array object
Want to know
what’s even faster?

list of ints

a = [0] * (bits >> 3)
A plain list of int objects
in the range 0…255 runs
2% faster than bytearray

Why?

bytearray: stores bytes that must each
be translated to an int object address,
and back, for each a[n] |= bits
list: simply stores int objects to
begin with — no translation!
So plain old list is
a faster bit vector than
the fancy new bytearray
(Except on PyPy)


(Except on PyPy)
(Where they are all the same)

(Except on PyPy)
(Where they are all the same)
(And run much faster)

Conclusion?

I. The Bit Vector

Verdict:
Space-efficient

Slightly slower, but 8× less space

II. The Reusable Buffer

cat

$ time cat < gigabyte.data > /dev/null
0.00s user 0.12s system 99% cpu 0.117 total
$ dd if=gigabyte.data of=/dev/null
2097152+0 records in
2097152+0 records out
1073741824 bytes (1.1 GB) copied,
        0.698802 s, 1.5 GB/s

Q:

Why does dd take 5× longer
than cat by default?

A:

Block size!

$ strace cat < gigabyte.data > /dev/null
read(0, ""..., 131072) = 131072
write(1, ""..., 131072) = 131072
$ strace dd if=gigabyte.data of=/dev/null
read(0, ""..., 512) = 512
write(1, ""..., 512) = 512
Giving dd the same block size
as cat makes it perform the same
$ dd if=gigabyte.data of=/dev/null bs=131072
8192+0 records in
8192+0 records out
1073741824 bytes (1.1 GB) copied,
        0.117653 s, 9.1 GB/s
So as we look at Python I/O, we will
need to keep block size in mind

Normal Python

while True:
    data = i.read(blocksize)
    if not data:
        break
    o.write(data)

Broken attempt at readinto()

data = bytearray(blocksize)
while True:
    length = i.readinto(data)
    if not length:
        break
    o.write(data)

Fix that incurs a copy

data = bytearray(blocksize)
while True:
    length = i.readinto(data)
    if not length:
        break
    o.write(data[:length])

What if we want to achieve zero-copy?

memoryview!

>>> s = bytearray(b'_________')
>>> m = memoryview(s)
>>> v = m[3:6]
>>> v[0] = 65  # A
>>> v[1] = 66  # B
>>> v[2] = 67  # C
>>> s
bytearray(b'___ABC___')

Zero-copy version of fix

data = bytearray(blocksize)
view = memoryview(data)
while True:
    length = i.readinto(data)
    if not length:
        break
    o.write(view[:length])
dd             0.112 s  **********
cat            0.113 s  **********
read()         0.122 s  ************
readinto()     0.117 s  ***********
+bytearray[]   0.183 s  ******************
+memoryview[]  0.119 s  ***********

File I/O, 128kB blocks

Result:
bytearray gives 4% speedup

small blocks

512B

What will slicing so often do?

data = bytearray(blocksize)
view = memoryview(data)
while True:
    length = i.readinto(data)
    if not length:
        break
    o.write(view[:length])
dd             0.64  *************
read()         1.04  *********************
readinto()     0.96  *******************
+bytearray[]   1.33  **************************
+memoryview[]  1.25  *************************

File I/O, 512B blocks

20% slowdown

but

I thought of something

data = bytearray(blocksize)
view = memoryview(data)
while True:
    length = i.readinto(data)
    if not length:
        break
    elif length == blocksize:
        o.write(data)
    else:
        o.write(view[:length])
dd             0.64  *************
read()         1.04  *********************
readinto()     0.96  *******************
+bytearray[]   1.33  **************************
+memoryview[]  1.25  *************************
+hybrid        1.00  ********************

File I/O, 512B blocks

bytearray gives 4% speedup

Python 2.7 has the same behavior,
but slightly slower than 3.4

Lesson

Hard to beat old-
fashioned strings

Note

Easy to write code that looks good, but is slower. Functional programmer's heart sing, versus side-effecty.

II. The Reusable Buffer

Verdict:
Dangerous
But offers a great
memory profile!

III. The Accumulator

Q: How many bytes will
recv(1024) return?

Q: How many bytes will
recv(1024) return?
A: One

Q: How many bytes will
recv(1024) return?
A: One
(or more)

Note

Could be a single packet; could be several; could be one stray byte from the most recent packet

“But it worked when I
ran against localhost!”

Note

Using min() here is a 10% performance penalty

How does a recv() solution perform?

data = b''
while len(data) < content_length:
    n = content_length - len(data)
    more = s.recv(n if n < 1500 else 1500)
    if not more:
        raise EOFError()
    data += more

How long does the += approach take?

blocks = []
n = content_length
while n:
    more = s.recv(n if n < 1500 else 1500)
    if not more:
        raise EOFError()
    blocks.append(more)
    n -= len(more)
data = b''.join(blocks)

recv()

1.08 s

recv_into() now runs
into a problem
incoming blocks

┌─┬─┬─┬─┐
└─┴─┴─┴─┘
┌─┬─┬─┐
└─┴─┴─┘
┌─┬─┬─┬─┐
└─┴─┴─┴─┘
    
┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴
          bytearray
      incoming blocks

┌─┬─┬─┬─┐
└─┴─┴─┴─┘
        ┌─┬─┬─┐
        └─┴─┴─┘
              ┌─┬─┬─┬─┐
              └─┴─┴─┴─┘
                
┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴
          bytearray
So, the memoryview()
slicing expense?
It becomes mandatory
>>> s = bytearray(b'_________')
>>> m = memoryview(s)
>>> v = m[3:6]
>>> v[0] = 65  # A
>>> v[1] = 66  # B
>>> v[2] = 67  # C
>>> s
bytearray(b'___ABC___')
┌─┬─┬─┬─┐  First block
└─┴─┴─┴─┘
    

┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴
          bytearray
        ┌─┬─┬─┐  Second block
        └─┴─┴─┘
                  memoryview
        ┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬
┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴
          bytearray
              ┌─┬─┬─┬─┐
              └─┴─┴─┴─┘
                  memoryview
              ┌─┬─┬─┬─┬─┬─┬─┬
┌─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬─┬
└─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴─┴
          bytearray
data = bytearray(content_length)
view = memoryview(data)
n = content_length
while n:
    limit = n if n < 1500 else 1500
    more = s.recv_into(view[-n:], limit)
    if not more:
        raise EOFError()
    n -= more
0.80 s
(instead of 1.08 s)
Old-fashioned recv()
plus bytearray.extend()
data = bytearray(0)
n = content_length
while n:
    more = s.recv(n if n < 1500 else 1500)
    if not more:
        raise EOFError()
    data.extend(more)
    n -= len(more)

0.81 s

Turns out?

bytearray.extend()
is pretty inefficient
bytearray.extend(s)
  1. i = iter(s)
  2. Calls next(i) for every byte
  3. Asks the byte its numeric value
call next(i)
    integer = address of int object
    INCREF(integer)
    return integer
ask the integer its value
insert that value into bytearray
DECREF(integer)
INCREF()
    reads 8 bytes from RAM
    writes 8 bytes to RAM
return
    copies 8 byte address
DECREF()
    reads 8 bytes from RAM
    writes 8 bytes to RAM
__________________________
    40 bytes of RAM bandwidth
    just to communicate 8 bits
Plus, bytearray.extend()
writes to an intermediate buffer —
which it dynamically
reallocates as it grows
then does the append

Q:

Does bytearray have
an append operation
that’s any good?

A:

Yes!

Note

Now think about it: where would you put the real append?

+=

Note

The one they taught us to avoid!

data = bytearray(0)
n = content_length
while n:
    more = s.recv(n if n < 1500 else 1500)
    if not more:
        raise EOFError()
    data += more
    n -= len(more)

0.73 s

data += more         
recv() + join()  1.09 s  ***********
recv_into()       0.80 s  ********
data.extend()     0.81 s  ********
data += more       0.73 s  *******

bytearray gives 33% speedup

and
cleaner code
data = bytearray(0)
n = content_length
while n:
    more = s.recv(n if n < 1500 else 1500)
    if not more:
        raise EOFError()
    data += more
    n -= len(more)

Q:

What about the
send() maneuver?

A:

Python already
has you covered
sendall()

III. The Accumulator

Verdict:
bytearray is a win!
Try recv_into() for big blocks,
but recv() and += for small.

IV. The Freestyle Mutable String

What if you want to
change part of a payload before
using, storing, or re-transmitting?
Good — the bytearray is mutable
Bad — none of the methods it
shares with strings do mutation!
>>> b = bytearray(b'Hello, world!')
>>> b.upper()
bytearray(b'HELLO, WORLD!')
>>> b
bytearray(b'Hello, world!')
A bytearray only changes when
subjected to list-like operations
>>> b
bytearray(b'Hello, world!')
>>> b[5] = 32
>>> b[7:12] = b'PyCon'
>>> b
bytearray(b'Hello  PyCon!')
>>> b.clear()
>>> b
bytearray(b'')
The result is curious:
a “mutable string” that, alas,
does no mutation precisely when
you start calling its string methods
Want to lower-case a word?
>>> b = bytearray(b'ALL YOUR BASE')
>>> bslice = b[4:8]
>>> l = bslice.lower()
>>> b[4:8] = l
>>> b
bytearray(b'ALL your BASE')
Problem: two extra data copies
There is no .lower_into(b) method
hidden somewhere to save us

Q:

Can the memoryview save us?

A:

No.

A:

>>> b = bytearray(b'ALL YOUR BASE')
>>> v = memoryview(b)
>>> bslice = v[4:8]
>>> l = bslice.lower()
Traceback (most recent call last):
  ...
AttributeError: 'memoryview' object has
                no attribute 'lower'
So, without in-place string operations,
you will incur data copies — thus removing
some of the temptation to use bytearray
objects in the first place
To mutate a bytearray without
rewriting its whole content,
you are going to need indexes

Indexes

Do you remember indexes?

Do you remember the wilderness
in which you wandered before you
discovered split(), strip(), and join()?
When your code, instead of explaining what
you wanted to do, instead explained —
at great length — how to do it?
i = s.find(': ')
j = i + 2
name = s[:i].lower()
value = s[j:]
The bytearray will let you
enjoy those days all over again
because mutation is entirely
powered by indexes
a[6] = 65
a[10:16] = 'Python'

Hint: Use REs

>>> import re
>>> s = bytearray(b'Big Ben tower')
>>> match = re.search(b'Be.', s)
>>> match.span()
(4, 7)

IV. The Freestyle Mutable String

Verdict:
bytearray is awkward

Tidbits

bytearray has list-like methods
Example.pop() and .append() in case
you ever need a stack of bytes
Buffer protocol — the C interface that supports
readinto(), recv_into(), send(), write()
Plus, memoryview() is both
a consumer and a provider of
the buffer interface!
So efficient, raw byte-oriented I/O
and byte-level introspection is now possible
for a range of Python data structures
Note that memoryview.release() lets
you release the underlying bytes early

Resources

ctypes structures — David Beazley ¹ ²
NumPy arrays — Jake VanderPlas ³
Memory profiling — Victor Lin
readinto() — Eli Bendersky
Official Documentation
PEP-3118 “Revising the buffer protocol”
by Travis Oliphant, Carl Banks

Conclusion


bytearray

  1. Memory-efficient store of byte integers
  2. Help control memory fragmentation
  3. Great way to accumulate data
  4. Awkward for string operations

Thank you very much!

@brandon_rhodes

bytearray

  1. Memory-efficient store of byte integers
  2. Help control memory fragmentation
  3. Great way to accumulate data
  4. Awkward for string operations