Using Python to Power
Selenium at Scale
PyCon Canada 2016
@brandon_rhodes

Q:

What is Selenium?

Java | Python | Perl | PHP | …

↓↑

Selenium API

↓↑

Browser

# Underlying mechanism:
# run arbitrary JavaScript
# in main browser UI thread

driver.execute_script('…')
# Send and receive strings, numbers,
# lists, dicts, DOM elements.

value1, value2, value3 = driver.execute_script(
    '…', arg1, arg2, arg3,
)
# Python API

driver.get('https://2016.pycon.ca/en/')

# (Selenium method names are too long!)
find = driver.find_element_by_css_selector

h2 = find('h2')
assert 'Toronto' in h2.text
nav = find('.navigation__link')
nav.click()

Selenium tests are expensive



Selenium tests are expensive

Avoid letting unit
tests creep in—
Do not use your
Selenium tests to iterate
across edge cases!
Use unit tests instead

Two big problems

Selenium tests are:

  1. flaky
  2. expensive

Flaky

Selenium makes it easy
to write broken tests
find = driver.find_elements_by_css_selector

find('.navigation__link').click()
assert driver.find('h1').text == 'Schedule'

#
#
#
find = driver.find_elements_by_css_selector

find('.navigation__link').click()
assert driver.find('h1').text == 'Schedule'

# Broken!
#  Might grab <h1> on current page
#  Might select before <h1> arrives
Sometimes browser behavior
is flaky in Selenium

“Selenium tests are flaky”

Makes engineers sad
to fix them

“But the test passes on my laptop!”

Engineers start to
ignore red builds

“It’s probably just a flake.”

changes.png

Quest for Green

We added armor
to our Selenium suite
and to our continuous integration

Commit Queue

(your branch)
──diff── × ──diff──  ──land
                          
                          
                          
(master)   ──  ──  ── ? ──   ──   ──   ──   ──

Commit Queue

(your branch)
──diff── × ──diff──  ──land
                            
                              rebase + retest
                              
(master)   ──  ──  ──  ── ? ──   ──   ──   ──

Commit Queue

(your branch)
──diff── × ──diff──  ──land
                            
                              rebase + retest
                              
(master)   ──  ──  ──  ──  ──  ──  ──  ──
But: could the Selenium
build ever qualify?

Quarantine



pass-fail-pass

A test in Quarantine:

Oncall

Auto-retry

“But,” I hear you cry, “a retry
can mask a real failure!”

Damage of false red

damage false green

Armor

January 2016

Dropbox’s Selenium test
suite was approved as a Commit
Queue blocking build
green-check.png green-check.png green-check.png

Scale

shard

But remember,
with n shards:

(test_time ÷ n) + (setup_time × n)

Expense

t minute suite
×
n developers
×
m diffs per day
×
≥3 builds per diff

Expense

120 minute suite
×
100 developers
×
2 diffs per day
×
5 builds per diff
= 2,000 hours / day
2,000 hours / day
× 20¢ EC2 instance
= $400 / working day
= $104k / year

Runway



Runway

“Dropbox is cash flow positive”
— Drew Houston, June 2016

Cost control

Narrow triggers

Whitelist or blacklist per build
to avoid running all suites
on all commits

At scale

Don’t be a purist —
poke holes for fixtures
“Create a user with both
a work and personal Dropbox account
who is a member of 2 shared folders

At scale

At scale

Isolation does get more difficult
if you save time by re-using the browser
driver.window_handles
driver.switch_to_window(handle)

Teardown

Warning: The Selenium method
driver.delete_all_cookies()
only deletes cookies for
the current domain
Retry with greater isolation
Every test that fails
is immediately re-run
with a fresh browser

At scale

Dispose of the VM

What data should be rescued
before the VM is deleted?
We used to deliver logfile
snippets that started
when the test did
=== TEST START ===
...
==== TEST END ====
Problem: You can’t see
what happened right before
your test started.
=== TEST START ===
Traceback (most recent call last):
  ...
Exception: ...
==== TEST END ====
Traceback (most recent call last):
  ...
Exception: ...
Service halted; restarting
Service halted; restarting
=== TEST START ===
Traceback (most recent call last):
  ...
Exception: ...
==== TEST END ====

Scale

Use only one browser

Scale

Writing Tests

Avoid raw Selenium

Test API

two-buttons.jpg
# No

find('.navigation__link').click()
find('._sd_login_button').click()
# Yes

find_link('Schedule').click()
find_button('Login').click()

Using the web like a human

# Turns out? This is N round-trips!

[button.text == 'Login'
 for button in find('button')]

Problem: custom solution

Haphazard in-house routines,
some used everywhere, some used
only in particular suites

Solution

https://github.com/elliterate/capybara.py

page.visit("/")
page.click_link("Sign in")

page.fill_in("Username", value="user@example.com")
page.fill_in("Password", value="password")
page.click_button("Sign in")

Gating

Configuration flags
1. Gate an experiment
2. Gate an A/B test

Gating

Problem: You can’t run
the whole suite under all
2n

combinations of n gates

Gating

Provide:

for setting gates

Choose a testing framework
that lets you re-run a slate of tests
with different gate settings
class NormalTests(unittest.TestCase):
    def ...
    def ...
    def ...
    def ...

class ExperimentTests(NormalTests):
    def setUp(self):
        gate.set('use_new_hash', True)
Periodically audit the
@gates(…) and with gates(…) calls
for obsolete settings
Larger point: The test suite
is itself a product that needs
maintenance and refactoring

Capybara + Armor + Scale

=



Capybara + Armor + Scale

=

green-check.png

Capybara + Armor + Scale

=

green-check.png

Thank you! — @brandon_rhodes