The Dictionary Even Mightier @brandon_rhodes PyCon 2017 ----------------------------------------------- Original “Mighty Dictionary” talk ↓ 2009 2010 ...Oct Nov Dec Jan Feb Mar Apr May Jun Jul... 2.6.4 2.7.0 ↗ ↗ ↖ 3.0 3.1 3.3 ----------------------------------------------- 3.0 — Comprehensions (PEP 274) ⎫ Views (PEP 3106) ⎪ 3.1 — OrderedDict (PEP 372) ⎬ → Python 2.7 3.3 — Random seed (issue 13703)⎭ 3.3 — Key-Sharing (PEP 412) ⎫ 3.4 — SipHash (PEP 456) ⎬ Python 3 only 3.6 — Compact (issue 27350) ⎭ ----------------------------------------------- Dictionary Comprehensions (PEP 274) ----------------------------------------------- List Comprehension Python 2.0 [n*n for n in numbers] dict([(n, n*n) for n in numbers]) ----------------------------------------------- Generator Expression Python 2.4 dict((n, n*n) for n in numbers) ----------------------------------------------- Dictionary Comprehension {n: n*n for n in numbers} • Readable • Smaller • Faster ----------------------------------------------- Symmetry ----------------------------------------------- Symmetry [n*n for n in numbers] {n*n for n in numbers} {n: n*n for n in numbers} ----------------------------------------------- Symmetry [n*n for n in numbers] {n*n for n in numbers} {n: n*n for n in numbers} Lets learners extrapolate ----------------------------------------------- Dictionary Comprehensions (PEP 274) Python 3.0 → Python 2.7 ----------------------------------------------- Dictionary Views (PEP 3106) ----------------------------------------------- Python 3.0 → Python 2.7 ----------------------------------------------- d.keys() d.iterkeys() d.values() vs d.itervalues() d.items() d.iteritems() ----------------------------------------------- d.keys() ? d.iterkeys() d.values() <──── d.itervalues() d.items() d.iteritems() ----------------------------------------------- d.keys() ? d.iterkeys() d.values() <──── d.itervalues() d.items() d.iteritems() list(d.keys()) ----------------------------------------------- But what about is_present = k in d.keys() is_present = v in d.values() ? ----------------------------------------------- Could an object implement __contains__() for keys, values, items? ----------------------------------------------- Could an object implement __contains__() __sub__() __and__() __xor__() __or__() isdisjoint() for keys, values, items? ----------------------------------------------- Java Collections Framework ----------------------------------------------- Could an object implement v.__contains__() v.__sub__() v.__and__() v.__xor__() v.__or__() v.isdisjoint() v.__iter__() for keys, values, items? ----------------------------------------------- “View” d.keys(), d.values(), d.items() ↓ v.__contains__() v.__sub__() v.__and__() v.__xor__() v.__or__() v.isdisjoint() v.__iter__() ----------------------------------------------- Dictionary Views d.keys() 3.0 d.values() d.items() that all return views ↓ d.viewkeys() 2.7 d.viewvalues() d.viewitems() ----------------------------------------------- Iterators are the same but how you get them is different ----------------------------------------------- Dictionary Views (PEP 3106) Python 3.0 → Python 2.7 ----------------------------------------------- OrderedDict (PEP 372) ----------------------------------------------- OrderedDict • Python 3.1 → Python 2.7 • Preserves insertion order • Bigger, slower • In Python ≤ 3.4 • In C code ≥ 3.5 • Left two PEPs dangling PEP 468: kwarg order PEP 520: class dict ----------------------------------------------- Key-Sharing (PEP 412) Python 3.3 ----------------------------------------------- A question of space ----------------------------------------------- def __init__(self, name, port, proto): self.name = name self.port = port self.proto = proto ----------------------------------------------- 000 001 010 011 100 101 110 111 ----------------------------------------------- d['name'] = 'ssh' 000 001 010 011 100 101 110 111 ----------------------------------------------- d['name'] = 'ssh' 'name' 000 001 010 011 100 101 110 111 ----------------------------------------------- d['name'] = 'ssh' 01101010011000001011110000000100 000 001 010 011 100 101 110 111 ----------------------------------------------- d['name'] = 'ssh' ┌─┐ 01101010011000001011110000000100 └─┘ 000 001 010 011 100 101 110 111 ----------------------------------------------- d['name'] = 'ssh' ┌─┐ 01101010011000001011110000000100 └─┘ 000 001 010 011 100 00000100 'name' 'ssh' 101 110 111 ----------------------------------------------- hash('port') 01100100001100111010000000101101 000 001 010 011 100 00000100 'name' 'ssh' 101 110 111 ----------------------------------------------- hash('port') 01100100001100111010000000101101 000 001 010 011 100 00000100 'name' 'ssh' 101 00101101 'port' 22 110 111 ----------------------------------------------- hash('proto') 00111010001001001100101001010100 000 001 010 011 100 00000100 'name' 'ssh' 101 00101101 'port' 22 110 111 ----------------------------------------------- hash('proto') 00111010001001001100101001010100 000 001 01010100 'proto' 'tcp' 010 011 100 00000100 'name' 'ssh' 101 00101101 'port' 22 110 111 ----------------------------------------------- 000 001 010 011 100 101 110 111 000 001 01010100 'proto' 'tcp' 010 011 100 00000100 'name' 'ssh' 101 00101101 'port' 22 110 111 ----------------------------------------------- 000 001 010 011 100 00000100 'name' 'domain' 101 110 111 000 001 01010100 'proto' 'tcp' 010 011 100 00000100 'name' 'ssh' 101 00101101 'port' 22 110 111 ----------------------------------------------- 000 001 010 011 100 00000100 'name' 'domain' 101 00101101 'port' 53 110 111 000 001 01010100 'proto' 'tcp' 010 011 100 00000100 'name' 'ssh' 101 00101101 'port' 22 110 111 ----------------------------------------------- 000 001 01010100 'proto' 'udp' 010 011 100 00000100 'name' 'domain' 101 00101101 'port' 53 110 111 000 001 01010100 'proto' 'tcp' 010 011 100 00000100 'name' 'ssh' 101 00101101 'port' 22 110 111 ----------------------------------------------- ! Wait for the first call to __init__() ----------------------------------------------- 000 001 01010100 'proto' 'tcp' 010 011 100 00000100 'name' 'ssh' 101 00101101 'port' 22 110 111 ----------------------------------------------- │ 000 │ 001 01010100 'proto' │'tcp' 010 │ 011 │ 100 00000100 'name' │'ssh' 101 00101101 'port' │22 110 │ 111 │ │ ----------------------------------------------- │ │ 000 │ │ 001 01010100 'proto' │'tcp' │ 010 │ │ 011 │ │ 100 00000100 'name' │'ssh' │ 101 00101101 'port' │22 │ 110 │ │ 111 │ │ │ │ ----------------------------------------------- │ │ 000 │ │ 001 01010100 'proto' │'tcp' │ 010 │ │ 011 │ │ 100 00000100 'name' │'ssh' │'domain' 101 00101101 'port' │22 │ 110 │ │ 111 │ │ │ │ ----------------------------------------------- │ │ 000 │ │ 001 01010100 'proto' │'tcp' │ 010 │ │ 011 │ │ 100 00000100 'name' │'ssh' │'domain' 101 00101101 'port' │22 │53 110 │ │ 111 │ │ │ │ ----------------------------------------------- │ │ 000 │ │ 001 01010100 'proto' │'tcp' │'udp' 010 │ │ 011 │ │ 100 00000100 'name' │'ssh' │'domain' 101 00101101 'port' │22 │53 110 │ │ 111 │ │ │ │ ----------------------------------------------- │ │ │ 000 │ │ │ 001 01010100 'proto' │'tcp' │'udp' │ 010 │ │ │ 011 │ │ │ 100 00000100 'name' │'ssh' │'domain'│ 101 00101101 'port' │22 │53 │ 110 │ │ │ 111 │ │ │ │ │ │ ----------------------------------------------- │ │ │ 000 │ │ │ 001 01010100 'proto' │'tcp' │'udp' │ 010 │ │ │ 011 │ │ │ 100 00000100 'name' │'ssh' │'domain'│'gophe 101 00101101 'port' │22 │53 │ 110 │ │ │ 111 │ │ │ │ │ │ ----------------------------------------------- │ │ │ 000 │ │ │ 001 01010100 'proto' │'tcp' │'udp' │ 010 │ │ │ 011 │ │ │ 100 00000100 'name' │'ssh' │'domain'│'gophe 101 00101101 'port' │22 │53 │70 110 │ │ │ 111 │ │ │ │ │ │ ----------------------------------------------- │ │ │ 000 │ │ │ 001 01010100 'proto' │'tcp' │'udp' │'tcp' 010 │ │ │ 011 │ │ │ 100 00000100 'name' │'ssh' │'domain'│'gophe 101 00101101 'port' │22 │53 │70 110 │ │ │ 111 │ │ │ │ │ │ ----------------------------------------------- │ │ │ 000 │ │ │ 001 01010100 'proto' │'tcp' │'udp' │'tcp' 010 │ │ │ 011 │ │ │ 100 00000100 'name' │'ssh' │'domain'│'gophe 101 00101101 'port' │22 │53 │70 110 │ │ │ 111 │ │ │ │ │ │ ⅔ table gone! ----------------------------------------------- Memory Savings For even a tiny object, - (2 × 8 × 8) bytes = - 128 bytes ----------------------------------------------- Memory Cost +8 bytes for every dictionary ----------------------------------------------- Object-oriented programs often use 10–20% less memory ----------------------------------------------- Key-Sharing (PEP 412) Python 3.3 Take-away: In your objects, assign every possible attribute in __init__() ----------------------------------------------- Random seed (issue 13703) ----------------------------------------------- 000 001 010 011 100 101 110 111 ----------------------------------------------- 000 001 01100100 'cat' 3 010 01111010 'stone' 1 011 100 01111100 'bowl' 2 101 01110101 'dough' 4 110 11111110 ******* * 111 ----------------------------------------------- aback baaing cabbed deal easels ----------------------------------------------- * * 000 * 001 010 011 * 100 101 10001110 'baaing' 'gerund' 110 11000110 'aback' 'adj.' 111 * * * * * --- 64 / 145 --- 0.52002 ----------------------------------------------- 1 + 2 + 3 + 4 + ⋯ + n = (n + 1)(n) = n² + n = O(n²) ----------------------------------------------- https://accidentallyquadratic.tumblr.com/ ----------------------------------------------- https://accidentallyquadratic.tumblr.com/ • Elasticsearch IndicesQuery • Vim TAGS lookup • Capistrano server definition • Ruby `reject!` • Chrome Server-Sent Event Parsing • Rust hash iteration + reinsertion ⋮ ----------------------------------------------- /* Python 3.2 unicodeobject.c */ x = *p << 7; while (--len >= 0) x = (1000003*x) ^ *p++; x ^= Py_SIZE(self); ----------------------------------------------- 87 × 1001 ─────── 87 0 0 + 87 ─────── 87087 ----------------------------------------------- 87 × 1001 ─────── 87 + 87 ─────── 87087 ----------------------------------------------- 87 × 10001001 ─────────── 87 87 + 87 ─────────── 870087087 ----------------------------------------------- 87 × 100111011 ──────────── 87 87 87 87 87 + 87 ──────────── 8709657957 ----------------------------------------------- 87 × 100111011 ──────────── ¹¹ ¹87 87 87 87 87 + 87 ──────────── 8709657957 ----------------------------------------------- /* Python 3.2 unicodeobject.c */ x = *p << 7; while (--len >= 0) x = (1000003*x) ^ *p++; x ^= Py_SIZE(self); ----------------------------------------------- >>> bin(1000003) 0b11110100001001000011 -----------------------------------------------× 0b11110100001001000011 ──────────────────────────────────────────── .................... .................... .................... .................... .................... .................... .................... .................... .................... ──────────────────────────────────────────── /* Python 3.2 unicodeobject.c */ x = *p << 7; while (--len >= 0) x = (1000003*x) ^ *p++; x ^= Py_SIZE(self); ----------------------------------------------- December 2011, at 28c3: “Efficient Denial of Service Attacks on Web Application Platforms” — Alexander “alech” Klink Julian “zeri” Wälde ----------------------------------------------- “Web application technologies PHP ASP.NET Java ColdFusion Perl Ruby Python JavaScript ----------------------------------------------- “Web application technologies PHP ▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓ 77.3 % ASP.NET ▓▓▓▓ 21.7% Java ▓ 4 % ColdFusion | 1.2 % Perl | 1 % Ruby | 0.6 % Python | 0.2 % JavaScript | < 0.1 %” ----------------------------------------------- Python’s hash function could be: • “broken using a meet-in-the-middle attack” • “reasonable-sized attack strings only for 32 bits” • “Python (Plone) … max POST size of 1 MB” • “7 minutes of CPU usage for a 1 MB request” ----------------------------------------------- https://medium.com/@robertgrosse/ generating-64-bit-hash-collisions -to-dos-python-5b21404a5306 — 2017 March 1 ----------------------------------------------- /* Python 3.2 unicodeobject.c */ x = *p << 7; while (--len >= 0) x = (1000003*x) ^ *p++; x ^= Py_SIZE(self); ----------------------------------------------- /* Python 3.2.3 unicodeobject.c */ x = _Py_HashSecret.prefix; x ^= *p << 7; while (--len >= 0) x = (_PyHASH_MULTIPLIER*x) ^ *p++; x ^= Py_SIZE(self); x ^= _Py_HashSecret.suffix; ----------------------------------------------- A take-away: The same dictionary will have a different order each time you run— • Python 3.3+ • A bugfix release of 2.6, 2.7, 3.1, or 3.2 with the -R flag ----------------------------------------------- December 2012, at 29c3: “Hash-Flooding DOS Reloaded” djb, Jean-Philippe Aumasson, Martin Boßlet • Calling hash() → recover random key • Invented cryptographic SipHash ----------------------------------------------- SipHash (PEP 456) ----------------------------------------------- Old hash Random hash SipHash 2.5 – 2.5.5 2.6 – 2.6.7 2.6.8 – 2.6.9 2.7.1 – 2.7.2 2.7.3 – 2.7.13 3.0 3.1 – 3.1.4 3.1.5 3.2 – 3.2.2 3.2.3 – 3.2.6 3.3.0 – 3.3.6 3.4 3.5 3.6 ----------------------------------------------- Old hash Random hash SipHash 2.5 – 2.5.5 2.6 – 2.6.7 2.6.8 – 2.6.9 ⎫ 2.7.1 – 2.7.2 2.7.3 – 2.7.13 ⎪ 3.0 ⎬ -R 3.1 – 3.1.4 3.1.5 ⎪ 3.2 – 3.2.2 3.2.3 – 3.2.6 ⎭ 3.3.0 – 3.3.6 3.4 3.5 3.6 ----------------------------------------------- PEP 509 Add a private version to dict ----------------------------------------------- Private version (PEP 509) Python 3.6 ----------------------------------------------- Every dict now carries a version number guaranteed to increment on every modify ----------------------------------------------- Private version (PEP 509) Take-away: in 3.6+, quick O(1) internal check is now available for whether a dict has changed ----------------------------------------------- Issue 27350 Compact Dictionaries ----------------------------------------------- 000 001 010 011 100 101 110 111 ----------------------------------------------- 000 001 01100100 'cat' 3 010 01111010 'stone' 1 011 100 01111100 'bowl' 2 101 01110101 'dough' 4 110 11111110 'eag*** * 111 ----------------------------------------------- 0000 0001 0010 00100010 'fish' 7 0011 0100 01100100 'cat' 3 0101 01110101 'dough' 4 0110 0111 1000 1001 1010 01111010 'stone' 1 1011 1100 01111100 'bowl' 2 1101 1110 11111110 'eagle' 5 1111 ----------------------------------------------- 000 001 01100100 'cat' 3 010 01111010 'stone' 1 011 100 01111100 'bowl' 2 101 01110101 'dough' 4 110 11111110 'eagle' 5 111 ----------------------------------------------- 000 001 01100100 'cat' 3 010 01111010 'stone' 1 011 100 01111100 'bowl' 2 101 01110101 'dough' 4 110 11111110 'eagle' 5 111 3 × 3 × 8 = 72 bytes ----------------------------------------------- ! 2012 — Hettinger et al on python-dev ----------------------------------------------- ----------------------------------------------- 0 01111010 'stone' 1 ----------------------------------------------- 0 1 01111010 'stone' 1 01111100 'bowl' 2 ----------------------------------------------- 2 0 1 01111010 'stone' 1 01111100 'bowl' 2 01100100 'cat' 3 ----------------------------------------------- 2 0 1 3 01111010 'stone' 1 01111100 'bowl' 2 01100100 'cat' 3 01110101 'dough' 4 ----------------------------------------------- 2 0 1 3 4 01111010 'stone' 1 01111100 'bowl' 2 01100100 'cat' 3 01110101 'dough' 4 11111110 'eagle' 5 ----------------------------------------------- 000 001 01100100 'cat' 3 010 01111010 'stone' 1 011 100 01111100 'bowl' 2 101 01110101 'dough' 4 110 11111110 'eagle' 5 111 ----------------------------------------------- 2 0 1 3 4 01111010 'stone' 1 01111100 'bowl' 2 01100100 'cat' 3 01110101 'dough' 4 11111110 'eagle' 5 ----------------------------------------------- 5 2 3 0 1 4 01111010 'stone' 1 01111100 'bowl' 2 01100100 'cat' 3 01110101 'dough' 4 11111110 'eagle' 5 00100010 'fish' 7 ----------------------------------------------- ! 2012 — Hettinger et al on python-dev 2015 — Becomes default dict in PyPy ----------------------------------------------- Benchmark Change trans2_database 2.85% spitfire_cstringio 2.07% html5lib 1.82% ⋮ ⋮ trans2_backendopt -3.01% sympy_str -3.83% pypy_interp -4.06% bm_mako -8.74% ----------------------------------------------- PyPy benchmark average change: -0.38% ----------------------------------------------- Real improvement: memory usage! ----------------------------------------------- ! 2012 — Hettinger et al on python-dev 2015 — Becomes default dict in PyPy 2016 — June: Inada Naoki’s issue 27350 July: nothing happens August: “ping” and python-dev September: Core Dev Sprint! ----------------------------------------------- “we discussed a lot about your compact dict change at the current Python core dev sprint. We *all* want your change in Python 3.6! I proposed to just push your change *right now* but polish the code later” — Victor Stinner ----------------------------------------------- “we discussed a lot about your compact dict change at the current Python core dev sprint. We *all* want your change in Python 3.6! I proposed to just push your change *right now* but polish the code later” — Victor Stinner “Isn't this premature” ----------------------------------------------- The memory usage of the new dict is between 20% and 25% smaller compared to Python 3.5. — “What’s New In Python 3.6” ----------------------------------------------- Side-effect: Iteration is faster ----------------------------------------------- Another side-effect ----------------------------------------------- (original order: stone, bowl, cat, dough, eagle) 000 001 01100100 'cat' 3 010 01111010 'stone' 1 011 100 01111100 'bowl' 2 101 01110101 'dough' 4 110 11111110 'eagle' 5 111 ----------------------------------------------- (original order: stone, bowl, cat, dough, eagle) 2 0 1 3 4 01111010 'stone' 1 01111100 'bowl' 2 01100100 'cat' 3 01110101 'dough' 4 11111110 'eagle' 5 ----------------------------------------------- Can you now depend on order? ----------------------------------------------- Can you now depend on order? PEP 468 PEP 520 ----------------------------------------------- Guido on python-dev: “I’ve been asked about this. Here’s my opinion on the letter of the law in 3.6 - “keyword args are ordered [PEP 468] - the namespace passed to a metaclass is ordered by definition order [PEP 520] - ditto for the class __dict__ [PEP 520] “I'd like to handwave on the ordering of all other dicts.” ----------------------------------------------- “I want to remind them that they are taking a risk, and their code won't be backwards compatible” ----------------------------------------------- Host elrond Username brandon Host galadriel Username ubuntu ForwardAgent yes ----------------------------------------------- entries = [{}] for line in text.splitlines(): if not line.strip(): continue key, value = line.split() if key == 'Host': entries.append({}) entries[-1][key] = value ----------------------------------------------- [{}, {'Host': 'elrond', 'Username': 'brandon'}, {'Host': 'galadriel', 'Username': 'ubuntu', 'ForwardAgent': 'yes'}] ----------------------------------------------- for entry in entries: for name, value in entry.items(): print('{} {}'.format(name, value)) ↓ Host elrond Username brandon Host galadriel Username ubuntu ForwardAgent yes ----------------------------------------------- Stack Overflow ----------------------------------------------- Stack Overflow “Dictionary output in python is in wrong order” ----------------------------------------------- Stack Overflow “Dictionary output in python is in wrong order” “Why does this python dictionary get created out of order?” ----------------------------------------------- Stack Overflow “Dictionary output in python is in wrong order” “Why does this python dictionary get created out of order?” “python - Why is my dictionary printed in the wrong order as the input” ----------------------------------------------- Stack Overflow “Dictionary output in python is in wrong order” “Why does this python dictionary get created out of order?” “python - Why is my dictionary printed in the wrong order as the input” “why elements of python dictionary not get printed in sequence” ----------------------------------------------- Stack Overflow “Dictionary output in python is in wrong order” “Why does this python dictionary get created out of order?” “python - Why is my dictionary printed in the wrong order as the input” “why elements of python dictionary not get printed in sequence” “dictionary is not staying in order python” ----------------------------------------------- +------------------------------------------+ |Raymond Hettinger @raymondh Apr 6 | | | |Having dicts ordered by default in #python| |3.6 is not guaranteed yet. But it is so | |convenient that a guarantee for 3.7 is | |almost inevitable. | | | |↰ 15 replies ⇄ 79 retweets ❤ 159 likes | +------------------------------------------+ ----------------------------------------------- +------------------------------------+ |David Beazley @dabeaz Apr 6 | | | |Replying to @raymondh | |you're not putting that Genie back | |in the bottle. Order all the things!| | | |↰ 1 replies ⇄ retweet ❤ 18 likes | +------------------------------------+ ----------------------------------------------- Dictionaries for humans ----------------------------------------------- Python Dictionary Even Mightier ----------------------------------------------- Python Dictionary Even Mightier May your hashes be unique, ----------------------------------------------- Python Dictionary Even Mightier May your hashes be unique, Your keys rarely collide, ----------------------------------------------- Python Dictionary Even Mightier May your hashes be unique, Your keys rarely collide, And your dictionaries ----------------------------------------------------------------------------------------------