So here’s an API innovation that I just made a rough implementation of: taking a Python data descriptor and enhancing it so that, when it’s invoked on the class itself, it becomes a constructor and returns a new instance of the class.
Does that sound like a good idea?
After you’ve read the saga below, feel free to share your opinion by replying to my tweet about this post!
https://twitter.com/brandon_rhodes/status/1486435857607311368
The background is that my astronomy library Skyfield
uses objects to represent measures like distance and velocity.
I could have used two separate objects
to represent a distance measured as au
(astronomical units)
and a distance measured as km
,
but I prefer not to have several objects represent one single thing.
It seems cleaner to me to have but a single object
for each particular distance that Skyfield needs to represent,
and to have that object different offer units
as different attributes
(which also helps readability
by making users name the unit they’re asking the Distance
for):
AU_KM = 149597870.700
class Distance:
def __init__(self, au=None, km=None):
if au is not None:
self.au = au
elif km is not None:
self.au = km / AU_KM
@property
def km(self):
return self.au * AU_KM
d = Distance(km=217)
print('au:', d.au)
print('km:', d.km)
au: 1.4505554055322529e-06
km: 217.00000000000003
At least that’s how I implemented it back in 2014.
In retrospect I’m not happy with how __init__()
works.
To my eyes today it looks wasteful:
only one of the several arguments is ever used,
and the if…elif
has to search through them every time,
even though only one will ever be non-None
.
Side note —
I felt so bad about this __init__()
code
that Skyfield’s Distance
class
provides a second, redundant constructor for au
:
@classmethod
def from_au(cls, au):
self = cls.__new__(cls) # avoid calling __init__()
self.au = au
return self
You’ll recognize this pivot as a
modern design pattern that I discuss on my Python Patterns site:
instead of providing one huge method
with a big if…elif
inside that decides what it should do,
simply have n smaller methods
that each do one thing well.
Anyway.
Another problem with the example code above
is that .km
needs to be computed over and over again
every time the user needs it.
In a loop, that could become expensive:
a Skyfield distance isn’t always a simple float,
it can be a whole NumPy array of a million elements.
At first I avoided recomputing it
by using one of those unwieldy if self._km is None:
caching checks inside of each unit’s property,
but then I remembered that Python descriptors
were very carefully designed
so that you could write a descriptor
that only gets called if there isn’t yet a .km
attribute!
Otherwise Python skips the whole method call,
which both makes the logic simpler and saves valuable time.
I never remember the formula,
so I had to look it up (I got it from Pyramid’s source code):
from functools import update_wrapper
class reify(object):
def __init__(self, method):
self.method = method
update_wrapper(self, method)
def __get__(self, instance, objtype=None):
if instance is None:
return self
value = self.method(instance)
instance.__dict__[self.__name__] = value
return value
I never quite remember what all its code is doing, but it works great!
class Distance:
def __init__(self, au=None, km=None):
if au is not None:
self.au = au
elif km is not None:
self.km = km
self.au = km / AU_KM
@reify
def km(self):
return self.au * AU_KM
d = Distance(au=1.524)
print('au:', d.au)
print('km:', d.km)
print()
d = Distance(km=217)
print('au:', d.au)
print('km:', d.km)
au: 1.524
km: 227987154.9468
au: 1.4505554055322529e-06
km: 217
When we use @reify
to turn .km
into a real attribute like this,
that we can assign to,
we get an extra benefit:
when a distance is instantiated using km
,
we can cache the actual value that the user provided!
Note that we are now returning the exact km
value the user provided,
instead of trying to re-convert the value from au
and getting the slight rounding error we saw in the first example:
km
is now 217
instead of 217.00000000000003
.
But how can we avoid the ugly if…elif
block,
and the need to have __init__()
accept as many parameters
as there are units of measurement?
I already illustrated, above,
a from_au()
method that does things the ‘right way’
by multiplexing on method name rather than on combinations of arguments.
Is that the way forward,
adding something like from_km()
for each additional unit supported?
That way forward still seems a little sad:
if I need to support n units,
then I need to write n constructors in addition to my n reified properties,
and keep them all in sync with each other.
Could there be a better way?
Like a sudden blaze of light, a possible constructor syntax occurred to me:
d = Distance.km(217)
Wow!
Could it work?
Could a @reify
descriptor
be taught to respond to a class-method-like call
and return a newly constructed instance?
What even happens when that syntax is invoked?
try:
Distance.km(217)
except Exception as e:
print(e)
'reify' object is not callable
A bit of testing revealed
that calling the reify
descriptor like a function
does something very Pythonic and obvious:
it runs its __call__()
method if it has one.
Simple enough!
Let’s write a better reify
:
class unit(object):
def __init__(self, method):
self.method = method
update_wrapper(self, method)
def __get__(self, instance, objtype=None):
if instance is None:
return self
value = self.method(instance)
instance.__dict__[self.__name__] = value
return value
# New code:
def __call__(self, *args):
print('type(self):', type(self))
print('args:', args)
class Distance:
# ...
@unit
def km(self):
return self.au * AU_KM
Distance.km(217)
type(self): <class 'tmp.test_code_214_output_243.<locals>.unit'>
args: (217,)
It worked!
Our new extended descriptor is starting to gain a superpower
beyond what @reify
itself could do:
it supports being called like a method of the class itself.
Can we turn it into a constructor?
Well, let’s see what else the constructor will need.
It will need to know the name of the attribute to set
on the new instance.
That’s easy enough: it can ask for the __name__
of the method,
and it will learn that it’s named km
.
What about learning the conversation factor that it needs to use?
We don’t currently have a way to provide it.
Maybe the decorator should take an argument —
which is conventionally accomplished in Python
by turning the decorator into a closure that returns an inner function.
Oh, dear, this is going to be a bit ugly, isn’t it? Let’s just get it working and maybe we can simplify later.
def unit(conversion_factor):
def wrap(method):
name = method.__name__
class unit_descriptor(object):
# These methods are the same as before:
def __init__(self, method):
self.method = method
update_wrapper(self, method)
def __get__(self, instance, objtype=None):
if instance is None:
return self
value = self.method(instance)
instance.__dict__[self.__name__] = value
return value
# This is improved:
def __call__(self, value):
print('name:', name)
print('conversion_factor:', conversion_factor)
print('value:', value)
return unit_descriptor(method)
return wrap
class Distance:
# ...
@unit(AU_KM)
def km(self):
return self.au * AU_KM
Distance.km(217)
name: km
conversion_factor: 149597870.7
value: 217
Success!
The __call__()
now knows what attribute it should set
and what conversion factor it should apply.
Yes, I dislike creating classes inside of functions inside of other functions.
But let’s see if we can get it working before we worry about elegance.
It feels like time to write up some actual constructor code
and see if we can get a Distance
instance returned back to us.
Well — drat.
How will __call__()
know what class to build?
I could of course hard-code Distance
as the class to build,
but I was hoping that a single descriptor
could support all of my classes,
including Velocity
and Angle
.
It would be better if unit
could auto-discover
that Distance
was the class it was being asked to build.
But, alas, it doesn’t know its class!
Remember that the self
argument to __call__()
,
as you might have noticed in the earlier example,
is the unit
descriptor instance itself —
which makes sense,
but doesn’t help me discover
which class this unit
descriptor is attached to.
And here we run into a quirk of Python: methods don’t know which class they belong to. And decorators don’t have a way to get to the class either — the class object doesn’t even exist yet as the class’s namespace itself is running.
One common solution is to do post-processing on the class.
Then you can loop over its attributes,
figure out which ones are unit
descriptors,
and give them each an extra attribute, like _my_class
,
that tells them what class they belong to.
There are several well-known ways to post-process a Python class.
But then there would be a subtle problem.
Though it would probably never come up,
it’s technically incorrect for km()
to always return a Distance
,
because someone could subclass Distance
and then the km()
constructor, if called on their subclass,
really ought to return an instance of that subclass instead.
And — those approaches just feel complicated
compared to what we’re trying to do.
Is it really impossible for __call__()
to learn its class?
So I dug around a bit.
I added print()
statements in a variety of places.
And discovered a solution!
To my vast surprise, it turns out that the method call:
Distance.km()
— goes through, not one, but two of the methods
defined on the unit_descriptor
!
The steps Python take turn out to be:
Distance
— look up the object..km
— Grab the km
attribute.__get__(None, Distance)
and the resulting object is considered
the actual value of the attribute.()
— Finally, Python invokes that object’s __call__()
method.And while the knowledge ‘which class was this called on’ has evaporated
by the time __call__()
is invoked,
it’s available inside the ‘get’ dunder!
Recall that at this point we’re using a __get__()
method
that I copied-and-pasted from the Internet
without actually understanding what it does.
Let’s look at it again —
this time, pay attention to its first two lines:
def __get__(self, instance, objtype=None):
if instance is None:
return self
value = self.method(instance)
instance.__dict__[self.__name__] = value
return value
Yeah, I didn’t understand them at first either.
Why would instance
be None
?
Well, it turns out, that’s what tells the descriptor that you,
instead of asking for an instance attribute d.km
,
have asked for a class attribute Distance.km
.
And it just so happens that in that case,
objtype
is Distance
!
So we can get rid of __call__()
entirely!
We didn’t know it,
but it turns out that we were already intercepting
the Distance.km()
class method —
we just weren’t doing anything interesting yet.
Let’s fix that:
from functools import partial
def unit(conversion_factor, core_unit):
def wrap(method):
name = method.__name__
class unit_descriptor(object):
def __init__(self, method): # Same as before
self.method = method
update_wrapper(self, method)
def __get__(self, instance, objtype=None):
if instance is None: # New behavior:
return partial(constructor, objtype)
value = self.method(instance)
instance.__dict__[self.__name__] = value
return value
# New way to build a Distance:
def constructor(cls, value):
obj = cls.__new__(cls) # Make a new Distance
setattr(obj, name, value) # “Set .km to 217”
if conversion_factor is not None: # And set “.au”
value = value / conversion_factor
setattr(obj, core_unit, value)
return obj
return unit_descriptor(method)
return wrap
class Distance:
# ...
@unit(AU_KM, 'au')
def km(self):
return self.au * AU_KM
d = Distance.km(217)
print('au:', d.au)
print('km:', d.km)
au: 1.4505554055322529e-06
km: 217
Amazing! It works!
To avoid having to tell constructor()
too many new facts,
we go ahead and embed it inside the closure
so that it can get values like conversion_factor
and core_unit
for free.
The only thing it’s missing is the class it’s supposed to build,
whether that’s Distance
or Velocity
,
but we can provide that easily enough using a partial —
creating, in essence, a little hand-made bound method
that already knows its first argument.
(Yes, I could have instead used some black magic from Stack Overflow to create a real bound method. No, I am not in the least tempted to do so.)
The mission is now accomplished!
But code is kind of ugly: we’re building an inner class, inside of a function, inside of another function.
A final simplification fell out when I noticed that we were having to state twice, redundantly, the name of the unit and how to perform its conversion. Remember that the method we’re decorating still looks like this:
@unit(AU_KM, 'au')
def km(self):
return self.au * AU_KM
Why mention au
and AU_KM
twice?
That’s the kind of thing a descriptor is supposed to take care of for us!
So let’s abandon the complexity of wrapping a method,
abandon the need to have a closure,
and refactor back to a plain descriptor:
class unit(object):
def __init__(self, name, conversion_factor=None, core_unit=None):
self.name = name
self.conversion_factor = conversion_factor
self.core_unit = core_unit
def __get__(self, instance, objtype=None):
if instance is None: # If called as a class method:
def constructor(value): # (same as above)
obj = objtype.__new__(objtype)
setattr(obj, self.name, value)
conversion_factor = self.conversion_factor
if conversion_factor is not None:
value = value / conversion_factor
setattr(obj, self.core_unit, value)
return obj
return constructor
value = getattr(instance, self.core_unit)
value = value * self.conversion_factor
instance.__dict__[self.name] = value
return value
class Distance:
au = unit('au')
km = unit('km', AU_KM, 'au')
d = Distance.au(1.524)
print('au:', d.au)
print('km:', d.km)
print()
d = Distance.km(217)
print('au:', d.au)
print('km:', d.km)
au: 1.524
km: 227987154.9468
au: 1.4505554055322529e-06
km: 217
Well. What do you think?
Will I regret this later if I push this to production?
Again, replies to my tweet are welcome!