神刀安全网

The Descriptor Protocol and Python Black Magic


The Descriptor Protocol, and Python Black Magic

April 26, 2016 The Descriptor Protocol and Python Black Magic

Late last night, I saw a very confusing tweet:

    class A:         def b(self):             pass      if A.b is A.b:         print("Python 3")     else:         print("Python 2")     

— Jake VanderPlas (@jakevdp) April 26, 2016

Like any self-respecting programmer, the first thing I did was copy paste that into a terminal, even though I knew exactly what to expect. My response was simple:

@jakevdp what is this black magic!?!?— Avy Faingezicht (@avyfain) April 26, 2016

Since I graduated last summer, I have been writing lots of both Python 2 and 3. This snippet seemed like something I should understand well. However, I did not, so this post is an attempt to solve that. I was inspired by Julia Evans , and her campaign to share the things she learns, however incomplete her understanding might be.

This post assumes you have at least a basic understanding of Python and OOP. For a good overview of OOP in Python, I recommend Leonardo Giordani’s series which builds up nicely from simple concepts to the internals of Python classes (he also has one on Python 2.x , although I haven’t read it closely).

So, what is this black magic?

My first instinct was to check the behavior of the comparison itself. While == delegates to an object’s __eq__ method to check equality , the is keyword checks identity , so those objects can’t be the same in memory!

# Python 2 >>> A.b <unbound method A.b> >>> hex(id(A.b)) '0x1006ebc80' >>> hex(id(A.b)) '0x1006beb90'  # Python 3 >>> A.b <function A.b at 0x101b75158> >>> hex(id(A.b)) '0x101b75158' >>> hex(id(A.b)) '0x101b75158'

As expected! The memory locations (as given by id ) in Python 2 are different, causing the identity check to fail. Not so in 3 . So far so good. But why do we get unbound method on one end and function on another? How are these objects even stored internally? In most cases, Python uses a dictionary, accessible under __dict__ to store the local variables, or namespace of an object (Note that not all objects have a __dict__ , but that is a different story). Let’s look up b in A :

# Python 2 >>> A.__dict__['b'] <function b at 0x1007a8398> >>> type(A.__dict__['b']) <type 'function'> >>> type(A.b) <type 'instancemethod'>  # Python 3 >>> A.__dict__['b'] <function A.b at 0x101b75158> >>> type(A.__dict__['b']) <class 'function'> >>> type(A.b) <class 'function'>

Huh? In 2 we get an instancemethod , while 3 spits out a function, but if we check the type inside the enclosing __dict__ we see they are both functions ? How does this work? This is caused by the design of the Descriptor Protocol , which defines how data in an object is reached through a series of attribute accesses. In Python 2, the protocol sets in place a type distinction based on how the function object is accessed. In the doc, Raymond Hettinger explains:

# Python 2 >>> class D(object): ...      def f(self, x): ...           return x  >>> d = D() >>> D.__dict__['f'] # Stored internally as a function <function f at 0x00C45070> >>> D.f             # Get from a class becomes an unbound method <unbound method D.f> >>> d.f             # Get from an instance becomes a bound method <bound method D.f of <__main__.D object at 0x00B18C90>>

In 3, this distinction between bound and unbound doesn’t exist, but strangely, the docs for Python 3 are not up to date, so I can’t tell what the underlying behavior is. The same code clearly has a different output:

# Python 3 >>> class D(object): ...     def f(self, x): ...           return x  >>> d = D() >>> D.__dict__['f'] # Stored internally as a function <function D.f at 0x1014021e0> >>> D.f             # Get from a class becomes an unbound method... NOT! <function D.f at 0x1014021e0> >>> d.f             # Get from an instance becomes a bound method <bound method D.f of <__main__.D object at 0x10123cf28>>

Also explained in the documentation is the fact that both bound and unbound methods are backed by the same C implementation, except for the value of their im_self attribute, which is NULL when unbound. So I am guessing that instancemethod is creating a new instance of the function object at runtime in 2 regardless of whether it is bound or unbound, while in 3 the instantiation only happens when bound , given that the unbound s don’t exist. This would make sense, as the function must be executed each time you access it.

If that were the case, we would expect that calling b on an instance on A would always return a different object, regardless of which Python runtime we’re on, as they are always bound:

# Python 2&3 >>> a = A() >>> a.b is a.b False >>> hex(id(a.b)) '0x1003bf988' >>> hex(id(a.b)) '0x1003f1448'

So, the reason why A.b is A.b in Python 3, and not Python 2 is this whole bound/unbound story. Seems like the Descriptor Protocol is responsible for this sorcery! Magic is just technology we don’t understand, yet .

If you have more insight into the inner workings of this, I’d love tohear about it.

Update (4/26/16):Jake VanderPlas replied to my tweet, and pointed to a 2009 post by Guido describing the behavior. Apparently, the bound/unbound distinction was introduced as a way to achieve “first-class everything,” which methods didn’t quite fit into. Python 3’s undoing of unbound methods is just a further expression of the idea.

Update 2 (4/29/16):Today I received an email from Todd Jennings, who pointed me to the bug that tracks the out-of-date documentation for Python 3. Sadly, it is marked as still waiting.

Image: “The Witch No. 1” by Baker, Joseph E. – Licensed under Public Domain, via Wikimedia Commons Want to see more articles like this? Sign up below:

转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » The Descriptor Protocol and Python Black Magic

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址