By Jake Edge
June 2, 2016
Amber Brown led a session at the 2016 Python Language Summit on the progress in porting the Twisted event-driven networking framework to Python 3. The lack of a Python 3 version of Twisted has been considered one of the larger barriers to adopting the new version of the language, so progress on that front is of great interest in the Python community.
According to Brown, roughly 50% of the lines of code in Twisted have been ported at this point. But, of things that people are likely to want to use, that number is more like 78%, she said. Users who want to write their own Network News Transfer Protocol (NNTP) server will be out of luck, but for most of the commonly used protocols, Twisted will run on Python 3.
That amounts to some 40,000 lines of code ported, which opens up 100,000 lines of code in third-party libraries to be able to use Python 3. The 40 or so patches merged for the port had roughly 6500 lines inserted and 4400 lines deleted.
In the seven years since Python 3 came out, there has been little progress in porting Twisted until recently; she asked, "why has it taken so long?" The bytes versus Unicode divide was one of the major barriers and early releases in the 3.x series did not have support for byte-handling features that Twisted really needs. The change to how strings are handled was good, and cleaned up a lot of ambiguity, she said. But Twisted deals with protocols on the wire, so it needs to use byte strings.
Python 3 lacks explicit Unicode strings using the u'' notation, while Python 2 is missing b'' for byte strings. In addition, until relatively recent Python 3 versions, there was no way to use the " % " formatting operator for byte strings. PEP 461 was adopted to add formatting for byte strings at the behest of the Twisted and Mercurial projects. But there is no .format() method for byte strings, so the Python 2.7 Twisted code using that all has to change.
That leads to more time spent porting and more code to review, she said. There are effectively three string types in the Python world: bytes, Unicode strings, and strings. And there are inconsistencies among them. For example, sys.path() returns bytes on Unix, but strings on other operating systems. In addition, cgi.parse_multipart() returns strings on Python 3, which is just wrong.
There is an "avalanche of changes" that comes from porting to Python 3, she said. New style classes by default broke a lot of things, as did the differences between bound and unbound methods . But, she said, those in the room are all aware of these problems.
Porting to Python 3 was "the most expensive thing we have ever done for Twisted", she said. On average, she spent two hours a day for a year and a half working on it. That cost upwards of $60,000 just for her time, most of it unpaid. That doesn’t include lots of time spent by others, including reviewers, and there are "thousands of hours left to go". She is now down deep into porting the protocols in Twisted, which is the harder part.
The "unfortunate reality" is that if she didn’t do that work, it would not have happened. Other Twisted developers had written off porting to Python 3. Earlier versions in the 3.x series would have made the job too large, she said; it is only since Python 3.3 was released that porting has become tractable.
But porting to Python 3 has been a "massive drain" on the development of features in Twisted. Half of the patches in the review queue are for the port. As with most projects, reviewers are a scarce resource, and the port patches require a lot of care and knowledge of the problem domain.
That leads to the question of who is using Python 3. The reality is that Python is falling by the wayside for performance-sensitive applications, she said. People are turning to Go or other options. And Twisted on Python 3 is a less attractive target for developers than Twisted on PyPy 5.1—because of the performance.
So, Twisted has spent an enormous amount of time changing its codebase to end up with slower code. PyPy and Pyston make Python competitive with Go in terms of performance, but only really support Python 2.7 at this point. There are some 3,500 C API functions in Python, which is a huge barrier for projects like PyPy and Pyston. She asked: "How do we stop this from happening again?" Long term, it may well be thatasyncio (along with async / await ) will provide much of the functionality of the Twisted core.
Guido van Rossum asked about interoperability between Twisted and asyncio. Brown said that it is possible to await a Twisted Deferred so mixing the two is possible. Twisted will be able to share its event loop with that used by asyncio, she said. "The golden age of Twisted and asyncio is 2016", she said to a round of applause. There are still some patches to be merged and some edge cases to be worked out, but there is enough of Twisted working for Python 3 that it can be done.
Brown said she wondered what users with large Python 2.7 codebases would do in 2020 when 2.7 is deprecated and no longer gets updates. She thinks they will simply keep running it. Van Rossum said "that’s fine", but that they won’t get updates. For Twisted, though, Brown thinks the project will probably end up supporting Twisted on 2.7 for five years after users can realistically port to Python 3, which probably means 2022 or beyond.
( Log in
to post comments)