Tuesday, December 6, 2011

Writing 2.x & 3.x Compatible Code

While we're at the crossroads transitioning from Python 2 to 3, you may wonder whether it is possible to write code runs without modification under both Python 2 and 3. It seems like a reasonable request, but how would you get started? What breaks the most Python 2 code when executed by a 3.x interpreter?

print vs. print()

If you think like me, you'd say the print statement. That's as good a place to start as any, so let's give it a shot. The tricky part is that in 2.x, it's a statement, thus a keyword or reserved word while in 3.x, it's just a BIF. In other words, because language syntax is involved, you cannot use if statements, and no, Python still doesn't have #ifdef macros!

Let's try just putting parentheses around the arguments to print:

>>> print('Hello World!')

Hello World!

Cool! That works under both Python 2 and Python 3! Are we done? Sorry.

>>> print(10, 20) # Python 2

(10, 20)

You're not going to be as lucky this time as the former is a tuple while in Python 3, you're passing in multiple arguments to print():

>>> print(10, 20) # Python 3

10 20

If you think a bit more, perhaps we can check if print is a keyword. You may recall there is a keyword module which contains a list of keywords. Since print won't be a keyword in 3.x, you may think that it can be as simple as this:

>>> import keyword

>>> 'print' in keyword.kwlist

False

As a smart programmer, you'd probably try it in 2.x expecting a True response. Although you would be correct, you'd still fail for a different reason:

>>> import keyword

>>> if 'print' in keyword.kwlist:

...     from __future__ import print_function

...

File "", line 2

SyntaxError: from __future__ imports must occur at the beginning of the file

One solution which works requires you to use a function that has similar capabilities as print. One of them is sys.stdout.write() while another is distutils.log.warn(). For whatever reason, we decided to use the latter in many of this book's chapters. I suppose sys.stderr.write() will also work, if unbuffered output is your thing.

The "Hello World!" example would then look like this:

# Python 2.x

print 'Hello World!'

# Python 3.x

print('Hello World!')

The following line would work in both versions:

# Python 2.x & 3.x compatible

from distutils.log import warn as printf

printf('Hello World!')

That reminds me of why we didn't use sys.stdout.write()... we would need to add a NEWLINE character at the end of the string to match the behavior:

# Python 2.x & 3.x compatible

import sys

sys.stdout.write('Hello World!\n')

The one real problem isn't this little minor annoyance, but that these functions are no true proxy for print or print() for that matter... they only work when you've come up with a single string representing your output. Anything more complex requires you to put in more effort.

Import your way to a solution

In other situations, life is a bit easier, and you can just import the correct solution. In the code below, we want to import the urlopen() function. In Python 2, it lives in urllib and urllib2 (we'll use the latter), and in Python 3, it's been integrated into urllib.request. Your solution which works for both 2.x and 3.x is neat and simple in this case:

try:

    from urllib2 import urlopen

except ImportError:

    from urllib.request import urlopen

For memory conservation, perhaps you're interested in the iterator (Python 3) version of a well-known built-in like zip(). In Python 2, the iterator version is itertools.izip(). This function is renamed as and replaces zip() in Python 3, and if you insist on this iterator version, your import statement is also fairly straightforward:

try:

    from itertools import izip as zip

except ImportError:

    pass

One example which isn't as elegant looking is the StringIO class. In Python 2, the pure Python version is in the StringIO module, meaning you access it via StringIO.StringIO. There is also a C version for speed, and that's located at cStringIO.StringIO. Depending on your Python installation, you may prefer cStringIO first and fallback to StringIO if cStringIO is not available.

In Python 3, Unicode is the default string type, but if you're doing any kind of networking, it's likely you'll have to manipulate ASCII/bytes strings instead, so instead of StringIO, you'd want io.BytesIO. In order to get what you want, the import is slightly uglier:

try:

    from io import BytesIO as StringIO

except ImportError:

    try:

        from cStringIO import StringIO

    except ImportError:

        from StringIO import StringIO

Putting it all together

If you're lucky, these are all the changes you have to make, and the rest of your code is simpler than the setup at the beginning. If you install the imports above of distutils.log.warn() [as printf()], url*.urlopen(), *.StringIO, and a normal import of xml.etree.ElementTree (2.5 and newer), you can write a very short parser to display the top headline stories from the Google News service with just these roughly eight lines of code:

g = urlopen('http://news.google.com/news?topic=h&output=rss')

f = StringIO(g.read())

g.close()

tree = xml.etree.ElementTree.parse(f)

f.close()

for elmt in tree.getiterator():

    if elmt.tag == 'title' and not \

            elmt.text.startswith('Top Stories'):

        printf('- %s' % elmt.text)

This script runs exactly the same under 2.x and 3.x with no changes to the code whatsoever. Of course, if you're using 2.4 and older, you need to download ElementTree separately.

The code snippets in this subsection come from the "Text Processing" chapter of the book, so take a look at the goognewsrss.py file to see the full version in action.

Some will feel that these changes really start to mess up the elegance of your Python source. After all, readbility counts! If you prefer to keep your code cleaner yet still write code that runs under both versions without changes, take a look at the six package.

six is a compatibility library who's primary role is to provide an interface to keep your application code the same while hiding the complexities described in this appendix subsection from the developer. To find out more about six, read this: http://packages.python.org/six

Regardless whether you use a library like six or choose to roll your own, we hoped to show in this short narrative that it is possible to write code that runs under both 2.x & 3.x. The bottom line is that you may have to sacrifice some of the elegance and simplicity of Python, trading it off for true 2 to 3 portability. I'm sure we'll be revisiting this issue for the next few years until the whole world has completed the transition to the next generation.

No comments:

Post a Comment