Saturday, September 20, 2014

Simple Google API access from Python

Introduction

Back in 2012 when I published Core Python Applications Programming, 3rd ed., I
posted about how I integrated Google technologies into the book. The only problem is that I presented very specific code for Google App Engine and Google+ only. I didn't show a generic way how, using pretty much the same boilerplate Python snippet, you can access any number of Google APIs; so here we are.

In this multi-part series, I'll break down the code that allows you to leverage Google APIs to the most basic level (even for Python), so you can customize as necessary for your app, whether it's running as a command-line tool or something server-side in the cloud backending Web or mobile clients. If you've got the book and played around with our Google+ API example, you'll find this code familiar, if not identical -- I'll go into more detail here, highlighting the common code for generic API access and then bring in the G+-relevant code later.

We'll start in this first post by demonstrating how to access public or unauthorized data from Google APIs. (The next post will illustrate how to access authorized data from Google APIs.) Regardless of which you use, the corresponding boilerplate code stands alone. In fact, it's probably best if you saved these generic snippets in a library module so you can (re)use the same bits for any number of apps which access any number of modern Google APIs.

Google API access

In order to access Google APIs, follow these instructions:
  • Go to the Google Developers Console and login.
    • Use your Gmail or Google credentials; create an account if needed
  • Click "Create Project" button
    • Enter a Project Name (mutable, human-friendly string only used in the console)
    • Enter a Project ID (immutable, must be unique and not already taken)
  • Once project has been created, click "Enable an API" button
    • You can toggle on any API(s) that support(s) simple API access (not authorized).
    • For the code example below, we use the Google+ API.
    • Other ideas: YouTube Data API, Google Maps API, etc.
    • Find more APIs (and version#s which you need) at the OAuth Playground.
  • Select "Credentials" in left-nav under "APIs & auth"
    • Go to bottom half and click "Create new Key" button
    • Grab long "API KEY" cryptic string and save to Python script

Accessing Google APIs from Python

Now that you're set up, everything else is done on the Python side. To talk to a Google API, you need the Google APIs Client Library for Python, specifically the apiclient.discovery.build() function. Download and install the library in your usual way, for example:

$ pip install -U google-api-python-client
NOTE: If you're building a Python App Engine app, you'll need something else, the Google APIs Client Library for Python on Google App Engine. It's similar but has extra goodies (specifically decorators -- brief generic intro to those in my previous post) just for cloud developers that must be installed elsewhere. As App Engine developers know, libraries must be in the same location on the filesystem as your source code.
Once everything is installed, make sure that you can import apiclient.discovery:

$ python
Python 2.7.6 (default, Apr  9 2014, 11:48:52)
[GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.38)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import apiclient.discovery
>>>

In discovery.py is the build() function, which is what we need to create a service endpoint for interacting with an API. Now craft the following lines of code in your command-line tool:

from apiclient.discovery import build

API_KEY = # copied from project credentials page
SERVICE = build(API, VERSION, developerKey=API_KEY)

Take the API KEY you copied from the credentials page and assign to the API_KEY variable as a string. Obviously, embedding an API key in source code isn't something you'd so in practice as it's not secure whatsoever -- stick it in a database, key broker, encrypt, or at least have it in a separate byte code (.pyc/.pyo) file that you import -- but we'll allow it now solely for illustrative purposes of a simple command-line script.

In our short example we're going to do a simple search for "python" in public Google+ posts, so for the API variable, use the string 'plus'. The API version is currently on version 1 (at the time of this writing), so use 'v1' for VERSION. (Each API will use a different name and version string... again, you can find those in the OAuth Playground or in the docs for the specific API you want to use.) Here's the call once we've filled in those variables:

SERVICE = build('plus', 'v1', developerKey=API_KEY)

We need a template for the results that come back. There are many fields in a Google+ post, so we're only going to pick three to display... the user name, post timestamp, and a snippet of the post itself:

TMPL = '''
    User: %s
    Date: %s
    Post: %s
'''

Now for the code. Google+ posts are activities (known as "notes;" there are other activities as well). One of the methods you have access to is search(), which lets you query public activities; so that's what we're going to use. Add the following call using the SERVICE endpoint you already created using the verbs we just described and execute it:

items = SERVICE.activities().search(query='python').execute().get('items', [])

If all goes well, the (JSON) response payload will contain a set of 'items' (else we assign an empty list for the for loop). From there, we'll loop through each matching post, do some minor string manipulation to replace all whitespace characters (including NEWLINEs [ \n ]) with spaces, and display if not blank:

for data in items:
    post = ' '.join(data['title'].strip().split())
    if post:
        print TMPL % (data['actor']['displayName'],
                      data['published'], post)

We're using the print statement here in Python 2, but a pro tip to start getting ready for Python 3 is to add this import to the top of your script (which has no effect in 3.x) so you can use the print() function instead:

from __future__ import print_function

Conclusion

To find out more about the input parameters as well as all the fields that are in the response, take a look at the docs. Below is the entire script missing only the API_KEY which you'll have to fill in yourself.

#!/usr/bin/env python

from apiclient.discovery import build

TMPL = '''
    User: %s
    Date: %s
    Post: %s
'''

API_KEY = # copied from project credentials page
SERVICE = build('plus', 'v1', developerKey=API_KEY)
items = SERVICE.activities().search(query='python').execute().get('items', [])
for data in items:
    post = ' '.join(data['title'].strip().split())
    if post:
        print TMPL % (data['actor']['displayName'],
                      data['published'], post)

When you run it, you should see pretty much what you'd expect, a few posts on Python, some on Monty Python, and of course, some on the snake -- I called my script plus_u.py for "Google+ unauthenticated:"

$ python plus_u.py 

    User: Jeff Ward
    Date: 2014-09-20T18:08:23.058Z
    Post: How to make python accessible in the command window.


    User: Fayland Lam
    Date: 2014-09-20T16:40:11.512Z
    Post: Data Engineer http://findmjob.com/job/AB7ZKitA5BGYyW1oAlQ0Fw/Data-Engineer.html #python #hadoop #jobs...


    User: Willy's Emporium LTD
    Date: 2014-09-20T16:19:33.851Z
    Post: MONTY PYTHON QUOTES MUG Take a swig to wash down all that albatross and crunchy frog. Featuring 20 ...


    User: Doddy Pal
    Date: 2014-09-20T15:49:54.405Z
    Post: Classic Monty Python!!!


    User: Sebastian Huskins
    Date: 2014-09-20T15:33:00.707Z
    Post: Made a small python script to get shellcode out of an executable. I found a nice commandlinefu.com oneline...

EXTRA CREDIT: To test your skills, check the docs and add a fourth line to each output which is the URL/link to that specific post, so that you (and your users) can open a browser to it if of interest.

If you want to build on from here, check out the larger app using the Google+ API featured in Chapter 15 of the book -- it adds some brains to this basic code where the Google+ posts are sorted by popularity using a "chatter" score. That just about wraps it up this post... tune in next time to see how to get authorized data access!

Saturday, July 26, 2014

Introduction to Python decorators

In this post, we're going to give you a user-friendly introduction to Python decorators. (The code works on both Python 2 [2.6 or 2.7 only] and 3 so don't be concerned with your version.) Before jumping into the topic du jour, consider the usefulness of the map() function. You've got a list with some data and want to apply some function [like times2() below] to all its elements and get a new list with the modified data:

def times2(x):
    return x * 2

>>> list(map(times2, [0, 1, 2, 3, 4]))
[0, 2, 4, 6, 8]

Yeah yeah, I know that you can do the same thing with a list comprehension or generator expression, but my point was about an independent piece of logic [like times2()] and mapping that function across a data set ([0, 1, 2, 3, 4]) to generate a new data set ([0, 2, 4, 6, 8]). However, since mapping functions like times2()aren't tied to any particular chunk of data, you can reuse them elsewhere with other unrelated (or related) data.

Along similar lines, consider function calls. You have independent functions and methods in classes. Now, think about "mapped" execution across functions. What are things that you can do with functions that don't have much to do with the behavior of the functions themselves? How about logging function calls, timing them, or some other introspective, cross-cutting behavior. Sure you can implement that behavior in each of the functions that you care about such information, however since they're so generic, it would be nice to only write that logging code just once.

Introduced in 2.4, decorators modularize cross-cutting behavior so that developers don't have to implement near duplicates of the same piece of code for each function. Rather, Python gives them the ability to put that logic in one place and use decorators with its at-sign ("@") syntax to "map" that behavior to any function (or method). This compartmentalization of cross-cutting functionality gives Python an aspect-oriented programming flavor.

How do you do this in Python? Let's take a look at a simple example, the logging of function calls. Create a decorator function that takes a function object as its sole argument, and implement the cross-cutting functionality. In logged() below, we're just going to log function calls by making a call to the print() function each time a logged function is called.

def logged(_func):
    def _wrapped():
        print('Function %r called at: %s' % (
            _func.__name__, ctime()))
        return _func()
    return _wrapped

In logged(), we use the function's name (given by func.__name__) plus a timestamp from time.ctime() to build our output string. Make sure you get the right imports, time.ctime() for sure, and if using Python 2, the print() function:

from __future__ import print_function # 2.6 or 2.7 only
from time import ctime

Now that we have our logged() decorator, how do we use it? On the line above the function which you want to apply the decorator to, place an at-sign in front of the decorator name. That's followed immediately on the next line with the normal function declaration. Here's what it looks like, applied to a boring generic foo() function which just print()s it's been called.

@logged
def foo():
    print('foo() called')

When you call foo(), you can see that the decorator logged() is called first, which then calls foo() on your behalf:

$ log_func.py
Function 'foo' called at: Sun Jul 27 04:09:37 2014
foo() called

If you take a closer look at logged() above, the way the decorator works is that the decorated function is "wrapped" so that it is passed as func to the decorator then the newly-wrapped function _wrapped()is (re)assigned as foo(). That's why it now behaves the way it does when you call it.

The entire script:

#!/usr/bin/env python
'log_func.py -- demo of decorators'

from __future__ import print_function
 # 2.6 or 2.7 only
from time import ctime

def logged(_func):
    def _wrapped():
        print('Function %r called at: %s' % (
              _func.__name__, ctime()))
        return _func()
    return _wrapped

@logged
def foo():
    print('foo() called')

foo()


That was just a simple example to give you an idea of what decorators are. If you dig a little deeper, you'll discover one caveat is that the wrapping isn't perfect. For example, the attributes of foo() are lost, i.e., its name and docstring. If you ask for either, you'll get _wrapped()'s info instead:

>>> print("My name:", foo.__name__) # should be 'foo'!
My name: _wrapped
>>> print("Docstring:", foo.__doc__) # _wrapped's docstring!
Docstring: None

In reality, the "@" syntax is just a shortcut. Here's what you really did, which should explain this behavior:

def foo():
    print('foo() called')

foo = logged(foo) # returns _wrapped (and its attributes)

So as you can tell, it's not a complete wrap. A convenience function that ties up these loose ends is functools.wraps(). If you use it and run the same code, you will get foo()'s info. However, if you're not going to use a function's attributes while it's wrapped, it's less important to do this.

There's also support for additional features, such calling decorated functions with parameters, applying more complex decorators, applying multiple levels of decorators, and also class decorators. You can find out more about (function and method) decorators in Chapter 11 of Core Python Programming or live in my upcoming course which starts in just a few days near the San Francisco airport... there are still a few seats left!

Wednesday, September 4, 2013

Learning Programming

Two years ago, I wrote a post on "learning Python" to launch this blog dedicated to Python. While useful, it doesn't address beginners' needs as much, so it's time for a revisit. Because Python is such a user-friendly language for beginners, I'm often asked whether Python is the "best first language" for those new to programming. While tempted to respond in the affirmative, my answer really is, "it depends." It depends on your experience, age, level of exposure, etc.

Yes, there are indeed plenty of resources out there, such as courses from online learning brands such as Khan Academy, Udacity, Coursera, Codecademy, CodeSchool, and edX, but most certainly don't come with an instructor, instead relying on live or recorded videos and possibly supplemental study groups, or "cohort learning," as a colleague of mine has branded it. Whatever the mechanism, it's surely better than pure online tutorials or slaving away over a book, neither of which come with instructors either.

Stepping back a bit, before jumping into hardcore C/C++, Java, PHP, Ruby, or Javascript lessons, for learning tools that are used in industry today, there are better stepping stones to get you there. You may be a kid or a professional who either doesn't code much or had done so long ago. You're say that type of user who is "insulted" by the move "left" or "right" commands for controlling a turtle, say, and desire something more complex. The good news is that there are tools out there, more which allow you to venture further without an instructor.

One of them is Scratch, a "jigsaw puzzle"-like programming language created at MIT. Yes, you will do left, right, up, down, etc., but you'll also get to play audio, video, repeat commands, draw graphics, and make sounds. This tool is great for teaching the young learner, who don't need any of the advanced features but which are available for when they're ready to take the next step. It can be used to teach children the concepts of programming without all the syntax that text-based programming languages feature which may make learning those concepts a burden.

If you wish to proceed, go to the website to get started. They've got videos there as well as projects you can copy. As you can see, you snap together puzzle pieces that teach you coding. Better yet, to get started even more quickly, clone one (or more) of the projects, and "tweak" the code a bit to "do your own thing." In time, you may even develop your own fun applications or real games. Another similar graphical learning tool to consider is Alice from the University of Virginia and now Carnegie-Mellon University.

Once you're comfortable with that type of working environment, there's a similar tool from MIT called App Inventor. Leverage your Scratch skills and start building applications that run on Android devices! There's an emulator, so you don't really need an Android device, but it's certainly more rewarding when you can use an app that you built running on a tablet or phone! (Try a family friend who may have an old device they don't use any more.)

Once you're to move beyond block-like languages, there are 2 good choices (or better yet, do both!). One of which the de facto language of the web: Javascript. Unfortunately, there are so many online tutorials out there, I wouldn't know which to suggest, so looking forward to your comments below. The ones which are the most effective however, have you learning then coding directly into the browser and seeing results immediately, requiring you to write successful Javascript before allowing you to move on.

The thing about Javascript is that code typically only runs within the browser, to control web pages (i.e., "DOM manipulation") and actions you can take on a single page -- it can also be AJAX code that makes an external call to update a page without requiring a page load. Nevertheless, browser-only execution can be somewhat limiting, so there are now 2 additional ways you can use it.

One is to write "server-side" applications via Node.js. This type of Javascript allows you to write code that executes on the remote machine serving your web pages (generally) after you've entered information in a form and clicked submit. For every web page that users see and interact with, there's also got to be code on the server side that does all the work! This code will also end up returning the final HTML that users see in their browsers once the form has been submitted and results returned.

Another place you can use Javascript is in Google's cloud. The tool there is called Google Apps Script. Using Apps Script, you can create applications that interact with various Google Apps, automate repetitive tasks, or write glue code that lets you connect and share data between different Google services. Try some of their tutorials to get started!

The other option besides Javascript is Python. No doubt you already know what it is since you're here. Python's syntax is extremely approachable for beginners and is widely considered "executable pseudocode." That's right, a programming language that doesn't require you have a Computer Science degree to make good use of it! It's also one of those rare languages that can be used by adults in the professional world as well as by kids learning how to code. Sure there are many online learning systems out there, a sampling of which are here:
See if you like any of them or have your new coder friends try them out. However, I think kids (and even adults) learn programming best when they get to write cool games (leveraging the amazing PyGame library). There are several books written just for kids, including "Hello World" which was actually written by an engineer and his son! Along with that book there are two more you should consider:
Two of the three books above are in the beginners list I created over a year ago along with two other Python reading lists in this post. (The third book should be added to the list as well.) Those of you who are already programmers probably know which one I would recommend. :-) Seriously though, those reading lists show that I can toot other horns too. :P

Here are other online projects and learning resources, including book websites, that you can also try (many are for kids):
In conjunction with a good learning system, book, or project-based learning above, you should also try out one of many free online courses to validate things you've picked up but to also build other knowledge you haven't learned yet. There are a pair from Coursera and one from Udacity:
For existing programmers who are still questioning why Python, check out Udacity's motivational blogpost.

That's it! Hopefully I've given you enough resources you can pass along to friends and family members who are intrigued by your passion for computer programming and wish to see what all the excitement is all about. A young man I met on vacation this summer motivated this post... good luck Mitchell! I hope to see the rest of you on the road as well, perhaps at a developers' conference or sitting in one of my upcoming Python courses!

Tuesday, May 29, 2012

Tuples aren't what you think they're for

While I'm happy that the number of Python users continues to grow at a rapid pace and that there are many tutorials added each day to support all the newbies, there are a few things that make me cringe when I see them.

One example of this is seeing a Python college textbook (you can tell by its retail price) produced by a big-name publisher (one of the largest in the world which shall remain unnamed) that instructs users (of Python 2), to get user command-line input using the input() function! Clearly, this is a major faux pas, as most Python users know that it's a security risk and that raw_input() should always be used instead (and the main reason why raw_input() replaces and is renamed as input() in Python 3).

Another example is this recent article on lists and tuples. While I find the content useful in teaching new Python developers various useful ways of using slicing, I disagree with the premise that tuples...
  1. along with lists are two of Python's most popular data structures
  2. are mostly immutable but there are workarounds, and
  3. should be used for application data manipulation

I would says lists and dictionaries are the two most popular Python data structures; tuples shouldn't even be in that group. In fact, I would even argue that tuples shouldn't be used to manipulate application data at all, as that wasn't what they were generally created for. (If this was the case, then why not have lists with a read-only flag?)

The main reason why tuples exist is to get data to and from function calls. [UPDATE: two other strong use cases: 1) "constructed" dictionary keys (i would've turned such N-tuples into a delimited string) and from that use comes 2) a data structure with positional semantics, aka indices with implied meaning... both of these view such tuples as an individual entity (made up of multiple components), again, not a data structure for manipulating objects. Named tuples is an related alternative. See the debate in the commentary below.]

Calling a foreign API or 3rd-party function and want to pass in a data structure you know can't be altered? Check. Calling any function where you want to pass in only one data structure (instead of separate variables)? Use "*" and you're good to go. Previously worked with a programming language that only allowed you to return a single value? Tuples are that one object (think of it as a single shopping bag for all your groceries).

All of the manipulations in the post on getting around the immutability are superfluous and not adhering to the best practice of not using tuples as a data structure. I mean, this is not a strict rule. If you're needing a data structure where you're not going to make any modifications and desire slightly better performance, sure a tuple can be used in such cases. This is why in Python 2.6, for the first time "evar," tuples were given methods!

There was never any need for tuples to have methods because they were immutable. "Just use lists," is what we would all say. However, lists had a pair of read-only methods (count() and index()) that led to inefficiencies (and poor practices) where developers used tuples for the reason we just outlined but needed to either get a count on how many times an object appeared in that sequence or wanted to find the index of the first appearance of an object. They would have to convert that tuple to a list, just to call those methods. Starting in 2.6, tuples now have those (and only those) methods to avoid this extra nonsense.

So yes, you can use tuples as user-land data structures in such cases, but that's really it. For manipulation, use lists instead. As stated at the top, I'm generally all for more intro posts and tutorials out there. However, there may be some that don't always impart the best practices out there. Readers should always be alert and question whether there are more "Pythonic" ways of doing things. In this case, tuples should not be one of the "[two] of the most commonly used built-in data types in Python...."

Friday, April 6, 2012

Integrating Google APIs and Technologies

In 1997, long before my tenure at Google, I became a member of the Python community in helping to create Yahoo!Mail, one of the most popular web-based email systems in the world. There were only two Python books on the market back then, and neither addressed my developer’s need to learn Python quickly and competently, so I had to resort to the online docs. This absence, and consequently my development of class materials for a Python course, inspired me to write Prentice Hall’s bestselling Core Python Programming over a decade ago. Since then, I’ve used Python to work on all kinds of interesting applications, from web-based e-mail to geolocalized product search engines, social media games, antispam/antivirus e-mail appliances, and most interestingly, software for doctors to help them analyze and assess patients with spinal fractures. (Ask me about osteoporosis!)

Today at Google, my work involves advocating our tools and APIs to the global developer community. Now that I've been part of the Google family for the past 2.5 years, I thought it would be fun to integrate some of our technologies into the book. With the just-published 3rd edition, readers will find revised but also brand new material they can use to build real applications with. Some of the Google technologies I've integrated into Core Python Applications Programming include accessing your Gmail, parsing Google News XML feeds, and a complete chapter on cloud computing with Google App Engine. It’s also the first published book to feature code that utilizes the Google+ API. While the book contains a longer example using that API, I want to show you how easy it is to connect to Google+ using Python right now!

The bulk of the work in connecting to Google+ (and other Google APIs) is done by my fellow colleagues who maintain the Google APIs Client Library for Python, easily downloaded with pip or easy_install as "google-api-python-client." With this library, the most difficult step to connect with your API of choice has basically been reduced to a single line... see the fourth line of this short Python 2.x example:

# plus.py (by Wesley Chun under CC-SA3.0 license)
from apiclient import discovery

API_KEY = YOUR_KEY_FROM_CONSOLE_API_ACCESS_PAGE
service = discovery.build("plus", "v1", developerKey=API_KEY)
feed = service.activities().search(query='android').execute()
for record in feed['items']:
post = ' '.join(record['title'].strip().split())
if post:
print '\nFrom:', record['actor']['displayName']
print 'Post:', post
print 'Date:', record['published']

In that one line of code (italicized above), we use the Google APIs Client Library's apiclient.discovery.build() method, passing in: a) the desired API ("plus" for Google+), b) the version (currently "v1"), and c) the API key you obtained from your project's development console in the "Simple API Access" section. This key gives your project access to APIs that do not need to access user data. Once we have a handle to the service, we can execute generic queries on the available data stream.

In this code snippet, we're simply querying for the latest (public) Google+ posts that are related to Android and displaying them on the command-line (code can be easily repurposed into any mobile or web application). Naturally, you need to go through the OAuth flow if you do want access to authenticated data. Give it a try!

If you like the code, dig into Core Python Applications Programming for a longer, more detailed example. Both scripts can be downloaded at the website (the code is part of Chapter 15), and you can get involved in the conversation on the Google+ page. I'm open to all feedback, suggestions, and fixes. You can find me at +wescpy or @wescpy. Looking forward to meeting you at an upcoming Google or Python event or in one of my public courses!

Friday, March 9, 2012

A new PyCon... and a new book (and a new article)!!

I'm excited about this year's PyCon conference happening this time in the heart of Silicon Valley. There are many firsts, so let's just list a few here (let me know if I'm missing any)!

This is the first PyCon...
  • ever held in Silicon Valley (although older Python workshops have been hosted here)
  • that had a cap; yes, we "ended" registration at 1500 people
  • where we ran out of swag bags; 1800 were ordered... POOF, gone by mid-Saturday
  • to have sold out! (even though we capped at 1500; didn't stop it from going over 2000)
  • with an attendance near or exceeding 2200 (2257)
  • that had to stop accepting sponsorships... at 136!!!
  • to feature a physical race (not to be confused with a race condition)
Another exciting announcement is that my first 3rd edition Core Python book will be published and debuting at the conference!! It's called Core Python Applications Programming and based on the second part of the original Core Python Programming book. All of the books' individual home pages are now unified at corepython.com. The books also have a shared Google+ page for you to encircle! They're literally "hot off the presses" as they were overnighted by the printer to the publisher's hotel and brought by hand to the conference! (Amazon's not shipping them for another 10 days after that!)

The new book features upgrades and new stuff added to existing chapters as well as brand new chapters on Django, Google App Engine, and text processing with CSV, JSON, and XML. There is even new material on Twitter and Google+ in case you're feeling more social than when the previous edition was published. Those of you asking for that PowerPoint slideshow generator for the past N years, or perhaps an intro to NoSQL/MongoDB? Yep, they're in there too! Finally, I've added not only Python 3 equivalents to many of the code samples, but I also cover some best practices when porting from 2.x to 3.x.

With all of the updates and new material, I'm hoping that this will be one of the most popular places for intermediate Python programmers to go once they've gotten comfortable with the langauge but want to apply their skills to a variety of topics in Python development today. While the coverage doesn't necessarily go particularly deep, the goal is to give programmers a kickstart with a comprehensive introduction.

To help kickoff the new book, I got to thinking about Python books in general, especially the numerous times that people have either asked me or asked in some online forum: "What's a good Python book?" Unlike Python, there's not one right answer for this question, so as part of this exploration, I came up with 3 different book lists for diverse audiences of readers out there. You can find that article at InformIT.

In the meantime, it's back to the drawing board for me as I prepare to work on the 3rd edition of the main part of Core Python. If you've got ideas or suggestions on updating part 1 or wish to participate in the review process, please contact me now! (@wescpy/+wescpy)

ps. For those interested in brushing up on your Python skills, I'll be offering my popular Intro+Intermediate course this summer near the San Francisco airport. Go to cyberwebconsulting.com for more information!

Tuesday, December 6, 2011

Writing 2.x & 3.x Compatible Code

While we're at the crossroads transitioning from Python 2 to 3, you may wonder whether it is possible to write code runs without modification under both Python 2 and 3. It seems like a reasonable request, but how would you get started? What breaks the most Python 2 code when executed by a 3.x interpreter?

print vs. print()

If you think like me, you'd say the print statement. That's as good a place to start as any, so let's give it a shot. The tricky part is that in 2.x, it's a statement, thus a keyword or reserved word while in 3.x, it's just a BIF. In other words, because language syntax is involved, you cannot use if statements, and no, Python still doesn't have #ifdef macros!

Let's try just putting parentheses around the arguments to print:

>>> print('Hello World!')

Hello World!

Cool! That works under both Python 2 and Python 3! Are we done? Sorry.

>>> print(10, 20) # Python 2

(10, 20)

You're not going to be as lucky this time as the former is a tuple while in Python 3, you're passing in multiple arguments to print():

>>> print(10, 20) # Python 3

10 20

If you think a bit more, perhaps we can check if print is a keyword. You may recall there is a keyword module which contains a list of keywords. Since print won't be a keyword in 3.x, you may think that it can be as simple as this:

>>> import keyword

>>> 'print' in keyword.kwlist

False

As a smart programmer, you'd probably try it in 2.x expecting a True response. Although you would be correct, you'd still fail for a different reason:

>>> import keyword

>>> if 'print' in keyword.kwlist:

...     from __future__ import print_function

...

File "", line 2

SyntaxError: from __future__ imports must occur at the beginning of the file

One solution which works requires you to use a function that has similar capabilities as print. One of them is sys.stdout.write() while another is distutils.log.warn(). For whatever reason, we decided to use the latter in many of this book's chapters. I suppose sys.stderr.write() will also work, if unbuffered output is your thing.

The "Hello World!" example would then look like this:

# Python 2.x

print 'Hello World!'

# Python 3.x

print('Hello World!')

The following line would work in both versions:

# Python 2.x & 3.x compatible

from distutils.log import warn as printf

printf('Hello World!')

That reminds me of why we didn't use sys.stdout.write()... we would need to add a NEWLINE character at the end of the string to match the behavior:

# Python 2.x & 3.x compatible

import sys

sys.stdout.write('Hello World!\n')

The one real problem isn't this little minor annoyance, but that these functions are no true proxy for print or print() for that matter... they only work when you've come up with a single string representing your output. Anything more complex requires you to put in more effort.

Import your way to a solution

In other situations, life is a bit easier, and you can just import the correct solution. In the code below, we want to import the urlopen() function. In Python 2, it lives in urllib and urllib2 (we'll use the latter), and in Python 3, it's been integrated into urllib.request. Your solution which works for both 2.x and 3.x is neat and simple in this case:

try:

    from urllib2 import urlopen

except ImportError:

    from urllib.request import urlopen

For memory conservation, perhaps you're interested in the iterator (Python 3) version of a well-known built-in like zip(). In Python 2, the iterator version is itertools.izip(). This function is renamed as and replaces zip() in Python 3, and if you insist on this iterator version, your import statement is also fairly straightforward:

try:

    from itertools import izip as zip

except ImportError:

    pass

One example which isn't as elegant looking is the StringIO class. In Python 2, the pure Python version is in the StringIO module, meaning you access it via StringIO.StringIO. There is also a C version for speed, and that's located at cStringIO.StringIO. Depending on your Python installation, you may prefer cStringIO first and fallback to StringIO if cStringIO is not available.

In Python 3, Unicode is the default string type, but if you're doing any kind of networking, it's likely you'll have to manipulate ASCII/bytes strings instead, so instead of StringIO, you'd want io.BytesIO. In order to get what you want, the import is slightly uglier:

try:

    from io import BytesIO as StringIO

except ImportError:

    try:

        from cStringIO import StringIO

    except ImportError:

        from StringIO import StringIO

Putting it all together

If you're lucky, these are all the changes you have to make, and the rest of your code is simpler than the setup at the beginning. If you install the imports above of distutils.log.warn() [as printf()], url*.urlopen(), *.StringIO, and a normal import of xml.etree.ElementTree (2.5 and newer), you can write a very short parser to display the top headline stories from the Google News service with just these roughly eight lines of code:

g = urlopen('http://news.google.com/news?topic=h&output=rss')

f = StringIO(g.read())

g.close()

tree = xml.etree.ElementTree.parse(f)

f.close()

for elmt in tree.getiterator():

    if elmt.tag == 'title' and not \

            elmt.text.startswith('Top Stories'):

        printf('- %s' % elmt.text)

This script runs exactly the same under 2.x and 3.x with no changes to the code whatsoever. Of course, if you're using 2.4 and older, you need to download ElementTree separately.

The code snippets in this subsection come from the "Text Processing" chapter of the book, so take a look at the goognewsrss.py file to see the full version in action.

Some will feel that these changes really start to mess up the elegance of your Python source. After all, readbility counts! If you prefer to keep your code cleaner yet still write code that runs under both versions without changes, take a look at the six package.

six is a compatibility library who's primary role is to provide an interface to keep your application code the same while hiding the complexities described in this appendix subsection from the developer. To find out more about six, read this: http://packages.python.org/six

Regardless whether you use a library like six or choose to roll your own, we hoped to show in this short narrative that it is possible to write code that runs under both 2.x & 3.x. The bottom line is that you may have to sacrifice some of the elegance and simplicity of Python, trading it off for true 2 to 3 portability. I'm sure we'll be revisiting this issue for the next few years until the whole world has completed the transition to the next generation.