Tuesday, May 29, 2012

Tuples aren't what you think they're for

While I'm happy that the number of Python users continues to grow at a rapid pace and that there are many tutorials added each day to support all the newbies, there are a few things that make me cringe when I see them.

One example of this is seeing a Python college textbook (you can tell by its retail price) produced by a big-name publisher (one of the largest in the world which shall remain unnamed) that instructs users (of Python 2), to get user command-line input using the input() function! Clearly, this is a major faux pas, as most Python users know that it's a security risk and that raw_input() should always be used instead (and the main reason why raw_input() replaces and is renamed as input() in Python 3).

Another example is this recent article on lists and tuples. While I find the content useful in teaching new Python developers various useful ways of using slicing, I disagree with the premise that tuples...
  1. along with lists are two of Python's most popular data structures
  2. are mostly immutable but there are workarounds, and
  3. should be used for application data manipulation

I would says lists and dictionaries are the two most popular Python data structures; tuples shouldn't even be in that group. In fact, I would even argue that tuples shouldn't be used to manipulate application data at all, as that wasn't what they were generally created for. (If this was the case, then why not have lists with a read-only flag?)

The main reason why tuples exist is to get data to and from function calls. [UPDATE: two other strong use cases: 1) "constructed" dictionary keys (i would've turned such N-tuples into a delimited string) and from that use comes 2) a data structure with positional semantics, aka indices with implied meaning... both of these view such tuples as an individual entity (made up of multiple components), again, not a data structure for manipulating objects. Named tuples is an related alternative. See the debate in the commentary below.]

Calling a foreign API or 3rd-party function and want to pass in a data structure you know can't be altered? Check. Calling any function where you want to pass in only one data structure (instead of separate variables)? Use "*" and you're good to go. Previously worked with a programming language that only allowed you to return a single value? Tuples are that one object (think of it as a single shopping bag for all your groceries).

All of the manipulations in the post on getting around the immutability are superfluous and not adhering to the best practice of not using tuples as a data structure. I mean, this is not a strict rule. If you're needing a data structure where you're not going to make any modifications and desire slightly better performance, sure a tuple can be used in such cases. This is why in Python 2.6, for the first time "evar," tuples were given methods!

There was never any need for tuples to have methods because they were immutable. "Just use lists," is what we would all say. However, lists had a pair of read-only methods (count() and index()) that led to inefficiencies (and poor practices) where developers used tuples for the reason we just outlined but needed to either get a count on how many times an object appeared in that sequence or wanted to find the index of the first appearance of an object. They would have to convert that tuple to a list, just to call those methods. Starting in 2.6, tuples now have those (and only those) methods to avoid this extra nonsense.

So yes, you can use tuples as user-land data structures in such cases, but that's really it. For manipulation, use lists instead. As stated at the top, I'm generally all for more intro posts and tutorials out there. However, there may be some that don't always impart the best practices out there. Readers should always be alert and question whether there are more "Pythonic" ways of doing things. In this case, tuples should not be one of the "[two] of the most commonly used built-in data types in Python...."

Friday, April 6, 2012

Integrating Google APIs and Technologies

In 1997, long before my tenure at Google, I became a member of the Python community in helping to create Yahoo!Mail, one of the most popular web-based email systems in the world. There were only two Python books on the market back then, and neither addressed my developer’s need to learn Python quickly and competently, so I had to resort to the online docs. This absence, and consequently my development of class materials for a Python course, inspired me to write Prentice Hall’s bestselling Core Python Programming over a decade ago. Since then, I’ve used Python to work on all kinds of interesting applications, from web-based e-mail to geolocalized product search engines, social media games, antispam/antivirus e-mail appliances, and most interestingly, software for doctors to help them analyze and assess patients with spinal fractures. (Ask me about osteoporosis!)

Today at Google, my work involves advocating our tools and APIs to the global developer community. Now that I've been part of the Google family for the past 2.5 years, I thought it would be fun to integrate some of our technologies into the book. With the just-published 3rd edition, readers will find revised but also brand new material they can use to build real applications with. Some of the Google technologies I've integrated into Core Python Applications Programming include accessing your Gmail, parsing Google News XML feeds, and a complete chapter on cloud computing with Google App Engine. It’s also the first published book to feature code that utilizes the Google+ API. While the book contains a longer example using that API, I want to show you how easy it is to connect to Google+ using Python right now!

The bulk of the work in connecting to Google+ (and other Google APIs) is done by my fellow colleagues who maintain the Google APIs Client Library for Python, easily downloaded with pip or easy_install as "google-api-python-client." With this library, the most difficult step to connect with your API of choice has basically been reduced to a single line... see the fourth line of this short Python 2.x example:

# plus.py (by Wesley Chun under CC-SA3.0 license)
from apiclient import discovery

service = discovery.build("plus", "v1", developerKey=API_KEY)
feed = service.activities().search(query='android').execute()
for record in feed['items']:
post = ' '.join(record['title'].strip().split())
if post:
print '\nFrom:', record['actor']['displayName']
print 'Post:', post
print 'Date:', record['published']

In that one line of code (italicized above), we use the Google APIs Client Library's apiclient.discovery.build() method, passing in: a) the desired API ("plus" for Google+), b) the version (currently "v1"), and c) the API key you obtained from your project's development console in the "Simple API Access" section. This key gives your project access to APIs that do not need to access user data. Once we have a handle to the service, we can execute generic queries on the available data stream.

In this code snippet, we're simply querying for the latest (public) Google+ posts that are related to Android and displaying them on the command-line (code can be easily repurposed into any mobile or web application). Naturally, you need to go through the OAuth flow if you do want access to authenticated data. Give it a try!

If you like the code, dig into Core Python Applications Programming for a longer, more detailed example. Both scripts can be downloaded at the website (the code is part of Chapter 15), and you can get involved in the conversation on the Google+ page. I'm open to all feedback, suggestions, and fixes. You can find me at +wescpy or @wescpy. Looking forward to meeting you at an upcoming Google or Python event or in one of my public courses!

Friday, March 9, 2012

A new PyCon... and a new book (and a new article)!!

I'm excited about this year's PyCon conference happening this time in the heart of Silicon Valley. There are many firsts, so let's just list a few here (let me know if I'm missing any)!

This is the first PyCon...
  • ever held in Silicon Valley (although older Python workshops have been hosted here)
  • that had a cap; yes, we "ended" registration at 1500 people
  • where we ran out of swag bags; 1800 were ordered... POOF, gone by mid-Saturday
  • to have sold out! (even though we capped at 1500; didn't stop it from going over 2000)
  • with an attendance near or exceeding 2200 (2257)
  • that had to stop accepting sponsorships... at 136!!!
  • to feature a physical race (not to be confused with a race condition)
Another exciting announcement is that my first 3rd edition Core Python book will be published and debuting at the conference!! It's called Core Python Applications Programming and based on the second part of the original Core Python Programming book. All of the books' individual home pages are now unified at corepython.com. The books also have a shared Google+ page for you to encircle! They're literally "hot off the presses" as they were overnighted by the printer to the publisher's hotel and brought by hand to the conference! (Amazon's not shipping them for another 10 days after that!)

The new book features upgrades and new stuff added to existing chapters as well as brand new chapters on Django, Google App Engine, and text processing with CSV, JSON, and XML. There is even new material on Twitter and Google+ in case you're feeling more social than when the previous edition was published. Those of you asking for that PowerPoint slideshow generator for the past N years, or perhaps an intro to NoSQL/MongoDB? Yep, they're in there too! Finally, I've added not only Python 3 equivalents to many of the code samples, but I also cover some best practices when porting from 2.x to 3.x.

With all of the updates and new material, I'm hoping that this will be one of the most popular places for intermediate Python programmers to go once they've gotten comfortable with the langauge but want to apply their skills to a variety of topics in Python development today. While the coverage doesn't necessarily go particularly deep, the goal is to give programmers a kickstart with a comprehensive introduction.

To help kickoff the new book, I got to thinking about Python books in general, especially the numerous times that people have either asked me or asked in some online forum: "What's a good Python book?" Unlike Python, there's not one right answer for this question, so as part of this exploration, I came up with 3 different book lists for diverse audiences of readers out there. You can find that article at InformIT.

In the meantime, it's back to the drawing board for me as I prepare to work on the 3rd edition of the main part of Core Python. If you've got ideas or suggestions on updating part 1 or wish to participate in the review process, please contact me now! (@wescpy/+wescpy)

ps. For those interested in brushing up on your Python skills, I'll be offering my popular Intro+Intermediate course this summer near the San Francisco airport. Go to cyberwebconsulting.com for more information!