Tuesday, May 29, 2012

Tuples aren't what you think they're for

While I'm happy that the number of Python users continues to grow at a rapid pace and that there are many tutorials added each day to support all the newbies, there are a few things that make me cringe when I see them.

One example of this is seeing a Python college textbook (you can tell by its retail price) produced by a big-name publisher (one of the largest in the world which shall remain unnamed) that instructs users (of Python 2), to get user command-line input using the input() function! Clearly, this is a major faux pas, as most Python users know that it's a security risk and that raw_input() should always be used instead (and the main reason why raw_input() replaces and is renamed as input() in Python 3).

Another example is this recent article on lists and tuples. While I find the content useful in teaching new Python developers various useful ways of using slicing, I disagree with the premise that tuples...
  1. along with lists are two of Python's most popular data structures
  2. are mostly immutable but there are workarounds, and
  3. should be used for application data manipulation

I would says lists and dictionaries are the two most popular Python data structures; tuples shouldn't even be in that group. In fact, I would even argue that tuples shouldn't be used to manipulate application data at all, as that wasn't what they were generally created for. (If this was the case, then why not have lists with a read-only flag?)

The main reason why tuples exist is to get data to and from function calls. [UPDATE: two other strong use cases: 1) "constructed" dictionary keys (i would've turned such N-tuples into a delimited string) and from that use comes 2) a data structure with positional semantics, aka indices with implied meaning... both of these view such tuples as an individual entity (made up of multiple components), again, not a data structure for manipulating objects. Named tuples is an related alternative. See the debate in the commentary below.]

Calling a foreign API or 3rd-party function and want to pass in a data structure you know can't be altered? Check. Calling any function where you want to pass in only one data structure (instead of separate variables)? Use "*" and you're good to go. Previously worked with a programming language that only allowed you to return a single value? Tuples are that one object (think of it as a single shopping bag for all your groceries).

All of the manipulations in the post on getting around the immutability are superfluous and not adhering to the best practice of not using tuples as a data structure. I mean, this is not a strict rule. If you're needing a data structure where you're not going to make any modifications and desire slightly better performance, sure a tuple can be used in such cases. This is why in Python 2.6, for the first time "evar," tuples were given methods!

There was never any need for tuples to have methods because they were immutable. "Just use lists," is what we would all say. However, lists had a pair of read-only methods (count() and index()) that led to inefficiencies (and poor practices) where developers used tuples for the reason we just outlined but needed to either get a count on how many times an object appeared in that sequence or wanted to find the index of the first appearance of an object. They would have to convert that tuple to a list, just to call those methods. Starting in 2.6, tuples now have those (and only those) methods to avoid this extra nonsense.

So yes, you can use tuples as user-land data structures in such cases, but that's really it. For manipulation, use lists instead. As stated at the top, I'm generally all for more intro posts and tutorials out there. However, there may be some that don't always impart the best practices out there. Readers should always be alert and question whether there are more "Pythonic" ways of doing things. In this case, tuples should not be one of the "[two] of the most commonly used built-in data types in Python...."