Tuesday, May 29, 2012

Tuples aren't what you think they're for

While I'm happy that the number of Python users continues to grow at a rapid pace and that there are many tutorials added each day to support all the newbies, there are a few things that make me cringe when I see them.

One example of this is seeing a Python college textbook (you can tell by its retail price) produced by a big-name publisher (one of the largest in the world which shall remain unnamed) that instructs users (of Python 2), to get user command-line input using the input() function! Clearly, this is a major faux pas, as most Python users know that it's a security risk and that raw_input() should always be used instead (and the main reason why raw_input() replaces and is renamed as input() in Python 3).

Another example is this recent article on lists and tuples. While I find the content useful in teaching new Python developers various useful ways of using slicing, I disagree with the premise that tuples...
  1. along with lists are two of Python's most popular data structures
  2. are mostly immutable but there are workarounds, and
  3. should be used for application data manipulation

I would says lists and dictionaries are the two most popular Python data structures; tuples shouldn't even be in that group. In fact, I would even argue that tuples shouldn't be used to manipulate application data at all, as that wasn't what they were generally created for. (If this was the case, then why not have lists with a read-only flag?)

The main reason why tuples exist is to get data to and from function calls. [UPDATE: two other strong use cases: 1) "constructed" dictionary keys (i would've turned such N-tuples into a delimited string) and from that use comes 2) a data structure with positional semantics, aka indices with implied meaning... both of these view such tuples as an individual entity (made up of multiple components), again, not a data structure for manipulating objects. Named tuples is an related alternative. See the debate in the commentary below.]

Calling a foreign API or 3rd-party function and want to pass in a data structure you know can't be altered? Check. Calling any function where you want to pass in only one data structure (instead of separate variables)? Use "*" and you're good to go. Previously worked with a programming language that only allowed you to return a single value? Tuples are that one object (think of it as a single shopping bag for all your groceries).

All of the manipulations in the post on getting around the immutability are superfluous and not adhering to the best practice of not using tuples as a data structure. I mean, this is not a strict rule. If you're needing a data structure where you're not going to make any modifications and desire slightly better performance, sure a tuple can be used in such cases. This is why in Python 2.6, for the first time "evar," tuples were given methods!

There was never any need for tuples to have methods because they were immutable. "Just use lists," is what we would all say. However, lists had a pair of read-only methods (count() and index()) that led to inefficiencies (and poor practices) where developers used tuples for the reason we just outlined but needed to either get a count on how many times an object appeared in that sequence or wanted to find the index of the first appearance of an object. They would have to convert that tuple to a list, just to call those methods. Starting in 2.6, tuples now have those (and only those) methods to avoid this extra nonsense.

So yes, you can use tuples as user-land data structures in such cases, but that's really it. For manipulation, use lists instead. As stated at the top, I'm generally all for more intro posts and tutorials out there. However, there may be some that don't always impart the best practices out there. Readers should always be alert and question whether there are more "Pythonic" ways of doing things. In this case, tuples should not be one of the "[two] of the most commonly used built-in data types in Python...."

18 comments:

  1. I like making the distinction that lists are generally homogeneous, containing N elements that are all of the same type, while tuples give a separate and distinct meaning to each element and that the elements are often of different types. After enough time with the language, one expects to see a Python list like

    ['apple', 'orange', 'banana']

    but expects a Python tuple to look like

    (1.9, 'lbs', 'apple')

    ReplyDelete
    Replies
    1. This is the same distinction I have come to make as well. It feels natural and seems intuitive.

      Delete
    2. This is probably the way it *should* be. As many readers know, in Python 3, certain operations that don't make sense (are undefined) for heterogeneous collections are no longer allowed. When you have a list of data to manipulate, they should generally all be of the same type.

      Delete
    3. I like this distinction too. I think I was introduced to the idea via Haskell. I ended up writing a blog post about it because while I found it clarifying it seemed unknown among many Python programmers.

      Delete
  2. If we take into account internal use by the interpreter for things such as function arguments and multi-valued function returns, the contest for "most commonly used" between lists, tuples and dictionaries might actually give surprising answers. (I would actually be betting on the last two, with lists coming in third, especially in Python 3 where many former list creating functions instead produce custom iterators)

    As far as data structures go in user code, though, I think you're right - the extra flexibility of lists will generally trump the micro-optimisation benefits of using a tuple instead.

    ReplyDelete
    Replies
    1. Ah, but you see, I already agreed with you in the post. The main purpose of tuples *is* to send parameters to *and* gather return values from function calls. So behind the scenes, yes, they'll rank at the top, but *not* for explicit user data manipulation. They're used implicitly during app "operations." :-)

      Delete
  3. We teach Python as an introductory programming language and the distinction between tuples and lists is something that many of our students struggle with.

    As an illustrative example, I ask them to imagine how we would use Python's built-in data structures to represent an editable polygon in a 2D graphics application. We consider different options and then I suggest to them that a list of tuples is perhaps the best choice.

    The logic here is that polygon vertices always consist of x and y values, so an immutable pair is appropriate. But since we might need to edit the shape by adding, removing or replacing vertices, the collection of vertices needs to be represented by a list.

    ReplyDelete
  4. I agree with Nick Efford's comment. A tuple has another usage other than "performance", which is its semantic value. It indicates "this is not an Nth dimension of a list, this is a fixed, positional data structure". To me, representing a list of points looks much more natural as [(1, 2), (5, 12), ...] rather than [[1,2], [5,12], ...], and easier to read too.

    ReplyDelete
    Replies
    1. Agreed. The correct representation of a point really is (x,y), so that's certainly a valid use, and it makes very little sense for them to be lists, especially if you have no intention on using list operations/methods.

      Delete
  5. As much as I appreciate these clarifications of best practices, I would like to say that it is not a nice practice to leave someone unaware of your criticism in this way. These Python Central guys have a comment box below their articles. It would be nice of you to leave a note there so that they can come here, read this and learn and improve their stuff.

    ReplyDelete
    Replies
    1. Max: point well taken... I'll do so. This wasn't a rant directed at this one particular piece as much as it was a generalized criticism towards tutorials which, while teaching the language syntax, would benefit even more if they also imparted good developer practices.

      Delete

  6. The main reason why tuples exist is to get data to and from function calls.


    I also use tuples as dict keys and as set elements. I would call this "a main reason" for using tuples. For instance I use this more often than the "*" for the argument unpacking.

    Marko

    ReplyDelete
    Replies
    1. This likely depends on the apps you write, but I haven't seen tuples used as dict keys *that* often; however, that's just been *my* experience. I would say it's certainly a valid use however. Also, with regards to "*"... i didn't mean its use in function *definitions* (argument unpacking) as much as i meant using "*" in function *calls*.

      Delete
    2. I use tuples as dictionary keys to a great degree, which is a habit I started after seeing this blog post by Guido: http://www.artima.com/weblogs/viewpost.jsp?thread=101605. It basically replaced what I used to do in Perl, which was to turn everything into a string and concatenate, into a clearer system of just passing in tuples for a composite dictionary key.

      Delete
  7. As I say in this 2006 blog post, it's not just about homogenous vs heterogenous but also that, as opposed to a list, "the index in a tuple has an implied semantic. The point of a tuple is that the i-th slot means something specific. In other words, it's a index-based (rather than name based) datastructure."

    The existence of named tuples now makes this point even clearer. One could hardly imaging a "named list"

    ReplyDelete
  8. Hi Wesley. After reading your article and then the comments I think the main article is left with a flaw: The main use for a tuple, and what should help distinguish it from the uses of a list are, as many have said in the comments, (and you seem to have taken on board in the comments), is that the position of an entry in a tuple has implied meaning.

    If you don't have this fact at the head of your paragraph beginning "The main reason why tuples exist ..." then the main article becomes misleading.

    ReplyDelete
    Replies
    1. Thanks. I added an "UPDATE" section which adds these 2 other strong use cases.

      Delete
  9. hi Wesley.. just to let you know that i'm your biggest fan. I watched your video lectures on python and find your articulation and method of explanation to be very sincere and clear.

    ReplyDelete