Monday, July 11, 2016

Exporting a Google Sheet spreadsheet as CSV

Introduction

Today, we'll follow-up to my earlier post on the Google Sheets API and multiple posts (first, secondthird) on the Google Drive API by answering one common question: How do you download a Google Sheets spreadsheet as a CSV file? The "FAQ"ness of the question itself as well as various versions of Google APIs has led to many similar StackOverflow questions: one, two, three, four, five, just to list a few. Let's answer this question definitively and walk through a Python code sample that does exactly that. The main assumption is that you have a Google Sheet file in your Google Drive named "inventory".

Choosing the right API

Upon first glance, developers may think the Google Sheets API is the one to use. Unfortunately that isn't the case. The Sheets API is the one to use for spreadsheet-oriented operations, such as inserting data, reading spreadsheet rows, managing individual tab/sheets within a spreadsheet, cell formatting, creating charts, adding pivot tables, etc., It isn't meant to perform file-based requests like exporting a Sheet in CSV (comma-separated values) format. For file-oriented operations with a Google Sheet, you would use the Google Drive API.

Using the Google Drive API

As mentioned earlier, Google Drive features numerous API scopes of authorization. As usual, we always recommend you use the most restrictive scope possible that allows your app to do its work. You'll request fewer permissions from your users (which makes them happier), and it also makes your app more secure, possibly preventing modifying, destroying, or corrupting data, or perhaps inadvertently going over quotas. Since we're only exporting a Google Sheets file from Google Drive, the only scope we need is:
  • 'https://www.googleapis.com/auth/drive.readonly' — Read-only access to file content or metadata
The earlier post I wrote on the Google Drive API featured sample code that exported an uploaded Google Docs file as PDF and download that from Drive. This post will not only feature a change to exporting a Google Sheets file in CSV format, but also demonstrate one additional feature of the Drive API: querying

Since we've fully covered the authorization boilerplate fully in earlier posts and videos, we're going to skip that here and jump right to the action, creating of a service endpoint to Drive. The API name is (of course 'drive', and the current version of the API is 3, so use the string 'v3' in this call to the apiclient.discovey.build() function:

DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))

Query and export files from Google Drive

While unnecessary, we'll create a few string constants representing the filename, source and destination file MIME types to make the code easier to understand:
FILENAME = 'inventory'
SRC_MIMETYPE = 'application/vnd.google-apps.spreadsheet'
DST_MIMETYPE = 'text/csv'
In this simple example, we're only going to export one Google Sheets file as CSV, arbitrarily choosing a file named, "inventory." So to perform the query, you need both the filename and its MIME type, "application/vnd.google-apps.spreadsheet". Query components are conjoined with the "and" keyword, so your query string will look like this: q='name="%s" and mimeType="%s"' % (FILENAME, SRC_MIMETYPE).

Since there may be more than one Google Sheets file named 'inventory". we opt for newest one and thus need to sort all matching files in descending order of last modification time then name if "mtime"s are identical via an "order by" clause: orderBy='modifiedTime desc,name'. Here is the complete call to DRIVE.files().list() to issue the query:
files = DRIVE.files().list(
    q='name="%s" and mimeType="%s"' % (FILENAME, SRC_MIMETYPE),
    orderBy='modifiedTime desc,name').execute().get('files', [])
If any files match, the payload will contain a 'files' key, else we default to an empty list and display to the user on the last line that no files were found. Otherwise, grab the first match, the most recently-modified 'inventory' file, create a suitable CSV filename from it, and change all spaces to underscores:

fn = '%s.csv' % os.path.splitext(files[0]['name'].replace(' ', '_'))[0]

The final Drive API call requests an export of 'inventory' as a CSV file, and if successful, the downloaded data is written with the filename above. In either case, the user is notified of success or failure of the export:
data = DRIVE.files().export(fileId=files[0]['id'], mimeType=DST_MIMETYPE).execute()
if data:
    with open(fn, 'wb') as f:
        f.write(data)
    print('DONE')
else:
    print('ERROR (could not download file)')
Note that if downloading as CSV, the Drive API only exports of the first sheet in a Sheets file... you won't get any others. However, it does support 3 other download formats that will get you all the sheets.

If you create a Sheets file named 'inventory', run the script, grant the script access to your Google Drive (via the OAuth2 prompt that pops up in the browser), and then you should get output that looks like this:
$ python drive_sheets_csv_export.py # or python3
Exporting "inventory" as "inventory.csv"... DONE

Conclusion

Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!). If I were to divide the script into 4 major sections, they would be:
  • Get creds & build Google Drive service endpoint
  • Source and destination file info
  • Query Google Drive for matching files
  • Export most recent matching Sheets file as CSV

Here's the code itself:
from __future__ import print_function
import argparse
import os

from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = 'https://www.googleapis.com/auth/drive.readonly'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args()
    flow = client.flow_from_clientsecrets('client_id.json', SCOPES)
    creds = tools.run_flow(flow, store, flags)
DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))

FILENAME = 'inventory'
SRC_MIMETYPE = 'application/vnd.google-apps.spreadsheet'
DST_MIMETYPE = 'text/csv'

files = DRIVE.files().list(
    q='name="%s" and mimeType="%s"' % (FILENAME, SRC_MIMETYPE),
    orderBy='modifiedTime desc,name').execute().get('files', [])

if files:
    fn = '%s.csv' % os.path.splitext(files[0]['name'].replace(' ', '_'))[0]
    print('Exporting "%s" as "%s"... ' % (files[0]['name'], fn), end='')
    data = DRIVE.files().export(fileId=files[0]['id'], mimeType=DST_MIMETYPE).execute()
    if data:
        with open(fn, 'wb') as f:
            f.write(data)
        print('DONE')
    else:
        print('ERROR (could not download file)')
else:
    print('!!! ERROR: File not found')
As with our other code samples, you can now customize for your own needs, for a mobile frontend, sysadmin script, or a server-side backend, perhaps accessing other Google APIs. Hope this helps answer yet another frequently asked question!

Thursday, June 9, 2016

Migrating SQL data to Google Sheets using the new Google Sheets API

NOTE: The code covered in this post are also available in a video walkthrough.

Introduction

In this post, we're going to demonstrate how to use the latest generation Google Sheets API. Launched at Google I/O 2016 (full talk here), the Sheets API v4 can do much more than previous versions, bringing it to near-parity with what you can do with the Google Sheets UI (user interface) on desktop and mobile. Below, I'll walk you through a Python script that reads the rows of a relational database representing customer orders for a toy company and pushes them into a Google Sheet. Other API calls we'll make: one to create new Google Sheets with and another that reads the rows from a Sheet.

Earlier posts demonstrated the structure and "how-to" use Google APIs in general, so more recent posts, including this one, focus on solutions and use of specific APIs. Once you review the earlier material, you're ready to start with authorization scopes then see how to use the API itself.

    Google Sheets API authorization & scopes

    Previous versions of the Google Sheets API (formerly called the Google Spreadsheets API), were part of a group of "GData APIs" that implemented the Google Data (GData) protocol, an older, less-secure, REST-inspired technology for reading, writing, and modifying information on the web. The new API version falls under the more modern set of Google APIs requiring OAuth2 authorization and whose use is made easier with the Google APIs Client Libraries.

    The current API version features a pair of authorization scopes: read-only and read-write. As usual, we always recommend you use the most restrictive scope possible that allows your app to do its work. You'll request fewer permissions from your users (which makes them happier), and it also makes your app more secure, possibly preventing modifying, destroying, or corrupting data, or perhaps inadvertently going over quotas. Since we're creating a Google Sheet and writing data into it, we must use the read-write scope:
    • 'https://www.googleapis.com/auth/spreadsheets' — Read/write access to Sheets and Sheet properties

    Using the Google Sheets API

    Let's look at some code that reads rows from a SQLite database and creates a Google Sheet with that data. Since we covered the authorization boilerplate fully in earlier posts and videos, we're going straight to creating a Sheets service endpoint. The API string to use is 'sheets' and the version string to use is 'v4' as we call the apiclient.discovey.build() function:

    SHEETS = discovery.build('sheets', 'v4', http=creds.authorize(Http()))

    With the SHEETS service endpoint in hand, the first thing to do is to create a brand new Google Sheet. Before we use it, one thing to know about the Sheets API is that most calls require a JSON payload representing the data & operations you wish to perform, and you'll see this as you become more familiar with it. For creating new Sheets, it's pretty simple, you don't have to provide anything, in which case you'd pass in an empty (dict as the) body, but a better bare minimum would be a name for the Sheet, so that's what data is for:

    data = {'properties': {'title': 'Toy orders [%s]' % time.ctime()}}

    Notice that a Sheet's "title" is part of its "properties," and we also happen to add the timestamp as part of its name. With the payload complete, we call the API with the command to create a new Sheet [spreadsheets().create()], passing in data in the (eventual) request body:

    res = SHEETS.spreadsheets().create(body=data).execute()

    Alternatively, you can use the Google Drive API (v2 or v3) to create a Sheet but would also need to pass in the Google Sheets (file) MIME type:
    data = {
        'name': 'Toy orders [%s]' % time.ctime(),
        'mimeType': 'application/vnd.google-apps.spreadsheet',
    }
    res = DRIVE.files().create(body=data).execute() # insert() for v2
    
    The general rule-of-thumb is that if you're only working with Sheets, you can do all the operations with its API, but if creating files other than Sheets or performing other Drive file or folder operations, you may want to stick with the Drive API. You can also use both or any other Google APIs for more complex applications. We'll stick with just the Sheets API for now. After creating the Sheet, grab and display some useful information to the user:
    SHEET_ID = res['spreadsheetId']
    print('Created "%s"' % res['properties']['title'])
    
    You may be wondering: Why do I need to create a Sheet and then make a separate API call to add data to it? Why can't I do this all when creating the Sheet? The answer (to this likely FAQ) is you can, but you would need to construct and pass in a JSON payload representing the entire Sheet—meaning all cells and their formatting—a much larger and more complex data structure than just an array of rows. (Don't believe me? Try it yourself!) This is why we have all of the spreadsheets().values() methods... to simplify uploading or downloading of only values to or from a Sheet.

    Now let's turn our attention to the simple SQLite database file (db.sqlite) available from the Google Sheets Node.js codelab. The next block of code just connects to the database with the standard library sqlite3 package, grabs all the rows, adds a header row, and filters the last two (timestamp) columns:
    FIELDS = ('ID', 'Customer Name', 'Product Code', 'Units Ordered',
            'Unit Price', 'Status', 'Created at', 'Updated at')
    cxn = sqlite3.connect('db.sqlite')
    cur = cxn.cursor()
    rows = cur.execute('SELECT * FROM orders').fetchall()
    cxn.close()
    rows.insert(0, FIELDS)
    data = {'values': [row[:6] for row in rows]}
    
    When you have a payload (array of row data) you want to stick into a Sheet, you simply pass in those values to spreadsheets().values().update() like we do here:
    SHEETS.spreadsheets().values().update(spreadsheetId=SHEET_ID,
        range='A1', body=data, valueInputOption='RAW').execute()
    
    The call requires a Sheet's ID and command body as expected, but there are two other fields: the full (or, as in our case, the "upper left" corner of the) range of cells to write to (in A1 notation), and valueInputOption indicates how the data should be interpreted, writing the raw values ("RAW") or interpreting them as if a user were entering them into the UI ("USER_ENTERED"), possibly converting strings & numbers based on the cell formatting.

    Reading rows out of a Sheet is even easier, the spreadsheets().values().get() call needing only an ID and a range of cells to read:
    print('Wrote data to Sheet:')
    rows = SHEETS.spreadsheets().values().get(spreadsheetId=SHEET_ID,
        range='Sheet1').execute().get('values', [])
    for row in rows:
        print(row)
    
    The API call returns a dict which has a 'values' key if data is available, otherwise we default to an empty list so the for loop doesn't fail.

    If you run the code (entire script below) and grant it permission to manage your Google Sheets (via the OAuth2 prompt that pops up in the browser), the output you get should look like this:
    $ python3 sheets-toys.py # or python (2.x)
    Created "Toy orders [Thu May 26 18:58:17 2016]" with this data:
    ['ID', 'Customer Name', 'Product Code', 'Units Ordered', 'Unit Price', 'Status']
    ['1', "Alice's Antiques", 'FOO-100', '25', '12.5', 'DELIVERED']
    ['2', "Bob's Brewery", 'FOO-200', '60', '18.75', 'SHIPPED']
    ['3', "Carol's Car Wash", 'FOO-100', '100', '9.25', 'SHIPPED']
    ['4', "David's Dog Grooming", 'FOO-250', '15', '29.95', 'PENDING']
    ['5', "Elizabeth's Eatery", 'FOO-100', '35', '10.95', 'PENDING']
    

    Conclusion

    Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!):

    '''sheets-toys.py -- Google Sheets API demo
        created Jun 2016 by +Wesley Chun/@wescpy
    '''
    from __future__ import print_function
    import argparse
    import sqlite3
    import time
    
    from apiclient import discovery
    from httplib2 import Http
    from oauth2client import file, client, tools
    
    SCOPES = 'https://www.googleapis.com/auth/spreadsheets'
    store = file.Storage('storage.json')
    creds = store.get()
    if not creds or creds.invalid:
        flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args()
        flow = client.flow_from_clientsecrets('client_id.json', SCOPES)
        creds = tools.run_flow(flow, store, flags)
    
    SHEETS = discovery.build('sheets', 'v4', http=creds.authorize(Http()))
    data = {'properties': {'title': 'Toy orders [%s]' % time.ctime()}}
    res = SHEETS.spreadsheets().create(body=data).execute()
    SHEET_ID = res['spreadsheetId']
    print('Created "%s"' % res['properties']['title'])
    
    FIELDS = ('ID', 'Customer Name', 'Product Code', 'Units Ordered',
            'Unit Price', 'Status', 'Created at', 'Updated at')
    cxn = sqlite3.connect('db.sqlite')
    cur = cxn.cursor()
    rows = cur.execute('SELECT * FROM orders').fetchall()
    cxn.close()
    rows.insert(0, FIELDS)
    data = {'values': [row[:6] for row in rows]}
    
    SHEETS.spreadsheets().values().update(spreadsheetId=SHEET_ID,
        range='A1', body=data, valueInputOption='RAW').execute()
    print('Wrote data to Sheet:')
    rows = SHEETS.spreadsheets().values().get(spreadsheetId=SHEET_ID,
        range='Sheet1').execute().get('values', [])
    for row in rows:
        print(row)
    
    You can now customize this code for your own needs, for a mobile frontend, devops script, or a server-side backend, perhaps accessing other Google APIs. If this example is too complex, check the Python quickstart in the docs that way simpler, only reading data out of an existing Sheet. If you know JavaScript and are ready for something more serious, try the Node.js codelab where we got the SQLite database from. That's it... hope you find these code samples useful in helping you get started with the latest Sheets API!

    EXTRA CREDIT: Feel free to experiment and try cell formatting or other API features. Challenge yourself as there's a lot more to Sheets than just reading and writing values! 

    Wednesday, December 23, 2015

    Migrating to Google Drive API v3

    NOTE: The code covered in this and the previous post are also available in a video walkthrough.

    Introduction

    In a blog post last week, we introduced readers to performing uploads and downloads files to/from Google Drive from a simple Python command-line script. In an official Google blog post later that same day, the Google Drive API team announced a new version of the API. Great timing huh? Well, good thing I knew it was coming, so that I could prepare this post for you, which is a primer on how to migrate from the current version of the API (v2) to the new one (v3).

    As stated by the Drive team, v2 isn't being deprecated, and there are no new features in v3, thus migration isn't required. The new version is mainly for new apps/integrations as well as developers with v2 apps who wish to take advantage of the improvements. This post is intended for those in the latter group, covering porting existing apps to v3. Ready? Let's go straight to the action.

    Migrating from Google Drive API v2 to v3

    Most of this post will be just examining all the "diffs" between the v2 code sample from the previous post (renamed from drive_updown.py to drive_updown2.py) and its v3 equivalent (drive_updown3.py). We'll take things step-by-step to provide more details, but let's start with all the diffs first:
    --- drive_updown2.py 2015-12-10 14:37:11.000000000 -0800
    +++ drive_updown3.py 2015-12-10 13:05:07.000000000 -0800
    @@ -19,23 +19,24 @@
                 'client_secret.json', scope=SCOPES)
         creds = tools.run_flow(flow, store, flags) \
                 if flags else tools.run(flow, store)
    -DRIVE = build('drive', 'v2', http=creds.authorize(Http()))
    +DRIVE = build('drive', 'v3', http=creds.authorize(Http()))
     
     FILES = (
    -    ('hello.txt', False),
    -    ('hello.txt', True),
    +    ('hello.txt', None),
    +    ('hello.txt', 'application/vnd.google-apps.document'),
     )
     
    -for filename, convert in FILES:
    -    metadata = {'title': filename}
    -    res = DRIVE.files().insert(convert=convert, body=metadata,
    -            media_body=filename, fields='mimeType,exportLinks').execute()
    +for filename, mimeType in FILES:
    +    metadata = {'name': filename}
    +    if mimeType:
    +        metadata['mimeType'] = mimeType
    +    res = DRIVE.files().create(body=metadata, media_body=filename).execute()
         if res:
             print('Uploaded "%s" (%s)' % (filename, res['mimeType']))
     
     if res:
         MIMETYPE = 'application/pdf'
    -    res, data = DRIVE._http.request(res['exportLinks'][MIMETYPE])
    +    data = DRIVE.files().export(fileId=res['id'], mimeType=MIMETYPE).execute()
         if data:
             fn = '%s.pdf' % os.path.splitext(filename)[0]
             with open(fn, 'wb') as fh:
    
    We'll start with the building of the service endpoint, with the trivial change of the API version string from 'v2' to 'v3':
    -DRIVE = build('drive', 'v2', http=creds.authorize(Http()))
    +DRIVE = build('drive', 'v3', http=creds.authorize(Http()))
    
    The next change is the deprecation of the conversion flag. The problem with a Boolean variable is that it limits the possible types of file formats supported. By changing it to a file mimeType instead, the horizons are broadened:
     FILES = (
    -    ('hello.txt', False),
    -    ('hello.txt', True),
    +    ('hello.txt', None),
    +    ('hello.txt', 'application/vnd.google-apps.document'),
     )
    
    Your next question will be: "What are the mimeTypes for the supported Google Apps document formats?" The answers can be found at this page in the official docs. This changes the datatype in our array of 2-tuples, so we need to change the loop variable to reflect this... we'll use the mimeType instead of a conversion flag:
    -for filename, convert in FILES:
    +for filename, mimeType in FILES:
    
    Another change related to deprecating the convert flag is that the mimeType isn't a parameter to the API call. Instead, it's another piece of metadata, so we need to add mimeType to the metadata object.

    Related to this is a name change: since a file's name is its name and not its title, it makes more sense to use "name" as the metadata value:
    -    metadata = {'title': filename}
    +    metadata = {'name': filename}
    +    if mimeType:
    +        metadata['mimeType'] = mimeType
    
    Why the if statement? Not only did v3 see a change to using mimeTypes, but rather than being a parameter like the conversion flag in v2, the mimeType has been moved into the file's metadata, so if we're doing any conversion, we need to add it to our metadata field (then remove the convert parameter down below).

    Next is yet another name change: when creating files on Google Drive, "create()" makes more sense as a method name than "insert()". Reducing the size of payload is another key ingredient of v3. We mentioned in the previous post that insert() returns more than 30 fields in the response payload unless you use the fields parameter to specify exactly which you wish returned. In v3, the default response payload only returns four fields, including all the ones we need in this script, so use of the fields parameter isn't required any more:
    -    res = DRIVE.files().insert(convert=convert, body=metadata,
    -            media_body=filename, fields='mimeType,exportLinks').execute()
    +    res = DRIVE.files().create(body=metadata, media_body=filename).execute()
    
    The final improvement we can demonstrate: users no longer have to make an authorized HTTP GET request with a link to export and download a file in an alternate format like PDF®. Instead, it's now a "normal" API call (to the new "export()" method) with the mimeType as a parameter. The only other parameter you need is the file ID, which comes back as part of the (default) response payload when the create() call was made:
    -    res, data = DRIVE._http.request(res['exportLinks'][MIMETYPE])
    +    data = DRIVE.files().export(fileId=res['id'], mimeType=MIMETYPE).execute()
    
    That's it! If you run the script, grant the script access to your Google Drive (via the OAuth2 prompt that pops up in the browser), and then you should get output that looks like this:
    $ python drive_updown3.py # or python3
    Uploaded "hello.txt" (text/plain)
    Uploaded "hello.txt" (application/vnd.google-apps.document)
    Downloaded "hello.pdf" (application/pdf)
    

    Conclusion

    The entire v2 script (drive_updown2.py) was spelled out in full in the previous post, and it hasn't changed since then. Below is the v3 script (drive_updown3.py) for your convenience which runs on both Python 2 and Python 3 (unmodified!):
    #!/usr/bin/env python
    
    from __future__ import print_function
    import os
    
    from apiclient.discovery import build
    from httplib2 import Http
    from oauth2client import file, client, tools
    try:
        import argparse
        flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args()
    except ImportError:
        flags = None
    
    SCOPES = 'https://www.googleapis.com/auth/drive.file'
    store = file.Storage('storage.json')
    creds = store.get()
    if not creds or creds.invalid:
        flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
        creds = tools.run_flow(flow, store, flags) \
                if flags else tools.run(flow, store)
    DRIVE = build('drive', 'v3', http=creds.authorize(Http()))
    
    FILES = (
        ('hello.txt', None),
        ('hello.txt', 'application/vnd.google-apps.document'),
    )
    
    for filename, mimeType in FILES:
        metadata = {'name': filename}
        if mimeType:
            metadata['mimeType'] = mimeType
        res = DRIVE.files().create(body=metadata, media_body=filename).execute()
        if res:
            print('Uploaded "%s" (%s)' % (filename, res['mimeType']))
    
    if res:
        MIMETYPE = 'application/pdf'
        data = DRIVE.files().export(fileId=res['id'], mimeType=MIMETYPE).execute()
        if data:
            fn = '%s.pdf' % os.path.splitext(filename)[0]
            with open(fn, 'wb') as fh:
                fh.write(data)
            print('Downloaded "%s" (%s)' % (fn, MIMETYPE))
    )
    
    Just as in the previous post(s), you can now customize this code for your own needs, for a mobile frontend, sysadmin script, or a server-side backend, perhaps accessing other Google APIs. Hope we accomplished our goal by pointing out some of the shortcomings that are in v2 and how they were improved in v3! All of the content in this and the previous post are spelled out visually in this video that I created for you.

    Monday, December 14, 2015

    Google Drive: Uploading & Downloading files with Python

    UPDATE: Since this post was published, the Google Drive team released a newer version of their API. After reading this one, go to the next post to learn about migrating your app from v2 to v3 as well as link to my video which walks through the code samples in both posts.

    Introduction

    So far in this series of blogposts covering authorized Google APIs, we've used Python to access Google Drive, Gmail, and Google Calendar. Today, we're revisiting Google Drive with a small snippet that uploads plain text files to Drive, with & without conversion to a Google Apps format (Google Docs), then exports & downloads the converted one as PDF®.

    Earlier posts demonstrated the structure and "how-to" use Google APIs in general, so more recent posts, including this one, focus on solutions and apps, and use of specific APIs. Once you review the earlier material, you're ready to start with authorization scopes then see how to use the API itself.

      Google Drive API Scopes

      Google Drive features numerous API scopes of authorization. As usual, we always recommend you use the most restrictive scope possible that allows your app to do its work. You'll request fewer permissions from your users (which makes them happier), and it also makes your app more secure, possibly preventing modifying, destroying, or corrupting data, or perhaps inadvertently going over quotas. Since we need to upload/create files in Google Drive, the minimum scope we need is:
      • 'https://www.googleapis.com/auth/drive.file' — Per-file create/open, read/write access

      Using the Google Drive API

      Let's get going with our example today that uploads and downloads a simple plain text file to Drive. The file will be uploaded twice, once as-is, and the second time, converted to a Google Docs document. The last part of the script will request an export of the (uploaded) Google Doc as PDF and download that from Drive.

      Since we've fully covered the authorization boilerplate fully in earlier posts and videos, we're going to skip that here and jump right to the action, creating of a service endpoint to Drive. The API name is (of course) 'drive', and the current version of the API is 2, so use the string 'v2' in this call to the apiclient.discovey.build() function:

      DRIVE = build('drive', 'v2', http=creds.authorize(Http()))

      Let's also create a FILES array object (tuple, list, etc.) which holds 2-tuples of the files to upload. These pairs are made up of a filename and a flag indicating whether or not you wish the file to be converted to a Google Apps format:
      FILES = (
          ('hello.txt', False),
          ('hello.txt', True),
      )
      Since we're uploading a plain text file, a conversion to Apps format means Google Docs. (You can imagine that if it was a CSV file, the target format would be Google Sheets instead.) With the setup complete, let's move on to the code that performs the file uploads.

      We'll loop through FILES, cycling through each file-convert flag pair and call the files.insert() method to perform the upload. The four parameters needed are: 1) the conversion flag, 2) the file metadata, which is only the filename (see below), 3) the media_body, which is also the filename but has a different purpose — it specifies where the file content will come from, meaning the file will be opened and its data transferred to the API, and 4), a set of fields you want returned.
      for filename, convert in FILES:
          metadata = {'title': filename}
          res = DRIVE.files().insert(convert=convert, body=metadata,
                  media_body=filename, fields='mimeType,exportLinks').execute()
          if res:
              print('Uploaded "%s" (%s)' % (filename, res['mimeType']))
      
      It's important to give the fields() parameter because if you don't, more than 30(!) are returned by default from the API. There's no need to waste all that network traffic if all you need are just a couple. In our case, we only want the mimeType, to confirm what the file was saved as, and exportLinks, which we'll explore in a moment. If files are uploaded successfully, the print() lets the user know, and then we move on to the final section of the script.

      Before we dig into the last bit of code, it's important to realize that the res variable still contains the result from the second upload, the one where the file is converted to Google Docs. This is important because this is where we need to extract the download link for the format you want (res['exportLinks'][MIMETYPE]). The way to download the file is to make an authorized HTTP GET call, passing in that link. In our case, it's the PDF version. If the download is successful, the data variable will have the payload to write to disk. If all's good, let the user know:
      if res:
          MIMETYPE = 'application/pdf'
          res, data = DRIVE._http.request(res['exportLinks'][MIMETYPE])
          if data:
              fn = '%s.pdf' % os.path.splitext(filename)[0]
              with open(fn, 'wb') as fh:
                  fh.write(data)
              print('Downloaded "%s" (%s)' % (fn, MIMETYPE))
      
      Final note: this code sample is slightly different from previous posts in two big ways: 1) now that the Google APIs Client Library runs on Python 3, I'll try to produce only code samples for this blog that run unmodified under both 2.x and 3.x interpreters — the primary one-line difference being the import of the print() function, and 2) we're going to incorporate the use of the run_flow() function from oauth2client.tools and only fallback to the deprecated run() function if necessary — more info on this change available in this earlier post.

      If you run the script, grant the script access to your Google Drive (via the OAuth2 prompt that pops up in the browser), and then you should get output that looks like this:
      $ python drive_updown3.py # or python3
      Uploaded "hello.txt" (text/plain)
      Uploaded "hello.txt" (application/vnd.google-apps.document)
      Downloaded "hello.pdf" (application/pdf)
      

      Conclusion

      Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!):
      #!/usr/bin/env python
      
      from __future__ import print_function
      import os
      
      from apiclient.discovery import build
      from httplib2 import Http
      from oauth2client import file, client, tools
      try:
          import argparse
          flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args()
      except ImportError:
          flags = None
      
      SCOPES = 'https://www.googleapis.com/auth/drive.file'
      store = file.Storage('storage.json')
      creds = store.get()
      if not creds or creds.invalid:
          flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
          creds = tools.run_flow(flow, store, flags) \
                  if flags else tools.run(flow, store)
      DRIVE = build('drive', 'v2', http=creds.authorize(Http()))
      
      FILES = (
          ('hello.txt', False),
          ('hello.txt', True),
      )
      
      for filename, convert in FILES:
          metadata = {'title': filename}
          res = DRIVE.files().insert(convert=convert, body=metadata,
                  media_body=filename, fields='mimeType,exportLinks').execute()
          if res:
              print('Uploaded "%s" (%s)' % (filename, res['mimeType']))
      
      if res:
          MIMETYPE = 'application/pdf'
          res, data = DRIVE._http.request(res['exportLinks'][MIMETYPE])
          if data:
              fn = '%s.pdf' % os.path.splitext(filename)[0]
              with open(fn, 'wb') as fh:
                  fh.write(data)
              print('Downloaded "%s" (%s)' % (fn, MIMETYPE))
      
      You can now customize this code for your own needs, for a mobile frontend, sysadmin script, or a server-side backend, perhaps accessing other Google APIs. If you want to see another example of using the Drive API, check out this earlier post listing the files in Google Drive and its accompanying video as well as a similar example in the official docs or its equivalent in Java (server-side, Android), iOS (Objective-C, Swift), C#/.NET, PHP, Ruby, JavaScript (client-side, Node.js, Google Apps Script), or Go. That's it... hope you find these code samples useful in helping you get started with the Drive API!

      UPDATE: Since this post was published, the Google Drive team released a newer version of their API. Go to the next post to learn about migrating your app from v2 to v3 as well as link to my video which walks through the code samples in both posts.

      EXTRA CREDIT: Feel free to experiment and try something else to test your skills and challenge yourself as there's a lot more to Drive than just uploading and downloading files. Experiment with creating folders and manipulate files there, work with a folder of photos and organize them using the image metadata available to you, implement a search engine for your Drive files, etc. There are so many things you can do! 

      Wednesday, September 9, 2015

      Creating events in Google Calendar from Python

      NOTE: The code covered in this blogpost is also available in a video walkthrough here.

      UPDATE (Jan 2016): Tweaked the code to support oauth2client.tools.run_flow() which deprecates oauth2client.tools.run(). You can read more about that change and migration steps here.

      Introduction

      So far in this series of blogposts covering authorized Google APIs, we've used Python code to access Google Drive and Gmail. Today, we're going to demonstrate the Google Calendar API. While Google Calendar, and calendaring in general, have been around for a long time and are fairly stable, it's somewhat of a mystery as to why so few developers create good calendar integrations, whether it be with Google Calendar, or other systems. We'll try to show it isn't necessarily difficult and hopefully motivate some of you out there to add a calendaring feature in your next mobile or web app.

      Earlier posts demonstrated the structure and "how-to" use Google APIs in general, so more recent posts, including this one, focus on solutions and apps, and use of specific APIs. Once you review the earlier material, you're ready to start with authorization scopes then see how to use the API itself.

        Google Calendar API Scopes

        Below are the Google Calendar API scopes of authorization. There are only a pair (at the time of this writing): read-only and read/write. As usual, use the most restrictive scope you possibly can yet still allowing your app to do its work. This makes your app more secure and may prevent inadvertently going over any quotas, or accessing, destroying, or corrupting data. Also, users are less hesitant to install your app if it asks only for more restricted access to their calendars. However, it's likely that in order to really use the API to its fullest, you will probably have to ask for read-write so that you can add, update, or delete events in their calendars.
        • 'https://www.googleapis.com/auth/calendar.readonly' — Read-only access to calendar
        • 'https://www.googleapis.com/auth/calendar' — Read/write access to calendar

        Using the Google Calendar API

        We're going to create a sample Python script that inserts a new event into your Google Calendar. Since this requires modifying your calendar, you need the read/write scope above. The API name is 'calendar' which is currently on version 3, so here's the call to apiclient.discovery.build() you'll use:

        CAL = build('calendar', 'v3', http=creds.authorize(Http()))

        Note that all lines of code above that is predominantly boilerplate (that was explained in earlier posts and videos). Anyway, we've got an established service endpoint with build(), we need to come up with the data to create a calendar event with, at the very least, an event name plus start and end times.

        Timezone required (and in a specific format)

        The API requires a GMT offset, the number of hours your timezone is away from Coordinated Universal Time (UTC, more commonly known as GMT). The format is +/-HH:MM away from UTC. For example, Pacific Daylight Time (PDT, also known as Mountain Standard Time, or MST), is "-07:00," or seven hours behind UTC while Nepal Standard Time (NST [or NPT to avoid confusion with Newfoundland Standard Time]), is "+05:45," or five hours and forty-five minutes ahead of UTC. Also, the timezone must be in RFC 3339 format, which implements the specifications of ISO 8601 for the Internet. Timestamps look like the following in the required format: "YYYY-MM-DDTHH:MM:SS±HH:MM". For example, September 15, 2015 at 7 PM PDT is represented by this string: "2015-09-15T19:00:00-07:00".

        The sample script uses the PDT timezone, so we set the GMT_OFF variable to "-07:00". The EVENT body will hold the event name, and start and end times suffixed with the GMT offset:
        GMT_OFF = '-07:00'    # PDT/MST/GMT-7
        EVENT = {
            'summary': 'Dinner with friends',
            'start':   {'dateTime': '2015-09-15T19:00:00%s' % GMT_OFF},
            'end':     {'dateTime': '2015-09-15T22:00:00%s' % GMT_OFF},
        }
        Use the insert() method of the events() service to add the event. As expected, one required parameter is the ID of the calendar to insert the event into. A special value of 'primary' has been set aside for the currently authenticated user. The other required parameter is the event body. In our request, we also ask the Calendar API to send email notifications to the guests, and that's done by passing in the sendNotifications flag with a True value. Our call to the API looks like this:
        e = CAL.events().insert(calendarId='primary',
            sendNotifications=True, body=EVENT).execute()
        The one remaining thing is to confirm that the calendar event was created successfully. We do that by checking the return value — it should be an Event object with all the details we passed in a moment ago:
        print('''*** %r event added:
            Start: %s
            End:   %s''' % (e['summary'].encode('utf-8'),
                e['start']['dateTime'], e['end']['dateTime']))
        
        Now, if you really want some proof the event was created, one of the fields that's created is a link to the calendar event. We don't use it in the code, but you can... just use e['htmlLink'].

        Regardless, that's pretty much the entire script save for the OAuth2 code that we're so familiar with from previous posts. The script is posted below in its entirety, and if you run it, depending on the date/times you use, you'll see something like this:
        $ python gcal_insert.py
        *** 'Dinner with friends' event added:
            Start: 2015-09-15T19:00:00-07:00
            End:   2015-09-15T22:00:00-07:00
        It also works with Python 3 with one slight nit/difference being the "b" prefix on from the event name due to converting from Unicode to bytes:
        *** b'Dinner with friends' event added:

        Conclusion

        There can be much more to adding a calendar event, such as events that repeat with a recurrence rule, the ability to add attachments for an event, such as a party invitation or a PDF of the show tickets. For more on what you can do when creating events, take a look at the docs for events().insert() as well as the corresponding developer guide. All of the docs for the Google Calendar API can be found here. Also be sure to check out the companion video for this code sample. That's it!

        Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!):
        from __future__ import print_function
        from apiclient.discovery import build
        from httplib2 import Http
        from oauth2client import file, client, tools
        
        try:
            import argparse
            flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args()
        except ImportError:
            flags = None
        
        SCOPES = 'https://www.googleapis.com/auth/calendar'
        store = file.Storage('storage.json')
        creds = store.get()
        if not creds or creds.invalid:
            flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
            creds = tools.run_flow(flow, store, flags) \
                    if flags else tools.run(flow, store)
        CAL = build('calendar', 'v3', http=creds.authorize(Http()))
        
        GMT_OFF = '-07:00'      # PDT/MST/GMT-7
        EVENT = {
            'summary': 'Dinner with friends',
            'start':  {'dateTime': '2015-09-15T19:00:00%s' % GMT_OFF},
            'end':    {'dateTime': '2015-09-15T22:00:00%s' % GMT_OFF},
            'attendees': [
                {'email': 'friend1@example.com'},
                {'email': 'friend2@example.com'},
            ],
        }
        
        e = CAL.events().insert(calendarId='primary',
                sendNotifications=True, body=EVENT).execute()
        
        print('''*** %r event added:
            Start: %s
            End:   %s''' % (e['summary'].encode('utf-8'),
                e['start']['dateTime'], e['end']['dateTime']))
        
        You can now customize this code for your own needs, for a mobile frontend, a server-side backend, or to access other Google APIs. If you want to see another example of using the Calendar API (listing the next 10 events in your calendar), check out the Python Quickstart example or its equivalent in Java (server-side, Android), iOS (Objective-C, Swift), C#/.NET, PHP, Ruby, JavaScript (client-side, Node.js), or Go. That's it... hope you find these code samples useful in helping you get started with the Calendar API!

        EXTRA CREDIT: To test your skills and challenge yourself, try creating recurring events (such as when you expect to receive your paycheck), events with attachments, or even editing existing events.

        Thursday, August 6, 2015

        Accessing Gmail from Python (plus BONUS)

        NOTE: The code covered in this blogpost is also available in a video walkthrough here.

        UPDATE (Aug 2016): The code has been modernized to use oauth2client.tools.run_flow() instead of the deprecated oauth2client.tools.run(). You can read more about that change here.

        Introduction

        The last several posts have illustrated how to connect to public/simple and authorized Google APIs. Today, we're going to demonstrate accessing the Gmail (another authorized) API. Yes, you read that correctly... "API." In the old days, you access mail services with standard Internet protocols such as IMAP/POP and SMTP. However, while they are standards, they haven't kept up with modern day email usage and developers' needs that go along with it. In comes the Gmail API which provides CRUD access to email threads and drafts along with messages, search queries, management of labels (like folders), and domain administration features that are an extra concern for enterprise developers.

        Earlier posts demonstrate the structure and "how-to" use Google APIs in general, so the most recent posts, including this one, focus on solutions and apps, and use of specific APIs. Once you review the earlier material, you're ready to start with Gmail scopes then see how to use the API itself.

          Gmail API Scopes

          Below are the Gmail API scopes of authorization. We're listing them in most-to-least restrictive order because that's the order you should consider using them in  use the most restrictive scope you possibly can yet still allowing your app to do its work. This makes your app more secure and may prevent inadvertently going over any quotas, or accessing, destroying, or corrupting data. Also, users are less hesitant to install your app if it asks only for more restricted access to their inboxes.
          • 'https://www.googleapis.com/auth/gmail.readonly' — Read-only access to all resources + metadata
          • 'https://www.googleapis.com/auth/gmail.send' — Send messages only (no inbox read nor modify)
          • 'https://www.googleapis.com/auth/gmail.labels' — Create, read, update, and delete labels only
          • 'https://www.googleapis.com/auth/gmail.insert' — Insert and import messages only
          • 'https://www.googleapis.com/auth/gmail.compose' — Create, read, update, delete, and send email drafts and messages
          • 'https://www.googleapis.com/auth/gmail.modify' — All read/write operations except for immediate & permanent deletion of threads & messages
          • 'https://mail.google.com/' — All read/write operations (use with caution)

          Using the Gmail API

          We're going to create a sample Python script that goes through your Gmail threads and looks for those which have more than 2 messages, for example, if you're seeking particularly chatty threads on mailing lists you're subscribed to. Since we're only peeking at inbox content, the only scope we'll request is 'gmail.readonly', the most restrictive scope. The API string is 'gmail' which is currently on version 1, so here's the call to apiclient.discovery.build() you'll use:

          GMAIL = discovery.build('gmail', 'v1', http=creds.authorize(Http()))

          Note that all lines of code above that is predominantly boilerplate (that was explained in earlier posts). Anyway, once you have an established service endpoint with build(), you can use the list() method of the threads service to request the file data. The one required parameter is the user's Gmail address. A special value of 'me' has been set aside for the currently authenticated user.
          threads = GMAIL.users().threads().list(userId='me').execute().get('threads', [])
          If all goes well, the (JSON) response payload will (not be empty or missing and) contain a sequence of threads that we can loop over. For each thread, we need to fetch more info, so we issue a second API call for that. Specifically, we care about the number of messages in a thread:
          for thread in threads:
              tdata = GMAIL.users().threads().get(userId='me', id=thread['id']).execute()
              nmsgs = len(tdata['messages'])
          
          We're seeking only all threads more than 2 (that means at least 3) messages, discarding the rest. If a thread meets that criteria, scan the first message and cycle through the email headers looking for the "Subject" line to display to users, skipping the remaining headers as soon as we find one:
              if nmsgs > 2:
                  msg = tdata['messages'][0]['payload']
                  subject = ''
                  for header in msg['headers']:
                      if header['name'] == 'Subject':
                          subject = header['value']
                          break
                  if subject:
                      print('%s (%d msgs)' % (subject, nmsgs))
          
          If you're on many mailing lists, this may give you more messages than desired, so feel free to up the threshold from 2 to 50, 100, or whatever makes sense for you. (In that case, you should use a variable.) Regardless, that's pretty much the entire script save for the OAuth2 code that we're so familiar with from previous posts. The script is posted below in its entirety, and if you run it, you'll see an interesting collection of threads... YMMV depending on what messages are in your inbox:
          $ python3 gmail_threads.py
          [Tutor] About Python Module to Process Bytes (3 msgs)
          Core Python book review update (30 msgs)
          [Tutor] scratching my head (16 msgs)
          [Tutor] for loop for long numbers (10 msgs)
          [Tutor] How to show the listbox from sqlite and make it searchable? (4 msgs)
          [Tutor] find pickle and retrieve saved data (3 msgs)
          

          BONUS: Python 3!

          As of Mar 2015 (formally in Apr 2015 when the docs were updated), support for Python 3 was added to Google APIs Client Library (3.3+)! This update was a long time coming (relevant GitHub thread), and allows Python 3 developers to write code that accesses Google APIs. If you're already running 3.x, you can use its pip command (pip3) to install the Client Library:

          $ pip3 install -U google-api-python-client

          Because of this, unlike previous blogposts, we're deliberately going to avoid use of the print statement and switch to the print() function instead. If you're still running Python 2, be sure to add the following import so that the code will also run in your 2.x interpreter:

          from __future__ import print_function

          Conclusion

          To find out more about the input parameters as well as all the fields that are in the response, take a look at the docs for threads().list(). For more information on what other operations you can execute with the Gmail API, take a look at the reference docs and check out the companion video for this code sample. That's it!

          Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!):
          from __future__ import print_function
          
          from apiclient import discovery
          from httplib2 import Http
          from oauth2client import file, client, tools
          
          SCOPES = 'https://www.googleapis.com/auth/gmail.readonly'
          store = file.Storage('storage.json')
          creds = store.get()
          if not creds or creds.invalid:
              flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
              creds = tools.run_flow(flow, store)
          GMAIL = discovery.build('gmail', 'v1', http=creds.authorize(Http()))
          
          threads = GMAIL.users().threads().list(userId='me').execute().get('threads', [])
          for thread in threads:
              tdata = GMAIL.users().threads().get(userId='me', id=thread['id']).execute()
              nmsgs = len(tdata['messages'])
          
              if nmsgs > 2:
                  msg = tdata['messages'][0]['payload']
                  subject = ''
                  for header in msg['headers']:
                      if header['name'] == 'Subject':
                          subject = header['value']
                          break
                  if subject:
                      print('%s (%d msgs)' % (subject, nmsgs))
          
          You can now customize this code for your own needs, for a mobile frontend, a server-side backend, or to access other Google APIs. If you want to see another example of using the Gmail API (displaying all your inbox labels), check out the Python Quickstart example in the official docs or its equivalent in Java (server-side, Android), iOS (Objective-C, Swift), C#/.NET, PHP, Ruby, JavaScript (client-side, Node.js), or Go. That's it... hope you find these code samples useful in helping you get started with the Gmail API!

          EXTRA CREDIT: To test your skills and challenge yourself, try writing code that allows users to perform a search across their email, or perhaps creating an email draft, adding attachments, then sending them! Note that to prevent spam, there are strict Program Policies that you must abide with... any abuse could rate limit your account or get it shut down. Check out those rules plus other Gmail terms of use here.

          Tuesday, April 7, 2015

          Google APIs: migrating from tools.run() to tools.run_flow()

          Got AttributeError? As in: AttributeError: 'module' object has no attribute 'run'? Rename run() to run_flow(), and you'll be good-to-go. TL;DR: This mini-tutorial slash migration guide slash PSA (public service announcement) is aimed at Python developers using the Google APIs Client Library (to access Google APIs from their applications) currently calling oauth2client.tools.run() and likely getting an exception (see Jan 2016 update below), and need to oauth2client.tools.run_flow(), its replacement. 

          UPDATE (Aug 2016): The flags parameter in run_flow() function became optional in Feb 2016, so tweaked the blogpost to reflect that.

          UPDATE (Jun 2016): Revised the code and cleaned up the dialog so there are no longer any instances of using run() function, significantly shortening this post.

          UPDATE (Jan 2016): The tools.run() function itself was forcibly removed (without a fallback) in Aug 2015, so if you're using any release on or after that, any such calls from your code will throw an exception (AttributeError: 'module' object has no attribute 'run'). To fix this problem, continue reading.

          Prelude

          We're going to continue our look at accessing Google APIs from Python. In addition to the previous pair of posts (http://goo.gl/57Gufk and http://goo.gl/cdm3kZ), as part of my day job, I've been working on corresponding video content, some of which are tied specifically to posts on this blog.

          In this follow-up, we're going to specifically address the sidebar in the previous post, where we bookmarked an item for future discussion where the future is now: in the oauth2client package, tools.run() has been deprecated by tools.run_flow(). Note you need at least Python 2.7 or 3.3 to use the Google APIs Client Library. (If you didn't even know Python 3 was supported at all, then you need to see this post and this Quora Q&A.)

          Replacing tools.run() with tools.run_flow()

          Now let's convert the authorized access to Google APIs code from using tools.run() to tools.run_flow(). Here is the old snippet I'm talking about that needs upgrading:
          from apiclient import discovery
          from httplib2 import Http
          from oauth2client import file, client, tools
          
          SCOPES = # one or more scopes (str or iterable)
          store = file.Storage('storage.json')
          creds = store.get()
          if not creds or creds.invalid:
              flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
              creds = tools.run(flow, store)
          
          SERVICE = discovery.build(API, VERSION, http=creds.authorize(Http()))
          
          If you're using the latest Client Library (as of Feb 2016), all you need to do is change the tools.run() call to tools.run_flow(), as italicized below. Everything else stays exactly the same:
          from apiclient import discovery
          from httplib2 import Http
          from oauth2client import file, client, tools
          
          SCOPES = # one or more scopes (str or iterable)
          store = file.Storage('storage.json')
          creds = store.get()
          if not creds or creds.invalid:
              flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
              creds = tools.run_flow(flow, store)
          If you don't have the latest Client Library, then your update involves the extra steps of adding lines that import argparse and using it to get the flags argument needed by tools.run_flow() plus the actual change from tools.run(); all updates italicized below:
          import argparse
          
          from apiclient import discovery
          from httplib2 import Http
          from oauth2client import file, client, tools
          
          SCOPES = # one or more scopes (str or iterable)
          store = file.Storage('storage.json')
          creds = store.get()
          if not creds or creds.invalid:
              flags = argparse.ArgumentParser(parents=[tools.argparser]).parse_args()
              flow = client.flow_from_clientsecrets('client_id.json', SCOPES)
              creds = tools.run_flow(flow, store, flags)
          
          SERVICE = discovery.build(API, VERSION, http=creds.authorize(Http()))
          

          Command-line argument processing, or "Why argparse?"

          Python has had several modules in the Standard Library that allow developers to process command-line arguments. The original one was getopt which mirrored the getopt() function from C. In Python 2.3, optparse was introduced, featuring more powerful processing capabilities. However, it was deprecated in 2.7 in favor of a similar module, argparse. (To find out more about their similarities, differences and rationale behind developing argparse , see PEP 389 and this argparse docs page.) For the purposes of using Google APIs, you're all set if using Python 2.7 as it's included in the Standard Library. Otherwise Python 2.3-2.6 users can install it with: "pip install -U argparse". 

          Irregardless of whether you need argparse, once you migrate to either snippet with tools.run_flow(), your application should go back to working the way it had before.