Monday, July 11, 2016

Exporting a Google Sheet spreadsheet as CSV

Introduction

Today, we'll follow-up to my earlier post on the Google Sheets API and multiple posts (first, secondthird) on the Google Drive API by answering one common question: How do you download a Google Sheets spreadsheet as a CSV file? The "FAQ"ness of the question itself as well as various versions of Google APIs has led to many similar StackOverflow questions: one, two, three, four, five, just to list a few. Let's answer this question definitively and walk through a Python code sample that does exactly that. The main assumption is that you have a Google Sheet file in your Google Drive named "inventory".

Choosing the right API

Upon first glance, developers may think the Google Sheets API is the one to use. Unfortunately that isn't the case. The Sheets API is the one to use for spreadsheet-oriented operations, such as inserting data, reading spreadsheet rows, managing individual tab/sheets within a spreadsheet, cell formatting, creating charts, adding pivot tables, etc., It isn't meant to perform file-based requests like exporting a Sheet in CSV (comma-separated values) format. For file-oriented operations with a Google Sheet, you would use the Google Drive API.

Using the Google Drive API

As mentioned earlier, Google Drive features numerous API scopes of authorization. As usual, we always recommend you use the most restrictive scope possible that allows your app to do its work. You'll request fewer permissions from your users (which makes them happier), and it also makes your app more secure, possibly preventing modifying, destroying, or corrupting data, or perhaps inadvertently going over quotas. Since we're only exporting a Google Sheets file from Google Drive, the only scope we need is:
  • 'https://www.googleapis.com/auth/drive.readonly' — Read-only access to file content or metadata
The earlier post I wrote on the Google Drive API featured sample code that exported an uploaded Google Docs file as PDF and download that from Drive. This post will not only feature a change to exporting a Google Sheets file in CSV format, but also demonstrate one additional feature of the Drive API: querying

Since we've fully covered the authorization boilerplate fully in earlier posts and videos, we're going to skip that here and jump right to the action, creating of a service endpoint to Drive. The API name is (of course 'drive', and the current version of the API is 3, so use the string 'v3' in this call to the apiclient.discovey.build() function:

DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))

Query and export files from Google Drive

While unnecessary, we'll create a few string constants representing the filename, source and destination file MIME types to make the code easier to understand:
FILENAME = 'inventory'
SRC_MIMETYPE = 'application/vnd.google-apps.spreadsheet'
DST_MIMETYPE = 'text/csv'
In this simple example, we're only going to export one Google Sheets file as CSV, arbitrarily choosing a file named, "inventory." So to perform the query, you need both the filename and its MIME type, "application/vnd.google-apps.spreadsheet". Query components are conjoined with the "and" keyword, so your query string will look like this: q='name="%s" and mimeType="%s"' % (FILENAME, SRC_MIMETYPE).

Since there may be more than one Google Sheets file named 'inventory". we opt for newest one and thus need to sort all matching files in descending order of last modification time then name if "mtime"s are identical via an "order by" clause: orderBy='modifiedTime desc,name'. Here is the complete call to DRIVE.files().list() to issue the query:
files = DRIVE.files().list(
    q='name="%s" and mimeType="%s"' % (FILENAME, SRC_MIMETYPE),
    orderBy='modifiedTime desc,name').execute().get('files', [])
If any files match, the payload will contain a 'files' key, else we default to an empty list and display to the user on the last line that no files were found. Otherwise, grab the first match, the most recently-modified 'inventory' file, create a suitable CSV filename from it, and change all spaces to underscores:

fn = '%s.csv' % os.path.splitext(files[0]['name'].replace(' ', '_'))[0]

The final Drive API call requests an export of 'inventory' as a CSV file, and if successful, the downloaded data is written with the filename above. In either case, the user is notified of success or failure of the export:
data = DRIVE.files().export(fileId=files[0]['id'], mimeType=DST_MIMETYPE).execute()
if data:
    with open(fn, 'wb') as f:
        f.write(data)
    print('DONE')
else:
    print('ERROR (could not download file)')
Note that if downloading as CSV, the Drive API only exports of the first sheet in a Sheets file... you won't get any others. However, it does support 3 other download formats that will get you all the sheets.

If you create a Sheets file named 'inventory', run the script, grant the script access to your Google Drive (via the OAuth2 prompt that pops up in the browser), and then you should get output that looks like this:
$ python drive_sheets_csv_export.py # or python3
Exporting "inventory" as "inventory.csv"... DONE

Conclusion

Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!). If I were to divide the script into 4 major sections, they would be:
  • Get creds & build Google Drive service endpoint
  • Source and destination file info
  • Query Google Drive for matching files
  • Export most recent matching Sheets file as CSV

Here's the code itself:
from __future__ import print_function
import os

from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = 'https://www.googleapis.com/auth/drive.readonly'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
    creds = tools.run_flow(flow, store)
DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))

FILENAME = 'inventory'
SRC_MIMETYPE = 'application/vnd.google-apps.spreadsheet'
DST_MIMETYPE = 'text/csv'

files = DRIVE.files().list(
    q='name="%s" and mimeType="%s"' % (FILENAME, SRC_MIMETYPE),
    orderBy='modifiedTime desc,name').execute().get('files', [])

if files:
    fn = '%s.csv' % os.path.splitext(files[0]['name'].replace(' ', '_'))[0]
    print('Exporting "%s" as "%s"... ' % (files[0]['name'], fn), end='')
    data = DRIVE.files().export(fileId=files[0]['id'], mimeType=DST_MIMETYPE).execute()
    if data:
        with open(fn, 'wb') as f:
            f.write(data)
        print('DONE')
    else:
        print('ERROR (could not download file)')
else:
    print('!!! ERROR: File not found')
As with our other code samples, you can now customize for your own needs, for a mobile frontend, sysadmin script, or a server-side backend, perhaps accessing other Google APIs. Hope this helps answer yet another frequently asked question!

3 comments:

  1. Very helpfull!! If someone is trying to use this in 2020 oauth2client has been deprecated and google-auth is recommended.
    Below is an issue from GitHub to replace oauth2client with google-auth:
    https://github.com/googleapis/google-api-python-client/issues/679

    ReplyDelete
  2. Correct, although `oauth2client` will still work for the foreseeable future (because so much code depends on it), however they're not adding new features. The "intro to G Suite APIs" script (featuring the Drive API) with both old & new version using `google-auth` (and `oauthlib` under the covers) are available at this sample's GitHub repo at https://github.com/googlecodelabs/gsuite-apis-intro. A `diff` b/w this pair of scripts will show the exact changes that need to be made. A much longer explanation for Python developers (which will turn into a semi-official blog post at some point) is available via a larger script (showing how to download images from Drive, archiving them to Google Cloud Storage, sending them to the Cloud Vision API for analysis, and have the results written into a row of a Google Sheet) where the migration technique I just described is outlined in detail in the `README` in the `alt` folder offering the various alternatives: https://github.com/googlecodelabs/analyze_gsimg/tree/master/alt. (At some point, I'll have videos & blog posts on that larger example too.)

    ReplyDelete
  3. This article was incredibly useful to me - many thanks!

    ReplyDelete