Tuesday, December 13, 2016

Formatting text with the Google Slides API

NOTE: The code covered in this post are also available in a video walkthrough.

Introduction

If you know something about public speaking, you're aware that the most effective presentations are those which have more images and less text. As a developer of applications that auto-generate slide decks, this is even more critical as you must ensure that your code creates the most compelling presentations possible for your users.

This means that any text featured in those slide decks must be more impactful. To that end, it's important you know how to format any text you do have. That's the exact subject of today's post, showing you how to format text in a variety of ways using Python and the Google Slides API.

The API is fairly new, so if you're unfamiliar with it, check out the launch post and take a peek at the API overview page to acclimate yourself to it first. You can also read related posts (and videos) explaining how to replace text & images with the API or how to generate slides from spreadsheet data. If you're ready-to-go, let's move on!

Using the Google Slides API

The demo script requires creating a new slide deck so you need the read-write scope for Slides:
  • 'https://www.googleapis.com/auth/presentations' — Read-write access to Slides and Slides presentation properties
If you're new to using Google APIs, we recommend reviewing earlier posts & videos covering the setting up projects and the authorization boilerplate so that we can focus on the main app. Once we've authorized our app, assume you have a service endpoint to the API and have assigned it to the SLIDES variable.

Create deck & set up new slide for text formatting

A new slide deck can be created with SLIDES.presentations().create()—or alternatively with the Google Drive API which we won't do here. We'll name it, "Slides text formatting DEMO" and save its ID along with the IDs of the title and subtitle textboxes on the auto-created title slide:
DATA = {'title': 'Slides text formatting DEMO'}
rsp = SLIDES.presentations().create(body=DATA).execute()
deckID = rsp['presentationId']
titleSlide = rsp['slides'][0]
titleID = titleSlide['pageElements'][0]['objectId']
subtitleID = titleSlide['pageElements'][1]['objectId']
The title slide only has two elements on it, the title and subtitle textboxes, returned in that order, hence why we grab them at indexes 0 and 1 respectively. Now that we have a deck, let's add a slide that has a single (largish) textbox. The slide layout with that characteristic that works best for our demo is the "main point" template:



While we're at it, let's also add the title & subtitle on the title slide. Here's the snippet that builds and executes all three requests:
print('** Create "main point" layout slide & add titles')
reqs = [
  {'createSlide':
     {'slideLayoutReference': {'predefinedLayout': 'MAIN_POINT'}}},
  {'insertText':
     {'objectId': titleID, 'text': 'Formatting text'}},
  {'insertText':
     {'objectId': subtitleID, 'text': 'via the Google Slides API'}},
]
rsp = SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=deckID).execute().get('replies')
slideID = rsp[0]['createSlide']['objectId']
The requests are sent in the order you see above, and responses come back in the same order. We don't care much about the 'insertText' directives, but we do want to get the ID of the newly-created slide. In the array of 3 returned responses, that slideID comes first.

Why do we need the slide ID? Well, since we're going to be using the one textbox on that slide, the only way to get the ID of that textbox is by doing a presentations().pages().get() call to fetch all the objects on that slide. Since there's only one "page element," the textbox in question, we make that call and save the first (and only) object's ID:
print('** Fetch "main point" slide title (textbox) ID')
rsp = SLIDES.presentations().pages().get(presentationId=deckID,
        pageObjectId=slideID).execute().get('pageElements')
textboxID = rsp[0]['objectId']
Armed with the textbox ID, we're ready to add our text and format it!

Formatting text

The last part of the script starts by inserting seven (short) paragraphs of text—then format different parts of that text (in a variety of ways). Take a look here, then we'll discuss below:
reqs = [
    # add 6 paragraphs
    {'insertText': {
        'text': 'Bold 1\nItal 2\n\tfoo\n\tbar\n\t\tbaz\n\t\tqux\nMono 3',
        'objectId': textboxID,
    }},
    # shrink text from 48pt ("main point" textbox default) to 32pt
    {'updateTextStyle': {
        'objectId': textboxID,
        'style': {'fontSize': {'magnitude': 32, 'unit': 'PT'}},
        'textRange': {'type': 'ALL'},
        'fields': 'fontSize',
    }},
    # change word 1 in para 1 ("Bold") to bold
    {'updateTextStyle': {
        'objectId': textboxID,
        'style': {'bold': True},
        'textRange': {'type': 'FIXED_RANGE', 'startIndex': 0, 'endIndex': 4},
        'fields': 'bold',
    }},
    # change word 1 in para 2 ("Ital") to italics
    {'updateTextStyle': {
        'objectId': textboxID,
        'style': {'italic': True},
        'textRange': {'type': 'FIXED_RANGE', 'startIndex': 7, 'endIndex': 11},
        'fields': 'italic'
    }},
    # change word 1 in para 7 ("Mono") to Courier New
    {'updateTextStyle': {
        'objectId': textboxID,
        'style': {'fontFamily': 'Courier New'},
        'textRange': {'type': 'FIXED_RANGE', 'startIndex': 36, 'endIndex': 40},
        'fields': 'fontFamily'
    }},
    # bulletize everything
    {'createParagraphBullets': {
        'objectId': textboxID,
        'textRange': {'type': 'ALL'},
    }},
]
After the text is inserted, the first operation this code performs is to change the font size of all the text inserted ('ALL' means to format the entire text range) to 32 pt. The main point layout specifies a default font size of 48 pt, so this request shrinks the text so that everything fits and doesn't wrap. The 'fields' parameter specifies that only the 'fontSize' attribute is affected by this command, meaning leave others such as the font type, color, etc., alone.

The next request bolds the first word of the first paragraph. Instead of 'ALL', the exact range for the first word is given. (NOTE: the end index is excluded from the range, so that's why it must be 4 instead of 3, or you're going to lose one character.) In this case, it's the "Bold" word from the first paragraph, "Bold 1". Again, 'fields' is present to indicate that only the font size should be affected by this request while everything else is left alone. The next directive is nearly identical except for italicizing the first word ("Ital") of the second paragraph ("Ital 2").

After this we have a text style request to alter the font of the first word ("Mono") in the last paragraph ("Mono 3") to Courier New. The only other difference is that 'fields' is now 'fontFamily' instead of a flag. Finally, bulletize all paragraphs. Another call to SLIDES.presentations().batchUpdate() and we're done.

Conclusion

If you run the script, you should get output that looks something like this, with each print() representing execution of key parts of the application:
$ python3 slides_format_text.py 
** Create new slide deck
** Create "main point" layout slide & add titles
** Fetch "main point" slide title (textbox) ID
** Insert text & perform various formatting operations
DONE
When the script has completed, you should have a new presentation with these slides:




Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!)—by using, copying, and/or modifying this code or any other piece of source from this blog, you implicitly agree to its Apache2 license:
from __future__ import print_function

from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = 'https://www.googleapis.com/auth/presentations',
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
    creds = tools.run_flow(flow, store)
SLIDES = discovery.build('slides', 'v1', http=creds.authorize(Http()))

print('** Create new slide deck')
DATA = {'title': 'Slides text formatting DEMO'}
rsp = SLIDES.presentations().create(body=DATA).execute()
deckID = rsp['presentationId']
titleSlide = rsp['slides'][0]
titleID = titleSlide['pageElements'][0]['objectId']
subtitleID = titleSlide['pageElements'][1]['objectId']

print('** Create "main point" layout slide & add titles')
reqs = [
    {'createSlide': {'slideLayoutReference': {'predefinedLayout': 'MAIN_POINT'}}},
    {'insertText': {'objectId': titleID, 'text': 'Formatting text'}},
    {'insertText': {'objectId': subtitleID, 'text': 'via the Google Slides API'}},
]
rsp = SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=deckID).execute().get('replies')
slideID = rsp[0]['createSlide']['objectId']

print('** Fetch "main point" slide title (textbox) ID')
rsp = SLIDES.presentations().pages().get(presentationId=deckID,
        pageObjectId=slideID).execute().get('pageElements')
textboxID = rsp[0]['objectId']

print('** Insert text & perform various formatting operations')
reqs = [
    # add 7 paragraphs
    {'insertText': {
        'text': 'Bold 1\nItal 2\n\tfoo\n\tbar\n\t\tbaz\n\t\tqux\nMono 3',
        'objectId': textboxID,
    }},
    # shrink text from 48pt ("main point" textbox default) to 32pt
    {'updateTextStyle': {
        'objectId': textboxID,
        'style': {'fontSize': {'magnitude': 32, 'unit': 'PT'}},
        'textRange': {'type': 'ALL'},
        'fields': 'fontSize',
    }},
    # change word 1 in para 1 ("Bold") to bold
    {'updateTextStyle': {
        'objectId': textboxID,
        'style': {'bold': True},
        'textRange': {'type': 'FIXED_RANGE', 'startIndex': 0, 'endIndex': 4},
        'fields': 'bold',
    }},
    # change word 1 in para 2 ("Ital") to italics
    {'updateTextStyle': {
        'objectId': textboxID,
        'style': {'italic': True},
        'textRange': {'type': 'FIXED_RANGE', 'startIndex': 7, 'endIndex': 11},
        'fields': 'italic'
    }},
    # change word 1 in para 6 ("Mono") to Courier New
    {'updateTextStyle': {
        'objectId': textboxID,
        'style': {'fontFamily': 'Courier New'},
        'textRange': {'type': 'FIXED_RANGE', 'startIndex': 36, 'endIndex': 40},
        'fields': 'fontFamily'
    }},
    # bulletize everything
    {'createParagraphBullets': {
        'objectId': textboxID,
        'textRange': {'type': 'ALL'},
    }},
]
SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=deckID).execute()
print('DONE')
As with our other code samples, you can now customize it to learn more about the API, integrate into other apps for your own needs, for a mobile frontend, sysadmin script, or a server-side backend!

Tuesday, December 6, 2016

Modifying email signatures with the Gmail API

NOTE: The content here is also available as a video and overview post, part of this series.

UPDATE (Feb 2017): Tweaked the code sample as the isPrimary flag may be missing from non-primary aliases; also added link above to video.

Introduction

In a previous post, I introduced Python developers to the Gmail API with a tutorial on how to search for threads with a minimum number of messages. Today, we'll explore another part of the API, covering the settings endpoints that were added in mid-2016. What's the big deal? Well, you couldn't use the API to read nor modify user settings before, and now you can!

One example all of us can relate to is your personal email signature. Wouldn't it be great if we could modify it programmatically, say to include some recent news about you (perhaps a Tweet other social post), or maybe some random witty quote? You could then automate it to change once a quarter, or even hourly if you like being truly random!

Using the Gmail API

Our simple Python script won't be sending email nor reading user messages, so the only authorization scope needed is the one that accesses basic user settings (there's another for more sensitive user settings):
  • https://www.googleapis.com/auth/gmail.settings.basic — Manage basic Gmail user settings
See the documentation for a list of all Gmail API scopes and what each of them mean. Since we've fully covered the authorization boilerplate in earlier posts and videos, including how to connect to the Gmail API, we're going to skip that here and jump right to the action. You can copy the boilerplate from other scripts you've written. Regardless, be sure to create an service endpoint to the API:

GMAIL = discovery.build('gmail', 'v1',
    http=creds.authorize(Http()))


What are "sendAs" email addresses?

First, a quick word about "sendAs" email addresses. Gmail lets you send email from addresses other than your actual Gmail address (considered your primary address). This lets you manage multiple accounts from the same Gmail user interface. (As expected, you need to own or otherwise have access to the alternate email addresses in order to do this.) However, most people only use their primary address, so you may not know about it. You can learn more about sendAs addresses here and here.

Now you may be tempted to use the term "alias," especially because that word was mentioned in those Help pages you just looked at right? However for now, I'd recommend trying to avoid that terminology as it refers to something else in a G Suite/Google Apps context. Can't you see how we already got distracted from the core reason for this post? See, you almost forgot about email signatures already, right? If you stick with "sender addresses" or "sendAs email addresses," there won’t be any confusion.

Using a "Quote of the Day" in your email signature

The Python script we're exploring in this post sets a "Quote of the Day" (or "QotD" for short) as the signature of your primary sendAs address. Where does the QotD come from? Well, it can be as simple (and boring) as this function that returns a hardcoded string:



Cute but not very random right? A better idea is to choose from a number of quotes you have in a relational database w/columns for quotes & authors. Here’s some sample code for data in a SQLite database:



More random, which is cool, but this particular snippet isn't efficient because we’re selecting all rows and then choosing a quote randomly. Obviously there's a better way if a database is your data source. I prefer using a web service instead, coming in the form of a REST API. The code snippet here does just that:



You only need to find a quote-of-the-day service and provide its URL on line 8 that returns a JSON payload. Obviously you'll need a bit more scaffolding if this were a real service, but in this pseudocode example, you can assume that using urllib.{,request.}urlopen() works where the service sends back an empty string upon failure. To play it safe, this snippet falls back to the hardcoded string we saw earlier if the service doesn't return a quote, which comes back as a 2-tuple representing quote and author, respectively.

Setting your new email signature

Now that we're clear on the source for the QotD, we can focus on actually setting it as your new email signature. To do that, we need to get all of your sender (sendAs email) addresses—the goal is only to change your primary addresses (and none of the others if you have any):
addresses = GMAIL.users().settings().sendAs().list(userId='me',
    fields='sendAs(isPrimary,sendAsEmail)').execute().get('sendAs')
As in our other Gmail example, a userId of 'me' indicates the currently-authenticated user. The API will return a number of attributes. If know exactly which ones we want, we can specify them in with the fields attribute so as to control size of the return payload which may contribute to overall latency. In our case, we're requesting just the sendAs.isPrimary flag and sendAs.sendAsEmail, the actual email address string of the sender addresses. What's returned is a Python list consisting of all of your sendAs email addresses, which we cycle through to find the primary address:
for address in addresses:
    if address.get('isPrimary'):
        break
One of your sender addresses must be primary, so unless there's a bug in Gmail, when control of the for loop concludes, address will point to your primary sender address. Now all you have to do is set the signature and confirm to the user:
rsp = GMAIL.users().settings().sendAs().patch(userId='me',
        sendAsEmail=address['sendAsEmail'], body=DATA).execute()
print("Signature changed to '%s'" % rsp['signature'])
If you only have one sender address, there's no need request all the addresses and loop through them looking for the primary address as we did above. In such circumstances, that entire request and loop are extraneous... just pass your email address as the sendAsEmail argument, like this:
rsp = GMAIL.users().settings().sendAs().patch(userId='me',
        sendAsEmail=YOUR_EMAIL_ADDR_HERE, body=DATA).execute()

Conclusion

That's all there is... just 26 lines of code. If we use the static string qotd() function above, your output when running this script will look like this:
$ python gmail_change_sig.py # or python3
Signature changed to '"I heart cats."  ~anonymous'
$
Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!). By using, copying, and/or modifying this code or any other piece of source from this blog, you implicitly agree to its Apache2 license:
from __future__ import print_function

from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

import qotd
DATA = {'signature': qotd.qotd()}   # quote source up-to-you!

SCOPES = 'https://www.googleapis.com/auth/gmail.settings.basic'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
    creds = tools.run_flow(flow, store)
GMAIL = discovery.build('gmail', 'v1', http=creds.authorize(Http()))
# this entire block optional if you only have one sender address
addresses = GMAIL.users().settings().sendAs().list(userId='me',
        fields='sendAs(isPrimary,sendAsEmail)').execute().get('sendAs')
for address in addresses:
    if address.get('isPrimary'):
        break
rsp = GMAIL.users().settings().sendAs().patch(userId='me',
        sendAsEmail=address['sendAsEmail'], body=DATA).execute()
print("Signature changed to '%s'" % rsp['signature'])
As with our other code samples, you can now customize for your own needs, for a mobile frontend, sysadmin script, or a server-side backend, perhaps accessing other Google APIs.

Code challenge

Want to exercise your newfound knowledge of using the Gmail API's settings endpoints? Write a script that uses the API to manage filters or configure a vacation responder. HINT: take a look at the official Gmail API docs, including the pages specific to filters and vacation settings.

Tuesday, November 29, 2016

Generating slides from spreadsheet data

NOTE: The code covered in this post are also available in a video walkthrough.


Introduction

A common use case when you have data in a spreadsheet or database, is to find ways of making that data more visually appealing to others. This is the subject of today's post, where we'll walk through a simple Python script that generates presentation slides based on data in a spreadsheet using both the Google Sheets and Slides APIs.

Specifically, we'll take all spreadsheet cells containing values and create an equivalent table on a slide with that data. The Sheet also features a pre-generated pie chart added from the Explore in Google Sheets feature that we'll import into a blank slide. Not only do we do that, but if the data in the Sheet is updated (meaning the chart is as well), then so can the imported chart image in the presentation. These are just two examples of generating slides from spreadsheet data. The example Sheet we're getting the data from for this script looks like this:


The data in this Sheet originates from the Google Sheets API codelab. In the codelab, this data lives in a SQLite relational database, and in the previous post covering how to migrate SQL data to Google Sheets, we "imported" that data into the Sheet we're using. As mentioned before, the pie chart comes from the Explore feature.

Using the Google Sheets & Slides APIs

The scopes needed for this application are the read-only scope for Sheets (to read the cell contents and the pie chart) and the read-write scope for Slides since we're creating a new presentation:
  • 'https://www.googleapis.com/auth/spreadsheets.readonly' — Read-only access to Google Sheets and properties
  • 'https://www.googleapis.com/auth/presentations' — Read-write access to Slides and Slides presentation properties
If you're new to using Google APIs, we recommend reviewing earlier posts & videos covering the setting up projects and the authorization boilerplate so that we can focus on the main app. Once we've authorized our app, two service endpoints are created, one for each API. The one for Sheets is saved to the SHEETS variable while the one for Slides goes to SLIDES.

Start with Sheets

The first thing to do is to grab all the data we need from the Google Sheet using the Sheets API. You can either supply your own Sheet with your own chart, or you can run the script from the earlier post mentioned earlier to create an identical Sheet as above. In either case, you need to provide the Sheet ID to read from, which is saved to the sheetID variable. Using its ID, we call spreadsheets().values().get() to pull out all the cells (as rows & columns) from the Sheet and save it to orders:
sheetID = '. . .'   # use your own!
orders = SHEETS.spreadsheets().values().get(range='Sheet1',
        spreadsheetId=sheetID).execute().get('values')
The next step is to call spreadsheets().get() to get all the sheets in the Sheet —there's only one, so grab it at index 0. Since this sheet only has one chart, we also use index 0 to get that:
sheet = SHEETS.spreadsheets().get(spreadsheetId=sheetID,
        ranges=['Sheet1']).execute().get('sheets')[0]
chartID = sheet['charts'][0]['chartId']
That's it for Sheets. Everything from here on out takes places in Slides.

Create new Slides presentation

A new slide deck can be created with SLIDES.presentations().create()—or alternatively with the Google Drive API which we won't do here. We'll name it, "Generating slides from spreadsheet data DEMO" and save its (new) ID along with the IDs of the title and subtitle textboxes on the (one) title slide created in the new deck:
DATA = {'title': 'Generating slides from spreadsheet data DEMO'}
rsp = SLIDES.presentations().create(body=DATA).execute()
deckID = rsp['presentationId']
titleSlide = rsp['slides'][0]
titleID = titleSlide['pageElements'][0]['objectId']
subtitleID = titleSlide['pageElements'][1]['objectId']

Create slides for table & chart

A mere title slide doesn't suffice as we need a place for the cell data as well as the pie chart, so we'll create slides for each. While we're at it, we might as well fill in the text for the presentation title and subtitle. These requests are self-explanatory as you can see below in the reqs variable. The SLIDES.presentations().batchUpdate() method is then used to send the four commands to the API. Upon return, save the IDs for both the cell table slide as well as the blank slide for the chart:
reqs = [
  {'createSlide': {'slideLayoutReference': {'predefinedLayout': 'TITLE_ONLY'}}},
  {'createSlide': {'slideLayoutReference': {'predefinedLayout': 'BLANK'}}},
  {'insertText': {'objectId': titleID,    'text': 'Importing Sheets data'}},
  {'insertText': {'objectId': subtitleID, 'text': 'via the Google Slides API'}},
]
rsp = SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=deckID).execute().get('replies')
tableSlideID = rsp[0]['createSlide']['objectId']
chartSlideID = rsp[1]['createSlide']['objectId']
Note the order of the requests. The create-slide requests come first followed by the text inserts. Responses that come back from the API are returned in the same order as they were sent, hence why the cell table slide ID comes back first (index 0) followed by the chart slide ID (index 1). The text inserts don't have any meaningful return values and are thus ignored.

Filling out the table slide

Now let's focus on the table slide. There are two things we need to accomplish. In the previous set of requests, we asked the API to create a "title only" slide, meaning there's (only) a textbox for the slide title. The next snippet of code gets all the page elements on that slide so we can get the ID of that textbox, the only thing on that page:
rsp = SLIDES.presentations().pages().get(presentationId=deckID,
        pageObjectId=tableSlideID).execute().get('pageElements')
textboxID = rsp[0]['objectId'] 
On this slide, we need to add the cell table for the Sheet data, so a create-table request takes care of that. The required elements in such a call include the ID of the slide the table should go on as well as the total number of rows and columns desired. Fortunately all that are available from tableSlideID and orders saved earlier. Oh, and add a title for this table slide too. Here's the code:
reqs = [
    {'createTable': {
        'elementProperties': {'pageObjectId': tableSlideID},
        'rows': len(orders),
        'columns': len(orders[0])},
    },
    {'insertText': {'objectId': textboxID, 'text': 'Toy orders'}},
]
rsp = SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=deckID).execute().get('replies')
tableID = rsp[0]['createTable']['objectId']
Another call to SLIDES.presentations().batchUpdate() and we're done, saving the ID of the newly-created table. Next, we'll fill in each cell of that table.

Populate table & add chart image

The first set of requests needed now fill in each cell of the table. The most compact way to issue these requests is with a double-for loop list comprehension. The first loops over the rows while the second loops through each column (of each row). Magically, this creates all the text insert requests needed.
reqs = [
    {'insertText': {
        'objectId': tableID,
        'cellLocation': {'rowIndex': i, 'columnIndex': j},
        'text': str(data),
    }} for i, order in enumerate(orders) for j, data in enumerate(order)]
The final request "imports" the chart from the Sheet onto the blank slide whose ID we saved earlier. Note, while the dimensions below seem completely arbitrary, be assured we're using the same size & transform as a blank rectangle we drew on the slide earlier (and read those values from). The alternative would be to use math to come up with your object dimensions. Here is the code we're talking about, followed by the actual call to the API:
reqs.append({'createSheetsChart': {
    'spreadsheetId': sheetID,
    'chartId': chartID,
    'linkingMode': 'LINKED',
    'elementProperties': {
        'pageObjectId': chartSlideID,
        'size': {
            'height': {'magnitude': 7075, 'unit': 'EMU'},
            'width':  {'magnitude': 11450, 'unit': 'EMU'}
        },
        'transform': {
            'scaleX': 696.6157,
            'scaleY': 601.3921,
            'translateX': 583875.04,
            'translateY': 444327.135,
            'unit': 'EMU',
        },
    },
}})
SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=deckID).execute()
Once all the requests have been created, send them to the Slides API then we're done. (In the actual app, you'll see we've sprinkled various print() calls to let the user knows which steps are being executed.

Conclusion

The entire script clocks in at just under 100 lines of code... see below. If you run it, you should get output that looks something like this:
$ python3 slides_table_chart.py
** Fetch Sheets data
** Fetch chart info from Sheets
** Create new slide deck
** Create 2 slides & insert slide deck title+subtitle
** Fetch table slide title (textbox) ID
** Create table & insert table slide title
** Fill table cells & create linked chart to Sheets
DONE
When the script has completed, you should have a new presentation with these 3 slides:




Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!). If I were to divide the script into major sections, they would be represented by each of the print() calls above. Here's the complete script—by using, copying, and/or modifying this code or any other piece of source from this blog, you implicitly agree to its Apache2 license:
from __future__ import print_function

from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = (
    'https://www.googleapis.com/auth/spreadsheets.readonly',
    'https://www.googleapis.com/auth/presentations',
)
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
    creds = tools.run_flow(flow, store)
HTTP = creds.authorize(Http())
SHEETS = discovery.build('sheets', 'v4', http=HTTP)
SLIDES = discovery.build('slides', 'v1', http=HTTP)

print('** Fetch Sheets data')
sheetID = '. . .'   # use your own!
orders = SHEETS.spreadsheets().values().get(range='Sheet1',
        spreadsheetId=sheetID).execute().get('values')

print('** Fetch chart info from Sheets')
sheet = SHEETS.spreadsheets().get(spreadsheetId=sheetID,
        ranges=['Sheet1']).execute().get('sheets')[0]
chartID = sheet['charts'][0]['chartId']

print('** Create new slide deck')
DATA = {'title': 'Generating slides from spreadsheet data DEMO'}
rsp = SLIDES.presentations().create(body=DATA).execute()
deckID = rsp['presentationId']
titleSlide = rsp['slides'][0]
titleID = titleSlide['pageElements'][0]['objectId']
subtitleID = titleSlide['pageElements'][1]['objectId']

print('** Create 2 slides & insert slide deck title+subtitle')
reqs = [
  {'createSlide': {'slideLayoutReference': {'predefinedLayout': 'TITLE_ONLY'}}},
  {'createSlide': {'slideLayoutReference': {'predefinedLayout': 'BLANK'}}},
  {'insertText': {'objectId': titleID,    'text': 'Importing Sheets data'}},
  {'insertText': {'objectId': subtitleID, 'text': 'via the Google Slides API'}},
]
rsp = SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=deckID).execute().get('replies')
tableSlideID = rsp[0]['createSlide']['objectId']
chartSlideID = rsp[1]['createSlide']['objectId']

print('** Fetch table slide title (textbox) ID')
rsp = SLIDES.presentations().pages().get(presentationId=deckID,
        pageObjectId=tableSlideID).execute().get('pageElements')
textboxID = rsp[0]['objectId']

print('** Create table & insert table slide title')
reqs = [
    {'createTable': {
        'elementProperties': {'pageObjectId': tableSlideID},
        'rows': len(orders),
        'columns': len(orders[0])},
    },
    {'insertText': {'objectId': textboxID, 'text': 'Toy orders'}},
]
rsp = SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=deckID).execute().get('replies')
tableID = rsp[0]['createTable']['objectId']

print('** Fill table cells & create linked chart to Sheets')
reqs = [
    {'insertText': {
        'objectId': tableID,
        'cellLocation': {'rowIndex': i, 'columnIndex': j},
        'text': str(data),
    }} for i, order in enumerate(orders) for j, data in enumerate(order)]

reqs.append({'createSheetsChart': {
    'spreadsheetId': sheetID,
    'chartId': chartID,
    'linkingMode': 'LINKED',
    'elementProperties': {
        'pageObjectId': chartSlideID,
        'size': {
            'height': {'magnitude': 7075, 'unit': 'EMU'},
            'width':  {'magnitude': 11450, 'unit': 'EMU'}
        },
        'transform': {
            'scaleX': 696.6157,
            'scaleY': 601.3921,
            'translateX': 583875.04,
            'translateY': 444327.135,
            'unit': 'EMU',
        },
    },
}})
SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=deckID).execute()
print('DONE')
As with our other code samples, you can now customize it to learn more about the API, integrate into other apps for your own needs, for a mobile frontend, sysadmin script, or a server-side backend!

Code challenge

Given the knowledge you picked up from this post and its code sample, augment the script with another call to the Sheets API that updates the number of toys ordered by one of the customers, then add the corresponding call to the Slides API that refreshes the linked image based on the changes made to the Sheet (and chart). EXTRA CREDIT: Use the Google Drive API to monitor the Sheet so that any updates to toy orders will result in an "automagic" update of the chart image in the Slides presentation.

Wednesday, November 9, 2016

Replacing text & images with the Google Slides API with Python

NOTE: The code covered in this post are also available in a video walkthrough however the code here differs slightly, featuring some minor improvements to the code in the video.

Introduction

One of the critical things developers have not been able to do previously was access Google Slides presentations programmatically. To address this "shortfall," the Slides team pre-announced their first API a few months ago at Google I/O 2016—also see full announcement video (40+ mins). In early November, the G Suite product team officially launched the API, finally giving all developers access to build or edit Slides presentations from their applications.

In this post, I'll walk through a simple example featuring an existing Slides presentation template with a single slide. On this slide are placeholders for a presentation name and company logo, as illustrated below:

One of the obvious use cases that will come to mind is to take a presentation template replete with "variables" and placeholders, and auto-generate decks from the same source but created with different data for different customers. For example, here's what a "completed" slide would look like after the proxies have been replaced with "real data:"

Using the Google Slides API

We need to edit/write into a Google Slides presentation, meaning the read-write scope from all Slides API scopes below:
  • 'https://www.googleapis.com/auth/presentations' — Read-write access to Slides and Slides presentation properties
  • 'https://www.googleapis.com/auth/presentations.readonly' — View-only access to Slides presentations and properties
  • 'https://www.googleapis.com/auth/drive' — Full access to users' files on Google Drive
Why is the Google Drive API scope listed above? Well, think of it this way: APIs like the Google Sheets and Slides APIs were created to perform spreadsheet and presentation operations. However, importing/exporting, copying, and sharing are all file-based operations, thus where the Drive API fits in. If you need a review of its scopes, check out the Drive auth scopes page in the docs. Copying a file requires the full Drive API scope, hence why it's listed above. If you're not going to copy any files and only performing actions with the Slides API, you can of course leave it out.

Since we've fully covered the authorization boilerplate fully in earlier posts and videos, we're going to skip that here and jump right to the action.

Getting started

What are we doing in today's code sample? We start with a slide template file that has "variables" or placeholders for a title and an image. The application code will go then replace these proxies with the actual desired text and image, with the goal being that this scaffolding will allow you to automatically generate multiple slide decks but "tweaked" with "real" data that gets substituted into each slide deck.

The title slide template file is TMPFILE, and the image we're using as the company logo is the Google Slides product icon whose filename is stored as the IMG_FILE variable in my Google Drive. Be sure to use your own image and template files! These definitions plus the scopes to be used in this script are defined like this:
IMG_FILE = 'google-slides.png'     # use your own!
TMPLFILE = 'title slide template'  # use your own!
SCOPES = (
    'https://www.googleapis.com/auth/drive',
    'https://www.googleapis.com/auth/presentations',
)
Skipping past most of the OAuth2 boilerplate, let's move ahead to creating the API service endpoints. The Drive API name is (of course) 'drive', currently on 'v3', while the Slides API is 'slides' and 'v1' in the following call to create a signed HTTP client that's shared with a pair of calls to the apiclient.discovery.build() function to create the API service endpoints:
HTTP = creds.authorize(Http())
DRIVE =  discovery.build('drive',  'v3', http=HTTP)
SLIDES = discovery.build('slides', 'v1', http=HTTP)

Copy template file

The first step of the "real" app is to find and copy the template file TMPLFILE. To do this, we'll use DRIVE.files().list() to query for the file, then grab the first match found. Then we'll use DRIVE.files().copy() to copy the file and name it 'Google Slides API template DEMO':
rsp = DRIVE.files().list(q="name='%s'" % TMPLFILE).execute().get('files')[0]
DATA = {'name': 'Google Slides API template DEMO'}
print('** Copying template %r as %r' % (rsp['name'], DATA['name']))
DECK_ID = DRIVE.files().copy(body=DATA, fileId=rsp['id']).execute().get('id')

Find image placeholder

Next, we'll ask the Slides API to get the data on the first (and only) slide in the deck. Specifically, we want the dimensions of the image placeholder. Later on, we will use those properties when replacing it with the company logo, so that it will be automatically resized and centered into the same spot as the image placeholder.
The SLIDES.presentations().get() method is used to read the presentation metadata. Returned is a payload consisting of everything in the presentation, the masters, layouts, and of course, the slides themselves. We only care about the slides, so we get that from the payload. And since there's only one slide, we grab it at index 0. Once we have the slide, we're loop through all of the elements on that page and stop when we find the rectangle (image placeholder):
print('** Get slide objects, search for image placeholder')
slide = SLIDES.presentations().get(presentationId=DECK_ID
       ).execute().get('slides')[0]
obj = None
for obj in slide['pageElements']:
    if obj['shape']['shapeType'] == 'RECTANGLE':
        break

Find image file

At this point, the obj variable points to that rectangle. What are we going to replace it with? The company logo, which we now query for using the Drive API:
print('** Searching for icon file')
rsp = DRIVE.files().list(q="name='%s'" % IMG_FILE).execute().get('files')[0]
print(' - Found image %r' % rsp['name'])
img_url = '%s&access_token=%s' % (
        DRIVE.files().get_media(fileId=rsp['id']).uri, creds.access_token) 
The query code is similar to when we searched for the template file earlier. The trickiest thing about this snippet is that we need a full URL that points directly to the company logo. We use the DRIVE.files().get_media() method to create that request but don't execute it. Instead, we dig inside the request object itself and grab the file's URI and merge it with the current access token so what we're left with is a valid URL that the Slides API can use to read the image file and create it in the presentation.

Replace text and image

Back to the Slides API for the final steps: replace the title (text variable) with the desired text, add the company logo with the same size and transform as the image placeholder, and delete the image placeholder as it's no longer needed:
print('** Replacing placeholder text and icon')
reqs = [
    {'replaceAllText': {
        'containsText': {'text': '{{NAME}}'},
        'replaceText': 'Hello World!'
    }},
    {'createImage': {
        'url': img_url,
        'elementProperties': {
            'pageObjectId': slide['objectId'],
            'size': obj['size'],
            'transform': obj['transform'],
        }
    }},
    {'deleteObject': {'objectId': obj['objectId']}},
]
SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=DECK_ID).execute()
print('DONE')
Once all the requests have been created, send them to the Slides API then let the user know everything is done.

Conclusion

That's the entire script, just under 60 lines of code. If you watched the video, you may notice a few minor differences in the code. One is use of the fields parameter in the Slides API calls. They represent the use of field masks, which is a separate topic on its own. As you're learning the API now, it may cause unnecessary confusion, so it's okay to disregard them for now. The other difference is an improvement in the replaceAllText request—the old way in the video is now deprecated, so go with what we've replaced it with in this post.

If your template slide deck and image is in your Google Drive, and you've modified the filenames and run the script, you should get output that looks something like this:
$ python3 slides_template.py
** Copying template 'title slide template' as 'Google Slides API template DEMO'
** Get slide objects, search for image placeholder
** Searching for icon file
 - Found image 'google-slides.png'
** Replacing placeholder text and icon
DONE
Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!). If I were to divide the script into major sections, they would be:
  • Get creds & build API service endpoints
  • Copy template file
  • Get image placeholder size & transform (for replacement image later)
  • Get secure URL for company logo
  • Build and send Slides API requests to...
    • Replace slide title variable with "Hello World!"
    • Create image with secure URL using placeholder size & transform
    • Delete image placeholder
Here's the complete script—by using, copying, and/or modifying this code or any other piece of source from this blog, you implicitly agree to its Apache2 license:
from __future__ import print_function

from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

IMG_FILE = 'google-slides.png'      # use your own!
TMPLFILE = 'title slide template'   # use your own!
SCOPES = (
    'https://www.googleapis.com/auth/drive',
    'https://www.googleapis.com/auth/presentations',
)
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
    creds = tools.run_flow(flow, store)
HTTP = creds.authorize(Http())
DRIVE  = discovery.build('drive',  'v3', http=HTTP)
SLIDES = discovery.build('slides', 'v1', http=HTTP)

rsp = DRIVE.files().list(q="name='%s'" % TMPLFILE).execute().get('files')[0]
DATA = {'name': 'Google Slides API template DEMO'}
print('** Copying template %r as %r' % (rsp['name'], DATA['name']))
DECK_ID = DRIVE.files().copy(body=DATA, fileId=rsp['id']).execute().get('id')

print('** Get slide objects, search for image placeholder')
slide = SLIDES.presentations().get(presentationId=DECK_ID,
        fields='slides').execute().get('slides')[0]
obj = None
for obj in slide['pageElements']:
    if obj['shape']['shapeType'] == 'RECTANGLE':
        break

print('** Searching for icon file')
rsp = DRIVE.files().list(q="name='%s'" % IMG_FILE).execute().get('files')[0]
print(' - Found image %r' % rsp['name'])
img_url = '%s&access_token=%s' % (
        DRIVE.files().get_media(fileId=rsp['id']).uri, creds.access_token)

print('** Replacing placeholder text and icon')
reqs = [
    {'replaceAllText': {
        'containsText': {'text': '{{NAME}}'},
        'replaceText': 'Hello World!'
    }},
    {'createImage': {
        'url': img_url,
        'elementProperties': {
            'pageObjectId': slide['objectId'],
            'size': obj['size'],
            'transform': obj['transform'],
        }
    }},
    {'deleteObject': {'objectId': obj['objectId']}},
]
SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=DECK_ID).execute()
print('DONE')
As with our other code samples, you can now customize it to learn more about the API, integrate into other apps for your own needs, for a mobile frontend, sysadmin script, or a server-side backend!

Code challenge

Add more slides and/or text variables and modify the script replace them too. EXTRA CREDIT: Change the image-based image placeholder to a text-based image placeholder, say a textbox with the text, "{{COMPANY_LOGO}}" and use the replaceAllShapesWithImage request to perform the image replacement. By making this one change, your code should be simplified from the image-based image replacement solution we used in this post.

Wednesday, September 28, 2016

Formatting cells in Google Sheets with Python

Introduction

One of the critical things that developers have not been able to do in previous versions of the Google Sheets API is to format cells... that's a big deal! Anyway, the past is the past, and I choose to look ahead. In my earlier post on the Google Sheets API, I introduced Sheets API v4 with a tutorial on how to transfer data from a SQL database to a Sheet. You'd do that primarily to make database data more presentable rather than deliberately switching to a storage mechanism with weaker querying capabilities. At the very end of that post, I challenged readers to try formatting. If you got stuck, confused, or haven't had a chance yet, today's your lucky day. One caveat is that there's more JavaScript in this post than Python... you've been warned!

Using the Google Sheets API

We need to write (formatting) into a Google Sheet, so we need the same scope as last time, read-write:
  • 'https://www.googleapis.com/auth/spreadsheets' — Read-write access to Sheets and Sheet properties
Since we've fully covered the authorization boilerplate fully in earlier posts and videos, including how to connect to the Sheets API, we're going to skip that here and jump right to the action.

Formatting cells in Google Sheets

The way the API works, in general, is to take one or more commands, and execute them on the Sheet. This comes in the form of individual requests, either to cells, a Sheet, or the entire spreadsheet. A group if requests is organized as a JavaScript array. Each request in the array is represented by a JSON object. Yes, this part of the post may seem like a long exercise in JavaScript, but stay with me here. Continuing... once your array is complete, you send all the requests to the Sheet using the SHEETS.spreadsheets().batchUpdate() command. Here's pseudocode sending 5 commands to the API:
SHEET_ID =  . . .

reqs = {'requests': [
    {'updateSheetProperties':
        . . .
    {'repeatCell':
        . . .
    {'setDataValidation':
        . . .
    {'sortRange':
        . . .
    {'addChart':
        . . .
]}
SHEETS.spreadsheets().batchUpdate(
        spreadsheetId=SHEET_ID, body=reqs).execute()
What we're executing will be similar. The target spreadsheet will be the one you get when you run the code from the previous post, only without the timestamp in its title as it's unnecessary:


Once you've run the earlier script and created a Sheet of your own, be sure to assign it to the SHEET_ID variable. The goal is to send enough formatting commands to arrive at the same spreadsheet but with improved visuals:


Four (4) requests are needed to bring the original Sheet to this state:
  1. Set the top row as "frozen", meaning it doesn't scroll even when the data does
  2. Also bold the first row, as these are the column headers
  3. Format column E as US currency with dollar sign & 2 decimal places
  4. Set data validation for column F, requiring values from a fixed set

Creating Sheets API requests

As mentioned before, each request is represented by a JSON object, cleverly disguised as Python dictionaries in this post, and the entire request array is implemented as a Python list. Let's take a look at what it takes to together the individual requests:

Frozen rows

Frozen rows is a Sheet property, so in order to change it, users must employ the updateSheetProperties command. Specifically, frozenRowCount is a grid property, meaning the field that must be updated is gridProperties.frozenRowCount, set to 1. Here's the Python dict (that gets converted to a JSON object) representing this request:
{'updateSheetProperties': {
    'properties': {'gridProperties': {'frozenRowCount': 1}},
    'fields': 'gridProperties.frozenRowCount',
}},
The properties attribute specifies what is changing and what the new value is. The fields property serves as an attribute "mask." It's how you specify what to alter and what to leave alone when applying a change. In this case, both the properties and fields attributes refer to the same thing: the frozen row count grid property. If you leave out the fields attribute here, sure the frozen row count would be set but all other grid properties would be undone, not such a good side effect. It's okay if it doesn't make a lot of sense yet... there are more examples coming.

Bold formatting

Text formatting, such as bold or italics, is a cell operation. Since we want to apply this formatting to multiple cells, the correct command is, repeatCell. Specifically, what needs to be changed about a cell? A cell's userEnteredFormat.textFormat.bold attribute. This is a simple Boolean value, so we set it to True. The fields masks are as described above... we need to tell the API to explicitly change the just userEnteredFormat.textFormat.bold attribute. Everything else should stay as-is.

The last thing we need is to tell the API which cells in the Sheet should be formatted. For this we have range. This attribute tells the API what Sheet (by ID) and which cells (column and row ranges) in that Sheet to format. Above, you see that we want to bold just one row, row #1. Similarly, there are currently six columns to bold, A-F.

However, like most modern computer systems, the API supports start and end index values beginning with zero... not alphabetic column names nor rows consisting of whole numbers, and the range is exclusive of the end index, meaning it goes up to but does not include the ending row or column. For row 1, this means a starting index of 0 and an ending index of 1. Similarly, columns A-F have start & end index value of 0 and 6, respectively. Visually, here's how you compare traditional row & column values to 0-based index counting:


Here's the dict representing this request:
{'repeatCell': {
    'range': {
        'sheetId': 0,
        'startColumnIndex': 0,
        'endColumnIndex': 6,
        'startRowIndex': 0,
        'endRowIndex': 1
    },
    'cell': {'userEnteredFormat': {'textFormat': {'bold': True}}},
    'fields': 'userEnteredFormat.textFormat.bold',
}},
Before we move on, let's talk about some shortcuts we can make. The ID of the first Sheet created for you is 0. If that's the Sheet you're using, then you can omit passing the Sheet ID. Similarly, the starting row and column indexes default to 0, so you can leave those out too if those are the values to be used. Finally, while an ending column index of 6 works, it won't if more columns are added later. It's best if you just omit the ending index altogether, meaning you want that entire row formatted. All this means that the only thing in the range you need is the ending row index. Instead of the above, your request can be shortened to:
{'repeatCell': {
    'range': {'endRowIndex': 1},
    'cell': {'userEnteredFormat': {'textFormat': {'bold': True}}},
    'fields': 'userEnteredFormat.textFormat.bold',
}},

Range quiz

Okay, now that you know about ranges, take this quiz: assumming the Sheet ID is 0, what are the starting and ending column and row indexes for the four cells highlighted in blue in this Sheet?


If you went with starting and ending column indexes of 3 and 5 and row indexes of 2 and 4, then you're right on the money and ready for more!

Currency formatting

Currency formatting is similar to text formatting, only with numbers, meaning that instead of userEnteredFormat.textFormat, you'd be setting a cell's userEnteredFormat.numberFormat attribute. The command is also repeatCell. Clearly the starting and ending column indexes should be 4 and 5 with a starting row index of 1. But just like the cell bolding we did above, there's no need to restrict ourselves to just the 5 rows of data as more may be coming. Yes, it's best to leave off the ending row index so that the rest of the column is formatted. The only thing you need to learn is how to format cells using US currency, but that's pretty easy to do after a quick look at the docs on formatting numbers:
{'repeatCell': {
    'range': {
        'startRowIndex': 1,
        'startColumnIndex': 4,
        'endColumnIndex': 5,
    },
    'cell': {
        'userEnteredFormat': {
            'numberFormat': {
                'type': 'CURRENCY',
                'pattern': '"$"#,##0.00',
            },
        },
    },
    'fields': 'userEnteredFormat.numberFormat',
}}

More on fields

One caveat to our sample app here is that all of the fields mask only have a single value, the one we want to change, but that's not always the case. There may be situations where you want to effect a variety of changes to cells. To see more examples of fields, check out this page in the docs featuring more formatting examples. To learn more about how masking works, check out this page and this one too.

Cell validation

The final formatting request implements cell validation on column F as well as restricting their possible values. The command used here is setDataValidation. The range is similar to that of currency formatting, only for column F, meaning a starting row index of 1, and starting and ending column indexes of 5 and 6, respectively. The rule implements the restriction. Similar to other spreadsheet software, you can restrict cell values in any number of ways, as outlined by the ConditionType documentation page. Ours is to allow for one of three possible values, so the ConditionType is ONE_OF_LIST.

When you restrict cell values, you can choose to allow but flag it (weak enforcement) or disallow any value outside of what you specify (strict enforcement). If you wish to employ strict enforcement, you need to pass in a strict attribute with a True value. The default is weak enforcement, or False. In either case, users entering invalid values will get a default warning message that the input is not allowed. If you prefer a custom message over the default option, you can pass that to the API as the inputMessage attribute. I prefer the system default and elect not to use it here. Here are the 4 combinations of what shows up when you use or don't use inputMessage with strict and weak enforcement:


No inputMessage (default) + weak enforcement


With inputMessage (my custom msg) + weak enforcement


No inputMessage (default) + strict enforcement


With inputMessage (my custom msg) + weak enforcement

The last attribute you can send is showCustomUi. If the showCustomUi flag is set to True, the Sheets user interface will display a small pulldown menu listing the values accepted by the cell. It's a pretty poor user experience without it (because users won’t know what the available choices are), so I recommend you always use it too. With that, this request looks like this:
{'setDataValidation': {
    'range': {
        'startRowIndex': 1,
        'startColumnIndex': 5,
        'endColumnIndex': 6,
    },
    'rule': {
        'condition': {
            'type': 'ONE_OF_LIST',
            'values': [
                {'userEnteredValue': 'PENDING'},
                {'userEnteredValue': 'SHIPPED'},
                {'userEnteredValue': 'DELIVERED'},
            ]
        },
        #'inputMessage': 'Select PENDING, SHIPPED, or DELIVERED',
        #'strict': True,
        'showCustomUi': True,
    },
}}
Since we're not modifying cell attributes, but instead focusing on validation, you'll notice there's no fields mask in these types of requests.

Running our script

Believe it or not, that's the bulk of this application. With the reqs list of these four requests, the last line of code calls the Sheets API exactly like the pseudocode above. Now you can simply run it:
$ python sheets_cell_format.py # or python3
$
There's no output from this script, so you should only expect that your Sheet will be formatted once it has completed. If you bring up the Sheet in the user interface, you should see the changes happening in near real-time:


Conclusion

Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!):
from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = 'https://www.googleapis.com/auth/spreadsheets'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_id.json', SCOPES)
    creds = tools.run_flow(flow, store)
SHEETS = discovery.build('sheets', 'v4', http=creds.authorize(Http()))

SHEET_ID = ... # add your Sheet ID here
reqs = {'requests': [
    # frozen row 1
    {'updateSheetProperties': {
        'properties': {'gridProperties': {'frozenRowCount': 1}},
        'fields': 'gridProperties.frozenRowCount',
    }},
    # embolden row 1
    {'repeatCell': {
        'range': {'endRowIndex': 1},
        'cell': {'userEnteredFormat': {'textFormat': {'bold': True}}},
        'fields': 'userEnteredFormat.textFormat.bold',
    }},
    # currency format for column E (E2:E7)
    {'repeatCell': {
        'range': {
            'startRowIndex': 1,
            'endRowIndex': 6,
            'startColumnIndex': 4,
            'endColumnIndex': 5,
        },
        'cell': {
            'userEnteredFormat': {
                'numberFormat': {
                    'type': 'CURRENCY',
                    'pattern': '"$"#,##0.00',
                },
            },
        },
        'fields': 'userEnteredFormat.numberFormat',
    }},
    # validation for column F (F2:F7)
    {'setDataValidation': {
        'range': {
            'startRowIndex': 1,
            'endRowIndex': 6,
            'startColumnIndex': 5,
            'endColumnIndex': 6,
        },
        'rule': {
            'condition': {
                'type': 'ONE_OF_LIST',
                'values': [
                    {'userEnteredValue': 'PENDING'},
                    {'userEnteredValue': 'SHIPPED'},
                    {'userEnteredValue': 'DELIVERED'},
                ]
            },
            #'inputMessage': 'Select PENDING, SHIPPED, or DELIVERED',
            #'strict': True,
            'showCustomUi': True,
        },
    }},
]}

res = SHEETS.spreadsheets().batchUpdate(
        spreadsheetId=SHEET_ID, body=reqs).execute()
As with our other code samples, you can now customize for your own needs, for a mobile frontend, sysadmin script, or a server-side backend, perhaps accessing other Google APIs.

Code challenge

Once you fully grasp this sample and are ready for a challenge: Use the API to create a column "G" with a "Total Cost" header in cell G1, set cell G2 with the formula to calculate the cost based on toys ordered & cost in columns D & E then and create an autoFill request to replicate that formula down column G. When you're done, the right-hand side of your Sheet now looks like this:

Here are some steps you can take to achieve this improvement:
  1. Create column G with a "Total Cost" header in cell G1; make sure it's bold too (or do you have to?)
  2. Set cell G2 with formula =MULTIPLY(D2,E2)
  3. Use autoFill command to copy formula from G2 down the entire column (HINT: you only need the range attribute)
You're now well under way to being able to writing useful applications with the Sheets API!

Monday, July 11, 2016

Exporting a Google Sheet spreadsheet as CSV

Introduction

Today, we'll follow-up to my earlier post on the Google Sheets API and multiple posts (first, secondthird) on the Google Drive API by answering one common question: How do you download a Google Sheets spreadsheet as a CSV file? The "FAQ"ness of the question itself as well as various versions of Google APIs has led to many similar StackOverflow questions: one, two, three, four, five, just to list a few. Let's answer this question definitively and walk through a Python code sample that does exactly that. The main assumption is that you have a Google Sheet file in your Google Drive named "inventory".

Choosing the right API

Upon first glance, developers may think the Google Sheets API is the one to use. Unfortunately that isn't the case. The Sheets API is the one to use for spreadsheet-oriented operations, such as inserting data, reading spreadsheet rows, managing individual tab/sheets within a spreadsheet, cell formatting, creating charts, adding pivot tables, etc., It isn't meant to perform file-based requests like exporting a Sheet in CSV (comma-separated values) format. For file-oriented operations with a Google Sheet, you would use the Google Drive API.

Using the Google Drive API

As mentioned earlier, Google Drive features numerous API scopes of authorization. As usual, we always recommend you use the most restrictive scope possible that allows your app to do its work. You'll request fewer permissions from your users (which makes them happier), and it also makes your app more secure, possibly preventing modifying, destroying, or corrupting data, or perhaps inadvertently going over quotas. Since we're only exporting a Google Sheets file from Google Drive, the only scope we need is:
  • 'https://www.googleapis.com/auth/drive.readonly' — Read-only access to file content or metadata
The earlier post I wrote on the Google Drive API featured sample code that exported an uploaded Google Docs file as PDF and download that from Drive. This post will not only feature a change to exporting a Google Sheets file in CSV format, but also demonstrate one additional feature of the Drive API: querying

Since we've fully covered the authorization boilerplate fully in earlier posts and videos, we're going to skip that here and jump right to the action, creating of a service endpoint to Drive. The API name is (of course 'drive', and the current version of the API is 3, so use the string 'v3' in this call to the apiclient.discovey.build() function:

DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))

Query and export files from Google Drive

While unnecessary, we'll create a few string constants representing the filename, source and destination file MIME types to make the code easier to understand:
FILENAME = 'inventory'
SRC_MIMETYPE = 'application/vnd.google-apps.spreadsheet'
DST_MIMETYPE = 'text/csv'
In this simple example, we're only going to export one Google Sheets file as CSV, arbitrarily choosing a file named, "inventory." So to perform the query, you need both the filename and its MIME type, "application/vnd.google-apps.spreadsheet". Query components are conjoined with the "and" keyword, so your query string will look like this: q='name="%s" and mimeType="%s"' % (FILENAME, SRC_MIMETYPE).

Since there may be more than one Google Sheets file named 'inventory". we opt for newest one and thus need to sort all matching files in descending order of last modification time then name if "mtime"s are identical via an "order by" clause: orderBy='modifiedTime desc,name'. Here is the complete call to DRIVE.files().list() to issue the query:
files = DRIVE.files().list(
    q='name="%s" and mimeType="%s"' % (FILENAME, SRC_MIMETYPE),
    orderBy='modifiedTime desc,name').execute().get('files', [])
If any files match, the payload will contain a 'files' key, else we default to an empty list and display to the user on the last line that no files were found. Otherwise, grab the first match, the most recently-modified 'inventory' file, create a suitable CSV filename from it, and change all spaces to underscores:

fn = '%s.csv' % os.path.splitext(files[0]['name'].replace(' ', '_'))[0]

The final Drive API call requests an export of 'inventory' as a CSV file, and if successful, the downloaded data is written with the filename above. In either case, the user is notified of success or failure of the export:
data = DRIVE.files().export(fileId=files[0]['id'], mimeType=DST_MIMETYPE).execute()
if data:
    with open(fn, 'wb') as f:
        f.write(data)
    print('DONE')
else:
    print('ERROR (could not download file)')
Note that if downloading as CSV, the Drive API only exports of the first sheet in a Sheets file... you won't get any others. However, it does support 3 other download formats that will get you all the sheets.

If you create a Sheets file named 'inventory', run the script, grant the script access to your Google Drive (via the OAuth2 prompt that pops up in the browser), and then you should get output that looks like this:
$ python drive_sheets_csv_export.py # or python3
Exporting "inventory" as "inventory.csv"... DONE

Conclusion

Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!). If I were to divide the script into 4 major sections, they would be:
  • Get creds & build Google Drive service endpoint
  • Source and destination file info
  • Query Google Drive for matching files
  • Export most recent matching Sheets file as CSV

Here's the code itself:
from __future__ import print_function
import os

from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = 'https://www.googleapis.com/auth/drive.readonly'
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
    creds = tools.run_flow(flow, store)
DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))

FILENAME = 'inventory'
SRC_MIMETYPE = 'application/vnd.google-apps.spreadsheet'
DST_MIMETYPE = 'text/csv'

files = DRIVE.files().list(
    q='name="%s" and mimeType="%s"' % (FILENAME, SRC_MIMETYPE),
    orderBy='modifiedTime desc,name').execute().get('files', [])

if files:
    fn = '%s.csv' % os.path.splitext(files[0]['name'].replace(' ', '_'))[0]
    print('Exporting "%s" as "%s"... ' % (files[0]['name'], fn), end='')
    data = DRIVE.files().export(fileId=files[0]['id'], mimeType=DST_MIMETYPE).execute()
    if data:
        with open(fn, 'wb') as f:
            f.write(data)
        print('DONE')
    else:
        print('ERROR (could not download file)')
else:
    print('!!! ERROR: File not found')
As with our other code samples, you can now customize for your own needs, for a mobile frontend, sysadmin script, or a server-side backend, perhaps accessing other Google APIs. Hope this helps answer yet another frequently asked question!

Thursday, June 9, 2016

Migrating SQL data to Google Sheets using the new Google Sheets API

NOTE: The code covered in this post are also available in a video walkthrough.
UPDATE (Sep 2016): Removed use of argparse module & flags (effective as of Feb 2016).

Introduction

In this post, we're going to demonstrate how to use the latest generation Google Sheets API. Launched at Google I/O 2016 (full talk here), the Sheets API v4 can do much more than previous versions, bringing it to near-parity with what you can do with the Google Sheets UI (user interface) on desktop and mobile. Below, I'll walk you through a Python script that reads the rows of a relational database representing customer orders for a toy company and pushes them into a Google Sheet. Other API calls we'll make: one to create new Google Sheets with and another that reads the rows from a Sheet.

Earlier posts demonstrated the structure and "how-to" use Google APIs in general, so more recent posts, including this one, focus on solutions and use of specific APIs. Once you review the earlier material, you're ready to start with authorization scopes then see how to use the API itself.

    Google Sheets API authorization & scopes

    Previous versions of the Google Sheets API (formerly called the Google Spreadsheets API), were part of a group of "GData APIs" that implemented the Google Data (GData) protocol, an older, less-secure, REST-inspired technology for reading, writing, and modifying information on the web. The new API version falls under the more modern set of Google APIs requiring OAuth2 authorization and whose use is made easier with the Google APIs Client Libraries.

    The current API version features a pair of authorization scopes: read-only and read-write. As usual, we always recommend you use the most restrictive scope possible that allows your app to do its work. You'll request fewer permissions from your users (which makes them happier), and it also makes your app more secure, possibly preventing modifying, destroying, or corrupting data, or perhaps inadvertently going over quotas. Since we're creating a Google Sheet and writing data into it, we must use the read-write scope:
    • 'https://www.googleapis.com/auth/spreadsheets' — Read/write access to Sheets and Sheet properties

    Using the Google Sheets API

    Let's look at some code that reads rows from a SQLite database and creates a Google Sheet with that data. Since we covered the authorization boilerplate fully in earlier posts and videos, we're going straight to creating a Sheets service endpoint. The API string to use is 'sheets' and the version string to use is 'v4' as we call the apiclient.discovey.build() function:

    SHEETS = discovery.build('sheets', 'v4', http=creds.authorize(Http()))

    With the SHEETS service endpoint in hand, the first thing to do is to create a brand new Google Sheet. Before we use it, one thing to know about the Sheets API is that most calls require a JSON payload representing the data & operations you wish to perform, and you'll see this as you become more familiar with it. For creating new Sheets, it's pretty simple, you don't have to provide anything, in which case you'd pass in an empty (dict as the) body, but a better bare minimum would be a name for the Sheet, so that's what data is for:

    data = {'properties': {'title': 'Toy orders [%s]' % time.ctime()}}

    Notice that a Sheet's "title" is part of its "properties," and we also happen to add the timestamp as part of its name. With the payload complete, we call the API with the command to create a new Sheet [spreadsheets().create()], passing in data in the (eventual) request body:

    res = SHEETS.spreadsheets().create(body=data).execute()

    Alternatively, you can use the Google Drive API (v2 or v3) to create a Sheet but would also need to pass in the Google Sheets (file) MIME type:
    data = {
        'name': 'Toy orders [%s]' % time.ctime(),
        'mimeType': 'application/vnd.google-apps.spreadsheet',
    }
    res = DRIVE.files().create(body=data).execute() # insert() for v2
    
    The general rule-of-thumb is that if you're only working with Sheets, you can do all the operations with its API, but if creating files other than Sheets or performing other Drive file or folder operations, you may want to stick with the Drive API. You can also use both or any other Google APIs for more complex applications. We'll stick with just the Sheets API for now. After creating the Sheet, grab and display some useful information to the user:
    SHEET_ID = res['spreadsheetId']
    print('Created "%s"' % res['properties']['title'])
    
    You may be wondering: Why do I need to create a Sheet and then make a separate API call to add data to it? Why can't I do this all when creating the Sheet? The answer (to this likely FAQ) is you can, but you would need to construct and pass in a JSON payload representing the entire Sheet—meaning all cells and their formatting—a much larger and more complex data structure than just an array of rows. (Don't believe me? Try it yourself!) This is why we have all of the spreadsheets().values() methods... to simplify uploading or downloading of only values to or from a Sheet.

    Now let's turn our attention to the simple SQLite database file (db.sqlite) available from the Google Sheets Node.js codelab. The next block of code just connects to the database with the standard library sqlite3 package, grabs all the rows, adds a header row, and filters the last two (timestamp) columns:
    FIELDS = ('ID', 'Customer Name', 'Product Code', 'Units Ordered',
            'Unit Price', 'Status', 'Created at', 'Updated at')
    cxn = sqlite3.connect('db.sqlite')
    cur = cxn.cursor()
    rows = cur.execute('SELECT * FROM orders').fetchall()
    cxn.close()
    rows.insert(0, FIELDS)
    data = {'values': [row[:6] for row in rows]}
    
    When you have a payload (array of row data) you want to stick into a Sheet, you simply pass in those values to spreadsheets().values().update() like we do here:
    SHEETS.spreadsheets().values().update(spreadsheetId=SHEET_ID,
        range='A1', body=data, valueInputOption='RAW').execute()
    
    The call requires a Sheet's ID and command body as expected, but there are two other fields: the full (or, as in our case, the "upper left" corner of the) range of cells to write to (in A1 notation), and valueInputOption indicates how the data should be interpreted, writing the raw values ("RAW") or interpreting them as if a user were entering them into the UI ("USER_ENTERED"), possibly converting strings & numbers based on the cell formatting.

    Reading rows out of a Sheet is even easier, the spreadsheets().values().get() call needing only an ID and a range of cells to read:
    print('Wrote data to Sheet:')
    rows = SHEETS.spreadsheets().values().get(spreadsheetId=SHEET_ID,
        range='Sheet1').execute().get('values', [])
    for row in rows:
        print(row)
    
    The API call returns a dict which has a 'values' key if data is available, otherwise we default to an empty list so the for loop doesn't fail.

    If you run the code (entire script below) and grant it permission to manage your Google Sheets (via the OAuth2 prompt that pops up in the browser), the output you get should look like this:
    $ python3 sheets-toys.py # or python (2.x)
    Created "Toy orders [Thu May 26 18:58:17 2016]" with this data:
    ['ID', 'Customer Name', 'Product Code', 'Units Ordered', 'Unit Price', 'Status']
    ['1', "Alice's Antiques", 'FOO-100', '25', '12.5', 'DELIVERED']
    ['2', "Bob's Brewery", 'FOO-200', '60', '18.75', 'SHIPPED']
    ['3', "Carol's Car Wash", 'FOO-100', '100', '9.25', 'SHIPPED']
    ['4', "David's Dog Grooming", 'FOO-250', '15', '29.95', 'PENDING']
    ['5', "Elizabeth's Eatery", 'FOO-100', '35', '10.95', 'PENDING']
    

    Conclusion

    Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!):

    '''sheets-toys.py -- Google Sheets API demo
        created Jun 2016 by +Wesley Chun/@wescpy
    '''
    from __future__ import print_function
    import sqlite3
    import time
    
    from apiclient import discovery
    from httplib2 import Http
    from oauth2client import file, client, tools
    
    SCOPES = 'https://www.googleapis.com/auth/spreadsheets'
    store = file.Storage('storage.json')
    creds = store.get()
    if not creds or creds.invalid:
        flow = client.flow_from_clientsecrets('client_id.json', SCOPES)
        creds = tools.run_flow(flow, store)
    SHEETS = discovery.build('sheets', 'v4', http=creds.authorize(Http()))
    
    data = {'properties': {'title': 'Toy orders [%s]' % time.ctime()}}
    res = SHEETS.spreadsheets().create(body=data).execute()
    SHEET_ID = res['spreadsheetId']
    print('Created "%s"' % res['properties']['title'])
    
    FIELDS = ('ID', 'Customer Name', 'Product Code', 'Units Ordered',
            'Unit Price', 'Status', 'Created at', 'Updated at')
    cxn = sqlite3.connect('db.sqlite')
    cur = cxn.cursor()
    rows = cur.execute('SELECT * FROM orders').fetchall()
    cxn.close()
    rows.insert(0, FIELDS)
    data = {'values': [row[:6] for row in rows]}
    
    SHEETS.spreadsheets().values().update(spreadsheetId=SHEET_ID,
        range='A1', body=data, valueInputOption='RAW').execute()
    print('Wrote data to Sheet:')
    rows = SHEETS.spreadsheets().values().get(spreadsheetId=SHEET_ID,
        range='Sheet1').execute().get('values', [])
    for row in rows:
        print(row)
    
    You can now customize this code for your own needs, for a mobile frontend, devops script, or a server-side backend, perhaps accessing other Google APIs. If this example is too complex, check the Python quickstart in the docs that way simpler, only reading data out of an existing Sheet. If you know JavaScript and are ready for something more serious, try the Node.js codelab where we got the SQLite database from. That's it... hope you find these code samples useful in helping you get started with the latest Sheets API!

    EXTRA CREDIT: Feel free to experiment and try cell formatting or other API features. Challenge yourself as there's a lot more to Sheets than just reading and writing values!