Tuesday, November 29, 2016

Generating slides from spreadsheet data

NOTE: The code covered in this post are also available in a video walkthrough.


Introduction

A common use case when you have data in a spreadsheet or database, is to find ways of making that data more visually appealing to others. This is the subject of today's post, where we'll walk through a simple Python script that generates presentation slides based on data in a spreadsheet using both the Google Sheets and Slides APIs.

Specifically, we'll take all spreadsheet cells containing values and create an equivalent table on a slide with that data. The Sheet also features a pre-generated pie chart added from the Explore in Google Sheets feature that we'll import into a blank slide. Not only do we do that, but if the data in the Sheet is updated (meaning the chart is as well), then so can the imported chart image in the presentation. These are just two examples of generating slides from spreadsheet data. The example Sheet we're getting the data from for this script looks like this:


The data in this Sheet originates from the Google Sheets API codelab. In the codelab, this data lives in a SQLite relational database, and in the previous post covering how to migrate SQL data to Google Sheets, we "imported" that data into the Sheet we're using. As mentioned before, the pie chart comes from the Explore feature.

Using the Google Sheets & Slides APIs

The scopes needed for this application are the read-only scope for Sheets (to read the cell contents and the pie chart) and the read-write scope for Slides since we're creating a new presentation:
  • 'https://www.googleapis.com/auth/spreadsheets.readonly' — Read-only access to Google Sheets and properties
  • 'https://www.googleapis.com/auth/presentations' — Read-write access to Slides and Slides presentation properties
If you're new to using Google APIs, we recommend reviewing earlier posts & videos covering the setting up projects and the authorization boilerplate so that we can focus on the main app. Once we've authorized our app, two service endpoints are created, one for each API. The one for Sheets is saved to the SHEETS variable while the one for Slides goes to SLIDES.

Start with Sheets

The first thing to do is to grab all the data we need from the Google Sheet using the Sheets API. You can either supply your own Sheet with your own chart, or you can run the script from the earlier post mentioned earlier to create an identical Sheet as above. In either case, you need to provide the Sheet ID to read from, which is saved to the sheetID variable. Using its ID, we call spreadsheets().values().get() to pull out all the cells (as rows & columns) from the Sheet and save it to orders:
sheetID = '. . .'   # use your own!
orders = SHEETS.spreadsheets().values().get(range='Sheet1',
        spreadsheetId=sheetID).execute().get('values')
The next step is to call spreadsheets().get() to get all the sheets in the Sheet —there's only one, so grab it at index 0. Since this sheet only has one chart, we also use index 0 to get that:
sheet = SHEETS.spreadsheets().get(spreadsheetId=sheetID,
        ranges=['Sheet1']).execute().get('sheets')[0]
chartID = sheet['charts'][0]['chartId']
That's it for Sheets. Everything from here on out takes places in Slides.

Create new Slides presentation

A new slide deck can be created with SLIDES.presentations().create()—or alternatively with the Google Drive API which we won't do here. We'll name it, "Generating slides from spreadsheet data DEMO" and save its (new) ID along with the IDs of the title and subtitle textboxes on the (one) title slide created in the new deck:
DATA = {'title': 'Generating slides from spreadsheet data DEMO'}
rsp = SLIDES.presentations().create(body=DATA).execute()
deckID = rsp['presentationId']
titleSlide = rsp['slides'][0]
titleID = titleSlide['pageElements'][0]['objectId']
subtitleID = titleSlide['pageElements'][1]['objectId']

Create slides for table & chart

A mere title slide doesn't suffice as we need a place for the cell data as well as the pie chart, so we'll create slides for each. While we're at it, we might as well fill in the text for the presentation title and subtitle. These requests are self-explanatory as you can see below in the reqs variable. The SLIDES.presentations().batchUpdate() method is then used to send the four commands to the API. Upon return, save the IDs for both the cell table slide as well as the blank slide for the chart:
reqs = [
  {'createSlide': {'slideLayoutReference': {'predefinedLayout': 'TITLE_ONLY'}}},
  {'createSlide': {'slideLayoutReference': {'predefinedLayout': 'BLANK'}}},
  {'insertText': {'objectId': titleID,    'text': 'Importing Sheets data'}},
  {'insertText': {'objectId': subtitleID, 'text': 'via the Google Slides API'}},
]
rsp = SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=deckID).execute().get('replies')
tableSlideID = rsp[0]['createSlide']['objectId']
chartSlideID = rsp[1]['createSlide']['objectId']
Note the order of the requests. The create-slide requests come first followed by the text inserts. Responses that come back from the API are returned in the same order as they were sent, hence why the cell table slide ID comes back first (index 0) followed by the chart slide ID (index 1). The text inserts don't have any meaningful return values and are thus ignored.

Filling out the table slide

Now let's focus on the table slide. There are two things we need to accomplish. In the previous set of requests, we asked the API to create a "title only" slide, meaning there's (only) a textbox for the slide title. The next snippet of code gets all the page elements on that slide so we can get the ID of that textbox, the only thing on that page:
rsp = SLIDES.presentations().pages().get(presentationId=deckID,
        pageObjectId=tableSlideID).execute().get('pageElements')
textboxID = rsp[0]['objectId'] 
On this slide, we need to add the cell table for the Sheet data, so a create-table request takes care of that. The required elements in such a call include the ID of the slide the table should go on as well as the total number of rows and columns desired. Fortunately all that are available from tableSlideID and orders saved earlier. Oh, and add a title for this table slide too. Here's the code:
reqs = [
    {'createTable': {
        'elementProperties': {'pageObjectId': tableSlideID},
        'rows': len(orders),
        'columns': len(orders[0])},
    },
    {'insertText': {'objectId': textboxID, 'text': 'Toy orders'}},
]
rsp = SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=deckID).execute().get('replies')
tableID = rsp[0]['createTable']['objectId']
Another call to SLIDES.presentations().batchUpdate() and we're done, saving the ID of the newly-created table. Next, we'll fill in each cell of that table.

Populate table & add chart image

The first set of requests needed now fill in each cell of the table. The most compact way to issue these requests is with a double-for loop list comprehension. The first loops over the rows while the second loops through each column (of each row). Magically, this creates all the text insert requests needed.
reqs = [
    {'insertText': {
        'objectId': tableID,
        'cellLocation': {'rowIndex': i, 'columnIndex': j},
        'text': str(data),
    }} for i, order in enumerate(orders) for j, data in enumerate(order)]
The final request "imports" the chart from the Sheet onto the blank slide whose ID we saved earlier. Note, while the dimensions below seem completely arbitrary, be assured we're using the same size & transform as a blank rectangle we drew on the slide earlier (and read those values from). The alternative would be to use math to come up with your object dimensions. Here is the code we're talking about, followed by the actual call to the API:
reqs.append({'createSheetsChart': {
    'spreadsheetId': sheetID,
    'chartId': chartID,
    'linkingMode': 'LINKED',
    'elementProperties': {
        'pageObjectId': chartSlideID,
        'size': {
            'height': {'magnitude': 7075, 'unit': 'EMU'},
            'width':  {'magnitude': 11450, 'unit': 'EMU'}
        },
        'transform': {
            'scaleX': 696.6157,
            'scaleY': 601.3921,
            'translateX': 583875.04,
            'translateY': 444327.135,
            'unit': 'EMU',
        },
    },
}})
SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=deckID).execute()
Once all the requests have been created, send them to the Slides API then we're done. (In the actual app, you'll see we've sprinkled various print() calls to let the user knows which steps are being executed.

Conclusion

The entire script clocks in at just under 100 lines of code... see below. If you run it, you should get output that looks something like this:
$ python3 slides_table_chart.py
** Fetch Sheets data
** Fetch chart info from Sheets
** Create new slide deck
** Create 2 slides & insert slide deck title+subtitle
** Fetch table slide title (textbox) ID
** Create table & insert table slide title
** Fill table cells & create linked chart to Sheets
DONE
When the script has completed, you should have a new presentation with these 3 slides:




Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!). If I were to divide the script into major sections, they would be represented by each of the print() calls above. Here's the complete script—by using, copying, and/or modifying this code or any other piece of source from this blog, you implicitly agree to its Apache2 license:
from __future__ import print_function

from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

SCOPES = (
    'https://www.googleapis.com/auth/spreadsheets.readonly',
    'https://www.googleapis.com/auth/presentations',
)
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
    creds = tools.run_flow(flow, store)
HTTP = creds.authorize(Http())
SHEETS = discovery.build('sheets', 'v4', http=HTTP)
SLIDES = discovery.build('slides', 'v1', http=HTTP)

print('** Fetch Sheets data')
sheetID = '. . .'   # use your own!
orders = SHEETS.spreadsheets().values().get(range='Sheet1',
        spreadsheetId=sheetID).execute().get('values')

print('** Fetch chart info from Sheets')
sheet = SHEETS.spreadsheets().get(spreadsheetId=sheetID,
        ranges=['Sheet1']).execute().get('sheets')[0]
chartID = sheet['charts'][0]['chartId']

print('** Create new slide deck')
DATA = {'title': 'Generating slides from spreadsheet data DEMO'}
rsp = SLIDES.presentations().create(body=DATA).execute()
deckID = rsp['presentationId']
titleSlide = rsp['slides'][0]
titleID = titleSlide['pageElements'][0]['objectId']
subtitleID = titleSlide['pageElements'][1]['objectId']

print('** Create 2 slides & insert slide deck title+subtitle')
reqs = [
  {'createSlide': {'slideLayoutReference': {'predefinedLayout': 'TITLE_ONLY'}}},
  {'createSlide': {'slideLayoutReference': {'predefinedLayout': 'BLANK'}}},
  {'insertText': {'objectId': titleID,    'text': 'Importing Sheets data'}},
  {'insertText': {'objectId': subtitleID, 'text': 'via the Google Slides API'}},
]
rsp = SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=deckID).execute().get('replies')
tableSlideID = rsp[0]['createSlide']['objectId']
chartSlideID = rsp[1]['createSlide']['objectId']

print('** Fetch table slide title (textbox) ID')
rsp = SLIDES.presentations().pages().get(presentationId=deckID,
        pageObjectId=tableSlideID).execute().get('pageElements')
textboxID = rsp[0]['objectId']

print('** Create table & insert table slide title')
reqs = [
    {'createTable': {
        'elementProperties': {'pageObjectId': tableSlideID},
        'rows': len(orders),
        'columns': len(orders[0])},
    },
    {'insertText': {'objectId': textboxID, 'text': 'Toy orders'}},
]
rsp = SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=deckID).execute().get('replies')
tableID = rsp[0]['createTable']['objectId']

print('** Fill table cells & create linked chart to Sheets')
reqs = [
    {'insertText': {
        'objectId': tableID,
        'cellLocation': {'rowIndex': i, 'columnIndex': j},
        'text': str(data),
    }} for i, order in enumerate(orders) for j, data in enumerate(order)]

reqs.append({'createSheetsChart': {
    'spreadsheetId': sheetID,
    'chartId': chartID,
    'linkingMode': 'LINKED',
    'elementProperties': {
        'pageObjectId': chartSlideID,
        'size': {
            'height': {'magnitude': 7075, 'unit': 'EMU'},
            'width':  {'magnitude': 11450, 'unit': 'EMU'}
        },
        'transform': {
            'scaleX': 696.6157,
            'scaleY': 601.3921,
            'translateX': 583875.04,
            'translateY': 444327.135,
            'unit': 'EMU',
        },
    },
}})
SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=deckID).execute()
print('DONE')
As with our other code samples, you can now customize it to learn more about the API, integrate into other apps for your own needs, for a mobile frontend, sysadmin script, or a server-side backend!

Code challenge

Given the knowledge you picked up from this post and its code sample, augment the script with another call to the Sheets API that updates the number of toys ordered by one of the customers, then add the corresponding call to the Slides API that refreshes the linked image based on the changes made to the Sheet (and chart). EXTRA CREDIT: Use the Google Drive API to monitor the Sheet so that any updates to toy orders will result in an "automagic" update of the chart image in the Slides presentation.

Wednesday, November 9, 2016

Replacing text & images with the Google Slides API with Python

NOTE: The code covered in this post are also available in a video walkthrough however the code here differs slightly, featuring some minor improvements to the code in the video.

Introduction

One of the critical things developers have not been able to do previously was access Google Slides presentations programmatically. To address this "shortfall," the Slides team pre-announced their first API a few months ago at Google I/O 2016—also see full announcement video (40+ mins). In early November, the G Suite product team officially launched the API, finally giving all developers access to build or edit Slides presentations from their applications.

In this post, I'll walk through a simple example featuring an existing Slides presentation template with a single slide. On this slide are placeholders for a presentation name and company logo, as illustrated below:

One of the obvious use cases that will come to mind is to take a presentation template replete with "variables" and placeholders, and auto-generate decks from the same source but created with different data for different customers. For example, here's what a "completed" slide would look like after the proxies have been replaced with "real data:"

Using the Google Slides API

We need to edit/write into a Google Slides presentation, meaning the read-write scope from all Slides API scopes below:
  • 'https://www.googleapis.com/auth/presentations' — Read-write access to Slides and Slides presentation properties
  • 'https://www.googleapis.com/auth/presentations.readonly' — View-only access to Slides presentations and properties
  • 'https://www.googleapis.com/auth/drive' — Full access to users' files on Google Drive
Why is the Google Drive API scope listed above? Well, think of it this way: APIs like the Google Sheets and Slides APIs were created to perform spreadsheet and presentation operations. However, importing/exporting, copying, and sharing are all file-based operations, thus where the Drive API fits in. If you need a review of its scopes, check out the Drive auth scopes page in the docs. Copying a file requires the full Drive API scope, hence why it's listed above. If you're not going to copy any files and only performing actions with the Slides API, you can of course leave it out.

Since we've fully covered the authorization boilerplate fully in earlier posts and videos, we're going to skip that here and jump right to the action.

Getting started

What are we doing in today's code sample? We start with a slide template file that has "variables" or placeholders for a title and an image. The application code will go then replace these proxies with the actual desired text and image, with the goal being that this scaffolding will allow you to automatically generate multiple slide decks but "tweaked" with "real" data that gets substituted into each slide deck.

The title slide template file is TMPFILE, and the image we're using as the company logo is the Google Slides product icon whose filename is stored as the IMG_FILE variable in my Google Drive. Be sure to use your own image and template files! These definitions plus the scopes to be used in this script are defined like this:
IMG_FILE = 'google-slides.png'     # use your own!
TMPLFILE = 'title slide template'  # use your own!
SCOPES = (
    'https://www.googleapis.com/auth/drive',
    'https://www.googleapis.com/auth/presentations',
)
Skipping past most of the OAuth2 boilerplate, let's move ahead to creating the API service endpoints. The Drive API name is (of course) 'drive', currently on 'v3', while the Slides API is 'slides' and 'v1' in the following call to create a signed HTTP client that's shared with a pair of calls to the apiclient.discovery.build() function to create the API service endpoints:
HTTP = creds.authorize(Http())
DRIVE =  discovery.build('drive',  'v3', http=HTTP)
SLIDES = discovery.build('slides', 'v1', http=HTTP)

Copy template file

The first step of the "real" app is to find and copy the template file TMPLFILE. To do this, we'll use DRIVE.files().list() to query for the file, then grab the first match found. Then we'll use DRIVE.files().copy() to copy the file and name it 'Google Slides API template DEMO':
rsp = DRIVE.files().list(q="name='%s'" % TMPLFILE).execute().get('files')[0]
DATA = {'name': 'Google Slides API template DEMO'}
print('** Copying template %r as %r' % (rsp['name'], DATA['name']))
DECK_ID = DRIVE.files().copy(body=DATA, fileId=rsp['id']).execute().get('id')

Find image placeholder

Next, we'll ask the Slides API to get the data on the first (and only) slide in the deck. Specifically, we want the dimensions of the image placeholder. Later on, we will use those properties when replacing it with the company logo, so that it will be automatically resized and centered into the same spot as the image placeholder.
The SLIDES.presentations().get() method is used to read the presentation metadata. Returned is a payload consisting of everything in the presentation, the masters, layouts, and of course, the slides themselves. We only care about the slides, so we get that from the payload. And since there's only one slide, we grab it at index 0. Once we have the slide, we're loop through all of the elements on that page and stop when we find the rectangle (image placeholder):
print('** Get slide objects, search for image placeholder')
slide = SLIDES.presentations().get(presentationId=DECK_ID
       ).execute().get('slides')[0]
obj = None
for obj in slide['pageElements']:
    if obj['shape']['shapeType'] == 'RECTANGLE':
        break

Find image file

At this point, the obj variable points to that rectangle. What are we going to replace it with? The company logo, which we now query for using the Drive API:
print('** Searching for icon file')
rsp = DRIVE.files().list(q="name='%s'" % IMG_FILE).execute().get('files')[0]
print(' - Found image %r' % rsp['name'])
img_url = '%s&access_token=%s' % (
        DRIVE.files().get_media(fileId=rsp['id']).uri, creds.access_token) 
The query code is similar to when we searched for the template file earlier. The trickiest thing about this snippet is that we need a full URL that points directly to the company logo. We use the DRIVE.files().get_media() method to create that request but don't execute it. Instead, we dig inside the request object itself and grab the file's URI and merge it with the current access token so what we're left with is a valid URL that the Slides API can use to read the image file and create it in the presentation.

Replace text and image

Back to the Slides API for the final steps: replace the title (text variable) with the desired text, add the company logo with the same size and transform as the image placeholder, and delete the image placeholder as it's no longer needed:
print('** Replacing placeholder text and icon')
reqs = [
    {'replaceAllText': {
        'containsText': {'text': '{{NAME}}'},
        'replaceText': 'Hello World!'
    }},
    {'createImage': {
        'url': img_url,
        'elementProperties': {
            'pageObjectId': slide['objectId'],
            'size': obj['size'],
            'transform': obj['transform'],
        }
    }},
    {'deleteObject': {'objectId': obj['objectId']}},
]
SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=DECK_ID).execute()
print('DONE')
Once all the requests have been created, send them to the Slides API then let the user know everything is done.

Conclusion

That's the entire script, just under 60 lines of code. If you watched the video, you may notice a few minor differences in the code. One is use of the fields parameter in the Slides API calls. They represent the use of field masks, which is a separate topic on its own. As you're learning the API now, it may cause unnecessary confusion, so it's okay to disregard them for now. The other difference is an improvement in the replaceAllText request—the old way in the video is now deprecated, so go with what we've replaced it with in this post.

If your template slide deck and image is in your Google Drive, and you've modified the filenames and run the script, you should get output that looks something like this:
$ python3 slides_template.py
** Copying template 'title slide template' as 'Google Slides API template DEMO'
** Get slide objects, search for image placeholder
** Searching for icon file
 - Found image 'google-slides.png'
** Replacing placeholder text and icon
DONE
Below is the entire script for your convenience which runs on both Python 2 and Python 3 (unmodified!). If I were to divide the script into major sections, they would be:
  • Get creds & build API service endpoints
  • Copy template file
  • Get image placeholder size & transform (for replacement image later)
  • Get secure URL for company logo
  • Build and send Slides API requests to...
    • Replace slide title variable with "Hello World!"
    • Create image with secure URL using placeholder size & transform
    • Delete image placeholder
Here's the complete script—by using, copying, and/or modifying this code or any other piece of source from this blog, you implicitly agree to its Apache2 license:
from __future__ import print_function

from apiclient import discovery
from httplib2 import Http
from oauth2client import file, client, tools

IMG_FILE = 'google-slides.png'      # use your own!
TMPLFILE = 'title slide template'   # use your own!
SCOPES = (
    'https://www.googleapis.com/auth/drive',
    'https://www.googleapis.com/auth/presentations',
)
store = file.Storage('storage.json')
creds = store.get()
if not creds or creds.invalid:
    flow = client.flow_from_clientsecrets('client_secret.json', SCOPES)
    creds = tools.run_flow(flow, store)
HTTP = creds.authorize(Http())
DRIVE  = discovery.build('drive',  'v3', http=HTTP)
SLIDES = discovery.build('slides', 'v1', http=HTTP)

rsp = DRIVE.files().list(q="name='%s'" % TMPLFILE).execute().get('files')[0]
DATA = {'name': 'Google Slides API template DEMO'}
print('** Copying template %r as %r' % (rsp['name'], DATA['name']))
DECK_ID = DRIVE.files().copy(body=DATA, fileId=rsp['id']).execute().get('id')

print('** Get slide objects, search for image placeholder')
slide = SLIDES.presentations().get(presentationId=DECK_ID,
        fields='slides').execute().get('slides')[0]
obj = None
for obj in slide['pageElements']:
    if obj['shape']['shapeType'] == 'RECTANGLE':
        break

print('** Searching for icon file')
rsp = DRIVE.files().list(q="name='%s'" % IMG_FILE).execute().get('files')[0]
print(' - Found image %r' % rsp['name'])
img_url = '%s&access_token=%s' % (
        DRIVE.files().get_media(fileId=rsp['id']).uri, creds.access_token)

print('** Replacing placeholder text and icon')
reqs = [
    {'replaceAllText': {
        'containsText': {'text': '{{NAME}}'},
        'replaceText': 'Hello World!'
    }},
    {'createImage': {
        'url': img_url,
        'elementProperties': {
            'pageObjectId': slide['objectId'],
            'size': obj['size'],
            'transform': obj['transform'],
        }
    }},
    {'deleteObject': {'objectId': obj['objectId']}},
]
SLIDES.presentations().batchUpdate(body={'requests': reqs},
        presentationId=DECK_ID).execute()
print('DONE')
As with our other code samples, you can now customize it to learn more about the API, integrate into other apps for your own needs, for a mobile frontend, sysadmin script, or a server-side backend!

Code challenge

Add more slides and/or text variables and modify the script replace them too. EXTRA CREDIT: Change the image-based image placeholder to a text-based image placeholder, say a textbox with the text, "{{COMPANY_LOGO}}" and use the replaceAllShapesWithImage request to perform the image replacement. By making this one change, your code should be simplified from the image-based image replacement solution we used in this post.