Thursday, November 6, 2014

Authorized Google API access from Python (part 2 of 2)

NOTE: You can also watch a video walkthrough of the common code covered in this blogpost here.

UPDATE (Aug 2016): The code has been modernized to use oauth2client.tools.run_flow() instead of the deprecated oauth2client.tools.run_flow(). You can read more about that change here.

UPDATE (Jun 2016): Updated to Python 2.7 & 3.3+ and Drive API v3.

Introduction

In this final installment of a (currently) two-part series introducing Python developers to building on Google APIs, we'll extend from the simple API example from the first post (part 1) just over a month ago. Those first snippets showed some skeleton code and a short real working sample that demonstrate accessing a public (Google) API with an API key (that queried public Google+ posts). An API key however, does not grant applications access to authorized data.

Authorized data, including user information such as personal files on Google Drive and YouTube playlists, require additional security steps before access is granted. Sharing of and hardcoding credentials such as usernames and passwords is not only insecure, it's also a thing of the past. A more modern approach leverages token exchange, authenticated API calls, and standards such as OAuth2.

In this post, we'll demonstrate how to use Python to access authorized Google APIs using OAuth2, specifically listing the files (and folders) in your Google Drive. In order to better understand the example, we strongly recommend you check out the OAuth2 guides (general OAuth2 info, OAuth2 as it relates to Python and its client library) in the documentation to get started.

The docs describe the OAuth2 flow: making a request for authorized access, having the user grant access to your app, and obtaining a(n access) token with which to sign and make authorized API calls with. The steps you need to take to get started begin nearly the same way as for simple API access. The process diverges when you arrive on the Credentials page when following the steps below.

Google API access

In order to Google API authorized access, follow these instructions (the first three of which are roughly the same for simple API access):
  • Go to the Google Developers Console and login.
    • Use your Gmail or Google credentials; create an account if needed
  • Click "Create a Project" from pulldown under your username (at top)
    • Enter a Project Name (mutable, human-friendly string only used in the console)
    • Enter a Project ID (immutable, must be unique and not already taken)
  • Once project has been created, enable APIs you wish to use
  • Select "Credentials" in left-nav
    • Click "Create credentials" and select OAuth client ID
    • In the new dialog, select your application type — we're building a command-line script which is an "Installed application"
    • In the bottom part of that same dialog, specify the type of installed application; choose "Other" (cmd-line scripts are not web nor mobile)
    • Click "Create Client ID" to generate your credentials
  • Finally, click "Download JSON" to save the new credentials to your computer... perhaps choose a shorter name like "client_secret.json" or "client_id.json"
NOTEs: Instructions from the previous blogpost were to get an API key. This time, in the steps above, we're creating and downloading OAuth2 credentials. You can also watch a video walkthrough of this app setup process of getting simple or authorized access credentials in the "DevConsole" here.

    Accessing Google APIs from Python

    In order to access authorized Google APIs from Python, you still need the Google APIs Client Library for Python, so in this case, do follow those installation instructions from part 1.

    We will again use the apiclient.discovery.build() function, which is what we need to create a service endpoint for interacting with an API, authorized or otherwise. However, for authorized data access, we need additional resources, namely the httplib2 and oauth2client packages. Here are the first five lines of the new boilerplate code for authorized access:

    from __future__ import print_function
    
    from apiclient import discovery
    from httplib2 import Http
    from oauth2client import file, client, tools
    
    SCOPES = # one or more scopes (strings)
    
    SCOPES is a critical variable: it represents the set of scopes of authorization an app wants to obtain (then access) on behalf of user(s). What's does a scope look like?

    Each scope is a single character string, specifically a URL. Here are some examples:
    • 'https://www.googleapis.com/auth/plus.me' — access your personal Google+ settings
    • 'https://www.googleapis.com/auth/drive.metadata.readonly' — read-only access your Google Drive file or folder metadata
    • 'https://www.googleapis.com/auth/youtube' — access your YouTube playlists and other personal information
    You can request one or more scopes, given as a single space-delimited string of scopes or an iterable (list, generator expression, etc.) of strings.  If you were writing an app that accesses both your YouTube playlists as well as your Google+ profile information, your SCOPES variable could be either of the following:
    SCOPES = 'https://www.googleapis.com/auth/plus.me https://www.googleapis.com/auth/youtube'

    That is space-delimited and made tiny by me so it doesn't wrap in a regular-sized browser window; or it could be an easier-to-read, non-tiny, and non-wrapped tuple:

    SCOPES = (
        'https://www.googleapis.com/auth/plus.me',
        'https://www.googleapis.com/auth/youtube',
    )

    Our example command-line script will just list the files on your Google Drive, so we only need the read-only Drive metadata scope, meaning our SCOPES variable will be just this:
    SCOPES = 'https://www.googleapis.com/auth/drive.metadata.readonly'
    The next section of boilerplate represents the security code:
    store = file.Storage('storage.json')
    creds = store.get()
    if not creds or creds.invalid:
        flow = client.flow_from_clientsecrets('client_id.json', SCOPES)
        creds = tools.run_flow(flow, store)
    
    Once the user has authorized access to their personal data by your app, a special "access token" is given to your app. This precious resource must be stored somewhere local for the app to use. In our case, we'll store it in a file called "storage.json". The lines setting the store and creds variables are attempting to get a valid access token with which to make an authorized API call.

    If the credentials are missing or invalid, such as being expired, the authorization flow (using the client secret you downloaded along with a set of requested scopes) must be created (by client.flow_from_clientsecrets()) and executed (by tools.run_flow()) to ensure possession of valid credentials. The client_id.json or client_secret.json file is the credentials file you saved when you clicked "Download JSON" from the DevConsole after you've created your OAuth2 client ID.

    If you don't have credentials at all, the user much explicitly grant permission — I'm sure you've all seen the OAuth2 dialog describing the type of access an app is requesting (remember those scopes?). Once the user clicks "Accept" to grant permission, a valid access token is returned and saved into the storage file (because you passed a handle to it when you called tools.run_flow()).

    Note: tools.run() deprecated by tools.run_flow()
    You may have seen usage of the older tools.run() function, but it has been deprecated by tools.run_flow(). We explain this in more detail in another blogpost specifically geared towards migration.

    Once the user grants access and valid credentials are saved, you can create one or more endpoints to the secure service(s) desired with apiclient.discovery.build(), just like with simple API access. Its call will look slightly different, mainly that you need to sign your HTTP requests with your credentials rather than passing an API key:

    DRIVE = discovery.build(API, VERSION, http=creds.authorize(Http()))

    In our example, we're going to list your files and folders in your Google Drive, so for API, use the string 'drive'. The API is currently on version 3 so use 'v3' for VERSION:

    DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))

    If you want to get comfortable with OAuth2, what it's flow is and how it works, we recommend that you experiment at the OAuth Playground. There you can choose from any number of APIs to access and experience first-hand how your app must be authorized to access personal data.

    Going back to our working example, once you have an established service endpoint, you can use the list() method of the files service to request the file data:

    files = DRIVE.files().list().execute().get('files', [])

    If there's any data to read, the response dict will contain an iterable of files that we can loop over (or default to an empty list so the loop doesn't fail), displaying file names and types:

    for f in files:
        print(f['name'], f['mimeType'])

    Conclusion

    To find out more about the input parameters as well as all the fields that are in the response, take a look at the docs for files().list(). For more information on what other operations you can execute with the Google Drive API, take a look at the reference docs and check out the companion video for this code sample. That's it!

    Below is the entire script for your convenience:
    '''
    drive_list.py -- Google Drive API authorized demo
        updated Aug 2016 by +WesleyChun/@wescpy
    '''
    from __future__ import print_function
    
    from apiclient import discovery
    from httplib2 import Http
    from oauth2client import file, client, tools
    
    SCOPES = 'https://www.googleapis.com/auth/drive.readonly.metadata'
    store = file.Storage('storage.json')
    creds = store.get()
    if not creds or creds.invalid:
        flow = client.flow_from_clientsecrets('client_id.json', SCOPES)
        creds = tools.run_flow(flow, store)
    
    DRIVE = discovery.build('drive', 'v3', http=creds.authorize(Http()))
    files = DRIVE.files().list().execute().get('files', [])
    for f in files:
        print(f['name'], f['mimeType'])
    
    When you run it, you should see pretty much what you'd expect, a list of file or folder names followed by their MIMEtypes — I named my script drive_list.py:
    $ python3 drive_list.py
    Google Maps demo application/vnd.google-apps.spreadsheet
    Overview of Google APIs - Sep 2014 application/vnd.google-apps.presentation
    tiresResearch.xls application/vnd.google-apps.spreadsheet
    6451_Core_Python_Schedule.doc application/vnd.google-apps.document
    out1.txt application/vnd.google-apps.document
    tiresResearch.xls application/vnd.ms-excel
    6451_Core_Python_Schedule.doc application/msword
    out1.txt text/plain
    Maps and Sheets demo application/vnd.google-apps.spreadsheet
    ProtoRPC Getting Started Guide application/vnd.google-apps.document
    gtaskqueue-1.0.2_public.tar.gz application/x-gzip
    Pull Queues application/vnd.google-apps.folder
    gtaskqueue-1.0.1_public.tar.gz application/x-gzip
    appengine-java-sdk.zip application/zip
    taskqueue.py text/x-python-script
    Google Apps Security Whitepaper 06/10/2010.pdf application/pdf
    
    Obviously your output will be different, depending on what files are in your Google Drive. But that's it... hope this is useful. You can now customize this code for your own needs and/or to access other Google APIs. Thanks for reading!

    EXTRA CREDIT: To test your skills, add functionality to this code that also displays the last modified timestamp, the file (byte)size, and perhaps shave the MIMEtype a bit as it's slightly harder to read in its entirety... perhaps take just the final path element? One last challenge: in the output above, we have both Microsoft Office documents as well as their auto-converted versions for Google Apps... perhaps only show the filename once and have a double-entry for the filetypes!

    Saturday, September 20, 2014

    Simple Google API access from Python (part 1 of 2)

    NOTE: You can also watch a video walkthrough of the common code covered in this blogpost here.

    UPDATE (Aug 2016): The code has been modernized to recognize that the Client Library is available for Python 2 or 3.

    Introduction

    Back in 2012 when I published Core Python Applications Programming, 3rd ed., I
    posted about how I integrated Google technologies into the book. The only problem is that I presented very specific code for Google App Engine and Google+ only. I didn't show a generic way how, using pretty much the same boilerplate Python snippet, you can access any number of Google APIs; so here we are.

    In this multi-part series, I'll break down the code that allows you to leverage Google APIs to the most basic level (even for Python), so you can customize as necessary for your app, whether it's running as a command-line tool or something server-side in the cloud backending Web or mobile clients. If you've got the book and played around with our Google+ API example, you'll find this code familiar, if not identical — I'll go into more detail here, highlighting the common code for generic API access and then bring in the G+-relevant code later.

    We'll start in this first post by demonstrating how to access public or unauthorized data from Google APIs. (The next post will illustrate how to access authorized data from Google APIs.) Regardless of which you use, the corresponding boilerplate code stands alone. In fact, it's probably best if you saved these generic snippets in a library module so you can (re)use the same bits for any number of apps which access any number of modern Google APIs.

    Google API access

    In order to access Google APIs, follow these instructions:
    • Go to the Google Developers Console and login.
      • Use your Gmail or Google credentials; create an account if needed
    • Click "Create Project" button
      • Enter a Project Name (mutable, human-friendly string only used in the console)
      • Enter a Project ID (immutable, must be unique and not already taken)
    • Once project has been created, click "Enable an API" button
      • You can toggle on any API(s) that support(s) simple API access (not authorized).
      • For the code example below, we use the Google+ API.
      • Other ideas: YouTube Data API, Google Maps API, etc.
      • Find more APIs (and version#s which you need) at the OAuth Playground.
    • Select "Credentials" in left-nav under "APIs & auth"
      • Go to bottom half and click "Create new Key" button
      • Grab long "API KEY" cryptic string and save to Python script
      NOTE: You can also watch a video walkthrough of this app setup process in the "DevConsole" here.

      Accessing Google APIs from Python

      Now that you're set up, everything else is done on the Python side. To talk to a Google API, you need the Google APIs Client Library for Python, specifically the apiclient.discovery.build() function. Download and install the library in your usual way, for example:

      $ pip install -U google-api-python-client  # or pip3 for 3.x
      NOTE: If you're building a Python App Engine app, you'll need something else, the Google APIs Client Library for Python on Google App Engine. It's similar but has extra goodies (specifically decorators — brief generic intro to those in my previous post) just for cloud developers that must be installed elsewhere. As App Engine developers know, libraries must be in the same location on the filesystem as your source code.
      Once everything is installed, make sure that you can import apiclient.discovery:

      $ python
      Python 2.7.6 (default, Apr  9 2014, 11:48:52)
      [GCC 4.2.1 Compatible Apple LLVM 5.1 (clang-503.0.38)] on darwin
      Type "help", "copyright", "credits" or "license" for more information.
      >>> import apiclient.discovery
      >>>

      In discovery.py is the build() function, which is what we need to create a service endpoint for interacting with an API. Now craft the following lines of code in your command-line tool, using the shorthand from-import statement instead:

      from apiclient import discovery

      API_KEY = # copied from project credentials page
      SERVICEdiscovery.build(API, VERSION, developerKey=API_KEY)

      Take the API key you copied from the credentials page and assign to the API_KEY variable as a string. Obviously, embedding an API key in source code isn't something you'd so in practice as it's not secure whatsoever — stick it in a database, key broker, encrypt, or at least have it in a separate byte code (.pyc/.pyo) file that you import — but we'll allow it now solely for illustrative purposes of a simple command-line script.

      In our short example we're going to do a simple search for "python" in public Google+ posts, so for the API variable, use the string 'plus'. The API version is currently on version 1 (at the time of this writing), so use 'v1' for VERSION. (Each API will use a different name and version string... again, you can find those in the OAuth Playground or in the docs for the specific API you want to use.) Here's the call once we've filled in those variables:

      GPLUS = discovery.build('plus', 'v1', developerKey=API_KEY)

      We need a template for the results that come back. There are many fields in a Google+ post, so we're only going to pick three to display... the user name, post timestamp, and a snippet of the post itself:

      TMPL = '''
          User: %s
          Date: %s
          Post: %s
      '''

      Now for the code. Google+ posts are activities (known as "notes;" there are other activities as well). One of the methods you have access to is search(), which lets you query public activities; so that's what we're going to use. Add the following call using the GPLUS service endpoint you already created using the verbs we just described and execute it:

      items = GPLUS.activities().search(query='python').execute().get('items', [])

      If all goes well, the (JSON) response payload will contain a set of 'items' (else we assign an empty list for the for loop). From there, we'll loop through each matching post, do some minor string manipulation to replace all whitespace characters (including NEWLINEs [ \n ]) with spaces, and display if not blank:

      for data in items:
          post = ' '.join(data['title'].strip().split())
          if post:
              print(TMPL % (data['actor']['displayName'],
                            data['published'], post))


      Conclusion

      To find out more about the input parameters as well as all the fields that are in the response, take a look at the docs. Below is the entire script missing only the API_KEY which you'll have to fill in yourself.

      from __future__ import print_function
      from apiclient import discovery
      
      TMPL = '''
          User: %s
          Date: %s
          Post: %s
      '''
      
      API_KEY = # copied from project credentials page
      GPLUS = discovery.build('plus', 'v1', developerKey=API_KEY)
      items = GPLUS.activities().search(query='python').execute().get('items', [])
      for data in items:
          post = ' '.join(data['title'].strip().split())
          if post:
              print(TMPL % (data['actor']['displayName'],
                            data['published'], post))
      

      When you run it, you should see pretty much what you'd expect, a few posts on Python, some on Monty Python, and of course, some on the snake — I called my script plus_search.py:

      $ python plus_search.py # or python3
      
          User: Jeff Ward
          Date: 2014-09-20T18:08:23.058Z
          Post: How to make python accessible in the command window.
      
      
          User: Fayland Lam
          Date: 2014-09-20T16:40:11.512Z
          Post: Data Engineer http://findmjob.com/job/AB7ZKitA5BGYyW1oAlQ0Fw/Data-Engineer.html #python #hadoop #jobs...
      
      
          User: Willy's Emporium LTD
          Date: 2014-09-20T16:19:33.851Z
          Post: MONTY PYTHON QUOTES MUG Take a swig to wash down all that albatross and crunchy frog. Featuring 20 ...
      
      
          User: Doddy Pal
          Date: 2014-09-20T15:49:54.405Z
          Post: Classic Monty Python!!!
      
      
          User: Sebastian Huskins
          Date: 2014-09-20T15:33:00.707Z
          Post: Made a small python script to get shellcode out of an executable. I found a nice commandlinefu.com oneline...
      

      EXTRA CREDIT: To test your skills, check the docs and add a fourth line to each output which is the URL/link to that specific post, so that you (and your users) can open a browser to it if of interest.

      If you want to build on from here, check out the larger app using the Google+ API featured in Chapter 15 of the book — it adds some brains to this basic code where the Google+ posts are sorted by popularity using a "chatter" score. That just about wraps it up this post. Once you're good to go, then you're ready to learn how to perform authorized Google API access in part 2 of this two-part series!

      Saturday, July 26, 2014

      Introduction to Python decorators

      In this post, we're going to give you a user-friendly introduction to Python decorators. (The code works on both Python 2 [2.6 or 2.7 only] and 3 so don't be concerned with your version.) Before jumping into the topic du jour, consider the usefulness of the map() function. You've got a list with some data and want to apply some function [like times2() below] to all its elements and get a new list with the modified data:

      def times2(x):
          return x * 2

      >>> list(map(times2, [0, 1, 2, 3, 4]))
      [0, 2, 4, 6, 8]

      Yeah yeah, I know that you can do the same thing with a list comprehension or generator expression, but my point was about an independent piece of logic [like times2()] and mapping that function across a data set ([0, 1, 2, 3, 4]) to generate a new data set ([0, 2, 4, 6, 8]). However, since mapping functions like times2()aren't tied to any particular chunk of data, you can reuse them elsewhere with other unrelated (or related) data.

      Along similar lines, consider function calls. You have independent functions and methods in classes. Now, think about "mapped" execution across functions. What are things that you can do with functions that don't have much to do with the behavior of the functions themselves? How about logging function calls, timing them, or some other introspective, cross-cutting behavior. Sure you can implement that behavior in each of the functions that you care about such information, however since they're so generic, it would be nice to only write that logging code just once.

      Introduced in 2.4, decorators modularize cross-cutting behavior so that developers don't have to implement near duplicates of the same piece of code for each function. Rather, Python gives them the ability to put that logic in one place and use decorators with its at-sign ("@") syntax to "map" that behavior to any function (or method). This compartmentalization of cross-cutting functionality gives Python an aspect-oriented programming flavor.

      How do you do this in Python? Let's take a look at a simple example, the logging of function calls. Create a decorator function that takes a function object as its sole argument, and implement the cross-cutting functionality. In logged() below, we're just going to log function calls by making a call to the print() function each time a logged function is called.

      def logged(_func):
          def _wrapped():
              print('Function %r called at: %s' % (
                  _func.__name__, ctime()))
              return _func()
          return _wrapped

      In logged(), we use the function's name (given by func.__name__) plus a timestamp from time.ctime() to build our output string. Make sure you get the right imports, time.ctime() for sure, and if using Python 2, the print() function:

      from __future__ import print_function # 2.6 or 2.7 only
      from time import ctime

      Now that we have our logged() decorator, how do we use it? On the line above the function which you want to apply the decorator to, place an at-sign in front of the decorator name. That's followed immediately on the next line with the normal function declaration. Here's what it looks like, applied to a boring generic foo() function which just print()s it's been called.

      @logged
      def foo():
          print('foo() called')

      When you call foo(), you can see that the decorator logged() is called first, which then calls foo() on your behalf:

      $ log_func.py
      Function 'foo' called at: Sun Jul 27 04:09:37 2014
      foo() called

      If you take a closer look at logged() above, the way the decorator works is that the decorated function is "wrapped" so that it is passed as func to the decorator then the newly-wrapped function _wrapped()is (re)assigned as foo(). That's why it now behaves the way it does when you call it.

      The entire script:

      #!/usr/bin/env python
      'log_func.py -- demo of decorators'

      from __future__ import print_function
       # 2.6 or 2.7 only
      from time import ctime

      def logged(_func):
          def _wrapped():
              print('Function %r called at: %s' % (
                    _func.__name__, ctime()))
              return _func()
          return _wrapped

      @logged
      def foo():
          print('foo() called')

      foo()


      That was just a simple example to give you an idea of what decorators are. If you dig a little deeper, you'll discover one caveat is that the wrapping isn't perfect. For example, the attributes of foo() are lost, i.e., its name and docstring. If you ask for either, you'll get _wrapped()'s info instead:

      >>> print("My name:", foo.__name__) # should be 'foo'!
      My name: _wrapped
      >>> print("Docstring:", foo.__doc__) # _wrapped's docstring!
      Docstring: None

      In reality, the "@" syntax is just a shortcut. Here's what you really did, which should explain this behavior:

      def foo():
          print('foo() called')

      foo = logged(foo) # returns _wrapped (and its attributes)

      So as you can tell, it's not a complete wrap. A convenience function that ties up these loose ends is functools.wraps(). If you use it and run the same code, you will get foo()'s info. However, if you're not going to use a function's attributes while it's wrapped, it's less important to do this.

      There's also support for additional features, such calling decorated functions with parameters, applying more complex decorators, applying multiple levels of decorators, and also class decorators. You can find out more about (function and method) decorators in Chapter 11 of Core Python Programming or live in my upcoming course which starts in just a few days near the San Francisco airport... there are still a few seats left!