To keep track of all your folders and files in Google Drive, you can run a Python script that automatically scrapes information of all the content uploaded in your account. Let me show you how!

This post is part of the Project Pigeon, a series of tutorials that covers using Python to automate the following tasks:

  1. Split a huge Excel workbook into multiple files: Click here
  2. Upload Excel files to Google Drive: Part 1 and Part 2

Before proceeding with this tutorial, make sure you have read and understood my previous posts.

I personally use this script right after uploading hundreds of files to GDrive for monitoring purposes. It's inconvenient to list the uploaded folders one by one.

What you'll need

  1. A Google Drive folder that you want to track
  2. A Google spreadsheet to list your output

Steps

  • Step 1. Get all the folders in Google Drive - Click here
  • Step 2. Write to Google Spreadsheet and Customize the format - Click here
  • Step 3. Run the script and check the result!

Step 1: Get all the folders in Google Drive.

Again before we start, you must know how to connect to Google APIs already, which I discussed in this demo.

Image: Google Drive with many subfolders

First, list down the information you need to extract from GDrive API. In this scenario, I wanted to know the following:

  • Folder owner
  • Folder name
  • Folder Link
  • Number of Files in the Folder
  • Latest File uploaded
  • Latest File upload time

Image: Desired Output in Google Spreadsheet

Retrieve the basic folder information (folder id and name) using this Google API query:

def getAllFolders(folder_id, spreadsheet_id):

    folder_id = FOLDER_ID

    results = DRIVE_SERVICE.files().list(q="mimeType='application/vnd.google-apps.folder' and parents in '"+folder_id+"' and trashed = false",fields="nextPageToken, files(id, name)",pageSize=400).execute()
...

To determine the number of files in a GDrivefolder, use the countFiles() method:

def countFiles(folder_id):

    results = DRIVE_SERVICE.files().list(q="mimeType='application/vnd.google-apps.spreadsheet' and parents in '"+folder_id+"' and trashed = false",fields="nextPageToken, files(id, name)").execute()

    items = results.get('files', [])

    count = 0
    for item in items:
        count += 1
    return count

To determine the name of latest file uploaded in the GDrive folder, use the getLatestFile() and retrieveName() methods:

def getLatestFile(folder_id):  
    results = DRIVE_SERVICE.files().list(q="mimeType='application/vnd.google-apps.spreadsheet' and parents in '"+folder_id+"' and trashed = false",fields="nextPageToken, files(id, name, createdTime)", orderBy="createdTime").execute()
    items = results.get('files', [])

    if not items:
        return ""
    else:
        return items

def retrieveName(items,count):  
    if not items:
        return ""
    else:
        print(items[count-1]['name'])
        return items[count-1]['name']


latestFile = retrieveName(getLatestFile(item['id']),file_count)  

Use the following methods to get the upload timestamp of the latest file in a Google Drive folder:

def retrieveCreatedDate(items,count):  
    if not items:
        return ""
    else:
        print(items[count-1]['createdTime'])
        return items[count-1]['createdTime']

createdDate = retrieveCreatedDate(getLatestFile(item['id']),file_count)

Putting everything together results to this method:

def getAllFolders(folder_id, spreadsheet_id):

    folder_id = FOLDER_ID
    results = DRIVE_SERVICE.files().list(q="mimeType='application/vnd.google-apps.folder' and parents in '"+folder_id+"' and trashed = false",fields="nextPageToken, files(id, name)",pageSize=400).execute()

    items = results.get('files', [])

    values = []

    count = 1
    print(len(items))

    for item in items:
        print(count)
        print('{0} ({1})'.format(item['name'], item['id']))

        # Sales Rep
        person_name = item['name'][CHAR_BEFORE_NAME:].strip()

        # Number of Files
        file_count = countFiles(item['id'])
        latestFileId = '=HYPERLINK("https://docs.google.com/spreadsheets/d/'+retrieveId(getLatestFile(item['id']),file_count)+'","Click")'

        # Latest File name and upload time
        latestFile = retrieveName(getLatestFile(item['id']),file_count)
        createdDate = retrieveCreatedDate(getLatestFile(item['id']),file_count)

        print("----", person_name)
        print("----", file_count)
        print("----", createdDate)

        item['person'] = person_name
        item['file_count'] = file_count
        item['link'] = '=HYPERLINK("https://drive.google.com/drive/folders/'+item['id']+'","Click")'

     # Save to a list    
values.append([item['person'],item['name'],item['link'],item['id'],item['file_count'],latestFile,latestFileId,createdDate])

        count+=1
...

Next step: Writing these information to a Google spreadsheet!

Back to top



Step 2: Write to Google Spreadsheet and Customize format

Write to Google Spreadsheet

def writeToSheets(values,range,spreadsheet_id):  
    print("Writing to Sheets")
    body = {
        'values': values
        }
    print(values)
    SS_SERVICE.spreadsheets().values().update(spreadsheetId=spreadsheet_id,valueInputOption='USER_ENTERED',range=range,body=body).execute()


def getAllFolders(folder_id, spreadsheet_id):  
    ....
    writeToSheets(values,TAB_NAME,spreadsheet_id)

Customize Formatting

Google Sheet APIs also allows you to customize the formatting of a Google Spreadsheet automatically.

An example is this colorAndSort() method, which applies conditional formatting and sorts multiple columns:

def colorAndSort(spreadsheet_id):  
    sheet_id=SHEET_ID
    my_range = {
        'sheetId': sheet_id,
        'startRowIndex': 1,
        'endRowIndex': 600,
        'startColumnIndex': 4,
        'endColumnIndex': 5,
    }
    requests = [{
        'addConditionalFormatRule': {
            'rule': {
                'ranges': [ my_range ],
                'booleanRule': {
                    'condition': {
                        'type': 'CUSTOM_FORMULA',
                        'values': [ { 'userEnteredValue': '=GT($E2,1)'} ]
                    },
                    'format': {
                        'backgroundColor': { 'red': 1, 'green': 1, 'blue': 0 }
                        }
                    }
                  }
                }
            },
        {
        "sortRange": {
            "range": {
            "sheetId": sheet_id,
            "startRowIndex": 1,
            "endRowIndex": 600,
            "startColumnIndex": 0,
            "endColumnIndex": 9
        },
        "sortSpecs": [
          {
            "dimensionIndex": 6,
            "sortOrder": "DESCENDING"
          },
          {
            "dimensionIndex": 7,
            "sortOrder": "DESCENDING"
          },
          {
            "dimensionIndex": 0,
            "sortOrder": "ASCENDING"
          }
        ]
      }
    }
          ]

    body = {
        'requests': requests
    }
    response = SS_SERVICE.spreadsheets().batchUpdate(spreadsheetId=spreadsheet_id, body=body).execute()
    print('{0} cells updated.'.format(len(response.get('replies'))));
Back to top



Step 3: Run the script and check the results.

Make sure to add these methods below your script:

def main():

    # Step 1: Collects information about your FOLDER_ID, Writes to SS_ID
    getAllFolders(FOLDER_ID,SS_ID)

    # Step 2: Applies conditional formatting and sorts your Spreadsheet (SS_ID)
    colorAndSort(SS_ID)

if __name__ == '__main__':  
    main()

Your variables should be set also, just like this.

Once done, run your Python script and see the results in your terminal:

Feel free to copy and customize this script as you see fit!

Hope this helps you! - EM

Back to top