To keep track of all your folders and files in Google Drive, you can run a Python script that automatically scrapes information of all the content uploaded in your account. Let me show you how!
This post is part of the Project Pigeon, a series of tutorials that covers using Python to automate the following tasks:
- Split a huge Excel workbook into multiple files: Click here
- Upload Excel files to Google Drive: Part 1 and Part 2
Before proceeding with this tutorial, make sure you have read and understood my previous posts.
I personally use this script right after uploading hundreds of files to GDrive for monitoring purposes. It’s inconvenient to list the uploaded folders one by one.
What you’ll need
- A Google Drive folder that you want to track
- A Google spreadsheet to list your output
Steps
- Step 1. Get all the folders in Google Drive – Click here
- Step 2. Write to Google Spreadsheet and Customize the format – Click here
- Step 3. Run the script and check the result!
Step 1: Get all the folders in Google Drive.
Again before we start, you must know how to connect to Google APIs already, which I discussed in this demo.

Image: Google Drive with many subfolders
First, list down the information you need to extract from GDrive API. In this scenario, I wanted to know the following:
- Folder owner
- Folder name
- Folder Link
- Number of Files in the Folder
- Latest File uploaded
- Latest File upload time

Image: Desired Output in Google Spreadsheet
Retrieve the basic folder information (folder id and name) using this Google API query:
def getAllFolders(folder_id, spreadsheet_id):
folder_id = FOLDER_ID
results = DRIVE_SERVICE.files().list(q="mimeType='application/vnd.google-apps.folder' and parents in '"+folder_id+"' and trashed = false",fields="nextPageToken, files(id, name)",pageSize=400).execute()
...
To determine the number of files in a GDrivefolder, use the countFiles() method:
def countFiles(folder_id):
results = DRIVE_SERVICE.files().list(q="mimeType='application/vnd.google-apps.spreadsheet' and parents in '"+folder_id+"' and trashed = false",fields="nextPageToken, files(id, name)").execute()
items = results.get('files', [])
count = 0
for item in items:
count += 1
return count
To determine the name of latest file uploaded in the GDrive folder, use the getLatestFile() and retrieveName() methods:
def getLatestFile(folder_id):
results = DRIVE_SERVICE.files().list(q="mimeType='application/vnd.google-apps.spreadsheet' and parents in '"+folder_id+"' and trashed = false",fields="nextPageToken, files(id, name, createdTime)", orderBy="createdTime").execute()
items = results.get('files', [])
if not items:
return ""
else:
return items
def retrieveName(items,count):
if not items:
return ""
else:
print(items[count-1]['name'])
return items[count-1]['name']
latestFile = retrieveName(getLatestFile(item['id']),file_count)
Use the following methods to get the upload timestamp of the latest file in a Google Drive folder:
def retrieveCreatedDate(items,count):
if not items:
return ""
else:
print(items[count-1]['createdTime'])
return items[count-1]['createdTime']
createdDate = retrieveCreatedDate(getLatestFile(item['id']),file_count)
Putting everything together results to this method:
def getAllFolders(folder_id, spreadsheet_id):
folder_id = FOLDER_ID
results = DRIVE_SERVICE.files().list(q="mimeType='application/vnd.google-apps.folder' and parents in '"+folder_id+"' and trashed = false",fields="nextPageToken, files(id, name)",pageSize=400).execute()
items = results.get('files', [])
values = []
count = 1
print(len(items))
for item in items:
print(count)
print('{0} ({1})'.format(item['name'], item['id']))
# Sales Rep
person_name = item['name'][CHAR_BEFORE_NAME:].strip()
# Number of Files
file_count = countFiles(item['id'])
latestFileId = '=HYPERLINK("https://docs.google.com/spreadsheets/d/'+retrieveId(getLatestFile(item['id']),file_count)+'","Click")'
# Latest File name and upload time
latestFile = retrieveName(getLatestFile(item['id']),file_count)
createdDate = retrieveCreatedDate(getLatestFile(item['id']),file_count)
print("----", person_name)
print("----", file_count)
print("----", createdDate)
item['person'] = person_name
item['file_count'] = file_count
item['link'] = '=HYPERLINK("https://drive.google.com/drive/folders/'+item['id']+'","Click")'
# Save to a list
values.append([item['person'],item['name'],item['link'],item['id'],item['file_count'],latestFile,latestFileId,createdDate])
count+=1
...
Next step: Writing these information to a Google spreadsheet!
Step 2: Write to Google Spreadsheet and Customize format
Write to Google Spreadsheet
def writeToSheets(values,range,spreadsheet_id):
print("Writing to Sheets")
body = {
'values': values
}
print(values)
SS_SERVICE.spreadsheets().values().update(spreadsheetId=spreadsheet_id,valueInputOption='USER_ENTERED',range=range,body=body).execute()
def getAllFolders(folder_id, spreadsheet_id):
....
writeToSheets(values,TAB_NAME,spreadsheet_id)
Customize Formatting
Google Sheet APIs also allows you to customize the formatting of a Google Spreadsheet automatically.
An example is this colorAndSort() method, which applies conditional formatting and sorts multiple columns:
def colorAndSort(spreadsheet_id):
sheet_id=SHEET_ID
my_range = {
'sheetId': sheet_id,
'startRowIndex': 1,
'endRowIndex': 600,
'startColumnIndex': 4,
'endColumnIndex': 5,
}
requests = [{
'addConditionalFormatRule': {
'rule': {
'ranges': [ my_range ],
'booleanRule': {
'condition': {
'type': 'CUSTOM_FORMULA',
'values': [ { 'userEnteredValue': '=GT($E2,1)'} ]
},
'format': {
'backgroundColor': { 'red': 1, 'green': 1, 'blue': 0 }
}
}
}
}
},
{
"sortRange": {
"range": {
"sheetId": sheet_id,
"startRowIndex": 1,
"endRowIndex": 600,
"startColumnIndex": 0,
"endColumnIndex": 9
},
"sortSpecs": [
{
"dimensionIndex": 6,
"sortOrder": "DESCENDING"
},
{
"dimensionIndex": 7,
"sortOrder": "DESCENDING"
},
{
"dimensionIndex": 0,
"sortOrder": "ASCENDING"
}
]
}
}
]
body = {
'requests': requests
}
response = SS_SERVICE.spreadsheets().batchUpdate(spreadsheetId=spreadsheet_id, body=body).execute()
print('{0} cells updated.'.format(len(response.get('replies'))));
Step 3: Run the script and check the results.
Make sure to add these methods below your script:
def main():
# Step 1: Collects information about your FOLDER_ID, Writes to SS_ID
getAllFolders(FOLDER_ID,SS_ID)
# Step 2: Applies conditional formatting and sorts your Spreadsheet (SS_ID)
colorAndSort(SS_ID)
if __name__ == '__main__':
main()
Your variables should be set also, just like this.

Once done, run your Python script and see the results in your terminal:

Feel free to copy and customize this script as you see fit!
https://gist.github.com/eyana-m/852b9630dcbadf593f72e7d18585d7c6.js
Leave a comment