Anyone can just google the nearest airport of any area. But can you do that for more than 40,000 locations? Here's where Python can help. Sample data covers all airports and zipcodes from the United States.

Python is not just useful in building web applications and automating workflows, but also for gaining business insights too.

I work in analytics for Sales. Part of the job is to help sales teams find strategic locations to focus their selling energies on.

One criteria of our search is accessibility: Can a sales rep travel to a target location with ease? I calculated the nearest airports of each US zipcode to find out.

The idea is: the location is more accessible if there's an airport nearby.



I created a Python script that calculates the nearest airports of all 40,943 US zipcodes using airport and zipcode data that are available for public use.

I used the Haversine formula in calculating the nearest distance.

Data Sources

  1. World Airports - I got my data from
  2. Zipcodes - This dataset is the most recent one I found: AggData.con


Here's the Git repository: Github
But to understand my logic, read on...

Required Python Packages

import os  
import numpy as np  
import pandas as pd  
from math import cos, asin, sqrt  
import csv  
from pathlib import Path  
from timeit import default_timer as timer  
from datetime import datetime

I mainly use numpy and pandas to clean and filter our data sources. The math package is used for the actual calculation. Os, csv and pathlib to write the output. The rest for logging.

Back to top

Step 1: Clean Data Sources

For data cleaning, I used pandas to do the following:

  1. Create two dataframes, one for each data source
  2. Remove unnecessary columns
  3. Only include large, medium and small airports that are in the United States (Assigned the filtered result to another dataframe)
  4. Create another column based on existing column.
df_airports = pd.read_csv(FILE_AIRPORTS,encoding = "ISO-8859-1")  
columns_to_drop = ['elevation_ft', 'scheduled_service', 'gps_code',  
       'home_link', 'wikipedia_link', 'keywords', 'score',
df_airports.drop(columns_to_drop, axis=1, inplace=True)

df_airports_filter = df_airports[(df_airports['iso_country']=='US') & (df_airports['type'].isin(['large_airport','medium_airport','small_airport']))]

df_airports_filter = df_airports_filter.copy()  
df_airports_filter.loc[:,'iso_state'] = df_airports_filter['iso_region'].str.split('-').str[1]

df_zipcodes = pd.read_csv(FILE_ZIPCODES,encoding = "ISO-8859-1")

I also created helper methods to retrieve data that I will need later:

def getAllStates():  
    return df_airports_filter['iso_state'].unique()

def getAirports(state):  
    df = df_airports_filter[df_airports_filter['iso_state']==state]
    return df.to_dict('records')

def getZipcodes(state):  
    df = df_zipcodes[(df_zipcodes['State Abbreviation']==state)]
    return df.to_dict('records')

def getInfo(state):  
    print(len(getAirports(state)), "airports in", state)
    print(len(getZipcodes(state)), "zipcode in", state)
Back to top

Step 2: Calculate Nearest Airport by Zipcode

The Haversine formula is one way of calculating the distance between two points: the latitude-longtitude of the zipcode and airport.

def distance(lat1, lon1, lat2, lon2):  
    p = 0.017453292519943295  #Pi/180
    a = 0.5 - cos((lat2-lat1)*p)/2 + cos(lat1*p)*cos(lat2*p) * (1-cos((lon2-lon1)*p)) / 2
    return 12742 * asin(sqrt(a)) #2*R*asin..

For each zipcode, the script will calculate its distance to 14,693 airports in the US. To get the nearest airport, here's the method:

def closest(data, zipcode):  
    dl = []
    for p in data:
        ap = {
        'zipcode': zipcode['Zip Code'],
        'country': zipcode['Country'],
        'state': zipcode['State Abbreviation'],
        'state_full': zipcode['State'],
        'county': zipcode['County'],
        'latitude-zip': zipcode['Latitude'],
        'longitude-zip': zipcode['Longitude'],
        'nearest-airport': p['ident'],
        'latitude-air': p['latitude_deg'],
        'longitude-air': p['longitude_deg'],
        'distance': distance(zipcode['Latitude'],zipcode['Longitude'],p['latitude_deg'],p['longitude_deg'])
    dl_sorted = sorted(dl, key=lambda k: k['distance'])

    writeZipsToCSV(dl_sorted,zipcode['State Abbreviation'],zipcode['Zip Code'])
   return dl_sorted[0]

The closest method returns the calculation with the shortest distance (return dl_sorted[0])

 writeZipsToCSV(dl_sorted,zipcode['State Abbreviation'],zipcode['Zip Code']

To validate my assumption, I also opted to print out all the calculated airport distances of each zipcode.

def writeZipsToCSV(dl_sorted,state,zipcode):  
    output_folder = "Output/"+state+"/"
    if not os.path.exists(output_folder):

    with open(output_folder+str(zipcode)+"_all airports.csv","w") as csv_file:
            dict_writer = csv.DictWriter(csv_file, dl_sorted[0].keys())
Back to top

Step 3: Loop calculation for all zipcodes in a state

For each state, the script will do the ff:

  1. Retrieve its zipcodes from the declared dataframe: zipcodes = getZipcodes(state)
  2. Retrieve the nearest airport for each zipcode in a dictionary: dicts.append(closest(getAirports(state), zc))
  3. Write this dictionary in a csv file.
entries = 0;  
i =  
timestamp = i.strftime('%Y-%m%d-')

def calculateNearestAirport(state):  
    global entries
        zipcodes = getZipcodes(state)
        dicts = []
        print("Calculating for",state,"with", len(zipcodes), "zipcodes...")
        for zc in zipcodes:
            dicts.append(closest(getAirports(state), zc))

        with open("Output/"+timestamp+state+"_nearest_airport.csv","w") as csv_file:
            dict_writer = csv.DictWriter(csv_file, dicts[0].keys())

        entries = entries + len(zipcodes)
        print("Done calculating for ", len(zipcodes), "zipcodes of", state)
Back to top

Step 4: Run script in terminal

In the first line, I defined the included states in the calculation. For this example, the script will only include the first entry, which is California.

I also included the timer() function to measure performance.

states_scope = getAllStates()[0:1]

perf_time = []

    start = timer()
    print("Calculating for the following states: ")
    [print (x) for x in states_scope]
    for state in states_scope:

        start_state = timer()
        end_state = timer()
        diff = (end_state-start_state)
        'state': state,
        'duration': round(diff/60,3)

    end = timer()
    print(round((end - start)/60,3), "minutes")
    print(len(perf_time), "states")
    print(entries, "zipcodes")
    for k in perf_time:

Here's how it looks like in the terminal:

It took the script around 1 minute to calculate for 1 state with 2590 zipcodes. Not bad, compared to googling those zipcodes one by one!

Back to top

Finale: Code base

Remember to update the inputs to your folder destination:

Back to top

Sample Data: Beverly Hills, LA

Now, let's contextualize our script to actual business data.

Out of 558 airports in California, what's the nearest airport to Beverly Hills, LA?

zipcode country state state_full county latitude-zip longitude-zip
90210 US CA California Los Angeles 34.0901 -118.4065

Based on the script, the nearest airport to 90210 is: Santa Monica Municipal Airport (KSMO)

nearest-airport latitude-air longitude-air distance (km)
KSMO 34.01580048 -118.4509964 9.222835746

Let's validate the model by plotting in Google Maps:

Nearest Airport

  • The black line indicates the distance of 9.20 km from 90210 to the airport, which is close to 9.22km!

  • Note that the formula doesn't consider the actual roads in the location. Haversine simply calculates the distance from point A to point B.

Now, here's the second nearest airport: Bob Hope Airport (KBUR)

nearest-airport latitude-air longitude-air distance (km)
KBUR 34.20069885 -118.3590012 13.05176636

Nearest Airport

  • Distance based on Haversine: 13.05 km
  • Distance based on Google Maps: 13.03 km
Back to top