zipcodeR is an all-in-one toolkit of functions and data for working with ZIP codes in R.

This document will introduce the tools provided by zipcodeR for improving your workflow when working with ZIP code-level data. The goal of these examples is to help you quickly get up and running with zipcodeR using real-world examples.

Basic search functions

First thing’s first: zipcodeR’s data & basic search functions are a core component of the package. We’ll cover these before showing you how you can implement this package with a real-world example.

Data

The package ships with an offline database containing 24 columns of data for each ZIP code. You can either keep all 24 variables or filter to just one of these depending on what data you need.

The columns of data provided are: zipcode, zipcode_type, major_city, post_office_city, common_city_list, county, state, lat, lng, timezone, radius_in_miles, area_code_list, population, population_density, land_area_in_sqmi, water_area_in_sqmi, housing_units, occupied_housing_units, median_home_value, median_household_income, bounds_west, bounds_east, bounds_north, bounds_south

Searching for ZIP codes by state

Let’s begin by using zipcodeR to find all ZIP codes within a given state.

Getting all ZIP codes for a single state is simple, you only need to pass a two-digit abbreviation of a state’s name to get a tibble of all ZIP codes in that state. Let’s start by finding all of the ZIP codes in New York:

## # A tibble: 2,208 × 24
##    zipcode zipcode_type major_city   post_office_city common_city_list county   
##    <chr>   <chr>        <chr>        <chr>                      <blob> <chr>    
##  1 00501   Unique       Holtsville   <NA>                   <raw 22 B> Suffolk …
##  2 00544   Unique       Holtsville   <NA>                   <raw 22 B> Suffolk …
##  3 06390   PO Box       Fishers Isl… Fishers Island,…       <raw 32 B> Suffolk …
##  4 10001   Standard     New York     New York, NY           <raw 20 B> New York…
##  5 10002   Standard     New York     New York, NY           <raw 34 B> New York…
##  6 10003   Standard     New York     New York, NY           <raw 20 B> New York…
##  7 10004   Standard     New York     New York, NY           <raw 37 B> New York…
##  8 10005   Standard     New York     New York, NY           <raw 35 B> New York…
##  9 10006   Standard     New York     New York, NY           <raw 31 B> New York…
## 10 10007   Standard     New York     New York, NY           <raw 20 B> New York…
## # … with 2,198 more rows, and 18 more variables: state <chr>, lat <dbl>,
## #   lng <dbl>, timezone <chr>, radius_in_miles <dbl>, area_code_list <blob>,
## #   population <int>, population_density <dbl>, land_area_in_sqmi <dbl>,
## #   water_area_in_sqmi <dbl>, housing_units <int>,
## #   occupied_housing_units <int>, median_home_value <int>,
## #   median_household_income <int>, bounds_west <dbl>, bounds_east <dbl>,
## #   bounds_north <dbl>, bounds_south <dbl>

What if you only wanted the actual ZIP codes and no other variables? You can use R’s dollar sign operator to select one column at a time from the output of zipcodeR’s search functions:

nyzip <- search_state('NY')$zipcode

Searching multiple states at once

You can also search for ZIP codes in multiple states at once by passing a vector of state abbreviations to the search_states function like so:

states <- c('NY','NJ','CT')

search_state(states)
## # A tibble: 3,378 × 24
##    zipcode zipcode_type major_city  post_office_city  common_city_list county   
##    <chr>   <chr>        <chr>       <chr>                       <blob> <chr>    
##  1 06001   Standard     Avon        Avon, CT                <raw 16 B> Hartford…
##  2 06002   Standard     Bloomfield  Bloomfield, CT          <raw 22 B> Hartford…
##  3 06006   Unique       Windsor     <NA>                    <raw 19 B> Hartford…
##  4 06010   Standard     Bristol     Bristol, CT             <raw 19 B> Hartford…
##  5 06011   PO Box       Bristol     <NA>                    <raw 19 B> Hartford…
##  6 06013   Standard     Burlington  Burlington, CT          <raw 36 B> Hartford…
##  7 06016   Standard     Broad Brook Broad Brook, CT         <raw 46 B> Hartford…
##  8 06018   Standard     Canaan      Canaan, CT              <raw 18 B> Litchfie…
##  9 06019   Standard     Canton      Canton, CT              <raw 34 B> Hartford…
## 10 06020   Standard     Canton Cen… Canton Center, CT       <raw 25 B> Hartford…
## # … with 3,368 more rows, and 18 more variables: state <chr>, lat <dbl>,
## #   lng <dbl>, timezone <chr>, radius_in_miles <dbl>, area_code_list <blob>,
## #   population <int>, population_density <dbl>, land_area_in_sqmi <dbl>,
## #   water_area_in_sqmi <dbl>, housing_units <int>,
## #   occupied_housing_units <int>, median_home_value <int>,
## #   median_household_income <int>, bounds_west <dbl>, bounds_east <dbl>,
## #   bounds_north <dbl>, bounds_south <dbl>

This results in a tibble containing all ZIP codes for the states passed to the search_states() function.

Searching by county

It is also possible to search for ZIP codes located in a particular county within a state.

Let’s find all of the ZIP codes located within Ocean County, New Jersey:

search_county('Ocean','NJ')
## # A tibble: 32 × 24
##    zipcode zipcode_type major_city   post_office_city  common_city_list county  
##    <chr>   <chr>        <chr>        <chr>                       <blob> <chr>   
##  1 08005   Standard     Barnegat     Barnegat, NJ            <raw 20 B> Ocean C…
##  2 08006   PO Box       Barnegat Li… Barnegat Light, …       <raw 33 B> Ocean C…
##  3 08008   Standard     Beach Haven  Beach Haven, NJ         <raw 61 B> Ocean C…
##  4 08050   Standard     Manahawkin   Manahawkin, NJ          <raw 47 B> Ocean C…
##  5 08087   Standard     Tuckerton    Tuckerton, NJ           <raw 51 B> Ocean C…
##  6 08092   Standard     West Creek   West Creek, NJ          <raw 22 B> Ocean C…
##  7 08527   Standard     Jackson      Jackson, NJ             <raw 19 B> Ocean C…
##  8 08533   Standard     New Egypt    New Egypt, NJ           <raw 21 B> Ocean C…
##  9 08701   Standard     Lakewood     Lakewood, NJ            <raw 20 B> Ocean C…
## 10 08721   Standard     Bayville     Bayville, NJ            <raw 20 B> Ocean C…
## # … with 22 more rows, and 18 more variables: state <chr>, lat <dbl>,
## #   lng <dbl>, timezone <chr>, radius_in_miles <dbl>, area_code_list <blob>,
## #   population <int>, population_density <dbl>, land_area_in_sqmi <dbl>,
## #   water_area_in_sqmi <dbl>, housing_units <int>,
## #   occupied_housing_units <int>, median_home_value <int>,
## #   median_household_income <int>, bounds_west <dbl>, bounds_east <dbl>,
## #   bounds_north <dbl>, bounds_south <dbl>

Approximate matching of county names

Sometimes working with county names can be messy and there might not be a 100% match between our database and the name. The search_county() function can be configured to use base R’s agrep function for these cases via an optional parameter.

One example where this feature is useful comes from the state of Louisiana. Since Louisiana has parishes, their county names don’t line up exactly with how other states name their counties.

This example uses approxmiate matching to retrieve all ZIP codes for St. Bernard Parish in Louisiana:

search_county("ST BERNARD","LA", similar = TRUE)$zipcode
## [1] "70032" "70043" "70044" "70075" "70085" "70092"

Try running the above code with the similar parameter set to FALSE or not present and you’ll receive an error.

Finding out more about your ZIP codes

What if you already have a dataset containing ZIP codes and want to find out more about that particular area?

Using the reverse_zipcode() function, we can get up to 24 more columns of data when given a ZIP code.

Data: U.S. Real Estate Market

To explore how zipcodeR can enhance your data & workflow, we will use a public dataset from the National Association of Realtors containing data about housing market trends in the United States.

This dataset, which is updated monthly, contains 10833 observations with current housing market data from the National Association of Realtors hosted on Amazon S3

This is what the data we will be working with looks like:

head(real_estate_data)
## # A tibble: 6 × 40
##   month_date_yyyymm postal_code zip_name          flag  median_listing_… median_listing_…
##   <chr>             <chr>       <chr>             <chr>            <dbl>            <dbl>
## 1 202108            47803       terre haute, in   <NA>           219990            0.0732
## 2 202108            48617       clare, mi         <NA>           187775            0.0438
## 3 202108            98902       yakima, wa        *              246500           -0.0592
## 4 202108            32112       crescent city, fl <NA>           280000           -0.0175
## 5 202108            11764       miller place, ny  <NA>           607442.          -0.0295
## 6 202108            48317       utica, mi         <NA>           257400            0.03  
## # … with 34 more variables: median_listing_price_yy <dbl>,
## #   active_listing_count <dbl>, active_listing_count_mm <dbl>,
## #   active_listing_count_yy <dbl>, median_days_on_market <dbl>,
## #   median_days_on_market_mm <dbl>, median_days_on_market_yy <dbl>,
## #   new_listing_count <dbl>, new_listing_count_mm <dbl>,
## #   new_listing_count_yy <dbl>, price_increased_count <dbl>,
## #   price_increased_count_mm <dbl>, price_increased_count_yy <dbl>, …

Note: The data used in this vignette was filtered to only include valid 5-digit ZIP codes as zipcodeR does not yet have a function for normalizing ZIP codes. The full Realtor dataset will have a different number of rows.

We’ll focus on the first row for now, which represents the town of Terre Haute, In.

real_estate_data[1,]
## # A tibble: 1 × 40
##   month_date_yyyymm postal_code zip_name        flag  median_listing_… median_listing_…
##   <chr>             <chr>       <chr>           <chr>            <dbl>            <dbl>
## 1 202108            47803       terre haute, in <NA>            219990           0.0732
## # … with 34 more variables: median_listing_price_yy <dbl>,
## #   active_listing_count <dbl>, active_listing_count_mm <dbl>,
## #   active_listing_count_yy <dbl>, median_days_on_market <dbl>,
## #   median_days_on_market_mm <dbl>, median_days_on_market_yy <dbl>,
## #   new_listing_count <dbl>, new_listing_count_mm <dbl>,
## #   new_listing_count_yy <dbl>, price_increased_count <dbl>,
## #   price_increased_count_mm <dbl>, price_increased_count_yy <dbl>, …

The Realtor dataset contains a column named postal_code containing the ZIP code that identifies the town. We’ll use this to find out more about Terre Haute than what is provided in the housing market data.

Relating ZIP codes to Census data

You may also be interested in relating data at the ZIP code level to Census data. zipcodeR currently provides a function for getting all Census tracts when provided with a 5-digit ZIP code.

Let’s find out how many Census tracts are in the ZIP code from the previous example.

get_tracts(zip_code)
## # A tibble: 11 × 3
##    ZCTA5 TRACT        GEOID
##    <chr> <chr>        <dbl>
##  1 47803 000400 18167000400
##  2 47803 000700 18167000700
##  3 47803 001300 18167001300
##  4 47803 001400 18167001400
##  5 47803 001500 18167001500
##  6 47803 001600 18167001600
##  7 47803 001700 18167001700
##  8 47803 010100 18167010100
##  9 47803 010600 18167010600
## 10 47803 010701 18167010701
## 11 47803 010702 18167010702

Now that you have all of the tracts for this ZIP code, it would be very easy to join this with other Census data, such as that which is available from the American Community Survey and other sources.

But ZIP codes alone are not terribly useful for social science research since they are only meant to represent USPS service areas. The Census Bureau has established ZIP code tabulation areas (ZCTAs) that provide a representation of ZIP codes and can be used for joining with Census data. But not every ZIP code is also a ZCTA.

Testing if a ZIP code is a ZCTA

zipcodeR provides a function for testing if a given ZIP code is also a ZIP code tabulation area. When provided with a vector of 5-digit ZIP codes the function will return TRUE or FALSE based upon whether the ZIP code is also a ZCTA.

is_zcta(zip_code)
## [1] TRUE