GeoJSON and KML data for the United States
I had a devil of a time finding simple GeoJSON and KML boundary files for US counties and states. Eventually I realized that I could get shapefiles from the United States Census Cartographic Boundary Files and convert them to GeoJSON and KML formats using the MyGeoData vector converter.
The result is the following set of boundary files. Since copyright protection is not available for any work of the United States Government, you should all be free to use for any purpose. The Census Bureau does request to be cited as a source.
These files are available in various resolutions and are all derived from the 2010 census. The 500k files are the most detailed, but also the largest. The 20m files are the smallest, but at the cost of some dramatic simplification. The 5m files fall somewhere between the other two.
|US Outline||SHP, KML, GeoJSON||SHP, KML, GeoJSON||SHP, KML, GeoJSON|
|US States||SHP, KML, GeoJSON||SHP, KML, GeoJSON||SHP, KML, GeoJSON|
|US Counties||SHP, KML, GeoJSON||SHP, KML, GeoJSON||SHP, KML, GeoJSON|
|US Congressional (see note)||SHP, KML, GeoJSON||SHP, KML, GeoJSON|
You can also look at this example of how to use the files.
I recieved the following warning from Carole MacDonald in October 2020:
The datasets are slightly out of date. Wade Hammond County, AK is now Kusilvak Census Area (0500000US02270) and Shannon County, SD is now Oglala Lakota County (0500000US46102).
I recieved the following notebook from Ali Ebrahim in April 2020:
Thanks for your helpful guide. Unfortunately, I wasn’t able to parse the counties file listed due to that encoding issue mentioned, so I wrote a colab which will also do the parsing.
I received an update of the link to the US Census Bureau files from Dan Raney in January 2020. The original link (https://www.census.gov/geo/maps-data/data/tiger-cart-boundary.html) now leads to a 404. Thanks, Dan!
I received the following note from Nick C in November 2019:
First of all, thank you! Thank you for providing the GeoJSON files for the US (states/counties). I am using them for my course project and they have saved me a ton of time!
As a courtesy, I did want to let you know that it appears the “US Counties” 500k GeoJSON file might have an incorrect encoding… when trying to load the file using the geopandas “read_file()” function I get the following error:
Traceback (most recent call last): File "shapely_scratch.py", line 25, in <module> us_county = gpd.read_file(us_county_path, driver='GeoJSON') File "/home/christnp/.local/lib/python3.6/site-packages/geopandas/io/file.py", line 95, in read_file gdf = GeoDataFrame.from_features(f_filt, crs=crs, columns=columns) File "/home/christnp/.local/lib/python3.6/site-packages/geopandas/geodataframe.py", line 294, in from_features for f in features_lst: File "fiona/ogrext.pyx", line 1369, in fiona.ogrext.Iterator.__next__ File "fiona/ogrext.pyx", line 232, in fiona.ogrext.FeatureBuilder.build TypeError: startswith first arg must be bytes or a tuple of bytes, not str
Upon further investigation (i.e., using the json package to load the json file), we can see that it is an ecoding issue:
json.load(open(us_county_path)) Traceback (most recent call last): File "shapely_scratch.py", line 24, in <module> json.load(open(us_county_path)) File "/usr/lib/python3.6/json/__init__.py", line 296, in load return loads(fp.read(), File "/usr/lib/python3.6/codecs.py", line 321, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf1 in position 13337350: invalid continuation byte
For now to overcome this error, I simply add the following to a try/except
cur_json = json.load(open(us_county_path, encoding='ISO-8859-1')) path,ext = os.path.splitext(us_county_path) new_path =path+"_new"+ext with open(new_path,"w", encoding='utf-8') as jsonfile: json.dump(cur_json,jsonfile,ensure_ascii=False) us_county = gpd.read_file(new_path, driver='GeoJSON')
Hope this is helpful :-)
I received the following note from Peter T in February 2016:
I had to do a little housekeeping before JSON.parse() would correctly parse the data I’m using (with up-to-date Safari), even though the file passed a JSON validator. I want to pass on what I did. I’m using only data for the state of Georgia from the 20m file. I did two things:
- removed new-line characters, \n, between each of the counties, and
- removed several (around twenty) extra sets of square brackets, [ ], within the county coordinates vectors.
JSON.parse() now seems to work fine on the Georgia numbers. The attached (CleanGeorgia.txt) txt file is the cleaned version that I am using.
I received the following pointer to an exciting new tool in July 2015:
Your page helped me. Also, Matthew Bloch is putting a lot of effort into his Mapshaper program. I was able to use that to extract State boundaries from the USGS data for US and also to drop all the islands I was not interested in.
I received the following observation from MR in November 2013, so stay alert. I make no promises about the accuracy of these files, I just used the conversion tools listed above.
Was using these, gratefully, but just noticed that California districts are not accurate for 113th Congress. For instance, in gz_2010_us_500_11_20m.json look in the northern part of the state. Not sure of accuracy other states.
I believe the issue is that congressional redistricting from the census is not fully reflected in the 2010 files. If you are depending on congressional boundaries, be warned!