Reverse Geocoding Part 1 — Using Boundary Data with GeoJSON

Suppose if you have a set of a set of longitude and latitude pairs, and you want to know the location names such as States, Prefectures, etc. This is called ‘reverse geocoding’, as opposed to ‘geocoding’ which maps geocodes (longitude and latitude) based on the addresses. And, there are a couple of ways to do this. Each option has pros and cons, but today I’m going to show you how to do it by using GeoJSON files.

GeoJSON is a format for encoding a variety of geographic data structures. It is often used to draw a boundary of countries, regions, states, cities and others. In Exploratory, we use those GeoJSON for data visualization.

GeoJSON consists of a set of groups, Japan’s prefectures above. Each group has a set of coordinate info to draw a polygon shape, like a shape of Tokyo prefecture. You can look at the Wikipedia page about the GeoJSON for more detail if you are interested in. Anyway, you can check which boundary a given geocode point (longitude / latitude) is located inside to map it to a corresponding name of the boundary such as Prefecture.

And thanks to R community, yes you can do this with the sf package. I’m going to explain how to do it with Exploratory.

Install sf package

The sf package doesn’t come with the Exploratory, so you need to install it first.

Start Exploratory, and click “R Packages” from the menu.

It opens up the R Package management dialog. Click “Install” tab, and type in “sf” in the text box, and click “Install” button.

If you see the “sf” package in the installed package list, the installation is done and it is ready to use.

Import Data

Here I prepared data about the major flood events in Japan since 1985. It is originally from Dartmouth Flood Observatory’s Global Archive of Large Flood Events. You can visit here, and download the CSV data by choosing “Download CSV” menu.

Once you download the CSV data, go to Exploratory, open a project (or create a new project), clikc the ‘+’ icon next to the “Data Frames” and choose “File Data”.

Create a function to find location names from longitude/latitude data

Once you download and import the data to Exploratory, you can quickly check how the data looks like by clicking “Table” tab. Each row of the data represents a single flood event with date, damage, and where it happened with longitude/latitude data. But there’s no location name. Yes, we need location names like “Tokyo”!

Ok, so here what I’m going to do is, creating a function, that takes longitude/latitude data and returns the prefecture names using Japan prefecture GeoJSON and sf package.

Clikc the “+” button right next to the “Script”.

Give a name “find locations” to the script. Then it opens up the script editor window. Write an R script below, and click “Save” button on upper right.

This script defines a function called “find_pref”. It takes longitude/latitude data and returns the prefecture names. It returns NA if the point doesn’t belong to any of the prefecture shapes. 

In this script, at line 6 (starting with “st_read”), I point the GeoJSON of Prefectures in Japan by URL. If you want to use your own GeoJSON, you can change the URL to point yours. If you don’t find any GeoJSON that you want to use, you can create one by yourself from Shapefile. You can read this article for more detail.


# Load "sf" library
library(sf)

# Load Japan prefecture GeoJSON
.jppref.geojson <- st_read("https://dl.dropboxusercontent.com/s/luj2iy5przp90k5/jp_prefs.geojson")

# Find Prefecture function. It takes logitude/latitude vectors 
# and returns a vector of prefecture names. 
find_pref <- function(lon, lat, sp_polygon = .jppref.geojson) {
  res <- character(0)
  cnt = 1
  # Iterate the vector data one by one
  for(lo in lon){
    la <- lat[cnt]
    # Check whether the point belongs to any of polygons.  
    which.row <- sf::st_contains(sp_polygon, sf::st_point(c(lo, la)), sparse = FALSE) %>%  
      grep(TRUE, .)
    
    # If the point doesn't belong to any polygons, return NA.
    if (identical(which.row, integer(0)) == TRUE) {
      res <- c(res, NA)
    } else {
      d <- sp_polygon[which.row, ]
      res <- c(res, paste0(d$name))
    }
    cnt <- cnt+1
  }
  return (res)
}

Apply the function

Ok, everything is setup and ready. Now this is the fun part, reverse geo-coding with GeoJSON!

Go back to the data frame, click the ‘+’ button and choose “Create Calculation (Mutate)”.

Choose “Create New Column”, set “prefecture” to the New Column Name and type in find_pref(longitude, latitude) at the Calculation. This means “Creating a new column with a name”prefecture" and it is based on the result of find_pref() function“. Click”Run" button if it looks ok.

Then you see a new column “prefecture”, and you see the prefecture names for each event!

Let’s go to the map again and show the location name. Click “Viz” tab to show the map that we created above. Assign “prefecture” column at the “Label”, then hover any of dots. Now you see the prefecture name of the dot along with longitude/latitutde information on upper right.

Try the interactive map below to see how it works on Exploratory.

Loading...

Reverse Geocoding by using GeoJSON is a fast and standalone solution if you have appropriate GeoJSON file on your local machine. But it has some limitations. For example, if you zoom in and hover the dot shown in the screenshot below, you don’t see the location name.

I’m actually simplified this GeoJSON file to reduce the file size. This means that some of the small islands or locations along the coastlines might not be in any of the boundaries. I can use higher resolution GeoJSON to cover such islands or locations, but obviously this will increase the GeoJSON file size, which will make the reverse-geocoding operation slower.

This is when you want to consider using geocoding service like Google MAP API, which I’m going to talk about in the next post. Stay tuned!


This analysis is done by Exploratory, the modern data science tool for non-programmers. If you are interested in, sign up from here for a free trial. If you are a student or teacher it’s free!


R Packages used in this post