library(osmextract)
library(sf)
library(mapview)
# 1. Setup Environment ----------------------------------------------------
# Define where large .pbf files should live to avoid re-downloading
download_dir <- "~/data/raw/OSM"
if(!dir.exists(download_dir)) dir.create(download_dir, recursive = TRUE)
Sys.setenv(OSMEXT_DOWNLOAD_DIRECTORY = download_dir)
# 2. Define the Region ----------------------------------------------------
# Let's target a smaller region (e.g., Rhode Island) for testing,
# rather than the whole US ('us-latest.osm.pbf') which is several GBs.
target_region <- "Rhode Island"
# 3. Extract and Translate ------------------------------------------------
# oe_get matches the place name, downloads the PBF, and converts to GPKG.
# We use SQL query logic to filter *during* the translation to save space.
bridges <- oe_get(
place = target_region,
layer = "lines",
# SQL query: Look for lines where the 'bridge' tag is not NULL or 'no'
query = "SELECT * FROM lines WHERE bridge IS NOT NULL AND bridge != 'no'",
quiet = FALSE
)
# 4. Inspection -----------------------------------------------------------
head(bridges)
# Check unique bridge types extracted
table(bridges$bridge)
# 5. Visualization --------------------------------------------------------
mapview(bridges, zcol = "bridge", legend = TRUE)
# 6. Save processed subset ------------------------------------------------
st_write(bridges, file.path(download_dir, "RI_bridges_processed.gpkg"), delete_layer = TRUE)OpenStreetMaps
Back to Data Sources
Open Street Maps
https://planet.openstreetmap.org/ https://towardsdatascience.com/how-to-read-osm-data-with-duckdb-ffeb15197390
The OpenStreetMap (OSM), and in particular the dataset underlying the map, is a collaborative effort to create a free and editable map of the world. It’s efforts aim to capture the geographic position and assosiated metadata of locations and assets like buildings and roads contributed by volunteers. Novel or otherwise noteworthy aspects of this dataset include it’s open nature and reuse license (Anyone can access and use the data under the Open Data Commons Open Database License (ODbL)). It also has global coverage and covers a number of different asset types, as opposed to something like OpenAddresses which is more US-centric and oriented towards address points. There are a number of ways to access this data including:
Here I’ll demo how to pull bridges out of OSM using osmextract from scratch:
:::
OpenStreetMap (OSM) is a collaborative project to create a free, editable map of the world. It is arguably the most prominent example of Volunteered Geographic Information (VGI).
The dataset captures geographic positions and associated metadata of assets ranging from highways and buildings to hiking trails and coffee shops. Unlike proprietary datasets, OSM is defined by its:
- Open License: Data is available under the Open Data Commons Open Database License (ODbL).
- Global Coverage: A single schema applied worldwide (though tagging conventions can vary locally).
- Richness: It often contains data layer types (pedestrian paths, cycleways) that commercial providers ignore.
Data Structure
To effectively use OSM in R, it helps to understand the three core elements: 1. Nodes: Points in space (coordinates). Used for POIs or defining the shape of lines. 2. Ways: Ordered lists of nodes. Open ways are lines (roads, streams); closed ways are polygons (buildings, parks). 3. Relations: Groups of nodes, ways, or other relations. Used for complex logical connections (e.g., a bus route consisting of many road segments).
The R Ecosystem for OSM
While there are many tools to access OSM, the R ecosystem provides several targeted packages:
| Package | Best For |
|---|---|
| osmextract | High Volume. Downloading and reading large .pbf files (e.g., whole states or countries). It uses GDAL/SQL logic to keep memory usage low. |
| osmdata | Precision. querying the Overpass API for small, specific datasets (e.g., “all cafes in downtown Seattle”). |
| OSMnx | Network Analysis. A Python library, but often used alongside R via reticulate for topology cleaning and street network analysis. |
Transform to spatlite
```{r}
in_pbf <- "/home/rstudio/data/raw/OSM/us-latest.osm.pbf"
out_dbase <- '/home/rstudio/data/raw/OSM/us_latest_spatialite.sqlite'
full_command <- paste('ogr2ogr -f SQLite', shQuote(out_dbase), shQuote(in_pbf), "-dsco SPATIALITE=YES -lco SPATIAL_INDEX=YES -gt 65536 --config OSM_MAX_TMPFILE_SIZE 4096 -oo USE_CUSTOM_INDEXING=YES -oo COMPRESS_NODES=YES")
system(full_command)
```Workflow: Extracting Bridges with osmextract
The following workflow demonstrates how to download a specific region, extract vector data tagged as a bridge, and load it as a Simple Features (sf) object.
Extraction Pipeline
:::
DuckDB Integration
As mentioned in Towards Data Science, DuckDB is increasingly popular for querying OSM data because it can query the .osm.pbf files or resulting parquet files directly without loading everything into RAM. You can use the duckdb R client to run SQL queries on the geopackages created by osmextract:
library(duckdb)
library(DBI)
con <- DBI::dbConnect(duckdb::duckdb())
# Query the GeoPackage directly using DuckDB's spatial extension - Note: Requires loading spatial extension in DuckDB
dbExecute(con, "INSTALL spatial; LOAD spatial;")
result <- dbGetQuery(con, "SELECT * FROM ST_Read('~/data/raw/OSM/us-latest.gpkg')