Skip to content

Diff: architecture/regional-name-resolution.es

From d70ca2d to d70ca2d

+0 / −0 lines
BeforeAfter
--- ---
schema: foundry-doc-v1 schema: foundry-doc-v1
title: "Regional Name Resolution Architecture" title: "Regional Name Resolution Architecture"
slug: regional-name-resolution slug: regional-name-resolution
category: architecture category: architecture
type: concept type: concept
quality: complete quality: complete
status: active status: active
audience: vendor-public audience: vendor-public
bcsc_class: no-disclosure-implication bcsc_class: no-disclosure-implication
language_protocol: PROSE-TOPIC language_protocol: PROSE-TOPIC
last_edited: 2026-05-31 last_edited: 2026-05-31
editor: pointsav-engineering editor: pointsav-engineering
paired_with: regional-name-resolution.es.md paired_with: regional-name-resolution.es.md
short_description: "How co-location cluster centroids are resolved to colloquial place names using TIGER 2023, GISCO LAU 2021, GADM GBR, and a 12-entry Canadian Nominatim override list." short_description: "How co-location cluster centroids are resolved to colloquial place names using TIGER 2023, GISCO LAU 2021, GADM GBR, and a 12-entry Canadian Nominatim override list."
cites: [] cites: []
--- ---
Co-location clusters are identified by geometry — a set of latitude/longitude Co-location clusters are identified by geometry — a set of latitude/longitude
coordinates derived from OpenStreetMap point-of-interest records. Geometry does coordinates derived from OpenStreetMap point-of-interest records. Geometry does
not have a name. Giving a cluster a useful, human-recognisable name requires a not have a name. Giving a cluster a useful, human-recognisable name requires a
separate resolution step that matches the cluster's centroid against authoritative separate resolution step that matches the cluster's centroid against authoritative
place-name datasets. This article describes how that resolution works, why it is place-name datasets. This article describes how that resolution works, why it is
necessary, and where it can fail. necessary, and where it can fail.
## The Problem with Administrative Boundaries ## The Problem with Administrative Boundaries
OpenStreetMap and Wikidata organise geography into administrative hierarchies: OpenStreetMap and Wikidata organise geography into administrative hierarchies:
country, region, county, municipality. These hierarchies are legally and country, region, county, municipality. These hierarchies are legally and
politically defined. They do not always correspond to the names that residents, politically defined. They do not always correspond to the names that residents,
businesses, and market researchers use to describe a place. businesses, and market researchers use to describe a place.
Consider a cluster of retail co-locations in the community of Sherwood Park, Consider a cluster of retail co-locations in the community of Sherwood Park,
Alberta. Sherwood Park is an unincorporated community within Strathcona County. Alberta. Sherwood Park is an unincorporated community within Strathcona County.
Its OSM administrative boundary is the county — *Strathcona County* — not the Its OSM administrative boundary is the county — *Strathcona County* — not the
community. An algorithm that resolves cluster names solely from administrative community. An algorithm that resolves cluster names solely from administrative
boundaries would label this cluster "Strathcona County," a name that conveys boundaries would label this cluster "Strathcona County," a name that conveys
almost nothing to a researcher studying suburban retail patterns in the Edmonton almost nothing to a researcher studying suburban retail patterns in the Edmonton
metropolitan area. The name "Sherwood Park" is what the community, its retailers, metropolitan area. The name "Sherwood Park" is what the community, its retailers,
and its residents use. It is what a Regional Market TOPIC article should be titled. and its residents use. It is what a Regional Market TOPIC article should be titled.
This disconnect between legal administrative geography and colloquial place names This disconnect between legal administrative geography and colloquial place names
is not an edge case. It appears wherever unincorporated communities, census is not an edge case. It appears wherever unincorporated communities, census
subdivisions, and historical town names persist alongside newer county or borough subdivisions, and historical town names persist alongside newer county or borough
structures. The resolution architecture exists to bridge that gap. structures. The resolution architecture exists to bridge that gap.
## Boundary Datasets ## Boundary Datasets
Four datasets supply place-name candidates, each covering a different part of Four datasets supply place-name candidates, each covering a different part of
the geographic scope of the platform. the geographic scope of the platform.
**TIGER 2023 (United States).** The US Census Bureau's Topologically Integrated **TIGER 2023 (United States).** The US Census Bureau's Topologically Integrated
Geographic Encoding and Referencing (TIGER) dataset provides place boundaries for Geographic Encoding and Referencing (TIGER) dataset provides place boundaries for
the United States. The 2023 vintage includes approximately 32,000 named places: the United States. The 2023 vintage includes approximately 32,000 named places:
incorporated cities and towns, census-designated places (CDPs), and some incorporated cities and towns, census-designated places (CDPs), and some
unincorporated communities with recognised names. TIGER places are the primary unincorporated communities with recognised names. TIGER places are the primary
resolution source for all US clusters. resolution source for all US clusters.
**GISCO LAU 2021 (European Union and associated countries).** The European **GISCO LAU 2021 (European Union and associated countries).** The European
Commission's Geographic Information Services for the Commission of the EU Commission's Geographic Information Services for the Commission of the EU
(GISCO) publishes Local Administrative Unit (LAU) boundaries derived from (GISCO) publishes Local Administrative Unit (LAU) boundaries derived from
NUTS (Nomenclature of Territorial Units for Statistics). The 2021 vintage covers NUTS (Nomenclature of Territorial Units for Statistics). The 2021 vintage covers
approximately 98,600 municipalities across EU member states and neighbouring approximately 98,600 municipalities across EU member states and neighbouring
countries participating in the Eurostat framework. LAU boundaries are the primary countries participating in the Eurostat framework. LAU boundaries are the primary
resolution source for EU clusters in Germany, France, Spain, Italy, Poland, the resolution source for EU clusters in Germany, France, Spain, Italy, Poland, the
Netherlands, Austria, Portugal, Greece, Sweden, Denmark, Finland, and Norway. Netherlands, Austria, Portugal, Greece, Sweden, Denmark, Finland, and Norway.
**GADM GBR (United Kingdom).** The Global Administrative Areas (GADM) database **GADM GBR (United Kingdom).** The Global Administrative Areas (GADM) database
provides sub-national boundary data for countries not covered by GISCO. For the provides sub-national boundary data for countries not covered by GISCO. For the
United Kingdom, GADM provides administrative level 3 boundaries (parishes and United Kingdom, GADM provides administrative level 3 boundaries (parishes and
wards in England; communities in Wales; civil parishes in Scotland). These wards in England; communities in Wales; civil parishes in Scotland). These
provide finer-grained name candidates than the LAU-equivalent level 2 districts. provide finer-grained name candidates than the LAU-equivalent level 2 districts.
**Nominatim overrides (Canada).** Canada presents a particular challenge because **Nominatim overrides (Canada).** Canada presents a particular challenge because
census subdivisions (CSDs) — the standard administrative unit — sometimes cover census subdivisions (CSDs) — the standard administrative unit — sometimes cover
large geographic areas that contain multiple distinct communities with different large geographic areas that contain multiple distinct communities with different
names. Twelve manual override entries in `ca_places_nominatim.json` provide names. Twelve manual override entries in `ca_places_nominatim.json` provide
canonical place names for cases where the CSD name would be misleading. Sherwood canonical place names for cases where the CSD name would be misleading. Sherwood
Park (Strathcona County CSD) is one of these twelve overrides. Park (Strathcona County CSD) is one of these twelve overrides.
## Resolution Logic ## Resolution Logic
For each cluster centroid, the resolution algorithm proceeds as follows: For each cluster centroid, the resolution algorithm proceeds as follows:
*Name match.* The algorithm first checks whether the cluster's constituent *Name match.* The algorithm first checks whether the cluster's constituent
retail locations carry a consistent `addr:city` or `addr:suburb` tag in OSM. retail locations carry a consistent `addr:city` or `addr:suburb` tag in OSM.
If a majority of member records agree on a place name, that name is taken as a If a majority of member records agree on a place name, that name is taken as a
candidate without consulting boundary datasets. candidate without consulting boundary datasets.
*Boundary containment.* If no OSM tag consensus exists, the centroid is tested *Boundary containment.* If no OSM tag consensus exists, the centroid is tested
for containment against the applicable boundary dataset. The smallest-area for containment against the applicable boundary dataset. The smallest-area
polygon that contains the centroid is selected. Its name field becomes the polygon that contains the centroid is selected. Its name field becomes the
resolution candidate. resolution candidate.
*Administrative level fallback.* If no polygon at the preferred administrative *Administrative level fallback.* If no polygon at the preferred administrative
level contains the centroid — which can occur near coast lines, in disputed level contains the centroid — which can occur near coast lines, in disputed
areas, or for clusters near the edge of dataset coverage — the algorithm steps areas, or for clusters near the edge of dataset coverage — the algorithm steps
up to the next administrative level and repeats the containment test. up to the next administrative level and repeats the containment test.
*Override application.* After the initial candidate is identified, the algorithm *Override application.* After the initial candidate is identified, the algorithm
checks the candidate name against the override list. For Canada, if the resolved checks the candidate name against the override list. For Canada, if the resolved
CSD name matches one of the twelve known problematic names, the override supplies CSD name matches one of the twelve known problematic names, the override supplies
the correct colloquial name. the correct colloquial name.
## Why Canonical Names Matter ## Why Canonical Names Matter
The resolved name is not merely a display label. It is the primary identifier The resolved name is not merely a display label. It is the primary identifier
used in the Regional Markets scoring system. A cluster's resolved name determines used in the Regional Markets scoring system. A cluster's resolved name determines
which metro-distance calculation applies to it: the scoring system looks up the which metro-distance calculation applies to it: the scoring system looks up the
canonical metro reference list using the resolved name to determine whether a canonical metro reference list using the resolved name to determine whether a
cluster belongs to a metro core, a suburban ring, or a standalone secondary cluster belongs to a metro core, a suburban ring, or a standalone secondary
market. An incorrect resolution — labelling Sherwood Park as Strathcona County, market. An incorrect resolution — labelling Sherwood Park as Strathcona County,
for instance — would cause the cluster to receive the wrong metro-distance for instance — would cause the cluster to receive the wrong metro-distance
calculation and potentially be misclassified. calculation and potentially be misclassified.
The resolved name also becomes the title of any Regional Market TOPIC article The resolved name also becomes the title of any Regional Market TOPIC article
written for that cluster. Correctness here is a matter of editorial integrity: written for that cluster. Correctness here is a matter of editorial integrity:
an article titled "Strathcona County" about a retail cluster in Sherwood Park an article titled "Strathcona County" about a retail cluster in Sherwood Park
would be factually misleading. would be factually misleading.
## Known Limitations ## Known Limitations
The current resolution architecture relies on boundary datasets with fixed The current resolution architecture relies on boundary datasets with fixed
vintages (TIGER 2023, GISCO LAU 2021). Names that have changed since those vintages (TIGER 2023, GISCO LAU 2021). Names that have changed since those
vintages — due to incorporation, annexation, or renaming — will not be reflected vintages — due to incorporation, annexation, or renaming — will not be reflected
until the boundary data is refreshed. Similarly, newly established communities until the boundary data is refreshed. Similarly, newly established communities
that postdate the boundary datasets will fall back to administrative-level that postdate the boundary datasets will fall back to administrative-level
resolution, which may produce less specific names. resolution, which may produce less specific names.
The twelve Canadian override entries represent the cases identified during The twelve Canadian override entries represent the cases identified during
the Phase 14 and Phase 15 build cycles. Other CSD/community name mismatches the Phase 14 and Phase 15 build cycles. Other CSD/community name mismatches
may exist in areas not yet covered by the platform. may exist in areas not yet covered by the platform.
--- ---
*Data provenance:* TIGER 2023 (US Census Bureau, public domain); GISCO LAU 2021 *Data provenance:* TIGER 2023 (US Census Bureau, public domain); GISCO LAU 2021
(Eurostat/EC, CC BY 4.0); GADM GBR (GADM v4.1, non-commercial research licence); (Eurostat/EC, CC BY 4.0); GADM GBR (GADM v4.1, non-commercial research licence);
Nominatim overrides (original, project-gis). OSM data CC0. Nominatim overrides (original, project-gis). OSM data CC0.