Diff: architecture/regional-name-resolution.es
From 29c35f3 to 29c35f3
+138 / −0 lines
| Before | After |
|---|---|
| --- | |
| schema: foundry-doc-v1 | |
| title: "Regional Name Resolution Architecture" | |
| slug: regional-name-resolution | |
| category: architecture | |
| type: concept | |
| quality: complete | |
| status: active | |
| audience: vendor-public | |
| bcsc_class: no-disclosure-implication | |
| language_protocol: PROSE-TOPIC | |
| last_edited: 2026-05-31 | |
| editor: pointsav-engineering | |
| paired_with: regional-name-resolution.es.md | |
| short_description: "How co-location cluster centroids are resolved to colloquial place names using TIGER 2023, GISCO LAU 2021, GADM GBR, and a 12-entry Canadian Nominatim override list." | |
| cites: [] | |
| --- | |
| Co-location clusters are identified by geometry — a set of latitude/longitude | |
| coordinates derived from OpenStreetMap point-of-interest records. Geometry does | |
| not have a name. Giving a cluster a useful, human-recognisable name requires a | |
| separate resolution step that matches the cluster's centroid against authoritative | |
| place-name datasets. This article describes how that resolution works, why it is | |
| necessary, and where it can fail. | |
| ## The Problem with Administrative Boundaries | |
| OpenStreetMap and Wikidata organise geography into administrative hierarchies: | |
| country, region, county, municipality. These hierarchies are legally and | |
| politically defined. They do not always correspond to the names that residents, | |
| businesses, and market researchers use to describe a place. | |
| Consider a cluster of retail co-locations in the community of Sherwood Park, | |
| Alberta. Sherwood Park is an unincorporated community within Strathcona County. | |
| Its OSM administrative boundary is the county — *Strathcona County* — not the | |
| community. An algorithm that resolves cluster names solely from administrative | |
| boundaries would label this cluster "Strathcona County," a name that conveys | |
| almost nothing to a researcher studying suburban retail patterns in the Edmonton | |
| metropolitan area. The name "Sherwood Park" is what the community, its retailers, | |
| and its residents use. It is what a Regional Market TOPIC article should be titled. | |
| This disconnect between legal administrative geography and colloquial place names | |
| is not an edge case. It appears wherever unincorporated communities, census | |
| subdivisions, and historical town names persist alongside newer county or borough | |
| structures. The resolution architecture exists to bridge that gap. | |
| ## Boundary Datasets | |
| Four datasets supply place-name candidates, each covering a different part of | |
| the geographic scope of the platform. | |
| **TIGER 2023 (United States).** The US Census Bureau's Topologically Integrated | |
| Geographic Encoding and Referencing (TIGER) dataset provides place boundaries for | |
| the United States. The 2023 vintage includes approximately 32,000 named places: | |
| incorporated cities and towns, census-designated places (CDPs), and some | |
| unincorporated communities with recognised names. TIGER places are the primary | |
| resolution source for all US clusters. | |
| **GISCO LAU 2021 (European Union and associated countries).** The European | |
| Commission's Geographic Information Services for the Commission of the EU | |
| (GISCO) publishes Local Administrative Unit (LAU) boundaries derived from | |
| NUTS (Nomenclature of Territorial Units for Statistics). The 2021 vintage covers | |
| approximately 98,600 municipalities across EU member states and neighbouring | |
| countries participating in the Eurostat framework. LAU boundaries are the primary | |
| resolution source for EU clusters in Germany, France, Spain, Italy, Poland, the | |
| Netherlands, Austria, Portugal, Greece, Sweden, Denmark, Finland, and Norway. | |
| **GADM GBR (United Kingdom).** The Global Administrative Areas (GADM) database | |
| provides sub-national boundary data for countries not covered by GISCO. For the | |
| United Kingdom, GADM provides administrative level 3 boundaries (parishes and | |
| wards in England; communities in Wales; civil parishes in Scotland). These | |
| provide finer-grained name candidates than the LAU-equivalent level 2 districts. | |
| **Nominatim overrides (Canada).** Canada presents a particular challenge because | |
| census subdivisions (CSDs) — the standard administrative unit — sometimes cover | |
| large geographic areas that contain multiple distinct communities with different | |
| names. Twelve manual override entries in `ca_places_nominatim.json` provide | |
| canonical place names for cases where the CSD name would be misleading. Sherwood | |
| Park (Strathcona County CSD) is one of these twelve overrides. | |
| ## Resolution Logic | |
| For each cluster centroid, the resolution algorithm proceeds as follows: | |
| *Name match.* The algorithm first checks whether the cluster's constituent | |
| retail locations carry a consistent `addr:city` or `addr:suburb` tag in OSM. | |
| If a majority of member records agree on a place name, that name is taken as a | |
| candidate without consulting boundary datasets. | |
| *Boundary containment.* If no OSM tag consensus exists, the centroid is tested | |
| for containment against the applicable boundary dataset. The smallest-area | |
| polygon that contains the centroid is selected. Its name field becomes the | |
| resolution candidate. | |
| *Administrative level fallback.* If no polygon at the preferred administrative | |
| level contains the centroid — which can occur near coast lines, in disputed | |
| areas, or for clusters near the edge of dataset coverage — the algorithm steps | |
| up to the next administrative level and repeats the containment test. | |
| *Override application.* After the initial candidate is identified, the algorithm | |
| checks the candidate name against the override list. For Canada, if the resolved | |
| CSD name matches one of the twelve known problematic names, the override supplies | |
| the correct colloquial name. | |
| ## Why Canonical Names Matter | |
| The resolved name is not merely a display label. It is the primary identifier | |
| used in the Regional Markets scoring system. A cluster's resolved name determines | |
| which metro-distance calculation applies to it: the scoring system looks up the | |
| canonical metro reference list using the resolved name to determine whether a | |
| cluster belongs to a metro core, a suburban ring, or a standalone secondary | |
| market. An incorrect resolution — labelling Sherwood Park as Strathcona County, | |
| for instance — would cause the cluster to receive the wrong metro-distance | |
| calculation and potentially be misclassified. | |
| The resolved name also becomes the title of any Regional Market TOPIC article | |
| written for that cluster. Correctness here is a matter of editorial integrity: | |
| an article titled "Strathcona County" about a retail cluster in Sherwood Park | |
| would be factually misleading. | |
| ## Known Limitations | |
| The current resolution architecture relies on boundary datasets with fixed | |
| vintages (TIGER 2023, GISCO LAU 2021). Names that have changed since those | |
| vintages — due to incorporation, annexation, or renaming — will not be reflected | |
| until the boundary data is refreshed. Similarly, newly established communities | |
| that postdate the boundary datasets will fall back to administrative-level | |
| resolution, which may produce less specific names. | |
| The twelve Canadian override entries represent the cases identified during | |
| the Phase 14 and Phase 15 build cycles. Other CSD/community name mismatches | |
| may exist in areas not yet covered by the platform. | |
| --- | |
| *Data provenance:* TIGER 2023 (US Census Bureau, public domain); GISCO LAU 2021 | |
| (Eurostat/EC, CC BY 4.0); GADM GBR (GADM v4.1, non-commercial research licence); | |
| Nominatim overrides (original, project-gis). OSM data CC0. |