Diff: architecture/regional-name-resolution.es
From 1c02ec1 to 1c02ec1
+0 / −0 lines
| Before | After |
|---|---|
| --- | --- |
| schema: foundry-doc-v1 | schema: foundry-doc-v1 |
| title: "Regional Name Resolution Architecture" | title: "Regional Name Resolution Architecture" |
| slug: regional-name-resolution | slug: regional-name-resolution |
| category: architecture | category: architecture |
| type: concept | type: concept |
| content_type: topic | content_type: topic |
| quality: complete | quality: complete |
| status: active | status: active |
| audience: vendor-public | audience: vendor-public |
| bcsc_class: no-disclosure-implication | bcsc_class: no-disclosure-implication |
| language_protocol: PROSE-TOPIC | language_protocol: PROSE-TOPIC |
| last_edited: 2026-05-31 | last_edited: 2026-05-31 |
| editor: pointsav-engineering | editor: pointsav-engineering |
| paired_with: regional-name-resolution.es.md | paired_with: regional-name-resolution.es.md |
| short_description: "How co-location cluster centroids are resolved to colloquial place names using TIGER 2023, GISCO LAU 2021, GADM GBR, and a 12-entry Canadian Nominatim override list." | short_description: "How co-location cluster centroids are resolved to colloquial place names using TIGER 2023, GISCO LAU 2021, GADM GBR, and a 12-entry Canadian Nominatim override list." |
| cites: [] | cites: [] |
| --- | --- |
| Co-location clusters are identified by geometry — a set of latitude/longitude | Co-location clusters are identified by geometry — a set of latitude/longitude |
| coordinates derived from OpenStreetMap point-of-interest records. Geometry does | coordinates derived from OpenStreetMap point-of-interest records. Geometry does |
| not have a name. Giving a cluster a useful, human-recognisable name requires a | not have a name. Giving a cluster a useful, human-recognisable name requires a |
| separate resolution step that matches the cluster's centroid against authoritative | separate resolution step that matches the cluster's centroid against authoritative |
| place-name datasets. This article describes how that resolution works, why it is | place-name datasets. This article describes how that resolution works, why it is |
| necessary, and where it can fail. | necessary, and where it can fail. |
| ## The Problem with Administrative Boundaries | ## The Problem with Administrative Boundaries |
| OpenStreetMap and Wikidata organise geography into administrative hierarchies: | OpenStreetMap and Wikidata organise geography into administrative hierarchies: |
| country, region, county, municipality. These hierarchies are legally and | country, region, county, municipality. These hierarchies are legally and |
| politically defined. They do not always correspond to the names that residents, | politically defined. They do not always correspond to the names that residents, |
| businesses, and market researchers use to describe a place. | businesses, and market researchers use to describe a place. |
| Consider a cluster of retail co-locations in the community of Sherwood Park, | Consider a cluster of retail co-locations in the community of Sherwood Park, |
| Alberta. Sherwood Park is an unincorporated community within Strathcona County. | Alberta. Sherwood Park is an unincorporated community within Strathcona County. |
| Its OSM administrative boundary is the county — *Strathcona County* — not the | Its OSM administrative boundary is the county — *Strathcona County* — not the |
| community. An algorithm that resolves cluster names solely from administrative | community. An algorithm that resolves cluster names solely from administrative |
| boundaries would label this cluster "Strathcona County," a name that conveys | boundaries would label this cluster "Strathcona County," a name that conveys |
| almost nothing to a researcher studying suburban retail patterns in the Edmonton | almost nothing to a researcher studying suburban retail patterns in the Edmonton |
| metropolitan area. The name "Sherwood Park" is what the community, its retailers, | metropolitan area. The name "Sherwood Park" is what the community, its retailers, |
| and its residents use. It is what a Regional Market article should be titled. | and its residents use. It is what a Regional Market article should be titled. |
| This disconnect between legal administrative geography and colloquial place names | This disconnect between legal administrative geography and colloquial place names |
| is not an edge case. It appears wherever unincorporated communities, census | is not an edge case. It appears wherever unincorporated communities, census |
| subdivisions, and historical town names persist alongside newer county or borough | subdivisions, and historical town names persist alongside newer county or borough |
| structures. The resolution architecture exists to bridge that gap. | structures. The resolution architecture exists to bridge that gap. |
| ## Boundary Datasets | ## Boundary Datasets |
| Four datasets supply place-name candidates, each covering a different part of | Four datasets supply place-name candidates, each covering a different part of |
| the geographic scope of the platform. | the geographic scope of the platform. |
| **TIGER 2023 (United States).** The US Census Bureau's Topologically Integrated | **TIGER 2023 (United States).** The US Census Bureau's Topologically Integrated |
| Geographic Encoding and Referencing (TIGER) dataset provides place boundaries for | Geographic Encoding and Referencing (TIGER) dataset provides place boundaries for |
| the United States. The 2023 vintage includes approximately 32,000 named places: | the United States. The 2023 vintage includes approximately 32,000 named places: |
| incorporated cities and towns, census-designated places (CDPs), and some | incorporated cities and towns, census-designated places (CDPs), and some |
| unincorporated communities with recognised names. TIGER places are the primary | unincorporated communities with recognised names. TIGER places are the primary |
| resolution source for all US clusters. | resolution source for all US clusters. |
| **GISCO LAU 2021 (European Union and associated countries).** The European | **GISCO LAU 2021 (European Union and associated countries).** The European |
| Commission's Geographic Information Services for the Commission of the EU | Commission's Geographic Information Services for the Commission of the EU |
| (GISCO) publishes Local Administrative Unit (LAU) boundaries derived from | (GISCO) publishes Local Administrative Unit (LAU) boundaries derived from |
| NUTS (Nomenclature of Territorial Units for Statistics). The 2021 vintage covers | NUTS (Nomenclature of Territorial Units for Statistics). The 2021 vintage covers |
| approximately 98,600 municipalities across EU member states and neighbouring | approximately 98,600 municipalities across EU member states and neighbouring |
| countries participating in the Eurostat framework. LAU boundaries are the primary | countries participating in the Eurostat framework. LAU boundaries are the primary |
| resolution source for EU clusters in Germany, France, Spain, Italy, Poland, the | resolution source for EU clusters in Germany, France, Spain, Italy, Poland, the |
| Netherlands, Austria, Portugal, Greece, Sweden, Denmark, Finland, and Norway. | Netherlands, Austria, Portugal, Greece, Sweden, Denmark, Finland, and Norway. |
| **GADM GBR (United Kingdom).** The Global Administrative Areas (GADM) database | **GADM GBR (United Kingdom).** The Global Administrative Areas (GADM) database |
| provides sub-national boundary data for countries not covered by GISCO. For the | provides sub-national boundary data for countries not covered by GISCO. For the |
| United Kingdom, GADM provides administrative level 3 boundaries (parishes and | United Kingdom, GADM provides administrative level 3 boundaries (parishes and |
| wards in England; communities in Wales; civil parishes in Scotland). These | wards in England; communities in Wales; civil parishes in Scotland). These |
| provide finer-grained name candidates than the LAU-equivalent level 2 districts. | provide finer-grained name candidates than the LAU-equivalent level 2 districts. |
| **Nominatim overrides (Canada).** Canada presents a particular challenge because | **Nominatim overrides (Canada).** Canada presents a particular challenge because |
| census subdivisions (CSDs) — the standard administrative unit — sometimes cover | census subdivisions (CSDs) — the standard administrative unit — sometimes cover |
| large geographic areas that contain multiple distinct communities with different | large geographic areas that contain multiple distinct communities with different |
| names. Twelve manual override entries in `ca_places_nominatim.json` provide | names. Twelve manual override entries in `ca_places_nominatim.json` provide |
| canonical place names for cases where the CSD name would be misleading. Sherwood | canonical place names for cases where the CSD name would be misleading. Sherwood |
| Park (Strathcona County CSD) is one of these twelve overrides. | Park (Strathcona County CSD) is one of these twelve overrides. |
| ## Resolution Logic | ## Resolution Logic |
| For each cluster centroid, the resolution algorithm proceeds as follows: | For each cluster centroid, the resolution algorithm proceeds as follows: |
| *Name match.* The algorithm first checks whether the cluster's constituent | *Name match.* The algorithm first checks whether the cluster's constituent |
| retail locations carry a consistent `addr:city` or `addr:suburb` tag in OSM. | retail locations carry a consistent `addr:city` or `addr:suburb` tag in OSM. |
| If a majority of member records agree on a place name, that name is taken as a | If a majority of member records agree on a place name, that name is taken as a |
| candidate without consulting boundary datasets. | candidate without consulting boundary datasets. |
| *Boundary containment.* If no OSM tag consensus exists, the centroid is tested | *Boundary containment.* If no OSM tag consensus exists, the centroid is tested |
| for containment against the applicable boundary dataset. The smallest-area | for containment against the applicable boundary dataset. The smallest-area |
| polygon that contains the centroid is selected. Its name field becomes the | polygon that contains the centroid is selected. Its name field becomes the |
| resolution candidate. | resolution candidate. |
| *Administrative level fallback.* If no polygon at the preferred administrative | *Administrative level fallback.* If no polygon at the preferred administrative |
| level contains the centroid — which can occur near coast lines, in disputed | level contains the centroid — which can occur near coast lines, in disputed |
| areas, or for clusters near the edge of dataset coverage — the algorithm steps | areas, or for clusters near the edge of dataset coverage — the algorithm steps |
| up to the next administrative level and repeats the containment test. | up to the next administrative level and repeats the containment test. |
| *Override application.* After the initial candidate is identified, the algorithm | *Override application.* After the initial candidate is identified, the algorithm |
| checks the candidate name against the override list. For Canada, if the resolved | checks the candidate name against the override list. For Canada, if the resolved |
| CSD name matches one of the twelve known problematic names, the override supplies | CSD name matches one of the twelve known problematic names, the override supplies |
| the correct colloquial name. | the correct colloquial name. |
| ## Why Canonical Names Matter | ## Why Canonical Names Matter |
| The resolved name is not merely a display label. It is the primary identifier | The resolved name is not merely a display label. It is the primary identifier |
| used in the Regional Markets scoring system. A cluster's resolved name determines | used in the Regional Markets scoring system. A cluster's resolved name determines |
| which metro-distance calculation applies to it: the scoring system looks up the | which metro-distance calculation applies to it: the scoring system looks up the |
| canonical metro reference list using the resolved name to determine whether a | canonical metro reference list using the resolved name to determine whether a |
| cluster belongs to a metro core, a suburban ring, or a standalone secondary | cluster belongs to a metro core, a suburban ring, or a standalone secondary |
| market. An incorrect resolution — labelling Sherwood Park as Strathcona County, | market. An incorrect resolution — labelling Sherwood Park as Strathcona County, |
| for instance — would cause the cluster to receive the wrong metro-distance | for instance — would cause the cluster to receive the wrong metro-distance |
| calculation and potentially be misclassified. | calculation and potentially be misclassified. |
| The resolved name also becomes the title of any Regional Market article | The resolved name also becomes the title of any Regional Market article |
| written for that cluster. Correctness here is a matter of editorial integrity: | written for that cluster. Correctness here is a matter of editorial integrity: |
| an article titled "Strathcona County" about a retail cluster in Sherwood Park | an article titled "Strathcona County" about a retail cluster in Sherwood Park |
| would be factually misleading. | would be factually misleading. |
| ## Known Limitations | ## Known Limitations |
| The current resolution architecture relies on boundary datasets with fixed | The current resolution architecture relies on boundary datasets with fixed |
| vintages (TIGER 2023, GISCO LAU 2021). Names that have changed since those | vintages (TIGER 2023, GISCO LAU 2021). Names that have changed since those |
| vintages — due to incorporation, annexation, or renaming — will not be reflected | vintages — due to incorporation, annexation, or renaming — will not be reflected |
| until the boundary data is refreshed. Similarly, newly established communities | until the boundary data is refreshed. Similarly, newly established communities |
| that postdate the boundary datasets will fall back to administrative-level | that postdate the boundary datasets will fall back to administrative-level |
| resolution, which may produce less specific names. | resolution, which may produce less specific names. |
| The twelve Canadian override entries represent the cases identified during | The twelve Canadian override entries represent the cases identified during |
| the Phase 14 and Phase 15 build cycles. Other CSD/community name mismatches | the Phase 14 and Phase 15 build cycles. Other CSD/community name mismatches |
| may exist in areas not yet covered by the platform. | may exist in areas not yet covered by the platform. |
| ## References | ## References |
| - [Address geocoding](https://en.wikipedia.org/wiki/Geocoding) — Wikipedia, accessed 2026-06-14 | - [Address geocoding](https://en.wikipedia.org/wiki/Geocoding) — Wikipedia, accessed 2026-06-14 |
| - [OpenStreetMap](https://en.wikipedia.org/wiki/OpenStreetMap) — Wikipedia, accessed 2026-06-14 | - [OpenStreetMap](https://en.wikipedia.org/wiki/OpenStreetMap) — Wikipedia, accessed 2026-06-14 |
| --- | --- |
| *Data provenance:* TIGER 2023 (US Census Bureau, public domain); GISCO LAU 2021 | *Data provenance:* TIGER 2023 (US Census Bureau, public domain); GISCO LAU 2021 |
| (Eurostat/EC, CC BY 4.0); GADM GBR (GADM v4.1, non-commercial research licence); | (Eurostat/EC, CC BY 4.0); GADM GBR (GADM v4.1, non-commercial research licence); |
| Nominatim overrides (original). OSM data CC0. | Nominatim overrides (original). OSM data CC0. |