Skip to content

Regional Name Resolution Architecture

Topic

From the PointSav Documentation

How co-location cluster centroids are resolved to colloquial place names using TIGER 2023, GISCO LAU 2021, GADM GBR, and a 12-entry Canadian Nominatim override list.

Updated 2026-05-31 · HistoryEspañol
vendor-public

Co-location clusters are identified by geometry — a set of latitude/longitude coordinates derived from OpenStreetMap point-of-interest records. Geometry does not have a name. Giving a cluster a useful, human-recognisable name requires a separate resolution step that matches the cluster's centroid against authoritative place-name datasets. This article describes how that resolution works, why it is necessary, and where it can fail.

[edit]The Problem with Administrative Boundaries

OpenStreetMap and Wikidata organise geography into administrative hierarchies: country, region, county, municipality. These hierarchies are legally and politically defined. They do not always correspond to the names that residents, businesses, and market researchers use to describe a place.

Consider a cluster of retail co-locations in the community of Sherwood Park, Alberta. Sherwood Park is an unincorporated community within Strathcona County. Its OSM administrative boundary is the county — Strathcona County — not the community. An algorithm that resolves cluster names solely from administrative boundaries would label this cluster "Strathcona County," a name that conveys almost nothing to a researcher studying suburban retail patterns in the Edmonton metropolitan area. The name "Sherwood Park" is what the community, its retailers, and its residents use. It is what a Regional Market article should be titled.

This disconnect between legal administrative geography and colloquial place names is not an edge case. It appears wherever unincorporated communities, census subdivisions, and historical town names persist alongside newer county or borough structures. The resolution architecture exists to bridge that gap.

[edit]Boundary Datasets

Four datasets supply place-name candidates, each covering a different part of the geographic scope of the platform.

TIGER 2023 (United States). The US Census Bureau's Topologically Integrated Geographic Encoding and Referencing (TIGER) dataset provides place boundaries for the United States. The 2023 vintage includes approximately 32,000 named places: incorporated cities and towns, census-designated places (CDPs), and some unincorporated communities with recognised names. TIGER places are the primary resolution source for all US clusters.

GISCO LAU 2021 (European Union and associated countries). The European Commission's Geographic Information Services for the Commission of the EU (GISCO) publishes Local Administrative Unit (LAU) boundaries derived from NUTS (Nomenclature of Territorial Units for Statistics). The 2021 vintage covers approximately 98,600 municipalities across EU member states and neighbouring countries participating in the Eurostat framework. LAU boundaries are the primary resolution source for EU clusters in Germany, France, Spain, Italy, Poland, the Netherlands, Austria, Portugal, Greece, Sweden, Denmark, Finland, and Norway.

GADM GBR (United Kingdom). The Global Administrative Areas (GADM) database provides sub-national boundary data for countries not covered by GISCO. For the United Kingdom, GADM provides administrative level 3 boundaries (parishes and wards in England; communities in Wales; civil parishes in Scotland). These provide finer-grained name candidates than the LAU-equivalent level 2 districts.

Nominatim overrides (Canada). Canada presents a particular challenge because census subdivisions (CSDs) — the standard administrative unit — sometimes cover large geographic areas that contain multiple distinct communities with different names. Twelve manual override entries in ca_places_nominatim.json provide canonical place names for cases where the CSD name would be misleading. Sherwood Park (Strathcona County CSD) is one of these twelve overrides.

[edit]Resolution Logic

For each cluster centroid, the resolution algorithm proceeds as follows:

Name match. The algorithm first checks whether the cluster's constituent retail locations carry a consistent addr:city or addr:suburb tag in OSM. If a majority of member records agree on a place name, that name is taken as a candidate without consulting boundary datasets.

Boundary containment. If no OSM tag consensus exists, the centroid is tested for containment against the applicable boundary dataset. The smallest-area polygon that contains the centroid is selected. Its name field becomes the resolution candidate.

Administrative level fallback. If no polygon at the preferred administrative level contains the centroid — which can occur near coast lines, in disputed areas, or for clusters near the edge of dataset coverage — the algorithm steps up to the next administrative level and repeats the containment test.

Override application. After the initial candidate is identified, the algorithm checks the candidate name against the override list. For Canada, if the resolved CSD name matches one of the twelve known problematic names, the override supplies the correct colloquial name.

[edit]Why Canonical Names Matter

The resolved name is not merely a display label. It is the primary identifier used in the Regional Markets scoring system. A cluster's resolved name determines which metro-distance calculation applies to it: the scoring system looks up the canonical metro reference list using the resolved name to determine whether a cluster belongs to a metro core, a suburban ring, or a standalone secondary market. An incorrect resolution — labelling Sherwood Park as Strathcona County, for instance — would cause the cluster to receive the wrong metro-distance calculation and potentially be misclassified.

The resolved name also becomes the title of any Regional Market article written for that cluster. Correctness here is a matter of editorial integrity: an article titled "Strathcona County" about a retail cluster in Sherwood Park would be factually misleading.

[edit]Known Limitations

The current resolution architecture relies on boundary datasets with fixed vintages (TIGER 2023, GISCO LAU 2021). Names that have changed since those vintages — due to incorporation, annexation, or renaming — will not be reflected until the boundary data is refreshed. Similarly, newly established communities that postdate the boundary datasets will fall back to administrative-level resolution, which may produce less specific names.

The twelve Canadian override entries represent the cases identified during the Phase 14 and Phase 15 build cycles. Other CSD/community name mismatches may exist in areas not yet covered by the platform.

[edit]References


Data provenance: TIGER 2023 (US Census Bureau, public domain); GISCO LAU 2021 (Eurostat/EC, CC BY 4.0); GADM GBR (GADM v4.1, non-commercial research licence); Nominatim overrides (original). OSM data CC0.

Edit this page · View source