Business clustering service
TopicFrom the PointSav Documentation
service-business turns raw retail data points into actionable commercial clusters by applying a parent-child spatial schema — so when multiple distinct operators share a single physical site, the GIS engine receives one commercial entity per site rather than several overlapping points.
Retail data is inherently messy — a single commercial site often contains multiple distinct points, such as a big-box anchor, a nested pharmacy, and a fuel outlet sharing the same parking area. service-business turns those raw points into actionable commercial clusters using a parent-child spatial schema, so the GIS engine receives one unified commercial entity per physical site rather than several overlapping records. The service iterates the service-fs raw data lake, groups entities that share a footprint within a 100 m proximity threshold, and assigns the highest-weight named anchor as the parent.
[edit]Key Takeaways
service-businessuses a grid-based spatial index at roughly 1 km cell resolution to group raw retail points. Points within 100 m of each other are collapsed into one commercial cluster before any tier score is computed.- The parent-child schema assigns the highest-weight named anchor as the parent. Every other operator at the same site becomes a child record. Without this step, co-located fuel outlets and pharmacies would each count as independent tier signals.
- Output is
cleansed-clusters.jsonl, consumed directly by[[app-orchestration-gis]]when building the regional co-location index. The clustering step is the boundary between raw POI data and ranked commercial intelligence. - The 100 m proximity threshold is calibrated for large-format retail parks — close enough to capture genuine co-location, far enough to separate adjacent but structurally distinct shopping destinations.
[edit]The Clustering Logic
service-business processes raw commercial nodes so that the GIS engine produces a single, unified commercial entity per physical site.
[edit]Grid-Based Spatial Indexing
To perform this at scale, the service uses a grid-based spatial index (approximately 1 km cells). It iterates through the service-fs raw data lake and groups entities that share a physical footprint within a 100 m proximity threshold.
[edit]Parent-Child Schema
- Parent node: The primary commercial driver — typically the highest-weight named anchor at the site.
- Children (sub-entities): Secondary operators located within the same spatial node.
[edit]Cleansed Data Output
The output is a refined cleansed-clusters.jsonl file. This processed dataset is consumed by the downstream app-orchestration-gis to build the regional co-location index.
[edit]See also
[edit]References
- DBSCAN — Wikipedia, accessed 2026-06-14