Business clustering service — PointSav Documentation

Retail data is inherently messy — a single commercial site often contains multiple distinct points, such as a big-box anchor, a nested pharmacy, and a fuel outlet sharing the same parking area. service-business turns those raw points into actionable commercial clusters using a parent-child spatial schema, so the GIS engine receives one unified commercial entity per physical site rather than several overlapping records. The service iterates the service-fs raw data lake, groups entities that share a footprint within a 100 m proximity threshold, and assigns the highest-weight named anchor as the parent.

[edit]Key Takeaways

service-business uses a grid-based spatial index at roughly 1 km cell resolution to group raw retail points. Points within 100 m of each other are collapsed into one commercial cluster before any tier score is computed.
The parent-child schema assigns the highest-weight named anchor as the parent. Every other operator at the same site becomes a child record. Without this step, co-located fuel outlets and pharmacies would each count as independent tier signals.
Output is cleansed-clusters.jsonl, consumed directly by [[app-orchestration-gis]] when building the regional co-location index. The clustering step is the boundary between raw POI data and ranked commercial intelligence.
The 100 m proximity threshold is calibrated for large-format retail parks — close enough to capture genuine co-location, far enough to separate adjacent but structurally distinct shopping destinations.

[edit]The Clustering Logic

service-business processes raw commercial nodes so that the GIS engine produces a single, unified commercial entity per physical site.

[edit]Grid-Based Spatial Indexing

To perform this at scale, the service uses a grid-based spatial index (approximately 1 km cells). It iterates through the service-fs raw data lake and groups entities that share a physical footprint within a 100 m proximity threshold.

[edit]Parent-Child Schema

Parent node: The primary commercial driver — typically the highest-weight named anchor at the site.
Children (sub-entities): Secondary operators located within the same spatial node.

[edit]Cleansed Data Output

The output is a refined cleansed-clusters.jsonl file. This processed dataset is consumed by the downstream app-orchestration-gis to build the regional co-location index.

[edit]See also

[edit]References

DBSCAN — Wikipedia, accessed 2026-06-14