Skip to content

Diff: services/service-business-clustering

From c4d1fb1 to c4d1fb1

+0 / −0 lines
BeforeAfter
--- ---
schema: foundry-doc-v1 schema: foundry-doc-v1
title: "service-business" title: "service-business"
slug: service-business-clustering slug: service-business-clustering
category: services category: services
type: topic type: topic
quality: complete quality: complete
status: active status: active
audience: public audience: public
short_description: "service-business turns raw retail data points into actionable commercial clusters by applying a parent-child spatial schema — so when multiple distinct operators share a single physical site, the GIS engine receives one commercial entity per site rather than several overlapping points." short_description: "service-business turns raw retail data points into actionable commercial clusters by applying a parent-child spatial schema — so when multiple distinct operators share a single physical site, the GIS engine receives one commercial entity per site rather than several overlapping points."
bcsc_class: public-disclosure-safe bcsc_class: public-disclosure-safe
language_protocol: PROSE-TOPIC language_protocol: PROSE-TOPIC
last_edited: 2026-05-08 last_edited: 2026-05-08
editor: pointsav-engineering editor: pointsav-engineering
paired_with: service-business-clustering.es.md paired_with: service-business-clustering.es.md
cites: [] cites: []
--- ---
Retail data is inherently messy — a single commercial site often contains multiple distinct points, such as a big-box anchor, a nested pharmacy, and a fuel outlet sharing the same parking area. **`service-business`** turns those raw points into actionable commercial clusters using a parent-child spatial schema, so the GIS engine receives one unified commercial entity per physical site rather than several overlapping records. The service iterates the `service-fs` raw data lake, groups entities that share a footprint within a 100 m proximity threshold, and assigns the highest-weight named anchor as the parent. Retail data is inherently messy — a single commercial site often contains multiple distinct points, such as a big-box anchor, a nested pharmacy, and a fuel outlet sharing the same parking area. **`service-business`** turns those raw points into actionable commercial clusters using a parent-child spatial schema, so the GIS engine receives one unified commercial entity per physical site rather than several overlapping records. The service iterates the `service-fs` raw data lake, groups entities that share a footprint within a 100 m proximity threshold, and assigns the highest-weight named anchor as the parent.
## The Clustering Logic ## The Clustering Logic
Retail data is inherently messy. A single commercial node often contains multiple distinct points — for example, a big-box anchor store, a nested pharmacy, and a fuel outlet in the same parking area. `service-business` processes these points so that the GIS engine produces a single, unified commercial entity per physical site. Retail data is inherently messy. A single commercial node often contains multiple distinct points — for example, a big-box anchor store, a nested pharmacy, and a fuel outlet in the same parking area. `service-business` processes these points so that the GIS engine produces a single, unified commercial entity per physical site.
### Grid-Based Spatial Indexing ### Grid-Based Spatial Indexing
To perform this at scale, the service uses a grid-based spatial index (approximately 1 km cells). It iterates through the `service-fs` raw data lake and groups entities that share a physical footprint within a 100 m proximity threshold. To perform this at scale, the service uses a grid-based spatial index (approximately 1 km cells). It iterates through the `service-fs` raw data lake and groups entities that share a physical footprint within a 100 m proximity threshold.
### Parent-Child Schema ### Parent-Child Schema
- **Parent node:** The primary commercial driver — typically the highest-weight named anchor at the site. - **Parent node:** The primary commercial driver — typically the highest-weight named anchor at the site.
- **Children (sub-entities):** Secondary operators located within the same spatial node. - **Children (sub-entities):** Secondary operators located within the same spatial node.
## Cleansed Data Output ## Cleansed Data Output
The output is a refined `cleansed-clusters.jsonl` file. This processed dataset is consumed by the downstream `app-orchestration-gis` to build the regional co-location index. The output is a refined `cleansed-clusters.jsonl` file. This processed dataset is consumed by the downstream `app-orchestration-gis` to build the regional co-location index.
## See Also ## See Also
- [[app-orchestration-gis]] - [[app-orchestration-gis]]
- [[service-fs-data-lake]] - [[service-fs-data-lake]]
- [[service-places-filtering]] - [[service-places-filtering]]