Usecase "strategic urban clustering" (high volumes)

Bernd Welter · Post by **Bernd Welter** » Wed Oct 09, 2024 10:38 am

Hi there,

as some of you are dealing with the strategic xCluster2.planClusters and run into performance issues I want to collect some strategies of how this can be tackled. Let's start with the core story:

Given: a list of urban locations based on

Primary key ID, mandatory
Latitude / Longitude, mandatory
Activity (e.g. number of letters to be served), mandatory
Group, optional - group means that all locations with the same group have to be assigned to the same output cluster (though the cluster ID is not known in advance)

Challenge:

Create a given number N of clusters which deal with two core targets:
- The cluster's aggregated activity should be balanced
- The cluster's shapes should be as compact as possible
Problem: the numbers of the involved given locations often exceed 100'000 ! This has a huge impact on performance (and sometimes quality)

Approach #1 : native strategy

Simply call planClusters with the elemental locations, their coords and activities and groups
Advantage: simple business logic
Disadvantage: performance, system requirements (e.g. memory consumption when Distance matrix is used)

Approach #2a : preaggregate per group (applie if groups are given!)

Perform a "group by" on the "group" property such as

Code: Select all

SELECT 
  Group as ID,
  AVG(Longitude) as Longitude,
  AVG(Latitude) as Latitude,
  SUM(Activity) as Activity
FROM Locations
GROUP BY Group

Advantage : Smaller complexity (e.g. a street with 100 locations is summed up into a single location!)
Disadvantage : what coordinates should repreent the aggregate?

Approach #2b : preaggregate and cluster

Almost equal as #2a. Just perform a planClusters with each GROUP and the output cluster count = 1! This will give you an aggregated activity that represents the group and also the cluster center (chosen from the coordinates of the groupos locatios!) lies in a meaningful route location that is later used for the overall clustering

"One cluster" outpout : the pyramid is lated used as the representing route location of the group - in this case with an aggregated activity of 14'500