Page 1 of 1

Usecase "strategic urban clustering" (high volumes)

Posted: Wed Oct 09, 2024 10:38 am
by Bernd Welter
Hi there,

as some of you are dealing with the strategic xCluster2.planClusters and run into performance issues I want to collect some strategies of how this can be tackled. Let's start with the core story:

Given: a list of urban locations based on
  • Primary key ID, mandatory
  • Latitude / Longitude, mandatory
  • Activity (e.g. number of letters to be served), mandatory
  • Group, optional - group means that all locations with the same group have to be assigned to the same output cluster (though the cluster ID is not known in advance)
Challenge:
  • Create a given number N of clusters which deal with two core targets:
    • The cluster's aggregated activity should be balanced
    • The cluster's shapes should be as compact as possible
  • Problem: the numbers of the involved given locations often exceed 100'000 ! This has a huge impact on performance (and sometimes quality)
Approach #1 : native strategy
  • Simply call planClusters with the elemental locations, their coords and activities and groups
  • Advantage: simple business logic
  • Disadvantage: performance, system requirements (e.g. memory consumption when Distance matrix is used)
  • cluster-20.png
Approach #2a : preaggregate per group (applie if groups are given!)
  • Perform a "group by" on the "group" property such as

    Code: Select all

    SELECT 
      Group as ID,
      AVG(Longitude) as Longitude,
      AVG(Latitude) as Latitude,
      SUM(Activity) as Activity
    FROM Locations
    GROUP BY Group  
    
  • Advantage : Smaller complexity (e.g. a street with 100 locations is summed up into a single location!)
  • Disadvantage : what coordinates should repreent the aggregate?
Approach #2b : preaggregate and cluster
  • Almost equal as #2a. Just perform a planClusters with each GROUP and the output cluster count = 1! This will give you an aggregated activity that represents the group and also the cluster center (chosen from the coordinates of the groupos locatios!) lies in a meaningful route location that is later used for the overall clustering ;-)
    "One cluster" outpout : the pyramid is lated used as the representing route location of the group - in this case with an aggregated activity of 14'500
    "One cluster" outpout : the pyramid is lated used as the representing route location of the group - in this case with an aggregated activity of 14'500