Usecase "strategic urban clustering" (high volumes)
Posted: Wed Oct 09, 2024 10:38 am
Hi there,
as some of you are dealing with the strategic xCluster2.planClusters and run into performance issues I want to collect some strategies of how this can be tackled. Let's start with the core story:
Given: a list of urban locations based on
as some of you are dealing with the strategic xCluster2.planClusters and run into performance issues I want to collect some strategies of how this can be tackled. Let's start with the core story:
Given: a list of urban locations based on
- Primary key ID, mandatory
- Latitude / Longitude, mandatory
- Activity (e.g. number of letters to be served), mandatory
- Group, optional - group means that all locations with the same group have to be assigned to the same output cluster (though the cluster ID is not known in advance)
- Create a given number N of clusters which deal with two core targets:
- The cluster's aggregated activity should be balanced
- The cluster's shapes should be as compact as possible
- Problem: the numbers of the involved given locations often exceed 100'000 ! This has a huge impact on performance (and sometimes quality)
- Simply call planClusters with the elemental locations, their coords and activities and groups
- Advantage: simple business logic
- Disadvantage: performance, system requirements (e.g. memory consumption when Distance matrix is used)
- Perform a "group by" on the "group" property such as
Code: Select all
SELECT Group as ID, AVG(Longitude) as Longitude, AVG(Latitude) as Latitude, SUM(Activity) as Activity FROM Locations GROUP BY Group
- Advantage : Smaller complexity (e.g. a street with 100 locations is summed up into a single location!)
- Disadvantage : what coordinates should repreent the aggregate?
- Almost equal as #2a. Just perform a planClusters with each GROUP and the output cluster count = 1! This will give you an aggregated activity that represents the group and also the cluster center (chosen from the coordinates of the groupos locatios!) lies in a meaningful route location that is later used for the overall clustering