Let me explain this... As you might know the clustrering tries to satisfy two conflicting goals:Getting same responses every time with below combination:
1. imbalanceTolerance=10,50&10 and ignoreImabalanceTolerance=false
2. ignoreImbalanceTolerance=true
But getting different response with combination imbalanceTolerance=0 and ignoreImabalanceTolerance=false.
Please let me know, why similar response is generated with above mentioned combinations? I am expecting different responses with different tolerance values.
- create geographic compact territories
- assign each territory an equal workload
So if you activate a specific tolerance T% (e.g.5%) we determine the target activity and derive thresholds for a potential solution:Target activity: the average activity which satisfies the second goal in a perfect way. Simply the sum of all activities divided by the number of output clusters, e.g. the total activity is 10'000 and you want to create 5 clusters. TargetActivity is then 2'000.
- MinActivity := TargetActivity * (1 - T%), e.g. 2'000 * 0.95 = 1'900
- MaxActivity := TargetActivity * (1 + T%), e.g. 2'000 * 1.05 = 2'100
Now the next question is:
For this you need to understand the iteration under the roof.Why do the various tolerances in the customers example return same solutions?
This is how we create a kickof "condition" which we would try to improve if it violates the imbalanceTolerance.Trivial assignment: each customer is simply assigned to the closest cluster center.
- In the customers example the Trivial assignment satisfies all the imbalance tolerances [100%,50%, 10%] so there's no need to improve the trivial assignment.
- By setting the imbalanceTolerance to 0% the trivial assignment is no longer sufficient.
Bernd
Appendix:
There's no such a call as "find me the best tolerance" in a single step. If you want to go for it you can apply a client logic based on one of the following approaches:
Stragegy 1: start with a low value of imbalance tolerance (e.g.0) - as long as this creates "no valid solution found" increase it step by step
Strategy 2 : start with a trivial tolerance and reduce it until you run into the "no valid solution found". The las successful solution is what you are looking for