Comparison of bulk operations
Posted: Thu Mar 21, 2024 9:06 am
Hi there,
In a recent project I've been asked about the impact of "not having a bulk operation" for a specific task.
Let me start with viewing this from a “logical” perspective on meta level – not technical wise…
A user provides “N pieces of information (e.g. addresses) and requires N times a function being applied (e.g. geocoding of each address)” – doing this via a single batch call does not generate more information, it simply gathers it in a different way. Especially when it comes to “geocoding of large volumes” there are different approaches of gathering the final “info” you need:
In a recent project I've been asked about the impact of "not having a bulk operation" for a specific task.
No, it does not. (And by the way we will add bulk geocoding in PTV Developer - but once you read the statement below you probably understand why other tasks seem to be more important)You seem to not give much importance to the issue of not having batch geocoding in PTV Developer, when this is a requirement. Is it simply not possible? Does this rule out Developer?
Let me start with viewing this from a “logical” perspective on meta level – not technical wise…
A user provides “N pieces of information (e.g. addresses) and requires N times a function being applied (e.g. geocoding of each address)” – doing this via a single batch call does not generate more information, it simply gathers it in a different way. Especially when it comes to “geocoding of large volumes” there are different approaches of gathering the final “info” you need:
- Approach 1 - Reference approach: send N elemental geocodings in a single threaded sequence requires 100% time for the global info being available on client side.
- Approach 2 - Apply one BULK operation (if possible): This might reduce the calculation time from 100% to 9x%. The bigger N is the more can be saved but I wouldn’t expect “wonders”, because the calculation inside the service is still a sequencial one. Only the network traffic is reduced compared to te reference approach.
- Approach 3 - Send the elemental requests in parallel (without exceeding a certain “degree of parallelism”): Here's huge potential because the reference time can simply be divided by the degree:
- degree == 2: reduces the clients waiting time for 100% info being available to roughly 50%
- degree == 3 : reduces the clients waiting time for 100% info being available to roughly 33%
- degree == N : reduces the client's waiting time to roughly 100% / N
- Approach 4 - Now the biggest potential is when you apply both strategies at the same time:
- Cut the workload in chunks and send them through parallel bulk operations
- In all these approaches the transaction volume relevant for billing is equal.
- This works fine if you replace the initial approach with a real bulk/batch operation.
- Sidenote: this is NOT the same if you compare a distance matrixc with [N:M] with NxM elemental routings! In this case the temporary data strutures of the two approaches are NOT equal and can lead to gaps in the output information!
Native approach | Shortcut approach | Quality | Performance gain | Supported by API |
---|---|---|---|---|
N times single geocoding | single threaded bulk geocoding | 100% compareable | small | xLocate1, xLocate 2 |
N times single geocoding | multi threaded single geocodings | 100% compareable | huge | xLocate1, xLocate 2 |
N times single geocoding | multi threaded bulk geocodings | 100% compareable | hugest | xLocate1, xLocate 2 |
N times route info through 2 or more waypoints | xroute1.calculateBulkRouteInfo | 100% compareable | small | xRoute1 |
N times route info through 2 waypoints sharing start or destination or both | xroute1.calculateMatrixInfo xDima2.calculateDistanceMatrix | not compareable | huge | xRoute1, xDima2 |