xlocate 1: Classification of geocode result
Posted: Fri Jun 06, 2014 3:32 pm
In many cases systems receive addresses from other systems that need to be used in transport planning. For example the transport management system of a transport company can receive data through an EDI from an order management system of a client. Often this data is lacking coordinates needed for further planning and needs to be geocoded.
When geocoding there is always the question of how to determine if the result quality is high enough to accept it and to use it in further calculations without the need of having the result checked by a user. These data sets can become quite big and the difference of only a few percent in the automatic geocode acceptance rate can cause many man hours of manual checking. So how can you use the xLocate server to distinguish between a bad result and a good result?
The first way is to make use of the score. The xLocate will rank the result between 100 (perfect match) and 0 (no match at all). You can find the score on the totalScore attribute on the ResultAddress object. This gives you a straight forward way to set a limit on what you accept automatically and what to send to manual geocoding. However there is a downside: The score calculation is a formula you cannot change. It can be that within your business case the formula doesn't suit your needs.
For example in a quote system for international transport you might not care if a street can be found correctly. As long as the place or postcode are good matches you know that the reality will only slightly differ from your quote. If you are working within parcel delivery a street can become important again because a wrong match can send the vehicle to the wrong side of the city. So there can be a need for a more detailed quality indicator than a single number.
This can be done by working with the field classifications. For many input fields you can request a classification so you can make your own decision tree. These classifications are not returned by default. To request the classifications you need to add them as ResultField in your input. The result field that can help you are:
Note 2: city and city2 are combined in the TOWN_CLASSIFICATION. It is a well-known fact that opinions on whether something is a city or a city2 can differ a lot. Instead of forcing the user to input the data exactly as the map provider has stored it, our geocoding algorithm can work around this. For example: the village of Heffen in Belgium is a district of the city of Mechelen according to the map provider. XLocate will allow you to enter Heffen as a city and will return Mechelen, Heffen as result without adding a penalty for Heffen being in the city input field instead of the city2 input field.
The possible output values can be looked up in the FieldClassificationDescription enumeration.
Examples of general decision tree for transport can be:
If
(POSTCODE_CLASSIFICATION = EXACT and TOWN_CLASSIFICATION >= Medium and STREET_CLASSIFICATION >= High)
Or
(TOWN_CLASSIFICATION >= High and STREET_CLASSIFICATION >= High)
Then
accept the result
Else
send to manual geocoding
In this sample we take into account that a typo in a postcode can easily lead to another valid postcode while a typo in a place name does not. It always wants streets classified high to make sure a vehicle will end up near the real result. It does not look at the house number result because being in the correct street is close enough.
When geocoding there is always the question of how to determine if the result quality is high enough to accept it and to use it in further calculations without the need of having the result checked by a user. These data sets can become quite big and the difference of only a few percent in the automatic geocode acceptance rate can cause many man hours of manual checking. So how can you use the xLocate server to distinguish between a bad result and a good result?
The first way is to make use of the score. The xLocate will rank the result between 100 (perfect match) and 0 (no match at all). You can find the score on the totalScore attribute on the ResultAddress object. This gives you a straight forward way to set a limit on what you accept automatically and what to send to manual geocoding. However there is a downside: The score calculation is a formula you cannot change. It can be that within your business case the formula doesn't suit your needs.
For example in a quote system for international transport you might not care if a street can be found correctly. As long as the place or postcode are good matches you know that the reality will only slightly differ from your quote. If you are working within parcel delivery a street can become important again because a wrong match can send the vehicle to the wrong side of the city. So there can be a need for a more detailed quality indicator than a single number.
This can be done by working with the field classifications. For many input fields you can request a classification so you can make your own decision tree. These classifications are not returned by default. To request the classifications you need to add them as ResultField in your input. The result field that can help you are:
- POSTCODE_CLASSIFICATION
TOWN_CLASSIFICATION
STREET_CLASSIFICATION
HOUSENR_CLASSIFICATION
Note 2: city and city2 are combined in the TOWN_CLASSIFICATION. It is a well-known fact that opinions on whether something is a city or a city2 can differ a lot. Instead of forcing the user to input the data exactly as the map provider has stored it, our geocoding algorithm can work around this. For example: the village of Heffen in Belgium is a district of the city of Mechelen according to the map provider. XLocate will allow you to enter Heffen as a city and will return Mechelen, Heffen as result without adding a penalty for Heffen being in the city input field instead of the city2 input field.
The possible output values can be looked up in the FieldClassificationDescription enumeration.
Examples of general decision tree for transport can be:
If
(POSTCODE_CLASSIFICATION = EXACT and TOWN_CLASSIFICATION >= Medium and STREET_CLASSIFICATION >= High)
Or
(TOWN_CLASSIFICATION >= High and STREET_CLASSIFICATION >= High)
Then
accept the result
Else
send to manual geocoding
In this sample we take into account that a typo in a postcode can easily lead to another valid postcode while a typo in a place name does not. It always wants streets classified high to make sure a vehicle will end up near the real result. It does not look at the house number result because being in the correct street is close enough.