Geocoding basics (strictly recommended)

deals with geocoding and reverse geocoding in the context of PTV Geocoding&Places, PTV Geocoding&Places OSM and PTV xLocate 1 and 2
Post Reply
User avatar
Bernd Welter
Site Admin
Posts: 2695
Joined: Mon Apr 14, 2014 10:28 am
Contact:

Geocoding basics (strictly recommended)

Post by Bernd Welter »

Hello community,

every once in a while new xServer players start to deal with geocoding which is an important key to success in logistics processes: Almost every other usecase in the logistics scope is based on coordinates. Whether you want
  • to calculate driving times and distances from one location to another
  • to visualize locations of your orders or customers
  • to optimize transport sequences
  • ...
you always need the coordinates of those objects. And this is why you need to perform a geocoding in advance. Furthermore you might distinguish between various precision levels of coordinates: for some usecases it is sufficient to get coordinates which match the real ones by up to several kilometers (long distance trips) while you might need high precision coordinates if you want to find the exact house entry for a parcel delivery driver.

:!: Another prio B task is often done by the players which is the validation of addresses but that’s in fact a different story (check http://xserver.ptvgroup.com/forum/viewt ... f=11&t=414)

Depending on your usecase scenario we distinguish between the following input structures of the data:
  • single field, i.e. the input address is given as one single string (or maybe a country code plus a single string). This pattern is common in the internet when users want to find their spots with lowest efforts (dialog based: users correct their input if not happy with the result). We also call this an "interactive search, with a user feedback loop".
  • multi field, i.e. the input address is described via several strings such as country code, postcode, city, district, street and house number. This pattern is used within internal business systems such as ERP, TMS or CRM. Quite often those systems perform the geocoding in a batch mode (without a user feedback loop) and so it requires a more or less complicated business logic to define an automatic decision process
Noise: though the motivation of some users is definetly a postive one they produce noise within their input: sometimes the users want to add information to the address (potential parking hints, opening hours, …) which disturbes the geocoders because it has to identify and filter this non-address parts.
:idea: Therefore we recommend to create additional fields where users can provide such info. After separating the info we no longer have to spend efforts on the filtering.
Besides the input patterns there are various parameters that tell the geocoder how to understand and how to process the address data. For a detailed documentation of all the parameters we recommend to take a look at the API and the SearchParameter class (xLocate 1) and the SearchOptions (xLocate 2).

Here are just some examples of those parameters (xLocate 1):
  • COUNTRY_CODETYPE: how is the input address country specified? As there are ISO-Code-Systems such as ISO-2, ISO-3 you might ask your users to provide one of this specific code systems. Usually this is hidden behind a text (e.g. the users sees a drop down list with fully translated country names and the workflow forwards the ISO-2 code to the backend – invisible for the user)
  • Search types: we support various search types such as BINARY and FUZZY. While some of those types are rather strict in comparing strings others are flexible trying to manage type-o’s or misspellings.
  • Return details: while some usecases deal with a return list where a user is expected to decide between various hits (or refining his input) other usecases require compact result lists and aggregate several hits to one representative. Major examples are: “returning a list of districts with equal postcodes versus a whole representant of the postcode itself” and “returning a list of house number sections versus an interpolated street center”.
  • Patterns: sometimes different user groups are used to have individual input patterns, e.g. french users provide the housenumber in front of the streetname (7, rue de rivoli) while germans set it afterwards (Kaiserstraße 242). You can assist the geocoder by telling him where to look for such additional data.
  • Attention: some search parameters in the current architecture are already marked as deprecated (SWAP_AND_SPLIT_MODE, ASTERISK_MODE…). Though the API still gathers them they are not really used anymore.
Output structures (xlocate 1)
The following section describes the structures of the result. Though the major parts of the result are the output address (given with the known 7 fields) and the coordinates the geocoder also returns a large number of additional properties per address. While the output address (and geocodes) are sufficient for a dialog user to decide about which hit is the proper one a batch mode requires a lot more information to implement the automatic decision. This is why we return a large number of additional properties. These criteria can be categorized as follows:
  • Properties describing the specific hit itself, e.g. DETAIL_LEVEL (POSTCODE, STREET, …) or the coordinates
  • Properties comparing the specific hit with the input, e.g. FOUNDBY_STREET (which compares the input street with the hit’s street property) or SCORE
  • Properties comparing the specific hit with the other hits, e.g. CLASSIFICATION (UNIQUE: this is the only hit with 100% score versus EXACT: same quality but not the only hit with this quality)
  • THE SCORE: a special comment about the score: most users see this output property as the key to geocoding but a high score does not guarantee a high quality! Please look at the detailed properties such as the field classifications, they are extremely important.
As mentioned before: the number of output properties is huge. You can benefit from spending some time and business logic on understanding the structure of the output. Depending on the requirements of your individual usecase you might have to take a look at specific subset of the info. There is no generic approach. On a metalevel: PTV geocoder provides a lot of potential ingredients but depending on what you’d like to cook you have to choose completely different parts of them.
So the more you understand the character of all those properties the better are your skills to implement a proper automatic geocoding.

So in the end you have to define your individual understanding of “what is a good match”. We can give you some recommendations but the best performance you can get is achieved by dealing with the parameters on your own.
Feel free to gather the documentation of the attached files which enlight you in understanding each and every parameter.
gpGeocoder_Parameter.pdf
Documentation Geocoder_Parameter
(134.64 KiB) Downloaded 917 times
gpGeocoder_Classification.pdf
Documentation Geocoder_Classification
(346.9 KiB) Downloaded 854 times
Best regards from the headquarter (and many, many thanks to the buddies in DEV who provided the docs!)

Furthermore this thread also is quite interesting in this context
http://xserver.ptvgroup.com/forum/viewt ... ?f=11&t=16

Bernd

PS: Coordinate format: of course we support various coordinate formats such as PTG_MERCATOR and OG_GEODECIMAL but those are not geocoding parameters but generic ones (and therefore part of the xServer framework).
Bernd Welter
Technical Partner Manager Developer Components
PTV Logistics - Germany

Bernd at... The Forum,LinkedIn, Youtube, StackOverflow
I like the smell of PTV Developer in the morning... :twisted:
User avatar
bocajo
Posts: 45
Joined: Tue Mar 01, 2016 3:05 pm

Re: Geocoding basics (strictly recommended)

Post by bocajo »

I think that a customer should not concern himself so much with the geocoding search parameters. If a customer doesn't get the expected result she/he can't know if it is a data or a xLocate search algorithm problem. I would prefer that she/he is connecting the support so PTV can clarify what can be done to get the expected result.
Important from my point of view is to know the result fields Score.TotalScore, Field.Classification and maybe even more MatchingPostalCodeDigits especially if a customer like to geocode a bunch of addresses automatically. With these fields a customer can choose its own criterias or threshold value to decide if the result address can be adopted automatically.
Jochen Anderer
Manager Engineer
PTV GROUP GERMANY
Post Reply