Keywords: Address quality, Data quality, Addresses, Address database, CRM, Duplicates, Duplicate check, De-duplication, Consolidation, Selectivity, Software.
De-duplication in address files
The address files (e.g. CSV, Excel, Access) or database tables (e.g. Microsoft SQL Server, Oracle) are checked and cleaned in 3 steps:
- In the first step duplicate candidates are determined by a fuzzy match of addresses.
- In the second step duplicate candidates are combined in groups and compared against each other. It is checked whether it deals with duplicates or with single addresses (Singles) only.
- In the third step the duplicate groups are cleansed. q.address supports two procedures:
- Selection: An address is selected according to certain rules; the remaining addresses are deleted from the inventory. Problem: Along with the eliminated addresses the information contained in them is also lost. For instance, the first name could be blank in the surviving addresses while it still existed in the eliminated addresses. The first name would be lost. The same is applicable for telephone numbers, E-mail and Internet addresses, marketing information etc.
- Consolidation: While consolidating information is collected from all the addresses of the duplicate group and merged according to certain rules in a result address. Loss of information can be prevented in this manner.
If address lists are prepared for mailing, then the processing results in most cases can be used immediately without further reviewing.
If the addresses should be cleansed from the address database of a CRM-, Marketing- or ERP-Systems (so called “Basic database cleansing”), then the task often fails: The results have to be loaded into the database and a solution has to be found for handling the duplicates. Since duplicates cannot regularly be marked as deleted or removed entirely, because more information is attached to them: offers, invoices, marketing information, and contact persons, which in turn may have individual data records.
The search for duplicates is a kernel technology for a larger number of applied tasks:
- Search and elimination of duplicates (= conventional de-duplication)
- Merging several address inventories free of duplicates
- E.g.: Various companies of one group of companies, departments, special fields or applications manage individual address inventories which should be consolidated in a common CRM-System.
- Filters: Positive- and negative matches
- E.g.: Eliminating Nixie– and Robinson-addresses or addresses from black lists. (e.g. customers reluctant to pay)
- Address updates (by comparing with the relocation file among others)
- Address accumulation (by comparing with the reference inventories, like company databases for transfer of marketing data among others)
- Synchronization of various address systems, like the customers in the accounting system (ERP) and the addresses in upstream CRM-System.
The product name for the module duplicate check is “DC” (“DublettenCheck”)
Duplicate check (for cleansing address files) is available in:
- q.address Stand-alone