Hello,
I have a database with thousands of records that contain personal details of customers. Some of these records pertain to the same customer - however, they have been submitted by different people, so they differ slightly in detail.
I've been looking to see if any of the data mining tools provided by Business Intelligence Studio in SQL Server 2005 will enable me to achieve a high degree of accuracy in matching records that pertain to the same customer. From what I can see, these tools seem more suited to making general predictions based on large groupings rather than the kind of precise prediction I am looking for.
So I'd appreciate it if anyone could tell me if there is any way I could use Business Intelligence Studio to match these 'duplicate' records together, or whether I will have to create a more SQL-based solution which attempts to match the customer records using SELECT statements and making assumptions about the data.
TIA,
Kweri
One solution is to start by creating an Integration Services project.
In the project, define a Data Flow task and add the following transforms:
- a Data Source transform, which reads from your database
- a Fuzzy Matching transform
- a Data Destination transform
The Fuzzy Matching Integration Services transform is intended to resolve the kind of problem you describe (matching records based on similarity)
No comments:
Post a Comment