Introduction
Activities
Activity
|
Details
|
Data Profiling
|
The right data in for the match, the data investigation, the data accuracy, the data completeness.
|
Data Standardization
|
Cleaning and
standardization
|
Define the Fuzzy Match key
|
Fuzzy match keys ( the columns that need to be matched )
with the key width.
|
Fuzzy Match Process
|
How to use the following:
1) Key width 2) Match level 3) Search level 4) Cleanse server log 5) Dynamic Match Threshold (DMAT) 6) Filters 7) Subtype Matching 8) Match Only Previous Rowid Object option 9) Configure match threads 10) Enable Light Weight Matching (LWM) |
Database Tuning
|
1) Analyze tables
2) Create indexes 3) Configure Match_Batch_Size 4) Analyze STRP table. |
Data Profiling
- You need to perform analysis of the data on which the match will be performed. You should also analyze the quality of data across all fields.
- Share the result of the data analysis with business users and get inputs about what attributes need to be considered for the matching process.
- Identify fields which you think can provide better matches, e.g. SSN, TAXID etc.
- The next step is to determine filter criteria which are normally used on exact columns such as COUNTRY=US. This will be helpful for achieving better performance.
- You need to also determine the completeness of data. For example, if the country code is valued in only 50% of the records, it may not be a good candidate as an exact column.
- You need to verify percentage of data accuracy.,.e.g. the gender field should only contain gender values.
- It is always a good idea to analyze data using the pattern mechanism.
- Finally determine the type of match population to use. e.g. USA.
Data Standardization
- Determine the cleansing rule to standardize data, for example, Street, St. to ST.
- Use data standardizing tools such as address doctor, Trillium or any other third party tool.
Determine the Fuzzy Match Key
- OrganizationName: If the data contains the organization names or both organization names and the person's name
- PersonName: If the data contains person names only
- AddressPart1: If the data contains only the address
Tuning the Potential match candidates
- For less SSA indexes, reduce the key width to ‘Preferred’
- For more match candidates, use the key width as ‘Extended’
- For less SSA ranges use the search level as ‘Narrow’
- For more candidates to match use search level as ‘Exhaustive’
- Use ‘Typical’ search level for business data
- To match most candidates, use search level as ‘Extreme’. It has performance issues associated with it.
- For records that are highly similar and which should be considered for a match, use match level as ‘Conservative’
- Use match level as ‘Typical’ for most matches
- Match level ‘Loose’ is better for manual matches to ensure that tighter rules have not missed any potential matches