Technology World

Monday, August 6, 2018

Informatica MDM - Match Rule Tuning

Would you like to know how to perform Informatica MDM match rule tuning? Are you also interested in knowing what steps are involved in match rule tuning? If so, then this article, will assist you in understanding the steps involved in match rule tuning.

Introduction

The Informatica MDM match tuning is an iterative process. It involves the following steps: data profiling, data standardization, defining the fuzzy match key, tuning the fuzzy match process and database tuning.

Activities

The activities mentioned below are needed to perform the match rule tuning in the Informatica MDM.

Activity	Details
Data Profiling	The right data in for the match, the data investigation, the data accuracy, the data completeness.
Data Standardization	Cleaning and standardization
Define the Fuzzy Match key	Fuzzy match keys ( the columns that need to be matched ) with the key width.
Fuzzy Match Process	How to use the following: 1) Key width 2) Match level 3) Search level 4) Cleanse server log 5) Dynamic Match Threshold (DMAT) 6) Filters 7) Subtype Matching 8) Match Only Previous Rowid Object option 9) Configure match threads 10) Enable Light Weight Matching (LWM)
Database Tuning	1) Analyze tables 2) Create indexes 3) Configure Match_Batch_Size 4) Analyze STRP table.

Data Profiling

You need to perform analysis of the data on which the match will be performed. You should also analyze the quality of data across all fields.
Share the result of the data analysis with business users and get inputs about what attributes need to be considered for the matching process.
Identify fields which you think can provide better matches, e.g. SSN, TAXID etc.
The next step is to determine filter criteria which are normally used on exact columns such as COUNTRY=US. This will be helpful for achieving better performance.
You need to also determine the completeness of data. For example, if the country code is valued in only 50% of the records, it may not be a good candidate as an exact column.
You need to verify percentage of data accuracy.,.e.g. the gender field should only contain gender values.
It is always a good idea to analyze data using the pattern mechanism.
Finally determine the type of match population to use. e.g. USA.

Data Standardization

Determine the cleansing rule to standardize data, for example, Street, St. to ST.
Use data standardizing tools such as address doctor, Trillium or any other third party tool.

Determine the Fuzzy Match Key

The basic rules mentioned below about defining the fuzzy match key include:

OrganizationName: If the data contains the organization names or both organization names and the person's name
PersonName: If the data contains person names only
AddressPart1: If the data contains only the address

Tuning the Potential match candidates

a) Key Width:

For less SSA indexes, reduce the key width to ‘Preferred’
For more match candidates, use the key width as ‘Extended’

b) Search Level:

For less SSA ranges use the search level as ‘Narrow’
For more candidates to match use search level as ‘Exhaustive’
Use ‘Typical’ search level for business data
To match most candidates, use search level as ‘Extreme’. It has performance issues associated with it.

c) Match Level:

For records that are highly similar and which should be considered for a match, use match level as ‘Conservative’
Use match level as ‘Typical’ for most matches
Match level ‘Loose’ is better for manual matches to ensure that tighter rules have not missed any potential matches

d) Define the fuzzy match key columns that have more unique data, e.g. the Person Name or the Organization Name

e) Data in the fuzzy match key columns should not contain nulls. Nulls (SSA_KEY is K$$$$$$$) are potential candidates for each other.

Use the range query as below and review the SSA_DATA column for all the qualifying candidates-

SELECT DISTINCT ROWID_OBJECT, DATA_COUNT,SSA_DATA, DATA_ROW

FROM C_PARTY_STRP

WHERE SSA_KEY BETWEEN ‘YBJ>$$$$’ AND ‘YBLVZZZZ’

AND INVALID_IND = 0

ORDER BY ROWID_OBJECT, DATA_ROW

Cleanse Server logs

Cleanse server logs help to determine long running ranges. These long running ranges normally have more candidates to match. Isolate such ranges by looking into the cleanse server logs. Normally, production is a multi-threaded environment, so determine the rangers for these threads. Analyze which thread is taking more time and take out those records from the matching process and re-run the match job.

The video below provides more details about the match rule tuning -

Friday, July 27, 2018

How to Soft Delete BULK Records using SIF - Delete API

Are you facing the issue in your organization about soft deleting the high volume of records in the Informatica MDM? Are you looking for the best possible way to soft delete this bulky volume of records? Are you looking for information about how to use the Services Integration Framework (SIF) – Delete API? This article examines the basic concept of SIF – Delete API and how to implement Java code for soft deleting records using SIF API.

Business Use Case:

Assume that MDM implementation is completed and daily jobs are running well in production. However, on a particular day, the ETL team loaded the wrong set of data in the MDM landing tables which results in records going from landing to staging and from staging to Base Object. Now, bad data is present in the Base Object, Xref and history tables and your business would like to soft delete these records. These options below are available to resolve this problem:

a) Physically delete the records using ExecuteBatchDelete API.

b) The ETL team can soft delete the record and load in landing. It can then be processed using stage and load job.

c) Soft delete records using SIF Delete API.

Out of all these, option ‘Soft delete records using SIF Delete API’ is the easiest to implement to handle business needs.

What is SIF Delete API?

The SIF Delete API can delete a base object record and all its cross-reference (XREF) records. It can also be used to delete any specific XREF record. State of record in Base Object table will be reset when a XREF record is deleted and it is based on the higher precedence basis. The records undergo the following changes when records are deleted:

Records in the ACTIVE state are set to the DELETED state.
Records in the PENDING state are hard deleted.
Records in the DELETED state are retained in the DELETED state.

Sample API Request:

DeleteRequest request = new DeleteRequest();
RecordKey recordKey = new RecordKey();
recordKey.setSourceKey("4001");
recordKey.setSystemName("SRC1");
ArrayList recordKeys = new ArrayList();
recordKeys.add(recordKey);
request.setRecordKeys(recordKeys);
request.setSiperianObjectUid("PACKAGE.CUSTOMER_PUT");

DeleteResponse response = (DeleteResponse) sipClient.process(request);

Response Processing:

The getDeleteResults() returns the list of RecordResult objects which contains all necessary information such as the record key with ROWID_XREF, PKEY_SRC_OBJECT, ROWID_SYSTEM etc which can be retrieved as below.

DeleteResponse response = new DeleteResponse();
for(Iterator iter=response.getDeleteResults().iterator();
iter.hasNext();)
{
  //iterate through response records
  RecordResult result = (RecordResult) iter.next();
  System.out.println("Record: " + result.getRecordKey());
  System.out.println(result.isSuccess()?"Success","Error");
  System.out.println("Message: " + result.getMessage());
}

Sample Code:

You can download source code here.

private void deleteRecord(List xrefIds) {

try {

List successRecord = new ArrayList();

List failedRecord = new ArrayList();

DeleteRequest request = new DeleteRequest();

ArrayList recordKeys = new ArrayList();

Iterator itrRowidXref = xrefIds.iterator();

while (itrRowidXref.hasNext()) {

Integer rowidXrefValue = (Integer) itrRowidXref.next();

RecordKey recordKey = new RecordKey();

recordKey.setRowidXref(rowidXrefValue.toString());

recordKeys.add(recordKey);

}

request.setRecordKeys(recordKeys);

request.setSiperianObjectUid("BASE_OBJECT.C_BO_CUST");

AsynchronousOptions localAsynchronousOptions = new AsynchronousOptions(false);

request.setAsynchronousOptions(localAsynchronousOptions);

DeleteResponse response = (DeleteResponse) sipClient.process(request);

if (response != null) {

for (Iterator iter = response.getDeleteResults().iterator();

iter.hasNext();) {

RecordResult result = (RecordResult) iter.next();

if (result.isSuccess()) {

successRecord.add(result.getRecordKey().getRowidXref());

} else {

failedRecord.add(result.getRecordKey().getRowidXref());

}

System.out.println("Failed Records : " + failedRecord);

} catch (Exception e) {

e.printStackTrace();

}

Details about implementation are explained in this video:

Monday, July 23, 2018

Services Integration Framework – SIF – API – CleansePut

Purpose of the CleansePut API:

The CleansePut API is used to insert or update a record into a base or dependent child base object in a single request. It increases the performance by reducing the number of round trips between the client and the MDM Hub.

How does it work?

During the CleansePut processing all records go through the stage batch process and the load batch process in a single request.
The data is transferred from a landing table to the staging table associated with a specific base object.
During this transfer of data cleansing happens if cleansing is defined.
The mapping created in the MDM hub has a link between the landing table and the staging table along with the data cleaning function.
This mapping name is used to determine the landing and the staging table structure.
After successful processing of the stage job, the load batch process will be started which transfers data from a staging table to the corresponding target table or the base object in the Hub Store.
In order to determine a base object or dependent child table name, the staging table associated with the mapping is used.
Even though data is processed through the stage batch, it does not use the landing and staging tables.

What is role of stage management during CleansePut request?

If state management is enabled then we can specify the initial state of the record in the HUB_STATE_IND column of the mapping. Valid values in the HUB_STATE_IND column:

1 (ACTIVE)
0 (PENDING)
-1 (DELETED)

Default value is 1 when you insert a new record. We cannot use the HUB_STATE_IND column of the mapping to specify the state change while updating the record.

Important points:

Null values can be processed by both the PutRequest and CleansePut APIs. For example, if we do not specify a value for a request, null value will be set.
For the non nullable column, do not insert a null value such as a unique key column.
Values in the read only column cannot be updated or inserted by the CleansePut API.
We can insert or update values in the system columns if the putable property is enabled.
We can use the backslash (\) to escape special characters such as the single quotation mark (') or the tilde (~) in the CleansePut object.
To filter the record we can use the Mappings tool in the Hub Console to include a filter criteria.
The CleansePut API can use delta detection on the staging table. Data will be filtered if the input data does not differ from the existing data.

Methods

Method Name	Description
getCleanseData	Gets the cleansed record
getGenerateSourceKey	Gets the status that indicates whether to generate a source key
getIsFillOnGap
getPeriodReferenceDate
getRecord	Gets the record to update or insert into a base or dependent object
getSiperianObjectUid	Gets the unique ID for the record from SiperianObjectUidProvider.getSiperianObjectUid()
getSystemName	The name of the system
getTimeLineAction
setGenerateSourceKey(boolean generateSourceKey)	Sets the status to indicate whether to generate a source key
setIsFillOnGap(boolean isFillOnGap)
setPeriodReferenceDate(Date periodReferenceDate)
setRecord(Record record)	Sets the record to update or insert into a base or dependent object
setSystemName(String systemName)	Sets the name of the system
setTimeLineAction(int timeLineAction)

Java Sample Example

In the example below, the record with the ROWID_OBJECT = 1000 gets updated and it uses the Stage SRC1 Party mapping:

CleansePut request = new CleansePut ();

Record record = new Record();

record.setSiperianObjectUid("MAPPING.Stage SRC1 Party");

record.setField( new Field("PARTY_ID", "1000") );

record.setField( new Field("FULL_NM", "Ross Paul") );

record.setField( new Field("TAXID", "123456") );

record.setField( new Field("LAST_UPDATE_DATE", new Date()) );

request.setRecord( record );

CleansePutResponse response = (CleansePutResponse) sipClient.process(request);

This video below explains how to use Put API in Java -

Wednesday, July 18, 2018

How to delete records in Informatica MDM using SIF API

In this article we will learn about the process for deleting records in Ithe nformatica MDM Hub.

Pre-Requisite:

1. In order to delete records we need to have SOAP UI installed in our system.

2. Database client such as SQL Developer in order to verify records

3. Server logs access in order to analyze logs in case any issue occurs

Sample Request:

Below is a sample request which can be used to delete record/records in the XREF and BO tables.

You can download the XML Request here.

<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:urn="urn:siperian.api">

<soapenv:Header/>

<soapenv:Body>

<urn:executeBatchDelete>

<urn:username>xxxxx</urn:username>

<urn:password>

<urn:password>yyyyy</urn:password>

<urn:encrypted>false</urn:encrypted>

</urn:password>

<urn:orsId>localhost-orcl102-CMX_ORS</urn:orsId>

<urn:tableName>C_BO_PARTY</urn:tableName>

<urn:sourceTableName>C_BO_PARTY_XREF_DEL</urn:sourceTableName>

<urn:recalculateBvt>TRUE</urn:recalculateBvt>

<urn:cascading>FALSE</urn:cascading>

<urn:overrideHistory>FALSE</urn:overrideHistory>

<urn:purgeHistory>FALSE</urn:purgeHistory>

</urn:executeBatchDelete>

</soapenv:Body>

</soapenv:Envelope>

Details about request:

Below are the components available in this request:

TableName : Base Object table name
SourceTableName: Name of the table that contains the list of cross-reference records to delete. This table should contain at least the ROWID_XREF column or the (PKEY_SRC_OBJECT and ROWID_SYSTEM columns)
Cascading: Set to true to run a cascading batch delete
OverrideHistory:

o Determines if the MDM Hub records the activity performed by the batch delete in the history tables.

o Set to true to record the history of the deleted records in the history table.

o Set to false to ignore the value of PurgeHistory and to write the last state of the data into the history tables when the record is deleted.

· PurgeHistory:

o Determines if the MDM Hub deletes all non-merge history records related to the deleted cross-reference record.

o The deleted history records cannot be retrieved.

o Set to true to delete the history records.

o Set to false to retain the history records.

Thevideo below provides detailed information on how to delete records using the SOAP UI tool.