Technology World: Master Data Management

Showing posts with label Master Data Management. Show all posts

Friday, February 22, 2019

Informatica Master Data Management (MDM) Architecture Overview

Are you looking for information about Informatica Master Data Management Architecture? Would you be also interested in knowing what components involved? If so, then you reached the right place, in this article we will explore Informatica MDM Architecture in detail. We will also understand what upstream and downstream systems are involved.

MDM Architecture Overview

As we know, Master Data Management i.e. MDM is a solution for mastering business information. MDM involves several processes with the help of which we can achieve uniformity, accuracy, and consistency in the business data. Such business-critical data can be used for better process management and for achieving the organization's goals. With the help of the MDM solution, we can carry out the data governance practices very effectively.

If we look at the big picture of MDM architecture, we can see, there are basically three layers. The first layer is the source systems, the second layer is MDM implementation and the third layer is consumption.

The source system layer includes operational systems which maintain 3rd party data. This layer may include multiple sources with different platforms such as Siebel, Oracle, SAP, Acxiom or D and B. The data from source systems will not be pushed to MDM layer directly. In order to push data from source system to MDM layer, we normally use ETL layer (Here ETL stands for Extract, Transport, and Load). This data push may happen in the batch mode or real-time mode or near real-time mode.

Once data is entered to MDM landing tables, first data cleansing will happen. Data standardization rules will be applied to enrich the business data. Cleansed and standardized data will be loaded into staging tables. To achieve data integrity constraints are enforced to the Base Object table while loading data from staging table to base object table. This is not the end of the process. Actual processing work will start after this. Even though the cleansed and standardized records are loaded in the MDM system, there will be duplicate and fuzzy records in the system. The next processes i.e. match process will identify such records based on the business criteria and rules developed during the data quality analysis phase. These duplicate records will be consolidated to make a golden copy of records. The golden copies of records may hold relations among them e.g. Manager and Employee relationships or Organization and Branch relationships etc. These relationships can also be maintained in the MDM system in the form of hierarchies.
The data stewardship will help to keep a golden copy of records in its consistency state and enforce controls on create and update processes through the user interface which comes with Informatica MDM product. e.g. Informatica Data Director or Customer 360 application.

All these MDM features such data modeling, data quality, identifying duplicates, consolidating records, maintaining hierarchies and workflows will not be compromised over the security, hence MDM also comes with role based in build security. However, if it is required, we can integrate the organization's existing security features such as LDAP security for authentication. However, authorization of MDM components needs to be happening in the MDM hub as per role-based mechanism. One of the great thing about Informatica MDM is it keeps MDM configurations in synch with the help of metadata.

Okay, we created golden of records in the MDM hub. What we do with this data? Thanks for asking that question, actually, after the successful implementation of MDM solution, the golden copies of records will be available to the consumer to consume. There could be a third party application which can consume data directly from MDM. However, in most of the cases, the data will be pushed from MDM to these consuming system through ETL layer as like data loading from source systems to MDM. It could be the batch mode, real-time or near real-time. There are few other types of systems such as analytical or reporting systems which consumes data for different purposes. The analytical consuming systems such as Data warehouse, Data marts or Portal dashboard will use these golden records to analyze the data and comes better organization growth plans. On the other hand, reporting consuming systems such as business intelligence or corporate performance management will help to produce the report to achieve effectiveness in business processes and to achieve business goals.

MDM Architecture - Deep Dive

We got a basic idea where Informatica MDM fits in the enterprise application. Now is the time to deep dive into the MDM system architecture. Informatica MDM three major components and those are Hub store, MDM hub and Services Integration Framework.

The Hub Store is a database component where business data is stored and consolidated. Hub store is based on underlyng database which can be Microsoft SQL Server, Oracle or DB2. It contains information about all of the databases that are part of your Informatica MDM Hub implementation. It has two parts, one is Master Database and second is Operational Reference Store, also known as ORS. We will use ORS term quite often during our lectures as well as during real time MDM implementation.

What is this Master Database component? Master Database is a database schema which maintains most critical configuration details of Informatica MDM hub. It includes user accounts created using MDM hub users section. Security configuration such as username and encrypted passwords for database schema users and application users. Master data maintains registry for Operational reference Store. e.g. If you register 3 ORS then those 3 entries will be present in Master Database. The default name of the Master Database is CMX_SYSTEM. Normal practice we use term CMX_SYSTEM quite often as like ORS.

We know Registering Database, Creating Users, providing tool access, overview of message queue and security provider. Where these configurations are maintained? Yes, you are right, this information is also persisted in the master database. The Master Database stores the connection settings and properties for each Operational Reference Store which are registered through MDM hub. In other words, we can access and manage multiple Operational Reference Stores from one Master Database.

Important thing to remember is for a given Informatica MDM Hub environment we can have only one Master Database. i.e. only one CMX_SYSTEM for one MDM environment.

Okay, if master database maintains configuration details then where business data is stored? I am glad you asked that question. The answer is, business data is stored in Operational reference store. Lets understand what else Operational Reference Store contains. Along with business data, ORS also maintains the rules for processing the master data. If you remember, about match columns, matchrule sets, all such rules are stored in the ORS. It also stores additional information such as BVT, Tokens, data leaniage along with history. It has Repository tables which start with C_REPOS and Repository archive table which starts with C_REPAR which holds all this information. Do you want to know what is default name of ORS? It is CMX_ORS, but you can name whatever you like because at the end it is database schema name. Unlike Master Database there is no restriction on number of ORS in a given MDM hub environment. But if we configure more ORS in the MDM hub it will adversely impact your MDM env, so use it wisely.

Important thing to remember about ORS is, we cannot associated a single Operational Reference Store with multiple Master Databases. The Master Database also stores site-level information, such as the number of incorrect log-in attempts allowed before a user account is locked out.

Next important component in the MDM architecture is application server. Informatica MDM supports three application servers and those are JBOSS, Weblogic and Webshphere. Irrespective of what kind of application server you are using, the components which get installed on these servers will remain same. We normally install Process Server and Hub Server on the application server. Lets understand little more about Process server. It is java code (to be specific a Java servlet) that cleanses the data and also processes batch jobs. Prior to MDM 9.7, we used to call it as cleanse server instead of process server. Why? because only cleansing used happen on cleanse server. But now, both data cleansing and batch job processing happens on Process server and hence the name. We can configure mutliple process server for better performance. Apart from that, we can configure configure process server in 3 different modes and those are Batch mode, online mode and the Batch and Online mode. We can choose the mode as per our business need. On other hand, Hub Server is used for core and common services which includes security, access and session management.

Monday, January 28, 2019

Important Informatica MDM Interview Questions and Answers - Part IV

Are you looking for the Informatica MDM interview questions and answers? Are you also looking for an explanation for various concepts in MDM? If yes, then refer to this article where we have explained various MDM concepts in the form of interview questions and answers. This article will be helpful for the Informatica MDM interview. In this article, we will focus on questions and answers about Cleanse function in the Informatica MDM.

Q1: What is the use of cleanse function in MDM?

Answer: Informatica MDM hub is used for data enrichment and consolidation. In order to perform data enrichment, it has to go through cleansing and standardization. Cleansing is a process through source data is cleansed for nuisance characters or words, invalid data or repeated words. Cleansing also helps to achieve standardization e.g. converting Limited, Ltd., Lmtd, Lt to standard work LTD etc.
In Informatica MDM hub, cleansing is achieved during stage process while moving data from the landing table to the staging table. We need to install and configure cleanse engine before running stage job.

Q2: What are the cleanse functions you have used your projects so are?

Answer: This is one of the common questions get asked during the Informatica MDM interview.

In order to achieve data cleansing we normally use inbuild cleanse functions such as Concatenation, Trim, Uppercase, regular expression.
For complex operation where IF-ELSE conditions need to be handled then we use graph function. We also use Cleanse List function to achieve cleansing and standardization.
There are several scenarios where inbuild cleanse function do not satisfy the business requirement in such case we build the custom Java Cleanse function. e.g. determining the length of String, Determine the index of the character in the given String.

The video below explains how to develop custom Java cleanse function.

Q3: How to read the database using cleanse function?

Answer: Read database function is used to perform the lookup and get values from the database table. While using read database cleanse function we need to connect to the database by passing table name and column name on which we need to perform the lookup.

Normally, Read database cleanse function is used if we need to populate values in staging table by reading database table.

Q4: What is the Graph Cleanse function and how to create it?

Answer: MDM hub comes with various types of inbuilt cleanse function such as Data Conversion, General Processing, Geographic, Logic Functions, Math Functions, Misc Functions, Noise Functions, and String Functions. However, there are some business scenarios where these inbuilt functions do not meet the requirements.

In such cases, we can combine inbuilt cleanse function and create our own cleanse function. In order to create such a function, we need to use the Graph Cleanse Function. Using Graph Cleanse function we can achieve IF-ELSE or CASE statement scenarios.

Q5: Have you created a custom Java Cleanse function? If yes, what was use case?

Answer: Informatica MDM hub comes with inbuilt cleanse function. We can build custom complex function by combining these inbuilt functions to cleanse and standardize the business data. There are some business cases where inbuilt cleanse function or custom complex cleanse function does not satisfy business needs. In such cases, we need to create custom Java Cleanse functions. Informatica MDM provides Java framework to create custom Java cleanse function.

Business use case: Determining the geocode of the given address.
Assume that your business would like to determine the geocode of the given physical address. We have two options here:
a) Buy address doctor license from Informatica and populate co-ordinates for address
b) Build custom logic using Google Geocoding API (Free) with no extra license money

If we choose option b) where we no need to pay for determining Geocoding of address. In order to implement such custom logic, we need to write custom Java cleanse function.

The video below provides a detailed explanation about how to build custom Java Cleanse function.

Thursday, December 27, 2018

Important Informatica MDM Interview Questions and Answers - Part III

Are you preparing for Informatica MDM interview? And are you looking for the interview questions and answers about Informatica MDM? If yes, then refer to this article. In this article, we are going discuss various questions and their answers which are normally asked in Informatica MDM interview. You can also refer the previous article - Important Informatica MDM Interview Questions and Answers - Part II

Q 1: Suppose you are running stage job with delta detection enabled. While running stage job delta detection is successful but a stage job failed to insert the records in Stage table. How do you handle this issue?

Answer:

This is scenario based question which can be asked by the interviewer to check knowledge of the candidate.

In the case of full data load if stage job failed to process records then we can handle this situation in two ways-
1. Truncate PRL and reload:

When we run stage job, the records from landing table get compared with _PRL table and delta is determined.
If we re-run stage job after its failure then no delta will be determined as the _PRL table will be same as the landing table.
To fix this we can truncate PRL table and re-run stage job.
There will be more time required to run stage job as it is going to process whole data set.
Only delta records will be updated or inserted as part of the load job.

2. Populate PRL table using _RAW table:

If we have enabled RAW retention then this approach will be an efficient approach.
First, we need to determine JOB_ROWID for the previous run using C_REPOS_JOB_CONTROL table.
Using JOB_ROWID we can pull all records from the _RAW table and insert into the _PRL table.
We need to re-run -stage job to process delta records.

The video below provides more insights about the stage and load jobs in Informatica MDM

Q 2: When PRL, OPL, RAW and REJ tables are created?

Answer:

When we configure the landing and staging tables the next is to create the mapping. Once mappings are created then Raw retention and delta detections properties get enabled. The mentioned below are the instances during which PRL, OPL, RAW and REJ tables are created.

a. _REJ table get created when we create the staging table

b. When we configure the staging table for Raw Retention, the _RAW table associated with the staging table is created.

c. The _PRL and _OPL tables are created when we configure delta detection for the staging tables.

Q 3: What are the causes of record rejection?

Answer:
The _REJ table is associated with the staging table. e.g. If the staging table name is C_STG_CRM_PARTY then associated reject table name will be C_STG_CRM_PARTY_REJ.

Reason for Reject table creation:
1. The reject table is created to store rejected records during the stage job and the load job.
2. To increases performance by rejecting a record when it first encounters a reason to reject the record

Note: If there is more than one reason to reject a record, the reject table describes the first reason that encounters.

There are several causes for the record to reject during MDM processes. The main reasons or causes for record rejections are as follows:

The value of PKEY_SRC_OBJECT column is null
The duplicate value in PKEY_SRC_OBJECT column. One one record is processed successfully (One with highest SRC_ROWID). The other duplicate record/records are rejected
The value in the LAST_UPDATE_DATE column contains a future date or null date.
The value in the LAST_UPDATE_DATE column is less than 1900.
The unique column contains duplicate values.
The column HUB_STATE_IND contains values other than 1, -1, 0
The column contains invalid referential integrity value.

Q 4: When PRL, REJ, STG and RAW table get cleared/truncated?

Answer:
This is another interesting question interviewer may ask to check how extensive candidate has worked with Informatica MDM tool.

Not all the system tables in the Informatica MDM are truncated. Some of the system tables are truncated during specific processes.
a. The _PRL table gets truncated during each stage job run
b. The _REJ table never gets truncated during stage or load job. However, we can manually truncate it or we can use Clean SIF API on Base Object table to clean or truncate REJ table.
c. The _STG table is truncated during each stage job
d. The _RAW table never gets truncated during stage or load job. However, if the retention period is complete then the unique records are kept in the _RAW table from stage job prior to the retention period. The remaining records are deleted from the _RAW table. The _RAW table also get truncated when we call Clean SIF API on Base Object Table.

Read More: Learn more about how to handle rejected records.

Q 5: Have you used any data quality tool along with Informatica MDM such as Informatica Data Quality?

Answer:

In some projects, Data Quality tools are used. It is not mandatory to have knowledge or work experience in Data Quality tool. However, having knowledge about Data Quality tool will make your career profile strong.

So if you have Data Quality experience then mention about it. e.g. You can mention that you used Data Quality to perform data analysis and come with data standardization rules for Party and Address data.

You can refer the video below learn more about Informatica Data Quality

Sunday, December 23, 2018

Important Informatica MDM Interview Questions and Answers - Part II

In this article, we will focus on interview questions related to Informatica MDM stage table and delta detection process. Are you interested to also know interview questions and answers about Hard Delete detection process? If yes, then refer to this article where we provide detailed questions and answers about Informatica MDM. Here is the link for Important Informatica MDM Interview Questions and Answers - Part I, in case you have not gone through it already.

Q 1: Where do you configure Audit Trail?

Answer:

The audit trail is used to maintain the history of source data. The history of the source data can be maintained for the specific number of runs or the specific number of job runs. The audit trail is configured at the Stage table level. Audit trail option gets enabled when we create the mapping between landing and staging table. Once Audit trail is configured _RAW table associated with the Stage is get created.

Read more: Click here to read more about Audit Trail and Delta Detection

Q 2: What is Hard Delete Detection?

Answer:

The hard delete detection (HDD) is used to determine records physically deleted from the source. There are two types of Hard Delete Detection in Informatica MDM -

a) Direct Delete

b) Consensus Delete

The details about how to configure Hard Delete Detection in Informatica MDM is explained in the video below -

Q 3: What is delta detection? How to enable delta detection?

Answer:
The delta detection is used determine new inserts and update in existing source record for full data load process. The delta detection happens for the specific column which we can configure at the Stage table level. The delta detection option gets enabled when we create the mapping between landing and staging table. In order to achieve delta detection data from the landing is compared with the _PRL table which is created at the time of delta detection configuration.

In the figure below, we can see data changes on day 1

The second figure below provides states of records in each landing, staging and PRL table due to delta detection process -

Q 4: What is the full data load and incremental data load?

Answer:
A) Full data load: In this case, the source sends full data files every day to load data in MDM. The new inserts and update to existing records will be determined in MDM as part of the delta detection process.

B) Incremental data load: In this case, the incremental file from source is loaded in MDM landing tables every day. The new inserts and update to existing records will be determined outside the MDM process. The MDM delta detection is not required.

Q 5: How to use delta detection with incremental data load?

Answer:

This is tricky question interviewer might ask in order to check whether the candidate really has real-time experience.

The answer to this question is - The delta detection only works with full data load and not with incremental data load.

You can learn more about Informatica MDM here.

Wednesday, December 19, 2018

Important Informatica MDM Interview Questions and Answers - Part I

Are you preparing for Informatica Master Data Management (MDM) interview? Are you also planning to learn MDM concepts? Would you like to know how to prepare for Informatica MDM interview? If yes, then refer to this article which provides detailed information about questions asked during MDM interview. This article also provides details about the reason behind asking the interview questions. Good luck to your interview!

Q 1: Explain your Informatica MDM experience related to MDM Hub configuration, User Exits, IDD and SIF.

Answer:

As the start of the interview, the interviewer may like to know more about your experience and will ask this question. This common question normally asked in every MDM interview.

You can start with explaining, how started your MDM career and then provide experience in each of MDM components such as MDM hub configuration, User Exits, IDD and SIF. If you do not have experience in any of the module or if you have the basic idea about it then mention it accordingly. The sample answer is as below -

I have more than 5 years of Informatica MDM experience. I worked on configuring MDM hub for landing, staging and base object tables. I have a great experience in configuring stage table properties such as delta detection, base object properties. I worked on the configuration of the match and merge rules. I worked extensively on the match and merge job tuning. I have worked on Informatica Data Director configuration tool to create IDD app for data stewards. I also have Core Java knowledge using which developed IDD and MDM hub User Exits to achieve business requirements. In these User Exits, I have used SIF API to connect MDM hub and fetch as well update records in the MDM tables.

Important! The interviewer may ask questions based on your answer to this introductory question.

Q 2: How many sources were present in your last project and what are those?

Answer:

This question is normally followed with several questions which depend on the number of sources configured. So provide the number of source systems which you configured in the project. Also, provide the name of source systems and what kind of data contributed by each source system. The source system names such as SALES, CRM, HCM etc.

Q 3: How many landing, staging and BO tables were present in your last project?

Answer:
In order to answer this question, you can provide below details -

The number of landing, staging and BO tables depends on
a) Data model design
b) Number of Source systems configured

You can also mention the number of staging tables multiple of the source system. e.g. if the number of BO tables are 10 and the number of source systems are 3 then the total number of staging tables are = 10 * 3 = 30.

So, if the number of Source systems = 3
The number of landing tables configured = 10
The number of BO tables configured = 10
The number of Staging tables configured = 30 ( 10 * 3)

Learn more: About the landing and staging tables.

Q 4: What are the processes involved in the Informatica MDM?

Answer:

Informatica MDM involves the various processes to process data from sources. The processes involved in Informatica MDM are

a. Landing: The data is pulled from the source system and pushed in the MDM landing tables.

b. Staging: The landing table data is standardized, cleansed and pushed to the MDM Staging tables.

c. Load: The data from the staging table is loaded to BO table.

d. Tokenization: If we configure fuzzy match rules then in order to generate match tokens, the tokenization process is used.

e. Match: The match process is used to match the records

f. Merge or Consolidation: The matched records are consolidated during merge process.

Read more: Click here to learn about Batch Groups in Informatica MDM

Q 5: What is the stage process and what is its significance?

Answer:

The stage process transfers source data to the staging table.

The job uses stage mapping between the landing table and the staging table.
The data standardization and cleansing is performed during the stage process.
If required database lookup can be achieved during stage job.

You can learn more about stage and load jobs here:

Wednesday, December 12, 2018

Important MDM SQL Queries

This article provides important queries used for daily activities in the Informatica MDM. This article will continuously keep updating for interesting and important SQL queries which are used in day to day activities in MDM.

Survivorship Verification

Order of survivorship is a very important concept in the Informatica MDM. As per Informatica below is the order of precedence,

1. By trust score (only if a column is trust-enabled). The data with the highest trust score wins. If the trust scores are

equal, or if trust is not enabled for a column, then proceed to the next comparison.

2. By SRC_LUD in the cross-reference(XREF) record. The data with the more recent cross-reference SRC_LUD value

wins. If the SRC_LUD values are equal, then proceed to the next comparison.

3. By ROWID_XREF in the cross-reference. ROWID_XREF values are evaluated in numeric descending order. The

data with the highest ROWID_XREF wins.

In order to verify the order of precedence working correctly or not, use below query -

select P.party_name, x.party_name as party_nm, x.rowid_object, x.orig_rowid_object, x.rowid_system,x.src_lud, x.rowid_xref, x.LAST_UPDATE_DATE,
rank() over (partition by x.rowid_object order by x.rowid_system desc , x.src_lud desc, cast(x.orig_rowid_object as decimal) desc , cast(x.rowid_xref as decimal) desc)) as r1
from
CMX_ORS.C_BASE_PARTY P,
CMX_ORS.C_BASE_PARTY_XREF X
where p.rowid_object = x.rowid_object

Tuesday, November 6, 2018

ElasticSearch in the Informatica MDM

Are you looking for the information about ElasticSearch? What is purpose using Elasticsearch in Informatica MDM? If yes, then this article provides you with detailed information about it. This article also highlights on brief history about Elastic Search.

What is Elasticsearch?

Elasticsearch is an open source, a distributed, multitenant-capable full-text search engine developed in Java. It is founded in 2012 to provide a scalable search solution. It comes as ELK stack i.e. Elasticsearch, Logstash and Kibana. These three products together provide great search solution. Elasticsearch is a search engine based on Lucene. Logstash is a repository where actual logs(Information/data) is stored and send to Elasticsearch. Kibana is a user interface where logs are shown in an analytical form such as graph etc.

Informatica MDM and Elasticsearch

Elastic search is integrated with Informatica MDM from MDM version 10.3 for better search functionality in Customer 360 application. Once the Elasticsearch in MDM, the search functionality can be viewed in Customer 360 application as below -

Elasticsearch and Solr Search in Informatica MDM

We can use either Solr or Elasticsearch with Informatica MDM. Both search engines are based on Lucene library. However, Elasticsearch is better in performance. Search with Solr is deprecated and is replaced by the search with Elasticsearch.

With Elasticsearch, we can use the asterisk wildcard character (*) to perform a search.
The query parser of Elasticsearch provides the flexibility to use various types of characters in the search strings.
Solr search does not provide the flexibility to use various types of characters
With Elasticsearch we can use operators such as AND and OR to search for records.

How to install Elasticsearch?

Elasticsearch package comes with Informatica MDM 10.3. The installation instructions are simple and provided in the installation guide. You can install Elasticsearch on any machine where the MDM Hub components are installed or on a separate machine. However, if you would like to install it as standalone then you can install Elasticsearch from here. DOWNLOAD

You can refer the video below to configure Elastic search with Informatica MDM

Tuesday, September 25, 2018

How to monitor what are users logged in to the Informatica Data Director Application?

Are you looking for an article on how to monitor IDD users? Are you also looking for information what changes need to be made in order to achieve it? If yes, then refer this article. This article explains what is need of user monitoring and how to configure it.

Introduction

The Informatica Data Director (IDD) is one of the business critical application. The various business users uses IDD application. It is always good idea to monitor users using the application for security reason. In lower environments such as development or QA, it become more tedious to track who made the change. So having monitory control on login mechanism will try to avoid such incidents. This articles helps to configure IDD application for monitoring users who uses it.

Configuration file

We need to use log4j.xml file to log users which uses IDD application. We can use existing log file or can create new log file.

File Location

We need to update log4j.xml file from below location

<install directory>\hub\server\conf

Code Changes

Add the code below after consoleappender code in the log4j.xml file

</layout>

</appender>

<appender-ref ref="loginAppender"/>

</category>

<appender-ref ref="loginAppender"/>

</category>

Server Restart

Normally application server restart is not required. However, if log file is not generated after above code changes then restart the application server.

How to analyze the log file

If user is logged in or logged out then this information will be stored in the log file. The log file entry will look like as below :

[2018-09-25 15:03:31,774] [http-/0.0.0.0:8080-5] [INFO ] com.siperian.dsapp.common.util.LoginLogger: User <admin> logged into IDD

[2018-09-25 15:04:14,255] [http-/0.0.0.0:8080-2] [INFO ] com.siperian.dsapp.common.util.LoginLogger: User <admin> has been logged out of the IDD"

[2018-09-25 15:04:14,329] [http-/0.0.0.0:8080-2] [INFO ] com.siperian.dsapp.common.util.LoginLogger: User <testuser> logged into IDD

[2018-09-25 15:05:16,295] [http-/0.0.0.0:8080-5] [INFO ] com.siperian.dsapp.common.util.LoginLogger: User <testuser> has been logged out of the IDD"

[2018-09-25 15:05:16,295] [http-/0.0.0.0:8080-5] [INFO ] com.siperian.dsapp.common.util.LoginLogger: User <admin> has been logged out of the IDD"

[2018-09-25 15:05:23,309] [http-/0.0.0.0:8080-6] [INFO ] com.siperian.dsapp.common.util.LoginLogger: User <jamesmanager> logged into IDD

[2018-09-25 15:06:32,365] [http-/0.0.0.0:8080-7] [INFO ] com.siperian.dsapp.common.util.LoginLogger: User <jamesmanager> has been logged out of the IDD"

The video below provides additional information about how to monitor users which are logged in the IDD Application.

Friday, September 21, 2018

What is Hard Delete Detection in Informatica MDM?

Do you know how Hard Delete Detection (HDD) works in Informatica MDM? Are you interested in knowing the basic concepts and the working principles of HDD? Are you looking for a sample code of HDD? If so, then you can refer to this article. In this article we will discuss the Hard Delete Detection process and its usage in the Informatica MDM.

What is the Hard Delete Detection?

Hard Delete detection is abbreviated as HDD. It is a process to determine the records removed or physically deleted from the source system. Informatica MDM determines the records which are removed from the source system and soft delete it in the associated MDM base object tables.

What are the soft delete and the hard delete?

Soft delete and hard delete are not Informatica MDM concepts. These are well known concepts for any database management system. Soft deletion is achieved by using a column such as 'STATUS' or 'ACTIVE_INACTIVE' or any other column which tells the record is deleted or active for business. So soft deleted records are physically maintained in the database but are not active for business purposes. The soft deleted records can be recovered by making them active and making them available them for business.

On the other hand the hard deleted records are physically removed from the database and those are not available to business once those are hard deleted.

Do all types of databases support HDD in Informatica?

No. Only Oracle and Microsoft SQL Server environments can detect records that are removed from the source systems. The DB2 database environment cannot detect records which are removed from the source systems.

How HDD works in the Informatica MDM?

The stage job in the MDM Hub compares all the records in the landing with the records in the previous landing table (aka PRL table) associated with each landing table.
After determining the missing records in the landing table those are flagged as hard deletes for a full load.
The hard delete flagged records are reinserted back into the landing table along with a delete flag value.
For flagging records for hard deletes in the source, either we can use HUB_STATE_IND column or any other custom column.
After running the stage and the load job in the MDM Hub, records are updated in the associated base object table.

What are the requirements for HDD implemenation?

Below are major requirements for implementing HDD :

In order for HDD to work, we need to have a full load every time. It does not work with incremental or transitional loads.
We need to create a hard delete detection table in the repository table to configure hard deletes.
In order to make entry into the job metric table, we need to maintain an additional configuration.
HDD requires user exits to be written in Java.

What are the User Exits required for HDD implementation?

Below are the user exits required to be implemented for HDD:

1. Post Landing User Exits

2. Post Stage User Exits

Sample code for User Exits:

Here is a sample code for Post Landing User Exits:

public class PostLandingUE implements PostLandingUserExit {

public void processUserExit(UserExitContext oUEContext, String stagingTableName, String landingTableName,

String PRLTableName) throws Exception {

try {

HardDeleteDetection hdd = new HardDeleteDetection(oUEContext.getBatchJobRowid(), stagingTableName);

hdd.startHardDeleteDetection(oUEContext.getDBConnection());

} catch (Exception e) {

e.printStackTrace();

}

Here is a sample code for Post Stage User Exits:

public class PostStageUE implements PostStageUserExit {

public void processUserExit(UserExitContext userExitContext, String stagingTableName, String landingTableName,

String PRLTableName) throws Exception {

try {

ConsensusFlagUpdate consensusProcess = new ConsensusFlagUpdate(userExitContext.getBatchJobRowid(), stagingTableName);

consensusProcess.startConsensusFlagUpdate(userExitContext.getDBConnection());

} catch (Exception e) {

e.printStackTrace();

}

The video below provides detailed information about the Hard Delete detection process in the MDM hub.

Monday, September 3, 2018

How to install Informatica MDM software?

Are you planning to install Informatica MDM in the Windows system? Are you looking for step by step instructions about the installation of Informatica MDM? If so, then you can refer to this article to understand the steps involved during the MDM installation.

Step 1: Locate the installer file, received from the Informatica Support team. In the Windows system, it will be .exe file. Double click on the .exe file. The dialog window will appear as shown in the screen below. Read the agreement and select the radio button corresponding to 'I accept the terms of the License Agreement' and then click on the 'Next' button.

Step 2: Select the file system location on the Windows System where you would like to install the Informatica MDM product. You can browse the location using the 'Choose' button. After providing the location click on the 'Next' button.

Step 3: If you would like to create a short cut for the MDM Hub on the Desktop or in the Start Menu then choose the shortcut folder location. Click on the 'Next' button after choosing the shortcut folder option.

Step 4: When you buy the Informatica product from Informatica, it comes with the license file. You can browse the license file using 'Choose' button and then click on the 'Next' button.

Step 5: The next step is important. If you are using the Weblogic application server then select the radio button corresponding to the 'Weblogic' option or else select the other based on the application server you are using in your environment. After it, click on the 'Next' button.

Step 6: The next step will ask to provide the application server home location. You can provide the path or browse the application server home location by using 'Choose' button. After providing the home path, click on the 'Next' button.

Step 7: If you are using the Weblogic application server then the dialog window mentioned below will appear to provide the Weblogic application server details. You need to provide the host name (it is the server name on which the weblogic server is installed), Server name, Weblogic user name, Weblogic user password and listen port on which the weblogic server is listening. You can get the information from your middleware (application server installation) team. In the screen shot below, the sample values are provided. After providing the server details, click on the 'Next' button.

Step 8: After providing the details about the application server, the next step is to provide the database details. If you are using the Oracle database then select the radio button corresponding to 'Oracle 11g R2' or the respective database version (If you are using a higher version of MDM then the database version option will be different). Click on the 'Next' button.

Step 9: Based on the database setup, if you can select the database Service Name or the database SID (Oracle System Id). You can get the Service Name or the SID name from your DBA. Click on the 'Next' button.

Step 10: If you are using the Oracle database then the dialog window mentioned below will appear to provide the Oracle database details. You need to provide the Server (it is the name of the server on which the Oracle database is installed), Port on which the database is listening, Service name, System Schema name and System Schema password. You can get the information from your DBA team. In the screen shot below, the sample values are provided. After providing the database details, click on the 'Next' button.

Step 11: The next dialog window will show connect url. You can verify connect url. If you would like to make any changes in connect url then select the radio button corresponding to 'Yes' or else select the radio button corresponding to 'No' option. After selecting the appropriate option click on the 'Next' button.

Step 12: After the installation of the MDM product we need to run the post install script. It can be done in two ways: We can run the post install script

1) immediately after running the installation file or

2) manually after the installation is done.

If you would like to run the post install script immediately after the installation is done then select the radio button corresponding to the 'Yes' option. After the appropriate option selection click on the 'Next' button.

Step 13: After providing all necessary information, the pre-installation summary will be provided as shown in the screen shot below. It shows the Product Name, the Install Folder, the Shortcut location, the Application server details and the database details.

Step 14: The pre-installation summary also provides more information as shown in the screen shot below. It shows the required disk space as well as the available disk space. If everything looks good then click on the 'Install' button.

Step 15: The installation will start and the progress bar will show the progress of installation. The screen shot below provides the details about progress with the steps of execution.

Step 16: Finally, the installation will be completed and the successful message will appear on the screen.

With the help of these steps, you will be able to install the MDM software successfully. Refer to the video below to learn the Informatica MDM product.

Thursday, August 16, 2018

How to enable DEBUG mode match process in Informatica MDM?

Are you facing any issues while running the match process in Informatica MDM? Are you looking for information about how to analyze the issue in the match process? Would you also be interested in knowing how to enable DEBUG mode for match process such as searchMatch API, Match jobs, IDD Extended Search? If so, then this article provides detailed information about it.

Introduction

The match process is one of the critical processes in the MDM hub. Any issue in this process, will impact the business. So to analyze the issue, we need logs in DEBUG mode for the match process. Configure your log4j.xml file to generate a separate match log file. This log file can be used to analyze match issues in the Master Data Management (MDM) Hub and Informatica Data Director (IDD). Match can occur during execution of the searchMatch API, the Match jobs and the IDD Extended Search.

After making the changes mentioned below in log4j.xml file, it will generate match.log file. It will have details about the matches comparing with each of the match rules separately. It will be helpful to understand the behavior of the match rule configuration. Based on the log file analysis we can fine tune the match rules.

How to make log4j.xml changes for the match process?

We need to make the configuration changes below in log4j.xml file. This file presents at <Install directory>/hub/cleanse/conf directory.

Add the entry below in the log4j.xml file to generate the 'matchprocess.log' file:

</layout>

</appender>

<appender-ref ref="MATCH"/>

</category>

<appender-ref ref="MATCH"/>

</category>

The video below provides information about how the match process works in the MDM hub.

How to enable DEBUG mode in the Informatica MDM?

Are you looking for information about how to enable DEBUG mode in the Informatica MDM? Are you also looking for what configuration files need to be updated to see logs in the DEBUG mode? Would you be interested in knowing what the locations of configuration and log files are? If so, then you can read this article to get more interesting details about MDM logging.

Introduction

Intermatica MDM is a complex application. It involves many processes such as the stage, the load and the match and merge jobs etc. During execution of these jobs we might notice any issue. In order to analyze any issue, the log files play an important role. The log files in DEBUG mode provide more information compared to the log files in INFO mode.

What are the locations for log and configuration files?

The logs are stored at the location below:

a) MDM Cleanse log file: <Install directory>\hub\cleanse\logs\cmxserver.log

b) MDM Server log file: <Install directory>\hub\server\logs\cmxserver.log

c) To change MDM Cleanse log file configuration, update the file mentioned below

<Install directory>\hub\cleanse\conf\log4j.xml

d) To change MDM Server log file configuration, we can update the configuration file mentioned below

<Install directory>\hub\server\conf\log4j.xml

What are the configuration changes required to be made for enabling cleanse logs in DEBUG mode?

To enable the cleanse logs in the debug mode, perform the steps mentioned below:

Change the priority to "DEBUG" in all the following categories:

<priority value="DEBUG"/

</category>

</category>

</category>

To log the database queries change the priority to "ON"

<appender-ref ref="FILE"/>

</category>

Change the threshold parameter to DEBUG.

Increase the maximum file size to a higher value if required (Optional)

Increase the number of files if required (Optional)

Important points:

No server restart is required after making changes in the log4j file. The changes will automatically be reflected within a few minutes.
For a clustered environment, update the log4j file in all the nodes of the cluster individually.
If the socket server is down, the log messages will be lost
There will be negligible performance impact as the socket server and MDM server are on the same machine so network latency does not have a big impact

DronaBlog

Friday, February 22, 2019

MDM Architecture Overview

MDM Architecture - Deep Dive

Monday, January 28, 2019

Q1: What is the use of cleanse function in MDM?

Q2: What are the cleanse functions you have used your projects so are?

Q3: How to read the database using cleanse function?

Q4: What is the Graph Cleanse function and how to create it?

Q5: Have you created a custom Java Cleanse function? If yes, what was use case?

Thursday, December 27, 2018

Q 1: Suppose you are running stage job with delta detection enabled. While running stage job delta detection is successful but a stage job failed to insert the records in Stage table. How do you handle this issue?

Q 2: When PRL, OPL, RAW and REJ tables are created?

Q 3: What are the causes of record rejection?

Q 4: When PRL, REJ, STG and RAW table get cleared/truncated?

Q 5: Have you used any data quality tool along with Informatica MDM such as Informatica Data Quality?

Sunday, December 23, 2018

Q 1: Where do you configure Audit Trail?

Q 2: What is Hard Delete Detection?

Q 3: What is delta detection? How to enable delta detection?

Q 4: What is the full data load and incremental data load?

Q 5: How to use delta detection with incremental data load?

Wednesday, December 19, 2018

Q 1: Explain your Informatica MDM experience related to MDM Hub configuration, User Exits, IDD and SIF.

Q 2: How many sources were present in your last project and what are those?

Q 3: How many landing, staging and BO tables were present in your last project?

Q 4: What are the processes involved in the Informatica MDM?

Q 5: What is the stage process and what is its significance?

Wednesday, December 12, 2018

Survivorship Verification

Tuesday, November 6, 2018

What is Elasticsearch?

Informatica MDM and Elasticsearch

Elasticsearch and Solr Search in Informatica MDM

How to install Elasticsearch?

Tuesday, September 25, 2018

Introduction

Configuration file

File Location

Code Changes

Server Restart

How to analyze the log file

Friday, September 21, 2018

What is the Hard Delete Detection?

What are the soft delete and the hard delete?

Do all types of databases support HDD in Informatica?

How HDD works in the Informatica MDM?

What are the requirements for HDD implemenation?

What are the User Exits required for HDD implementation?

Sample code for User Exits:

Monday, September 3, 2018

Thursday, August 16, 2018

Introduction

How to make log4j.xml changes for the match process?

Introduction

What are the locations for log and configuration files?

What are the configuration changes required to be made for enabling cleanse logs in DEBUG mode?

Important points: