Monday, August 6, 2018

Important and Useful Unix commands

Are you looking for an article which provides a list of important commands used for daily unix activities? This article provides a consolidated list of unix commands.


In this article we have listed unix commands for : File and Directories, Compressed Files and Manipulating Data

File and Directory

Displays File Contents
Changes Directory to another directory
Changes file group
Changes permissions
Copies source file into destination
Determines file type
Finds files
Searches files for regular expressions
Displays first few lines of a file
Creates softlink on oldname
Displays information about file type
Creates a new directory dirname
Displays data in paginated form
Moves (Renames) an oldname to newname
Prints current working directory
Removes (Deletes) filename
Deletes an existing directory provided it is empty
Prints last few lines in a file
Updates access and modification time of a file
To view file content

Compressed Files

Compresses files
Helps uncompress gzipped files
GNU alternative compression method
Helps uncompress files
List, test and extract compressed files in a ZIP archive
Cat a compressed file
Compares compressed files
Compares compressed files
File perusal filter for crt viewing of compressed text
Locates commands by keyword lookup
Displays command information pages online
Displays manual pages online
Searches the whatis database for complete words
GNOME help viewer

Manipulating Data

Pattern scanning and processing language
Compares the contents of two files
Compares sorted data
Cuts out selected fields of each line of a file
Differential file comparator
Expands tabs to spaces
Joins files on some common field
Data manipulation language
Stream text editor
Sorts file data
Splits file into smaller files
Translates characters
Reports repeated lines in a file
Counts words, lines, and characters
Opens vi text editor
Opens vim text editor
Simple text formatter
Checks text for spelling error
Checks text for spelling error
GNU project Emacs
ex, edit
Line editor

The video below provides a tutorial on Unix topics -

Informatica MDM - Match Rule Tuning

Would you like to know how to perform  Informatica MDM match rule tuning? Are you also interested in knowing what steps are involved in match rule tuning? If so, then this article, will assist you in understanding the steps involved in match rule tuning.


The Informatica MDM match tuning is an iterative process. It involves the following steps: data profiling, data standardization, defining the fuzzy match key, tuning the fuzzy match process and database tuning.


The activities mentioned below are needed to perform the match rule tuning in the Informatica MDM.

Data Profiling​
The right data in for the match, the data investigation, the data accuracy, the data completeness.
​Data Standardization
Cleaning and standardization
​Define the Fuzzy Match key
Fuzzy match keys  ( the columns that need to be matched ) with the key width.
Fuzzy Match Process​
How to use the following:
1) Key width
2) Match level
3) Search level
4) Cleanse server log
5) Dynamic Match Threshold (DMAT)
6) Filters
7) Subtype Matching
8) Match Only Previous Rowid Object option
9) Configure match threads
10) Enable Light Weight Matching (LWM)
​Database Tuning
​1) Analyze tables
2) Create indexes
3) Configure Match_Batch_Size
4) Analyze STRP table.

Data Profiling

  • You need to perform analysis of the data on which the match will be performed. You should also analyze the quality of data across all fields.
  • Share the result of the data analysis with business users and get inputs about what attributes need to be considered for the matching  process.
  • Identify fields which you think can provide better matches, e.g. SSN, TAXID etc.
  • The next step is to determine filter criteria which are normally used on exact columns such as COUNTRY=US. This will be helpful for achieving better performance.
  • You need to also determine the completeness of data. For example, if the country code is valued in only 50% of the records, it may not be a good candidate as an exact column.
  • You need to verify percentage of data accuracy.,.e.g. the gender field should only contain gender values. 
  • It is always a good idea to analyze data using the pattern mechanism.
  • Finally determine the type of match population to use. e.g. USA.

Data Standardization

  • Determine the cleansing rule to standardize data, for example, Street, St. to ST.
  • Use data standardizing tools such as address doctor, Trillium or any other third party tool.

Determine the Fuzzy Match Key

The basic rules mentioned below about defining the fuzzy match key include:
  • OrganizationName: If the data contains the organization names or both organization names and the person's name 
  • PersonName: If the data contains person names only
  • AddressPart1: If the data contains only the address

Tuning the Potential match candidates

a) Key Width:
  • For less SSA indexes, reduce the key width to ‘Preferred’  
  • For more match candidates, use the key width as ‘Extended’ 
b) Search Level: 
  • For less SSA ranges use the search level as ‘Narrow’
  • For more candidates to match use search level as ‘Exhaustive’
  • Use ‘Typical’ search level for business data
  • To match most candidates, use search level as ‘Extreme’. It has performance issues associated with it. 
c) Match Level: 
  • For records that are highly similar and which should be considered for a match, use match level as ‘Conservative’
  • Use match level as ‘Typical’ for most matches
  • Match level ‘Loose’ is better for manual matches to ensure that tighter rules have not missed any potential matches
d) Define the fuzzy match key columns that have more unique data, e.g. the Person Name or the Organization Name

e) Data in the fuzzy match key columns should not contain nulls. Nulls (SSA_KEY is K$$$$$$$) are potential candidates for each other. 

Use the range query as below and review the SSA_DATA column for all the qualifying candidates-


Cleanse Server logs

Cleanse server logs help to determine long running ranges. These long running ranges normally have more candidates to match. Isolate such ranges by looking into the cleanse server logs. Normally, production is a multi-threaded environment, so determine the rangers for these threads. Analyze which thread is taking more time and take out those records from the matching process and re-run the match job.

The video below provides more details about the match rule tuning -

Friday, August 3, 2018

Informatica Master Data Management - MDM - Quiz - 3

Q1. What statement correctly describes what the consolidation indicator is used for?

A. It indicates where the record is in the consolidation process
B. It indicates if the column appears in the informatica data director IDD
C. It indicates if the row can be merged in the informatica data director IDD
D. It indicates if the column can be used in an informatica data director IDD query.

Q2. Which statement is correct regarding security access manager SAM?

A. SAM enforces an organization’s security policy
B. SAM security applies primarily to users of third party applications
C. The hub console has its own security mechanisms to authenticate users
D. All are correct.

Q3. There must be one landing table for every base object

A. True
B. False

Q4. Which statement is true about State management Enabled Base object?

A. Trust is calculated for all records irrespective of the record state.
B. Trust is calculated only for records with ACTIVE STATE.
C. Trust calculation is ignored for records with a DELETE STATE.
D. Trust calculation is ignored both for records with DELETE state and Pending state

Q5. What does the cleanse match server process do?

A. It handles cleanse and match requests.
B. It enables the configuration of cleanse and match rules.
C. It embeds Informatica Data Quality in the MDM Hub
D. It creates Cleanse and Match Web Services.

Previous Quiz             Next Quiz

Informatica Master Data Management - MDM - Quiz - 2

Q1. What does it mean if the dirty indicator for a record set to 1?

A. The record is new
B. The record has been updated and needs to be tokenized
C. The record is ready for consolidation
D. The record is in an active state

Q2. Which statement is true regarding reject records?

A. Records with values not converted in the staging table to the expected data type, will be rejected
B. The stage job will reject records with missing or invalid lookup values
C. The load job will reject records where last_update_date is in the past
D. The stage job will reject addresses with invalid zip codes

Q3. Load by ROWID bypasses the match process and directly inserts the record into the X-ref table for the designated BO

A. True
B. False

Q4. Which statement is true about master data?

A. Master data often resides in more than one database
B. Master data is often used by several different groups
C. Master data is non essential business data
D. Master data is sometimes referred to as reference data

Q5. Which are the characteristics of an immutable source system?

A. Records from the source system will be accepted as unique
B. Only one source system can be configured as an immutable sources
C. Immutable sources are also distinct systems
D. No records can be matched to it

Previous Quiz             Next Quiz

Thursday, August 2, 2018

Informatica Master Data Management - MDM - Quiz - 1

Q1. Which of these choices are associated with base object properties?

A. Complete tokenize Ratio
B. Requeue on parent merge
C. Generate match tokens on load
D. Allow null update
E. Allow constraint to be disabled

Q2. Which are available within the enterprise manager?

A. Database log configuration
B. Environment report
C. Hub server properties
D. Message queues setting

Q3. You can adjust the match weight for a fuzzy match column

A. True
B. False

Q4. Which of the following hub components can be promoted using metadata manager?

A. The cleanse function
B. Packages using custom queries
C. Message queues
D. Custom index

Q5. In regards to the match purpose which statement is not correct?

A. Match purpose defines the primary goal behind a match rule
B. Each match purpose supports a combination of mandatory and optional fields
C. Both family and wide family are valid for match purpose
D. House hold and wide household are valid for match purpose

      Next Quiz

What is ROWID_OBJECT and ORIG_ROWID_OBJECT in Informatica MDM and what is significance?

 In Informatica Master Data Management (MDM), ROWID_OBJECT and ORIG_ROWID_OBJECT are critical identifiers within the MDM data model, parti...