Technology World

Monday, August 6, 2018

Important and Useful Unix commands

Are you looking for an article which provides a list of important commands used for daily unix activities? This article provides a consolidated list of unix commands.

Introduction

In this article we have listed unix commands for : File and Directories, Compressed Files and Manipulating Data

File and Directory

Command	Details
cat	Displays File Contents
cd	Changes Directory to another directory
chgrp	Changes file group
chmod	Changes permissions
cp	Copies source file into destination
file	Determines file type
find	Finds files
grep	Searches files for regular expressions
head	Displays first few lines of a file
ln	Creates softlink on oldname
ls	Displays information about file type
mkdir	Creates a new directory dirname
more	Displays data in paginated form
mv	Moves (Renames) an oldname to newname
pwd	Prints current working directory
rm	Removes (Deletes) filename
rmdir	Deletes an existing directory provided it is empty
tail	Prints last few lines in a file
touch	Updates access and modification time of a file
vi	To view file content

Compressed Files

Command	Details
compress	Compresses files
gunzip	Helps uncompress gzipped files
gzip	GNU alternative compression method
uncompress	Helps uncompress files
unzip	List, test and extract compressed files in a ZIP archive
zcat	Cat a compressed file
zcmp	Compares compressed files
zdiff	Compares compressed files
zmore	File perusal filter for crt viewing of compressed text
apropos	Locates commands by keyword lookup
info	Displays command information pages online
man	Displays manual pages online
whatis	Searches the whatis database for complete words
yelp	GNOME help viewer

Manipulating Data

Command	Details
awk	Pattern scanning and processing language
cmp	Compares the contents of two files
comm	Compares sorted data
cut	Cuts out selected fields of each line of a file
diff	Differential file comparator
expand	Expands tabs to spaces
join	Joins files on some common field
perl	Data manipulation language
sed	Stream text editor
sort	Sorts file data
split	Splits file into smaller files
tr	Translates characters
uniq	Reports repeated lines in a file
wc	Counts words, lines, and characters
vi	Opens vi text editor
vim	Opens vim text editor
fmt	Simple text formatter
spell	Checks text for spelling error
ispell	Checks text for spelling error
emacs	GNU project Emacs
ex, edit	Line editor

The video below provides a tutorial on Unix topics -

Informatica MDM - Match Rule Tuning

Would you like to know how to perform Informatica MDM match rule tuning? Are you also interested in knowing what steps are involved in match rule tuning? If so, then this article, will assist you in understanding the steps involved in match rule tuning.

Introduction

The Informatica MDM match tuning is an iterative process. It involves the following steps: data profiling, data standardization, defining the fuzzy match key, tuning the fuzzy match process and database tuning.

Activities

The activities mentioned below are needed to perform the match rule tuning in the Informatica MDM.

Activity	Details
Data Profiling	The right data in for the match, the data investigation, the data accuracy, the data completeness.
Data Standardization	Cleaning and standardization
Define the Fuzzy Match key	Fuzzy match keys ( the columns that need to be matched ) with the key width.
Fuzzy Match Process	How to use the following: 1) Key width 2) Match level 3) Search level 4) Cleanse server log 5) Dynamic Match Threshold (DMAT) 6) Filters 7) Subtype Matching 8) Match Only Previous Rowid Object option 9) Configure match threads 10) Enable Light Weight Matching (LWM)
Database Tuning	1) Analyze tables 2) Create indexes 3) Configure Match_Batch_Size 4) Analyze STRP table.

Data Profiling

You need to perform analysis of the data on which the match will be performed. You should also analyze the quality of data across all fields.
Share the result of the data analysis with business users and get inputs about what attributes need to be considered for the matching process.
Identify fields which you think can provide better matches, e.g. SSN, TAXID etc.
The next step is to determine filter criteria which are normally used on exact columns such as COUNTRY=US. This will be helpful for achieving better performance.
You need to also determine the completeness of data. For example, if the country code is valued in only 50% of the records, it may not be a good candidate as an exact column.
You need to verify percentage of data accuracy.,.e.g. the gender field should only contain gender values.
It is always a good idea to analyze data using the pattern mechanism.
Finally determine the type of match population to use. e.g. USA.

Data Standardization

Determine the cleansing rule to standardize data, for example, Street, St. to ST.
Use data standardizing tools such as address doctor, Trillium or any other third party tool.

Determine the Fuzzy Match Key

The basic rules mentioned below about defining the fuzzy match key include:

OrganizationName: If the data contains the organization names or both organization names and the person's name
PersonName: If the data contains person names only
AddressPart1: If the data contains only the address

Tuning the Potential match candidates

a) Key Width:

For less SSA indexes, reduce the key width to ‘Preferred’
For more match candidates, use the key width as ‘Extended’

b) Search Level:

For less SSA ranges use the search level as ‘Narrow’
For more candidates to match use search level as ‘Exhaustive’
Use ‘Typical’ search level for business data
To match most candidates, use search level as ‘Extreme’. It has performance issues associated with it.

c) Match Level:

For records that are highly similar and which should be considered for a match, use match level as ‘Conservative’
Use match level as ‘Typical’ for most matches
Match level ‘Loose’ is better for manual matches to ensure that tighter rules have not missed any potential matches

d) Define the fuzzy match key columns that have more unique data, e.g. the Person Name or the Organization Name

e) Data in the fuzzy match key columns should not contain nulls. Nulls (SSA_KEY is K$$$$$$$) are potential candidates for each other.

Use the range query as below and review the SSA_DATA column for all the qualifying candidates-

SELECT DISTINCT ROWID_OBJECT, DATA_COUNT,SSA_DATA, DATA_ROW

FROM C_PARTY_STRP

WHERE SSA_KEY BETWEEN ‘YBJ>$$$$’ AND ‘YBLVZZZZ’

AND INVALID_IND = 0

ORDER BY ROWID_OBJECT, DATA_ROW

Cleanse Server logs

Cleanse server logs help to determine long running ranges. These long running ranges normally have more candidates to match. Isolate such ranges by looking into the cleanse server logs. Normally, production is a multi-threaded environment, so determine the rangers for these threads. Analyze which thread is taking more time and take out those records from the matching process and re-run the match job.

The video below provides more details about the match rule tuning -

Technology World

DronaBlog

Monday, August 6, 2018

Important and Useful Unix commands

Introduction

File and Directory

Compressed Files

Manipulating Data

Informatica MDM - Match Rule Tuning

Introduction

Activities

Data Profiling

Data Standardization

Determine the Fuzzy Match Key

Tuning the Potential match candidates

Cleanse Server logs

Friday, August 3, 2018

Informatica Master Data Management - MDM - Quiz - 3

Informatica Master Data Management - MDM - Quiz - 2

Thursday, August 2, 2018

Informatica Master Data Management - MDM - Quiz - 1

Configuring Survivorship in Informatica IDMC MDM SaaS