Technology World

Saturday, February 5, 2022

What are best Practices for Customer 360 data modeling ?

Are you planning to make changes in the existing customer 360 or c360 data model ? or are you thinking to extend the customer 360 data model and looking for guidelines in doing it? If so, then you reached the right place. In this article, we will understand best practices for customer 360 data modeling in Informatica Master Data Management.

A) What is Customer 360?

Informatica provides a pre-designed customer domain Master Data Management (MDM) tool. Using this we can expedite the development process for MDM implementation in our organization. Customer 360 comes with a prebuilt data model which we can either update or extend as needed.

Customer 360 MDM also comes with a user interface called customer 360 application and it is based on business entity services.

There are multiple aspects we need to consider while updating or extending the customer 360 data model and we are going to discuss this in this article.

B) What are the steps for modifying the existing customer 360 data model in Informatica MDM?

We can extend the customer 360 data model by various actions such as

1) Changing the physical schema

2) Adding needed columns to an existing table

3) Adding for updating values for existing columns

Following are the steps we need to perform to extend the data model.

Step 1 : Compare existing data model with business requirement and perform gap analysis

Step 2 : Prepare documentation that will provide a list of tables and columns needed to add

Step 3 : Take a backup of an existing schema

Step 4 : Review guidelines and standards for extending the data model

Step 5 : Add tables and columns as needed

C) What are the Guidelines for extending the customer 360 data model?

We can change the definition of tables or add a new table to the existing customer 360 data model. To perform these kinds of changes consider following guidelines

1. Check if we can use the existing table

2. Do not use a root base object to store organization or person information.

3. Do not define table names greater than 24 characters.

4. Do not delete existing columns

5. Do not delete existing base objects

6. Do not modify the data types of existing columns

7. Do not modify the physical name of existing base objects.

8. It is ok to modify the display name of existing base object tables or columns.

9. Do not decrease the length of an existing column.

10. Prefix the names of the new base object tables to distinguish the table from the existing tables.

11. For newly added column in the existing table-use prefix name of the column as x_

D) What are the guidelines for adding a new base object in the customer 360 data model?

We can add root or child base objects, lookup base objects or relationship base objects. For adding new base objects we need to consider the following guidelines.

1. Child base object with one-to-many relationships- Add a Party Role foreign key in the table to relate the table to the Party Role table.

2. Child base object with many - to -many relationships- Use the relationship base object to relate the table to the Party Role table.

3. Use lookup Indicator as TRUE for the lookup table.

Learn more about Informatica MDM here -

Tuesday, January 25, 2022

What is RANK function in Oracle database ?

Are you looking for an article on the Rank function in the Oracle database? Are you also interested in knowing how to use the RANK function in SQL? If so, then you reached the right place. In this article, we will learn the RANK function in detail Let's start.

A) What is the RANK function?

The RANK is an analytic function in oracle we use the RANK function to get the rank of a value in a group of values. The RANK function can be used both as an aggregate and as an analytic function.

B) Using RANK function as an aggregate function

We need to use the following syntax in order to use the RANK function as an aggregate function

RANK (Expression_1 , Expression _2 .......Expression_n)

WITHIN GROUP

(ORDER BY Expression_1 , Expression_2......Expression_n)

In the above syntax, Expression _1 , Expression_2 are used to determine the unique row in the group.

C) Using RANK function as an Analytic function

We need to use the following syntax in order to use RANK function as Analytic function -

RANK ( ) OVER ( [ query - partition _clause ]

ORDER BY clause )

In above syntax -

Order By clause - is an optional parameter and is used to order the data within each group.

query-partition_clause - it is an optional parameter and is used to partition results set into groups.

D) DENSE_RANK in oracle

The DENSE_RANK in oracle determines the rank of a row in an ordered group of rows and returns rank as Number Rank values are not skipped in the event of ties. If values are the same then the same rank number is given.

Learn more about oracle here -

Sunday, January 23, 2022

What Account Identifier in Snowflake ?

Are you looking for an article about what is Account Identifier in Snowflake is? Are you also interested in knowing what is the format for Account Identifier? If so, then you reached the right place. In this article, we will also learn about Organization names and Account names.

A) What is an Account Identifier in Snowflake?

The unique identifier which uniquely identifies the Snowflake account within a business entity and throughout the global network of snowflake is called Account Identifier.

Here the global network of Snowflake comprises of supported cloud regions and cloud platforms.

B) What are the uses of Account Identifier?

Following are important use cases where Account Identifier plays a vital role -

1) Account identifier is used in URLs for accessing the Snowflake web interface.

2) Account identifier is also used for connecting to Snowflake using drivers , snowSQL, connectors, and other clients.

3) It is also used in 3rd party applications which are part of the snowflake ecosystem.

4) Account identifies is required for secure Data sharing, Database replications, and Failover/failback features.

5) It is also used in interactions with external systems and securing snowflake internal operations.

C) How to identify Snowflake Account?

We can identify snowflake accounts using two ways -

1) Using given name in Organization

2) Using snowflake assigned locator

for identifying accounts using names in the organization, the ORGADMIN role must be created.

D) Identifying Account using Name in Organization

An organization is the first-class snowflake object and it is linked to the accounts owned by the business entity. The organization provides capabilities to administrators i.e users with ORGADMIN roles to create, view, and manage all accounts across different regions and cloud platforms.

Important points :

1) Account name must be unique within the organization.

2) Account name is not unique across organizations

3) Account names with underscores also have dashed versions.

Account name as Account identifier format -

1) <organization _name > - <account _name>

2) <organization _name> _ <Account_name>

3) <organization_name> . <Account_name>

Let's understand more about organization Name and Account Name -

a) Organization Name: It is the name chosen by the customer

* Organization name must be unique across all snowflake organizations.

* It can contain uppercase letters

* It can contain letters

* It should not contain underscore or any other special characters

* The organization name can be changed but has more complications.

b) Account Name: It is the name created by the customer.

* Account Name must be unique in an organization

* Account Name is not unique across snowflake organizations.

E) Identifying Account using Account locator in Region

The account locator is an identifier snowflake assigns at the time of account creation.

Customers can provide specific value to the account locater if it is created through a service representative else it will generate with random strings.

As each snowflake account is hosted on cloud platform in specific region , the account locator requires region id and cloud platform provider details in order to identify account using Account locator

Format using Account locator is

1) <account _locator> . <region_id> or

2) <account_locator> . <region_id> . <cloud>

Learn more about snowflake here

Thursday, January 20, 2022

What is Virtual Private Snowflake ?

Are you looking for details of Virtual Private Snowflake or VPS? Are you also interested in knowing how to use an account locator for a VPS account? If so, then you reached the right place. In this article, we will learn about Virtual Private Snowflake.

A) What is Virtual Private Snowflake?

Snowflake comes in various editions such as Standard Edition, Enterprise Edition, Business Critical Edition etc. Virtual Private Snowflake or VPS is another snowflake edition.

Virtual Private Snowflake provides the highest level of securities for organizations that have strict security requirements e.g Financial Institutions. It helps great help while dealing with sensitive data for analyzing and sharing such sensitive data.

Virtual Private Snowflake (VPS) comes with all the services and features of Business Critical Edition. However, all these Functionalities come in a completely separate environment i.e Isolated from all other snowflake accounts. In other words, the VPS accounts do not share resources with accounts outside VPS.

B) Account Identifier for VPS account

The account identifier for the VPS account is different than other snowflake editions. As we know the other snowflake editions have specific formats for Account Identifier, you can click here for more details. As VPS uses a different structure for hostnames and URLs hence the format for VPS account identifier is also different. In order to get a specific format for the VPS account, we need to reach out to the snowflake support team.

However, we have another alternative option that we can use for VPS account Identifier and that format is

C) Does VPS support secure Data sharing?

The answer is NO . the VPS or Virtual Private Snowflake currently does not support secure data sharing. On another hand, Enterprise and standard edition support secure Data sharing.

D) What are the important features of VPS?

Along with features of Business Critical Edition, the edition provides the following features

1) Use of Tri -secret service for customer-managed encryption keys.

2) Support for private connectivity to snowflake services using Azure Private Link, AWS Private Link, Google Cloud Private service connect.

3) Dedicated metadata store and pool for compute resources

4) Support for FedRAMP for US government regions.

Learn more about snowflake here -

Sunday, January 16, 2022

Snowflake interview questions and answers - Part III

Are you preparing for your snowflake interview? Are you looking for Snowflake interview questions and answers? If so, then you reached the right place. In this article, we will focus on snowflake caching interview questions and answers. You can visit the previous article on Snowflake interview questions - Part II here.

Q.1 Explain caching in Snowflake ? or How caching works in Snowflake?

Snowflake provides caching at two levels - one at the cloud services layer and the second one at compute level. When we execute SQL query against Snowflake. the result from the cloud services layer will be fetched. If cloud services layer cache is disabled then compute layer level cache will be used.

The important thing to note here is that cache scenarios work only when the time of execution falls within AUTO_SUSPEND specified time.

Q.2 How does cache work if the underlying table gets updated?

As we know when we submit a SQL query to a virtual warehouse in Snowflake, it gets executed against database storage and results are returned back to the cloud services layer. Apart from it, the data gets cached at compute layer and cloud services layer when we execute the same query again the result will be returned using cache.

Now let's assume that we either updated the underlying table by deleting a record or updating a record. After updating the table, if we run the same query again the cache will not be used. Instead, the virtual warehouse will be connected to database storage to fetch the data.

Q.3 Is it a good idea to run a select query after updating the table and after deleting the record separately?

When we update the table by deleting or updating a record then such change is made in the storage layer. that means, if we execute a query against Snowflake, it needs to connect the storage layer to fetch the latest data. Whenever we connect database storage through compute layer it incurs expenses. so it makes sense to perform all your DML operations such as delete, update as a single unit of work, and then query the results instead of making separate select query calls. This will be helpful to achieve cost-effectiveness.

Q. 4 Does the user's cache share across multiple users?

Assume user 1 executes a SQL query against snowflake using virtual warehouse VM1 and after successful execution, if the same SQL query is executed by user 2 with vm2, in such case will cache be used for user2 query execution? the answer is YES As long users use the same or different virtual warehouse and time of execution are under Auto_Suspend timeframe then cache from either result cache or local disk cache will be used.

Q. 5 How long the query results will be cached?

The query results are retained for 24 hrs from last the time of execution. Result cache layer holds queries for 24 hrs.

Q.6 How much snowflake charges for storing cache?

Snowflake does not charge for storing cache.

Q. 7 Explain more about Remote disk?

The remote disk is nothing but database storage which can be achieved using cloud providers like AWS S3. The remote disk is nothing but the blob storage area

Q. 8 How to disable local disk cache

We cannot disable the local disk cache.

Q. 9 How to disable Result cache

In order to disable Result cache use statement below -

ALTER SESSION SET USE_CACHED_RESULT=FALSE ;

Q. 10 Can we use the Result cache even if we suspend the warehouse?

The answer is YES. We can use the Result cache when the virtual warehouse is in a suspended or Inactive state.

Learn more about snowflakes here...

Sunday, January 2, 2022

What is difference between Star and Snowflake schema ?

Are you looking for the details about star schema and snowflake schema? If so, then you reached the right place. In this article, we will see details about it along with the differences between the Star & Snowflake schema.

A) What is star schema?

The star schema is the simplest schema used to develop dimensional data marts and the data warehouse star schema consists of one or more fact tables referencing the multiple numbers of dimension tables.

The star schema separates business process data into facts and dimensions facts holds measurable, quantitative data on the other hand dimensions provides descriptive attribute related to fact data.

1) Benefits:- The star schema is denormalized and the benefits of star schema are the queries are simple, business logic reporting is simplified, better query performance, improved performance for aggregation operation.

2) Disadvantages:- star schema is not flexible for complex analytical needs. It does not support many to many relationships. The data integrity is not well enforced due to the denormalized state.

B) Snowflake schema

The Snowflake schema is a logical arrangement of tables in a multidimensional database with entity relationships that resemble a snowflake shape.

Snowflake schema has centralized fact tables and those are connected to multiple dimensions.

The snowflake schema is similar to the star schema but in the Snowflake schema dimensions are normalized into multiple related tables.

1) Benefits:- Below are the benefits of snowflake schema Better storage savings due to normalization. optimization of some OLAP database models with snowflake schema.

2) Disadvantages:- Complexity in SQL query due to normalization. Data loads into the snowflake schema must be highly controlled and managed.

C) Difference between snowflake and star schema

The snowflake and star schema are similar in nature. However, the Snowflake schema is normalized for some dimensions. On other hand, the logical dimensions are denormalized in the star schema.

Learn more about snowflake here

Monday, December 27, 2021

What is Cold and Hot virtual warehouses ?

The virtual warehouse in snowflake goes through various states. The states are cold warehouse, warm warehouse, and Hot warehouse. In this article, we will learn about what does each state of virtual warehouses in snowflake. We will also see what impact of states of virtual warehouses on query performance.

A) What is a virtual warehouse in snowflake?

The virtual warehouse in snowflake is a cluster of computing resources that are used to perform activities such as DML operations and SQL execution. The compute resources consist of temporary storage, memory, and CPU. The majority of expenses occur due to the use of a virtual warehouse or compute layer.

B) What are the states of the virtual warehouse in snowflake?

The virtual warehouses go through various states and these are -

a) Cold virtual warehouse

b) Warm virtual warehouse

c) Hot virtual warehouse

In the next section, we will more about all these states of the virtual warehouse.

C) Cold virtual warehouse :

Starting a new virtual warehouse without using local disk cashing and executing the query is called a cold virtual warehouse. When our virtual warehouse is suspended or not active and if we run the query then it will start a new instance of the virtual warehouse.

D) Warm virtual warehouse :

It is the state of the virtual warehouse during which the virtual warehouse is active and running for a while and processed queries. Assume that, the virtual warehouse is in a warm state and we disable the result cache and execute the query then it will use local disk caching. This caching is called warm caching.

E) Hot virtual warehouse :

The hot virtual warehouse is a state of virtual warehouse during which the virtual warehouse is active and running for a while and processed queries. However, in this case, the result cache is on. If we execute the query in this state, the query result is returned from the result cache. This is the most efficient operation.

F) Impact of virtual warehouse state on query performance

1) Cold warehouse

The query takes longer than a warm and hot virtual warehouse. It uses a remote disk. Local disk cache is not used. Result cache is not used.

2) Warm warehouse

The query takes lesser than a cold warehouse but more than a hot virtual warehouse. It does not use a remote disk, however, it uses a local disk. It does not use result cache.

3) Hot virtual warehouse

The query takes lesser time for execution than a cold virtual warehouse and a warm virtual warehouse. It does not use both remote disk and local disk cache. The result is returned using the result cache. from the cloud services layer. It is the most efficient way of getting the result of the query.