Technology World: 2022

Tuesday, August 9, 2022

What are top cloud data warehouses ?

Are you looking for the top cloud data warehouses in the current market? Are you interested in knowing which cloud data warehouse is the preferred one? If so, then you reached the right place. Let's discuss top cloud data warehouses.

A] What is a cloud data warehouses?

A database stored as managed service and provided as software-as-a-service (i.e SaaS) to perform analytics and business intelligence operations in a public cloud is called a cloud data warehouses.

In some cases these can be private cloud provider services.

B] What are the cloud data warehouses?

Following are the currently available top cloud Data warehouses -

1. Azure Synapse Analytics

2. Amazon Redshift

3. Google BigQuery

4. Azure SQL database

5. Snowflake

6. Azure Cosmos DB+

C] Which could Data Warehouse should we use?

The use of Cloud Data Warehouse depends on the business use case. However, currently, Snowflake Cloud Data Warehouse is commonly used due to the ease and performance it provides compared to other Cloud Datawarehouse.

Learn more about Snowflake Cloud Datawarehouse here.

Sunday, March 27, 2022

How does secure data sharing work in snowflake ?

Are you looking for the details about how data sharing works in snowflake? Are you also interested in knowing what are things we can share in snowflake? If so, then you reached the right place. In this article, we will explore data sharing in Snowflake. Let's start.

A] What are the database objects we can share?

The following snowflake database objects can be shared -

a) External tables

b) Tables

c) Secure views

d) Secure UDFs

e) Secure materialized views

The objects which are shared are in Read-only. i.e we cannot add, update or delete data or objects.

B] How does secure data sharing work?

Secure data sharing is a feature of snowflake using which data is shared with consuming systems.

Following are important points about secure data sharing.

1. No actual data is copied or transferred

2. Sharing achieved using snowflake unique services layer and metadata store

3. No storage is needed on the consumer side

4. No monthly charges to the consumer for storage

5. Consumers need to pay for executing queries i.e for using compute resources

How does Data Sharing work in Snowflake?

a) Secure data share comes with the provider and consumer concepts.

b) Provider creates a shared database and grants access to specific objects in the database

c) Provider has capabilities to share from multiple databases if these databases are from the same account

d) The read-only database is created on the consumer side from the share. Access to this database is configurable using standard role-based access control.

Learn more about snowflake here -

Saturday, February 5, 2022

What are best Practices for Customer 360 data modeling ?

Are you planning to make changes in the existing customer 360 or c360 data model ? or are you thinking to extend the customer 360 data model and looking for guidelines in doing it? If so, then you reached the right place. In this article, we will understand best practices for customer 360 data modeling in Informatica Master Data Management.

A) What is Customer 360?

Informatica provides a pre-designed customer domain Master Data Management (MDM) tool. Using this we can expedite the development process for MDM implementation in our organization. Customer 360 comes with a prebuilt data model which we can either update or extend as needed.

Customer 360 MDM also comes with a user interface called customer 360 application and it is based on business entity services.

There are multiple aspects we need to consider while updating or extending the customer 360 data model and we are going to discuss this in this article.

B) What are the steps for modifying the existing customer 360 data model in Informatica MDM?

We can extend the customer 360 data model by various actions such as

1) Changing the physical schema

2) Adding needed columns to an existing table

3) Adding for updating values for existing columns

Following are the steps we need to perform to extend the data model.

Step 1 : Compare existing data model with business requirement and perform gap analysis

Step 2 : Prepare documentation that will provide a list of tables and columns needed to add

Step 3 : Take a backup of an existing schema

Step 4 : Review guidelines and standards for extending the data model

Step 5 : Add tables and columns as needed

C) What are the Guidelines for extending the customer 360 data model?

We can change the definition of tables or add a new table to the existing customer 360 data model. To perform these kinds of changes consider following guidelines

1. Check if we can use the existing table

2. Do not use a root base object to store organization or person information.

3. Do not define table names greater than 24 characters.

4. Do not delete existing columns

5. Do not delete existing base objects

6. Do not modify the data types of existing columns

7. Do not modify the physical name of existing base objects.

8. It is ok to modify the display name of existing base object tables or columns.

9. Do not decrease the length of an existing column.

10. Prefix the names of the new base object tables to distinguish the table from the existing tables.

11. For newly added column in the existing table-use prefix name of the column as x_

D) What are the guidelines for adding a new base object in the customer 360 data model?

We can add root or child base objects, lookup base objects or relationship base objects. For adding new base objects we need to consider the following guidelines.

1. Child base object with one-to-many relationships- Add a Party Role foreign key in the table to relate the table to the Party Role table.

2. Child base object with many - to -many relationships- Use the relationship base object to relate the table to the Party Role table.

3. Use lookup Indicator as TRUE for the lookup table.

Learn more about Informatica MDM here -

Tuesday, January 25, 2022

What is RANK function in Oracle database ?

Are you looking for an article on the Rank function in the Oracle database? Are you also interested in knowing how to use the RANK function in SQL? If so, then you reached the right place. In this article, we will learn the RANK function in detail Let's start.

A) What is the RANK function?

The RANK is an analytic function in oracle we use the RANK function to get the rank of a value in a group of values. The RANK function can be used both as an aggregate and as an analytic function.

B) Using RANK function as an aggregate function

We need to use the following syntax in order to use the RANK function as an aggregate function

RANK (Expression_1 , Expression _2 .......Expression_n)

WITHIN GROUP

(ORDER BY Expression_1 , Expression_2......Expression_n)

In the above syntax, Expression _1 , Expression_2 are used to determine the unique row in the group.

C) Using RANK function as an Analytic function

We need to use the following syntax in order to use RANK function as Analytic function -

RANK ( ) OVER ( [ query - partition _clause ]

ORDER BY clause )

In above syntax -

Order By clause - is an optional parameter and is used to order the data within each group.

query-partition_clause - it is an optional parameter and is used to partition results set into groups.

D) DENSE_RANK in oracle

The DENSE_RANK in oracle determines the rank of a row in an ordered group of rows and returns rank as Number Rank values are not skipped in the event of ties. If values are the same then the same rank number is given.

Learn more about oracle here -

Sunday, January 23, 2022

What Account Identifier in Snowflake ?

Are you looking for an article about what is Account Identifier in Snowflake is? Are you also interested in knowing what is the format for Account Identifier? If so, then you reached the right place. In this article, we will also learn about Organization names and Account names.

A) What is an Account Identifier in Snowflake?

The unique identifier which uniquely identifies the Snowflake account within a business entity and throughout the global network of snowflake is called Account Identifier.

Here the global network of Snowflake comprises of supported cloud regions and cloud platforms.

B) What are the uses of Account Identifier?

Following are important use cases where Account Identifier plays a vital role -

1) Account identifier is used in URLs for accessing the Snowflake web interface.

2) Account identifier is also used for connecting to Snowflake using drivers , snowSQL, connectors, and other clients.

3) It is also used in 3rd party applications which are part of the snowflake ecosystem.

4) Account identifies is required for secure Data sharing, Database replications, and Failover/failback features.

5) It is also used in interactions with external systems and securing snowflake internal operations.

C) How to identify Snowflake Account?

We can identify snowflake accounts using two ways -

1) Using given name in Organization

2) Using snowflake assigned locator

for identifying accounts using names in the organization, the ORGADMIN role must be created.

D) Identifying Account using Name in Organization

An organization is the first-class snowflake object and it is linked to the accounts owned by the business entity. The organization provides capabilities to administrators i.e users with ORGADMIN roles to create, view, and manage all accounts across different regions and cloud platforms.

Important points :

1) Account name must be unique within the organization.

2) Account name is not unique across organizations

3) Account names with underscores also have dashed versions.

Account name as Account identifier format -

1) <organization _name > - <account _name>

2) <organization _name> _ <Account_name>

3) <organization_name> . <Account_name>

Let's understand more about organization Name and Account Name -

a) Organization Name: It is the name chosen by the customer

* Organization name must be unique across all snowflake organizations.

* It can contain uppercase letters

* It can contain letters

* It should not contain underscore or any other special characters

* The organization name can be changed but has more complications.

b) Account Name: It is the name created by the customer.

* Account Name must be unique in an organization

* Account Name is not unique across snowflake organizations.

E) Identifying Account using Account locator in Region

The account locator is an identifier snowflake assigns at the time of account creation.

Customers can provide specific value to the account locater if it is created through a service representative else it will generate with random strings.

As each snowflake account is hosted on cloud platform in specific region , the account locator requires region id and cloud platform provider details in order to identify account using Account locator

Format using Account locator is

1) <account _locator> . <region_id> or

2) <account_locator> . <region_id> . <cloud>

Learn more about snowflake here

Thursday, January 20, 2022

What is Virtual Private Snowflake ?

Are you looking for details of Virtual Private Snowflake or VPS? Are you also interested in knowing how to use an account locator for a VPS account? If so, then you reached the right place. In this article, we will learn about Virtual Private Snowflake.

A) What is Virtual Private Snowflake?

Snowflake comes in various editions such as Standard Edition, Enterprise Edition, Business Critical Edition etc. Virtual Private Snowflake or VPS is another snowflake edition.

Virtual Private Snowflake provides the highest level of securities for organizations that have strict security requirements e.g Financial Institutions. It helps great help while dealing with sensitive data for analyzing and sharing such sensitive data.

Virtual Private Snowflake (VPS) comes with all the services and features of Business Critical Edition. However, all these Functionalities come in a completely separate environment i.e Isolated from all other snowflake accounts. In other words, the VPS accounts do not share resources with accounts outside VPS.

B) Account Identifier for VPS account

The account identifier for the VPS account is different than other snowflake editions. As we know the other snowflake editions have specific formats for Account Identifier, you can click here for more details. As VPS uses a different structure for hostnames and URLs hence the format for VPS account identifier is also different. In order to get a specific format for the VPS account, we need to reach out to the snowflake support team.

However, we have another alternative option that we can use for VPS account Identifier and that format is

C) Does VPS support secure Data sharing?

The answer is NO . the VPS or Virtual Private Snowflake currently does not support secure data sharing. On another hand, Enterprise and standard edition support secure Data sharing.

D) What are the important features of VPS?

Along with features of Business Critical Edition, the edition provides the following features

1) Use of Tri -secret service for customer-managed encryption keys.

2) Support for private connectivity to snowflake services using Azure Private Link, AWS Private Link, Google Cloud Private service connect.

3) Dedicated metadata store and pool for compute resources

4) Support for FedRAMP for US government regions.

Learn more about snowflake here -

Sunday, January 16, 2022

Snowflake interview questions and answers - Part III

Are you preparing for your snowflake interview? Are you looking for Snowflake interview questions and answers? If so, then you reached the right place. In this article, we will focus on snowflake caching interview questions and answers. You can visit the previous article on Snowflake interview questions - Part II here.

Q.1 Explain caching in Snowflake ? or How caching works in Snowflake?

Snowflake provides caching at two levels - one at the cloud services layer and the second one at compute level. When we execute SQL query against Snowflake. the result from the cloud services layer will be fetched. If cloud services layer cache is disabled then compute layer level cache will be used.

The important thing to note here is that cache scenarios work only when the time of execution falls within AUTO_SUSPEND specified time.

Q.2 How does cache work if the underlying table gets updated?

As we know when we submit a SQL query to a virtual warehouse in Snowflake, it gets executed against database storage and results are returned back to the cloud services layer. Apart from it, the data gets cached at compute layer and cloud services layer when we execute the same query again the result will be returned using cache.

Now let's assume that we either updated the underlying table by deleting a record or updating a record. After updating the table, if we run the same query again the cache will not be used. Instead, the virtual warehouse will be connected to database storage to fetch the data.

Q.3 Is it a good idea to run a select query after updating the table and after deleting the record separately?

When we update the table by deleting or updating a record then such change is made in the storage layer. that means, if we execute a query against Snowflake, it needs to connect the storage layer to fetch the latest data. Whenever we connect database storage through compute layer it incurs expenses. so it makes sense to perform all your DML operations such as delete, update as a single unit of work, and then query the results instead of making separate select query calls. This will be helpful to achieve cost-effectiveness.

Q. 4 Does the user's cache share across multiple users?

Assume user 1 executes a SQL query against snowflake using virtual warehouse VM1 and after successful execution, if the same SQL query is executed by user 2 with vm2, in such case will cache be used for user2 query execution? the answer is YES As long users use the same or different virtual warehouse and time of execution are under Auto_Suspend timeframe then cache from either result cache or local disk cache will be used.

Q. 5 How long the query results will be cached?

The query results are retained for 24 hrs from last the time of execution. Result cache layer holds queries for 24 hrs.

Q.6 How much snowflake charges for storing cache?

Snowflake does not charge for storing cache.

Q. 7 Explain more about Remote disk?

The remote disk is nothing but database storage which can be achieved using cloud providers like AWS S3. The remote disk is nothing but the blob storage area

Q. 8 How to disable local disk cache

We cannot disable the local disk cache.

Q. 9 How to disable Result cache

In order to disable Result cache use statement below -

ALTER SESSION SET USE_CACHED_RESULT=FALSE ;

Q. 10 Can we use the Result cache even if we suspend the warehouse?

The answer is YES. We can use the Result cache when the virtual warehouse is in a suspended or Inactive state.

Learn more about snowflakes here...

Sunday, January 2, 2022

What is difference between Star and Snowflake schema ?

Are you looking for the details about star schema and snowflake schema? If so, then you reached the right place. In this article, we will see details about it along with the differences between the Star & Snowflake schema.

A) What is star schema?

The star schema is the simplest schema used to develop dimensional data marts and the data warehouse star schema consists of one or more fact tables referencing the multiple numbers of dimension tables.

The star schema separates business process data into facts and dimensions facts holds measurable, quantitative data on the other hand dimensions provides descriptive attribute related to fact data.

1) Benefits:- The star schema is denormalized and the benefits of star schema are the queries are simple, business logic reporting is simplified, better query performance, improved performance for aggregation operation.

2) Disadvantages:- star schema is not flexible for complex analytical needs. It does not support many to many relationships. The data integrity is not well enforced due to the denormalized state.

B) Snowflake schema

The Snowflake schema is a logical arrangement of tables in a multidimensional database with entity relationships that resemble a snowflake shape.

Snowflake schema has centralized fact tables and those are connected to multiple dimensions.

The snowflake schema is similar to the star schema but in the Snowflake schema dimensions are normalized into multiple related tables.

1) Benefits:- Below are the benefits of snowflake schema Better storage savings due to normalization. optimization of some OLAP database models with snowflake schema.

2) Disadvantages:- Complexity in SQL query due to normalization. Data loads into the snowflake schema must be highly controlled and managed.

C) Difference between snowflake and star schema

The snowflake and star schema are similar in nature. However, the Snowflake schema is normalized for some dimensions. On other hand, the logical dimensions are denormalized in the star schema.