DronaBlog

Showing posts with label Snowflake. Show all posts
Showing posts with label Snowflake. Show all posts

Saturday, October 16, 2021

What is Time Travel in Snowflake ?

                        Are you looking for details about Time Travel in Snowflake? Are you also interested in knowing what are tasks we can perform using Time travel feature? If so, then you reached the right place. In this article, we will learn one of the powerful features is Snowflake.


A) What is Time Travel in Snowflake

                        The feature by which we can access historical data at any point within a specified period is called Time Travel in snowflake we can access data not only changed but deleted as well.


B) What are the tasks that can be performed using Time travel in Snowflake?

                      The tasks below can be effectively performed by using Time Travel Feature 

             1.  Backing up the data from key points in the past.

             2. Duplicating the data from key points in the past.

            3. Restoring tables, schemes, and databases if those are accidentally deleted.






C) What is Data Protection Lifecycle? 

                  In snowflake, there are three-phase of the data protection lifecycles. 

           1. Current Data Storage: on the current data set we can perform standard operations such as DML , DDL etc.

           2. Time Travel Retention: The normal retention period is 1 to 90 days. Here is the list of operations allowed with time travel.

          a) SELECT .... AT| BEFORE ...

          b) CLONE ... AT|BEFORE ...

          C) UNDROP...

          3. Fail safe: This is the last phase in Data Protection Lifecycle. This can only be performed by snowflake No user operations are allowed.


D) Data  Retention Period in snowflake 

                In snowflake, Data Retention Period is a key component for Time Travel. The Data retention period specifies the period or number of days we can preserve data. Snowflake Preserves the state of data before update /delete/drop. 





               For Snowflake Standard  Edition, the Data Retention period is one day.

              For Snowflake Enterprise  Edition, Data Retention Period between 0 to 90 days.


            Learn more about Snowflake here -



           

Sunday, October 3, 2021

How to handle NULL values in Snowflake

                   Are you facing challenges while handling NULL values in snowflake? Are you also interested in knowing what are the things we need to consider while columns in the query contain NULL values? In this article, we will learn details about handling NULL values.


A) What are NULL values?

                   Many times NULL value is referred to as no value. some systems do not have NULL concepts. In technical terms, the NULL value is a reference to an empty area of the memory. some systems handle differently than others. Snowflake has its own way of handling NULL values.






B) What are the rules for handling NULL values in Snowflake?

                  Here is the list of rules  -

             1. Not null to null value comparison :

                   we compare not null value with the null value it returns NULL

                   e.g  'ABC ' =NULL returns NULL

             2. Equality or inequality comparison :

                   If we use inequality comparisons such as less than or greater than it results in a NULL                              value.

                   e.g 'ABC' > NULL returns NULL

             3. Comparision of NULL values :

                   If we compare one NULL value with another NULL value then it results in a NULL value.

                   e,g . NULL= NULL results NULL

             4. Best way of comparing NULL values is 

                     IS NULL or IS NOT NULL

             5. Function to null safe equality 

                  We can use the EQUAL_NULL function to check NULL safe equality.

                  e.g. equal_null ('ABC' , null) will return false.





             6. Aggregate function dismisses NULL values.

                  Assume that one of the fields in the snowflake contains the below values -

                  1,2, NULL , 3, NULL 6, 1

                 If we use aggregate function 4 AVG on this field then it will return the result as                                              3 i.e(1+2+3+6)/4

                 i.e. It ignores NULL values.

             7. Real average value 

                  In case we need a real average in the above example we need to use the SUM function, where the NULL value is 0

                  i.e. SUM(<value>)/count (*)

                 e.g. sum(1+2+0+3+0+6)/6

                  i.e. 12/6 = 2

             8. The empty string in snowflake is NOT NULL

                 e.g. "IS NULL will returns false.

             9. Count (*) returns the total count of rows in the table 

            10. Count (<column_name>) returns the count of rows with not null values only


                                  Learn more about snowflake here -



Tuesday, September 28, 2021

What are the components of snowflake architecture ?

                Are you looking for an article on snowflake architecture? Are you also looking for the components of snowflake architecture? If so, then you reached the right place. In this article, we will explore database storage, Query processing, and cloud services in detail.

A) What is the Architecture of Snowflake?

               Snowflake architecture is a hybrid of a shared-nothing database and shared disk.

       1. Snowflake uses a central data repository which is similar to shared-disk architecture.

       2. Snowflake processes queries using massively parallel processing compute clusters. In this kind of architecture each node in the cluster, stores a portion of the entire data set. This is similar to shared-nothing architecture.





B) What are the components of Snowflake Architecture?

              The components of snowflake architecture are as below 

        1. Database Storage

        2. Query Processing 

        3. Cloud Services 

                 Let's understand each of these components one by one

1. Cloud  Services 

               It is the topmost layer in snowflake architecture. It consists of a collection of services that coordinates various activities across the Snowflake platform. The cloud services join various components of a snowflake in order to fulfill requests such as login or giving a response back to the user.

               Here is the list of services that are handled in this layer.

         1. Authentication 

         2. Infrastructure Management 

         3. Metadata Management 

         4. Query Parsing 

         5. Query optimization 

         6. Access Control

2. Query Processing 

               In this layer, query execution is handled. It is the most common and widely used component of the snowflake.  The queries are processed using a virtual warehouse. Each virtual warehouse is massively parallel processing compute cluster. It consists of multiple compute nodes provided by snowflake from the cloud provider.





3. Database Storage 

              It is cloud storage where optimized data is stored. What is optimized data? The optimized data is nothing but the data which is reorganized by snowflake into the compressed and columnar format.

             What are aspects handled by snowflake related to data? here is a list which is taken care of by snowflake 

         1. File Size 

         2. Structure of the data 

         3. Compression of the data 

         4. Metadata

         5. Statistics of the data

         6. Organization of the data


 The important thing here is the data stored by snowflake is not visible or accessible directly by customers. It can only be accessed using SQL query operations.


                 Learn more about snowflake here 



Tuesday, July 27, 2021

Top 10 commonly used commands in Snowflake

        Are you looking for details about commonly used commands in Snowflake? Are you also interested in knowing what are DML, DDL snowflake commands? Then you have reached the right place. In this article, we will explore more about snowflake commands.

A) Types of commands in Snowflake

          There are four types of commands present in snowflake and those are 

          1. DDL - Data Definition Language

          2. DML - Data Manipulation Language

          3. DCL - Data Control Language

          4. TCL - Transaction Control Language

         In this article, we will focus on DDL and DML commands use in Snowflake






B) What commonly used DDL commands in Snowflake?

            Here is the list of DDL commands used in snowflake

           1. ALTER

           2. CREATE

           3. DROP 

           4. USE

           5. SHOW


           Let's see each of these commands one by one -

1. AFTER - AFTER command is used to modify metadata of on account level, parameters for session or metadata of the database object

Syntax : 

              AFTER <object_type> <object_name> <actions>

e.g  AFTER SESSION SET < params >


2. CREATE - CREATE command is used to create new object

Syntax : 

              CREATE <object_type> <object_name>

e.g.  CREATE  DATABASE ABC


3. DROP - DROP command is used for removing object from system.

Syntax : DROP <object_type> [IF EXISTS] <identifier>

e.g  DROP USER [IF EXISTS] abc_ user


4. USE - USE command is used to specify role, warehouse database ,or schema for current session

Syntax : USE WAREHOUSE <name>

e.g   USE WAREHOUSE xyz 


5. SHOW - SHOW command is used to provide metadata for the object.

Syntax : SHOW <object_ type_plural> [LIKE '<pattern>']

e.g  SHOW PARAMETERS [LIKE '<pattern>']






C) What are commonly used DML commands in Snowflake?

         Here is the list of DDL commands used in Snowflake

          1. INSERT

          2. MERGE

          3. UPDATE

          4. DELETE

          5. TRUNCATE

             Let's see each of these commands one by one.


1. INSERT - INSERT command is used to insert one or more rows into the table.

       Syntax :

       INSERT INTO <table_name> [< column_name>]

           VALUES (<value>|DEFAULT|NULL,...)

        

       e.g

        INSERT INTO TAB_ABC  (id, name)

         VALUES ( 100 , ' DRONA')

2) MERGE - MERGE command is used to Insert, delete and update values in a table based on values in a subquery or another table.

         Syntax : 

         MERGE INTO <table_name> USING <source> ON <join_exp>L

         e.g

         MERGE INTO TAB_ABC USING TAB_PQR ON TAB_ABC.ID=TAB_PQR.ID WHEN MATCHED Then 

         Update set TAB_ABC .NAME = TAB_PQR.NAME


3. UPDATE: The UPDATE command is used to update rows in the table.

       Syntax : 

              UPDATE <table_name> SET <field>=<value>

       e.g 

            UPDATE TAB_ABC SET NAME = 'XYZ'


4. DELETE : DELETE command is used to delete records from the table.

         Syntax : 

              DELETE FROM <table_name > WHERE <condition>

          e.g 

              DELETE FROM TAB_ABC WHERE NAME='BOB'


5) TRUNCATE: TRUNCATE command is used to remove records from the table including privileges and constraints.

          Syntax :

              TRUNCATE TABLE <table _name>

          e.g 

              TRUNCATE TABLE TAB_ ABC.







Sunday, June 27, 2021

What is Common Table Expression (CTE) in Snowflake?

    Are you working on Snowflake technology and would like to understand Common Table Expression also known as CTE? If so, then you reached the right place. In this article, we will understand what is Common Table Expression. We will also explore how to use CTE in queries.

A) What is Common Table Expression i.e CTE?

      A Common Table Expression (CTE) is a named subquery. It is defined in a WITH Clause. It is equivalent to a temporary view.

      The  CTE Contains an optional list of columns and a SELECT Statment. The output of CTE is a table with a column defined.





B) What is the syntax for CTE?

     The syntax for CTE is as below

      WITH

            TEMP_CTE (COL_1 ,COL_2) AS

              (SELECT col_nm1 , col_nm2 From TABLE)

     SELECT COL_1 , COL_2 FROM TEMP_CTE

C) What is recursive CTE?

      A recursive CTE is a CTE that references itself. In a recursive CTE, we can join Table to itself as many times as needed.


D) Benefits of CTE?

      1) CTE increases the modularity of the SQL programs

       2) CTE is helpful for simplified maintenance

       3) Recursive CTE can be used to process hierarchical data in the table.





E) Naming conventions for CTE?

      We need to avoid CTE names that match with database views or table the reason for this is - if we define query with CTE names then CTE takes precedence over table or view names which will produce unexpected results.

         For more details about snowflake refer to this video  



Friday, June 18, 2021

How to Use LIKE operator in Snowflake?

   Are you looking for details of what is collate in Snowflake? Are you also interested in knowing what is the purpose of collate and how to use it? If so, then you reached the right place. We will understand how to use collate to achieve LIKE functionality in Snowflake. In this article, we will more about collate or collation in Snowflake.




A) What is collate?

          The collate function in Snowflake allows specifying alternative rules for comparing strings.

B) What is the purpose of collate in Snowflake?

        The collate function in Snowflake is used to compare and sort the data. The comparison and sorting will be based on a particular language or other user-specified rules.

         The text strings in Snowflake are stored using UTF-8 character set. Comparing based on Unicode will not provide the desired output because of the following reasons :

          1. The special character in a language does no sort based on the language standards.

          2. In case we would like to achieve sorting based on special rules .e.g case insensitive sort.


C) What type of rules can be used with collate in Snowflake

     Here is a list of rules that can be used with collate 

     1. Different character sets for different language

     2. To achieve case insensitive comparisons 

    3.  Accent sensitivity e.g a,á,ä

    4 . Punctuation sensitivity e.g P-Q-R and PQR

    5. Sorting based on the first letter in the strings.

    6. Trimming leading and trailing spaces and then sorting

     7. Other options can be implemented based on business needs.





D) Where to use collate in Snowflake SQL?

       1. Simple comparison in where clause

              WHERE  FIELD1= FIELD2

        2. Join condition

              ON  EMP. EMP_NM =MANAGER.MNGR_NM

         3. Sorting condition

               ORDER BY FIELD 1

         4. Aggregation condition

                GROUP BY

          5. Aggregate functions

                  MAX ( FIELD1)

          6. Scaler functions

                 LEAST (FIELD1, FIELD2, FIELD3)

         7. Data clustering conditions

                CLUSTER BY (FIELD1)

     There are other several usages of collate in SQL, however above mentioned are commonly used.


E) How to use collate with LIKE operator

      here is an example of collate using like operation

     SELECT * FROM EMP WHERE COLLATE (NAME,") Like%ABC%


Learn more about snowflake here -



Wednesday, May 19, 2021

Things to consider while configuring snowflake account - part 2

In the previous article on Things to consider while configuring Snowflake Account - part I, we understood two important factors i.e Data retention and timezone. In this article, we will see security, cost savings, and connection performance. Let's start.






A] Security -

        The security requirement for Snowflake has two parts.

a) Storage integration

        We should not create external stages without storage integration. A Storage integration is secure means of creating connectivity between snowflake and cloud. storage provider. We need to set the below properties to `true ' so that we can prevent the exposure of access tokens or secret keys to snowflake users, the properties are 

  •  REQUIRE_STORAGE_INTEGRATION_FOR_STAGE_CREATION 
  •  REQUIRE_STORAGE_INTEGRATION_FOR_STAGE_OPERATION

b) Network Policies.

        It is important to prevent access from unwanted networks to the snowflake account. The network policies help to define a list of valid network locations for user connection.

     We can configure account level and user-level network policies.

        NETWORK_POLICY is a parameter associated with network policies.





B] Connection Performance

      In order achieve better connection performance we need to set CLIENT_METADATA_REQUEST_USE_CONNECTION_CTX to value `TRUE'. This helps to reduce the amount of information used for JDBC and ODBC connections.


C] Cost Savings.

      Cost  of Snowflake implementation is dependent upon two factors 1) Users  2) Warehouses

1) User: In the snowflake account there are system users and human users. Set different configurations for each of these. Below are helpful parameters for users - 

  •  LOCK_TIMEOUT
  •  STATEMENT_TIMEOUT_IN_SECONDS
  •  STATEMENT_QUEUED_TIMEOUT_IN_SECONDS


2) Warehouse: The optimum use of Snowflake warehouses can be achieved by resource monitoring, setting the appropriate warehouse size, and setting proper cluster count

      The mentioned below are parameters of Warehouse

  • RESOURCE MONITOR
  • AUTO_SUSPEND 
  • WAREHOUSE_ SIZE 
  •  MIN_CLUSTER_COUNT


 Learn more about snowflake here-



Saturday, May 1, 2021

Things to consider while configuring snowflake account - Part I

Are you planning to use snowflake in your project? Are you interested in knowing all things you need to consider while configuring a snowflake account? If so, then you reached the right place. In this article, we will focus on five important factors for snowflake Configuration. This is part I of the article.





Introduction:-

With the snowflake chargeback model, you can quickly shorten your project but we have to consider several factors before starting configuring and using the snowflake account. Let's understand these factors one by one 

Factors to consider:-

These five important factors that need to consider for the snowflake account and are

1. Data Retention

2. Timezone

3. Security

4. Connection Performance

5. Cost Savings

          In this article, we will focus on Data Retention and Timezone. We will learn more about the remaining factors in Things to consider while configuring snowflake account-part 2





A] Data Retention 

     What is data retention? It is the time for which snowflake will retain a historical view of the data.

a) Important Points -

          • For cost-saving, we can set value for each  database. for non-production data set it to fewer days

          • For development environment one day is adequate 

          • The default data retention time is one day

          • For enterprise production environment we can set it to go days so that we can traverse data for a longer duration

b) Parameter

        DATA - RETENTION_TIME_IN_DAYS

B] Timezone 

         The time-related values are presented with the timezone in the Snowflake configuration.

a) Important Points -

         •- The default timezone is Los Angeles.

         •- We can change timezone based on our needs

         •- As normal practice companies set the value to if companies presence is across the globe.

         •- Set timezone same at on account level and at the user level.

         •-  Provide time-related values with timezone to consuming systems.




 

Tuesday, April 27, 2021

How to define chargeback model for snowflake ?

 Are you planning to implement snowflake and interested in knowing about how to plan your snowflake expenses ? Are you also would like to know what are the expenses occurs for snowflake implementation? If so,then you reached right place. In this article we will see what is snowflake chargeback model, what are the snowflake components where expenses occurs and how to plan those.so Let's start 





What is snowflake chargeback model?

snowflake operates with a flexible pay-as-you-go model. For traditional data solution we have initial fixed cost and later adds with service and maintains cost . Here snowflake gives flexibility to increase or decrease the cost based on usage. what we have to do is , just create an account and start using it . However , if we do not plan properly the snowflake bill will go high.

Factors to consider about pay 

a) How to Pay : We  have to decide in advance about paying snowflake credits. We need to decide whether each project will pay for usage or enterprise level payment.

b) Snowflake discount : Snowflake provides discount on the volume of purchased credit so consider making consolidated purchase.





Snowflake component with expenses?

Here is list of Snowflake components which produces expenses -

1. Warehouse

2. Snowflake

3. Materialized views

4. Cloud services

5. Data transfers

6. Storage

How to reduce snowflake expenses?

a) Very first thing to consider for reducing snowflake expenses is minimize the usage across enterprise i.e use only much you need

b) Plan for all potential expenses from each of the snowflake components

c) Track snowflake expenses to determine how much consumption has occurred & how to charge it.

d) Create naming convention for database objects which will help to identify owner and associate budget .

e) Maintain the snowflake expenses tracking

Learn more about snowflake here -




Tuesday, January 19, 2021

Join Conditions in Snowflake

 Are you looking for an article about the type of Join conditions in Snowflake? Are you also would like to understand these different types of join conditions in detail? If so, then you reached the right place. In this article, we will understand various join conditions used in Snowflake.





Introduction:

As like DBMS, Snowflake supports various types of join conditions. Join condition allow us to combine data from two or more tables. In Snowflake, there is no association between the tables using the primary key and foreign keys but we can still use the join condition.


Type of Join in Snowflake:

  • Inner Join
  • Outer Join
    • Left Outer Join
    • Right Outer Join
    • Full Outer Join
Let's understand each Type of Join with help of the Venn diagram

Understanding Syntax for Join Condition

Using the join condition in the SELECT statement is command practice. After writing the SELECT statement in the FROM clause we need to provide a table name then followed by the JOIN keyword. We also need to provide details on which column we need to perform join by using an ON clause. Here is the syntax for the join condition in Snowflake.




A. Inner Join

The Inner join in Snowflake is used to return rows from both tables which satisfy the condition. 

Example:
SELECT CUST.CUST_NBR
, CUST.FIRST_NAME
, CUST.LAST_NAME
, ADDR.AD_LINE_1
, ADDR.COUNTRY
FROM CUSTOMER CUST
INNER JOIN ADDRESS ADDR
ON CUST.CUST_NBR = ADDR.CUST_NBR.

The above query returns the elements which common between the CUSTOMER and ADDRESS tables.


B. Outer Join

If two tables do not contain all the records then we use an OUTER JOIN. Each table contains records even if the other table does not contain any corresponding record. The outer Join is subcategorized as 
  1. Left Outer Join
  2. Right Outer Join
  3. Full Outer Join
Let's understand these Outer join one by one.

1. Left Outer Join

The LEFT JOIN in Snowflake returns all the rows from the table on the left side of the equation even if there are no matching rows in the table on the right side of the equation.



Syntax

SELECT CUST.CUST_NBR
, CUST.FIRST_NAME
, CUST.LAST_NAME
, ADDR.AD_LINE_1
, ADDR.COUNTRY
FROM CUSTOMER CUST
LEFT OUTER JOIN ADDRESS ADDR
ON CUST.CUST_NBR = ADDR.CUST_NBR.





2. Right Outer Join


The RIGHT JOIN in Snowflake returns all the rows from the table on the right side of the equation even if there are no matching rows in the table on the left side of the equation
.
Syntax
SELECT CUST.CUST_NBR
, CUST.FIRST_NAME
, CUST.LAST_NAME
, ADDR.AD_LINE_1
, ADDR.COUNTRY
FROM CUSTOMER CUST
RIGHT OUTER JOIN ADDRESS ADDR
ON CUST.CUST_NBR = ADDR.CUST_NBR.

3. Full Outer Join

The FULL OUTER JOIN in Snowflake returns all the rows from both tables irrespective of condition match.
Syntax
SELECT CUST.CUST_NBR
, CUST.FIRST_NAME
, CUST.LAST_NAME
, ADDR.AD_LINE_1
, ADDR.COUNTRY
FROM CUSTOMER CUST
FULL OUTER JOIN ADDRESS ADDR
ON CUST.CUST_NBR = ADDR.CUST_NBR.


















Learn more about Snowflake here

What are differences between multimerge and merge API in Informatica MDM

                Are you interested in knowing what is the use of multimerge and merge APIs? Are you also would like to know the difference b...