DronaBlog

Tuesday, November 23, 2021

Why snowflake is leader ?

                  Do you know why snowflake is one of the leading cloud data warehouse platforms available in the current market? Are you also aware that how Snowflake has evolved over the period of time and still evolving and replacing traditional data warehouses? We are going to understand all these things in this article.


Introduction:

                  In this article, we are going to see the golden age in which we are living. we will also see the design for the traditional data warehouse. Then we will see how snowflake evolved over the number of years and then finally we will see what's making snowflake a leader in the cloud data warehouse market.


Golden Age:

                 Let's start with the Golden age. Currently, we live in the golden age of distributed computing. The public cloud platforms such as Amazon web services, Google cloud platform, or Microsoft Azure provides unlimited storage and compute resources and these resources are available on demand. Because of that only end-user can Enterprise-class experience for systems or applications with help of software as a service or SAAS model. For this experience, we do not have to spend a lot of money. These services are cost-efficient and perform well. Cloud Dataware leverages these features but not the traditional data warehouse.





                  Before going to see what are the drawbacks of the traditional data warehouse implementation. Let's have a look into the design of the traditional data warehouse. As we can see in this screen, the traditional data warehouse has multiple layers, Those are the data source layer, staging layer, warehouse layer, data mart layer, and client layer.



                  The data source layer brings the data from various sources such as Salesforce, CRM, Human Resources, etc. such data is stored in a traditional database or flat-file format, The ETL i.e. Extract transform, the load is implemented to pull data from source systems and push to staging layer. After performing cleansing standardization data is then loaded from the staging layer to the data warehouse. Along with Raw data, we also store metadata and a summary of data in the data warehouse. Finally, this data is published to the data mart. The data mart night be for sales, inventory, or for purchasing. On top of this layer data mart, the client layer will be present. The business users or business analysts will perform various operations in order to carry out in-depth data analysts, prepare the reports and perform data mining. All these users will connect to multiple data marts for their needs.

                   As we can see this traditional data warehouse model is complex and resource extensive traditional data ware model is designed considering the fact that it will deal with fixed resources. That was true earlier but with evolution technologies, social media, and advancement in the sector fixed resources design is no more relevant. We deal with a variety of data coming with different speeds and formats. Traditional data warehouses face challenges in managing all these aspects of data.

                 Another aspect of a traditional data warehouse is investment cost. we need to invest a big chunk of money in the early stages of data warehouse implement which is not the case with the cloud data warehouse.

                Complex ETL pipelines are another drawback of the traditional data warehouse. As we can see we need to build multiple pipelines to push data from data source to staging layer, staging layer to the data warehouse, and data warehouse to data mart. Adding flexibility to this flow is a very challenging thing for the traditional data warehouse. Hence snowflake comes into the picture. 

Snowflake Evolution:

We are going to see various things about snowflakes but before going to see features and the advancement in the snowflake, let's understand a few things related to snowflakes. currently, snowflake supports three cloud platforms. And those are amazon web services, google cloud platform and Microsoft azure snowflake supports various regions across the world and those are north America, Europe, Asia specific. Each of these regions is supported by respect cloud platforms i.e. Amazon web services, google cloud platform and Microsoft azure.




                Let's have look at how snowflake evolved over the period of time snowflake was founded in the year 2012 and was published in Oct 2014. In the same year, it come with the Amazon S3 platform once it become more stable, it introduced Microsoft azure cloud in 2018, and in this year 2019 Google cloud platform was introduced. As we can see, within a short period of time this product has evolved a lot. It will support three major cloud service providers, i.e. Amazon, Google, and Microsoft. As it progresses, it will support many more cloud providers in the future. Snowflake was number one rank in cloud 100 in this year 2019. Snowflake is one of the leading tools in the cloud data warehouse market.





                So, what makes snowflake a leading platform? The critical aspect about snowflake is it segregated storage and compute layer traditional data warehouse either support shared-nothing architecture or shared disk architecture. On other hard snowflake brought a hybrid approach on the table with benefits from both shared-nothing & shared disk approach. Apart from it, the snowflake is a pure software as a service, product .i.e users don't have to worry about software installation, administration, product upgrades, etc. It also supports ASSI SQL and ACID transactions. semi-structured data which is difficult to manage with a traditional data warehouse is easily managed & maintained using a snowflake cloud data warehouse. The Elastic storage of computing resources can be scaled independently and seamlessly. That's a very critical aspect brought by a snowflake. It is highly available and it is durable. And of course, it is cost-efficient snowflake is also working on improving cost efficiency furthermore. Last but not least snowflake is secure and comes with end-to-end encryption.

              Because of all those features snowflake is the leader in the cloud data warehouse.




Sunday, November 21, 2021

What is virtual Warehouse in Snowflake ?

                  Are you looking for details virtual warehouse? Are you also interested in knowing what role virtual warehouse plays in snowflakes? If so, then you reached write place, in this article we will learn about virtual warehouses in snowflake in detail.


A) What is Virtual Warehouse? 

                  Before knowing about virtual warehouse we need to know what is EC2? EC2 is also known as Elastic compute cloud which is a web service that provides secure, resizable compute capacity in the cloud. Now, we know what is EC2, Let's understand what is the virtual warehouse.

                  The virtual warehouse is an important layer in snowflake architecture and it consists of clusters of EC2  instances. The virtual warehouse is an abstraction by which each cluster is presented to a single user.






B) What is a worker node in the virtual warehouse? 

                   As we know virtual warehouse consists of a cluster of EC2 instances. The individual EC2 instance is called a worker node that performs given tasks. End-user never interacts with worker node. When users perform any action which involves virtual warehouse processing during such time users do not know how many worker nodes are working on and how they are performing tasks in the warehouse.


C) What are the virtual warehouse sizes? 

                   Virtual warehouses come with T-shirt sizes .currently available VW size are 

                   i) x- Small                                               vi ) 2x - Large 

                  ii) Small                                                   vii) 3x - Large 

                 iii) Medium                                              viii) 4x - Large 

                iv)  Large                                                     ix) 5x - Large 

                 v) x - Large                                                x) 6x - Large






D) Elasticity and Execution Engine 

                   Virtual warehouse comes with two important concepts Elasticity an Execution Engine . The VWs  are compute resources and these can be created , resized and destroyed at any point of time . This feature is Elasticity and has no effect on the state of persistant store or database .

                   The execution engine is implemented by a snowflake and it is a SQL execution engine. This engine is built based on the below features 

                        i) Columnar

                       ii) Vectorized 

                      iii) Push-based 


                     Learn more about snowflake here  

    


        

Friday, November 19, 2021

What are the Batch Processes in Informatica MDM?

Are you looking for an article that explains various processes in Informatica Master Data Management (MDM)? If so, then you reached the right place. In this article, we will learn about various processes through which records are loaded to the MDM system. Let's start.


Informatica MDM contains the various processes and those are -


Step 1: The land process transfers data from a source system via ETL jobs to landing tables in the MDM ORS (Operational Reference Store).

Step 2: The stage process (Stage Job) reads the data from the landing table, cleanses the data if applicable, and moves the cleansed data into a staging table via mapping in HUB Console.

Step 3: The load process (Load Job) loads data from the staging table into the corresponding base object in MDM ORS.

Step 4: The tokenize process (Tokenization Job) generates match tokens based on match columns that are used subsequently by the match process to identify candidate base object records for matching.

Step 5: The match process (Match Job) compares two records for points of similarity. If sufficient points of similarity are found to indicate that the two records are probably duplicates of each other, then Informatica MDM Hub flags those records for consolidation.





Step 6: The consolidate process (Merge Job) merges duplicate records into a single record after duplicate records have been identified in the match process.

Step 7: Publish or distribution process is the main outbound flow for Informatica MDM Hub. Hub integrates with external systems or DB Schemas to share the consolidated (Golden) Records.


Learn more about Informatica MDM here,





Saturday, November 6, 2021

What is difference between HTTPS , SSL and TLS ?

                  Are you looking for details about HTTPS protocol? Are you also interested in knowing the differences between HTTPS, SSL, and TLS? If so, then you reached the right place. In this article, we will learn more about HTTPS, SSL, and TLS. Let's start.


A) Understand the speed of the data 

               The data sent over the internet is very fast. It is faster than traditional channels such as wires, optic fiber, air. It will not be an exaggeration if we say data send over the internet with speed of light. Even speed of data sent over the internet is fast, but it still has to go through multiple devices during its journey over the network and that is where criminals target data.






B) What is HTTPS?

                The internet consists of distributed client and server information systems. When we access any application using a computer or mobile or any other type of device, ( these devices act as the client ) we send the request to the server. The server can accept or reject the request. If the request is accepted then a connection is created over a specific protocol. In order to establish a communication set of rules which are implemented with the protocol.

                 HTTP stands for Hypertext Transfer Protocol used on the worldwide web(www). This commonly used protocol defines 

                                   1. How data is formatted

                                   2. What type of data is to be transmitted 

                                   3. How the server should respond to the specific command 

                  However, HTTP is not secure as it does not have data encryption and authentication functionalities. In order to achieve security especially transmitting data over the network, Hypertext Transfer protocol secure (HTTPS) protocol can be used.

                                  Though HTTPS is a safer solution for the client and server models, this added security isn't automatic. In order to maintain security standards, we need to purchase SSL/TLS certificates from a trusted certificate authority.


C) What is SSL? 

                The SSL stands for Secure Socket Layer. The internet connections are maintained safely by SSL encryption and decryption method. These connections can be between client to client or client to server or server to server. As SSL is an older protocol, the updated TLS was released in 1999 and it is being commonly used nowadays.






D) What is TLS?

                 TLS stands for Transport Layer Security. TLS is a cryptographic protocol used for achieving better privacy, data integrity, and authentication compared to SSL. It supports stronger, secure cipher suites and algorithms.

                TLS is more commonly used in computer networks, web browsing, instant messaging, email etc. 


                     Learn more about Java here 




Saturday, October 30, 2021

How does TLS or SSL Decryption work ?

              Would you like to know how does TLS or SSL decryption work? Would you be also interested in knowing Symmetric and Asymmetric cryptography? If so, then you reached the right place. In this article, we will explore decryption with TLS or SSL. Let's start.

A) What TLS or SSL? 

               As discussed in what is the difference between HTTPS, SSL, and TLS  ? article, TLS or SSL is a cryptographic protocol for achieving privacy, data integrity over the network.






B) How does TLS/SSL decryption work? 

               The TLS and SSL both use asymmetric cryptography. TLS /SSL provides reliable security with high performance.

                a) Symmetric Cryptography :

                     Symmetric cryptography uses a secret key to encrypt data. The generated secret key is shared with the sender and receiver. The secret key should be 128 bits in length in order to achieve security.

                 b) Asymmetric Cryptography :

                      Asymmetric cryptography uses private and public keys. The public and private keys are mathematically designed. It requires higher bandwidth. The key length should be a minimum of 1024 bits.

                c) Secure session key : 

                     The secure session key is generated by SSL /TLS by using asymmetric cryptography. The secure session key is used to decrypt and encrypt the data transmitted over the network. secure session the TLS handshake is achieved with the secure session key.






C) What TLS handshake? 

               The TLS handshake is a process to achieve communication between server and client to achieve the below Functionalities -

               1. Acknowledge one another 

               2. Verify each other's authenticity 

               3. Designate encryption algorithms 

               4. Agree on session keys. 




                

Friday, October 22, 2021

What are differences between multimerge and merge API in Informatica MDM

                Are you interested in knowing what is the use of multimerge and merge APIs? Are you also would like to know the difference between merge and multimerge API? If so, then you reached the right place. In this article, we will learn about these APIs in detail.


A) What is Multimerge API? 

                 The Multimerge API is used to merge the list of records together. Multimerge is the generic form of merge API.






B) What is Merge API? 

                The merge API is used to merge two base object records that are identified as the same base object record.


C) What are the differences between Multimerge and Merge API? 

          1) Number of records to merge : 

              a) Merge API allows only two records to merge 

              b) Multimerge API allows more than two records to merge.

         2) Parameters to request : 

             a) Merge API accepts sourceRecord key and targetRecord key as parameters in the input

             b) Multimerge API accepts multiple record key lists as parameters in the request.





         3) Consolidated records : 

             a) Merge API allows records irrespective of the value of consolidation indicator 

             b) Multimerge API allows merging of unconsolidated records only i.e. consolidation indicator                   !=1

         4) Final value for consolidation indicator : 

            a) The final value for consolidation indicator after performing merge API operation is 1 i.e                           consolidated state  

            b) Multimerge API does not change consolidation indicator value for surviving records.

        5) Surviving Record : 

             a) The surviving record is specified in merge API with targetRecordkey as the parameter.

             b) For Multimerge API, the surviving record will be determined based on survivorship rules of the XREF that are participating in the merge process.


                 Learn more about Informatica MDM survivorship rules here 



   

Saturday, October 16, 2021

What is Time Travel in Snowflake ?

                        Are you looking for details about Time Travel in Snowflake? Are you also interested in knowing what are tasks we can perform using Time travel feature? If so, then you reached the right place. In this article, we will learn one of the powerful features is Snowflake.


A) What is Time Travel in Snowflake

                        The feature by which we can access historical data at any point within a specified period is called Time Travel in snowflake we can access data not only changed but deleted as well.


B) What are the tasks that can be performed using Time travel in Snowflake?

                      The tasks below can be effectively performed by using Time Travel Feature 

             1.  Backing up the data from key points in the past.

             2. Duplicating the data from key points in the past.

            3. Restoring tables, schemes, and databases if those are accidentally deleted.






C) What is Data Protection Lifecycle? 

                  In snowflake, there are three-phase of the data protection lifecycles. 

           1. Current Data Storage: on the current data set we can perform standard operations such as DML , DDL etc.

           2. Time Travel Retention: The normal retention period is 1 to 90 days. Here is the list of operations allowed with time travel.

          a) SELECT .... AT| BEFORE ...

          b) CLONE ... AT|BEFORE ...

          C) UNDROP...

          3. Fail safe: This is the last phase in Data Protection Lifecycle. This can only be performed by snowflake No user operations are allowed.


D) Data  Retention Period in snowflake 

                In snowflake, Data Retention Period is a key component for Time Travel. The Data retention period specifies the period or number of days we can preserve data. Snowflake Preserves the state of data before update /delete/drop. 





               For Snowflake Standard  Edition, the Data Retention period is one day.

              For Snowflake Enterprise  Edition, Data Retention Period between 0 to 90 days.


            Learn more about Snowflake here -



           

What is CRM system?

  In the digital age, where customer-centricity reigns supreme, businesses are increasingly turning to advanced technologies to manage and n...