DronaBlog

Thursday, July 9, 2020

Best Practices for Elastic Search in Informatica MDM

Elastic Search a search engine that is based on the Lucene library is used in the Informatica MDM in order to achieve free text searches like google as well as a fuzzy search like match engine search. In this article, we will understand what are the best practices which we need to follow in order to implement Elastic Search using the Informatica MDM solution successfully.



Introduction

It is vital to follow best practices while integrating Elastic Search with Informatica MDM. Some minor configuration may lead to expensive performance cost. The best practices provided here helps not only to achieve better performance but also for better search results.

Elastic Search Best Practices

Here are the details about the Best Practices

1. Indexing Job Execution
If we enable searchable properties for Base Object tables including lookup table then we need to run indexing job for lookup table first then followed by indexing job on remaining Base Object table. 

2. Indexing Job execution for all tables
If we have configured Searchable property for parent and child tables e.g. Party table, Party Phone table, etc. Then we need to run an indexing job for all the tables. First, run the indexing job for Party table and then run jobs for child tables

3. Facets configuration
Facets are used for pre-emptive grouping of the records. We need to use a limited number of facet fields as it has an advance impact on the performance of search functionality. We also have to make sure the fields for which we need to configure facets are having low entropy. Low entropy fields have a low set of unique values.

4. Unused Business Entities
If there are unused Business Entities with searchable properties then delete those as it will cause performance issues for indexing and load jobs.

5. Index Auto commit property
We need to increase the value of the auto-commit property and keep it optimum based on your environment configuration. The property es.index.refresh.interval can be used to set it

6. Indexing jobs in parallel
We should try to avoid running indexing jobs in parallel as that may cause resource exhaustion. 

7. Running load jobs in parallel
If we have configured searchable on multiple tables such as Party and Address tables then do not run load jobs for these tables in parallel. This is because during load job indexing job get executed and may lead to resource exhaustion scenario and job will fail.

8. Deleting indexes
The CleanTable API will not delete the indexes, we need to manually delete it if required. However, in case you still would like to delete the indexes then we need to use the curl command to execute Elastic Search APIs to delete those. As of now, there is no Informatica API to handle this use case.

9. Limiting the number of searchable fields for Business Entities
We have limitations on how many searchable fields we should use for the Elastic Search document. By default 50 number of nested fields are allowed in Elastic Search. Apart from it, there is a limit on the amount of data is required for Elastic Search REST calls. The limit is 104857600. So make sure less number of searchable columns are configured for the Business Entities.



Learn more about Informatica MDM here -




No comments:

Post a Comment

Please do not enter any spam link in the comment box.

What is CRM system?

  In the digital age, where customer-centricity reigns supreme, businesses are increasingly turning to advanced technologies to manage and n...