Decrease the runtime of large DC jobs

Last published at: February 11th, 2022
Delete

When running a Duplicate Check job, Duplicate Check searches your database for duplicate records. The runtime of your DC job is affected by database size, the applied scenarios, the number of records that are returned in an index search and the status of the Salesforce servers. If you want to decrease the runtime of your DC job, read this article for some useful tips and tricks.

1. Database size

The more records in your database, the more records Duplicate Check needs to compare every time you run a DC job. Pretty obvious. If you want to do a specific search in your database, you could decide to apply a filter to your job to run the job on a subset of your data. Ask yourself, is it really necessary to run this job in my entire database?

Delete

Learn more about the DC Job filter in this knowledge item.

2. Scenarios

Duplicate Check identifies duplicate records based on the applied scenario. A scenario defines what fields, records should be compared on, to find and identify duplicate records. Our default scenario for Leads has defined 5 fields, so it will compare records based on 5 fields. If you apply a more extended scenario, or multiple scenarios, the runtime of your job will be extended as well.

These tests are executed in a test environment with dummy data (1275 Lead records, 20% duplicates). No rights can be derived from this information. 

3. Numbers of records returned in index search

When running a DC Job, Duplicate Check returns a number of potential duplicate records for every record in your Object in the duplicate detection process. This is a process that runs in the background and is not visible to the user. Out of those returned potential duplicate records, Duplicate Check will define duplicate records that reach the threshold level. The number of records that are returned in that background process is defined in the DC Setup. Generally spoken, the more records you return in the index search, the better the duplicate results. However, returning more duplicate records will extend to the runtime of your DC job.

These tests are executed in a test environment with dummy data (1275 Lead records, 20% duplicates). No rights can be derived from this information.

4. Duplicate Check Local

Duplicate Check is a native Force.com application. That's pretty awesome since we're the only deduplication app that is native! The advantage of totally running on the Salesforce cloud is that we can analyze your data right where it is. The 'downside' is that fact that we depend on the Salesforce servers. Even though the uptime is great, the server speed is a bit variable. If the Salesforce servers have a bad day, Duplicate Check is affected by that as well. Running a job could take a little longer than usual, but will always deliver results. So, we created Duplicate Check Local with which you can process your data on a local computing machine and returns the results to the Salesforce Cloud Service.

Delete

Learn more about Duplicate Check Local in this knowledge item.