# Data Deduplication

With data deduplication, you can save a significant amount of cloud storage space and reduce operational costs. Cloudback offers a simple but efficient technique of data deduplication.

## How deduplication works

Cloudback backup archives are deterministic. If nothing has changed in a repository or workspace, Cloudback creates the same archive before encrypting it. Once the archive is encrypted, the AES encryption makes it non-deterministic.

Cloudback compares the checksum of a new deterministic backup with a previous deterministic backup before encrypting it. If the archives match, Cloudback doesn’t upload a new archive to storage, but instead stores a link to the previous archive. The retention policy is extended for the previous archive, it’s not deleted until there are valid linked backups.

## How to enable deduplication

Deduplication is enabled by default for newly created customer managed storages. If you have older storages (created before May 2022), you may need to enable it manually.

Data deduplication is a customer managed storage setting, and it’s configured separately for each storage. To change the deduplication setting for a storage, use the `Deduplication Type` combo box in the `New Storage` and `Edit Storage` pages:

![Data deduplication type](/files/ax9rgr95GvtPQsFoNxit)

The following deduplication types are available:

* **Only archive if changes have occurred - no duplicates**: Cloudback will not store a new archive if the checksum of a new backup matches the checksum of a previous backup. This is the default deduplication type.
* **Always archive - duplicates allowed**: Cloudback will always store a new archive even if the checksum of a new backup matches the checksum of a previous backup.

## When deduplication occurs

Deduplication occurs when certain conditions are met:

* Deduplication is enabled for the storage while creating a storage or by editing an existing storage
* The previous backup of a repository is stored in the same cloud storage
* The checksum of a previous backup matches the checksum of a new backup

## Backup deduplication status

The deduplication status of a backup is displayed in the `Backup details` window as the `Deduplicated` field:

* `Yes` means that the backup is deduplicated
* `No` means that the backup is not deduplicated

![Backup deduplication status](/files/3qnJWIzknv9wyFAAEsur)

You can open the `Backup details` window by clicking on the `Information` icon which is displayed in the list of backups in the `Backups` tab on the `Repository details` page.

## Backup deduplication statistics

The deduplication statistics are displayed in the `Deduplication over period` section of the `Repository details` page. The statistics are displayed for specific periods: `Last Week`, `Last Month`, `Last 3 Months`, `Last 6 Months`, and `Last Year`.

![Repository deduplication stats](/files/FgLebQyJo1dIbtkg6tf4)

The following statistics are displayed for the selected period:

* `Backups processed` - the total number of backups
* `Backups deduplicated` - the number of backups that were deduplicated
* `Backups size` - the total size of backups uploaded to storage
* `Space saved` - the total size of saved space in storage

Statistics charts are displayed for the selected period:

* **Donut chart** shows the number of deduplicated and non-deduplicated backups
* **Bar chart** shows the deduplication savings over the selected period.
  * the grey bars represent daily deduplication savings (deduplicated size per day)
  * the green bars represent the accumulated total deduplication savings over time

## Learn More

* [Cloudback Managed Storages](/storage-configuration/cloudback-managed-storages.md)
* [Customer Managed Storages](/storage-configuration/customer-managed-storages.md)
* [Backup Schedules](/managing-backups/setting-backup-schedules.md)


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.cloudback.it/managing-backups/data-deduplication.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
