Cloudback Docs
HomePricingBlogContactSign In
  • Getting Started
    • What is Cloudback?
    • Installation Guide
    • First Backup Walkthrough
  • Managing Backups
    • Automated Daily Backups
    • Metadata Backups
    • One-Click Manual Backups
    • Setting Backup Schedules
    • Manage Backup Storage
    • Backup Retention Policy
    • Password-Protected Backups
    • Account Settings
    • Bulk Operations
    • Data Deduplication
    • Email Notifications
    • Instant Notifications
    • Archive Name Pattern
  • Data Restoration
    • Download Backups
    • Restore to GitHub
  • Automation
    • Terraform Provider
    • Operations API
  • Dashboard
    • Dashboard Overview
    • Card view
    • Table view
    • Repository Details
    • Backup Details and Metadata
    • Backup Status Badge
  • Storage Configuration
    • Cloudback Managed Storages
    • Customer Managed Storages
    • Replicating Backups
  • Supported Storages
    • Alibaba Cloud Object Storage Service
    • Amazon S3 Bucket via Access Key
    • Amazon S3 Bucket via Access Point
    • Amazon S3 Glacier
    • Amazon S3 Object Tagging
    • Google Cloud Storage Bucket
    • Microsoft Azure Blob Container
    • Microsoft OneDrive Business
    • Microsoft OneDrive Personal
    • OpenStack Swift
    • Wasabi Customer Managed Storage
  • Account and Billing Management
    • Payment Methods
    • GitHub Organizations
    • Invoiced Customers
  • Troubleshooting and Support
    • Known Issues
    • Contact us
  • Security Features
    • Access Review: Vanta Integration
    • Immutability: Amazon S3 Object Lock
    • Encryption: Password-Protected Archives
    • Traceability: Audit Log
  • Legal
    • Terms of Service
    • Privacy Policy
Powered by GitBook
LogoLogo

Learn more

  • Integrations
  • Blog

Explore

  • Roadmap
  • Changelog

Support

  • Contact Us
  • Status

Legal

  • Terms of Service
  • Privacy Policy

© 2025 Cloudback

On this page
  • How deduplication works
  • How to enable deduplication
  • When deduplication occurs
  • Backup deduplication status
  • Backup deduplication statistics
  • Learn More

Was this helpful?

  1. Managing Backups

Data Deduplication

PreviousBulk OperationsNextEmail Notifications

Last updated 5 months ago

Was this helpful?

With data deduplication, you can save a significant amount of cloud storage space and reduce operational costs. Cloudback offers a simple but efficient technique of data deduplication.

How deduplication works

Our backup archives are deterministic. If nothing has changed in a GitHub repository, we create the same archive before encrypting it. Once the archive is encrypted, the AES encryption makes it non-deterministic.

Cloudback compares the checksum of a new deterministic backup with a previous deterministic backup before encrypting it. If the archives match, Cloudback doesn’t upload a new archive to storage, but instead stores a link to the previous archive. The retention policy is extended for the previous archive, it’s not deleted until there are valid linked backups.

How to enable deduplication

The feature is released on 06 May 2022, please note that:

  • Deduplication is disabled by default for all customer managed storages created before 06 May 2022; you must enable it manually if you want to use this feature.

  • Deduplication is enabled by default for all customer managed storages created after 06 May 2022; you must manually disable it if you don’t want to use this feature.

Data deduplication is a customer managed storage setting and it’s configured separately for each storage. To change the deduplication setting for a storage, use the Deduplication Type combo box in the New Storage and Edit Storage:

The following deduplication types are available:

  • Only archive if changes have occurred - no duplicates: Cloudback will not store a new archive if the checksum of a new backup matches the checksum of a previous backup. This is the default deduplication type.

  • Always archive - duplicates allowed: Cloudback will always store a new archive even if the checksum of a new backup matches the checksum of a previous backup.

When deduplication occurs

Deduplication occurs when certain conditions are met:

  • Deduplication is enabled for the storage while creating a storage or by editing an existing storage

  • The previous backup of a repository is stored in the same cloud storage

  • The checksum of a previous backup matches the checksum of a new backup

Backup deduplication status

The deduplication status of a backup is displayed in the Backup details window as the Deduplicated field:

  • Yes means that the backup is deduplicated

  • No means that the backup is not deduplicated

You can open the Backup details window by clicking on the Information icon which is displayed in the list of backups in the Backups tab on the Repository details page, or by clicking on the Information icon which is displayed in the list of backups on the repository card in the Dashboard page.

Backup deduplication statistics

The deduplication statistics are displayed in the Deduplication over period section of the Repository details page. The statistics are displayed for specific periods: Last Week, Last Month, Last 3 Months, Last 6 Months, and Last Year.

The following statistics are displayed for the selected period:

  • Backups processed - the total number of backups

  • Backups deduplicated - the number of backups that were deduplicated

  • Backups size - the total size of backups uploaded to storage

  • Space saved - the total size of saved space in storage

Statistics charts are displayed for the selected period:

  • Donut chart shows the number of deduplicated and non-deduplicated backups

  • Bar chart shows the amount of saved space in the storage.

    • the green bars represent the amount of saved space

    • the grey bars represent the amount of space used by non-deduplicated backups

Learn More

Cloudback Managed Storages
Customer Managed Storages
Backup Schedules
Data deduplication type
Backup deduplication status
Repository deduplication stats