Trove

Checksums

Data integrity verification with SHA-256, BLAKE3, and XXHash checksum algorithms.

Trove computes checksums on every write and verifies them on read, ensuring data integrity across storage backends. The checksum algorithm is configurable per Trove instance.

Algorithms

Trove supports three hash algorithms, each with different trade-offs:

AlgorithmConstantCryptographicSpeedUse Case
SHA-256trove.SHA256YesBaselineDefault, compliance-friendly
BLAKE3trove.Blake3Yes~3x fasterHigh-throughput workloads
XXHashtrove.XXHashNo~10x fasterInternal dedup, non-security contexts

SHA-256 is the default. Switch to BLAKE3 for better throughput without sacrificing cryptographic properties, or XXHash when security is not a concern and raw speed matters.

Configuration

Set the algorithm when creating a Trove instance:

t, err := trove.Open(drv,
    trove.WithChecksumAlgorithm(trove.Blake3),
)

The default configuration uses SHA-256:

cfg := trove.DefaultConfig()
// cfg.ChecksumAlgorithm == trove.SHA256

How It Works

Automatic Computation on Put

When Put is called, Trove's internal checksum layer reads the data stream, computes the hash, and stores the result alongside the object metadata.

// Put automatically computes a checksum
info, err := t.Put(ctx, "data", "report.csv", file)
// info.ETag contains the computed hash

The Object model stores both the algorithm and the hex-encoded digest:

type Object struct {
    // ...
    ChecksumAlg  string `json:"checksum_alg"`   // e.g., "sha256"
    ChecksumVal  string `json:"checksum_val"`    // hex-encoded digest
}

Verification on Get

When Get is called, the stored checksum is verified against the retrieved content. If the data has been corrupted or tampered with, ErrChecksumMismatch is returned.

reader, err := t.Get(ctx, "data", "report.csv")
if errors.Is(err, trove.ErrChecksumMismatch) {
    // Data integrity failure -- content does not match stored checksum
}

CAS Hash Format

Content-Addressable Storage (CAS) hashes use a prefixed format that encodes both the algorithm and the digest:

algorithm:hexdigest

Examples:

sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
blake3:af1349b9f5f9a1a6a0404dea36dcc9499bcb25c9adc112b7cc9a93cae41f3262
xxhash:ef46db3751d8e999

This format is used in the deduplication middleware and CAS operations to identify content by its hash.

Computing Checksums Directly

The internal checksum package provides standalone functions for computing and verifying checksums:

import "github.com/xraph/trove/internal"

// Compute from a reader
checksum, err := internal.ComputeChecksum(reader, internal.AlgBlake3)
// checksum.Algorithm == "blake3"
// checksum.Value     == "af1349b9f5f9..."

// Compute from bytes
checksum, err := internal.ComputeChecksumBytes(data, internal.AlgSHA256)

// Verify against an expected checksum
err := internal.VerifyChecksum(reader, expected)
// Returns an error if the checksums do not match

X-Trove-Checksum Header

When Trove is used as an HTTP API (via the Forge extension), the checksum is exposed in the X-Trove-Checksum response header:

HTTP/1.1 200 OK
Content-Type: application/octet-stream
X-Trove-Checksum: sha256:e3b0c44298fc1c149afbf4c8996fb924...

Clients can validate the header value against locally computed hashes for end-to-end integrity verification.

Performance Comparison

Relative throughput benchmarks for computing checksums on a 1 GB file (higher is better):

AlgorithmThroughputRelative Speed
SHA-256~500 MB/s1x
BLAKE3~1.5 GB/s3x
XXHash~5.0 GB/s10x

These numbers are approximate and vary by hardware. BLAKE3 benefits significantly from SIMD instructions on modern CPUs.

Choosing an Algorithm

  • SHA-256 -- Best for regulatory compliance (FIPS 180-4), interoperability with AWS S3 ETags, and scenarios requiring widely recognized cryptographic hashes.
  • BLAKE3 -- Best for high-throughput workloads where cryptographic integrity is needed but SHA-256 is a bottleneck. Recommended for most production deployments.
  • XXHash -- Best for internal operations like deduplication indexes and cache keys where collision resistance is sufficient and security is not a concern.

On this page