Deduplication

The identification and elimination of duplicate assets in a library. Exact deduplication uses content hashing to detect byte-identical files stored under different names. Near-duplicate detection uses visual embedding similarity to identify perceptually identical images that differ at the binary level.

AI art libraries accumulate duplicates rapidly. The same image may be saved from the ComfyUI output folder, copied into a project directory, and backed up to cloud storage — three copies with different filenames but identical content. Exact deduplication using content hashes (like SHA-1) catches these perfectly.

Near-duplicate detection is more subtle. Two images generated with the same prompt but different seeds may be visually indistinguishable to a human but have completely different binary content. Near-duplicate detection uses embedding similarity — if two images occupy nearly the same position in embedding space, they are flagged as near-duplicates. This helps identify unintentional regenerations and near-identical outputs that clutter the library.