DupShelf

DupShelf

Exact duplicate photo finder (byte-for-byte)

Not every duplicate finder works the same way. “Similar image” tools use perceptual hashes to group photos that look alike—which can merge burst shots, crops, or edits you still want. An exact duplicate photo finder only groups files with identical content: same bytes on disk, regardless of filename or folder. DupShelf uses SHA-256 hashing for that conservative, review-friendly pass.

Exact vs similar: why the distinction matters

Similar-image detection answers: do these look alike to an algorithm? Exact detection answers: are these the same file content? The second question has a yes/no answer grounded in cryptography. That is why exact mode is the recommended first step before you delete anything: false positives are far rarer than with perceptual matching.

What counts as an exact duplicate

What does not count: re-compressed JPEGs, cropped versions, or burst frames with different bytes—even if they look identical to you.

  • Same image saved twice under different names
  • Copy of IMG_1234.jpg next to the original in another folder
  • WhatsApp forward byte-identical to an earlier save
  • Backup restore that duplicated folders without renaming

Why DupShelf uses SHA-256

SHA-256 produces a fixed-length fingerprint of file contents. Comparing hashes is fast and deterministic: equal hash means equal file. We do not rely on filenames, EXIF dates, or thumbnail pixels. For large libraries, hashing dominates scan time, which is why progress and cancel are built in.

Workflow for safest cleanup

Scan the folder, open each group, verify thumbnails side by side, pick one keeper (often highest resolution or best path), move non-keepers to dupshelf-duplicate-images, then delete that folder only after visual confirmation in your file manager. Never bulk-delete from inside the browser tool.

When exact mode is not enough

If two files are the same photo but one was re-exported at lower quality, bytes differ and exact mode will not group them. That is when perceptual tools—or a future DupShelf similar mode—belong in pass two, after you have already removed provable copies.

Exact mode and professional libraries

Photographers and studios often have RAW + JPEG pairs, edited exports, and client deliveries. Exact dedup removes true copies (duplicate delivery ZIPs, double exports) without touching legitimately different edits. Always review groups that contain mixed extensions.

Auditing results for peace of mind

Open the largest duplicate groups first—they recover the most space. Click thumbnails side by side; if anything looks different, skip that group. Export CSV if you want a record before delete for tax, legal, or client audit trails.

Hash collisions—should you worry?

SHA-256 collisions are theoretically possible but not a practical concern for photo libraries. In practice, matching hashes mean matching files. If two different images ever matched, the chance is astronomically low compared to human review error.

Comparing to size-only or name-only tools

File size alone causes false positives when different images share size by coincidence. Filename rules miss renamed copies. Content hashing is the reliable standard for exact dedup.

Summary and next steps

Exact duplicate finding is the foundation of every sane photo cleanup project. Start here before similar-image AI, before cloud dedup, and before bulk delete in any gallery app. When DupShelf finishes, sort groups by recoverable size and work largest first—the top ten groups often represent most of the wasted space. Keep a CSV export if you need accountability for client work or family archives. Remember that hash equality is stronger evidence than filename patterns or file dates, which sync tools rewrite constantly. If a group surprises you, open both files in an external viewer; if they are not truly identical, skip the group. Teaching relatives to use exact mode first prevents the horror stories that come from perceptual tools merging different smiles in the same burst. When similar detection arrives in DupShelf, it will be opt-in and labeled—exact mode will remain default because trust matters more than feature count. DupShelf remains exact-first because one mistaken merge hurts more than one missed similar pair. Run the workbench on your messiest folder this week and compare recoverable space to what you guessed was duplicate.

Questions

Will you add similar photo detection?
Yes, as an optional mode with clear labeling. Exact mode will remain the default for safety.
Can exact mode merge PNG and JPG of the same scene?
Only if the bytes are identical. Different formats or compression usually mean different hashes.
Is SHA-256 overkill for photos?
It is fast on modern hardware and eliminates ambiguity. Simpler checks like file size alone would cause false positives.
Do thumbnails affect the hash?
No. We hash the full file contents on disk, not embedded previews.
How is this different from checksum tools?
Same idea, packaged for photo libraries with grouping UI, keeper marks, move-to-folder, and CSV export.
Will exact dedup find edited duplicates?
No. Edits change bytes. Use similar detection in a second pass if needed.
Does metadata or EXIF affect the hash?
No. Only raw file bytes on disk are hashed.

Related guides