Duplicate Photo Sorting

Over the course of 15+ years, I’ve migrated family photos across multiple different systems. I’m finally consolidating and sorting. A problem I have is due to old camera file names (IMG_0001), I didn’t overwrite files but instead copied to a different folder (photos/canon and photos/iphone1 etc)

I’m going to list some ideas for those who ever need to embark on a similar path. I’m not in software devs or anything, so maybe this is a common approach. I made some mistakes in my first attempt, and went back and restarted. Since I’m on a self-host path, Google, Picasa, Apple photos aren’t in the cards yet.

Step 1 - Take your mega photo folder and copy it to a different drive.

Step 2 - Czkawka is awesome freeware that has the capability of finding file duplicates via hash. Run it on the copied folder.

Step 3 - Spend numerous hours going through the dupes and deleting them. I’m at 30k dupes.

Czkawka caveat - It doesn’t look at exif data. Don’t rely on the “created on” date to be accurate. Czkawka is reading the os’s “created on” date, not the exif or “modified on” date.

Step 3a - Keep photos in this priority
i) correct created date
ii) correct name by date (IMG_20201225.jpg for example).

If you have IMG_001.jpg and 20201225.jpg as duplicates, but 20201225.jpg has a “created on” date 05-17-22, but IMG_001.jpg has “created on” 12-25-2020, I keep the IMG_001, for the reason explained below regarding the Bulk Rename.

Step 4 - Copy this folder with dupes deleted to an ssd.

Step 5 - Using Czkawka, run the “similar photos” search

Step 6 - Continue similar to Step 3/3a

Step 7 - Copy this folder to a new folder (either on SSD or a different SSD). THIS IS LIKELY YOUR FINAL DESTINATION for all future photos in addition! So choose an accessible place for uploads from your phone.

Step 8 - Use Bulk Rename Utility to read exif data and rename all files to a format you like. I use year, month, day, hour, minute, and then _1 if you have multiple photos in one time frame. If exif data doesn’t exist, then use the “created on” date from the OS.
Step8a (optional) - copy/paste this to another folder on an ssd.

Step 9 - Drag photos into folders based on your sorting method. I do it by year. Maybe you want it by event. Your choice.

Step 10 - run your photo search/host/indexing software of choice, whether photo prism, light room, etc.

Additional thoughts: WAF factor is huge on this.

I’ve had problems with Nextcloud backing up my phone camera. There are a lot of glitches it seems, because it hangs a lot and won’t get past certain files.

Xpenology can run in proxmox (which, you should have some computer running Proxmox in your house at all times), and you can utilize Synology’s impeccable DS Photo backup app. Some hangups can still occur in this app, but they’re easier to realize, allowing you to delete the problem photo. DS Photo also saves files by date, unlike Nextcloud that had problems when I tried this.

You can transfer your Xpenology data over to Unraid or wherever else rather easily.

Czkawka has a photo preview window that helps with sorting the “similar photos” to make sure the algorithm is right. I found some weird corrupted photos that were identified as “similar” but had weird bands across one of them.
Czkawka does not have a video preview tab.

3 Likes

Nice writeup and thanks for sharing. This is something I need to do too.