[Feature] Deduplication of images at server level #79

Closed
opened 2026-02-04 17:22:42 +03:00 by OVERLORD · 5 comments
Owner

Originally created by @Nonobis on GitHub (May 25, 2022).

Reference at this discussion #183
I think a de-duplication of simular images at server level could be great.

Originally created by @Nonobis on GitHub (May 25, 2022). Reference at this discussion #183 I think a de-duplication of simular images at server level could be great.
Author
Owner

@zackpollard commented on GitHub (May 30, 2022):

De-duplication is probably way down there in the list of priorities for this project. Regardless of that, de-duplication on a per-user level probably makes sense, however I would probably avoid de-duplication between users as mixing data is probably not a great idea for privacy sake. For the specific case in that issue, it probably makes sense to do something similar to google photos (again, far down the line) of the images being shared which allows the other user to directly access the original photos without them being copied, or have them automatically import into your own library, in which case they would be duplicated.

@zackpollard commented on GitHub (May 30, 2022): De-duplication is probably way down there in the list of priorities for this project. Regardless of that, de-duplication on a per-user level probably makes sense, however I would probably avoid de-duplication between users as mixing data is probably not a great idea for privacy sake. For the specific case in that issue, it probably makes sense to do something similar to google photos (again, far down the line) of the images being shared which allows the other user to directly access the original photos without them being copied, or have them automatically import into your own library, in which case they would be duplicated.
Author
Owner

@Nonobis commented on GitHub (May 30, 2022):

I agree, it's low prority ... but i am sorry but i don't really see the privacy aspect of this feature. We are on a selfhosted product for personal usage, not on a public service exposed. Every user with access to the storage hard can access the file directly if necessary.

It's possible to maintain a correct level of privacy between user and optimizing data storage.

If a 2 or more users has the sames pictures, you need to deduplicate images at least when uploading or in a background task at server level. It's optimizing storage space, not exposing pictures to non allowed user.

Even with actual relative low cost of big hard drive, it's a waste of space. You can store image id/user id in database without exposing data between user. If a 50 mo picture if uploaded by 10 or 50 users, imagine the loss of space or bandwitch if you backup your data offsite .....

I like your idea but i think it's not enougth to limit de-duplication at album level for me.

Sorry if i'm not clear or sound agressive (it's not the case), english is not my main language.

@Nonobis commented on GitHub (May 30, 2022): I agree, it's low prority ... but i am sorry but i don't really see the privacy aspect of this feature. We are on a selfhosted product for personal usage, not on a public service exposed. Every user with access to the storage hard can access the file directly if necessary. It's possible to maintain a correct level of privacy between user and optimizing data storage. If a 2 or more users has the sames pictures, you need to deduplicate images at least when uploading or in a background task at server level. It's optimizing storage space, not exposing pictures to non allowed user. Even with actual _relative low cost_ of big hard drive, it's a waste of space. You can store image id/user id in database without exposing data between user. If a 50 mo picture if uploaded by 10 or 50 users, imagine the loss of space or bandwitch if you backup your data offsite ..... I like your idea but i think it's not enougth to limit de-duplication at album level for me. Sorry if i'm not clear or sound agressive (it's not the case), english is not my main language.
Author
Owner

@alextran1502 commented on GitHub (May 30, 2022):

@Nonobis I think your concern is that if you share an image with other users in the shared album, that image is "copied" into multiple images, correct?

If yes, you don't have to worry because the image is not copied into another image, it is referred as a shared assets in the that shared album, so when the album is viewed, it pulls the information in the pool of assets (contains all assets on the server) that which one is in it and then displays them.

@alextran1502 commented on GitHub (May 30, 2022): @Nonobis I think your concern is that if you share an image with other users in the shared album, that image is "copied" into multiple images, correct? If yes, you don't have to worry because the image is not copied into another image, it is referred as a shared assets in the that shared album, so when the album is viewed, it pulls the information in the pool of assets (contains all assets on the server) that which one is in it and then displays them.
Author
Owner

@jschwalbe commented on GitHub (May 31, 2022):

If I may, I think his concern can be explained with the following example:
I take my family on vacation to Disney. We all get the PhotoPass where a cast member takes photos and then we all download them to our personal devices.

Person A uploads the photos to immich.
Person B, C and D all do so as well.

If the photos are hashed somehow (md5, sha, whatever) then persons B C and D wouldn’t have to upload the photos bc the client will send the hash, the server will say “I‘ve already got a copy of those photos””, don’t bother uploading. I’ll link them to your collection as well as if you uploaded them, without the extra space usage.”

Does that clear it up at all? (Nonobis, let me know if I’m off!)

@jschwalbe commented on GitHub (May 31, 2022): If I may, I think his concern can be explained with the following example: I take my family on vacation to Disney. We all get the PhotoPass where a cast member takes photos and then we all download them to our personal devices. Person A uploads the photos to immich. Person B, C and D all do so as well. If the photos are hashed somehow (md5, sha, whatever) then persons B C and D wouldn’t have to upload the photos bc the client will send the hash, the server will say “I‘ve already got a copy of those photos””, don’t bother uploading. I’ll link them to your collection as well as if you uploaded them, without the extra space usage.” Does that clear it up at all? (Nonobis, let me know if I’m off!)
Author
Owner

@Nonobis commented on GitHub (May 31, 2022):

. @alextran1502 @jschwalbe Thanks for your response, it's more clear and it's solve my worries :) Your explanation is the minimum for a great app like this one. (i havbe tested all other app and it's the best ... i was at the limit to write a full apps microservice .Net 6.0 before i found immich).

But it's more like @jschwalbe description. i think de-deplucation need to be done even if image are not shared between user. Like is sample, i think there is really no need to upload multiple times x images already uploaded by another user.

@Nonobis commented on GitHub (May 31, 2022): . @alextran1502 @jschwalbe Thanks for your response, it's more clear and it's solve my worries :) Your explanation is the minimum for a great app like this one. (i havbe tested all other app and it's the best ... i was at the limit to write a full apps microservice .Net 6.0 before i found immich). But it's more like @jschwalbe description. i think de-deplucation need to be done even if image are not shared between user. Like is sample, i think there is really no need to upload multiple times x images already uploaded by another user.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: immich-app/immich#79