[BUG] Android App duplicate with external library #1437

Closed
opened 2026-02-05 01:48:14 +03:00 by OVERLORD · 17 comments
Owner

Originally created by @toxic0berliner on GitHub (Oct 10, 2023).

The bug

Given the warning to not use immich as the sole backup app for your pictures, I am still using an external app that backups all my pictures from my android phone to my NAS.
I just moved from a custom importer script to the external library feature.

But now, immich is not able to recognize anymore that the same picture is on my phone and on the server. I get a duplicate for each picture, one with a cloud only icon for the one on the server, and one with a crossed cloud for the one on my phone.

In the past I used to get a proper deduplication with a single picture and a checkmark inside the little cloud icon.

Maybe something broke and external libs are not matched against the local android pictures ?

The OS that Immich Server is running on

Docker image running on ubuntu 22.04

Version of Immich Server

v1.81.1

Version of Immich Mobile App

1.80.0 build.104

Platform with the issue

  • Server
  • Web
  • Mobile

Your docker-compose.yml content

version: "3.8"

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    command: [ "start.sh", "immich" ]
    volumes:
      - immich-upload:/usr/src/app/upload
      - orion-photo:/mnt/orion/photo
    env_file:
      - stack.env
    depends_on:
      - redis
      - database
      - typesense
    restart: always
    networks:
      immichnet:
        aliases: 
          - immich-server

  immich-microservices:
    container_name: immich_microservices
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    command: [ "start.sh", "microservices" ]
    volumes:
      - immich-upload:/usr/src/app/upload
      - orion-photo:/mnt/orion/photo
    env_file:
      - stack.env
    depends_on:
      - redis
      - database
      - typesense
    restart: always
    networks:
      immichnet:
        aliases: 
          - immich-microservices

  immich-machine-learning:
    container_name: immich_machine_learning
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    volumes:
      - model-cache:/cache
    env_file:
      - stack.env
    restart: always
    networks:
      immichnet:
        aliases: 
          - immich-machine-learning

  immich-web:
    container_name: immich_web
    image: ghcr.io/immich-app/immich-web:${IMMICH_VERSION:-release}
    env_file:
      - stack.env
    restart: always
    networks:
      immichnet:
        aliases: 
          - immich-web

  typesense:
    container_name: immich_typesense
    image: typesense/typesense:0.24.1@sha256:9bcff2b829f12074426ca044b56160ca9d777a0c488303469143dd9f8259d4dd
    environment:
      - TYPESENSE_API_KEY=${TYPESENSE_API_KEY}
      - TYPESENSE_DATA_DIR=/data
    logging:
      driver: none
    volumes:
      - tsdata:/data
    restart: always
    networks:
      immichnet:
        aliases: 
          - typesense

  redis:
    container_name: immich_redis
    image: redis:6.2-alpine@sha256:70a7a5b641117670beae0d80658430853896b5ef269ccf00d1827427e3263fa3
    restart: always
    networks:
      immichnet:
        aliases: 
          - redis

  database:
    container_name: immich_postgres
    image: postgres:14-alpine@sha256:28407a9961e76f2d285dc6991e8e48893503cc3836a4755bbc2d40bcc272a441
    env_file:
      - stack.env
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      PG_DATA: /var/lib/postgresql/data
    volumes:
      - pgdata:/var/lib/postgresql/data
    restart: always
    networks:
      immichnet:
        aliases: 
          - database

  photo:
    container_name: immich_proxy
    image: ghcr.io/immich-app/immich-proxy:${IMMICH_VERSION:-release}
    labels:
      traefik.http.services.photo.loadbalancer.server.port: 8080
      traefik.docker.network: traefiknetwork
      subdomain: photo
    environment:
      # Make sure these values get passed through from the env file
      - IMMICH_SERVER_URL=${IMMICH_SERVER_URL}
      - IMMICH_WEB_URL=${IMMICH_WEB_URL}
    #ports:
    #  - 2283:8080
    depends_on:
      - immich-server
      - immich-web
    restart: always
    networks:
      traefiknetwork:
        aliases: 
          - photo
      immichnet:
        aliases: 
          - immich_proxy
  

  
  
volumes:
  pgdata:
    driver: local-persist
    driver_opts:
      mountpoint: ${BASE_VOLUMES}/${STACKNAME}/pgdata
  model-cache:
    driver: local-persist
    driver_opts:
      mountpoint: ${BASE_VOLUMES}/${STACKNAME}/model-cache
  tsdata:
    driver: local-persist
    driver_opts:
      mountpoint: ${BASE_VOLUMES}/${STACKNAME}/tsdata
  orion-photo:
    driver: local-persist
    driver_opts:
      mountpoint: ${BASE_ORION}/photo
  immich-upload:
    driver: local-persist
    driver_opts:
      # mountpoint: ${BASE_ORION}/photo/immich
      mountpoint: ${BASE_ORION}/docker/volumes/immich
networks:
  traefiknetwork:
    name: traefiknetwork
    driver: bridge
    external: true
  immichnet:
    name: immichnet
    driver: bridge
    external: false
    attachable: true

Your .env content

STACKNAME=photo
BASE_VOLUMES=/var/lib/docker/local-persist
BASE_ORION=/mnt/orion
PUID=5678
PGID=100
TZ=Europe/Paris
UMASK=0
LOCAL_NETWORK=192.168.0.0/16
REALHOST=myhostname
DB_HOSTNAME=immich_postgres
DB_USERNAME=myuser
DB_PASSWORD=mypassword
DB_DATABASE_NAME=immich
REDIS_HOSTNAME=immich_redis
UPLOAD_LOCATION=${BASE_ORION}/photo/immich
TYPESENSE_API_KEY=myAPIKey
PUBLIC_LOGIN_PAGE_MESSAGE=
IMMICH_WEB_URL=http://immich-web:3000
IMMICH_SERVER_URL=http://immich-server:3001
IMMICH_MACHINE_LEARNING_URL=http://immich-machine-learning:3003
LOG_LEVEL=debug

Reproduction steps

0.backup your android pictures to the future external library folder
1.spin up the stack, add a user and the external library, let it discover all pictures
2.start the android app, login, let it scan local pictures 
3.all pictures are show twice

Additional information

No response

Originally created by @toxic0berliner on GitHub (Oct 10, 2023). ### The bug Given the warning to not use immich as the sole backup app for your pictures, I am still using an external app that backups all my pictures from my android phone to my NAS. I just moved from a custom importer script to the external library feature. But now, immich is not able to recognize anymore that the same picture is on my phone and on the server. I get a duplicate for each picture, one with a cloud only icon for the one on the server, and one with a crossed cloud for the one on my phone. In the past I used to get a proper deduplication with a single picture and a checkmark inside the little cloud icon. Maybe something broke and external libs are not matched against the local android pictures ? ### The OS that Immich Server is running on Docker image running on ubuntu 22.04 ### Version of Immich Server v1.81.1 ### Version of Immich Mobile App 1.80.0 build.104 ### Platform with the issue - [ ] Server - [ ] Web - [X] Mobile ### Your docker-compose.yml content ```YAML version: "3.8" services: immich-server: container_name: immich_server image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release} command: [ "start.sh", "immich" ] volumes: - immich-upload:/usr/src/app/upload - orion-photo:/mnt/orion/photo env_file: - stack.env depends_on: - redis - database - typesense restart: always networks: immichnet: aliases: - immich-server immich-microservices: container_name: immich_microservices image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release} command: [ "start.sh", "microservices" ] volumes: - immich-upload:/usr/src/app/upload - orion-photo:/mnt/orion/photo env_file: - stack.env depends_on: - redis - database - typesense restart: always networks: immichnet: aliases: - immich-microservices immich-machine-learning: container_name: immich_machine_learning image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release} volumes: - model-cache:/cache env_file: - stack.env restart: always networks: immichnet: aliases: - immich-machine-learning immich-web: container_name: immich_web image: ghcr.io/immich-app/immich-web:${IMMICH_VERSION:-release} env_file: - stack.env restart: always networks: immichnet: aliases: - immich-web typesense: container_name: immich_typesense image: typesense/typesense:0.24.1@sha256:9bcff2b829f12074426ca044b56160ca9d777a0c488303469143dd9f8259d4dd environment: - TYPESENSE_API_KEY=${TYPESENSE_API_KEY} - TYPESENSE_DATA_DIR=/data logging: driver: none volumes: - tsdata:/data restart: always networks: immichnet: aliases: - typesense redis: container_name: immich_redis image: redis:6.2-alpine@sha256:70a7a5b641117670beae0d80658430853896b5ef269ccf00d1827427e3263fa3 restart: always networks: immichnet: aliases: - redis database: container_name: immich_postgres image: postgres:14-alpine@sha256:28407a9961e76f2d285dc6991e8e48893503cc3836a4755bbc2d40bcc272a441 env_file: - stack.env environment: POSTGRES_PASSWORD: ${DB_PASSWORD} POSTGRES_USER: ${DB_USERNAME} POSTGRES_DB: ${DB_DATABASE_NAME} PG_DATA: /var/lib/postgresql/data volumes: - pgdata:/var/lib/postgresql/data restart: always networks: immichnet: aliases: - database photo: container_name: immich_proxy image: ghcr.io/immich-app/immich-proxy:${IMMICH_VERSION:-release} labels: traefik.http.services.photo.loadbalancer.server.port: 8080 traefik.docker.network: traefiknetwork subdomain: photo environment: # Make sure these values get passed through from the env file - IMMICH_SERVER_URL=${IMMICH_SERVER_URL} - IMMICH_WEB_URL=${IMMICH_WEB_URL} #ports: # - 2283:8080 depends_on: - immich-server - immich-web restart: always networks: traefiknetwork: aliases: - photo immichnet: aliases: - immich_proxy volumes: pgdata: driver: local-persist driver_opts: mountpoint: ${BASE_VOLUMES}/${STACKNAME}/pgdata model-cache: driver: local-persist driver_opts: mountpoint: ${BASE_VOLUMES}/${STACKNAME}/model-cache tsdata: driver: local-persist driver_opts: mountpoint: ${BASE_VOLUMES}/${STACKNAME}/tsdata orion-photo: driver: local-persist driver_opts: mountpoint: ${BASE_ORION}/photo immich-upload: driver: local-persist driver_opts: # mountpoint: ${BASE_ORION}/photo/immich mountpoint: ${BASE_ORION}/docker/volumes/immich networks: traefiknetwork: name: traefiknetwork driver: bridge external: true immichnet: name: immichnet driver: bridge external: false attachable: true ``` ### Your .env content ```Shell STACKNAME=photo BASE_VOLUMES=/var/lib/docker/local-persist BASE_ORION=/mnt/orion PUID=5678 PGID=100 TZ=Europe/Paris UMASK=0 LOCAL_NETWORK=192.168.0.0/16 REALHOST=myhostname DB_HOSTNAME=immich_postgres DB_USERNAME=myuser DB_PASSWORD=mypassword DB_DATABASE_NAME=immich REDIS_HOSTNAME=immich_redis UPLOAD_LOCATION=${BASE_ORION}/photo/immich TYPESENSE_API_KEY=myAPIKey PUBLIC_LOGIN_PAGE_MESSAGE= IMMICH_WEB_URL=http://immich-web:3000 IMMICH_SERVER_URL=http://immich-server:3001 IMMICH_MACHINE_LEARNING_URL=http://immich-machine-learning:3003 LOG_LEVEL=debug ``` ### Reproduction steps ```bash 0.backup your android pictures to the future external library folder 1.spin up the stack, add a user and the external library, let it discover all pictures 2.start the android app, login, let it scan local pictures 3.all pictures are show twice ``` ### Additional information _No response_
Author
Owner

@alextran1502 commented on GitHub (Oct 10, 2023):

I think this is not the intended use case. The external library is used for existing libraries while uploading assets will go into the default library.

@alextran1502 commented on GitHub (Oct 10, 2023): I think this is not the intended use case. The external library is used for existing libraries while uploading assets will go into the default library.
Author
Owner

@toxic0berliner commented on GitHub (Oct 10, 2023):

Damn, it would be a bit sad if that's the case. I trashed my previous install... 50k pictures, with many faces, takes over 3 days to scan and several weeks to ignore the over 40k faces and rename all my friends....

I'm really not sure I'm ready or even should move everything to the immich primary library... Really difficult to add the dedup algorithm to external libraries ?

It was working fine in the pas with my custom script that imported into the library with an external path... But that started to fail as well mid September (not importing new ones) so I thought external library would be best.

Even if I were to switch to immich as primary app including for backup, I have over 250GB of pictures on my phone, not really looking forward to moving it on my NAS from where they are to immich....

@toxic0berliner commented on GitHub (Oct 10, 2023): Damn, it would be a bit sad if that's the case. I trashed my previous install... 50k pictures, with many faces, takes over 3 days to scan and several weeks to ignore the over 40k faces and rename all my friends.... I'm really not sure I'm ready or even should move everything to the immich primary library... Really difficult to add the dedup algorithm to external libraries ? It was working fine in the pas with my custom script that imported into the library with an external path... But that started to fail as well mid September (not importing new ones) so I thought external library would be best. Even if I were to switch to immich as primary app including for backup, I have over 250GB of pictures on my phone, not really looking forward to moving it on my NAS from where they are to immich....
Author
Owner

@toxic0berliner commented on GitHub (Oct 10, 2023):

I tried to not grant the permission to use Android pictures but the app keeps asking, so can't use external library at all as long as there is any overlap with the content of the phone, can't use the app without it seeing local android pictures...
Makes it unusable for me.
I'm thankfully not your only user and you don't really need me, sure, but I fail to see why external library really shouldn't be treated as the primary library in case the picture on the phone is already on the server in an external library...

Was liking the face recognition, places, timeline, overall swiftness of the UI. I can't believe I'm the only one with such need but I'm also not ready to fork or PR to fix it as I'm a bad dev, so I hope I can convince you 😁

@toxic0berliner commented on GitHub (Oct 10, 2023): I tried to not grant the permission to use Android pictures but the app keeps asking, so can't use external library at all as long as there is any overlap with the content of the phone, can't use the app without it seeing local android pictures... Makes it unusable for me. I'm thankfully not your only user and you don't really need me, sure, but I fail to see why external library really shouldn't be treated as the primary library in case the picture on the phone is already on the server in an external library... Was liking the face recognition, places, timeline, overall swiftness of the UI. I can't believe I'm the only one with such need but I'm also not ready to fork or PR to fix it as I'm a bad dev, so I hope I can convince you 😁
Author
Owner

@alextran1502 commented on GitHub (Oct 10, 2023):

I am not sure what you are trying to achieve, from my POV you can

  • backup assets from your phone to the default library
  • if you have an existing external library, then mount it and use the library feature.
@alextran1502 commented on GitHub (Oct 10, 2023): I am not sure what you are trying to achieve, from my POV you can - backup assets from your phone to the default library - if you have an existing external library, then mount it and use the `library` feature.
Author
Owner

@toxic0berliner commented on GitHub (Oct 10, 2023):

I have 250gb of pictures already on my phone and already on the NAS where I run immich.
Just trying to use Immich and not move all my existing pictures.
The NAS also store some pictures and movies that I remove from my phone since then. So ideally I'd import all existing files AND enable backup, all to the primary library, but that would mean moving or duplicating over 500gb of pictures and videos...

So I'd really like instead to keep the existing files where they are, not enable the backup as the one I already have works fine, but still be able to use Immich to see and analyse all my pictures and be able to share them with friends.

This is why I would need the external library AND the android photos to work together and not show up twice, else I'll not use Immich on my phone, not invest time in "maintaining" it and ultimately it'll end up fully unused.

@toxic0berliner commented on GitHub (Oct 10, 2023): I have 250gb of pictures already on my phone and already on the NAS where I run immich. Just trying to use Immich and not move all my existing pictures. The NAS also store some pictures and movies that I remove from my phone since then. So ideally I'd import all existing files AND enable backup, all to the primary library, but that would mean moving or duplicating over 500gb of pictures and videos... So I'd really like instead to keep the existing files where they are, not enable the backup as the one I already have works fine, but still be able to use Immich to see and analyse all my pictures and be able to share them with friends. This is why I would need the external library AND the android photos to work together and not show up twice, else I'll not use Immich on my phone, not invest time in "maintaining" it and ultimately it'll end up fully unused.
Author
Owner

@mattjmeier commented on GitHub (Oct 24, 2023):

I think this is an important issue. I am also experiencing it (while loving Immich overall!) and fully agree.

I'm sure many people possess duplicated photos in their external libraries for a variety of reasons. Some of those reasons may be vestigial or even superfluous. In my personal case, even the result of laziness.

  • as noted here, and the most pressing reason - other backup methods used prior to Immich will have backed up photos from the device already to the external library. This problem would go away over time, if one was using Immich as the sole backup source, but in my case ~2000 photos that were already backed up to an external source would still forever be duplicated in the Immich timeline.
  • selecting images to print or share with others in the past can result in duplicated subsets of images over libraries
  • other backups, editing workflows, etc.

Obviously, there are other deduplication methods that could take care of things like the duplicated folders. But for people with larger photo collections (mine is ~100k), that is a lot to manage and go through. I love the idea of having the Immich UI put all the photos into a timeline for me without too much intervention. It is working so incredibly well!!

As I have pointed out (https://github.com/immich-app/immich/discussions/4240#discussioncomment-7180105) I think there is a relatively simple solution to this: don't display two images in the timeline that share the same file checksum. Why would this ever be the desired behavior? If they are identical images, then I am confident that no one would want them displayed adjacent to each other in the timeline. If there are reasons someone would want this, I am very curious to hear it.

How could a solution be implemented? I propose that they could either be considered a type of 'stack' (i.e., keep the assets tracked separately, but displayed as one), or alternatively, subjected to the same checksum searching that already applies to the "Upload" library (i.e., consider it a single asset). The former option could give users more flexibility, the latter may be easier to implement.

I love Immich and hope to continue using it! I really feel strongly about this though. I would be willing to help out with a PR, although the learning curve would be really steep for me as I am not familiar with the languages used in Immich.

Thanks for everyone's continued efforts on this amazing project!!

@mattjmeier commented on GitHub (Oct 24, 2023): I think this is an important issue. I am also experiencing it (while loving Immich overall!) and fully agree. I'm sure many people possess duplicated photos in their external libraries for a variety of reasons. Some of those reasons may be vestigial or even superfluous. In my personal case, even the result of laziness. - as noted here, and the most pressing reason - other backup methods used prior to Immich will have backed up photos from the device already to the external library. This problem *would* go away over time, if one was using Immich as the sole backup source, but in my case ~2000 photos that were already backed up to an external source would still **forever be duplicated in the Immich timeline**. - selecting images to print or share with others in the past can result in duplicated subsets of images over libraries - other backups, editing workflows, etc. Obviously, there are other deduplication methods that could take care of things like the duplicated folders. But for people with larger photo collections (mine is ~100k), that is a lot to manage and go through. I love the idea of having the Immich UI put all the photos into a timeline for me without too much intervention. It is working so incredibly well!! As I have pointed out (https://github.com/immich-app/immich/discussions/4240#discussioncomment-7180105) I think there is a relatively simple solution to this: don't display two images in the timeline that share the same file checksum. Why would this ever be the desired behavior? If they are *identical* images, then I am confident that no one would want them displayed adjacent to each other in the timeline. If there are reasons someone would want this, I am very curious to hear it. How could a solution be implemented? I propose that they could either be considered a type of 'stack' (i.e., keep the assets tracked separately, but displayed as one), or alternatively, subjected to the same checksum searching that already applies to the "Upload" library (i.e., consider it a single asset). The former option could give users more flexibility, the latter may be easier to implement. I love Immich and hope to continue using it! I really feel strongly about this though. I would be willing to help out with a PR, although the learning curve would be really steep for me as I am not familiar with the languages used in Immich. Thanks for everyone's continued efforts on this amazing project!!
Author
Owner

@jrasm91 commented on GitHub (Oct 24, 2023):

Libraries don't currently use checksums since they are the "source of truth" and there is a significantly negative performance impact to generating hashes on large libraries. Even if we had them, checksums have to be unique in the database and now you still have the complexity of managing what file do you keep and which one do you ignore, how do you manage that on rescan or file moves, etc. There are also priorities for libraries like automatic album creation. I guess long story short, probably not going to be addressed anytime soon and you are better off using a proper dedupe tool instead.

@jrasm91 commented on GitHub (Oct 24, 2023): Libraries don't currently use checksums since they are the "source of truth" and there is a significantly negative performance impact to generating hashes on large libraries. Even if we had them, checksums have to be unique in the database and now you still have the complexity of managing what file do you keep and which one do you ignore, how do you manage that on rescan or file moves, etc. There are also priorities for libraries like automatic album creation. I guess long story short, probably not going to be addressed anytime soon and you are better off using a proper dedupe tool instead.
Author
Owner

@mattjmeier commented on GitHub (Oct 24, 2023):

Ahh, thanks for the insight and taking the time to reply.

So if I understand correctly, the upload library is specially designated to calculate the sha1 hash for the assets in it, but external libraries are not.

The part I am not understanding is how the resources required would be any different if I uploaded 100k photos from my phone. If I did this, hypothetically, the hashes would be calculated and presumably recorded in the db. But this isn't possible for the external libraries?

And I guess what you are saying about being unique in the database means that two assets cannot share a checksum because it's a primary key. This makes sense*. I suppose it would make sense to me intuitively that two duplicate photos (with the same checksum) could be represented by a single asset in the database (since it essentially is). Perhaps it would also start to violate other rules about fields in the database - e.g., can't have more than one file path per asset, likely? I can see how problems would start to pile up.

I can also definitely understand that people running this on a raspberry pi wouldn't find it desirable to run checksum calculations for days on end.

I'm curious how photoprism implements this feature (https://docs.photoprism.app/user-guide/library/duplicates/ - they are checking sha1 for every file on import to detect). It is one of the few things it does better - automatically stacking assets when it makes sense to do so (i.e., raw + jpg version; identical images; etc.). I understand this is getting outside the scope of what Immich was designed to do. It's just that it's so awesome at doing everything else it is so tempting to integrate this feature.

I also share @toxic0berliner's concerns regarding dropping other backup methods. I am currently using Nextcloud for auto backups from mobile. I would be happy to lose this method, but it works and is stable for now. So, perhaps something for the future.

I get the impression that there are many users facing the same issue though, because a lot of people are going to be using external libraries like this, and many people WILL have duplicates as I've described, and many will have other methods of backups too. I'm not trying to put more on the current developers' shoulders, just sharing my experience.

I still come back to the same question: why would any user want duplicate images sharing a sha1 hash displayed in the timeline? It seems as simple (ha... I know, is it ever simple) as offering the option to calculate hashes; recording it in a table in the database; and picking one as the primary asset to display and generate thumbs for (the first one by mtime? literally doesn't matter).

*(EDIT: actually I'm not sure anymore how this is possible, because I do have duplicates in the timeline, meaning they would have the same hash... I obviously do not have a good grasp of how this is all working in the back end, although it's clear that hashes are not calculated for both duplicates)

@mattjmeier commented on GitHub (Oct 24, 2023): Ahh, thanks for the insight and taking the time to reply. So if I understand correctly, the upload library is specially designated to calculate the sha1 hash for the assets in it, but external libraries are not. The part I am not understanding is how the resources required would be any different if I uploaded 100k photos from my phone. If I did this, hypothetically, the hashes would be calculated and presumably recorded in the db. But this isn't possible for the external libraries? And I guess what you are saying about being unique in the database means that two assets cannot share a checksum because it's a primary key. This makes sense*. I suppose it would make sense to me intuitively that two duplicate photos (with the same checksum) could be represented by a single asset in the database (since it essentially is). Perhaps it would also start to violate other rules about fields in the database - e.g., can't have more than one file path per asset, likely? I can see how problems would start to pile up. I can also definitely understand that people running this on a raspberry pi wouldn't find it desirable to run checksum calculations for days on end. I'm curious how photoprism implements this feature (https://docs.photoprism.app/user-guide/library/duplicates/ - they are checking sha1 for every file on import to detect). It is one of the few things it does better - automatically stacking assets when it makes sense to do so (i.e., raw + jpg version; identical images; etc.). I understand this is getting outside the scope of what Immich was designed to do. It's just that it's so awesome at doing everything else it is so tempting to integrate this feature. I also share @toxic0berliner's concerns regarding dropping other backup methods. I am currently using Nextcloud for auto backups from mobile. I would be happy to lose this method, but it works and is stable for now. So, perhaps something for the future. I get the impression that there are many users facing the same issue though, because a lot of people are going to be using external libraries like this, and many people WILL have duplicates as I've described, and many will have other methods of backups too. I'm not trying to put more on the current developers' shoulders, just sharing my experience. I still come back to the same question: why would **any** user want duplicate images sharing a sha1 hash displayed in the timeline? It seems as simple (ha... I know, is it ever simple) as offering the option to calculate hashes; recording it in a table in the database; and picking one as the primary asset to display and generate thumbs for (the first one by mtime? literally doesn't matter). *(EDIT: actually I'm not sure anymore how this is possible, because I do have duplicates in the timeline, meaning they would have the same hash... I obviously do not have a good grasp of how this is all working in the back end, although it's clear that hashes are not calculated for both duplicates)
Author
Owner

@jrasm91 commented on GitHub (Oct 25, 2023):

External libraries are quite different than upload libraries and we have separate implementations, which reflect each use case.

Upload libraries have immich as the source of truth and it manages creating and deleting files and deduping them.

External libraries have the file system as the source of truth and so we leave creating, deleting and deduping files to the user. Deduping has different semantics in this context and the implementation would be quite different. We realized that by not having hashing it is significantly faster to import an external library, so we didn't add it.

It is not to say hashing and other deduping cannot be done, it is more that it is not trivial as it seems and specifically because there were benefits to excluding it (simpler implementation) we didn't include it originally.

Checksum is a required field, but the value for external library files is just a hash of the file path instead.

@jrasm91 commented on GitHub (Oct 25, 2023): External libraries are quite different than upload libraries and we have separate implementations, which reflect each use case. Upload libraries have immich as the source of truth and it manages creating and deleting files and deduping them. External libraries have the file system as the source of truth and so we leave creating, deleting and deduping files to the user. Deduping has different semantics in this context and the implementation would be quite different. We realized that by not having hashing it is _significantly_ faster to import an external library, so we didn't add it. It is not to say hashing and other deduping _cannot_ be done, it is more that it is not trivial as it seems and specifically because there were benefits to excluding it (simpler implementation) we didn't include it originally. Checksum is a required field, but the value for external library files is just a hash of the file path instead.
Author
Owner

@jrasm91 commented on GitHub (Oct 25, 2023):

I don't think any user wants duplicates in their external libraries, but they do want external libraries and they got them sooner at the expense of no dedupe checking.

@jrasm91 commented on GitHub (Oct 25, 2023): I don't think any user wants duplicates in their external libraries, but they do want external libraries and they got them sooner at the expense of no dedupe checking.
Author
Owner

@mattjmeier commented on GitHub (Oct 25, 2023):

Totally fair! Happy to have it, because that is what drew me in as a user.

Pre-existing duplicates I agree are a separate problem with no easy answer. It was just a surprise that backing my photos up through a separate mechanism (which is indeed recommended upfront in the documentation) results in duplicate uploads from the mobile app to my library.

I would be really interested to learn more about how the current implementation works to check duplicates against images in the upload_location but the code base is massive and I didn't have any luck trying to search on my own. Any pointers on where to look?

Side note: why not use md5 rather than sha1 since it's a bit less computationally expensive? (EDIT: I guess the speed is fairly comparable, but you get more bits from sha1...)

@mattjmeier commented on GitHub (Oct 25, 2023): Totally fair! Happy to have it, because that is what drew me in as a user. Pre-existing duplicates I agree are a separate problem with no easy answer. It was just a surprise that backing my photos up through a separate mechanism (which is indeed recommended upfront in the documentation) results in duplicate uploads from the mobile app to my library. I would be really interested to learn more about how the current implementation works to check duplicates against images in the `upload_location` but the code base is massive and I didn't have any luck trying to search on my own. Any pointers on where to look? Side note: why not use md5 rather than sha1 since it's a bit less computationally expensive? (EDIT: I guess the speed is fairly comparable, but you get more bits from sha1...)
Author
Owner

@jrasm91 commented on GitHub (Oct 25, 2023):

Pre-existing duplicates I agree are a separate problem with no easy answer. It was just a surprise that backing my photos up through a separate mechanism (which is indeed recommended upfront in the documentation) results in duplicate uploads from the mobile app to my library.

Honestly, there seem to be two main types of users using Immich right now:

  1. I want immich to backup and organize my photos for me.
  2. I have my own collection of photos I'll give you read only access to them, don't touch them.

Immich was originally designed to work exactly like google photos. With google photos you don't have an option 2 available in the first place. But, there are lots of people looking for self-hosted photos with use case 2 in mind, so libraries was added (after the fact) to accommodate that user group. Upload libraries are really for group one and external libraries are really for group two.

While we want to support more use cases, photo management software is indeed complicated. I'd say, currently at least, using the upload library and the external libraries in tandem in not a great experience and I think most people are only using one or the other right now. I'm sure it will improve in the future, but it is a current limitation. It's still unclear exactly how they should/will be integrated in the future. There are talks of migrating "partner sharing" to be library based and other stuff like that.

I would be really interested to learn more about how the current implementation works to check duplicates against images in the upload_location but the code base is massive and I didn't have any luck trying to search on my own. Any pointers on where to look?

  • Hashes are calculated on upload here
  • Hashes are have a unique database constraint (per library) here
  • Asset uploads are passed to the service here. If the upload violates the constraint the error is caught and the duplicate id is returned.
  • Library checksum fields are set here

Side note: why not use md5 rather than sha1 since it's a bit less computationally expensive? (EDIT: I guess the speed is fairly comparable, but you get more bits from sha1...)

Long story short, it is the version Alex picked when he started building, probably because he is not a crypto expert and just made a decision and moved on. By the time more contributors started working on the project sha1 was already widely incorporated into the project and it would take a bit of effort to migrate to another algorithm. The benefits of migrating simply was not worth the time and effort. Basically, migrating has minimal impact on the users of the system, but delays other more critical features that we've decided to build instead. So like, do you want to migrate to md5 or get a better search system, a stacked photos implementation, a more robust dedupe implementation, automatic albums for external libraries, etc. We've decided those features are more important than the algorithm we use for hashing. Sha1 is pretty performant still and on some machines is a single cpu instruction.

@jrasm91 commented on GitHub (Oct 25, 2023): > Pre-existing duplicates I agree are a separate problem with no easy answer. It was just a surprise that backing my photos up through a separate mechanism (which is indeed recommended upfront in the documentation) results in duplicate uploads from the mobile app to my library. Honestly, there seem to be two main types of users using Immich right now: 1. I want immich to backup and organize my photos for me. 2. I have my own collection of photos I'll give you read only access to them, don't touch them. Immich was originally designed to work exactly like google photos. With google photos you don't have an option 2 available in the first place. But, there are lots of people looking for self-hosted photos with use case 2 in mind, so libraries was _added_ (after the fact) to accommodate that user group. Upload libraries are really for group one and external libraries are really for group two. While we _want_ to support more use cases, photo management software is indeed complicated. I'd say, currently at least, using the upload library and the external libraries in tandem in not a great experience and I think most people are only using one or the other right now. I'm sure it will improve in the future, but it is a current limitation. It's still unclear exactly how they should/will be integrated in the future. There are talks of migrating "partner sharing" to be library based and other stuff like that. > I would be really interested to learn more about how the current implementation works to check duplicates against images in the upload_location but the code base is massive and I didn't have any luck trying to search on my own. Any pointers on where to look? - Hashes are calculated on upload [here](https://github.com/immich-app/immich/blob/main/server/src/immich/app.interceptor.ts#L119-L135) - Hashes are have a unique database constraint (per library) [here](https://github.com/immich-app/immich/blob/main/server/src/infra/entities/asset.entity.ts#L29) - Asset uploads are passed to the service [here](https://github.com/immich-app/immich/blob/main/server/src/immich/api-v1/asset/asset.service.ts#L99-L104). If the upload violates the constraint the error is caught and the duplicate id is returned. - Library checksum fields are set [here](https://github.com/immich-app/immich/blob/main/server/src/domain/library/library.service.ts#L223-L237) > Side note: why not use md5 rather than sha1 since it's a bit less computationally expensive? (EDIT: I guess the speed is fairly comparable, but you get more bits from sha1...) Long story short, it is the version Alex picked when he started building, probably because he is not a crypto expert and just made a decision and moved on. By the time more contributors started working on the project sha1 was already widely incorporated into the project and it would take a bit of effort to migrate to another algorithm. The benefits of migrating simply was not worth the time and effort. Basically, migrating has minimal impact on the users of the system, but delays other more critical features that we've decided to build instead. So like, do you want to migrate to md5 or get a better search system, a stacked photos implementation, a more robust dedupe implementation, automatic albums for external libraries, etc. We've decided those features are more important than the algorithm we use for hashing. Sha1 is pretty performant still and on some machines is a single cpu instruction.
Author
Owner

@mattjmeier commented on GitHub (Oct 25, 2023):

Thanks so much for all the details. I really appreciate you taking the time! I understand the nuances a lot better now.

I would place myself somewhere between 1 and 2... I do want Immich to be my mobile backup & organization/UI/sharing solution (i.e., a replacement for google photos, obviously), but I also have a large collection of photos, and like the granularity of being able to provide various volumes across various physical locations and not worry about it destroying my collection while the app is in development. I would happily enable a longer processing time to have duplicate detection (but, I have a reasonably powerful server to do this, which many users might not).

I guess the solution for me is to disable the Immich mobile upload entirely until there is progress on this front and rely on 3rd party tools, then clean up the existing duplicates as required, which is easy enough to do (well worth the effort to keep using the excellent application). I suppose that will work, thanks for helping me reach that conclusion - hopefully this discussion helps others too.

I'm happy to continue the discussion if I think of anything productive.

@mattjmeier commented on GitHub (Oct 25, 2023): Thanks so much for all the details. I really appreciate you taking the time! I understand the nuances a lot better now. I would place myself somewhere between 1 and 2... I do want Immich to be my mobile backup & organization/UI/sharing solution (i.e., a replacement for google photos, obviously), but I also have a large collection of photos, and like the granularity of being able to provide various volumes across various physical locations and not worry about it destroying my collection while the app is in development. I would happily enable a longer processing time to have duplicate detection (but, I have a reasonably powerful server to do this, which many users might not). I guess the solution for me is to disable the Immich mobile upload entirely until there is progress on this front and rely on 3rd party tools, then clean up the existing duplicates as required, which is easy enough to do (well worth the effort to keep using the excellent application). I suppose that will work, thanks for helping me reach that conclusion - hopefully this discussion helps others too. I'm happy to continue the discussion if I think of anything productive.
Author
Owner

@alextran1502 commented on GitHub (Oct 25, 2023):

Thanks, @mattjmeier and @jrasm91, for a very productive conversation.

@alextran1502 commented on GitHub (Oct 25, 2023): Thanks, @mattjmeier and @jrasm91, for a very productive conversation.
Author
Owner

@jrasm91 commented on GitHub (Oct 25, 2023):

I think that sounds like a good solution in the interim while we continue to work out the kinks around libraries and figure out how to tackle your use case. Thanks for being understanding as well, it is refreshing 🙏.

@jrasm91 commented on GitHub (Oct 25, 2023): I think that sounds like a good solution in the interim while we continue to work out the kinks around libraries and figure out how to tackle your use case. Thanks for being understanding as well, it is refreshing :pray:.
Author
Owner

@jrasm91 commented on GitHub (Oct 25, 2023):

I think adding an optional feature for "library hashing" could be something we look at in the future as well.

@jrasm91 commented on GitHub (Oct 25, 2023): I think adding an optional feature for "library hashing" could be something we look at in the future as well.
Author
Owner

@alextran1502 commented on GitHub (Nov 1, 2023):

Conver to discussion/feature request this as this is not a bug but the current intention. Future optimization might address this issue

@alextran1502 commented on GitHub (Nov 1, 2023): Conver to discussion/feature request this as this is not a bug but the current intention. Future optimization might address this issue
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: immich-app/immich#1437