Smart search shows irrelevant results #1886

Closed
opened 2026-02-05 04:19:16 +03:00 by OVERLORD · 8 comments
Owner

Originally created by @Chuckame on GitHub (Dec 27, 2023).

The bug

Searching with any word or sentence always returns all the photos, both in webapp or android app. Expecting some filters or no result.

Screenshots:
Screenshot_20231227-014712
Screenshot_20231227-014618
Screenshot_20231227-014553

I only put 4 photos on the immich instance for reproduction, but it's the same with my personal photos.

The OS that Immich Server is running on

Unraid

Version of Immich Server

1.91.4

Version of Immich Mobile App

1.91.4

Platform with the issue

  • Server
  • Web
  • Mobile

Your docker-compose.yml content

version: "3.8"

services:
  server:
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    command: [ "start.sh", "immich" ]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
    env_file:
      - .env
    depends_on:
      - redis
      - database
    restart: unless-stopped
    networks:
      - internal
      - caddy
    labels:
      caddy: media.home.chuckame.fr
      caddy.reverse_proxy: '{{upstreams http 3001}}'

  microservices:
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    extends:
      file: hwaccel.yml
      service: hwaccel
    command: [ "start.sh", "microservices" ]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
    env_file:
      - .env
    depends_on:
      - redis
      - database
    restart: unless-stopped
    networks:
      - internal

  machine-learning:
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: unless-stopped
    networks:
      - internal

  redis:
    image: redis:6.2-alpine@sha256:b6124ab2e45cc332e16398022a411d7e37181f21ff7874835e0180f56a09e82a
    restart: unless-stopped
    networks:
      - internal

  database:
    image: tensorchord/pgvecto-rs:pg14-v0.1.11@sha256:0335a1a22f8c5dd1b697f14f079934f5152eaaa216c09b61e293be285491f8ee
    env_file:
      - .env
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
    volumes:
      - pgdata:/var/lib/postgresql/data
    restart: unless-stopped
    networks:
      - internal

volumes:
  pgdata:
  model-cache:

networks:
  caddy:
    external: true
  internal:

Your .env content

UPLOAD_LOCATION=/mnt/user/Photos/immich

IMMICH_VERSION=v1.91.4

# Connection secret for postgres. You should change it to a random password
DB_PASSWORD=REDACTED

# The values below this line do not need to be changed
###################################################################################
DB_HOSTNAME=database
DB_USERNAME=postgres
DB_DATABASE_NAME=immich

REDIS_HOSTNAME=redis

Reproduction steps

1. Upload photos
2. Search with any word/sentence
3. Results always showing all the photos

Additional information

  • all machine learning stuff enabled
  • jobs appeared (tagging, face reco,...) well on the jobs screen
  • no error/abnormal log in any of the services
Originally created by @Chuckame on GitHub (Dec 27, 2023). ### The bug Searching with any word or sentence always returns all the photos, both in webapp or android app. Expecting some filters or no result. Screenshots: ![Screenshot_20231227-014712](https://github.com/immich-app/immich/assets/16419143/5d145cc9-6698-43ac-a737-834f0fb4cf6f) ![Screenshot_20231227-014618](https://github.com/immich-app/immich/assets/16419143/f7a5bf8f-35ae-4901-9661-63898d123663) ![Screenshot_20231227-014553](https://github.com/immich-app/immich/assets/16419143/cc95fb56-054b-466f-b93c-398f8f2b131d) I only put 4 photos on the immich instance for reproduction, but it's the same with my personal photos. ### The OS that Immich Server is running on Unraid ### Version of Immich Server 1.91.4 ### Version of Immich Mobile App 1.91.4 ### Platform with the issue - [ ] Server - [X] Web - [X] Mobile ### Your docker-compose.yml content ```YAML version: "3.8" services: server: image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release} command: [ "start.sh", "immich" ] volumes: - ${UPLOAD_LOCATION}:/usr/src/app/upload - /etc/localtime:/etc/localtime:ro env_file: - .env depends_on: - redis - database restart: unless-stopped networks: - internal - caddy labels: caddy: media.home.chuckame.fr caddy.reverse_proxy: '{{upstreams http 3001}}' microservices: image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release} extends: file: hwaccel.yml service: hwaccel command: [ "start.sh", "microservices" ] volumes: - ${UPLOAD_LOCATION}:/usr/src/app/upload - /etc/localtime:/etc/localtime:ro env_file: - .env depends_on: - redis - database restart: unless-stopped networks: - internal machine-learning: image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release} volumes: - model-cache:/cache env_file: - .env restart: unless-stopped networks: - internal redis: image: redis:6.2-alpine@sha256:b6124ab2e45cc332e16398022a411d7e37181f21ff7874835e0180f56a09e82a restart: unless-stopped networks: - internal database: image: tensorchord/pgvecto-rs:pg14-v0.1.11@sha256:0335a1a22f8c5dd1b697f14f079934f5152eaaa216c09b61e293be285491f8ee env_file: - .env environment: POSTGRES_PASSWORD: ${DB_PASSWORD} POSTGRES_USER: ${DB_USERNAME} POSTGRES_DB: ${DB_DATABASE_NAME} volumes: - pgdata:/var/lib/postgresql/data restart: unless-stopped networks: - internal volumes: pgdata: model-cache: networks: caddy: external: true internal: ``` ### Your .env content ```Shell UPLOAD_LOCATION=/mnt/user/Photos/immich IMMICH_VERSION=v1.91.4 # Connection secret for postgres. You should change it to a random password DB_PASSWORD=REDACTED # The values below this line do not need to be changed ################################################################################### DB_HOSTNAME=database DB_USERNAME=postgres DB_DATABASE_NAME=immich REDIS_HOSTNAME=redis ``` ### Reproduction steps ```bash 1. Upload photos 2. Search with any word/sentence 3. Results always showing all the photos ``` ### Additional information - all machine learning stuff enabled - jobs appeared (tagging, face reco,...) well on the jobs screen - no error/abnormal log in any of the services
OVERLORD added the 🗄️server label 2026-02-05 04:19:16 +03:00
Author
Owner

@mertalev commented on GitHub (Dec 27, 2023):

This is just a product of how smart search works right now. It returns the top 100 results for a given search, so for a toy library it can effectively return everything. There are some technical constraints to this, such as it being difficult to algorithmically know how many results are relevant.

We've recently been unblocked on this with the switch to pgvecto.rs, so we hope to overhaul search with pagination, custom filters, etc.

I'm leaving this issue open since I agree the behavior here isn't ideal, but it's more of an enhancement issue than an actual bug.

@mertalev commented on GitHub (Dec 27, 2023): This is just a product of how smart search works right now. It returns the top 100 results for a given search, so for a toy library it can effectively return everything. There are some technical constraints to this, such as it being difficult to algorithmically know how many results are relevant. We've recently been unblocked on this with the switch to pgvecto.rs, so we hope to overhaul search with pagination, custom filters, etc. I'm leaving this issue open since I agree the behavior here isn't ideal, but it's more of an enhancement issue than an actual bug.
Author
Owner

@Chuckame commented on GitHub (Dec 31, 2023):

Ok understood, I've put all my library, and I've seen that the smart search is more about ranking photos instead of filtering out! I really love this feature, the dev team made an awesome work, that's impressive

@Chuckame commented on GitHub (Dec 31, 2023): Ok understood, I've put all my library, and I've seen that the smart search is more about ranking photos instead of filtering out! I really love this feature, the dev team made an awesome work, that's impressive
Author
Owner

@gbcatrinoiu commented on GitHub (Jan 24, 2024):

maybe it't already been implemented but in 1.93.3 it works perfect both on iOS and web. always returns only the relevant images.

@gbcatrinoiu commented on GitHub (Jan 24, 2024): maybe it't already been implemented but in 1.93.3 it works perfect both on iOS and web. always returns only the relevant images.
Author
Owner

@Chuckame commented on GitHub (Jan 26, 2024):

@gbcatrinoiu it's returning relevant images if your search in really targeting your photos' content. If now you search for something totally never inside your photo library, then it would just show the nearest photos related to the search, even if it's "far" away from the reality.

Maybe an acknowledgable warning message to indicates how is working the search for dumb new people like me 😄

Also pagination + custom sorting (by date) would be really interesting to have more control on what we search.

@Chuckame commented on GitHub (Jan 26, 2024): @gbcatrinoiu it's returning relevant images if your search in really targeting your photos' content. If now you search for something totally never inside your photo library, then it would just show the nearest photos related to the search, even if it's "far" away from the reality. Maybe an acknowledgable warning message to indicates how is working the search for dumb new people like me 😄 Also pagination + custom sorting (by date) would be really interesting to have more control on what we search.
Author
Owner

@ChrislyBear-GH commented on GitHub (Mar 5, 2024):

Is there a way to debug Smart Search, i.e. show context matches somehow? Like we're able to identify people in the asset information panel.
Or is this something, that isn't possible because it's hidden in the "machine learning blackbox"?

@ChrislyBear-GH commented on GitHub (Mar 5, 2024): Is there a way to debug Smart Search, i.e. show context matches somehow? Like we're able to identify people in the asset information panel. Or is this something, that isn't possible because it's hidden in the "machine learning blackbox"?
Author
Owner

@alextran1502 commented on GitHub (Mar 5, 2024):

@ChrislyBear-GH It is not possible because the search phase is encoded into a list of 512 numbers. Then, it uses the VBASE mechanism to identify the score with all the photos and returns the closest match.

@alextran1502 commented on GitHub (Mar 5, 2024): @ChrislyBear-GH It is not possible because the search phase is encoded into a list of 512 numbers. Then, it uses the VBASE mechanism to identify the score with all the photos and returns the closest match.
Author
Owner

@dannyxu2015 commented on GitHub (Jan 3, 2025):

@ChrislyBear-GH It is not possible because the search phase is encoded into a list of 512 numbers. Then, it uses the VBASE mechanism to identify the score with all the photos and returns the closest match.

@alextran1502 Is it possible to expose the relative scores of all photos in smart search API, and let the developer/customers to decide the score threshold to filter closest match?

@dannyxu2015 commented on GitHub (Jan 3, 2025): > @ChrislyBear-GH It is not possible because the search phase is encoded into a list of 512 numbers. Then, it uses the VBASE mechanism to identify the score with all the photos and returns the closest match. @alextran1502 Is it possible to expose the relative scores of all photos in smart search API, and let the developer/customers to decide the score threshold to filter closest match?
Author
Owner

@mertalev commented on GitHub (Jan 3, 2025):

A flat cutoff doesn't really work for these models. For one query, the most relevant result might have a distance of 0.91 (lower is more similar) and irrelevant results start at 0.93, and 0.95 might be highly relevant in another query. The cutoff would need to be much smarter than that.

@mertalev commented on GitHub (Jan 3, 2025): A flat cutoff doesn't really work for these models. For one query, the most relevant result might have a distance of 0.91 (lower is more similar) and irrelevant results start at 0.93, and 0.95 might be highly relevant in another query. The cutoff would need to be much smarter than that.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: immich-app/immich#1886