smart Search Machine learning request failed with Error: write EPIPE #4246

Closed
opened 2026-02-05 09:55:54 +03:00 by OVERLORD · 0 comments
Owner

Originally created by @ppskj178 on GitHub (Sep 7, 2024).

The bug

I'm getting a smart Search error but it seems to be working, is there a problem?

The OS that Immich Server is running on

Ubuntu 22.04.4 LTS

Version of Immich Server

v1.114.0

Version of Immich Mobile App

just server problem

Platform with the issue

  • Server
  • Web
  • Mobile

Your docker-compose.yml content

#
# WARNING: Make sure to use the docker-compose.yml of the current release:
#
# https://github.com/immich-app/immich/releases/latest/download/docker-compose.yml
#
# The compose file on main may not be compatible with the latest release.
#

name: immich

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    # extends:
    #   file: hwaccel.transcoding.yml
    #   service: cpu # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding
    devices:
      - /dev/dri:/dev/dri
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /etc/localtime:/etc/localtime:ro
      - /data/photo:/data/photo
    env_file:
      - .env
    ports:
      - 2283:3001
    depends_on:
      - redis
      - database
    restart: always

  immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}
    # extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
    #   file: hwaccel.ml.yml
    #   service: cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: always

  redis:
    container_name: immich_redis
    image: docker.io/redis:6.2-alpine@sha256:328fe6a5822256d065debb36617a8169dbfbd77b797c525288e465f56c1d392b
    healthcheck:
      test: redis-cli ping || exit 1
    restart: always

  database:
    container_name: immich_postgres
    image: docker.io/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_INITDB_ARGS: '--data-checksums'
    volumes:
      - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
    healthcheck:
      test: pg_isready --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' || exit 1; Chksum="$$(psql --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' --tuples-only --no-align --command='SELECT COALESCE(SUM(checksum_failures), 0) FROM pg_stat_database')"; echo "checksum failure count is $$Chksum"; [ "$$Chksum" = '0' ] || exit 1
      interval: 5m
      start_interval: 30s
      start_period: 5m
    command: ["postgres", "-c" ,"shared_preload_libraries=vectors.so", "-c", 'search_path="$$user", public, vectors', "-c", "logging_collector=on", "-c", "max_wal_size=2GB", "-c", "shared_buffers=512MB", "-c", "wal_compression=on"]
    restart: always

volumes:
  model-cache:

Your .env content

# You can find documentation for all the supported env variables at https://immich.app/docs/install/environment-variables

# The location where your uploaded files are stored
UPLOAD_LOCATION=./library
# The location where your database files are stored
DB_DATA_LOCATION=./postgres

# To set a timezone, uncomment the next line and change Etc/UTC to a TZ identifier from this list: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List
# TZ=Etc/UTC

# The Immich version to use. You can pin this to a specific version like "v1.71.0"
IMMICH_VERSION=v1.114.0

# Connection secret for postgres. You should change it to a random password
DB_PASSWORD=postgres

# The values below this line do not need to be changed
###################################################################################
DB_USERNAME=postgres
DB_DATABASE_NAME=immich

Reproduction steps

  1. mount NFS (synology nas, NFS v4.1)
  2. docker-compose
  3. add External Library
  4. new library file scan
    ...

Relevant log output

immich_server            | [Nest] 6  - 09/06/2024, 3:34:08 PM   ERROR [Microservices:JobService] Unable to run job handler (smartSearch/smart-search): Error: Machine learning request to "http://immich-machine-learning:3003" failed with Error: write EPIPE
immich_server            | [Nest] 6  - 09/06/2024, 3:34:08 PM   ERROR [Microservices:JobService] Error: Machine learning request to "http://immich-machine-learning:3003" failed with Error: write EPIPE
immich_server            |     at /usr/src/app/dist/repositories/machine-learning.repository.js:19:19
immich_server            |     at async MachineLearningRepository.predict (/usr/src/app/dist/repositories/machine-learning.repository.js:18:21)
immich_server            |     at async MachineLearningRepository.encodeImage (/usr/src/app/dist/repositories/machine-learning.repository.js:42:26)
immich_server            |     at async SmartInfoService.handleEncodeClip (/usr/src/app/dist/services/smart-info.service.js:132:27)
immich_server            |     at async /usr/src/app/dist/services/job.service.js:148:36
immich_server            |     at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28)
immich_server            |     at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24)
immich_server            | [Nest] 6  - 09/06/2024, 3:34:08 PM   ERROR [Microservices:JobService] Object:
immich_server            | {
immich_server            |   "id": "f14a3a50-eb40-474e-a524-a9d5cab6cd0c",
immich_server            |   "source": "upload"
immich_server            | }

###################

immich_machine_learning  | [09/06/24 15:28:37] INFO     Starting gunicorn 23.0.0                           
immich_machine_learning  | [09/06/24 15:28:37] INFO     Listening at: http://[::]:3003 (9)                 
immich_machine_learning  | [09/06/24 15:28:37] INFO     Using worker: app.config.CustomUvicornWorker       
immich_machine_learning  | [09/06/24 15:28:37] INFO     Booting worker with pid: 10                        
immich_machine_learning  | [09/06/24 15:28:44] INFO     Started server process [10]                        
immich_machine_learning  | [09/06/24 15:28:44] INFO     Waiting for application startup.                   
immich_machine_learning  | [09/06/24 15:28:44] INFO     Created in-memory cache with unloading after 300s  
immich_machine_learning  |                              of inactivity.                                     
immich_machine_learning  | [09/06/24 15:28:44] INFO     Initialized request thread pool with 4 threads.    
immich_machine_learning  | [09/06/24 15:28:44] INFO     Application startup complete.                      
immich_machine_learning  | [09/06/24 15:31:26] INFO     Loading visual model 'ViT-B-32__openai' to memory  
immich_machine_learning  | [09/06/24 15:31:26] INFO     Setting execution providers to                     
immich_machine_learning  |                              ['CPUExecutionProvider'], in descending order of   
immich_machine_learning  |                              preference                                         
immich_machine_learning  | [09/06/24 15:31:28] INFO     Loading detection model 'buffalo_l' to memory      
immich_machine_learning  | [09/06/24 15:31:28] INFO     Setting execution providers to                     
immich_machine_learning  |                              ['CPUExecutionProvider'], in descending order of   
immich_machine_learning  |                              preference                                         
immich_machine_learning  | [09/06/24 15:31:30] INFO     Loading recognition model 'buffalo_l' to memory    
immich_machine_learning  | [09/06/24 15:31:30] INFO     Setting execution providers to                     
immich_machine_learning  |                              ['CPUExecutionProvider'], in descending order of   
immich_machine_learning  |                              preference                                         
immich_machine_learning  | 2024-09-06 15:31:41.360817697 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {6,512} for output 683
immich_machine_learning  | 2024-09-06 15:33:10.391278868 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {2,512} for output 683
immich_machine_learning  | 2024-09-06 15:33:36.751320628 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {2,512} for output 683
immich_machine_learning  | 2024-09-06 15:33:44.253202395 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {2,512} for output 683
immich_machine_learning  | 2024-09-06 15:33:59.504701362 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {3,512} for output 683
immich_machine_learning  | 2024-09-06 15:34:18.437689249 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {3,512} for output 683
immich_machine_learning  | 2024-09-06 15:34:22.297890797 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {4,512} for output 683
immich_machine_learning  | [09/06/24 15:40:34] INFO     Shutting down due to inactivity.                   
immich_machine_learning  | [09/06/24 15:40:34] INFO     Shutting down                                      
immich_machine_learning  | [09/06/24 15:40:34] INFO     Waiting for application shutdown.                  
immich_machine_learning  | [09/06/24 15:40:34] INFO     Application shutdown complete.                     
immich_machine_learning  | [09/06/24 15:40:34] INFO     Finished server process [10]                       
immich_machine_learning  | [09/06/24 15:40:34] ERROR    Worker (pid:10) was sent SIGINT!                   
immich_machine_learning  | [09/06/24 15:40:34] INFO     Booting worker with pid: 196

Additional information

n100 cpu, 8GB memory, 2GB swap
on proxmox
docker on lxc
(jellyfin server, https://tteck.github.io/Proxmox/#miscellaneous)
installed with https://github.com/intel/compute-runtime/releases

ps1. swap usage continues to slowly increase, is this normal? 43k images

ps2. I haven't checked it out because it's been a while, but there seems to be an issue where clearing the External Library either doesn't empty the storage completely or does so very slowly. I solved it by uninstalling the docker.

Originally created by @ppskj178 on GitHub (Sep 7, 2024). ### The bug I'm getting a smart Search error but it seems to be working, is there a problem? ### The OS that Immich Server is running on Ubuntu 22.04.4 LTS ### Version of Immich Server v1.114.0 ### Version of Immich Mobile App just server problem ### Platform with the issue - [X] Server - [ ] Web - [ ] Mobile ### Your docker-compose.yml content ```YAML # # WARNING: Make sure to use the docker-compose.yml of the current release: # # https://github.com/immich-app/immich/releases/latest/download/docker-compose.yml # # The compose file on main may not be compatible with the latest release. # name: immich services: immich-server: container_name: immich_server image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release} # extends: # file: hwaccel.transcoding.yml # service: cpu # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding devices: - /dev/dri:/dev/dri volumes: - ${UPLOAD_LOCATION}:/usr/src/app/upload - /etc/localtime:/etc/localtime:ro - /data/photo:/data/photo env_file: - .env ports: - 2283:3001 depends_on: - redis - database restart: always immich-machine-learning: container_name: immich_machine_learning # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag. # Example tag: ${IMMICH_VERSION:-release}-cuda image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release} # extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration # file: hwaccel.ml.yml # service: cpu # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable volumes: - model-cache:/cache env_file: - .env restart: always redis: container_name: immich_redis image: docker.io/redis:6.2-alpine@sha256:328fe6a5822256d065debb36617a8169dbfbd77b797c525288e465f56c1d392b healthcheck: test: redis-cli ping || exit 1 restart: always database: container_name: immich_postgres image: docker.io/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0 environment: POSTGRES_PASSWORD: ${DB_PASSWORD} POSTGRES_USER: ${DB_USERNAME} POSTGRES_DB: ${DB_DATABASE_NAME} POSTGRES_INITDB_ARGS: '--data-checksums' volumes: - ${DB_DATA_LOCATION}:/var/lib/postgresql/data healthcheck: test: pg_isready --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' || exit 1; Chksum="$$(psql --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' --tuples-only --no-align --command='SELECT COALESCE(SUM(checksum_failures), 0) FROM pg_stat_database')"; echo "checksum failure count is $$Chksum"; [ "$$Chksum" = '0' ] || exit 1 interval: 5m start_interval: 30s start_period: 5m command: ["postgres", "-c" ,"shared_preload_libraries=vectors.so", "-c", 'search_path="$$user", public, vectors', "-c", "logging_collector=on", "-c", "max_wal_size=2GB", "-c", "shared_buffers=512MB", "-c", "wal_compression=on"] restart: always volumes: model-cache: ``` ### Your .env content ```Shell # You can find documentation for all the supported env variables at https://immich.app/docs/install/environment-variables # The location where your uploaded files are stored UPLOAD_LOCATION=./library # The location where your database files are stored DB_DATA_LOCATION=./postgres # To set a timezone, uncomment the next line and change Etc/UTC to a TZ identifier from this list: https://en.wikipedia.org/wiki/List_of_tz_database_time_zones#List # TZ=Etc/UTC # The Immich version to use. You can pin this to a specific version like "v1.71.0" IMMICH_VERSION=v1.114.0 # Connection secret for postgres. You should change it to a random password DB_PASSWORD=postgres # The values below this line do not need to be changed ################################################################################### DB_USERNAME=postgres DB_DATABASE_NAME=immich ``` ### Reproduction steps 1. mount NFS (synology nas, NFS v4.1) 2. docker-compose 3. add External Library 4. new library file scan ... ### Relevant log output ```shell immich_server | [Nest] 6 - 09/06/2024, 3:34:08 PM ERROR [Microservices:JobService] Unable to run job handler (smartSearch/smart-search): Error: Machine learning request to "http://immich-machine-learning:3003" failed with Error: write EPIPE immich_server | [Nest] 6 - 09/06/2024, 3:34:08 PM ERROR [Microservices:JobService] Error: Machine learning request to "http://immich-machine-learning:3003" failed with Error: write EPIPE immich_server | at /usr/src/app/dist/repositories/machine-learning.repository.js:19:19 immich_server | at async MachineLearningRepository.predict (/usr/src/app/dist/repositories/machine-learning.repository.js:18:21) immich_server | at async MachineLearningRepository.encodeImage (/usr/src/app/dist/repositories/machine-learning.repository.js:42:26) immich_server | at async SmartInfoService.handleEncodeClip (/usr/src/app/dist/services/smart-info.service.js:132:27) immich_server | at async /usr/src/app/dist/services/job.service.js:148:36 immich_server | at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:394:28) immich_server | at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:581:24) immich_server | [Nest] 6 - 09/06/2024, 3:34:08 PM ERROR [Microservices:JobService] Object: immich_server | { immich_server | "id": "f14a3a50-eb40-474e-a524-a9d5cab6cd0c", immich_server | "source": "upload" immich_server | } ################### immich_machine_learning | [09/06/24 15:28:37] INFO Starting gunicorn 23.0.0 immich_machine_learning | [09/06/24 15:28:37] INFO Listening at: http://[::]:3003 (9) immich_machine_learning | [09/06/24 15:28:37] INFO Using worker: app.config.CustomUvicornWorker immich_machine_learning | [09/06/24 15:28:37] INFO Booting worker with pid: 10 immich_machine_learning | [09/06/24 15:28:44] INFO Started server process [10] immich_machine_learning | [09/06/24 15:28:44] INFO Waiting for application startup. immich_machine_learning | [09/06/24 15:28:44] INFO Created in-memory cache with unloading after 300s immich_machine_learning | of inactivity. immich_machine_learning | [09/06/24 15:28:44] INFO Initialized request thread pool with 4 threads. immich_machine_learning | [09/06/24 15:28:44] INFO Application startup complete. immich_machine_learning | [09/06/24 15:31:26] INFO Loading visual model 'ViT-B-32__openai' to memory immich_machine_learning | [09/06/24 15:31:26] INFO Setting execution providers to immich_machine_learning | ['CPUExecutionProvider'], in descending order of immich_machine_learning | preference immich_machine_learning | [09/06/24 15:31:28] INFO Loading detection model 'buffalo_l' to memory immich_machine_learning | [09/06/24 15:31:28] INFO Setting execution providers to immich_machine_learning | ['CPUExecutionProvider'], in descending order of immich_machine_learning | preference immich_machine_learning | [09/06/24 15:31:30] INFO Loading recognition model 'buffalo_l' to memory immich_machine_learning | [09/06/24 15:31:30] INFO Setting execution providers to immich_machine_learning | ['CPUExecutionProvider'], in descending order of immich_machine_learning | preference immich_machine_learning | 2024-09-06 15:31:41.360817697 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {6,512} for output 683 immich_machine_learning | 2024-09-06 15:33:10.391278868 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {2,512} for output 683 immich_machine_learning | 2024-09-06 15:33:36.751320628 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {2,512} for output 683 immich_machine_learning | 2024-09-06 15:33:44.253202395 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {2,512} for output 683 immich_machine_learning | 2024-09-06 15:33:59.504701362 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {3,512} for output 683 immich_machine_learning | 2024-09-06 15:34:18.437689249 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {3,512} for output 683 immich_machine_learning | 2024-09-06 15:34:22.297890797 [W:onnxruntime:, execution_frame.cc:870 VerifyOutputSizes] Expected shape from model of {1,512} does not match actual shape of {4,512} for output 683 immich_machine_learning | [09/06/24 15:40:34] INFO Shutting down due to inactivity. immich_machine_learning | [09/06/24 15:40:34] INFO Shutting down immich_machine_learning | [09/06/24 15:40:34] INFO Waiting for application shutdown. immich_machine_learning | [09/06/24 15:40:34] INFO Application shutdown complete. immich_machine_learning | [09/06/24 15:40:34] INFO Finished server process [10] immich_machine_learning | [09/06/24 15:40:34] ERROR Worker (pid:10) was sent SIGINT! immich_machine_learning | [09/06/24 15:40:34] INFO Booting worker with pid: 196 ``` ### Additional information n100 cpu, 8GB memory, 2GB swap on proxmox docker on lxc (jellyfin server, https://tteck.github.io/Proxmox/#miscellaneous) installed with https://github.com/intel/compute-runtime/releases ps1. swap usage continues to slowly increase, is this normal? 43k images ps2. I haven't checked it out because it's been a while, but there seems to be an issue where clearing the External Library either doesn't empty the storage completely or does so very slowly. I solved it by uninstalling the docker.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: immich-app/immich#4246