[BUG] Thumbnail and metadata extraction timeouts when uploading several images at once #742

Closed
opened 2026-02-04 22:14:59 +03:00 by OVERLORD · 36 comments
Owner

Originally created by @raisinbear on GitHub (Mar 7, 2023).

The bug

Hi,

As far as I'm aware this is new to one of the more recent releases as I haven't encountered this issue before. In short: when uploading several images - have tested with 10 .heic photos for instance - via mobile app / CLI / Web, I get quite the number of timeout errors from the microservices container à la:

immich-immich-microservices-1  | [Nest] 1  - 03/07/2023, 12:46:24 PM   ERROR [MediaService] Failed to generate jpeg thumbnail for asset: f0929d05-f79a-4917-ad82-2cd9d9c45998
immich-immich-microservices-1  | Error: Connection terminated due to connection timeout
immich-immich-microservices-1  |     at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132:73)
immich-immich-microservices-1  |     at Object.onceWrapper (node:events:641:28)
immich-immich-microservices-1  |     at Connection.emit (node:events:527:28)
immich-immich-microservices-1  |     at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:57:12)
immich-immich-microservices-1  |     at Socket.emit (node:events:527:28)
immich-immich-microservices-1  |     at TCP.<anonymous> (node:net:709:12)
immich-immich-microservices-1  | [Nest] 1  - 03/07/2023, 12:45:40 PM   ERROR [MetadataExtractionProcessor] Error extracting EXIF Error: Connection terminated due to connection timeout
immich-immich-microservices-1  | Error: Connection terminated due to connection timeout
immich-immich-microservices-1  |     at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132:73)
immich-immich-microservices-1  |     at Object.onceWrapper (node:events:641:28)
immich-immich-microservices-1  |     at Connection.emit (node:events:527:28)
immich-immich-microservices-1  |     at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:57:12)
immich-immich-microservices-1  |     at Socket.emit (node:events:527:28)
immich-immich-microservices-1  |     at TCP.<anonymous> (node:net:709:12)
immich_postgres                | 2023-03-07 12:45:44.963 UTC [1598] LOG:  could not receive data from client: Connection reset by peer
immich_postgres                | 2023-03-07 12:45:44.964 UTC [1599] LOG:  could not receive data from client: Connection reset by peer

Once CPU load goes down, a lot of thumbnails and metadata are missing. I assume this is in parts down to my server being generally slow to process the images and or utilized by other services at the time. At the same time, the timeout it is running into seems kind of overly strict. It isn't really that slow 😅. Also - even if a lot of thumbnails / metadata are still missing, I can trigger creation in the web ui and that always succeeds.

I'll try to formulate my questions / suggestions structurally coherent:

  • Has something changed in the backend that would explain why this happens now, with the recent release(s) but didn't in the past?
  • The fact that the jobs, when triggered from web ui never fail with a timeout makes me wonder whether a queue is used here whereas on upload everything is attempted to be done at once.
  • If it's just down to my machine being too slow, it would be great if the timeout were adjustable
  • or if one could trigger the jobs via cli command in the container, making it possible to just set up a cron job on the host machine and forget about it
    (- or if immich could itself schedule jobs for missing thumbnails / metadata on a regular basis. but that would probably mean ui work to set up the schedule, which imo could be overkill)

Thank you guys so much!

The OS that Immich Server is running on

Raspbian Buster (32-bit)

Version of Immich Server

v.1.50.1

Version of Immich Mobile App

v.1.50.0

Platform with the issue

  • Server
  • Web
  • Mobile

Your docker-compose.yml content

version: "3.8"

services:
  immich-server:
    container_name: immich_server
    image: altran1502/immich-server:release
    entrypoint: [ "/bin/sh", "./start-server.sh" ]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
    env_file:
      - .env
    environment:
      - NODE_ENV=production
    depends_on:
      - redis
      - database
    restart: always

  immich-microservices:
    container_name: immich_microservices
    image: altran1502/immich-server:release
    entrypoint: [ "/bin/sh", "./start-microservices.sh" ]
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
    env_file:
      - .env
    environment:
      - NODE_ENV=production
    depends_on:
      - redis
      - database
    restart: always

  immich-web:
    container_name: immich_web
    image: altran1502/immich-web:release
    entrypoint: [ "/bin/sh", "./entrypoint.sh" ]
    env_file:
      - .env
    restart: always

  redis:
    container_name: immich_redis
    image: redis:6.2
    restart: always

  database:
    container_name: immich_postgres
    image: postgres:14
    env_file:
      - .env
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      PG_DATA: /var/lib/postgresql/data
    volumes:
      - pgdata:/var/lib/postgresql/data
    restart: always

  immich-proxy:
    container_name: immich_proxy
    image: altran1502/immich-proxy:release
    environment:
      # Make sure these values get passed through from the env file
      - IMMICH_SERVER_URL
      - IMMICH_WEB_URL
    ports:
      - 2283:8080
    logging:
      driver: none
    depends_on:
      - immich-server
    restart: always

volumes:
  pgdata:

Your .env content

###################################################################################
# Database
###################################################################################

DB_HOSTNAME=immich_postgres
DB_USERNAME=postgres
DB_PASSWORD=postgres
DB_DATABASE_NAME=immich

# Optional Database settings:
# DB_PORT=5432

###################################################################################
# Redis
###################################################################################

REDIS_HOSTNAME=immich_redis

# Optional Redis settings:
# REDIS_PORT=6379
# REDIS_DBINDEX=0
# REDIS_PASSWORD=
# REDIS_SOCKET=

###################################################################################
# Upload File Location
#
# This is the location where uploaded files are stored.
###################################################################################

UPLOAD_LOCATION=/home/immich/data

###################################################################################
# Reverse Geocoding
#
# Reverse geocoding is done locally which has a small impact on memory usage
# This memory usage can be altered by changing the REVERSE_GEOCODING_PRECISION variable
# This ranges from 0-3 with 3 being the most precise
# 3 - Cities > 500 population: ~200MB RAM
# 2 - Cities > 1000 population: ~150MB RAM
# 1 - Cities > 5000 population: ~80MB RAM
# 0 - Cities > 15000 population: ~40MB RAM
####################################################################################

# DISABLE_REVERSE_GEOCODING=false
# REVERSE_GEOCODING_PRECISION=3

####################################################################################
# WEB - Optional
#
# Custom message on the login page, should be written in HTML form.
# For example:
# PUBLIC_LOGIN_PAGE_MESSAGE="This is a demo instance of Immich.<br><br>Email: <i>demo@demo.de</i><br>Password: <i>demo</i>"
####################################################################################

PUBLIC_LOGIN_PAGE_MESSAGE=

####################################################################################
# Alternative Service Addresses - Optional
#
# This is an advanced feature for users who may be running their immich services on different hosts.
# It will not change which address or port that services bind to within their containers, but it will change where other services look for their peers.
# Note: immich-microservices is bound to 3002, but no references are made
####################################################################################

IMMICH_WEB_URL=http://immich-web:3000
IMMICH_SERVER_URL=http://immich-server:3001
IMMICH_MACHINE_LEARNING_URL=http://immich-machine-learning:3003

####################################################################################
# Alternative API's External Address - Optional
#
# This is an advanced feature used to control the public server endpoint returned to clients during Well-known discovery.
# You should only use this if you want mobile apps to access the immich API over a custom URL. Do not include trailing slash.
# NOTE: At this time, the web app will not be affected by this setting and will continue to use the relative path: /api
# Examples: http://localhost:3001, http://immich-api.example.com, etc
####################################################################################

#IMMICH_API_URL_EXTERNAL=http://localhost:3001

Reproduction steps

1. Setup immich on a not too fast machine 😉
2. Upload many images at once, mobile, web, cli, however you like. 
3. Observe timeout errors in log during metadata extraction / thumbnail creation

Additional information

No response

Originally created by @raisinbear on GitHub (Mar 7, 2023). ### The bug Hi, As far as I'm aware this is new to one of the more recent releases as I haven't encountered this issue before. In short: when uploading several images - have tested with 10 .heic photos for instance - via mobile app / CLI / Web, I get quite the number of timeout errors from the microservices container à la: ``` immich-immich-microservices-1 | [Nest] 1 - 03/07/2023, 12:46:24 PM ERROR [MediaService] Failed to generate jpeg thumbnail for asset: f0929d05-f79a-4917-ad82-2cd9d9c45998 immich-immich-microservices-1 | Error: Connection terminated due to connection timeout immich-immich-microservices-1 | at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132:73) immich-immich-microservices-1 | at Object.onceWrapper (node:events:641:28) immich-immich-microservices-1 | at Connection.emit (node:events:527:28) immich-immich-microservices-1 | at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:57:12) immich-immich-microservices-1 | at Socket.emit (node:events:527:28) immich-immich-microservices-1 | at TCP.<anonymous> (node:net:709:12) ``` ``` immich-immich-microservices-1 | [Nest] 1 - 03/07/2023, 12:45:40 PM ERROR [MetadataExtractionProcessor] Error extracting EXIF Error: Connection terminated due to connection timeout immich-immich-microservices-1 | Error: Connection terminated due to connection timeout immich-immich-microservices-1 | at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132:73) immich-immich-microservices-1 | at Object.onceWrapper (node:events:641:28) immich-immich-microservices-1 | at Connection.emit (node:events:527:28) immich-immich-microservices-1 | at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:57:12) immich-immich-microservices-1 | at Socket.emit (node:events:527:28) immich-immich-microservices-1 | at TCP.<anonymous> (node:net:709:12) immich_postgres | 2023-03-07 12:45:44.963 UTC [1598] LOG: could not receive data from client: Connection reset by peer immich_postgres | 2023-03-07 12:45:44.964 UTC [1599] LOG: could not receive data from client: Connection reset by peer ``` Once CPU load goes down, a lot of thumbnails and metadata are missing. I assume this is in parts down to my server being generally slow to process the images and or utilized by other services at the time. At the same time, the timeout it is running into seems kind of overly strict. It isn't really that slow 😅. Also - even if _a lot_ of thumbnails / metadata are still missing, I can trigger creation in the web ui and that always succeeds. I'll try to formulate my questions / suggestions structurally coherent: - Has something changed in the backend that would explain why this happens now, with the recent release(s) but didn't in the past? - The fact that the jobs, when triggered from web ui never fail with a timeout makes me wonder whether a queue is used here whereas on upload everything is attempted to be done at once. - If it's just down to my machine being too slow, it would be great if the timeout were adjustable - or if one could trigger the jobs via cli command in the container, making it possible to just set up a cron job on the host machine and forget about it (- or if immich could itself schedule jobs for missing thumbnails / metadata on a regular basis. but that would probably mean ui work to set up the schedule, which imo could be overkill) Thank you guys so much! ### The OS that Immich Server is running on Raspbian Buster (32-bit) ### Version of Immich Server v.1.50.1 ### Version of Immich Mobile App v.1.50.0 ### Platform with the issue - [X] Server - [ ] Web - [ ] Mobile ### Your docker-compose.yml content ```YAML version: "3.8" services: immich-server: container_name: immich_server image: altran1502/immich-server:release entrypoint: [ "/bin/sh", "./start-server.sh" ] volumes: - ${UPLOAD_LOCATION}:/usr/src/app/upload env_file: - .env environment: - NODE_ENV=production depends_on: - redis - database restart: always immich-microservices: container_name: immich_microservices image: altran1502/immich-server:release entrypoint: [ "/bin/sh", "./start-microservices.sh" ] volumes: - ${UPLOAD_LOCATION}:/usr/src/app/upload env_file: - .env environment: - NODE_ENV=production depends_on: - redis - database restart: always immich-web: container_name: immich_web image: altran1502/immich-web:release entrypoint: [ "/bin/sh", "./entrypoint.sh" ] env_file: - .env restart: always redis: container_name: immich_redis image: redis:6.2 restart: always database: container_name: immich_postgres image: postgres:14 env_file: - .env environment: POSTGRES_PASSWORD: ${DB_PASSWORD} POSTGRES_USER: ${DB_USERNAME} POSTGRES_DB: ${DB_DATABASE_NAME} PG_DATA: /var/lib/postgresql/data volumes: - pgdata:/var/lib/postgresql/data restart: always immich-proxy: container_name: immich_proxy image: altran1502/immich-proxy:release environment: # Make sure these values get passed through from the env file - IMMICH_SERVER_URL - IMMICH_WEB_URL ports: - 2283:8080 logging: driver: none depends_on: - immich-server restart: always volumes: pgdata: ``` ### Your .env content ```Shell ################################################################################### # Database ################################################################################### DB_HOSTNAME=immich_postgres DB_USERNAME=postgres DB_PASSWORD=postgres DB_DATABASE_NAME=immich # Optional Database settings: # DB_PORT=5432 ################################################################################### # Redis ################################################################################### REDIS_HOSTNAME=immich_redis # Optional Redis settings: # REDIS_PORT=6379 # REDIS_DBINDEX=0 # REDIS_PASSWORD= # REDIS_SOCKET= ################################################################################### # Upload File Location # # This is the location where uploaded files are stored. ################################################################################### UPLOAD_LOCATION=/home/immich/data ################################################################################### # Reverse Geocoding # # Reverse geocoding is done locally which has a small impact on memory usage # This memory usage can be altered by changing the REVERSE_GEOCODING_PRECISION variable # This ranges from 0-3 with 3 being the most precise # 3 - Cities > 500 population: ~200MB RAM # 2 - Cities > 1000 population: ~150MB RAM # 1 - Cities > 5000 population: ~80MB RAM # 0 - Cities > 15000 population: ~40MB RAM #################################################################################### # DISABLE_REVERSE_GEOCODING=false # REVERSE_GEOCODING_PRECISION=3 #################################################################################### # WEB - Optional # # Custom message on the login page, should be written in HTML form. # For example: # PUBLIC_LOGIN_PAGE_MESSAGE="This is a demo instance of Immich.<br><br>Email: <i>demo@demo.de</i><br>Password: <i>demo</i>" #################################################################################### PUBLIC_LOGIN_PAGE_MESSAGE= #################################################################################### # Alternative Service Addresses - Optional # # This is an advanced feature for users who may be running their immich services on different hosts. # It will not change which address or port that services bind to within their containers, but it will change where other services look for their peers. # Note: immich-microservices is bound to 3002, but no references are made #################################################################################### IMMICH_WEB_URL=http://immich-web:3000 IMMICH_SERVER_URL=http://immich-server:3001 IMMICH_MACHINE_LEARNING_URL=http://immich-machine-learning:3003 #################################################################################### # Alternative API's External Address - Optional # # This is an advanced feature used to control the public server endpoint returned to clients during Well-known discovery. # You should only use this if you want mobile apps to access the immich API over a custom URL. Do not include trailing slash. # NOTE: At this time, the web app will not be affected by this setting and will continue to use the relative path: /api # Examples: http://localhost:3001, http://immich-api.example.com, etc #################################################################################### #IMMICH_API_URL_EXTERNAL=http://localhost:3001 ``` ### Reproduction steps ```bash 1. Setup immich on a not too fast machine 😉 2. Upload many images at once, mobile, web, cli, however you like. 3. Observe timeout errors in log during metadata extraction / thumbnail creation ``` ### Additional information _No response_
OVERLORD added the 🗄️server label 2026-02-04 22:14:59 +03:00
Author
Owner

@alextran1502 commented on GitHub (Mar 7, 2023):

Is the issue reproducible multiple times?

@alextran1502 commented on GitHub (Mar 7, 2023): Is the issue reproducible multiple times?
Author
Owner

@alextran1502 commented on GitHub (Mar 7, 2023):

We didn't change anything related to generating the thumbnails mechanism. It might be related to communicating with the database from the message you shared

immich_postgres                | 2023-03-07 12:45:44.963 UTC [1598] LOG:  could not receive data from client: Connection reset by peer
immich_postgres                | 2023-03-07 12:45:44.964 UTC [1599] LOG:  could not receive data from client: Connection reset by peer
@alextran1502 commented on GitHub (Mar 7, 2023): We didn't change anything related to generating the thumbnails mechanism. It might be related to communicating with the database from the message you shared ``` immich_postgres | 2023-03-07 12:45:44.963 UTC [1598] LOG: could not receive data from client: Connection reset by peer immich_postgres | 2023-03-07 12:45:44.964 UTC [1599] LOG: could not receive data from client: Connection reset by peer ```
Author
Owner

@jrasm91 commented on GitHub (Mar 7, 2023):

Yeah, looks like the actual thumbnail generated is successfully, but saving the file to the database is failing with a timeout error.

image

@jrasm91 commented on GitHub (Mar 7, 2023): Yeah, looks like the actual thumbnail generated is successfully, but saving the file to the database is failing with a timeout error. ![image](https://user-images.githubusercontent.com/4334196/223509513-39f32c0b-d8c4-4762-b633-950527c6a9af.png)
Author
Owner

@raisinbear commented on GitHub (Mar 7, 2023):

We didn't change anything related to generating the thumbnails mechanism. It might be related to communicating with the database from the message you shared

immich_postgres                | 2023-03-07 12:45:44.963 UTC [1598] LOG:  could not receive data from client: Connection reset by peer
immich_postgres                | 2023-03-07 12:45:44.964 UTC [1599] LOG:  could not receive data from client: Connection reset by peer

Yes it’s reproducible. At first I thought it was a hiccup and that my server was under load from elsewhere but I tried multiple times and it always happens when uploading many (don’t know where the limit is exactly) images at once. About the postgres message in the last two lines, I’ll check again tomorrow and see if I can supply more info. But apart from this issue, everything is running smooth and I didn’t notice anything pointing to an issue with the database or database container.

@raisinbear commented on GitHub (Mar 7, 2023): > We didn't change anything related to generating the thumbnails mechanism. It might be related to communicating with the database from the message you shared > > ``` > immich_postgres | 2023-03-07 12:45:44.963 UTC [1598] LOG: could not receive data from client: Connection reset by peer > immich_postgres | 2023-03-07 12:45:44.964 UTC [1599] LOG: could not receive data from client: Connection reset by peer > ``` Yes it’s reproducible. At first I thought it was a hiccup and that my server was under load from elsewhere but I tried multiple times and it always happens when uploading many (don’t know where the limit is exactly) images at once. About the postgres message in the last two lines, I’ll check again tomorrow and see if I can supply more info. But apart from this issue, everything is running smooth and I didn’t notice anything pointing to an issue with the database or database container.
Author
Owner

@raisinbear commented on GitHub (Mar 8, 2023):

Ok, I did some more tests. I can reliably reproduce this on a Threadripper 24 Core machine in a Debian VM with fresh setup as well, when running a simple stress -c 24 during upload. When the CPUs are idle (on the Threadripper machine) during import, thumbnail generation and metadata extraction runs expectedly without issue.

I also tried to get more from the logs, but this is all I get:

More log lines
immich-immich-microservices-1  | [Nest] 1  - 03/08/2023, 5:32:43 AM   ERROR [MetadataExtractionProcessor] Error extracting EXIF Error: Connection terminated due to connection timeout
immich-immich-microservices-1  | Error: Connection terminated due to connection timeout
immich-immich-microservices-1  |     at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132:73)
immich-immich-microservices-1  |     at Object.onceWrapper (node:events:641:28)
immich-immich-microservices-1  |     at Connection.emit (node:events:527:28)
immich-immich-microservices-1  |     at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:57:12)
immich-immich-microservices-1  |     at Socket.emit (node:events:527:28)
immich-immich-microservices-1  |     at TCP.<anonymous> (node:net:709:12)
immich-immich-microservices-1  | [Nest] 1  - 03/08/2023, 5:32:46 AM   ERROR [MetadataExtractionProcessor] Error extracting EXIF Error: Connection terminated due to connection timeout
immich-immich-microservices-1  | Error: Connection terminated due to connection timeout
immich-immich-microservices-1  |     at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132
:73)
immich-immich-microservices-1  |     at Object.onceWrapper (node:events:641:28)
immich-immich-microservices-1  |     at Connection.emit (node:events:527:28)
immich-immich-microservices-1  |     at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:57:12)
immich-immich-microservices-1  |     at Socket.emit (node:events:527:28)
immich-immich-microservices-1  |     at TCP.<anonymous> (node:net:709:12)
immich_postgres                | 2023-03-08 05:32:51.830 UTC [109] LOG:  could not receive data from client: Connection reset by peer
immich-immich-microservices-1  | [Nest] 1  - 03/08/2023, 5:32:57 AM   ERROR [MetadataExtractionProcessor] Error extracting EXIF Error: Connection terminated due to connection timeout
immich-immich-microservices-1  | Error: Connection terminated due to connection timeout
immich-immich-microservices-1  |     at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132:73)
immich-immich-microservices-1  |     at Object.onceWrapper (node:events:641:28)
immich-immich-microservices-1  |     at Connection.emit (node:events:527:28)
immich-immich-microservices-1  |     at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:57:12)
immich-immich-microservices-1  |     at Socket.emit (node:events:527:28)
immich-immich-microservices-1  |     at TCP.<anonymous> (node:net:709:12)
immich_postgres                | 2023-03-08 05:32:57.690 UTC [110] LOG:  could not receive data from client: C
onnection reset by peer
immich_postgres                | 2023-03-08 05:32:57.693 UTC [112] LOG:  could not receive data from client: Connection reset by peer
immich_postgres                | 2023-03-08 05:32:57.698 UTC [111] LOG:  could not receive data from client: Connection reset by peer
immich_postgres                | 2023-03-08 05:32:57.716 UTC [113] LOG:  could not receive data from client: Connection reset by peer
immich-immich-microservices-1  | [Nest] 1  - 03/08/2023, 5:32:58 AM   ERROR [MediaService] Failed to generate jpeg thumbnail for asset: 5f49be63-dc67-49e0-a98a-8bddff6a49f4
immich-immich-microservices-1  | Error: Connection terminated due to connection timeout
immich-immich-microservices-1  |     at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132:73)
immich-immich-microservices-1  |     at Object.onceWrapper (node:events:641:28)
immich-immich-microservices-1  |     at Connection.emit (node:events:527:28)
immich-immich-microservices-1  |     at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:57:12)
immich-immich-microservices-1  |     at Socket.emit (node:events:527:28)
immich-immich-microservices-1  |     at TCP.<anonymous> (node:net:709:12)
immich-immich-microservices-1  | [Nest] 1  - 03/08/2023, 5:32:58 AM   ERROR [MetadataExtractionProcessor] Error extracting EXIF Error: Connection terminated due to connection timeout
immich-immich-microservices-1  | Error: Connection terminated due to connection timeout
immich-immich-microservices-1  |     at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132:73)
immich-immich-microservices-1  |     at Object.onceWrapper (node:events:641:28)
immich-immich-microservices-1  |     at Connection.emit (node:events:527:28)
immich-immich-microservices-1  |     at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:57:12)
immich-immich-microservices-1  |     at Socket.emit (node:events:527:28)
immich-immich-microservices-1  |     at TCP.<anonymous> (node:net:709:12)
immich-immich-microservices-1  | [Nest] 1  - 03/08/2023, 5:32:58 AM   ERROR [MediaService] Failed to generate jpeg thumbnail for asset: f4943d55-c87e-4b56-88fd-fe866ee2c534
immich-immich-microservices-1  | Error: Connection terminated due to connection timeout
immich-immich-microservices-1  |     at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132:73)
immich-immich-microservices-1  |     at Object.onceWrapper (node:events:641:28)
immich-immich-microservices-1  |     at Connection.emit (node:events:527:28)
immich-immich-microservices-1  |     at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:57:12)
immich-immich-microservices-1  |     at Socket.emit (node:events:527:28)
immich-immich-microservices-1  |     at TCP.<anonymous> (node:net:709:12)
immich-immich-microservices-1  | [Nest] 1  - 03/08/2023, 5:32:58 AM   ERROR [MetadataExtractionProcessor] Error extracting EXIF Error: Connection terminated due to connection timeout
immich-immich-microservices-1  | Error: Connection terminated due to connection timeout
immich-immich-microservices-1  |     at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132:73)
immich-immich-microservices-1  |     at Object.onceWrapper (node:events:641:28)
immich-immich-microservices-1  |     at Connection.emit (node:events:527:28)
immich-immich-microservices-1  |     at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:57:12)
immich-immich-microservices-1  |     at Socket.emit (node:events:527:28)
immich-immich-microservices-1  |     at TCP.<anonymous> (node:net:709:12)

@jrasm91, thanks for the inquiry. Right, as far as I understand these lines, any error during sharp resizing process would be covered by the catch above and I never see this warning. However, I can't seem to follow the code much further to understand what exactly is supposed to be happening during the .save() call and where the timeout could originate from.

I don't know if it helps but it seems that webp thumbnails are created sucessfully.

Thanks for the help!

@raisinbear commented on GitHub (Mar 8, 2023): Ok, I did some more tests. I can reliably reproduce this on a Threadripper 24 Core machine in a Debian VM with fresh setup as well, when running a simple `stress -c 24` during upload. When the CPUs are idle (on the Threadripper machine) during import, thumbnail generation and metadata extraction runs expectedly without issue. I also tried to get more from the logs, but this is all I get: <details><summary>More log lines</summary> ``` immich-immich-microservices-1 | [Nest] 1 - 03/08/2023, 5:32:43 AM ERROR [MetadataExtractionProcessor] Error extracting EXIF Error: Connection terminated due to connection timeout immich-immich-microservices-1 | Error: Connection terminated due to connection timeout immich-immich-microservices-1 | at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132:73) immich-immich-microservices-1 | at Object.onceWrapper (node:events:641:28) immich-immich-microservices-1 | at Connection.emit (node:events:527:28) immich-immich-microservices-1 | at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:57:12) immich-immich-microservices-1 | at Socket.emit (node:events:527:28) immich-immich-microservices-1 | at TCP.<anonymous> (node:net:709:12) immich-immich-microservices-1 | [Nest] 1 - 03/08/2023, 5:32:46 AM ERROR [MetadataExtractionProcessor] Error extracting EXIF Error: Connection terminated due to connection timeout immich-immich-microservices-1 | Error: Connection terminated due to connection timeout immich-immich-microservices-1 | at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132 :73) immich-immich-microservices-1 | at Object.onceWrapper (node:events:641:28) immich-immich-microservices-1 | at Connection.emit (node:events:527:28) immich-immich-microservices-1 | at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:57:12) immich-immich-microservices-1 | at Socket.emit (node:events:527:28) immich-immich-microservices-1 | at TCP.<anonymous> (node:net:709:12) immich_postgres | 2023-03-08 05:32:51.830 UTC [109] LOG: could not receive data from client: Connection reset by peer immich-immich-microservices-1 | [Nest] 1 - 03/08/2023, 5:32:57 AM ERROR [MetadataExtractionProcessor] Error extracting EXIF Error: Connection terminated due to connection timeout immich-immich-microservices-1 | Error: Connection terminated due to connection timeout immich-immich-microservices-1 | at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132:73) immich-immich-microservices-1 | at Object.onceWrapper (node:events:641:28) immich-immich-microservices-1 | at Connection.emit (node:events:527:28) immich-immich-microservices-1 | at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:57:12) immich-immich-microservices-1 | at Socket.emit (node:events:527:28) immich-immich-microservices-1 | at TCP.<anonymous> (node:net:709:12) immich_postgres | 2023-03-08 05:32:57.690 UTC [110] LOG: could not receive data from client: C onnection reset by peer immich_postgres | 2023-03-08 05:32:57.693 UTC [112] LOG: could not receive data from client: Connection reset by peer immich_postgres | 2023-03-08 05:32:57.698 UTC [111] LOG: could not receive data from client: Connection reset by peer immich_postgres | 2023-03-08 05:32:57.716 UTC [113] LOG: could not receive data from client: Connection reset by peer immich-immich-microservices-1 | [Nest] 1 - 03/08/2023, 5:32:58 AM ERROR [MediaService] Failed to generate jpeg thumbnail for asset: 5f49be63-dc67-49e0-a98a-8bddff6a49f4 immich-immich-microservices-1 | Error: Connection terminated due to connection timeout immich-immich-microservices-1 | at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132:73) immich-immich-microservices-1 | at Object.onceWrapper (node:events:641:28) immich-immich-microservices-1 | at Connection.emit (node:events:527:28) immich-immich-microservices-1 | at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:57:12) immich-immich-microservices-1 | at Socket.emit (node:events:527:28) immich-immich-microservices-1 | at TCP.<anonymous> (node:net:709:12) immich-immich-microservices-1 | [Nest] 1 - 03/08/2023, 5:32:58 AM ERROR [MetadataExtractionProcessor] Error extracting EXIF Error: Connection terminated due to connection timeout immich-immich-microservices-1 | Error: Connection terminated due to connection timeout immich-immich-microservices-1 | at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132:73) immich-immich-microservices-1 | at Object.onceWrapper (node:events:641:28) immich-immich-microservices-1 | at Connection.emit (node:events:527:28) immich-immich-microservices-1 | at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:57:12) immich-immich-microservices-1 | at Socket.emit (node:events:527:28) immich-immich-microservices-1 | at TCP.<anonymous> (node:net:709:12) immich-immich-microservices-1 | [Nest] 1 - 03/08/2023, 5:32:58 AM ERROR [MediaService] Failed to generate jpeg thumbnail for asset: f4943d55-c87e-4b56-88fd-fe866ee2c534 immich-immich-microservices-1 | Error: Connection terminated due to connection timeout immich-immich-microservices-1 | at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132:73) immich-immich-microservices-1 | at Object.onceWrapper (node:events:641:28) immich-immich-microservices-1 | at Connection.emit (node:events:527:28) immich-immich-microservices-1 | at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:57:12) immich-immich-microservices-1 | at Socket.emit (node:events:527:28) immich-immich-microservices-1 | at TCP.<anonymous> (node:net:709:12) immich-immich-microservices-1 | [Nest] 1 - 03/08/2023, 5:32:58 AM ERROR [MetadataExtractionProcessor] Error extracting EXIF Error: Connection terminated due to connection timeout immich-immich-microservices-1 | Error: Connection terminated due to connection timeout immich-immich-microservices-1 | at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132:73) immich-immich-microservices-1 | at Object.onceWrapper (node:events:641:28) immich-immich-microservices-1 | at Connection.emit (node:events:527:28) immich-immich-microservices-1 | at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:57:12) immich-immich-microservices-1 | at Socket.emit (node:events:527:28) immich-immich-microservices-1 | at TCP.<anonymous> (node:net:709:12) ``` </details> @jrasm91, thanks for the inquiry. Right, as far as I understand these lines, any error during `sharp` resizing process would be covered by the catch above and I never see this warning. However, I can't seem to follow the code much further to understand what exactly is supposed to be happening during the `.save()` call and where the timeout could originate from. I don't know if it helps but it _seems_ that webp thumbnails are created sucessfully. Thanks for the help!
Author
Owner

@raisinbear commented on GitHub (Mar 10, 2023):

Brief update, I was trying to understand how the processing worked. Can't say I fully do, but I changed the concurrency setting in server/apps/microservices/src/processor.ts for the JobName.GENERATE_JPEG_THUMBNAIL and JobName.GENERATE_WEBP_THUMBNAIL processes from 3 to 1 (lines 116 and 121 in current main):

grafik

Also, I introduced a probably redundant concurrency: 1 in line 151 of [...]/src/processors/metadata-extraction.processor.ts:

grafik

I transferred these changes directly into the according .js files in the microservices container on my raspberry pi and uploaded 16+ images at once - the same sequence of images that always failed before - several times (deleteing them in between). No timeouts 😀. I have no idea if that is symptomatic treatment. It doesn't seem like a viable root cause. But thumbnail generation and metadata extraction even succeed now while running a stress -c 4 + forced heavy traffic from another dockerized service. The latter is anecdotal, as I only tried once, but before, roughly but reliably 1/4th of the jobs never completed even with all other services shut down..

Does that make any sense to you?

@raisinbear commented on GitHub (Mar 10, 2023): Brief update, I was trying to understand how the processing worked. Can't say I fully do, but I changed the `concurrency` setting in `server/apps/microservices/src/processor.ts` for the `JobName.GENERATE_JPEG_THUMBNAIL` and `JobName.GENERATE_WEBP_THUMBNAIL` processes from 3 to 1 (lines 116 and 121 in current `main`): ![grafik](https://user-images.githubusercontent.com/68740188/224264749-6cd67f79-1d15-41c3-a6df-b9233247b408.png) Also, I introduced a probably redundant `concurrency: 1` in line 151 of `[...]/src/processors/metadata-extraction.processor.ts`: ![grafik](https://user-images.githubusercontent.com/68740188/224264837-b844ca25-d7bf-4f68-be61-81f353f1a665.png) I transferred these changes directly into the according .js files in the microservices container on my raspberry pi and uploaded 16+ images at once - the same sequence of images that _always_ failed before - several times (deleteing them in between). **No timeouts** 😀. I have no idea if that is symptomatic treatment. It doesn't seem like a viable root cause. But thumbnail generation and metadata extraction even succeed now while running a `stress -c 4` + forced heavy traffic from another dockerized service. The latter is anecdotal, as I only tried once, but before, roughly but reliably 1/4th of the jobs never completed even with all other services shut down.. Does that make any sense to you?
Author
Owner

@raisinbear commented on GitHub (Mar 10, 2023):

Dove a little deeper: as per bull documentation, concurrencies stack up:

* For each named processor, concurrency stacks up, so any of these three process functions
* can run with a concurrency of 125. To avoid this behaviour you need to create an own queue
* for each process function.
*/
const loadBalancerQueue = new Queue('loadbalancer');
loadBalancerQueue.process('requestProfile', 100, requestProfile);
loadBalancerQueue.process('sendEmail', 25, sendEmail);
loadBalancerQueue.process('sendInvitation', 0, sendInvite);

That means before my change it was doing 6 thumbnail generations in parallel. Plus 4 Metadata extractions if I calculated correctly (missing concurrency specifier defaults to 1 say the docs). Plus 2 Video transcodings, if there are any (weren't in the tests before). I checked via added logger.warn() line in media.service.ts and indeed, with my double concurrency: 1 modification, two thumbnail generations are done in parallel. If setting concurrency to 0 in line 121, thumbnail generations happen one after the other. Together with concurrency: 1 on videos this actually gave me an overall speedup of 30% over the modification above and video concurrency: 2 (this time 2 videos + 16 images).
I still don't know where the timeout originated from. I could speculate that jobs begin to stall because too many are running in parallel on a machine with not enough ressources, but this is just guesswork.

@raisinbear commented on GitHub (Mar 10, 2023): Dove a little deeper: as per bull documentation, concurrencies stack up: >```/*** > * For each named processor, concurrency stacks up, so any of these three process functions > * can run with a concurrency of 125. To avoid this behaviour you need to create an own queue > * for each process function. > */ >const loadBalancerQueue = new Queue('loadbalancer'); >loadBalancerQueue.process('requestProfile', 100, requestProfile); >loadBalancerQueue.process('sendEmail', 25, sendEmail); >loadBalancerQueue.process('sendInvitation', 0, sendInvite); That means before my change it was doing 6 thumbnail generations in parallel. Plus 4 Metadata extractions if I calculated correctly (missing `concurrency` specifier defaults to 1 say the docs). Plus 2 Video transcodings, if there are any (weren't in the tests before). I checked via added `logger.warn()` line in `media.service.ts` and indeed, with my double `concurrency: 1` modification, two thumbnail generations are done in parallel. If setting `concurrency` to 0 in line 121, thumbnail generations happen one after the other. Together with `concurrency: 1` on videos this actually gave me an overall speedup of 30% over the modification above and video `concurrency: 2` (this time 2 videos + 16 images). I still don't know where the timeout originated from. I could speculate that jobs begin to stall because too many are running in parallel on a machine with not enough ressources, but this is just guesswork.
Author
Owner

@penguinsam commented on GitHub (Mar 21, 2023):

Timeout happens in my instance too. It is good if these values can be put in env.

@penguinsam commented on GitHub (Mar 21, 2023): Timeout happens in my instance too. It is good if these values can be put in env.
Author
Owner

@EnochPrime commented on GitHub (Apr 2, 2023):

I'm running into these timeouts as well very consistently. Can confirm that bull stacks the concurrency. On v1.52.0 it says 7 thumbnail tasks are running.

@EnochPrime commented on GitHub (Apr 2, 2023): I'm running into these timeouts as well very consistently. Can confirm that bull stacks the concurrency. On v1.52.0 it says 7 thumbnail tasks are running.
Author
Owner

@EnochPrime commented on GitHub (Apr 4, 2023):

According to discord user mudone these errors may be a result of the database timeout.

c584791b65/server/libs/infra/src/database.config.ts (L21)

@EnochPrime commented on GitHub (Apr 4, 2023): According to discord user mudone these errors may be a result of the database timeout. https://github.com/immich-app/immich/blob/c584791b65c88bfc327cfbc55407502362897f14/server/libs/infra/src/database.config.ts#L21
Author
Owner

@raisinbear commented on GitHub (Apr 4, 2023):

According to discord user mudone these errors may be a result of the database timeout.

c584791b65/server/libs/infra/src/database.config.ts (L21)

I tried changing that setting, too, but raising the timeout didn’t do anything for me. The timeout error might be symptomatic? I don’t understand the reason exactly, but with lowered concurrency the errors don’t occur in my instance.

@raisinbear commented on GitHub (Apr 4, 2023): > According to discord user mudone these errors may be a result of the database timeout. > > https://github.com/immich-app/immich/blob/c584791b65c88bfc327cfbc55407502362897f14/server/libs/infra/src/database.config.ts#L21 I tried changing that setting, too, but raising the timeout didn’t do anything for me. The timeout error might be symptomatic? I don’t understand the reason exactly, but with lowered concurrency the errors don’t occur in my instance.
Author
Owner

@EnochPrime commented on GitHub (Apr 4, 2023):

The user on discord had success with a 60s timeout, but I do agree that it is probably more of a symptom. If things are running smoothly 10s should be plenty of time.

@EnochPrime commented on GitHub (Apr 4, 2023): The user on discord had success with a 60s timeout, but I do agree that it is probably more of a symptom. If things are running smoothly 10s should be plenty of time.
Author
Owner

@jrasm91 commented on GitHub (Apr 4, 2023):

Maybe it's related to the cpu being swamped by the microservices container and throttling it's usage would help prevent the issue.

@jrasm91 commented on GitHub (Apr 4, 2023): Maybe it's related to the cpu being swamped by the microservices container and throttling it's usage would help prevent the issue.
Author
Owner

@raisinbear commented on GitHub (Apr 4, 2023):

Maybe it's related to the cpu being swamped by the microservices container and throttling it's usage would help prevent the issue.

Right. How would you go about this other than lowering concurrency? At least for me there are no other services running anymore but apparently 7 thumbnail creations + the „small“ stuff like metadata extraction etc. in parallel is enough to exhaust cpu :/ even without videos coming in, which are by default processed in pairs, too.

@raisinbear commented on GitHub (Apr 4, 2023): > Maybe it's related to the cpu being swamped by the microservices container and throttling it's usage would help prevent the issue. Right. How would you go about this other than lowering concurrency? At least for me there are no other services running anymore but apparently 7 thumbnail creations + the „small“ stuff like metadata extraction etc. in parallel is enough to exhaust cpu :/ even without videos coming in, which are by default processed in pairs, too.
Author
Owner
@jrasm91 commented on GitHub (Apr 4, 2023): https://docs.docker.com/compose/compose-file/compose-file-v3/#resources
Author
Owner

@EnochPrime commented on GitHub (Apr 4, 2023):

My microservices has been running restricted, but I lessed these errors by expanding the resources available. I was not running into this before v1.50.

@EnochPrime commented on GitHub (Apr 4, 2023): My microservices has been running restricted, but I lessed these errors by expanding the resources available. I was not running into this before v1.50.
Author
Owner

@EnochPrime commented on GitHub (Apr 4, 2023):

That being said I should probably run a test with nothing else running to make sure it is not a case of other services competing for the cpu cycles.

@EnochPrime commented on GitHub (Apr 4, 2023): That being said I should probably run a test with nothing else running to make sure it is not a case of other services competing for the cpu cycles.
Author
Owner

@raisinbear commented on GitHub (Apr 4, 2023):

https://docs.docker.com/compose/compose-file/compose-file-v3/#resources

Wow, didn’t even think of that 🙈. Will try, but as @EnochPrime reports it doesn’t seem to resolve the issue but might actually make it worse. Could it have to do with stalling of the jobs? Sadly, I’ve no experience with bull, merely guessing from what I find 😐

@raisinbear commented on GitHub (Apr 4, 2023): > https://docs.docker.com/compose/compose-file/compose-file-v3/#resources Wow, didn’t even think of that 🙈. Will try, but as @EnochPrime reports it doesn’t seem to resolve the issue but might actually make it worse. Could it have to do with [stalling of the jobs](https://github.com/OptimalBits/bull#important-notes)? Sadly, I’ve no experience with bull, merely guessing from what I find 😐
Author
Owner

@EnochPrime commented on GitHub (Apr 7, 2023):

I updated to v1.53.0 and also deployed to a node with more available resources. I am still seeing these errors, but the microservices container has not shutdown and it appears to be making progress.

@EnochPrime commented on GitHub (Apr 7, 2023): I updated to v1.53.0 and also deployed to a node with more available resources. I am still seeing these errors, but the microservices container has not shutdown and it appears to be making progress.
Author
Owner

@rhullah commented on GitHub (Apr 11, 2023):

I recently upgraded from v1.51.2 to v1.53.0 and ran the Generate Thumbs job due to the recent change in folder structure and I'm seeing these errors too. I also now have a bunch of missing thumbnails and full size images due to these errors. Is there anything I can do to ensure the jobs don't timeout and instead succeed? I'm also on a RasberryPi so resources might be limited, but I didn't see much stress on the system while the job was running. I'm wondering if my issue is more of a slow to write storage path than a resource (CPU/RAM) issue.

@rhullah commented on GitHub (Apr 11, 2023): I recently upgraded from v1.51.2 to v1.53.0 and ran the Generate Thumbs job due to the recent change in folder structure and I'm seeing these errors too. I also now have a bunch of missing thumbnails and full size images due to these errors. Is there anything I can do to ensure the jobs don't timeout and instead succeed? I'm also on a RasberryPi so resources might be limited, but I didn't see much stress on the system while the job was running. I'm wondering if my issue is more of a slow to write storage path than a resource (CPU/RAM) issue.
Author
Owner

@rhullah commented on GitHub (Apr 11, 2023):

I'm wondering if my issue is more of a slow to write storage path than a resource (CPU/RAM) issue.

I'm no longer sure it's a slow storage location issue. I've volume mapped a much faster location for the the thumbs/... path and I'm still receiving the "Connection Terminated due to connection timeout" error response which comes from this "Failed to generate thumbnail for asset" error message.

@rhullah commented on GitHub (Apr 11, 2023): > I'm wondering if my issue is more of a slow to write storage path than a resource (CPU/RAM) issue. I'm no longer sure it's a slow storage location issue. I've volume mapped a much faster location for the the `thumbs/...` path and I'm still receiving the "Connection Terminated due to connection timeout" error response which comes from this ["Failed to generate thumbnail for asset"](https://github.com/immich-app/immich/blob/dd8d1133344f2aa251511b8e101d3a3fd35d8412/server/libs/domain/src/media/media.service.ts#L85) error message.
Author
Owner

@Gatherix commented on GitHub (Apr 11, 2023):

I resolved this issue by deploying on my desktop, which compared to the previous machine has the same memory but many more CPU resources available. All files remained on the previous machine and were accessed/written via a network share. So this seems CPU-bound instead of storage-related. Generating ~10k thumbnails took several hours of moderate CPU usage. Prior to using my desktop, I saw the same behavior as others: failed thumbnails, connection timeouts, and a persistently crashing microservices container.

@Gatherix commented on GitHub (Apr 11, 2023): I resolved this issue by deploying on my desktop, which compared to the previous machine has the same memory but many more CPU resources available. All files remained on the previous machine and were accessed/written via a network share. So this seems CPU-bound instead of storage-related. Generating ~10k thumbnails took several hours of moderate CPU usage. Prior to using my desktop, I saw the same behavior as others: failed thumbnails, connection timeouts, and a persistently crashing microservices container.
Author
Owner

@rhullah commented on GitHub (Apr 11, 2023):

My CPU sits there with hardly any usage while still getting these errors. It's as if Postgres just fell asleep or something because the timeouts are coming from the PG client:

Error: Connection terminated due to connection timeout
  at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132:73)
  at Object.onceWrapper (node:events:641:28)
  at Connection.emit (node:events:527:28)
  at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:57:12)
  at Socket.emit (node:events:527:28)
  at TCP.<anonymous> (node:net:709:12)

The other thing that confuses me is that attempting to Generate Thumbnails for only those that are missing seems to do nothing. It's as if the ones that are erroring are still getting marked as completed because nothing seems to be running when I click the "Missing" button for the Generate Thumbnails job.

@rhullah commented on GitHub (Apr 11, 2023): My CPU sits there with hardly any usage while still getting these errors. It's as if Postgres just fell asleep or something because the timeouts are coming from the PG client: ``` Error: Connection terminated due to connection timeout at Connection.<anonymous> (/usr/src/app/node_modules/pg/lib/client.js:132:73) at Object.onceWrapper (node:events:641:28) at Connection.emit (node:events:527:28) at Socket.<anonymous> (/usr/src/app/node_modules/pg/lib/connection.js:57:12) at Socket.emit (node:events:527:28) at TCP.<anonymous> (node:net:709:12) ``` The other thing that confuses me is that attempting to Generate Thumbnails for only those that are missing seems to do nothing. It's as if the ones that are erroring are still getting marked as completed because nothing seems to be running when I click the "Missing" button for the Generate Thumbnails job.
Author
Owner

@raisinbear commented on GitHub (Apr 11, 2023):

@rhullah, as I think I wrote further up, I could only keep this in check with manual changes in the .js files in the microservices container to lower the overpowering level of concurrency. Doing that, I never got the issue again even on a RaspberryPi 2. However, this is only a temporary fix and the contrary of set and forget, as recreating the container / updating will undo the modifications. A stronger machine definitely helps, but I also experienced it on a RaspberryPi 4 a couple of times with the stock settings.

@raisinbear commented on GitHub (Apr 11, 2023): @rhullah, as I think I wrote further up, I could only keep this in check with manual changes in the .js files in the microservices container to lower the overpowering level of concurrency. Doing that, I never got the issue again even on a RaspberryPi 2. However, this is only a temporary fix and the contrary of set and forget, as recreating the container / updating will undo the modifications. A stronger machine definitely helps, but I also experienced it on a RaspberryPi 4 a couple of times with the stock settings.
Author
Owner

@Gatherix commented on GitHub (Apr 11, 2023):

Do you get any successful thumbnails before the failures start @rhullah? I similarly saw little CPU usage when getting the errors and a seemingly useless "Missing" button.

@Gatherix commented on GitHub (Apr 11, 2023): Do you get any successful thumbnails before the failures start @rhullah? I similarly saw little CPU usage when getting the errors and a seemingly useless "Missing" button.
Author
Owner

@rhullah commented on GitHub (Apr 11, 2023):

Do you get any successful thumbnails before the failures start @rhullah? I similarly saw little CPU usage when getting the errors and a seemingly useless "Missing" button.

I seemed to generate a few successful thumbs then it would consistently have the timeout and throw error logs. Then after a longer time, it would seem that Postgres would wake up and it would start successfully creating thumbs again. As a result, some images in Immich would be missing thumbs (on the main library page) and missing the detailed image (when clicking on a specific item).

@rhullah commented on GitHub (Apr 11, 2023): > Do you get any successful thumbnails before the failures start @rhullah? I similarly saw little CPU usage when getting the errors and a seemingly useless "Missing" button. I seemed to generate a few successful thumbs then it would consistently have the timeout and throw error logs. Then after a longer time, it would seem that Postgres would wake up and it would start successfully creating thumbs again. As a result, some images in Immich would be missing thumbs (on the main library page) and missing the detailed image (when clicking on a specific item).
Author
Owner

@alextran1502 commented on GitHub (Apr 11, 2023):

This is an issue after we add Typesense and rewrite the machine learning in Python, with the combined CPU usage of machine learning + video transcoding and thumbnail generation. If your CPU is not powerful enough, it will hog all the running processes and cannot be completed in time (the timeout notification). I am trying to think about how to manage the queue better so that it can help elevate this issue and let the slower/less powerful device runs all the jobs successfully, even with a slower completion time.

@alextran1502 commented on GitHub (Apr 11, 2023): This is an issue after we add Typesense and rewrite the machine learning in Python, with the combined CPU usage of machine learning + video transcoding and thumbnail generation. If your CPU is not powerful enough, it will hog all the running processes and cannot be completed in time (the timeout notification). I am trying to think about how to manage the queue better so that it can help elevate this issue and let the slower/less powerful device runs all the jobs successfully, even with a slower completion time.
Author
Owner

@rhullah commented on GitHub (Apr 11, 2023):

This is an issue after we add Typesense and rewrite the machine learning in Python, with the combined CPU usage of machine learning + video transcoding and thumbnail generation.

Would this be the case even if I have Machine Learning disabled? Because I do. I was getting restarts happening with the Machine Learning container (before I ran the template path job) so I disabled that container in the compose file and set it to false in the .env file.

And, does video transcoding occur in the "Generate Thumbnails" job? I'm not uploading new assets, only trying to "fix" the template paths so that they are in the new location.

@rhullah commented on GitHub (Apr 11, 2023): > This is an issue after we add Typesense and rewrite the machine learning in Python, with the combined CPU usage of machine learning + video transcoding and thumbnail generation. Would this be the case even if I have Machine Learning disabled? Because I do. I was getting restarts happening with the Machine Learning container (before I ran the template path job) so I disabled that container in the compose file and set it to `false` in the `.env` file. And, does video transcoding occur in the "Generate Thumbnails" job? I'm not uploading new assets, only trying to "fix" the template paths so that they are in the new location.
Author
Owner

@rhullah commented on GitHub (Apr 11, 2023):

@rhullah, as I think I wrote further up, I could only keep this in check with manual changes in the .js files in the microservices container to lower the overpowering level of concurrency. Doing that, I never got the issue again even on a RaspberryPi 2. However, this is only a temporary fix and the contrary of set and forget, as recreating the container / updating will undo the modifications. A stronger machine definitely helps, but I also experienced it on a RaspberryPi 4 a couple of times with the stock settings.

Yeah, I did notice that. I wasn't sure which file(s) was update where but I was trying to look into it. I wouldn't mind changing it, even temporarily, just to get past this update of the new template paths.

@rhullah commented on GitHub (Apr 11, 2023): > @rhullah, as I think I wrote further up, I could only keep this in check with manual changes in the .js files in the microservices container to lower the overpowering level of concurrency. Doing that, I never got the issue again even on a RaspberryPi 2. However, this is only a temporary fix and the contrary of set and forget, as recreating the container / updating will undo the modifications. A stronger machine definitely helps, but I also experienced it on a RaspberryPi 4 a couple of times with the stock settings. Yeah, I did notice that. I wasn't sure which file(s) was update where but I was trying to look into it. I wouldn't mind changing it, even temporarily, just to get past this update of the new template paths.
Author
Owner

@raisinbear commented on GitHub (Apr 11, 2023):

@rhullah, as I think I wrote further up, I could only keep this in check with manual changes in the .js files in the microservices container to lower the overpowering level of concurrency. Doing that, I never got the issue again even on a RaspberryPi 2. However, this is only a temporary fix and the contrary of set and forget, as recreating the container / updating will undo the modifications. A stronger machine definitely helps, but I also experienced it on a RaspberryPi 4 a couple of times with the stock settings.

Yeah, I did notice that. I wasn't sure which file(s) was update where but I was trying to look into it. I wouldn't mind changing it, even temporarily, just to get past this update of the new template paths.

If you’re interested in tinkering, some of the parallelism settings are in here:
immich_microservices:/usr/src/app/dist/apps/microservices/apps/microservices/src/processors.js

The lower part of this file looks as follows for me:

decorate([
(0, bull_1.Process)({ name: domain_1.JobName.QUEUE_GENERATE_THUMBNAILS, concurrency: 1 }),
__metadata("design:type", Function),
__metadata("design:paramtypes", [Object]),
__metadata("design:returntype", Promise)
], ThumbnailGeneratorProcessor.prototype, "handleQueueGenerateThumbnails", null);
__decorate([
(0, bull_1.Process)({ name: domain_1.JobName.GENERATE_JPEG_THUMBNAIL, concurrency: 0 }),
__metadata("design:type", Function),
__metadata("design:paramtypes", [Object]),
__metadata("design:returntype", Promise)
], ThumbnailGeneratorProcessor.prototype, "handleGenerateJpegThumbnail", null);
__decorate([
(0, bull_1.Process)({ name: domain_1.JobName.GENERATE_JPEG_THUMBNAIL_DC, concurrency: 0 }),
__metadata("design:type", Function),
__metadata("design:paramtypes", [Object]),
__metadata("design:returntype", Promise)
], ThumbnailGeneratorProcessor.prototype, "handleGenerateJpegThumbnail_dc", null);
__decorate([
(0, bull_1.Process)({ name: domain_1.JobName.GENERATE_WEBP_THUMBNAIL, concurrency: 0 }),
__metadata("design:type", Function),
__metadata("design:paramtypes", [Object]),
__metadata("design:returntype", Promise)
], ThumbnailGeneratorProcessor.prototype, "handleGenerateWepbThumbnail", null);
__decorate([
(0, bull_1.Process)({ name: domain_1.JobName.GENERATE_WEBP_THUMBNAIL_DC, concurrency: 0 }),
__metadata("design:type", Function),
__metadata("design:paramtypes", [Object]),
__metadata("design:returntype", Promise)
], ThumbnailGeneratorProcessor.prototype, "handleGenerateWepbThumbnail_dc", null);
ThumbnailGeneratorProcessor = __decorate([
(0, bull_1.Processor)(domain_1.QueueName.THUMBNAIL_GENERATION),
__metadata("design:paramtypes", [domain_1.MediaService])
], ThumbnailGeneratorProcessor);
exports.ThumbnailGeneratorProcessor = ThumbnailGeneratorProcessor;
//# sourceMappingURL=processors.js.map

That is because bull processes stack up. So one is specified with concurrency 1, the others with 0. giving a total of 1 instead of the 7 introduced previously.
There is much more than that, also in the „processors“ subdirectory. Some processors don’t have concurrency specified, so stack up by sheer number (default concurrency is 1).
A couple of notes, though:

  • your problem seems distinct in that even when triggering generation multiple times, you don’t get the thumbnails. With the timeout error, I always had a chance that some ran successfully as should be expected.
  • The file content above is from 1.52.1 and only an example, I’m away from my server atm and cannot access the current configuration :/
  • if you’re going through with changes, copy the modified file(s) back to the container and restart the microservices
  • The timeout error is completely gone for me but I ran into #2115 more often for some timing reason, so in the end introduced many more changes (not production ready by a long shot, otherwise might have tried to contribute) to mitigate that as well
@raisinbear commented on GitHub (Apr 11, 2023): > > @rhullah, as I think I wrote further up, I could only keep this in check with manual changes in the .js files in the microservices container to lower the overpowering level of concurrency. Doing that, I never got the issue again even on a RaspberryPi 2. However, this is only a temporary fix and the contrary of set and forget, as recreating the container / updating will undo the modifications. A stronger machine definitely helps, but I also experienced it on a RaspberryPi 4 a couple of times with the stock settings. > > Yeah, I did notice that. I wasn't sure which file(s) was update where but I was trying to look into it. I wouldn't mind changing it, even temporarily, just to get past this update of the new template paths. If you’re interested in tinkering, some of the parallelism settings are in here: immich_microservices:/usr/src/app/dist/apps/microservices/apps/microservices/src/processors.js The lower part of this file looks as follows for me: decorate([ (0, bull_1.Process)({ name: domain_1.JobName.QUEUE_GENERATE_THUMBNAILS, concurrency: 1 }), __metadata("design:type", Function), __metadata("design:paramtypes", [Object]), __metadata("design:returntype", Promise) ], ThumbnailGeneratorProcessor.prototype, "handleQueueGenerateThumbnails", null); __decorate([ (0, bull_1.Process)({ name: domain_1.JobName.GENERATE_JPEG_THUMBNAIL, concurrency: 0 }), __metadata("design:type", Function), __metadata("design:paramtypes", [Object]), __metadata("design:returntype", Promise) ], ThumbnailGeneratorProcessor.prototype, "handleGenerateJpegThumbnail", null); __decorate([ (0, bull_1.Process)({ name: domain_1.JobName.GENERATE_JPEG_THUMBNAIL_DC, concurrency: 0 }), __metadata("design:type", Function), __metadata("design:paramtypes", [Object]), __metadata("design:returntype", Promise) ], ThumbnailGeneratorProcessor.prototype, "handleGenerateJpegThumbnail_dc", null); __decorate([ (0, bull_1.Process)({ name: domain_1.JobName.GENERATE_WEBP_THUMBNAIL, concurrency: 0 }), __metadata("design:type", Function), __metadata("design:paramtypes", [Object]), __metadata("design:returntype", Promise) ], ThumbnailGeneratorProcessor.prototype, "handleGenerateWepbThumbnail", null); __decorate([ (0, bull_1.Process)({ name: domain_1.JobName.GENERATE_WEBP_THUMBNAIL_DC, concurrency: 0 }), __metadata("design:type", Function), __metadata("design:paramtypes", [Object]), __metadata("design:returntype", Promise) ], ThumbnailGeneratorProcessor.prototype, "handleGenerateWepbThumbnail_dc", null); ThumbnailGeneratorProcessor = __decorate([ (0, bull_1.Processor)(domain_1.QueueName.THUMBNAIL_GENERATION), __metadata("design:paramtypes", [domain_1.MediaService]) ], ThumbnailGeneratorProcessor); exports.ThumbnailGeneratorProcessor = ThumbnailGeneratorProcessor; //# sourceMappingURL=processors.js.map That is because bull processes stack up. So one is specified with concurrency 1, the others with 0. giving a total of 1 instead of the 7 introduced previously. There is much more than that, also in the „processors“ subdirectory. Some processors don’t have concurrency specified, so stack up by sheer number (default concurrency is 1). A couple of notes, though: - your problem seems distinct in that even when triggering generation multiple times, you don’t get the thumbnails. With the timeout error, I always had a chance that some ran successfully as should be expected. - The file content above is from 1.52.1 and only an example, I’m away from my server atm and cannot access the current configuration :/ - if you’re going through with changes, copy the modified file(s) back to the container and restart the microservices - The timeout error is completely gone for me but I ran into #2115 more often for some timing reason, so in the end introduced many more changes (not production ready by a long shot, otherwise might have tried to contribute) to mitigate that as well
Author
Owner

@rhullah commented on GitHub (Apr 11, 2023):

Thanks, I changed both GENERATE_JPEG_THUMBNAIL and GENERATE_WEBP_THUMBNAIL concurrency to 1 and then ran the job again. This time it was able to go through all the images/videos and generate thumbnails with no error. I have since restarted the container so that it reset the values back. I'll just keep an eye on the logs during sync and see if there's errors in the future with new uploads.

@rhullah commented on GitHub (Apr 11, 2023): Thanks, I changed both `GENERATE_JPEG_THUMBNAIL` and `GENERATE_WEBP_THUMBNAIL` concurrency to `1` and then ran the job again. This time it was able to go through all the images/videos and generate thumbnails with no error. I have since restarted the container so that it reset the values back. I'll just keep an eye on the logs during sync and see if there's errors in the future with new uploads.
Author
Owner

@wittymap commented on GitHub (May 29, 2023):

Just wanted to report out that I am also seeing this timeout issue (exact same errors as OP) when uploading and processing more than ~50 files at a time. Running v1.58.0 on Docker on a reasonably-fast Windows 10 machine (7th gen i7 @ 2.8GHz, 32GB RAM)

Changing all of the concurrencies to 1 in server/libs/domain/src/job/job.constants.ts within the microservices app kept the CPU usage down and resolved the timeout issue. Limiting the CPU usage allowable for the microservices app in docker did not help.

It'd be really great if these concurrencies could be configured in the .env file instead of having to edit the source.

@wittymap commented on GitHub (May 29, 2023): Just wanted to report out that I am also seeing this timeout issue (exact same errors as OP) when uploading and processing more than ~50 files at a time. Running v1.58.0 on Docker on a reasonably-fast Windows 10 machine (7th gen i7 @ 2.8GHz, 32GB RAM) Changing all of the concurrencies to 1 in `server/libs/domain/src/job/job.constants.ts` within the microservices app kept the CPU usage down and resolved the timeout issue. Limiting the CPU usage allowable for the microservices app in docker did not help. It'd be really great if these concurrencies could be configured in the .env file instead of having to edit the source.
Author
Owner

@jrasm91 commented on GitHub (May 29, 2023):

I just updated how jobs, handlers, queues, and concurrencies are configured in the server code. Maybe I can see if they can be dynamically re-configured at runtime now, which would mean they could be added to the administration > settings page.

@jrasm91 commented on GitHub (May 29, 2023): I just updated how jobs, handlers, queues, and concurrencies are configured in the server code. Maybe I can see if they can be dynamically re-configured at runtime now, which would mean they could be added to the administration > settings page.
Author
Owner

@EnochPrime commented on GitHub (Jun 1, 2023):

I just updated how jobs, handlers, queues, and concurrencies are configured in the server code. Maybe I can see if they can be dynamically re-configured at runtime now, which would mean they could be added to the administration > settings page.

Thanks for putting this in via #2622.
I will need to investigate how this helps for my deployment.

@EnochPrime commented on GitHub (Jun 1, 2023): > I just updated how jobs, handlers, queues, and concurrencies are configured in the server code. Maybe I can see if they can be dynamically re-configured at runtime now, which would mean they could be added to the administration > settings page. Thanks for putting this in via #2622. I will need to investigate how this helps for my deployment.
Author
Owner

@jrasm91 commented on GitHub (Jun 1, 2023):

Ideally you could configure less jobs to run at a time, which seems to be a cause for the timeouts.

@jrasm91 commented on GitHub (Jun 1, 2023): Ideally you could configure less jobs to run at a time, which seems to be a cause for the timeouts.
Author
Owner

@mertalev commented on GitHub (Dec 23, 2023):

I'm closing this as there doesn't seem to be any activity on this issue, and it seems to be more or less resolved by the ability to change concurrency dynamically.

@mertalev commented on GitHub (Dec 23, 2023): I'm closing this as there doesn't seem to be any activity on this issue, and it seems to be more or less resolved by the ability to change concurrency dynamically.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: immich-app/immich#742