Facial Recognition Jobs Stuck #7272

Closed
opened 2026-02-05 12:54:43 +03:00 by OVERLORD · 10 comments
Owner

Originally created by @haoxi911 on GitHub (Sep 18, 2025).

I have searched the existing issues, both open and closed, to make sure this is not a duplicate report.

  • Yes

The bug

Recently, I noticed that the "Facial Recognition" job stuck and won't process the queue. If you restart immich and scan the missing jobs, it will queue the jobs again, process hundreds of them and then stuck again.

See screenshot. On my server, it has been like this for hours, the number wasn't changed.

Image

I tried to view the docker compose logs but it didn't give any error message there. Should we set a timeout for each "Facial Recognition" job and let it fail so that the rest of jobs in the queue can be processed?

I will try to PR if anyone can give any guidance on which part of the code could be the root cause.

The OS that Immich Server is running on

Ubuntu 22.04.5 LTS

Version of Immich Server

v1.136.0

Version of Immich Mobile App

N/A

Platform with the issue

  • Server
  • Web
  • Mobile

Device make and model

N/A

Your docker-compose.yml content

N/A

Your .env content

N/A

Reproduction steps

Uploading many photos, all other background tasks can finish without error, but "Facial Recognition" stuck.

Relevant log output


Additional information

No response

Originally created by @haoxi911 on GitHub (Sep 18, 2025). ### I have searched the existing issues, both open and closed, to make sure this is not a duplicate report. - [x] Yes ### The bug Recently, I noticed that the "Facial Recognition" job stuck and won't process the queue. If you restart immich and scan the missing jobs, it will queue the jobs again, process hundreds of them and then stuck again. See screenshot. On my server, it has been like this for hours, the number wasn't changed. <img width="558" height="294" alt="Image" src="https://github.com/user-attachments/assets/266972ed-5151-4dd1-8c51-a03dd6e57c6a" /> I tried to view the `docker compose logs` but it didn't give any error message there. Should we set a timeout for each "Facial Recognition" job and let it fail so that the rest of jobs in the queue can be processed? I will try to PR if anyone can give any guidance on which part of the code could be the root cause. ### The OS that Immich Server is running on Ubuntu 22.04.5 LTS ### Version of Immich Server v1.136.0 ### Version of Immich Mobile App N/A ### Platform with the issue - [x] Server - [ ] Web - [ ] Mobile ### Device make and model N/A ### Your docker-compose.yml content ```YAML N/A ``` ### Your .env content ```Shell N/A ``` ### Reproduction steps Uploading many photos, all other background tasks can finish without error, but "Facial Recognition" stuck. ### Relevant log output ```shell ``` ### Additional information _No response_
Author
Owner

@bo0tzz commented on GitHub (Sep 18, 2025):

Please update to the latest version. If the issue then still happens, post logs and we can reopen this.

@bo0tzz commented on GitHub (Sep 18, 2025): Please update to the latest version. If the issue then still happens, post logs and we can reopen this.
Author
Owner

@pig-sky commented on GitHub (Sep 22, 2025):

I'm also seeing this (v1.142.1). I have a very large external library, so I'm running a separate PostgreSQL server on dedicated hardware, but I was seeing it when running a containerised Postgres.

After a while, the following appears in the logs (following hundreds of lines where everything was going fine, just lots of deferring for later or not enough matches):

2025-09-22 13:51:38.391969+00:00[Nest] 7  - 09/22/2025, 2:51:38 PM   DEBUG [Microservices:PersonService] Face 623040ac-8905-4285-809d-3d3486d3fe43 has 2 matches
2025-09-22 13:51:38.392504+00:00[Nest] 7  - 09/22/2025, 2:51:38 PM   DEBUG [Microservices:PersonService] Deferring non-core face 623040ac-8905-4285-809d-3d3486d3fe43 for later processing
2025-09-22 13:51:38.397372+00:00Query failed : {
2025-09-22 13:51:38.397416+00:00durationMs: 0.5109719997271895,
2025-09-22 13:51:38.397430+00:00error: Error: write CONNECTION_DESTROYED 192.168.10.127:5432
2025-09-22 13:51:38.397443+00:00at Object.execute (/usr/src/app/server/node_modules/.pnpm/postgres@3.4.7/node_modules/postgres/cjs/src/connection.js:156:35)
2025-09-22 13:51:38.397454+00:00at Query.handler (/usr/src/app/server/node_modules/.pnpm/postgres@3.4.7/node_modules/postgres/cjs/src/index.js:230:13)
2025-09-22 13:51:38.397466+00:00at Query.handle (/usr/src/app/server/node_modules/.pnpm/postgres@3.4.7/node_modules/postgres/cjs/src/query.js:140:65)
2025-09-22 13:51:38.397477+00:00at process.processTicksAndRejections (node:internal/process/task_queues:105:5) {
2025-09-22 13:51:38.397488+00:00code: 'CONNECTION_DESTROYED',
2025-09-22 13:51:38.397499+00:00errno: 'CONNECTION_DESTROYED',
2025-09-22 13:51:38.397510+00:00address: [ '192.168.10.127' ],
2025-09-22 13:51:38.397521+00:00port: [ 5432 ]
2025-09-22 13:51:38.397532+00:00},
2025-09-22 13:51:38.397543+00:00sql: 'begin',
2025-09-22 13:51:38.397554+00:00params: []
2025-09-22 13:51:38.397565+00:00}
2025-09-22 13:51:40.383460+00:00[Nest] 19  - 09/22/2025, 2:51:40 PM   DEBUG [Api:LoggingInterceptor~zrqrwv4d] GET /api/server/ping 200 0.18ms ::1

It looks like the Postgres session is getting dropped, but there shouldn't be any network interruption since both machines are sat on the same switch and switch stats don't show any dropped packets or anything abnormal, and my postgres instance does not have of the timeout variables set.

Is Immich setting a query timeout somewhere which is too short for large libraries like these?

@pig-sky commented on GitHub (Sep 22, 2025): I'm also seeing this (v1.142.1). I have a very large external library, so I'm running a separate PostgreSQL server on dedicated hardware, but I was seeing it when running a containerised Postgres. After a while, the following appears in the logs (following hundreds of lines where everything was going fine, just lots of deferring for later or not enough matches): ```2025-09-22 13:51:38.293039+00:00[Nest] 7 - 09/22/2025, 2:51:38 PM  DEBUG [Microservices:PersonService] Deferring non-core face 92ed0f53-799e-475e-8d4b-b579f09439e0 for later processing 2025-09-22 13:51:38.391969+00:00[Nest] 7 - 09/22/2025, 2:51:38 PM  DEBUG [Microservices:PersonService] Face 623040ac-8905-4285-809d-3d3486d3fe43 has 2 matches 2025-09-22 13:51:38.392504+00:00[Nest] 7 - 09/22/2025, 2:51:38 PM  DEBUG [Microservices:PersonService] Deferring non-core face 623040ac-8905-4285-809d-3d3486d3fe43 for later processing 2025-09-22 13:51:38.397372+00:00Query failed : { 2025-09-22 13:51:38.397416+00:00durationMs: 0.5109719997271895, 2025-09-22 13:51:38.397430+00:00error: Error: write CONNECTION_DESTROYED 192.168.10.127:5432 2025-09-22 13:51:38.397443+00:00at Object.execute (/usr/src/app/server/node_modules/.pnpm/postgres@3.4.7/node_modules/postgres/cjs/src/connection.js:156:35) 2025-09-22 13:51:38.397454+00:00at Query.handler (/usr/src/app/server/node_modules/.pnpm/postgres@3.4.7/node_modules/postgres/cjs/src/index.js:230:13) 2025-09-22 13:51:38.397466+00:00at Query.handle (/usr/src/app/server/node_modules/.pnpm/postgres@3.4.7/node_modules/postgres/cjs/src/query.js:140:65) 2025-09-22 13:51:38.397477+00:00at process.processTicksAndRejections (node:internal/process/task_queues:105:5) { 2025-09-22 13:51:38.397488+00:00code: 'CONNECTION_DESTROYED', 2025-09-22 13:51:38.397499+00:00errno: 'CONNECTION_DESTROYED', 2025-09-22 13:51:38.397510+00:00address: [ '192.168.10.127' ], 2025-09-22 13:51:38.397521+00:00port: [ 5432 ] 2025-09-22 13:51:38.397532+00:00}, 2025-09-22 13:51:38.397543+00:00sql: 'begin', 2025-09-22 13:51:38.397554+00:00params: [] 2025-09-22 13:51:38.397565+00:00} 2025-09-22 13:51:40.383460+00:00[Nest] 19 - 09/22/2025, 2:51:40 PM  DEBUG [Api:LoggingInterceptor~zrqrwv4d] GET /api/server/ping 200 0.18ms ::1 ``` It looks like the Postgres session is getting dropped, but there shouldn't be any network interruption since both machines are sat on the same switch and switch stats don't show any dropped packets or anything abnormal, and my postgres instance does not have of the timeout variables set. Is Immich setting a query timeout somewhere which is too short for large libraries like these?
Author
Owner

@haoxi911 commented on GitHub (Sep 22, 2025):

@pig-sky I noticed the exactly same error logs. I am currently running Postgres in the same docker network with immich-server, so your setup should not matter.

immich_microservices     | Query failed : {
immich_microservices     |   durationMs: 0.5846300004050136,
immich_microservices     |   error: Error: write CONNECTION_DESTROYED database:5432
immich_microservices     |       at Object.execute (/usr/src/app/server/node_modules/postgres/cjs/src/connection.js:156:35)
immich_microservices     |       at Query.handler (/usr/src/app/server/node_modules/postgres/cjs/src/index.js:230:13)
immich_microservices     |       at Query.handle (/usr/src/app/server/node_modules/postgres/cjs/src/query.js:140:65)
immich_microservices     |       at process.processTicksAndRejections (node:internal/process/task_queues:105:5) {
immich_microservices     |     code: 'CONNECTION_DESTROYED',
immich_microservices     |     errno: 'CONNECTION_DESTROYED',
immich_microservices     |     address: [ 'database' ],
immich_microservices     |     port: [ 5432 ]
immich_microservices     |   },
immich_microservices     |   sql: 'select "asset_face"."id", "asset_face"."personId", "asset_face"."sourceType", (select to_json(obj) from (select "asset"."ownerId", "asset"."visibility", "asset"."fileCreatedAt" from "asset" where "asset"."id" = "asset_face"."assetId") as obj) as "asset", (select to_json(obj) from (select "face_search".* from "face_search" where "face_search"."faceId" = "asset_face"."id") as obj) as "faceSearch" from "asset_face" where "asset_face"."id" = $1 and "asset_face"."deletedAt" is null',
immich_microservices     |   params: [ '7d575221-a0e7-48b7-b1ca-f42b16d16612' ]
immich_microservices     | }
immich_microservices     | [Nest] 7  - 09/22/2025, 12:25:07 PM   ERROR [Microservices:{"id":"7d575221-a0e7-48b7-b1ca-f42b16d16612","deferred":false}] Unable to run job handler (FacialRecognition): Error: write CONNECTION_DESTROYED database:5432
immich_microservices     | Error: write CONNECTION_DESTROYED database:5432
immich_microservices     |     at Object.execute (/usr/src/app/server/node_modules/postgres/cjs/src/connection.js:156:35)
immich_microservices     |     at Query.handler (/usr/src/app/server/node_modules/postgres/cjs/src/index.js:230:13)
immich_microservices     |     at Query.handle (/usr/src/app/server/node_modules/postgres/cjs/src/query.js:140:65)
immich_microservices     |     at process.processTicksAndRejections (node:internal/process/task_queues:105:5)
immich_microservices     | [Nest] 7  - 09/22/2025, 12:25:07 PM     LOG [Microservices:PersonService] [FacialRecognition] Assigning face ce2daf13-f594-4830-8d88-f52a37cb152f to person 7e409a0e-0ab1-49ec-ac34-17b798354c94
immich_microservices     | Query failed : {
immich_microservices     |   durationMs: 0.281338999979198,
immich_microservices     |   error: Error: write CONNECTION_DESTROYED database:5432
immich_microservices     |       at Object.execute (/usr/src/app/server/node_modules/postgres/cjs/src/connection.js:156:35)
immich_microservices     |       at Query.handler (/usr/src/app/server/node_modules/postgres/cjs/src/index.js:230:13)
immich_microservices     |       at Query.handle (/usr/src/app/server/node_modules/postgres/cjs/src/query.js:140:65)
immich_microservices     |       at process.processTicksAndRejections (node:internal/process/task_queues:105:5) {
immich_microservices     |     code: 'CONNECTION_DESTROYED',
immich_microservices     |     errno: 'CONNECTION_DESTROYED',
immich_microservices     |     address: [ 'database' ],
immich_microservices     |     port: [ 5432 ]
immich_microservices     |   },
immich_microservices     |   sql: 'begin',
immich_microservices     |   params: []
immich_microservices     | }
@haoxi911 commented on GitHub (Sep 22, 2025): @pig-sky I noticed the exactly same error logs. I am currently running Postgres in the same docker network with `immich-server`, so your setup should not matter. ``` immich_microservices | Query failed : { immich_microservices | durationMs: 0.5846300004050136, immich_microservices | error: Error: write CONNECTION_DESTROYED database:5432 immich_microservices | at Object.execute (/usr/src/app/server/node_modules/postgres/cjs/src/connection.js:156:35) immich_microservices | at Query.handler (/usr/src/app/server/node_modules/postgres/cjs/src/index.js:230:13) immich_microservices | at Query.handle (/usr/src/app/server/node_modules/postgres/cjs/src/query.js:140:65) immich_microservices | at process.processTicksAndRejections (node:internal/process/task_queues:105:5) { immich_microservices | code: 'CONNECTION_DESTROYED', immich_microservices | errno: 'CONNECTION_DESTROYED', immich_microservices | address: [ 'database' ], immich_microservices | port: [ 5432 ] immich_microservices | }, immich_microservices | sql: 'select "asset_face"."id", "asset_face"."personId", "asset_face"."sourceType", (select to_json(obj) from (select "asset"."ownerId", "asset"."visibility", "asset"."fileCreatedAt" from "asset" where "asset"."id" = "asset_face"."assetId") as obj) as "asset", (select to_json(obj) from (select "face_search".* from "face_search" where "face_search"."faceId" = "asset_face"."id") as obj) as "faceSearch" from "asset_face" where "asset_face"."id" = $1 and "asset_face"."deletedAt" is null', immich_microservices | params: [ '7d575221-a0e7-48b7-b1ca-f42b16d16612' ] immich_microservices | } immich_microservices | [Nest] 7 - 09/22/2025, 12:25:07 PM  ERROR [Microservices:{"id":"7d575221-a0e7-48b7-b1ca-f42b16d16612","deferred":false}] Unable to run job handler (FacialRecognition): Error: write CONNECTION_DESTROYED database:5432 immich_microservices | Error: write CONNECTION_DESTROYED database:5432 immich_microservices | at Object.execute (/usr/src/app/server/node_modules/postgres/cjs/src/connection.js:156:35) immich_microservices | at Query.handler (/usr/src/app/server/node_modules/postgres/cjs/src/index.js:230:13) immich_microservices | at Query.handle (/usr/src/app/server/node_modules/postgres/cjs/src/query.js:140:65) immich_microservices | at process.processTicksAndRejections (node:internal/process/task_queues:105:5) immich_microservices | [Nest] 7 - 09/22/2025, 12:25:07 PM  LOG [Microservices:PersonService] [FacialRecognition] Assigning face ce2daf13-f594-4830-8d88-f52a37cb152f to person 7e409a0e-0ab1-49ec-ac34-17b798354c94 immich_microservices | Query failed : { immich_microservices | durationMs: 0.281338999979198, immich_microservices | error: Error: write CONNECTION_DESTROYED database:5432 immich_microservices | at Object.execute (/usr/src/app/server/node_modules/postgres/cjs/src/connection.js:156:35) immich_microservices | at Query.handler (/usr/src/app/server/node_modules/postgres/cjs/src/index.js:230:13) immich_microservices | at Query.handle (/usr/src/app/server/node_modules/postgres/cjs/src/query.js:140:65) immich_microservices | at process.processTicksAndRejections (node:internal/process/task_queues:105:5) { immich_microservices | code: 'CONNECTION_DESTROYED', immich_microservices | errno: 'CONNECTION_DESTROYED', immich_microservices | address: [ 'database' ], immich_microservices | port: [ 5432 ] immich_microservices | }, immich_microservices | sql: 'begin', immich_microservices | params: [] immich_microservices | } ```
Author
Owner

@haoxi911 commented on GitHub (Sep 22, 2025):

@pig-sky Do you have any ideas on how to handle the random Postgres connection errors? I feel it shall fail the job and continue the next job in the queue. I haven't figured out how to do it correctly besides running docker compose restart immich-microservices. Note: my background tasks are processed by a separate container, not immich-server.

@haoxi911 commented on GitHub (Sep 22, 2025): @pig-sky Do you have any ideas on how to handle the random Postgres connection errors? I feel it shall fail the job and continue the next job in the queue. I haven't figured out how to do it correctly besides running `docker compose restart immich-microservices`. Note: my background tasks are processed by a separate container, not `immich-server`.
Author
Owner

@haoxi911 commented on GitHub (Sep 23, 2025):

@bo0tzz I noticed here when Bullmq is initialized, the emit method returns a promise, but the message queue handler didn't await for it!!

(job) => this.eventRepository.emit('JobStart', queueName, job as JobItem)

/* I think this line should be changed to */

async (job) => await this.eventRepository.emit('JobStart', queueName, job as JobItem),

This change ensures that if any exceptions occur during the emit process, Bullmq will be informed about the issue, allowing it to automatically fail or retry the job as per the configured settings. The current code, however, prevents Bullmq from being aware of the job’s outcome.

@haoxi911 commented on GitHub (Sep 23, 2025): @bo0tzz I noticed [here](https://github.com/immich-app/immich/blob/ba0cfb76ede1fec016ad29a8b66699a191aa5cef/server/src/repositories/job.repository.ts#L92) when Bullmq is initialized, the `emit` method returns a promise, but the message queue handler didn't await for it!! ``` (job) => this.eventRepository.emit('JobStart', queueName, job as JobItem) /* I think this line should be changed to */ async (job) => await this.eventRepository.emit('JobStart', queueName, job as JobItem), ``` This change ensures that if any exceptions occur during the `emit` process, Bullmq will be informed about the issue, allowing it to automatically fail or retry the job as per the configured settings. The current code, however, prevents Bullmq from being aware of the job’s outcome.
Author
Owner

@haoxi911 commented on GitHub (Oct 1, 2025):

For anyone who has the same issue, you can simply wrap the whole handleRecognizeFaces method with try-catch, this will at least let the job keep processing.

@haoxi911 commented on GitHub (Oct 1, 2025): For anyone who has the same issue, you can simply wrap the whole `handleRecognizeFaces` method with `try-catch`, this will at least let the job keep processing.
Author
Owner

@lokorel commented on GitHub (Nov 1, 2025):

hi, I have the same problem with Recognize Faces and CONNECTION_DESTROYED session posts in the log. It's strange that the issue was closed, the problem is relevant on the latest versions.

@lokorel commented on GitHub (Nov 1, 2025): hi, I have the same problem with Recognize Faces and CONNECTION_DESTROYED session posts in the log. It's strange that the issue was closed, the problem is relevant on the latest versions.
Author
Owner

@haoxi911 commented on GitHub (Nov 1, 2025):

@lokorel FYI - I ended up with separating immich-server and immich-microservice as two containers. Then whenever this connection destroyed error raised, I will compose restart the micro service container. This makes sure the API server is always online while letting the background task queue to keep moving.

It would be great if someone can figure out the root cause of this connection destroyed error.

@haoxi911 commented on GitHub (Nov 1, 2025): @lokorel FYI - I ended up with separating immich-server and immich-microservice as two containers. Then whenever this connection destroyed error raised, I will compose restart the micro service container. This makes sure the API server is always online while letting the background task queue to keep moving. It would be great if someone can figure out the root cause of this connection destroyed error.
Author
Owner

@maldimirov commented on GitHub (Jan 5, 2026):

Had the same issue with ~105k Facial Recognition jobs stuck for a few weeks.
I just updated to all the latest docker container versions, including Immich 2.4.1 and the queue finished to 0. Can't say if it's the container restarting or something special in the new version. But the update is worth the try.

@maldimirov commented on GitHub (Jan 5, 2026): Had the same issue with ~105k Facial Recognition jobs stuck for a few weeks. I just updated to all the latest docker container versions, including Immich 2.4.1 and the queue finished to 0. Can't say if it's the container restarting or something special in the new version. But the update is worth the try.
Author
Owner

@guwidoe commented on GitHub (Jan 15, 2026):

@haoxi911 @maldimirov @lokorel @pig-sky

To everyone having this issue, feel free to try out this tool. I also found the in-built admin tools quite lacking in terms of job management and some other stuff, therefore I have decided to build this little "add-on":
https://github.com/guwidoe/immich-admintools

Feel free to raise issues and submit feature requests there.

@guwidoe commented on GitHub (Jan 15, 2026): @haoxi911 @maldimirov @lokorel @pig-sky To everyone having this issue, feel free to try out this tool. I also found the in-built admin tools quite lacking in terms of job management and some other stuff, therefore I have decided to build this little "add-on": https://github.com/guwidoe/immich-admintools Feel free to raise issues and submit feature requests there.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: immich-app/immich#7272