[BUG] immich_microservices jobs handler error. #1538

Closed
opened 2026-02-05 02:17:09 +03:00 by OVERLORD · 19 comments
Owner

Originally created by @davidpan on GitHub (Oct 31, 2023).

The bug

immich_microservices jobs handler error.

There are two types of images: directly uploaded and External Library, of which the Extended Library is about 500GB and the directly uploaded one is about 50GB.

  1. Task processing error in both local and remote immich_machine_learning cases. Only on the host of the microservice there is an error message, while on the immich_machine_learning host there is no log.
  2. The RECOGNIZE FACES and ENCODE CLIP tasks that need to be used all indicate an error.
  3. After configuring the Machine Learning Settings url, restart the whole set of servers and start missing jobs again, same error.
  4. Local test environment - use official compose, verify immich_machine_learning is working properly, then open port 3003 and configure it on the server, same error.

The OS that Immich Server is running on

Ubuntu 22.04.3 LTS

Version of Immich Server

v1.83.0

Version of Immich Mobile App

v1.83.0

Platform with the issue

  • Server
  • Web
  • Mobile

Your docker-compose.yml content

https://github.com/immich-app/immich/releases/latest/download/docker-compose.yml

Your .env content

https://github.com/immich-app/immich/releases/latest/download/example.env

Reproduction steps

...

Additional information

error log:
[Nest] 7 - 10/31/2023, 1:20:33 AM ERROR [JobService] Object:
{
"id": "f631de14-e3a6-41e1-92df-4f47ae9138be"
}
[Nest] 7 - 10/31/2023, 1:20:33 AM ERROR [JobService] Unable to run job handler (recognizeFaces/recognize-faces): Error: Request for facial recognition failed with status 404: Not Found
[Nest] 7 - 10/31/2023, 1:20:33 AM ERROR [JobService] Error: Request for facial recognition failed with status 404: Not Found
at MachineLearningRepository.post (/usr/src/app/dist/infra/repositories/machine-learning.repository.js:29:19)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async PersonService.handleRecognizeFaces (/usr/src/app/dist/domain/person/person.service.js:208:23)
at async /usr/src/app/dist/domain/job/job.service.js:108:37
at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:350:28)
at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:535:24)

Originally created by @davidpan on GitHub (Oct 31, 2023). ### The bug immich_microservices jobs handler error. There are two types of images: directly uploaded and External Library, of which the Extended Library is about 500GB and the directly uploaded one is about 50GB. 1. Task processing error in both local and remote immich_machine_learning cases. Only on the host of the microservice there is an error message, while on the immich_machine_learning host there is no log. 2. The RECOGNIZE FACES and ENCODE CLIP tasks that need to be used all indicate an error. 3. After configuring the Machine Learning Settings url, restart the whole set of servers and start missing jobs again, same error. 4. Local test environment - use official compose, verify immich_machine_learning is working properly, then open port 3003 and configure it on the server, same error. ### The OS that Immich Server is running on Ubuntu 22.04.3 LTS ### Version of Immich Server v1.83.0 ### Version of Immich Mobile App v1.83.0 ### Platform with the issue - [X] Server - [ ] Web - [ ] Mobile ### Your docker-compose.yml content ```YAML https://github.com/immich-app/immich/releases/latest/download/docker-compose.yml ``` ### Your .env content ```Shell https://github.com/immich-app/immich/releases/latest/download/example.env ``` ### Reproduction steps ```bash ... ``` ### Additional information error log: [Nest] 7 - 10/31/2023, 1:20:33 AM ERROR [JobService] Object: { "id": "f631de14-e3a6-41e1-92df-4f47ae9138be" } [Nest] 7 - 10/31/2023, 1:20:33 AM ERROR [JobService] Unable to run job handler (recognizeFaces/recognize-faces): Error: Request for facial recognition failed with status 404: Not Found [Nest] 7 - 10/31/2023, 1:20:33 AM ERROR [JobService] Error: Request for facial recognition failed with status 404: Not Found at MachineLearningRepository.post (/usr/src/app/dist/infra/repositories/machine-learning.repository.js:29:19) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async PersonService.handleRecognizeFaces (/usr/src/app/dist/domain/person/person.service.js:208:23) at async /usr/src/app/dist/domain/job/job.service.js:108:37 at async Worker.processJob (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:350:28) at async Worker.retryIfFailed (/usr/src/app/node_modules/bullmq/dist/cjs/classes/worker.js:535:24)
Author
Owner

@alextran1502 commented on GitHub (Oct 31, 2023):

Im pretty sure you are encountering this issue https://github.com/immich-app/immich/issues/4117. You can find the fix in the issue

@alextran1502 commented on GitHub (Oct 31, 2023): Im pretty sure you are encountering this issue https://github.com/immich-app/immich/issues/4117. You can find the fix in the issue
Author
Owner

@davidpan commented on GitHub (Oct 31, 2023):

It's not that. Look at the test procedure I wrote for point 4, it works fine in the test environment, but the remote machine learning as a server reports the same error. The files under /cache/ I have checked.

Im pretty sure you are encountering this issue #4117. You can find the fix in the issue

@davidpan commented on GitHub (Oct 31, 2023): It's not that. Look at the test procedure I wrote for point 4, it works fine in the test environment, but the remote machine learning as a server reports the same error. The files under /cache/ I have checked. > Im pretty sure you are encountering this issue #4117. You can find the fix in the issue
Author
Owner

@davidpan commented on GitHub (Oct 31, 2023):

@alextran1502 Please see the chart below:
image

@davidpan commented on GitHub (Oct 31, 2023): @alextran1502 Please see the chart below: <img width="452" alt="image" src="https://github.com/immich-app/immich/assets/12668/a6837553-175c-4121-ba98-09a5d2aa3200">
Author
Owner

@alextran1502 commented on GitHub (Oct 31, 2023):

@davidpan where are these file located at?

@alextran1502 commented on GitHub (Oct 31, 2023): @davidpan where are these file located at?
Author
Owner

@alextran1502 commented on GitHub (Oct 31, 2023):

Can you help grabbing the log from the machine learning container?

@alextran1502 commented on GitHub (Oct 31, 2023): Can you help grabbing the log from the machine learning container?
Author
Owner

@davidpan commented on GitHub (Oct 31, 2023):

Can you help grabbing the log from the machine learning container?

[10/31/23 00:43:29] INFO Starting gunicorn 21.2.0
[10/31/23 00:43:29] INFO Listening at: http://0.0.0.0:3003 #(9)
[10/31/23 00:43:29] INFO Using worker: uvicorn.workers.UvicornWorker
[10/31/23 00:43:29] INFO Booting worker with pid: 10
[10/31/23 00:43:48] INFO Created in-memory cache with unloading disabled.
[10/31/23 00:43:48] INFO Initialized request thread pool with 12 threads.

The following logs only appeared when I tried the local jobs.

[10/31/23 01:59:25] INFO Loading clip model 'ViT-B-32::openai'
[10/31/23 01:59:25] INFO Loading image classification model
'microsoft/resnet-50'
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration.
/opt/venv/lib/python3.11/site-packages/transformers/models/convnext/feature_extraction_convnext.py:28: FutureWarning: The class ConvNextFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use ConvNextImageProcessor instead.
warnings.warn(
[10/31/23 02:03:15] INFO Loading facial recognition model 'buffalo_l'

@davidpan commented on GitHub (Oct 31, 2023): > Can you help grabbing the log from the machine learning container? [10/31/23 00:43:29] INFO Starting gunicorn 21.2.0 [10/31/23 00:43:29] INFO Listening at: http://0.0.0.0:3003 #(9) [10/31/23 00:43:29] INFO Using worker: uvicorn.workers.UvicornWorker [10/31/23 00:43:29] INFO Booting worker with pid: 10 [10/31/23 00:43:48] INFO Created in-memory cache with unloading disabled. [10/31/23 00:43:48] INFO Initialized request thread pool with 12 threads. > The following logs only appeared when I tried the local jobs. [10/31/23 01:59:25] INFO Loading clip model 'ViT-B-32::openai' [10/31/23 01:59:25] INFO Loading image classification model 'microsoft/resnet-50' Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration. Could not find image processor class in the image processor config or the model config. Loading based on pattern matching with the model's feature extractor configuration. /opt/venv/lib/python3.11/site-packages/transformers/models/convnext/feature_extraction_convnext.py:28: FutureWarning: The class ConvNextFeatureExtractor is deprecated and will be removed in version 5 of Transformers. Please use ConvNextImageProcessor instead. warnings.warn( [10/31/23 02:03:15] INFO Loading facial recognition model 'buffalo_l'
Author
Owner

@jrasm91 commented on GitHub (Oct 31, 2023):

What url are you using for the machine learning url? Can you connect to it from inside the immich microservices container?

@jrasm91 commented on GitHub (Oct 31, 2023): What url are you using for the machine learning url? Can you connect to it from inside the immich microservices container?
Author
Owner

@jrasm91 commented on GitHub (Oct 31, 2023):

If you are using the IP of the host, that does not resolve from inside a docker container.

You should pass the compose service name, container name, or add extra configuration to pass the docker gateway IP to the container.

@jrasm91 commented on GitHub (Oct 31, 2023): If you are using the IP of the host, that does not resolve from inside a docker container. You should pass the compose service name, container name, or add extra configuration to pass the docker gateway IP to the container.
Author
Owner

@davidpan commented on GitHub (Oct 31, 2023):

@davidpan where are these file located at?
/media/usb/immich, server and microserver are configured and loaded and accessible within the virtual machine.

Am I misunderstanding your question, do you mean the location of the model related files?

immich_machine_learning server /cache,Load from volume: immich_model-cache, in different docker server environments, in different locations on the docker host.

image

The reason why I think there is no problem with the model location is that in a local test environment, the locally launched immich is able to do the machine learning in question properly. I just opened the immich_machine_learning host port of the local test environment and gave it to the official environment on the remote server.

@davidpan commented on GitHub (Oct 31, 2023): > @davidpan where are these file located at? /media/usb/immich, server and microserver are configured and loaded and accessible within the virtual machine. Am I misunderstanding your question, do you mean the location of the model related files? immich_machine_learning server /cache,Load from volume: immich_model-cache, in different docker server environments, in different locations on the docker host. <img width="509" alt="image" src="https://github.com/immich-app/immich/assets/12668/c643495e-b7f1-4dab-9673-c49257042c91"> The reason why I think there is no problem with the model location is that in a local test environment, the locally launched immich is able to do the machine learning in question properly. I just opened the immich_machine_learning host port of the local test environment and gave it to the official environment on the remote server.
Author
Owner

@davidpan commented on GitHub (Oct 31, 2023):

If you are using the IP of the host, that does not resolve from inside a docker container.

You should have to pass the compose service name, container name, or add extra configuration to load the docker gateway IP to the container.

Access testing within microserver

image

setup :
image

@davidpan commented on GitHub (Oct 31, 2023): > If you are using the IP of the host, that does not resolve from inside a docker container. > > You should have to pass the compose service name, container name, or add extra configuration to load the docker gateway IP to the container. Access testing within microserver <img width="400" alt="image" src="https://github.com/immich-app/immich/assets/12668/79ca3ab5-b3d8-4d93-baf0-4acdf4e2f77e"> setup : <img width="656" alt="image" src="https://github.com/immich-app/immich/assets/12668/6750ce5f-af46-4467-bd34-2cc6a5da5948">
Author
Owner

@davidpan commented on GitHub (Oct 31, 2023):

What url are you using for the machine learning url? Can you connect to it from inside the immich microservices container?

yes,When not upgraded yet, it still works fine at v1.82.

@davidpan commented on GitHub (Oct 31, 2023): > What url are you using for the machine learning url? Can you connect to it from inside the immich microservices container? yes,When not upgraded yet, it still works fine at v1.82.
Author
Owner

@jrasm91 commented on GitHub (Oct 31, 2023):

What url are you using for the machine learning url? Can you connect to it from inside the immich microservices container?

yes,When not upgraded yet, it still works fine at v1.82.

What do you mean by this? It is a problem in 1.83 but not 1.82?

@jrasm91 commented on GitHub (Oct 31, 2023): > > What url are you using for the machine learning url? Can you connect to it from inside the immich microservices container? > > yes,When not upgraded yet, it still works fine at v1.82. What do you mean by this? It is a problem in 1.83 but not 1.82?
Author
Owner

@davidpan commented on GitHub (Oct 31, 2023):

What url are you using for the machine learning url? Can you connect to it from inside the immich microservices container?

yes,When not upgraded yet, it still works fine at v1.82.

What do you mean by this? It is a problem in 1.83 but not 1.82?

I'm not quite sure if it's a matter of upgrading or not, as I started with v1.82 and was still in the middle of the photo processing process when I saw that there was a release of v1.83 and upgraded.

Unknown correlation although the issue was discovered right after the upgrade.

In troubleshooting the issue, I rebuilt the system using a local computer and imported a small portion of photos and the machine learning portion was fine. Then mapped out port 3003 on that machine for server use and the immich_microservices host on the server reported the same error.

@davidpan commented on GitHub (Oct 31, 2023): > > > What url are you using for the machine learning url? Can you connect to it from inside the immich microservices container? > > > > > > yes,When not upgraded yet, it still works fine at v1.82. > > What do you mean by this? It is a problem in 1.83 but not 1.82? I'm not quite sure if it's a matter of upgrading or not, as I started with v1.82 and was still in the middle of the photo processing process when I saw that there was a release of v1.83 and upgraded. Unknown correlation although the issue was discovered right after the upgrade. In troubleshooting the issue, I rebuilt the system using a local computer and imported a small portion of photos and the machine learning portion was fine. Then mapped out port 3003 on that machine for server use and the immich_microservices host on the server reported the same error.
Author
Owner

@davidpan commented on GitHub (Oct 31, 2023):

Also tried to clean up redis manually to prevent leftover historical tasks.

redis-cli flushall
@davidpan commented on GitHub (Oct 31, 2023): Also tried to clean up redis manually to prevent leftover historical tasks. ``` redis-cli flushall ```
Author
Owner

@davidpan commented on GitHub (Nov 1, 2023):

@alextran1502 @jrasm91

Thanks for the previous responses.

After rebuilding the server and local environment, test verification confirmed that immich_machine_learning can now recognize faces normally. Setting the corresponding IP on the server also allows the remote machine_learning service to recognize faces normally.

However, stopping the immich_machine_learning VM on the server while the remote machine learning host is working and configured will cause the current host's CPU load to drop to zero and network transmission to cease. restart the immich_machine_learning VM and the remote service will resume again.

This can be replicated consistently.

@davidpan commented on GitHub (Nov 1, 2023): @alextran1502 @jrasm91 Thanks for the previous responses. After rebuilding the server and local environment, test verification confirmed that immich_machine_learning can now recognize faces normally. Setting the corresponding IP on the server also allows the remote machine_learning service to recognize faces normally. However, stopping the immich_machine_learning VM on the server while the remote machine learning host is working and configured will cause the current host's CPU load to drop to zero and network transmission to cease. restart the immich_machine_learning VM and the remote service will resume again. This can be replicated consistently.
Author
Owner

@jrasm91 commented on GitHub (Nov 1, 2023):

This cannot be an immich bug or issue. Immich simply sends requests to the IP/hostname provided for the machine learning endpoint. If turning off an "unrelated" container changes the behavior/availability/reachability of the target endpoint then you have some misconfiguration in your system.

@jrasm91 commented on GitHub (Nov 1, 2023): This cannot be an immich bug or issue. Immich simply sends requests to the IP/hostname provided for the machine learning endpoint. If turning off an "unrelated" container changes the behavior/availability/reachability of the target endpoint then you have some misconfiguration in your system.
Author
Owner

@mw2c commented on GitHub (Feb 7, 2024):

I faced the same issue and solved it by removing the "/" from the end of the server URL.

@mw2c commented on GitHub (Feb 7, 2024): I faced the same issue and solved it by removing the "/" from the end of the server URL.
Author
Owner

@yuanmomo commented on GitHub (Mar 29, 2024):

solved it by removing the "/" from the end of the server URL.

Thanks, this helps.

@yuanmomo commented on GitHub (Mar 29, 2024): > solved it by removing the "/" from the end of the server URL. Thanks, this helps.
Author
Owner

@tanmaychimurkar commented on GitHub (Apr 8, 2024):

I faced the same issue and solved it by removing the "/" from the end of the server URL.

I had a similar issue, and this solve the same error linked in this thread. Thank you so much 👍🏻

@tanmaychimurkar commented on GitHub (Apr 8, 2024): > I faced the same issue and solved it by removing the "/" from the end of the server URL. I had a similar issue, and this solve the same error linked in this thread. Thank you so much 👍🏻
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: immich-app/immich#1538