The openvino variant of the machine learning container responds differently to probes #2971

Closed
opened 2026-02-05 07:15:27 +03:00 by OVERLORD · 14 comments
Owner

Originally created by @djjudas21 on GitHub (Apr 19, 2024).

Originally assigned to: @mertalev on GitHub.

The bug

I have been using Immich with Kubernetes without problems, via the Helm chart. This week I have been working on enabling GPU support.

I noticed that a deployment of the image immich-app/immich-machine-learning:v1.101.0 works completely fine, but when I switch to immich-app/immich-machine-learning:v1.101.0-openvino with no other changes, that container gets into CrashLoopBackoff after the model is loaded, because the Liveness probes fail and the container get repeatedly restarted.

This issue is specific to Kubernetes, but it clearly demonstrates that there is some kind of difference in behaviour between the standard and openvino variants of the image, so I feel this issue belongs here.

The OS that Immich Server is running on

Kubernetes

Version of Immich Server

v1.101.0

Version of Immich Mobile App

1.101.0 build.147

Platform with the issue

  • Server
  • Web
  • Mobile

Your docker-compose.yml content

machine-learning:
  enabled: true
  image:
    tag: v1.101.0-openvino
  env:
    TRANSFORMERS_CACHE: /cache
  resources:
    requests:
      memory: 1500Mi
      cpu: 1
      gpu.intel.com/i915: 1
    limits:
      gpu.intel.com/i915: 1

Your .env content

N/A

Reproduction steps

1. Make a deployment of the openvino machine-learning image
2. Everything is fine
3. Upload an image to trigger the ML container to do something
4. ML container loads a model, stops responding to probes
5. Kubernetes restarts the container
6. GOTO 3

Relevant log output

[jonathan@poseidon immich]$ kubectl logs -f immich-machine-learning-696fc76fff-hg5bx
[04/19/24 13:45:03] INFO     Starting gunicorn 21.2.0                           
[04/19/24 13:45:03] INFO     Listening at: http://[::]:3003 (9)                 
[04/19/24 13:45:03] INFO     Using worker: app.config.CustomUvicornWorker       
[04/19/24 13:45:03] INFO     Booting worker with pid: 13                        
[04/19/24 13:45:06] INFO     Started server process [13]                        
[04/19/24 13:45:06] INFO     Waiting for application startup.                   
[04/19/24 13:45:06] INFO     Created in-memory cache with unloading after 300s  
                             of inactivity.                                     
[04/19/24 13:45:06] INFO     Initialized request thread pool with 4 threads.    
[04/19/24 13:45:06] INFO     Application startup complete.                      
## Everything is fine up until now. Here I upload a new image to Immich and
## as soon as the model is loaded, the container stops responding to probes, so
## Kubernetes kills it.
[04/19/24 13:48:10] INFO     Setting 'ViT-B-32__openai' execution providers to  
                             ['OpenVINOExecutionProvider',                      
                             'CPUExecutionProvider'], in descending order of    
                             preference                                         
[04/19/24 13:48:10] INFO     Setting 'buffalo_l' execution providers to         
                             ['OpenVINOExecutionProvider',                      
                             'CPUExecutionProvider'], in descending order of    
                             preference                                         
[04/19/24 13:48:10] INFO     Loading clip model 'ViT-B-32__openai' to memory    
## Container is killed shortly after here.



Events:
  Type     Reason                  Age                 From                     Message
  ----     ------                  ----                ----                     -------
  Normal   Scheduled               3m41s               default-scheduler        Successfully assigned immich/immich-machine-learning-696fc76fff-hg5bx to kube04
  Normal   SuccessfulAttachVolume  3m40s               attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-0ccd4181-398b-4d3c-ad97-bc876159b7c4"
  Warning  Unhealthy               3m35s               kubelet                  Readiness probe failed: Get "http://10.1.102.109:3003/ping": dial tcp 10.1.102.109:3003: connect: connection refused
  Warning  Unhealthy               5s (x3 over 25s)    kubelet                  Liveness probe failed: Get "http://10.1.102.109:3003/ping": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Normal   Killing                 5s                  kubelet                  Container immich-machine-learning failed liveness probe, will be restarted
  Warning  Unhealthy               4s                  kubelet                  Readiness probe failed: Get "http://10.1.102.109:3003/ping": read tcp 192.168.0.55:41504->10.1.102.109:3003: read: connection reset by peer
  Normal   Pulled                  4s (x2 over 3m35s)  kubelet                  Container image "ghcr.io/immich-app/immich-machine-learning:v1.101.0-openvino" already present on machine
  Normal   Created                 4s (x2 over 3m35s)  kubelet                  Created container immich-machine-learning
  Normal   Started                 4s (x2 over 3m35s)  kubelet                  Started container immich-machine-learning
  Warning  Unhealthy               1s (x7 over 3m33s)  kubelet                  Readiness probe failed: Get "http://10.1.102.109:3003/ping": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

Additional information

No response

Originally created by @djjudas21 on GitHub (Apr 19, 2024). Originally assigned to: @mertalev on GitHub. ### The bug I have been using Immich with Kubernetes without problems, via the Helm chart. This week I have been working on enabling GPU support. I noticed that a deployment of the image `immich-app/immich-machine-learning:v1.101.0` works completely fine, but when I switch to `immich-app/immich-machine-learning:v1.101.0-openvino` with no other changes, that container gets into CrashLoopBackoff after the model is loaded, because the Liveness probes fail and the container get repeatedly restarted. This issue is specific to Kubernetes, but it clearly demonstrates that there is some kind of difference in behaviour between the standard and openvino variants of the image, so I feel this issue belongs here. ### The OS that Immich Server is running on Kubernetes ### Version of Immich Server v1.101.0 ### Version of Immich Mobile App 1.101.0 build.147 ### Platform with the issue - [X] Server - [ ] Web - [ ] Mobile ### Your docker-compose.yml content ```YAML machine-learning: enabled: true image: tag: v1.101.0-openvino env: TRANSFORMERS_CACHE: /cache resources: requests: memory: 1500Mi cpu: 1 gpu.intel.com/i915: 1 limits: gpu.intel.com/i915: 1 ``` ### Your .env content ```Shell N/A ``` ### Reproduction steps ```bash 1. Make a deployment of the openvino machine-learning image 2. Everything is fine 3. Upload an image to trigger the ML container to do something 4. ML container loads a model, stops responding to probes 5. Kubernetes restarts the container 6. GOTO 3 ``` ### Relevant log output ```shell [jonathan@poseidon immich]$ kubectl logs -f immich-machine-learning-696fc76fff-hg5bx [04/19/24 13:45:03] INFO Starting gunicorn 21.2.0 [04/19/24 13:45:03] INFO Listening at: http://[::]:3003 (9) [04/19/24 13:45:03] INFO Using worker: app.config.CustomUvicornWorker [04/19/24 13:45:03] INFO Booting worker with pid: 13 [04/19/24 13:45:06] INFO Started server process [13] [04/19/24 13:45:06] INFO Waiting for application startup. [04/19/24 13:45:06] INFO Created in-memory cache with unloading after 300s of inactivity. [04/19/24 13:45:06] INFO Initialized request thread pool with 4 threads. [04/19/24 13:45:06] INFO Application startup complete. ## Everything is fine up until now. Here I upload a new image to Immich and ## as soon as the model is loaded, the container stops responding to probes, so ## Kubernetes kills it. [04/19/24 13:48:10] INFO Setting 'ViT-B-32__openai' execution providers to ['OpenVINOExecutionProvider', 'CPUExecutionProvider'], in descending order of preference [04/19/24 13:48:10] INFO Setting 'buffalo_l' execution providers to ['OpenVINOExecutionProvider', 'CPUExecutionProvider'], in descending order of preference [04/19/24 13:48:10] INFO Loading clip model 'ViT-B-32__openai' to memory ## Container is killed shortly after here. Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 3m41s default-scheduler Successfully assigned immich/immich-machine-learning-696fc76fff-hg5bx to kube04 Normal SuccessfulAttachVolume 3m40s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-0ccd4181-398b-4d3c-ad97-bc876159b7c4" Warning Unhealthy 3m35s kubelet Readiness probe failed: Get "http://10.1.102.109:3003/ping": dial tcp 10.1.102.109:3003: connect: connection refused Warning Unhealthy 5s (x3 over 25s) kubelet Liveness probe failed: Get "http://10.1.102.109:3003/ping": context deadline exceeded (Client.Timeout exceeded while awaiting headers) Normal Killing 5s kubelet Container immich-machine-learning failed liveness probe, will be restarted Warning Unhealthy 4s kubelet Readiness probe failed: Get "http://10.1.102.109:3003/ping": read tcp 192.168.0.55:41504->10.1.102.109:3003: read: connection reset by peer Normal Pulled 4s (x2 over 3m35s) kubelet Container image "ghcr.io/immich-app/immich-machine-learning:v1.101.0-openvino" already present on machine Normal Created 4s (x2 over 3m35s) kubelet Created container immich-machine-learning Normal Started 4s (x2 over 3m35s) kubelet Started container immich-machine-learning Warning Unhealthy 1s (x7 over 3m33s) kubelet Readiness probe failed: Get "http://10.1.102.109:3003/ping": context deadline exceeded (Client.Timeout exceeded while awaiting headers) ``` ### Additional information _No response_
Author
Owner

@bo0tzz commented on GitHub (Apr 19, 2024):

I think the probes for the ML deployment don't quite work right anyways, and if a model takes too long to load (or if it's being downloaded at startup through the preload env var) I've had my non-openvino instances occasionally get killed as well. It could be that the openvino image just takes slightly longer to load things, thus exposing that issue?

@bo0tzz commented on GitHub (Apr 19, 2024): I think the probes for the ML deployment don't quite work right anyways, and if a model takes too long to load (or if it's being downloaded at startup through the preload env var) I've had my non-openvino instances occasionally get killed as well. It could be that the openvino image just takes slightly longer to load things, thus exposing that issue?
Author
Owner

@djjudas21 commented on GitHub (Apr 19, 2024):

I'll have a play with the probes and see if I can get it more stable. The default values look pretty aggressive IMO. Timeout of 1s, period of 10s. If a probe fails, it gets retried immediately - so the liveness probe could actually kill the container after just 3 seconds, which is insane.

    livenessProbe:
      failureThreshold: 3
      httpGet:
        path: /ping
        port: http
        scheme: HTTP
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 1
@djjudas21 commented on GitHub (Apr 19, 2024): I'll have a play with the probes and see if I can get it more stable. The default values look pretty aggressive IMO. Timeout of 1s, period of 10s. If a probe fails, it gets retried immediately - so the liveness probe could actually kill the container after just 3 seconds, which is insane. ```yaml livenessProbe: failureThreshold: 3 httpGet: path: /ping port: http scheme: HTTP periodSeconds: 10 successThreshold: 1 timeoutSeconds: 1 ```
Author
Owner

@djjudas21 commented on GitHub (Apr 19, 2024):

Eww, they're hard-coded... https://github.com/immich-app/immich-charts/blob/main/charts/immich/templates/machine-learning.yaml#L28

@djjudas21 commented on GitHub (Apr 19, 2024): Eww, they're hard-coded... https://github.com/immich-app/immich-charts/blob/main/charts/immich/templates/machine-learning.yaml#L28
Author
Owner

@djjudas21 commented on GitHub (Apr 19, 2024):

OK so I manually configured the probes to have longer timeouts and a great tolerance of failed probes. It didn't get killed by Kubernetes, but it crashed in a nasty way. I'm well out of my depth with this

[jonathan@poseidon immich]$ kubectl logs -f immich-machine-learning-78f87fcc4b-5s84r
[04/19/24 15:18:49] INFO     Starting gunicorn 21.2.0                           
[04/19/24 15:18:49] INFO     Listening at: http://[::]:3003 (10)                
[04/19/24 15:18:49] INFO     Using worker: app.config.CustomUvicornWorker       
[04/19/24 15:18:49] INFO     Booting worker with pid: 14                        
[04/19/24 15:18:52] INFO     Started server process [14]                        
[04/19/24 15:18:52] INFO     Waiting for application startup.                   
[04/19/24 15:18:52] INFO     Created in-memory cache with unloading after 300s  
                             of inactivity.                                     
[04/19/24 15:18:52] INFO     Initialized request thread pool with 4 threads.    
[04/19/24 15:18:52] INFO     Application startup complete.                      
[04/19/24 15:18:54] INFO     Setting 'ViT-B-32__openai' execution providers to  
                             ['OpenVINOExecutionProvider',                      
                             'CPUExecutionProvider'], in descending order of    
                             preference                                         
[04/19/24 15:18:54] INFO     Loading clip model 'ViT-B-32__openai' to memory    
[04/19/24 15:18:54] INFO     Setting 'buffalo_l' execution providers to         
                             ['OpenVINOExecutionProvider',                      
                             'CPUExecutionProvider'], in descending order of    
                             preference                                         
[04/19/24 15:20:19] INFO     Loading facial recognition model 'buffalo_l' to    
                             memory                                             
[04/19/24 15:21:11] ERROR    Exception in ASGI application                      
                                                                                
                             ╭─────── Traceback (most recent call last) ───────╮
                             │ /usr/src/app/main.py:118 in predict             │
                             │                                                 │
                             │   115 │                                         │
                             │   116 │   model = await load(await model_cache. │
                             │       ttl=settings.model_ttl, **kwargs))        │
                             │   117 │   model.configure(**kwargs)             │
                             │ ❱ 118 │   outputs = await run(model.predict, in │
                             │   119 │   return ORJSONResponse(outputs)        │
                             │   120                                           │
                             │   121                                           │
                             │                                                 │
                             │ /usr/src/app/main.py:125 in run                 │
                             │                                                 │
                             │   122 async def run(func: Callable[..., Any], i │
                             │   123 │   if thread_pool is None:               │
                             │   124 │   │   return func(inputs)               │
                             │ ❱ 125 │   return await asyncio.get_running_loop │
                             │   126                                           │
                             │   127                                           │
                             │   128 async def load(model: InferenceModel) ->  │
                             │                                                 │
                             │ /usr/lib/python3.10/concurrent/futures/thread.p │
                             │ y:58 in run                                     │
                             │                                                 │
                             │ /usr/src/app/models/base.py:59 in predict       │
                             │                                                 │
                             │    56 │   │   self.load()                       │
                             │    57 │   │   if model_kwargs:                  │
                             │    58 │   │   │   self.configure(**model_kwargs │
                             │ ❱  59 │   │   return self._predict(inputs)      │
                             │    60 │                                         │
                             │    61 │   @abstractmethod                       │
                             │    62 │   def _predict(self, inputs: Any) -> An │
                             │                                                 │
                             │ /usr/src/app/models/facial_recognition.py:49 in │
                             │ _predict                                        │
                             │                                                 │
                             │   46 │   │   else:                              │
                             │   47 │   │   │   decoded_image = image          │
                             │   48 │   │   assert is_ndarray(decoded_image, n │
                             │ ❱ 49 │   │   bboxes, kpss = self.det_model.dete │
                             │   50 │   │   if bboxes.size == 0:               │
                             │   51 │   │   │   return []                      │
                             │   52 │   │   assert is_ndarray(kpss, np.float32 │
                             │                                                 │
                             │ /opt/venv/lib/python3.10/site-packages/insightf │
                             │ ace/model_zoo/retinaface.py:224 in detect       │
                             │                                                 │
                             │   221 │   │   det_img = np.zeros( (input_size[1 │
                             │   222 │   │   det_img[:new_height, :new_width,  │
                             │   223 │   │                                     │
                             │ ❱ 224 │   │   scores_list, bboxes_list, kpss_li │
                             │   225 │   │                                     │
                             │   226 │   │   scores = np.vstack(scores_list)   │
                             │   227 │   │   scores_ravel = scores.ravel()     │
                             │                                                 │
                             │ /opt/venv/lib/python3.10/site-packages/insightf │
                             │ ace/model_zoo/retinaface.py:152 in forward      │
                             │                                                 │
                             │   149 │   │   kpss_list = []                    │
                             │   150 │   │   input_size = tuple(img.shape[0:2] │
                             │   151 │   │   blob = cv2.dnn.blobFromImage(img, │
                             │       (self.input_mean, self.input_mean, self.i │
                             │ ❱ 152 │   │   net_outs = self.session.run(self. │
                             │   153 │   │                                     │
                             │   154 │   │   input_height = blob.shape[2]      │
                             │   155 │   │   input_width = blob.shape[3]       │
                             │                                                 │
                             │ /opt/venv/lib/python3.10/site-packages/onnxrunt │
                             │ ime/capi/onnxruntime_inference_collection.py:22 │
                             │ 0 in run                                        │
                             │                                                 │
                             │    217 │   │   if not output_names:             │
                             │    218 │   │   │   output_names = [output.name  │
                             │    219 │   │   try:                             │
                             │ ❱  220 │   │   │   return self._sess.run(output │
                             │    221 │   │   except C.EPFail as err:          │
                             │    222 │   │   │   if self._enable_fallback:    │
                             │    223 │   │   │   │   print(f"EP Error: {err!s │
                             ╰─────────────────────────────────────────────────╯
                             RuntimeException: [ONNXRuntimeError] : 6 :         
                             RUNTIME_EXCEPTION : Encountered unknown exception  
                             in Run()                                           
@djjudas21 commented on GitHub (Apr 19, 2024): OK so I manually configured the probes to have longer timeouts and a great tolerance of failed probes. It didn't get killed by Kubernetes, but it crashed in a nasty way. I'm well out of my depth with this ``` [jonathan@poseidon immich]$ kubectl logs -f immich-machine-learning-78f87fcc4b-5s84r [04/19/24 15:18:49] INFO Starting gunicorn 21.2.0 [04/19/24 15:18:49] INFO Listening at: http://[::]:3003 (10) [04/19/24 15:18:49] INFO Using worker: app.config.CustomUvicornWorker [04/19/24 15:18:49] INFO Booting worker with pid: 14 [04/19/24 15:18:52] INFO Started server process [14] [04/19/24 15:18:52] INFO Waiting for application startup. [04/19/24 15:18:52] INFO Created in-memory cache with unloading after 300s of inactivity. [04/19/24 15:18:52] INFO Initialized request thread pool with 4 threads. [04/19/24 15:18:52] INFO Application startup complete. [04/19/24 15:18:54] INFO Setting 'ViT-B-32__openai' execution providers to ['OpenVINOExecutionProvider', 'CPUExecutionProvider'], in descending order of preference [04/19/24 15:18:54] INFO Loading clip model 'ViT-B-32__openai' to memory [04/19/24 15:18:54] INFO Setting 'buffalo_l' execution providers to ['OpenVINOExecutionProvider', 'CPUExecutionProvider'], in descending order of preference [04/19/24 15:20:19] INFO Loading facial recognition model 'buffalo_l' to memory [04/19/24 15:21:11] ERROR Exception in ASGI application ╭─────── Traceback (most recent call last) ───────╮ │ /usr/src/app/main.py:118 in predict │ │ │ │ 115 │ │ │ 116 │ model = await load(await model_cache. │ │ ttl=settings.model_ttl, **kwargs)) │ │ 117 │ model.configure(**kwargs) │ │ ❱ 118 │ outputs = await run(model.predict, in │ │ 119 │ return ORJSONResponse(outputs) │ │ 120 │ │ 121 │ │ │ │ /usr/src/app/main.py:125 in run │ │ │ │ 122 async def run(func: Callable[..., Any], i │ │ 123 │ if thread_pool is None: │ │ 124 │ │ return func(inputs) │ │ ❱ 125 │ return await asyncio.get_running_loop │ │ 126 │ │ 127 │ │ 128 async def load(model: InferenceModel) -> │ │ │ │ /usr/lib/python3.10/concurrent/futures/thread.p │ │ y:58 in run │ │ │ │ /usr/src/app/models/base.py:59 in predict │ │ │ │ 56 │ │ self.load() │ │ 57 │ │ if model_kwargs: │ │ 58 │ │ │ self.configure(**model_kwargs │ │ ❱ 59 │ │ return self._predict(inputs) │ │ 60 │ │ │ 61 │ @abstractmethod │ │ 62 │ def _predict(self, inputs: Any) -> An │ │ │ │ /usr/src/app/models/facial_recognition.py:49 in │ │ _predict │ │ │ │ 46 │ │ else: │ │ 47 │ │ │ decoded_image = image │ │ 48 │ │ assert is_ndarray(decoded_image, n │ │ ❱ 49 │ │ bboxes, kpss = self.det_model.dete │ │ 50 │ │ if bboxes.size == 0: │ │ 51 │ │ │ return [] │ │ 52 │ │ assert is_ndarray(kpss, np.float32 │ │ │ │ /opt/venv/lib/python3.10/site-packages/insightf │ │ ace/model_zoo/retinaface.py:224 in detect │ │ │ │ 221 │ │ det_img = np.zeros( (input_size[1 │ │ 222 │ │ det_img[:new_height, :new_width, │ │ 223 │ │ │ │ ❱ 224 │ │ scores_list, bboxes_list, kpss_li │ │ 225 │ │ │ │ 226 │ │ scores = np.vstack(scores_list) │ │ 227 │ │ scores_ravel = scores.ravel() │ │ │ │ /opt/venv/lib/python3.10/site-packages/insightf │ │ ace/model_zoo/retinaface.py:152 in forward │ │ │ │ 149 │ │ kpss_list = [] │ │ 150 │ │ input_size = tuple(img.shape[0:2] │ │ 151 │ │ blob = cv2.dnn.blobFromImage(img, │ │ (self.input_mean, self.input_mean, self.i │ │ ❱ 152 │ │ net_outs = self.session.run(self. │ │ 153 │ │ │ │ 154 │ │ input_height = blob.shape[2] │ │ 155 │ │ input_width = blob.shape[3] │ │ │ │ /opt/venv/lib/python3.10/site-packages/onnxrunt │ │ ime/capi/onnxruntime_inference_collection.py:22 │ │ 0 in run │ │ │ │ 217 │ │ if not output_names: │ │ 218 │ │ │ output_names = [output.name │ │ 219 │ │ try: │ │ ❱ 220 │ │ │ return self._sess.run(output │ │ 221 │ │ except C.EPFail as err: │ │ 222 │ │ │ if self._enable_fallback: │ │ 223 │ │ │ │ print(f"EP Error: {err!s │ ╰─────────────────────────────────────────────────╯ RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Encountered unknown exception in Run() ```
Author
Owner

@mertalev commented on GitHub (Apr 19, 2024):

For the first issue, models when using OpenVINO take some time to start on the first load since they're compiled to that format. But everything is supposed to happen in a separate thread pool without blocking the main thread, so I'm not sure what's causing this to be blocking.

For the second issue, OpenVINO is buggy and doesn't currently work with facial recognition, see #8226.

@mertalev commented on GitHub (Apr 19, 2024): For the first issue, models when using OpenVINO take some time to start on the first load since they're compiled to that format. But everything is supposed to happen in a separate thread pool without blocking the main thread, so I'm not sure what's causing this to be blocking. For the second issue, OpenVINO is buggy and doesn't currently work with facial recognition, see #8226.
Author
Owner

@djjudas21 commented on GitHub (Apr 19, 2024):

Thanks for clarifying about the OpenVINO bug, I'm following that one now.

The startup time for the machine learning container isn't straightforward though. If it immediately loaded the models when the container started, that would be fine because you can define a startupProbe in Kubernetes to protect the container during this period. But instead, the container starts up quickly and the model is not loaded until a request for ML comes in from Immich, which may happen a long time after the container has started, so there is no way to work around this in Kubernetes, other than to make the probes very generous.

I'm no software engineer, but it does sound odd that the main thread gets blocked.

In my case, I have no choice but to disable GPU acceleration for ML if OpenVINO doesn't support it, but I'll leave this issue open because it sounds like there is some investigation to be done with the thread pools, as @bo0tzz said that also affects the non-openvino container.

Thanks for your help, @mertalev. Have a nice weekend!

@djjudas21 commented on GitHub (Apr 19, 2024): Thanks for clarifying about the OpenVINO bug, I'm following that one now. The startup time for the machine learning container isn't straightforward though. If it immediately loaded the models when the container started, that would be fine because you can define a startupProbe in Kubernetes to protect the container during this period. But instead, the container starts up quickly and the model is not loaded until a request for ML comes in from Immich, which may happen a long time after the container has started, so there is no way to work around this in Kubernetes, other than to make the probes very generous. I'm no software engineer, but it does sound odd that the main thread gets blocked. In my case, I have no choice but to disable GPU acceleration for ML if OpenVINO doesn't support it, but I'll leave this issue open because it sounds like there is some investigation to be done with the thread pools, as @bo0tzz said that also affects the non-openvino container. Thanks for your help, @mertalev. Have a nice weekend!
Author
Owner

@mertalev commented on GitHub (Apr 19, 2024):

There are envs to preload certain models if it helps: setting MACHINE_LEARNING_PRELOAD__CLIP=ViT-B-32__openai and MACHINE_LEARNING_PRELOAD__FACIAL_RECOGNITION=buffalo_l will eagerly load those models at startup without waiting for a request.

@mertalev commented on GitHub (Apr 19, 2024): There are envs to preload certain models if it helps: setting `MACHINE_LEARNING_PRELOAD__CLIP=ViT-B-32__openai` and `MACHINE_LEARNING_PRELOAD__FACIAL_RECOGNITION=buffalo_l` will eagerly load those models at startup without waiting for a request.
Author
Owner

@mertalev commented on GitHub (Jun 19, 2024):

I think this is because of the Python GIL. When it compiles to OpenVINO, it holds onto the GIL and prevents other threads from executing. That means it can't respond to probes during that time. Not sure what I can do about that short of putting it in a subprocess or something.

@mertalev commented on GitHub (Jun 19, 2024): I think this is because of the Python GIL. When it compiles to OpenVINO, it holds onto the GIL and prevents other threads from executing. That means it can't respond to probes during that time. Not sure what I can do about that short of putting it in a subprocess or something.
Author
Owner

@djjudas21 commented on GitHub (Aug 6, 2024):

I can confirm this is still a problem. With #8226 now resolved, I had another crack at enabling hardware acceleration for my ML. I'm using the immich-machine-learning:main-openvino image. It starts up properly and appears stable at idle, but once the first ML job is started, the container stops responding and gets killed.

[jonathan@latitude immich]$ kubectl logs -f immich-machine-learning-66cbb9cbb8-n4wqt
[08/06/24 16:23:27] INFO     Starting gunicorn 22.0.0                           
[08/06/24 16:23:27] INFO     Listening at: http://[::]:3003 (9)                 
[08/06/24 16:23:27] INFO     Using worker: app.config.CustomUvicornWorker       
[08/06/24 16:23:27] INFO     Booting worker with pid: 10                        
[08/06/24 16:23:32] INFO     Started server process [10]                        
[08/06/24 16:23:32] INFO     Waiting for application startup.                   
[08/06/24 16:23:32] INFO     Created in-memory cache with unloading after 300s  
                             of inactivity.                                     
[08/06/24 16:23:32] INFO     Initialized request thread pool with 4 threads.    
[08/06/24 16:23:32] INFO     Application startup complete.                      
[08/06/24 16:58:04] INFO     Loading visual model 'ViT-B-32__openai' to memory  
[08/06/24 16:58:04] INFO     Setting execution providers to                     
                             ['OpenVINOExecutionProvider',                      
                             'CPUExecutionProvider'], in descending order of    
                             preference                                         
Events:
  Type     Reason     Age                From     Message
  ----     ------     ----               ----     -------
  Normal   Created    31s (x2 over 35m)  kubelet  Created container immich-machine-learning
  Warning  Unhealthy  31s (x3 over 51s)  kubelet  Liveness probe failed: Get "http://10.1.199.105:3003/ping": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
  Normal   Killing    31s                kubelet  Container immich-machine-learning failed liveness probe, will be restarted
  Warning  Unhealthy  31s                kubelet  Readiness probe failed: Get "http://10.1.199.105:3003/ping": read tcp 192.168.0.57:59722->10.1.199.105:3003: read: connection reset by peer
  Normal   Pulled     31s                kubelet  Container image "ghcr.io/immich-app/immich-machine-learning:main-openvino" already present on machine
  Normal   Started    30s (x2 over 35m)  kubelet  Started container immich-machine-learning
  Warning  Unhealthy  30s (x3 over 35m)  kubelet  Readiness probe failed: Get "http://10.1.199.105:3003/ping": dial tcp 10.1.199.105:3003: connect: connection refused
  Warning  Unhealthy  28s (x5 over 35m)  kubelet  Readiness probe failed: Get "http://10.1.199.105:3003/ping": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
@djjudas21 commented on GitHub (Aug 6, 2024): I can confirm this is still a problem. With #8226 now resolved, I had another crack at enabling hardware acceleration for my ML. I'm using the `immich-machine-learning:main-openvino` image. It starts up properly and appears stable at idle, but once the first ML job is started, the container stops responding and gets killed. ``` [jonathan@latitude immich]$ kubectl logs -f immich-machine-learning-66cbb9cbb8-n4wqt [08/06/24 16:23:27] INFO Starting gunicorn 22.0.0 [08/06/24 16:23:27] INFO Listening at: http://[::]:3003 (9) [08/06/24 16:23:27] INFO Using worker: app.config.CustomUvicornWorker [08/06/24 16:23:27] INFO Booting worker with pid: 10 [08/06/24 16:23:32] INFO Started server process [10] [08/06/24 16:23:32] INFO Waiting for application startup. [08/06/24 16:23:32] INFO Created in-memory cache with unloading after 300s of inactivity. [08/06/24 16:23:32] INFO Initialized request thread pool with 4 threads. [08/06/24 16:23:32] INFO Application startup complete. [08/06/24 16:58:04] INFO Loading visual model 'ViT-B-32__openai' to memory [08/06/24 16:58:04] INFO Setting execution providers to ['OpenVINOExecutionProvider', 'CPUExecutionProvider'], in descending order of preference ``` ``` Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Created 31s (x2 over 35m) kubelet Created container immich-machine-learning Warning Unhealthy 31s (x3 over 51s) kubelet Liveness probe failed: Get "http://10.1.199.105:3003/ping": context deadline exceeded (Client.Timeout exceeded while awaiting headers) Normal Killing 31s kubelet Container immich-machine-learning failed liveness probe, will be restarted Warning Unhealthy 31s kubelet Readiness probe failed: Get "http://10.1.199.105:3003/ping": read tcp 192.168.0.57:59722->10.1.199.105:3003: read: connection reset by peer Normal Pulled 31s kubelet Container image "ghcr.io/immich-app/immich-machine-learning:main-openvino" already present on machine Normal Started 30s (x2 over 35m) kubelet Started container immich-machine-learning Warning Unhealthy 30s (x3 over 35m) kubelet Readiness probe failed: Get "http://10.1.199.105:3003/ping": dial tcp 10.1.199.105:3003: connect: connection refused Warning Unhealthy 28s (x5 over 35m) kubelet Readiness probe failed: Get "http://10.1.199.105:3003/ping": context deadline exceeded (Client.Timeout exceeded while awaiting headers) ```
Author
Owner

@djjudas21 commented on GitHub (Aug 6, 2024):

Starting up with MACHINE_LEARNING_PRELOAD__CLIP=ViT-B-32__openai and MACHINE_LEARNING_PRELOAD__FACIAL_RECOGNITION=buffalo_l does indeed load the models up front, but that just causes the container to fail its probes earlier

@djjudas21 commented on GitHub (Aug 6, 2024): Starting up with `MACHINE_LEARNING_PRELOAD__CLIP=ViT-B-32__openai` and `MACHINE_LEARNING_PRELOAD__FACIAL_RECOGNITION=buffalo_l` does indeed load the models up front, but that just causes the container to fail its probes earlier
Author
Owner

@mertalev commented on GitHub (Aug 6, 2024):

The models only take that long the first time they're compiled. You can preload them and set a really high timeout for the probes to let it get compiled.

@mertalev commented on GitHub (Aug 6, 2024): The models only take that long the first time they're compiled. You can preload them and set a really high timeout for the probes to let it get compiled.
Author
Owner

@djjudas21 commented on GitHub (Aug 6, 2024):

Good idea. Unfortunately that brings me to a second issue with the Helm chart where the startupProbe is hardcoded to false but I have added initialDelaySeconds to the liveness and readiness probes.

Here's my full, working ML block for a Kubernetes deployment via Helm chart, because examples/documentation seem a bit lacking for this 🙂

machine-learning:
  enabled: true
  image:
    # Specify OpenVINO image variant
    tag: main-openvino
  env:
    TRANSFORMERS_CACHE: /cache
    # Load ML models up front
    MACHINE_LEARNING_PRELOAD__CLIP: ViT-B-32__openai
    MACHINE_LEARNING_PRELOAD__FACIAL_RECOGNITION: buffalo_l
  resources:
    requests:
      memory: 1500Mi
      cpu: 1
      gpu.intel.com/i915: 1
    limits:
      gpu.intel.com/i915: 1
  # Override probes to allow slow ML startup
  probes:
    liveness:
      spec:
        initialDelaySeconds: 120
    readiness:
      spec:
        initialDelaySeconds: 120
  persistence:
    cache:
      enabled: true
      size: 10Gi
      # Optional: Set this to pvc to avoid downloading the ML models every start.
      type: pvc
      accessMode: ReadWriteMany
      storageClass: truenas
@djjudas21 commented on GitHub (Aug 6, 2024): Good idea. Unfortunately that brings me to a second issue with the Helm chart where [the `startupProbe` is hardcoded to false](https://github.com/immich-app/immich-charts/blob/main/charts/immich/templates/machine-learning.yaml#L17) but I have added `initialDelaySeconds` to the liveness and readiness probes. Here's my full, working ML block for a Kubernetes deployment via Helm chart, because examples/documentation seem a bit lacking for this :slightly_smiling_face: ```yaml machine-learning: enabled: true image: # Specify OpenVINO image variant tag: main-openvino env: TRANSFORMERS_CACHE: /cache # Load ML models up front MACHINE_LEARNING_PRELOAD__CLIP: ViT-B-32__openai MACHINE_LEARNING_PRELOAD__FACIAL_RECOGNITION: buffalo_l resources: requests: memory: 1500Mi cpu: 1 gpu.intel.com/i915: 1 limits: gpu.intel.com/i915: 1 # Override probes to allow slow ML startup probes: liveness: spec: initialDelaySeconds: 120 readiness: spec: initialDelaySeconds: 120 persistence: cache: enabled: true size: 10Gi # Optional: Set this to pvc to avoid downloading the ML models every start. type: pvc accessMode: ReadWriteMany storageClass: truenas ```
Author
Owner

@hranicka commented on GitHub (Apr 23, 2025):

Having the same issues. Probes are not responding an the pods are being shut down. Using official Helm charts and image v1.131.3-openvino.

As a workaround, after a deployment, I had to patch the deployment:

kubectl patch deployment immich-machine-learning -n immich \
  --type json \
  -p='[
    {"op":"remove","path":"/spec/template/spec/containers/0/livenessProbe"},
    {"op":"remove","path":"/spec/template/spec/containers/0/readinessProbe"},
    {"op":"remove","path":"/spec/template/spec/containers/0/startupProbe"}
  ]'

Setting this in values.yaml did not help:

  image:
    repository: ghcr.io/immich-app/immich-machine-learning
    tag: v1.131.3-openvino

  probes:
    liveness:
      enabled: false
    readiness:
      enabled: false
    startup:
      enabled: false
@hranicka commented on GitHub (Apr 23, 2025): Having the same issues. Probes are not responding an the pods are being shut down. Using official Helm charts and image `v1.131.3-openvino`. As a workaround, after a deployment, I had to patch the deployment: ```shell kubectl patch deployment immich-machine-learning -n immich \ --type json \ -p='[ {"op":"remove","path":"/spec/template/spec/containers/0/livenessProbe"}, {"op":"remove","path":"/spec/template/spec/containers/0/readinessProbe"}, {"op":"remove","path":"/spec/template/spec/containers/0/startupProbe"} ]' ``` Setting this in `values.yaml` did not help: ```yaml image: repository: ghcr.io/immich-app/immich-machine-learning tag: v1.131.3-openvino probes: liveness: enabled: false readiness: enabled: false startup: enabled: false ```
Author
Owner

@jrasm91 commented on GitHub (Sep 19, 2025):

Sounds like this issue is essentially resolved, and that potentially a separate issue should be opened in https://github.com/immich-app/immich-charts for probe configuration or default values for the machine learning pod.

@jrasm91 commented on GitHub (Sep 19, 2025): Sounds like this issue is essentially resolved, and that potentially a separate issue should be opened in https://github.com/immich-app/immich-charts for probe configuration or default values for the machine learning pod.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: immich-app/immich#2971