[BUG] Machinelearning crashing on k8s deployment v1.57.1 - v1.65.0 #862

Closed
opened 2026-02-04 23:06:59 +03:00 by OVERLORD · 28 comments
Owner

Originally created by @gcarrarom on GitHub (May 19, 2023).

The bug

There's a bug when using version 1.56.1 on kubernetes using the official helm chart:
zsh ⌁ klf immich-machine-learning-54b5766488-kx4b4
Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/insightface/init.py", line 8, in
import onnxruntime
File "/opt/venv/lib/python3.10/site-packages/onnxruntime/init.py", line 55, in
raise import_capi_exception
File "/opt/venv/lib/python3.10/site-packages/onnxruntime/init.py", line 23, in
from onnxruntime.capi._pybind_state import (
File "/opt/venv/lib/python3.10/site-packages/onnxruntime/capi/_pybind_state.py", line 33, in
from .onnxruntime_pybind11_state import * # noqa
ImportError: /opt/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-310-x86_64-linux-gnu.so: cannot enable executable stack as shared object requires: Permission denied

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/usr/src/app/src/main.py", line 6, in
from insightface.app import FaceAnalysis
File "/opt/venv/lib/python3.10/site-packages/insightface/init.py", line 10, in
raise ImportError(
ImportError: Unable to import dependency onnxruntime.

The OS that Immich Server is running on

Kubernetes - k3s - MicroOS

Version of Immich Server

v1.56.1

Version of Immich Mobile App

N/A

Platform with the issue

  • Server
  • Web
  • Mobile

Your docker-compose.yml content

postgresql:
      enabled: true
redis:
  enabled: true

typesense:
  enabled: true
  persistence:
    tsdata:
      enabled: true
      existingClaim: typesense-data

machine-learning:
  persistence:
    cache:
      enabled: true
      existingClaim: machinelearning-data
proxy:
  ingress:
    main:
      enabled: true
      ingressClassName: nginx
      annotations:
        nginx.ingress.kubernetes.io/proxy-body-size: "0"
        cert-manager.io/cluster-issuer: letsencrypt
      hosts:
        - host: my.domain.com
          paths:
            - path: "/"
      tls:
        - hosts:
            - my.domain.com
          secretName: my-domain-com
image:
  tag: v1.56.1
immich:
  persistence:
    library:
      existingClaim: photos

Your .env content

postgresql:
      enabled: true
redis:
  enabled: true

typesense:
  enabled: true
  persistence:
    tsdata:
      enabled: true
      existingClaim: typesense-data

machine-learning:
  persistence:
    cache:
      enabled: true
      existingClaim: machinelearning-data
proxy:
  ingress:
    main:
      enabled: true
      ingressClassName: nginx
      annotations:
        nginx.ingress.kubernetes.io/proxy-body-size: "0"
        cert-manager.io/cluster-issuer: letsencrypt
      hosts:
        - host: my.domain.com
          paths:
            - path: "/"
      tls:
        - hosts:
            - my.domain.com
          secretName: my-domain-com
image:
  tag: v1.56.1
immich:
  persistence:
    library:
      existingClaim: photos

Reproduction steps

1. Deploy the helm chart on that version
2. Wait for all pods to come up and machinelearning to crash.

Additional information

No response

Originally created by @gcarrarom on GitHub (May 19, 2023). ### The bug There's a bug when using version 1.56.1 on kubernetes using the official helm chart: zsh ⌁ klf immich-machine-learning-54b5766488-kx4b4 Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/insightface/__init__.py", line 8, in <module> import onnxruntime File "/opt/venv/lib/python3.10/site-packages/onnxruntime/__init__.py", line 55, in <module> raise import_capi_exception File "/opt/venv/lib/python3.10/site-packages/onnxruntime/__init__.py", line 23, in <module> from onnxruntime.capi._pybind_state import ( File "/opt/venv/lib/python3.10/site-packages/onnxruntime/capi/_pybind_state.py", line 33, in <module> from .onnxruntime_pybind11_state import * # noqa ImportError: /opt/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-310-x86_64-linux-gnu.so: cannot enable executable stack as shared object requires: Permission denied During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/src/app/src/main.py", line 6, in <module> from insightface.app import FaceAnalysis File "/opt/venv/lib/python3.10/site-packages/insightface/__init__.py", line 10, in <module> raise ImportError( ImportError: Unable to import dependency onnxruntime. ### The OS that Immich Server is running on Kubernetes - k3s - MicroOS ### Version of Immich Server v1.56.1 ### Version of Immich Mobile App N/A ### Platform with the issue - [X] Server - [ ] Web - [ ] Mobile ### Your docker-compose.yml content ```YAML postgresql: enabled: true redis: enabled: true typesense: enabled: true persistence: tsdata: enabled: true existingClaim: typesense-data machine-learning: persistence: cache: enabled: true existingClaim: machinelearning-data proxy: ingress: main: enabled: true ingressClassName: nginx annotations: nginx.ingress.kubernetes.io/proxy-body-size: "0" cert-manager.io/cluster-issuer: letsencrypt hosts: - host: my.domain.com paths: - path: "/" tls: - hosts: - my.domain.com secretName: my-domain-com image: tag: v1.56.1 immich: persistence: library: existingClaim: photos ``` ### Your .env content ```Shell postgresql: enabled: true redis: enabled: true typesense: enabled: true persistence: tsdata: enabled: true existingClaim: typesense-data machine-learning: persistence: cache: enabled: true existingClaim: machinelearning-data proxy: ingress: main: enabled: true ingressClassName: nginx annotations: nginx.ingress.kubernetes.io/proxy-body-size: "0" cert-manager.io/cluster-issuer: letsencrypt hosts: - host: my.domain.com paths: - path: "/" tls: - hosts: - my.domain.com secretName: my-domain-com image: tag: v1.56.1 immich: persistence: library: existingClaim: photos ``` ### Reproduction steps ```bash 1. Deploy the helm chart on that version 2. Wait for all pods to come up and machinelearning to crash. ``` ### Additional information _No response_
OVERLORD added the 🗄️server label 2026-02-04 23:06:59 +03:00
Author
Owner

@gcarrarom commented on GitHub (May 19, 2023):

Just to update: I've now rolled back to 1.56.0 and it's working flawlessly. It's probably a bug introduced on 1.56.1.

@gcarrarom commented on GitHub (May 19, 2023): Just to update: I've now rolled back to 1.56.0 and it's working flawlessly. It's probably a bug introduced on 1.56.1.
Author
Owner

@alextran1502 commented on GitHub (May 19, 2023):

HMm from 1.56.0 to 1.56.1 we only changed the server and the web related code 🤔

@alextran1502 commented on GitHub (May 19, 2023): HMm from 1.56.0 to 1.56.1 we only changed the server and the web related code 🤔
Author
Owner

@jrasm91 commented on GitHub (May 19, 2023):

There have been a few reports of related onyx runtime errors that have been fixed by delete the machine learning cache volume. Rolling back versions might have done that in your situation.

@jrasm91 commented on GitHub (May 19, 2023): There have been a few reports of related onyx runtime errors that have been fixed by delete the machine learning cache volume. Rolling back versions might have done that in your situation.
Author
Owner

@gcarrarom commented on GitHub (May 19, 2023):

Great to know, I'll try pushing 1.56.1 again and clear the cache. I should report back in a few hours.

@gcarrarom commented on GitHub (May 19, 2023): Great to know, I'll try pushing 1.56.1 again and clear the cache. I should report back in a few hours.
Author
Owner

@gcarrarom commented on GitHub (May 19, 2023):

Odd, Just upgraded to 1.56.1 and still the same error. I've removed the emptyDir cache folder and the error persists. Tried creating it using another storage class and same issue. Could it be something else in another directory? Same error here:

zsh ⌁ klf immich-machine-learning-5d4b859887-zc27z 
Traceback (most recent call last):
  File "/opt/venv/lib/python3.10/site-packages/insightface/__init__.py", line 8, in <module>
    import onnxruntime
  File "/opt/venv/lib/python3.10/site-packages/onnxruntime/__init__.py", line 55, in <module>
    raise import_capi_exception
  File "/opt/venv/lib/python3.10/site-packages/onnxruntime/__init__.py", line 23, in <module>
    from onnxruntime.capi._pybind_state import (
  File "/opt/venv/lib/python3.10/site-packages/onnxruntime/capi/_pybind_state.py", line 33, in <module>
    from .onnxruntime_pybind11_state import *  # noqa
ImportError: /opt/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-310-x86_64-linux-gnu.so: cannot enable executable stack as shared object requires: Permission denied

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/app/src/main.py", line 6, in <module>
    from insightface.app import FaceAnalysis
  File "/opt/venv/lib/python3.10/site-packages/insightface/__init__.py", line 10, in <module>
    raise ImportError(
ImportError: Unable to import dependency onnxruntime
@gcarrarom commented on GitHub (May 19, 2023): Odd, Just upgraded to 1.56.1 and still the same error. I've removed the emptyDir cache folder and the error persists. Tried creating it using another storage class and same issue. Could it be something else in another directory? Same error here: ``` zsh ⌁ klf immich-machine-learning-5d4b859887-zc27z Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/insightface/__init__.py", line 8, in <module> import onnxruntime File "/opt/venv/lib/python3.10/site-packages/onnxruntime/__init__.py", line 55, in <module> raise import_capi_exception File "/opt/venv/lib/python3.10/site-packages/onnxruntime/__init__.py", line 23, in <module> from onnxruntime.capi._pybind_state import ( File "/opt/venv/lib/python3.10/site-packages/onnxruntime/capi/_pybind_state.py", line 33, in <module> from .onnxruntime_pybind11_state import * # noqa ImportError: /opt/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-310-x86_64-linux-gnu.so: cannot enable executable stack as shared object requires: Permission denied During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/src/app/src/main.py", line 6, in <module> from insightface.app import FaceAnalysis File "/opt/venv/lib/python3.10/site-packages/insightface/__init__.py", line 10, in <module> raise ImportError( ImportError: Unable to import dependency onnxruntime ```
Author
Owner

@bo0tzz commented on GitHub (May 19, 2023):

Do you have selinux enabled? From a bit of googling it seems like that could cause the error you're getting.

@bo0tzz commented on GitHub (May 19, 2023): Do you have selinux enabled? From a bit of googling it seems like that could cause the error you're getting.
Author
Owner

@DrSpaldo commented on GitHub (May 20, 2023):

I've also had problems after updating to 1.56.1

@DrSpaldo commented on GitHub (May 20, 2023): I've also had problems after updating to 1.56.1
Author
Owner

@alextran1502 commented on GitHub (May 20, 2023):

#2487 should fix this I believe

@alextran1502 commented on GitHub (May 20, 2023): #2487 should fix this I believe
Author
Owner

@gcarrarom commented on GitHub (May 21, 2023):

Amazing! 1.56.2 fixed it! Thank you!

@gcarrarom commented on GitHub (May 21, 2023): Amazing! 1.56.2 fixed it! Thank you!
Author
Owner

@gcarrarom commented on GitHub (May 23, 2023):

Sadly I need to reopen this bug for 1.57.1. Same error. Any ideas?

@gcarrarom commented on GitHub (May 23, 2023): Sadly I need to reopen this bug for 1.57.1. Same error. Any ideas?
Author
Owner

@alextran1502 commented on GitHub (May 24, 2023):

Can you try remove the model cache, start up the pod and let it finish download the model before usage?

@alextran1502 commented on GitHub (May 24, 2023): Can you try remove the model cache, start up the pod and let it finish download the model before usage?
Author
Owner

@gcarrarom commented on GitHub (May 24, 2023):

So, removed the files from the cache portion of the k8s deployment and the same error is happening with the ephemeral storage. It seems odd to run into such errors even though there is no cache whatsoever...

Traceback (most recent call last):
  File "/opt/venv/lib/python3.10/site-packages/insightface/__init__.py", line 8, in <module>
    import onnxruntime
  File "/opt/venv/lib/python3.10/site-packages/onnxruntime/__init__.py", line 55, in <module>
    raise import_capi_exception
  File "/opt/venv/lib/python3.10/site-packages/onnxruntime/__init__.py", line 23, in <module>
    from onnxruntime.capi._pybind_state import (
  File "/opt/venv/lib/python3.10/site-packages/onnxruntime/capi/_pybind_state.py", line 33, in <module>
    from .onnxruntime_pybind11_state import *  # noqa
ImportError: /opt/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-310-x86_64-linux-gnu.so: cannot enable executable stack as shared object requires: Permission denied

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/app/src/main.py", line 6, in <module>
    from insightface.app import FaceAnalysis
  File "/opt/venv/lib/python3.10/site-packages/insightface/__init__.py", line 10, in <module>
    raise ImportError(
ImportError: Unable to import dependency onnxruntime.
@gcarrarom commented on GitHub (May 24, 2023): So, removed the files from the cache portion of the k8s deployment and the same error is happening with the ephemeral storage. It seems odd to run into such errors even though there is no cache whatsoever... ``` Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/insightface/__init__.py", line 8, in <module> import onnxruntime File "/opt/venv/lib/python3.10/site-packages/onnxruntime/__init__.py", line 55, in <module> raise import_capi_exception File "/opt/venv/lib/python3.10/site-packages/onnxruntime/__init__.py", line 23, in <module> from onnxruntime.capi._pybind_state import ( File "/opt/venv/lib/python3.10/site-packages/onnxruntime/capi/_pybind_state.py", line 33, in <module> from .onnxruntime_pybind11_state import * # noqa ImportError: /opt/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-310-x86_64-linux-gnu.so: cannot enable executable stack as shared object requires: Permission denied During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/src/app/src/main.py", line 6, in <module> from insightface.app import FaceAnalysis File "/opt/venv/lib/python3.10/site-packages/insightface/__init__.py", line 10, in <module> raise ImportError( ImportError: Unable to import dependency onnxruntime. ```
Author
Owner

@gcarrarom commented on GitHub (Jun 5, 2023):

Just did the update to 1.60.0 and it's still running into the same issue.

This permission denied issue makes me think it might be the permission of the downloaded files. I'll look into the user that download the modules and see if there's something going on there.
Nevermind. User seems to have all the permissions it needs. I'll try to debug more tonight.

@gcarrarom commented on GitHub (Jun 5, 2023): Just did the update to 1.60.0 and it's still running into the same issue. ~~This permission denied issue makes me think it might be the permission of the downloaded files. I'll look into the user that download the modules and see if there's something going on there.~~ Nevermind. User seems to have all the permissions it needs. I'll try to debug more tonight.
Author
Owner

@geraldwuhoo commented on GitHub (Jun 7, 2023):

I have been getting this issue for the past few weeks as well. My server is still on 1.55.1, the last working version for me.
I think @bo0tzz may be correct about SELinux permissions, as I do have SELinux enabled on my machines. What changed in between 1.55.1 and future versions that could cause this? Unfortunately disabling SELinux is not an option for me just to solve this one issue.

@geraldwuhoo commented on GitHub (Jun 7, 2023): I have been getting this issue for the past few weeks as well. My server is still on 1.55.1, the last working version for me. I think @bo0tzz may be correct about SELinux permissions, as I do have SELinux enabled on my machines. What changed in between 1.55.1 and future versions that could cause this? Unfortunately disabling SELinux is not an option for me just to solve this one issue.
Author
Owner

@gcarrarom commented on GitHub (Jun 16, 2023):

Same happening with v1.61.0:

zsh ⌁ kgp                                          
NAME                                       READY   STATUS             RESTARTS         AGE
immich-machine-learning-6978bfdbdf-z8www   0/1     CrashLoopBackOff   2 (25s ago)      70s
Traceback (most recent call last):
  File "/opt/venv/lib/python3.11/site-packages/insightface/__init__.py", line 8, in <module>
    import onnxruntime
  File "/opt/venv/lib/python3.11/site-packages/onnxruntime/__init__.py", line 55, in <module>
    raise import_capi_exception
  File "/opt/venv/lib/python3.11/site-packages/onnxruntime/__init__.py", line 23, in <module>
    from onnxruntime.capi._pybind_state import ExecutionMode  # noqa: F401
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/onnxruntime/capi/_pybind_state.py", line 33, in <module>
    from .onnxruntime_pybind11_state import *  # noqa
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ImportError: /opt/venv/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-311-x86_64-linux-gnu.so: cannot enable executable stack as shared object requires: Permission denied

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/app/main.py", line 4, in <module>
    from cache import ModelCache
  File "/usr/src/app/cache.py", line 5, in <module>
    from models import get_model
  File "/usr/src/app/models.py", line 2, in <module>
    from insightface.app import FaceAnalysis
  File "/opt/venv/lib/python3.11/site-packages/insightface/__init__.py", line 10, in <module>
    raise ImportError(
ImportError: Unable to import dependency onnxruntime. 
@gcarrarom commented on GitHub (Jun 16, 2023): Same happening with v1.61.0: ```bash zsh ⌁ kgp NAME READY STATUS RESTARTS AGE immich-machine-learning-6978bfdbdf-z8www 0/1 CrashLoopBackOff 2 (25s ago) 70s ``` ```python Traceback (most recent call last): File "/opt/venv/lib/python3.11/site-packages/insightface/__init__.py", line 8, in <module> import onnxruntime File "/opt/venv/lib/python3.11/site-packages/onnxruntime/__init__.py", line 55, in <module> raise import_capi_exception File "/opt/venv/lib/python3.11/site-packages/onnxruntime/__init__.py", line 23, in <module> from onnxruntime.capi._pybind_state import ExecutionMode # noqa: F401 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.11/site-packages/onnxruntime/capi/_pybind_state.py", line 33, in <module> from .onnxruntime_pybind11_state import * # noqa ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ImportError: /opt/venv/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-311-x86_64-linux-gnu.so: cannot enable executable stack as shared object requires: Permission denied During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/src/app/main.py", line 4, in <module> from cache import ModelCache File "/usr/src/app/cache.py", line 5, in <module> from models import get_model File "/usr/src/app/models.py", line 2, in <module> from insightface.app import FaceAnalysis File "/opt/venv/lib/python3.11/site-packages/insightface/__init__.py", line 10, in <module> raise ImportError( ImportError: Unable to import dependency onnxruntime. ```
Author
Owner

@bo0tzz commented on GitHub (Jun 16, 2023):

What changed in between 1.55.1 and future versions that could cause this?

v1.56.0 introduced face recognition, which I believe is what added the onnxruntime dependency.

@bo0tzz commented on GitHub (Jun 16, 2023): > What changed in between 1.55.1 and future versions that could cause this? v1.56.0 introduced face recognition, which I believe is what added the onnxruntime dependency.
Author
Owner

@nohitme commented on GitHub (Jun 16, 2023):

I am seeing a different error message when starting immich-machine-learning container (v1.61.0):

python: can't open file '/usr/src/app/src/main.py': [Errno 2] No such file or directory?

Is this a new issue?

@nohitme commented on GitHub (Jun 16, 2023): I am seeing a different error message when starting immich-machine-learning container (v1.61.0): `python: can't open file '/usr/src/app/src/main.py': [Errno 2] No such file or directory`? Is this a new issue?
Author
Owner

@bo0tzz commented on GitHub (Jun 16, 2023):

@nohitme that is an unrelated issue. Please make sure you're using the latest image and docker-compose.yml, and open a support thread in Discord or the Github Discussions if you still have trouble.

@bo0tzz commented on GitHub (Jun 16, 2023): @nohitme that is an unrelated issue. Please make sure you're using the latest image and docker-compose.yml, and open a support thread in Discord or the Github Discussions if you still have trouble.
Author
Owner

@nohitme commented on GitHub (Jun 16, 2023):

Understand it could be a separate issue. I will verify it separately on the latest image (I am sure it was tho) and report it if it persists.

Thanks for the reply!

@nohitme commented on GitHub (Jun 16, 2023): Understand it could be a separate issue. I will verify it separately on the latest image (I am sure it was tho) and report it if it persists. Thanks for the reply!
Author
Owner

@gcarrarom commented on GitHub (Jun 19, 2023):

Interesting.. Freshly built container image for machine learning from the main branch:

zsh ⌁ docker run -it test                    
Traceback (most recent call last):
  File "/opt/venv/lib/python3.11/site-packages/insightface/__init__.py", line 8, in <module>
    import onnxruntime
  File "/opt/venv/lib/python3.11/site-packages/onnxruntime/__init__.py", line 55, in <module>
    raise import_capi_exception
  File "/opt/venv/lib/python3.11/site-packages/onnxruntime/__init__.py", line 23, in <module>
    from onnxruntime.capi._pybind_state import ExecutionMode  # noqa: F401
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/onnxruntime/capi/_pybind_state.py", line 33, in <module>
    from .onnxruntime_pybind11_state import *  # noqa
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ImportError: /opt/venv/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-311-x86_64-linux-gnu.so: cannot enable executable stack as shared object requires: Permission denied

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/src/app/main.py", line 5, in <module>
    from cache import ModelCache
  File "/usr/src/app/cache.py", line 5, in <module>
    from models import get_model
  File "/usr/src/app/models.py", line 2, in <module>
    from insightface.app import FaceAnalysis
  File "/opt/venv/lib/python3.11/site-packages/insightface/__init__.py", line 10, in <module>
    raise ImportError(
ImportError: Unable to import dependency onnxruntime. 

It's not k8s specific then. I'll remove the multi-step build to check if there's something missing/permission mismatch that could be happening on the container build.

@gcarrarom commented on GitHub (Jun 19, 2023): Interesting.. Freshly built container image for machine learning from the main branch: ``` zsh ⌁ docker run -it test Traceback (most recent call last): File "/opt/venv/lib/python3.11/site-packages/insightface/__init__.py", line 8, in <module> import onnxruntime File "/opt/venv/lib/python3.11/site-packages/onnxruntime/__init__.py", line 55, in <module> raise import_capi_exception File "/opt/venv/lib/python3.11/site-packages/onnxruntime/__init__.py", line 23, in <module> from onnxruntime.capi._pybind_state import ExecutionMode # noqa: F401 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.11/site-packages/onnxruntime/capi/_pybind_state.py", line 33, in <module> from .onnxruntime_pybind11_state import * # noqa ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ImportError: /opt/venv/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-311-x86_64-linux-gnu.so: cannot enable executable stack as shared object requires: Permission denied During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/usr/src/app/main.py", line 5, in <module> from cache import ModelCache File "/usr/src/app/cache.py", line 5, in <module> from models import get_model File "/usr/src/app/models.py", line 2, in <module> from insightface.app import FaceAnalysis File "/opt/venv/lib/python3.11/site-packages/insightface/__init__.py", line 10, in <module> raise ImportError( ImportError: Unable to import dependency onnxruntime. ``` It's not k8s specific then. I'll remove the multi-step build to check if there's something missing/permission mismatch that could be happening on the container build.
Author
Owner

@gcarrarom commented on GitHub (Jun 19, 2023):

Same error building with a simple pip install of the requirements. This is the container image I'm using:

FROM python:3.11.4-bullseye@sha256:5b401676aff858495a5c9c726c60b8b73fe52833e9e16eccdb59e93d52741727

ENV NODE_ENV=production \
  TRANSFORMERS_CACHE=/cache \
  PYTHONDONTWRITEBYTECODE=1 \
  PYTHONUNBUFFERED=1 \
  PATH="/opt/venv/bin:$PATH" \
  PYTHONPATH=`pwd` \
  PIP_NO_CACHE_DIR=true

WORKDIR /usr/src/app

COPY ./requirements.txt .
COPY app .
RUN pip install -r requirements.txt

ENTRYPOINT ["python", "main.py"]

It runs into the same problem, here's the directory it's trying to execute, it's owned by the root user:

zsh ⌁ docker run --entrypoint "" -it test bash     
root@41d3bfede331:/usr/src/app# cd /usr/local/lib/python3.11/site-packages/onnxruntime/capi/
root@41d3bfede331:/usr/local/lib/python3.11/site-packages/onnxruntime/capi# ls -al
total 14136
drwxr-xr-x. 1 root root      490 Jun 19 18:07 .
drwxr-xr-x. 1 root root      216 Jun 19 18:07 ..
-rw-r--r--. 1 root root      247 Jun 19 18:07 __init__.py
drwxr-xr-x. 1 root root      424 Jun 19 18:07 __pycache__
-rw-r--r--. 1 root root      406 Jun 19 18:07 _ld_preload.py
-rw-r--r--. 1 root root     1510 Jun 19 18:07 _pybind_state.py
-rwxr-xr-x. 1 root root    14216 Jun 19 18:07 libonnxruntime_providers_shared.so
-rw-r--r--. 1 root root     3965 Jun 19 18:07 onnxruntime_collect_build_info.py
-rw-r--r--. 1 root root    38714 Jun 19 18:07 onnxruntime_inference_collection.py
-rw-r--r--. 1 root root 14392120 Jun 19 18:07 onnxruntime_pybind11_state.cpython-311-x86_64-linux-gnu.so
-rw-r--r--. 1 root root     6237 Jun 19 18:07 onnxruntime_validation.py
drwxr-xr-x. 1 root root       44 Jun 19 18:07 training
root@41d3bfede331:/usr/local/lib/python3.11/site-packages/onnxruntime/capi# whoami
root

The onnxruntime_pybind11_state.cpython-311-x86_64-linux-gnu.so file is not executable though, that might be the problem.

@gcarrarom commented on GitHub (Jun 19, 2023): Same error building with a simple pip install of the requirements. This is the container image I'm using: ```dockerfile FROM python:3.11.4-bullseye@sha256:5b401676aff858495a5c9c726c60b8b73fe52833e9e16eccdb59e93d52741727 ENV NODE_ENV=production \ TRANSFORMERS_CACHE=/cache \ PYTHONDONTWRITEBYTECODE=1 \ PYTHONUNBUFFERED=1 \ PATH="/opt/venv/bin:$PATH" \ PYTHONPATH=`pwd` \ PIP_NO_CACHE_DIR=true WORKDIR /usr/src/app COPY ./requirements.txt . COPY app . RUN pip install -r requirements.txt ENTRYPOINT ["python", "main.py"] ``` It runs into the same problem, here's the directory it's trying to execute, it's owned by the root user: ``` zsh ⌁ docker run --entrypoint "" -it test bash root@41d3bfede331:/usr/src/app# cd /usr/local/lib/python3.11/site-packages/onnxruntime/capi/ root@41d3bfede331:/usr/local/lib/python3.11/site-packages/onnxruntime/capi# ls -al total 14136 drwxr-xr-x. 1 root root 490 Jun 19 18:07 . drwxr-xr-x. 1 root root 216 Jun 19 18:07 .. -rw-r--r--. 1 root root 247 Jun 19 18:07 __init__.py drwxr-xr-x. 1 root root 424 Jun 19 18:07 __pycache__ -rw-r--r--. 1 root root 406 Jun 19 18:07 _ld_preload.py -rw-r--r--. 1 root root 1510 Jun 19 18:07 _pybind_state.py -rwxr-xr-x. 1 root root 14216 Jun 19 18:07 libonnxruntime_providers_shared.so -rw-r--r--. 1 root root 3965 Jun 19 18:07 onnxruntime_collect_build_info.py -rw-r--r--. 1 root root 38714 Jun 19 18:07 onnxruntime_inference_collection.py -rw-r--r--. 1 root root 14392120 Jun 19 18:07 onnxruntime_pybind11_state.cpython-311-x86_64-linux-gnu.so -rw-r--r--. 1 root root 6237 Jun 19 18:07 onnxruntime_validation.py drwxr-xr-x. 1 root root 44 Jun 19 18:07 training root@41d3bfede331:/usr/local/lib/python3.11/site-packages/onnxruntime/capi# whoami root ``` The `onnxruntime_pybind11_state.cpython-311-x86_64-linux-gnu.so` file is not executable though, that might be the problem.
Author
Owner

@gcarrarom commented on GitHub (Jun 19, 2023):

Only thing that I can se affecting this now is SELinux on the host running the container runtime for k3s. Makes me wonder what exactly is this package trying to access.

EDIT: I mean the package from onnxruntime. I'm trying to build it using their base image to account for that portion before building the python packages of this immich machine-learning image. Oddly enough their process to build is not working as intended. I will try to continue troubleshooting tomorrow.

@gcarrarom commented on GitHub (Jun 19, 2023): Only thing that I can se affecting this now is SELinux on the host running the container runtime for k3s. Makes me wonder what exactly is this package trying to access. EDIT: I mean the package from onnxruntime. I'm trying to build it using their base image to account for that portion before building the python packages of this immich machine-learning image. Oddly enough their process to build is not working as intended. I will try to continue troubleshooting tomorrow.
Author
Owner

@gcarrarom commented on GitHub (Jun 30, 2023):

Error is slightly different now from the new version thanks to the update from #2951

zsh ⌁ klf immich-machine-learning-668757f9b6-jgmsq 
Traceback (most recent call last):
  File "<frozen runpy>", line 198, in _run_module_as_main
  File "<frozen runpy>", line 88, in _run_code
  File "/usr/src/app/main.py", line 12, in <module>
    from .models.base import InferenceModel
  File "/usr/src/app/models/__init__.py", line 1, in <module>
    from .clip import CLIPSTEncoder
  File "/usr/src/app/models/clip.py", line 8, in <module>
    from .base import InferenceModel
  File "/usr/src/app/models/base.py", line 8, in <module>
    from onnxruntime.capi.onnxruntime_pybind11_state import InvalidProtobuf  # type: ignore
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/onnxruntime/__init__.py", line 55, in <module>
    raise import_capi_exception
  File "/opt/venv/lib/python3.11/site-packages/onnxruntime/__init__.py", line 23, in <module>
    from onnxruntime.capi._pybind_state import ExecutionMode  # noqa: F401
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.11/site-packages/onnxruntime/capi/_pybind_state.py", line 33, in <module>
    from .onnxruntime_pybind11_state import *  # noqa
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ImportError: /opt/venv/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-311-x86_64-linux-gnu.so: cannot enable executable stack as shared object requires: Permission denied

I will try to make a few tweaks on the fsgroup in k8s and see if it helps.

@gcarrarom commented on GitHub (Jun 30, 2023): Error is slightly different now from the new version thanks to the update from #2951 ```bash zsh ⌁ klf immich-machine-learning-668757f9b6-jgmsq Traceback (most recent call last): File "<frozen runpy>", line 198, in _run_module_as_main File "<frozen runpy>", line 88, in _run_code File "/usr/src/app/main.py", line 12, in <module> from .models.base import InferenceModel File "/usr/src/app/models/__init__.py", line 1, in <module> from .clip import CLIPSTEncoder File "/usr/src/app/models/clip.py", line 8, in <module> from .base import InferenceModel File "/usr/src/app/models/base.py", line 8, in <module> from onnxruntime.capi.onnxruntime_pybind11_state import InvalidProtobuf # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.11/site-packages/onnxruntime/__init__.py", line 55, in <module> raise import_capi_exception File "/opt/venv/lib/python3.11/site-packages/onnxruntime/__init__.py", line 23, in <module> from onnxruntime.capi._pybind_state import ExecutionMode # noqa: F401 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.11/site-packages/onnxruntime/capi/_pybind_state.py", line 33, in <module> from .onnxruntime_pybind11_state import * # noqa ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ImportError: /opt/venv/lib/python3.11/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-311-x86_64-linux-gnu.so: cannot enable executable stack as shared object requires: Permission denied ``` I will try to make a few tweaks on the fsgroup in k8s and see if it helps.
Author
Owner

@bo0tzz commented on GitHub (Jun 30, 2023):

Since this is SElinux not liking a dependency that I think we can't really do without (cc @mertalev?), I don't believe there is much we can do about this from the Immich side.

@bo0tzz commented on GitHub (Jun 30, 2023): Since this is SElinux not liking a dependency that I think we can't really do without (cc @mertalev?), I don't believe there is much we can do about this from the Immich side.
Author
Owner

@gcarrarom commented on GitHub (Jun 30, 2023):

Kinda? I mean, the files are labeled as such in the container by default:

root@code-658d97b879-j2f6g:/usr/src/app# ls -Z /opt/venv/lib/python3.11/site-packages/onnxruntime/capi/
system_u:object_r:var_lib_t:s0 __init__.py                         system_u:object_r:var_lib_t:s0 onnxruntime_inference_collection.py
system_u:object_r:var_lib_t:s0 _ld_preload.py                      system_u:object_r:var_lib_t:s0 onnxruntime_pybind11_state.cpython-311-x86_64-linux-gnu.so
system_u:object_r:var_lib_t:s0 _pybind_state.py                    system_u:object_r:var_lib_t:s0 onnxruntime_validation.py
system_u:object_r:var_lib_t:s0 libonnxruntime_providers_shared.so  system_u:object_r:var_lib_t:s0 training
system_u:object_r:var_lib_t:s0 onnxruntime_collect_build_info.py

Sorry, now that I think about it, those labels are probably coming from the installation of the onnx dotnet runtime. Problem is how it gets flagged on the selinux level at the host:

type=AVC msg=audit(1688149775.122:26827): avc:  denied  { execstack } for  pid=13404 comm="python" scontext=system_u:system_r:unconfined_service_t:s0 tcontext=system_u:system_r:unconfined_service_t:s0 tclass=process permissive=0

I guess we could get the entrypoint of the container to change it? But that would mean running some sort of init container that could re-label those. I haven't had much time to look into it, sorry, but maybe I could play around with those labels and get a workaround for us.

@gcarrarom commented on GitHub (Jun 30, 2023): Kinda? I mean, the files are labeled as such in the container by default: ``` root@code-658d97b879-j2f6g:/usr/src/app# ls -Z /opt/venv/lib/python3.11/site-packages/onnxruntime/capi/ system_u:object_r:var_lib_t:s0 __init__.py system_u:object_r:var_lib_t:s0 onnxruntime_inference_collection.py system_u:object_r:var_lib_t:s0 _ld_preload.py system_u:object_r:var_lib_t:s0 onnxruntime_pybind11_state.cpython-311-x86_64-linux-gnu.so system_u:object_r:var_lib_t:s0 _pybind_state.py system_u:object_r:var_lib_t:s0 onnxruntime_validation.py system_u:object_r:var_lib_t:s0 libonnxruntime_providers_shared.so system_u:object_r:var_lib_t:s0 training system_u:object_r:var_lib_t:s0 onnxruntime_collect_build_info.py ``` Sorry, now that I think about it, those labels are probably coming from the installation of the onnx dotnet runtime. Problem is how it gets flagged on the selinux level at the host: ``` type=AVC msg=audit(1688149775.122:26827): avc: denied { execstack } for pid=13404 comm="python" scontext=system_u:system_r:unconfined_service_t:s0 tcontext=system_u:system_r:unconfined_service_t:s0 tclass=process permissive=0 ``` I guess we could get the entrypoint of the container to change it? But that would mean running some sort of init container that could re-label those. I haven't had much time to look into it, sorry, but maybe I could play around with those labels and get a workaround for us.
Author
Owner

@gcarrarom commented on GitHub (Jun 30, 2023):

Aha! That seems to be k3s that didn't enable selinux integration by default:

system_u:system_r:unconfined_service_t:s0 30095 ? 00:01:23 longhorn

All pods are coming up as unconfined_service_t
Seems to be fixed by enabling the configuration on the node level: https://github.com/k3s-io/k3s/issues/533

It is weird that it should've been done by default. I'll look into it and report back to reference for anyone else that is also looking into it.

@gcarrarom commented on GitHub (Jun 30, 2023): Aha! That seems to be k3s that didn't enable selinux integration by default: ``` system_u:system_r:unconfined_service_t:s0 30095 ? 00:01:23 longhorn ``` All pods are coming up as `unconfined_service_t` Seems to be fixed by enabling the configuration on the node level: https://github.com/k3s-io/k3s/issues/533 It is weird that it should've been done by default. I'll look into it and report back to reference for anyone else that is also looking into it.
Author
Owner

@gcarrarom commented on GitHub (Jun 30, 2023):

I can confirm, adding a proper label to the kubernetes containers allowed the execution to work properly. My instance is now running just fine for all machine learning tasks:
CleanShot 2023-06-30 at 16 56 16

Thanks very much for the amazing project!

@gcarrarom commented on GitHub (Jun 30, 2023): I can confirm, adding a proper label to the kubernetes containers allowed the execution to work properly. My instance is now running just fine for all machine learning tasks: ![CleanShot 2023-06-30 at 16 56 16](https://github.com/immich-app/immich/assets/10549675/402b11c9-44d4-40e3-9b2e-3bf67c07b2cc) Thanks very much for the amazing project!
Author
Owner

@nkay08 commented on GitHub (Dec 27, 2023):

I am running the stack via docker-compose and I am using the latest docker-compose.yml.
I am experiencing the same issue as described above.

The immich-machine-learning container runs into this issue at startup:

Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/insightface/init.py", line 8, in
import onnxruntime
File "/opt/venv/lib/python3.10/site-packages/onnxruntime/init.py", line 55, in
raise import_capi_exception
File "/opt/venv/lib/python3.10/site-packages/onnxruntime/init.py", line 23, in
from onnxruntime.capi._pybind_state import (
File "/opt/venv/lib/python3.10/site-packages/onnxruntime/capi/_pybind_state.py", line 33, in
from .onnxruntime_pybind11_state import * # noqa
ImportError: /opt/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-310-x86_64-linux-gnu.so: cannot enable executable stack as shared object requires: Permission denied

I am not really sure how I can solve this issue.

@nkay08 commented on GitHub (Dec 27, 2023): I am running the stack via docker-compose and I am using the latest docker-compose.yml. I am experiencing the same issue as described above. The `immich-machine-learning` container runs into this issue at startup: ``` Traceback (most recent call last): File "/opt/venv/lib/python3.10/site-packages/insightface/init.py", line 8, in import onnxruntime File "/opt/venv/lib/python3.10/site-packages/onnxruntime/init.py", line 55, in raise import_capi_exception File "/opt/venv/lib/python3.10/site-packages/onnxruntime/init.py", line 23, in from onnxruntime.capi._pybind_state import ( File "/opt/venv/lib/python3.10/site-packages/onnxruntime/capi/_pybind_state.py", line 33, in from .onnxruntime_pybind11_state import * # noqa ImportError: /opt/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-310-x86_64-linux-gnu.so: cannot enable executable stack as shared object requires: Permission denied ``` I am not really sure how I can solve this issue.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: immich-app/immich#862