mirror of
https://github.com/immich-app/immich.git
synced 2026-02-05 00:30:57 +03:00
[BUG] Machinelearning crashing on k8s deployment v1.57.1 - v1.65.0 #862
Closed
opened 2026-02-04 23:06:59 +03:00 by OVERLORD
·
28 comments
No Branch/Tag Specified
main
feat/asset-file-apis
chore/translations
fix/web-switch-label-clickable
fix/web-people-hidden-state
renovate/typescript-projects
release/next
fix/timezones
fix/time-zone-upserts
midzelis/wip
push-zpwsovysllvn
push-nwxlpmyzkyrl
push-nvnkszuqwppm
renovate/github-actions
push-smstsuupsowp
refactor/adaptive_image
push-olwpzvrxnomt
push-lmxsupnmxspl
renovate/machine-learning
feat/web-chromecast-video-looping
feat/use-native-clients
renovate/flutter
fix/create-face-edited
fix/mobile-ios-mtls
docs/contributing
docs/mise-mobile
renovate/grafana-monorepo
feature/bottom-buttons-order
feat/immich-mobile-ui-showcase
refactor/consolidate-image-requests
renovate/connectivity_plus-7.x
renovate/major-vitest-monorepo
renovate/pypi-python-multipart-vulnerability
fix/mobile-people-query
sqlite_thumbs
feat/html-text
chore/no-macro-validation
refactor/purchase-store
uhthomas/mobile-fix-app-bar-fade
uhthomas/mobile-fix-asset-jump
feat/pano-ocr
feat/shared-link-login
fix/database-backup-db-names
fix-keep-correct-ios-shared-album-asset
fix-memory-generation-and-display
feat/verify-permissions
refactor/album-service-small-tests
fix/ml-rocm-build
fix/flipped-dimensions-mobile
push-vpxwmwwxwnvw
fix-migration-width-height
refactor/more-queries
revert/prettier-translations
refactor/asset-service-queries
fix/locale-settings-desc
chore/add-debug-log
feat/edit-filters
shared-deep-link-handler
feat/mobile-editing
feat/thumbnail-native-clients
feat/platform-clients
feat/integrity-checks-izzy
fix/foreground-cloud-sync
feat/dynamic-layout
filter-by-person
feat/csp
refactor/sidebar
fix/disable-editing
fix/view-timeline-deeplink
image-zoom-on-slow-connection
fix-consider-dar-for-video-dimension
fix/merged-edited-assets
perf/optimize-album-sort
open-api-fix
feat/create-job-with-dto
use-toast-primary
feat/vitest-4
feat/ios-fastlane-match
match-signing
fix-update-time-update-timeline
chore/translation-keys
feat/modal-routes
feat/panorama-tiles
feature/mobile-view-asset-owner
feat/system-settings
feature/show-activity-count
better-info-in-asset-viewer
fix/all-people-count
feat/location-favorites
feature/rearrange-buttons-2
fix/download-storage-template
feat/kb-shortcuts-mobile
fix/people-count
push-qolzzzzxrvvn
chore/originals-in-asset-files
feat/asset-size-columns
ben/tree-a11y
new-search-filter-ui
refactor/expectSelectedReadonly
refactor/mobile-grdb
push-qvuktpxmkknu
feat/mobile-native-local-sync
refactor/timeline_ops
fix/scrubber_end
feat/version.txt
feat/context-menus
feat/server-chunked-uploads
refactor/virtualsegment
refactor/rename_daymonth_groups
fix/restrict-android-bg-worker
feat/android-periodic-worker
fix-remote-sync-clean-up
refactor/timeline_move_ops
renovate/mapbox-mapbox-gl-rtl-text-0.x
fix/timeline_split_selectable
feat/keyboard_actions_help_modal
feat/static_frontend
feat/notification-warnign-android
feat/plugins2
feat/plugins
test/create-workflow-token-action
fix/docs-force
debug/search-result-similarity
debug/cf-chunked-uploads
feat/eslint_rule
feat/search-filter-album/web
refactor/timeline_photostream
refactor/timelineasset_asset
feat/session-permissions
feat/timeline_photostream_assetnav
feat/timeline_minor_optimize
feat/timeline_perf_nocomp
feat/timeline_search_results_actions
feat/timeline_search_results_page
fix/timeline_padding
fix/timeline_search_reactivity_warnings
feat/timeline_scrollbar
feat/timeline_stream_withviewer
fix/timeline_back_forth_nav
refactor/timeline_photostream_component
fix/generated-files-checks
fix/locate-button-local
chore/base-image-mimalloc
refactor/timeline_assetlayout
refactor/timeline_selectable
refactor/timeline_aware_actions
refactor/timeline_monthsegment
feat/remove-old-pages
chore/deps-gradle
tmp_photostream
tmp/lcms
feat/mobile-dynamic-thumbnails
fix/mobile-finer-thumbnail-concurrency
refactor/timeline1
refactor/extract_photostream
refactor/rename_load_api
refactor/timeline2
refactor/timeline3
feat/multi-select-asset-viewer
feat-no-thumbhash-cache
refactor/asset_grid
feat/faster-access-checks
fix/18991
fix/19543
chore/temp-remove
fix/21419
feat/mobile-hdr-images
chore/update-mise-lockfile
feat/mise-server-checks
feat/mise-ci
feat/windows-2025
feat/dev_cli
refactor/mobile-migrate-clients
fix/map-theme
fix/require-checkbox
chore/use_swc
feat/efficient-thumbnail-decoding
refactor/mobile-thumbhash
refactor/mobile-thumbhash-new
fix/mobile-uncached-zoom
feat/beta-background-upload
fix/beta-timeline-memories-setting
fix/failed-uploads-not-removed
feat/mobile-shared-album
feat/groups
drift-map-page
drift-auth-user-sync
fix/disable-memory
feat/add-to-album-action
edit-date-time-action
drift-people-page
sqlite-remove-isIn
feat/inline-storage-columns
chore/required-reviewers
refact/asset-manager
fix/folder-sort
pnpm
feat/widget-multiple-server-urls
chore/medium-tests-dbname
fix/web-no-iterator-find
fix/map-pan-interruption
track-livephotos
timeline_events
chore/oxlint-migration
feat/maintenance-worker
feat/dav
chore/demo-snapshot
refactor/server-side-dedupe
feat/integrity-checks
dev/recognition-eval
lighter_buckets_test
perf/postgres-queue
postgres-queue
focus_rings
refactor/web-stores-1
refactor/add-to-taken
feat/sort-places
feat/sidecar-asset-file
vet
tmp/demo-snapshot-preview
fix/server-migration-file-extension
refactor/mobile-v2
fix/asset-update-race-condition
rknn-toolkit-lite2
refactor/mobile-split-up-search-page
feature/Add-rocm-support-for-machine-learning
feat/rocm
chore/async-hash-file
feat/shared-link-view-count
feat/rotation
feat/graphql
feat/job-ids
feat/ignore-library-permission-error
feat/docker-compose-builder
feat/kysely-typeorm
mobile/onboarding
no-video-player
fix/server-qsv-output-format
chore/server-geodata-tweaks
mobile/native-video-player-no-hero
feat/xxhash
fix/docs-concurrency
feat/preload-ml-textual-model
feat/local-tileserver
refactor/exif-orientation
original-path-infix
refactor/mobile/login-form-1
feat/server-editor-endpoints
fix/server-qsv-vbr
fix-mobile-db-problems
feat/ml-armnn-conversion
feat/mobile/backup-with-album-info
feat/fast-initial-sync-1
chore/handle-output_dims
feat/server-more-robust-generation
feat/unassign-faces
feat/shortcuts-on-asset-grid
feat/background-upload
feat/capacitor-mobile-app-poc
feat/server-nvenc-hw-decoding
release/v1.105
fix/mobile-fetch-non-archive
feat/fine-grained-access-controls
web/automation-ui
feat/mobile-server-endpoint-save-dropdown
feat/blurhash-thumbnail
object-storage
feat/memories-animations
dev/metrics
ml/tflite
feat/ml-export-cli
v2.5.3
v2.5.2
v2.5.1
v2.5.0
v2.4.1
v2.4.0
v2.3.1
v2.3.0
v2.2.3
v2.2.2
v2.2.1
v2.2.0
v2.1.0
v2.0.1
v2.0.0
v1.144.1
v1.144.0
v1.143.1
v1.143.0
v1.142.1
v1.142.0
v1.141.1
v1.141.0
v1.140.1
v1.140.0
v1.139.4
v1.139.3
v1.139.2
v1.139.1
v1.139.0
v1.138.1
v1.138.0
v1.137.3
v1.137.2
v1.137.1
v1.137.0
v1.136.0
v1.135.3
v1.135.2
v1.135.1
v1.135.0
v1.134.0
v1.133.1
v1.133.0
v1.132.3
v1.132.2
v1.132.1
v1.132.0
v1.131.3
v1.131.2
v1.131.1
v1.131.0
v1.130.3
v1.130.2
v1.130.1
v1.130.0
v1.129.0
v1.128.0
v1.127.0
v1.126.1
v1.126.0
v1.125.7
v1.125.6
v1.125.5
v1.125.4
v1.125.3
v1.125.2
v1.125.1
v1.125.0
v1.124.2
v1.124.1
v1.124.0
v1.123.0
v1.122.3
v1.122.2
v1.122.1
v1.122.0
v1.121.0
v1.120.2
v1.120.1
v1.120.0
v1.119.1
v1.119.0
v1.118.2
v1.118.1
v1.118.0
v1.117.0
v1.116.2
v1.116.1
v1.116.0
v1.115.0
v1.114.0
v1.113.1
v1.113.0
v1.112.1
v1.112.0
v1.111.0
v1.110.0
v1.109.2
v1.109.1
v1.109.0
v1.108.0
v1.107.2
v1.107.1
v1.107.0
v1.106.4
v1.106.3
v1.106.2
v1.106.1
v1.106.0
v1.105.1
v1.105.0
v1.104.0
v1.103.1
v1.103.0
v1.102.3
v1.102.2
v1.102.1
v1.102.0
v1.101.0
v1.100.0
v1.99.0
v1.98.2
v1.98.1
v1.98.0
v1.97.0
v1.96.0
v1.95.1
v1.95.0
v1.94.1
v1.94.0
v1.93.3
v1.93.2
v1.93.1
v1.93.0
v1.92.1
v1.92.0
v1.91.4
v1.91.3
v1.91.2
v1.91.1
v1.91.0
v1.90.2
v1.90.1
v1.90.0
v1.89.0
v1.88.2
v1.88.1
v1.88.0
v1.87.0
v1.86.0
v1.85.0
v1.84.0
v1.83.0
v1.82.1
v1.82.0
v1.81.1
v1.81.0
v1.80.0
v1.79.1
v1.79.0
v1.78.1
v1.78.0
v1.77.0
v1.76.1
v1.76.0
v1.75.2
v1.75.1
v1.75.0
v1.74.0
v1.73.0
v1.72.2
v1.72.1
v1.72.0
v1.71.0
v1.70.0
v1.69.0
v1.68.0
v1.67.2
v1.67.1
v1.67.0
v1.66.1
v1.66.0
v1.65.0
v1.64.0
v1.63.2
v1.63.1
v1.63.0
v1.62.1
v1.62.0
v1.61.0
v1.60.0
v1.59.1
v1.59.0
v1.58.0
v1.57.1
v1.57.0
v1.56.2
v1.56.1
v1.56.0
v1.55.1
v1.55.0
v1.54.1
v1.54.0
v1.53.0
v1.52.1
v1.52.0
v1.51.2
v1.51.1
v1.51.0
v1.50.1
v1.50.0
v1.49.0
v1.48.1
v1.48.0
v1.47.3
v1.47.2
v1.47.1
v1.47.0
v1.46.1
v1.46.0
v1.45.0
v1.44.0
v1.43.1
v1.43.0
v1.42.0_65-dev
v1.41.1_64-dev
v1.41.0_64-dev
v1.40.1_63-dev
v1.40.0_63-dev
v1.39.0_61-dev
v1.38.2_60-dev
v1.38.1_60-dev
v1.38.0_60-dev
v1.37.0_58-dev
v1.36.2_56-dev
v1.36.1_55-dev
v1.36.0_55-dev
v1.35.0_54-dev
v1.34.0_53-dev
v1.33.1_52-dev
v1.33.0_52-dev
v1.32.1_51-dev
v1.32.0_50-dev
v1.31.1_49-dev
v1.31.0_49-dev
v1.30.2_48-dev
v1.30.0_46-dev
v1.29.6_45-dev
v1.29.6_44-dev
v1.29.5_44-dev
v1.29.4_44-dev
v1.29.3_43-dev
v1.29.2_43-dev
v1.29.1_43-dev
v1.29.0_42-dev
v1.28.4_41-dev
v1.28.4_42-dev
v1.28.3_41-dev
v1.28.2_40-dev
v1.28.1_39-dev
v1.28.0_38-dev
v1.27.0_37-dev
v1.26.0_36-dev
v1.25.0_35-dev
v1.24.0_34-dev
v1.23.0_33-dev
v1.22.0_32-dev
v1.21.1_31-dev
v1.21.0_31-dev
v1.20.3_30-dev
v1.20.2_30-dev
v1.20.1_30-dev
v1.20.0_30-dev
v1.19.1_29-dev
v1.19.0_29-dev
v1.18.0_27-dev
v1.17.0_25-dev
v1.16.0_23-dev
v1.15.1_21-dev
v1.15.0_21-dev
v1.14.0_21-dev
v1.13.0_20-dev
v1.12.0_18-dev
v1.11.0_17-dev
v1.10.0_15-dev
v1.9.1_14-dev
v1.9.0_13-dev
v1.8.0_12-dev
v1.7.0_11-dev
v1.6.0_10-dev
v1.5.1+9-dev
v1.5.0+8-dev
v1.4.0+7-dev
v1.4.0+6-dev
v1.4.0-dev
v1.3.0-dev
v1.3.1-dev
v0.6-dev
v0.5-dev
v0.4-dev
v0.3-dev
v0.2-dev
first-android-release
Milestone
No items
No Milestone
Projects
Clear projects
No project
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: immich-app/immich#862
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @gcarrarom on GitHub (May 19, 2023).
The bug
There's a bug when using version 1.56.1 on kubernetes using the official helm chart:
zsh ⌁ klf immich-machine-learning-54b5766488-kx4b4
Traceback (most recent call last):
File "/opt/venv/lib/python3.10/site-packages/insightface/init.py", line 8, in
import onnxruntime
File "/opt/venv/lib/python3.10/site-packages/onnxruntime/init.py", line 55, in
raise import_capi_exception
File "/opt/venv/lib/python3.10/site-packages/onnxruntime/init.py", line 23, in
from onnxruntime.capi._pybind_state import (
File "/opt/venv/lib/python3.10/site-packages/onnxruntime/capi/_pybind_state.py", line 33, in
from .onnxruntime_pybind11_state import * # noqa
ImportError: /opt/venv/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_pybind11_state.cpython-310-x86_64-linux-gnu.so: cannot enable executable stack as shared object requires: Permission denied
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/src/app/src/main.py", line 6, in
from insightface.app import FaceAnalysis
File "/opt/venv/lib/python3.10/site-packages/insightface/init.py", line 10, in
raise ImportError(
ImportError: Unable to import dependency onnxruntime.
The OS that Immich Server is running on
Kubernetes - k3s - MicroOS
Version of Immich Server
v1.56.1
Version of Immich Mobile App
N/A
Platform with the issue
Your docker-compose.yml content
Your .env content
Reproduction steps
Additional information
No response
@gcarrarom commented on GitHub (May 19, 2023):
Just to update: I've now rolled back to 1.56.0 and it's working flawlessly. It's probably a bug introduced on 1.56.1.
@alextran1502 commented on GitHub (May 19, 2023):
HMm from 1.56.0 to 1.56.1 we only changed the server and the web related code 🤔
@jrasm91 commented on GitHub (May 19, 2023):
There have been a few reports of related onyx runtime errors that have been fixed by delete the machine learning cache volume. Rolling back versions might have done that in your situation.
@gcarrarom commented on GitHub (May 19, 2023):
Great to know, I'll try pushing 1.56.1 again and clear the cache. I should report back in a few hours.
@gcarrarom commented on GitHub (May 19, 2023):
Odd, Just upgraded to 1.56.1 and still the same error. I've removed the emptyDir cache folder and the error persists. Tried creating it using another storage class and same issue. Could it be something else in another directory? Same error here:
@bo0tzz commented on GitHub (May 19, 2023):
Do you have selinux enabled? From a bit of googling it seems like that could cause the error you're getting.
@DrSpaldo commented on GitHub (May 20, 2023):
I've also had problems after updating to 1.56.1
@alextran1502 commented on GitHub (May 20, 2023):
#2487 should fix this I believe
@gcarrarom commented on GitHub (May 21, 2023):
Amazing! 1.56.2 fixed it! Thank you!
@gcarrarom commented on GitHub (May 23, 2023):
Sadly I need to reopen this bug for 1.57.1. Same error. Any ideas?
@alextran1502 commented on GitHub (May 24, 2023):
Can you try remove the model cache, start up the pod and let it finish download the model before usage?
@gcarrarom commented on GitHub (May 24, 2023):
So, removed the files from the cache portion of the k8s deployment and the same error is happening with the ephemeral storage. It seems odd to run into such errors even though there is no cache whatsoever...
@gcarrarom commented on GitHub (Jun 5, 2023):
Just did the update to 1.60.0 and it's still running into the same issue.
This permission denied issue makes me think it might be the permission of the downloaded files. I'll look into the user that download the modules and see if there's something going on there.Nevermind. User seems to have all the permissions it needs. I'll try to debug more tonight.
@geraldwuhoo commented on GitHub (Jun 7, 2023):
I have been getting this issue for the past few weeks as well. My server is still on 1.55.1, the last working version for me.
I think @bo0tzz may be correct about SELinux permissions, as I do have SELinux enabled on my machines. What changed in between 1.55.1 and future versions that could cause this? Unfortunately disabling SELinux is not an option for me just to solve this one issue.
@gcarrarom commented on GitHub (Jun 16, 2023):
Same happening with v1.61.0:
@bo0tzz commented on GitHub (Jun 16, 2023):
v1.56.0 introduced face recognition, which I believe is what added the onnxruntime dependency.
@nohitme commented on GitHub (Jun 16, 2023):
I am seeing a different error message when starting immich-machine-learning container (v1.61.0):
python: can't open file '/usr/src/app/src/main.py': [Errno 2] No such file or directory?Is this a new issue?
@bo0tzz commented on GitHub (Jun 16, 2023):
@nohitme that is an unrelated issue. Please make sure you're using the latest image and docker-compose.yml, and open a support thread in Discord or the Github Discussions if you still have trouble.
@nohitme commented on GitHub (Jun 16, 2023):
Understand it could be a separate issue. I will verify it separately on the latest image (I am sure it was tho) and report it if it persists.
Thanks for the reply!
@gcarrarom commented on GitHub (Jun 19, 2023):
Interesting.. Freshly built container image for machine learning from the main branch:
It's not k8s specific then. I'll remove the multi-step build to check if there's something missing/permission mismatch that could be happening on the container build.
@gcarrarom commented on GitHub (Jun 19, 2023):
Same error building with a simple pip install of the requirements. This is the container image I'm using:
It runs into the same problem, here's the directory it's trying to execute, it's owned by the root user:
The
onnxruntime_pybind11_state.cpython-311-x86_64-linux-gnu.sofile is not executable though, that might be the problem.@gcarrarom commented on GitHub (Jun 19, 2023):
Only thing that I can se affecting this now is SELinux on the host running the container runtime for k3s. Makes me wonder what exactly is this package trying to access.
EDIT: I mean the package from onnxruntime. I'm trying to build it using their base image to account for that portion before building the python packages of this immich machine-learning image. Oddly enough their process to build is not working as intended. I will try to continue troubleshooting tomorrow.
@gcarrarom commented on GitHub (Jun 30, 2023):
Error is slightly different now from the new version thanks to the update from #2951
I will try to make a few tweaks on the fsgroup in k8s and see if it helps.
@bo0tzz commented on GitHub (Jun 30, 2023):
Since this is SElinux not liking a dependency that I think we can't really do without (cc @mertalev?), I don't believe there is much we can do about this from the Immich side.
@gcarrarom commented on GitHub (Jun 30, 2023):
Kinda? I mean, the files are labeled as such in the container by default:
Sorry, now that I think about it, those labels are probably coming from the installation of the onnx dotnet runtime. Problem is how it gets flagged on the selinux level at the host:
I guess we could get the entrypoint of the container to change it? But that would mean running some sort of init container that could re-label those. I haven't had much time to look into it, sorry, but maybe I could play around with those labels and get a workaround for us.
@gcarrarom commented on GitHub (Jun 30, 2023):
Aha! That seems to be k3s that didn't enable selinux integration by default:
All pods are coming up as
unconfined_service_tSeems to be fixed by enabling the configuration on the node level: https://github.com/k3s-io/k3s/issues/533
It is weird that it should've been done by default. I'll look into it and report back to reference for anyone else that is also looking into it.
@gcarrarom commented on GitHub (Jun 30, 2023):
I can confirm, adding a proper label to the kubernetes containers allowed the execution to work properly. My instance is now running just fine for all machine learning tasks:

Thanks very much for the amazing project!
@nkay08 commented on GitHub (Dec 27, 2023):
I am running the stack via docker-compose and I am using the latest docker-compose.yml.
I am experiencing the same issue as described above.
The
immich-machine-learningcontainer runs into this issue at startup:I am not really sure how I can solve this issue.