mirror of
https://github.com/immich-app/immich.git
synced 2026-02-24 19:08:04 +03:00
The openvino variant of the machine learning container responds differently to probes #2971
Closed
opened 2026-02-05 07:15:27 +03:00 by OVERLORD
·
14 comments
No Branch/Tag Specified
main
fix/warn-invalid-filetype
update-pwa
chore/translations
release/next
refactor/star-rating
renovate/flutter
renovate/major-machine-learning
uhthomas/feat-sort-smart-search
renovate/github-cqlabs-homebrew-dcm-1.x
push-vxwxqoulmxun
push-zlzxxyywnmtr
feat/mobile-edit-2-server-sync-entity
chore/deduplicate-storage-template-example
feat/splash-screen-error
fix/download-button
fix/maintenance-reload
feat/video-player
feat/mobile-editing
feat/use-native-clients
refactor/remove-replace-with-upload
push-snrprxmlposz
push-okmnxsumoyzr
uhthomas/chore-mobile-maplibre
feat/library-offline-stats
uhthomas/mobile-fix-asset-details-album-pop
feat/crawl-wrapper
feat/open-in-browser
push-skvzqoozqkpl
feat/custom-date-range
feat/edit-filters
fix/locale-settings-desc
push-xyozownmuwqp
push-lvyturrtwkrq
push-mvnsqpxklmnu
push-ztrmyrpuwvow
push-rsywxvptwxuv
push-pvvtwywwqzvy
postgres-socketio
feat/pg-queue
proposal/zod
refactor/asset-upload
feat/integrity-checks-izzy
renovate/connectivity_plus-7.x
better-project-structure
uhthomas/mobile-feat-asset-viewer-details
fix/ml-rocm-build
fix/25803
feat/asset-file-apis
midzelis/wip
push-zpwsovysllvn
push-nwxlpmyzkyrl
feature/bottom-buttons-order
sqlite_thumbs
fix-keep-correct-ios-shared-album-asset
fix-memory-generation-and-display
push-vpxwmwwxwnvw
fix-migration-width-height
revert/prettier-translations
shared-deep-link-handler
feat/thumbnail-native-clients
feat/platform-clients
fix/foreground-cloud-sync
filter-by-person
feat/csp
refactor/sidebar
fix/disable-editing
fix/view-timeline-deeplink
image-zoom-on-slow-connection
fix-consider-dar-for-video-dimension
fix/merged-edited-assets
open-api-fix
feat/create-job-with-dto
use-toast-primary
feat/vitest-4
feat/ios-fastlane-match
match-signing
fix-update-time-update-timeline
feat/modal-routes
feat/panorama-tiles
feature/mobile-view-asset-owner
feat/system-settings
feature/show-activity-count
better-info-in-asset-viewer
fix/all-people-count
feat/location-favorites
feature/rearrange-buttons-2
fix/download-storage-template
feat/kb-shortcuts-mobile
fix/people-count
push-qolzzzzxrvvn
chore/originals-in-asset-files
feat/asset-size-columns
ben/tree-a11y
new-search-filter-ui
refactor/expectSelectedReadonly
refactor/mobile-grdb
push-qvuktpxmkknu
feat/mobile-native-local-sync
refactor/timeline_ops
fix/scrubber_end
feat/version.txt
feat/context-menus
feat/server-chunked-uploads
refactor/virtualsegment
refactor/rename_daymonth_groups
fix/restrict-android-bg-worker
feat/android-periodic-worker
fix-remote-sync-clean-up
refactor/timeline_move_ops
fix/timeline_split_selectable
feat/keyboard_actions_help_modal
feat/static_frontend
feat/notification-warnign-android
feat/plugins2
feat/plugins
test/create-workflow-token-action
fix/docs-force
debug/search-result-similarity
debug/cf-chunked-uploads
feat/eslint_rule
feat/search-filter-album/web
refactor/timeline_photostream
refactor/timelineasset_asset
feat/session-permissions
feat/timeline_photostream_assetnav
feat/timeline_minor_optimize
feat/timeline_perf_nocomp
feat/timeline_search_results_actions
feat/timeline_search_results_page
fix/timeline_padding
fix/timeline_search_reactivity_warnings
feat/timeline_scrollbar
feat/timeline_stream_withviewer
fix/timeline_back_forth_nav
refactor/timeline_photostream_component
fix/generated-files-checks
fix/locate-button-local
chore/base-image-mimalloc
refactor/timeline_assetlayout
refactor/timeline_selectable
refactor/timeline_aware_actions
refactor/timeline_monthsegment
feat/remove-old-pages
chore/deps-gradle
tmp_photostream
tmp/lcms
feat/mobile-dynamic-thumbnails
fix/mobile-finer-thumbnail-concurrency
refactor/timeline1
refactor/extract_photostream
refactor/rename_load_api
refactor/timeline2
refactor/timeline3
feat/multi-select-asset-viewer
feat-no-thumbhash-cache
refactor/asset_grid
feat/faster-access-checks
fix/18991
fix/19543
chore/temp-remove
fix/21419
feat/mobile-hdr-images
chore/update-mise-lockfile
feat/mise-server-checks
feat/mise-ci
feat/windows-2025
feat/dev_cli
refactor/mobile-migrate-clients
fix/map-theme
fix/require-checkbox
chore/use_swc
feat/efficient-thumbnail-decoding
refactor/mobile-thumbhash
refactor/mobile-thumbhash-new
feat/beta-background-upload
fix/beta-timeline-memories-setting
fix/failed-uploads-not-removed
feat/mobile-shared-album
feat/groups
drift-map-page
drift-auth-user-sync
fix/disable-memory
feat/add-to-album-action
edit-date-time-action
drift-people-page
sqlite-remove-isIn
chore/required-reviewers
refact/asset-manager
fix/folder-sort
pnpm
feat/widget-multiple-server-urls
chore/medium-tests-dbname
fix/web-no-iterator-find
fix/map-pan-interruption
track-livephotos
timeline_events
chore/oxlint-migration
feat/maintenance-worker
feat/dav
chore/demo-snapshot
refactor/server-side-dedupe
feat/integrity-checks
dev/recognition-eval
lighter_buckets_test
perf/postgres-queue
postgres-queue
focus_rings
refactor/web-stores-1
refactor/add-to-taken
feat/sort-places
vet
tmp/demo-snapshot-preview
fix/server-migration-file-extension
fix/asset-update-race-condition
rknn-toolkit-lite2
refactor/mobile-split-up-search-page
feature/Add-rocm-support-for-machine-learning
feat/rocm
chore/async-hash-file
feat/shared-link-view-count
feat/rotation
feat/graphql
feat/job-ids
feat/ignore-library-permission-error
feat/docker-compose-builder
feat/kysely-typeorm
mobile/onboarding
no-video-player
fix/server-qsv-output-format
chore/server-geodata-tweaks
mobile/native-video-player-no-hero
feat/xxhash
fix/docs-concurrency
feat/local-tileserver
refactor/exif-orientation
original-path-infix
refactor/mobile/login-form-1
feat/server-editor-endpoints
fix/server-qsv-vbr
fix-mobile-db-problems
feat/ml-armnn-conversion
feat/mobile/backup-with-album-info
feat/fast-initial-sync-1
chore/handle-output_dims
feat/unassign-faces
feat/shortcuts-on-asset-grid
feat/capacitor-mobile-app-poc
feat/server-nvenc-hw-decoding
fix/mobile-fetch-non-archive
web/automation-ui
feat/mobile-server-endpoint-save-dropdown
object-storage
feat/memories-animations
dev/metrics
ml/tflite
feat/ml-export-cli
v2.5.6
v2.5.5
v2.5.4
v2.5.3
v2.5.2
v2.5.1
v2.5.0
v2.4.1
v2.4.0
v2.3.1
v2.3.0
v2.2.3
v2.2.2
v2.2.1
v2.2.0
v2.1.0
v2.0.1
v2.0.0
v1.144.1
v1.144.0
v1.143.1
v1.143.0
v1.142.1
v1.142.0
v1.141.1
v1.141.0
v1.140.1
v1.140.0
v1.139.4
v1.139.3
v1.139.2
v1.139.1
v1.139.0
v1.138.1
v1.138.0
v1.137.3
v1.137.2
v1.137.1
v1.137.0
v1.136.0
v1.135.3
v1.135.2
v1.135.1
v1.135.0
v1.134.0
v1.133.1
v1.133.0
v1.132.3
v1.132.2
v1.132.1
v1.132.0
v1.131.3
v1.131.2
v1.131.1
v1.131.0
v1.130.3
v1.130.2
v1.130.1
v1.130.0
v1.129.0
v1.128.0
v1.127.0
v1.126.1
v1.126.0
v1.125.7
v1.125.6
v1.125.5
v1.125.4
v1.125.3
v1.125.2
v1.125.1
v1.125.0
v1.124.2
v1.124.1
v1.124.0
v1.123.0
v1.122.3
v1.122.2
v1.122.1
v1.122.0
v1.121.0
v1.120.2
v1.120.1
v1.120.0
v1.119.1
v1.119.0
v1.118.2
v1.118.1
v1.118.0
v1.117.0
v1.116.2
v1.116.1
v1.116.0
v1.115.0
v1.114.0
v1.113.1
v1.113.0
v1.112.1
v1.112.0
v1.111.0
v1.110.0
v1.109.2
v1.109.1
v1.109.0
v1.108.0
v1.107.2
v1.107.1
v1.107.0
v1.106.4
v1.106.3
v1.106.2
v1.106.1
v1.106.0
v1.105.1
v1.105.0
v1.104.0
v1.103.1
v1.103.0
v1.102.3
v1.102.2
v1.102.1
v1.102.0
v1.101.0
v1.100.0
v1.99.0
v1.98.2
v1.98.1
v1.98.0
v1.97.0
v1.96.0
v1.95.1
v1.95.0
v1.94.1
v1.94.0
v1.93.3
v1.93.2
v1.93.1
v1.93.0
v1.92.1
v1.92.0
v1.91.4
v1.91.3
v1.91.2
v1.91.1
v1.91.0
v1.90.2
v1.90.1
v1.90.0
v1.89.0
v1.88.2
v1.88.1
v1.88.0
v1.87.0
v1.86.0
v1.85.0
v1.84.0
v1.83.0
v1.82.1
v1.82.0
v1.81.1
v1.81.0
v1.80.0
v1.79.1
v1.79.0
v1.78.1
v1.78.0
v1.77.0
v1.76.1
v1.76.0
v1.75.2
v1.75.1
v1.75.0
v1.74.0
v1.73.0
v1.72.2
v1.72.1
v1.72.0
v1.71.0
v1.70.0
v1.69.0
v1.68.0
v1.67.2
v1.67.1
v1.67.0
v1.66.1
v1.66.0
v1.65.0
v1.64.0
v1.63.2
v1.63.1
v1.63.0
v1.62.1
v1.62.0
v1.61.0
v1.60.0
v1.59.1
v1.59.0
v1.58.0
v1.57.1
v1.57.0
v1.56.2
v1.56.1
v1.56.0
v1.55.1
v1.55.0
v1.54.1
v1.54.0
v1.53.0
v1.52.1
v1.52.0
v1.51.2
v1.51.1
v1.51.0
v1.50.1
v1.50.0
v1.49.0
v1.48.1
v1.48.0
v1.47.3
v1.47.2
v1.47.1
v1.47.0
v1.46.1
v1.46.0
v1.45.0
v1.44.0
v1.43.1
v1.43.0
v1.42.0_65-dev
v1.41.1_64-dev
v1.41.0_64-dev
v1.40.1_63-dev
v1.40.0_63-dev
v1.39.0_61-dev
v1.38.2_60-dev
v1.38.1_60-dev
v1.38.0_60-dev
v1.37.0_58-dev
v1.36.2_56-dev
v1.36.1_55-dev
v1.36.0_55-dev
v1.35.0_54-dev
v1.34.0_53-dev
v1.33.1_52-dev
v1.33.0_52-dev
v1.32.1_51-dev
v1.32.0_50-dev
v1.31.1_49-dev
v1.31.0_49-dev
v1.30.2_48-dev
v1.30.0_46-dev
v1.29.6_45-dev
v1.29.6_44-dev
v1.29.5_44-dev
v1.29.4_44-dev
v1.29.3_43-dev
v1.29.2_43-dev
v1.29.1_43-dev
v1.29.0_42-dev
v1.28.4_41-dev
v1.28.4_42-dev
v1.28.3_41-dev
v1.28.2_40-dev
v1.28.1_39-dev
v1.28.0_38-dev
v1.27.0_37-dev
v1.26.0_36-dev
v1.25.0_35-dev
v1.24.0_34-dev
v1.23.0_33-dev
v1.22.0_32-dev
v1.21.1_31-dev
v1.21.0_31-dev
v1.20.3_30-dev
v1.20.2_30-dev
v1.20.1_30-dev
v1.20.0_30-dev
v1.19.1_29-dev
v1.19.0_29-dev
v1.18.0_27-dev
v1.17.0_25-dev
v1.16.0_23-dev
v1.15.1_21-dev
v1.15.0_21-dev
v1.14.0_21-dev
v1.13.0_20-dev
v1.12.0_18-dev
v1.11.0_17-dev
v1.10.0_15-dev
v1.9.1_14-dev
v1.9.0_13-dev
v1.8.0_12-dev
v1.7.0_11-dev
v1.6.0_10-dev
v1.5.1+9-dev
v1.5.0+8-dev
v1.4.0+7-dev
v1.4.0+6-dev
v1.4.0-dev
v1.3.0-dev
v1.3.1-dev
v0.6-dev
v0.5-dev
v0.4-dev
v0.3-dev
v0.2-dev
first-android-release
Labels
Clear labels
accessibility
changelog:enhancement
changelog:security
changelog:skip
changelog:translation
cli
date-time
dependencies
documentation
external-library
format
good first issue
mobile-beta
mobile-beta
mobile-beta
needs-answer
nice to have
pull-request
sharing
tech-debt
📱mobile
🖥️web
🗄️server
🧠machine-learning
Mirrored from GitHub Pull Request
No Label
Milestone
No items
No Milestone
Projects
Clear projects
No project
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: immich-app/immich#2971
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @djjudas21 on GitHub (Apr 19, 2024).
Originally assigned to: @mertalev on GitHub.
The bug
I have been using Immich with Kubernetes without problems, via the Helm chart. This week I have been working on enabling GPU support.
I noticed that a deployment of the image
immich-app/immich-machine-learning:v1.101.0works completely fine, but when I switch toimmich-app/immich-machine-learning:v1.101.0-openvinowith no other changes, that container gets into CrashLoopBackoff after the model is loaded, because the Liveness probes fail and the container get repeatedly restarted.This issue is specific to Kubernetes, but it clearly demonstrates that there is some kind of difference in behaviour between the standard and openvino variants of the image, so I feel this issue belongs here.
The OS that Immich Server is running on
Kubernetes
Version of Immich Server
v1.101.0
Version of Immich Mobile App
1.101.0 build.147
Platform with the issue
Your docker-compose.yml content
Your .env content
Reproduction steps
Relevant log output
Additional information
No response
@bo0tzz commented on GitHub (Apr 19, 2024):
I think the probes for the ML deployment don't quite work right anyways, and if a model takes too long to load (or if it's being downloaded at startup through the preload env var) I've had my non-openvino instances occasionally get killed as well. It could be that the openvino image just takes slightly longer to load things, thus exposing that issue?
@djjudas21 commented on GitHub (Apr 19, 2024):
I'll have a play with the probes and see if I can get it more stable. The default values look pretty aggressive IMO. Timeout of 1s, period of 10s. If a probe fails, it gets retried immediately - so the liveness probe could actually kill the container after just 3 seconds, which is insane.
@djjudas21 commented on GitHub (Apr 19, 2024):
Eww, they're hard-coded... https://github.com/immich-app/immich-charts/blob/main/charts/immich/templates/machine-learning.yaml#L28
@djjudas21 commented on GitHub (Apr 19, 2024):
OK so I manually configured the probes to have longer timeouts and a great tolerance of failed probes. It didn't get killed by Kubernetes, but it crashed in a nasty way. I'm well out of my depth with this
@mertalev commented on GitHub (Apr 19, 2024):
For the first issue, models when using OpenVINO take some time to start on the first load since they're compiled to that format. But everything is supposed to happen in a separate thread pool without blocking the main thread, so I'm not sure what's causing this to be blocking.
For the second issue, OpenVINO is buggy and doesn't currently work with facial recognition, see #8226.
@djjudas21 commented on GitHub (Apr 19, 2024):
Thanks for clarifying about the OpenVINO bug, I'm following that one now.
The startup time for the machine learning container isn't straightforward though. If it immediately loaded the models when the container started, that would be fine because you can define a startupProbe in Kubernetes to protect the container during this period. But instead, the container starts up quickly and the model is not loaded until a request for ML comes in from Immich, which may happen a long time after the container has started, so there is no way to work around this in Kubernetes, other than to make the probes very generous.
I'm no software engineer, but it does sound odd that the main thread gets blocked.
In my case, I have no choice but to disable GPU acceleration for ML if OpenVINO doesn't support it, but I'll leave this issue open because it sounds like there is some investigation to be done with the thread pools, as @bo0tzz said that also affects the non-openvino container.
Thanks for your help, @mertalev. Have a nice weekend!
@mertalev commented on GitHub (Apr 19, 2024):
There are envs to preload certain models if it helps: setting
MACHINE_LEARNING_PRELOAD__CLIP=ViT-B-32__openaiandMACHINE_LEARNING_PRELOAD__FACIAL_RECOGNITION=buffalo_lwill eagerly load those models at startup without waiting for a request.@mertalev commented on GitHub (Jun 19, 2024):
I think this is because of the Python GIL. When it compiles to OpenVINO, it holds onto the GIL and prevents other threads from executing. That means it can't respond to probes during that time. Not sure what I can do about that short of putting it in a subprocess or something.
@djjudas21 commented on GitHub (Aug 6, 2024):
I can confirm this is still a problem. With #8226 now resolved, I had another crack at enabling hardware acceleration for my ML. I'm using the
immich-machine-learning:main-openvinoimage. It starts up properly and appears stable at idle, but once the first ML job is started, the container stops responding and gets killed.@djjudas21 commented on GitHub (Aug 6, 2024):
Starting up with
MACHINE_LEARNING_PRELOAD__CLIP=ViT-B-32__openaiandMACHINE_LEARNING_PRELOAD__FACIAL_RECOGNITION=buffalo_ldoes indeed load the models up front, but that just causes the container to fail its probes earlier@mertalev commented on GitHub (Aug 6, 2024):
The models only take that long the first time they're compiled. You can preload them and set a really high timeout for the probes to let it get compiled.
@djjudas21 commented on GitHub (Aug 6, 2024):
Good idea. Unfortunately that brings me to a second issue with the Helm chart where the
startupProbeis hardcoded to false but I have addedinitialDelaySecondsto the liveness and readiness probes.Here's my full, working ML block for a Kubernetes deployment via Helm chart, because examples/documentation seem a bit lacking for this 🙂
@hranicka commented on GitHub (Apr 23, 2025):
Having the same issues. Probes are not responding an the pods are being shut down. Using official Helm charts and image
v1.131.3-openvino.As a workaround, after a deployment, I had to patch the deployment:
Setting this in
values.yamldid not help:@jrasm91 commented on GitHub (Sep 19, 2025):
Sounds like this issue is essentially resolved, and that potentially a separate issue should be opened in https://github.com/immich-app/immich-charts for probe configuration or default values for the machine learning pod.