Machine-learning crashes when loading model on startup #2586

New Issue

OVERLORD · 2026-02-05T06:13:46+03:00

OVERLORD commented

2026-02-05 06:13:46 +03:00

Originally created by @bo0tzz on GitHub (Mar 13, 2024).

[03/13/24 08:42:29] INFO     Starting gunicorn 21.2.0                           
[03/13/24 08:42:29] INFO     Listening at: http://0.0.0.0:3003 (9)              
[03/13/24 08:42:29] INFO     Using worker: app.config.CustomUvicornWorker       
[03/13/24 08:42:29] INFO     Booting worker with pid: 13                        
[03/13/24 08:42:29] DEBUG    Could not load ANN shared libraries, using ONNX:   
                             libmali.so: cannot open shared object file: No such
                             file or directory                                  
[03/13/24 08:42:33] INFO     Started server process [13]                        
[03/13/24 08:42:33] INFO     Waiting for application startup.                   
[03/13/24 08:42:33] INFO     Created in-memory cache with unloading after 300s  
                             of inactivity.                                     
[03/13/24 08:42:33] INFO     Initialized request thread pool with 4 threads.    
[03/13/24 08:42:33] INFO     Preloading models:                                 
                             clip='ViT-H-14-378-quickgelu__dfn5b'               
                             facial_recognition=None                            
[03/13/24 08:42:33] DEBUG    Available ORT providers: {'CPUExecutionProvider',  
                             'AzureExecutionProvider'}                          
[03/13/24 08:42:33] INFO     Setting 'ViT-H-14-378-quickgelu__dfn5b' execution  
                             providers to ['CPUExecutionProvider'], in          
                             descending order of preference                     
[03/13/24 08:42:33] DEBUG    Setting execution provider options to              
                             [{'arena_extend_strategy': 'kSameAsRequested'}]    
[03/13/24 08:42:33] DEBUG    Setting execution_mode to ORT_SEQUENTIAL           
[03/13/24 08:42:33] DEBUG    Setting inter_op_num_threads to 1                  
[03/13/24 08:42:33] DEBUG    Setting intra_op_num_threads to 2                  
[03/13/24 08:42:33] DEBUG    Setting preferred runtime to onnx                  
[03/13/24 08:42:33] DEBUG    Checking for inactivity...                         
[03/13/24 08:42:33] INFO     Loading clip model 'ViT-H-14-378-quickgelu__dfn5b' 
                             to memory                                          
[03/13/24 08:42:33] DEBUG    Loading clip text model                            
                             'ViT-H-14-378-quickgelu__dfn5b'                    
[03/13/24 08:42:34] DEBUG    Loaded clip text model                             
                             'ViT-H-14-378-quickgelu__dfn5b'                    
[03/13/24 08:42:34] DEBUG    Loading clip vision model                          
                             'ViT-H-14-378-quickgelu__dfn5b'                    
[03/13/24 08:42:35] ERROR    Traceback (most recent call last):                 
                               File                                             
                             "/opt/venv/lib/python3.11/site-packages/starlette/r
                             outing.py", line 734, in lifespan                  
                                 async with self.lifespan_context(app) as       
                             maybe_state:                                       
                               File "/usr/local/lib/python3.11/contextlib.py",  
                             line 210, in __aenter__                            
                                 return await anext(self.gen)                   
                                        ^^^^^^^^^^^^^^^^^^^^^                   
                               File "/usr/src/app/main.py", line 55, in lifespan
                                 await preload_models(settings.preload)         
                               File "/usr/src/app/main.py", line 69, in         
                             preload_models                                     
                                 await load(await                               
                             model_cache.get(preload_models.clip,               
                             ModelType.CLIP))                                   
                               File "/usr/src/app/main.py", line 137, in load   
                                 await run(_load, model)                        
                               File "/usr/src/app/main.py", line 125, in run    
                                 return await                                   
                             asyncio.get_running_loop().run_in_executor(thread_p
                             ool, func, inputs)                                 
                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                
                               File                                             
                             "/usr/local/lib/python3.11/concurrent/futures/threa
                             d.py", line 58, in run                             
                                 result = self.fn(*self.args, **self.kwargs)    
                                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^    
                               File "/usr/src/app/main.py", line 134, in _load  
                                 model.load()                                   
                               File "/usr/src/app/models/base.py", line 53, in  
                             load                                               
                                 self._load()                                   
                               File "/usr/src/app/models/clip.py", line 146, in 
                             _load                                              
                                 super()._load()                                
                               File "/usr/src/app/models/clip.py", line 41, in  
                             _load                                              
                                 self.vision_model =                            
                             self._make_session(self.visual_path)               
                                                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^
                             ^^^^^^^^^                                          
                               File "/usr/src/app/models/base.py", line 121, in 
                             _make_session                                      
                                 session = ort.InferenceSession(                
                                           ^^^^^^^^^^^^^^^^^^^^^                
                               File                                             
                             "/opt/venv/lib/python3.11/site-packages/onnxruntime
                             /capi/onnxruntime_inference_collection.py", line   
                             419, in __init__                                   
                                 self._create_inference_session(providers,      
                             provider_options, disabled_optimizers)             
                               File                                             
                             "/opt/venv/lib/python3.11/site-packages/onnxruntime
                             /capi/onnxruntime_inference_collection.py", line   
                             483, in _create_inference_session                  
                                 sess.initialize_session(providers,             
                             provider_options, disabled_optimizers)             
                             onnxruntime.capi.onnxruntime_pybind11_state.Fail:  
                             [ONNXRuntimeError] : 1 : FAIL : Deserialize tensor 
                             onnx::MatMul_6214 failed.GetFileLength for         
                             /cache/clip/ViT-H-14-378-quickgelu__dfn5b/visual/Co
                             nstant_7383_attr__value failed:Invalid fd was      
                             supplied: -1                                       
                                                                                
[03/13/24 08:42:35] ERROR    Application startup failed. Exiting.               
[03/13/24 08:42:35] INFO     Worker exiting (pid: 13)                           
[03/13/24 08:42:35] ERROR    Worker (pid:13) exited with code 3                 
[03/13/24 08:42:35] ERROR    Shutting down: Master                              
[03/13/24 08:42:35] ERROR    Reason: Worker failed to boot.

This might potentially be because the download of the model was interrupted midway through? The download also seems to be taking an unreasonable amount of time for me, and I'm not getting any progress indications.

Env vars used:

MACHINE_LEARNING_PRELOAD__CLIP: "ViT-H-14-378-quickgelu__dfn5b"
MACHINE_LEARNING_WORKER_TIMEOUT: 3600
TRANSFORMERS_CACHE: /cache

Originally created by @bo0tzz on GitHub (Mar 13, 2024). ``` [03/13/24 08:42:29] INFO Starting gunicorn 21.2.0 [03/13/24 08:42:29] INFO Listening at: http://0.0.0.0:3003 (9) [03/13/24 08:42:29] INFO Using worker: app.config.CustomUvicornWorker [03/13/24 08:42:29] INFO Booting worker with pid: 13 [03/13/24 08:42:29] DEBUG Could not load ANN shared libraries, using ONNX: libmali.so: cannot open shared object file: No such file or directory [03/13/24 08:42:33] INFO Started server process [13] [03/13/24 08:42:33] INFO Waiting for application startup. [03/13/24 08:42:33] INFO Created in-memory cache with unloading after 300s of inactivity. [03/13/24 08:42:33] INFO Initialized request thread pool with 4 threads. [03/13/24 08:42:33] INFO Preloading models: clip='ViT-H-14-378-quickgelu__dfn5b' facial_recognition=None [03/13/24 08:42:33] DEBUG Available ORT providers: {'CPUExecutionProvider', 'AzureExecutionProvider'} [03/13/24 08:42:33] INFO Setting 'ViT-H-14-378-quickgelu__dfn5b' execution providers to ['CPUExecutionProvider'], in descending order of preference [03/13/24 08:42:33] DEBUG Setting execution provider options to [{'arena_extend_strategy': 'kSameAsRequested'}] [03/13/24 08:42:33] DEBUG Setting execution_mode to ORT_SEQUENTIAL [03/13/24 08:42:33] DEBUG Setting inter_op_num_threads to 1 [03/13/24 08:42:33] DEBUG Setting intra_op_num_threads to 2 [03/13/24 08:42:33] DEBUG Setting preferred runtime to onnx [03/13/24 08:42:33] DEBUG Checking for inactivity... [03/13/24 08:42:33] INFO Loading clip model 'ViT-H-14-378-quickgelu__dfn5b' to memory [03/13/24 08:42:33] DEBUG Loading clip text model 'ViT-H-14-378-quickgelu__dfn5b' [03/13/24 08:42:34] DEBUG Loaded clip text model 'ViT-H-14-378-quickgelu__dfn5b' [03/13/24 08:42:34] DEBUG Loading clip vision model 'ViT-H-14-378-quickgelu__dfn5b' [03/13/24 08:42:35] ERROR Traceback (most recent call last): File "/opt/venv/lib/python3.11/site-packages/starlette/r outing.py", line 734, in lifespan async with self.lifespan_context(app) as maybe_state: File "/usr/local/lib/python3.11/contextlib.py", line 210, in __aenter__ return await anext(self.gen) ^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/app/main.py", line 55, in lifespan await preload_models(settings.preload) File "/usr/src/app/main.py", line 69, in preload_models await load(await model_cache.get(preload_models.clip, ModelType.CLIP)) File "/usr/src/app/main.py", line 137, in load await run(_load, model) File "/usr/src/app/main.py", line 125, in run return await asyncio.get_running_loop().run_in_executor(thread_p ool, func, inputs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/local/lib/python3.11/concurrent/futures/threa d.py", line 58, in run result = self.fn(*self.args, **self.kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/usr/src/app/main.py", line 134, in _load model.load() File "/usr/src/app/models/base.py", line 53, in load self._load() File "/usr/src/app/models/clip.py", line 146, in _load super()._load() File "/usr/src/app/models/clip.py", line 41, in _load self.vision_model = self._make_session(self.visual_path) ^^^^^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^ File "/usr/src/app/models/base.py", line 121, in _make_session session = ort.InferenceSession( ^^^^^^^^^^^^^^^^^^^^^ File "/opt/venv/lib/python3.11/site-packages/onnxruntime /capi/onnxruntime_inference_collection.py", line 419, in __init__ self._create_inference_session(providers, provider_options, disabled_optimizers) File "/opt/venv/lib/python3.11/site-packages/onnxruntime /capi/onnxruntime_inference_collection.py", line 483, in _create_inference_session sess.initialize_session(providers, provider_options, disabled_optimizers) onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Deserialize tensor onnx::MatMul_6214 failed.GetFileLength for /cache/clip/ViT-H-14-378-quickgelu__dfn5b/visual/Co nstant_7383_attr__value failed:Invalid fd was supplied: -1 [03/13/24 08:42:35] ERROR Application startup failed. Exiting. [03/13/24 08:42:35] INFO Worker exiting (pid: 13) [03/13/24 08:42:35] ERROR Worker (pid:13) exited with code 3 [03/13/24 08:42:35] ERROR Shutting down: Master [03/13/24 08:42:35] ERROR Reason: Worker failed to boot. ``` This might potentially be because the download of the model was interrupted midway through? The download also seems to be taking an unreasonable amount of time for me, and I'm not getting any progress indications. Env vars used: ```env MACHINE_LEARNING_PRELOAD__CLIP: "ViT-H-14-378-quickgelu__dfn5b" MACHINE_LEARNING_WORKER_TIMEOUT: 3600 TRANSFORMERS_CACHE: /cache ```

OVERLORD added the 🧠machine-learning label 2026-02-05 06:13:46 +03:00

OVERLORD closed this issue

2026-02-05 06:13:48 +03:00

OVERLORD commented

2026-02-05 06:13:50 +03:00

@lexcao1729 commented on GitHub (Mar 14, 2024):

I have the same problem.

@lexcao1729 commented on GitHub (Mar 14, 2024): I have the same problem.

OVERLORD commented

2026-02-05 06:13:52 +03:00

@bo0tzz commented on GitHub (Mar 14, 2024):

I ended up getting things to work by clearing out the model cache and significantly increasing the timeout, to give ample time to download the new model. However bad state like this should still not cause a crash if it does happen.

@bo0tzz commented on GitHub (Mar 14, 2024): I ended up getting things to work by clearing out the model cache and significantly increasing the timeout, to give ample time to download the new model. However bad state like this should still not cause a crash if it does happen.

OVERLORD commented

2026-02-05 06:13:58 +03:00

@mertalev commented on GitHub (Mar 19, 2024):

Can you confirm if it still times out with the default limit if you increase request threads? I'm wondering if it's because there aren't enough threads to go around with only 4.

@mertalev commented on GitHub (Mar 19, 2024): Can you confirm if it still times out with the default limit if you increase request threads? I'm wondering if it's because there aren't enough threads to go around with only 4.

OVERLORD commented

2026-02-05 06:14:00 +03:00

@lexcao1729 commented on GitHub (Mar 19, 2024):

I ended up getting things to work by clearing out the model cache and significantly increasing the timeout, to give ample time to download the new model. However bad state like this should still not cause a crash if it does happen.

This works, thank you.

@lexcao1729 commented on GitHub (Mar 19, 2024): > I ended up getting things to work by clearing out the model cache and significantly increasing the timeout, to give ample time to download the new model. However bad state like this should still not cause a crash if it does happen. This works, thank you.

OVERLORD commented

2026-02-05 06:14:02 +03:00

@jrasm91 commented on GitHub (Sep 7, 2024):

Pretty sure this has been fixed

@jrasm91 commented on GitHub (Sep 7, 2024): Pretty sure this has been fixed

Sign in to join this conversation.

Branches Tags

main

fix/web-people-hidden-state

fix-filename-search-label

chore/yank-cloud-id

chore/oauth-labels

renovate/machine-learning

uhthomas/mobile-fix-app-bar-fade

feat/debug-schema

renovate/typescript-projects

fix/25803

feat/asset-file-apis

chore/translations

fix/web-switch-label-clickable

release/next

fix/timezones

fix/time-zone-upserts

midzelis/wip

push-zpwsovysllvn

push-nwxlpmyzkyrl

push-nvnkszuqwppm

renovate/github-actions

push-smstsuupsowp

refactor/adaptive_image

push-olwpzvrxnomt

push-lmxsupnmxspl

feat/web-chromecast-video-looping

feat/use-native-clients

renovate/flutter

fix/create-face-edited

fix/mobile-ios-mtls

docs/contributing

docs/mise-mobile

renovate/grafana-monorepo

feature/bottom-buttons-order

feat/immich-mobile-ui-showcase

refactor/consolidate-image-requests

renovate/connectivity_plus-7.x

renovate/major-vitest-monorepo

renovate/pypi-python-multipart-vulnerability

fix/mobile-people-query

sqlite_thumbs

feat/html-text

chore/no-macro-validation

refactor/purchase-store

uhthomas/mobile-fix-asset-jump

feat/pano-ocr

feat/shared-link-login

fix/database-backup-db-names

fix-keep-correct-ios-shared-album-asset

fix-memory-generation-and-display

feat/verify-permissions

refactor/album-service-small-tests

fix/ml-rocm-build

fix/flipped-dimensions-mobile

push-vpxwmwwxwnvw

fix-migration-width-height

refactor/more-queries

revert/prettier-translations

refactor/asset-service-queries

fix/locale-settings-desc

chore/add-debug-log

feat/edit-filters

shared-deep-link-handler

feat/mobile-editing

feat/thumbnail-native-clients

feat/platform-clients

feat/integrity-checks-izzy

fix/foreground-cloud-sync

feat/dynamic-layout

filter-by-person

feat/csp

refactor/sidebar

fix/disable-editing

fix/view-timeline-deeplink

image-zoom-on-slow-connection

fix-consider-dar-for-video-dimension

fix/merged-edited-assets

perf/optimize-album-sort

open-api-fix

feat/create-job-with-dto

use-toast-primary

feat/vitest-4

feat/ios-fastlane-match

match-signing

fix-update-time-update-timeline

chore/translation-keys

feat/modal-routes

feat/panorama-tiles

feature/mobile-view-asset-owner

feat/system-settings

feature/show-activity-count

better-info-in-asset-viewer

fix/all-people-count

feat/location-favorites

feature/rearrange-buttons-2

fix/download-storage-template

feat/kb-shortcuts-mobile

fix/people-count

push-qolzzzzxrvvn

chore/originals-in-asset-files

feat/asset-size-columns

ben/tree-a11y

new-search-filter-ui

refactor/expectSelectedReadonly

refactor/mobile-grdb

push-qvuktpxmkknu

feat/mobile-native-local-sync

refactor/timeline_ops

fix/scrubber_end

feat/version.txt

feat/context-menus

feat/server-chunked-uploads

refactor/virtualsegment

refactor/rename_daymonth_groups

fix/restrict-android-bg-worker

feat/android-periodic-worker

fix-remote-sync-clean-up

refactor/timeline_move_ops

renovate/mapbox-mapbox-gl-rtl-text-0.x

fix/timeline_split_selectable

feat/keyboard_actions_help_modal

feat/static_frontend

feat/notification-warnign-android

feat/plugins2

feat/plugins

test/create-workflow-token-action

fix/docs-force

debug/search-result-similarity

debug/cf-chunked-uploads

feat/eslint_rule

feat/search-filter-album/web

refactor/timeline_photostream

refactor/timelineasset_asset

feat/session-permissions

feat/timeline_photostream_assetnav

feat/timeline_minor_optimize

feat/timeline_perf_nocomp

feat/timeline_search_results_actions

feat/timeline_search_results_page

fix/timeline_padding

fix/timeline_search_reactivity_warnings

feat/timeline_scrollbar

feat/timeline_stream_withviewer

fix/timeline_back_forth_nav

refactor/timeline_photostream_component

fix/generated-files-checks

fix/locate-button-local

chore/base-image-mimalloc

refactor/timeline_assetlayout

refactor/timeline_selectable

refactor/timeline_aware_actions

refactor/timeline_monthsegment

feat/remove-old-pages

chore/deps-gradle

tmp_photostream

tmp/lcms

feat/mobile-dynamic-thumbnails

fix/mobile-finer-thumbnail-concurrency

refactor/timeline1

refactor/extract_photostream

refactor/rename_load_api

refactor/timeline2

refactor/timeline3

feat/multi-select-asset-viewer

feat-no-thumbhash-cache

refactor/asset_grid

feat/faster-access-checks

fix/18991

fix/19543

chore/temp-remove

fix/21419

feat/mobile-hdr-images

chore/update-mise-lockfile

feat/mise-server-checks

feat/mise-ci

feat/windows-2025

feat/dev_cli

refactor/mobile-migrate-clients

fix/map-theme

fix/require-checkbox

chore/use_swc

feat/efficient-thumbnail-decoding

refactor/mobile-thumbhash

refactor/mobile-thumbhash-new

fix/mobile-uncached-zoom

feat/beta-background-upload

fix/beta-timeline-memories-setting

fix/failed-uploads-not-removed

feat/mobile-shared-album

feat/groups

drift-map-page

drift-auth-user-sync

fix/disable-memory

feat/add-to-album-action

edit-date-time-action

drift-people-page

sqlite-remove-isIn

feat/inline-storage-columns

chore/required-reviewers

refact/asset-manager

fix/folder-sort

pnpm

feat/widget-multiple-server-urls

chore/medium-tests-dbname

fix/web-no-iterator-find

fix/map-pan-interruption

track-livephotos

timeline_events

chore/oxlint-migration

feat/maintenance-worker

feat/dav

chore/demo-snapshot

refactor/server-side-dedupe

feat/integrity-checks

dev/recognition-eval

lighter_buckets_test

perf/postgres-queue

postgres-queue

focus_rings

refactor/web-stores-1

refactor/add-to-taken

feat/sort-places

feat/sidecar-asset-file

vet

tmp/demo-snapshot-preview

fix/server-migration-file-extension

refactor/mobile-v2

fix/asset-update-race-condition

rknn-toolkit-lite2

refactor/mobile-split-up-search-page

feature/Add-rocm-support-for-machine-learning

feat/rocm

chore/async-hash-file

feat/shared-link-view-count

feat/rotation

feat/graphql

feat/job-ids

feat/ignore-library-permission-error

feat/docker-compose-builder

feat/kysely-typeorm

mobile/onboarding

no-video-player

fix/server-qsv-output-format

chore/server-geodata-tweaks

mobile/native-video-player-no-hero

feat/xxhash

fix/docs-concurrency

feat/preload-ml-textual-model

feat/local-tileserver

refactor/exif-orientation

original-path-infix

refactor/mobile/login-form-1

feat/server-editor-endpoints

fix/server-qsv-vbr

fix-mobile-db-problems

feat/ml-armnn-conversion

feat/mobile/backup-with-album-info

feat/fast-initial-sync-1

chore/handle-output_dims

feat/server-more-robust-generation

feat/unassign-faces

feat/shortcuts-on-asset-grid

feat/background-upload

feat/capacitor-mobile-app-poc

feat/server-nvenc-hw-decoding

release/v1.105

fix/mobile-fetch-non-archive

feat/fine-grained-access-controls

web/automation-ui

feat/mobile-server-endpoint-save-dropdown

feat/blurhash-thumbnail

object-storage

feat/memories-animations

dev/metrics

ml/tflite

feat/ml-export-cli

1 Participants

Notifications

Due Date

No due date set.

Dependencies

No dependencies set.

Reference: immich-app/immich#2586