Failed to Smart Search #4474

Closed
opened 2026-02-05 10:34:09 +03:00 by OVERLORD · 4 comments
Owner

Originally created by @jdicioccio on GitHub (Oct 5, 2024).

The bug

When doing a text search with Immich 1.117.0, I'm getting errors loading the model. I tried removing the model cache volume, but it just downloads the non-functioning model again.

The OS that Immich Server is running on

Debian bookworm

Version of Immich Server

v1.117.0

Version of Immich Mobile App

v1.117.0

Platform with the issue

  • Server
  • Web
  • Mobile

Your docker-compose.yml content

name: immich

services:
  immich-server:
    container_name: immich_server
    image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release}
    extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/hardware-transcoding
      file: hwaccel.transcoding.yml
      service: rkmpp # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding
    volumes:
      - ${UPLOAD_LOCATION}:/usr/src/app/upload
      - /data/photoprism/photos:/photoprism/photos:ro
      - /etc/localtime:/etc/localtime:ro
    env_file:
      - .env
    ports:
      - 2283:3001
    depends_on:
      - redis
      - database
    restart: always
    healthcheck:
      disable: false

  immich-machine-learning:
    container_name: immich_machine_learning
    # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag.
    # Example tag: ${IMMICH_VERSION:-release}-cuda
    image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-armnn
    extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration
      file: hwaccel.ml.yml
      service: armnn # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable
    volumes:
      - model-cache:/cache
    env_file:
      - .env
    restart: always
    healthcheck:
      disable: false

  redis:
    container_name: immich_redis
    image: docker.io/redis:6.2-alpine@sha256:e3b17ba9479deec4b7d1eeec1548a253acc5374d68d3b27937fcfe4df8d18c7e
    healthcheck:
      test: redis-cli ping || exit 1
    restart: always

  database:
    container_name: immich_postgres
    image: registry.hub.docker.com/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0
    environment:
      POSTGRES_PASSWORD: ${DB_PASSWORD}
      POSTGRES_USER: ${DB_USERNAME}
      POSTGRES_DB: ${DB_DATABASE_NAME}
      POSTGRES_INITDB_ARGS: '--data-checksums'
    volumes:
      - ${DB_DATA_LOCATION}:/var/lib/postgresql/data
    healthcheck:
      test: pg_isready --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' || exit 1; Chksum="$$(psql --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' --tuples-only --no-align --command='SELECT COALESCE(SUM(checksum_failures), 0) FROM pg_stat_database')"; echo "checksum failure count is $$Chksum"; [ "$$Chksum" = '0' ] || exit 1
      interval: 5m
      start_interval: 30s
      start_period: 5m
    restart: always
    command: ["postgres", "-c" ,"shared_preload_libraries=vectors.so", "-c", 'search_path="$$user", public, vectors', "-c", "logging_collector=on", "-c", "max_wal_size=2GB", "-c", "shared_buffers=512MB", "-c", "wal_compression=on"]

volumes:
  model-cache:

Your .env content

# You can find documentation for all the supported env variables at https://immich.app/docs/install/environment-variables

# The location where your uploaded files are stored
UPLOAD_LOCATION=/data/immich/library
# The location where your database files are stored
DB_DATA_LOCATION=/data/immich/db

# The Immich version to use. You can pin this to a specific version like "v1.71.0"
IMMICH_VERSION=release

# Connection secret for postgres. You should change it to a random password
DB_PASSWORD=...

# The values below this line do not need to be changed
###################################################################################
DB_USERNAME=postgres
DB_DATABASE_NAME=immich

Reproduction steps

  1. Open mobile app or web app
  2. Perform a text search

Relevant log output

[10/05/24 01:06:28] INFO     Downloading textual model
                             'ViT-B-16-SigLIP-384__webli'. This may take a
                             while.
Fetching 11 files: 100%|██████████| 11/11 [00:20<00:00,  1.83s/it]
[10/05/24 01:06:49] INFO     Loading textual model 'ViT-B-16-SigLIP-384__webli'
                             to memory
arm_release_ver: g13p0-01eac0, rk_so_ver: 10
[10/05/24 01:06:49] INFO     Loading ANN model
                             /cache/clip/ViT-B-16-SigLIP-384__webli/textual/mode
                             l.armnn ...
Warning: WARNING: Layer of type Cast is not supported on requested backend GpuAcc for input data type Signed32 and output data type Signed64 (reason: in validate_arguments src/gpu/cl/kernels/ClCastKernel.cpp:59: ITensor data type S64 not supported by this kernel), falling back to the next backend.
Warning: ERROR: Layer of type Cast is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Gather is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/core/CL/kernels/CLGatherKernel.cpp:58: ITensor data type S64 not supported by this kernel), falling back to the next backend.
Warning: ERROR: Layer of type Gather is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend.
Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ]
[10/05/24 01:06:50] ERROR    Exception in ASGI application

                             ╭─────── Traceback (most recent call last) ───────╮
                             │ /usr/src/app/main.py:152 in predict             │
                             │                                                 │
                             │   149 │   │   inputs = text                     │
                             │   150else:                                 │
                             │   151 │   │   raise HTTPException(400, "Either  │
                             │ ❱ 152 │   response = await run_inference(inputs │
                             │   153 │   return ORJSONResponse(response)       │
                             │   154                                           │
                             │   155                                           │
                             │                                                 │
                             │ /usr/src/app/main.py:175 in run_inference       │
                             │                                                 │
                             │   172 │   │   response[entry["task"]] = output  │
                             │   173 │                                         │
                             │   174 │   without_deps, with_deps = entries     │
                             │ ❱ 175 │   await asyncio.gather(*[_run_inference │
                             │   176 │   if with_deps:                         │
                             │   177 │   │   await asyncio.gather(*[_run_infer │
                             │   178 │   if isinstance(payload, Image):        │
                             │                                                 │
                             │ /usr/src/app/main.py:169 in _run_inference      │
                             │                                                 │
                             │   166 │   │   │   except KeyError:              │
                             │   167 │   │   │   │   message = f"Task {entry[' │
                             │       output of {dep}"                          │
                             │   168 │   │   │   │   raise HTTPException(400,  │
                             │ ❱ 169 │   │   model = await load(model)         │
                             │   170 │   │   output = await run(model.predict, │
                             │   171 │   │   outputs[model.identity] = output  │
                             │   172 │   │   response[entry["task"]] = output  │
                             │                                                 │
                             │ /usr/src/app/main.py:213 in load                │
                             │                                                 │
                             │   210 │   │   return model                      │
                             │   211 │                                         │
                             │   212 │   try:                                  │
                             │ ❱ 213 │   │   return await run(_load, model)    │
                             │   214 │   except (OSError, InvalidProtobuf, Bad │
                             │   215 │   │   log.warning(f"Failed to load {mod │
                             │       '{model.model_name}'. Clearing cache.")   │
                             │   216 │   │   model.clear_cache()               │
                             │                                                 │
                             │ /usr/src/app/main.py:188 in run                 │
                             │                                                 │
                             │   185 │   if thread_pool is None:               │
                             │   186 │   │   return func(*args, **kwargs)      │
                             │   187 │   partial_func = partial(func, *args, * │
                             │ ❱ 188 │   return await asyncio.get_running_loop │
                             │   189                                           │
                             │   190                                           │
                             │   191 async def load(model: InferenceModel) ->  │
                             │                                                 │
                             │ /usr/local/lib/python3.11/concurrent/futures/th │
                             │ read.py:58 in run                               │
                             │                                                 │
                             │ /usr/src/app/main.py:200 in _load               │
                             │                                                 │
                             │   197 │   │   │   raise HTTPException(500, f"Fa │
                             │   198 │   │   with lock:                        │
                             │   199 │   │   │   try:                          │
                             │ ❱ 200 │   │   │   │   model.load()              │
                             │   201 │   │   │   except FileNotFoundError as e │
                             │   202 │   │   │   │   if model.model_format ==  │
                             │   203 │   │   │   │   │   raise e               │
                             │                                                 │
                             │ /usr/src/app/models/base.py:53 in load          │
                             │                                                 │
                             │    50 │   │   self.download()                   │
                             │    51 │   │   attempt = f"Attempt #{self.load_a │
                             │       else "Loading"                            │
                             │    52 │   │   log.info(f"{attempt} {self.model_ │
                             │       '{self.model_name}' to memory")           │
                             │ ❱  53 │   │   self.session = self._load()       │
                             │    54 │   │   self.loaded = True                │
                             │    55 │                                         │
                             │    56 │   def predict(self, *inputs: Any, **mod │
                             │                                                 │
                             │ /usr/src/app/models/clip/textual.py:26 in _load │
                             │                                                 │
                             │    23 │   │   return res                        │
                             │    24 │                                         │
                             │    25 │   def _load(self) -> ModelSession:      │
                             │ ❱  26 │   │   session = super()._load()         │
                             │    27 │   │   log.debug(f"Loading tokenizer for │
                             │    28 │   │   self.tokenizer = self._load_token │
                             │    29 │   │   tokenizer_kwargs: dict[str, Any]  │
                             │                                                 │
                             │ /usr/src/app/models/base.py:78 in _load         │
                             │                                                 │
                             │    75 │   │   )                                 │
                             │    76 │                                         │
                             │    77 │   def _load(self) -> ModelSession:      │
                             │ ❱  78 │   │   return self._make_session(self.mo │
                             │    79 │                                         │
                             │    80 │   def clear_cache(self) -> None:        │
                             │    81 │   │   if not self.cache_dir.exists():   │
                             │                                                 │
                             │ /usr/src/app/models/base.py:108 in              │
                             │ _make_session                                   │
                             │                                                 │
                             │   105 │   │                                     │
                             │   106 │   │   match model_path.suffix:          │
                             │   107 │   │   │   case ".armnn":                │
                             │ ❱ 108 │   │   │   │   session: ModelSession = A │
                             │   109 │   │   │   case ".onnx":                 │
                             │   110 │   │   │   │   session = OrtSession(mode │
                             │   111 │   │   │   case _:                       │
                             │                                                 │
                             │ /usr/src/app/sessions/ann.py:26 in __init__     │
                             │                                                 │
                             │   23 │   │   self.ann = Ann(tuning_level=settin │
                             │      "gpu-tuning.ann").as_posix())              │
                             │   24 │   │                                      │
                             │   25 │   │   log.info("Loading ANN model %s ... │
                             │ ❱ 26 │   │   self.model = self.ann.load(        │
                             │   27 │   │   │   model_path.as_posix(),         │
                             │   28 │   │   │   cached_network_path=model_path │
                             │   29 │   │   │   fp16=settings.ann_fp16_turbo,  │
                             │                                                 │
                             │ /usr/src/ann/ann.py:124 in load                 │
                             │                                                 │
                             │   121 │   │   │   cached_network_path.encode()  │
                             │   122 │   │   )                                 │
                             │   123 │   │   if net_id < 0:                    │
                             │ ❱ 124 │   │   │   raise ValueError("Cannot load │
                             │   125 │   │                                     │
                             │   126 │   │   self.input_shapes[net_id] = tuple │
                             │   127 │   │   │   self.shape(net_id, input=True │
                             │       input=True))                              │
                             ╰─────────────────────────────────────────────────╯
                             ValueError: Cannot load model!

Additional information

RK3588 CPU

Originally created by @jdicioccio on GitHub (Oct 5, 2024). ### The bug When doing a text search with Immich 1.117.0, I'm getting errors loading the model. I tried removing the model cache volume, but it just downloads the non-functioning model again. ### The OS that Immich Server is running on Debian bookworm ### Version of Immich Server v1.117.0 ### Version of Immich Mobile App v1.117.0 ### Platform with the issue - [X] Server - [X] Web - [X] Mobile ### Your docker-compose.yml content ```YAML name: immich services: immich-server: container_name: immich_server image: ghcr.io/immich-app/immich-server:${IMMICH_VERSION:-release} extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/hardware-transcoding file: hwaccel.transcoding.yml service: rkmpp # set to one of [nvenc, quicksync, rkmpp, vaapi, vaapi-wsl] for accelerated transcoding volumes: - ${UPLOAD_LOCATION}:/usr/src/app/upload - /data/photoprism/photos:/photoprism/photos:ro - /etc/localtime:/etc/localtime:ro env_file: - .env ports: - 2283:3001 depends_on: - redis - database restart: always healthcheck: disable: false immich-machine-learning: container_name: immich_machine_learning # For hardware acceleration, add one of -[armnn, cuda, openvino] to the image tag. # Example tag: ${IMMICH_VERSION:-release}-cuda image: ghcr.io/immich-app/immich-machine-learning:${IMMICH_VERSION:-release}-armnn extends: # uncomment this section for hardware acceleration - see https://immich.app/docs/features/ml-hardware-acceleration file: hwaccel.ml.yml service: armnn # set to one of [armnn, cuda, openvino, openvino-wsl] for accelerated inference - use the `-wsl` version for WSL2 where applicable volumes: - model-cache:/cache env_file: - .env restart: always healthcheck: disable: false redis: container_name: immich_redis image: docker.io/redis:6.2-alpine@sha256:e3b17ba9479deec4b7d1eeec1548a253acc5374d68d3b27937fcfe4df8d18c7e healthcheck: test: redis-cli ping || exit 1 restart: always database: container_name: immich_postgres image: registry.hub.docker.com/tensorchord/pgvecto-rs:pg14-v0.2.0@sha256:90724186f0a3517cf6914295b5ab410db9ce23190a2d9d0b9dd6463e3fa298f0 environment: POSTGRES_PASSWORD: ${DB_PASSWORD} POSTGRES_USER: ${DB_USERNAME} POSTGRES_DB: ${DB_DATABASE_NAME} POSTGRES_INITDB_ARGS: '--data-checksums' volumes: - ${DB_DATA_LOCATION}:/var/lib/postgresql/data healthcheck: test: pg_isready --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' || exit 1; Chksum="$$(psql --dbname='${DB_DATABASE_NAME}' --username='${DB_USERNAME}' --tuples-only --no-align --command='SELECT COALESCE(SUM(checksum_failures), 0) FROM pg_stat_database')"; echo "checksum failure count is $$Chksum"; [ "$$Chksum" = '0' ] || exit 1 interval: 5m start_interval: 30s start_period: 5m restart: always command: ["postgres", "-c" ,"shared_preload_libraries=vectors.so", "-c", 'search_path="$$user", public, vectors', "-c", "logging_collector=on", "-c", "max_wal_size=2GB", "-c", "shared_buffers=512MB", "-c", "wal_compression=on"] volumes: model-cache: ``` ### Your .env content ```Shell # You can find documentation for all the supported env variables at https://immich.app/docs/install/environment-variables # The location where your uploaded files are stored UPLOAD_LOCATION=/data/immich/library # The location where your database files are stored DB_DATA_LOCATION=/data/immich/db # The Immich version to use. You can pin this to a specific version like "v1.71.0" IMMICH_VERSION=release # Connection secret for postgres. You should change it to a random password DB_PASSWORD=... # The values below this line do not need to be changed ################################################################################### DB_USERNAME=postgres DB_DATABASE_NAME=immich ``` ### Reproduction steps 1. Open mobile app or web app 2. Perform a text search ### Relevant log output ```shell [10/05/24 01:06:28] INFO Downloading textual model 'ViT-B-16-SigLIP-384__webli'. This may take a while. Fetching 11 files: 100%|██████████| 11/11 [00:20<00:00, 1.83s/it] [10/05/24 01:06:49] INFO Loading textual model 'ViT-B-16-SigLIP-384__webli' to memory arm_release_ver: g13p0-01eac0, rk_so_ver: 10 [10/05/24 01:06:49] INFO Loading ANN model /cache/clip/ViT-B-16-SigLIP-384__webli/textual/mode l.armnn ... Warning: WARNING: Layer of type Cast is not supported on requested backend GpuAcc for input data type Signed32 and output data type Signed64 (reason: in validate_arguments src/gpu/cl/kernels/ClCastKernel.cpp:59: ITensor data type S64 not supported by this kernel), falling back to the next backend. Warning: ERROR: Layer of type Cast is not supported on any preferred backend [GpuAcc ] Warning: WARNING: Layer of type Gather is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/core/CL/kernels/CLGatherKernel.cpp:58: ITensor data type S64 not supported by this kernel), falling back to the next backend. Warning: ERROR: Layer of type Gather is not supported on any preferred backend [GpuAcc ] Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend. Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ] Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend. Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ] Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend. Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ] Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend. Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ] Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend. Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ] Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend. Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ] Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend. Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ] Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend. Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ] Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend. Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ] Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend. Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ] Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend. Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ] Warning: WARNING: Layer of type Transpose is not supported on requested backend GpuAcc for input data type Float32 and output data type Float32 (reason: in validate_arguments src/gpu/cl/kernels/ClPermuteKernel.cpp:60: Permutation up to 4-D src tensor is supported), falling back to the next backend. Warning: ERROR: Layer of type Transpose is not supported on any preferred backend [GpuAcc ] [10/05/24 01:06:50] ERROR Exception in ASGI application ╭─────── Traceback (most recent call last) ───────╮ │ /usr/src/app/main.py:152 in predict │ │ │ │ 149 │ │ inputs = text │ │ 150 │ else: │ │ 151 │ │ raise HTTPException(400, "Either │ │ ❱ 152 │ response = await run_inference(inputs │ │ 153 │ return ORJSONResponse(response) │ │ 154 │ │ 155 │ │ │ │ /usr/src/app/main.py:175 in run_inference │ │ │ │ 172 │ │ response[entry["task"]] = output │ │ 173 │ │ │ 174 │ without_deps, with_deps = entries │ │ ❱ 175 │ await asyncio.gather(*[_run_inference │ │ 176 │ if with_deps: │ │ 177 │ │ await asyncio.gather(*[_run_infer │ │ 178 │ if isinstance(payload, Image): │ │ │ │ /usr/src/app/main.py:169 in _run_inference │ │ │ │ 166 │ │ │ except KeyError: │ │ 167 │ │ │ │ message = f"Task {entry[' │ │ output of {dep}" │ │ 168 │ │ │ │ raise HTTPException(400, │ │ ❱ 169 │ │ model = await load(model) │ │ 170 │ │ output = await run(model.predict, │ │ 171 │ │ outputs[model.identity] = output │ │ 172 │ │ response[entry["task"]] = output │ │ │ │ /usr/src/app/main.py:213 in load │ │ │ │ 210 │ │ return model │ │ 211 │ │ │ 212 │ try: │ │ ❱ 213 │ │ return await run(_load, model) │ │ 214 │ except (OSError, InvalidProtobuf, Bad │ │ 215 │ │ log.warning(f"Failed to load {mod │ │ '{model.model_name}'. Clearing cache.") │ │ 216 │ │ model.clear_cache() │ │ │ │ /usr/src/app/main.py:188 in run │ │ │ │ 185 │ if thread_pool is None: │ │ 186 │ │ return func(*args, **kwargs) │ │ 187 │ partial_func = partial(func, *args, * │ │ ❱ 188 │ return await asyncio.get_running_loop │ │ 189 │ │ 190 │ │ 191 async def load(model: InferenceModel) -> │ │ │ │ /usr/local/lib/python3.11/concurrent/futures/th │ │ read.py:58 in run │ │ │ │ /usr/src/app/main.py:200 in _load │ │ │ │ 197 │ │ │ raise HTTPException(500, f"Fa │ │ 198 │ │ with lock: │ │ 199 │ │ │ try: │ │ ❱ 200 │ │ │ │ model.load() │ │ 201 │ │ │ except FileNotFoundError as e │ │ 202 │ │ │ │ if model.model_format == │ │ 203 │ │ │ │ │ raise e │ │ │ │ /usr/src/app/models/base.py:53 in load │ │ │ │ 50 │ │ self.download() │ │ 51 │ │ attempt = f"Attempt #{self.load_a │ │ else "Loading" │ │ 52 │ │ log.info(f"{attempt} {self.model_ │ │ '{self.model_name}' to memory") │ │ ❱ 53 │ │ self.session = self._load() │ │ 54 │ │ self.loaded = True │ │ 55 │ │ │ 56 │ def predict(self, *inputs: Any, **mod │ │ │ │ /usr/src/app/models/clip/textual.py:26 in _load │ │ │ │ 23 │ │ return res │ │ 24 │ │ │ 25 │ def _load(self) -> ModelSession: │ │ ❱ 26 │ │ session = super()._load() │ │ 27 │ │ log.debug(f"Loading tokenizer for │ │ 28 │ │ self.tokenizer = self._load_token │ │ 29 │ │ tokenizer_kwargs: dict[str, Any] │ │ │ │ /usr/src/app/models/base.py:78 in _load │ │ │ │ 75 │ │ ) │ │ 76 │ │ │ 77 │ def _load(self) -> ModelSession: │ │ ❱ 78 │ │ return self._make_session(self.mo │ │ 79 │ │ │ 80 │ def clear_cache(self) -> None: │ │ 81 │ │ if not self.cache_dir.exists(): │ │ │ │ /usr/src/app/models/base.py:108 in │ │ _make_session │ │ │ │ 105 │ │ │ │ 106 │ │ match model_path.suffix: │ │ 107 │ │ │ case ".armnn": │ │ ❱ 108 │ │ │ │ session: ModelSession = A │ │ 109 │ │ │ case ".onnx": │ │ 110 │ │ │ │ session = OrtSession(mode │ │ 111 │ │ │ case _: │ │ │ │ /usr/src/app/sessions/ann.py:26 in __init__ │ │ │ │ 23 │ │ self.ann = Ann(tuning_level=settin │ │ "gpu-tuning.ann").as_posix()) │ │ 24 │ │ │ │ 25 │ │ log.info("Loading ANN model %s ... │ │ ❱ 26 │ │ self.model = self.ann.load( │ │ 27 │ │ │ model_path.as_posix(), │ │ 28 │ │ │ cached_network_path=model_path │ │ 29 │ │ │ fp16=settings.ann_fp16_turbo, │ │ │ │ /usr/src/ann/ann.py:124 in load │ │ │ │ 121 │ │ │ cached_network_path.encode() │ │ 122 │ │ ) │ │ 123 │ │ if net_id < 0: │ │ ❱ 124 │ │ │ raise ValueError("Cannot load │ │ 125 │ │ │ │ 126 │ │ self.input_shapes[net_id] = tuple │ │ 127 │ │ │ self.shape(net_id, input=True │ │ input=True)) │ ╰─────────────────────────────────────────────────╯ ValueError: Cannot load model! ``` ### Additional information RK3588 CPU
Author
Owner

@bo0tzz commented on GitHub (Oct 5, 2024):

@mertalev I was under the impression that you were still working on RK3588 support?

@bo0tzz commented on GitHub (Oct 5, 2024): @mertalev I was under the impression that you were still working on RK3588 support?
Author
Owner

@mertalev commented on GitHub (Oct 5, 2024):

RK3588 is already supported for many models, but the siglip models are still WIP and apparently don't work.

@mertalev commented on GitHub (Oct 5, 2024): RK3588 is already supported for many models, but the siglip models are still WIP and apparently don't work.
Author
Owner

@jdicioccio commented on GitHub (Oct 5, 2024):

This used to work.. maybe before it was falling back to running on CPU?

@jdicioccio commented on GitHub (Oct 5, 2024): This used to work.. maybe before it was falling back to running on CPU?
Author
Owner

@mertalev commented on GitHub (Oct 5, 2024):

The ARMNN models just didn't exist before so it used CPU, but now they do exist but are broken. You can use the CPU image for machine learning for now to use CPU as before.

@mertalev commented on GitHub (Oct 5, 2024): The ARMNN models just didn't exist before so it used CPU, but now they do exist but are broken. You can use the CPU image for machine learning for now to use CPU as before.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: immich-app/immich#4474