[BUG] High search latency from ML encode-text and postgres #1042

Closed
opened 2026-02-05 00:10:44 +03:00 by OVERLORD · 3 comments
Owner

Originally created by @pl4nty on GitHub (Jul 2, 2023).

The bug

When doing non-metadata searches in the web client, I'm seeing high average latencies of 1.5s, and some >5s requests. I don't have enough usage to get P90 etc though. Is this expected? Metadata searches are much faster.

Here's some tracing from a particularly slow request for /search?q=test&clip=false. I've been able to replicate these latencies with the demo instance too.

image

Trace-91e851-2023-07-02 16 27 47.json.txt

The OS that Immich Server is running on

Official containers on Oracle Linux 7 Kubernetes node, 4 arm64 cores 24GB RAM

Version of Immich Server

v1.65.0

Version of Immich Mobile App

N/A

Platform with the issue

  • Server
  • Web
  • Mobile

Your docker-compose.yml content

Official Helm chart: https://github.com/immich-app/immich-charts/tree/main/charts/immich
My Helm values: https://github.com/pl4nty/lab-infra/blob/main/kubernetes/oke/immich/immich.yaml

Your .env content

N/A

Reproduction steps

1. Search for text in the web client eg `test`

Additional information

All Immich containers except proxy are injected with OpenTelemetry autoinstrumentation. Bitnami Redis, official Typesense, and cloudnative-pg Postgres are used.

Originally created by @pl4nty on GitHub (Jul 2, 2023). ### The bug When doing non-metadata searches in the web client, I'm seeing high average latencies of 1.5s, and some >5s requests. I don't have enough usage to get P90 etc though. Is this expected? Metadata searches are much faster. Here's some tracing from a particularly slow request for `/search?q=test&clip=false`. I've been able to replicate these latencies with the demo instance too. ![image](https://github.com/immich-app/immich/assets/21111317/addc852f-dc29-47fb-a36a-c62199749bfa) [Trace-91e851-2023-07-02 16 27 47.json.txt](https://github.com/immich-app/immich/files/11928835/Trace-91e851-2023-07-02.16.27.47.json.txt) ### The OS that Immich Server is running on Official containers on Oracle Linux 7 Kubernetes node, 4 arm64 cores 24GB RAM ### Version of Immich Server v1.65.0 ### Version of Immich Mobile App N/A ### Platform with the issue - [ ] Server - [X] Web - [ ] Mobile ### Your docker-compose.yml content ```YAML Official Helm chart: https://github.com/immich-app/immich-charts/tree/main/charts/immich My Helm values: https://github.com/pl4nty/lab-infra/blob/main/kubernetes/oke/immich/immich.yaml ``` ### Your .env content ```Shell N/A ``` ### Reproduction steps ```bash 1. Search for text in the web client eg `test` ``` ### Additional information All Immich containers except `proxy` are injected with OpenTelemetry autoinstrumentation. Bitnami Redis, official Typesense, and cloudnative-pg Postgres are used.
Author
Owner

@mertalev commented on GitHub (Jul 2, 2023):

The first request will typically be slower than following requests since models are unloaded after idling for 300s. Outside of this, the bulk of the latency comes down to inference speed, i.e. CPU performance.

It's also normal for metadata searches to be much faster since there's no live inference taking place: all of the tags have already been made and indexed beforehand.

@mertalev commented on GitHub (Jul 2, 2023): The first request will typically be slower than following requests since models are unloaded after idling for 300s. Outside of this, the bulk of the latency comes down to inference speed, i.e. CPU performance. It's also normal for metadata searches to be much faster since there's no live inference taking place: all of the tags have already been made and indexed beforehand.
Author
Owner

@pl4nty commented on GitHub (Jul 2, 2023):

Thanks. Would CPU architecture make a difference even though it's python? The underlying cores are ARM, from a 3Ghz Ampere A1. My metrics aren't precise enough to catch a CPU usage spike, but I'll do some further testing.

I noticed /encode-text is included in #2574, so I'll look forward to testing with a GPU when it releases :)

@pl4nty commented on GitHub (Jul 2, 2023): Thanks. Would CPU architecture make a difference even though it's python? The underlying cores are ARM, from a 3Ghz Ampere A1. My metrics aren't precise enough to catch a CPU usage spike, but I'll do some further testing. I noticed `/encode-text` is included in #2574, so I'll look forward to testing with a GPU when it releases :)
Author
Owner

@mertalev commented on GitHub (Jul 2, 2023):

ARM could have something to do with it, but it also seems this CPU isn't very powerful going by this.

CUDA support will be for all ML endpoints and models. Stay tuned :)

If you want to do more quantitative testing, you can use Locust by following these steps:

  1. Clone the repo
  2. Navigate to the machine-learning folder
  3. Follow the instructions to install Poetry and dependencies
  4. Run locust --host <HOST> --web-host localhost, setting the host to the ML instance
  5. Open the web UI shown and begin swarming
@mertalev commented on GitHub (Jul 2, 2023): ARM could have something to do with it, but it also seems this CPU isn't very powerful going by [this](https://www.storagereview.com/review/oci-ampere-a1-compute-review). CUDA support will be for all ML endpoints and models. Stay tuned :) If you want to do more quantitative testing, you can use [Locust](https://locust.io) by following these steps: 1. Clone the repo 2. Navigate to the `machine-learning` folder 3. Follow the instructions to install Poetry and dependencies 4. Run `locust --host <HOST> --web-host localhost`, setting the host to the ML instance 5. Open the web UI shown and begin swarming
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: immich-app/immich#1042