[PR #23392] fix(ml): retry OCR OrtSession with remaining providers #17523

Open
opened 2026-02-05 16:23:19 +03:00 by OVERLORD · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/immich-app/immich/pull/23392
Author: @apetersson
Created: 10/31/2025
Status: 🔄 Open

Base: mainHead: allow_ocr_fallback_to_cpu_coreml


📝 Commits (5)

  • 021ba6e fix(ml): retry OCR OrtSession with remaining providers
  • f81654b fix(ml): test for retry OCR OrtSession with remaining providers
  • e13b3e6 fix(machine-learning): stabilize ORT fallback across models
  • 03112cf fix(ml): clarify coreml fallback logging and mock providers in tests
  • bbb9be6 fix(ml): pass mypy with explicit NumPy arrays

📊 Changes

8 files changed (+548 additions, -41 deletions)

View changed files

📝 machine-learning/immich_ml/models/clip/textual.py (+31 -0)
📝 machine-learning/immich_ml/models/clip/visual.py (+28 -0)
📝 machine-learning/immich_ml/models/facial_recognition/detection.py (+15 -4)
📝 machine-learning/immich_ml/models/facial_recognition/recognition.py (+22 -6)
📝 machine-learning/immich_ml/models/ocr/detection.py (+21 -10)
📝 machine-learning/immich_ml/models/ocr/recognition.py (+20 -8)
📝 machine-learning/immich_ml/sessions/ort.py (+77 -11)
📝 machine-learning/test_main.py (+334 -2)

📄 Description

Description

  • Extend the OCR detection and recognition pipelines so that when the leading ONNX Runtime provider (e.g., CoreML) throws an ONNXRuntimeError, we drop just that provider, rebuild the OrtSession with the remaining providers in the existing preference order, and retry the inference. This keeps OCR tasks alive on CPUs while still attempting faster providers first.
  • Add regression coverage to prove the behavior by stubbing RapidOCR to fail on the first call and verifying we retry with ["CPUExecutionProvider"] and still return results.

Fixes #23391
https://github.com/immich-app/immich/issues/23391

How Has This Been Tested?

My workstation (Apple Silicon, macOS 15):

  • UV_CACHE_DIR=.uv-cache UV_HTTP_TIMEOUT=120 uv run --extra cpu --group dev pytest test_main.py::TestOcrFallback -q

Screenshots (if appropriate)

Checklist:

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation if applicable
  • I have no unrelated changes in the PR.
  • I have confirmed that any new dependencies are strictly necessary.
  • I have written tests for new code (if applicable)
  • I have followed naming conventions/patterns in the surrounding code
  • All code in src/services/ uses repositories implementations for database calls, filesystem operations, etc.
  • All code in src/repositories/ is pretty basic/simple and does not have any immich specific logic (that belongs in src/services/)

Please describe to which degree, if any, an LLM was used in creating this pull request.

LLM (ChatGPT/GPT-5) assisted with brainstorming the fallback approach and drafting this description; all code and tests were written and validated hybrid with codex and by hand..


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/immich-app/immich/pull/23392 **Author:** [@apetersson](https://github.com/apetersson) **Created:** 10/31/2025 **Status:** 🔄 Open **Base:** `main` ← **Head:** `allow_ocr_fallback_to_cpu_coreml` --- ### 📝 Commits (5) - [`021ba6e`](https://github.com/immich-app/immich/commit/021ba6e7a7fa9fc19f4d2a45b4129ffc89ef09a8) fix(ml): retry OCR OrtSession with remaining providers - [`f81654b`](https://github.com/immich-app/immich/commit/f81654b058233550574da078a4d41630c411b421) fix(ml): test for retry OCR OrtSession with remaining providers - [`e13b3e6`](https://github.com/immich-app/immich/commit/e13b3e6a37f5f631f401bbfe447ec4f00c5cf4e9) fix(machine-learning): stabilize ORT fallback across models - [`03112cf`](https://github.com/immich-app/immich/commit/03112cf7e16dc76438b87b7d3e007669a6e48206) fix(ml): clarify coreml fallback logging and mock providers in tests - [`bbb9be6`](https://github.com/immich-app/immich/commit/bbb9be6cbc24daacd339edd67bae638edbe6e979) fix(ml): pass mypy with explicit NumPy arrays ### 📊 Changes **8 files changed** (+548 additions, -41 deletions) <details> <summary>View changed files</summary> 📝 `machine-learning/immich_ml/models/clip/textual.py` (+31 -0) 📝 `machine-learning/immich_ml/models/clip/visual.py` (+28 -0) 📝 `machine-learning/immich_ml/models/facial_recognition/detection.py` (+15 -4) 📝 `machine-learning/immich_ml/models/facial_recognition/recognition.py` (+22 -6) 📝 `machine-learning/immich_ml/models/ocr/detection.py` (+21 -10) 📝 `machine-learning/immich_ml/models/ocr/recognition.py` (+20 -8) 📝 `machine-learning/immich_ml/sessions/ort.py` (+77 -11) 📝 `machine-learning/test_main.py` (+334 -2) </details> ### 📄 Description ## Description - Extend the OCR detection and recognition pipelines so that when the leading ONNX Runtime provider (e.g., CoreML) throws an ONNXRuntimeError, we drop just that provider, rebuild the OrtSession with the remaining providers in the existing preference order, and retry the inference. This keeps OCR tasks alive on CPUs while still attempting faster providers first. - Add regression coverage to prove the behavior by stubbing RapidOCR to fail on the first call and verifying we retry with ["CPUExecutionProvider"] and still return results. Fixes #23391 https://github.com/immich-app/immich/issues/23391 ## How Has This Been Tested? My workstation (Apple Silicon, macOS 15): - [x] UV_CACHE_DIR=.uv-cache UV_HTTP_TIMEOUT=120 uv run --extra cpu --group dev pytest test_main.py::TestOcrFallback -q <details><summary><h2>Screenshots (if appropriate)</h2></summary> <!-- Images go below this line. --> </details> ## Checklist: - [x] I have performed a self-review of my own code - [ ] I have made corresponding changes to the documentation if applicable - [x] I have no unrelated changes in the PR. - [x] I have confirmed that any new dependencies are strictly necessary. - [x] I have written tests for new code (if applicable) - [x] I have followed naming conventions/patterns in the surrounding code - [ ] All code in src/services/ uses repositories implementations for database calls, filesystem operations, etc. - [ ] All code in src/repositories/ is pretty basic/simple and does not have any immich specific logic (that belongs in src/services/) ## Please describe to which degree, if any, an LLM was used in creating this pull request. LLM (ChatGPT/GPT-5) assisted with brainstorming the fallback approach and drafting this description; all code and tests were written and validated hybrid with codex and by hand.. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
OVERLORD added the pull-request label 2026-02-05 16:23:19 +03:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: immich-app/immich#17523