[PR #24338] [CLOSED] fix(metadata): add retry logic for cloud storage #17830

Closed
opened 2026-02-05 16:28:16 +03:00 by OVERLORD · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/immich-app/immich/pull/24338
Author: @LuckyCoders
Created: 12/2/2025
Status: Closed

Base: mainHead: main


📝 Commits (7)

  • 89a35e8 Checkpoint before follow-up message
  • b93b415 feat: Implement retry logic for exiftool operations
  • 7033545 Increase max retries and set final retry delay to 10s
  • e008c5a Refactor: Improve temporary error detection logic
  • 766e247 feat: Cast exiftool read result to ImmichTags
  • 05128d5 Refactor: Improve logging and retry delay in MetadataRepository
  • fae9231 Merge pull request #4 from LuckyCoders/cursor/fix-metadata-extraction-container-crash-gemini-3-pro-preview-506a

📊 Changes

1 file changed (+280 additions, -27 deletions)

View changed files

📝 server/src/repositories/metadata.repository.ts (+280 -27)

📄 Description

Description

This PR addresses issues where metadata operations (reading, extracting, writing) cause container crashes and spam BatchCluster has ended, cannot enqueue errors, particularly when interacting with cloud drives or experiencing temporary file unavailability.

The changes implement:

  • Retry logic with exponential backoff: Metadata operations (readTags, extractBinaryTag, writeTags) now include up to 3 retries with increasing delays (500ms, 1000ms, 2000ms) to handle temporary network issues or file unavailability.
  • Automatic ExifTool instance recreation: If an ExifTool operation fails with a "BatchCluster has ended" error, the ExifTool instance is automatically recreated to restore functionality without crashing the container.
  • Improved error handling: Specific file system errors (ENOENT, ETIMEDOUT, ECONNRESET, EACCES) and messages indicating temporary file unavailability are now considered retryable.

This significantly improves the robustness and stability of metadata processing, especially in environments with potentially unreliable file access.

Fixes # (issue)

How Has This Been Tested?

  • Manual testing with cloud drive storage to simulate temporary file unavailability and BatchCluster errors.
  • Unit tests for retry logic and ExifTool recreation (if applicable).

Screenshots (if appropriate)

Checklist:

  • I have performed a self-review of my own code
  • I have made corresponding changes to the documentation if applicable
  • I have no unrelated changes in the PR.
  • I have confirmed that any new dependencies are strictly necessary.
  • I have written tests for new code (if applicable)
  • I have followed naming conventions/patterns in the surrounding code
  • All code in src/services/ uses repositories implementations for database calls, filesystem operations, etc.
  • All code in src/repositories/ is pretty basic/simple and does not have any immich specific logic (that belongs in src/services/)

Please describe to which degree, if any, an LLM was used in creating this pull request.

This pull request was created with the help of an AI assistant, which assisted in identifying the root cause, suggesting solutions, and generating the code changes based on the problem description and user feedback.


🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/immich-app/immich/pull/24338 **Author:** [@LuckyCoders](https://github.com/LuckyCoders) **Created:** 12/2/2025 **Status:** ❌ Closed **Base:** `main` ← **Head:** `main` --- ### 📝 Commits (7) - [`89a35e8`](https://github.com/immich-app/immich/commit/89a35e80b888146ca4782321c154b6babd15a8ff) Checkpoint before follow-up message - [`b93b415`](https://github.com/immich-app/immich/commit/b93b415f677bf28e0f2859fa3409df150e373a17) feat: Implement retry logic for exiftool operations - [`7033545`](https://github.com/immich-app/immich/commit/7033545b0382b5c6f6ece08b5513b8dfb96f315b) Increase max retries and set final retry delay to 10s - [`e008c5a`](https://github.com/immich-app/immich/commit/e008c5aca1e752624accf22419f10eafd6b19fa3) Refactor: Improve temporary error detection logic - [`766e247`](https://github.com/immich-app/immich/commit/766e2471b86213c17bc8f6846be6e43fe50d67c0) feat: Cast exiftool read result to ImmichTags - [`05128d5`](https://github.com/immich-app/immich/commit/05128d574a2abf37b8dddd9ec1683be29d9d2379) Refactor: Improve logging and retry delay in MetadataRepository - [`fae9231`](https://github.com/immich-app/immich/commit/fae9231e0affef51ec664ed884c2a0bcddd3a250) Merge pull request #4 from LuckyCoders/cursor/fix-metadata-extraction-container-crash-gemini-3-pro-preview-506a ### 📊 Changes **1 file changed** (+280 additions, -27 deletions) <details> <summary>View changed files</summary> 📝 `server/src/repositories/metadata.repository.ts` (+280 -27) </details> ### 📄 Description ## Description This PR addresses issues where metadata operations (reading, extracting, writing) cause container crashes and spam `BatchCluster has ended, cannot enqueue` errors, particularly when interacting with cloud drives or experiencing temporary file unavailability. The changes implement: * **Retry logic with exponential backoff:** Metadata operations (`readTags`, `extractBinaryTag`, `writeTags`) now include up to 3 retries with increasing delays (500ms, 1000ms, 2000ms) to handle temporary network issues or file unavailability. * **Automatic ExifTool instance recreation:** If an `ExifTool` operation fails with a "BatchCluster has ended" error, the `ExifTool` instance is automatically recreated to restore functionality without crashing the container. * **Improved error handling:** Specific file system errors (`ENOENT`, `ETIMEDOUT`, `ECONNRESET`, `EACCES`) and messages indicating temporary file unavailability are now considered retryable. This significantly improves the robustness and stability of metadata processing, especially in environments with potentially unreliable file access. Fixes # (issue) ## How Has This Been Tested? - [ ] Manual testing with cloud drive storage to simulate temporary file unavailability and `BatchCluster` errors. - [ ] Unit tests for retry logic and `ExifTool` recreation (if applicable). <details><summary><h2>Screenshots (if appropriate)</h2></summary> <!-- Images go below this line. --> </details> <!-- API endpoint changes (if relevant) ## API Changes The `/api/something` endpoint is now `/api/something-else` --> ## Checklist: - [ ] I have performed a self-review of my own code - [ ] I have made corresponding changes to the documentation if applicable - [ ] I have no unrelated changes in the PR. - [ ] I have confirmed that any new dependencies are strictly necessary. - [ ] I have written tests for new code (if applicable) - [ ] I have followed naming conventions/patterns in the surrounding code - [ ] All code in `src/services/` uses repositories implementations for database calls, filesystem operations, etc. - [ ] All code in `src/repositories/` is pretty basic/simple and does not have any immich specific logic (that belongs in `src/services/`) ## Please describe to which degree, if any, an LLM was used in creating this pull request. This pull request was created with the help of an AI assistant, which assisted in identifying the root cause, suggesting solutions, and generating the code changes based on the problem description and user feedback. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
OVERLORD added the pull-request label 2026-02-05 16:28:16 +03:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: immich-app/immich#17830