fix(server): Fix delay with multiple ml servers (#16284)

* Prospective fix for ensuring that known active ML servers are used to reduce search delay.

* Added some logging and renamed backoff const.

* Fix lint issues.

* Update to use env vars for timeouts and updated documentation and strings.

* Fix docs.

* Make counter logic clearer.

* Minor readability improvements.

* Extract  skipUrl logic per feedback, and change log to verbose.

* Make code harder to read.
This commit is contained in:
Tom Graham
2025-02-28 03:14:09 +11:00
committed by GitHub
parent c70c9067b0
commit a808b8610e
5 changed files with 82 additions and 1 deletions

View File

@@ -168,6 +168,8 @@ Redis (Sentinel) URL example JSON before encoding:
| `MACHINE_LEARNING_ANN_TUNING_LEVEL` | ARM-NN GPU tuning level (1: rapid, 2: normal, 3: exhaustive) | `2` | machine learning |
| `MACHINE_LEARNING_DEVICE_IDS`<sup>\*4</sup> | Device IDs to use in multi-GPU environments | `0` | machine learning |
| `MACHINE_LEARNING_MAX_BATCH_SIZE__FACIAL_RECOGNITION` | Set the maximum number of faces that will be processed at once by the facial recognition model | None (`1` if using OpenVINO) | machine learning |
| `MACHINE_LEARNING_PING_TIMEOUT` | How long (ms) to wait for a PING response when checking if an ML server is available | `2000` | server |
| `MACHINE_LEARNING_AVAILABILITY_BACKOFF_TIME` | How long to ignore ML servers that are offline before trying again | `30000` | server |
\*1: It is recommended to begin with this parameter when changing the concurrency levels of the machine learning service and then tune the other ones.