docs: facial recognition and general clean-up (#11106)

* add facial recognition docs, clean up existing info * Update smart-search.md Co-authored-by: Alex <alex.tran1502@gmail.com> --------- Co-authored-by: Alex <alex.tran1502@gmail.com>
2025-12-23 09:15:05 +03:00 · 2024-07-14 22:08:16 -04:00
parent 8193416230
commit cc1235d4aa
5 changed files with 134 additions and 50 deletions
--- a/docs/docs/features/facial-recognition.md
+++ b/docs/docs/features/facial-recognition.md
@@ -2,7 +2,7 @@

 ## Overview

-Immich recognizes faces in your photos and videos and groups them together. You can then assign names to the faces and search for them.
+Immich recognizes faces in your photos and videos and groups them together into people. You can then assign names to these people and search for them.

 The list of people is shown in the Explore page.

@@ -18,13 +18,75 @@ The asset detail view will also show the faces that are recognized in the asset.

 ## Actions

-Additional actions you can do with a detected person are:
+Additional actions you can do include:

- Change the feature face photo of the person
- Set date of birth
- Merge two or more detected faces into one person
- Hide face
+- Changing the feature photo of the person
+- Setting a person's date of birth
+- Merging two or more detected faces into one person
+- Hiding the faces of a person from the Explore page and detail view
+- Assigning an unrecognized face to a person

 It can be found from the app bar when you access the detail view of a person.

 <img src={require('./img/facial-recognition-4.png').default} title='Facial Recognition 4' width="70%"/>
+
+## How Face Detection Works
+
+Face detection sends the generated preview image to the machine learning service for processing. The service checks if it has the relevant model downloaded and downloads it if not. The image is decoded, pre-processed and passed to the face detection model (with hardware acceleration if configured). The bounding boxes and scores outputted from this model are used to crop and preprocess the image once again to be passed to a facial recognition model (also accelerated if configured). The embeddings from the recognition model, together with the bounding boxes and scores from the face detection model, are then sent back to the server to be added to the database. The embeddings in particular are indexed so they can be searched quickly during facial recognition clustering.
+
+## How Facial Recognition Works
+
+The facial recognition algorithm we use is derived from DBSCAN, a popular clustering algorithm. It essentially treats each detected face as a point in a graph and aims to group points that are close to each other.
+
+:::note
+An important concept is whether something is a _core point_. A core point has a minimum number of points around it within a certain distance. A non-core point can only be assigned to a cluster if it can reach a core point; a non-core point can't be used to extend a cluster even if it's part of one. In Immich, the _Minimum Recognized Faces_ setting controls the threshold to be considered a core point.
+:::
+
+For each face, it looks around it to find other faces within a certain distance. Faces within this distance are considered similar, so it then checks if any of these faces are associated with a person.
+
+If there is an existing person, it assigns the person of the most similar face to the face being processed.
+
+If there is none, then it has to determine something from the DBSCAN algorithm: whether the face is a _core point_. If there are a certain number of similar faces (by default 3, including the face being considered), then this face is a core point. A new person is created for this face and the face is assigned to it. When other faces are processed, if they're similar to this face, they'll see that it has an associated person and can be assigned to that person.
+
+However, if there aren't enough similar faces, no new person will be created. Instead, the face will wait for all the other faces to be processed to see if any matches that previously didn't have an associated person now do. If they do, then the face will be assigned to that person. If not, this face will be considered an outlier, such as a stranger in the background of an image.
+
+The algorithm has some subtle differences compared to DBSCAN:
+
+- DBSCAN doesn't have a concept of incremental clustering: it clusters all points at once. In contrast, facial recognition has to evolve as more assets are added without re-clustering everything each time.
+  - The algorithm described above works within a set of queued assets. Once these faces are processed and a new round of faces are detected, the behavior will not be the same as traditional DBSCAN since it preserves the clusters (people) generated from the previous round.
+    - Facial recognition tries to wait for face detection and thumbnail generation to complete before starting for this reason: the larger the set of faces in the queue, the better the results will be.
+    - Re-running facial recognition on all assets afterwards does behave like DBSCAN, however.
+- DBSCAN is designed for range-based searches (i.e. points within a distance), but high-dimensional vector indices are generally optimized for getting the closest K results. The recognition algorithm doesn't try to get _all_ similar faces within a distance for performance reasons. Instead, it searches for a small number of matches for each face. The end result should be very similar if not identical, but with possibly different performance characteristics.
+  - Because of this, part of the recognition process is handled during a nightly job to ensure that unassigned faces with potential matches can be recognized.
+
+:::tip
+If you didn't import your assets at once or if the server was able to process jobs faster than you could upload them, it's possible that the clustering was suboptimal. If you haven't put effort into the current results, it may be worth re-running facial recognition on all assets for the best starting point. If it's too late for that, you can also manually assign a selection of unassigned faces and queue _Missing_ for Facial Recognition to help it learn and assign more faces automatically.
+:::
+
+## Configuration
+
+Navigating to Administration > Settings > Machine Learning Settings > Facial Recognition will show the options available.
+
+:::tip
+It's better to only tweak the parameters here than to set them to something very different unless you're ready to test a variety of options. If you do need to set a parameter to a strict setting, relaxing other settings can be a good option to compensate, and vice versa.
+:::
+
+### Facial recognition model
+
+There are a few different models available; the default is typically considered the best. On more constrained systems where the default is too intensive, you can choose a smaller model instead.
+
+### Minimum detection score
+
+This setting affects whether a result from the face detecton model is filtered out as a false positive. It may seem tempting to set this low to detect more faces, but it can lead to false positives that are difficult to deal with and can harm facial recognition. It is strongly recommended not to go below 0.5 for this setting. Setting it to a very high number like 0.9 is also not recommended: the default is already biased toward precision, so a threshold that high leads to many undetected faces.
+
+After changing this setting, it will only apply to new face detection jobs. To apply the new setting to all assets, you need to re-run face detection for all assets.
+
+### Maximum recognition distance
+
+The distance threshold described in How Facial Recognition Works. The default works well for most people, but it may be worth lowering it if the library has twins or otherwise very similar looking people. A threshold that's too low just means needing to merge duplicate people after facial recognition, whereas a threshold too high can produce unsalvageable results. It is strongly recommended not to go below 0.3 or above 0.7.
+
+### Minimum recognized faces
+
+The core point threshold described in How Facial Recognition Works. This setting has a few implications. First, it takes effect immediately in that people with fewer faces than this are hidden from view. Secondly, it makes clustering more robust as it prevents loosely-related faces from being linked to each other by requiring a certain level of density.
+
+Increasing this setting is a good idea if you increase the recognition distance or reduce the minimum detection score. Setting it to 1 effectively disables the concept of core points, but can be an option if you prefer a more hands-on approach.
--- a/docs/docs/features/hardware-transcoding.md
+++ b/docs/docs/features/hardware-transcoding.md
@@ -123,6 +123,7 @@ Once this is done, you can continue to step 3 of "Basic Setup".

 - You may want to choose a slower preset than for software transcoding to maintain quality and efficiency
 - While you can use VAAPI with NVIDIA and Intel devices, prefer the more specific APIs since they're more optimized for their respective devices
+- You can confirm the device is being recognized and used by checking its utilization (via `nvtop` for NVIDIA, `intel_gpu_top` for Intel, etc.) when transcoding. A lack of error logs when transcoding also indicates that it's being used.

 [hw-file]: https://github.com/immich-app/immich/releases/latest/download/hwaccel.transcoding.yml
 [nvct]: https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html
--- a/docs/docs/features/smart-search.md
+++ b/docs/docs/features/smart-search.md
@@ -7,29 +7,30 @@ Immich uses Postgres as its search database for both metadata and smart search.

 Smart search is powered by the [pgvecto.rs](https://github.com/tensorchord/pgvecto.rs) extension, utilizing machine learning models like [CLIP](https://openai.com/research/clip) to provide relevant search results. This allows for freeform searches without requiring specific keywords in the image or video metadata.

-Archived photos are not included in search results by default. To include them, mark the checkbox in [advanced search filters](/docs/features/smart-search#advanced-search-filters).
-
-:::tip Alternative CLIP Models
-More powerful models can be used for more accurate search results. For more information, see the related [FAQ](/docs/FAQ#can-i-use-a-custom-clip-model).
-:::
-
-:::info
-Smart Search is currently limited to 5,000 results for a single search on the web.
-:::
-
 ## Advanced Search Filters

 In addition, Immich offers advanced search functionality, allowing you to find specific content using customizable search filters. These filters include location, one or more faces, specific albums, and more. You can try out the search filters on the [Demo site](https://demo.immich.app).

-Smart search features include:
+The filters smart search allows you to search by include:

- Search for one or more faces (with or without context search).
- Search by Country or State or City or by all three.
- Search by camera make and model.
- Search by date range.
- Search by file name.
- Search by media types: image, video or all (**Note:** Image includes live images).
- Search by condition: not in any album or archive or Favorite or all conditions.
+- People
+- Location
+  - Country
+  - State
+  - City
+- Camera
+  - Make
+  - Model
+- Date range
+- File name or extension
+- Media type
+  - Image (including live/motion photos)
+  - Video
+  - All
+- Condition
+  - Not in any album
+  - Archived
+  - Favorited

 <Tabs>
  <TabItem value="Computer" label="Computer" default>
@@ -47,3 +48,27 @@ Some search examples:

 </TabItem>
 </Tabs>
+
+## Configuration
+
+Navigating to `Administration > Settings > Machine Learning Settings > Smart Search` will show the options available.
+
+### CLIP model
+
+More powerful models can be used for more accurate search results, but are slower and can require more server resources. Check out the models [here][huggingface-clip] for more options!
+
+[Multilingual models][huggingface-multilingual-clip] are also available so users can search in their native language. These models support over 100 languages; the `nllb` models in particular support 200.
+:::note
+Multilingual models are much slower and larger and perform slightly worse for English than English-only models. For this reason, only use them if you actually intend to search in a language besides English.
+
+As a special case, the `ViT-H-14-quickgelu__dfn5b` and `ViT-H-14-378-quickgelu__dfn5b` models are excellent at many European languages despite not specifically being multilingual. They're very intensive regardless, however - especially the latter.
+:::
+
+Once you've chosen a model, change this setting to the name of the model you chose. Be sure to re-run Smart Search on all assets after this change.
+
+:::note
+Feel free to make a feature request if there's a model you want to use that we don't currently support.
+:::
+
+[huggingface-clip]: https://huggingface.co/collections/immich-app/clip-654eaefb077425890874cd07
+[huggingface-multilingual-clip]: https://huggingface.co/collections/immich-app/multilingual-clip-654eb08c2382f591eeb8c2a7