Highlight text containing diacritics on search results #3758

Open
opened 2026-02-05 07:20:57 +03:00 by OVERLORD · 8 comments
Owner

Originally created by @athoik on GitHub (Apr 22, 2023).

Describe the Bug

Searching text with or without diacritics works great! 👍

Although the highlighted text works only if an exact match is found on getMatchPositions

a46b438a4c/app/Search/SearchResultsFormatter.php (L92)

So search a text like δοκιμή will only get highlighted only if enter as written on a page. Entering text δοκιμη works, but no highlighted text shown on search results.

The following patch fixes the issue, using transliterator_transliterate to convert text to lower case without diacritics.
It requires package php-intl installed (eg apt-get install php8.2-intl).

diff --git a/app/Search/SearchResultsFormatter.php b/app/Search/SearchResultsFormatter.php
index 9cbc5ee6..6bbab29a 100644
--- a/app/Search/SearchResultsFormatter.php
+++ b/app/Search/SearchResultsFormatter.php
@@ -84,11 +84,11 @@ class SearchResultsFormatter
     protected function getMatchPositions(string $text, array $terms): array
     {
         $matchRefs = [];
-        $text = mb_strtolower($text);
+        $text = transliterator_transliterate('NFD; [:Nonspacing Mark:] Remove; Lower; NFC;', $text);

         foreach ($terms as $term) {
             $offset = 0;
-            $term = mb_strtolower($term);
+            $term = transliterator_transliterate('NFD; [:Nonspacing Mark:] Remove; Lower; NFC;', $term);
             $pos = mb_strpos($text, $term, $offset);
             while ($pos !== false) {
                 $end = $pos + mb_strlen($term);

I believe above above change will work universally for all languages with diacritics.

Please consider accepting that change, if you believe it will improve BookStack.

Thanks!

Steps to Reproduce

  1. Create a page that contains text δοκιμή (Greek word for test, with ή -> GREEK SMALL LETTER ETA WITH TONOS)
  2. Got to 'search'
  3. Type word δοκιμη (small letters without diacritics)
  4. Search results appear but text δοκιμή is not highlighted

Expected Behaviour

The text δοκιμή should be highlighted, since it was possible to search that text.

Screenshots or Additional Context

No response

Browser Details

No response

Exact BookStack Version

v23.02.3

PHP Version

8.2.5

Hosting Environment

Debian 11 with PHP 8.2 by @armando-femat

Originally created by @athoik on GitHub (Apr 22, 2023). ### Describe the Bug Searching text with or without diacritics works great! 👍 Although the highlighted text works only if an exact match is found on `getMatchPositions` https://github.com/BookStackApp/BookStack/blob/a46b438a4c5dc52c8592aec681473c858cfdbd27/app/Search/SearchResultsFormatter.php#L92 So search a text like `δοκιμή` will only get highlighted only if enter as written on a page. Entering text `δοκιμη` works, but no highlighted text shown on search results. The following patch fixes the issue, using transliterator_transliterate to convert text to lower case without diacritics. It requires package `php-intl` installed (eg apt-get install php8.2-intl). ``` diff --git a/app/Search/SearchResultsFormatter.php b/app/Search/SearchResultsFormatter.php index 9cbc5ee6..6bbab29a 100644 --- a/app/Search/SearchResultsFormatter.php +++ b/app/Search/SearchResultsFormatter.php @@ -84,11 +84,11 @@ class SearchResultsFormatter protected function getMatchPositions(string $text, array $terms): array { $matchRefs = []; - $text = mb_strtolower($text); + $text = transliterator_transliterate('NFD; [:Nonspacing Mark:] Remove; Lower; NFC;', $text); foreach ($terms as $term) { $offset = 0; - $term = mb_strtolower($term); + $term = transliterator_transliterate('NFD; [:Nonspacing Mark:] Remove; Lower; NFC;', $term); $pos = mb_strpos($text, $term, $offset); while ($pos !== false) { $end = $pos + mb_strlen($term); ``` I believe above above change will work universally for all languages with diacritics. Please consider accepting that change, if you believe it will improve BookStack. Thanks! ### Steps to Reproduce 1. Create a page that contains text `δοκιμή` (Greek word for test, with ή -> GREEK SMALL LETTER ETA WITH TONOS) 2. Got to 'search' 3. Type word `δοκιμη` (small letters without diacritics) 4. Search results appear but text `δοκιμή` is not highlighted ### Expected Behaviour The text δοκιμή should be highlighted, since it was possible to search that text. ### Screenshots or Additional Context _No response_ ### Browser Details _No response_ ### Exact BookStack Version v23.02.3 ### PHP Version 8.2.5 ### Hosting Environment Debian 11 with PHP 8.2 [by @armando-femat](https://jardin.icamole.fr/books/bookstack/page/installation)
OVERLORD added the 🔨 Feature Request label 2026-02-05 07:20:57 +03:00
Author
Owner

@esakkiraja100116 commented on GitHub (Apr 25, 2023):

Already the text δοκιμή was highlighted as you expect @athoik

image

@esakkiraja100116 commented on GitHub (Apr 25, 2023): Already the text `δοκιμή ` was highlighted as you expect @athoik ![image](https://user-images.githubusercontent.com/57084732/234343831-6a85890d-e191-4ca0-b143-9f7b62d19315.png)
Author
Owner

@athoik commented on GitHub (Apr 26, 2023):

@esakkiraja100116 that is correct, you typed the word δοκιμή including diacritics.

Now give another try searching the word δοκιμη without diacritics and let me know if it gets highlighted.

@athoik commented on GitHub (Apr 26, 2023): @esakkiraja100116 that is correct, you typed the word `δοκιμή` including diacritics. Now give another try searching the word `δοκιμη` without diacritics and let me know if it gets highlighted.
Author
Owner

@esakkiraja100116 commented on GitHub (Apr 26, 2023):

Yes, it's highlighted. Can you provide any screenshot like this @athoik

image

@esakkiraja100116 commented on GitHub (Apr 26, 2023): Yes, it's highlighted. Can you provide any screenshot like this @athoik ![image](https://user-images.githubusercontent.com/57084732/234498673-ad2b1e37-120f-4d47-9786-aaa159478b55.png)
Author
Owner

@athoik commented on GitHub (Apr 26, 2023):

Using the word δοκιμη also δοκιμή should be highlighted (that's what patch is doing)

image

@athoik commented on GitHub (Apr 26, 2023): Using the word `δοκιμη` also `δοκιμή` should be highlighted (that's what patch is doing) ![image](https://user-images.githubusercontent.com/2682247/234513634-ef4ad805-e916-4a59-8788-cfb5ae056237.png)
Author
Owner

@esakkiraja100116 commented on GitHub (Apr 26, 2023):

  • May I know which patch ?
  • Have you checked that you're using the updated version of bookstack ?
  • you can also check with offical-demo

Screenshot from 2023-04-26 14-09-28

@esakkiraja100116 commented on GitHub (Apr 26, 2023): - May I know which patch ? - Have you checked that you're using the updated version of bookstack ? - you can also check with [offical-demo](https://demo.bookstackapp.com/search?term=%CE%B4%CE%BF%CE%BA%CE%B9%CE%BC%CE%B7) ![Screenshot from 2023-04-26 14-09-28](https://user-images.githubusercontent.com/57084732/234520049-f157424c-fd25-43b0-99f6-8d853f17a994.png)
Author
Owner

@ssddanbrown commented on GitHub (Apr 26, 2023):

@esakkiraja100116 I'm pretty sure your screenshots are showing the scenario that @athoik is trying to address here. I believe they'd desire both instances of the term in your screenshot to become bold, not just the last.


Thanks for investigating and providing a patch @athoik.
I'm going to reclassify this as a feature request, since it's not a break/bug in existing supported behaviour (I didn't really know this was a thing) but a request to specifically support diacritics here.

In regards to the patch, I'm not too keen on adding a new system requirement just to meet what is mostly a minor presentational feature (with a little functional purpose). We could conditionally do this based upon extension existence, but not sure if that's a route I'd want to take. I'll have to ponder upon options for this.

@ssddanbrown commented on GitHub (Apr 26, 2023): @esakkiraja100116 I'm pretty sure your screenshots are showing the scenario that @athoik is trying to address here. I believe they'd desire both instances of the term in your screenshot to become bold, not just the last. --- Thanks for investigating and providing a patch @athoik. I'm going to reclassify this as a feature request, since it's not a break/bug in existing supported behaviour (I didn't really know this was a thing) but a request to specifically support diacritics here. In regards to the patch, I'm not too keen on adding a new system requirement just to meet what is mostly a minor presentational feature (with a little functional purpose). We could conditionally do this based upon extension existence, but not sure if that's a route I'd want to take. I'll have to ponder upon options for this.
Author
Owner

@athoik commented on GitHub (Apr 26, 2023):

It's really trivial feature but really nice on Greek users (or other communities using diacritics), since it's common to search words with or without diacritics.

Please feel free to include that feature in hacks section! It might be useful for other people too.

In case php-intl package becomes a thing, then we can re-consider the addition.

Thanks a lot for your support! 👍

@athoik commented on GitHub (Apr 26, 2023): It's really trivial feature but really nice on Greek users (or other communities using diacritics), since it's common to search words with or without diacritics. Please feel free to include that feature in [hacks](https://www.bookstackapp.com/hacks/) section! It might be useful for other people too. In case `php-intl` package becomes a thing, then we can re-consider the addition. Thanks a lot for your support! 👍
Author
Owner

@esakkiraja100116 commented on GitHub (Apr 26, 2023):

Thanks for your clarification @ssddanbrown. I confused with label called bug. As you said it's a feature request

@esakkiraja100116 commented on GitHub (Apr 26, 2023): > Thanks for your clarification @ssddanbrown. I confused with label called `bug`. As you said it's a `feature request `
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/BookStack#3758