Search Results are completely Irrelevant #2318

Closed
opened 2026-02-05 03:38:37 +03:00 by OVERLORD · 8 comments
Owner

Originally created by @vampirismtrueblood on GitHub (Jul 8, 2021).

Describe the bug
When searching for anything even with the EXACT string, it still displays completely wrong results, I will post screenshots of both My mediawiki and Bookstack to help show the algorithm difference

Running version: BookStack v21.05.3 container (Yes, i added the fix term like % $searchterm % makes absolutely no difference with and without)

Both my bookstack and Mediawiki are in absolute sync, both have same exact articles of 1790+

I ran php artisan bookstack:regenerate-search BEFORE and AFTER % $searchterm %

Bookstack will only give correct results if:

  1. Enclosed within double-quotes "search word(s)"
  2. Must be matching the same exact order of keywords in the actual page title

Steps To Reproduce
Steps to reproduce the behavior:

  1. Search for a page
  2. Click on "Enter" or search

Expected behavior
Get Pages with top matching score as first results

Screenshots
Test ONE
Bookstack (completely irrelevant results)
image

Mediawiki (Perfectly accurate)
image

Test TWO (Notice the sequence of keywords used vs actual page name
Bookstack (Completely irrelevant results)
image

Mediawiki (Perfectly accurate again)
image

Test THREE (Exact Title Match)
Bookstack (Completely irrelevant results)
image

Mediawiki (Perfect match)
image

Your Configuration (please complete the following information):
BookStack v21.05.3 container

Additional context

Originally created by @vampirismtrueblood on GitHub (Jul 8, 2021). **Describe the bug** When searching for anything even with the EXACT string, it still displays completely wrong results, I will post screenshots of both My mediawiki and Bookstack to help show the algorithm difference Running version: BookStack v21.05.3 container (Yes, i added the fix term like % $searchterm % makes absolutely no difference with and without) Both my bookstack and Mediawiki are in absolute sync, both have same exact articles of 1790+ I ran php artisan bookstack:regenerate-search BEFORE and AFTER % $searchterm % Bookstack will only give correct results if: 1. Enclosed within double-quotes "search word(s)" 2. Must be matching the same exact order of keywords in the actual page title **Steps To Reproduce** Steps to reproduce the behavior: 1. Search for a page 2. Click on "Enter" or search **Expected behavior** Get Pages with top matching score as first results **Screenshots** **Test ONE** Bookstack **(completely irrelevant results)** ![image](https://user-images.githubusercontent.com/8714571/124830051-f3588b00-dfab-11eb-8f69-b9db714c5e97.png) Mediawiki (Perfectly accurate) ![image](https://user-images.githubusercontent.com/8714571/124830254-31ee4580-dfac-11eb-9b87-a7c08ee4df37.png) **Test TWO (Notice the sequence of keywords used vs actual page name** Bookstack **(Completely irrelevant results)** ![image](https://user-images.githubusercontent.com/8714571/124830585-ae812400-dfac-11eb-9946-bd3ec864ab75.png) Mediawiki (Perfectly accurate again) ![image](https://user-images.githubusercontent.com/8714571/124830663-d1abd380-dfac-11eb-8625-489cc9a6e405.png) **Test THREE (Exact Title Match)** Bookstack **(Completely irrelevant results)** ![image](https://user-images.githubusercontent.com/8714571/124831092-5b5ba100-dfad-11eb-8b2b-ec3c339d62dc.png) Mediawiki (Perfect match) ![image](https://user-images.githubusercontent.com/8714571/124831156-6dd5da80-dfad-11eb-8dad-e0ca23baf7d8.png) **Your Configuration (please complete the following information):** BookStack v21.05.3 container **Additional context**
OVERLORD added the 🛠️ Enhancement🏭 Back-End labels 2026-02-05 03:38:37 +03:00
Author
Owner

@vampirismtrueblood commented on GitHub (Jul 8, 2021):

If you would please take a peak at Mediawiki Search algos, I love bookstack, your visual Editor is state of the art with some css tweaks, and goes far and beyond when compared to Mediawiki, but the Search Functionality is equally as important. I'm happy to run tests anytime and feedback as quickly as possible.

@vampirismtrueblood commented on GitHub (Jul 8, 2021): If you would please take a peak at Mediawiki Search algos, I love bookstack, your visual Editor is state of the art with some css tweaks, and goes far and beyond when compared to Mediawiki, but the Search Functionality is equally as important. I'm happy to run tests anytime and feedback as quickly as possible.
Author
Owner

@vampirismtrueblood commented on GitHub (Jul 8, 2021):

So I couldn't wait on someone to get back to me .. I did look into the code, and I realized it's using score weights based on pages title, description and type be it book, chapter, shelf .. etc .

I finally got it the search to behave accurately like Mediawiki and here's my solution:

On the DB
use bookstack
delete from search_terms;

edit the file app/Entities/Tools/SearchIndex.php

vim app/Entities/Tools/SearchIndex.php

change the value from 5 to 200 on Lines 34 and 52

so that it'll look like this under both "Public and Private" Functions":
$nameTerms = $this->generateTermArrayFromText($entity->name, 200 * $entity->searchFactor);

finally run this command to re-generate the index with more logical weights:
php artisan bookstack:regenerate-search

I'm sure there are better ways to do it, but that was the fastest and one that requires least changes. hopefully this helps someone too

@vampirismtrueblood commented on GitHub (Jul 8, 2021): So I couldn't wait on someone to get back to me .. I did look into the code, and I realized it's using score weights based on pages title, description and type be it book, chapter, shelf .. etc . I finally got it the search to behave accurately like Mediawiki and here's my solution: On the DB **use bookstack** **delete from search_terms;** edit the file app/Entities/Tools/SearchIndex.php **vim app/Entities/Tools/SearchIndex.php** change the value from **5 to 200** on Lines **34** and **52** so that it'll look like this under both "Public and Private" Functions": **$nameTerms = $this->generateTermArrayFromText($entity->name, 200 * $entity->searchFactor);** finally run this command to re-generate the index with more logical weights: **php artisan bookstack:regenerate-search** I'm sure there are better ways to do it, but that was the fastest and one that requires least changes. hopefully this helps someone too
Author
Owner

@abulgatz commented on GitHub (Aug 6, 2021):

This greatly improves the search! @vampirismtrueblood can you explain what weights this changes and how you figured this out?

@abulgatz commented on GitHub (Aug 6, 2021): This greatly improves the search! @vampirismtrueblood can you explain what weights this changes and how you figured this out?
Author
Owner

@bensulli commented on GitHub (Oct 12, 2021):

This has become an increasing issue as our wiki has grown with our company. We have a lot of pages now, but the search never seems to find or prioritize the results we expected. There are times where I'll search for exactly the title of a page and it won't show up in the results (or at least not in the first page).

Happy to provide any details to @ssddanbrown directly if it'll help, but can't post an example here due to confidentiality.

@bensulli commented on GitHub (Oct 12, 2021): This has become an increasing issue as our wiki has grown with our company. We have a lot of pages now, but the search never seems to find or prioritize the results we expected. There are times where I'll search for exactly the title of a page and it won't show up in the results (or at least not in the first page). Happy to provide any details to @ssddanbrown directly if it'll help, but can't post an example here due to confidentiality.
Author
Owner

@ssddanbrown commented on GitHub (Oct 13, 2021):

Just to confirm my view on this, I'm well aware the search system needs some attention. Over the last few years I've spent some time attempting revamps of the system or exploring alternative options but failed each time. This was to address a wider set of issues (Such as translation handling). I've realised it would be more worthwhile to just improve upon what we have though so I do plan on soon spending some time during these next few months on improving a range of search elements.

@ssddanbrown commented on GitHub (Oct 13, 2021): Just to confirm my view on this, I'm well aware the search system needs some attention. Over the last few years I've spent some time attempting revamps of the system or exploring alternative options but failed each time. This was to address a wider set of issues (Such as translation handling). I've realised it would be more worthwhile to just improve upon what we have though so I do plan on soon spending some time [during these next few months](https://danb.me/blog/posts/leaving-my-job-to-focus-on-open-source/) on improving a range of search elements.
Author
Owner

@bensulli commented on GitHub (Oct 13, 2021):

Awesome to hear! Please don't hesitate to reach out if I can lend a hand with testing (not much of a coder unfortunately, but can deploy and test it against our real-world data). I've brought Bookstacks to three different organizations now because I sing its praises every new job :)

@bensulli commented on GitHub (Oct 13, 2021): Awesome to hear! Please don't hesitate to reach out if I can lend a hand with testing (not much of a coder unfortunately, but can deploy and test it against our real-world data). I've brought Bookstacks to three different organizations now because I sing its praises every new job :)
Author
Owner

@vampirismtrueblood commented on GitHub (Nov 13, 2021):

explain

I did look into the search class as well as the DB, found the link and went from there, it's what Im currently using, although it's still not quite there like media wiki but is very much usable with the workaround I mentioned above (it's quick and short)

@vampirismtrueblood commented on GitHub (Nov 13, 2021): > explain I did look into the search class as well as the DB, found the link and went from there, it's what Im currently using, although it's still not quite there like media wiki but is very much usable with the workaround I mentioned above (it's quick and short)
Author
Owner

@ssddanbrown commented on GitHub (Nov 13, 2021):

I've now made a range of changes to the search indexing & scoring system as part of PR #3043.

Part of this was adjusting up the title score although I have not upped this as drastically as above (From 5 to 40, instead of the 200 above). I'd want to tread a bit more carefully while being cautious of how the changes will interact with other changes.

Some other bits in the PR that specifically address result scoring/ranking:

  • Per-term scoring is adjusted based on instance-wide frequency of that term.
  • Terms within headers, within page content, now receive a boost.
  • Applied tag names and values will now be considered as part of the scoring.

Hopefully the combination of these changes will make a significant different. Will all be part of the next feature release so will therefore close this issue off.

@ssddanbrown commented on GitHub (Nov 13, 2021): I've now made a range of changes to the search indexing & scoring system as part of PR #3043. Part of this was adjusting up the title score although I have not upped this as drastically as above (From 5 to 40, instead of the 200 above). I'd want to tread a bit more carefully while being cautious of how the changes will interact with other changes. Some other bits in the PR that specifically address result scoring/ranking: - Per-term scoring is adjusted based on instance-wide frequency of that term. - Terms within headers, within page content, now receive a boost. - Applied tag names and values will now be considered as part of the scoring. Hopefully the combination of these changes will make a significant different. Will all be part of the next feature release so will therefore close this issue off.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/BookStack#2318