Search bar : ignore a list of words #2364

Closed
opened 2026-02-05 03:48:33 +03:00 by OVERLORD · 3 comments
Owner

Originally created by @Golfwingzero on GitHub (Aug 24, 2021).

When using the search bar, all words are taken into account, which can in some cases disturb the expected results of the search. If a user includes a common word in their search, it may well return the entire documentation. In some cases, the pertinent result will not even appear on the first page.

Example from the demo site : searching truth for truths returns 15 pages. Searching truth truths returns only the one page (in this example, the expected page still appears first, but this isn't always the case).

Proposed solution :
It would be useful, in the settings, to have the option to add a list of terms that the search should ignore, such as common one or two character words that may appear in many pages.

Originally created by @Golfwingzero on GitHub (Aug 24, 2021). When using the search bar, all words are taken into account, which can in some cases disturb the expected results of the search. If a user includes a common word in their search, it may well return the entire documentation. In some cases, the pertinent result will not even appear on the first page. Example from the demo site : searching _truth for truths_ returns 15 pages. Searching _truth truths_ returns only the one page (in this example, the expected page still appears first, but this isn't always the case). **Proposed solution : It would be useful, in the settings, to have the option to add a list of terms that the search should ignore, such as common one or two character words that may appear in many pages.**
Author
Owner

@ssddanbrown commented on GitHub (Aug 24, 2021):

Thanks for the suggestion @Golfwingzero.

To be honest, I don't really like the idea of having an ignore list, especially one a user has to maintain. Works against the default ease of use I aim towards and could just open up some confusing scenarios in usage (Someone wanting to legitimately find an ignored word for example).

Ideally, We should weight the search terms according to frequency in content so that infrequent words have a greater influence on results without needing manual intervention.

@ssddanbrown commented on GitHub (Aug 24, 2021): Thanks for the suggestion @Golfwingzero. To be honest, I don't really like the idea of having an ignore list, especially one a user has to maintain. Works against the default ease of use I aim towards and could just open up some confusing scenarios in usage (Someone wanting to legitimately find an ignored word for example). Ideally, We should weight the search terms according to frequency in content so that infrequent words have a greater influence on results without needing manual intervention.
Author
Owner

@Golfwingzero commented on GitHub (Aug 24, 2021):

I see your point @ssddanbrown. In this particular case though, telling my users to not use short common words such as "of" or "for" in their searches goes against the instinctive behavior in cases those words are included in the targeted page's title.

Since for short linking words such as these, the case where you'd actually need them to find your page will likely be rare, maybe they could be searched only when using quotation marks, and ignored otherwise ? To use my previous example again, you'd search "truth for truths" if you actually wanted to include "for" in your search expression.

Your idea of weighing less frequent words would also probably solve the issue. Either way, hopefully a solution will be implemented, because unfortunately those words break a number of common searches for my user base.

Thank you.

Edit : I've been thinking more about this and I really can't think of a case where, without being included in an expression within quotation marks, a single "a", or "of" could be a relevant or useful search term. Including such words will always add unwanted results to a search.

@Golfwingzero commented on GitHub (Aug 24, 2021): I see your point @ssddanbrown. In this particular case though, telling my users to not use short common words such as "of" or "for" in their searches goes against the instinctive behavior in cases those words are included in the targeted page's title. Since for short linking words such as these, the case where you'd actually need them to find your page will likely be rare, maybe they could be searched only when using quotation marks, and ignored otherwise ? To use my previous example again, you'd search _"truth for truths"_ if you actually wanted to include "for" in your search expression. Your idea of weighing less frequent words would also probably solve the issue. Either way, hopefully a solution will be implemented, because unfortunately those words break a number of common searches for my user base. Thank you. Edit : I've been thinking more about this and I really can't think of a case where, without being included in an expression within quotation marks, a single "a", or "of" could be a relevant or useful search term. Including such words will always add unwanted results to a search.
Author
Owner

@ssddanbrown commented on GitHub (Nov 13, 2021):

As part of #3043, specifically 7405613f8d, term frequency will now be checked and be considered as part of the scoring for the search results. This will be part of the next feature release. Hopefully these changes will ensure there's little desire for the originally requested ignore list so I'll therefore close this off.

@ssddanbrown commented on GitHub (Nov 13, 2021): As part of #3043, specifically 7405613f8d800999713f14f125bacd1132e14818, term frequency will now be checked and be considered as part of the scoring for the search results. This will be part of the next feature release. Hopefully these changes will ensure there's little desire for the originally requested ignore list so I'll therefore close this off.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/BookStack#2364