mirror of
https://github.com/BookStackApp/BookStack.git
synced 2026-02-05 00:29:48 +03:00
Search has issues with words adjacent to puncutation characters #1709
Closed
opened 2026-02-05 01:40:45 +03:00 by OVERLORD
·
9 comments
No Branch/Tag Specified
development
l10n_development
further_theme_development
release
llm_only
vectors
v25-11
docker_env
drawio_rendering
user_permissions
ldap_host_failover
svg_image
prosemirror
captcha_example
fix/video-export
v25.12.3
v25.12.2
v25.12.1
v25.12
v25.11.6
v25.11.5
v25.11.4
v24.11.4
v25.11.3
v25.11.2
v25.11.1
v25.11
v25.07.3
v25.07.2
v25.07.1
v25.07
v25.05.2
v25.05.1
v25.05
v25.02.5
v25.02.4
v25.02.3
v25.02.2
v25.02.1
v25.02
v24.12.1
v24.12
v24.10.3
v24.10.2
v24.10.1
v24.10
v24.05.4
v24.05.3
v24.05.2
v24.05.1
v24.05
v24.02.3
v24.02.2
v24.02.1
v24.02
v23.12.3
v23.12.2
v23.12.1
v23.12
v23.10.4
v23.10.3
v23.10.2
v23.10.1
v23.10
v23.08.3
v23.08.2
v23.08.1
v23.08
v23.06.2
v23.06.1
v23.06
v23.05.2
v23.05.1
v23.05
v23.02.3
v23.02.2
v23.02.1
v23.02
v23.01.1
v23.01
v22.11.1
v22.11
v22.10.2
v22.10.1
v22.10
v22.09.1
v22.09
v22.07.3
v22.07.2
v22.07.1
v22.07
v22.06.2
v22.06.1
v22.06
v22.04.2
v22.04.1
v22.04
v22.03.1
v22.03
v22.02.3
v22.02.2
v22.02.1
v22.02
v21.12.5
v21.12.4
v21.12.3
v21.12.2
v21.12.1
v21.12
v21.11.3
v21.11.2
v21.11.1
v21.11
v21.10.3
v21.10.2
v21.10.1
v21.10
v21.08.6
v21.08.5
v21.08.4
v21.08.3
v21.08.2
v21.08.1
v21.08
v21.05.4
v21.05.3
v21.05.2
v21.05.1
v21.05
v21.04.6
v21.04.5
v21.04.4
v21.04.3
v21.04.2
v21.04.1
v21.04
v0.31.8
v0.31.7
v0.31.6
v0.31.5
v0.31.4
v0.31.3
v0.31.2
v0.31.1
v0.31.0
v0.30.7
v0.30.6
v0.30.5
v0.30.4
v0.30.3
v0.30.2
v0.30.1
v0.30.0
v0.29.3
v0.29.2
v0.29.1
v0.29.0
v0.28.3
v0.28.2
v0.28.1
v0.28.0
v0.27.5
v0.27.4
v0.27.3
v0.27.2
v0.27.1
v0.27
v0.26.4
v0.26.3
v0.26.2
v0.26.1
v0.26.0
v0.25.5
v0.25.4
v0.25.3
v0.25.2
v0.25.1
v0.25.0
v0.24.3
v0.24.2
v0.24.1
v0.24.0
v0.23.2
v0.23.1
v0.23.0
v0.22.0
v0.21.0
v0.20.3
v0.20.2
v0.20.1
v0.20.0
v0.19.0
v0.18.5
v0.18.4
v0.18.3
v0.18.2
v0.18.1
v0.18.0
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.0
v0.16.3
v0.16.2
v0.16.1
v0.16.0
v0.15.3
v0.15.2
v0.15.1
v0.15.0
v0.14.3
v0.14.2
v0.14.1
v0.14.0
v0.13.1
v0.13.0
v0.12.2
v0.12.1
v0.12.0
v0.11.2
v0.11.1
v0.11.0
v0.10.0
v0.9.3
v0.9.2
v0.9.1
v0.9.0
v0.8.2
v0.8.1
v0.8.0
v0.7.6
v0.7.5
v0.7.4
v0.7.3
0.7.2
v.0.7.1
v0.7.0
v0.6.3
v0.6.2
v0.6.1
v0.6.0
v0.5.0
Labels
Clear labels
🎨 Design
📖 Docs Update
🐛 Bug
🐛 Bug
:cat2:🐈 Possible duplicate
💿 Database
☕ Open to discussion
💻 Front-End
🐕 Support
🚪 Authentication
🌍 Translations
🔌 API Task
🏭 Back-End
⛲ Upstream
🔨 Feature Request
🛠️ Enhancement
🛠️ Enhancement
🛠️ Enhancement
❤️ Happy feedback
🔒 Security
🔍 Pending Validation
💆 UX
📝 WYSIWYG Editor
🌔 Out of scope
🔩 API Request
:octocat: Admin/Meta
🖌️ View Customization
❓ Question
🚀 Priority
🛡️ Blocked
🚚 Export System
♿ A11y
🔧 Maintenance
> Markdown Editor
No Label
🏭 Back-End
Milestone
No items
No Milestone
Projects
Clear projects
No project
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: starred/BookStack#1709
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @Knaui on GitHub (May 5, 2020).
for example: it wont find "house" in "big-house"
but it will find "big"
this is the case for book or page titles and for page content
tested with BookStack version v0.29.0
@kshitijsharma97 commented on GitHub (Aug 4, 2020):
I also tried same ting in my dev instance I had the same issue.
If I try the 1st word then the page will come but if I try the whole name with hyphen nothing came in search result.
But if you pull the changes and update the version to BookStack v0.29.3.
The issue with this hyphen separated search is resolved.
@ssddanbrown commented on GitHub (Jul 12, 2021):
Updating the title to be more generic in the interest of merging down some issues.
Related to #1037
@Wookbert commented on GitHub (Jul 23, 2021):
@ssddanbrown
I’ve just realized that searching word parts which are combined through hyphens, doesn't work either.
Example: Searching for
historiandoes not find the page onCCU-Historian, while searching forccudoes. Note that hyphens are a very common element in for instance German language. You often have word combinations which are connected through 2 or even 3 hyphens.An english language example would be
Remote-robot-assisted, which IMO should be retrieved when searching for any of the three words individually, but also e.g.robot-assisted,robot assistedorrobotassisted. (Same applies for any spelling of theRemote robotcombination).@dweinerATL commented on GitHub (Aug 5, 2021):
@ssddanbrown we are running into something similar. Running BookStack v21.05.4 for a science fiction authors book series. One of her races are called
Ke!endarian. If you search forKe!endarian, no results. If you search forKel, you get the expected response. We have found that the search will work if you search for"Ke!endarian"however.@ssddanbrown commented on GitHub (Nov 13, 2021):
As part of #3043 I've made a change to auto-convert any search terms, that would experience this issue, into exact match terms instead which will run a direct, although less efficient, content match. Doesn't directly solve this but should provide a much better user-experience in such situations. Will be part of the next feature release.
@caius-martinus commented on GitHub (Oct 18, 2023):
Hello @ssddanbrown,
I think issue isn't solved at least in 23.08.2, here is how to reproduce: create a page with the content
/abc123on a single line. Now searchabc1and you should observe it doesn't match. However/abc1would.@sNiXx commented on GitHub (Nov 21, 2023):
I can confirm this issue is still present on 23.10.2. I also just verified on the demo instance (currently 23.10.4) and hyphenated words are not correctly found. For instance, the pages prod-linode-sparkjet or dev-internal-sparklebike on the demo instance cannot be found if the last term (i.e. sparkjet or sparklebike) is used to search.
@watschi commented on GitHub (Jan 8, 2025):
Facing the same issue with hyphenated words, which are pretty common in german text.
Quick and dirty solution (needs to be applied after any update):
app/Search/SearchIndex.php, add a hyphen (-) to$delimiters(at Link)php artisan bookstack:regenerate-searchTest-Word,Test,WordandTest-Wordwill return the desired content@ssddanbrown Any reason to exclude
-from the delimiters? Feels like this should be included by default, maybe it's an oversight, maybe I'm missing something 🙂@ssddanbrown commented on GitHub (Feb 14, 2025):
@watschi
Really it was because they felt more part of a term rather than something to split them by, but I can see the issue that would result.
I spent some time on this today to change up the indexing a bit via #5488.
I've tried to come to a compromise to help address some of the most problematic areas, in addition to adding
-as a delimiter.Now, for the text
cat-dogBookStack will now index that ascat,dogandcat-dog.That way, searching for either work will work but the full term will also work via our proper indexed term system.
The same is done for dots/periods (which I thought could be important for numbering among other things).
There will still be gaps and limitations in search due to the nature of the trying to keep content indexed, using prefix matching, and the use of custom tokenization, but this should solve some of the most common issues here reported about hyphenated words.
Therefore I'm going to close this off but new focus areas can be raised as needed (If not already open).
The mentioned changes will be part of the next feature release.
Note, that you'd need to regenerate the search index after updating to gain these index improvements.
Thanks all for your input!