[PR #3043] [MERGED] Search Engine Improvement #6124

Closed
opened 2026-02-05 10:25:02 +03:00 by OVERLORD · 0 comments
Owner

📋 Pull Request Information

Original PR: https://github.com/BookStackApp/BookStack/pull/3043
Author: @ssddanbrown
Created: 11/8/2021
Status: Merged
Merged: 11/13/2021
Merged by: @ssddanbrown

Base: masterHead: search_improvements_a


📝 Commits (10+)

  • e1b8fe4 Refactored search runner a little to be neater
  • 9e0164f Further search system refactorings
  • b0b6f46 Reduced data retreived from database on page search
  • 7405613 Added search term score popularity adjustment
  • b3e1c7d Applied styleci fixes and pluck improvement as per larastan
  • bc472ca Improved relation loading during search
  • da17004 Added test to cover search frquency rank changes
  • 0ddd052 Added missing comments or types
  • 9f32613 Refactored search indexer, Increase title/name score boost
  • 820be16 Updated regen-search command to show some level of progress

📊 Changes

19 files changed (+811 additions, -149 deletions)

View changed files

📝 app/Actions/Tag.php (+6 -0)
📝 app/Console/Commands/RegenerateSearch.php (+11 -2)
📝 app/Entities/Models/Book.php (+1 -1)
📝 app/Entities/Models/Bookshelf.php (+1 -1)
📝 app/Entities/Models/Chapter.php (+1 -1)
📝 app/Entities/Models/Entity.php (+1 -9)
📝 app/Entities/Models/Page.php (+2 -4)
📝 app/Entities/Repos/PageRepo.php (+1 -1)
📝 app/Entities/Tools/SearchIndex.php (+182 -37)
📝 app/Entities/Tools/SearchOptions.php (+39 -7)
app/Entities/Tools/SearchResultsFormatter.php (+236 -0)
📝 app/Entities/Tools/SearchRunner.php (+191 -63)
📝 app/Http/Controllers/SearchController.php (+5 -7)
📝 database/seeders/LargeContentSeeder.php (+8 -3)
📝 resources/sass/_blocks.scss (+4 -0)
📝 resources/views/entities/list-item-basic.blade.php (+1 -1)
📝 resources/views/entities/list-item.blade.php (+3 -3)
📝 resources/views/entities/tag.blade.php (+4 -4)
📝 tests/Entity/EntitySearchTest.php (+114 -5)

📄 Description

This PR tracks ideas and progress for a series of improvements to be made for the search system.

Enhancements

  • Term relative frequency based ranking
    • Related: #2894, Upgrade from v22.02.2 to 22.04.2 (#2840)
    • Adjust search scores based on the relative use of the given terms so that a common word has a lesser impact on search score compared to a relatively rarely-used word.
    • Implementation - 7405613f8d
    • Testing - da17004c3e
    • A lot of raw SQL used here, Double check all is escaped correctly.
  • Adjust existing scoring to boost titles/names further
  • Content parsing for heading boosting
    • Boost terms used in header formats (h1, h2, h3...). Will need to see if performance hit from parsing is viable, Including when doing a system-re-index operation. (Maybe add a progress bar?)
    • Implementation - f28daa01d9
    • Testing
  • Match tags for standard terms
  • Auto-convert terms with parse delimiters to exact matches
    • Related: #2095, Books and Chapters as templates (#2088)
    • Currently using a normal term for something with term parse delimiters (eg. 192.168.1.1) will have no results since the used search query won't match any term in the database. Instead we should convert such queries to exact searches for convenience.
    • Implementation - 7d0724e288
    • Testing
  • Highlight/show terms within search result listing

Fixes & Tweaks

  • Added progress visibility on regen-search command. - 820be162f5
  • Prevent page HTML being returned in search query for efficiency. - b0b6f466c1
  • Load relations in query to prevent view-time loading currently leading to n+1 situation. - bc472ca2d7
  • General code refactorings

Docs

  • Need to advise about running the search regen command to re-index according to new scoring.

🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.

## 📋 Pull Request Information **Original PR:** https://github.com/BookStackApp/BookStack/pull/3043 **Author:** [@ssddanbrown](https://github.com/ssddanbrown) **Created:** 11/8/2021 **Status:** ✅ Merged **Merged:** 11/13/2021 **Merged by:** [@ssddanbrown](https://github.com/ssddanbrown) **Base:** `master` ← **Head:** `search_improvements_a` --- ### 📝 Commits (10+) - [`e1b8fe4`](https://github.com/BookStackApp/BookStack/commit/e1b8fe45b0271e66adbcc06c7d75ddb1a80b4556) Refactored search runner a little to be neater - [`9e0164f`](https://github.com/BookStackApp/BookStack/commit/9e0164f4f45cb68f9dccf96db28c8b05ed493be7) Further search system refactorings - [`b0b6f46`](https://github.com/BookStackApp/BookStack/commit/b0b6f466c18f88ba0474778624195e1719c82532) Reduced data retreived from database on page search - [`7405613`](https://github.com/BookStackApp/BookStack/commit/7405613f8d800999713f14f125bacd1132e14818) Added search term score popularity adjustment - [`b3e1c7d`](https://github.com/BookStackApp/BookStack/commit/b3e1c7da73a5a5279b84d16b2efd170f4b7702f9) Applied styleci fixes and pluck improvement as per larastan - [`bc472ca`](https://github.com/BookStackApp/BookStack/commit/bc472ca2d7f0f01b035cb17a414c9e7eef9a5576) Improved relation loading during search - [`da17004`](https://github.com/BookStackApp/BookStack/commit/da17004c3ee95a13afd1ea1b460ac2eae4262e87) Added test to cover search frquency rank changes - [`0ddd052`](https://github.com/BookStackApp/BookStack/commit/0ddd0528181fde31e9d3a45f3ec5c2efaba44995) Added missing comments or types - [`9f32613`](https://github.com/BookStackApp/BookStack/commit/9f3261398207d3c4d77d20f54ac160f61209c1e1) Refactored search indexer, Increase title/name score boost - [`820be16`](https://github.com/BookStackApp/BookStack/commit/820be162f5bfb31f69f0122a61755fdd8623275f) Updated regen-search command to show some level of progress ### 📊 Changes **19 files changed** (+811 additions, -149 deletions) <details> <summary>View changed files</summary> 📝 `app/Actions/Tag.php` (+6 -0) 📝 `app/Console/Commands/RegenerateSearch.php` (+11 -2) 📝 `app/Entities/Models/Book.php` (+1 -1) 📝 `app/Entities/Models/Bookshelf.php` (+1 -1) 📝 `app/Entities/Models/Chapter.php` (+1 -1) 📝 `app/Entities/Models/Entity.php` (+1 -9) 📝 `app/Entities/Models/Page.php` (+2 -4) 📝 `app/Entities/Repos/PageRepo.php` (+1 -1) 📝 `app/Entities/Tools/SearchIndex.php` (+182 -37) 📝 `app/Entities/Tools/SearchOptions.php` (+39 -7) ➕ `app/Entities/Tools/SearchResultsFormatter.php` (+236 -0) 📝 `app/Entities/Tools/SearchRunner.php` (+191 -63) 📝 `app/Http/Controllers/SearchController.php` (+5 -7) 📝 `database/seeders/LargeContentSeeder.php` (+8 -3) 📝 `resources/sass/_blocks.scss` (+4 -0) 📝 `resources/views/entities/list-item-basic.blade.php` (+1 -1) 📝 `resources/views/entities/list-item.blade.php` (+3 -3) 📝 `resources/views/entities/tag.blade.php` (+4 -4) 📝 `tests/Entity/EntitySearchTest.php` (+114 -5) </details> ### 📄 Description This PR tracks ideas and progress for a series of improvements to be made for the search system. ### Enhancements - **Term relative frequency based ranking** - Related: #2894, #2840 - _Adjust search scores based on the relative use of the given terms so that a common word has a lesser impact on search score compared to a relatively rarely-used word._ - [x] Implementation - 7405613f8d800999713f14f125bacd1132e14818 - [x] Testing - da17004c3ee95a13afd1ea1b460ac2eae4262e87 - [x] A lot of raw SQL used here, Double check all is escaped correctly. - **Adjust existing scoring to boost titles/names further** - Related: #2840 - [x] Implementation - 9f3261398207d3c4d77d20f54ac160f61209c1e1 - **Content parsing for heading boosting** - _Boost terms used in header formats (h1, h2, h3...). Will need to see if performance hit from parsing is viable, Including when doing a system-re-index operation. (Maybe add a progress bar?)_ - [x] Implementation - f28daa01d9d43d36c12b075bddca92be9e8f85e4 - [x] Testing - **Match tags for standard terms** - Related: #1577 - _Include tag names and values as indexed search terms so that they can be search without specifically using a tag search_ - [x] Implementation - 99587a0be63556a6915ac2728d8236da2f61c288 - [x] Testing - **Auto-convert terms with parse delimiters to exact matches** - Related: #2095, #2088 - _Currently using a normal term for something with term parse delimiters (eg. `192.168.1.1`) will have no results since the used search query won't match any term in the database. Instead we should convert such queries to exact searches for convenience._ - [x] Implementation - 7d0724e2888f768149b425efcdc185a1c7a4be02 - [x] Testing - **Highlight/show terms within search result listing** - Related: #1891, #997 - _We should show the terms highlighted within the preview content for search results. Will just be an estimate of how it matched but even an estimation should prove useful. Ideally should also highlight tags_ - Implementation: - [x] Text preview content - f30b937bb02eea92c078ea9644e3b70bd63974d8 - [x] Item titles - ab4e99bb187fb4273dcad2fa3c731ba46e49a585 - [x] Tags - 339518e2a6ad1cee717d821afe9238d0ac9792ed - [x] Testing - 63d8d72d7ecdba31903bee4c2295f2d0a2149e0d ### Fixes & Tweaks - [x] Added progress visibility on regen-search command. - 820be162f5bfb31f69f0122a61755fdd8623275f - [x] Prevent page HTML being returned in search query for efficiency. - b0b6f466c18f88ba0474778624195e1719c82532 - [x] Load relations in query to prevent view-time loading currently leading to n+1 situation. - bc472ca2d7f0f01b035cb17a414c9e7eef9a5576 - [x] General code refactorings - e1b8fe45b0271e66adbcc06c7d75ddb1a80b4556 - 9e0164f4f45cb68f9dccf96db28c8b05ed493be7 ### Docs - Need to advise about running the search regen command to re-index according to new scoring. --- <sub>🔄 This issue represents a GitHub Pull Request. It cannot be merged through Gitea due to API limitations.</sub>
OVERLORD added the pull-request label 2026-02-05 10:25:02 +03:00
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/BookStack#6124