Chinese search cannot find words in the middle of a sentence. #624

Open
opened 2026-02-04 21:28:46 +03:00 by OVERLORD · 23 comments
Owner

Originally created by @jasoncheng7115 on GitHub (Mar 31, 2018).

For Bug Reports

  • BookStack Version: v0.20.0

When the word I'm looking for is the first word, or there's a space in front of it, it's ok.
i01

But if the word is in the middle of a sentence, it cannot be found.
i02

Whether this is a full-text retrieval of related issues?

Thanks!

Originally created by @jasoncheng7115 on GitHub (Mar 31, 2018). ### For Bug Reports * BookStack Version: v0.20.0 When the word I'm looking for is the first word, or there's a space in front of it, it's ok. ![i01](https://user-images.githubusercontent.com/30381035/38159052-a175106e-34d3-11e8-875d-304da7a1bbaa.png) But if the word is in the middle of a sentence, it cannot be found. ![i02](https://user-images.githubusercontent.com/30381035/38159056-aad016fe-34d3-11e8-83b0-31db2080f243.png) Whether this is a full-text retrieval of related issues? Thanks!
Author
Owner

@alexwyl commented on GitHub (Mar 4, 2019):

The same problem in version: v0.25.1, I have just tried BookStack...

@alexwyl commented on GitHub (Mar 4, 2019): The same problem in version: v0.25.1, I have just tried BookStack...
Author
Owner

@lotustalk commented on GitHub (Mar 13, 2019):

The same problem in version: v0.24.3, I use a docker

@lotustalk commented on GitHub (Mar 13, 2019): The same problem in version: v0.24.3, I use a docker
Author
Owner

@derky1202 commented on GitHub (Sep 2, 2019):

still the same problem in v26.4. hope it could get solved. thanks

@derky1202 commented on GitHub (Sep 2, 2019): still the same problem in v26.4. hope it could get solved. thanks
Author
Owner

@sosize commented on GitHub (Sep 25, 2019):

you can use "成功" for search, maybe the word segmentation has the bug, hope fix it

@sosize commented on GitHub (Sep 25, 2019): you can use "成功" for search, maybe the word segmentation has the bug, hope fix it
Author
Owner

@LeonLiuY commented on GitHub (Nov 7, 2019):

Confirmed this issue still in v0.27.5
One of my team member is hesitating because of this. Would like to see it fixed.

@LeonLiuY commented on GitHub (Nov 7, 2019): Confirmed this issue still in v0.27.5 One of my team member is hesitating because of this. Would like to see it fixed.
Author
Owner

@hlj commented on GitHub (Dec 11, 2019):

Hope fix this issue soon.

@hlj commented on GitHub (Dec 11, 2019): Hope fix this issue soon.
Author
Owner

@ssddanbrown commented on GitHub (Dec 12, 2019):

Sorry about this issue. It essentially stems from my unfamiliarity with non-English text.

At the moment BookStack splits up page content, on certain characters such as spaces and some punctuation, into terms which are put in the database for indexing then a "Starts With" match of those are checked against on a normal search.

As @sosize has mentioned, you can wrap a search in quotes, at which point BookStack will perform a "contains" against the content directly instead of the above "Starts With". This is not the default simply due to performance. ("Starts With" searches can use indexes much more effectively than "Contains").

I'm not really sure how we could utilise the "Starts With" system for such characters. Perhaps the search should default to a "Contains" search if such characters are found in a term?

@ssddanbrown commented on GitHub (Dec 12, 2019): Sorry about this issue. It essentially stems from my unfamiliarity with non-English text. At the moment BookStack splits up page content, on certain characters such as spaces and some punctuation, into terms which are put in the database for indexing then a "Starts With" match of those are checked against on a normal search. As @sosize has mentioned, you can wrap a search in quotes, at which point BookStack will perform a "contains" against the content directly instead of the above "Starts With". This is not the default simply due to performance. ("Starts With" searches can use indexes much more effectively than "Contains"). I'm not really sure how we could utilise the "Starts With" system for such characters. Perhaps the search should default to a "Contains" search if such characters are found in a term?
Author
Owner

@sosize commented on GitHub (Dec 28, 2019):

@ssddanbrown Can this be set as config control ?Select “Starts With” or “Contains” for search type.

More is hope full-text search.

Or how to quickly modify the code?

@sosize commented on GitHub (Dec 28, 2019): @ssddanbrown Can this be set as config control ?Select “Starts With” or “Contains” for search type. More is hope full-text search. Or how to quickly modify the code?
Author
Owner

@lishuai199502 commented on GitHub (Apr 2, 2020):

can i replace all the "startWith" with "contains",or how to modify the source code ,sorry ,i'm a noob

@lishuai199502 commented on GitHub (Apr 2, 2020): can i replace all the "startWith" with "contains",or how to modify the source code ,sorry ,i'm a noob
Author
Owner

@lishuai199502 commented on GitHub (Apr 2, 2020):

Hi,all the guys,I fixed this problem in v0.28.3.Just add a '%' in SearchService.php.
In detail.
in \app\Entities\SearchService.php,about line 196.
modify
$query->orWhere('term', 'like', $inputTerm . '%');
to
$query->orWhere('term', 'like', '%'.$inputTerm . '%');
Just try.

@lishuai199502 commented on GitHub (Apr 2, 2020): Hi,all the guys,I fixed this problem in v0.28.3.Just add a '%' in SearchService.php. In detail. in \app\Entities\SearchService.php,about line 196. modify $query->orWhere('term', 'like', $inputTerm . '%'); to $query->orWhere('term', 'like', '%'.$inputTerm . '%'); Just try.
Author
Owner

@0x9394 commented on GitHub (Aug 18, 2020):

@ssddanbrown hi, can above fix be merge to the source?
after modify SearchService.php now I can search both chinese and english in text body.

@0x9394 commented on GitHub (Aug 18, 2020): @ssddanbrown hi, can above fix be merge to the source? after modify SearchService.php now I can search both chinese and english in text body.
Author
Owner

@chimin-roh commented on GitHub (Aug 19, 2022):

(i'm korean and same problems occur)
I know this issue closed, but i'll post some info in the hopes it will help others in the future.
My bookstack version: v22.07.03

in\app\Entities\Tools\SearchRunner.php about 222 line and 281 line

※ can find middle term
$query->orWhere('term', 'like', $inputTerm . '%');
to
$query->orWhere('term', 'like', '%'.$inputTerm . '%');

※ can sort correctly
$termQuery->orWhere('term', 'like', $term . '%');
to
$termQuery->orWhere('term', 'like', '%'.$term . '%');

@chimin-roh commented on GitHub (Aug 19, 2022): (i'm korean and same problems occur) I know this issue closed, but i'll post some info in the hopes it will help others in the future. My bookstack version: v22.07.03 in\app\Entities\Tools\SearchRunner.php about 222 line and 281 line ※ can find middle term $query->orWhere('term', 'like', $inputTerm . '%'); to $query->orWhere('term', 'like', '%'.$inputTerm . '%'); ※ can sort correctly $termQuery->orWhere('term', 'like', $term . '%'); to $termQuery->orWhere('term', 'like', '%'.$term . '%');
Author
Owner

@derky1202 commented on GitHub (Sep 17, 2022):

nice job. thanks

(i'm korean and same problems occur) I know this issue closed, but i'll post some info in the hopes it will help others in the future. My bookstack version: v22.07.03

in\app\Entities\Tools\SearchRunner.php about 222 line and 281 line

※ can find middle term $query->orWhere('term', 'like', $inputTerm . '%'); to $query->orWhere('term', 'like', '%'.$inputTerm . '%');

※ can sort correctly $termQuery->orWhere('term', 'like', $term . '%'); to $termQuery->orWhere('term', 'like', '%'.$term . '%');

@derky1202 commented on GitHub (Sep 17, 2022): nice job. thanks > (i'm korean and same problems occur) I know this issue closed, but i'll post some info in the hopes it will help others in the future. My bookstack version: v22.07.03 > > in\app\Entities\Tools\SearchRunner.php about 222 line and 281 line > > ※ can find middle term $query->orWhere('term', 'like', $inputTerm . '%'); to $query->orWhere('term', 'like', '%'.$inputTerm . '%'); > > ※ can sort correctly $termQuery->orWhere('term', 'like', $term . '%'); to $termQuery->orWhere('term', 'like', '%'.$term . '%');
Author
Owner

@charlietag commented on GitHub (Jul 23, 2023):

I've made a PR for to make it configurable in .env

ENHANCE_SEARCH_BAR_COMPATIBILITY=false

Hope I'm making it in the right way

#4393

@charlietag commented on GitHub (Jul 23, 2023): I've made a PR for to make it configurable in .env ``` ENHANCE_SEARCH_BAR_COMPATIBILITY=false ``` Hope I'm making it in the right way #4393
Author
Owner

@ssddanbrown commented on GitHub (Jul 23, 2023):

For me to properly look at addressing this, it would be useful if people could help me a little in understanding how the languages in question work. Apologies for my naivety on the subject.

  • In the Chinese language, does a single Chinese character generally map to what is a single word in latin based languages?
  • Is a single Chinese character generally the common unit for what would be searched?
  • How would multiple terms be joined in a single query? For example, If I made the search query for orange cat in English, would the equivalent Chinese search query contain a space?
  • How does the above apply for other languages in Asia such as Korean and Japanese?
@ssddanbrown commented on GitHub (Jul 23, 2023): For me to properly look at addressing this, it would be useful if people could help me a little in understanding how the languages in question work. Apologies for my naivety on the subject. - In the Chinese language, does a single Chinese character generally map to what is a single word in latin based languages? - Is a single Chinese character generally the common unit for what would be searched? - How would multiple terms be joined in a single query? For example, If I made the search query for `orange cat` in English, would the equivalent Chinese search query contain a space? - How does the above apply for other languages in Asia such as Korean and Japanese?
Author
Owner

@charlietag commented on GitHub (Jul 23, 2023):

Hi @ssddanbrown, thanks for helping to solve non-English languages.

I hope the following will help you to understand what I try to solve

Assume senario like this

Pages

My cat likes to eat orange.
But I want him to drink juice

In chinese, it would be

我的貓喜歡吃橘子
但是我要他喝果汁

Database table (search_terms)

And in normal seaerch mode, the query is designed to be starts with, because each value in table column term only stores one vocabulary.
So it's ok in English.

My          | page
cat         | page
likes       | page
to          | page
eat         | page
orange      | page
But         | page
I           | page
want        | page
him         | page
to          | page
drink       | page
juice       | page

In chinese, it would be stored in search_terms like this.
And as you can see, column term stores multiple words in one value

我的貓喜歡吃橘子 | page
但是我要他喝果汁 | page

English vs Chinese

My       <---> 我的
cat      <---> 貓
likes to <---> 喜歡
eat      <---> 吃
orange   <---> 橘子
But      <---> 但是
I        <---> 我
want     <---> 要
him      <---> 他
to drink <---> 喝
juice    <---> 果汁

What we actually prefer

But I'm not sure this is a good design for indexing level.

我  | page
的  | page
貓  | page
喜  | page
歡  | page
吃  | page
橘  | page
子  | page
但  | page
是  | page
我  | page
要  | page
他  | page
喝  | page
果  | page
汁  | page

Re-design

I'm not good at indexing area.
I have a question that why not just search from pages table using like '%term%'.
And let database deal with index thing?

@charlietag commented on GitHub (Jul 23, 2023): Hi @ssddanbrown, thanks for helping to solve non-English languages. I hope the following will help you to understand what I try to solve Assume senario like this ## Pages ``` My cat likes to eat orange. But I want him to drink juice ``` In chinese, it would be ``` 我的貓喜歡吃橘子 但是我要他喝果汁 ``` ## Database table (search_terms) And in `normal seaerch` mode, the query is designed to be `starts with`, because each value in table column `term` only stores one vocabulary. So it's ok in English. ``` My | page cat | page likes | page to | page eat | page orange | page But | page I | page want | page him | page to | page drink | page juice | page ``` In chinese, it would be stored in `search_terms` like this. And as you can see, column `term` stores multiple words in one value ``` 我的貓喜歡吃橘子 | page 但是我要他喝果汁 | page ``` ## English vs Chinese ``` My <---> 我的 cat <---> 貓 likes to <---> 喜歡 eat <---> 吃 orange <---> 橘子 But <---> 但是 I <---> 我 want <---> 要 him <---> 他 to drink <---> 喝 juice <---> 果汁 ``` ## What we actually prefer But I'm not sure this is a good design for indexing level. ``` 我 | page 的 | page 貓 | page 喜 | page 歡 | page 吃 | page 橘 | page 子 | page 但 | page 是 | page 我 | page 要 | page 他 | page 喝 | page 果 | page 汁 | page ``` ## Re-design I'm not good at indexing area. I have a question that why not just search from pages table using like '%term%'. And let database deal with index thing?
Author
Owner

@charlietag commented on GitHub (Jul 23, 2023):

So if we search orange cat, in Chinese, it would be 橘子 貓.

And since Table "search_terms" contains nothing like 橘子 貓, I will get nothing.

And if I search for the following, it will failed:

  • English (failed) - my users like to copy paste to search things...

    • range
    • at
  • Chinese (failed)

What I hope it would be

I hope I can search things like above (failed part)

I can use exact search to achieve purpose above.

  • English (success)

    • "range"
    • "at"
  • Chinese (success)

    • "橘"
    • "貓"

But general users will not remeber to add quotes(") when search things

@charlietag commented on GitHub (Jul 23, 2023): ## Normal search So if we search `orange cat`, in Chinese, it would be `橘子 貓`. And since `Table "search_terms"` contains nothing like `橘子 貓`, I will get nothing. And if I search for the following, it will failed: * English (failed) - my users like to copy paste to search things... * `range` * `at` * Chinese (failed) * `橘` * `貓` ## What I hope it would be I hope I can search things like above (failed part) ## Exact search I can use `exact search` to achieve purpose above. * English (success) * `"range"` * `"at"` * Chinese (success) * `"橘"` * `"貓"` But general users will not remeber to add quotes(") when search things
Author
Owner

@ssddanbrown commented on GitHub (Jul 23, 2023):

Thanks for the info @charlietag.

I have a question that why not just search from pages table using like '%term%'. And let database deal with index thing?

The database won't use indexes for queries like that. The search index is specifically built so prefix-based matching can be performed while making use of database indexes. Additionally contains matching in the context of how this are currently built would significantly increase the accidental matches of partial included terms, and therefore impact the scoring.
Databases do often have fulltext indexes for "contains" search (Which BookStack used to use) but those have their own complications and there's a reason we moved away from things.

My intention has been to alter how we split the terms for indexing and search, for different character ranges, much like you've suggested, but I just want to better understand how searches and words translate in different languages, hence my last comment.


I would still like to invite others, particularly those using other Asian languages, to answer my previous comment.

@ssddanbrown commented on GitHub (Jul 23, 2023): Thanks for the info @charlietag. > I have a question that why not just search from pages table using like '%term%'. And let database deal with index thing? The database won't use indexes for queries like that. The search index is specifically built so prefix-based matching can be performed while making use of database indexes. Additionally contains matching in the context of how this are currently built would significantly increase the accidental matches of partial included terms, and therefore impact the scoring. Databases do often have fulltext indexes for "contains" search (Which BookStack used to use) but those have their own complications and there's a reason we moved away from things. My intention has been to alter how we split the terms for indexing and search, for different character ranges, much like you've suggested, but I just want to better understand how searches and words translate in different languages, hence my last comment. --- I would still like to invite others, particularly those using other Asian languages, to [answer my previous comment](https://github.com/BookStackApp/BookStack/issues/778#issuecomment-1646803819).
Author
Owner

@10935336 commented on GitHub (Jul 24, 2023):

For me to properly look at addressing this, it would be useful if people could help me a little in understanding how the languages in question work. Apologies for my naivety on the subject.

I'm not a language expert.
So this answer may not be entirely accurate.

  • In the Chinese language, does a single Chinese character generally map to what is a single word in latin based languages?
In modern Chinese, most words are written with two or more characters.
https://en.wikipedia.org/wiki/Chinese_characters

But there are also some cases where a single character maps to a single Latin word.

i <--> 我
my <--> 我的
myself <--> 我自己 or 我本人 or 本人 or 独自
dog <--> 狗
cloud <-->  云
car <-->  车
  • Is a single Chinese character generally the common unit for what would be searched?

A search for a Chinese character usually does not return useful results.
But sometimes people still search for a single Chinese character like "cat“ ”"

Here are some searches recorded by google analytics on my website:

美好的每一天  <--> wonderful everyday(a video game title)
官网  <--> official website
宣传片  <--> promo video
巨构  <--> megastructure
指令  <--> command
文化  <--> culture
新用户  <--> new user
服务器  <--> server
添加  <--> add
猫  <--> cat
个人利益  <--> personal benefit
公共事件  <--> public event
雨  <--> rain
  • How would multiple terms be joined in a single query? For example, If I made the search query for orange cat in English, would the equivalent Chinese search query contain a space?

The words are not separated by spaces in Chinese, Japanese and Korea.
Unlike most languages, Chinese does not use spaces to separate characters into words.

When searching in Chinese, you would not use spaces to separate terms in a query. Instead, you would enter the characters for each term next to each other without spaces.

So usually search engines use a tokenizer to break a sentence into words:

"人人生而自由,在尊严和权利上一律平等"
“人人”, “生而”, “自由”, ",", "在", "尊严", "和", "权力", "上", "一律", "平等"
("all human beings", "born", "free", ",", "in", "dignity", "and", "rights", "on", "all", "equal")

"All human beings are born free and equal in dignity and rights"
"All human beings", "are born", "free", "and", "equal", "in", "digenity", "and", "rights"

In the example of orange cat, it can be an 橘猫 or 橘色猫 or 橘色的猫(orange color's cat).

orange  <--> 橘子(mandarin orange) or 橙子 or 橙色(orange color)
cat  <-->  猫
methoxymethane   <-->  二甲醚 or 甲氧基甲烷

two <--> 二
methyl ether <--> 甲醚

methoxy <--> 甲氧基
methane <--> 甲烷

oxy <--> 氧基
alkyl <--> 烷
`甲` can mean a shell or armor, which is the external protective layer of an animal or a person. In this case, it can be translated as shell or armor

`甲` can mean the first of the ten heavenly stems, which is the first symbol in the cycle of ten celestial stems. In this case, it can be translated as the first of the ten heavenly stems or simply A.

`甲` can mean the first party in a list or a contract, which is the one that comes first. In this case, it can be translated as first (in a list, as a party in a contract etc).



So there seems to be no easy way to segment words.

To be honest, it is very difficult to search Sino-Tibetan languages well. So many applications I have seen choose to use elasticsearch as their Search Engine.

Even in elasticsearch, many people are not satisfied with the official tokenizer and many other tokenizers have been created:

Update:
This may be the solution you want. Jieba is a popular (32.7K star) Chinese word segmentation component,
and this is its PHP ported version:

But it seems that jieba consumes a bit of memory, this module is more lightweight

@10935336 commented on GitHub (Jul 24, 2023): > For me to properly look at addressing this, it would be useful if people could help me a little in understanding how the languages in question work. Apologies for my naivety on the subject. I'm not a language expert. So this answer may not be entirely accurate. > * In the Chinese language, does a single Chinese character generally map to what is a single word in latin based languages? ``` In modern Chinese, most words are written with two or more characters. https://en.wikipedia.org/wiki/Chinese_characters ``` But there are also some cases where a single character maps to a single Latin word. ``` i <--> 我 my <--> 我的 myself <--> 我自己 or 我本人 or 本人 or 独自 dog <--> 狗 cloud <--> 云 car <--> 车 ``` > * Is a single Chinese character generally the common unit for what would be searched? A search for a Chinese character usually does not return useful results. But sometimes people still search for a single Chinese character like "`cat`“ ”`猫`" Here are some searches recorded by google analytics on my website: ``` 美好的每一天 <--> wonderful everyday(a video game title) 官网 <--> official website 宣传片 <--> promo video 巨构 <--> megastructure 指令 <--> command 文化 <--> culture 新用户 <--> new user 服务器 <--> server 添加 <--> add 猫 <--> cat 个人利益 <--> personal benefit 公共事件 <--> public event 雨 <--> rain ``` > * How would multiple terms be joined in a single query? For example, If I made the search query for `orange cat` in English, would the equivalent Chinese search query contain a space? The words are not separated by spaces in Chinese, Japanese and Korea. Unlike most languages, Chinese does not use spaces to separate characters into words. When searching in Chinese, you would not use spaces to separate terms in a query. Instead, you would enter the characters for each term next to each other without spaces. So usually search engines use a tokenizer to break a sentence into words: ``` "人人生而自由,在尊严和权利上一律平等" “人人”, “生而”, “自由”, ",", "在", "尊严", "和", "权力", "上", "一律", "平等" ("all human beings", "born", "free", ",", "in", "dignity", "and", "rights", "on", "all", "equal") "All human beings are born free and equal in dignity and rights" "All human beings", "are born", "free", "and", "equal", "in", "digenity", "and", "rights" ``` In the example of `orange cat`, it can be an `橘猫` or `橘色猫` or `橘色的猫(orange color's cat)`. ``` orange <--> 橘子(mandarin orange) or 橙子 or 橙色(orange color) cat <--> 猫 ``` ``` methoxymethane <--> 二甲醚 or 甲氧基甲烷 two <--> 二 methyl ether <--> 甲醚 methoxy <--> 甲氧基 methane <--> 甲烷 oxy <--> 氧基 alkyl <--> 烷 ``` ``` `甲` can mean a shell or armor, which is the external protective layer of an animal or a person. In this case, it can be translated as shell or armor `甲` can mean the first of the ten heavenly stems, which is the first symbol in the cycle of ten celestial stems. In this case, it can be translated as the first of the ten heavenly stems or simply A. `甲` can mean the first party in a list or a contract, which is the one that comes first. In this case, it can be translated as first (in a list, as a party in a contract etc). ``` <br> <br> <br> So there seems to be no easy way to segment words. To be honest, it is very difficult to search Sino-Tibetan languages well. So many applications I have seen choose to use elasticsearch as their Search Engine. Even in elasticsearch, many people are not satisfied with the official tokenizer and many other tokenizers have been created: - https://github.com/medcl/elasticsearch-analysis-ik - https://github.com/medcl/elasticsearch-analysis-stconvert - https://github.com/medcl/elasticsearch-analysis-pinyin - https://github.com/KennFalcon/elasticsearch-analysis-hanlp - https://github.com/elastic/elasticsearch-analysis-smartcn Update: This may be the solution you want. Jieba is a popular (32.7K star) Chinese word segmentation component, and this is its PHP ported version: - https://github.com/fukuball/jieba-php - https://github.com/cyd622/nlp-jieba But it seems that jieba consumes a bit of memory, this module is more lightweight - https://github.com/hightman/scws
Author
Owner

@matteotw commented on GitHub (Apr 23, 2024):

I also couldn't search Chinese words successfully. (English keywords are OK.) I have no experience about it, just guess it could be optimised through something like Asian language parser.

https://docs-develop.pleroma.social/backend/configuration/howto_search_cjk/

https://pgroonga.github.io/

@matteotw commented on GitHub (Apr 23, 2024): I also couldn't search Chinese words successfully. (English keywords are OK.) I have no experience about it, just guess it could be optimised through something like Asian language parser. https://docs-develop.pleroma.social/backend/configuration/howto_search_cjk/ https://pgroonga.github.io/
Author
Owner

@kernelry commented on GitHub (Jul 1, 2024):

Version:v24.02.2
I think I solved the problem, Modify the code on line 213 of /var/www/BookStack/app/Search/SearchRunner.php
Before modification:

   210	        $subQuery->where(function (Builder $query) use ($terms) {
   211	            foreach ($terms as $inputTerm) {
   212	                $inputTerm = str_replace('\\', '\\\\', $inputTerm);
   213	                $query->orWhere('term', 'like', $inputTerm . '%');
   214	            }
   215	        });

only one result...
image

After modification:

   210	        $subQuery->where(function (Builder $query) use ($terms) {
   211	            foreach ($terms as $inputTerm) {
   212	                $inputTerm = str_replace('\\', '\\\\', $inputTerm);
   213	                $query->orWhere('term', 'like', '%' . $inputTerm . '%');
   214	            }
   215	        });

have seven result!
image

@kernelry commented on GitHub (Jul 1, 2024): Version:v24.02.2 I think I solved the problem, Modify the code on line 213 of `/var/www/BookStack/app/Search/SearchRunner.php`: Before modification: ```php 210 $subQuery->where(function (Builder $query) use ($terms) { 211 foreach ($terms as $inputTerm) { 212 $inputTerm = str_replace('\\', '\\\\', $inputTerm); 213 $query->orWhere('term', 'like', $inputTerm . '%'); 214 } 215 }); ``` only one result... ![image](https://github.com/BookStackApp/BookStack/assets/19744542/9b50d985-0f18-43bd-a2f8-27bbaf344590) After modification: ```php 210 $subQuery->where(function (Builder $query) use ($terms) { 211 foreach ($terms as $inputTerm) { 212 $inputTerm = str_replace('\\', '\\\\', $inputTerm); 213 $query->orWhere('term', 'like', '%' . $inputTerm . '%'); 214 } 215 }); ``` have seven result! ![image](https://github.com/BookStackApp/BookStack/assets/19744542/09a04fff-627e-4141-9ae1-5551e277d1b2)
Author
Owner

@charlietag commented on GitHub (Jul 27, 2024):

Hi @kernelry

Actually, that's what I've proposed to author.
But he has his own consideration.
For now we can only workaround.

Let's hope it will be fixed in the future version.

https://github.com/BookStackApp/BookStack/pull/4393

@charlietag commented on GitHub (Jul 27, 2024): Hi @kernelry Actually, that's what I've proposed to author. But he has his own consideration. For now we can only workaround. Let's hope it will be fixed in the future version. https://github.com/BookStackApp/BookStack/pull/4393
Author
Owner

@johnroyer commented on GitHub (Jan 31, 2025):

I take a look in search functionality. Parse text to tokens is good on English-liked languages. But it is not a good idea on CJK-liked (Chinese, Japanese, Korean) language, because they do not use spaces to separate words and phrases.

MySQL support full-text indexing (MATCH ... AGAINST), but it do not do really well on searching.

I would like to introduce Meilisearch as full-text indexing engine. Meilisearch use N-grams to generate tokens, better then using spaces. By using N-grams, Meilisearch support CJK languages.

I create an demo project with BookStack and Meilisearch: https://github.com/johnroyer/BookStack-Meilisearch . It use Meiliseach to show search suggestions.


@ssddanbrown : Search engine is a hard work. I hope you can put your time on implement functionalities, rather then search engine. Thanks a lot.

@johnroyer commented on GitHub (Jan 31, 2025): I take a look in search functionality. Parse text to tokens is good on English-liked languages. But it is not a good idea on CJK-liked (Chinese, Japanese, Korean) language, because they do not use spaces to separate words and phrases. MySQL support full-text indexing (`MATCH ... AGAINST`), but it do not do really well on searching. I would like to introduce [Meilisearch](https://github.com/meilisearch/meilisearch) as full-text indexing engine. Meilisearch use [N-grams](https://www.meilisearch.com/docs/learn/indexing/tokenization) to generate tokens, better then using spaces. By using N-grams, Meilisearch support [CJK](https://www.meilisearch.com/docs/learn/resources/language) languages. I create an demo project with BookStack and Meilisearch: https://github.com/johnroyer/BookStack-Meilisearch . It use Meiliseach to show search suggestions. ---- @ssddanbrown : Search engine is a hard work. I hope you can put your time on implement functionalities, rather then search engine. Thanks a lot.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/BookStack#624