mirror of
https://github.com/BookStackApp/BookStack.git
synced 2026-02-06 00:59:39 +03:00
BookStack indexing fails in Danswer - status 403: Forbidden #4314
Closed
opened 2026-02-05 08:31:01 +03:00 by OVERLORD
·
19 comments
No Branch/Tag Specified
development
further_theme_development
l10n_development
release
llm_only
vectors
v25-11
docker_env
drawio_rendering
user_permissions
ldap_host_failover
svg_image
prosemirror
captcha_example
fix/video-export
v25.12.3
v25.12.2
v25.12.1
v25.12
v25.11.6
v25.11.5
v25.11.4
v24.11.4
v25.11.3
v25.11.2
v25.11.1
v25.11
v25.07.3
v25.07.2
v25.07.1
v25.07
v25.05.2
v25.05.1
v25.05
v25.02.5
v25.02.4
v25.02.3
v25.02.2
v25.02.1
v25.02
v24.12.1
v24.12
v24.10.3
v24.10.2
v24.10.1
v24.10
v24.05.4
v24.05.3
v24.05.2
v24.05.1
v24.05
v24.02.3
v24.02.2
v24.02.1
v24.02
v23.12.3
v23.12.2
v23.12.1
v23.12
v23.10.4
v23.10.3
v23.10.2
v23.10.1
v23.10
v23.08.3
v23.08.2
v23.08.1
v23.08
v23.06.2
v23.06.1
v23.06
v23.05.2
v23.05.1
v23.05
v23.02.3
v23.02.2
v23.02.1
v23.02
v23.01.1
v23.01
v22.11.1
v22.11
v22.10.2
v22.10.1
v22.10
v22.09.1
v22.09
v22.07.3
v22.07.2
v22.07.1
v22.07
v22.06.2
v22.06.1
v22.06
v22.04.2
v22.04.1
v22.04
v22.03.1
v22.03
v22.02.3
v22.02.2
v22.02.1
v22.02
v21.12.5
v21.12.4
v21.12.3
v21.12.2
v21.12.1
v21.12
v21.11.3
v21.11.2
v21.11.1
v21.11
v21.10.3
v21.10.2
v21.10.1
v21.10
v21.08.6
v21.08.5
v21.08.4
v21.08.3
v21.08.2
v21.08.1
v21.08
v21.05.4
v21.05.3
v21.05.2
v21.05.1
v21.05
v21.04.6
v21.04.5
v21.04.4
v21.04.3
v21.04.2
v21.04.1
v21.04
v0.31.8
v0.31.7
v0.31.6
v0.31.5
v0.31.4
v0.31.3
v0.31.2
v0.31.1
v0.31.0
v0.30.7
v0.30.6
v0.30.5
v0.30.4
v0.30.3
v0.30.2
v0.30.1
v0.30.0
v0.29.3
v0.29.2
v0.29.1
v0.29.0
v0.28.3
v0.28.2
v0.28.1
v0.28.0
v0.27.5
v0.27.4
v0.27.3
v0.27.2
v0.27.1
v0.27
v0.26.4
v0.26.3
v0.26.2
v0.26.1
v0.26.0
v0.25.5
v0.25.4
v0.25.3
v0.25.2
v0.25.1
v0.25.0
v0.24.3
v0.24.2
v0.24.1
v0.24.0
v0.23.2
v0.23.1
v0.23.0
v0.22.0
v0.21.0
v0.20.3
v0.20.2
v0.20.1
v0.20.0
v0.19.0
v0.18.5
v0.18.4
v0.18.3
v0.18.2
v0.18.1
v0.18.0
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.0
v0.16.3
v0.16.2
v0.16.1
v0.16.0
v0.15.3
v0.15.2
v0.15.1
v0.15.0
v0.14.3
v0.14.2
v0.14.1
v0.14.0
v0.13.1
v0.13.0
v0.12.2
v0.12.1
v0.12.0
v0.11.2
v0.11.1
v0.11.0
v0.10.0
v0.9.3
v0.9.2
v0.9.1
v0.9.0
v0.8.2
v0.8.1
v0.8.0
v0.7.6
v0.7.5
v0.7.4
v0.7.3
0.7.2
v.0.7.1
v0.7.0
v0.6.3
v0.6.2
v0.6.1
v0.6.0
v0.5.0
Labels
Clear labels
🎨 Design
📖 Docs Update
🐛 Bug
🐛 Bug
:cat2:🐈 Possible duplicate
💿 Database
☕ Open to discussion
💻 Front-End
🐕 Support
🚪 Authentication
🌍 Translations
🔌 API Task
🏭 Back-End
⛲ Upstream
🔨 Feature Request
🛠️ Enhancement
🛠️ Enhancement
🛠️ Enhancement
❤️ Happy feedback
🔒 Security
🔍 Pending Validation
💆 UX
📝 WYSIWYG Editor
🌔 Out of scope
🔩 API Request
:octocat: Admin/Meta
🖌️ View Customization
❓ Question
🚀 Priority
🛡️ Blocked
🚚 Export System
♿ A11y
🔧 Maintenance
> Markdown Editor
pull-request
Mirrored from GitHub Pull Request
No Label
🐕 Support
Milestone
No items
No Milestone
Projects
Clear projects
No project
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: starred/BookStack#4314
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @LkySlevin on GitHub (Nov 12, 2023).
Attempted Debugging
Searched GitHub Issues
Describe the Scenario
@ssddanbrown First thing, thank you for BookStack and also for integrating it into Danswer. I am using BookStack for one year now and wanted to integrate AI, thus I want to connect it to Danswer. I already got a lot of help from the Danswer team, but I am not able to run the indexing of my BookStack wiki.
From the background task log I get:
I have created a user explicitly for being a dooropener to danswer and used the API tokens created. I also tried with an already existing admin user, both with the same result. I already checked with the Danswer team in their Slack, but they also think it might be on the BookStack side of things. Any idea where I could dig any further and how to proceed?
Exact BookStack Version
BookStack v23.01.1
Log Content
I dont think this error is related, because it is only once in my logs (and the only error within the past half year) and the error persits for every indexing run (each 10mins). Nevertheless I post it since it might give any help:
Hosting Environment
@ssddanbrown commented on GitHub (Nov 12, 2023):
Hi @LkySlevin,
For the created user, who's API credentials you're using here, do they belong to a BookStack role which has the "Access system API" role permission?
@LkySlevin commented on GitHub (Nov 17, 2023):
Hi @ssddanbrown ,
Yes i did.
I followed the instructions from Danswer - Bookstack Connector Guide.
So I created a new User called "DanswerUser" and I also created the role "Danswer". This role has the "Access system API" permission. I than created an API Token and entered it with the corresponding Secret in Danswer Admin Panel.
These are the user role's asset permissions:

Maybe it has something to do with the base URL? But this is the one within the .env file and it is the page visible after login.
https://thriving-green.com/bookstack/public/
No idea, what could be the issue...
@ssddanbrown commented on GitHub (Nov 18, 2023):
It could likely be due to the base URL, or how your setup is handling URLs in general.
You could try removing the trailing slash of the base URL in danswer.
Also, try going to
https://thriving-green.com/bookstack/public/api/docs.jsonin the browser while logged in as an API-allowed user to see if that endpoint works and returns JSON.Having
/public/in the URL like that indicates a sketchy setup though, likely with workarounds or edits at play. You should really never need to have/public/be part of the URL if setup properly.@LkySlevin commented on GitHub (Nov 18, 2023):
That actually works fine. I can see a json.
Does not change anything.
Well, it has been a while and it was the first time for me to set anything up like this using apache and php whatsoever. Took me quite some time also to SSH into siteground. So, I would call my self a real freshman.
I think I remember playing around with the base URL until it worked but my memory could trick me here.
If you really think that could be the issue, then I would appreciate if you could help me put it in the right place.
@ssddanbrown commented on GitHub (Nov 18, 2023):
Since the JSON endpoint worked, we'll continue checking on API usage for now, can circle back to that after but not sure it'd be the issue since you can access the docs endpoint.
Next up is to validate the token and key works externally.
From a terminal window, or powershell window if on Windows, run:
But replace
abc123with the BookStack token ID, anddef456with the BookStack token secret.What do you get in response?
@LkySlevin commented on GitHub (Nov 18, 2023):
Alright, I am quite new to curl requests and I tryed the following.
I use cmd window with admin rights on windows
Simply copy+pasting your code from above and entering the credentials does not work since it is not executed completely but only the first two lines resulting in
curl: (3) URL rejected: Bad hostnameUsing it as a one liner (which pretty sure is wrong)
curl --request GET \ --url https//thriving-green.com/bookstack/public/api/books \ --header 'Authorization: Token gBW0xz0jTAutvbxAIr:VW6Dq6QvEbxXNS'(I shortened the keys) leads to the response:curl: (3) URL rejected: Bad hostname curl: (6) Could not resolve host: https curl: (3) URL rejected: Bad hostname curl: (6) Could not resolve host: Token curl: (3) URL rejected: Port number was not a decimal number between 0 and 65535Finally, I searched the internet and adapted your code to
Using
^and""I think is the way to go on windows I guess, resulting in an automatic newline within the powershell asking formore?. The final part of the response translates toThe revocation function could not perform a revocation check for the certificate.So I am not sure if I performed the
curlcommand correctly, but I assume the last attempt was correct.I really appreciate your work on bookstack and your help here!
@LkySlevin commented on GitHub (Nov 18, 2023):
Not sure if it helps but ChatGPT suggested to try:
Again I shortened the tokens here.
EDIT: I tried with new tokens from my admin user from bookstack, which has all permissions.
Without the
--insecureoption I also get thecurl (35)error. However, using--insecureI get a proper response@ssddanbrown commented on GitHub (Nov 18, 2023):
Okay, so that last attempt is working okay.
Does danswer work if you use those same (known working) details?
If you get the same error, next thing I'd suggest checking is the webserver error/access logs to see if they provide any clues. If you're using some kind of management layer/system, you might have to refer to their docs in where to find those. Thinking it could be some security/access controls set on the site?
@LkySlevin commented on GitHub (Nov 19, 2023):
I tried with these credentials one more time (with and without trailing

/) but with no luck.Regarding my setup, our website thriving-green.com is hosted at siteground. That was done years ago by a colleague, who is not part of the organization anymore.
When I installed BookStack, I simply installed it here:
That is why
publicis part of the base URL I guess.In the logfiles folder I checked the latest
.gzfile but did not find any clues - though you might be knowing what to look for.AFAIK I did not setup any particular layer/system or security/access controls in BookStack or the website setup. We are simply an NGO using BookStack as a wiki.
Would you suggest me, to move the Bookstack folder outside of the website? Can this be done without the risk of loosing our current conten? What would be the base url then?
Thanks for your support!
@LkySlevin commented on GitHub (Nov 19, 2023):
Besides, have a look at my
.env's content (I removed credentials)@ssddanbrown commented on GitHub (Nov 19, 2023):
It really depends on what options you have in (what I assume to be) your management system (siteground).
It gets a bit more complex since you're wanting to serve on a sub-path too.
It's probably gonna take a lot more time to understand what hosting options you have, and walk through the process step-by-step.
I have some guidance here for a sub-directory setup, but it assumes web-server access. You may be limited by your hosting system.
I'm not sure it's the cause of the danswer issues though though, since you can connect to the API directly.
Can you see error logs when following this guidance?
@LkySlevin commented on GitHub (Nov 20, 2023):
Well according to the error logs - there are no errors at all :D

So that is a dead end.
But what I can see is that we actually have different domains. The wiki-domain containt an old wiki used several years before. I think I will create a subdomain with bookstack and adapt the base URL. Do you think this is a good approach?
Regarding API - dont you think it is strange that I can only get access with the admin account and not the others? And do you know why it only works with "insecure" settings?
@ssddanbrown commented on GitHub (Nov 20, 2023):
That is usually the easiest approach.
Ideally you'd need to be able to set your web root so you're only exposing the
publicfolder, but I'm not sure SiteGround provide this, since they document this workaround.Also, ideally you'd have command-line access to properly manage a BookStack instance. I advise against using BookStack in environments where this is not possible otherwise you can't properly manage the instance.
Before attempting anything, make sure you have good backups though.
It is but I'm not sure it's connected with the danswer issue, since you're specifically getting a 403 response there, rather than a connection error. The error you got with the non-admin user is quite specific to a non-matching api token scenario. Not sure how you'd see that error message without the token ID being wrong or badly formatted somehow.
The
--insecureflag ignores issues with verification when attempting to make ahttps://connection. You have a valid public cert on the website, but this can be thrown if there are issues from the client system where you're running the command (or anything in the middle like proxies, especially if using a company machine/internet).It's still not clear why you're getting 403 errors from danswer, but siteground is responding with that exact
403 - Forbiddenwhen certain paths are accessed, which makes me think a connection is being made but something's off at either the siteground side or BookStack side, or maybe the handling of URLs (could be customizations to make it serve onpublic).The
.htaccessof the BookStack public folder, and each.htaccessin the folders above, could be affecting things.Shame there's nothing in the error logs, that seems wrong to be honest. You can try looking instead at the access logs, should be in the same kind of place. Just to see what they show when danswer attempts a connection.
@LkySlevin commented on GitHub (Nov 21, 2023):
Thank you for the elaborate answer.
Correct me if I am misunderstanding you here, but I am able to SSH into siteground, that is how I set up BookStack in the first place. Although it took me quite some time to find a proper tutorial and get in.
So as there is no other solution in sight, I will try to move BookStack. My plan is actually to leave the current state (whole bookstack folder) where it is at the moment as my backup and also copy it to the new subdomain and update the base URL as described in
.evnusing SSH.I will report back what happened.
@ssddanbrown commented on GitHub (Nov 21, 2023):
Ah, okay, that should be fine then.
Was just worried since a lot of these managed-hosting systems don't provide access.
At some point, when the new version is active, you'll have to update the URLs in the database, for which we have a command for. You'll also have to update the
APP_URLin the new setup.envfile.Before running that command, you'll want to backup the database where possible, since the database will probably still be shared with the original instance (unless you've exported and re-imported into a different database entry or something).
@LkySlevin commented on GitHub (Nov 22, 2023):
So I did the following
php artisan bookstack:update-url https://thriving-green.com/bookstack/public https://bookstack.thriving-green.com.envfile but did not work but also showed me the 403 forbidden error.https://bookstack.thriving-green.com/publicI can log into bookstack again.However, I tried Danswer and it failed again with the same message and since I have to add the
publicwe are at the same place as before I would say.I also checked the access logs from Siteground. Unfortunately I dont see anything happening when I run an index attempt with danswer in the logs. I only see the following when login in and accessing a page:
(I altered the IP)
So I dont know if the access logs have any value for you and besides. Do you think that the workaround with the
.htaccesswill bring any benefit? I assume since it is only redirecting it might not change the situation.@ssddanbrown commented on GitHub (Nov 23, 2023):
Alright, just done some more testing. Think I've got an idea of the cause.
This felt like something being blocked at host/webserver level (Siteground or the web-server they're running) since I'd see similar
403 - Forbiddenresponses when hitting certain endpoints.From experience, I know some systems that attempt to do active security blocking can be unfriendly to default or empty user-agents (How browsers identify themselves to servers, but it's totally messy and most lie anyway).
Playing around with this via CURL, i found this is in play and can block requests that are using the default python-requests user agent (which is the library used by Danswer to make requests). As an example:
The first example is completely being blocked at the Siteground/web-server level, so is not reaching BookStack.
The second is reaching BookStack but just then failing API auth (expected since I'm using made-up invalid token values).
Based on these tests, this is likely the cause.
If you don't have security controls for this (or anything related to
User-Agent) then it'd be worth contacting siteground if possible to see if that can alter this rule.Could alternatively ask for the user-agent to be changed on the danswer side, but I don't think they should have to make changes just to work with rules used by Siteground, best to do this Siteground side if possible.
@LkySlevin commented on GitHub (Nov 24, 2023):
Finally the issue is resolved.
I contacted SiteGround support and they confirmed:
So I checked the Danswer repo and found under
backend -> danswer -> connectors -> bookstack -> client.pyI rebuild Danswer and retried with the new user-agent and it worked!

Thank you very much for your support, without you I would not have made it ;)
@ssddanbrown commented on GitHub (Nov 24, 2023):
Awesome news! Good to see things working for you!