Setting Export to PDF to display:none, crawlers still able to find and perform export #2306

Closed
opened 2026-02-05 03:36:40 +03:00 by OVERLORD · 4 comments
Owner

Originally created by @mtrayn01 on GitHub (Jun 27, 2021).

Describe the bug
I have book stack as the documentation repository for my SAAS site. I used the solution in #1251 to set the Actions section to empty to prevent users (all are public users) from exporting to PDF.

I run the bookstack app on an AWS EC2 micro server which is only a single CPU 1 GB RAM linux Apache Web server which is powerful enough for normal operation. The problem I have is that crawlers are hitting the site (which I want them to do) and are somehow finding the export to PDF function, and are exporting all documents to PDF. This is causing a hit on the server to the point where it eventually runs out of resources (maxes out CPU) and crashes the server.

I don't see anywhere that I can disable the Export to PDF functionality globally from a configuration parameter.

As a temporary solution (not confirmed if it will fix the issue with bots yet) is to rename /public/dist/export_styles.css to _export_styles.css

This causes an Unknown Error page to be displayed when I hit the /export/pdf URL used by crawlers.

How are crawlers finding this Action if it is set to display:none? Is there another/better way to completely disable export functions?

Steps To Reproduce
Implement suggested change in #1251
Open a book and append "/export/pdf" to the URL

Expected behavior
Should not be able to perform the action
Should not even be able to identify that this is a possible URL

Screenshots
N/A

Your Configuration (please complete the following information):

  • BookStack v0.31.6
  • PHP Version: 7.4.3
  • Hosting Method (Nginx/Apache/Docker): Apache

Additional context
N/A

Originally created by @mtrayn01 on GitHub (Jun 27, 2021). **Describe the bug** I have book stack as the documentation repository for my SAAS site. I used the solution in #1251 to set the Actions section to empty to prevent users (all are public users) from exporting to PDF. I run the bookstack app on an AWS EC2 micro server which is only a single CPU 1 GB RAM linux Apache Web server which is powerful enough for normal operation. The problem I have is that crawlers are hitting the site (which I want them to do) and are somehow finding the export to PDF function, and are exporting all documents to PDF. This is causing a hit on the server to the point where it eventually runs out of resources (maxes out CPU) and crashes the server. I don't see anywhere that I can disable the Export to PDF functionality globally from a configuration parameter. As a temporary solution (not confirmed if it will fix the issue with bots yet) is to rename /public/dist/export_styles.css to _export_styles.css This causes an Unknown Error page to be displayed when I hit the /export/pdf URL used by crawlers. How are crawlers finding this Action if it is set to display:none? Is there another/better way to completely disable export functions? **Steps To Reproduce** Implement suggested change in #1251 Open a book and append "/export/pdf" to the URL **Expected behavior** Should not be able to perform the action Should not even be able to identify that this is a possible URL **Screenshots** N/A **Your Configuration (please complete the following information):** - BookStack v0.31.6 - PHP Version: 7.4.3 - Hosting Method (Nginx/Apache/Docker): Apache **Additional context** N/A
Author
Owner

@ssddanbrown commented on GitHub (Jun 28, 2021):

How are crawlers finding this Action if it is set to display:none?

They'll often (Not always though) be looking at the source of the page, instead of rendering it to see what's visible or not.

You could maybe do this at the apache level, so add something like the below to your apache config:

RedirectMatch 404 ^.*/export/.*$
@ssddanbrown commented on GitHub (Jun 28, 2021): > How are crawlers finding this Action if it is set to display:none? They'll often (Not always though) be looking at the source of the page, instead of rendering it to see what's visible or not. You could maybe do this at the apache level, so add something like the below to your apache config: ```apache RedirectMatch 404 ^.*/export/.*$ ```
Author
Owner

@mtrayn01 commented on GitHub (Jun 28, 2021):

Thanks @ssddanbrown - I agree this would allow the bots to be redirected to a 404 page. Do you believe this is a better solution than renaming the export_styles.css file which results in an Unknown Error page to be displayed and a 500 error for the PDF file? Would the search engine results be negatively impacted by causing error pages in the URL they are trying to access versus removing links to the export actions completely from the source of the page?

I think from a product standpoint, there needs to be a way to completely disable export functions from a setting that removes any references to the export URL from the source of the page.

@mtrayn01 commented on GitHub (Jun 28, 2021): Thanks @ssddanbrown - I agree this would allow the bots to be redirected to a 404 page. Do you believe this is a better solution than renaming the export_styles.css file which results in an Unknown Error page to be displayed and a 500 error for the PDF file? Would the search engine results be negatively impacted by causing error pages in the URL they are trying to access versus removing links to the export actions completely from the source of the page? I think from a product standpoint, there needs to be a way to completely disable export functions from a setting that removes any references to the export URL from the source of the page.
Author
Owner

@ssddanbrown commented on GitHub (Jun 29, 2021):

I think from a product standpoint, there needs to be a way to completely disable export functions from a setting that removes any references to the export URL from the source of the page.

It's a bit more than this alone, We'll have to disable this at the routing level to do it properly which is already essentially covered by #1251 so I'll close this off.

Do you believe this is a better solution than renaming the export_styles.css file which results in an Unknown Error page to be displayed and a 500 error for the PDF file? Would the search engine results be negatively impacted by causing error pages in the URL they are trying to access versus removing links to the export actions completely from the source of the page?

A 404 would be better than a random error. Error could potentially end up leaking info (If debugging is enabled while being indexed for example) while an 404 is more descriptive in what you want to achieve.

@ssddanbrown commented on GitHub (Jun 29, 2021): > I think from a product standpoint, there needs to be a way to completely disable export functions from a setting that removes any references to the export URL from the source of the page. It's a bit more than this alone, We'll have to disable this at the routing level to do it properly which is already essentially covered by #1251 so I'll close this off. > Do you believe this is a better solution than renaming the export_styles.css file which results in an Unknown Error page to be displayed and a 500 error for the PDF file? Would the search engine results be negatively impacted by causing error pages in the URL they are trying to access versus removing links to the export actions completely from the source of the page? A 404 would be better than a random error. Error could potentially end up leaking info (If debugging is enabled while being indexed for example) while an 404 is more descriptive in what you want to achieve.
Author
Owner

@mtrayn01 commented on GitHub (Jun 29, 2021):

I have implemented the 404 redirect as recommended, however I still think disabling export function should be a setting as a future product improvement. I see there are several other requests for this feature, so just wanted to put my +1 for that request.

@mtrayn01 commented on GitHub (Jun 29, 2021): I have implemented the 404 redirect as recommended, however I still think disabling export function should be a setting as a future product improvement. I see there are several other requests for this feature, so just wanted to put my +1 for that request.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/BookStack#2306