HMTL export taking longer then 1 minute #4809

Open
opened 2026-02-05 09:17:43 +03:00 by OVERLORD · 10 comments
Owner

Originally created by @jonathon2nd on GitHub (Jun 3, 2024).

Describe the Bug

Attempting to do an HTML export fails after one minute, results in 504 error.

Steps to Reproduce

Using either export-books.php or via UI
image

Attempt to generate an html export of a book.

Expected Behaviour

HTML would be downloaded.

Screenshots or Additional Context

The txt download is ~533kB

Log from console.

2024-06-03T18:51:46.963675560Z   % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
2024-06-03T18:51:47.105897343Z 
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  2572  100  2572    0     0  18402      0 --:--:-- --:--:-- --:--:-- 18503
2024-06-03T18:53:00.705073606Z PHP Warning:  file_get_contents(http://bookstack-service.wiki/api/books/28/export/html): Failed to open stream: HTTP request failed! HTTP/1.1 504 Gateway Time-out
2024-06-03T18:53:00.705112372Z  in /export-books.php on line 74

Browser Details

No response

Exact BookStack Version

v24.05.1

Originally created by @jonathon2nd on GitHub (Jun 3, 2024). ### Describe the Bug Attempting to do an HTML export fails after one minute, results in 504 error. ### Steps to Reproduce Using either export-books.php or via UI ![image](https://github.com/BookStackApp/BookStack/assets/52681917/6eb85a23-bbcd-4dee-9dcc-ed28fe803468) Attempt to generate an html export of a book. ### Expected Behaviour HTML would be downloaded. ### Screenshots or Additional Context The txt download is ~533kB Log from console. ``` 2024-06-03T18:51:46.963675560Z % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 2024-06-03T18:51:47.105897343Z 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 2572 100 2572 0 0 18402 0 --:--:-- --:--:-- --:--:-- 18503 2024-06-03T18:53:00.705073606Z PHP Warning: file_get_contents(http://bookstack-service.wiki/api/books/28/export/html): Failed to open stream: HTTP request failed! HTTP/1.1 504 Gateway Time-out 2024-06-03T18:53:00.705112372Z in /export-books.php on line 74 ``` ### Browser Details _No response_ ### Exact BookStack Version v24.05.1
OVERLORD added the 🐛 Bug label 2026-02-05 09:17:43 +03:00
Author
Owner

@jonathon2nd commented on GitHub (Jun 3, 2024):

Screenshot from 2024-06-03 13-13-01
PDF also times out

@jonathon2nd commented on GitHub (Jun 3, 2024): ![Screenshot from 2024-06-03 13-13-01](https://github.com/BookStackApp/BookStack/assets/52681917/60563b36-0082-46d8-9dfc-6d9477a19481) PDF also times out
Author
Owner

@ssddanbrown commented on GitHub (Jun 4, 2024):

Hi @jonathon2nd,
Exports can take a while if there's a lot of content, and sometimes in rare cases specific content can trip up the exports system and cause more work than expected to be done.
Really, this is the kind of thing I'd need to replicate with the same content to actually testing.

Do other books in the system also time-out, even if simple?
You could maybe clone the book and delete parts of it to help identify if it's mainly down to a specific page or collection of pages.

@ssddanbrown commented on GitHub (Jun 4, 2024): Hi @jonathon2nd, Exports can take a while if there's a lot of content, and sometimes in rare cases specific content can trip up the exports system and cause more work than expected to be done. Really, this is the kind of thing I'd need to replicate with the same content to actually testing. Do other books in the system also time-out, even if simple? You could maybe clone the book and delete parts of it to help identify if it's mainly down to a specific page or collection of pages.
Author
Owner

@M0n7y5 commented on GitHub (Jun 4, 2024):

Check your logs ... you may need to change memory limits or execution timeout in php.ini

@M0n7y5 commented on GitHub (Jun 4, 2024): Check your logs ... you may need to change memory limits or execution timeout in php.ini
Author
Owner

@jonathon2nd commented on GitHub (Jun 4, 2024):

@M0n7y5 Both had already increased. I am now running into Cloudflare timeout. No errors in container logs.

@ssddanbrown We have no other books that have the timeout. Once the book is split up, we will export each one and see if it is a problem because of content type, not necessarily the size of the book.

The txt download is ~533kB
The md download is ~775kB

@jonathon2nd commented on GitHub (Jun 4, 2024): @M0n7y5 Both had already increased. I am now running into Cloudflare timeout. No errors in container logs. @ssddanbrown We have no other books that have the timeout. Once the book is split up, we will export each one and see if it is a problem because of content type, not necessarily the size of the book. The txt download is ~533kB The md download is ~775kB
Author
Owner

@jonathon2nd commented on GitHub (Jun 5, 2024):

The book has been refactored, still failing to export to html in 1 minute

txt export size: ~150kB
md export size: ~250kB

Able to export each page individually
image

@jonathon2nd commented on GitHub (Jun 5, 2024): The book has been refactored, still failing to export to html in 1 minute txt export size: ~150kB md export size: ~250kB Able to export each page individually ![image](https://github.com/BookStackApp/BookStack/assets/52681917/766f50b4-86cc-4495-bb1f-671e4ff4a772)
Author
Owner

@M0n7y5 commented on GitHub (Jun 5, 2024):

You need to tell cloudflare to wait longer for server to respond. Cloudflare thinks server is down while your book is converting to PDF.

@M0n7y5 commented on GitHub (Jun 5, 2024): You need to tell cloudflare to wait longer for server to respond. Cloudflare thinks server is down while your book is converting to PDF.
Author
Owner

@M0n7y5 commented on GitHub (Jun 5, 2024):

Also one page taking 120MB is crazy ... What kind of content do you have on your pages?

@M0n7y5 commented on GitHub (Jun 5, 2024): Also one page taking 120MB is crazy ... What kind of content do you have on your pages?
Author
Owner

@jonathon2nd commented on GitHub (Jun 5, 2024):

Lots of photos.

Whats strange is that those couple of huge individual pages take no more then ~3 seconds. Most others were instant. So not sure why the book export explodes.

@jonathon2nd commented on GitHub (Jun 5, 2024): Lots of photos. Whats strange is that those couple of huge individual pages take no more then ~3 seconds. Most others were instant. So not sure why the book export explodes.
Author
Owner

@ssddanbrown commented on GitHub (Jun 8, 2024):

Yeah, 120MB is super high. If the pages are exporting quick, might indicate hitting some kind of memory limit or exhaustion, or just that HTML is just too large to be handling without problems.
There might be a more efficient way for us to do the embed/parsing (placeholder then simple string replacements at the end) but at those kinds of sizes, I'd be surpised if there are not other issues that pop up anyway.
The formats we produce aren't really great for high-image/data content tbh.

@ssddanbrown commented on GitHub (Jun 8, 2024): Yeah, 120MB is super high. If the pages are exporting quick, might indicate hitting some kind of memory limit or exhaustion, or just that HTML is just too large to be handling without problems. There might be a more efficient way for us to do the embed/parsing (placeholder then simple string replacements at the end) but at those kinds of sizes, I'd be surpised if there are not other issues that pop up anyway. The formats we produce aren't really great for high-image/data content tbh.
Author
Owner

@M0n7y5 commented on GitHub (Jun 18, 2024):

The issue here is that parsing HTML takes a lot of memory and converting it to PDF is CPU intensive task because all of this is done in old PHP library. PHP itself is just slow. I solved my issue by using https://gotenberg.dev/ and overriding the PDF Export. It also solves a lot of weird issues with some Unicode stuff. It uses headless Chrome under the hood.

@M0n7y5 commented on GitHub (Jun 18, 2024): The issue here is that parsing HTML takes a lot of memory and converting it to PDF is CPU intensive task because all of this is done in old PHP library. PHP itself is just slow. I solved my issue by using https://gotenberg.dev/ and overriding the PDF Export. It also solves a lot of weird issues with some Unicode stuff. It uses headless Chrome under the hood.
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/BookStack#4809