Provide alternative option for PDF rendering #4365

Closed
opened 2026-02-05 08:41:18 +03:00 by OVERLORD · 5 comments
Owner

Originally created by @ssddanbrown on GitHub (Dec 18, 2023).

Originally assigned to: @ssddanbrown on GitHub.

At this point WKHTMLtoPDF is deprecated, with the source repos archived and it drifting out of OS repos (Alpine as an example).

Would be good to allow an alternative to fill the non-dompdf gap, and then deprecate and remove WKHTMLtoPDF specific support in the future.

Instead of supporting specific export options, we could maybe support a generic interface to allow adapting to different options where desired. Something like a configurable command path with a placeholder parameter to take a location of where BookStack writes out HTML to for conversion.

Research

  • Chromium - Can't see way to completely disable networking & JS.
  • Firefox - Can't find direct way for PDF capture. Does have a --no-remote option which may disable networking?

For browsers, there's a WebDriver standard in progress which may open up possibilities in this area via a standard API.

  • TCPDF - LGPL
    • New version being built here, existing stable version support-only. New version slow to develop (look like it's been 6 years in the making).
    • Need to test output against BookStack HTML.
    • Need to assess options (disable network).
  • WeasyPrint - BSD
    • Need to test output against BookStack HTML.
      • Looks pretty good.
    • Need to assess options (disable network).
  • Pandoc - GPL (More of an abstraction layer to other libs)
    • Need to test output against BookStack HTML.
    • Need to assess options (disable network).
    • Tested using pagejs-cli below, works fine, can pass options to underlying lib used, so output and security depends on underlying lib.
  • PagedJS-CLI
    • Does a fair job at output, but fails some common CSS properties used in example page. Really feels like it's intended for print use more that matching web output.
    • Has good controls to prevent fetching.

There's also commercial offerings, which may or may not be better (PDFreactor, PrinceXML).
Still makes sense to me to create a generic command line rather than supporting specific libraries in this case, as it's a moving area, and by allowing a user to call their own wrapper script it can be built upon like for customizing generation options, and allow flexibility in solution used without over complicating our support. There could be other options to think about too (for example, running chrome in a network-limited container).

Implementation

  • Add new EXPORT_PDF_COMMAND env option.
    • Supports placeholders:
      • {input_html_path} - Path to input HTML file to convert.
      • {output_pdf_path} - Path that the output PDF file should be written to.

Notes: Should update existing LDAP_USER_FILTER env option to support this format.

Originally created by @ssddanbrown on GitHub (Dec 18, 2023). Originally assigned to: @ssddanbrown on GitHub. At this point [WKHTMLtoPDF](https://github.com/wkhtmltopdf/wkhtmltopdf) is deprecated, with the source repos archived and it drifting out of OS repos (Alpine as an example). Would be good to allow an alternative to fill the non-dompdf gap, and then deprecate and remove WKHTMLtoPDF specific support in the future. Instead of supporting specific export options, we could maybe support a generic interface to allow adapting to different options where desired. Something like a configurable command path with a placeholder parameter to take a location of where BookStack writes out HTML to for conversion. ### Research - Chromium - Can't see way to completely disable networking & JS. - Firefox - Can't find direct way for PDF capture. Does have a `--no-remote` option which may disable networking? For browsers, there's a [WebDriver standard in progress](https://w3c.github.io/webdriver-bidi/) which may open up possibilities in this area via a standard API. - [TCPDF](https://github.com/tecnickcom/tc-lib-pdf) - LGPL - New version [being built here](https://github.com/tecnickcom/tc-lib-pdf), existing stable version support-only. New version slow to develop (look like it's been 6 years in the making). - [ ] Need to test output against BookStack HTML. - [ ] Need to assess options (disable network). - [WeasyPrint](https://weasyprint.org/) - BSD - [x] Need to test output against BookStack HTML. - Looks pretty good. - [x] Need to assess options (disable network). - [Not specifically built](https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#security) for untrusted content, though should be able to disable network calls via python wrapper script to [define custom URL fetcher](https://doc.courtbouillon.org/weasyprint/stable/first_steps.html#url-fetchers). - [Pandoc](https://pandoc.org/) - GPL (More of an abstraction layer to other libs) - [x] Need to test output against BookStack HTML. - [x] Need to assess options (disable network). - Tested using pagejs-cli below, works fine, can pass options to underlying lib used, so output and security depends on underlying lib. - [PagedJS-CLI](https://www.npmjs.com/package/pagedjs-cli) - Does a fair job at output, but fails some common CSS properties used in example page. Really feels like it's intended for print use more that matching web output. - Has good controls to prevent fetching. There's also commercial offerings, which may or may not be better (PDFreactor, PrinceXML). Still makes sense to me to create a generic command line rather than supporting specific libraries in this case, as it's a moving area, and by allowing a user to call their own wrapper script it can be built upon like for customizing generation options, and allow flexibility in solution used without over complicating our support. There could be other options to think about too (for example, running chrome in a network-limited container). ### Implementation - Add new `EXPORT_PDF_COMMAND` env option. - Supports placeholders: - `{input_html_path}` - Path to input HTML file to convert. - `{output_pdf_path}` - Path that the output PDF file should be written to. Notes: Should update existing `LDAP_USER_FILTER` env option to support this format.
OVERLORD added the 🏭 Back-End🚚 Export System labels 2026-02-05 08:41:18 +03:00
Author
Owner

@chris-devel0per commented on GitHub (Jan 3, 2024):

Yes I think that is would be a nice feature!

@chris-devel0per commented on GitHub (Jan 3, 2024): Yes I think that is would be a nice feature!
Author
Owner

@roceil commented on GitHub (Feb 22, 2024):

Could you please clarify if you mean that in the current version, wkhtmltopdf is no longer available? I am in the process of preparing to install wkhtmltopdf in Docker for exporting PDFs in Chinese.

@roceil commented on GitHub (Feb 22, 2024): Could you please clarify if you mean that in the current version, wkhtmltopdf is no longer available? I am in the process of preparing to install wkhtmltopdf in Docker for exporting PDFs in Chinese.
Author
Owner

@ssddanbrown commented on GitHub (Feb 22, 2024):

Could you please clarify if you mean that in the current version, wkhtmltopdf is no longer available?

@roceil The ability to use WKHTMLtoPDF is still currently available in BookStack, but may likely be removed in the future.

@ssddanbrown commented on GitHub (Feb 22, 2024): > Could you please clarify if you mean that in the current version, wkhtmltopdf is no longer available? @roceil The ability to use WKHTMLtoPDF is still currently available in BookStack, but may likely be removed in the future.
Author
Owner

@Froggy422811 commented on GitHub (Apr 15, 2024):

There is a nice GIST to replace WKHTMLtoPDF within a docker container. This is working very good for us:
https://gist.github.com/kmpm/c254558fcb0608346f49946a53cd8c09

If the new function made it possible to specify an "export template" at book, chapter or page level, that would be a great thing.

@Froggy422811 commented on GitHub (Apr 15, 2024): There is a nice GIST to replace WKHTMLtoPDF within a docker container. This is working very good for us: https://gist.github.com/kmpm/c254558fcb0608346f49946a53cd8c09 If the new function made it possible to specify an "export template" at book, chapter or page level, that would be a great thing.
Author
Owner

@ssddanbrown commented on GitHub (Apr 26, 2024):

With the merge of #4969 there's now a generic command-based PDF option, which will be part of the next feature release. In the future we can put together specific engine/generator guidance (and help build generation implementations maybe).

@ssddanbrown commented on GitHub (Apr 26, 2024): With the merge of #4969 there's now a generic command-based PDF option, which will be part of the next feature release. In the future we can put together specific engine/generator guidance (and help build generation implementations maybe).
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/BookStack#4365