Feature request: Pandoc intergration #1977

Closed
opened 2026-02-05 02:22:54 +03:00 by OVERLORD · 5 comments
Owner

Originally created by @maggie44 on GitHub (Dec 14, 2020).

Describe the bug
Exporting content in HTML and in PDF formats can create a series of peculiar issues:

  1. Exporting to HTML creates file sizes approximately twice the size of PDF.
  2. PDF files often do not maintain the appropriate formatting.

Feature request:

  1. PDF files do not contain the Table of Contents metadata (i.e. the table of contents that can be used within the Table of Contents panels available in PDF viewers).

Steps To Reproduce
Steps to reproduce the behavior:

  1. Go to the demo BookStack site.
  2. Export various content to HTML or PDF

Below are a few examples pulled from the demo site:

Overlapping cropped images:

my-lorem-ipsum-book.pdf

Page 6, misaligned text next to image when compared to HTML version/BookStack interface:

dummy-content-book.pdf
dummy-content-book.html.zip

File sizes of the last two files:

dummy-content-book.pdf size = 377KB
dummy-content-book.html size = 741KB

Originally created by @maggie44 on GitHub (Dec 14, 2020). **Describe the bug** Exporting content in HTML and in PDF formats can create a series of peculiar issues: 1. Exporting to HTML creates file sizes approximately twice the size of PDF. 2. PDF files often do not maintain the appropriate formatting. Feature request: 3. PDF files do not contain the Table of Contents metadata (i.e. the table of contents that can be used within the Table of Contents panels available in PDF viewers). **Steps To Reproduce** Steps to reproduce the behavior: 1. Go to the demo BookStack site. 2. Export various content to HTML or PDF **Below are a few examples pulled from the demo site:** _Overlapping cropped images:_ [my-lorem-ipsum-book.pdf](https://github.com/BookStackApp/BookStack/files/5685995/my-lorem-ipsum-book.pdf) _Page 6, misaligned text next to image when compared to HTML version/BookStack interface:_ [dummy-content-book.pdf](https://github.com/BookStackApp/BookStack/files/5685988/dummy-content-book.pdf) [dummy-content-book.html.zip](https://github.com/BookStackApp/BookStack/files/5686000/dummy-content-book.html.zip) _File sizes of the last two files:_ dummy-content-book.pdf size = 377KB dummy-content-book.html size = 741KB
Author
Owner

@ssddanbrown commented on GitHub (Dec 16, 2020):

Hi @maggie0002,

  1. Exporting to HTML creates file sizes approximately twice the size of PDF.

Yeah, This doesn't surprise me. We could be more efficient with the styles we're adding to the HTML version but that will only save about 50kb perhaps. Generally though, these are two very different file formats. The main size consumer will be images. The image encoding used within the HTML export is known to have a size overhead whereas PDF can make optimizations here.

  1. PDF files often do not maintain the appropriate formatting.

Yeah, It's tricky. There are other open issues regarding PDF rendering quirks. If PDF rendering is important to you I'd recommend our documentation on using the alternative wkhtmltopdf renderer:
https://www.bookstackapp.com/docs/admin/pdf-rendering/

This generally provides a more stable and consistent output.

  1. PDF files do not contain the Table of Contents metadata

Again the default PDF renderer does not support this but you can use the wkhtmltopdf to get this working. This is discussed within #2052.

@ssddanbrown commented on GitHub (Dec 16, 2020): Hi @maggie0002, > 1. Exporting to HTML creates file sizes approximately twice the size of PDF. Yeah, This doesn't surprise me. We could be more efficient with the styles we're adding to the HTML version but that will only save about 50kb perhaps. Generally though, these are two very different file formats. The main size consumer will be images. The image encoding used within the HTML export is known to have a size overhead whereas PDF can make optimizations here. > 2. PDF files often do not maintain the appropriate formatting. Yeah, It's tricky. There are other open issues regarding PDF rendering quirks. If PDF rendering is important to you I'd recommend our documentation on using the alternative wkhtmltopdf renderer: https://www.bookstackapp.com/docs/admin/pdf-rendering/ This generally provides a more stable and consistent output. > 3. PDF files do not contain the Table of Contents metadata Again the default PDF renderer does not support this but you can use the wkhtmltopdf to get this working. This is discussed within #2052.
Author
Owner

@maggie44 commented on GitHub (Jan 14, 2021):

A follow up here, since I’m now back into BookStack work.

Is there the ability to have a html package export? I see some advantages in the consolidated html file, but a zip archive of a html package would leave the images in their original form, benefiting from their compression, resolve image cropping issues, and would also help resolve some of the video export issues as it could export videos that have been uploaded into the archive.

By package I’m referring to a similar export system as we see in browsers where the index.html is accompanied by a folder containing components that index.html refers to locally.

@maggie44 commented on GitHub (Jan 14, 2021): A follow up here, since I’m now back into BookStack work. Is there the ability to have a html package export? I see some advantages in the consolidated html file, but a zip archive of a html package would leave the images in their original form, benefiting from their compression, resolve image cropping issues, and would also help resolve some of the video export issues as it could export videos that have been uploaded into the archive. By package I’m referring to a similar export system as we see in browsers where the index.html is accompanied by a folder containing components that index.html refers to locally.
Author
Owner

@ssddanbrown commented on GitHub (Jan 15, 2021):

Hi @maggie0002,

Is there the ability to have a html package export?

Not at the current time.

but a zip archive of a html package would leave the images in their original form, benefiting from their compression, resolve image cropping issues, and would also help resolve some of the video export issues as it could export videos that have been uploaded into the archive.

A zip archive would not solve/perform those actions alone, We'd have to go to a fair amount of extra effort to accommodate each of those.

If that feature would be really important to you feel free to open a new feature request issue.

@ssddanbrown commented on GitHub (Jan 15, 2021): Hi @maggie0002, > Is there the ability to have a html package export? Not at the current time. > but a zip archive of a html package would leave the images in their original form, benefiting from their compression, resolve image cropping issues, and would also help resolve some of the video export issues as it could export videos that have been uploaded into the archive. A zip archive would not solve/perform those actions alone, We'd have to go to a fair amount of extra effort to accommodate each of those. If that feature would be really important to you feel free to open a new feature request issue.
Author
Owner

@maggie44 commented on GitHub (Jan 15, 2021):

Hi @ssddanbrown,

I was thinking Pandoc integration as an optional module. It would add some efficiencies to the various exports by keeping the assets seperate as discussed above (and potentially resolve some other outstanding issues), but also provide a bunch of additional options, such as EPUB (https://github.com/BookStackApp/BookStack/issues/1949), Word doc, video export support (https://github.com/BookStackApp/BookStack/issues/883; https://github.com/BookStackApp/BookStack/issues/2412) and a bunch more.

Here are a few shortcuts to try it out:

  1. Here is Pandoc: https://pandoc.org
  2. In most repositories so apt-get install pandoc or brew install pandoc should do the trick (if installing in a docker container, may need to install build-essential and/or curl).
  3. An example Markdown I have tested with:

test.md

# Test file
Test MD File.

[![Build Status](https://cdn.vox-cdn.com/thumbor/zEZJzZFEXm23z-Iw9ESls2jYFYA=/89x0:1511x800/1600x900/cdn.vox-cdn.com/uploads/chorus_image/image/55717463/google_ai_photography_street_view_2.0.jpg)](https://travis-ci.org/joemccann/dillinger)
Dillinger is a cloud-enabled, mobile-ready, offline-storage, AngularJS powered HTML5 Markdown editor.

  - Type some Markdown
  - Convert some Markdown

![](https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4)

# New Features!

  - sdfsdf
  - sdfsdvldkvnc
 
You can also:
  - send

Execute the command:

pandoc test.md -o example2.html --extract-media ./assets

@maggie44 commented on GitHub (Jan 15, 2021): Hi @ssddanbrown, I was thinking Pandoc integration as an optional module. It would add some efficiencies to the various exports by keeping the assets seperate as discussed above (and potentially resolve some other outstanding issues), but also provide a bunch of additional options, such as EPUB (https://github.com/BookStackApp/BookStack/issues/1949), Word doc, video export support (https://github.com/BookStackApp/BookStack/issues/883; https://github.com/BookStackApp/BookStack/issues/2412) and a bunch more. Here are a few shortcuts to try it out: 1. Here is Pandoc: https://pandoc.org 2. In most repositories so `apt-get install pandoc` or `brew install pandoc` should do the trick (if installing in a docker container, may need to install` build-essential` and/or `curl`). 3. An example Markdown I have tested with: test.md ``` # Test file Test MD File. [![Build Status](https://cdn.vox-cdn.com/thumbor/zEZJzZFEXm23z-Iw9ESls2jYFYA=/89x0:1511x800/1600x900/cdn.vox-cdn.com/uploads/chorus_image/image/55717463/google_ai_photography_street_view_2.0.jpg)](https://travis-ci.org/joemccann/dillinger) Dillinger is a cloud-enabled, mobile-ready, offline-storage, AngularJS powered HTML5 Markdown editor. - Type some Markdown - Convert some Markdown ![](https://www.learningcontainer.com/wp-content/uploads/2020/05/sample-mp4-file.mp4) # New Features! - sdfsdf - sdfsdvldkvnc You can also: - send ``` Execute the command: `pandoc test.md -o example2.html --extract-media ./assets`
Author
Owner

@ssddanbrown commented on GitHub (Jan 16, 2021):

@maggie0002 I do like pandoc, and although it might in theory solve a couple of those existing issues, It would likely bring a lot more as this kind of thing quickly gets complex.

I'm going to close this off since the original issue and discussion here is for non-pandoc related thoughts; Please could you open this as a new issue (Pretty much just copy and paste your last message as a new issue)

@ssddanbrown commented on GitHub (Jan 16, 2021): @maggie0002 I do like pandoc, and although it might in theory solve a couple of those existing issues, It would likely bring a lot more as this kind of thing quickly gets complex. I'm going to close this off since the original issue and discussion here is for non-pandoc related thoughts; Please could you open this as a new issue (Pretty much just copy and paste your last message as a new issue)
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/BookStack#1977