Add a field to the API endpoint /api/pages/{id} to get the raw html #3860

Closed
opened 2026-02-05 07:41:48 +03:00 by OVERLORD · 5 comments
Owner

Originally created by @deleyva on GitHub (Jun 15, 2023).

API Endpoint or Feature

Add a field (raw_html) to the API endpoint /api/pages/{id} to get the raw html (with includes ids on it, not the replaced html)

Use-Case

I am creating a database to keep track of reused content so, if anyone deletes that content, he gets notified that the content that he's about to delete is reused in a given page.

As this tracking is not in the roadmap, I'm building this in a django external system, but I need that field in the endpoint to get the data I need.

Additional context

No response

Originally created by @deleyva on GitHub (Jun 15, 2023). ### API Endpoint or Feature Add a field (raw_html) to the API endpoint /api/pages/{id} to get the raw html (with includes ids on it, not the replaced html) ### Use-Case I am creating a database to keep track of reused content so, if anyone deletes that content, he gets notified that the content that he's about to delete is reused in a given page. As this tracking is not in the roadmap, I'm building this in a django external system, but I need that field in the endpoint to get the data I need. ### Additional context _No response_
OVERLORD added the 🔩 API Request label 2026-02-05 07:41:48 +03:00
Author
Owner

@ssddanbrown commented on GitHub (Jun 15, 2023):

This is something we should support otherwise it's not possible to reliably create an external page editor, or even a proper fetch + update flow, without messing up include tags.

@ssddanbrown commented on GitHub (Jun 15, 2023): This is something we should support otherwise it's not possible to reliably create an external page editor, or even a proper fetch + update flow, without messing up include tags.
Author
Owner

@riton commented on GitHub (Jun 19, 2023):

Same need with different Use Case here.

At IN2P3-CC, we're planning to manage part of our documentation from Gitlab.
CI/CD jobs would update Bookstack according to the Gitlab repository content which acts as our source of truth.
Our tool produces html that is sent to the Bookstack page API.

Since the html returned by a get page API is slightly modified, our tool is unable to detect (without heavy html introspection) if a page should be updated or not. Indeed remote page HTML always differs from locally generated HTML.

The solution we're planing to use updates the page on each call. As you may expect, this is not ideal and pollutes Activity Log.

If we are given access to the exact HTML that was initially sent to Bookstack, the problem vanishes.

@riton commented on GitHub (Jun 19, 2023): Same need with different _Use Case_ here. At [IN2P3-CC](https://cc.in2p3.fr/en/), we're planning to manage part of our documentation from Gitlab. CI/CD jobs would update `Bookstack` according to the _Gitlab repository content_ which acts as our _source of truth_. Our tool produces `html` that is sent to the `Bookstack` _page_ API. Since the `html` returned by a _get page_ API is slightly modified, our tool is unable to detect (without heavy `html` introspection) if a page _should be updated or not_. Indeed _remote page HTML_ always differs from _locally generated HTML_. The solution we're planing to use updates the _page_ on each call. As you may expect, this is _not ideal_ and pollutes _Activity Log_. If we are given access to the _exact HTML_ that was initially sent to `Bookstack`, the problem vanishes.
Author
Owner

@ssddanbrown commented on GitHub (Jun 20, 2023):

I have now added this within 8b935e71d1, and it will be part of the next release.
Thanks @deleyva for the original request here.


@riton Just a note on your use-case, this new property will provide the raw html stored in the BookStack database.
BookStack does do some pre-storage-processing of HTML content too, meaning this won't provide the exact HTML that was originally sent to BookStack. Specifically supporting that use-case would be a more substantial request that I would not be sure about including.

Since it sounds like you just need to check if the content matches your Gitlab side of things (One-direction check), here's a potential creative workaround:

Create a hash for the incoming gitlab content on change. Store that hash (locally to API system, or could sneak it into BookStack content or as a page tag), then on next update, compare the existing hash (if exists) to the new Gitlab content hash, update only if hashes differ.

@ssddanbrown commented on GitHub (Jun 20, 2023): I have now added this within 8b935e71d14ca985691ad5f8b2b4262f6f36454e, and it will be part of the next release. Thanks @deleyva for the original request here. --- @riton Just a note on your use-case, this new property will provide the raw html stored in the BookStack database. BookStack does do some pre-storage-processing of HTML content too, meaning this won't provide the *exact HTML* that was originally sent to BookStack. Specifically supporting that use-case would be a more substantial request that I would not be sure about including. Since it sounds like you just need to check if the content matches your Gitlab side of things (One-direction check), here's a potential creative workaround: Create a hash for the incoming gitlab content on change. Store that hash (locally to API system, or could sneak it into BookStack content or as a page tag), then on next update, compare the existing hash (if exists) to the new Gitlab content hash, update only if hashes differ.
Author
Owner

@deleyva commented on GitHub (Jun 22, 2023):

Thank you very much for devolping this endpoint! It helps a lot.

@deleyva commented on GitHub (Jun 22, 2023): Thank you very much for devolping this endpoint! It helps a lot.
Author
Owner

@riton commented on GitHub (Jun 24, 2023):

As suggested by @ssddanbrown

Create a hash for the incoming gitlab content on change. Store that hash (locally to API system, or could sneak it into BookStack content or as a page tag), then on next update, compare the existing hash (if exists) to the new Gitlab content hash, update only if hashes differ.

We're appending a <meta name="page-cksum" content="XXXXX"> (not in the <head> section of the HTML Page, but 🤷 ) to each Page generated through our C.I / C.D system. Works like a charm 👍

@riton commented on GitHub (Jun 24, 2023): As suggested by @ssddanbrown > Create a hash for the incoming gitlab content on change. Store that hash (locally to API system, or could sneak it into BookStack content or as a page tag), then on next update, compare the existing hash (if exists) to the new Gitlab content hash, update only if hashes differ. We're appending a `<meta name="page-cksum" content="XXXXX">` (not in the `<head>` section of the HTML Page, but :shrug: ) to each _Page_ generated through our _C.I / C.D_ system. Works like a charm :+1:
Sign in to join this conversation.
1 Participants
Notifications
Due Date
No due date set.
Dependencies

No dependencies set.

Reference: starred/BookStack#3860