mirror of
https://github.com/BookStackApp/BookStack.git
synced 2026-02-10 19:06:16 +03:00
Invalid markup from page includes #3453
Closed
opened 2026-02-05 06:45:24 +03:00 by OVERLORD
·
6 comments
No Branch/Tag Specified
development
l10n_development
release
llm_only
vectors
v25-11
docker_env
drawio_rendering
user_permissions
ldap_host_failover
svg_image
prosemirror
captcha_example
fix/video-export
v25.12.3
v25.12.2
v25.12.1
v25.12
v25.11.6
v25.11.5
v25.11.4
v24.11.4
v25.11.3
v25.11.2
v25.11.1
v25.11
v25.07.3
v25.07.2
v25.07.1
v25.07
v25.05.2
v25.05.1
v25.05
v25.02.5
v25.02.4
v25.02.3
v25.02.2
v25.02.1
v25.02
v24.12.1
v24.12
v24.10.3
v24.10.2
v24.10.1
v24.10
v24.05.4
v24.05.3
v24.05.2
v24.05.1
v24.05
v24.02.3
v24.02.2
v24.02.1
v24.02
v23.12.3
v23.12.2
v23.12.1
v23.12
v23.10.4
v23.10.3
v23.10.2
v23.10.1
v23.10
v23.08.3
v23.08.2
v23.08.1
v23.08
v23.06.2
v23.06.1
v23.06
v23.05.2
v23.05.1
v23.05
v23.02.3
v23.02.2
v23.02.1
v23.02
v23.01.1
v23.01
v22.11.1
v22.11
v22.10.2
v22.10.1
v22.10
v22.09.1
v22.09
v22.07.3
v22.07.2
v22.07.1
v22.07
v22.06.2
v22.06.1
v22.06
v22.04.2
v22.04.1
v22.04
v22.03.1
v22.03
v22.02.3
v22.02.2
v22.02.1
v22.02
v21.12.5
v21.12.4
v21.12.3
v21.12.2
v21.12.1
v21.12
v21.11.3
v21.11.2
v21.11.1
v21.11
v21.10.3
v21.10.2
v21.10.1
v21.10
v21.08.6
v21.08.5
v21.08.4
v21.08.3
v21.08.2
v21.08.1
v21.08
v21.05.4
v21.05.3
v21.05.2
v21.05.1
v21.05
v21.04.6
v21.04.5
v21.04.4
v21.04.3
v21.04.2
v21.04.1
v21.04
v0.31.8
v0.31.7
v0.31.6
v0.31.5
v0.31.4
v0.31.3
v0.31.2
v0.31.1
v0.31.0
v0.30.7
v0.30.6
v0.30.5
v0.30.4
v0.30.3
v0.30.2
v0.30.1
v0.30.0
v0.29.3
v0.29.2
v0.29.1
v0.29.0
v0.28.3
v0.28.2
v0.28.1
v0.28.0
v0.27.5
v0.27.4
v0.27.3
v0.27.2
v0.27.1
v0.27
v0.26.4
v0.26.3
v0.26.2
v0.26.1
v0.26.0
v0.25.5
v0.25.4
v0.25.3
v0.25.2
v0.25.1
v0.25.0
v0.24.3
v0.24.2
v0.24.1
v0.24.0
v0.23.2
v0.23.1
v0.23.0
v0.22.0
v0.21.0
v0.20.3
v0.20.2
v0.20.1
v0.20.0
v0.19.0
v0.18.5
v0.18.4
v0.18.3
v0.18.2
v0.18.1
v0.18.0
v0.17.4
v0.17.3
v0.17.2
v0.17.1
v0.17.0
v0.16.3
v0.16.2
v0.16.1
v0.16.0
v0.15.3
v0.15.2
v0.15.1
v0.15.0
v0.14.3
v0.14.2
v0.14.1
v0.14.0
v0.13.1
v0.13.0
v0.12.2
v0.12.1
v0.12.0
v0.11.2
v0.11.1
v0.11.0
v0.10.0
v0.9.3
v0.9.2
v0.9.1
v0.9.0
v0.8.2
v0.8.1
v0.8.0
v0.7.6
v0.7.5
v0.7.4
v0.7.3
0.7.2
v.0.7.1
v0.7.0
v0.6.3
v0.6.2
v0.6.1
v0.6.0
v0.5.0
Labels
Clear labels
🎨 Design
📖 Docs Update
🐛 Bug
🐛 Bug
:cat2:🐈 Possible duplicate
💿 Database
☕ Open to discussion
💻 Front-End
🐕 Support
🚪 Authentication
🌍 Translations
🔌 API Task
🏭 Back-End
⛲ Upstream
🔨 Feature Request
🛠️ Enhancement
🛠️ Enhancement
🛠️ Enhancement
❤️ Happy feedback
🔒 Security
🔍 Pending Validation
💆 UX
📝 WYSIWYG Editor
🌔 Out of scope
🔩 API Request
:octocat: Admin/Meta
🖌️ View Customization
❓ Question
🚀 Priority
🛡️ Blocked
🚚 Export System
♿ A11y
🔧 Maintenance
> Markdown Editor
pull-request
Mirrored from GitHub Pull Request
Milestone
No items
No Milestone
Projects
Clear projects
No project
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: starred/BookStack#3453
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @gaufde on GitHub (Jan 14, 2023).
Describe the Bug
The page include system searches for tags of the form
{{@\s?([0-9].*?)}}.This string is then directly replaced with a new string (see the
parsePageIncludesandfetchSectionOfPagemethods inBookStack\Entities\Tools\PageContent.If the new string contains HTML, then the resulting markup will be invalid.
For example, if the replacement string is
"<h2 id="bkmrk-heading">this is a heading</h2>"markup such as<p id="bkmrk-example-id">{{@18#bkmrk-another-id}}</p>gets converted into<p id="bkmrk-example-id"><h2 id="bkmrk-heading">this is a heading</h2></p>, which is not valid.This causes unexpected results such as blank paragraphs being created by the browser.
Steps to Reproduce
See the description of the bug.
If you want to see the side-effect, simply force Bookstack to include some HTML into a page include. This is done when including a table from one page into another page. The page with the embedded table will likely have an unexpected empty paragraph appear.
Expected Behaviour
If the replacement string does not include any HTML markup, then it should work exactly as it does right now.
However, if the replacement string does include some HTML, then Bookstack needs to do something to avoid nesting the HTML and avoid creating empty paragraphs. One solution would be to replace the entire block with the new content.
From the example a couple sections above,
<p id="bkmrk-example-id">{{@18#bkmrk-another-id}}</p>would become<h2 id="bkmrk-heading">this is a heading</h2>.This does mean that the ID
bkmrk-example-idwould not appear on the page. However, I don't think that causes any issues, and it may even be the preferred behavior.Screenshots or Additional Context
No response
Browser Details
No response
Exact BookStack Version
BookStack v22.11.1
PHP Version
No response
Hosting Environment
N/A
@ssddanbrown commented on GitHub (Jan 14, 2023):
Thanks @gaufde for reporting. The fundamental issue has already been discovered/reported within issue #3385 (which was opened based upon using the whole page contents, but it's the same underlying issue at play) therefore I'm going to close this off as a duplicate.
@gaufde commented on GitHub (Jan 14, 2023):
@ssddanbrown Okay that makes sense, it does seem like this is a duplicate then.
Out of curiosity, is a fix for this issue planned for any upcoming releases? I just looked at that other thread and it seems ambiguous since you say it may be difficult to fix.
@ssddanbrown commented on GitHub (Jan 14, 2023):
@gaufde I do recognize it as a bug, so would like to address at some point, but a fix is not currently in progress and I'm not rushing to it too fast since:
@gaufde commented on GitHub (Jan 14, 2023):
@ssddanbrown What are your concerns here?
I was thinking through a fix based on the code I wrote for #3854. Seems like the page included tags could be searched for using DOMXpath rather than regex. Then, once the content is retrieved from the other page, it could be used to replace the node(s) found by the XPath query.
To handle cases where the incoming string is not HTML, there might need to be a regex based function to search for starting and ending html tags. If the replacement string doesn’t start with an HTML tag (besides strong or em), then the string could be used to replace the content of the node rather than replacing the entire node itself. At lest that is what it should do in essence. I believe the function would need to make a duplicate node with the new content and replace the old node.
lastly, the function would need to take the new DOM tree and convert it back to an HTML string since that is what the current function returns.
I don’t have much experience with the DOM functions in PHP, but based on what I’ve seen so far this seems like a more robust way to modify HTML strings.
I understand that this isn’t a high priority bug, but I think it is important if you want the “Page Include Parse” logical theme event to be used to its full potential. I’m willing to help as I can since this is a bug that is currently somewhat important to fix for my workflow. Let me know what you think!
@ssddanbrown commented on GitHub (Jan 14, 2023):
Yeah, revising to a DOMDocument approach was my expectation to handle cases properly.
The complications arise in regards to context of inserted nodes. Dealing with the content just based upon containing non-strong/em tags is simplified. The range of valid inline elements is much wider, and I'm sure there are cases where insertion of block level HTML can be valid, so the parent would also need to be inspected.
Additionally, the action to replace the old node is an assumption of intended behavior. An include tag could be mixed alongside content which complicates matters. While replace would makes sense if the context of it being the only content, it may otherwise be more practical to insert afterwards instead in some cases. Would maybe need matching to the browser's current auto-rejigging of content.
@gaufde commented on GitHub (Jan 30, 2023):
Okay, this gives me a good sense of all the cases you are thinking about. You are right my description was somewhat simplified. I had not considered that people might nest content along side an include tag.
I tried to get some code started for this project. However, I ran into some complications I wasn't expecting. Turns out getting the proper DOM nodes can't be done with regex and DOMDocument/Xpath easily. You can of course use contains in an XPath expression, but then you would only be able to match for
{{@rather than the entire page include tag.I'll chip away at it next chance I get to devote some time to this project, but it won't be as immediate as I had hoped.