mirror of
https://github.com/jellyfin/jellyfin.git
synced 2026-05-04 18:09:12 +03:00
Segmentation fault with no logs when scanning media library since upgrade to 10.10.1 #6457
Reference in New Issue
Block a user
Delete Branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Originally created by @ryannathans on GitHub (Nov 12, 2024).
This issue respects the following points:
Description of the bug
After upgrading to Jellyfin 10.10.1 (from 10.9.11 IIRC) on FreeBSD 14.1 (using the package in the package manager), manual media library scans now cause segmentation faults. I do not know if automatic library scans cause the same issue, but I would assume so.
It doesn't seem to matter which library I scan, the same behavior occurs. There are no useful logs.
Reproduction steps
What is the current bug behavior?
Segmentation fault (core dumped)
What is the expected correct behavior?
Media library scans as it used to, with no segmentation fault
Jellyfin Server version
10.10.0+
Specify commit id
No response
Specify unstable release number
No response
Specify version number
No response
Specify the build version
10.10.1
Environment
Jellyfin logs
FFmpeg logs
Client / Browser logs
No response
Relevant screenshots or videos
No response
Additional information
Would love to provide more logs or information... but there doesn't seem to be any more hints or logs I can find
@felix920506 commented on GitHub (Nov 13, 2024):
please enable debug logging. https://jellyfin.org/docs/general/administration/troubleshooting#debug-logging
@oychang commented on GitHub (Nov 15, 2024):
Similar issue to OP. Enabled debug logging and this seems like the relevant section after triggering a manual library scan:
@robn commented on GitHub (Nov 17, 2024):
tl;dr bug in Skia, calling libjpeg incorrectly. I am writing a bug report for Skia right now, and will edit this comment with the link once I have it.Actually, looking again, I think it's jpeg-turbo not quite being compatible with other libjpegs. I've filed a bug there: https://github.com/libjpeg-turbo/libjpeg-turbo/issues/795.Update 2024-11-19: jpeg-turbo has declined. Bug logged against Skia: https://issues.skia.org/issues/379669745
FWIW, I'm just a Jellyfin user, and do not have time to get involved anywhere beyond this. I'd be quite happy if Jellyfin could work around it somehow, though I recognise you're some distance away.
Relevant FreeBSD bug: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=282704
Debugging session follows.
Also seeing exactly this on FreeBSD 14.1 on upgrade from the packaged version.
I've been trying to debug this at the system level (I don't know any Jellyfin-specifics, so I started with what I know).
First, an aside: the docs for enabling debug logging seem a little off, yielding the
Failed to create/read logger configurationerror that @oychang reported. I ended up using this for mylogging.json:After requesting a scan, the last log lines before the restart are:
If I remove the file in question, it regenerates it by whatever means, then crashes again:
This behaviour is consistent, allowing plenty of testing. I moved the file out of the way, and then compared with the new version. The both have the same sha256sum, which I suppose means its just downloaded from somewhere, or its generated in some deterministic way: poster.jpg
So I put
ktrace()on it, to see if it's anything obviously related to the image, or if that's just the last debug line: kdump.txtNot especially enlightening. Opens the file, does a bunch of 1K reads, and eventually does some kind of read to unmapped memory (ie, NULL deref or other invalid address).
I wouldn't expect that from a managed language though (I mean, I have no actual idea) which suggests a native function or libary of some sort. So instead, I throw
gdbat it, and hello:Quick disassembly shows that it's a null pointer deref.
So either libjpeg has a bug, or something is holding it wrong.
libjpeg here is the FreeBSD-packaged jpeg-turbo 3.0.4. Going to the source there (
jdmaster.c:105):Strong indication there that
cinfo->masterhere is NULL, since it's the first test in that function, and is sufficiently far intojpeglib.hthat I can believe the0x238offset.Caller is from the FreeBSD-packaged libskiasharp 2.88.3, but actually from the version of skia bundled with it. Following https://github.com/mono/SkiaSharp/tree/v2.88.3/externals, that brings is to the call to
jpeg_calc_output_dimensions()atSkJpegCodec.cpp:277:"The actual jpeg_decompress_struct" smells a bit...
sk_codec_get_scaled_dimensions()isn't here directly, but it's easy to imagine that it's a dispatcher for various codecs. I'm getting tired, so I'm not going to prove that, just jump down the file a bit to where the problem almost certainly lies (SkJpegCodec.cpp:L310):So Skia is trying to fake a struct that is internal to libjpeg, initialising it "wrong", and and libjpeg blows up. Kinda poor form.
Looks like it's still that way on the head:
SkJpegCodec.cpp:255@ade3669.So I started writing a bug report against Skia, and was looking at the jpeg-turbo code again, and noticed part of the comment on
jpeg_calc_output_dimensions():If that's true, then maybe Skia isn't calling it wrong; maybe this fork of libjpeg is doing the wrong thing.
I'm not really sure what the canonical libjpeg is at this point, so I just picked the first one I found, mozjpeg. It seems like it's
jpeg_calc_output_dimensions()andjpeg_core_output_dimensions()never steps outside ofcinfo, so does not exhibit this fault. (jdmaster.c:91).So I guess I'm writing a bug report against jpeg-turbo then, and possibly also letting FreeBSD know so that they can consider relinking the stack.
@ryannathans commented on GitHub (Nov 22, 2024):
Looks like the freebsd guys have deployed a workaround to libskiasharp, deployed via package manager. The feature now works after libskiasharp: 2.88.3 -> 2.88.3_1
@felix920506 commented on GitHub (Nov 22, 2024):
closing this as it is an issue specific to the FreeBSD build, which is unsupported. Please report the issue to the maintainers of the FreeBSD build instead. If anyone is experiencing the same issue on Linux Windows or macOS please open a new report.
@robn commented on GitHub (Nov 23, 2024):
@felix920506 closing is reasonable, but it's not FreeBSD as such, rather, anywhere that jellyfin ends up using skiasharp/skia compiled against jpeg-turbo 3.x.
At least, your team may want consider if there's any way to better report where a crash in some native code happens. This would help debugging a lot.
@gnattu commented on GitHub (Nov 23, 2024):
The thing is we package the skiasharp with its own libskia on each and every platform we officially support so this is indeed more of a downstream issue that the packager decides to use an incompatible version of lib. Google’s libskia and the dotnet binding skiasharp has very very strict version requirements that we don’t even want to deal with, and I believe lots of the downstream packagers also don’t want to deal with.