jello/README.md

# Jello

A WIP video client for jellyfin.

(Planned) Features

1. Integrate with jellyfin
2. HDR video playback
3. Audio Track selection
4. Chapter selection

Libraries and frameworks used for this
1. iced -> primary gui toolkit
2. gstreamer -> primary video + audio decoding library
3. wgpu -> rendering the video from gstreamer in iced


### HDR
I'll try to document all my findings about HDR here.
I'm making this project to mainly learn about videos, color-spaces and gpu programming. And so very obviously I'm bound to make mistakes in either the code or the fundamental understanding of a concept. Please don't take anything in this text as absolute.

```rust
let window = ... // use winnit to get a window handle, check the example in this repo
let instance = wgpu::Instance::default();
let surface = instance.create_surface(window).unwrap();
let adapter = instance
    .request_adapter(&wgpu::RequestAdapterOptions {
        power_preference: wgpu::PowerPreference::default(),
        compatible_surface: Some(&surface),
        force_fallback_adapter: false,
    })
    .await
    .context("Failed to request wgpu adapter")?;
let caps = surface.get_capabilities();
println!("{:#?}", caps.formats);
```

This should print out all the texture formats that can be used by your current hardware
Among these the formats that support hdr (afaik) are
```
wgpu::TextureFormat::Rgba16Float
wgpu::TextureFormat::Rgba32Float
wgpu::TextureFormat::Rgb10a2Unorm
wgpu::TextureFormat::Rgb10a2Uint // (unsure)
```

My display supports Rgb10a2Unorm so I'll be going forward with that texture format.

`Rgb10a2Unorm` is still the same size as a `Rgba8Unorm` but data is in a different representation in each of them

`Rgb10a2Unorm`:
R, G, B => 10 bits each (2^10 = 1024 [0..=1023])
A => 2 bits (2^2 = 4 [0..=3])

Whereas in a normal pixel
`Rgba8Unorm`
R, G, B, A => 8 bits each (2^8 = 256 [0..=255])


For displaying videos the alpha components is not really used (I don't know of any) so we can use re-allocate 6 bits from the alpha channel and put them in the r,g and b components.
In the shader the components get uniformly normalized from [0..=1023] integer to [0..=1] in float so we can compute them properly

Videos however are generally not stored in this format or any rgb format in general because it is not as efficient for (lossy) compression as YUV formats.

Right now I don't want to deal with yuv formats so I'll use gstreamer caps to convert the video into `Rgba10a2` format


## Pixel formats and Planes
Dated: Sun Jan  4 09:09:16 AM IST 2026
| value | count | quantile | percentage | frequency |
| --- | --- | --- | --- | --- |
| yuv420p | 1815 | 0.5067001675041876 | 50.67% | ************************************************** |
| yuv420p10le | 1572 | 0.4388609715242881 | 43.89% | ******************************************* |
| yuvj420p | 171 | 0.04773869346733668 | 4.77% | **** |
| rgba | 14 | 0.003908431044109436 | 0.39% |  |
| yuvj444p | 10 | 0.0027917364600781687 | 0.28% |  |

For all of my media collection these are the pixel formats for all the videos

### RGBA
Pretty self evident
8 channels for each of R, G, B and A
Hopefully shouldn't be too hard to make a function or possibly a lut that takes data from rgba and maps it to Rgb10a2Unorm

```mermaid
packet
title RGBA
+8: "R"
+8: "G"
+8: "B"
+8: "A"
```


### YUV
[All YUV formats](https://learn.microsoft.com/en-us/windows/win32/medfound/recommended-8-bit-yuv-formats-for-video-rendering#surface-definitions)
[10 and 16 bit yuv formats](https://learn.microsoft.com/en-us/windows/win32/medfound/10-bit-and-16-bit-yuv-video-formats)

Y -> Luminance
U,V -> Chrominance

p -> Planar
sp -> semi planar

j -> full range

planar formats have each of the channels in a contiguous array one after another
in semi-planar formats the y channel is seperate and uv channels are interleaved