Driver/receive: Implement audio reorder/jitter buffer (#156)
This PR Introduces a new `VoiceTick` event which collects and reorders all RTP packets to smooth over network instability, as well as to synchronise user audio streams. Raw packet events have been moved to `RtpPacket`, while `SpeakingUpdate`s have been removed as they can be easily computed using the `silent`/`speaking` audio maps included in each event. Closes #146.
This commit is contained in:
@@ -1,28 +1,38 @@
|
||||
use std::collections::{HashMap, HashSet};
|
||||
|
||||
use super::*;
|
||||
|
||||
#[derive(Clone, Debug, Eq, PartialEq)]
|
||||
#[non_exhaustive]
|
||||
/// Opus audio packet, received from another stream (detailed in `packet`).
|
||||
/// `payload_offset` contains the true payload location within the raw packet's `payload()`,
|
||||
/// if extensions or raw packet data are required.
|
||||
/// Audio data from all users in a voice channel, fired every 20ms.
|
||||
///
|
||||
/// Valid audio data (`Some(audio)` where `audio.len >= 0`) contains up to 20ms of 16-bit stereo PCM audio
|
||||
/// at 48kHz, using native endianness. Songbird will not send audio for silent regions, these should
|
||||
/// be inferred using [`SpeakingUpdate`]s (and filled in by the user if required using arrays of zeroes).
|
||||
/// Songbird implements a jitter buffer to sycnhronise user packets, smooth out network latency, and
|
||||
/// handle packet reordering by the network. Packet playout via this event is delayed by approximately
|
||||
/// [`Config::playout_buffer_length`]` * 20ms` from its original arrival.
|
||||
///
|
||||
/// If `audio.len() == 0`, then this packet arrived out-of-order. If `None`, songbird was not configured
|
||||
/// to decode received packets.
|
||||
///
|
||||
/// [`SpeakingUpdate`]: crate::events::CoreEvent::SpeakingUpdate
|
||||
pub struct VoiceData<'a> {
|
||||
/// Decoded audio from this packet.
|
||||
pub audio: &'a Option<Vec<i16>>,
|
||||
/// Raw RTP packet data.
|
||||
///
|
||||
/// Includes the SSRC (i.e., sender) of this packet.
|
||||
pub packet: &'a Rtp,
|
||||
/// Byte index into the packet body (after headers) for where the payload begins.
|
||||
pub payload_offset: usize,
|
||||
/// Number of bytes at the end of the packet to discard.
|
||||
pub payload_end_pad: usize,
|
||||
/// [`Config::playout_buffer_length`]: crate::Config::playout_buffer_length
|
||||
pub struct VoiceTick {
|
||||
/// Decoded voice data and source packets sent by each user.
|
||||
pub speaking: HashMap<u32, VoiceData>,
|
||||
|
||||
/// Set of all SSRCs currently known in the call who aren't included in [`Self::speaking`].
|
||||
pub silent: HashSet<u32>,
|
||||
}
|
||||
|
||||
#[derive(Clone, Debug, Eq, PartialEq)]
|
||||
#[non_exhaustive]
|
||||
/// Voice packet and audio data for a single user, from a single tick.
|
||||
pub struct VoiceData {
|
||||
/// RTP packet clocked out for this tick.
|
||||
///
|
||||
/// If `None`, then the packet was lost, and [`Self::decoded_voice`] may include
|
||||
/// around one codec delay's worth of audio.
|
||||
pub packet: Option<RtpData>,
|
||||
/// PCM audio obtained from a user.
|
||||
///
|
||||
/// Valid audio data (`Some(audio)` where `audio.len >= 0`) typically contains 20ms of 16-bit stereo PCM audio
|
||||
/// at 48kHz, using native endianness. Channels are interleaved (i.e., `L, R, L, R, ...`).
|
||||
///
|
||||
/// This value will be `None` if Songbird is not configured to decode audio.
|
||||
pub decoded_voice: Option<Vec<i16>>,
|
||||
}
|
||||
|
||||
Reference in New Issue
Block a user