* Driver: Support `tokio-websockets`
* Fix bad feature flag
* Fix CI & examples features
* Use tungstenite in twilight example
* Error if none or both ws features are enabled
* Match `twilight-gateway` features
Discord will send `GatewayEvent::Speaking` (opcode 5) messages
after the Hello+Ready exchange, but will happily interleave
them with crypto mode negotiation. We were previously not expecting
such messages and dropping them -- this hurts receive-based bots'
ability to map SSRCs to UserIds when joining a call with existing
users.
This PR feeds all unexpected messages into the WS task directly,
which will handle them once all tasks are fully started.
This PR addresses some issues which have cropped up on voice receive at scale:
* In unknown circumstances, we can be left with adjacent packets queued which have very different timestamps. The playout buffer would withhold its held packets, leading to the loss of many subsequent packets if the timestamp jump is larger than 64 frames. This seems to occur for some specific clients which join before a bot, suggesting the DAVE -> legacy switchover is involved.
* Some loss patterns can leave us unable to correctly track the next expected sequence number (i.e., large loss runs), leaving the playout buffer unable to accept any packets if the packet sequence differed by over 64 entries.
The fixes are fallbacks which treat sufficiently large desynchronisation, and allow the playout to get back into a consistent state in both cases. Large timestamp jumps on adjacent packets now update the next expected TS (noting that we only want to withhold a few playout delays at most ). Failure to insert 0.25s of packets (or attempting to add a new sequence number into an empty buffer) can now take precedence.
Closes#261.
* Receive: Config of decode sample rate/channels
This PR allows for dynamic configuration of the output sample rate and
channel count of received Opus audio. Users who rely on supported
formats should no longer need to manually resample & downmix audio
decoded from SSRCs in a call.
Opus exposes tuples of (Mono, Stereo) x (8, 12, 16, 24, 48)kHz.
Changing this at runtime (mid-call) may cause some audio glitches, as
decoder state must be reconstructed from scratch for all affected SSRCs.
* Fix doc typo, consistent naming with MixMode.
This PR adds support for the new AEAD cryptosystems advertised by Discord, AES256-GCM and XChaCha20Poly1305. These schemes will shortly become mandatory, and provider stronger integrity/authentication guarantees over the cleartext portions of any voice packet by correctly specifying additional authenticated data.
To provide smooth switchover, we've added basic negotiation over the `CryptoMode`. This ensures that any clients who are manually specifying one of the legacy modes will automatically migrate to `Aes256Gcm` when Discord cease to advertise their original preference.
Closes#246.
---------
Co-authored-by: Kyle Simpson <kyleandrew.simpson@gmail.com>
Bots joining calls with users seem to provoke large runs of packets
with identical timestamps -- the existing logic was intended to handle this catchup case in addition to the normal (+=960) at all
times.
However, we were checking that a packet was modulo greater-than
the next ts, rather than modulo less-than. Simple enough to fix.
This will cause FreeBSD to fail setting up the socket. It may also be true of some other operating systems, but these are the ones I have been able to test.
It works on Windows and Linux.
The `Config` object provided to `Call`s and `Driver`s allows setting a `DisposalThread`, but since it is unreachable from outside the crate, the only way to properly set it is using a `Songbird` instance, which may not be adequate for all use cases. This PR just makes `DisposalThread` reachable from the outside.
Fixes behaviour where a Driver which was asked to leave an active call would receive the disconnect event several times: once when we started the disconnect, and once again when Discord killed the WS client.
Previously, we were only skipping zero-packet frames when we needed to
resample because the source sampling rate was not set to 48kHz. This
check should have also been applied in the case that a packet did not
need a resampler to be built.
Fixes#224.
This PR fixes a case where a call which changes channel or gracefully
reconnects would have been stuck in the Idle state. SetConn events will
now allow a transition straight back to Live if any tracks are found
attached to a mixer.
As of v0.9.1, `xsalsa20poly1305` has been deprecated. This is a mostly seamless replacement, as it appears to be the same crate authors / code / etc.
Co-authored-by: Kyle Simpson <kyleandrew.simpson@gmail.com>
A removed audio task could still have one or more driver messages left in its queue, leading to a crash when the id->mixer lookup failed. This removes an unwrap which is invalid under these assumptions and includes an extra cleanup measure for message forwarders under the same circumstances.
This was tested using `cargo make ready`.
This PR implements a custom scheduler for audio threads, which reduces thread use and (often) memory consumption.
To save threads and memory (e.g., packet buffer allocations), Songbird parks Mixer tasks which do not have any live Tracks.
These are now all co-located on a single async 'Idle' task.
This task is responsible for managing UDP keepalive messages for each task, maintaining event state, and executing any Mixer task messages.
Whenever any message arrives which adds a `Track`, the mixer task is moved to a live thread.
The Idle task inspects task counts and execution time on each thread, choosing the first live thread with room, and creating a new one if needed.
Each live thread is responsible for running as many live mixers as it can in a single tick every 20ms: this currently defaults to 16 mixers per thread, but is user-configurable.
A live thread also stores RTP packet blocks to be written into by each sub-task.
Each live thread has a conservative limit of 18ms that it will aim to stay under: if all work takes longer than this, it will offload the task with the highest mixing cost once per tick onto another (possibly new) live worker thread.
Moves all WS handling of unexpected payloads into the stream to prevent code duplication.
This also prevents non-{Hello,Resumed,Ready} messages from causing a handshake failure, as it seems Discord do not prevent such messages from appearing.
---------
Co-authored-by: Kyle Simpson <kyleandrew.simpson@gmail.com>
This patch changes around quite a few things.
The main entrypoint for twilight besides process will now be the
TwilightMap which concists of command senders for each shard.
This simplifies parts of the code as there is not any difference
between shards and clusters anymore.
This PR Introduces a new `VoiceTick` event which collects and reorders all RTP packets to smooth over network instability, as well as to synchronise user audio streams. Raw packet events have been moved to `RtpPacket`, while `SpeakingUpdate`s have been removed as they can be easily computed using the `silent`/`speaking` audio maps included in each event.
Closes#146.
This PR makes use of `SampleBuffer::samples_mut` to remove a 7680B stack allocation in general, and memcopy when softclip is used. This appears to offer ~1.5% performance boost according to `cargo make bench`.
Adds a new field to Config, disposer, an Option<Sender<DisposalMessage>> responsible for dropping the DisposalMessage on a separate thread.
If this is not set, and the Config is passed into manager::Songbird, a thread is spawned for this purpose (which previously was spawned per driver).
If this is not set, and the Config is passed directly into Driver or Call, a thread is spawned locally, which is the current behavior as there is no where to store the Sender.
This disposer is then used in Driver as previously, to run possibly blocking destructors (which should only block the disposal thread). I cannot see this disposal thread getting overloaded, but if it is the DisposalMessages will simply be queued in the flume channel until it can be dropped.
Co-authored-by: Kyle Simpson <kyleandrew.simpson@gmail.com>
`SsrcState` objects are created on a per-user basis when "receive" is enabled, but were previously never destroyed. This PR adds some shared dashmaps for the WS task to communicate SSRC-to-ID mappings to the UDP Rx task, as well as any disconnections. Additionally, decoder state is pruned a default 1 minute after a user last speaks.
This was tested using `cargo make ready` and via `examples/serenity/voice_receive/`.
Closes#133
Adds the "receive" feature, which is disabled by default. When this is disabled, the UDP receive task is not compiled and not run, and as an optimisation the UDP receive buffer size is set to 0. All related events are also removed.
This also removes the UDP Tx task, and moves packet and keepalive sends back into the mixer thread. This allows us to entirely remove channels and various allocations between the mixer and an async task created only for sending data (i.e., fewer memcopies).
If "receive" is enabled, UDP sends are now non-blocking due to technical constraints -- failure to send is non-fatal, but *will* drop affected packets. Given that blocking on a UDP send indicates that the OS cannot clear send buffers fast enough, this should alleviate OS load.
Closes#131.
This places songbird, serenity, and twilight onto the same WS library, hopefully reducing the compile overhead for everyone.
Tested using `cargo make ready` and by running `examples/voice`.
Closes#129.
This adds the `use_softclip` field to `Config`, which can currently provide a ~10us reduction in mixing cost from both a removed memcpy and the softclip itself.
This PR was tested using cargo make ready.
Closes#134.
This extensive PR rewrites the internal mixing logic of the driver to use symphonia for parsing and decoding audio data, and rubato to resample audio. Existing logic to decode DCA and Opus formats/data have been reworked as plugins for symphonia. The main benefit is that we no longer need to keep yt-dlp and ffmpeg processes alive, saving a lot of memory and CPU: all decoding can be done in Rust! In exchange, we now need to do a lot of the HTTP handling and resumption ourselves, but this is still a huge net positive.
`Input`s have been completely reworked such that all default (non-cached) sources are lazy by default, and are no longer covered by a special-case `Restartable`. These now span a gamut from a `Compose` (lazy), to a live source, to a fully `Parsed` source. As mixing is still sync, this includes adapters for `AsyncRead`/`AsyncSeek`, and HTTP streams.
`Track`s have been reworked so that they only contain initialisation state for each track. `TrackHandles` are only created once a `Track`/`Input` has been handed over to the driver, replacing `create_player` and related functions. `TrackHandle::action` now acts on a `View` of (im)mutable state, and can request seeks/readying via `Action`.
Per-track event handling has also been improved -- we can now determine and propagate the reason behind individual track errors due to the new backend. Some `TrackHandle` commands (seek etc.) benefit from this, and now use internal callbacks to signal completion.
Due to associated PRs on felixmcfelix/songbird from avid testers, this includes general clippy tweaks, API additions, and other repo-wide cleanup. Thanks go out to the below co-authors.
Co-authored-by: Gnome! <45660393+GnomedDev@users.noreply.github.com>
Co-authored-by: Alakh <36898190+alakhpc@users.noreply.github.com>