Bumps Cargo.toml v1.9.17 → v1.9.18 and ships the changelog for the
zero-copy mux refactor merged in 54552bb. No user-visible behavior
change; perf-focused release.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
3.7 KiB
• Performance refactor of full-tunnel mux hot path (#881 by @dazzling-no-more) — zero-copy reads via Bytes/BytesMut و base64 encoding از روی single mux thread برداشته شد. هیچ wire-protocol change نداره — فقط internal data flow. (1) tunnel_loop و SOCKS5 UDP receive loop دیگه per-iteration Vec::to_vec() copy ندارن. MuxMsg::{ConnectData,Data,UdpOpen,UdpData} حالا Bytes (Arc-backed) carry میکنن به جای Vec<u8>/Arc<Vec<u8>>. TCP path threshold-based: ≥32 KB → BytesMut::split().freeze() (saves 64 KB memcpy on hot downloads); <32 KB → Bytes::copy_from_slice + buf.clear() (payload-sized retention). UDP path: fixed Vec<u8> recv buffer + size-guarded copy. (2) base64 encoding (تا ~3 MB per batch) از mux thread رفت به spawned task تو fire_batch بعد از per-deployment semaphore — single mux task دیگه serialize نمیشه. (3) Code quality: BatchAccum::push_or_fire (۴ match arm به ۱ کلپس)، should_fire() predicate با saturating_add، encode_pending() free function. ۲۰۰ → ۲۰۸ lib test (+۸ regression: encode_pending × ۴، should_fire × ۳، batch_accum_reindexes_after_flush). API change: TunnelMux::udp_open/udp_data حالا impl Into<Bytes> میگیرن — existing callers با Vec/Bytes/BytesMut بدون تغییر کار میکنن.
• Performance refactor of the full-tunnel mux hot data path (#881 by @dazzling-no-more). No wire-protocol changes — internal data flow only.
1. Zero-copy reads via Bytes/BytesMut. tunnel_loop and the SOCKS5 UDP receive loop drop per-iteration Vec::to_vec() copies. MuxMsg::{ConnectData,Data,UdpOpen,UdpData} now carry Bytes (Arc-backed internally) instead of Vec<u8>/Arc<Vec<u8>>; the Arc::try_unwrap dance for pending_client_data is gone. TCP path is threshold-based to avoid memory regressions:
- n ≥ 32 KB:
BytesMut::split().freeze()— saves the 64 KB memcpy on hot downloads. - n < 32 KB:
Bytes::copy_from_slice+buf.clear()— payload-sized retention. Without this split,bytes1.x's whole-allocation refcount would pin a full 64 KB per queued tiny read under semaphore stall (worst case ~96 MB on a backpressured tunnel).
UDP path: fixed Vec<u8> recv buffer + Bytes::copy_from_slice after the 9 KB MAX_UDP_PAYLOAD_BYTES guard. parse_socks5_udp_packet split into _offsets + &[u8] wrapper so callers stay on the reusable buffer.
2. Base64 encoding moved off the single mux thread. New internal PendingOp { data: Option<Bytes>, encode_empty: bool } flows through mux_loop with raw bytes. Actual B64.encode(...) runs in fire_batch's spawned task, after the per-deployment semaphore. Up to ~3 MB of encoding per batch (50 ops × 64 KB) no longer serializes the single mux task.
3. Code quality (drive-bys). BatchAccum::push_or_fire collapses 4× ~25-line match arms into ~10 lines each. should_fire(pending_len, payload_bytes, op_bytes) predicate extracted with saturating_add. encode_pending(p) -> BatchOp extracted as a free function for direct test coverage.
Public API change: TunnelMux::udp_open and udp_data now take data: impl Into<Bytes> instead of Vec<u8> — existing in-tree callers passing Vec<u8>, &'static [u8], Bytes, or BytesMut all keep compiling.
200 → 208 lib tests (+8 regression: encode_pending_* × 4, should_fire_* × 3, batch_accum_reindexes_after_flush).