Files
therealaleph 786a9703c9 chore(release): v1.9.20 — fix Full-mode warm-up race (#924, #1029)
Bumps Cargo.toml v1.9.19 → v1.9.20 and ships the changelog. Headline
fix: the v1.9.15 Full-mode regression that's been tracking in #924 for
~3 weeks is resolved by @rezaisrad's PR #1029. Bisect-quality root
cause (h1 prewarm gated behind h2 handshake, both stall on cold start
under the same network conditions). Affected users can drop the
`force_http1: true` workaround now.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-11 00:37:38 +03:00

4.1 KiB

Fix Full mode regression از v1.9.15 (#924 — یک ۳-هفته‌ای tracking thread با ۱۸+ duplicate report، fixed by @rezaisrad in PR #1029). علامت: `batch timed out after 30s` در Full mode، در حالی که apps_script mode normal کار می‌کرد. فقط workaround موجود `"force_http1": true` kill switch بود. Bisect دقیق این رو به `0e678630a` (PR #799 که h2 multiplexing رو اضافه کرد) رساند. روت کاز یک‌ line ordering: `warm()` در v1.9.15 h1 prewarm loop رو پشت `ensure_h2().await` گذاشت — وقتی h2 handshake کند بود (تا 8s)، pool h1 خالی می‌موند. اگر در آن window یک request می‌آمد، h1 fallback یک TCP+TLS handshake cold می‌زد که خود stall می‌شد، outside the 30s batch_timeout. Fix: h1 prewarm parallel با h2 handshake (v1.9.14 ordering restored)، plus بستنک‌های پیرامون با `H1_OPEN_TIMEOUT_SECS = 8` و `H2Cell.dead` AtomicBool. ۲۰۸ → ۲۰۹ lib test (+1 regression: `ensure_h2_rejects_dead_cell_within_ttl`). تأیید end-to-end: 5/5 cold restarts pass (9.6-22.5s)، 5/5 concurrent SOCKS5 burst.

Fix Full mode regression since v1.9.15 (#924, PR #1029 by @rezaisrad). #924 was the canonical tracking thread for an 18+ duplicate cluster spanning ~3 weeks; affected users saw batch timed out after 30s on every Full-mode request while apps_script mode kept working. The only available workaround was the "force_http1": true kill switch.

Root cause (rigorously bisected to 0e678630a — PR #799 which added HTTP/2 multiplexing): PR #799 gated the h1 socket-pool prewarm behind ensure_h2().await. ensure_h2() is bounded by H2_OPEN_TIMEOUT_SECS = 8s but can take the full window on a cold first connection. During that window the h1 fallback pool was empty, so any request that arrived would:

  1. Get Err((Relay("h2 unavailable"), No)) immediately → fall back to h1
  2. Empty pool → cold open() → fresh TCP+TLS to connect_host:443
  3. Same network conditions that stalled h2 also stalled h1; cold open exceeded the 30s batch_timeout
  4. User saw batch timed out after 30s that "works on apps_script" couldn't explain

Fix (two commits, domain_fronter.rs-only):

  1. warm h1 pool in parallel with h2: spawn h2 prewarm in a separate task so the h1 prewarm loop runs concurrently. Full n h1 sockets are warm before user traffic, even when h2 stalls. run_pool_refill trims back to POOL_MIN_H2_FALLBACK = 2 within 5s once h2 lands as the fast path.

  2. bound h1 open() + detect dead h2 cells synchronously: H1_OPEN_TIMEOUT_SECS = 8 wraps the TCP+TLS handshake in open() so a stuck handshake doesn't block acquire() until the outer batch budget elapses. H2Cell.dead: Arc<AtomicBool> flipped by the connection driver task when Connection::await ends — known-dead cells are rejected within ≤5s instead of waiting for H2_CONN_TTL_SECS = 540s to expire.

API impact: h2_handshake_post_tls return type changes to (SendRequest, Arc<AtomicBool>). One existing test (h2_handshake_post_tls_returns_alpn_refused_when_peer_picks_h1) tweaks its Ok arm to match — no panic message change.

208 → 209 lib tests (+1 regression: ensure_h2_rejects_dead_cell_within_ttl). Live end-to-end (per PR notes): 5/5 cold restarts pass in 9.6-22.5s, 5/5 concurrent SOCKS5 burst, default full.json baseline 200 OK in 13.3s.

Action for affected users: update to v1.9.20, drop the "force_http1": true workaround from config.json if you had it set. Full mode should work reliably on cold restart again.