perf: skip H2 for full-tunnel batch requests (#1040)

Skip H2 for `tunnel_batch_request_to` (the Full-mode batch path). Tunnel batches already coalesce N ops into one HTTP request — H2 stream multiplexing has nothing to multiplex. The H2 try/fallback path on this specific code path introduced three regressions vs v1.9.14: 1. **Long-poll stalls**: idle polls completed at 16-17s (`LONGPOLL_DEADLINE` + network latency) instead of timing out at 10s on H1. Each poll held an Apps Script execution slot 60% longer. 2. **Silent batch drops**: `RequestSent::Maybe` failures dropped the entire batch with no retry — a failure mode H1 doesn't have. 3. **Pool starvation**: `POOL_MIN_H2_FALLBACK = 2` trimmed the H1 pool from 8 → 2 once H2 connected, but tunnel batches still used H1 and needed the full pool. H2 multiplexing is **kept active for relay mode** (non-full) where each browser request is a separate HTTP call that genuinely benefits from stream multiplexing. r0ar's controlled A/B test in #962 confirmed h2 is strictly better than `force_http1: true` for apps_script-mode users, and that path is unchanged here. ## Changes - `tunnel_batch_request_to`: remove H2 try/fallback/NonRetryable block, go straight to H1 pool `acquire()` (-54 lines). - `run_pool_refill`: always maintain `POOL_MIN = 8`. Remove the `POOL_MIN_H2_FALLBACK = 2` trim that was starving tunnel batches (-12 lines). ## A/B results (Pixel 6 Pro, 30 batch samples each) | Metric | H2 (stock v1.9.20) | H1 (this PR) | v1.9.14 (baseline) | |---|---|---|---| | 16-17s batches | **8-10/30** | **0/30** | **0/30** | | 10s timeouts | 0 | 4/30 | 5/30 | | Active RTTs | 1.4-2.4s | 1.3-2.2s | 1.4-2.3s | Restores v1.9.14 tunnel performance while keeping all v1.9.15+ improvements (H2 for relay, zero-copy mux, block DoH/QUIC, TLS pool tuning, PR #1029's warm-race fix). ## Verified locally on top of v1.9.20 - `cargo test --lib --release`: 209/209 ✅ (matches v1.9.20 baseline) - `cargo build --release --features ui --bin mhrv-rs-ui`: clean ✅ ## Interaction with PR #1029 (just shipped in v1.9.20) PR #1029 added `H2Cell.dead: Arc<AtomicBool>` for synchronous dead-cell detection. With this PR removing the H2 path for tunnel batches, the dead-cell flag is no longer consulted on the tunnel batch path — that's intentional (the flag now scopes to relay mode, which is the path it was protecting in practice). Reviewed via Anthropic Claude. Co-Authored-By: yyoyoian-pixel <noreply@github.com> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 21:24:48 +03:00 · 2026-05-11 01:41:59 +02:00
parent 9611279fbd
commit dd7b3553ec
1 changed files with 12 additions and 66 deletions
@@ -983,16 +983,13 @@ impl DomainFronter {
        }
    }

-    /// Background loop that keeps the h1 fallback pool warm.
+    /// Background loop that keeps the h1 pool warm.
    ///
-    /// Target depends on whether the h2 fast path is active:
-    /// - h2 disabled (or peer refused ALPN h2): keep `POOL_MIN` (8)
-    ///   sockets so the per-request acquire never pays a cold handshake
-    ///   — the pre-h2 default behavior.
-    /// - h2 active: keep just `POOL_MIN_H2_FALLBACK` (2). All real
-    ///   traffic rides the multiplexed h2 connection; the h1 pool only
-    ///   exists to cover the case where h2 dies and we need to fall
-    ///   back instantly without a cold handshake.
+    /// Always maintains `POOL_MIN` (8) connections. Full-tunnel mode
+    /// uses the h1 pool for all batch traffic (h2 is skipped for
+    /// tunnel batches), so the pool must stay at full capacity
+    /// regardless of h2 status. Relay mode also benefits from a warm
+    /// pool as h1 fallback.
    ///
    /// A connection only counts toward the minimum if it has at least
    /// 20 s of TTL remaining — nearly-expired entries don't help.
@@ -1000,7 +997,6 @@ impl DomainFronter {
    /// and opens replacements one at a time so there's no burst.
    pub async fn run_pool_refill(self: Arc<Self>) {
        const MIN_REMAINING_SECS: u64 = 20;
-        const POOL_MIN_H2_FALLBACK: usize = 2;
        loop {
            tokio::time::sleep(Duration::from_secs(POOL_REFILL_INTERVAL_SECS)).await;

@@ -1010,24 +1006,7 @@ impl DomainFronter {
                pool.retain(|e| e.created.elapsed().as_secs() < POOL_TTL_SECS);
            }

-            // Decide target. We treat "h2 active right now" as having a
-            // fresh, non-poisoned cell. h2_disabled is the sticky flag
-            // (peer never agreed to h2); a transient cell-poison after
-            // h2 success briefly drops back to the larger target until
-            // ensure_h2 reopens.
-            let target = if self.h2_disabled.load(Ordering::Relaxed) {
-                POOL_MIN
-            } else {
-                let cell = self.h2_cell.lock().await;
-                let h2_alive = cell
-                    .as_ref()
-                    .map(|c| {
-                        c.created.elapsed().as_secs() < H2_CONN_TTL_SECS
-                            && !c.dead.load(Ordering::Relaxed)
-                    })
-                    .unwrap_or(false);
-                if h2_alive { POOL_MIN_H2_FALLBACK } else { POOL_MIN }
-            };
+            let target = POOL_MIN;

            // Count only connections with enough life left.
            // Refill one at a time to avoid bursting TLS handshakes.
@@ -2876,44 +2855,11 @@ impl DomainFronter {

        let path = format!("/macros/s/{}/exec", script_id);

-        // h2 fast path. A batch carries N stateful tunnel ops — each
-        // `data`/`udp_data`/`connect` may have already executed
-        // upstream when the response framing failed. Replaying the
-        // whole batch on h1 risks duplicating every op in it. Only
-        // fall back when h2 definitely never sent. Honors
-        // user-configured batch_timeout so a slow but legitimate
-        // batch isn't cut off at an arbitrary fixed cap.
-        match self
-            .h2_relay_request(&path, payload.clone(), self.batch_timeout)
-            .await
-        {
-            Ok((status, _hdrs, _resp_body)) if is_h2_fronting_refusal_status(status) => {
-                // Edge rejected the batch before forwarding. Safe to
-                // fall back: no batched op reached Apps Script, so
-                // replaying via h1 won't double-fire any of them.
-                self.sticky_disable_h2_for_fronting_refusal(status, "tunnel batch")
-                    .await;
-                // fall through to h1
-            }
-            Ok((status, _hdrs, resp_body)) => {
-                return self.finalize_batch_response(script_id, status, resp_body);
-            }
-            Err((e, RequestSent::No)) => {
-                tracing::debug!(
-                    "h2 batch request pre-send failure: {} — falling back to h1",
-                    e
-                );
-            }
-            Err((e, RequestSent::Maybe)) => {
-                tracing::warn!(
-                    "h2 batch request post-send failure: {} — \
-                     not replaying on h1 to avoid duplicating batched ops",
-                    e
-                );
-                return Err(e);
-            }
-        }
-
+        // Skip h2 for tunnel batches. Batched ops are already coalesced
+        // into one HTTP request so h2 multiplexing adds no benefit.
+        // The h1 pool path is simpler and avoids h2-specific overhead
+        // (ready timeout, NonRetryable errors, concurrent stream
+        // contention with long-poll batches).
        let mut entry = self.acquire().await?;

        let req_head = format!(