chore(release): v1.9.24 — fix timeout cascade + Cloud Run docker build (#1088, #620)

Bumps v1.9.23 → v1.9.24. Two PRs from @dazzling-no-more:
- #1108 (#1088): batch header read honors request_timeout_secs.
  Closes the 10s inner timeout cliff that was cascade-killing tunnel
  sessions under slow Apps Script edges. +22 regression tests (231 total).
- #1117 (#620): cargo-chef Dockerfile so tunnel-node builds without
  BuildKit. Cloud Run's gcloud-deploy path now works.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
therealaleph
2026-05-13 13:58:16 +03:00
parent 4d135a4e2f
commit 1c9e73e4e7
3 changed files with 27 additions and 2 deletions
Generated
+1 -1
View File
@@ -2624,7 +2624,7 @@ dependencies = [
[[package]]
name = "mhrv-rs"
version = "1.9.23"
version = "1.9.24"
dependencies = [
"base64 0.22.1",
"bytes",
+1 -1
View File
@@ -1,6 +1,6 @@
[package]
name = "mhrv-rs"
version = "1.9.23"
version = "1.9.24"
edition = "2021"
description = "Rust port of MasterHttpRelayVPN -- DPI bypass via Google Apps Script relay with domain fronting"
license = "MIT"
+25
View File
@@ -0,0 +1,25 @@
<!-- see docs/changelog/v1.1.0.md for the file format: Persian, then `---`, then English. -->
**Fix Full mode timeout cascade — \`batch header read honors request_timeout_secs\`** ([#1088](https://github.com/therealaleph/MasterHttpRelayVPN-RUST/issues/1088), [PR #1108](https://github.com/therealaleph/MasterHttpRelayVPN-RUST/pull/1108) by @dazzling-no-more). در Full mode، یک Apps Script edge کند، تمام تونل sessionهای hot-and-flowing رو cascade-kill می‌کرد. کاربرها روی v1.9.21+ مرتب 10s "batch timeout" می‌دیدن و download progress تلگرام/browser رو از دست می‌دادن. **Root cause**: \`read_http_response\` در \`domain_fronter.rs\` یک hardcoded 10s header-read timeout داشت که داخل \`tunnel_batch_request_to\` اجرا می‌شد — مستقل از و کوتاه‌تر از outer \`tokio::time::timeout(batch_timeout, ...)\` در \`fire_batch\`. Apps Script cold starts معمولاً 8-12s طول می‌کشن (PR #1040's A/B 4/30 H1 batches رو ثبت کرد که دقیقاً 10s timeout می‌شدن)، پس inner cliff به‌عنوان false-positive batch timeout قبل از اینکه \`request_timeout_secs\` (default 30s) trigger بشه fire می‌شد. **Fix**: (1) \`tunnel_batch_request_to\` حالا \`batch_timeout\` رو به header read pass می‌کنه via new \`read_http_response_with_header_timeout\` helper. (2) Header read یک absolute deadline استفاده می‌کنه (\`timeout_at\`) به جای per-read \`timeout()\` — slow drip-feed peer دیگه نمی‌تونه silently extend بزنه. (3) Bonus: \`TunnelMux::reply_timeout\` با \`batch_timeout\` co-vary می‌کنه (\`batch_timeout + 5s slack\`). ۲۰۹ → **۲۳۱ lib test** (+22 regression).
**Docker: cargo-chef برای build بدون BuildKit** ([#620](https://github.com/therealaleph/MasterHttpRelayVPN-RUST/issues/620), [PR #1117](https://github.com/therealaleph/MasterHttpRelayVPN-RUST/pull/1117) by @dazzling-no-more). \`tunnel-node/Dockerfile\` از BuildKit-only \`RUN --mount=type=cache\` استفاده می‌کرد که روی Cloud Run's \`gcloud run deploy --source .\` path شکست می‌خورد (underlying \`gcr.io/cloud-builders/docker\` builder BuildKit رو enable نمی‌کنه). cargo-chef pattern: \`recipe.json\` planner stage + \`cargo chef cook\` deps stage + final build with \`src/\` on top. Docker's regular layer cache حالا dependency reuse رو handle می‌کنه — warm rebuilds تنها \`src/\` رو compile می‌کنن. Base bump \`rust:1.85-slim\` → \`rust:1.90-slim\` (cargo-chef نیاز به rustc 1.86+ داره).
---
**Fix Full mode timeout cascade — `batch header read honors request_timeout_secs`** ([#1088](https://github.com/therealaleph/MasterHttpRelayVPN-RUST/issues/1088), [PR #1108](https://github.com/therealaleph/MasterHttpRelayVPN-RUST/pull/1108) by @dazzling-no-more). Under Full mode, a single slow Apps Script edge cascade-killed every in-flight tunnel session sharing its batch. Users on v1.9.21+ saw frequent 10s "batch timeout" errors and lost download progress on Telegram / browser sessions.
**Root cause**: `read_http_response` in `domain_fronter.rs` had a **hardcoded 10s header-read timeout** that ran *inside* `tunnel_batch_request_to` — independent of and shorter than the outer `tokio::time::timeout(batch_timeout, …)` in `fire_batch`. Apps Script cold starts routinely land in the 8-12s range (PR #1040's A/B recorded 4/30 H1 batches timing out at exactly 10s after the H2→H1 switch), so the inner cliff fired as a false-positive batch timeout well before `request_timeout_secs` (default 30s) could.
**Fix** (in `domain_fronter.rs` + `tunnel_client.rs`):
1. `tunnel_batch_request_to` passes `batch_timeout` to the header read via new `read_http_response_with_header_timeout` helper. `Config::request_timeout_secs` is now the only knob controlling how long we wait for an Apps Script edge to start responding. Other callers (relay path, exit-node) keep the historical 10s value.
2. Header read uses a single **absolute deadline** (`timeout_at`) instead of per-read `timeout()`. Total elapsed across all header reads is bounded regardless of read cadence — a slow drip-feed peer can no longer silently extend.
3. **`TunnelMux::reply_timeout`** co-varies with `batch_timeout` (computed at construction as `fronter.batch_timeout() + 5s slack` instead of fixed 35s const). Operators raising `request_timeout_secs` no longer have sessions abandon `reply_rx` just before `fire_batch`'s HTTP round-trip would complete.
209 → **231 lib tests** (+22 regression covering the deadline/co-variance behavior).
**Docker: cargo-chef so tunnel-node builds without BuildKit** ([#620](https://github.com/therealaleph/MasterHttpRelayVPN-RUST/issues/620), [PR #1117](https://github.com/therealaleph/MasterHttpRelayVPN-RUST/pull/1117) by @dazzling-no-more). `tunnel-node/Dockerfile` used BuildKit-only `RUN --mount=type=cache` directives, breaking on Cloud Run's `gcloud run deploy --source .` path (the underlying `gcr.io/cloud-builders/docker` builder doesn't enable BuildKit, and `--set-build-env-vars DOCKER_BUILDKIT=1` doesn't flip it on either).
Reworked to use **cargo-chef**: a dedicated planner stage emits `recipe.json` for dependency metadata, a `cargo chef cook` stage builds just the deps in their own Docker layer, the final build stage adds `src/` on top. Docker's regular layer cache handles dependency reuse — warm rebuilds where only `src/` changes still skip the slow crate compile.
Base bump `rust:1.85-slim``rust:1.90-slim` (cargo-chef's transitive deps require rustc 1.86+; tunnel-node's `Cargo.toml` has no `rust-version` pin so the bump is internal-only).
**Action for Cloud Run users blocked on #620**: pull v1.9.24 of the tunnel-node Docker image (`ghcr.io/therealaleph/mhrv-tunnel-node:v1.9.24` or `:latest`) — your `gcloud run deploy --source .` should now succeed without BuildKit.
**Followup**: issue #1131 (BuffOvrFlw) reports `h1 open timed out after 8s` — that's the `H1_OPEN_TIMEOUT_SECS = 8` from PR #1029 firing on `open()` (TCP+TLS handshake), separate from the header-read timeout this release fixes. Worth a follow-up PR to make `H1_OPEN_TIMEOUT_SECS` parameterized via `request_timeout_secs` too.