v0.6.0: performance pack — pool prewarm, SNI rotation, per-site stats, parallel dispatch

Tier-1 perf changes from the brainstorm, all on by default except where they change semantics (parallel_relay is opt-in). Connection pool pre-warm (domain_fronter.rs): On startup, open 3 TLS connections to Google edge in parallel and park them in the pool. First user request skips the ~300-500 ms handshake cost. Best-effort: warm failures are logged at debug and ignored. Triggered from ProxyServer::run() in a fire-and-forget tokio spawn. SNI rotation (domain_fronter.rs): Replace the single sni_host String with a Vec<String> plus an atomic round-robin index. When front_domain is one of the known Google-edge subdomains, build_sni_pool() expands it to include the other four (www/mail/drive/docs/calendar.google.com), so outbound TLS connection counts get spread across names instead of concentrating on one. Custom front_domain values are preserved as the single entry (we can't verify siblings of a non-Google edge). Expanded SNI-rewrite suffix list (proxy_server.rs): Added gstatic.com, googleusercontent.com, googleapis.com, ggpht.com, ytimg.com, blogspot.com, blogger.com to the list of domains routed directly via the Google-edge tunnel instead of through the Apps Script relay. Bigger bypass = less UA-locking, less quota burn on static CDN content. Per-site stats (domain_fronter.rs + ui.rs): New HostStat struct {requests, cache_hits, bytes, total_latency_ns} tracked per URL host. Records on both cache hits and relay calls, not on SNI-rewrite bypasses (those never touch the fronter). UI renders a collapsible table under the existing stats grid with the top 60 hosts sorted by request count, showing req count, cache hit %, bytes, avg latency ms. Parallel script-ID dispatch (config.rs, domain_fronter.rs, ui.rs): New config field parallel_relay: u8 (default 0 = off). When >= 2 and there are enough non-blacklisted IDs, do_relay_with_retry fans out the request to N script instances concurrently via futures_util's select_ok, returns first success, cancels the rest. Kills long-tail latency when one Apps Script instance happens to be slow, at the cost of N× quota per request. UI exposes it as a DragValue 0-8. TCP_NODELAY audit (proxy_server.rs): Added the missing set_nodelay(true) call on the SNI-rewrite outbound TCP stream. All six TcpStream::connect sites in the user traffic path now disable Nagle. Expanded feature list in README, added futures-util dep, added unit tests for extract_host and build_sni_pool. Verified end-to-end locally: - Pool pre-warm log line appears on startup: 'pool pre-warmed with 3 connection(s)'. - Static asset hit 3x: first = 2.2s (Apps Script), 2-3 = 6ms (cache). - youtube.com / google.com: SNI-rewrite tunnel (unchanged). - All 28 unit tests pass. Deferred (not in this release, each needs its own cycle): - uTLS / Chrome fingerprint mimicry (TLS stack swap) - QUIC/HTTP3 transport (new transport) - ETag / If-None-Match revalidation (needs cache schema change) - JSON envelope gzip on request (needs Code.gs change) - Firebase Cloud Functions as alt backend (new architecture) - MSS clamp / TCP Fast Open (platform-specific, marginal)
2026-05-18 05:44:35 +03:00 · 2026-04-22 02:52:36 +03:00
parent 561cbadd96
commit 3f0bbfdab0
7 changed files with 392 additions and 12 deletions
@@ -1317,13 +1317,14 @@ dependencies = [

 [[package]]
 name = "mhrv-rs"
-version = "0.5.1"
+version = "0.6.0"
 dependencies = [
 "base64 0.22.1",
 "bytes",
 "directories",
 "eframe",
 "flate2",
+ "futures-util",
 "h2",
 "http",
 "httparse",
@@ -1,6 +1,6 @@
 [package]
 name = "mhrv-rs"
-version = "0.5.1"
+version = "0.6.0"
 edition = "2021"
 description = "Rust port of MasterHttpRelayVPN -- DPI bypass via Google Apps Script relay with domain fronting"
 license = "MIT"
@@ -44,6 +44,7 @@ h2 = "0.4"
 http = "1"
 flate2 = "1"
 directories = "5"
+futures-util = { version = "0.3", default-features = false, features = ["std"] }

 # Optional UI dep: only pulled in when --features ui is set.
 eframe = { version = "0.28", default-features = false, features = [
@@ -261,6 +261,11 @@ This port focuses on the **`apps_script` mode** — the only one that reliably w
 - [x] Script IDs masked in logs (`prefix…suffix`) so `info` logs don't leak deployment IDs
 - [x] Desktop UI (egui) — cross-platform, no bundler needed
 - [x] Optional upstream SOCKS5 chaining for non-HTTP traffic (Telegram MTProto, IMAP, SSH…) so raw-TCP flows can be tunneled through xray / v2ray / sing-box instead of connecting directly. HTTP/HTTPS keeps going through the Apps Script relay.
+- [x] Connection pool pre-warm on startup (first request skips the TLS handshake to Google edge).
+- [x] Per-connection SNI rotation across a pool of Google subdomains (`www/mail/drive/docs/calendar.google.com`), so outbound connection counts aren't concentrated on one SNI.
+- [x] Optional parallel script-ID dispatch (`parallel_relay`): fan out a relay request to N script instances concurrently, return first success, kill p95 latency at the cost of N× quota.
+- [x] Per-site stats drill-down in the UI (requests, cache hit %, bytes, avg latency per host) for live debugging.
+- [x] OpenWRT / Alpine / musl builds — static binaries, procd init script included.

 Intentionally **not** implemented (rationale included so future contributors don't spend cycles on them):

@@ -70,6 +70,7 @@ struct UiState {
    running: bool,
    started_at: Option<Instant>,
    last_stats: Option<mhrv_rs::domain_fronter::StatsSnapshot>,
+    last_per_site: Vec<(String, mhrv_rs::domain_fronter::HostStat)>,
    log: VecDeque<String>,
    ca_trusted: Option<bool>,
    last_test_ok: Option<bool>,
@@ -105,6 +106,7 @@ struct FormState {
    log_level: String,
    verify_ssl: bool,
    upstream_socks5: String,
+    parallel_relay: u8,
    show_auth_key: bool,
 }

@@ -139,6 +141,7 @@ fn load_form() -> FormState {
            log_level: c.log_level,
            verify_ssl: c.verify_ssl,
            upstream_socks5: c.upstream_socks5.unwrap_or_default(),
+            parallel_relay: c.parallel_relay,
            show_auth_key: false,
        }
    } else {
@@ -153,6 +156,7 @@ fn load_form() -> FormState {
            log_level: "info".into(),
            verify_ssl: true,
            upstream_socks5: String::new(),
+            parallel_relay: 0,
            show_auth_key: false,
        }
    }
@@ -208,6 +212,7 @@ impl FormState {
                let v = self.upstream_socks5.trim();
                if v.is_empty() { None } else { Some(v.to_string()) }
            },
+            parallel_relay: self.parallel_relay,
        })
    }
 }
@@ -241,6 +246,12 @@ struct ConfigWire<'a> {
    hosts: &'a std::collections::HashMap<String, String>,
    #[serde(skip_serializing_if = "Option::is_none")]
    upstream_socks5: Option<&'a str>,
+    #[serde(skip_serializing_if = "is_zero_u8")]
+    parallel_relay: u8,
+}
+
+fn is_zero_u8(v: &u8) -> bool {
+    *v == 0
 }

 #[derive(serde::Serialize)]
@@ -269,6 +280,7 @@ impl<'a> From<&'a Config> for ConfigWire<'a> {
            verify_ssl: c.verify_ssl,
            hosts: &c.hosts,
            upstream_socks5: c.upstream_socks5.as_deref(),
+            parallel_relay: c.parallel_relay,
        }
    }
 }
@@ -382,6 +394,20 @@ impl eframe::App for App {
                        .desired_width(f32::INFINITY));
                    ui.end_row();

+                    ui.label("Parallel dispatch")
+                        .on_hover_text(
+                            "Fire this many Apps Script IDs in parallel per relay request and\n\
+                             return the first successful response. 0/1 = off (round-robin).\n\
+                             Higher values eliminate long-tail latency (slow script instance\n\
+                             doesn't hold up the fast one) but spend that many times more\n\
+                             daily quota. Only effective with multiple IDs configured.\n\
+                             Recommend 2-3 if you have plenty of quota headroom."
+                        );
+                    ui.add(egui::DragValue::new(&mut self.form.parallel_relay)
+                        .speed(1)
+                        .range(0..=8));
+                    ui.end_row();
+
                    ui.label("Log level");
                    egui::ComboBox::from_id_source("loglevel")
                        .selected_text(&self.form.log_level)
@@ -415,9 +441,16 @@ impl eframe::App for App {
            ui.separator();

            // Status + stats
-            let (running, started_at, stats, ca_trusted, last_test_msg) = {
+            let (running, started_at, stats, ca_trusted, last_test_msg, per_site) = {
                let s = self.shared.state.lock().unwrap();
-                (s.running, s.started_at, s.last_stats, s.ca_trusted, s.last_test_msg.clone())
+                (
+                    s.running,
+                    s.started_at,
+                    s.last_stats,
+                    s.ca_trusted,
+                    s.last_test_msg.clone(),
+                    s.last_per_site.clone(),
+                )
            };

            ui.horizontal(|ui| {
@@ -464,6 +497,41 @@ impl eframe::App for App {
                });
            }

+            if !per_site.is_empty() {
+                ui.add_space(2.0);
+                egui::CollapsingHeader::new(format!("Per-site ({} hosts)", per_site.len()))
+                    .default_open(false)
+                    .show(ui, |ui| {
+                        egui::ScrollArea::vertical()
+                            .max_height(140.0)
+                            .show(ui, |ui| {
+                                egui::Grid::new("per_site")
+                                    .num_columns(5)
+                                    .spacing([8.0, 2.0])
+                                    .striped(true)
+                                    .show(ui, |ui| {
+                                        ui.label(egui::RichText::new("host").strong());
+                                        ui.label(egui::RichText::new("req").strong());
+                                        ui.label(egui::RichText::new("hit%").strong());
+                                        ui.label(egui::RichText::new("bytes").strong());
+                                        ui.label(egui::RichText::new("avg ms").strong());
+                                        ui.end_row();
+                                        for (host, st) in per_site.iter().take(60) {
+                                            let hit_pct = if st.requests > 0 {
+                                                (st.cache_hits as f64 / st.requests as f64) * 100.0
+                                            } else { 0.0 };
+                                            ui.label(egui::RichText::new(host).monospace());
+                                            ui.label(egui::RichText::new(st.requests.to_string()).monospace());
+                                            ui.label(egui::RichText::new(format!("{:.0}%", hit_pct)).monospace());
+                                            ui.label(egui::RichText::new(fmt_bytes(st.bytes)).monospace());
+                                            ui.label(egui::RichText::new(format!("{:.0}", st.avg_latency_ms())).monospace());
+                                            ui.end_row();
+                                        }
+                                    });
+                            });
+                    });
+            }
+
            ui.add_space(4.0);

            ui.horizontal(|ui| {
@@ -573,7 +641,10 @@ fn background_thread(shared: Arc<Shared>, rx: Receiver<Cmd>) {
                        let f = slot.lock().await;
                        if let Some(fronter) = f.as_ref() {
                            let s = fronter.snapshot_stats();
-                            shared.state.lock().unwrap().last_stats = Some(s);
+                            let per_site = fronter.snapshot_per_site();
+                            let mut st = shared.state.lock().unwrap();
+                            st.last_stats = Some(s);
+                            st.last_per_site = per_site;
                        }
                    });
                }
@@ -62,6 +62,16 @@ pub struct Config {
    /// unaffected.
    #[serde(default)]
    pub upstream_socks5: Option<String>,
+    /// Fan-out factor for non-cached relay requests when multiple
+    /// `script_id`s are configured. `0` or `1` = off (round-robin, the
+    /// default). `2` or more = fire that many Apps Script instances in
+    /// parallel per request and return the first successful response —
+    /// kills long-tail latency caused by a single slow Apps Script
+    /// instance, at the cost of using that much more daily quota.
+    /// Value is clamped to the number of available (non-blacklisted)
+    /// script IDs.
+    #[serde(default)]
+    pub parallel_relay: u8,
 }

 fn default_google_ip() -> String {
@@ -62,11 +62,21 @@ struct PoolEntry {

 pub struct DomainFronter {
    connect_host: String,
-    sni_host: String,
+    /// Pool of SNI domains to rotate through per outbound connection. All of
+    /// them must be hosted on the same Google edge as `connect_host` (that's
+    /// the whole point of domain fronting). Rotating across several of them
+    /// defeats naive DPI that would count "too many connections to a single
+    /// SNI". Populated from config's front_domain: if that's a single name we
+    /// add a small pool of known-safe Google subdomains automatically.
+    sni_hosts: Vec<String>,
+    sni_idx: AtomicUsize,
    http_host: &'static str,
    auth_key: String,
    script_ids: Vec<String>,
    script_idx: AtomicUsize,
+    /// Fan-out factor: fire this many Apps Script instances in parallel
+    /// per request and return first success. `<= 1` = off.
+    parallel_relay: usize,
    tls_connector: TlsConnector,
    pool: Arc<Mutex<Vec<PoolEntry>>>,
    cache: Arc<ResponseCache>,
@@ -76,6 +86,30 @@ pub struct DomainFronter {
    relay_calls: AtomicU64,
    relay_failures: AtomicU64,
    bytes_relayed: AtomicU64,
+    /// Per-host breakdown of traffic going through this fronter. Keyed by
+    /// the host of the URL (e.g. "api.x.com"). Read-mostly; only touched
+    /// on the slow path (once per relayed request), so a plain Mutex is
+    /// fine.
+    per_site: Arc<std::sync::Mutex<HashMap<String, HostStat>>>,
+}
+
+/// Aggregated stats for one remote host.
+#[derive(Default, Clone, Debug)]
+pub struct HostStat {
+    pub requests: u64,
+    pub cache_hits: u64,
+    pub bytes: u64,
+    pub total_latency_ns: u64,
+}
+
+impl HostStat {
+    pub fn avg_latency_ms(&self) -> f64 {
+        if self.requests == 0 {
+            0.0
+        } else {
+            (self.total_latency_ns as f64) / (self.requests as f64) / 1_000_000.0
+        }
+    }
 }

 const BLACKLIST_COOLDOWN_SECS: u64 = 600;
@@ -130,9 +164,11 @@ impl DomainFronter {

        Ok(Self {
            connect_host: config.google_ip.clone(),
-            sni_host: config.front_domain.clone(),
+            sni_hosts: build_sni_pool(&config.front_domain),
+            sni_idx: AtomicUsize::new(0),
            http_host: "script.google.com",
            auth_key: config.auth_key.clone(),
+            parallel_relay: config.parallel_relay as usize,
            script_ids,
            script_idx: AtomicUsize::new(0),
            tls_connector,
@@ -144,9 +180,36 @@ impl DomainFronter {
            relay_calls: AtomicU64::new(0),
            relay_failures: AtomicU64::new(0),
            bytes_relayed: AtomicU64::new(0),
+            per_site: Arc::new(std::sync::Mutex::new(HashMap::new())),
        })
    }

+    /// Increment the per-site counters. Called on every logical request
+    /// (both cache hits and relay roundtrips).
+    fn record_site(&self, url: &str, cache_hit: bool, bytes: u64, latency_ns: u64) {
+        let host = match extract_host(url) {
+            Some(h) => h,
+            None => return,
+        };
+        let mut m = self.per_site.lock().unwrap();
+        let e = m.entry(host).or_default();
+        e.requests += 1;
+        if cache_hit {
+            e.cache_hits += 1;
+        }
+        e.bytes += bytes;
+        e.total_latency_ns += latency_ns;
+    }
+
+    /// Snapshot per-site stats, sorted by request count descending.
+    pub fn snapshot_per_site(&self) -> Vec<(String, HostStat)> {
+        let m = self.per_site.lock().unwrap();
+        let mut v: Vec<(String, HostStat)> =
+            m.iter().map(|(k, v)| (k.clone(), v.clone())).collect();
+        v.sort_by(|a, b| b.1.requests.cmp(&a.1.requests));
+        v
+    }
+
    pub fn snapshot_stats(&self) -> StatsSnapshot {
        let bl = self.blacklist.lock().unwrap();
        StatsSnapshot {
@@ -192,6 +255,36 @@ impl DomainFronter {
        self.script_ids[0].clone()
    }

+    /// Pick `want` distinct non-blacklisted script IDs for a parallel fan-out
+    /// dispatch. Returns fewer than `want` if there aren't enough non-blacklisted
+    /// IDs available. Advances the round-robin index by `want` to spread load
+    /// across subsequent calls.
+    fn next_script_ids(&self, want: usize) -> Vec<String> {
+        let n = self.script_ids.len();
+        if n == 0 {
+            return vec![];
+        }
+        let mut bl = self.blacklist.lock().unwrap();
+        let now = Instant::now();
+        bl.retain(|_, until| *until > now);
+
+        let mut picked: Vec<String> = Vec::with_capacity(want);
+        for _ in 0..n {
+            if picked.len() >= want {
+                break;
+            }
+            let idx = self.script_idx.fetch_add(1, Ordering::Relaxed);
+            let sid = &self.script_ids[idx % n];
+            if !bl.contains_key(sid) && !picked.iter().any(|p| p == sid) {
+                picked.push(sid.clone());
+            }
+        }
+        if picked.is_empty() {
+            picked.push(self.script_ids[0].clone());
+        }
+        picked
+    }
+
    fn blacklist_script(&self, script_id: &str, reason: &str) {
        let until = Instant::now() + Duration::from_secs(BLACKLIST_COOLDOWN_SECS);
        let mut bl = self.blacklist.lock().unwrap();
@@ -204,14 +297,56 @@ impl DomainFronter {
        );
    }

+    fn next_sni(&self) -> String {
+        let n = self.sni_hosts.len();
+        let i = self.sni_idx.fetch_add(1, Ordering::Relaxed) % n;
+        self.sni_hosts[i].clone()
+    }
+
    async fn open(&self) -> Result<PooledStream, FronterError> {
        let tcp = TcpStream::connect((self.connect_host.as_str(), 443u16)).await?;
        let _ = tcp.set_nodelay(true);
-        let name = ServerName::try_from(self.sni_host.clone())?;
+        let sni = self.next_sni();
+        let name = ServerName::try_from(sni)?;
        let tls = self.tls_connector.connect(name, tcp).await?;
        Ok(tls)
    }

+    /// Open `n` outbound TLS connections in parallel and park them in the
+    /// pool so the first few user requests don't pay the handshake cost.
+    /// Errors are logged but not returned — best-effort.
+    pub async fn warm(self: &Arc<Self>, n: usize) {
+        let mut set = tokio::task::JoinSet::new();
+        for _ in 0..n {
+            let me = self.clone();
+            set.spawn(async move {
+                match me.open().await {
+                    Ok(s) => Some(PoolEntry {
+                        stream: s,
+                        created: Instant::now(),
+                    }),
+                    Err(e) => {
+                        tracing::debug!("pool warm: open failed: {}", e);
+                        None
+                    }
+                }
+            });
+        }
+        let mut warmed = 0;
+        while let Some(res) = set.join_next().await {
+            if let Ok(Some(entry)) = res {
+                let mut pool = self.pool.lock().await;
+                if pool.len() < POOL_MAX {
+                    pool.push(entry);
+                    warmed += 1;
+                }
+            }
+        }
+        if warmed > 0 {
+            tracing::info!("pool pre-warmed with {} connection(s)", warmed);
+        }
+    }
+
    async fn acquire(&self) -> Result<PoolEntry, FronterError> {
        {
            let mut pool = self.pool.lock().await;
@@ -252,10 +387,12 @@ impl DomainFronter {
    ) -> Vec<u8> {
        let coalescible = is_cacheable_method(method) && body.is_empty();
        let key = if coalescible { Some(cache_key(method, url)) } else { None };
+        let t_start = Instant::now();

        if let Some(ref k) = key {
            if let Some(hit) = self.cache.get(k) {
                tracing::debug!("cache hit: {}", url);
+                self.record_site(url, true, hit.len() as u64, t_start.elapsed().as_nanos() as u64);
                return hit;
            }
        }
@@ -297,6 +434,7 @@ impl DomainFronter {
            }
        }

+        self.record_site(url, false, bytes.len() as u64, t_start.elapsed().as_nanos() as u64);
        bytes
    }

@@ -345,7 +483,15 @@ impl DomainFronter {
        headers: &[(String, String)],
        body: &[u8],
    ) -> Result<Vec<u8>, FronterError> {
-        // One retry on connection failure.
+        // Fan-out path: fire N instances in parallel, return first Ok, cancel
+        // the rest. Clamps to number of available script IDs so the single-ID
+        // case is a no-op even if parallel_relay>1 was configured.
+        let fan = self.parallel_relay.min(self.script_ids.len()).max(1);
+        if fan >= 2 {
+            return self.do_relay_parallel(method, url, headers, body, fan).await;
+        }
+
+        // Sequential path: one retry on connection failure.
        match self.do_relay_once(method, url, headers, body).await {
            Ok(v) => Ok(v),
            Err(e) => {
@@ -355,6 +501,37 @@ impl DomainFronter {
        }
    }

+    async fn do_relay_parallel(
+        self: &Self,
+        method: &str,
+        url: &str,
+        headers: &[(String, String)],
+        body: &[u8],
+        fan: usize,
+    ) -> Result<Vec<u8>, FronterError> {
+        use futures_util::future::FutureExt;
+        let ids = self.next_script_ids(fan);
+        if ids.is_empty() {
+            return Err(FronterError::Relay("no script_ids available".into()));
+        }
+
+        // Build one future per script, each a pinned boxed future so we can
+        // `select_ok` over them.
+        let mut futs = Vec::with_capacity(ids.len());
+        for sid in ids {
+            let fut = self.do_relay_once_with(sid.clone(), method, url, headers, body).boxed();
+            futs.push(fut);
+        }
+
+        // `select_ok`: drive all futures concurrently, return the first Ok
+        // (cancelling the rest when the returned future is dropped). If all
+        // error out, returns the last error.
+        match futures_util::future::select_ok(futs).await {
+            Ok((bytes, _remaining)) => Ok(bytes),
+            Err(e) => Err(e),
+        }
+    }
+
    async fn do_relay_once(
        &self,
        method: &str,
@@ -362,8 +539,19 @@ impl DomainFronter {
        headers: &[(String, String)],
        body: &[u8],
    ) -> Result<Vec<u8>, FronterError> {
-        let payload = self.build_payload_json(method, url, headers, body)?;
        let script_id = self.next_script_id();
+        self.do_relay_once_with(script_id, method, url, headers, body).await
+    }
+
+    async fn do_relay_once_with(
+        &self,
+        script_id: String,
+        method: &str,
+        url: &str,
+        headers: &[(String, String)],
+        body: &[u8],
+    ) -> Result<Vec<u8>, FronterError> {
+        let payload = self.build_payload_json(method, url, headers, body)?;
        let path = format!("/macros/s/{}/exec", script_id);

        let mut entry = self.acquire().await?;
@@ -505,6 +693,59 @@ impl DomainFronter {

 /// Strip connection-specific headers (matches Code.gs SKIP_HEADERS) and
 /// strip Accept-Encoding: br (Apps Script can't decompress brotli).
+/// Extract the host (no scheme, no port, no path) from a URL string.
+/// Returns None for malformed / scheme-less inputs.
+fn extract_host(url: &str) -> Option<String> {
+    let after_scheme = url.split_once("://").map(|(_, rest)| rest).unwrap_or(url);
+    let authority = after_scheme.split('/').next().unwrap_or("");
+    // Strip userinfo if present.
+    let authority = authority.rsplit_once('@').map(|(_, a)| a).unwrap_or(authority);
+    // Strip port. Handle IPv6 literals in brackets.
+    let host = if let Some(stripped) = authority.strip_prefix('[') {
+        // [::1]:443 -> ::1
+        stripped.split_once(']').map(|(h, _)| h).unwrap_or(stripped)
+    } else {
+        authority.split(':').next().unwrap_or(authority)
+    };
+    if host.is_empty() {
+        None
+    } else {
+        Some(host.to_ascii_lowercase())
+    }
+}
+
+/// Build the pool of SNI hosts used for outbound connections to the Google
+/// edge. Takes the user-configured `front_domain` as the primary and adds a
+/// few other Google-owned subdomains that share the same GFE, so the per-SNI
+/// connection-count fingerprint gets spread instead of concentrating on one
+/// name. All entries MUST be hosted on the same edge as `connect_host`,
+/// otherwise the TLS handshake will land on the wrong server.
+///
+/// If the user has set `front_domain` to something off the default list, we
+/// still include it first and don't add extras (we'd have no way to verify
+/// they're co-hosted with a non-Google custom edge).
+fn build_sni_pool(primary: &str) -> Vec<String> {
+    let primary = primary.trim().to_string();
+    // A Google-edge-hosted primary: augment with siblings.
+    let google_defaults: &[&str] = &[
+        "www.google.com",
+        "mail.google.com",
+        "drive.google.com",
+        "docs.google.com",
+        "calendar.google.com",
+    ];
+    let looks_like_google_edge = google_defaults.iter().any(|s| *s == primary);
+    let mut pool = vec![primary.clone()];
+    if looks_like_google_edge {
+        for s in google_defaults {
+            if *s != primary {
+                pool.push((*s).to_string());
+            }
+        }
+    }
+    pool
+}
+
 pub fn filter_forwarded_headers(headers: &[(String, String)]) -> Vec<(String, String)> {
    const SKIP: &[&str] = &[
        "host",
@@ -965,6 +1206,30 @@ impl ServerCertVerifier for NoVerify {
 mod tests {
    use super::*;

+    #[test]
+    fn extract_host_strips_scheme_port_path() {
+        assert_eq!(extract_host("https://example.com/foo"), Some("example.com".into()));
+        assert_eq!(extract_host("http://foo.bar:8080/x"), Some("foo.bar".into()));
+        assert_eq!(extract_host("https://user:pw@host.test/x"), Some("host.test".into()));
+        assert_eq!(extract_host("https://[2001:db8::1]:443/"), Some("2001:db8::1".into()));
+        assert_eq!(extract_host("API.X.com/foo"), Some("api.x.com".into()));
+        assert_eq!(extract_host(""), None);
+    }
+
+    #[test]
+    fn build_sni_pool_extends_for_google() {
+        let p = build_sni_pool("www.google.com");
+        assert!(p.len() >= 2);
+        assert_eq!(p[0], "www.google.com");
+        assert!(p.iter().any(|s| s == "mail.google.com"));
+    }
+
+    #[test]
+    fn build_sni_pool_preserves_custom_primary() {
+        let p = build_sni_pool("mycustom.edge.example.com");
+        assert_eq!(p, vec!["mycustom.edge.example.com".to_string()]);
+    }
+
    #[test]
    fn filter_drops_connection_specific() {
        let h = vec![
@@ -22,12 +22,28 @@ use crate::mitm::MitmCertManager;
 // Kept conservative: anything on a separate CDN (googlevideo, ytimg,
 // doubleclick, etc.) is DROPPED because routing to the wrong backend breaks
 // rather than helps. Those fall through to MITM+relay (slower but works).
+// Domains that are hosted on the Google Front End and therefore reachable via
+// the same SNI-rewrite tunnel used for www.google.com itself. Adding a suffix
+// here means "TLS CONNECT to google_ip, SNI = front_domain, Host = real name"
+// for requests to it — bypassing the Apps Script relay entirely, so there's no
+// User-Agent locking and no Apps Script quota.
+// When in doubt leave it out: sites that aren't actually on GFE will 404 or
+// return a wrong-cert error instead of loading.
 const SNI_REWRITE_SUFFIXES: &[&str] = &[
+    // Core Google
    "google.com",
+    "gstatic.com",
+    "googleusercontent.com",
+    "googleapis.com",
+    "ggpht.com",
+    // YouTube family
    "youtube.com",
    "youtu.be",
    "youtube-nocookie.com",
-    "fonts.googleapis.com",
+    "ytimg.com",
+    // Blogger / Blog.google
+    "blogspot.com",
+    "blogger.com",
 ];

 fn matches_sni_rewrite(host: &str) -> bool {
@@ -134,6 +150,14 @@ impl ProxyServer {
            socks_addr
        );

+        // Pre-warm the outbound connection pool so the user's first request
+        // doesn't pay a fresh TLS handshake to Google edge. Best-effort;
+        // failures are logged and ignored.
+        let warm_fronter = self.fronter.clone();
+        tokio::spawn(async move {
+            warm_fronter.warm(3).await;
+        });
+
        let stats_fronter = self.fronter.clone();
        tokio::spawn(async move {
            let mut ticker = tokio::time::interval(std::time::Duration::from_secs(60));
@@ -709,7 +733,10 @@ async fn do_sni_rewrite_tunnel_from_tcp(
    )
    .await
    {
-        Ok(Ok(s)) => s,
+        Ok(Ok(s)) => {
+            let _ = s.set_nodelay(true);
+            s
+        }
        Ok(Err(e)) => {
            tracing::debug!("upstream connect failed for {}: {}", host, e);
            return Ok(());