v0.9.4: actionable diagnostic when outbound TLS to Google edge fails (#18 follow-up)

@Behzad9 reports: after the EMFILE fix in v0.9.3 landed cleanly, the
relay now fails with a different error:

    ERROR Relay failed: io: invalid peer certificate: UnknownIssuer

repeated on every request. This is rustls (via domain_fronter) rejecting
the server cert that whatever sits on our TLS connection to google_ip
presents. In practice this means one of three things, in decreasing
order of likelihood for an Iranian OpenWRT user:

  1. The ISP / a middlebox is intercepting outbound TLS to Google IPs
     and presenting its own cert. webpki-roots (Mozilla trust store,
     baked in) correctly rejects it.
  2. The user's google_ip setting points at a non-Google host.
  3. Router clock is wildly off (NTP not synced), certs look not-yet-valid.

Before this change: one identical ERROR per failed relay, no guidance.
Log filled with the same line.

Now:
  - New DomainFronter::log_relay_failure() detects cert-related error
    strings (UnknownIssuer, CertificateExpired, CertNotValidYet,
    NotValidForName, 'invalid peer certificate').
  - First occurrence logs an ERROR with the three root causes and three
    concrete fixes: run  to find a working Google IP,
    check the system clock, or as a LAST RESORT set verify_ssl=false
    (with the explicit warning that traffic is then only protected by
    the Apps Script auth_key, not outer TLS).
  - Subsequent occurrences drop to debug so the log stays readable —
    an AtomicBool gate on the DomainFronter instance tracks whether
    the hint was shown. Resets on proxy restart.
  - Non-cert errors still log at error level unchanged.

49 tests pass, no code-path regressions (log line content changed, not
behavior). Shipping so users hit this get actionable output.
This commit is contained in:
therealaleph
2026-04-22 22:25:52 +03:00
parent 0ad206f05e
commit a9ad697b6a
3 changed files with 44 additions and 3 deletions
Generated
+1 -1
View File
@@ -1317,7 +1317,7 @@ dependencies = [
[[package]] [[package]]
name = "mhrv-rs" name = "mhrv-rs"
version = "0.9.3" version = "0.9.4"
dependencies = [ dependencies = [
"base64 0.22.1", "base64 0.22.1",
"bytes", "bytes",
+1 -1
View File
@@ -1,6 +1,6 @@
[package] [package]
name = "mhrv-rs" name = "mhrv-rs"
version = "0.9.3" version = "0.9.4"
edition = "2021" edition = "2021"
description = "Rust port of MasterHttpRelayVPN -- DPI bypass via Google Apps Script relay with domain fronting" description = "Rust port of MasterHttpRelayVPN -- DPI bypass via Google Apps Script relay with domain fronting"
license = "MIT" license = "MIT"
+42 -1
View File
@@ -83,6 +83,9 @@ pub struct DomainFronter {
/// response cache isn't busted by the constantly-changing `features` /// response cache isn't busted by the constantly-changing `features`
/// / `fieldToggles` params. /// / `fieldToggles` params.
normalize_x_graphql: bool, normalize_x_graphql: bool,
/// Set once we've emitted the "UnknownIssuer means ISP MITM" hint,
/// so we don't spam it every time a cert-validation error repeats.
cert_hint_shown: std::sync::atomic::AtomicBool,
tls_connector: TlsConnector, tls_connector: TlsConnector,
pool: Arc<Mutex<Vec<PoolEntry>>>, pool: Arc<Mutex<Vec<PoolEntry>>>,
cache: Arc<ResponseCache>, cache: Arc<ResponseCache>,
@@ -179,6 +182,7 @@ impl DomainFronter {
auth_key: config.auth_key.clone(), auth_key: config.auth_key.clone(),
parallel_relay: config.parallel_relay as usize, parallel_relay: config.parallel_relay as usize,
normalize_x_graphql: config.normalize_x_graphql, normalize_x_graphql: config.normalize_x_graphql,
cert_hint_shown: std::sync::atomic::AtomicBool::new(false),
script_ids, script_ids,
script_idx: AtomicUsize::new(0), script_idx: AtomicUsize::new(0),
tls_connector, tls_connector,
@@ -307,6 +311,43 @@ impl DomainFronter {
); );
} }
/// Log a relay failure with extra guidance on cert-validation cases.
/// Rate-limited so a flood of identical "UnknownIssuer" errors doesn't
/// fill the log.
fn log_relay_failure(&self, e: &FronterError) {
let msg = e.to_string();
let is_cert_issue = msg.contains("UnknownIssuer")
|| msg.contains("invalid peer certificate")
|| msg.contains("CertificateExpired")
|| msg.contains("CertNotValidYet")
|| msg.contains("NotValidForName");
if is_cert_issue
&& !self
.cert_hint_shown
.swap(true, std::sync::atomic::Ordering::Relaxed)
{
// First time — print the full diagnostic. Subsequent hits
// drop to debug so the log stays readable.
tracing::error!(
"Relay failed: {} — this almost always means one of:\n \
(1) your ISP or a middlebox is intercepting TLS to the Google edge \
(common in Iran / IR);\n \
(2) the `google_ip` in your config is pointing at a non-Google host;\n \
(3) your system clock is way off (NTP not synced).\n\
Fixes (try in order): run `mhrv-rs scan-ips` to find a different Google \
frontend IP that isn't being MITM'd; check `date` on your host; as a \
LAST RESORT set `\"verify_ssl\": false` in config.json — this lets the \
relay work even through a middlebox, but your traffic is then only \
protected by the Apps Script relay's secret `auth_key`, not by outer TLS.",
e
);
} else if is_cert_issue {
tracing::debug!("Relay failed (cert): {}", e);
} else {
tracing::error!("Relay failed: {}", e);
}
}
fn next_sni(&self) -> String { fn next_sni(&self) -> String {
let n = self.sni_hosts.len(); let n = self.sni_hosts.len();
let i = self.sni_idx.fetch_add(1, Ordering::Relaxed) % n; let i = self.sni_idx.fetch_add(1, Ordering::Relaxed) % n;
@@ -479,7 +520,7 @@ impl DomainFronter {
Ok(Ok(bytes)) => bytes, Ok(Ok(bytes)) => bytes,
Ok(Err(e)) => { Ok(Err(e)) => {
self.relay_failures.fetch_add(1, Ordering::Relaxed); self.relay_failures.fetch_add(1, Ordering::Relaxed);
tracing::error!("Relay failed: {}", e); self.log_relay_failure(&e);
return error_response(502, &format!("Relay error: {}", e)); return error_response(502, &format!("Relay error: {}", e));
} }
Err(_) => { Err(_) => {