Files
therealaleph 677ec26bee fix: v1.9.8 — Android disconnect crash + UI test-button gate for non-apps_script modes
Android (#666 from @ilok67 with full root cause):
- MainActivity.onStop was sending ACTION_STOP via startService() AND immediately calling stopService() on the same service. ACTION_STOP runs teardown() on a background thread that stopSelf()s at the end; the redundant stopService() triggered onDestroy() in parallel, racing the lifecycle and crashing on every Disconnect tap. Removed the stopService() — ACTION_STOP alone is sufficient for both the live-service and the zombie-after-process-death cases. The tornDown AtomicBoolean already guards against double-teardown of native state but couldn't protect against OS-level stopSelf vs stopService race.

UI (#665 from @cmptrnb):
- Test Relay button was showing red "test result: fail" status when used in full or direct mode. The underlying test_cmd::run deliberately refuses in those modes because probing Apps Script directly while the data plane goes via tunnel-node would give a misleading result, but the refuse path was getting translated to generic "test failed". UI now checks mode before running and shows a mode-specific explainer for full/direct (point users at https://whatismyipaddress.com in the browser via the proxy as the right way to verify).

Includes already-merged PR #674 from @yyoyoian-pixel: drop client coalesce_step + tunnel-node straggler settle_step from 40 ms → 10 ms, raise tunnel-node settle max from 500 ms → 1000 ms. Asymmetric tuning: fast-fire when nothing else is queued, but adaptive coalesce on bursts. Backwards compatible — existing configs with explicit `coalesce_step_ms: 40` keep old behavior.

Tests: 179 lib + 33 tunnel-node green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 15:57:53 +03:00

6.1 KiB

• Fix v1.9.7 Android: کرش روی tap Disconnect (#666 از @ilok67 با root cause + fix کامل): MainActivity.onStop بعد از startService(ACTION_STOP) بلافاصله stopService() رو هم می‌زد. ACTION_STOP داخل MhrvVpnService یک thread پس‌زمینه به نام mhrv-teardown می‌سازه که teardown() (بستن tun2proxy، fd TUN، runtime) رو اجرا می‌کنه و در پایانش stopSelf() رو فرامی‌خونه. ولی stopService() بلافاصله onDestroy() رو روی همان service trigger می‌کرد — دو thread همزمان دارن از lifecycle می‌گذرن، و OS process service رو می‌کشه قبل از اینکه teardown تمام بشه. crash بعد از تب Disconnect، در حدود ۹۹٪ از تستها قابل reproduce. حالا stopService() حذف شده — ACTION_STOP تنها کافی است (هم برای service زنده هم برای حالت زامبی). idempotency guard tornDown AtomicBoolean قبلاً موجود بود ولی محافظت OS-level lifecycle race رو نمی‌کرد. تشکر از @ilok67 برای triage عالی. • Fix v1.9.7 UI: دکمهٔ Test Relay در حالت fulldirect) "test result: fail" قرمز نشون می‌داد (#665 از @cmptrnb). mhrv-rs test فقط برای حالت apps_script سیم‌کشی شده — در full mode عمداً refuse می‌کنه چون probe مستقیم Apps Script در حالی که data plane از tunnel-node رد می‌شه گمراه‌کننده است. ولی پیام refuse توسط UI به‌عنوان test failure ترجمه می‌شد + کاربر فکر می‌کرد proxy خراب است. حالا UI mode رو قبل از اجرای test چک می‌کنه + برای حالت‌های نامناسب پیام explainer می‌ده به‌جای fail قرمز:

Test Relay is wired only for apps_script mode. In full mode the data plane is the tunnel-node — to verify it end-to-end, start the proxy and load https://whatismyipaddress.com in your browser via 127.0.0.1:8085. The IP shown should be your tunnel-node's VPS IP.

  • Tune adaptive batch coalesce (PR #674 از @yyoyoian-pixel): از 40 ms → 10 ms برای client coalesce step و tunnel-node straggler settle step. tunnel-node settle max از 500 ms → 1000 ms. منطق asymmetric: وقتی هیچ op دیگری نیست، fast-fire (10 ms کافی برای catch کردن op‌هایی که در همان event-loop tick می‌رسن مثل ۶ موازی parallel browser connection)؛ ولی وقتی هر دو طرف data دارن (uploads، page load بستی)، adaptive reset همچنان batch می‌کنه تا 1 s cap. در short: «وقتی چیزی برای انتظار نیست منتظر نباش، وقتی هست با تمام توان batch کن.» سازگار به عقب: کاربران با coalesce_step_ms: 40 در config.json رفتار قدیمی رو نگه می‌دارن. • تست: ۱۷۹ lib + ۳۳ tunnel-node test همه pass.

• Fix Android crash on tap-Disconnect from v1.9.7 (#666 by @ilok67 with full root cause + fix): MainActivity.onStop was calling stopService() immediately after startService(ACTION_STOP). ACTION_STOP inside MhrvVpnService spawns the mhrv-teardown background thread that runs teardown() (stops tun2proxy, closes TUN fd, shuts down the Rust runtime) and then calls stopSelf() at the end. But stopService() immediately triggered onDestroy() on the same service — two threads racing through the lifecycle, and the OS would kill the process before teardown finished. Crash on every Disconnect tap, ~99% reproducible. Removed the stopService() call — ACTION_STOP alone is sufficient for both the live-service and the zombie-after-process-death cases. The existing tornDown AtomicBoolean idempotency guard protects against double-teardown of native state, but it can't protect against OS-level lifecycle races on stopSelf vs stopService. Thanks @ilok67 for the precise triage. • Fix UI showing "test result: fail" red status for full (and direct) modes from v1.9.7 (#665 by @cmptrnb). mhrv-rs test is wired only for the apps_script relay path — it deliberately refuses in full mode because probing Apps Script directly while the actual data plane goes via tunnel-node would give a misleading green result. But the refuse path was getting translated by the UI as a generic "test failed" with red status, scaring users into thinking their proxy was broken. Now the UI checks mode before running the test and shows a friendly explainer for full/direct:

Test Relay is wired only for apps_script mode. In full mode the data plane is the tunnel-node — to verify it end-to-end, start the proxy and load https://whatismyipaddress.com in your browser via 127.0.0.1:8085. The IP shown should be your tunnel-node's VPS IP.

• Tune adaptive batch coalesce (PR #674 from @yyoyoian-pixel): client coalesce step + tunnel-node straggler settle step from 40 ms → 10 ms, tunnel-node settle max from 500 ms → 1000 ms. The asymmetric design — small step, generous max — picks up "fire-and-forget when nothing else is queued" without giving up batching on bursts. The 10 ms still catches ops that arrive in the same event-loop tick (e.g. a browser opening 6 parallel connections on page load), so we don't degenerate into single-op batches; but on a download where the client is just waiting for the next chunk, the per-batch dead-air shrinks by ~30 ms. Backwards-compatible: existing configs with explicit coalesce_step_ms: 40 keep the old behaviour. • Tests: 179 lib + 33 tunnel-node tests all passing.