Finding the optimal communication path between LoRa mesh devices — before a single byte of payload is transmitted.
by Clemens Simon
Half-duplex radio physics collapse managed flooding to 0–60% delivery in ALL scenarios — not just Bay Area. System 5 combines geo-clustering, multi-path routing, and adaptive QoS into one self-healing protocol. For networks up to ~200 nodes: 100% delivery with 92–99.9% less bandwidth (vs flooding's 27–87%). At 500+ nodes: dramatically higher delivery (76% vs 3% at 500 nodes). The hop limit — today's biggest scaling bottleneck — becomes irrelevant: each hop costs ~1 transmission instead of n.
Meshtastic uses flooding: every node rebroadcasts every packet, hoping it reaches the destination. This design was fine when mesh networks had 10–30 nodes. But as communities like Bay Area Mesh, NYC Mesh, and European Freifunk grow to hundreds or thousands of devices, the approach breaks down in five fundamental ways.
Every message is rebroadcast by every node that receives it. A single message to one recipient generates n transmissions across the entire network.
1–50 kbps bandwidth. 1% duty cycle (EU law). Half-duplex radio. Each packet takes 50ms–2s airtime. Budget: ~36–720 packets/hour/node.
Every node transmits on every message — even nodes nowhere near the intended path. Battery-powered devices drain in hours instead of weeks.
Multiple nodes rebroadcast simultaneously. LoRa is half-duplex — collisions destroy packets, triggering more retransmissions. A vicious cycle.
Meshtastic caps hops at 3–7 to prevent flood storms. But this kills range: a message can't reach nodes beyond the limit. Every extra hop multiplies transmissions by n — so the limit can't be raised without drowning the network.
Below we compare four routing strategies on the same 15-node network topology. Each animation runs live in your browser — watch the TX counter in each panel. It counts how many radio transmissions are needed to deliver a single message. Fewer TX = less airtime, fewer collisions, longer battery life. The difference is dramatic: from ~100 TX (naive flooding) to ~2 TX (System 5).
The theoretical worst case — every node rebroadcasts every packet exactly once. Meshtastic doesn't actually use this (it uses managed flooding below), but this baseline reveals the fundamental cost of flooding: with n nodes, a single message always costs n transmissions, regardless of distance.
Source sends a packet. Every receiving node rebroadcasts it once. The entire network participates in every message.
TX = n (one per node)
This is what Meshtastic actually uses today (v2.6/2.7). It's already quite clever: before rebroadcasting, each node listens briefly. If it hears a neighbor already forwarded the packet, it stays silent (suppression). Nodes far from the sender rebroadcast first (they have lower SNR = shorter contention windows), while close nodes wait and often suppress. Nodes with the ROUTER role always rebroadcast to guarantee backbone coverage. This cuts transmissions by roughly 40–60% compared to naive flooding — a real improvement, but the cost still scales linearly with network size.
Before rebroadcasting, each node listens briefly. If it hears another node already rebroadcast, it suppresses its own transmission. Distant nodes (low SNR) get shorter delays and rebroadcast first. Close nodes wait and often suppress.
ROUTER-role nodes (marked R) override suppression — they always rebroadcast to ensure backbone coverage.
TX ≈ 0.4n – 0.6n (~50% suppression)
New in Meshtastic v2.6 — a significant step toward directed routing, but only for direct messages (unicast). The first time you message someone, it floods normally. The system watches which relay node successfully delivered the packet and caches that relay as the "next hop." Subsequent messages go only through that one relay node — a huge TX reduction. But there's no multi-path fallback: if that relay dies, it floods again. And all broadcasts (position beacons, channel messages) still use managed flooding.
Phase 1: First message uses managed flooding. The system tracks which node successfully relayed.
Phase 2: Subsequent messages go only via the learned next-hop node (marked NH). One relay instead of the whole network.
Phase 3: If the next-hop dies, the system falls back to managed flooding and learns a new relay.
Only works for direct messages (unicast). Broadcasts still use managed flooding. Only learns one hop — not a full path.
Our proposal: a fundamentally different approach that combines proven networking concepts into one protocol. Nodes self-organize into geographic clusters using GPS geohashes. Within a cluster, every node knows its neighbors via OGMs. Between clusters, border nodes act as bridges. Routes are built incrementally via distance-vector (like B.A.T.M.A.N.), not by running graph algorithms on-device. For unicast: 2 cached routes per destination with weighted selection. For broadcast (~98% of Meshtastic traffic): elected cluster-distributors propagate messages via local mini-floods, not network-wide flooding. All metrics (load, battery) are next-hop only, locally observable from OGMs. OGM intervals adapt to network density (30-180s) for EU868 duty-cycle compliance. The result: ~1 TX per hop for unicast, 88-99% fewer TX for broadcast.
W(r) = α·Q(r) + β·(1−Load) + γ·Batt
Load and Batt = next-hop node only (from OGM)
Share(r) = W(r) / Σ W(all)
Managed flooding suppresses ~50% of rebroadcasts but still scales as O(n). System 5 routes along specific paths — cost scales with hop count, not network size. At 100 nodes: managed flood = ~1,500 TX per message, System 5 = ~2 TX.
Next-hop learns a single relay node for direct messages. System 5 maintains 2-3 full paths with weighted load distribution for all traffic types. When a path fails, the next cached path activates instantly — no flooding fallback needed.
Intuition says "less flooding = better," but how much better? The formulas below show the exact TX cost per message for each approach. The key variable is n (network size): flooding-based approaches scale with n, while directed routing scales with d (hop distance). Below the formulas, we score each approach across 7 weighted criteria — from TX efficiency to broadcast support — to provide a fair overall comparison.
Every node rebroadcasts once. At n=100: 100 transmissions per message. Cost grows linearly with network size.
S = suppression rate (fraction of nodes that hear a rebroadcast and stay silent). Depends on density and SNR distribution. At n=100 with S=0.5: ~50 transmissions. Still O(n) but ~50% cheaper than naive.
First message floods (managed). After learning: d = hop count to destination via cached relay. Amortized cost depends on cache hit rate. Broadcasts still use managed flooding.
Every message — unicast and broadcast — follows a pre-computed path. Cost = hop count, independent of network size. At n=100, d=2: 2 transmissions. With fallback: scoped cluster flooding adds O(cluster_size) in worst case.
Q = link quality (OGM reception rate), Load = queue pressure, Batt = min battery along route.
Traffic share: Share(r) = W(r) / Σ W(all). Tuning: α=0.4, β=0.35, γ=0.25
| Criterion (Weight) | Naive Flood | Managed Flood | Next-Hop | System 5 |
|---|---|---|---|---|
| TX Cost per Message (20%) | 1 | 4 | 5 | 10 |
| Delivery Reliability (20%) | 9 | 9 | 8 | 9 |
| Scalability (15%) | 1 | 3 | 4 | 9 |
| Fault Tolerance (15%) | 8 | 8 | 7 | 9 |
| Hop Limit Freedom (10%) | 1 | 2 | 3 | 10 |
| Energy Efficiency (10%) | 1 | 3 | 5 | 8 |
| Broadcast Support (10%) | 10 | 10 | 3 | 9 |
| WEIGHTED TOTAL | 4.3 | 5.5 | 5.1 | 9.2 |
Meshtastic's managed flooding is clever — but still O(n). We didn't invent new theory. Instead, System 5 borrows six proven, battle-tested concepts from decades of networking research and adapts them to LoRa's unique constraints (low bandwidth, half-duplex, 1% duty cycle, limited RAM). Each concept solves a specific piece of the puzzle:
Nodes self-organize by geohash prefix. Full topology within cluster, summarized routes between. Scales from 10 to 10,000+ nodes.
Periodic originator messages. Count reception rate per neighbor. No complex calculation — just count how many arrive.
Traffic distributed proportionally to route weight. Good paths get more traffic, but never all. No single bottleneck node.
Overloaded nodes report queue pressure. Traffic naturally avoids congested paths — like water flowing around rocks.
Successful deliveries strengthen a route. Timeouts weaken it. Unused routes fade naturally. The network learns.
Where is the target node? Ask locally first, then cluster, then region. Answers are cached. Scoped flooding only as last resort.
Feedback from Bay Area Mesh operators exposed critical gaps in the original design. Their mountaintop routers (SUNL, Mt Diablo) showed that half-duplex radio physics — not routing algorithms — is the true scaling bottleneck. Five features were built in direct response.
The network identifies redundant nodes (those whose neighbors are all reachable via other paths) and mutes them. Silenced nodes still listen — they receive messages, the network knows they exist — but they don't rebroadcast. This removes the collision noise at mountaintops. Battery-fair rotation every 10 minutes. Result: TX halved, only 3% less delivery.
LoRa radios are half-duplex: a node cannot TX while receiving. When a mountaintop hears 10 rebroadcasts, it's blocked from forwarding for 10-20 seconds. Our simulator now models this per-node radio state (IDLE/TX/RX) on ALL scenarios. Result: flooding collapses to 0–60% delivery across all scenarios (~87% → ~6% in Bay Area). System 5 holds at ~74% in Bay Area and maintains high delivery everywhere.
Multi-path routing can deliver messages out of order (A,B,C → C,B,A via 3 paths). A 2-byte sequence counter per (source, destination) pair in the packet header lets the app detect gaps: "got seq 3 and 5, missed 4." Zero extra TX cost — just 2 bytes added to the existing header.
When both cached routes fail, System 5 falls back to scoped corridor flooding (source + destination clusters + border nodes only). Routes are built via distance-vector from OGM data — no BFS computation on-device needed.
A new Bay Area simulation models the actual network structure: 7 mountaintop nodes (45km range), 35 hill/rooftop nodes (10km), 193 valley/indoor nodes (2.5km). Asymmetric links, half-duplex, collision capture. Four Bay Area scenarios test normal, stress, silencing, and combined conditions. Try it live →
| Scenario | Flood Del. | S5 Del. | S5 TX | Silent |
| Bay Area (no half-duplex) | ~87% | ~80% | ~47K | — |
| Bay Area (half-duplex) | ~6% | ~74% | ~516K | 0% |
| Bay Area + Silencing | ~6% | ~70% | ~284K | 57% |
| Bay Area + Stress | ~5% | ~55% | ~283K | 0% |
| Bay Area + Silencing + Stress | ~5% | ~49% | ~132K | 57% |
Key insight: Half-duplex destroys flooding in Bay Area (~87% → ~6%) and across ALL scenarios (0–60% delivery). System 5 holds at ~74% in Bay Area and is the only approach maintaining high delivery. Node Silencing halves TX cost (~516K → ~284K) with only ~4% less delivery. 128 of 193 valley nodes are muted — all 7 mountain nodes stay active. Results averaged over 5 random seeds for statistical reliability.
In flooding, every hop multiplies transmissions across the entire network. In System 5, every hop costs exactly one transmission. This single change unlocks everything.
Every node rebroadcasts at every hop. At 100 nodes and 5 hops, a single message generates 330,000+ transmissions. The hop limit (default 3) is a survival mechanism — without it, the network drowns.
Only the forwarding node transmits. At 100 nodes and 5 hops, a single message generates 5 transmissions. No hop limit needed — 20 hops cost the same as flooding costs for 1.
| Scenario | Nodes | 3-hop Del. | 5-hop Del. | 7-hop Del. | Sys5 Del. | Sys5 TX |
| Small Local | 20 | 87% | 87% | 87% | 100% | 115 |
| Medium City | 100 | 25% | 27% | 27% | 100% | 402 |
| Large Regional | 500 | 1% | 2% | 3% | 76% | 412k |
| 1000 Nodes | 1000 | 0% | 0% | 0% | 46% | 182k* |
| 1500 Nodes | 1500 | 0% | 1% | 1% | 44% | 197k* |
* At 1000+ nodes, System 5 uses more total TX than managed flooding — but delivers 46% vs 0%. Managed flooding is effectively dead at this scale with half-duplex enabled.
Critical finding: With half-duplex and collisions enabled on ALL scenarios, managed flooding's delivery rate collapses everywhere. At 1000 nodes, 0% of messages arrive. Even at 100 nodes, only 27%. System 5 delivers 100% at 100 nodes and 46% at 1000 nodes. Half-duplex radio physics — not just hop limits — make flooding fundamentally unreliable.
No more artificial hop limits. Messages can traverse 20, 30, or 50 hops at the same per-hop cost. The network's range is limited only by node density, not by protocol constraints.
Only the forwarding node transmits per hop — not every node in range. Nodes far from the path sleep through. Battery life increases from hours to weeks.
With cheap hops, SHORT_FAST with more hops works as well as LONG_SLOW with fewer hops — at higher data rates. Choose the preset for your local conditions, not for the network's range limit.
No central server, no configuration, no coordination. When nodes power on, they independently discover neighbors via OGM beacons (every 30 seconds), compute their geographic cluster from GPS, identify border nodes that bridge clusters, and build multi-path routing tables — all automatically. Click "Next Step" below to watch this process unfold. Each step shows exactly what happens in the firmware. For the full technical deep-dive, see How It Works →
This animation shows how a System 5 mesh network self-organizes from powered-on nodes to a fully routed, load-balanced mesh.
Mesh networks don't exist in a vacuum — they range from a dozen handhelds in a neighborhood to thousands of devices spanning continents. System 5's geo-clustering architecture naturally adapts: a single cluster handles local traffic efficiently, while inter-cluster routing via border nodes scales to arbitrarily large networks. The three live simulations below show how the same protocol handles neighborhood, continental, and global scale. Watch the node activity logs on the right — they show real-time routing decisions, OGM beacons, and failover events.
12 nodes within ~3km. Direct LoRa links. Single geo-cluster. Full internal topology known to all nodes. Multi-path routing with load balancing.
Clusters in major cities connected via MQTT gateways and long-range relays. Border nodes bridge clusters. DNS-like cache resolves node positions.
Continental super-clusters connected via internet backbone (MQTT). Hierarchical geohash addressing. Cascading DNS cache for node discovery.
A routing protocol that only works in perfect conditions is worthless. In the real world, nodes die (batteries, hardware), GPS signals drop, internet links fail, and entire regions go dark. System 5 is designed to degrade gracefully: multi-path routing provides instant failover (0ms — the next cached route is already known), scoped corridor flooding is the safety net, and the adaptive QoS gate ensures emergency messages always get through even when the network is collapsing. Click on nodes or links below to simulate failures and watch the network adapt in real-time.
A node dies (battery, hardware). Its routes break instantly. System 5 switches to cached backup routes in 0ms. If a border node dies, the second border node takes over. If all borders die, the cluster falls back to flooding.
GPS module fails or loses signal. The node can't compute its geohash. Fallback: neighbor consensus — if 4 of 5 neighbors say "u0x8", the node adopts that cluster. If no neighbors have GPS: "homeless" mode with local flooding.
Internet-based MQTT links between cities fail. The LoRa relay subnet activates — a chain of small relay nodes bridges clusters via pure radio. Slower (more hops) but functional. The green chain below the clusters is this backup path.
As the Network Health Score drops, low-priority traffic is automatically blocked. SOS (P0) always gets through, even at 1% network health. Firmware updates (P7) only when the network is perfect. The network breathes: less traffic under stress = self-healing.
Claims are only as good as their evidence. We built a Python simulator that models EU 868MHz LoRa physics (path loss, terrain, duty cycle, collisions, half-duplex radio) and runs 6 routers on identical networks: Naive Flood, Managed Flood at 3/5/7 hop limits, Next-Hop, and System 5. Each scenario runs 100–300 random messages and measures delivery rate, total transmissions, and per-node load. The results below are from 26 scenarios covering 20–1500 nodes — including realistic environments (rural, maritime, indoor, highway convoy) and stress tests (node failure, link degradation, duty cycle enforcement). Try the interactive simulator →
| Scenario | Nodes | Naive TX | Managed TX | Next-Hop TX | Sys5 TX | S5 Delivery | S5 vs Managed |
|---|
Click button to toggle between log and linear scale
System 5 maintains high delivery even under failures
How much the most loaded node has to transmit — lower is better
High-priority traffic gets through even when the network is degraded
With realistic hop limits (3–7), managed flooding not only wastes bandwidth — it fails to deliver. At 500 nodes with hop limit 7, only 51% of messages arrive. At 1000 nodes, only 6%. System 5 delivers 7.5x more messages in the same scenarios, using fewer total transmissions per delivered message. The hop limit is not a safety net — it is the primary scaling barrier that makes large mesh networks fundamentally unreliable.
Mesh networking spans radio engineering, graph theory, and protocol design. Here's a quick reference for the technical terms used throughout this presentation — from LoRa basics to the protocol-specific concepts introduced by System 5.
Half-duplex radio physics collapse managed flooding to 0–60% delivery in ALL scenarios — not just Bay Area. At 100 nodes, only 27% of messages arrive. At 1000 nodes, 0%. System 5 breaks through these barriers with directed routing at ~1 TX per hop, surviving both the hop limit wall and the half-duplex collision cascade. Across 26 scenarios (all with half-duplex and collisions enabled), System 5 is the only approach that maintains high delivery — 100% at up to 200 nodes, 46% at 1000 nodes where flooding delivers nothing. Very large sparse networks remain challenging for any protocol. The protocol is fully specified, a working ESP32 firmware prototype exists for three board types, and the complete source code is open under MIT license.
Complete analysis with problem statement, all five approaches evaluated, mathematical scoring, resilience design, QoS architecture, and the project roadmap.