A complete technical walkthrough of MeshRoute's geo-clustered, multi-path, load-balanced routing protocol — from boot to delivery.
Every routing protocol answers one question: when node A wants to send a message to node Z, which intermediate nodes should carry it?
Current Meshtastic uses managed flooding: every node rebroadcasts every message, hoping it reaches the destination. This works for small networks but creates a bandwidth explosion at scale — 100 nodes means ~100 transmissions per message.
System 5 takes a fundamentally different approach: nodes self-organize into geographic clusters, discover multi-hop routes, and send each message along a specific computed path. The cost per message is proportional to the number of hops, not the number of nodes.
The following sections explain every step in detail.
When a System 5 node powers up, it doesn't know anything about the network yet. Here's the exact sequence:
"u33d"). The first 4 characters define the node's cluster. Nodes with the same 4-char prefix are in the same cluster.A geohash encodes a GPS coordinate into a string where shared prefixes mean geographic proximity. Two nodes with geohash "u33d8" and "u33d9" share the prefix "u33d" — they're in the same cluster. A node with "u33e1" is in a neighboring cluster. At 4-character precision, each cluster covers roughly 40km × 20km.
Each node periodically broadcasts an OGM (Originator Message) — a lightweight packet (~12 bytes) containing:
| Field | Size | Purpose |
|---|---|---|
| Node ID | 4 bytes | Unique identifier (derived from hardware MAC) |
| Cluster ID | 4 bytes | Geohash prefix of this node's cluster |
| Battery | 1 byte | Current charge level (0-100%) |
| Neighbor count | 1 byte | How many neighbors this node currently has |
| Flags | 1 byte | Border node, distributor role, S5 capability |
| Best reachable | 1 byte | Compact hop-count summary for distance-vector updates |
OGMs are not sent at a fixed 30s interval. The interval adapts to network density and EU868 duty cycle constraints:
| Neighbors | OGM Interval | Rationale |
|---|---|---|
| ≤ 8 | 30 s | Sparse network — fast discovery needed |
| 9–20 | 60 s | Moderate density — balance discovery vs. airtime |
| 21–40 | 120 s | Dense network — neighbors are stable, save airtime |
| > 40 | 180 s | Very dense — airtime is the bottleneck |
OGMs stay cluster-local (1 hop only) — they are never rebroadcasted. This means a 25-node cluster generates 25 OGMs per interval, not 25 × network-size. At SF12 (~500ms per 12-byte OGM), a 25-node cluster uses ~12.5s of airtime per cycle — well within EU868's 1% duty cycle (360ms per 36s per node, or ~10s aggregate for 25 nodes at 30s intervals).
When a node receives an OGM, it:
Memory constraint. Each neighbor entry costs ~20 bytes (node ID, link quality, battery, cluster ID, last-seen timestamp). On nRF52 devices with 256KB RAM, 16 neighbors × 20 bytes = 320 bytes — trivial. If a node hears more than 16 neighbors, it evicts the one with the lowest link quality. This is fine because routing only needs the best neighbors, not all of them.
A critical detail: the link from A→B can have different quality than B→A. A mountaintop node with clear line-of-sight might send to a valley node with quality 0.95, but the valley node (surrounded by buildings) can only reach back with quality 0.3. System 5 tracks both directions independently.
After a few OGM rounds (~1-2 minutes), nodes have discovered their neighbors. Now the network self-organizes into geographic clusters.
There is no central coordinator. Each node independently:
This is completely decentralized — no node needs to "know" the full network. If you turn on a new node in Munich, it computes its geohash ("u281"), and it's automatically part of the Munich cluster. It doesn't need permission or coordination.
A border node is any node that has neighbors in a different cluster. Border nodes are critical — they're the bridges for inter-cluster routing.
System 5 limits bridge links to 2 per cluster pair to prevent route explosion. The two strongest links between each pair of adjacent clusters are selected.
Each node only needs to know:
A node in San Francisco doesn't need to know every individual node in Oakland. It just needs to know: "to reach Oakland, route through border node #47 on the Bay Bridge ridge."
Once clusters and border nodes are known, each node builds a routing table with up to 2 routes (primary + backup) to every reachable destination in its cluster view.
Routes are not computed by running a graph algorithm on the node. Instead, they emerge incrementally from OGM data — similar to how RIP and B.A.T.M.A.N. build their routing tables:
Compute cost: Distance-vector requires one table-lookup per incoming OGM — a single comparison + write. BFS would require O(V+E) graph traversal per destination, which is unnecessary overhead on a 64 MHz nRF52840. The simulator uses BFS for clarity, but a real firmware implementation would use distance-vector.
Memory cost: A route entry needs only dst_id (4B) + next_hop (4B) + quality (1B) + age (2B) + hop_count (1B) = 12 bytes. Two routes per destination at 12 bytes each is far smaller than full path caching.
Route quality is the estimated end-to-end delivery probability, computed incrementally as each hop's OGM propagates quality information:
This naturally penalizes long paths (more hops = more quality multiplications < 1.0) and paths with weak links.
| Component | Per-Entry | Count | Total |
|---|---|---|---|
| Route entry | 12 bytes | 2 routes × 35 destinations | 840 bytes |
| Neighbor table | 20 bytes | 16 neighbors | 320 bytes |
| Cluster metadata | ~25 bytes | 8 clusters | 200 bytes |
| Node state | ~100 bytes | 1 | 100 bytes |
| Total | ~1.5 KB | ||
Even with 70 destinations (large cluster view), total memory is ~2 KB — well within nRF52's ~64 KB usable RAM.
When node A wants to send a message to node Z, this is the exact decision process:
Is this message's priority high enough for the current network health? (See Step 6)
Look up all cached routes to destination Z. Typically 2-5 routes.
Remove routes where any intermediate node has died (battery = 0) or any link is broken.
For each surviving route, compute a weight that balances three factors:
| Factor | Weight | What It Measures | Why It Matters |
|---|---|---|---|
| Q(r) — Quality | 0.4 (40%) | Accumulated route quality from distance-vector (product of per-hop qualities) | Higher = fewer packet losses, fewer retries needed |
| 1−Load — Spare capacity | 0.35 (35%) | Queue utilization of the next-hop node (from its last OGM) | Avoid routing through a congested next hop |
| Batt — Battery | 0.25 (25%) | Battery level of the next-hop node (from its last OGM) | Don't drain low-battery neighbors; prefer nodes on mains/solar power |
On LoRa networks, nodes only have reliable data about their direct neighbors (via OGMs). "Path-wide" battery or load information would require multi-hop propagation of per-node state — creating exactly the extra traffic that clustering is designed to avoid. Next-hop metrics are locally observable, always fresh, and add zero overhead.
This is a critical design choice. System 5 does not always pick the best route. It selects routes probabilistically, proportional to their weight:
Example: If route 1 has weight 0.8 and route 2 has weight 0.4:
This keeps secondary routes "warm" — traffic occasionally flows through them, so the network knows they still work. If route 1 fails, route 2 is immediately available with recent quality data.
The message is sent along the selected path, one hop at a time:
Links with quality > 0.5 get 3 retries (likely to succeed quickly). Links with quality ≤ 0.5 get 5 retries (need more attempts). This balances delivery probability against airtime cost.
When an intermediate node's queue is filling up, System 5 applies gradual backpressure:
| Queue Load | Action |
|---|---|
| < 80% | Normal operation — route weight unaffected |
| 80–95% | Route weight penalized (× 0.8–0.95) — traffic shifts to alternatives |
| > 95% | Route fully blocked — no new traffic routed through this node |
This prevents cascading overload: when a node starts getting congested, traffic naturally redistributes to other paths before the node drops packets.
Not all messages are equal. System 5 uses a Network Health Score (NHS) per cluster to throttle low-priority traffic when the network is stressed.
NHS is a value from 0.0 to 1.0 representing the cluster's health as seen by this node. It requires no extra traffic — it's computed entirely from data already present in received OGMs:
NHS is not a global cluster metric — each node computes its own local view. This avoids any extra polling or cross-node aggregation traffic.
| NHS Range | Network State | Allowed Priorities | Example |
|---|---|---|---|
| 0.8 – 1.0 | Healthy | All (0–7) | Everything gets through |
| 0.6 – 0.8 | Moderate | 0–5 only | Low-priority telemetry throttled |
| 0.4 – 0.6 | Degraded | 0–3 only | Only important messages |
| 0.2 – 0.4 | Critical | 0–1 only | Emergency/SOS only |
| < 0.2 | Collapsed | 0 only | SOS messages only |
Priority 0 = SOS/Emergency — always gets through, regardless of network state. This ensures that in a disaster scenario, critical messages are never blocked by routine telemetry.
If all 5 cached routes fail (after retries on each), System 5 doesn't give up. It falls back to scoped cluster flooding — a targeted mini-flood along the corridor between source and destination.
This is dramatically cheaper than full-network flooding. In a 500-node network with 10 clusters, a full flood touches all 500 nodes. A scoped corridor flood touches ~100 nodes (2 full clusters + border nodes).
When the corridor flood triggers frequently (as in the Bay Area half-duplex scenario), it can still generate high TX counts. This is the #1 optimization target: try more alternative directed routes before triggering the corridor flood.
The previous steps describe unicast routing (one sender, one receiver). But in Meshtastic, ~98% of traffic is broadcast — position beacons, node info, telemetry. This is where the biggest efficiency gains are.
In managed flooding, a single broadcast message triggers O(n) transmissions — every node that receives it rebroadcasts to all neighbors. In a 200-node network, one position beacon generates hundreds of transmissions, saturating the channel.
System 5 replaces blind broadcast flooding with structured wave propagation through elected cluster distributors:
Each cluster elects one distributor node — a valley-level node with high local connectivity but low cross-cluster leakage. Selection criteria:
When a node wants to broadcast, it sends the message via unicast to its cluster's distributor — 1-3 hops, ~1-3 TX. No flooding.
The distributor performs a mini-flood within its cluster only. Since clusters are small (20-30 nodes), this is cheap: ~20-30 TX to reach all cluster members.
Border nodes that receive the message relay it to the distributor of the adjacent cluster via unicast. The wave propagates cluster by cluster until all clusters are covered.
| Scenario | Managed Flood TX | Cluster-Dist TX | MF Reach | CD Reach | TX Savings |
|---|---|---|---|---|---|
| Medium (50 nodes) | 806 | 97 | 81% | 74% | 88% |
| Large (100 nodes) | ~1,500 | ~180 | 75% | 80% | 88% |
| Bay Area (235 nodes) | 4,301 | 220 | 90% | 96% | 95% |
| Regional (500 nodes) | 95,869 | 517 | 91% | 100% | 99.5% |
Position beacons, node info, and telemetry are broadcast every 15-900 seconds by every node. In a 100-node network with managed flooding, this means thousands of TX per minute just for housekeeping. With cluster-distributor, the same information reaches everyone with 88-99% fewer transmissions — freeing the channel for actual messages and dramatically reducing battery drain.
EU868 mandates a 1% duty cycle — each node can transmit at most 36ms per 3.6s, or ~360ms per 36s. Here's how the budget works for a 25-node cluster:
| Traffic Type | Frequency | Payload | Airtime (SF12) | Budget Share |
|---|---|---|---|---|
| OGM beacon | 30-120s (adaptive) | 12 bytes | ~500ms | 0.4-1.7% |
| Broadcast (via distributor) | Per message | ~30 bytes | ~800ms | Per event |
| Unicast hop | Per message | ~50 bytes | ~1.2s | Per event |
| Border summary | Per OGM cycle | ~20 bytes | ~600ms | Shared with OGM |
At the adaptive 60s OGM interval (typical for moderate density), a node uses ~0.8% of its 1% duty cycle for maintenance traffic — leaving headroom for actual data. With managed flooding, a single broadcast storm can consume the entire duty budget of every node in the cluster.
Networks change constantly: nodes move, batteries drain, links degrade. System 5 maintains itself through several mechanisms:
Inspired by ant colony optimization: at each OGM cycle (adaptive interval, see Step 2), all cached route qualities decay by 5%:
Routes that are actually used get their quality refreshed from real link measurements. Routes that are never used gradually fade to zero and are eventually replaced. This ensures the routing table always reflects current network conditions, not stale historical data.
After each message delivery attempt:
When a node discovers a new neighbor but its table is full (16 entries), it evicts the neighbor with the lowest link quality. This ensures the routing table always contains the best available connections.
Unlike Meshtastic's fixed 3-7 hop limit, System 5 uses a dynamic limit that scales with network size:
For a 100-node network: max_hops = √100 × 3 = 30. For a 1000-node network: max_hops = 40 (capped). This allows messages to traverse large networks without artificial limits, while preventing infinite loops.
This is the single most important real-world constraint that simulators typically ignore. LoRa radios are half-duplex: a node cannot transmit while it is receiving.
Consider a mountaintop router at 2000 ft elevation that can hear 100 nodes. When a message floods through the network:
Result: 5% actual utilization becomes 50% channel utilization at the mountaintop, and messages fail to propagate beyond the first hop.
With directed routing, the mountaintop node receives one targeted packet (not 14 simultaneous rebroadcasts). It processes it, waits for a clear TX window, and forwards to the next hop. The radio state machine is simple:
We built a Bay Area-style simulation with 235 nodes in 3 elevation tiers and half-duplex radio modeling. The results confirm the real-world observations:
| Scenario | Router | Delivery | Total TX |
|---|---|---|---|
| Bay Area without half-duplex | Managed Flood | 87.5% | 908,785 |
| System 5 | 80.5% | 47,094 | |
| Bay Area with half-duplex | Managed Flood | 6.0% | 6,752 |
| System 5 | 77.5% | 540,780 |
Half-duplex collapses flooding from ~87% → ~6% delivery. System 5 holds at ~74%. Try the Bay Area scenario in the live simulator →
Real-world mesh devices have tight memory constraints. The nRF52840 (used in RAK4631 solar routers) has 256KB RAM, with ~64KB available after BLE and LoRa stacks. System 5 is designed to fit comfortably.
With distance-vector routing and compact route entries (12 bytes instead of full path caching at 410 bytes), memory requirements are dramatically lower than initially estimated:
| Structure | Per-Entry | Count | Total |
|---|---|---|---|
| Neighbor entry | 20 bytes | 16 max | 320 bytes |
| Route entry (dst + next-hop + quality + age + hops) | 12 bytes | 2 routes × N destinations | varies |
| Cluster metadata | ~25 bytes | 8 clusters | 200 bytes |
| Node own state | ~100 bytes | 1 | 100 bytes |
| Network Size | Destinations Tracked | Routing Memory | Fits nRF52 (64KB)? |
|---|---|---|---|
| 20 nodes (local) | 20 | ~1.1 KB | Yes (1.7%) |
| 100 nodes (city) | ~40 (cluster view) | ~1.6 KB | Yes (2.5%) |
| 500 nodes (regional) | ~50 (cluster + borders) | ~1.8 KB | Yes (2.8%) |
| 1000 nodes | ~70 (cluster + borders) | ~2.3 KB | Yes (3.6%) |
| 10,000 nodes | ~200 (cluster + borders) | ~5.4 KB | Yes (8.4%) |
The key insight: geo-clustering means a node only tracks its own cluster + border routes, and distance-vector means each route is just 12 bytes (destination + next-hop + metadata), not a full path. Even a 10,000-node network uses only ~5.4 KB for routing — leaving >90% of available RAM for the application.
An earlier version of this proposal estimated ~410 bytes per route entry (storing full multi-hop paths) and 5 routes per destination, leading to estimates of 30-84 KB for large networks. The distance-vector approach with 12-byte entries and 2 routes per destination reduces this by >95%. This was correctly flagged as a concern in community feedback — thank you.
Based on feedback from Bay Area Mesh operators, we built a simulation that models the actual network structure:
| Tier | Nodes | Elevation | Range | Terrain | Role |
|---|---|---|---|---|---|
| Mountain | 7 (3%) | 600-1200m | 45km | Free-space | Backbone routers (Mt Diablo, Mt Tam, etc.) |
| Hill/Rooftop | 35 (15%) | 150-500m | 10km | Suburban | Bridge between mountain and valley |
| Valley/Indoor | 193 (82%) | 0-100m | 0.75-2.5km | Urban/Indoor | End-user handhelds and indoor nodes |
Try this scenario in the Live Simulator → — select "Bay Area Mesh (235 nodes, 3-tier elevation)" from the dropdown.
Read the full Q&A with Bay Area Mesh community →
One of the most impactful v2.0 features, inspired by Bay Area Mesh feedback. The core insight: in a 235-node network, most valley nodes are redundant — their neighbors can all be reached via other paths. Every time these redundant nodes rebroadcast, they add collision noise at mountaintop receivers without contributing to message delivery.
silence_priority = redundancy × 0.6 + (1 - battery) × 0.4. Low-battery nodes are silenced first. Solar nodes (always 100%) stay active longest.| Action | Silenced Node | Active Node |
|---|---|---|
| Receive OGMs | Yes — stays aware of network | Yes |
| Receive direct messages | Yes — can be addressed | Yes |
| Send own messages | Yes — can initiate | Yes |
| Rebroadcast flooding | No — stays silent | Yes |
| Send OGMs | No — saves airtime | Yes |
| Forward directed S5 packets | Yes — if on the computed path | Yes |
The same nodes can't be silenced forever — their batteries would last longer, but the active nodes would drain faster. System 5 rotates the silent set every 10 minutes:
Result: every node spends roughly equal time active and silent. Battery drain is distributed evenly across the network.
| Bay Area Scenario | S5 Delivery | S5 TX | Nodes Silenced |
|---|---|---|---|
| Without silencing | 77.5% | 540,780 | 0 |
| With silencing | 74.5% | 267,927 | 134 (57%) |
| 128 valley nodes silenced, 6 hill nodes silenced, 0 mountain nodes silenced | |||
When all 5 cached routes fail (every hop was tried with retries), the original System 5 immediately triggered scoped corridor flooding. This was expensive — in the Bay Area scenario, 73 out of 200 messages triggered fallback floods.
The v2.0 improvement adds one more step before flooding:
failed_nodes setThis is cheap (one BFS computation, no extra TX unless the path works) and often succeeds because the failed nodes were the actual problem — the rest of the network may have perfectly good paths.
Multi-path routing can deliver messages out of order. If messages A, B, C are sent via three different paths with different latencies, the receiver might see C, B, A — or worse, C, A (B lost on a failed path).
The v2.0 wire protocol adds a 2-byte sequence counter (uint16_t seq) per (source, destination) pair:
seq for each message to a given destinationRetransmission requires an ACK + retransmit cycle. On LoRa, each packet takes 500ms-2s of airtime. A 5-hop retransmit costs 5-10 seconds of channel time — during which the mountaintop is blocked from all other traffic. Sequence numbers provide gap detection at zero TX cost (just 2 extra bytes in the header). The app layer can decide whether to request a retransmit or simply show "1 message missing."
The sequence counters are stored efficiently in firmware: neighbor-indexed array for known neighbors (32 bytes) + LRU cache for others (96 bytes) = 128 bytes total, regardless of network size.
Every System 5 packet has a 22-byte header:
| Offset | Field | Size | Description |
|---|---|---|---|
| 0 | Version | 1 byte | Protocol version (currently 0x01) |
| 1 | Type | 1 byte | DATA (0x01), OGM (0x02), ACK (0x03), CLUSTER_ANNOUNCE (0x04) |
| 2 | Source ID | 4 bytes | Originator node ID |
| 6 | Destination ID | 4 bytes | Target node ID (0xFFFFFFFF = broadcast) |
| 10 | Packet ID | 4 bytes | Unique per-packet (for deduplication) |
| 14 | Hop Count | 1 byte | Current hop count (incremented each hop) |
| 15 | Max Hops | 1 byte | TTL — dynamic, based on √n × 3 |
| 16 | Priority | 1 byte | QoS priority (0 = SOS, 7 = lowest) |
| 17 | Flags | 1 byte | Fallback bit, route-request bit, etc. |
| 18 | Payload Length | 2 bytes | Payload size in bytes |
| 20 | Checksum | 2 bytes | CRC-16 of header + payload |
| 22+ | Payload | variable | Application data (max ~230 bytes for LoRa) |
| Property | Naive Flood | Managed Flood | Next-Hop | System 5 |
|---|---|---|---|---|
| TX cost per message | O(n) | O(n) × 0.5 | O(hops)* | O(hops) |
| Works for broadcasts | Yes | Yes | No | Yes |
| Hop limit needed | Yes (3-7) | Yes (3-7) | Partially | No (dynamic) |
| Multi-path failover | No | No | No | 5 routes |
| Load balancing | No | No | No | Weighted |
| Congestion avoidance | No | No | No | Backpressure |
| QoS priority | No | No | No | 8 levels + NHS gate |
| Half-duplex resilient | No | No | Partially | Yes |
| GPS required | No | No | No | Yes** |
| Memory overhead | Minimal | Minimal | Low | Moderate (~8-30KB) |
* Next-hop only works for direct messages after a learning flood. First message still floods.
** GPS required for clustering, but RSSI triangulation and cluster inheritance provide fallbacks.