MeshRoute — How System 5 Works (Step by Step)

The Big Picture

Every routing protocol answers one question: when node A wants to send a message to node Z, which intermediate nodes should carry it?

Current Meshtastic uses managed flooding: every node rebroadcasts every message, hoping it reaches the destination. This works for small networks but creates a bandwidth explosion at scale — 100 nodes means ~100 transmissions per message.

System 5 takes a fundamentally different approach: nodes self-organize into geographic clusters, discover multi-hop routes, and send each message along a specific computed path. The cost per message is proportional to the number of hops, not the number of nodes.

Managed Flooding (Meshtastic today) A sends → every node rebroadcasts → Z receives (eventually) Cost: O(n) transmissions per message — scales with network size System 5 (MeshRoute proposal) A sends → B → D → G → Z (computed path) Cost: O(hops) transmissions per message — scales with distance, not size

The following sections explain every step in detail.

Step 1: Boot — What Happens When a Node Turns On

When a System 5 node powers up, it doesn't know anything about the network yet. Here's the exact sequence:

Hardware init: LoRa radio starts listening on the configured frequency (EU 868MHz). GPS module begins acquiring satellites (if available).
Position acquisition: The node tries three methods in order:
- Real GPS — best option, accurate to ~5m (T-Beam, RAK4631 with GPS module)
- RSSI triangulation — if no GPS but 3+ neighbors with known positions, estimate position from signal strength
- Cluster inheritance — if neither works, adopt the cluster ID of the strongest neighbor
Geohash computation: GPS coordinates are converted to a geohash string (e.g., "u33d"). The first 4 characters define the node's cluster. Nodes with the same 4-char prefix are in the same cluster.
OGM broadcast: The node sends its first Originator Message (OGM) — a small packet announcing: "I exist, here's my position, battery level, and cluster ID."

What is a Geohash?

A geohash encodes a GPS coordinate into a string where shared prefixes mean geographic proximity. Two nodes with geohash "u33d8" and "u33d9" share the prefix "u33d" — they're in the same cluster. A node with "u33e1" is in a neighboring cluster. At 4-character precision, each cluster covers roughly 40km × 20km.

Step 2: Neighbor Discovery via OGM

Each node periodically broadcasts an OGM (Originator Message) — a lightweight packet (~12 bytes) containing:

Field	Size	Purpose
Node ID	4 bytes	Unique identifier (derived from hardware MAC)
Cluster ID	4 bytes	Geohash prefix of this node's cluster
Battery	1 byte	Current charge level (0-100%)
Neighbor count	1 byte	How many neighbors this node currently has
Flags	1 byte	Border node, distributor role, S5 capability
Best reachable	1 byte	Compact hop-count summary for distance-vector updates

Adaptive OGM Interval

OGMs are not sent at a fixed 30s interval. The interval adapts to network density and EU868 duty cycle constraints:

Neighbors	OGM Interval	Rationale
≤ 8	30 s	Sparse network — fast discovery needed
9–20	60 s	Moderate density — balance discovery vs. airtime
21–40	120 s	Dense network — neighbors are stable, save airtime
> 40	180 s	Very dense — airtime is the bottleneck

OGMs stay cluster-local (1 hop only) — they are never rebroadcasted. This means a 25-node cluster generates 25 OGMs per interval, not 25 × network-size. At SF12 (~500ms per 12-byte OGM), a 25-node cluster uses ~12.5s of airtime per cycle — well within EU868's 1% duty cycle (360ms per 36s per node, or ~10s aggregate for 25 nodes at 30s intervals).

When a node receives an OGM, it:

Measures RSSI/SNR of the received signal → determines link quality to the sender
Adds/updates the sender in its neighbor table (max 16 neighbors per node)
Notes the cluster ID — if the sender is in a different cluster, this node might be a border node

Why max 16 neighbors?

Memory constraint. Each neighbor entry costs ~20 bytes (node ID, link quality, battery, cluster ID, last-seen timestamp). On nRF52 devices with 256KB RAM, 16 neighbors × 20 bytes = 320 bytes — trivial. If a node hears more than 16 neighbors, it evicts the one with the lowest link quality. This is fine because routing only needs the best neighbors, not all of them.

Link Quality Is Asymmetric

A critical detail: the link from A→B can have different quality than B→A. A mountaintop node with clear line-of-sight might send to a valley node with quality 0.95, but the valley node (surrounded by buildings) can only reach back with quality 0.3. System 5 tracks both directions independently.

Mountain (600m, free-space) ——— quality 0.95 ———→ Valley (50m, urban) Mountain ←—— quality 0.30 —————— Valley System 5 knows both values. When routing TO the valley, it uses the 0.95 link. When routing FROM the valley, it finds a different path (maybe through a hill node).

Step 3: Geo-Clustering — Self-Organizing Geography

After a few OGM rounds (~1-2 minutes), nodes have discovered their neighbors. Now the network self-organizes into geographic clusters.

How Clustering Works

There is no central coordinator. Each node independently:

Computes its geohash from its GPS position
Takes the first 4 characters as its cluster ID
Nodes with the same 4-char prefix are automatically in the same cluster

This is completely decentralized — no node needs to "know" the full network. If you turn on a new node in Munich, it computes its geohash ("u281"), and it's automatically part of the Munich cluster. It doesn't need permission or coordination.

Border Nodes — The Bridges Between Clusters

A border node is any node that has neighbors in a different cluster. Border nodes are critical — they're the bridges for inter-cluster routing.

Cluster "u33d" Cluster "u33e" ┌──────────────┐ ┌──────────────┐ │ A B C │ │ F G H │ │ D │ │ I │ │ [E] ─┼── link ──┼─ [J] │ └──────────────┘ └──────────────┘ [E] and [J] are border nodes — they have neighbors across cluster boundaries. To send from A to H: A → D → E → J → G → H (within u33d) (bridge) (within u33e)

System 5 limits bridge links to 2 per cluster pair to prevent route explosion. The two strongest links between each pair of adjacent clusters are selected.

Why This Scales

Each node only needs to know:

Full topology within its own cluster (direct neighbor knowledge, ~20-50 nodes)
Border nodes to adjacent clusters (2 per pair × ~8 neighbor clusters = 16 nodes)
Cluster-level routes for distant clusters (abstract — "to reach cluster u33e, go through border node E")

A node in San Francisco doesn't need to know every individual node in Oakland. It just needs to know: "to reach Oakland, route through border node #47 on the Bay Bridge ridge."

Step 4: Route Computation — Distance-Vector with Multi-Path

Once clusters and border nodes are known, each node builds a routing table with up to 2 routes (primary + backup) to every reachable destination in its cluster view.

The Algorithm: Distance-Vector (not BFS)

Routes are not computed by running a graph algorithm on the node. Instead, they emerge incrementally from OGM data — similar to how RIP and B.A.T.M.A.N. build their routing tables:

When node A receives an OGM from neighbor B, it learns: "B can reach node Z in 3 hops with quality 0.8"
A updates its routing table: "I can reach Z via B in 4 hops, quality = 0.8 × q(A→B)"
If A later hears from neighbor C that C can reach Z in 2 hops with quality 0.9, it records a second (better) route
The routing table converges after a few OGM cycles — each node knows 1-2 next-hops per destination

Why distance-vector, not BFS?

Compute cost: Distance-vector requires one table-lookup per incoming OGM — a single comparison + write. BFS would require O(V+E) graph traversal per destination, which is unnecessary overhead on a 64 MHz nRF52840. The simulator uses BFS for clarity, but a real firmware implementation would use distance-vector.

Memory cost: A route entry needs only dst_id (4B) + next_hop (4B) + quality (1B) + age (2B) + hop_count (1B) = 12 bytes. Two routes per destination at 12 bytes each is far smaller than full path caching.

Distance-vector route learning: A builds routes to Z OGM from B: "I reach Z in 3 hops, quality 0.8" → A records: Route 1 to Z = via B, 4 hops, quality 0.8 × q(A→B) = 0.72 OGM from C: "I reach Z in 2 hops, quality 0.9" → A records: Route 2 to Z = via C, 3 hops, quality 0.9 × q(A→C) = 0.81 Result: 2 routes with independent next-hops. No graph traversal needed.

Route Quality

Route quality is the estimated end-to-end delivery probability, computed incrementally as each hop's OGM propagates quality information:

Route Quality (accumulated via OGMs) Q(route to Z via B) = Q(B's advertised quality to Z) × q(A→B)

This naturally penalizes long paths (more hops = more quality multiplications < 1.0) and paths with weak links.

Memory Budget

Component	Per-Entry	Count	Total
Route entry	12 bytes	2 routes × 35 destinations	840 bytes
Neighbor table	20 bytes	16 neighbors	320 bytes
Cluster metadata	~25 bytes	8 clusters	200 bytes
Node state	~100 bytes	1	100 bytes
Total			~1.5 KB

Even with 70 destinations (large cluster view), total memory is ~2 KB — well within nRF52's ~64 KB usable RAM.

Step 5: Sending a Message — Weighted Route Selection

When node A wants to send a message to node Z, this is the exact decision process:

Check QoS Gate

Is this message's priority high enough for the current network health? (See Step 6)

Get Cached Routes

Look up all cached routes to destination Z. Typically 2-5 routes.

Filter Dead Routes

Remove routes where any intermediate node has died (battery = 0) or any link is broken.

Compute Weights

For each surviving route, compute a weight that balances three factors:

Route Weight (the core formula) W(r) = 0.4 × Q(r) + 0.35 × (1 − Load) + 0.25 × Batt

Factor	Weight	What It Measures	Why It Matters
Q(r) — Quality	0.4 (40%)	Accumulated route quality from distance-vector (product of per-hop qualities)	Higher = fewer packet losses, fewer retries needed
1−Load — Spare capacity	0.35 (35%)	Queue utilization of the next-hop node (from its last OGM)	Avoid routing through a congested next hop
Batt — Battery	0.25 (25%)	Battery level of the next-hop node (from its last OGM)	Don't drain low-battery neighbors; prefer nodes on mains/solar power

Why next-hop only, not path-wide?

On LoRa networks, nodes only have reliable data about their direct neighbors (via OGMs). "Path-wide" battery or load information would require multi-hop propagation of per-node state — creating exactly the extra traffic that clustering is designed to avoid. Next-hop metrics are locally observable, always fresh, and add zero overhead.

Proportional Selection (Not Best-Only!)

This is a critical design choice. System 5 does not always pick the best route. It selects routes probabilistically, proportional to their weight:

Selection Probability P(route r) = W(r) / Σ W(all routes)

Example: If route 1 has weight 0.8 and route 2 has weight 0.4:

Route 1 is selected ~67% of the time (0.8 / 1.2)
Route 2 is selected ~33% of the time (0.4 / 1.2)

This keeps secondary routes "warm" — traffic occasionally flows through them, so the network knows they still work. If route 1 fails, route 2 is immediately available with recent quality data.

Hop-by-Hop Forwarding

The message is sent along the selected path, one hop at a time:

A sends to B (first hop in the path)
If B receives it → B forwards to D (next hop)
If B doesn't receive it → A retries (up to 3 times for good links, 5 for poor links)
If all retries fail → try the next cached route (up to 5 route attempts)
If all routes fail → fall back to scoped cluster flooding

Adaptive Retries

Links with quality > 0.5 get 3 retries (likely to succeed quickly). Links with quality ≤ 0.5 get 5 retries (need more attempts). This balances delivery probability against airtime cost.

Backpressure — Automatic Congestion Avoidance

When an intermediate node's queue is filling up, System 5 applies gradual backpressure:

Queue Load	Action
< 80%	Normal operation — route weight unaffected
80–95%	Route weight penalized (× 0.8–0.95) — traffic shifts to alternatives
> 95%	Route fully blocked — no new traffic routed through this node

This prevents cascading overload: when a node starts getting congested, traffic naturally redistributes to other paths before the node drops packets.

Step 6: QoS — Priority Under Pressure

Not all messages are equal. System 5 uses a Network Health Score (NHS) per cluster to throttle low-priority traffic when the network is stressed.

NHS Calculation (Locally Computed)

NHS is a value from 0.0 to 1.0 representing the cluster's health as seen by this node. It requires no extra traffic — it's computed entirely from data already present in received OGMs:

Average link quality to this node's direct neighbors (measured via RSSI/SNR)
Number of neighbors responding vs. expected (based on previous OGM rounds)
This node's own queue utilization

NHS is not a global cluster metric — each node computes its own local view. This avoids any extra polling or cross-node aggregation traffic.

Priority Gating

NHS Range	Network State	Allowed Priorities	Example
0.8 – 1.0	Healthy	All (0–7)	Everything gets through
0.6 – 0.8	Moderate	0–5 only	Low-priority telemetry throttled
0.4 – 0.6	Degraded	0–3 only	Only important messages
0.2 – 0.4	Critical	0–1 only	Emergency/SOS only
< 0.2	Collapsed	0 only	SOS messages only

Priority 0 = SOS/Emergency — always gets through, regardless of network state. This ensures that in a disaster scenario, critical messages are never blocked by routine telemetry.

Step 7: Fallback — When All Routes Fail

If all 5 cached routes fail (after retries on each), System 5 doesn't give up. It falls back to scoped cluster flooding — a targeted mini-flood along the corridor between source and destination.

How Scoped Flooding Works

Find the cluster-level path: Follow the cluster adjacency table from source cluster to destination cluster (built from border-node OGM exchanges)
Define the corridor: Source cluster + destination cluster + all border nodes along the cluster path
Flood only within the corridor: Message is broadcast only to nodes in the corridor — not the entire network

Full network (flooding would touch all 5 clusters): ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │ C1 │──│ C2 │──│ C3 │──│ C4 │──│ C5 │ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ Source in C1, destination in C4. Cluster path: C1 → C2 → C3 → C4 Scoped flood corridor (System 5 fallback): ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │ C1 │──│ B2 │──│ B3 │──│ C4 │ C5 is NOT flooded └─────┘ └─────┘ └─────┘ └─────┘ C1 and C4: full cluster flood (all members) B2 and B3: only border nodes + their immediate neighbors

This is dramatically cheaper than full-network flooding. In a 500-node network with 10 clusters, a full flood touches all 500 nodes. A scoped corridor flood touches ~100 nodes (2 full clusters + border nodes).

Known Limitation

When the corridor flood triggers frequently (as in the Bay Area half-duplex scenario), it can still generate high TX counts. This is the #1 optimization target: try more alternative directed routes before triggering the corridor flood.

Broadcast Routing — Cluster-Distributor Model

The previous steps describe unicast routing (one sender, one receiver). But in Meshtastic, ~98% of traffic is broadcast — position beacons, node info, telemetry. This is where the biggest efficiency gains are.

The Problem with Broadcast Flooding

In managed flooding, a single broadcast message triggers O(n) transmissions — every node that receives it rebroadcasts to all neighbors. In a 200-node network, one position beacon generates hundreds of transmissions, saturating the channel.

Cluster-Distributor: Wave Propagation

System 5 replaces blind broadcast flooding with structured wave propagation through elected cluster distributors:

Elect Distributors

Each cluster elects one distributor node — a valley-level node with high local connectivity but low cross-cluster leakage. Selection criteria:

Coverage: How many cluster members can it reach directly?
Containment: What fraction of its neighbors are inside the cluster? (High = less wasted airtime)
Tier preference: Valley nodes preferred over hill/mountain (they reach local nodes, not distant clusters)

Unicast to Distributor

When a node wants to broadcast, it sends the message via unicast to its cluster's distributor — 1-3 hops, ~1-3 TX. No flooding.

Local Mini-Flood

The distributor performs a mini-flood within its cluster only. Since clusters are small (20-30 nodes), this is cheap: ~20-30 TX to reach all cluster members.

Border Relay

Border nodes that receive the message relay it to the distributor of the adjacent cluster via unicast. The wave propagates cluster by cluster until all clusters are covered.

Broadcast wave propagation (235-node Bay Area): Source node in C1 sends to C1-Distributor (unicast, 2 TX) C1-Distributor mini-floods C1 (28 TX, reaches 30 nodes) Border B1→B2 relay to C2-Distributor (unicast, 1 TX) C2-Distributor mini-floods C2 (35 TX, reaches 42 nodes) Border B3→B4 relay to C3-Distributor (unicast, 1 TX) C3-Distributor mini-floods C3 (22 TX, reaches 25 nodes) ...continues until all clusters covered... Total: ~220 TX to reach 235 nodes (vs. 4,301 TX with managed flooding)

Results

Scenario	Managed Flood TX	Cluster-Dist TX	MF Reach	CD Reach	TX Savings
Medium (50 nodes)	806	97	81%	74%	88%
Large (100 nodes)	~1,500	~180	75%	80%	88%
Bay Area (235 nodes)	4,301	220	90%	96%	95%
Regional (500 nodes)	95,869	517	91%	100%	99.5%

Why this matters for real Meshtastic

Position beacons, node info, and telemetry are broadcast every 15-900 seconds by every node. In a 100-node network with managed flooding, this means thousands of TX per minute just for housekeeping. With cluster-distributor, the same information reaches everyone with 88-99% fewer transmissions — freeing the channel for actual messages and dramatically reducing battery drain.

Airtime Budget (EU868)

EU868 mandates a 1% duty cycle — each node can transmit at most 36ms per 3.6s, or ~360ms per 36s. Here's how the budget works for a 25-node cluster:

Traffic Type	Frequency	Payload	Airtime (SF12)	Budget Share
OGM beacon	30-120s (adaptive)	12 bytes	~500ms	0.4-1.7%
Broadcast (via distributor)	Per message	~30 bytes	~800ms	Per event
Unicast hop	Per message	~50 bytes	~1.2s	Per event
Border summary	Per OGM cycle	~20 bytes	~600ms	Shared with OGM

At the adaptive 60s OGM interval (typical for moderate density), a node uses ~0.8% of its 1% duty cycle for maintenance traffic — leaving headroom for actual data. With managed flooding, a single broadcast storm can consume the entire duty budget of every node in the cluster.

Step 8: Self-Maintenance — Keeping Routes Fresh

Networks change constantly: nodes move, batteries drain, links degrade. System 5 maintains itself through several mechanisms:

Route Quality Decay (Pheromone Model)

Inspired by ant colony optimization: at each OGM cycle (adaptive interval, see Step 2), all cached route qualities decay by 5%:

Quality Decay (per OGM cycle) Q(route) = Q(route) × 0.95

Routes that are actually used get their quality refreshed from real link measurements. Routes that are never used gradually fade to zero and are eventually replaced. This ensures the routing table always reflects current network conditions, not stale historical data.

Route Feedback

After each message delivery attempt:

Success: Route quality boosted by 5% (capped at 1.0). Fail count reset to 0.
Failure: Route quality halved. Fail count incremented. After 3 failures, route is excluded from selection until refreshed by the next OGM round.

Neighbor Eviction

When a node discovers a new neighbor but its table is full (16 entries), it evicts the neighbor with the lowest link quality. This ensures the routing table always contains the best available connections.

Dynamic Hop Limit

Unlike Meshtastic's fixed 3-7 hop limit, System 5 uses a dynamic limit that scales with network size:

Dynamic Max Hops max_hops = clamp(√n × 3, 15, 40)

For a 100-node network: max_hops = √100 × 3 = 30. For a 1000-node network: max_hops = 40 (capped). This allows messages to traverse large networks without artificial limits, while preventing infinite loops.

The Half-Duplex Problem (Bay Area Discovery)

This is the single most important real-world constraint that simulators typically ignore. LoRa radios are half-duplex: a node cannot transmit while it is receiving.

Why This Matters for Flooding

Consider a mountaintop router at 2000 ft elevation that can hear 100 nodes. When a message floods through the network:

Node A sends a message. The mountaintop hears it (now in RX state).
10 rooftop nodes near A also hear it and rebroadcast simultaneously.
The mountaintop is still receiving those 10 rebroadcasts — it cannot transmit.
4 long-range router nodes also rebroadcast. The mountaintop is still stuck in RX.
By the time the mountaintop can transmit, the managed flooding algorithm's suppression timer has expired — it either rebroadcasts (causing more collisions) or suppresses (message dies).

Result: 5% actual utilization becomes 50% channel utilization at the mountaintop, and messages fail to propagate beyond the first hop.

Why System 5 Survives Half-Duplex

With directed routing, the mountaintop node receives one targeted packet (not 14 simultaneous rebroadcasts). It processes it, waits for a clear TX window, and forwards to the next hop. The radio state machine is simple:

Managed Flooding at mountaintop: RX(A) → RX(B) → RX(C) → ... → RX(N) → TX window? ALL BLOCKED Duration: 10-20 seconds of continuous RX → message dies System 5 directed routing: RX(packet for us) → TX(forward to next hop) → IDLE Duration: ~2 seconds total → message continues

Simulation Results

We built a Bay Area-style simulation with 235 nodes in 3 elevation tiers and half-duplex radio modeling. The results confirm the real-world observations:

Scenario	Router	Delivery	Total TX
Bay Area without half-duplex	Managed Flood	87.5%	908,785
Bay Area without half-duplex	System 5	80.5%	47,094
Bay Area with half-duplex	Managed Flood	6.0%	6,752
Bay Area with half-duplex	System 5	77.5%	540,780

Half-duplex collapses flooding from ~87% → ~6% delivery. System 5 holds at ~74%. Try the Bay Area scenario in the live simulator →

Memory Footprint — Can This Run on nRF52?

Real-world mesh devices have tight memory constraints. The nRF52840 (used in RAK4631 solar routers) has 256KB RAM, with ~64KB available after BLE and LoRa stacks. System 5 is designed to fit comfortably.

Data Structures (Revised)

With distance-vector routing and compact route entries (12 bytes instead of full path caching at 410 bytes), memory requirements are dramatically lower than initially estimated:

Structure	Per-Entry	Count	Total
Neighbor entry	20 bytes	16 max	320 bytes
Route entry (dst + next-hop + quality + age + hops)	12 bytes	2 routes × N destinations	varies
Cluster metadata	~25 bytes	8 clusters	200 bytes
Node own state	~100 bytes	1	100 bytes

Scaling with Network Size

Network Size	Destinations Tracked	Routing Memory	Fits nRF52 (64KB)?
20 nodes (local)	20	~1.1 KB	Yes (1.7%)
100 nodes (city)	~40 (cluster view)	~1.6 KB	Yes (2.5%)
500 nodes (regional)	~50 (cluster + borders)	~1.8 KB	Yes (2.8%)
1000 nodes	~70 (cluster + borders)	~2.3 KB	Yes (3.6%)
10,000 nodes	~200 (cluster + borders)	~5.4 KB	Yes (8.4%)

The key insight: geo-clustering means a node only tracks its own cluster + border routes, and distance-vector means each route is just 12 bytes (destination + next-hop + metadata), not a full path. Even a 10,000-node network uses only ~5.4 KB for routing — leaving >90% of available RAM for the application.

Previous vs. Revised Estimates

An earlier version of this proposal estimated ~410 bytes per route entry (storing full multi-hop paths) and 5 routes per destination, leading to estimates of 30-84 KB for large networks. The distance-vector approach with 12-byte entries and 2 routes per destination reduces this by >95%. This was correctly flagged as a concern in community feedback — thank you.

Bay Area Topology — The Real-World Stress Test

Based on feedback from Bay Area Mesh operators, we built a simulation that models the actual network structure:

Three-Tier Elevation Model

Tier	Nodes	Elevation	Range	Terrain	Role
Mountain	7 (3%)	600-1200m	45km	Free-space	Backbone routers (Mt Diablo, Mt Tam, etc.)
Hill/Rooftop	35 (15%)	150-500m	10km	Suburban	Bridge between mountain and valley
Valley/Indoor	193 (82%)	0-100m	0.75-2.5km	Urban/Indoor	End-user handhelds and indoor nodes

What Makes This Scenario Hard

Extreme asymmetry: Mountain→Valley quality ~1.0 (clear line-of-sight), Valley→Mountain quality ~0 (urban terrain + distance)
Half-duplex blocking: Mountain nodes hear every transmission within 45km — they're permanently in RX state during flooding
Collision cascade: 10+ nodes transmitting simultaneously at the mountaintop → capture effect only helps if one signal is 6dB stronger
Hidden terminal: Two valley nodes can't hear each other but both transmit to the same mountaintop simultaneously

Try this scenario in the Live Simulator → — select "Bay Area Mesh (235 nodes, 3-tier elevation)" from the dropdown.

Read the full Q&A with Bay Area Mesh community →

Node Silencing — Muting Redundant Nodes

One of the most impactful v2.0 features, inspired by Bay Area Mesh feedback. The core insight: in a 235-node network, most valley nodes are redundant — their neighbors can all be reached via other paths. Every time these redundant nodes rebroadcast, they add collision noise at mountaintop receivers without contributing to message delivery.

How It Works

Redundancy scoring: For each node, check every neighbor — can that neighbor be reached by at least 2 other alive nodes? If yes, this neighbor connection is "redundant." The node's redundancy score = fraction of redundant neighbor connections (0.0 = critical, 1.0 = fully replaceable).
Critical bridge protection: Border nodes with few alternatives (≤3 other bridges to the same cluster pair) get their score heavily penalized. Mountain nodes and essential hill nodes are never silenced.
Battery-weighted priority: silence_priority = redundancy × 0.6 + (1 - battery) × 0.4. Low-battery nodes are silenced first. Solar nodes (always 100%) stay active longest.
Per-cluster application: Within each cluster, the top 60% of candidates (by priority) are silenced. At least 2 nodes always remain active per cluster.

What Silenced Nodes Do (and Don't)

Action	Silenced Node	Active Node
Receive OGMs	Yes — stays aware of network	Yes
Receive direct messages	Yes — can be addressed	Yes
Send own messages	Yes — can initiate	Yes
Rebroadcast flooding	No — stays silent	Yes
Send OGMs	No — saves airtime	Yes
Forward directed S5 packets	Yes — if on the computed path	Yes

Battery-Fair Rotation

The same nodes can't be silenced forever — their batteries would last longer, but the active nodes would drain faster. System 5 rotates the silent set every 10 minutes:

All expired silences are lifted
Redundancy scores are recomputed (nodes may have moved, died, or changed load)
A new set of nodes is selected for silencing, weighted by current battery level

Result: every node spends roughly equal time active and silent. Battery drain is distributed evenly across the network.

Results

Bay Area Scenario	S5 Delivery	S5 TX	Nodes Silenced
Without silencing	77.5%	540,780	0
With silencing	74.5%	267,927	134 (57%)
128 valley nodes silenced, 6 hill nodes silenced, 0 mountain nodes silenced

Emergency Re-Route — Last Resort Before Flooding

When all 5 cached routes fail (every hop was tried with retries), the original System 5 immediately triggered scoped corridor flooding. This was expensive — in the Bay Area scenario, 73 out of 200 messages triggered fallback floods.

The v2.0 improvement adds one more step before flooding:

Collect all intermediate nodes from the failed routes into a failed_nodes set
Run a fresh BFS from source to destination, excluding all failed nodes
If a new path is found, try it once (directed, not flooding)
Only if this emergency path also fails → trigger corridor flooding

This is cheap (one BFS computation, no extra TX unless the path works) and often succeeds because the failed nodes were the actual problem — the rest of the network may have perfectly good paths.

Sequence Numbers — Detecting Missing Messages

Multi-path routing can deliver messages out of order. If messages A, B, C are sent via three different paths with different latencies, the receiver might see C, B, A — or worse, C, A (B lost on a failed path).

The v2.0 wire protocol adds a 2-byte sequence counter (uint16_t seq) per (source, destination) pair:

The sender increments seq for each message to a given destination
The receiver tracks the last seen sequence number per source
Gaps are detectable: "received seq 3, then seq 5 — seq 4 is missing"
Out-of-order is detectable: "received seq 5, then seq 4 — reorder"

Why not retransmit?

Retransmission requires an ACK + retransmit cycle. On LoRa, each packet takes 500ms-2s of airtime. A 5-hop retransmit costs 5-10 seconds of channel time — during which the mountaintop is blocked from all other traffic. Sequence numbers provide gap detection at zero TX cost (just 2 extra bytes in the header). The app layer can decide whether to request a retransmit or simply show "1 message missing."

The sequence counters are stored efficiently in firmware: neighbor-indexed array for known neighbors (32 bytes) + LRU cache for others (96 bytes) = 128 bytes total, regardless of network size.

Packet Format — What Goes Over the Air

Every System 5 packet has a 22-byte header:

Offset	Field	Size	Description
0	Version	1 byte	Protocol version (currently 0x01)
1	Type	1 byte	DATA (0x01), OGM (0x02), ACK (0x03), CLUSTER_ANNOUNCE (0x04)
2	Source ID	4 bytes	Originator node ID
6	Destination ID	4 bytes	Target node ID (0xFFFFFFFF = broadcast)
10	Packet ID	4 bytes	Unique per-packet (for deduplication)
14	Hop Count	1 byte	Current hop count (incremented each hop)
15	Max Hops	1 byte	TTL — dynamic, based on √n × 3
16	Priority	1 byte	QoS priority (0 = SOS, 7 = lowest)
17	Flags	1 byte	Fallback bit, route-request bit, etc.
18	Payload Length	2 bytes	Payload size in bytes
20	Checksum	2 bytes	CRC-16 of header + payload
22+	Payload	variable	Application data (max ~230 bytes for LoRa)

System 5 vs. Everything Else

Property	Naive Flood	Managed Flood	Next-Hop	System 5
TX cost per message	O(n)	O(n) × 0.5	O(hops)*	O(hops)
Works for broadcasts	Yes	Yes	No	Yes
Hop limit needed	Yes (3-7)	Yes (3-7)	Partially	No (dynamic)
Multi-path failover	No	No	No	5 routes
Load balancing	No	No	No	Weighted
Congestion avoidance	No	No	No	Backpressure
QoS priority	No	No	No	8 levels + NHS gate
Half-duplex resilient	No	No	Partially	Yes
GPS required	No	No	No	Yes**
Memory overhead	Minimal	Minimal	Low	Moderate (~8-30KB)

* Next-hop only works for direct messages after a learning flood. First message still floods.
** GPS required for clustering, but RSSI triangulation and cluster inheritance provide fallbacks.

Try the Live Simulator → ← Back to Presentation