Deep Dive — Every Layer Explained

How System 5 Works

A complete technical walkthrough of MeshRoute's geo-clustered, multi-path, load-balanced routing protocol — from boot to delivery.

The Big Picture

Every routing protocol answers one question: when node A wants to send a message to node Z, which intermediate nodes should carry it?

Current Meshtastic uses managed flooding: every node rebroadcasts every message, hoping it reaches the destination. This works for small networks but creates a bandwidth explosion at scale — 100 nodes means ~100 transmissions per message.

System 5 takes a fundamentally different approach: nodes self-organize into geographic clusters, discover multi-hop routes, and send each message along a specific computed path. The cost per message is proportional to the number of hops, not the number of nodes.

Managed Flooding (Meshtastic today) A sends → every node rebroadcasts → Z receives (eventually) Cost: O(n) transmissions per message — scales with network size System 5 (MeshRoute proposal) A sends → B → D → G → Z (computed path) Cost: O(hops) transmissions per message — scales with distance, not size

The following sections explain every step in detail.

Step 1: Boot — What Happens When a Node Turns On

When a System 5 node powers up, it doesn't know anything about the network yet. Here's the exact sequence:

  1. Hardware init: LoRa radio starts listening on the configured frequency (EU 868MHz). GPS module begins acquiring satellites (if available).
  2. Position acquisition: The node tries three methods in order:
    • Real GPS — best option, accurate to ~5m (T-Beam, RAK4631 with GPS module)
    • RSSI triangulation — if no GPS but 3+ neighbors with known positions, estimate position from signal strength
    • Cluster inheritance — if neither works, adopt the cluster ID of the strongest neighbor
  3. Geohash computation: GPS coordinates are converted to a geohash string (e.g., "u33d"). The first 4 characters define the node's cluster. Nodes with the same 4-char prefix are in the same cluster.
  4. OGM broadcast: The node sends its first Originator Message (OGM) — a small packet announcing: "I exist, here's my position, battery level, and cluster ID."
What is a Geohash?

A geohash encodes a GPS coordinate into a string where shared prefixes mean geographic proximity. Two nodes with geohash "u33d8" and "u33d9" share the prefix "u33d" — they're in the same cluster. A node with "u33e1" is in a neighboring cluster. At 4-character precision, each cluster covers roughly 40km × 20km.

Step 2: Neighbor Discovery via OGM

Each node periodically broadcasts an OGM (Originator Message) — a lightweight packet (~12 bytes) containing:

FieldSizePurpose
Node ID4 bytesUnique identifier (derived from hardware MAC)
Cluster ID4 bytesGeohash prefix of this node's cluster
Battery1 byteCurrent charge level (0-100%)
Neighbor count1 byteHow many neighbors this node currently has
Flags1 byteBorder node, distributor role, S5 capability
Best reachable1 byteCompact hop-count summary for distance-vector updates
Adaptive OGM Interval

OGMs are not sent at a fixed 30s interval. The interval adapts to network density and EU868 duty cycle constraints:

NeighborsOGM IntervalRationale
≤ 830 sSparse network — fast discovery needed
9–2060 sModerate density — balance discovery vs. airtime
21–40120 sDense network — neighbors are stable, save airtime
> 40180 sVery dense — airtime is the bottleneck

OGMs stay cluster-local (1 hop only) — they are never rebroadcasted. This means a 25-node cluster generates 25 OGMs per interval, not 25 × network-size. At SF12 (~500ms per 12-byte OGM), a 25-node cluster uses ~12.5s of airtime per cycle — well within EU868's 1% duty cycle (360ms per 36s per node, or ~10s aggregate for 25 nodes at 30s intervals).

When a node receives an OGM, it:

  1. Measures RSSI/SNR of the received signal → determines link quality to the sender
  2. Adds/updates the sender in its neighbor table (max 16 neighbors per node)
  3. Notes the cluster ID — if the sender is in a different cluster, this node might be a border node
Why max 16 neighbors?

Memory constraint. Each neighbor entry costs ~20 bytes (node ID, link quality, battery, cluster ID, last-seen timestamp). On nRF52 devices with 256KB RAM, 16 neighbors × 20 bytes = 320 bytes — trivial. If a node hears more than 16 neighbors, it evicts the one with the lowest link quality. This is fine because routing only needs the best neighbors, not all of them.

Link Quality Is Asymmetric

A critical detail: the link from A→B can have different quality than B→A. A mountaintop node with clear line-of-sight might send to a valley node with quality 0.95, but the valley node (surrounded by buildings) can only reach back with quality 0.3. System 5 tracks both directions independently.

Mountain (600m, free-space) ——— quality 0.95 ———→ Valley (50m, urban) Mountain ←—— quality 0.30 —————— Valley System 5 knows both values. When routing TO the valley, it uses the 0.95 link. When routing FROM the valley, it finds a different path (maybe through a hill node).

Step 3: Geo-Clustering — Self-Organizing Geography

After a few OGM rounds (~1-2 minutes), nodes have discovered their neighbors. Now the network self-organizes into geographic clusters.

How Clustering Works

There is no central coordinator. Each node independently:

  1. Computes its geohash from its GPS position
  2. Takes the first 4 characters as its cluster ID
  3. Nodes with the same 4-char prefix are automatically in the same cluster

This is completely decentralized — no node needs to "know" the full network. If you turn on a new node in Munich, it computes its geohash ("u281"), and it's automatically part of the Munich cluster. It doesn't need permission or coordination.

Border Nodes — The Bridges Between Clusters

A border node is any node that has neighbors in a different cluster. Border nodes are critical — they're the bridges for inter-cluster routing.

Cluster "u33d" Cluster "u33e" ┌──────────────┐ ┌──────────────┐ │ A B C │ │ F G H │ │ D │ │ I │ │ [E] ─┼── link ──┼─ [J] │ └──────────────┘ └──────────────┘ [E] and [J] are border nodes — they have neighbors across cluster boundaries. To send from A to H: A → D → EJ → G → H (within u33d) (bridge) (within u33e)

System 5 limits bridge links to 2 per cluster pair to prevent route explosion. The two strongest links between each pair of adjacent clusters are selected.

Why This Scales

Each node only needs to know:

A node in San Francisco doesn't need to know every individual node in Oakland. It just needs to know: "to reach Oakland, route through border node #47 on the Bay Bridge ridge."

Step 4: Route Computation — Distance-Vector with Multi-Path

Once clusters and border nodes are known, each node builds a routing table with up to 2 routes (primary + backup) to every reachable destination in its cluster view.

The Algorithm: Distance-Vector (not BFS)

Routes are not computed by running a graph algorithm on the node. Instead, they emerge incrementally from OGM data — similar to how RIP and B.A.T.M.A.N. build their routing tables:

  1. When node A receives an OGM from neighbor B, it learns: "B can reach node Z in 3 hops with quality 0.8"
  2. A updates its routing table: "I can reach Z via B in 4 hops, quality = 0.8 × q(A→B)"
  3. If A later hears from neighbor C that C can reach Z in 2 hops with quality 0.9, it records a second (better) route
  4. The routing table converges after a few OGM cycles — each node knows 1-2 next-hops per destination
Why distance-vector, not BFS?

Compute cost: Distance-vector requires one table-lookup per incoming OGM — a single comparison + write. BFS would require O(V+E) graph traversal per destination, which is unnecessary overhead on a 64 MHz nRF52840. The simulator uses BFS for clarity, but a real firmware implementation would use distance-vector.

Memory cost: A route entry needs only dst_id (4B) + next_hop (4B) + quality (1B) + age (2B) + hop_count (1B) = 12 bytes. Two routes per destination at 12 bytes each is far smaller than full path caching.

Distance-vector route learning: A builds routes to Z OGM from B: "I reach Z in 3 hops, quality 0.8" → A records: Route 1 to Z = via B, 4 hops, quality 0.8 × q(A→B) = 0.72 OGM from C: "I reach Z in 2 hops, quality 0.9" → A records: Route 2 to Z = via C, 3 hops, quality 0.9 × q(A→C) = 0.81 Result: 2 routes with independent next-hops. No graph traversal needed.

Route Quality

Route quality is the estimated end-to-end delivery probability, computed incrementally as each hop's OGM propagates quality information:

Route Quality (accumulated via OGMs) Q(route to Z via B) = Q(B's advertised quality to Z) × q(A→B)

This naturally penalizes long paths (more hops = more quality multiplications < 1.0) and paths with weak links.

Memory Budget

ComponentPer-EntryCountTotal
Route entry12 bytes2 routes × 35 destinations840 bytes
Neighbor table20 bytes16 neighbors320 bytes
Cluster metadata~25 bytes8 clusters200 bytes
Node state~100 bytes1100 bytes
Total~1.5 KB

Even with 70 destinations (large cluster view), total memory is ~2 KB — well within nRF52's ~64 KB usable RAM.

Step 5: Sending a Message — Weighted Route Selection

When node A wants to send a message to node Z, this is the exact decision process:

1

Check QoS Gate

Is this message's priority high enough for the current network health? (See Step 6)

2

Get Cached Routes

Look up all cached routes to destination Z. Typically 2-5 routes.

3

Filter Dead Routes

Remove routes where any intermediate node has died (battery = 0) or any link is broken.

4

Compute Weights

For each surviving route, compute a weight that balances three factors:

Route Weight (the core formula) W(r) = 0.4 × Q(r) + 0.35 × (1 − Load) + 0.25 × Batt
FactorWeightWhat It MeasuresWhy It Matters
Q(r) — Quality0.4 (40%) Accumulated route quality from distance-vector (product of per-hop qualities) Higher = fewer packet losses, fewer retries needed
1−Load — Spare capacity0.35 (35%) Queue utilization of the next-hop node (from its last OGM) Avoid routing through a congested next hop
Batt — Battery0.25 (25%) Battery level of the next-hop node (from its last OGM) Don't drain low-battery neighbors; prefer nodes on mains/solar power
Why next-hop only, not path-wide?

On LoRa networks, nodes only have reliable data about their direct neighbors (via OGMs). "Path-wide" battery or load information would require multi-hop propagation of per-node state — creating exactly the extra traffic that clustering is designed to avoid. Next-hop metrics are locally observable, always fresh, and add zero overhead.

5

Proportional Selection (Not Best-Only!)

This is a critical design choice. System 5 does not always pick the best route. It selects routes probabilistically, proportional to their weight:

Selection Probability P(route r) = W(r) / Σ W(all routes)

Example: If route 1 has weight 0.8 and route 2 has weight 0.4:

This keeps secondary routes "warm" — traffic occasionally flows through them, so the network knows they still work. If route 1 fails, route 2 is immediately available with recent quality data.

6

Hop-by-Hop Forwarding

The message is sent along the selected path, one hop at a time:

  1. A sends to B (first hop in the path)
  2. If B receives it → B forwards to D (next hop)
  3. If B doesn't receive it → A retries (up to 3 times for good links, 5 for poor links)
  4. If all retries fail → try the next cached route (up to 5 route attempts)
  5. If all routes fail → fall back to scoped cluster flooding
Adaptive Retries

Links with quality > 0.5 get 3 retries (likely to succeed quickly). Links with quality ≤ 0.5 get 5 retries (need more attempts). This balances delivery probability against airtime cost.

Backpressure — Automatic Congestion Avoidance

When an intermediate node's queue is filling up, System 5 applies gradual backpressure:

Queue LoadAction
< 80%Normal operation — route weight unaffected
80–95%Route weight penalized (× 0.8–0.95) — traffic shifts to alternatives
> 95%Route fully blocked — no new traffic routed through this node

This prevents cascading overload: when a node starts getting congested, traffic naturally redistributes to other paths before the node drops packets.

Step 6: QoS — Priority Under Pressure

Not all messages are equal. System 5 uses a Network Health Score (NHS) per cluster to throttle low-priority traffic when the network is stressed.

NHS Calculation (Locally Computed)

NHS is a value from 0.0 to 1.0 representing the cluster's health as seen by this node. It requires no extra traffic — it's computed entirely from data already present in received OGMs:

NHS is not a global cluster metric — each node computes its own local view. This avoids any extra polling or cross-node aggregation traffic.

Priority Gating

NHS RangeNetwork StateAllowed PrioritiesExample
0.8 – 1.0HealthyAll (0–7)Everything gets through
0.6 – 0.8Moderate0–5 onlyLow-priority telemetry throttled
0.4 – 0.6Degraded0–3 onlyOnly important messages
0.2 – 0.4Critical0–1 onlyEmergency/SOS only
< 0.2Collapsed0 onlySOS messages only

Priority 0 = SOS/Emergency — always gets through, regardless of network state. This ensures that in a disaster scenario, critical messages are never blocked by routine telemetry.

Step 7: Fallback — When All Routes Fail

If all 5 cached routes fail (after retries on each), System 5 doesn't give up. It falls back to scoped cluster flooding — a targeted mini-flood along the corridor between source and destination.

How Scoped Flooding Works

  1. Find the cluster-level path: Follow the cluster adjacency table from source cluster to destination cluster (built from border-node OGM exchanges)
  2. Define the corridor: Source cluster + destination cluster + all border nodes along the cluster path
  3. Flood only within the corridor: Message is broadcast only to nodes in the corridor — not the entire network
Full network (flooding would touch all 5 clusters): ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │ C1 │──│ C2 │──│ C3 │──│ C4 │──│ C5 │ └─────┘ └─────┘ └─────┘ └─────┘ └─────┘ Source in C1, destination in C4. Cluster path: C1 → C2 → C3 → C4 Scoped flood corridor (System 5 fallback): ┌─────┐ ┌─────┐ ┌─────┐ ┌─────┐ │ C1 │──│ B2 │──│ B3 │──│ C4 │ C5 is NOT flooded └─────┘ └─────┘ └─────┘ └─────┘ C1 and C4: full cluster flood (all members) B2 and B3: only border nodes + their immediate neighbors

This is dramatically cheaper than full-network flooding. In a 500-node network with 10 clusters, a full flood touches all 500 nodes. A scoped corridor flood touches ~100 nodes (2 full clusters + border nodes).

Known Limitation

When the corridor flood triggers frequently (as in the Bay Area half-duplex scenario), it can still generate high TX counts. This is the #1 optimization target: try more alternative directed routes before triggering the corridor flood.

Broadcast Routing — Cluster-Distributor Model

The previous steps describe unicast routing (one sender, one receiver). But in Meshtastic, ~98% of traffic is broadcast — position beacons, node info, telemetry. This is where the biggest efficiency gains are.

The Problem with Broadcast Flooding

In managed flooding, a single broadcast message triggers O(n) transmissions — every node that receives it rebroadcasts to all neighbors. In a 200-node network, one position beacon generates hundreds of transmissions, saturating the channel.

Cluster-Distributor: Wave Propagation

System 5 replaces blind broadcast flooding with structured wave propagation through elected cluster distributors:

1

Elect Distributors

Each cluster elects one distributor node — a valley-level node with high local connectivity but low cross-cluster leakage. Selection criteria:

2

Unicast to Distributor

When a node wants to broadcast, it sends the message via unicast to its cluster's distributor — 1-3 hops, ~1-3 TX. No flooding.

3

Local Mini-Flood

The distributor performs a mini-flood within its cluster only. Since clusters are small (20-30 nodes), this is cheap: ~20-30 TX to reach all cluster members.

4

Border Relay

Border nodes that receive the message relay it to the distributor of the adjacent cluster via unicast. The wave propagates cluster by cluster until all clusters are covered.

Broadcast wave propagation (235-node Bay Area): Source node in C1 sends to C1-Distributor (unicast, 2 TX) C1-Distributor mini-floods C1 (28 TX, reaches 30 nodes) Border B1→B2 relay to C2-Distributor (unicast, 1 TX) C2-Distributor mini-floods C2 (35 TX, reaches 42 nodes) Border B3→B4 relay to C3-Distributor (unicast, 1 TX) C3-Distributor mini-floods C3 (22 TX, reaches 25 nodes) ...continues until all clusters covered... Total: ~220 TX to reach 235 nodes (vs. 4,301 TX with managed flooding)

Results

ScenarioManaged Flood TXCluster-Dist TXMF ReachCD ReachTX Savings
Medium (50 nodes)8069781%74%88%
Large (100 nodes)~1,500~18075%80%88%
Bay Area (235 nodes)4,30122090%96%95%
Regional (500 nodes)95,86951791%100%99.5%
Why this matters for real Meshtastic

Position beacons, node info, and telemetry are broadcast every 15-900 seconds by every node. In a 100-node network with managed flooding, this means thousands of TX per minute just for housekeeping. With cluster-distributor, the same information reaches everyone with 88-99% fewer transmissions — freeing the channel for actual messages and dramatically reducing battery drain.

Airtime Budget (EU868)

EU868 mandates a 1% duty cycle — each node can transmit at most 36ms per 3.6s, or ~360ms per 36s. Here's how the budget works for a 25-node cluster:

Traffic TypeFrequencyPayloadAirtime (SF12)Budget Share
OGM beacon30-120s (adaptive)12 bytes~500ms0.4-1.7%
Broadcast (via distributor)Per message~30 bytes~800msPer event
Unicast hopPer message~50 bytes~1.2sPer event
Border summaryPer OGM cycle~20 bytes~600msShared with OGM

At the adaptive 60s OGM interval (typical for moderate density), a node uses ~0.8% of its 1% duty cycle for maintenance traffic — leaving headroom for actual data. With managed flooding, a single broadcast storm can consume the entire duty budget of every node in the cluster.

Step 8: Self-Maintenance — Keeping Routes Fresh

Networks change constantly: nodes move, batteries drain, links degrade. System 5 maintains itself through several mechanisms:

Route Quality Decay (Pheromone Model)

Inspired by ant colony optimization: at each OGM cycle (adaptive interval, see Step 2), all cached route qualities decay by 5%:

Quality Decay (per OGM cycle) Q(route) = Q(route) × 0.95

Routes that are actually used get their quality refreshed from real link measurements. Routes that are never used gradually fade to zero and are eventually replaced. This ensures the routing table always reflects current network conditions, not stale historical data.

Route Feedback

After each message delivery attempt:

Neighbor Eviction

When a node discovers a new neighbor but its table is full (16 entries), it evicts the neighbor with the lowest link quality. This ensures the routing table always contains the best available connections.

Dynamic Hop Limit

Unlike Meshtastic's fixed 3-7 hop limit, System 5 uses a dynamic limit that scales with network size:

Dynamic Max Hops max_hops = clamp(√n × 3, 15, 40)

For a 100-node network: max_hops = √100 × 3 = 30. For a 1000-node network: max_hops = 40 (capped). This allows messages to traverse large networks without artificial limits, while preventing infinite loops.

The Half-Duplex Problem (Bay Area Discovery)

This is the single most important real-world constraint that simulators typically ignore. LoRa radios are half-duplex: a node cannot transmit while it is receiving.

Why This Matters for Flooding

Consider a mountaintop router at 2000 ft elevation that can hear 100 nodes. When a message floods through the network:

  1. Node A sends a message. The mountaintop hears it (now in RX state).
  2. 10 rooftop nodes near A also hear it and rebroadcast simultaneously.
  3. The mountaintop is still receiving those 10 rebroadcasts — it cannot transmit.
  4. 4 long-range router nodes also rebroadcast. The mountaintop is still stuck in RX.
  5. By the time the mountaintop can transmit, the managed flooding algorithm's suppression timer has expired — it either rebroadcasts (causing more collisions) or suppresses (message dies).

Result: 5% actual utilization becomes 50% channel utilization at the mountaintop, and messages fail to propagate beyond the first hop.

Why System 5 Survives Half-Duplex

With directed routing, the mountaintop node receives one targeted packet (not 14 simultaneous rebroadcasts). It processes it, waits for a clear TX window, and forwards to the next hop. The radio state machine is simple:

Managed Flooding at mountaintop: RX(A) → RX(B) → RX(C) → ... → RX(N) → TX window? ALL BLOCKED Duration: 10-20 seconds of continuous RX → message dies System 5 directed routing: RX(packet for us) → TX(forward to next hop) → IDLE Duration: ~2 seconds total → message continues

Simulation Results

We built a Bay Area-style simulation with 235 nodes in 3 elevation tiers and half-duplex radio modeling. The results confirm the real-world observations:

ScenarioRouterDeliveryTotal TX
Bay Area without half-duplex Managed Flood87.5%908,785
System 580.5%47,094
Bay Area with half-duplex Managed Flood6.0%6,752
System 577.5%540,780

Half-duplex collapses flooding from ~87% → ~6% delivery. System 5 holds at ~74%. Try the Bay Area scenario in the live simulator →

Memory Footprint — Can This Run on nRF52?

Real-world mesh devices have tight memory constraints. The nRF52840 (used in RAK4631 solar routers) has 256KB RAM, with ~64KB available after BLE and LoRa stacks. System 5 is designed to fit comfortably.

Data Structures (Revised)

With distance-vector routing and compact route entries (12 bytes instead of full path caching at 410 bytes), memory requirements are dramatically lower than initially estimated:

StructurePer-EntryCountTotal
Neighbor entry20 bytes16 max320 bytes
Route entry (dst + next-hop + quality + age + hops)12 bytes2 routes × N destinationsvaries
Cluster metadata~25 bytes8 clusters200 bytes
Node own state~100 bytes1100 bytes

Scaling with Network Size

Network SizeDestinations TrackedRouting MemoryFits nRF52 (64KB)?
20 nodes (local)20~1.1 KBYes (1.7%)
100 nodes (city)~40 (cluster view)~1.6 KBYes (2.5%)
500 nodes (regional)~50 (cluster + borders)~1.8 KBYes (2.8%)
1000 nodes~70 (cluster + borders)~2.3 KBYes (3.6%)
10,000 nodes~200 (cluster + borders)~5.4 KBYes (8.4%)

The key insight: geo-clustering means a node only tracks its own cluster + border routes, and distance-vector means each route is just 12 bytes (destination + next-hop + metadata), not a full path. Even a 10,000-node network uses only ~5.4 KB for routing — leaving >90% of available RAM for the application.

Previous vs. Revised Estimates

An earlier version of this proposal estimated ~410 bytes per route entry (storing full multi-hop paths) and 5 routes per destination, leading to estimates of 30-84 KB for large networks. The distance-vector approach with 12-byte entries and 2 routes per destination reduces this by >95%. This was correctly flagged as a concern in community feedback — thank you.

Bay Area Topology — The Real-World Stress Test

Based on feedback from Bay Area Mesh operators, we built a simulation that models the actual network structure:

Three-Tier Elevation Model

TierNodesElevationRangeTerrainRole
Mountain7 (3%)600-1200m45kmFree-space Backbone routers (Mt Diablo, Mt Tam, etc.)
Hill/Rooftop35 (15%)150-500m10kmSuburban Bridge between mountain and valley
Valley/Indoor193 (82%)0-100m0.75-2.5kmUrban/Indoor End-user handhelds and indoor nodes

What Makes This Scenario Hard

Try this scenario in the Live Simulator → — select "Bay Area Mesh (235 nodes, 3-tier elevation)" from the dropdown.

Read the full Q&A with Bay Area Mesh community →

Node Silencing — Muting Redundant Nodes

One of the most impactful v2.0 features, inspired by Bay Area Mesh feedback. The core insight: in a 235-node network, most valley nodes are redundant — their neighbors can all be reached via other paths. Every time these redundant nodes rebroadcast, they add collision noise at mountaintop receivers without contributing to message delivery.

How It Works

  1. Redundancy scoring: For each node, check every neighbor — can that neighbor be reached by at least 2 other alive nodes? If yes, this neighbor connection is "redundant." The node's redundancy score = fraction of redundant neighbor connections (0.0 = critical, 1.0 = fully replaceable).
  2. Critical bridge protection: Border nodes with few alternatives (≤3 other bridges to the same cluster pair) get their score heavily penalized. Mountain nodes and essential hill nodes are never silenced.
  3. Battery-weighted priority: silence_priority = redundancy × 0.6 + (1 - battery) × 0.4. Low-battery nodes are silenced first. Solar nodes (always 100%) stay active longest.
  4. Per-cluster application: Within each cluster, the top 60% of candidates (by priority) are silenced. At least 2 nodes always remain active per cluster.

What Silenced Nodes Do (and Don't)

ActionSilenced NodeActive Node
Receive OGMsYes — stays aware of networkYes
Receive direct messagesYes — can be addressedYes
Send own messagesYes — can initiateYes
Rebroadcast floodingNo — stays silentYes
Send OGMsNo — saves airtimeYes
Forward directed S5 packetsYes — if on the computed pathYes

Battery-Fair Rotation

The same nodes can't be silenced forever — their batteries would last longer, but the active nodes would drain faster. System 5 rotates the silent set every 10 minutes:

  1. All expired silences are lifted
  2. Redundancy scores are recomputed (nodes may have moved, died, or changed load)
  3. A new set of nodes is selected for silencing, weighted by current battery level

Result: every node spends roughly equal time active and silent. Battery drain is distributed evenly across the network.

Results

Bay Area ScenarioS5 DeliveryS5 TXNodes Silenced
Without silencing77.5%540,7800
With silencing74.5%267,927134 (57%)
128 valley nodes silenced, 6 hill nodes silenced, 0 mountain nodes silenced

Emergency Re-Route — Last Resort Before Flooding

When all 5 cached routes fail (every hop was tried with retries), the original System 5 immediately triggered scoped corridor flooding. This was expensive — in the Bay Area scenario, 73 out of 200 messages triggered fallback floods.

The v2.0 improvement adds one more step before flooding:

  1. Collect all intermediate nodes from the failed routes into a failed_nodes set
  2. Run a fresh BFS from source to destination, excluding all failed nodes
  3. If a new path is found, try it once (directed, not flooding)
  4. Only if this emergency path also fails → trigger corridor flooding

This is cheap (one BFS computation, no extra TX unless the path works) and often succeeds because the failed nodes were the actual problem — the rest of the network may have perfectly good paths.

Sequence Numbers — Detecting Missing Messages

Multi-path routing can deliver messages out of order. If messages A, B, C are sent via three different paths with different latencies, the receiver might see C, B, A — or worse, C, A (B lost on a failed path).

The v2.0 wire protocol adds a 2-byte sequence counter (uint16_t seq) per (source, destination) pair:

Why not retransmit?

Retransmission requires an ACK + retransmit cycle. On LoRa, each packet takes 500ms-2s of airtime. A 5-hop retransmit costs 5-10 seconds of channel time — during which the mountaintop is blocked from all other traffic. Sequence numbers provide gap detection at zero TX cost (just 2 extra bytes in the header). The app layer can decide whether to request a retransmit or simply show "1 message missing."

The sequence counters are stored efficiently in firmware: neighbor-indexed array for known neighbors (32 bytes) + LRU cache for others (96 bytes) = 128 bytes total, regardless of network size.

Packet Format — What Goes Over the Air

Every System 5 packet has a 22-byte header:

OffsetFieldSizeDescription
0Version1 byteProtocol version (currently 0x01)
1Type1 byteDATA (0x01), OGM (0x02), ACK (0x03), CLUSTER_ANNOUNCE (0x04)
2Source ID4 bytesOriginator node ID
6Destination ID4 bytesTarget node ID (0xFFFFFFFF = broadcast)
10Packet ID4 bytesUnique per-packet (for deduplication)
14Hop Count1 byteCurrent hop count (incremented each hop)
15Max Hops1 byteTTL — dynamic, based on √n × 3
16Priority1 byteQoS priority (0 = SOS, 7 = lowest)
17Flags1 byteFallback bit, route-request bit, etc.
18Payload Length2 bytesPayload size in bytes
20Checksum2 bytesCRC-16 of header + payload
22+PayloadvariableApplication data (max ~230 bytes for LoRa)

System 5 vs. Everything Else

PropertyNaive FloodManaged FloodNext-HopSystem 5
TX cost per messageO(n)O(n) × 0.5O(hops)*O(hops)
Works for broadcastsYesYesNoYes
Hop limit neededYes (3-7)Yes (3-7)PartiallyNo (dynamic)
Multi-path failoverNoNoNo5 routes
Load balancingNoNoNoWeighted
Congestion avoidanceNoNoNoBackpressure
QoS priorityNoNoNo8 levels + NHS gate
Half-duplex resilientNoNoPartiallyYes
GPS requiredNoNoNoYes**
Memory overheadMinimalMinimalLowModerate (~8-30KB)

* Next-hop only works for direct messages after a learning flood. First message still floods.
** GPS required for clustering, but RSSI triangulation and cluster inheritance provide fallbacks.

Try the Live Simulator → ← Back to Presentation