Context
- Platform: ChirpStack v4.3.0 (NS + AS), EU868 region
- Integrations: Redis (sessions + streams), MQTT enabled
- MQTT replay:
max_inflight_messages = 1,qos = 1on the gateway side + patchgateway/backend/mqtt.rs
// get message stream: special compil with
let mut stream = client.get_stream(None);
- Radio topology: 2 gateways (GW1 primary, GW2 secondary)
- Fleet: 3 devices, all with FrameCounter Validation disabled (to accept data that is not in chronological order)
- Period: 17/10 → 22/10 (UTC/local doesn’t matter, same window)
Symptoms
On 22/10 between 13:13 and 13:37, the NS emits 433 warnings for one device DEVICE A (only):
WARN ... chirpstack::uplink::data: None of the device-sessions for dev_addr resulted in valid MIC dev_addr=0x0183b587
From 13:46 onward, uplinks from the same device are OK again and things return to normal.
Topology (ASCII)
(Internet)
│
┌──────────────┴──────────────┐
│ Gateway 1 (GW1) │
│ primary │
│ 17→18/10 : OK │
│ 18→22/10 : OFFLINE │
│ 22/10 : buffer flush │
└──────────────┬──────────────┘
│
┌────────────────────┼────────────────────┐
│ │ │
┌───┴───────────┐ ┌───┴─────────┐ ┌───┴───┐
│ Dev A │ │ Dev B │ │ Dev C │
│ (EUIFA) │ │ (EUIxx) │ │ (EUIyy)
│ │ │ │ │ │
│→ GW1 (mostly) │ │→ GW1 (~60%) │ │→ GW1 (100%)
│→ GW2 (2 frames)│ │→ GW2 (~40%) │ │ │
│FCnt1031/1032 │ │ │ │ │
│last_fcnt32↑ │ │ │ │ │
│Stored on GW1 │ │Stored on GW1│ │Stored on GW1
└────────┬──────┴──────┬───────────┘ └───────┘
│ │
│ │
┌─────┴─────────────┴─────┐
│ Gateway 2 (GW2) │
│ secondary │
│ 17→22/10 : online │
│ receives A (2 frames), │
│ B (~40%) │
└──────────────────────────┘
▼ 22/10 ~13:13 → 13:37
┌──────────────────────────────────────────────────────┐
│ GW1 comes back online and flushes its buffer: │
│ • Dev A → 433 uplinks rejected (warn session) │
│ • Dev B → OK 100% (out-of-order replay, valid MIC) │
│ • Dev C → OK 100% (in order, valid MIC via GW1) │
└──────────────────────────────────────────────────────┘
▼ 22/10 ~13:46+
┌──────────────────────────────────────────────────────┐
│ Normal operation resumes: │
│ • Live uplinks received: FCnt 1155, 1156, 1157... │
│ • All devices A, B, C back to 100% OK │
└──────────────────────────────────────────────────────┘
Scope
-
Device A:
dev_eui=00800000000255fa,dev_addr=0183b587(impacted)- Not all uplink warnings are resent as data.
- Rate: 1 message / 15 min (no 16→32‑bit rollover within the window)
- Uplink: OTAA (keys/sessions OK), no join detected during the incident
- Uniqueness: DevAddr unique in the database (verified in Postgres)
-
Device B: no issue, 100% data revovered at the end(mostly received by GW1, also by GW2)
- Rate: 1 message / 15 min (no 16→32‑bit rollover)
-
Device C: no issue received , 100% data revovered at the end only by GW1 (never by GW2)
- No issue.
Timeline (observed in logs)
- 17/10 14:18 → 18/10 03:33: regular uplinks OK (GW1).
- 18/10: GW1 loses Internet and buffers. GW2 stays online and delivers 2 uplinks from Device A . Device B goes through GW2 for about 40% of its cadence. Device C never goes through GW2.
- 18/10 → 22/10: GW1 offline continues to store local devices (A, B, C). During this period, Device B goes through GW2 about 30%; Device C never through GW2.
- 22/10 13:13–13:37: GW1 comes back and flushes its queues → burst of warning session for
dev_addr=0183b587(Device A, 433 lines).- Device B: no warning (replayed out of order, but all validated).
- Device C: no warning (never via GW2, all validated).
- 22/10 13:46+: reception of live frames (e.g.,
FCnt 1155, 1156, 1157) → 100% OK for A, B, and C.
Interpretation / hypothesis
- The 433 rejections for Device A cover ≈ 4d08h at 1 message / 15 min => device A has some holes in the data.
- Devices B and C were also replayed, out of order, but everything was recovered at 100%.
Items already checked
- DevAddr unique (Postgres).
- No Join/Rejoin visible in the window.
- No counter rollover (15‑min cadence).
- Logs:
Gateway rx-info savedfor otherdevEUIat flush time → GW1 did flush its queues.
Questions for the forum
- Why does Device A have issues while B and C do not?
- Is there a link with the fact that Device A briefly went through GW2 (valid)?
- Any configuration recommendations to prevent this case (out‑of‑order → MIC failure) from happening again?
Log excerpts
Burst on 22/10 13:13–13:37:
WARN ... chirpstack::uplink::data: None of the device-sessions for dev_addr resulted in valid MIC dev_addr=0183b587
(repeated ~433 times)
Recovery around 13:46+:
INFO ... chirpstack::storage::device_session: Device-session saved dev_eui=00800000000255fa dev_addr=0183b587