AU915 LoRaWAN OTAA: JoinAccept “sent/scheduled” by gateway but 50% of devices never join (stuck in a JoinRequest loop)

Viewed 63

Hello,

We’re troubleshooting a persistent downlink problem in a remote deployment. Uplinks (JoinRequest) are received reliably, but downlinks (JoinAccept) appear to be scheduled/sent by the gateway and network server, yet many end devices never receive them and remain stuck in a join loop.

Setup

  • Region: AU915, Class A, OTAA
  • Gateway: RAK7289v2 (ChirpStack Gateway OS 4.9.0)
  • Network Server: ChirpStack LNS running in Docker
  • End devices: Dragino PS-LB-NA

Deployment includes a “border gateway” plus a two 1-hop relay gateways, but we also reproduced the issue in non-mesh mode.

Symptoms / impact

  • ~70 of 151 devices affected (Remainder are working flawlessly).
  • Devices repeatedly transmit JoinRequest; gateway/LNS processes them, but devices never complete OTAA.
  • Rebooting gateway/services doesn’t resolve it.

One problematic device showed no incoming downlinks on serial while on-site; later it successfully joined when tested around 100m away from teh oringla location.

What we’ve checked / tried

Signal strength doesn’t explain it: failing devices include some with relatively strong RSSI/SNR.

Duplicate uplinks / dedup hypothesis tested: we suspected JoinRequests being heard by two gateways and JoinAccept being sent by the “wrong” one. We turned mesh/relay off overnight (border gateway only) and the issue persisted, so this seems unlikely.

TX path sanity check: the relay gateway receives an extremely strong signal (≈-30 dBm @ 4km (almost too strong)) from the border gateway, suggesting the border gateway can transmit (at least in that link).

Gateway logs show downlink scheduling/acks: gateway receives downlink commands and returns ACK items like “OK/IGNORED” (and sometimes “COLLISION_PACKET/OK”), and the concentrator logs show “Scheduled packet for TX”.

Open questions

Are there known RAK7289v2 / SX1302 failure modes where downlinks can be “scheduled” but not actually transmitted or are transmitted with poor signal?

Could this be timing/clock drift (progressively worsening over time), GPS/PPS issues, or an AU915 RX1/RX2 / channel-mask mismatch that only affects downlinks/joins on some devices?

We can share gateway logs and JoinRequest/JoinAccept examples if helpful.

Thank you so much for your help :-)

2 Answers

Thanks Orne,
We do get a log message (on both LNS and gateway) indicating that the join-accept downlink has been sent.

We will bring a spectrum analyzer with us when we visit the system next.

I'll keep you posted on how this issue unfolds.

Update 25/2/26:
We found the primary cause of the issue to be with damaged gateways. We believe the gateway was damaged during a thunderstorm (one happened during the time things began failing). Replacing the RAK5146 inside the gateway resolved this issue immediately. We believe the symptoms were caused by the packets being received by the much further away gateway. Due to the asymmetry of demodulation in AU915 (and the gateway having a more sensitive antenna), we found that the end devices were being received and responded to by the further away gateway and hence the signal wasn’t strong enough to get back to them.
The second issue that we encountered was that the default deduplication time (200ms) was too short to be able to reliably receive all packets in the mesh before deciding. We found that increasing this to 500ms was enough to allow the devices to connect.
The -30dBm TX path we were seeing was an echo from the gateway when sending mesh frames.
Thank you for your help in resolving the issue 😊

For the failing OTAA activations, do you see a JoinAccept in the LoRaWAN frames tab (ChirpStack web-interface, device view)? (I believe you do based on above info).

If that is the case, then this confirms that the downlink was accepted by the gateway (only after receiving a TX ACK, ChirpStack will show the JoinAccept / downlinks). Note that the TX ACK is sent before the actual transmission, it is more a confirmation that it can be put in the queue.

Then it would mean that the issue is at the gateway, or between the gateway and the device. E.g. I have seen the same issue in the past were the device > gateway path was very good in terms of link margin, but that the gateway > device path was very bad (the isuse in this case was with the device antenna).

Things you could look at:

  • You should see a log message when the join-accept downlink has been sent
  • Using a spectrum analyzer could also help to debug if the gateway is actually transmitting the downlink (Air Spy SDR + https://www.gqrx.dk/ is what I often use).