Hi,
we are working on a production deployment of ChirpStack in HA and we are facing some challenges around the scaling of chirpstack-gateway-bridge.
Architecture
- LoRaWAN gateways send UDP (Semtech protocol, port 1700) over private APN (4G)
- Traffic enters through an F5 load balancer
- Then goes to multiple packet multiplexer pods (running on OpenShift)
- From there it is forwarded to multiple instances of chirpstack-gateway-bridge
- Backend components (MQTT, Redis, PostgreSQL, ChirpStack services) are already deployed in HA
Problem
From another post on the forum:
https://forum.chirpstack.io/t/chirpstack-high-availability-ha-and-scaling
"UDP data from a single gateway should always go to the same GW Bridge instance (because the GW Bridge keeps a state)"
In our architecture:
- F5 balances traffic towards multiple multiplexer pods
- each multiplexer forwards UDP packets to the Gateway Bridge layer
The issue is that each gateway should consistently hit the same Gateway Bridge instance, but this is not guaranteed in our current setup.
During tests with real gateway traffic we also observed warnings like:
[store] memorystore del: message X not found
which seems consistent with messages from the same gateway being processed by different instances.
Questions
We would like to better understand how this is usually handled in real deployments:
-
What actually happens if this constraint is not respected?
- Can it lead to message loss, inconsistent state, or broken flows?
- Is the warning above something to worry about in production, or can it be ignored?
-
How is this scenario handled when scaling the Gateway Bridge?
-
How do you handle existing gateways already in production when moving to a setup with multiple Gateway Bridge instances?
- Is there any migration concern or operational impact?
-
What happens in case of failure of one Gateway Bridge instance?
- If traffic is redistributed to another instance that has never “seen” those gateways before, how is this handled?
We are interested in understanding how others have approached this in real-world environments.
- Has anyone deployed ChirpStack with multiple Gateway Bridge instances in HA?
- How did you deal with UDP routing consistency?
- Any reference architectures or case studies would be very helpful.
Thanks in advance.