ChirpStack HA: Gateway Bridge scaling and UDP routing constraints

Viewed 10

Hi,

we are working on a production deployment of ChirpStack in HA and we are facing some challenges around the scaling of chirpstack-gateway-bridge.

Architecture

  • LoRaWAN gateways send UDP (Semtech protocol, port 1700) over private APN (4G)
  • Traffic enters through an F5 load balancer
  • Then goes to multiple packet multiplexer pods (running on OpenShift)
  • From there it is forwarded to multiple instances of chirpstack-gateway-bridge
  • Backend components (MQTT, Redis, PostgreSQL, ChirpStack services) are already deployed in HA

Problem

From another post on the forum:

https://forum.chirpstack.io/t/chirpstack-high-availability-ha-and-scaling

"UDP data from a single gateway should always go to the same GW Bridge instance (because the GW Bridge keeps a state)"

In our architecture:

  • F5 balances traffic towards multiple multiplexer pods
  • each multiplexer forwards UDP packets to the Gateway Bridge layer

The issue is that each gateway should consistently hit the same Gateway Bridge instance, but this is not guaranteed in our current setup.

During tests with real gateway traffic we also observed warnings like:

[store] memorystore del: message X not found

which seems consistent with messages from the same gateway being processed by different instances.

Questions

We would like to better understand how this is usually handled in real deployments:

  • What actually happens if this constraint is not respected?

    • Can it lead to message loss, inconsistent state, or broken flows?
    • Is the warning above something to worry about in production, or can it be ignored?
  • How is this scenario handled when scaling the Gateway Bridge?

  • How do you handle existing gateways already in production when moving to a setup with multiple Gateway Bridge instances?

    • Is there any migration concern or operational impact?
  • What happens in case of failure of one Gateway Bridge instance?

    • If traffic is redistributed to another instance that has never “seen” those gateways before, how is this handled?

We are interested in understanding how others have approached this in real-world environments.

  • Has anyone deployed ChirpStack with multiple Gateway Bridge instances in HA?
  • How did you deal with UDP routing consistency?
  • Any reference architectures or case studies would be very helpful.

Thanks in advance.

1 Answers

The issue is that each gateway should consistently hit the same Gateway Bridge instance, but this is not guaranteed in our current setup.

Yes, this should be respected as each ChirpStack Gateway Bridge keeps track of the UDP sockets per gateway as well as the MQTT topic subscriptions. You will end up with (worst-case) all Gateway Bridge instances having subscriptions for all data, meaning that all Gateway Bridge instances are forwarding data to your gateways on downlink (so you get multiple UDP packets for 1 downlink). There are more issues (e.g. internal cache).

Typically users fix this by making the load-balancer session sticky per IP address, thus 1 gateway always gets routed to the same Gateway Bridge or using DNS based load-balancing (multiple A records, each Gateway Bridge its own public IP). The gateway will resolve the hostname, get one of the IP addresses and starts forwarding to that address only.