Setting up a load balancer for your application?? Ever wondered how a response from the backend server is routed back to the client?? Now picture this: A user pings your service via the load balancer's IP, but the reply comes from some backend server's address instead. Boom— the client's TCP/IP stack freaks out, thinking, "Wait, that's not who I talked to!" and drops the link. If you've ever wondered how networks dodge this bullet without sacrificing speed or security, stick around. We'll unpack it all, from the basics to the clever fixes, and by the end, you'll see why this stuff is fascinating.
Before diving into the topic, let’s take a moment to revisit what load balancers are, for readers who might be unfamiliar. Load balancers are essentially the bouncers of the internet world—they stand between your users and your servers, spreading out incoming requests so no one machine gets slammed. Whether it's a shopping site during a flash sale or a streaming service during prime time, these tools ensure smooth sailing by distributing the load evenly. They can detect if a server's acting up and reroute traffic elsewhere, all while scaling up as needed.
Load balancers aren't one-size-fits-all; they operate at different "layers" of the network stack, which basically means they vary in how smart they are and speed.
| Aspect | Layer 4 (e.g., NLB) | Layer 7 (e.g., ALB) |
|------------------------|--------------------------------|--------------------------------------|
| What they check | IPs, ports, protocols | URLs, headers, content |
| Speed & Efficiency | Super fast, low overhead | Smarter but slightly slower |
| Best For | High-volume, low-latency needs | Content-based routing, like APIs |
| Client IP Handling | Often preserves it | Usually masks it with translation |
In a typical setup with client IP preservation (common in NLB for accurate logging and security), the client sends a request to the load balancer's VIP. The balancer forwards it to a backend server without changing the source IP, so the backend sees the real client's address. The server processes the request and crafts a response, setting its own IP as the source and the client's IP as the destination. If this response were sent directly into the network, it would bypass the balancer, leading to asymmetric routing. More critically, the client—having initiated the connection to the VIP—expects the response's source IP to match that VIP. A packet arriving from the backend's IP instead? The TCP stack flags it as invalid, potentially resetting the connection.
Client
| Request
v
Load Balancer (VIP)
| Request
v
Backend Server (sees Client IP)
| Response
v
Client <-- (Response comes from Backend IP instead of VIP ❌)
So, networks must ensure responses appear to come from the balancer, maintaining symmetry and IP consistency. But how? Enter traditional techniques like Source Network Address Translation (SNAT) and Direct Server Return (DSR), before we get to AWS's advanced solution.
SNAT is the go-to for simplicity. Here's how it rolls: When the load balancer forwards a request to the backend, it rewrites the source IP from the client's to its own VIP. The backend thinks the request came from the balancer and responds accordingly—to the VIP. The balancer then reverses the translation, sending the response back to the client with the VIP as the source IP.
Request Path:
Client (IP=C1)
|
v
Load Balancer (VIP) -- rewrites source IP to VIP
|
v
Backend Server (sees source=VIP, dest=Backend)
Response Path:
Backend Server (src=Backend, dest=VIP)
|
v
Load Balancer -- rewrites src back to VIP and dest back client IP
|
v
Client (sees src=VIP, dest=C1 âś… )
Pros: Easy setup, works anywhere, handles stateful stuff like sessions beautifully. Cons: Backend loses the original client IP, so say goodbye to accurate geo-tracking or per-user logs without extra headers. When to use it: Pretty much always for Layer 7 balancers or when you don't need client IP visibility on the backend. It's cloud-friendly and doesn't care about network topology—great for distributed setups where servers are scattered.
For performance-focused setups, DSR preserves the client IP on the inbound path. The load balancer forwards requests without SNAT, but the backend is configured with the VIP on a loopback interface. When responding, the server spoofs the VIP as its source IP, sending packets directly to the client—bypassing the balancer on the return path. This reduces load on the balancer, boosting throughput for bandwidth-heavy apps. The client sees the expected VIP as the source, avoiding disconnection. However, it requires each backend server to have the LB's VIP configured as a loopback address.
MAC addresses operate at Layer 2 (Data Link layer) and are only relevant within the same local network segment (e.g., a VLAN or subnet). In DSR, the key assumption is that the load balancer and backend servers are on the same Layer 2 network (no routing hop between them). Here's why and how MAT is involved:
This is why DSR is sometimes called "Layer 2 DSR" or "triangle mode." It's efficient because it avoids full IP rewriting, but it relies on L2 proximity.
Client (src=C1, dst=VIP, MAC=LB)
|
v
Load Balancer -- rewrites dst MAC=Backend, IP untouched
|
v
Backend Server (VIP on loopback, sees src=C1, dst=VIP)
The backend constructs the response packet with:
No MAC translation is required here because the backend sends the packet directly into the network. As it traverses routers toward the client, MAC addresses are rewritten hop-by-hop anyway (standard Ethernet behavior).
The client receives the packet with the expected source IP (VIP), and the TCP stack accepts it. The source MAC at delivery will be whatever the client's local router/gateway uses—not relevant to the connection.
Backend Server (src=VIP, dst=C1, srcMac=BackendMAC, destMac=MAC of next hop )
|
v
Direct to Client (bypasses LB)
|
v
Client (sees src=VIP âś…)
AWS didn't just build load balancers; they engineered an entire distributed system to handle networking at hyperscale. At the heart of AWS's Network Load Balancer (NLB) is Hyperplane—a sophisticated, software-defined networking (SDN) subsystem integrated into Amazon VPC. It's not a single device but a distributed packet-forwarding engine running on fleets of EC2 instances across Availability Zones, powering services like NLB, NAT Gateway, and even Lambda VPC networking.
Hyperplane creates the illusion of a traditional network while operating in a fully virtualized, distributed environment. Here's the step-by-step magic, focusing on solving the IP mismatch:
Hyperplane's Role in the Big Picture
Hyperplane isn't a single device or server; it's a software-defined, distributed packet-forwarding engine integrated into the Amazon VPC networking fabric. It powers several AWS services beyond NLB, like NAT Gateways and VPC endpoints, by running on fleets of EC2 instances across Availability Zones (AZs). For NLB specifically, Hyperplane acts as the "brain" that makes the load balancer appear as a seamless, scalable entity while manipulating traffic transparently.
When you create an NLB, it provisions Elastic Network Interfaces (ENIs) in your specified subnets (one per AZ typically). Incoming packets destined for the NLB's IP arrive at these ENIs, but Hyperplane intercepts and processes them within the VPC data plane. This setup allows NLB to scale horizontally without bottlenecks, handling millions of flows per second.
Backend Server Selection
Hyperplane decides which backend server (target) gets the request using a consistent flow hashing algorithm. Here's how:
Hyperplane uses techniques like shuffle sharding to distribute workloads across nodes, minimizing blast radius from failures and enabling massive scale.
Packet Manipulation and Routing
Hyperplane performs network address translation (NAT) and other manipulations in-flight, creating the "illusion" of direct communication while ensuring symmetry and client IP preservation. No changes are needed on your backend servers.
Inbound Traffic (Client to Backend):
Outbound Traffic (Backend to Client):
All this happens without the backend knowing about the NLB; it thinks it's communicating directly with the client.
State Management and Other "Etc."
Client (src=C1, dst=VIP)
|
v
NLB Elastic Network Interface (ENI in AZ)
|
v
[ Hyperplane Distributed Nodes ]
| - Select backend via 5-tuple hash
| - DNAT: VIP -> Backend IP
| - Preserve Client IP
|
v
Backend Server (sees Client IP)
Response Path:
Backend Server (src=Backend, dst=C1)
|
v
[ Hyperplane Nodes ]
| - SNAT: Backend -> VIP
| - Ensure symmetry + client trust
|
v
Client (sees src=VIP, dest=C1 âś…)
We've gone from "What's a load balancer?" to cracking the response-routing riddle. Whether you go with SNAT for ease, DSR for raw speed (but only in tight-knit networks), or Hyperplane for cloud elegance, the goal's the same: Keep those connections alive and kicking. Next time you're architecting a system, think about this early—it'll save you headaches down the line :).