Behind the Scenes of Load Balancing:Uncovering the Hidden IP Translation

Setting up a load balancer for your application?? Ever wondered how a response from the backend server is routed back to the client?? Now picture this: A user pings your service via the load balancer's IP, but the reply comes from some backend server's address instead. Boom— the client's TCP/IP stack freaks out, thinking, "Wait, that's not who I talked to!" and drops the link. If you've ever wondered how networks dodge this bullet without sacrificing speed or security, stick around. We'll unpack it all, from the basics to the clever fixes, and by the end, you'll see why this stuff is fascinating.

What are Load Balancers?

Before diving into the topic, let’s take a moment to revisit what load balancers are, for readers who might be unfamiliar. Load balancers are essentially the bouncers of the internet world—they stand between your users and your servers, spreading out incoming requests so no one machine gets slammed. Whether it's a shopping site during a flash sale or a streaming service during prime time, these tools ensure smooth sailing by distributing the load evenly. They can detect if a server's acting up and reroute traffic elsewhere, all while scaling up as needed.

Layer 4 vs. Layer 7:

Load balancers aren't one-size-fits-all; they operate at different "layers" of the network stack, which basically means they vary in how smart they are and speed.

Layer 4 Balancers (The Speed Demons): Network load balancers (NLBs) operate at the transport layer, handling protocols like TCP and UDP. They inspect IP addresses and ports to decide where to route traffic, often relying on lightweight hashing to ensure connection consistency. In practice, most NLBs follow what’s known as the 5-tuple approach: (source IP, source port, destination IP, destination port, protocol). This tuple is hashed to determine the target backend server.This tuple is then hashed to a particular backend server. AWS's Network Load Balancer (NLB) is a prime example—perfect. No deep dives into your data; just fast forwarding.
Layer 7 Balancers (The Brainy Ones): Up at the application layer, these inspect HTTP headers, URLs, even cookies. Want to route shopping cart requests one way and login pages another? AWS's Application Load Balancer (ALB) has your back. They're great for modern web apps but chew a bit more processing power since they're peeking inside the packets.

| Aspect                 | Layer 4 (e.g., NLB)            | Layer 7 (e.g., ALB)                  |
|------------------------|--------------------------------|--------------------------------------|
| What they check        | IPs, ports, protocols          | URLs, headers, content               |
| Speed & Efficiency     | Super fast, low overhead       | Smarter but slightly slower          |
| Best For               | High-volume, low-latency needs | Content-based routing, like APIs     |
| Client IP Handling     | Often preserves it             | Usually masks it with translation    |

The Core Topic: Response Packets and the IP Identity Crisis

In a typical setup with client IP preservation (common in NLB for accurate logging and security), the client sends a request to the load balancer's VIP. The balancer forwards it to a backend server without changing the source IP, so the backend sees the real client's address. The server processes the request and crafts a response, setting its own IP as the source and the client's IP as the destination. If this response were sent directly into the network, it would bypass the balancer, leading to asymmetric routing. More critically, the client—having initiated the connection to the VIP—expects the response's source IP to match that VIP. A packet arriving from the backend's IP instead? The TCP stack flags it as invalid, potentially resetting the connection.

Client
   | Request
   v
Load Balancer (VIP)
   | Request
   v
Backend Server (sees Client IP)
   | Response
   v
Client  <-- (Response comes from Backend IP instead of VIP ❌)

So, networks must ensure responses appear to come from the balancer, maintaining symmetry and IP consistency. But how? Enter traditional techniques like Source Network Address Translation (SNAT) and Direct Server Return (DSR), before we get to AWS's advanced solution.

Source Network Address Translation (SNAT)

SNAT is the go-to for simplicity. Here's how it rolls: When the load balancer forwards a request to the backend, it rewrites the source IP from the client's to its own VIP. The backend thinks the request came from the balancer and responds accordingly—to the VIP. The balancer then reverses the translation, sending the response back to the client with the VIP as the source IP.

Request Path:

Client (IP=C1) 
   |
   v
Load Balancer (VIP)  -- rewrites source IP to VIP
   |
   v
Backend Server (sees source=VIP, dest=Backend)

Response Path:

Backend Server (src=Backend, dest=VIP)
   |
   v
Load Balancer -- rewrites src back to VIP and dest back client IP
   |
   v
Client (sees src=VIP, dest=C1 ✅ )

Pros: Easy setup, works anywhere, handles stateful stuff like sessions beautifully. Cons: Backend loses the original client IP, so say goodbye to accurate geo-tracking or per-user logs without extra headers. When to use it: Pretty much always for Layer 7 balancers or when you don't need client IP visibility on the backend. It's cloud-friendly and doesn't care about network topology—great for distributed setups where servers are scattered.

Direct Server Return (DSR)

For performance-focused setups, DSR preserves the client IP on the inbound path. The load balancer forwards requests without SNAT, but the backend is configured with the VIP on a loopback interface. When responding, the server spoofs the VIP as its source IP, sending packets directly to the client—bypassing the balancer on the return path. This reduces load on the balancer, boosting throughput for bandwidth-heavy apps. The client sees the expected VIP as the source, avoiding disconnection. However, it requires each backend server to have the LB's VIP configured as a loopback address.

A deep dive into DSR using an example

The load balancer receives a client request destined for its Virtual IP (VIP).
It forwards the request to a backend server without changing the IP headers (no SNAT or DNAT)—preserving the original source (client IP) and destination (VIP).
The backend server is configured with the VIP on a loopback interface
The load balancer does MAT (Mac Address Translation) to have the request routed to a specific backend server
The backend processes the request and sends the response directly to the client, spoofing the source IP as the VIP (so the client sees the expected IP and doesn't drop the connection).

Where MAC Address Translation Comes In

MAC addresses operate at Layer 2 (Data Link layer) and are only relevant within the same local network segment (e.g., a VLAN or subnet). In DSR, the key assumption is that the load balancer and backend servers are on the same Layer 2 network (no routing hop between them). Here's why and how MAT is involved:

Inbound Path (Client → Load Balancer → Backend):

The incoming packet from the client has:
- Destination IP: VIP (load balancer's IP).
- Destination MAC: The load balancer's MAC (resolved via ARP).
The load balancer selects a backend server based on its algorithm (e.g., hash or round-robin).
Instead of routing the packet (which would involve IP changes), the load balancer acts like a bridge or switch: It rewrites the destination MAC address from its own MAC to the backend server's MAC.
The IP headers remain untouched (destination IP stays as VIP).
The packet is then forwarded directly to the backend on the same L2 segment.
Without this MAT, the packet wouldn't reach the backend— it would still be addressed to the load balancer's MAC and get stuck.

This is why DSR is sometimes called "Layer 2 DSR" or "triangle mode." It's efficient because it avoids full IP rewriting, but it relies on L2 proximity.

Client (src=C1, dst=VIP, MAC=LB)
   |
   v
Load Balancer -- rewrites dst MAC=Backend, IP untouched
   |
   v
Backend Server (VIP on loopback, sees src=C1, dst=VIP)

Outbound Path (Backend → Client):

The backend constructs the response packet with:
- Source IP: VIP (spoofed).
- Destination IP: Client IP.
- Source MAC: The backend's own MAC.
- Destination MAC: The next-hop router's MAC (resolved via ARP if needed).
No MAC translation is required here because the backend sends the packet directly into the network. As it traverses routers toward the client, MAC addresses are rewritten hop-by-hop anyway (standard Ethernet behavior).
The client receives the packet with the expected source IP (VIP), and the TCP stack accepts it. The source MAC at delivery will be whatever the client's local router/gateway uses—not relevant to the connection.

Backend Server (src=VIP, dst=C1, srcMac=BackendMAC, destMac=MAC of next hop )
   |
   v
Direct to Client (bypasses LB)
   |
   v
Client (sees src=VIP ✅)

AWS Hyperplane

AWS didn't just build load balancers; they engineered an entire distributed system to handle networking at hyperscale. At the heart of AWS's Network Load Balancer (NLB) is Hyperplane—a sophisticated, software-defined networking (SDN) subsystem integrated into Amazon VPC. It's not a single device but a distributed packet-forwarding engine running on fleets of EC2 instances across Availability Zones, powering services like NLB, NAT Gateway, and even Lambda VPC networking.

How Hyperplane Works: A Deep Dive

Hyperplane creates the illusion of a traditional network while operating in a fully virtualized, distributed environment. Here's the step-by-step magic, focusing on solving the IP mismatch:

Hyperplane's Role in the Big Picture

Hyperplane isn't a single device or server; it's a software-defined, distributed packet-forwarding engine integrated into the Amazon VPC networking fabric. It powers several AWS services beyond NLB, like NAT Gateways and VPC endpoints, by running on fleets of EC2 instances across Availability Zones (AZs). For NLB specifically, Hyperplane acts as the "brain" that makes the load balancer appear as a seamless, scalable entity while manipulating traffic transparently.

When you create an NLB, it provisions Elastic Network Interfaces (ENIs) in your specified subnets (one per AZ typically). Incoming packets destined for the NLB's IP arrive at these ENIs, but Hyperplane intercepts and processes them within the VPC data plane. This setup allows NLB to scale horizontally without bottlenecks, handling millions of flows per second.
Backend Server Selection

Hyperplane decides which backend server (target) gets the request using a consistent flow hashing algorithm. Here's how:
- It computes a hash based on the connection's 5-tuple (source IP, source port, destination IP, destination port, and protocol).
- This hash maps the flow to a healthy target from your target group, ensuring all packets in the same TCP/UDP flow go to the same backend for consistency (e.g., no mid-connection rerouting).
- If a target becomes unhealthy (via health checks), Hyperplane excludes it from the hash space and reroutes new flows accordingly.
- This decision happens dynamically and distributively—packets might land on different Hyperplane nodes, but shared state ensures consistent routing.
Hyperplane uses techniques like shuffle sharding to distribute workloads across nodes, minimizing blast radius from failures and enabling massive scale.
Packet Manipulation and Routing

Hyperplane performs network address translation (NAT) and other manipulations in-flight, creating the "illusion" of direct communication while ensuring symmetry and client IP preservation. No changes are needed on your backend servers.
- Inbound Traffic (Client to Backend):
  - Packet arrives at NLB's IP (VIP).
  - Hyperplane rewrites the destination IP from the VIP to the selected backend's private IP (Destination NAT or DNAT).
  - It also adjusts the destination MAC address to the backend's ENI.
  - The source IP (client's) remains unchanged, so the backend sees the real client IP for logging/security.
  - Packet is forwarded within the VPC to the backend.
- Outbound Traffic (Backend to Client):
  - Backend responds with its own IP as source and client's IP as destination.
  - As the packet traverses the VPC (e.g., toward an Internet Gateway or NAT), Hyperplane intercepts it.
  - It rewrites the source IP to the NLB's VIP (Source NAT or SNAT) and forges the source MAC to match the NLB's ENI.
  - This ensures the client sees the response coming from the expected VIP, avoiding TCP resets due to IP mismatches.
  - Symmetry is maintained—firewalls/stateful devices don't drop packets.
All this happens without the backend knowing about the NLB; it thinks it's communicating directly with the client.
State Management and Other "Etc."
- Flow State Tracking: Hyperplane maintains distributed, transactional state for each flow (e.g., connection tuples). This is shared across nodes using protocols that allow any node to handle packets for a flow, even if the original node fails. It tracks idle timeouts (e.g., 350s for TCP) and health.
- Health Checks and Failover: Hyperplane runs health checks from its nodes to targets, marking zones unhealthy if needed and updating DNS (via Route 53) for failover.
- Scalability and Resilience: It's fault-tolerant—nodes replicate state, and traffic is load-balanced across them. No single point of failure, and it auto-scales with demand

Client (src=C1, dst=VIP)
   |
   v
NLB Elastic Network Interface (ENI in AZ)
   |
   v
[ Hyperplane Distributed Nodes ]
   |   - Select backend via 5-tuple hash
   |   - DNAT: VIP -> Backend IP
   |   - Preserve Client IP
   |
   v
Backend Server (sees Client IP)

Response Path:
Backend Server (src=Backend, dst=C1)
   |
   v
[ Hyperplane Nodes ] 
   |   - SNAT: Backend -> VIP
   |   - Ensure symmetry + client trust
   |
   v
Client (sees src=VIP, dest=C1 ✅)

Conclusion

We've gone from "What's a load balancer?" to cracking the response-routing riddle. Whether you go with SNAT for ease, DSR for raw speed (but only in tight-knit networks), or Hyperplane for cloud elegance, the goal's the same: Keep those connections alive and kicking. Next time you're architecting a system, think about this early—it'll save you headaches down the line :).