How to Conduct High Volume Load Tests on Endpoints

API load testing is the systematic process of applying synthetic traffic to an application programming interface to evaluate its performance under specific concurrency levels. This methodology identifies the saturation point of the request-response cycle and determines how the system handles increased throughput before failure. Within high density infrastructure, API load testing functions as a validation gate for the integration layer; it ensures that the orchestration between load balancers, ingress controllers, and back end microservices remains stable during traffic spikes. The primary objective is to quantify the relationship between request rate, response latency, and resource consumption. Failure to conduct these tests results in unknown bottlenecks within the TCP stack, connection pooling mechanisms, or database contention layers. When an endpoint reaches capacity, the failure impacts propagate upstream through cascading timeouts, potentially triggering circuit breaker trips or service-wide outages. Effective testing requires an environment that mirrors production networking topology to account for signal attenuation, packet loss, and proxy overhead.

| Parameter | Value |
| :— | :— |
| Target Protocol Support | HTTP/1.1, HTTP/2, gRPC, WebSockets, MQTT |
| Concurrency Limit (Per Node) | 10,000 to 50,000 Virtual Users (VU) |
| Recommended OS | Linux (Ubuntu 22.04 LTS or RHEL 9) |
| Target Ports | 80, 443, 8080, 8443, 50051 (gRPC) |
| Security Protocols | TLS 1.2, TLS 1.3, mTLS |
| CPU Requirement | 8 vCPUs per 20,000 RPS (Request Per Second) |
| RAM Requirement | 16GB minimum (scaled by payload size) |
| Network Interface | 10Gbps+ for high volume egress |
| Standard Compliance | RFC 2616, RFC 7540, RFC 7230 |
| Execution Environment | Isolated VPC or Dedicated Subnet |

Environment Prerequisites

Successful high volume testing requires an execution environment configured for high performance networking. The load generating machines must have the ulimit value for open files increased to at least 65535 to prevent “too many open files” errors during socket allocation. The kernel must be tuned to handle rapid connection recycling; specifically, net.ipv4.tcp_tw_reuse should be enabled. Ensure that the testing suite, such as k6, Locust, or JMeter, is installed and matches the architecture of the load generators. Network security groups must permit egress from the generators to the target endpoints on specified ports, and any intermediate Web Application Firewalls (WAF) must have the testing IP ranges whitelisted to prevent rate limiting from skewing results.

Implementation Logic

The architecture of a high volume test relies on distributed execution to avoid the “observer effect,” where the monitoring of the test consumes the resources needed to generate the load. A controller node manages the distribution of test scripts to multiple worker nodes. This approach avoids NIC saturation on a single instance and bypasses the 64k ephemeral port limit inherent to a single IP address. The dependency chain flows from the load generator through the global load balancer, into the cluster ingress, and finally to the pod or service. Encapsulation occurs at each layer, with TLS termination usually happening at the edge or ingress. The logic focuses on exhausting the worker thread pools of the destination service. By increasing concurrency, we force the kernel to queue incoming packets in the backlog buffer. Once this buffer exceeds the somaxconn limit, the system begins dropping SYN packets, resulting in connection timeouts.

Infrastructure Preparation and Kernel Tuning

Before initiating high volume requests, the operating system on the load generator must be optimized to handle massive socket tallies. This involves modifying the sysctl parameters to expand the ephemeral port range and accelerate the recycling of sockets in the TIME_WAIT state.

“`bash

Expand ephemeral port range

sudo sysctl -w net.ipv4.ip_local_port_range=”1024 65535″

Enable fast recycling of TCP connections

sudo sysctl -w net.ipv4.tcp_tw_reuse=1

Increase the maximum number of open file descriptors

ulimit -n 100000

Increase the max connection tracking entries

sudo sysctl -w net.netfilter.nf_conntrack_max=1048576
“`

System Note: These modifications reside in the kernel space. If testing within a containerized environment like Docker or Kubernetes, these parameters must be set on the host node or through a privileged init container because the container shares the host network stack.

Test Script Scripting and Payload Definition

Develop an idempotent test script using a tool like k6. The script must define the target URL, headers, and the distribution of virtual users. Idempotency is critical; the test should use DELETE or POST operations that do not result in database bloat or side effects that degrade performance over subsequent runs.

“`javascript
import http from ‘k6/http’;
import { check, sleep } from ‘k6’;

export let options = {
stages: [
{ duration: ‘2m’, target: 500 }, // Ramp up
{ duration: ‘5m’, target: 500 }, // Stay at peak
{ duration: ‘2m’, target: 0 }, // Ramp down
],
};

export default function () {
const url = ‘https://api.internal.system/v1/resource’;
const payload = JSON.stringify({
id: ‘test_id’,
action: ‘ping’
});

const params = {
headers: {
‘Content-Type’: ‘application/json’,
‘Authorization’: ‘Bearer ‘,
},
};

let res = http.post(url, payload, params);
check(res, { ‘status was 200’: (r) => r.status === 200 });
sleep(0.1);
}
“`

System Note: Using a sleep interval prevents a single virtual user from consuming 100 percent of a CPU core. It simulates realistic user behavior and allows the load generator to manage more concurrent threads.

Distributed Execution Deployment

For high volume tests exceeding 10,000 RPS, use a distributed model. Deploy worker nodes across multiple availability zones to ensure that the bottleneck is the target application and not the cloud provider’s regional network throughput.

“`bash

Example command for starting a distributed agent in a k6 operator context

kubectl apply -f k6-test-deployment.yaml

Verification of agent readiness

kubectl get k6s
“`

System Note: Monitor the resource utilization of the load generators via htop or nodestat. If CPU usage on the generator exceeds 80 percent, the latency metrics reported by the tool will be inaccurate due to internal processing delays.

Monitoring and Data Capture

Integrate the load generators with a time series database like Prometheus or InfluxDB. Capture telemetry from the target infrastructure using node_exporter and cadvisor to correlate request spikes with CPU and memory saturation.

“`bash

View real-time socket statistics on the target server

ss -s

Check for dropped packets in the kernel

netstat -s | grep “SYNs to LISTEN sockets dropped”
“`

System Note: High socket drops in the netstat output indicate that the application’s listen queue is full. This suggests the application is unable to accept new connections as fast as the load generator is sending them.

Dependency Fault Lines

Ephemeral Port Exhaustion
Root Cause: The load generator runs out of unique source ports to establish new connections.
Symptoms: “Address already in use” errors or “Cannot assign requested address.”
Verification: Run ss -ant | grep TIME_WAIT | wc -l. High counts indicate ports are stuck in a wait state.
Remediation: Enable tcp_tw_reuse and increase the range in ip_local_port_range.

DNS Resolution Bottlenecks
Root Cause: Every request triggers a DNS lookup, overwhelming the local resolver or the upstream DNS server.
Symptoms: Intermittent “Host not found” errors or high latency in the initial connection phase.
Verification: Inspect systemd-resolved logs or use dig during the test.
Remediation: Use an IP address for the target or implement a local DNS cache like nscd.

TLS Handshake Overhead
Root Cause: The cryptographic process of establishing TLS connections consumes massive CPU resources on the ingress controller.
Symptoms: High system CPU usage on the load balancer while application CPU remains low.
Verification: Check the metrics for nginx_ingress_controller_ssl_handshake_duration_seconds.
Remediation: Implement session resumption or use faster cipher suites like AES-GCM.

Troubleshooting Matrix

| Error/Symptom | Probable Cause | Verification Command | Remediation |
| :— | :— | :— | :— |
| HTTP 502 Bad Gateway | Upstream service crash | kubectl logs | Increase pod resources or replicas. |
| HTTP 504 Gateway Timeout | Upstream taking too long | journalctl -u nginx | Optimize DB queries or increase timeout. |
| Connection Refused | Service not listening | netstat -tulpn | Start daemon or check port binding. |
| High I/O Wait | Disk contention | iostat -xz 1 | Upgrade to NVMe or reduce logging verbosity. |
| Request Timeout | Network packet loss | mtr -rw | Check for firewall throttles or ISP issues. |

Performance Optimization

To maximize throughput, tune the SOMAXCONN parameter on the target server to at least 4096. This allows the OS to queue more pending connections. In the application layer, ensure that connection pooling is enabled; reusing existing TCP connections via Move Persistent Connections (Keep-Alive) significantly reduces the overhead of the three-way handshake and TLS negotiation. Optimize the payload size by using Brotli or Gzip compression to reduce the total bytes transmitted over the wire.

Security Hardening

Load testing should occur within a segmented network environment. Use mTLS to ensure only authorized load generators can communicate with the backend. Implement firewall rules that strictly allow traffic from the testing subnet to the target subnet on defined ports only. Ensure that sensitive data is scrubbed from the request payloads to prevent accidental exposure in logs. Use a dedicated service account with the minimum necessary permissions for any authenticated API calls during the test.

Scaling Strategy

Horizontal scaling is the preferred method for managing high volume load. Instead of increasing the size of a single instance, deploy multiple smaller replicas behind a load balancer. Use a Round Robin or Least Connections algorithm to distribute traffic. For the load generators, use a cluster of nodes managed by an orchestrator like Kubernetes. This allows for linear scaling of the request volume by simply increasing the replica count of the tester pods.

Admin Desk

How do I prevent the load generator from crashing?
Monitor memory usage and set a hard limit on virtual users per node. If the tool uses Go, adjust GOMAXPROCS. Ensure the collector process is decoupled from the generator process to prevent a feedback loop that consumes all available RAM.

Why are my latencies higher in the test than in production?
Check if you are hitting a single NAT gateway which may be throttling throughput. Verify that the load generator is in the same geographic region as the target to minimize propagation delay. Ensure the generator’s CPU is not saturating.

What is the best way to test gRPC endpoints?
Use a tool that supports Protobuf natively, like ghz. Unlike HTTP/1.1, gRPC heavily utilizes multiplexing over HTTP/2; ensure your load generator correctly handles stream management and does not open a new TCP connection for every RPC call.

How can I identify if the database is the bottleneck?
Monitor the iowait metric on the database server. Use pg_stat_activity for PostgreSQL or SHOW PROCESSLIST for MySQL to identify long-running queries or lock contention. If latency increases while CPU remains low, the database is likely the cause.

Can I run load tests on production environments?
Only during scheduled maintenance windows and with clear “off” switches. Use header-based routing to direct test traffic to a shadow environment or use a dedicated “test” flag in the payload to prevent data pollution in production databases or analytics.

Leave a Comment