A new microservice passed all performance tests with flying colors, showing response times under 50ms even at high concurrency. However, when deployed to production, the service experienced severe performance degradation with response times exceeding 2 seconds and frequent timeouts.
# Performance Testing Scenarios
No summary provided
What Happened:
Diagnosis Steps:
Compared production and test environment configurations.
Analyzed database connection patterns in both environments.
Reviewed JMeter test scripts and configuration.
Monitored resource utilization during tests and in production.
Examined application logs for connection-related events.
Root Cause:
The load testing tool (JMeter) was configured to use connection pooling, maintaining persistent connections to the application. In contrast, real users in production created new connections for each request. The application was not properly configured for connection pooling, causing it to create a new database connection for each incoming request. This led to connection exhaustion and significant latency in the production environment.
Fix/Workaround:
• Short-term: Increased database connection pool size and timeout settings:
// application.properties - Updated connection pool configuration
spring.datasource.hikari.maximum-pool-size=50
spring.datasource.hikari.minimum-idle=10
spring.datasource.hikari.idle-timeout=30000
spring.datasource.hikari.max-lifetime=60000
spring.datasource.hikari.connection-timeout=10000
spring.datasource.hikari.validation-timeout=5000
• Long-term: Implemented proper connection pooling at all levels:
// DatabaseConfig.java - Proper connection pool configuration with metrics
@Configuration
public class DatabaseConfig {
@Bean
@ConfigurationProperties("spring.datasource.hikari")
public HikariConfig hikariConfig() {
HikariConfig config = new HikariConfig();
// Set reasonable defaults if not specified in properties
if (config.getMaximumPoolSize() <= 0) {
config.setMaximumPoolSize(50);
}
if (config.getMinimumIdle() <= 0) {
config.setMinimumIdle(10);
}
// Enable metrics
config.setMetricRegistry(metricRegistry());
// Add health check
config.setHealthCheckRegistry(healthCheckRegistry());
return config;
}
@Bean
public DataSource dataSource() {
return new HikariDataSource(hikariConfig());
}
@Bean
public MetricRegistry metricRegistry() {
return new MetricRegistry();
}
@Bean
public HealthCheckRegistry healthCheckRegistry() {
return new HealthCheckRegistry();
}
@Bean
public ConnectionPoolMonitor connectionPoolMonitor(MetricRegistry metricRegistry) {
return new ConnectionPoolMonitor(metricRegistry);
}
}
// ConnectionPoolMonitor.java - Monitor connection pool metrics
@Component
public class ConnectionPoolMonitor {
private static final Logger logger = LoggerFactory.getLogger(ConnectionPoolMonitor.class);
private final MetricRegistry metricRegistry;
public ConnectionPoolMonitor(MetricRegistry metricRegistry) {
this.metricRegistry = metricRegistry;
}
@Scheduled(fixedRate = 60000)
public void reportConnectionPoolStats() {
SortedMap<String, Gauge> gauges = metricRegistry.getGauges(
(name, metric) -> name.startsWith("hikaricp.pool")
);
if (!gauges.isEmpty()) {
logger.info("Connection pool stats:");
gauges.forEach((name, gauge) -> {
logger.info("{}: {}", name, gauge.getValue());
});
}
}
}
• Modified JMeter test scripts to better simulate real-world usage:
<!-- JMeter test plan with realistic connection behavior -->
<?xml version="1.0" encoding="UTF-8"?>
<jmeterTestPlan version="1.2" properties="5.0" jmeter="5.4.1">
<hashTree>
<TestPlan guiclass="TestPlanGui" testclass="TestPlan" testname="Realistic User Simulation" enabled="true">
<stringProp name="TestPlan.comments"></stringProp>
<boolProp name="TestPlan.functional_mode">false</boolProp>
<boolProp name="TestPlan.tearDown_on_shutdown">true</boolProp>
<boolProp name="TestPlan.serialize_threadgroups">false</boolProp>
<elementProp name="TestPlan.user_defined_variables" elementType="Arguments" guiclass="ArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
<collectionProp name="Arguments.arguments"/>
</elementProp>
<stringProp name="TestPlan.user_define_classpath"></stringProp>
</TestPlan>
<hashTree>
<ThreadGroup guiclass="ThreadGroupGui" testclass="ThreadGroup" testname="User Group" enabled="true">
<stringProp name="ThreadGroup.on_sample_error">continue</stringProp>
<elementProp name="ThreadGroup.main_controller" elementType="LoopController" guiclass="LoopControlPanel" testclass="LoopController" testname="Loop Controller" enabled="true">
<boolProp name="LoopController.continue_forever">false</boolProp>
<stringProp name="LoopController.loops">10</stringProp>
</elementProp>
<stringProp name="ThreadGroup.num_threads">100</stringProp>
<stringProp name="ThreadGroup.ramp_time">30</stringProp>
<boolProp name="ThreadGroup.scheduler">true</boolProp>
<stringProp name="ThreadGroup.duration">300</stringProp>
<stringProp name="ThreadGroup.delay">0</stringProp>
<boolProp name="ThreadGroup.same_user_on_next_iteration">false</boolProp>
</ThreadGroup>
<hashTree>
<ConfigTestElement guiclass="HttpDefaultsGui" testclass="ConfigTestElement" testname="HTTP Request Defaults" enabled="true">
<elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
<collectionProp name="Arguments.arguments"/>
</elementProp>
<stringProp name="HTTPsampler.domain">api.example.com</stringProp>
<stringProp name="HTTPsampler.port">443</stringProp>
<stringProp name="HTTPsampler.protocol">https</stringProp>
<stringProp name="HTTPsampler.contentEncoding"></stringProp>
<stringProp name="HTTPsampler.path"></stringProp>
<stringProp name="HTTPsampler.concurrentPool">4</stringProp>
<stringProp name="HTTPsampler.connect_timeout">5000</stringProp>
<stringProp name="HTTPsampler.response_timeout">30000</stringProp>
</ConfigTestElement>
<hashTree/>
<CookieManager guiclass="CookiePanel" testclass="CookieManager" testname="HTTP Cookie Manager" enabled="true">
<collectionProp name="CookieManager.cookies"/>
<boolProp name="CookieManager.clearEachIteration">true</boolProp>
<boolProp name="CookieManager.controlledByThreadGroup">false</boolProp>
</CookieManager>
<hashTree/>
<HeaderManager guiclass="HeaderPanel" testclass="HeaderManager" testname="HTTP Header Manager" enabled="true">
<collectionProp name="HeaderManager.headers">
<elementProp name="" elementType="Header">
<stringProp name="Header.name">Content-Type</stringProp>
<stringProp name="Header.value">application/json</stringProp>
</elementProp>
<elementProp name="" elementType="Header">
<stringProp name="Header.name">Accept</stringProp>
<stringProp name="Header.value">application/json</stringProp>
</elementProp>
</collectionProp>
</HeaderManager>
<hashTree/>
<HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="Login Request" enabled="true">
<boolProp name="HTTPSampler.postBodyRaw">true</boolProp>
<elementProp name="HTTPsampler.Arguments" elementType="Arguments">
<collectionProp name="Arguments.arguments">
<elementProp name="" elementType="HTTPArgument">
<boolProp name="HTTPArgument.always_encode">false</boolProp>
<stringProp name="Argument.value">{
"username": "${username}",
"password": "${password}"
}</stringProp>
<stringProp name="Argument.metadata">=</stringProp>
</elementProp>
</collectionProp>
</elementProp>
<stringProp name="HTTPSampler.domain"></stringProp>
<stringProp name="HTTPSampler.port"></stringProp>
<stringProp name="HTTPSampler.protocol"></stringProp>
<stringProp name="HTTPSampler.contentEncoding"></stringProp>
<stringProp name="HTTPSampler.path">/api/login</stringProp>
<stringProp name="HTTPSampler.method">POST</stringProp>
<boolProp name="HTTPSampler.follow_redirects">true</boolProp>
<boolProp name="HTTPSampler.auto_redirects">false</boolProp>
<boolProp name="HTTPSampler.use_keepalive">false</boolProp>
<boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp>
<stringProp name="HTTPSampler.embedded_url_re"></stringProp>
<stringProp name="HTTPSampler.connect_timeout"></stringProp>
<stringProp name="HTTPSampler.response_timeout"></stringProp>
</HTTPSamplerProxy>
<hashTree>
<JSONPostProcessor guiclass="JSONPostProcessorGui" testclass="JSONPostProcessor" testname="Extract Token" enabled="true">
<stringProp name="JSONPostProcessor.referenceNames">token</stringProp>
<stringProp name="JSONPostProcessor.jsonPathExprs">$.token</stringProp>
<stringProp name="JSONPostProcessor.match_numbers"></stringProp>
</JSONPostProcessor>
<hashTree/>
</hashTree>
<TestAction guiclass="TestActionGui" testclass="TestAction" testname="Think Time" enabled="true">
<intProp name="ActionProcessor.action">1</intProp>
<intProp name="ActionProcessor.target">0</intProp>
<stringProp name="ActionProcessor.duration">0</stringProp>
</TestAction>
<hashTree>
<UniformRandomTimer guiclass="UniformRandomTimerGui" testclass="UniformRandomTimer" testname="Uniform Random Timer" enabled="true">
<stringProp name="ConstantTimer.delay">1000</stringProp>
<stringProp name="RandomTimer.range">5000</stringProp>
</UniformRandomTimer>
<hashTree/>
</hashTree>
<HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="API Request" enabled="true">
<elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
<collectionProp name="Arguments.arguments"/>
</elementProp>
<stringProp name="HTTPSampler.domain"></stringProp>
<stringProp name="HTTPSampler.port"></stringProp>
<stringProp name="HTTPSampler.protocol"></stringProp>
<stringProp name="HTTPSampler.contentEncoding"></stringProp>
<stringProp name="HTTPSampler.path">/api/data</stringProp>
<stringProp name="HTTPSampler.method">GET</stringProp>
<boolProp name="HTTPSampler.follow_redirects">true</boolProp>
<boolProp name="HTTPSampler.auto_redirects">false</boolProp>
<boolProp name="HTTPSampler.use_keepalive">false</boolProp>
<boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp>
<stringProp name="HTTPSampler.embedded_url_re"></stringProp>
<stringProp name="HTTPSampler.connect_timeout"></stringProp>
<stringProp name="HTTPSampler.response_timeout"></stringProp>
</HTTPSamplerProxy>
<hashTree>
<HeaderManager guiclass="HeaderPanel" testclass="HeaderManager" testname="HTTP Header Manager" enabled="true">
<collectionProp name="HeaderManager.headers">
<elementProp name="" elementType="Header">
<stringProp name="Header.name">Authorization</stringProp>
<stringProp name="Header.value">Bearer ${token}</stringProp>
</elementProp>
</collectionProp>
</HeaderManager>
<hashTree/>
</hashTree>
<TestAction guiclass="TestActionGui" testclass="TestAction" testname="Think Time" enabled="true">
<intProp name="ActionProcessor.action">1</intProp>
<intProp name="ActionProcessor.target">0</intProp>
<stringProp name="ActionProcessor.duration">0</stringProp>
</TestAction>
<hashTree>
<UniformRandomTimer guiclass="UniformRandomTimerGui" testclass="UniformRandomTimer" testname="Uniform Random Timer" enabled="true">
<stringProp name="ConstantTimer.delay">2000</stringProp>
<stringProp name="RandomTimer.range">8000</stringProp>
</UniformRandomTimer>
<hashTree/>
</hashTree>
<HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="Logout Request" enabled="true">
<elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
<collectionProp name="Arguments.arguments"/>
</elementProp>
<stringProp name="HTTPSampler.domain"></stringProp>
<stringProp name="HTTPSampler.port"></stringProp>
<stringProp name="HTTPSampler.protocol"></stringProp>
<stringProp name="HTTPSampler.contentEncoding"></stringProp>
<stringProp name="HTTPSampler.path">/api/logout</stringProp>
<stringProp name="HTTPSampler.method">POST</stringProp>
<boolProp name="HTTPSampler.follow_redirects">true</boolProp>
<boolProp name="HTTPSampler.auto_redirects">false</boolProp>
<boolProp name="HTTPSampler.use_keepalive">false</boolProp>
<boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp>
<stringProp name="HTTPSampler.embedded_url_re"></stringProp>
<stringProp name="HTTPSampler.connect_timeout"></stringProp>
<stringProp name="HTTPSampler.response_timeout"></stringProp>
</HTTPSamplerProxy>
<hashTree>
<HeaderManager guiclass="HeaderPanel" testclass="HeaderManager" testname="HTTP Header Manager" enabled="true">
<collectionProp name="HeaderManager.headers">
<elementProp name="" elementType="Header">
<stringProp name="Header.name">Authorization</stringProp>
<stringProp name="Header.value">Bearer ${token}</stringProp>
</elementProp>
</collectionProp>
</HeaderManager>
<hashTree/>
</hashTree>
<ResultCollector guiclass="ViewResultsFullVisualizer" testclass="ResultCollector" testname="View Results Tree" enabled="true">
<boolProp name="ResultCollector.error_logging">false</boolProp>
<objProp>
<name>saveConfig</name>
<value class="SampleSaveConfiguration">
<time>true</time>
<latency>true</latency>
<timestamp>true</timestamp>
<success>true</success>
<label>true</label>
<code>true</code>
<message>true</message>
<threadName>true</threadName>
<dataType>true</dataType>
<encoding>false</encoding>
<assertions>true</assertions>
<subresults>true</subresults>
<responseData>false</responseData>
<samplerData>false</samplerData>
<xml>false</xml>
<fieldNames>true</fieldNames>
<responseHeaders>false</responseHeaders>
<requestHeaders>false</requestHeaders>
<responseDataOnError>false</responseDataOnError>
<saveAssertionResultsFailureMessage>true</saveAssertionResultsFailureMessage>
<assertionsResultsToSave>0</assertionsResultsToSave>
<bytes>true</bytes>
<sentBytes>true</sentBytes>
<url>true</url>
<threadCounts>true</threadCounts>
<idleTime>true</idleTime>
<connectTime>true</connectTime>
</value>
</objProp>
<stringProp name="filename"></stringProp>
</ResultCollector>
<hashTree/>
<ResultCollector guiclass="SummaryReport" testclass="ResultCollector" testname="Summary Report" enabled="true">
<boolProp name="ResultCollector.error_logging">false</boolProp>
<objProp>
<name>saveConfig</name>
<value class="SampleSaveConfiguration">
<time>true</time>
<latency>true</latency>
<timestamp>true</timestamp>
<success>true</success>
<label>true</label>
<code>true</code>
<message>true</message>
<threadName>true</threadName>
<dataType>true</dataType>
<encoding>false</encoding>
<assertions>true</assertions>
<subresults>true</subresults>
<responseData>false</responseData>
<samplerData>false</samplerData>
<xml>false</xml>
<fieldNames>true</fieldNames>
<responseHeaders>false</responseHeaders>
<requestHeaders>false</requestHeaders>
<responseDataOnError>false</responseDataOnError>
<saveAssertionResultsFailureMessage>true</saveAssertionResultsFailureMessage>
<assertionsResultsToSave>0</assertionsResultsToSave>
<bytes>true</bytes>
<sentBytes>true</sentBytes>
<url>true</url>
<threadCounts>true</threadCounts>
<idleTime>true</idleTime>
<connectTime>true</connectTime>
</value>
</objProp>
<stringProp name="filename"></stringProp>
</ResultCollector>
<hashTree/>
</hashTree>
</hashTree>
</hashTree>
</jmeterTestPlan>
• Created a comprehensive performance testing framework:
# performance_test_framework.py
import argparse
import csv
import datetime
import json
import logging
import os
import subprocess
import sys
import time
from concurrent.futures import ThreadPoolExecutor
from pathlib import Path
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import requests
from jinja2 import Environment, FileSystemLoader
# Configure logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler("performance_test.log"),
logging.StreamHandler()
]
)
logger = logging.getLogger("performance_test")
class PerformanceTestFramework:
def __init__(self, config_file):
self.config = self._load_config(config_file)
self.results_dir = Path(self.config.get("results_dir", "results"))
self.results_dir.mkdir(exist_ok=True)
self.test_timestamp = datetime.datetime.now().strftime("%Y%m%d_%H%M%S")
self.current_results_dir = self.results_dir / self.test_timestamp
self.current_results_dir.mkdir(exist_ok=True)
def _load_config(self, config_file):
"""Load configuration from JSON file"""
try:
with open(config_file, 'r') as f:
return json.load(f)
except Exception as e:
logger.error(f"Failed to load config file: {e}")
sys.exit(1)
def run_jmeter_test(self, test_plan, properties=None):
"""Run JMeter test with the specified test plan"""
jmeter_bin = self.config.get("jmeter_bin", "jmeter")
results_file = self.current_results_dir / f"jmeter_results_{int(time.time())}.csv"
cmd = [
jmeter_bin,
"-n", # Non-GUI mode
"-t", test_plan, # Test plan
"-l", str(results_file), # Results file
"-j", str(self.current_results_dir / "jmeter.log") # JMeter log file
]
# Add properties if specified
if properties:
for key, value in properties.items():
cmd.extend(["-J", f"{key}={value}"])
logger.info(f"Running JMeter test: {' '.join(cmd)}")
try:
subprocess.run(cmd, check=True)
logger.info(f"JMeter test completed successfully. Results saved to {results_file}")
return results_file
except subprocess.CalledProcessError as e:
logger.error(f"JMeter test failed: {e}")
return None
def run_direct_api_test(self, test_config):
"""Run direct API tests using requests library"""
results_file = self.current_results_dir / f"api_results_{int(time.time())}.csv"
base_url = test_config.get("base_url")
endpoints = test_config.get("endpoints", [])
concurrency = test_config.get("concurrency", 1)
iterations = test_config.get("iterations", 1)
results = []
def call_endpoint(endpoint):
url = f"{base_url}{endpoint['path']}"
method = endpoint.get("method", "GET").lower()
headers = endpoint.get("headers", {})
data = endpoint.get("data")
start_time = time.time()
try:
if method == "get":
response = requests.get(url, headers=headers, timeout=30)
elif method == "post":
response = requests.post(url, headers=headers, json=data, timeout=30)
elif method == "put":
response = requests.put(url, headers=headers, json=data, timeout=30)
elif method == "delete":
response = requests.delete(url, headers=headers, timeout=30)
else:
logger.error(f"Unsupported HTTP method: {method}")
return {
"endpoint": endpoint["path"],
"method": method,
"status_code": 0,
"response_time": 0,
"success": False,
"error": f"Unsupported HTTP method: {method}"
}
elapsed = time.time() - start_time
return {
"endpoint": endpoint["path"],
"method": method,
"status_code": response.status_code,
"response_time": elapsed,
"success": 200 <= response.status_code < 300,
"error": None
}
except Exception as e:
elapsed = time.time() - start_time
return {
"endpoint": endpoint["path"],
"method": method,
"status_code": 0,
"response_time": elapsed,
"success": False,
"error": str(e)
}
# Run tests with concurrency
with ThreadPoolExecutor(max_workers=concurrency) as executor:
for _ in range(iterations):
for endpoint in endpoints:
future = executor.submit(call_endpoint, endpoint)
results.append(future.result())
# Write results to CSV
with open(results_file, 'w', newline='') as f:
writer = csv.DictWriter(f, fieldnames=["endpoint", "method", "status_code", "response_time", "success", "error"])
writer.writeheader()
writer.writerows(results)
logger.info(f"API test completed. Results saved to {results_file}")
return results_file
def run_database_connection_test(self, db_config):
"""Test database connection performance"""
results_file = self.current_results_dir / f"db_results_{int(time.time())}.csv"
db_type = db_config.get("type", "postgresql")
concurrency = db_config.get("concurrency", 1)
iterations = db_config.get("iterations", 100)
if db_type == "postgresql":
import psycopg2
host = db_config.get("host", "localhost")
port = db_config.get("port", 5432)
database = db_config.get("database")
user = db_config.get("user")
password = db_config.get("password")
results = []
def test_connection():
start_time = time.time()
conn = None
try:
conn = psycopg2.connect(
host=host,
port=port,
database=database,
user=user,
password=password
)
# Execute a simple query
with conn.cursor() as cursor:
cursor.execute("SELECT 1")
cursor.fetchone()
elapsed = time.time() - start_time
return {
"operation": "connect_and_query",
"success": True,
"response_time": elapsed,
"error": None
}
except Exception as e:
elapsed = time.time() - start_time
return {
"operation": "connect_and_query",
"success": False,
"response_time": elapsed,
"error": str(e)
}
finally:
if conn:
conn.close()
# Run tests with concurrency
with ThreadPoolExecutor(max_workers=concurrency) as executor:
futures = [executor.submit(test_connection) for _ in range(iterations)]
for future in futures:
results.append(future.result())
# Write results to CSV
with open(results_file, 'w', newline='') as f:
writer = csv.DictWriter(f, fieldnames=["operation", "success", "response_time", "error"])
writer.writeheader()
writer.writerows(results)
logger.info(f"Database connection test completed. Results saved to {results_file}")
return results_file
else:
logger.error(f"Unsupported database type: {db_type}")
return None
def analyze_results(self, results_file, test_type):
"""Analyze test results and generate charts"""
if not os.path.exists(results_file):
logger.error(f"Results file not found: {results_file}")
return None
try:
df = pd.read_csv(results_file)
if test_type == "jmeter":
return self._analyze_jmeter_results(df, results_file)
elif test_type == "api":
return self._analyze_api_results(df, results_file)
elif test_type == "db":
return self._analyze_db_results(df, results_file)
else:
logger.error(f"Unsupported test type: {test_type}")
return None
except Exception as e:
logger.error(f"Failed to analyze results: {e}")
return None
def _analyze_jmeter_results(self, df, results_file):
"""Analyze JMeter test results"""
# Calculate statistics
stats = {
"total_requests": len(df),
"successful_requests": len(df[df["success"] == "true"]),
"failed_requests": len(df[df["success"] == "false"]),
"avg_response_time": df["elapsed"].mean(),
"min_response_time": df["elapsed"].min(),
"max_response_time": df["elapsed"].max(),
"p50_response_time": df["elapsed"].quantile(0.5),
"p90_response_time": df["elapsed"].quantile(0.9),
"p95_response_time": df["elapsed"].quantile(0.95),
"p99_response_time": df["elapsed"].quantile(0.99)
}
# Generate charts
charts_dir = self.current_results_dir / "charts"
charts_dir.mkdir(exist_ok=True)
# Response time histogram
plt.figure(figsize=(10, 6))
plt.hist(df["elapsed"], bins=50, alpha=0.7)
plt.xlabel("Response Time (ms)")
plt.ylabel("Frequency")
plt.title("Response Time Distribution")
plt.grid(True, alpha=0.3)
plt.savefig(charts_dir / "response_time_histogram.png")
plt.close()
# Response time over time
plt.figure(figsize=(12, 6))
plt.plot(df.index, df["elapsed"], marker=".", linestyle="-", alpha=0.5)
plt.xlabel("Request Number")
plt.ylabel("Response Time (ms)")
plt.title("Response Time Over Time")
plt.grid(True, alpha=0.3)
plt.savefig(charts_dir / "response_time_series.png")
plt.close()
# Generate report
self._generate_report(stats, charts_dir, "jmeter")
return stats
def _analyze_api_results(self, df, results_file):
"""Analyze direct API test results"""
# Calculate statistics
stats = {
"total_requests": len(df),
"successful_requests": len(df[df["success"] == True]),
"failed_requests": len(df[df["success"] == False]),
"avg_response_time": df["response_time"].mean(),
"min_response_time": df["response_time"].min(),
"max_response_time": df["response_time"].max(),
"p50_response_time": df["response_time"].quantile(0.5),
"p90_response_time": df["response_time"].quantile(0.9),
"p95_response_time": df["response_time"].quantile(0.95),
"p99_response_time": df["response_time"].quantile(0.99)
}
# Generate charts
charts_dir = self.current_results_dir / "charts"
charts_dir.mkdir(exist_ok=True)
# Response time by endpoint
endpoint_stats = df.groupby("endpoint")["response_time"].agg(["mean", "min", "max"]).reset_index()
plt.figure(figsize=(12, 6))
bar_width = 0.25
x = np.arange(len(endpoint_stats))
plt.bar(x - bar_width, endpoint_stats["mean"], width=bar_width, label="Mean", alpha=0.7)
plt.bar(x, endpoint_stats["min"], width=bar_width, label="Min", alpha=0.7)
plt.bar(x + bar_width, endpoint_stats["max"], width=bar_width, label="Max", alpha=0.7)
plt.xlabel("Endpoint")
plt.ylabel("Response Time (s)")
plt.title("Response Time by Endpoint")
plt.xticks(x, endpoint_stats["endpoint"], rotation=45, ha="right")
plt.legend()
plt.tight_layout()
plt.grid(True, alpha=0.3)
plt.savefig(charts_dir / "response_time_by_endpoint.png")
plt.close()
# Generate report
self._generate_report(stats, charts_dir, "api")
return stats
def _analyze_db_results(self, df, results_file):
"""Analyze database connection test results"""
# Calculate statistics
stats = {
"total_operations": len(df),
"successful_operations": len(df[df["success"] == True]),
"failed_operations": len(df[df["success"] == False]),
"avg_response_time": df["response_time"].mean(),
"min_response_time": df["response_time"].min(),
"max_response_time": df["response_time"].max(),
"p50_response_time": df["response_time"].quantile(0.5),
"p90_response_time": df["response_time"].quantile(0.9),
"p95_response_time": df["response_time"].quantile(0.95),
"p99_response_time": df["response_time"].quantile(0.99)
}
# Generate charts
charts_dir = self.current_results_dir / "charts"
charts_dir.mkdir(exist_ok=True)
# Response time histogram
plt.figure(figsize=(10, 6))
plt.hist(df["response_time"], bins=50, alpha=0.7)
plt.xlabel("Response Time (s)")
plt.ylabel("Frequency")
plt.title("Database Connection Time Distribution")
plt.grid(True, alpha=0.3)
plt.savefig(charts_dir / "db_response_time_histogram.png")
plt.close()
# Generate report
self._generate_report(stats, charts_dir, "db")
return stats
def _generate_report(self, stats, charts_dir, test_type):
"""Generate HTML report from test results"""
env = Environment(loader=FileSystemLoader("templates"))
template = env.get_template("report_template.html")
report_data = {
"test_type": test_type,
"timestamp": self.test_timestamp,
"stats": stats,
"charts": [str(p.relative_to(self.current_results_dir)) for p in charts_dir.glob("*.png")]
}
report_html = template.render(**report_data)
with open(self.current_results_dir / "report.html", "w") as f:
f.write(report_html)
logger.info(f"Report generated at {self.current_results_dir / 'report.html'}")
def run_tests(self):
"""Run all configured tests"""
test_configs = self.config.get("tests", [])
for test_config in test_configs:
test_type = test_config.get("type")
if test_type == "jmeter":
test_plan = test_config.get("test_plan")
properties = test_config.get("properties")
results_file = self.run_jmeter_test(test_plan, properties)
if results_file:
self.analyze_results(results_file, "jmeter")
elif test_type == "api":
results_file = self.run_direct_api_test(test_config)
if results_file:
self.analyze_results(results_file, "api")
elif test_type == "db":
results_file = self.run_database_connection_test(test_config)
if results_file:
self.analyze_results(results_file, "db")
else:
logger.error(f"Unsupported test type: {test_type}")
def main():
parser = argparse.ArgumentParser(description="Performance Test Framework")
parser.add_argument("--config", required=True, help="Path to configuration file")
args = parser.parse_args()
framework = PerformanceTestFramework(args.config)
framework.run_tests()
if __name__ == "__main__":
main()
Lessons Learned:
Load testing tools can mask connection-related issues if not configured to simulate real-world usage patterns.
How to Avoid:
Configure load testing tools to simulate real-world connection patterns.
Test with and without connection pooling to understand the impact.
Monitor connection metrics during load tests.
Include connection establishment in performance tests.
Validate test results against production-like environments.
No summary provided
What Happened:
A company migrated from a single shared database to a microservice-specific database architecture. After the migration, users reported that a critical API was taking 3-5 seconds to respond, compared to sub-second response times before the migration. The issue was not caught in pre-production testing.
Diagnosis Steps:
Analyzed application logs for error patterns and slow queries.
Compared database query execution plans before and after migration.
Used distributed tracing to identify bottlenecks in the request flow.
Conducted load testing with JMeter to reproduce and quantify the issue.
Profiled the application to identify CPU, memory, and I/O patterns.
Root Cause:
The performance degradation was caused by multiple factors: 1. The microservice was making sequential API calls to other services to gather data that was previously available in a single database query. 2. N+1 query patterns emerged in the ORM layer when fetching related entities. 3. Missing database indexes on frequently queried columns in the new database. 4. Connection pool settings were not optimized for the new architecture.
Fix/Workaround:
• Short-term: Implemented caching and optimized the most critical queries:
// Before: Inefficient sequential API calls
@Service
public class ProductService {
private final RestTemplate restTemplate;
private final ProductRepository productRepository;
public ProductDetails getProductDetails(String productId) {
Product product = productRepository.findById(productId)
.orElseThrow(() -> new ProductNotFoundException(productId));
// Sequential API calls
InventoryStatus inventory = restTemplate.getForObject(
"http://inventory-service/inventory/{productId}",
InventoryStatus.class,
productId
);
PricingInfo pricing = restTemplate.getForObject(
"http://pricing-service/pricing/{productId}",
PricingInfo.class,
productId
);
List<Review> reviews = restTemplate.getForObject(
"http://review-service/reviews/product/{productId}",
new ParameterizedTypeReference<List<Review>>() {},
productId
);
return new ProductDetails(product, inventory, pricing, reviews);
}
}
// After: Parallel API calls with CompletableFuture
@Service
public class ProductService {
private final RestTemplate restTemplate;
private final ProductRepository productRepository;
@Async
public CompletableFuture<ProductDetails> getProductDetails(String productId) {
Product product = productRepository.findById(productId)
.orElseThrow(() -> new ProductNotFoundException(productId));
// Parallel API calls
CompletableFuture<InventoryStatus> inventoryFuture = CompletableFuture.supplyAsync(() ->
restTemplate.getForObject(
"http://inventory-service/inventory/{productId}",
InventoryStatus.class,
productId
)
);
CompletableFuture<PricingInfo> pricingFuture = CompletableFuture.supplyAsync(() ->
restTemplate.getForObject(
"http://pricing-service/pricing/{productId}",
PricingInfo.class,
productId
)
);
CompletableFuture<List<Review>> reviewsFuture = CompletableFuture.supplyAsync(() ->
restTemplate.getForObject(
"http://review-service/reviews/product/{productId}",
new ParameterizedTypeReference<List<Review>>() {},
productId
)
);
return CompletableFuture.allOf(inventoryFuture, pricingFuture, reviewsFuture)
.thenApply(v -> new ProductDetails(
product,
inventoryFuture.join(),
pricingFuture.join(),
reviewsFuture.join()
));
}
}
• Fixed N+1 query issues in the ORM layer:
// Before: N+1 query problem
@Entity
public class Order {
@Id
private String id;
@OneToMany(mappedBy = "order", fetch = FetchType.LAZY)
private List<OrderItem> items;
// getters and setters
}
@Repository
public interface OrderRepository extends JpaRepository<Order, String> {
List<Order> findByCustomerId(String customerId);
}
@Service
public class OrderService {
private final OrderRepository orderRepository;
public List<OrderSummary> getCustomerOrders(String customerId) {
List<Order> orders = orderRepository.findByCustomerId(customerId);
return orders.stream().map(order -> {
// This triggers N+1 queries, one for each order
List<OrderItem> items = order.getItems();
return new OrderSummary(order.getId(), items.size(),
items.stream().mapToDouble(OrderItem::getPrice).sum());
}).collect(Collectors.toList());
}
}
// After: Using join fetch to avoid N+1 queries
@Repository
public interface OrderRepository extends JpaRepository<Order, String> {
@Query("SELECT o FROM Order o LEFT JOIN FETCH o.items WHERE o.customerId = :customerId")
List<Order> findByCustomerIdWithItems(@Param("customerId") String customerId);
}
@Service
public class OrderService {
private final OrderRepository orderRepository;
public List<OrderSummary> getCustomerOrders(String customerId) {
// Single query with join fetch
List<Order> orders = orderRepository.findByCustomerIdWithItems(customerId);
return orders.stream().map(order -> {
List<OrderItem> items = order.getItems();
return new OrderSummary(order.getId(), items.size(),
items.stream().mapToDouble(OrderItem::getPrice).sum());
}).collect(Collectors.toList());
}
}
• Added missing database indexes:
-- Add indexes on frequently queried columns
CREATE INDEX idx_product_category ON products(category_id);
CREATE INDEX idx_order_customer ON orders(customer_id);
CREATE INDEX idx_order_item_product ON order_items(product_id);
CREATE INDEX idx_order_item_order ON order_items(order_id);
CREATE INDEX idx_review_product ON reviews(product_id);
• Optimized connection pool settings:
# application.yml with optimized connection pool settings
spring:
datasource:
hikari:
maximum-pool-size: 20
minimum-idle: 5
idle-timeout: 30000
connection-timeout: 10000
max-lifetime: 1800000
leak-detection-threshold: 60000
• Long-term: Implemented a comprehensive performance testing and monitoring strategy:
// Custom Spring Boot Actuator endpoint for performance metrics
@Component
@Endpoint(id = "performance")
public class PerformanceMetricsEndpoint {
private final MeterRegistry meterRegistry;
public PerformanceMetricsEndpoint(MeterRegistry meterRegistry) {
this.meterRegistry = meterRegistry;
}
@ReadOperation
public Map<String, Object> getPerformanceMetrics() {
Map<String, Object> metrics = new HashMap<>();
// API response times
metrics.put("api.responseTime.p95", getTimerMetric("http.server.requests", "0.95"));
metrics.put("api.responseTime.p99", getTimerMetric("http.server.requests", "0.99"));
metrics.put("api.responseTime.max", getTimerMetric("http.server.requests", "max"));
// Database metrics
metrics.put("db.connectionPool.active", getGaugeValue("hikaricp.connections.active"));
metrics.put("db.connectionPool.idle", getGaugeValue("hikaricp.connections.idle"));
metrics.put("db.connectionPool.usage", getGaugeValue("hikaricp.connections.usage"));
metrics.put("db.connectionPool.connectionTimeout", getCounterValue("hikaricp.connections.timeout"));
// External service call metrics
metrics.put("externalService.responseTime.p95", getTimerMetric("http.client.requests", "0.95"));
metrics.put("externalService.errors", getCounterValue("http.client.requests.errors"));
return metrics;
}
private double getTimerMetric(String name, String percentile) {
return meterRegistry.find(name)
.timer()
.map(timer -> {
if (percentile.equals("max")) {
return timer.max(TimeUnit.MILLISECONDS);
} else {
return timer.percentile(Double.parseDouble(percentile), TimeUnit.MILLISECONDS);
}
})
.orElse(0.0);
}
private double getGaugeValue(String name) {
return meterRegistry.find(name)
.gauge()
.map(Gauge::value)
.orElse(0.0);
}
private double getCounterValue(String name) {
return meterRegistry.find(name)
.counter()
.map(Counter::count)
.orElse(0.0);
}
}
• Created a JMeter test plan for continuous performance testing:
<?xml version="1.0" encoding="UTF-8"?>
<jmeterTestPlan version="1.2" properties="5.0" jmeter="5.4.1">
<hashTree>
<TestPlan guiclass="TestPlanGui" testclass="TestPlan" testname="Microservice Performance Test" enabled="true">
<stringProp name="TestPlan.comments"></stringProp>
<boolProp name="TestPlan.functional_mode">false</boolProp>
<boolProp name="TestPlan.tearDown_on_shutdown">true</boolProp>
<boolProp name="TestPlan.serialize_threadgroups">false</boolProp>
<elementProp name="TestPlan.user_defined_variables" elementType="Arguments" guiclass="ArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
<collectionProp name="Arguments.arguments">
<elementProp name="HOST" elementType="Argument">
<stringProp name="Argument.name">HOST</stringProp>
<stringProp name="Argument.value">${__P(host,localhost)}</stringProp>
<stringProp name="Argument.metadata">=</stringProp>
</elementProp>
<elementProp name="PORT" elementType="Argument">
<stringProp name="Argument.name">PORT</stringProp>
<stringProp name="Argument.value">${__P(port,8080)}</stringProp>
<stringProp name="Argument.metadata">=</stringProp>
</elementProp>
<elementProp name="THREADS" elementType="Argument">
<stringProp name="Argument.name">THREADS</stringProp>
<stringProp name="Argument.value">${__P(threads,50)}</stringProp>
<stringProp name="Argument.metadata">=</stringProp>
</elementProp>
<elementProp name="RAMP_UP" elementType="Argument">
<stringProp name="Argument.name">RAMP_UP</stringProp>
<stringProp name="Argument.value">${__P(rampup,30)}</stringProp>
<stringProp name="Argument.metadata">=</stringProp>
</elementProp>
<elementProp name="DURATION" elementType="Argument">
<stringProp name="Argument.name">DURATION</stringProp>
<stringProp name="Argument.value">${__P(duration,300)}</stringProp>
<stringProp name="Argument.metadata">=</stringProp>
</elementProp>
</collectionProp>
</elementProp>
<stringProp name="TestPlan.user_define_classpath"></stringProp>
</TestPlan>
<hashTree>
<ThreadGroup guiclass="ThreadGroupGui" testclass="ThreadGroup" testname="Product API Test" enabled="true">
<stringProp name="ThreadGroup.on_sample_error">continue</stringProp>
<elementProp name="ThreadGroup.main_controller" elementType="LoopController" guiclass="LoopControlPanel" testclass="LoopController" testname="Loop Controller" enabled="true">
<boolProp name="LoopController.continue_forever">false</boolProp>
<intProp name="LoopController.loops">-1</intProp>
</elementProp>
<stringProp name="ThreadGroup.num_threads">${THREADS}</stringProp>
<stringProp name="ThreadGroup.ramp_time">${RAMP_UP}</stringProp>
<boolProp name="ThreadGroup.scheduler">true</boolProp>
<stringProp name="ThreadGroup.duration">${DURATION}</stringProp>
<stringProp name="ThreadGroup.delay"></stringProp>
<boolProp name="ThreadGroup.same_user_on_next_iteration">true</boolProp>
</ThreadGroup>
<hashTree>
<CSVDataSet guiclass="TestBeanGUI" testclass="CSVDataSet" testname="Product IDs CSV" enabled="true">
<stringProp name="delimiter">,</stringProp>
<stringProp name="fileEncoding">UTF-8</stringProp>
<stringProp name="filename">product_ids.csv</stringProp>
<boolProp name="ignoreFirstLine">false</boolProp>
<boolProp name="quotedData">false</boolProp>
<boolProp name="recycle">true</boolProp>
<stringProp name="shareMode">shareMode.all</stringProp>
<boolProp name="stopThread">false</boolProp>
<stringProp name="variableNames">productId</stringProp>
</CSVDataSet>
<hashTree/>
<HTTPSamplerProxy guiclass="HttpTestSampleGui" testclass="HTTPSamplerProxy" testname="Get Product Details" enabled="true">
<elementProp name="HTTPsampler.Arguments" elementType="Arguments" guiclass="HTTPArgumentsPanel" testclass="Arguments" testname="User Defined Variables" enabled="true">
<collectionProp name="Arguments.arguments"/>
</elementProp>
<stringProp name="HTTPSampler.domain">${HOST}</stringProp>
<stringProp name="HTTPSampler.port">${PORT}</stringProp>
<stringProp name="HTTPSampler.protocol">http</stringProp>
<stringProp name="HTTPSampler.contentEncoding"></stringProp>
<stringProp name="HTTPSampler.path">/api/products/${productId}</stringProp>
<stringProp name="HTTPSampler.method">GET</stringProp>
<boolProp name="HTTPSampler.follow_redirects">true</boolProp>
<boolProp name="HTTPSampler.auto_redirects">false</boolProp>
<boolProp name="HTTPSampler.use_keepalive">true</boolProp>
<boolProp name="HTTPSampler.DO_MULTIPART_POST">false</boolProp>
<stringProp name="HTTPSampler.embedded_url_re"></stringProp>
<stringProp name="HTTPSampler.connect_timeout">5000</stringProp>
<stringProp name="HTTPSampler.response_timeout">10000</stringProp>
</HTTPSamplerProxy>
<hashTree>
<HeaderManager guiclass="HeaderPanel" testclass="HeaderManager" testname="HTTP Headers" enabled="true">
<collectionProp name="HeaderManager.headers">
<elementProp name="" elementType="Header">
<stringProp name="Header.name">Accept</stringProp>
<stringProp name="Header.value">application/json</stringProp>
</elementProp>
<elementProp name="" elementType="Header">
<stringProp name="Header.name">Content-Type</stringProp>
<stringProp name="Header.value">application/json</stringProp>
</elementProp>
</collectionProp>
</HeaderManager>
<hashTree/>
<ResponseAssertion guiclass="AssertionGui" testclass="ResponseAssertion" testname="Response Assertion" enabled="true">
<collectionProp name="Asserion.test_strings">
<stringProp name="49586">200</stringProp>
</collectionProp>
<stringProp name="Assertion.custom_message"></stringProp>
<stringProp name="Assertion.test_field">Assertion.response_code</stringProp>
<boolProp name="Assertion.assume_success">false</boolProp>
<intProp name="Assertion.test_type">8</intProp>
</ResponseAssertion>
<hashTree/>
<JSONPathAssertion guiclass="JSONPathAssertionGui" testclass="JSONPathAssertion" testname="JSON Assertion - Product ID" enabled="true">
<stringProp name="JSON_PATH">$.id</stringProp>
<stringProp name="EXPECTED_VALUE">${productId}</stringProp>
<boolProp name="JSONVALIDATION">true</boolProp>
<boolProp name="EXPECT_NULL">false</boolProp>
<boolProp name="INVERT">false</boolProp>
<boolProp name="ISREGEX">false</boolProp>
</JSONPathAssertion>
<hashTree/>
</hashTree>
<ConstantThroughputTimer guiclass="ConstantThroughputTimerGui" testclass="ConstantThroughputTimer" testname="Constant Throughput Timer" enabled="true">
<intProp name="calcMode">2</intProp>
<doubleProp>
<name>throughput</name>
<value>600.0</value>
<savedValue>0.0</savedValue>
</doubleProp>
</ConstantThroughputTimer>
<hashTree/>
</hashTree>
<ResultCollector guiclass="SummaryReport" testclass="ResultCollector" testname="Summary Report" enabled="true">
<boolProp name="ResultCollector.error_logging">false</boolProp>
<objProp>
<name>saveConfig</name>
<value class="SampleSaveConfiguration">
<time>true</time>
<latency>true</latency>
<timestamp>true</timestamp>
<success>true</success>
<label>true</label>
<code>true</code>
<message>true</message>
<threadName>true</threadName>
<dataType>true</dataType>
<encoding>false</encoding>
<assertions>true</assertions>
<subresults>true</subresults>
<responseData>false</responseData>
<samplerData>false</samplerData>
<xml>false</xml>
<fieldNames>true</fieldNames>
<responseHeaders>false</responseHeaders>
<requestHeaders>false</requestHeaders>
<responseDataOnError>false</responseDataOnError>
<saveAssertionResultsFailureMessage>true</saveAssertionResultsFailureMessage>
<assertionsResultsToSave>0</assertionsResultsToSave>
<bytes>true</bytes>
<sentBytes>true</sentBytes>
<url>true</url>
<threadCounts>true</threadCounts>
<idleTime>true</idleTime>
<connectTime>true</connectTime>
</value>
</objProp>
<stringProp name="filename"></stringProp>
</ResultCollector>
<hashTree/>
<ResultCollector guiclass="StatVisualizer" testclass="ResultCollector" testname="Aggregate Report" enabled="true">
<boolProp name="ResultCollector.error_logging">false</boolProp>
<objProp>
<name>saveConfig</name>
<value class="SampleSaveConfiguration">
<time>true</time>
<latency>true</latency>
<timestamp>true</timestamp>
<success>true</success>
<label>true</label>
<code>true</code>
<message>true</message>
<threadName>true</threadName>
<dataType>true</dataType>
<encoding>false</encoding>
<assertions>true</assertions>
<subresults>true</subresults>
<responseData>false</responseData>
<samplerData>false</samplerData>
<xml>false</xml>
<fieldNames>true</fieldNames>
<responseHeaders>false</responseHeaders>
<requestHeaders>false</requestHeaders>
<responseDataOnError>false</responseDataOnError>
<saveAssertionResultsFailureMessage>true</saveAssertionResultsFailureMessage>
<assertionsResultsToSave>0</assertionsResultsToSave>
<bytes>true</bytes>
<sentBytes>true</sentBytes>
<url>true</url>
<threadCounts>true</threadCounts>
<idleTime>true</idleTime>
<connectTime>true</connectTime>
</value>
</objProp>
<stringProp name="filename"></stringProp>
</ResultCollector>
<hashTree/>
<ResultCollector guiclass="GraphVisualizer" testclass="ResultCollector" testname="Graph Results" enabled="true">
<boolProp name="ResultCollector.error_logging">false</boolProp>
<objProp>
<name>saveConfig</name>
<value class="SampleSaveConfiguration">
<time>true</time>
<latency>true</latency>
<timestamp>true</timestamp>
<success>true</success>
<label>true</label>
<code>true</code>
<message>true</message>
<threadName>true</threadName>
<dataType>true</dataType>
<encoding>false</encoding>
<assertions>true</assertions>
<subresults>true</subresults>
<responseData>false</responseData>
<samplerData>false</samplerData>
<xml>false</xml>
<fieldNames>true</fieldNames>
<responseHeaders>false</responseHeaders>
<requestHeaders>false</requestHeaders>
<responseDataOnError>false</responseDataOnError>
<saveAssertionResultsFailureMessage>true</saveAssertionResultsFailureMessage>
<assertionsResultsToSave>0</assertionsResultsToSave>
<bytes>true</bytes>
<sentBytes>true</sentBytes>
<url>true</url>
<threadCounts>true</threadCounts>
<idleTime>true</idleTime>
<connectTime>true</connectTime>
</value>
</objProp>
<stringProp name="filename"></stringProp>
</ResultCollector>
<hashTree/>
</hashTree>
</hashTree>
</jmeterTestPlan>
• Implemented a Rust-based performance monitoring tool:
// performance_monitor.rs
use actix_web::{web, App, HttpResponse, HttpServer, Responder};
use chrono::{DateTime, Utc};
use prometheus::{Encoder, Gauge, Histogram, HistogramOpts, IntCounter, Registry, TextEncoder};
use reqwest::Client;
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use std::sync::Arc;
use std::time::{Duration, Instant};
use tokio::sync::Mutex;
use tokio::time;
// Performance metrics
struct Metrics {
registry: Registry,
api_response_time: Histogram,
api_error_count: IntCounter,
db_connection_count: Gauge,
db_query_time: Histogram,
memory_usage: Gauge,
cpu_usage: Gauge,
}
impl Metrics {
fn new() -> Self {
let registry = Registry::new();
let api_response_time = Histogram::with_opts(HistogramOpts::new(
"api_response_time_seconds",
"API response time in seconds",
).buckets(vec![0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0]))
.unwrap();
let api_error_count = IntCounter::new(
"api_error_count",
"Number of API errors",
).unwrap();
let db_connection_count = Gauge::new(
"db_connection_count",
"Number of active database connections",
).unwrap();
let db_query_time = Histogram::with_opts(HistogramOpts::new(
"db_query_time_seconds",
"Database query time in seconds",
).buckets(vec![0.001, 0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0]))
.unwrap();
let memory_usage = Gauge::new(
"memory_usage_bytes",
"Memory usage in bytes",
).unwrap();
let cpu_usage = Gauge::new(
"cpu_usage_percent",
"CPU usage percentage",
).unwrap();
registry.register(Box::new(api_response_time.clone())).unwrap();
registry.register(Box::new(api_error_count.clone())).unwrap();
registry.register(Box::new(db_connection_count.clone())).unwrap();
registry.register(Box::new(db_query_time.clone())).unwrap();
registry.register(Box::new(memory_usage.clone())).unwrap();
registry.register(Box::new(cpu_usage.clone())).unwrap();
Metrics {
registry,
api_response_time,
api_error_count,
db_connection_count,
db_query_time,
memory_usage,
cpu_usage,
}
}
}
#[derive(Debug, Serialize, Deserialize)]
struct ApiEndpoint {
name: String,
url: String,
method: String,
expected_status: u16,
}
#[derive(Debug, Serialize, Deserialize)]
struct Config {
service_name: String,
endpoints: Vec<ApiEndpoint>,
database_url: Option<String>,
interval_seconds: u64,
}
#[derive(Debug, Serialize, Deserialize)]
struct EndpointResult {
name: String,
url: String,
status: u16,
response_time_ms: u64,
success: bool,
timestamp: DateTime<Utc>,
}
#[derive(Debug, Serialize, Deserialize)]
struct PerformanceReport {
service_name: String,
timestamp: DateTime<Utc>,
endpoints: Vec<EndpointResult>,
memory_usage_mb: f64,
cpu_usage_percent: f64,
db_connections: Option<i64>,
}
async fn metrics_handler(metrics: web::Data<Arc<Mutex<Metrics>>>) -> impl Responder {
let metrics = metrics.lock().await;
let encoder = TextEncoder::new();
let mut buffer = Vec::new();
encoder.encode(&metrics.registry.gather(), &mut buffer).unwrap();
HttpResponse::Ok()
.content_type("text/plain")
.body(String::from_utf8(buffer).unwrap())
}
async fn report_handler(report_history: web::Data<Arc<Mutex<Vec<PerformanceReport>>>>) -> impl Responder {
let reports = report_history.lock().await;
HttpResponse::Ok().json(&*reports)
}
#[actix_web::main]
async fn main() -> std::io::Result<()> {
// Load configuration
let config_file = std::fs::read_to_string("config.json").expect("Failed to read config file");
let config: Config = serde_json::from_str(&config_file).expect("Failed to parse config");
// Initialize metrics
let metrics = Arc::new(Mutex::new(Metrics::new()));
// Initialize report history
let report_history = Arc::new(Mutex::new(Vec::new()));
// Create HTTP client
let client = Client::builder()
.timeout(Duration::from_secs(10))
.build()
.expect("Failed to create HTTP client");
// Start monitoring task
let metrics_clone = metrics.clone();
let report_history_clone = report_history.clone();
let config_clone = config.clone();
tokio::spawn(async move {
let interval = Duration::from_secs(config_clone.interval_seconds);
let mut interval_timer = time::interval(interval);
loop {
interval_timer.tick().await;
let mut report = PerformanceReport {
service_name: config_clone.service_name.clone(),
timestamp: Utc::now(),
endpoints: Vec::new(),
memory_usage_mb: 0.0,
cpu_usage_percent: 0.0,
db_connections: None,
};
// Test API endpoints
for endpoint in &config_clone.endpoints {
let start = Instant::now();
let result = match endpoint.method.as_str() {
"GET" => client.get(&endpoint.url).send().await,
"POST" => client.post(&endpoint.url).send().await,
_ => panic!("Unsupported HTTP method"),
};
let elapsed = start.elapsed();
let response_time_ms = elapsed.as_millis() as u64;
let metrics = metrics_clone.lock().await;
metrics.api_response_time.observe(elapsed.as_secs_f64());
match result {
Ok(response) => {
let status = response.status().as_u16();
let success = status == endpoint.expected_status;
if !success {
metrics.api_error_count.inc();
}
report.endpoints.push(EndpointResult {
name: endpoint.name.clone(),
url: endpoint.url.clone(),
status,
response_time_ms,
success,
timestamp: Utc::now(),
});
}
Err(e) => {
println!("Error calling endpoint {}: {}", endpoint.url, e);
metrics.api_error_count.inc();
report.endpoints.push(EndpointResult {
name: endpoint.name.clone(),
url: endpoint.url.clone(),
status: 0,
response_time_ms,
success: false,
timestamp: Utc::now(),
});
}
}
}
// Get system metrics
if let Ok(sys_info) = get_system_info() {
let metrics = metrics_clone.lock().await;
metrics.memory_usage.set(sys_info.0 as f64);
metrics.cpu_usage.set(sys_info.1);
report.memory_usage_mb = sys_info.0 as f64 / 1024.0 / 1024.0;
report.cpu_usage_percent = sys_info.1;
}
// Get database connections if configured
if let Some(db_url) = &config_clone.database_url {
if let Ok(connections) = get_db_connections(db_url).await {
let metrics = metrics_clone.lock().await;
metrics.db_connection_count.set(connections as f64);
report.db_connections = Some(connections);
}
}
// Add report to history
let mut history = report_history_clone.lock().await;
history.push(report);
// Keep only the last 100 reports
if history.len() > 100 {
history.remove(0);
}
}
});
// Start HTTP server
HttpServer::new(move || {
App::new()
.app_data(web::Data::new(metrics.clone()))
.app_data(web::Data::new(report_history.clone()))
.route("/metrics", web::get().to(metrics_handler))
.route("/report", web::get().to(report_handler))
})
.bind("0.0.0.0:8081")?
.run()
.await
}
// Get system information (memory usage in bytes, CPU usage percentage)
fn get_system_info() -> Result<(u64, f64), Box<dyn std::error::Error>> {
let mut sys = sysinfo::System::new_all();
sys.refresh_all();
let memory_used = sys.used_memory();
let cpu_usage = sys.global_processor_info().cpu_usage();
Ok((memory_used, cpu_usage))
}
// Get database connections
async fn get_db_connections(db_url: &str) -> Result<i64, Box<dyn std::error::Error>> {
// This is a simplified example. In a real implementation, you would query
// the database to get the number of active connections.
// For PostgreSQL, you might use:
// SELECT count(*) FROM pg_stat_activity WHERE state = 'active';
// For this example, we'll just return a random number
Ok(rand::random::<i64>() % 20 + 5)
}
Lessons Learned:
Microservice architecture requires careful performance testing and optimization.
How to Avoid:
Implement comprehensive performance testing before production deployment.
Use distributed tracing to identify bottlenecks in microservice communication.
Optimize database queries and add appropriate indexes.
Implement parallel processing for independent operations.
Monitor performance metrics continuously to catch regressions early.
No summary provided
What Happened:
A company conducted performance tests before a major release, and the tests showed acceptable performance. However, after deployment to production, the application experienced severe performance degradation under real user load. The performance testing had failed to identify critical bottlenecks that emerged in production.
Diagnosis Steps:
Compared test scenarios with actual production usage patterns.
Analyzed test data and production metrics.
Reviewed performance test design and configuration.
Examined test environment setup and differences from production.
Investigated data access patterns and caching behavior.
Root Cause:
The performance testing initiative failed due to multiple design flaws: 1. Test scenarios did not accurately represent real user behavior and access patterns 2. Test data was too small and lacked the variety and volume of production data 3. Cache warming was not properly accounted for in the test design 4. Database connection pooling was configured differently in test vs. production 5. Tests focused on average response times rather than percentile distributions
Fix/Workaround:
• Short-term: Implemented immediate improvements to performance testing with realistic JMeter test plans
• Optimized test data generation to match production patterns and volume
• Implemented proper cache warming procedures before test execution
• Added percentile-based metrics collection instead of just averages
• Long-term: Created a comprehensive performance testing framework with:
- Realistic user journey modeling
- Production-like data volumes
- Proper environment configuration matching production
- Monitoring of system metrics during tests
- Percentile-based assertions for response times
Lessons Learned:
Performance testing must accurately simulate real-world conditions to be effective.
How to Avoid:
Design test scenarios based on actual user behavior data.
Use production-like data volumes and patterns.
Account for caching effects in test design.
Focus on percentile distributions rather than averages.
Test with realistic infrastructure configurations.
No summary provided
What Happened:
A team developed a new microservice that passed all performance tests in the testing environment, showing it could handle the required throughput with acceptable response times. However, when deployed to production, the service began failing under similar load levels, with response times increasing exponentially and eventually resulting in connection timeouts.
Diagnosis Steps:
Compared configuration between test and production environments.
Analyzed thread dumps and heap dumps from both environments.
Monitored database connection usage patterns.
Examined network traffic and connection establishment times.
Reviewed load testing methodology and test scripts.
Root Cause:
The investigation revealed that the connection pooling configuration was significantly different between environments: 1. The test environment used a connection pool with 100 connections, while production was limited to 20 2. The load testing tool (JMeter) maintained persistent connections during tests, while real users created new connections more frequently 3. The database had a connection limit that was being reached in production but not in testing 4. Connection pool exhaustion led to threads waiting for available connections, causing cascading timeouts 5. The performance tests didn't properly simulate connection establishment overhead
Fix/Workaround:
• Short-term: Increased the connection pool size in production:
// Before: Production HikariCP configuration
HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:postgresql://db.example.com:5432/mydb");
config.setUsername("app_user");
config.setPassword("********");
config.setMaximumPoolSize(20); // Too small for production load
config.setMinimumIdle(5);
config.setIdleTimeout(30000);
config.setConnectionTimeout(10000); // Too short for peak load
// After: Optimized HikariCP configuration
HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:postgresql://db.example.com:5432/mydb");
config.setUsername("app_user");
config.setPassword("********");
config.setMaximumPoolSize(50); // Increased based on load analysis
config.setMinimumIdle(10);
config.setIdleTimeout(60000);
config.setConnectionTimeout(30000); // Increased to handle peak load
config.setMaxLifetime(1800000); // 30 minutes
config.setLeakDetectionThreshold(60000); // Help identify connection leaks
• Implemented connection pool monitoring:
// Connection pool metrics with Micrometer
import com.zaxxer.hikari.HikariDataSource;
import com.zaxxer.hikari.HikariPoolMXBean;
import io.micrometer.core.instrument.Gauge;
import io.micrometer.core.instrument.MeterRegistry;
import org.springframework.stereotype.Component;
import javax.sql.DataSource;
import java.util.concurrent.TimeUnit;
@Component
public class ConnectionPoolMetrics {
public ConnectionPoolMetrics(DataSource dataSource, MeterRegistry registry) {
if (dataSource instanceof HikariDataSource) {
HikariDataSource hikariDataSource = (HikariDataSource) dataSource;
// Register metrics
Gauge.builder("hikaricp.connections.active", hikariDataSource,
ds -> ds.getHikariPoolMXBean().getActiveConnections())
.description("Active connections")
.register(registry);
Gauge.builder("hikaricp.connections.idle", hikariDataSource,
ds -> ds.getHikariPoolMXBean().getIdleConnections())
.description("Idle connections")
.register(registry);
Gauge.builder("hikaricp.connections.total", hikariDataSource,
ds -> ds.getHikariPoolMXBean().getTotalConnections())
.description("Total connections")
.register(registry);
Gauge.builder("hikaricp.connections.pending", hikariDataSource,
ds -> ds.getHikariPoolMXBean().getThreadsAwaitingConnection())
.description("Pending threads")
.register(registry);
// Register timers for connection acquisition
registry.timer("hikaricp.connection.acquisition.timer")
.record(hikariDataSource.getConnectionTimeout(), TimeUnit.MILLISECONDS);
}
}
}
• Long-term: Implemented a comprehensive connection pool sizing calculator:
// connection_pool_calculator.go
package main
import (
"fmt"
"math"
"time"
)
// ConnectionPoolCalculator calculates optimal connection pool settings
type ConnectionPoolCalculator struct {
// Application metrics
ConcurrentUsers int
AverageRequestsPerUser float64
PeakRequestsPerUser float64
// Database metrics
AverageQueryTime float64 // in seconds
PeakQueryTime float64 // in seconds
QueriesPerRequest float64
// System constraints
MaxDatabaseConnections int
DatabaseInstances int
ApplicationInstances int
// Connection overhead
ConnectionEstablishTime float64 // in seconds
ConnectionIdleTimeout float64 // in seconds
// Safety factors
PoolSizeSafetyFactor float64
IdleConnectionsFactor float64
}
// NewDefaultCalculator creates a calculator with reasonable defaults
func NewDefaultCalculator() *ConnectionPoolCalculator {
return &ConnectionPoolCalculator{
ConcurrentUsers: 100,
AverageRequestsPerUser: 2.0,
PeakRequestsPerUser: 5.0,
AverageQueryTime: 0.05, // 50ms
PeakQueryTime: 0.2, // 200ms
QueriesPerRequest: 10,
MaxDatabaseConnections: 500,
DatabaseInstances: 1,
ApplicationInstances: 2,
ConnectionEstablishTime: 0.5, // 500ms
ConnectionIdleTimeout: 300, // 5 minutes
PoolSizeSafetyFactor: 1.25,
IdleConnectionsFactor: 0.5,
}
}
// CalculatePoolSize calculates the optimal connection pool size
func (c *ConnectionPoolCalculator) CalculatePoolSize() int {
// Calculate connections needed based on concurrent users and queries
averageConcurrentQueries := float64(c.ConcurrentUsers) * c.AverageRequestsPerUser * c.QueriesPerRequest
peakConcurrentQueries := float64(c.ConcurrentUsers) * c.PeakRequestsPerUser * c.QueriesPerRequest
// Calculate connections needed based on query time
connectionsNeededAverage := averageConcurrentQueries * c.AverageQueryTime
connectionsNeededPeak := peakConcurrentQueries * c.PeakQueryTime
// Use the higher of the two calculations
connectionsNeeded := math.Max(connectionsNeededAverage, connectionsNeededPeak)
// Apply safety factor
connectionsNeeded = connectionsNeeded * c.PoolSizeSafetyFactor
// Divide by number of application instances
connectionsPerInstance := connectionsNeeded / float64(c.ApplicationInstances)
// Ensure we don't exceed database connection limits
maxConnectionsPerInstance := float64(c.MaxDatabaseConnections) / float64(c.ApplicationInstances)
connectionsPerInstance = math.Min(connectionsPerInstance, maxConnectionsPerInstance)
return int(math.Ceil(connectionsPerInstance))
}
// CalculateMinIdleConnections calculates the optimal minimum idle connections
func (c *ConnectionPoolCalculator) CalculateMinIdleConnections() int {
poolSize := c.CalculatePoolSize()
minIdle := int(float64(poolSize) * c.IdleConnectionsFactor)
return minIdle
}
// CalculateConnectionTimeout calculates the optimal connection timeout
func (c *ConnectionPoolCalculator) CalculateConnectionTimeout() time.Duration {
// Base timeout on connection establish time plus a buffer
timeoutSeconds := c.ConnectionEstablishTime * 3
return time.Duration(timeoutSeconds * float64(time.Second))
}
// CalculateIdleTimeout calculates the optimal idle timeout
func (c *ConnectionPoolCalculator) CalculateIdleTimeout() time.Duration {
// Base idle timeout on average time between requests
averageTimeBetweenRequests := 1.0 / c.AverageRequestsPerUser
// Set idle timeout to be several times longer than average time between requests
// but not longer than the configured max idle timeout
idleTimeoutSeconds := math.Min(averageTimeBetweenRequests * 10, c.ConnectionIdleTimeout)
return time.Duration(idleTimeoutSeconds * float64(time.Second))
}
// CalculateMaxLifetime calculates the optimal max lifetime for connections
func (c *ConnectionPoolCalculator) CalculateMaxLifetime() time.Duration {
// Set max lifetime to avoid connection staleness
// A common practice is to set it below the database's connection timeout
// For example, if the database closes idle connections after 8 hours, set this to 7 hours
return 7 * time.Hour
}
// GenerateHikariConfig generates HikariCP configuration
func (c *ConnectionPoolCalculator) GenerateHikariConfig() string {
poolSize := c.CalculatePoolSize()
minIdle := c.CalculateMinIdleConnections()
connectionTimeout := c.CalculateConnectionTimeout()
idleTimeout := c.CalculateIdleTimeout()
maxLifetime := c.CalculateMaxLifetime()
config := fmt.Sprintf(`
HikariConfig config = new HikariConfig();
config.setJdbcUrl("jdbc:postgresql://db.example.com:5432/mydb");
config.setUsername("app_user");
config.setPassword("********");
config.setMaximumPoolSize(%d);
config.setMinimumIdle(%d);
config.setIdleTimeout(%d); // %s
config.setConnectionTimeout(%d); // %s
config.setMaxLifetime(%d); // %s
config.setLeakDetectionThreshold(60000); // 1 minute
`,
poolSize,
minIdle,
idleTimeout.Milliseconds(), idleTimeout.String(),
connectionTimeout.Milliseconds(), connectionTimeout.String(),
maxLifetime.Milliseconds(), maxLifetime.String(),
)
return config
}
// GenerateSpringBootConfig generates Spring Boot configuration
func (c *ConnectionPoolCalculator) GenerateSpringBootConfig() string {
poolSize := c.CalculatePoolSize()
minIdle := c.CalculateMinIdleConnections()
connectionTimeout := c.CalculateConnectionTimeout()
idleTimeout := c.CalculateIdleTimeout()
maxLifetime := c.CalculateMaxLifetime()
config := fmt.Sprintf(`
# Spring Boot application.properties
spring.datasource.url=jdbc:postgresql://db.example.com:5432/mydb
spring.datasource.username=app_user
spring.datasource.password=********
spring.datasource.hikari.maximum-pool-size=%d
spring.datasource.hikari.minimum-idle=%d
spring.datasource.hikari.idle-timeout=%d
spring.datasource.hikari.connection-timeout=%d
spring.datasource.hikari.max-lifetime=%d
spring.datasource.hikari.leak-detection-threshold=60000
`,
poolSize,
minIdle,
idleTimeout.Milliseconds(),
connectionTimeout.Milliseconds(),
maxLifetime.Milliseconds(),
)
return config
}
func main() {
// Create calculator with default values
calc := NewDefaultCalculator()
// Override with specific values
calc.ConcurrentUsers = 500
calc.AverageRequestsPerUser = 3.0
calc.PeakRequestsPerUser = 8.0
calc.QueriesPerRequest = 15
calc.ApplicationInstances = 4
// Calculate and print results
poolSize := calc.CalculatePoolSize()
minIdle := calc.CalculateMinIdleConnections()
connectionTimeout := calc.CalculateConnectionTimeout()
idleTimeout := calc.CalculateIdleTimeout()
maxLifetime := calc.CalculateMaxLifetime()
fmt.Println("Connection Pool Calculator Results")
fmt.Println("==================================")
fmt.Printf("Optimal Pool Size: %d connections per instance\n", poolSize)
fmt.Printf("Minimum Idle Connections: %d connections per instance\n", minIdle)
fmt.Printf("Connection Timeout: %s\n", connectionTimeout)
fmt.Printf("Idle Timeout: %s\n", idleTimeout)
fmt.Printf("Max Lifetime: %s\n", maxLifetime)
fmt.Println()
fmt.Println("HikariCP Configuration:")
fmt.Println(calc.GenerateHikariConfig())
fmt.Println()
fmt.Println("Spring Boot Configuration:")
fmt.Println(calc.GenerateSpringBootConfig())
}
• Improved load testing methodology to better simulate real-world conditions:
# JMeter test plan improvements
---
test_plan:
name: "Realistic Connection Pool Testing"
description: "Test plan that accurately simulates real-world connection patterns"
thread_groups:
- name: "Steady Load Users"
threads: 100
ramp_up: 60
duration: 1800
loop_count: -1 # Forever
- name: "Bursty Users"
threads: 50
ramp_up: 5
duration: 1800
scheduler:
start_every: 300 # Every 5 minutes
duration: 60 # For 1 minute
- name: "Connection Churners"
threads: 20
ramp_up: 30
duration: 1800
cookie_manager:
clear_cookies_each_iteration: true
http_cache_manager:
clear_cache_each_iteration: true
timers:
- name: "Realistic Think Time"
type: "gaussian_random_timer"
constant_delay: 1000
deviation: 500
controllers:
- name: "Transaction Controller"
type: "transaction"
include_timers: false
listeners:
- name: "Connection Pool Metrics"
type: "backend_listener"
class_name: "org.example.ConnectionPoolMetricsBackendListener"
parameters:
prometheusEndpoint: "http://prometheus:9090/metrics"
applicationName: "test-application"
assertions:
- name: "Response Time Assertion"
type: "response_time"
test_field: "all_samples"
criteria: "less_than_or_equal_to"
value: 500
- name: "Connection Timeout Assertion"
type: "response_message"
test_field: "response_data"
pattern: "Connection timed out"
test_type: "not_contains"
pre_processors:
- name: "Clear Connection State"
type: "bsh_pre_processor"
script: |
// Force new connection establishment
sampler.getHeaderManager().removeHeaderNamed("Connection");
sampler.getHeaderManager().add(new org.apache.jmeter.protocol.http.control.Header("Connection", "close"));
Lessons Learned:
Performance testing must accurately simulate real-world connection patterns and environment configurations.
How to Avoid:
Align test environment configurations with production settings.
Test with realistic connection establishment patterns.
Monitor connection pool metrics during load tests.
Calculate optimal connection pool sizes based on workload.
Include connection churn in performance test scenarios.
No summary provided
What Happened:
A company conducted extensive load testing before a major product launch. The tests showed acceptable performance under load, but when the application went live, it experienced severe performance degradation and eventually crashed. The incident affected thousands of users and resulted in significant revenue loss.
Diagnosis Steps:
Compared test scenarios with actual production usage patterns.
Analyzed database query patterns during the outage.
Reviewed load test configuration and data generation.
Examined application logs and database metrics.
Profiled the application under real-world conditions.
Root Cause:
The load testing initiative failed because: 1. Test data was randomly generated and didn't match real-world data distribution 2. Key database queries that performed poorly with real data worked well with test data 3. Cache hit ratios were artificially high during testing 4. Complex user workflows weren't properly simulated 5. Data skew that occurred in production wasn't present in test data
Fix/Workaround:
• Short-term: Implemented database query optimizations:
-- Before: Inefficient query with no index utilization
SELECT * FROM orders
WHERE customer_id = ?
AND status = 'PROCESSING'
ORDER BY created_at DESC;
-- After: Optimized query with proper indexing
CREATE INDEX idx_orders_customer_status_created ON orders(customer_id, status, created_at DESC);
SELECT o.id, o.customer_id, o.status, o.amount, o.created_at
FROM orders o
WHERE o.customer_id = ?
AND o.status = 'PROCESSING'
ORDER BY o.created_at DESC
LIMIT 100;
• Implemented realistic data generation for testing:
# data_generator.py - Realistic test data generator
import random
import string
import datetime
import numpy as np
import pandas as pd
from faker import Faker
from sqlalchemy import create_engine
fake = Faker()
class RealisticDataGenerator:
def __init__(self, db_connection_string, production_stats_file=None):
self.engine = create_engine(db_connection_string)
self.stats = self._load_production_stats(production_stats_file)
def _load_production_stats(self, stats_file):
"""Load production statistics if available, otherwise use defaults."""
if stats_file:
try:
return pd.read_csv(stats_file)
except Exception as e:
print(f"Warning: Could not load production stats: {e}")
# Default statistics based on domain knowledge
return {
"order_amount_mean": 120.50,
"order_amount_std": 75.25,
"orders_per_customer_mean": 3.2,
"orders_per_customer_std": 2.1,
"customer_activity_pareto_shape": 1.5, # For power law distribution
"product_popularity_zipf_param": 1.2, # For Zipf distribution
"status_distribution": {
"COMPLETED": 0.68,
"PROCESSING": 0.15,
"PENDING_PAYMENT": 0.12,
"CANCELLED": 0.04,
"REFUNDED": 0.01
},
"hourly_order_distribution": [
0.01, 0.01, 0.005, 0.005, 0.005, 0.01, # 0-5 hours
0.02, 0.04, 0.06, 0.07, 0.06, 0.07, # 6-11 hours
0.08, 0.07, 0.06, 0.06, 0.07, 0.09, # 12-17 hours
0.10, 0.09, 0.08, 0.06, 0.04, 0.02 # 18-23 hours
]
}
def generate_customers(self, count):
"""Generate realistic customer data."""
customers = []
# Create a power law distribution for customer activity
activity_scores = np.random.pareto(
self.stats["customer_activity_pareto_shape"],
count
) + 1 # +1 to avoid zeros
# Normalize to 1-100 range for activity score
max_score = np.max(activity_scores)
activity_scores = 1 + (activity_scores / max_score) * 99
for i in range(count):
customers.append({
"id": i + 1,
"email": fake.email(),
"name": fake.name(),
"address": fake.address().replace("\n", ", "),
"phone": fake.phone_number(),
"created_at": fake.date_time_between(
start_date="-3y",
end_date="now"
),
"activity_score": int(activity_scores[i]),
"preferred_payment": random.choice([
"credit_card", "credit_card", "credit_card", # Weighted
"paypal", "paypal",
"bank_transfer"
])
})
return customers
def generate_products(self, count):
"""Generate realistic product data with zipf distribution for popularity."""
products = []
# Create a Zipf distribution for product popularity
popularity_ranks = np.random.zipf(
self.stats["product_popularity_zipf_param"],
count
)
# Normalize to 1-100 range for popularity score
max_rank = np.max(popularity_ranks)
popularity_scores = 1 + (popularity_ranks / max_rank) * 99
categories = [
"Electronics", "Clothing", "Home & Kitchen",
"Books", "Sports", "Beauty", "Toys", "Automotive"
]
for i in range(count):
category = random.choice(categories)
# Price ranges by category
if category == "Electronics":
price = random.uniform(50, 1000)
elif category == "Clothing":
price = random.uniform(15, 150)
elif category == "Home & Kitchen":
price = random.uniform(20, 300)
elif category == "Books":
price = random.uniform(10, 50)
elif category == "Sports":
price = random.uniform(20, 200)
elif category == "Beauty":
price = random.uniform(10, 100)
elif category == "Toys":
price = random.uniform(15, 80)
else: # Automotive
price = random.uniform(30, 500)
products.append({
"id": i + 1,
"name": f"{fake.word().capitalize()} {fake.word()} {category.split()[0]}",
"category": category,
"price": round(price, 2),
"stock": int(random.gammavariate(2, 100)) if random.random() > 0.1 else 0, # Some out of stock
"popularity_score": int(popularity_scores[i]),
"created_at": fake.date_time_between(
start_date="-1y",
end_date="now"
)
})
return products
def generate_orders(self, customer_count, product_count, order_count):
"""Generate realistic order data."""
orders = []
order_items = []
# Determine how many orders per customer using a normal distribution
orders_per_customer = np.random.normal(
self.stats["orders_per_customer_mean"],
self.stats["orders_per_customer_std"],
customer_count
)
orders_per_customer = np.maximum(0, orders_per_customer).astype(int)
# Adjust to match target order count
total_orders = sum(orders_per_customer)
if total_orders > 0:
orders_per_customer = np.round(
orders_per_customer * (order_count / total_orders)
).astype(int)
# Generate timestamps with realistic distribution
now = datetime.datetime.now()
timestamps = []
for _ in range(order_count):
# Random day in last 90 days
days_ago = random.randint(0, 90)
date = now - datetime.timedelta(days=days_ago)
# Hour based on hourly distribution
hour = np.random.choice(
24,
p=self.stats["hourly_order_distribution"]
)
date = date.replace(hour=hour, minute=random.randint(0, 59))
timestamps.append(date)
# Sort timestamps (newer first)
timestamps.sort(reverse=True)
# Generate orders
order_id = 1
item_id = 1
for customer_id in range(1, customer_count + 1):
customer_order_count = orders_per_customer[customer_id - 1]
for _ in range(customer_order_count):
if not timestamps:
break
timestamp = timestamps.pop()
# Determine order status based on timestamp and distribution
if timestamp > now - datetime.timedelta(hours=1):
status_options = ["PROCESSING", "PENDING_PAYMENT"]
status_weights = [0.7, 0.3]
elif timestamp > now - datetime.timedelta(days=1):
status_options = ["COMPLETED", "PROCESSING", "CANCELLED"]
status_weights = [0.6, 0.3, 0.1]
else:
status_options = list(self.stats["status_distribution"].keys())
status_weights = list(self.stats["status_distribution"].values())
status = random.choices(
status_options,
weights=status_weights
)[0]
# Generate order items
num_items = np.random.geometric(0.5) # Geometric distribution for items per order
num_items = min(max(1, num_items), 10) # Between 1 and 10 items
# Select products based on popularity
product_ids = list(range(1, product_count + 1))
product_weights = [1/i for i in range(1, product_count + 1)] # Zipf-like
selected_products = random.choices(
product_ids,
weights=product_weights,
k=num_items
)
# Calculate order total
order_total = 0
for product_id in selected_products:
quantity = random.randint(1, 3)
price = random.uniform(10, 200) # Simplified
item_total = price * quantity
order_items.append({
"id": item_id,
"order_id": order_id,
"product_id": product_id,
"quantity": quantity,
"price": price,
"total": item_total
})
order_total += item_total
item_id += 1
# Apply random discount to some orders
discount = 0
if random.random() < 0.2: # 20% of orders have discount
discount = order_total * random.uniform(0.05, 0.2)
order_total -= discount
orders.append({
"id": order_id,
"customer_id": customer_id,
"status": status,
"created_at": timestamp,
"updated_at": timestamp + datetime.timedelta(minutes=random.randint(5, 60)),
"total_amount": order_total,
"discount": discount,
"payment_method": random.choice(["credit_card", "paypal", "bank_transfer"]),
"shipping_address": fake.address().replace("\n", ", ")
})
order_id += 1
return orders, order_items
def save_to_database(self, customers, products, orders, order_items):
"""Save generated data to database."""
pd.DataFrame(customers).to_sql("customers", self.engine, if_exists="append", index=False)
pd.DataFrame(products).to_sql("products", self.engine, if_exists="append", index=False)
pd.DataFrame(orders).to_sql("orders", self.engine, if_exists="append", index=False)
pd.DataFrame(order_items).to_sql("order_items", self.engine, if_exists="append", index=False)
def export_to_csv(self, output_dir):
"""Export generated data to CSV files."""
import os
os.makedirs(output_dir, exist_ok=True)
customers = self.generate_customers(1000)
products = self.generate_products(500)
orders, order_items = self.generate_orders(1000, 500, 5000)
pd.DataFrame(customers).to_csv(f"{output_dir}/customers.csv", index=False)
pd.DataFrame(products).to_csv(f"{output_dir}/products.csv", index=False)
pd.DataFrame(orders).to_csv(f"{output_dir}/orders.csv", index=False)
pd.DataFrame(order_items).to_csv(f"{output_dir}/order_items.csv", index=False)
print(f"Exported data to {output_dir}")
# Example usage
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(description="Generate realistic test data")
parser.add_argument("--db", help="Database connection string")
parser.add_argument("--stats", help="Production statistics CSV file")
parser.add_argument("--output", help="Output directory for CSV files")
parser.add_argument("--customers", type=int, default=1000, help="Number of customers")
parser.add_argument("--products", type=int, default=500, help="Number of products")
parser.add_argument("--orders", type=int, default=5000, help="Number of orders")
args = parser.parse_args()
generator = RealisticDataGenerator(
args.db if args.db else "sqlite:///test_data.db",
args.stats
)
if args.output:
generator.export_to_csv(args.output)
else:
customers = generator.generate_customers(args.customers)
products = generator.generate_products(args.products)
orders, order_items = generator.generate_orders(
args.customers,
args.products,
args.orders
)
generator.save_to_database(customers, products, orders, order_items)
print(f"Generated and saved {len(customers)} customers, {len(products)} products, {len(orders)} orders")
• Long-term: Implemented a comprehensive load testing framework:
// LoadTestingFramework.java - Production-like load testing framework
package com.example.loadtesting;
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.time.Duration;
import java.time.Instant;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.Properties;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.function.Supplier;
import org.apache.commons.math3.distribution.ParetoDistribution;
import org.apache.commons.math3.distribution.ZipfDistribution;
import org.apache.jmeter.config.Arguments;
import org.apache.jmeter.config.CSVDataSet;
import org.apache.jmeter.control.LoopController;
import org.apache.jmeter.control.ThroughputController;
import org.apache.jmeter.engine.StandardJMeterEngine;
import org.apache.jmeter.protocol.http.control.CookieManager;
import org.apache.jmeter.protocol.http.control.HeaderManager;
import org.apache.jmeter.protocol.http.sampler.HTTPSamplerProxy;
import org.apache.jmeter.reporters.ResultCollector;
import org.apache.jmeter.reporters.Summariser;
import org.apache.jmeter.testelement.TestElement;
import org.apache.jmeter.testelement.TestPlan;
import org.apache.jmeter.threads.ThreadGroup;
import org.apache.jmeter.timers.ConstantThroughputTimer;
import org.apache.jmeter.timers.GaussianRandomTimer;
import org.apache.jmeter.util.JMeterUtils;
import org.apache.jorphan.collections.HashTree;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class LoadTestingFramework {
private static final Logger logger = LoggerFactory.getLogger(LoadTestingFramework.class);
private final Properties config;
private final Map<String, UserProfile> userProfiles;
private final Map<String, ScenarioDefinition> scenarios;
private final ProductionDataAnalyzer dataAnalyzer;
public LoadTestingFramework(String configPath, String productionDataPath) throws IOException {
// Load configuration
this.config = new Properties();
try (FileInputStream fis = new FileInputStream(configPath)) {
config.load(fis);
}
// Initialize user profiles
this.userProfiles = new HashMap<>();
initializeUserProfiles();
// Initialize scenarios
this.scenarios = new HashMap<>();
initializeScenarios();
// Initialize production data analyzer
this.dataAnalyzer = new ProductionDataAnalyzer(productionDataPath);
}
private void initializeUserProfiles() {
// Create user profiles based on production data analysis
userProfiles.put("high_value", new UserProfile(
"high_value",
0.15, // 15% of users
new ParetoDistribution(1.0, 1.5), // Session frequency
Duration.ofMinutes(15), // Average session duration
Map.of(
"browse_product", 0.3,
"search_product", 0.2,
"view_product_details", 0.2,
"add_to_cart", 0.15,
"checkout", 0.1,
"view_order_history", 0.05
)
));
userProfiles.put("regular", new UserProfile(
"regular",
0.65, // 65% of users
new ParetoDistribution(1.0, 2.0), // Session frequency
Duration.ofMinutes(8), // Average session duration
Map.of(
"browse_product", 0.4,
"search_product", 0.3,
"view_product_details", 0.15,
"add_to_cart", 0.1,
"checkout", 0.03,
"view_order_history", 0.02
)
));
userProfiles.put("occasional", new UserProfile(
"occasional",
0.20, // 20% of users
new ParetoDistribution(1.0, 3.0), // Session frequency
Duration.ofMinutes(5), // Average session duration
Map.of(
"browse_product", 0.5,
"search_product", 0.3,
"view_product_details", 0.15,
"add_to_cart", 0.04,
"checkout", 0.01,
"view_order_history", 0.0
)
));
}
private void initializeScenarios() {
// Define test scenarios based on real user behavior
scenarios.put("browse_product", new ScenarioDefinition(
"browse_product",
() -> {
HTTPSamplerProxy sampler = new HTTPSamplerProxy();
sampler.setDomain(config.getProperty("target.host"));
sampler.setPort(Integer.parseInt(config.getProperty("target.port")));
sampler.setProtocol(config.getProperty("target.protocol"));
sampler.setPath("/api/products");
sampler.setMethod("GET");
// Add category parameter based on production distribution
String category = dataAnalyzer.getRandomCategory();
sampler.addArgument("category", category);
// Add sorting and pagination based on production patterns
sampler.addArgument("sort", dataAnalyzer.getRandomSortOption());
sampler.addArgument("page", "1");
sampler.addArgument("limit", "20");
return sampler;
}
));
scenarios.put("search_product", new ScenarioDefinition(
"search_product",
() -> {
HTTPSamplerProxy sampler = new HTTPSamplerProxy();
sampler.setDomain(config.getProperty("target.host"));
sampler.setPort(Integer.parseInt(config.getProperty("target.port")));
sampler.setProtocol(config.getProperty("target.protocol"));
sampler.setPath("/api/products/search");
sampler.setMethod("GET");
// Use realistic search terms from production
String searchTerm = dataAnalyzer.getRandomSearchTerm();
sampler.addArgument("q", searchTerm);
return sampler;
}
));
scenarios.put("view_product_details", new ScenarioDefinition(
"view_product_details",
() -> {
HTTPSamplerProxy sampler = new HTTPSamplerProxy();
sampler.setDomain(config.getProperty("target.host"));
sampler.setPort(Integer.parseInt(config.getProperty("target.port")));
sampler.setProtocol(config.getProperty("target.protocol"));
// Use realistic product ID distribution (some products viewed more than others)
String productId = dataAnalyzer.getRandomProductId();
sampler.setPath("/api/products/" + productId);
sampler.setMethod("GET");
return sampler;
}
));
// Add more scenarios...
}
public void runLoadTest(int durationMinutes, int rampUpSeconds, int targetUsers) throws Exception {
// Set JMeter properties
JMeterUtils.loadJMeterProperties(config.getProperty("jmeter.properties"));
JMeterUtils.setJMeterHome(config.getProperty("jmeter.home"));
JMeterUtils.initLocale();
// Create JMeter test plan
HashTree testPlanTree = new HashTree();
TestPlan testPlan = new TestPlan("Realistic Load Test Plan");
testPlan.setProperty(TestElement.TEST_CLASS, TestPlan.class.getName());
testPlan.setProperty(TestElement.GUI_CLASS, TestPlan.class.getName());
testPlan.setUserDefinedVariables((Arguments) new Arguments().getObjectPropAsArguments());
// Create HTTP header manager
HeaderManager headerManager = new HeaderManager();
headerManager.add("Accept", "application/json");
headerManager.add("Content-Type", "application/json");
// Create cookie manager
CookieManager cookieManager = new CookieManager();
cookieManager.setClearEachIteration(false);
// Add test plan to tree
HashTree testPlanNode = testPlanTree.add(testPlan);
// Create thread groups for each user profile
for (UserProfile profile : userProfiles.values()) {
int profileUsers = (int) Math.round(targetUsers * profile.getUserPercentage());
ThreadGroup threadGroup = new ThreadGroup();
threadGroup.setName(profile.getName() + " Users");
threadGroup.setNumThreads(profileUsers);
threadGroup.setRampUp(rampUpSeconds);
threadGroup.setScheduler(true);
threadGroup.setDuration(durationMinutes * 60);
LoopController loopController = new LoopController();
loopController.setLoops(-1);
loopController.setContinueForever(true);
threadGroup.setSamplerController(loopController);
// Add thread group to test plan
HashTree threadGroupNode = testPlanNode.add(threadGroup);
// Add header and cookie managers
threadGroupNode.add(headerManager);
threadGroupNode.add(cookieManager);
// Add timer to simulate think time
GaussianRandomTimer timer = new GaussianRandomTimer();
timer.setDelay("1000"); // Base delay 1 second
timer.setRange("2000"); // +/- 2 seconds
threadGroupNode.add(timer);
// Add scenarios based on user profile behavior
for (Map.Entry<String, Double> scenarioEntry : profile.getScenarioDistribution().entrySet()) {
String scenarioName = scenarioEntry.getKey();
double percentage = scenarioEntry.getValue();
if (scenarios.containsKey(scenarioName)) {
// Create throughput controller to control scenario execution percentage
ThroughputController throughputController = new ThroughputController();
throughputController.setStyle(ThroughputController.TOTAL_EXECUTIONS);
throughputController.setPercentThroughput((float) (percentage * 100));
// Add scenario sampler
HTTPSamplerProxy sampler = scenarios.get(scenarioName).getSamplerSupplier().get();
sampler.setName(profile.getName() + " - " + scenarioName);
// Add throughput controller and sampler to thread group
HashTree scenarioNode = threadGroupNode.add(throughputController);
scenarioNode.add(sampler);
}
}
}
// Add result collector
Summariser summer = new Summariser("Load Test Results");
ResultCollector resultCollector = new ResultCollector(summer);
resultCollector.setFilename(config.getProperty("results.file"));
testPlanTree.add(testPlanTree.getArray()[0], resultCollector);
// Run the test
StandardJMeterEngine jmeterEngine = new StandardJMeterEngine();
jmeterEngine.configure(testPlanTree);
logger.info("Starting load test with {} users for {} minutes", targetUsers, durationMinutes);
jmeterEngine.run();
logger.info("Load test completed");
}
// Inner classes for user profiles and scenarios
private static class UserProfile {
private final String name;
private final double userPercentage;
private final ParetoDistribution sessionFrequency;
private final Duration averageSessionDuration;
private final Map<String, Double> scenarioDistribution;
public UserProfile(String name, double userPercentage, ParetoDistribution sessionFrequency,
Duration averageSessionDuration, Map<String, Double> scenarioDistribution) {
this.name = name;
this.userPercentage = userPercentage;
this.sessionFrequency = sessionFrequency;
this.averageSessionDuration = averageSessionDuration;
this.scenarioDistribution = scenarioDistribution;
}
public String getName() {
return name;
}
public double getUserPercentage() {
return userPercentage;
}
public ParetoDistribution getSessionFrequency() {
return sessionFrequency;
}
public Duration getAverageSessionDuration() {
return averageSessionDuration;
}
public Map<String, Double> getScenarioDistribution() {
return scenarioDistribution;
}
}
private static class ScenarioDefinition {
private final String name;
private final Supplier<HTTPSamplerProxy> samplerSupplier;
public ScenarioDefinition(String name, Supplier<HTTPSamplerProxy> samplerSupplier) {
this.name = name;
this.samplerSupplier = samplerSupplier;
}
public String getName() {
return name;
}
public Supplier<HTTPSamplerProxy> getSamplerSupplier() {
return samplerSupplier;
}
}
private static class ProductionDataAnalyzer {
private final Map<String, Double> categoryDistribution;
private final Map<String, Double> searchTermDistribution;
private final Map<String, Double> productIdDistribution;
private final Map<String, Double> sortOptionDistribution;
public ProductionDataAnalyzer(String productionDataPath) {
// In a real implementation, this would load and analyze production data
// For this example, we'll use hardcoded distributions
categoryDistribution = Map.of(
"electronics", 0.35,
"clothing", 0.25,
"home", 0.15,
"books", 0.10,
"sports", 0.08,
"beauty", 0.07
);
searchTermDistribution = Map.of(
"phone", 0.15,
"laptop", 0.12,
"headphones", 0.10,
"shirt", 0.08,
"shoes", 0.07,
"watch", 0.06,
"camera", 0.05,
"book", 0.04,
"chair", 0.03,
"table", 0.02
);
// In reality, this would be a much larger distribution
productIdDistribution = new HashMap<>();
ZipfDistribution zipf = new ZipfDistribution(1000, 1.2);
for (int i = 1; i <= 1000; i++) {
productIdDistribution.put(String.valueOf(i), zipf.probability(i));
}
sortOptionDistribution = Map.of(
"popularity", 0.45,
"price_asc", 0.25,
"price_desc", 0.20,
"newest", 0.10
);
}
public String getRandomCategory() {
return getRandomFromDistribution(categoryDistribution);
}
public String getRandomSearchTerm() {
return getRandomFromDistribution(searchTermDistribution);
}
public String getRandomProductId() {
return getRandomFromDistribution(productIdDistribution);
}
public String getRandomSortOption() {
return getRandomFromDistribution(sortOptionDistribution);
}
private String getRandomFromDistribution(Map<String, Double> distribution) {
double rand = Math.random();
double cumulativeProbability = 0.0;
for (Map.Entry<String, Double> entry : distribution.entrySet()) {
cumulativeProbability += entry.getValue();
if (rand <= cumulativeProbability) {
return entry.getKey();
}
}
// Fallback to first entry
return distribution.keySet().iterator().next();
}
}
public static void main(String[] args) {
try {
if (args.length < 4) {
System.out.println("Usage: java LoadTestingFramework <configPath> <productionDataPath> <durationMinutes> <targetUsers>");
System.exit(1);
}
String configPath = args[0];
String productionDataPath = args[1];
int durationMinutes = Integer.parseInt(args[2]);
int targetUsers = Integer.parseInt(args[3]);
LoadTestingFramework framework = new LoadTestingFramework(configPath, productionDataPath);
framework.runLoadTest(durationMinutes, 60, targetUsers);
} catch (Exception e) {
logger.error("Error running load test", e);
System.exit(1);
}
}
}
Lessons Learned:
Realistic test data and scenarios are essential for effective performance testing.
How to Avoid:
Use production data patterns to generate test data.
Implement data generation tools that match real-world distributions.
Test with realistic user workflows and access patterns.
Monitor cache hit ratios during testing to ensure they match production.
Include data skew and edge cases in test scenarios.
No summary provided
What Happened:
A team conducted extensive load testing for a new e-commerce platform before launch. The tests showed excellent performance under the expected load. However, after deployment to production, the system experienced severe performance degradation during peak hours. Investigation revealed that the test data used during load testing didn't accurately represent real-world usage patterns, particularly around product search and filtering operations.
Diagnosis Steps:
Compared test data patterns with production data patterns.
Analyzed database query execution plans in test vs. production.
Profiled application components under real user load.
Reviewed caching behavior with real-world access patterns.
Examined load test scripts and data generation logic.
Root Cause:
The investigation revealed that the test data generation had several critical flaws: 1. Product catalog data lacked the variability and distribution of real products 2. Test user behavior didn't include realistic search patterns and filter combinations 3. Test data was too uniformly distributed, lacking the "long tail" characteristics of real data 4. Cache hit ratios were artificially high due to repetitive access patterns in the test 5. Database query plans optimized for test data performed poorly with production data distributions
Fix/Workaround:
• Short-term: Implemented emergency database index optimizations and query rewrites
• Created a comprehensive data generation framework in Java:
// DataGenerator.java - Realistic test data generation framework
package com.example.loadtest;
import com.github.javafaker.Faker;
import org.apache.commons.math3.distribution.ZipfDistribution;
import org.apache.commons.math3.distribution.ParetoDistribution;
import java.io.BufferedWriter;
import java.io.FileWriter;
import java.io.IOException;
import java.time.LocalDateTime;
import java.time.format.DateTimeFormatter;
import java.util.*;
import java.util.concurrent.ThreadLocalRandom;
import java.util.function.Supplier;
import java.util.stream.Collectors;
import java.util.stream.IntStream;
/**
* Generates realistic test data for e-commerce load testing
*/
public class DataGenerator {
private final Faker faker = new Faker();
private final Random random = new Random();
private final ZipfDistribution zipfDistribution;
private final ParetoDistribution paretoDistribution;
// Configuration
private final int numUsers;
private final int numProducts;
private final int numCategories;
private final int numBrands;
private final int numAttributes;
private final int numAttributeValues;
private final int numOrders;
// Generated data
private final List<User> users = new ArrayList<>();
private final List<Product> products = new ArrayList<>();
private final List<Category> categories = new ArrayList<>();
private final List<Brand> brands = new ArrayList<>();
private final List<Attribute> attributes = new ArrayList<>();
private final Map<Attribute, List<AttributeValue>> attributeValues = new HashMap<>();
private final List<Order> orders = new ArrayList<>();
/**
* Creates a new data generator with the specified configuration
*/
public DataGenerator(int numUsers, int numProducts, int numCategories, int numBrands,
int numAttributes, int numAttributeValues, int numOrders) {
this.numUsers = numUsers;
this.numProducts = numProducts;
this.numCategories = numCategories;
this.numBrands = numBrands;
this.numAttributes = numAttributes;
this.numAttributeValues = numAttributeValues;
this.numOrders = numOrders;
// Initialize distributions for realistic data patterns
this.zipfDistribution = new ZipfDistribution(numProducts, 1.1);
this.paretoDistribution = new ParetoDistribution(1, 3);
}
/**
* Generates all test data
*/
public void generateData() {
System.out.println("Generating categories...");
generateCategories();
System.out.println("Generating brands...");
generateBrands();
System.out.println("Generating attributes and values...");
generateAttributesAndValues();
System.out.println("Generating products...");
generateProducts();
System.out.println("Generating users...");
generateUsers();
System.out.println("Generating orders...");
generateOrders();
System.out.println("Data generation complete!");
}
/**
* Generates realistic categories with hierarchical structure
*/
private void generateCategories() {
// Create root categories
int rootCategoryCount = numCategories / 5;
for (int i = 0; i < rootCategoryCount; i++) {
Category category = new Category(
i + 1,
faker.commerce().department(),
null
);
categories.add(category);
}
// Create subcategories with varying depth
int id = rootCategoryCount + 1;
for (Category rootCategory : new ArrayList<>(categories)) {
int subcategoryCount = random.nextInt(5) + 1;
for (int i = 0; i < subcategoryCount && categories.size() < numCategories; i++) {
Category subcategory = new Category(
id++,
faker.commerce().productName().split(" ")[0],
rootCategory.getId()
);
categories.add(subcategory);
// Add some third-level categories
if (random.nextDouble() < 0.3) {
int subsubcategoryCount = random.nextInt(3) + 1;
for (int j = 0; j < subsubcategoryCount && categories.size() < numCategories; j++) {
Category subsubcategory = new Category(
id++,
faker.commerce().material(),
subcategory.getId()
);
categories.add(subsubcategory);
}
}
}
}
}
/**
* Generates brands with realistic market share distribution
*/
private void generateBrands() {
// Create brands with varying popularity
for (int i = 0; i < numBrands; i++) {
double marketShare;
if (i < numBrands * 0.1) {
// Top 10% brands have high market share
marketShare = 0.5 + random.nextDouble() * 0.3;
} else if (i < numBrands * 0.3) {
// Next 20% have medium market share
marketShare = 0.2 + random.nextDouble() * 0.3;
} else {
// Remaining 70% have low market share
marketShare = random.nextDouble() * 0.2;
}
Brand brand = new Brand(
i + 1,
faker.company().name(),
marketShare
);
brands.add(brand);
}
}
/**
* Generates product attributes and their possible values
*/
private void generateAttributesAndValues() {
// Common attribute names for different product types
String[] attributeNames = {
"Color", "Size", "Material", "Weight", "Dimensions",
"Style", "Pattern", "Fit", "Season", "Occasion",
"Capacity", "Power", "Connectivity", "Resolution", "Memory"
};
// Create attributes
for (int i = 0; i < Math.min(numAttributes, attributeNames.length); i++) {
Attribute attribute = new Attribute(i + 1, attributeNames[i]);
attributes.add(attribute);
attributeValues.put(attribute, new ArrayList<>());
}
// Fill remaining attributes if needed
for (int i = attributeNames.length; i < numAttributes; i++) {
Attribute attribute = new Attribute(i + 1, "Attribute" + (i + 1));
attributes.add(attribute);
attributeValues.put(attribute, new ArrayList<>());
}
// Generate attribute values
for (Attribute attribute : attributes) {
List<AttributeValue> values = attributeValues.get(attribute);
// Generate appropriate values based on attribute name
Supplier<String> valueGenerator;
switch (attribute.getName()) {
case "Color":
valueGenerator = () -> faker.color().name();
break;
case "Size":
valueGenerator = () -> {
String[] sizes = {"XS", "S", "M", "L", "XL", "XXL", "XXXL"};
return sizes[random.nextInt(sizes.length)];
};
break;
case "Material":
valueGenerator = () -> faker.commerce().material();
break;
case "Weight":
valueGenerator = () -> (random.nextInt(100) + 1) + " kg";
break;
case "Dimensions":
valueGenerator = () -> {
int w = random.nextInt(100) + 10;
int h = random.nextInt(100) + 10;
int d = random.nextInt(50) + 5;
return w + "x" + h + "x" + d + " cm";
};
break;
default:
valueGenerator = () -> faker.lorem().word();
}
// Generate values for this attribute
int valueCount = Math.min(20, numAttributeValues / numAttributes);
for (int i = 0; i < valueCount; i++) {
AttributeValue value = new AttributeValue(
i + 1,
attribute.getId(),
valueGenerator.get()
);
values.add(value);
}
}
}
/**
* Generates products with realistic attribute distributions
*/
private void generateProducts() {
// Create products with realistic distributions
for (int i = 0; i < numProducts; i++) {
// Select category with preference for leaf categories
Category category;
if (random.nextDouble() < 0.8) {
// 80% of products in leaf categories
List<Category> leafCategories = categories.stream()
.filter(c -> categories.stream().noneMatch(sub -> sub.getParentId() != null && sub.getParentId().equals(c.getId())))
.collect(Collectors.toList());
category = leafCategories.get(random.nextInt(leafCategories.size()));
} else {
// 20% in non-leaf categories
category = categories.get(random.nextInt(categories.size()));
}
// Select brand with preference for popular brands
Brand brand;
if (random.nextDouble() < 0.7) {
// 70% of products from top 30% brands
List<Brand> popularBrands = brands.stream()
.sorted(Comparator.comparing(Brand::getMarketShare).reversed())
.limit((long)(numBrands * 0.3))
.collect(Collectors.toList());
brand = popularBrands.get(random.nextInt(popularBrands.size()));
} else {
// 30% from other brands
brand = brands.get(random.nextInt(brands.size()));
}
// Generate price with realistic distribution
double price;
double priceRandom = random.nextDouble();
if (priceRandom < 0.1) {
// 10% premium products
price = 500 + random.nextDouble() * 4500;
} else if (priceRandom < 0.3) {
// 20% high-end products
price = 100 + random.nextDouble() * 400;
} else if (priceRandom < 0.7) {
// 40% mid-range products
price = 20 + random.nextDouble() * 80;
} else {
// 30% budget products
price = 1 + random.nextDouble() * 19;
}
price = Math.round(price * 100) / 100.0;
// Generate stock level with realistic distribution
int stock;
double stockRandom = random.nextDouble();
if (stockRandom < 0.05) {
// 5% out of stock
stock = 0;
} else if (stockRandom < 0.15) {
// 10% low stock
stock = random.nextInt(5) + 1;
} else if (stockRandom < 0.85) {
// 70% normal stock
stock = random.nextInt(96) + 5;
} else {
// 15% high stock
stock = random.nextInt(900) + 100;
}
// Create product
Product product = new Product(
i + 1,
faker.commerce().productName(),
faker.lorem().paragraph(),
price,
stock,
category.getId(),
brand.getId(),
LocalDateTime.now().minusDays(random.nextInt(365))
);
products.add(product);
// Add product attributes (not all products have all attributes)
List<Attribute> productAttributes = new ArrayList<>(attributes);
Collections.shuffle(productAttributes);
int attrCount = random.nextInt(Math.min(5, numAttributes)) + 1;
for (int j = 0; j < attrCount; j++) {
Attribute attribute = productAttributes.get(j);
List<AttributeValue> values = attributeValues.get(attribute);
AttributeValue value = values.get(random.nextInt(values.size()));
product.addAttributeValue(value.getId());
}
}
}
/**
* Generates users with realistic demographics
*/
private void generateUsers() {
for (int i = 0; i < numUsers; i++) {
// Generate user with realistic age distribution
int age;
double ageRandom = random.nextDouble();
if (ageRandom < 0.05) {
// 5% under 18
age = random.nextInt(5) + 13;
} else if (ageRandom < 0.25) {
// 20% 18-24
age = random.nextInt(7) + 18;
} else if (ageRandom < 0.65) {
// 40% 25-40
age = random.nextInt(16) + 25;
} else if (ageRandom < 0.9) {
// 25% 41-60
age = random.nextInt(20) + 41;
} else {
// 10% over 60
age = random.nextInt(25) + 61;
}
User user = new User(
i + 1,
faker.name().firstName(),
faker.name().lastName(),
faker.internet().emailAddress(),
age,
faker.address().country(),
LocalDateTime.now().minusDays(random.nextInt(1000))
);
users.add(user);
}
}
/**
* Generates orders with realistic patterns
*/
private void generateOrders() {
// Generate orders with realistic time and product distributions
for (int i = 0; i < numOrders; i++) {
// Select user with preference for active users
User user;
if (random.nextDouble() < 0.8) {
// 80% of orders from top 20% active users
int activeUserCount = (int)(numUsers * 0.2);
user = users.get(random.nextInt(activeUserCount));
} else {
// 20% from other users
user = users.get(random.nextInt(users.size()));
}
// Generate order date with realistic distribution
LocalDateTime orderDate;
double dateRandom = random.nextDouble();
if (dateRandom < 0.4) {
// 40% of orders in last 30 days
orderDate = LocalDateTime.now().minusDays(random.nextInt(30));
} else if (dateRandom < 0.7) {
// 30% in last 31-90 days
orderDate = LocalDateTime.now().minusDays(random.nextInt(60) + 31);
} else if (dateRandom < 0.9) {
// 20% in last 91-365 days
orderDate = LocalDateTime.now().minusDays(random.nextInt(275) + 91);
} else {
// 10% older than a year
orderDate = LocalDateTime.now().minusDays(random.nextInt(365) + 366);
}
// Create order
Order order = new Order(
i + 1,
user.getId(),
orderDate,
generateOrderStatus(orderDate)
);
orders.add(order);
// Add order items with Zipf distribution for product popularity
int itemCount = (int)Math.round(paretoDistribution.sample()) + 1;
itemCount = Math.min(itemCount, 10); // Cap at 10 items per order
Set<Integer> orderProducts = new HashSet<>();
for (int j = 0; j < itemCount; j++) {
// Select product using Zipf distribution for popularity
int productIndex;
do {
productIndex = zipfDistribution.sample() - 1;
if (productIndex >= products.size()) {
productIndex = random.nextInt(products.size());
}
} while (orderProducts.contains(productIndex));
orderProducts.add(productIndex);
Product product = products.get(productIndex);
// Determine quantity with realistic distribution
int quantity;
if (random.nextDouble() < 0.8) {
// 80% of order items are quantity 1
quantity = 1;
} else if (random.nextDouble() < 0.95) {
// 15% are quantity 2-3
quantity = random.nextInt(2) + 2;
} else {
// 5% are quantity 4-10
quantity = random.nextInt(7) + 4;
}
order.addItem(product.getId(), quantity, product.getPrice());
}
}
}
/**
* Generates a realistic order status based on the order date
*/
private String generateOrderStatus(LocalDateTime orderDate) {
LocalDateTime now = LocalDateTime.now();
long daysSinceOrder = java.time.Duration.between(orderDate, now).toDays();
if (daysSinceOrder < 1) {
// Orders less than a day old
double random = this.random.nextDouble();
if (random < 0.2) return "PENDING";
if (random < 0.8) return "PROCESSING";
return "SHIPPED";
} else if (daysSinceOrder < 3) {
// Orders 1-3 days old
double random = this.random.nextDouble();
if (random < 0.1) return "PENDING";
if (random < 0.3) return "PROCESSING";
if (random < 0.8) return "SHIPPED";
return "DELIVERED";
} else if (daysSinceOrder < 7) {
// Orders 3-7 days old
double random = this.random.nextDouble();
if (random < 0.05) return "PROCESSING";
if (random < 0.3) return "SHIPPED";
if (random < 0.95) return "DELIVERED";
return "RETURNED";
} else {
// Orders more than 7 days old
double random = this.random.nextDouble();
if (random < 0.02) return "SHIPPED";
if (random < 0.9) return "DELIVERED";
if (random < 0.98) return "RETURNED";
return "CANCELLED";
}
}
/**
* Exports all generated data to CSV files
*/
public void exportToCsv(String outputDir) throws IOException {
// Create output directory if it doesn't exist
java.nio.file.Files.createDirectories(java.nio.file.Paths.get(outputDir));
// Export users
try (BufferedWriter writer = new BufferedWriter(new FileWriter(outputDir + "/users.csv"))) {
writer.write("id,first_name,last_name,email,age,country,registration_date\n");
for (User user : users) {
writer.write(String.format("%d,%s,%s,%s,%d,%s,%s\n",
user.getId(),
escapeCsv(user.getFirstName()),
escapeCsv(user.getLastName()),
escapeCsv(user.getEmail()),
user.getAge(),
escapeCsv(user.getCountry()),
user.getRegistrationDate().format(DateTimeFormatter.ISO_LOCAL_DATE_TIME)
));
}
}
// Export categories
try (BufferedWriter writer = new BufferedWriter(new FileWriter(outputDir + "/categories.csv"))) {
writer.write("id,name,parent_id\n");
for (Category category : categories) {
writer.write(String.format("%d,%s,%s\n",
category.getId(),
escapeCsv(category.getName()),
category.getParentId() == null ? "" : category.getParentId()
));
}
}
// Export brands
try (BufferedWriter writer = new BufferedWriter(new FileWriter(outputDir + "/brands.csv"))) {
writer.write("id,name,market_share\n");
for (Brand brand : brands) {
writer.write(String.format("%d,%s,%.4f\n",
brand.getId(),
escapeCsv(brand.getName()),
brand.getMarketShare()
));
}
}
// Export attributes
try (BufferedWriter writer = new BufferedWriter(new FileWriter(outputDir + "/attributes.csv"))) {
writer.write("id,name\n");
for (Attribute attribute : attributes) {
writer.write(String.format("%d,%s\n",
attribute.getId(),
escapeCsv(attribute.getName())
));
}
}
// Export attribute values
try (BufferedWriter writer = new BufferedWriter(new FileWriter(outputDir + "/attribute_values.csv"))) {
writer.write("id,attribute_id,value\n");
for (Map.Entry<Attribute, List<AttributeValue>> entry : attributeValues.entrySet()) {
for (AttributeValue value : entry.getValue()) {
writer.write(String.format("%d,%d,%s\n",
value.getId(),
value.getAttributeId(),
escapeCsv(value.getValue())
));
}
}
}
// Export products
try (BufferedWriter writer = new BufferedWriter(new FileWriter(outputDir + "/products.csv"))) {
writer.write("id,name,description,price,stock,category_id,brand_id,created_at\n");
for (Product product : products) {
writer.write(String.format("%d,%s,%s,%.2f,%d,%d,%d,%s\n",
product.getId(),
escapeCsv(product.getName()),
escapeCsv(product.getDescription()),
product.getPrice(),
product.getStock(),
product.getCategoryId(),
product.getBrandId(),
product.getCreatedAt().format(DateTimeFormatter.ISO_LOCAL_DATE_TIME)
));
}
}
// Export product attributes
try (BufferedWriter writer = new BufferedWriter(new FileWriter(outputDir + "/product_attributes.csv"))) {
writer.write("product_id,attribute_value_id\n");
for (Product product : products) {
for (Integer attributeValueId : product.getAttributeValueIds()) {
writer.write(String.format("%d,%d\n",
product.getId(),
attributeValueId
));
}
}
}
// Export orders
try (BufferedWriter writer = new BufferedWriter(new FileWriter(outputDir + "/orders.csv"))) {
writer.write("id,user_id,order_date,status\n");
for (Order order : orders) {
writer.write(String.format("%d,%d,%s,%s\n",
order.getId(),
order.getUserId(),
order.getOrderDate().format(DateTimeFormatter.ISO_LOCAL_DATE_TIME),
order.getStatus()
));
}
}
// Export order items
try (BufferedWriter writer = new BufferedWriter(new FileWriter(outputDir + "/order_items.csv"))) {
writer.write("order_id,product_id,quantity,price\n");
for (Order order : orders) {
for (OrderItem item : order.getItems()) {
writer.write(String.format("%d,%d,%d,%.2f\n",
order.getId(),
item.getProductId(),
item.getQuantity(),
item.getPrice()
));
}
}
}
System.out.println("Data exported to " + outputDir);
}
/**
* Escapes a string for CSV output
*/
private String escapeCsv(String value) {
if (value == null) {
return "";
}
value = value.replace("\"", "\"\"");
if (value.contains(",") || value.contains("\"") || value.contains("\n")) {
value = "\"" + value + "\"";
}
return value;
}
/**
* Generates JMeter test data files
*/
public void generateJMeterTestData(String outputDir) throws IOException {
// Create output directory if it doesn't exist
java.nio.file.Files.createDirectories(java.nio.file.Paths.get(outputDir));
// Generate search terms file
try (BufferedWriter writer = new BufferedWriter(new FileWriter(outputDir + "/search_terms.csv"))) {
writer.write("search_term,weight\n");
// Extract words from product names and descriptions
Map<String, Integer> wordFrequency = new HashMap<>();
for (Product product : products) {
String[] nameWords = product.getName().toLowerCase().split("\\s+");
String[] descWords = product.getDescription().toLowerCase().split("\\s+");
for (String word : nameWords) {
if (word.length() > 3) {
wordFrequency.put(word, wordFrequency.getOrDefault(word, 0) + 3);
}
}
for (String word : descWords) {
if (word.length() > 3) {
wordFrequency.put(word, wordFrequency.getOrDefault(word, 0) + 1);
}
}
}
// Add category and brand names
for (Category category : categories) {
String[] words = category.getName().toLowerCase().split("\\s+");
for (String word : words) {
if (word.length() > 3) {
wordFrequency.put(word, wordFrequency.getOrDefault(word, 0) + 5);
}
}
}
for (Brand brand : brands) {
String[] words = brand.getName().toLowerCase().split("\\s+");
for (String word : words) {
if (word.length() > 3) {
wordFrequency.put(word, wordFrequency.getOrDefault(word, 0) + 10);
}
}
}
// Sort by frequency
List<Map.Entry<String, Integer>> sortedWords = new ArrayList<>(wordFrequency.entrySet());
sortedWords.sort(Map.Entry.<String, Integer>comparingByValue().reversed());
// Write top 1000 words
int limit = Math.min(1000, sortedWords.size());
for (int i = 0; i < limit; i++) {
Map.Entry<String, Integer> entry = sortedWords.get(i);
writer.write(String.format("%s,%d\n", entry.getKey(), entry.getValue()));
}
}
// Generate filter combinations file
try (BufferedWriter writer = new BufferedWriter(new FileWriter(outputDir + "/filter_combinations.csv"))) {
writer.write("category_id,brand_ids,attribute_filters,price_min,price_max,sort_by,sort_direction\n");
// Generate realistic filter combinations
int numCombinations = 1000;
for (int i = 0; i < numCombinations; i++) {
// Select category
Category category = categories.get(random.nextInt(categories.size()));
// Select brands (0-3)
int brandCount = random.nextInt(4);
List<Brand> selectedBrands = new ArrayList<>(brands);
Collections.shuffle(selectedBrands);
selectedBrands = selectedBrands.subList(0, brandCount);
String brandIds = selectedBrands.stream()
.map(b -> String.valueOf(b.getId()))
.collect(Collectors.joining(","));
// Select attributes (0-3)
int attrCount = random.nextInt(4);
List<Attribute> selectedAttrs = new ArrayList<>(attributes);
Collections.shuffle(selectedAttrs);
selectedAttrs = selectedAttrs.subList(0, Math.min(attrCount, selectedAttrs.size()));
StringBuilder attrFilters = new StringBuilder();
for (Attribute attr : selectedAttrs) {
List<AttributeValue> values = attributeValues.get(attr);
if (values.isEmpty()) continue;
// Select 1-3 values for this attribute
int valueCount = random.nextInt(3) + 1;
List<AttributeValue> selectedValues = new ArrayList<>(values);
Collections.shuffle(selectedValues);
selectedValues = selectedValues.subList(0, Math.min(valueCount, selectedValues.size()));
if (attrFilters.length() > 0) {
attrFilters.append(";");
}
attrFilters.append(attr.getId()).append(":");
attrFilters.append(selectedValues.stream()
.map(v -> String.valueOf(v.getId()))
.collect(Collectors.joining(",")));
}
// Price range (50% of filters include price)
double priceMin = 0;
double priceMax = 0;
if (random.nextDouble() < 0.5) {
// Generate realistic price ranges
if (random.nextDouble() < 0.3) {
// Budget range
priceMin = 0;
priceMax = 20 + random.nextDouble() * 30;
} else if (random.nextDouble() < 0.7) {
// Mid range
priceMin = 20 + random.nextDouble() * 30;
priceMax = 100 + random.nextDouble() * 150;
} else {
// Premium range
priceMin = 100 + random.nextDouble() * 200;
priceMax = 500 + random.nextDouble() * 4500;
}
}
// Sort options
String[] sortOptions = {"price", "name", "created_at", "popularity"};
String sortBy = sortOptions[random.nextInt(sortOptions.length)];
String sortDirection = random.nextBoolean() ? "asc" : "desc";
writer.write(String.format("%d,\"%s\",\"%s\",%.2f,%.2f,%s,%s\n",
category.getId(),
brandIds,
attrFilters,
priceMin,
priceMax,
sortBy,
sortDirection
));
}
}
// Generate user sessions file
try (BufferedWriter writer = new BufferedWriter(new FileWriter(outputDir + "/user_sessions.csv"))) {
writer.write("session_id,user_id,session_length,product_views,search_count,filter_count,cart_adds,checkout\n");
// Generate realistic user sessions
int numSessions = 10000;
for (int i = 0; i < numSessions; i++) {
int userId = random.nextInt(numUsers) + 1;
// Session length follows a distribution where most sessions are short
int sessionLength;
double sessionRandom = random.nextDouble();
if (sessionRandom < 0.3) {
// 30% bounce (1-2 pages)
sessionLength = random.nextInt(2) + 1;
} else if (sessionRandom < 0.7) {
// 40% short sessions (3-5 pages)
sessionLength = random.nextInt(3) + 3;
} else if (sessionRandom < 0.9) {
// 20% medium sessions (6-10 pages)
sessionLength = random.nextInt(5) + 6;
} else {
// 10% long sessions (11-30 pages)
sessionLength = random.nextInt(20) + 11;
}
// Calculate realistic actions based on session length
int productViews = 0;
int searchCount = 0;
int filterCount = 0;
int cartAdds = 0;
boolean checkout = false;
if (sessionLength >= 2) {
// For sessions with at least 2 pages
productViews = Math.min(sessionLength - 1, random.nextInt(sessionLength) + 1);
if (sessionLength >= 3) {
// For sessions with at least 3 pages
searchCount = Math.min(sessionLength - productViews, random.nextInt(3));
if (sessionLength >= 4) {
// For sessions with at least 4 pages
filterCount = Math.min(sessionLength - productViews - searchCount, random.nextInt(3));
if (productViews >= 2) {
// Only add to cart if viewed at least 2 products
cartAdds = random.nextInt(Math.min(3, productViews)) * (random.nextDouble() < 0.3 ? 1 : 0);
if (cartAdds > 0 && sessionLength >= 5) {
// Only checkout if added to cart and session is long enough
checkout = random.nextDouble() < 0.2;
}
}
}
}
}
writer.write(String.format("%d,%d,%d,%d,%d,%d,%d,%b\n",
i + 1,
userId,
sessionLength,
productViews,
searchCount,
filterCount,
cartAdds,
checkout
));
}
}
System.out.println("JMeter test data exported to " + outputDir);
}
/**
* Main method for standalone execution
*/
public static void main(String[] args) {
try {
// Configure data generation
int numUsers = 10000;
int numProducts = 5000;
int numCategories = 100;
int numBrands = 200;
int numAttributes = 15;
int numAttributeValues = 150;
int numOrders = 20000;
// Create data generator
DataGenerator generator = new DataGenerator(
numUsers, numProducts, numCategories, numBrands,
numAttributes, numAttributeValues, numOrders
);
// Generate data
generator.generateData();
// Export to CSV
generator.exportToCsv("./test_data");
// Generate JMeter test data
generator.generateJMeterTestData("./jmeter_data");
} catch (Exception e) {
e.printStackTrace();
}
}
// Data model classes
static class User {
private final int id;
private final String firstName;
private final String lastName;
private final String email;
private final int age;
private final String country;
private final LocalDateTime registrationDate;
public User(int id, String firstName, String lastName, String email, int age, String country, LocalDateTime registrationDate) {
this.id = id;
this.firstName = firstName;
this.lastName = lastName;
this.email = email;
this.age = age;
this.country = country;
this.registrationDate = registrationDate;
}
public int getId() { return id; }
public String getFirstName() { return firstName; }
public String getLastName() { return lastName; }
public String getEmail() { return email; }
public int getAge() { return age; }
public String getCountry() { return country; }
public LocalDateTime getRegistrationDate() { return registrationDate; }
}
static class Category {
private final int id;
private final String name;
private final Integer parentId;
public Category(int id, String name, Integer parentId) {
this.id = id;
this.name = name;
this.parentId = parentId;
}
public int getId() { return id; }
public String getName() { return name; }
public Integer getParentId() { return parentId; }
}
static class Brand {
private final int id;
private final String name;
private final double marketShare;
public Brand(int id, String name, double marketShare) {
this.id = id;
this.name = name;
this.marketShare = marketShare;
}
public int getId() { return id; }
public String getName() { return name; }
public double getMarketShare() { return marketShare; }
}
static class Attribute {
private final int id;
private final String name;
public Attribute(int id, String name) {
this.id = id;
this.name = name;
}
public int getId() { return id; }
public String getName() { return name; }
}
static class AttributeValue {
private final int id;
private final int attributeId;
private final String value;
public AttributeValue(int id, int attributeId, String value) {
this.id = id;
this.attributeId = attributeId;
this.value = value;
}
public int getId() { return id; }
public int getAttributeId() { return attributeId; }
public String getValue() { return value; }
}
static class Product {
private final int id;
private final String name;
private final String description;
private final double price;
private final int stock;
private final int categoryId;
private final int brandId;
private final LocalDateTime createdAt;
private final List<Integer> attributeValueIds = new ArrayList<>();
public Product(int id, String name, String description, double price, int stock, int categoryId, int brandId, LocalDateTime createdAt) {
this.id = id;
this.name = name;
this.description = description;
this.price = price;
this.stock = stock;
this.categoryId = categoryId;
this.brandId = brandId;
this.createdAt = createdAt;
}
public void addAttributeValue(int attributeValueId) {
attributeValueIds.add(attributeValueId);
}
public int getId() { return id; }
public String getName() { return name; }
public String getDescription() { return description; }
public double getPrice() { return price; }
public int getStock() { return stock; }
public int getCategoryId() { return categoryId; }
public int getBrandId() { return brandId; }
public LocalDateTime getCreatedAt() { return createdAt; }
public List<Integer> getAttributeValueIds() { return attributeValueIds; }
}
static class Order {
private final int id;
private final int userId;
private final LocalDateTime orderDate;
private final String status;
private final List<OrderItem> items = new ArrayList<>();
public Order(int id, int userId, LocalDateTime orderDate, String status) {
this.id = id;
this.userId = userId;
this.orderDate = orderDate;
this.status = status;
}
public void addItem(int productId, int quantity, double price) {
items.add(new OrderItem(productId, quantity, price));
}
public int getId() { return id; }
public int getUserId() { return userId; }
public LocalDateTime getOrderDate() { return orderDate; }
public String getStatus() { return status; }
public List<OrderItem> getItems() { return items; }
}
static class OrderItem {
private final int productId;
private final int quantity;
private final double price;
public OrderItem(int productId, int quantity, double price) {
this.productId = productId;
this.quantity = quantity;
this.price = price;
}
public int getProductId() { return productId; }
public int getQuantity() { return quantity; }
public double getPrice() { return price; }
}
}
• Long-term: Implemented a comprehensive load testing strategy:
- Created realistic test data generation with proper distributions
- Implemented user behavior modeling based on production analytics
- Developed database query analysis and optimization tools
- Established performance testing as part of the CI/CD pipeline
Lessons Learned:
Realistic test data is critical for accurate performance testing results.
How to Avoid:
Generate test data that matches production data distributions and patterns.
Model realistic user behavior based on analytics data.
Test with production-sized datasets including edge cases.
Include database query analysis in performance testing.
Validate cache behavior with realistic access patterns.
No summary provided
What Happened:
During a scheduled performance test of a new application release, the system initially handled the expected load but began experiencing severe performance degradation after about 10 minutes. Response times increased exponentially, and users started receiving timeout errors. The application servers showed high CPU usage, and database connection errors appeared in the logs. The test had to be aborted before completion, and the release was delayed.
Diagnosis Steps:
Analyzed JMeter test results to identify the failure pattern.
Examined application server logs for error messages and stack traces.
Monitored database connection pool metrics during a repeated test.
Profiled the application to identify resource usage patterns.
Reviewed database query execution plans and performance.
Root Cause:
The investigation revealed multiple issues with database connection handling: 1. The application was opening new database connections for each transaction without properly closing them 2. Connection pool settings were misconfigured with excessive max connections 3. Database queries were inefficient, holding connections open for too long 4. No timeout settings were configured for long-running queries 5. The application lacked proper connection leak detection and recovery
Fix/Workaround:
• Implemented proper connection handling in the application code
• Optimized database connection pool configuration
• Added connection leak detection and timeout settings
• Improved query efficiency with proper indexing and optimization
• Implemented circuit breaker pattern for database operations
Lessons Learned:
Database connection management is critical for application scalability under load.
How to Avoid:
Implement proper connection pool monitoring in performance tests.
Configure appropriate connection pool settings based on workload.
Use connection leak detection in development and testing.
Include database performance metrics in all load tests.
Test with gradually increasing load to identify bottlenecks early.
No summary provided
What Happened:
During a scheduled 24-hour load test of a new application release, the system initially performed well but began experiencing degraded performance after about 8 hours. Response times gradually increased, and after 14 hours, the application servers began crashing with OutOfMemoryError exceptions. The issue was not detected in shorter load tests or functional testing, and only manifested under sustained load over an extended period.
Diagnosis Steps:
Analyzed heap dumps from the failing application servers.
Monitored memory usage patterns during extended load tests.
Used VisualVM to identify memory growth and object retention.
Reviewed code changes in the recent release.
Performed targeted tests of suspicious components.
Root Cause:
The investigation revealed multiple memory management issues: 1. A connection pooling implementation was not properly closing connections 2. A caching mechanism lacked eviction policies, causing unbounded growth 3. Several static collections were being populated without size limits 4. Thread local variables were not being cleaned up properly 5. Large objects were being retained in session state unnecessarily
Fix/Workaround:
• Implemented proper connection handling with explicit cleanup
• Added eviction policies to all caching mechanisms
• Replaced unbounded collections with size-limited alternatives
• Ensured proper cleanup of thread local variables
• Optimized session state management
Lessons Learned:
Memory management issues often only manifest under sustained load over extended periods.
How to Avoid:
Include extended-duration load tests in performance testing strategy.
Implement memory usage monitoring in all environments.
Use automated memory leak detection tools in development.
Review code for common memory leak patterns during code reviews.
Perform regular profiling of applications under load.
No summary provided
What Happened:
During a load test simulating peak traffic conditions, a web application initially performed well but began experiencing increasing response times after about 10 minutes. Eventually, the application started returning database connection errors to users. The monitoring system showed normal CPU and memory usage on application servers, but database connection counts were at maximum capacity. The issue was particularly concerning because the test was only at 70% of the expected peak load.
Diagnosis Steps:
Analyzed application logs for database connection errors.
Monitored database connection pool metrics during the test.
Examined database server performance and connection statistics.
Reviewed application code for connection handling patterns.
Traced individual requests to identify connection usage patterns.
Root Cause:
The investigation revealed multiple issues with connection management: 1. The application was not properly releasing database connections back to the pool 2. Some error handling paths failed to close connections in exception scenarios 3. Long-running transactions were holding connections for extended periods 4. Connection pool timeout settings were too long, preventing recovery 5. The application used separate connections for closely related operations
Fix/Workaround:
• Implemented proper connection handling with try-with-resources
• Fixed error handling paths to ensure connection release
• Optimized transaction boundaries to reduce connection hold times
• Adjusted connection pool settings for better recovery
• Implemented connection usage monitoring and alerting
Lessons Learned:
Database connection management is critical for application scalability under load.
How to Avoid:
Implement proper connection handling patterns in all database operations.
Use connection pool monitoring and alerting in all environments.
Test connection usage patterns under sustained load conditions.
Configure appropriate timeouts and recovery mechanisms.
Review database access patterns to minimize connection usage.
No summary provided
What Happened:
A company implemented a comprehensive load testing strategy using JMeter to validate a new e-commerce platform before launch. The tests showed excellent performance under expected peak loads. However, when the platform was launched, it experienced severe performance degradation and partial outages under real-world traffic. Post-incident analysis revealed that despite efforts to create a production-like test environment, critical configuration differences led to misleading test results.
Diagnosis Steps:
Compared production and test environment configurations in detail.
Analyzed real-world traffic patterns versus test scenarios.
Examined resource allocation and scaling configurations.
Reviewed database configuration and query patterns.
Assessed network topology and latency characteristics.
Root Cause:
The investigation revealed multiple configuration mismatches: 1. The test environment used a different database configuration with more aggressive caching 2. Network latency between services was significantly lower in the test environment 3. The test environment lacked the same resource constraints as production 4. Load balancer configurations differed, affecting connection distribution 5. Test data did not represent the distribution and complexity of production data
Fix/Workaround:
• Implemented configuration parity between environments
• Created a comprehensive environment comparison checklist
• Developed automated configuration validation tools
• Improved test data generation to match production patterns
• Implemented chaos engineering practices to test resilience
Lessons Learned:
Load test environments must accurately reflect production configurations to provide meaningful results.
How to Avoid:
Implement configuration-as-code for all environments to ensure parity.
Create automated validation tools to compare environment configurations.
Use production traffic patterns to inform test scenarios.
Include network latency and resource constraints in test environments.
Regularly validate test data against production data patterns.
No summary provided
What Happened:
A large e-commerce company conducted extensive performance testing of their new checkout flow before a major release. The tests showed excellent response times and throughput, well within the defined SLAs. However, when the system was deployed to production, real users immediately reported slow page loads and timeouts during checkout. The discrepancy between test results and real-world performance threatened a critical sales period and required urgent investigation.
Diagnosis Steps:
Compared test environment configuration with production.
Analyzed network traffic patterns in both environments.
Reviewed performance test scripts and configurations.
Monitored client-side performance metrics in production.
Examined browser developer tools for real user sessions.
Root Cause:
The investigation revealed that the performance tests were not accurately simulating real-world conditions: 1. The performance test tool (JMeter) was running on powerful servers with high-bandwidth connections 2. Test scripts were not implementing realistic client-side throttling to simulate mobile devices 3. The tests were not accounting for client-side rendering time, only server response times 4. Network conditions between test clients and servers were optimal, unlike real-world conditions 5. The test scripts did not include realistic think time between user actions
Fix/Workaround:
• Implemented immediate improvements to performance test methodology
• Added client-side throttling to simulate various device capabilities
• Incorporated network condition simulation (latency, packet loss)
• Included client-side rendering metrics in performance evaluation
• Created more realistic user behavior patterns with appropriate think times
Lessons Learned:
Performance testing must accurately simulate real-world conditions, including client-side constraints, to provide meaningful results.
How to Avoid:
Implement client-side throttling in performance tests to simulate various devices.
Include network condition simulation to represent real-world scenarios.
Measure end-to-end performance, including client-side rendering time.
Create realistic user behavior patterns with appropriate think times.
Validate test results with real-world monitoring data.