A DevOps team decided to migrate from Jenkins to GitLab CI for better integration with their source code management. After migrating 50% of the pipelines, they started experiencing inconsistent build results, missing artifacts, and failed deployments. The issues were intermittent, making them difficult to diagnose.
# DevOps Tools and Automation Scenarios
No summary provided
What Happened:
Diagnosis Steps:
Compared successful and failed pipeline runs in both systems.
Analyzed pipeline configurations and execution logs.
Reviewed resource utilization on CI runners.
Tested identical code with both CI systems.
Examined network connectivity between CI systems and artifact repositories.
Root Cause:
Multiple issues were identified: 1. The GitLab CI configuration didn't properly handle workspace persistence between stages, causing artifacts to be lost. 2. Environment variables were defined differently between the two systems, leading to inconsistent behavior. 3. The GitLab runners had insufficient resources compared to the Jenkins agents. 4. Some custom Jenkins plugins had no equivalent in GitLab CI, requiring workflow redesign. 5. Secret management was implemented differently, causing authentication failures.
Fix/Workaround:
• Short-term: Fixed the most critical issues in GitLab CI configuration:
# Before: Problematic GitLab CI configuration
stages:
- build
- test
- deploy
build:
stage: build
script:
- ./gradlew build
artifacts:
paths:
- build/libs/*.jar
test:
stage: test
script:
- ./gradlew test
deploy:
stage: deploy
script:
- kubectl apply -f k8s/deployment.yaml
# After: Improved GitLab CI configuration with proper artifact handling and caching
stages:
- build
- test
- deploy
variables:
GRADLE_OPTS: "-Dorg.gradle.daemon=false -Dorg.gradle.caching=true"
DOCKER_DRIVER: overlay2
DOCKER_TLS_CERTDIR: ""
.gradle_cache: &gradle_cache
cache:
key: ${CI_COMMIT_REF_SLUG}
paths:
- .gradle/
- build/
build:
stage: build
image: gradle:7.4-jdk17
<<: *gradle_cache
script:
- gradle build
artifacts:
paths:
- build/libs/*.jar
expire_in: 1 week
test:
stage: test
image: gradle:7.4-jdk17
<<: *gradle_cache
dependencies:
- build
script:
- gradle test
artifacts:
paths:
- build/reports/tests/
reports:
junit: build/test-results/test/TEST-*.xml
expire_in: 1 week
deploy:
stage: deploy
image: bitnami/kubectl:latest
dependencies:
- build
script:
- echo "$KUBE_CONFIG" | base64 -d > kubeconfig.yaml
- export KUBECONFIG=kubeconfig.yaml
- kubectl apply -f k8s/deployment.yaml
environment:
name: production
only:
- main
• Long-term: Implemented a comprehensive migration strategy:
// migration_validator.go - Tool to validate CI pipeline migration
package main
import (
"context"
"encoding/json"
"fmt"
"io/ioutil"
"log"
"os"
"os/exec"
"path/filepath"
"strings"
"sync"
"time"
"github.com/google/go-github/v45/github"
"github.com/xanzy/go-gitlab"
"gopkg.in/yaml.v3"
)
type PipelineConfig struct {
Name string `json:"name"`
RepoURL string `json:"repoURL"`
Branch string `json:"branch"`
JenkinsFile string `json:"jenkinsFile,omitempty"`
GitLabCI string `json:"gitlabCI,omitempty"`
Env map[string]string `json:"env"`
Secrets []string `json:"secrets"`
}
type ValidationResult struct {
PipelineName string `json:"pipelineName"`
System string `json:"system"` // Jenkins or GitLab
Success bool `json:"success"`
Duration float64 `json:"duration"` // in seconds
Artifacts []string `json:"artifacts"`
Errors []string `json:"errors"`
}
func main() {
if len(os.Args) < 2 {
fmt.Println("Usage: migration_validator <config_file.json>")
os.Exit(1)
}
configFile := os.Args[1]
pipelines, err := loadPipelineConfigs(configFile)
if err != nil {
log.Fatalf("Failed to load pipeline configs: %v", err)
}
// Create results directory
resultsDir := "migration_results"
os.MkdirAll(resultsDir, 0755)
// Validate each pipeline
var wg sync.WaitGroup
for _, pipeline := range pipelines {
wg.Add(1)
go func(p PipelineConfig) {
defer wg.Done()
validatePipeline(p, resultsDir)
}(pipeline)
}
wg.Wait()
fmt.Println("Migration validation completed. Results saved to", resultsDir)
}
func loadPipelineConfigs(configFile string) ([]PipelineConfig, error) {
data, err := ioutil.ReadFile(configFile)
if err != nil {
return nil, err
}
var configs []PipelineConfig
err = json.Unmarshal(data, &configs)
if err != nil {
return nil, err
}
return configs, nil
}
func validatePipeline(pipeline PipelineConfig, resultsDir string) {
// Clone repository
repoDir := filepath.Join(os.TempDir(), "migration_validator", pipeline.Name)
os.MkdirAll(repoDir, 0755)
defer os.RemoveAll(repoDir)
fmt.Printf("Validating pipeline: %s\n", pipeline.Name)
fmt.Printf("Cloning repository: %s\n", pipeline.RepoURL)
cmd := exec.Command("git", "clone", "--branch", pipeline.Branch, pipeline.RepoURL, repoDir)
if err := cmd.Run(); err != nil {
log.Printf("Failed to clone repository for %s: %v", pipeline.Name, err)
return
}
// Validate Jenkins pipeline
jenkinsResult := validateJenkinsPipeline(pipeline, repoDir)
saveValidationResult(jenkinsResult, filepath.Join(resultsDir, fmt.Sprintf("%s_jenkins.json", pipeline.Name)))
// Validate GitLab CI pipeline
gitlabResult := validateGitLabPipeline(pipeline, repoDir)
saveValidationResult(gitlabResult, filepath.Join(resultsDir, fmt.Sprintf("%s_gitlab.json", pipeline.Name)))
// Compare results
compareResults(jenkinsResult, gitlabResult, filepath.Join(resultsDir, fmt.Sprintf("%s_comparison.json", pipeline.Name)))
}
func validateJenkinsPipeline(pipeline PipelineConfig, repoDir string) ValidationResult {
result := ValidationResult{
PipelineName: pipeline.Name,
System: "Jenkins",
}
jenkinsFile := filepath.Join(repoDir, pipeline.JenkinsFile)
if _, err := os.Stat(jenkinsFile); os.IsNotExist(err) {
result.Errors = append(result.Errors, "Jenkinsfile not found")
return result
}
// Validate Jenkinsfile syntax
cmd := exec.Command("jenkins-cli", "declarative-linter", "--file", jenkinsFile)
output, err := cmd.CombinedOutput()
if err != nil {
result.Errors = append(result.Errors, fmt.Sprintf("Jenkinsfile syntax validation failed: %s", output))
return result
}
// Run Jenkins pipeline
startTime := time.Now()
cmd = exec.Command("jenkins-cli", "build", pipeline.Name, "-f", "-v")
for k, v := range pipeline.Env {
cmd.Env = append(cmd.Env, fmt.Sprintf("%s=%s", k, v))
}
output, err = cmd.CombinedOutput()
duration := time.Since(startTime).Seconds()
if err != nil {
result.Success = false
result.Duration = duration
result.Errors = append(result.Errors, fmt.Sprintf("Jenkins pipeline execution failed: %s", output))
return result
}
result.Success = true
result.Duration = duration
// Get artifacts
artifactsDir := filepath.Join(os.TempDir(), "jenkins_artifacts", pipeline.Name)
os.MkdirAll(artifactsDir, 0755)
cmd = exec.Command("jenkins-cli", "copy-artifacts", pipeline.Name, "-d", artifactsDir)
if err := cmd.Run(); err == nil {
filepath.Walk(artifactsDir, func(path string, info os.FileInfo, err error) error {
if !info.IsDir() {
relPath, _ := filepath.Rel(artifactsDir, path)
result.Artifacts = append(result.Artifacts, relPath)
}
return nil
})
}
return result
}
func validateGitLabPipeline(pipeline PipelineConfig, repoDir string) ValidationResult {
result := ValidationResult{
PipelineName: pipeline.Name,
System: "GitLab",
}
gitlabCIFile := filepath.Join(repoDir, pipeline.GitLabCI)
if _, err := os.Stat(gitlabCIFile); os.IsNotExist(err) {
result.Errors = append(result.Errors, "GitLab CI file not found")
return result
}
// Validate GitLab CI syntax
data, err := ioutil.ReadFile(gitlabCIFile)
if err != nil {
result.Errors = append(result.Errors, fmt.Sprintf("Failed to read GitLab CI file: %v", err))
return result
}
var ciConfig map[string]interface{}
if err := yaml.Unmarshal(data, &ciConfig); err != nil {
result.Errors = append(result.Errors, fmt.Sprintf("GitLab CI file syntax validation failed: %v", err))
return result
}
// Run GitLab CI pipeline
startTime := time.Now()
cmd := exec.Command("gitlab-runner", "exec", "docker", "--docker-privileged", "--env", strings.Join(mapToEnvSlice(pipeline.Env), " "))
cmd.Dir = repoDir
output, err := cmd.CombinedOutput()
duration := time.Since(startTime).Seconds()
if err != nil {
result.Success = false
result.Duration = duration
result.Errors = append(result.Errors, fmt.Sprintf("GitLab CI pipeline execution failed: %s", output))
return result
}
result.Success = true
result.Duration = duration
// Get artifacts
artifactsDir := filepath.Join(repoDir, ".gitlab-ci-local", "artifacts")
if _, err := os.Stat(artifactsDir); err == nil {
filepath.Walk(artifactsDir, func(path string, info os.FileInfo, err error) error {
if !info.IsDir() {
relPath, _ := filepath.Rel(artifactsDir, path)
result.Artifacts = append(result.Artifacts, relPath)
}
return nil
})
}
return result
}
func saveValidationResult(result ValidationResult, filePath string) {
data, err := json.MarshalIndent(result, "", " ")
if err != nil {
log.Printf("Failed to marshal validation result: %v", err)
return
}
if err := ioutil.WriteFile(filePath, data, 0644); err != nil {
log.Printf("Failed to save validation result: %v", err)
}
}
func compareResults(jenkins, gitlab ValidationResult, filePath string) {
comparison := struct {
PipelineName string `json:"pipelineName"`
JenkinsSuccess bool `json:"jenkinsSuccess"`
GitLabSuccess bool `json:"gitlabSuccess"`
DurationDiff float64 `json:"durationDiff"` // positive means GitLab is slower
ArtifactsMissing []string `json:"artifactsMissing"`
ArtifactsExtra []string `json:"artifactsExtra"`
JenkinsErrors []string `json:"jenkinsErrors"`
GitLabErrors []string `json:"gitlabErrors"`
Compatible bool `json:"compatible"`
Recommendations []string `json:"recommendations"`
Jenkins ValidationResult `json:"jenkins"`
GitLab ValidationResult `json:"gitlab"`
}{
PipelineName: jenkins.PipelineName,
JenkinsSuccess: jenkins.Success,
GitLabSuccess: gitlab.Success,
DurationDiff: gitlab.Duration - jenkins.Duration,
JenkinsErrors: jenkins.Errors,
GitLabErrors: gitlab.Errors,
Jenkins: jenkins,
GitLab: gitlab,
}
// Find missing artifacts
for _, jenkinsArtifact := range jenkins.Artifacts {
found := false
for _, gitlabArtifact := range gitlab.Artifacts {
if jenkinsArtifact == gitlabArtifact {
found = true
break
}
}
if !found {
comparison.ArtifactsMissing = append(comparison.ArtifactsMissing, jenkinsArtifact)
}
}
// Find extra artifacts
for _, gitlabArtifact := range gitlab.Artifacts {
found := false
for _, jenkinsArtifact := range jenkins.Artifacts {
if gitlabArtifact == jenkinsArtifact {
found = true
break
}
}
if !found {
comparison.ArtifactsExtra = append(comparison.ArtifactsExtra, gitlabArtifact)
}
}
// Determine compatibility
comparison.Compatible = gitlab.Success && len(comparison.ArtifactsMissing) == 0
// Generate recommendations
if !gitlab.Success {
comparison.Recommendations = append(comparison.Recommendations, "Fix GitLab CI pipeline configuration to ensure successful execution")
}
if len(comparison.ArtifactsMissing) > 0 {
comparison.Recommendations = append(comparison.Recommendations, "Update GitLab CI configuration to properly capture all artifacts")
}
if comparison.DurationDiff > 30 {
comparison.Recommendations = append(comparison.Recommendations, "Optimize GitLab CI pipeline for better performance")
}
data, err := json.MarshalIndent(comparison, "", " ")
if err != nil {
log.Printf("Failed to marshal comparison result: %v", err)
return
}
if err := ioutil.WriteFile(filePath, data, 0644); err != nil {
log.Printf("Failed to save comparison result: %v", err)
}
}
func mapToEnvSlice(env map[string]string) []string {
var result []string
for k, v := range env {
result = append(result, fmt.Sprintf("%s=%s", k, v))
}
return result
}
• Created a Rust-based CI migration tool:
// ci_migration_tool.rs
use clap::{App, Arg, SubCommand};
use regex::Regex;
use serde::{Deserialize, Serialize};
use std::collections::HashMap;
use std::fs::{self, File};
use std::io::{self, Read, Write};
use std::path::{Path, PathBuf};
use std::process::Command;
#[derive(Debug, Serialize, Deserialize)]
struct JenkinsStage {
name: String,
steps: Vec<String>,
}
#[derive(Debug, Serialize, Deserialize)]
struct JenkinsPipeline {
stages: Vec<JenkinsStage>,
environment: HashMap<String, String>,
post_actions: HashMap<String, Vec<String>>,
}
#[derive(Debug, Serialize, Deserialize)]
struct GitLabJob {
stage: String,
script: Vec<String>,
artifacts: Option<GitLabArtifacts>,
cache: Option<GitLabCache>,
variables: Option<HashMap<String, String>>,
only: Option<Vec<String>>,
except: Option<Vec<String>>,
tags: Option<Vec<String>>,
dependencies: Option<Vec<String>>,
}
#[derive(Debug, Serialize, Deserialize)]
struct GitLabArtifacts {
paths: Vec<String>,
expire_in: Option<String>,
}
#[derive(Debug, Serialize, Deserialize)]
struct GitLabCache {
key: String,
paths: Vec<String>,
}
#[derive(Debug, Serialize, Deserialize)]
struct GitLabCI {
stages: Vec<String>,
variables: HashMap<String, String>,
jobs: HashMap<String, GitLabJob>,
}
fn main() -> io::Result<()> {
let matches = App::new("CI Migration Tool")
.version("1.0")
.author("DevOps Team")
.about("Migrates CI pipelines between different systems")
.subcommand(
SubCommand::with_name("jenkins-to-gitlab")
.about("Converts Jenkins pipeline to GitLab CI")
.arg(
Arg::with_name("input")
.short("i")
.long("input")
.value_name("FILE")
.help("Input Jenkinsfile")
.required(true)
.takes_value(true),
)
.arg(
Arg::with_name("output")
.short("o")
.long("output")
.value_name("FILE")
.help("Output .gitlab-ci.yml file")
.required(true)
.takes_value(true),
),
)
.subcommand(
SubCommand::with_name("validate")
.about("Validates CI pipeline configuration")
.arg(
Arg::with_name("file")
.short("f")
.long("file")
.value_name("FILE")
.help("CI configuration file to validate")
.required(true)
.takes_value(true),
)
.arg(
Arg::with_name("type")
.short("t")
.long("type")
.value_name("TYPE")
.help("CI system type (jenkins or gitlab)")
.required(true)
.takes_value(true),
),
)
.subcommand(
SubCommand::with_name("sync-secrets")
.about("Synchronizes secrets between CI systems")
.arg(
Arg::with_name("source")
.short("s")
.long("source")
.value_name("SOURCE")
.help("Source CI system (jenkins or gitlab)")
.required(true)
.takes_value(true),
)
.arg(
Arg::with_name("target")
.short("t")
.long("target")
.value_name("TARGET")
.help("Target CI system (jenkins or gitlab)")
.required(true)
.takes_value(true),
)
.arg(
Arg::with_name("project")
.short("p")
.long("project")
.value_name("PROJECT")
.help("Project name or ID")
.required(true)
.takes_value(true),
),
)
.get_matches();
if let Some(matches) = matches.subcommand_matches("jenkins-to-gitlab") {
let input_file = matches.value_of("input").unwrap();
let output_file = matches.value_of("output").unwrap();
convert_jenkins_to_gitlab(input_file, output_file)?;
} else if let Some(matches) = matches.subcommand_matches("validate") {
let file = matches.value_of("file").unwrap();
let ci_type = matches.value_of("type").unwrap();
validate_ci_config(file, ci_type)?;
} else if let Some(matches) = matches.subcommand_matches("sync-secrets") {
let source = matches.value_of("source").unwrap();
let target = matches.value_of("target").unwrap();
let project = matches.value_of("project").unwrap();
sync_secrets(source, target, project)?;
} else {
println!("No subcommand specified. Use --help for usage information.");
}
Ok(())
}
fn convert_jenkins_to_gitlab(input_file: &str, output_file: &str) -> io::Result<()> {
println!("Converting Jenkins pipeline to GitLab CI...");
println!("Input: {}", input_file);
println!("Output: {}", output_file);
// Read Jenkinsfile
let mut file = File::open(input_file)?;
let mut content = String::new();
file.read_to_string(&mut content)?;
// Parse Jenkinsfile (simplified parsing for demonstration)
let jenkins_pipeline = parse_jenkinsfile(&content)?;
// Convert to GitLab CI
let gitlab_ci = convert_pipeline(jenkins_pipeline)?;
// Write GitLab CI file
let yaml = serde_yaml::to_string(&gitlab_ci).map_err(|e| {
io::Error::new(
io::ErrorKind::Other,
format!("Failed to serialize GitLab CI YAML: {}", e),
)
})?;
let mut output = File::create(output_file)?;
output.write_all(yaml.as_bytes())?;
println!("Conversion completed successfully!");
Ok(())
}
fn parse_jenkinsfile(content: &str) -> io::Result<JenkinsPipeline> {
// This is a simplified parser for demonstration purposes
// A real implementation would need a proper parser for Jenkinsfile syntax
let mut pipeline = JenkinsPipeline {
stages: Vec::new(),
environment: HashMap::new(),
post_actions: HashMap::new(),
};
// Extract stages
let stage_regex = Regex::new(r"stage\s*\(\s*['\"](.*?)['\"]\s*\)\s*\{([\s\S]*?)\}").unwrap();
for cap in stage_regex.captures_iter(content) {
let stage_name = cap[1].to_string();
let stage_content = cap[2].to_string();
// Extract steps
let steps_regex = Regex::new(r"steps\s*\{([\s\S]*?)\}").unwrap();
let mut steps = Vec::new();
if let Some(steps_cap) = steps_regex.captures(&stage_content) {
let steps_content = steps_cap[1].to_string();
let step_regex = Regex::new(r"sh\s*['\"](.*?)['\"]\s*").unwrap();
for step_cap in step_regex.captures_iter(&steps_content) {
steps.push(step_cap[1].to_string());
}
}
pipeline.stages.push(JenkinsStage {
name: stage_name,
steps,
});
}
// Extract environment variables
let env_regex = Regex::new(r"environment\s*\{([\s\S]*?)\}").unwrap();
if let Some(env_cap) = env_regex.captures(content) {
let env_content = env_cap[1].to_string();
let var_regex = Regex::new(r"(\w+)\s*=\s*['\"]?(.*?)['\"]?$").unwrap();
for var_cap in var_regex.captures_iter(&env_content) {
pipeline.environment.insert(var_cap[1].to_string(), var_cap[2].to_string());
}
}
// Extract post actions
let post_regex = Regex::new(r"post\s*\{([\s\S]*?)\}").unwrap();
if let Some(post_cap) = post_regex.captures(content) {
let post_content = post_cap[1].to_string();
let condition_regex = Regex::new(r"(\w+)\s*\{([\s\S]*?)\}").unwrap();
for cond_cap in condition_regex.captures_iter(&post_content) {
let condition = cond_cap[1].to_string();
let actions_content = cond_cap[2].to_string();
let action_regex = Regex::new(r"sh\s*['\"](.*?)['\"]\s*").unwrap();
let mut actions = Vec::new();
for action_cap in action_regex.captures_iter(&actions_content) {
actions.push(action_cap[1].to_string());
}
pipeline.post_actions.insert(condition, actions);
}
}
Ok(pipeline)
}
fn convert_pipeline(jenkins: JenkinsPipeline) -> io::Result<GitLabCI> {
let mut gitlab = GitLabCI {
stages: jenkins.stages.iter().map(|s| s.name.clone()).collect(),
variables: jenkins.environment,
jobs: HashMap::new(),
};
// Convert stages to jobs
for stage in &jenkins.stages {
let job_name = stage.name.to_lowercase().replace(" ", "_");
let mut job = GitLabJob {
stage: stage.name.clone(),
script: stage.steps.clone(),
artifacts: None,
cache: None,
variables: None,
only: Some(vec!["main".to_string(), "master".to_string()]),
except: None,
tags: None,
dependencies: None,
};
// Add artifacts for build stages
if stage.name.to_lowercase().contains("build") {
job.artifacts = Some(GitLabArtifacts {
paths: vec!["build/".to_string(), "dist/".to_string()],
expire_in: Some("1 week".to_string()),
});
}
// Add cache for build and test stages
if stage.name.to_lowercase().contains("build") || stage.name.to_lowercase().contains("test") {
job.cache = Some(GitLabCache {
key: "${CI_COMMIT_REF_SLUG}".to_string(),
paths: vec![".gradle/".to_string(), "node_modules/".to_string()],
});
}
gitlab.jobs.insert(job_name, job);
}
// Add post-action jobs
if let Some(failure_actions) = jenkins.post_actions.get("failure") {
gitlab.jobs.insert(
"notify_failure".to_string(),
GitLabJob {
stage: "notify".to_string(),
script: failure_actions.clone(),
artifacts: None,
cache: None,
variables: None,
only: None,
except: None,
tags: None,
dependencies: None,
},
);
// Add notify stage if it doesn't exist
if !gitlab.stages.contains(&"notify".to_string()) {
gitlab.stages.push("notify".to_string());
}
}
Ok(gitlab)
}
fn validate_ci_config(file: &str, ci_type: &str) -> io::Result<()> {
println!("Validating CI configuration...");
println!("File: {}", file);
println!("Type: {}", ci_type);
match ci_type.to_lowercase().as_str() {
"jenkins" => {
// Validate Jenkinsfile
let output = Command::new("java")
.arg("-jar")
.arg("/usr/share/jenkins/jenkins-cli.jar")
.arg("-s")
.arg("http://localhost:8080")
.arg("declarative-linter")
.arg("--file")
.arg(file)
.output()?;
if output.status.success() {
println!("Jenkins pipeline validation successful!");
} else {
println!("Jenkins pipeline validation failed:");
println!("{}", String::from_utf8_lossy(&output.stderr));
return Err(io::Error::new(
io::ErrorKind::Other,
"Jenkins pipeline validation failed",
));
}
}
"gitlab" => {
// Validate GitLab CI file
let output = Command::new("gitlab-runner")
.arg("lint")
.arg(file)
.output()?;
if output.status.success() {
println!("GitLab CI validation successful!");
} else {
println!("GitLab CI validation failed:");
println!("{}", String::from_utf8_lossy(&output.stderr));
return Err(io::Error::new(
io::ErrorKind::Other,
"GitLab CI validation failed",
));
}
}
_ => {
return Err(io::Error::new(
io::ErrorKind::InvalidInput,
format!("Unsupported CI type: {}", ci_type),
));
}
}
Ok(())
}
fn sync_secrets(source: &str, target: &str, project: &str) -> io::Result<()> {
println!("Synchronizing secrets...");
println!("Source: {}", source);
println!("Target: {}", target);
println!("Project: {}", project);
// This is a simplified implementation for demonstration purposes
// A real implementation would use the Jenkins and GitLab APIs
match (source.to_lowercase().as_str(), target.to_lowercase().as_str()) {
("jenkins", "gitlab") => {
// Get secrets from Jenkins
let output = Command::new("java")
.arg("-jar")
.arg("/usr/share/jenkins/jenkins-cli.jar")
.arg("-s")
.arg("http://localhost:8080")
.arg("list-credentials")
.arg("--format=json")
.output()?;
if !output.status.success() {
return Err(io::Error::new(
io::ErrorKind::Other,
"Failed to get secrets from Jenkins",
));
}
// Parse Jenkins credentials (simplified)
let credentials_json = String::from_utf8_lossy(&output.stdout);
println!("Found Jenkins credentials: {}", credentials_json);
// Set secrets in GitLab (simplified)
println!("Setting secrets in GitLab project: {}", project);
// In a real implementation, this would use the GitLab API to set variables
}
("gitlab", "jenkins") => {
// Get secrets from GitLab
let output = Command::new("gitlab")
.arg("project-variable")
.arg("list")
.arg("--project")
.arg(project)
.output()?;
if !output.status.success() {
return Err(io::Error::new(
io::ErrorKind::Other,
"Failed to get secrets from GitLab",
));
}
// Parse GitLab variables (simplified)
let variables_output = String::from_utf8_lossy(&output.stdout);
println!("Found GitLab variables: {}", variables_output);
// Set secrets in Jenkins (simplified)
println!("Setting secrets in Jenkins");
// In a real implementation, this would use the Jenkins API to set credentials
}
_ => {
return Err(io::Error::new(
io::ErrorKind::InvalidInput,
format!("Unsupported CI systems: {} to {}", source, target),
));
}
}
println!("Secrets synchronized successfully!");
Ok(())
}
• Implemented a comprehensive CI migration plan:
{
"migrationPlan": {
"name": "Jenkins to GitLab CI Migration",
"phases": [
{
"name": "Assessment",
"tasks": [
{
"name": "Inventory Jenkins Jobs",
"description": "Create inventory of all Jenkins jobs and pipelines",
"responsible": "DevOps Team",
"estimatedEffort": "3 days"
},
{
"name": "Analyze Dependencies",
"description": "Identify dependencies between jobs and external systems",
"responsible": "DevOps Team",
"estimatedEffort": "2 days"
},
{
"name": "Identify Custom Plugins",
"description": "List all custom Jenkins plugins in use",
"responsible": "DevOps Team",
"estimatedEffort": "1 day"
},
{
"name": "Resource Requirements",
"description": "Determine GitLab runner resource requirements",
"responsible": "Infrastructure Team",
"estimatedEffort": "2 days"
}
]
},
{
"name": "Preparation",
"tasks": [
{
"name": "Set Up GitLab CI Infrastructure",
"description": "Provision and configure GitLab runners",
"responsible": "Infrastructure Team",
"estimatedEffort": "5 days"
},
{
"name": "Create Migration Tools",
"description": "Develop tools for automated migration and validation",
"responsible": "DevOps Team",
"estimatedEffort": "10 days"
},
{
"name": "Define Migration Strategy",
"description": "Create detailed migration plan with prioritization",
"responsible": "DevOps Team Lead",
"estimatedEffort": "3 days"
},
{
"name": "Training",
"description": "Train development teams on GitLab CI",
"responsible": "DevOps Team",
"estimatedEffort": "5 days"
}
]
},
{
"name": "Pilot Migration",
"tasks": [
{
"name": "Select Pilot Projects",
"description": "Identify 3-5 projects for initial migration",
"responsible": "DevOps Team Lead",
"estimatedEffort": "1 day"
},
{
"name": "Migrate Pilot Projects",
"description": "Convert Jenkins pipelines to GitLab CI for pilot projects",
"responsible": "DevOps Team",
"estimatedEffort": "5 days"
},
{
"name": "Parallel Running",
"description": "Run both Jenkins and GitLab CI in parallel for pilot projects",
"responsible": "DevOps Team",
"estimatedEffort": "10 days"
},
{
"name": "Evaluate Results",
"description": "Compare results and gather feedback",
"responsible": "DevOps Team Lead",
"estimatedEffort": "3 days"
}
]
},
{
"name": "Full Migration",
"tasks": [
{
"name": "Prioritize Remaining Projects",
"description": "Create migration schedule for all projects",
"responsible": "DevOps Team Lead",
"estimatedEffort": "2 days"
},
{
"name": "Batch Migration",
"description": "Migrate projects in batches of 10-15",
"responsible": "DevOps Team",
"estimatedEffort": "20 days"
},
{
"name": "Validation",
"description": "Validate each migrated pipeline",
"responsible": "QA Team",
"estimatedEffort": "15 days"
},
{
"name": "Documentation",
"description": "Update documentation for new CI/CD processes",
"responsible": "Documentation Team",
"estimatedEffort": "10 days"
}
]
},
{
"name": "Decommissioning",
"tasks": [
{
"name": "Final Verification",
"description": "Ensure all pipelines are successfully migrated",
"responsible": "DevOps Team Lead",
"estimatedEffort": "5 days"
},
{
"name": "Jenkins Freeze",
"description": "Freeze Jenkins configuration and set to read-only",
"responsible": "DevOps Team",
"estimatedEffort": "1 day"
},
{
"name": "Monitoring Period",
"description": "Monitor GitLab CI performance for 30 days",
"responsible": "DevOps Team",
"estimatedEffort": "30 days"
},
{
"name": "Jenkins Decommissioning",
"description": "Backup and decommission Jenkins servers",
"responsible": "Infrastructure Team",
"estimatedEffort": "5 days"
}
]
}
],
"risks": [
{
"description": "Custom Jenkins plugins without GitLab equivalents",
"impact": "High",
"mitigation": "Identify alternatives or develop custom GitLab integrations"
},
{
"description": "Complex Jenkins pipelines that are difficult to migrate",
"impact": "Medium",
"mitigation": "Break down complex pipelines into smaller, more manageable components"
},
{
"description": "Team resistance to change",
"impact": "Medium",
"mitigation": "Provide comprehensive training and documentation"
},
{
"description": "Insufficient GitLab runner resources",
"impact": "High",
"mitigation": "Properly size and scale GitLab runner infrastructure"
},
{
"description": "Secret management differences",
"impact": "High",
"mitigation": "Develop secure process for migrating secrets between systems"
}
]
}
}
Lessons Learned:
CI/CD tool migrations require careful planning and validation to ensure consistent behavior.
How to Avoid:
Conduct thorough assessment of current CI/CD workflows before migration.
Create automated validation tools to compare pipeline results between systems.
Run both systems in parallel during migration to catch inconsistencies.
Ensure proper resource allocation for the new CI/CD system.
Implement comprehensive testing of migrated pipelines before cutover.
No summary provided
What Happened:
After upgrading Ansible from version 2.9 to 2.12 in a Jenkins pipeline, all deployments began failing with cryptic errors. The issue affected multiple teams and projects, causing significant deployment delays.
Diagnosis Steps:
Examined Jenkins build logs for error patterns.
Compared working and failing pipeline configurations.
Tested Ansible playbooks manually with different versions.
Reviewed Ansible release notes and breaking changes.
Checked for Python dependency conflicts.
Root Cause:
Multiple compatibility issues were identified: 1. Ansible 2.12 introduced breaking changes in module parameter names 2. Custom Ansible modules were using deprecated APIs 3. Python dependencies had version conflicts 4. Jenkins plugin for Ansible was incompatible with the new version 5. Ansible collections were not properly installed in the Jenkins environment
Fix/Workaround:
• Short-term: Implemented a version-specific container for Ansible:
# Dockerfile for version-specific Ansible
FROM python:3.9-slim
ARG ANSIBLE_VERSION=2.12.0
RUN apt-get update && \
apt-get install -y --no-install-recommends \
openssh-client \
sshpass \
git \
curl \
ca-certificates \
&& rm -rf /var/lib/apt/lists/*
RUN pip install --no-cache-dir ansible==${ANSIBLE_VERSION} \
ansible-lint \
yamllint \
molecule \
docker \
pytest-testinfra \
boto3 \
jmespath
# Install required Ansible collections
RUN ansible-galaxy collection install community.general \
ansible.posix \
community.docker \
community.aws
# Create non-root user
RUN useradd -m ansible
USER ansible
WORKDIR /home/ansible
# Set up SSH config for better handling of host keys
RUN mkdir -p /home/ansible/.ssh && \
echo "Host *\n\tStrictHostKeyChecking no\n\tUserKnownHostsFile /dev/null" > /home/ansible/.ssh/config && \
chmod 600 /home/ansible/.ssh/config
ENTRYPOINT ["ansible-playbook"]
• Updated Jenkins pipeline to use the containerized Ansible:
// Jenkinsfile with containerized Ansible
pipeline {
agent {
kubernetes {
yaml """
apiVersion: v1
kind: Pod
spec:
containers:
- name: ansible
image: company-registry/ansible:2.12.0
command:
- cat
tty: true
volumeMounts:
- name: ssh-keys
mountPath: /home/ansible/.ssh/id_rsa
subPath: id_rsa
readOnly: true
volumes:
- name: ssh-keys
secret:
secretName: ansible-ssh-keys
defaultMode: 0600
"""
}
}
environment {
ANSIBLE_CONFIG = "${WORKSPACE}/ansible.cfg"
}
stages {
stage('Checkout') {
steps {
checkout scm
}
}
stage('Lint') {
steps {
container('ansible') {
sh 'ansible-lint playbooks/'
}
}
}
stage('Deploy') {
steps {
container('ansible') {
withCredentials([string(credentialsId: 'vault-password', variable: 'VAULT_PASSWORD')]) {
sh '''
echo "$VAULT_PASSWORD" > .vault_password
ansible-playbook -i inventory/production playbooks/deploy.yml --vault-password-file .vault_password
rm -f .vault_password
'''
}
}
}
}
}
post {
always {
cleanWs()
}
}
}
• Long-term: Implemented a comprehensive automation tool versioning strategy:
# automation-tools.yaml - Tool versioning configuration
version: '1'
tools:
ansible:
current_version: '2.12.0'
previous_versions:
- version: '2.9.27'
support_until: '2023-06-30'
container_image: 'company-registry/ansible:2.9.27'
- version: '2.10.17'
support_until: '2023-12-31'
container_image: 'company-registry/ansible:2.10.17'
next_version: '2.13.0'
next_version_testing_date: '2023-04-15'
dependencies:
- name: 'python'
version: '>=3.8,<3.11'
- name: 'ansible-lint'
version: '>=6.0.0,<7.0.0'
collections:
- name: 'community.general'
version: '>=4.0.0'
- name: 'ansible.posix'
version: '>=1.3.0'
plugins:
- name: 'jenkins-ansible-plugin'
compatible_versions: '>=2.12.0,<3.0.0'
terraform:
current_version: '1.2.9'
previous_versions:
- version: '1.1.9'
support_until: '2023-09-30'
container_image: 'company-registry/terraform:1.1.9'
- version: '1.0.11'
support_until: '2023-06-30'
container_image: 'company-registry/terraform:1.0.11'
next_version: '1.3.0'
next_version_testing_date: '2023-05-01'
jenkins:
current_version: '2.346.3'
previous_versions:
- version: '2.332.3'
support_until: '2023-08-31'
next_version: '2.361.1'
next_version_testing_date: '2023-06-01'
plugins:
- name: 'kubernetes'
version: '3.12.0'
- name: 'pipeline'
version: '2.6'
- name: 'git'
version: '4.11.3'
environments:
development:
tool_versions:
ansible: 'current_version'
terraform: 'current_version'
jenkins: 'current_version'
staging:
tool_versions:
ansible: 'current_version'
terraform: 'current_version'
jenkins: 'current_version'
production:
tool_versions:
ansible: 'current_version'
terraform: 'current_version'
jenkins: 'current_version'
legacy:
tool_versions:
ansible: '2.9.27'
terraform: '1.0.11'
jenkins: '2.332.3'
upgrade_policy:
testing_period_days: 30
rollback_plan_required: true
approval_required: true
approvers:
- 'infrastructure-team'
- 'security-team'
notification_channels:
- '#devops-releases'
- 'devops-team@company.com'
• Created a tool version compatibility testing framework:
// tool_version_tester.go
package main
import (
"context"
"encoding/json"
"fmt"
"io/ioutil"
"log"
"os"
"os/exec"
"path/filepath"
"strings"
"time"
"gopkg.in/yaml.v3"
)
// ToolConfig represents the configuration for a specific tool
type ToolConfig struct {
CurrentVersion string `yaml:"current_version"`
PreviousVersions []struct {
Version string `yaml:"version"`
SupportUntil string `yaml:"support_until"`
Image string `yaml:"container_image"`
} `yaml:"previous_versions"`
NextVersion string `yaml:"next_version"`
NextVersionTestDate string `yaml:"next_version_testing_date"`
Dependencies []struct {
Name string `yaml:"name"`
Version string `yaml:"version"`
} `yaml:"dependencies"`
Collections []struct {
Name string `yaml:"name"`
Version string `yaml:"version"`
} `yaml:"collections"`
Plugins []struct {
Name string `yaml:"name"`
CompatibleVersions string `yaml:"compatible_versions"`
} `yaml:"plugins"`
}
// ToolsConfig represents the configuration for all tools
type ToolsConfig struct {
Version string `yaml:"version"`
Tools map[string]ToolConfig `yaml:"tools"`
Environments map[string]struct {
ToolVersions map[string]string `yaml:"tool_versions"`
} `yaml:"environments"`
UpgradePolicy struct {
TestingPeriodDays int `yaml:"testing_period_days"`
RollbackPlanRequired bool `yaml:"rollback_plan_required"`
ApprovalRequired bool `yaml:"approval_required"`
Approvers []string `yaml:"approvers"`
NotificationChannels []string `yaml:"notification_channels"`
} `yaml:"upgrade_policy"`
}
// TestCase represents a test case for a tool version
type TestCase struct {
Name string `json:"name"`
Description string `json:"description"`
Tool string `json:"tool"`
Version string `json:"version"`
Commands []string `json:"commands"`
Artifacts []string `json:"artifacts"`
Timeout string `json:"timeout"`
}
// TestResult represents the result of a test case
type TestResult struct {
TestCase TestCase `json:"test_case"`
Success bool `json:"success"`
Output string `json:"output"`
Error string `json:"error"`
Duration string `json:"duration"`
StartTime string `json:"start_time"`
EndTime string `json:"end_time"`
Environment string `json:"environment"`
}
// VersionCompatibilityTest represents a version compatibility test
type VersionCompatibilityTest struct {
config ToolsConfig
testCases []TestCase
results []TestResult
baseDir string
}
// NewVersionCompatibilityTest creates a new version compatibility test
func NewVersionCompatibilityTest(configPath, testCasesPath, baseDir string) (*VersionCompatibilityTest, error) {
// Read and parse config file
configData, err := ioutil.ReadFile(configPath)
if err != nil {
return nil, fmt.Errorf("failed to read config file: %w", err)
}
var config ToolsConfig
if err := yaml.Unmarshal(configData, &config); err != nil {
return nil, fmt.Errorf("failed to parse config file: %w", err)
}
// Read and parse test cases file
testCasesData, err := ioutil.ReadFile(testCasesPath)
if err != nil {
return nil, fmt.Errorf("failed to read test cases file: %w", err)
}
var testCases []TestCase
if err := json.Unmarshal(testCasesData, &testCases); err != nil {
return nil, fmt.Errorf("failed to parse test cases file: %w", err)
}
return &VersionCompatibilityTest{
config: config,
testCases: testCases,
results: []TestResult{},
baseDir: baseDir,
}, nil
}
// RunTests runs all test cases
func (t *VersionCompatibilityTest) RunTests(environment string) error {
log.Printf("Running version compatibility tests for environment: %s", environment)
// Create results directory
resultsDir := filepath.Join(t.baseDir, "results", time.Now().Format("20060102-150405"))
if err := os.MkdirAll(resultsDir, 0755); err != nil {
return fmt.Errorf("failed to create results directory: %w", err)
}
// Get tool versions for the environment
envConfig, ok := t.config.Environments[environment]
if !ok {
return fmt.Errorf("environment %s not found in config", environment)
}
// Run tests for each tool
for _, testCase := range t.testCases {
// Skip test if tool version doesn't match environment
toolVersion, ok := envConfig.ToolVersions[testCase.Tool]
if !ok {
log.Printf("Skipping test %s: tool %s not configured for environment %s", testCase.Name, testCase.Tool, environment)
continue
}
// Resolve version if it's a reference like 'current_version'
if toolVersion == "current_version" {
toolVersion = t.config.Tools[testCase.Tool].CurrentVersion
}
// Skip test if version doesn't match
if testCase.Version != "any" && testCase.Version != toolVersion {
log.Printf("Skipping test %s: version %s doesn't match environment version %s", testCase.Name, testCase.Version, toolVersion)
continue
}
log.Printf("Running test: %s for %s version %s", testCase.Name, testCase.Tool, toolVersion)
result := t.runTestCase(testCase, toolVersion, environment, resultsDir)
t.results = append(t.results, result)
// Write result to file
resultPath := filepath.Join(resultsDir, fmt.Sprintf("%s_%s.json", testCase.Tool, testCase.Name))
resultData, err := json.MarshalIndent(result, "", " ")
if err != nil {
log.Printf("Failed to marshal test result: %v", err)
continue
}
if err := ioutil.WriteFile(resultPath, resultData, 0644); err != nil {
log.Printf("Failed to write test result: %v", err)
}
}
// Write summary to file
summaryPath := filepath.Join(resultsDir, "summary.json")
summaryData, err := json.MarshalIndent(t.results, "", " ")
if err != nil {
return fmt.Errorf("failed to marshal test results: %w", err)
}
if err := ioutil.WriteFile(summaryPath, summaryData, 0644); err != nil {
return fmt.Errorf("failed to write test results: %w", err)
}
return nil
}
// runTestCase runs a single test case
func (t *VersionCompatibilityTest) runTestCase(testCase TestCase, version, environment, resultsDir string) TestResult {
result := TestResult{
TestCase: testCase,
Success: false,
Output: "",
Error: "",
StartTime: time.Now().Format(time.RFC3339),
}
// Create test directory
testDir := filepath.Join(t.baseDir, "tests", testCase.Tool, testCase.Name)
if err := os.MkdirAll(testDir, 0755); err != nil {
result.Error = fmt.Sprintf("Failed to create test directory: %v", err)
result.EndTime = time.Now().Format(time.RFC3339)
result.Duration = time.Since(time.Now()).String()
return result
}
// Parse timeout
timeout := 5 * time.Minute
if testCase.Timeout != "" {
var err error
timeout, err = time.ParseDuration(testCase.Timeout)
if err != nil {
result.Error = fmt.Sprintf("Failed to parse timeout: %v", err)
result.EndTime = time.Now().Format(time.RFC3339)
result.Duration = time.Since(time.Now()).String()
return result
}
}
// Create context with timeout
ctx, cancel := context.WithTimeout(context.Background(), timeout)
defer cancel()
// Run commands
var outputBuilder strings.Builder
startTime := time.Now()
for _, command := range testCase.Commands {
// Replace variables in command
command = strings.ReplaceAll(command, "${VERSION}", version)
command = strings.ReplaceAll(command, "${TOOL}", testCase.Tool)
command = strings.ReplaceAll(command, "${TEST_DIR}", testDir)
command = strings.ReplaceAll(command, "${RESULTS_DIR}", resultsDir)
// Split command into parts
parts := strings.Split(command, " ")
cmd := exec.CommandContext(ctx, parts[0], parts[1:]...)
cmd.Dir = testDir
// Capture output
output, err := cmd.CombinedOutput()
outputBuilder.Write(output)
if err != nil {
result.Error = fmt.Sprintf("Command failed: %v\nOutput: %s", err, output)
result.Output = outputBuilder.String()
result.EndTime = time.Now().Format(time.RFC3339)
result.Duration = time.Since(startTime).String()
return result
}
}
// Check for artifacts
for _, artifact := range testCase.Artifacts {
artifactPath := filepath.Join(testDir, artifact)
if _, err := os.Stat(artifactPath); os.IsNotExist(err) {
result.Error = fmt.Sprintf("Artifact not found: %s", artifactPath)
result.Output = outputBuilder.String()
result.EndTime = time.Now().Format(time.RFC3339)
result.Duration = time.Since(startTime).String()
return result
}
}
result.Success = true
result.Output = outputBuilder.String()
result.EndTime = time.Now().Format(time.RFC3339)
result.Duration = time.Since(startTime).String()
return result
}
func main() {
if len(os.Args) < 4 {
log.Fatalf("Usage: %s <config_path> <test_cases_path> <base_dir> [environment]", os.Args[0])
}
configPath := os.Args[1]
testCasesPath := os.Args[2]
baseDir := os.Args[3]
environment := "development"
if len(os.Args) > 4 {
environment = os.Args[4]
}
test, err := NewVersionCompatibilityTest(configPath, testCasesPath, baseDir)
if err != nil {
log.Fatalf("Failed to create version compatibility test: %v", err)
}
if err := test.RunTests(environment); err != nil {
log.Fatalf("Failed to run tests: %v", err)
}
log.Println("Tests completed successfully")
}
Lessons Learned:
Proper version management of automation tools is critical for stable CI/CD pipelines.
How to Avoid:
Implement containerized automation tools with specific versions.
Test tool upgrades thoroughly before applying them to production pipelines.
Maintain a version compatibility matrix for all automation tools.
Document breaking changes and migration paths for tool upgrades.
Use infrastructure as code to manage tool versions consistently.
No summary provided
What Happened:
A large financial services company used Jenkins for CI/CD and Ansible for configuration management. During a major platform upgrade, the deployment pipeline that had been working reliably for months suddenly began failing with cryptic Python errors. The failures occurred only in production deployments, while development and staging deployments continued to work normally. The issue caused significant delays in releasing critical security patches, creating business risk.
Diagnosis Steps:
Analyzed Jenkins build logs for error patterns.
Compared successful and failing deployment environments.
Examined Ansible playbook execution in detail.
Checked Python dependencies and versions across environments.
Reviewed recent changes to the automation infrastructure.
Root Cause:
The investigation revealed multiple version incompatibility issues: 1. The production Jenkins agent was running Python 3.8 while development and staging used Python 3.10 2. A recent Ansible update (2.12 to 2.13) had been applied only to development and staging environments 3. The deployment playbooks used features only available in Ansible 2.13 4. A Python dependency (cryptography) had different versions across environments 5. The Jenkins pipeline didn't explicitly specify or validate tool versions
Fix/Workaround:
• Implemented immediate fixes to restore deployments
• Temporarily downgraded Ansible playbooks to be compatible with version 2.12
• Created a standardized environment specification for all Jenkins agents
• Implemented version pinning for all Python dependencies
• Added environment validation steps to the pipeline
# Standardized Jenkins Agent Environment Configuration
# File: jenkins-agent-environment.yaml
version: '3'
services:
jenkins-agent:
image: jenkins/agent:latest
container_name: jenkins-agent
restart: unless-stopped
environment:
- JENKINS_URL=https://jenkins.example.com
- JENKINS_AGENT_NAME=production-agent-${AGENT_NUMBER}
- JENKINS_SECRET=${JENKINS_SECRET}
- JENKINS_AGENT_WORKDIR=/home/jenkins/agent
volumes:
- jenkins-agent-data:/home/jenkins/agent
- /var/run/docker.sock:/var/run/docker.sock
entrypoint: ["jenkins-agent"]
# Sidecar container with standardized tooling
jenkins-tools:
image: custom/jenkins-tools:1.0.0
container_name: jenkins-tools
restart: unless-stopped
volumes:
- jenkins-agent-data:/home/jenkins/agent
- /var/run/docker.sock:/var/run/docker.sock
command: ["tail", "-f", "/dev/null"] # Keep container running
volumes:
jenkins-agent-data:
# Standardized Jenkins Tools Container
# File: Dockerfile.jenkins-tools
FROM ubuntu:22.04
# Set non-interactive installation
ENV DEBIAN_FRONTEND=noninteractive
# Install basic tools
RUN apt-get update && apt-get install -y \
curl \
git \
jq \
unzip \
wget \
gnupg \
software-properties-common \
apt-transport-https \
ca-certificates \
lsb-release \
python3-pip \
python3-venv \
&& rm -rf /var/lib/apt/lists/*
# Set Python version explicitly
RUN add-apt-repository ppa:deadsnakes/ppa && \
apt-get update && \
apt-get install -y python3.10 python3.10-venv python3.10-dev && \
rm -rf /var/lib/apt/lists/* && \
update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.10 1 && \
update-alternatives --set python3 /usr/bin/python3.10
# Install specific version of Ansible
RUN python3 -m pip install --upgrade pip && \
python3 -m pip install ansible==2.13.3 ansible-core==2.13.3
# Install specific versions of Python dependencies
COPY requirements.txt /tmp/requirements.txt
RUN python3 -m pip install -r /tmp/requirements.txt
# Install Terraform
ARG TERRAFORM_VERSION=1.3.7
RUN wget -O- https://apt.releases.hashicorp.com/gpg | gpg --dearmor | tee /usr/share/keyrings/hashicorp-archive-keyring.gpg && \
echo "deb [signed-by=/usr/share/keyrings/hashicorp-archive-keyring.gpg] https://apt.releases.hashicorp.com $(lsb_release -cs) main" | tee /etc/apt/sources.list.d/hashicorp.list && \
apt-get update && apt-get install -y terraform=${TERRAFORM_VERSION} && \
rm -rf /var/lib/apt/lists/*
# Install AWS CLI
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64-2.9.15.zip" -o "awscliv2.zip" && \
unzip awscliv2.zip && \
./aws/install && \
rm -rf aws awscliv2.zip
# Install kubectl
ARG KUBECTL_VERSION=1.25.5
RUN curl -LO "https://dl.k8s.io/release/v${KUBECTL_VERSION}/bin/linux/amd64/kubectl" && \
chmod +x kubectl && \
mv kubectl /usr/local/bin/
# Create version manifest
RUN mkdir -p /opt/versions && \
echo "Python: $(python3 --version)" > /opt/versions/manifest.txt && \
echo "Ansible: $(ansible --version | head -n1)" >> /opt/versions/manifest.txt && \
echo "Terraform: $(terraform version | head -n1)" >> /opt/versions/manifest.txt && \
echo "AWS CLI: $(aws --version)" >> /opt/versions/manifest.txt && \
echo "kubectl: $(kubectl version --client=true --output=json | jq -r '.clientVersion.gitVersion')" >> /opt/versions/manifest.txt
# Create a non-root user
RUN useradd -m -s /bin/bash jenkins && \
mkdir -p /home/jenkins/.ssh && \
chown -R jenkins:jenkins /home/jenkins
USER jenkins
WORKDIR /home/jenkins
# Add version verification script
COPY --chown=jenkins:jenkins verify-environment.sh /home/jenkins/verify-environment.sh
RUN chmod +x /home/jenkins/verify-environment.sh
CMD ["/bin/bash"]
#!/bin/bash
# File: verify-environment.sh
# Purpose: Verify that the environment meets the required specifications
set -e
# Define required versions
REQUIRED_PYTHON_VERSION="3.10"
REQUIRED_ANSIBLE_VERSION="2.13.3"
REQUIRED_TERRAFORM_VERSION="1.3.7"
REQUIRED_KUBECTL_VERSION="1.25"
# Check Python version
PYTHON_VERSION=$(python3 -c 'import sys; print(".".join(map(str, sys.version_info[:2])))')
echo "Python version: $PYTHON_VERSION (required: $REQUIRED_PYTHON_VERSION)"
if [[ "$PYTHON_VERSION" != "$REQUIRED_PYTHON_VERSION" ]]; then
echo "ERROR: Python version mismatch"
exit 1
fi
# Check Ansible version
ANSIBLE_VERSION=$(ansible --version | head -n1 | awk '{print $2}')
echo "Ansible version: $ANSIBLE_VERSION (required: $REQUIRED_ANSIBLE_VERSION)"
if [[ "$ANSIBLE_VERSION" != "$REQUIRED_ANSIBLE_VERSION" ]]; then
echo "ERROR: Ansible version mismatch"
exit 1
fi
# Check Terraform version
TERRAFORM_VERSION=$(terraform version | head -n1 | awk '{print $2}' | sed 's/v//')
echo "Terraform version: $TERRAFORM_VERSION (required: $REQUIRED_TERRAFORM_VERSION)"
if [[ "$TERRAFORM_VERSION" != "$REQUIRED_TERRAFORM_VERSION" ]]; then
echo "ERROR: Terraform version mismatch"
exit 1
fi
# Check kubectl version
KUBECTL_VERSION=$(kubectl version --client=true --output=json | jq -r '.clientVersion.gitVersion' | sed 's/v//' | cut -d. -f1,2)
echo "kubectl version: $KUBECTL_VERSION (required: $REQUIRED_KUBECTL_VERSION)"
if [[ "$KUBECTL_VERSION" != "$REQUIRED_KUBECTL_VERSION" ]]; then
echo "ERROR: kubectl version mismatch"
exit 1
fi
# Check Python dependencies
echo "Checking Python dependencies..."
pip freeze > /tmp/installed_packages.txt
while IFS= read -r line; do
package=$(echo "$line" | cut -d'=' -f1)
version=$(echo "$line" | cut -d'=' -f2-)
# Get the required version from requirements.txt
required_version=$(grep "^$package==" /tmp/requirements.txt | cut -d'=' -f3-)
if [[ -n "$required_version" && "$version" != "$required_version" ]]; then
echo "ERROR: Package $package version mismatch. Found $version, required $required_version"
exit 1
fi
done < /tmp/installed_packages.txt
echo "Environment verification successful!"
exit 0
// Jenkins Pipeline with Environment Validation
// File: Jenkinsfile
pipeline {
agent {
label 'production-agent'
}
environment {
ANSIBLE_VERSION = '2.13.3'
PYTHON_VERSION = '3.10'
}
stages {
stage('Validate Environment') {
steps {
sh '''
# Verify tool versions
echo "Validating environment..."
# Check Python version
PYTHON_VERSION_ACTUAL=$(python3 -c 'import sys; print(".".join(map(str, sys.version_info[:2])))')
echo "Python version: $PYTHON_VERSION_ACTUAL (required: $PYTHON_VERSION)"
if [[ "$PYTHON_VERSION_ACTUAL" != "$PYTHON_VERSION" ]]; then
echo "ERROR: Python version mismatch"
exit 1
fi
# Check Ansible version
ANSIBLE_VERSION_ACTUAL=$(ansible --version | head -n1 | awk '{print $2}')
echo "Ansible version: $ANSIBLE_VERSION_ACTUAL (required: $ANSIBLE_VERSION)"
if [[ "$ANSIBLE_VERSION_ACTUAL" != "$ANSIBLE_VERSION" ]]; then
echo "ERROR: Ansible version mismatch"
exit 1
fi
# Verify Python dependencies
if [ -f "requirements.txt" ]; then
echo "Verifying Python dependencies..."
python3 -m pip install -r requirements.txt
fi
echo "Environment validation successful!"
'''
}
}
stage('Checkout') {
steps {
checkout scm
}
}
stage('Setup Virtual Environment') {
steps {
sh '''
# Create and activate virtual environment with specific Python version
python3 -m venv .venv
. .venv/bin/activate
# Install dependencies with pinned versions
python -m pip install -r requirements.txt
'''
}
}
stage('Run Ansible Playbook') {
steps {
withCredentials([sshUserPrivateKey(credentialsId: 'ansible-ssh-key', keyFileVariable: 'SSH_KEY')]) {
sh '''
# Activate virtual environment
. .venv/bin/activate
# Set up Ansible environment
export ANSIBLE_HOST_KEY_CHECKING=False
export ANSIBLE_SSH_PRIVATE_KEY_FILE=$SSH_KEY
# Run playbook with explicit version check
ansible --version
ansible-playbook -i inventory/production deploy.yml
'''
}
}
}
}
post {
always {
cleanWs()
}
failure {
script {
// Send detailed failure notification with environment information
def pythonVersion = sh(script: 'python3 --version', returnStdout: true).trim()
def ansibleVersion = sh(script: 'ansible --version | head -n1', returnStdout: true).trim()
mail(
to: 'devops-team@example.com',
subject: "Failed Pipeline: ${currentBuild.fullDisplayName}",
body: """
Pipeline failure in ${env.JOB_NAME} #${env.BUILD_NUMBER}:
Environment Information:
- Python: ${pythonVersion}
- Ansible: ${ansibleVersion}
- Agent: ${env.NODE_NAME}
Check the build log for details: ${env.BUILD_URL}
"""
)
}
}
}
}
# requirements.txt
# Pinned dependencies for deployment pipeline
ansible==2.13.3
ansible-core==2.13.3
cryptography==38.0.4
jinja2==3.1.2
markupsafe==2.1.1
pyyaml==6.0
boto3==1.26.27
botocore==1.29.27
netaddr==0.8.0
paramiko==2.12.0
requests==2.28.1
urllib3==1.26.13
Lessons Learned:
Automation tool version consistency is critical for reliable deployments across environments.
How to Avoid:
Containerize automation environments to ensure consistency.
Explicitly pin all dependency versions in requirements files.
Add environment validation steps to all pipelines.
Use infrastructure as code to define and version agent environments.
Implement automated testing of deployment pipelines across all environments.
No summary provided
What Happened:
A large retail company was deploying a major update to their e-commerce platform. The deployment pipeline used Jenkins for CI/CD and Ansible for configuration management. The deployment passed all tests in development and staging environments but failed in production. Investigation revealed that the production Jenkins server was running a newer version of Ansible than the development and staging environments. This version difference caused subtle changes in how certain Ansible modules behaved, resulting in configuration errors that only manifested in production.
Diagnosis Steps:
Analyzed Jenkins build logs from all environments.
Compared Ansible versions across environments.
Reviewed Ansible playbook execution in detail.
Tested the playbooks with different Ansible versions.
Examined the Ansible module documentation for version-specific changes.
Root Cause:
The investigation revealed multiple issues with the automation tooling: 1. Ansible versions were not consistently managed across environments 2. The production Jenkins server had been upgraded without corresponding updates to other environments 3. The Ansible playbooks used features that had behavior changes between versions 4. There was no version pinning or compatibility testing in the pipeline 5. The documentation for environment configurations was outdated
Fix/Workaround:
• Implemented immediate fix to restore service
• Standardized Ansible versions across all environments
• Implemented version pinning for all automation tools
• Created a compatibility testing stage in the pipeline
• Updated documentation for environment configurations
Lessons Learned:
Automation tool versions must be consistently managed across environments to ensure predictable behavior.
How to Avoid:
Implement version pinning for all automation tools.
Use containerized automation tools to ensure consistency.
Include version compatibility testing in CI/CD pipelines.
Document and enforce version management policies.
Coordinate tool upgrades across all environments.