Devops on Google Cloud Platform: Automating your build delivery on GCP in nutshell
Introduction
Nowadays IT projects tend to be complicated and gather many technologies. Building an app should then quickly become cumbersome.
Continuous Integration is therefore, more than ever, the cornerstone of every project. This article aims to present insights and feedback based on our experience and projects at Worldline. We will present how to build, ship applications on Google Cloud Platform using different technologies and practices. In addition, we will also highlight how AI could help us enhance the whole process on Google Cloud Platform.
Understanding CI/CD: A Key to Agile Software Development
In the fast-paced world of software development, the demand for quicker, higher-quality releases has never been greater. Continuous Integration and Continuous Delivery (or Continuous Deployment) have revolutionized the way teams approach software creation, making it faster, more efficient, and more reliable.
Continuous Integration (CI)
Continuous Integration is the practice that encourages developers to integrate their code changes into a shared repository frequently—often multiple times a day. Each integration is automatically verified through a series of builds and tests. The primary goal here is to identify issues early in the development process, enabling teams to resolve conflicts swiftly.
Continuous Delivery (CD)
Building on the principles of CI, Continuous Delivery ensures that the code integrated into the repository is always in a deployable state. With CD, automated processes facilitate the release of new updates, allowing teams to push software to production with minimal manual intervention. Continuous Deployment , a specific subset of this, automates the release of every code update that passes automated tests straight to production, making it instantly available to users.
How CI/CD Work Together
Think of CI/CD as a well-oiled machine—a pipeline that incorporates various stages:
- Code Commit: Developers make code changes and commit them to the repository.
- Build: The application is built automatically.
- Automated Testing: The new code is rigorously tested against a suite of automated tests.
- Staging Deployment: The build is deployed to a staging environment for further verification.
- Production Deployment: Upon passing all checks, the code can be deployed to production.
- This seamless integration of processes not only improves collaboration among development and operations teams but also increases the speed and reliability of software releases.
You probably understood CI/CD practices are not just buzzwords. They are essential tools that enable modern IT organizations to deliver software more efficiently and with greater quality.
The GCP application
In modern software development, automating DevOps pipelines is essential for maintaining agility, speed, and reliability. By leveraging the power of Google Cloud Platform (GCP) and integrating Artificial Intelligence (AI), you can further optimize your workflows for efficiency, resource management, and predictive analytics.
This article outlines how to automate a Continuous Integration and Continuous Deployment (CI/CD) pipeline for a Node.js backend and React frontend web application hosted on Google Kubernetes Engine (GKE). Additionally, we enhance the pipeline with AI-powered capabilities for predictive failure detection, automated issue resolution, and resource optimization.
Use Case: Automating Deployment for a Web Application
Our goal is to create an automated pipeline that builds, tests, and deploys Dockerized application artifacts to GKE upon code pushes.
Tools involved include:
- A GIT repository such as GitHub or GitLab : For hosting the source code.
- Cloud Build : For building and deploying code.
- Artifact Registry : To store Docker images.
- Google Kubernetes Engine (GKE) : For application hosting.
- Cloud Storage : For storing intermediate artifacts/logs.
- Cloud Pub/Sub (Optional) : For triggering advanced workflows.
- Cloud Monitoring and Logging : To monitor pipeline health.
Step 1: Set Up GCP Project and Enable APIs
- Create a new GCP project:
gcloud projects create my-webapp-project --set-as-default
gcloud config set project my-webapp-project
- Enable necessary APIs:
gcloud services enable \
cloudbuild.googleapis.com \
container.googleapis.com \
artifactregistry.googleapis.com \
storage.googleapis.com \
monitoring.googleapis.com \
logging.googleapis.com
- Configure IAM roles: Assign roles for the service account used by Cloud Build and GKE.
gcloud projects add-iam-policy-binding my-webapp-project \
--member=serviceAccount:$(gcloud projects get-iam-policy my-webapp-project \
--flatten="bindings[].members" \
--filter="bindings.role:roles/cloudbuild.builds.builder" \
--format="value(bindings.members[0])") \
--role=roles/container.developer
Step 2: Connect GitHub to Cloud Build
- Link GitHub repository to GCP:
- Navigate to Cloud Build > Triggers in the GCP Console.
- Select Create Trigger and choose your repository.
- Define cloudbuild.yaml: Place the following file in your repository root to define the pipeline steps.
steps:
# Frontend build and Docker image creation
- name: 'gcr.io/cloud-builders/npm'
dir: 'frontend'
args: ['install']
- name: 'gcr.io/cloud-builders/docker'
dir: 'frontend'
args: ['build', '-t', 'us-central1-docker.pkg.dev/my-webapp-project/frontend:latest', '.']
# Backend build and Docker image creation
- name: 'gcr.io/cloud-builders/npm'
dir: 'backend'
args: ['install']
- name: 'gcr.io/cloud-builders/docker'
dir: 'backend'
args: ['build', '-t', 'us-central1-docker.pkg.dev/my-webapp-project/backend:latest', '.']
# Push images to Artifact Registry
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'us-central1-docker.pkg.dev/my-webapp-project/frontend:latest']
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'us-central1-docker.pkg.dev/my-webapp-project/backend:latest']
# Deploy to GKE
- name: 'gcr.io/cloud-builders/kubectl'
args: ['apply', '-f', 'k8s/frontend-deployment.yaml']
- name: 'gcr.io/cloud-builders/kubectl'
args: ['apply', '-f', 'k8s/backend-deployment.yaml']
Step 3: Set Up Google Kubernetes Engine
- Create a GKE cluster:
gcloud container clusters create webapp-cluster \
--num-nodes=3 \
--region=us-central1
- Authenticate kubectl with GKE:
gcloud container clusters get-credentials webapp-cluster --region=us-central1
- Create Kubernetes manifests:
- Frontend Deployment (k8s/frontend-deployment.yaml):
apiVersion: apps/v1
kind: Deployment
metadata:
name: frontend
spec:
replicas: 3
selector:
matchLabels:
app: frontend
template:
metadata:
labels:
app: frontend
spec:
containers:
- name: frontend
image: us-central1-docker.pkg.dev/my-webapp-project/frontend:latest
ports:
- containerPort: 80
Step 4: Enable Continuous Deployment
With the cloudbuild.yaml
file and Kubernetes manifests in place, code pushes to the GitHub repository will trigger the pipeline. Cloud Build will:
- Build Docker images for the frontend and backend.
- Push the images to Artifact Registry.
- Deploy the images to GKE using kubectl.
Step 5: Implement Rollback and Monitoring
- Enable Kubernetes health checks: Add liveness and readiness probes in Kubernetes manifests.
livenessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 80
initialDelaySeconds: 10
periodSeconds: 5
- Rollback deployments manually (if necessary):
kubectl rollout undo deployment/frontend
kubectl rollout undo deployment/backend
- Enable monitoring: Use Cloud Monitoring and Logging for real-time insights into pipeline health.
Enhancing the Pipeline with AI
Enhancing the CI/CD pipeline for improved efficiency, reliability, and resource management. The AI system will focus on:
- Predictive Failure Detection: Analyzing logs and metrics to predict failures.
- Automated Issue Resolution: Automatically resolving common issues or recommending fixes.
- Resource Optimization: Dynamically allocating resources based on application workload.
Pipeline Enhancements
1. Predictive Failure Detection
Predict failures in the pipeline using historical data. This involves training a Machine Learning (ML) model to detect patterns leading to failures.
Data Collection:
Fetch logs and metrics from Google Cloud Monitoring and Logging.
from google.cloud import logging_v2, monitoring_v3
import pandas as pd
# Initialize clients
logging_client = logging_v2.Client()
monitoring_client = monitoring_v3.MetricServiceClient()
def fetch_logs():
logger = logging_client.logger('ci-cd-logs')
entries = logger.list_entries()
logs = []
for entry in entries:
logs.append(entry.payload)
return pd.DataFrame(logs, columns=["timestamp", "severity", "message"])
def fetch_metrics():
project_id = "projects/my-webapp-project"
query = f'projects/{project_id}/metrics'
results = monitoring_client.list_time_series(
request={"name": project_id, "filter": query, "interval": {"start_time": "...", "end_time": "..."}}
)
data = [{"timestamp": point.interval.end_time, "value": point.value.double_value} for ts in results for point in ts.points]
return pd.DataFrame(data)
Train an ML Model:
Use the collected data to train a classification model.
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
# Prepare data
logs = fetch_logs()
metrics = fetch_metrics()
dataset = logs.merge(metrics, on="timestamp")
X = dataset.drop(columns=["severity"])
y = dataset["severity"] # Assume severity indicates failures
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Train model
model = RandomForestClassifier()
model.fit(X_train, y_train)
# Save model
import joblib
joblib.dump(model, "failure_detection_model.pkl")
Deploy Model in the Pipeline: Integrate the model into the pipeline to make real-time predictions.
import joblib
model = joblib.load("failure_detection_model.pkl")
def predict_failure(log_entry):
return model.predict([log_entry])
2.Automated Issue Resolution
Automate resolution of common issues based on detected patterns.
def resolve_issue(issue_type):
if issue_type == "build_failure":
print("Retrying build...")
# Trigger retry logic
elif issue_type == "deployment_failure":
print("Rolling back deployment...")
# Rollback logic
else:
print("Escalating issue to the team.")
3.Resource Optimization
Optimize resource allocation based on workload patterns.
from google.cloud import container_v1
def scale_resources(cluster_name, node_pool_name, project_id, region, desired_nodes):
client = container_v1.ClusterManagerClient()
cluster_path = f"projects/{project_id}/locations/{region}/clusters/{cluster_name}"
client.set_node_pool_size(
project_id=project_id,
zone=region,
cluster_id=cluster_name,
node_pool_id=node_pool_name,
node_count=desired_nodes
)
print(f"Scaled node pool {node_pool_name} to {desired_nodes} nodes.")
Use AI to predict required resources:
def predict_resources(logs, metrics):
# Simplified resource prediction logic
workload = metrics["value"].mean()
if workload > 80:
return 5 # High load
elif workload > 50:
return 3 # Medium load
else:
return 1 # Low load
Integrate this into the pipeline to dynamically adjust resources.
Complete Python AI Pipeline
Integrate all the above features into a Python script executed within the pipeline.
def main():
logs = fetch_logs()
metrics = fetch_metrics()
# Predict failure
for log in logs.itertuples():
if predict_failure(log):
issue_type = log.issue_type # Example field
resolve_issue(issue_type)
# Predict resources and scale
desired_nodes = predict_resources(logs, metrics)
scale_resources(
cluster_name="webapp-cluster",
node_pool_name="default-pool",
project_id="my-webapp-project",
region="us-central1",
desired_nodes=desired_nodes,
)
if __name__ == "__main__":
main()
Monitoring and Alerts
Use Google Cloud Monitoring to trigger alerts based on ML predictions or unexpected behavior.
Configure alert policies in the GCP console. Use Python to send notifications via Pub/Sub or email when anomalies are detected.
from google.cloud import pubsub_v1
def send_alert(message):
publisher = pubsub_v1.PublisherClient()
topic_path = publisher.topic_path("my-webapp-project", "alerts")
publisher.publish(topic_path, message.encode("utf-8"))
Conclusion
We saw in this article how to easily automate the deployment process for a web application using Google Cloud tools like Cloud Build, Artifact Registry, and Google Kubernetes Engine. This setup streamlines the CI/CD pipeline, ensuring faster and more reliable deployments.
With the integration of GitHub or GitLab, any code push automatically triggers the build, test, and deployment process, keeping your application up-to-date. Adding health checks and monitoring improves the robustness of your system, while rollback mechanisms provide a safety net for deployments.
Last but not least, enhancing the pipeline with predictive AI models can further optimize workflows by identifying potential failures before they occur. This approach demonstrates the power of combining DevOps best practices with cutting-edge AI tools to create a resilient and scalable deployment pipeline.