
Senior Technical Lead
About the role
Job Summary
Senior Cloud Platform Engineer:
About the role
You will own the reliability, security, and scalability of our GCP-based AI platform infrastructure. Everything runs on Cloud Run, managed via Terraform, deployed through Cloud Build. You are responsible for zero-downtime deployments, cloud cost control, end-to-end observability, and ensuring that IAM, VPC, and data security posture meet enterprise standards. You are also the person the data and AI engineers call when their Terraform apply fails or their Cloud Run service won't start.
Key responsibilities
Own and evolve the Terraform IaC codebase — write and maintain reusable modules for Cloud Run services, AlloyDB clusters, Spanner instances, Big Query datasets, Memorystore Redis, Vertex AI endpoints, Artifact Registry, and VPC networking
Manage Cloud Build CI/CD pipelines across all services — branching strategy (Git Ops), build triggers, test gate enforcement, multi-environment promotion (dev → staging → prod), and automated rollback on failed health checks
Design and maintain GCP security posture — IAM least-privilege service accounts, Identity-Aware Proxy (IAP) for all internal services, VPC Service Controls, Private Service Connect for AlloyDB and Redis, and Secret Manager integration
Build and maintain the full observability stack — Cloud Monitoring dashboards, Open Telemetry collector configuration, structured JSON logging standards, distributed tracing across FastAPI and Lang Graph services, and Pager Duty or equivalent on-call alerting
Define and track SLOs for all platform services — API p50/p95/p99 latency, data pipeline freshness, AI pipeline throughput, Cloud Run error rate — and run monthly reliability reviews
Manage Docker image strategy — multi-stage build patterns to minimise image size, distroless base images, Artifact Registry lifecycle policies, and automated vulnerability scanning with Container Analysis
Implement Fin Ops practices — Big Query slot monitoring and reservation management, Cloud Run CPU/memory right-sizing, committed use discount planning, and per-team cost allocation using labels
Conduct quarterly infrastructure security reviews and respond to GCP Security Command Center findings
Must-have skills
Terraform — write modules from scratch, not just modify existing ones; HCL fluency, remote state backends (GCS), workspace management, and Terraform Cloud or Atlantis for Git Ops CI/CD integration
GCP — 3+ years hands-on production experience: Cloud Run, Big Query, Cloud Build, IAM, VPC networking, Cloud Monitoring, Secret Manager, Artifact Registry; GCP Associate Cloud Engineer or Professional DevOps Engineer certification strongly preferred
Docker — multi-stage builds, layer caching optimisation, distroless base images, image security scanning, and Artifact Registry management
CI/CD — Cloud Build or GitHub Actions: pipeline design from scratch, artifact versioning, environment-specific config management, and deployment gating strategies
Linux / bash — comfortable debugging inside running containers, writing shell automation scripts, managing file permissions and system resources
GCP networking — VPC design, subnet allocation, firewall rules, Private Service Connect, Cloud NAT, and DNS configuration for private service endpoints
Key Responsibilities
null
Skill Requirements
null
Other Requirements
Good to have
Open Telemetry — collector configuration, exporter setup (Cloud Trace, Prometheus), and custom instrumentation for Python FastAPI services
Kubernetes / GKE — even if the current stack is Cloud Run, GKE knowledge is valuable for future scale requirements
Python scripting for infrastructure automation — Cloud Functions, custom Cloud Build steps, GCP Admin SDK scripts
Cloud cost management tooling — Looker Studio billing dashboards, Budget Alerts, committed use planning models, and Big Query billing export analysis
Azure networking basics — enough to understand the cross-cloud connectivity between Azure Databricks and GCP services
GCP Security Engineer certification or equivalent security background
Required skills
GCP
Terraform
Cloud Run
Cloud Build
IAM
Monitoring
OpenTelemetry
SLOs
About HCL Technologies
Gautam Buddha Nagar
Headquarters