採用

Software Engineer 2

Abnormal Security

Hybrid - Bangalore, India

Hybrid

Full-time

2d ago

About the Role

Abnormal Security is looking for an experienced and driven Platform & Infra software engineer to join the PI team. Join us and help build the platforms that power Abnormal's growth

Observability Platform
Own and evolve the monitoring, metrics, and alerting infrastructure that every engineering team at Abnormal depends on. You'll work across the Prometheus, Chronosphere, and Grafana stack to ensure engineers can see what their systems are doing in real time — building dashboards, managing metric pipelines at scale, operating the Pager Duty alerting pipeline, and driving cost-efficient observability across all production environments (US, EU, and Gov Cloud).

Your Impact

Own the observability stack (Prometheus, Chronosphere, Grafana, Pager Duty) that every team relies on to detect, diagnose, and resolve production issues — when you make it better, every engineer at Abnormal gets faster.
Design platforms and developer tooling that remove friction — reducing deployment times, simplifying pipeline authoring, and letting product teams focus on building rather than firefighting.
Drive SLAs and SLOs for critical shared infrastructure ensuring the systems behind our products are resilient and cost-efficient.
Your architectural decisions on alerting pipelines and cross-environment deployments will define what products we can build and how quickly we deliver them to customers.

What you will do

Work with the Tech Lead, Engineering Manager, and Product Manager to design, develop, and deliver key platform features — from technical design docs through production rollout
Own features end-to-end: scoping, implementation, testing, deployment, and post-launch monitoring across multiple environments (US, EU, Gov Cloud)
Take ownership of 1-3 key services within Observability (Prometheus, Chronosphere, Grafana, Pager Duty pipeline) or Data Infra (Airflow, Spark) and be accountable for their reliability, performance, and evolution
Participate in on-call rotations — triage, diagnose, and resolve production issues independently, building deep operational knowledge of the systems you own
Improve system resilience by converting runbooks into automated solutions, refining SLAs/SLOs, and proactively identifying performance bottlenecks and failure modes
Assume ownership of the reliability of everything you build, including comprehensive unit tests, integration testing, and observability instrumentation
Build platforms, tooling, and APIs that make it easier for other engineering teams to ship — whether that's faster pipeline deployments, better dashboards, or simpler alerting configuration
Partner with internal customers (product and engineering teams) to understand their needs and translate them into scalable platform capabilities
Communicate effectively in an async-first, distributed environment — proactively providing updates, discussing challenges, and proposing solutions without prompting
Mentor junior engineers on the team, helping them ramp up on service operations and development practices
Raise the bar of engineering excellence through code reviews, knowledge sharing, design discussions, and contributing to team best practices

Must Haves

Backend Engineering & Distributed Systems (4+ years)
4+ years of hands-on backend engineering experience designing, building, and operating production-grade distributed systems
Strong proficiency in Python — the primary language for Airflow DAGs, platform services, and automation tooling
Working proficiency in Golang — used for high-performance infrastructure components, metric pipelines, and platform services
Experience building systems that process data at scale — whether metric ingestion pipelines, stream/batch processing, or high-throughput API services
Demonstrated experience owning a service or platform end-to-end — from technical design through production deployment, monitoring, and iteration
Comfortable balancing feature development with operational responsibilities: you've shipped features and kept them running reliably at scale
Experience writing technical design documents that articulate trade-offs, propose solutions, and get buy-in from peers and tech leads
Track record of breaking down ambiguous problems into concrete, deliverable milestones
Experience with fault tolerance patterns — retries, circuit breakers, graceful degradation, backpressure — and knowing when to apply each
Proven incident response capability: you've been on-call, diagnosed production issues under pressure, and driven them to resolution
Strong testing discipline — unit tests, integration tests, and an understanding of what to test and how to keep test suites maintainable
Ability to design systems with a forward-looking perspective — thinking about how your architecture handles 10x growth, multi-region deployment, and evolving requirements
Ability to contribute to and influence cross-team technical direction — you're not just implementing specs, you're shaping the solution
Async-first communication excellence — strong written communication skills for design docs, Slack discussions, PR reviews, and status updates across time zones
Proactive communicator — you surface blockers early, share con
Solid understanding of monitoring, alerting, and observability principles — you've instrumented services, set up dashboards, defined SLIs/SLOs, or triaged production incidents using metrics and logs

Nice to Have

Hands-on experience with Prometheus — PromQL queries, recording rules, alerting rules, relabeling configs, and understanding metric cardinality challenges at scale
Experience with Grafana — building dashboards, templating, managing datasources, and creating meaningful visualizations for operational and business metrics
Familiarity with commercial observability platforms like Chronosphere, Datadog, New Relic, or Honeycomb — understanding trade-offs between self-hosted and managed solutions
Experience designing or operating an alerting pipeline — Pager Duty, Ops Genie, or similar — including alert routing, escalation policies, and reducing noise/alert fatigue

Cloud Infrastructure & Kubernetes

Familiarity with AWS services — EC2, ECS, EKS, S3, RDS, IAM, CloudWatch, Lambda, SQS/SNS — and understanding how to architect cost-effective, secure cloud infrastructure
Experience with Kubernetes (K8s) — deploying and operating workloads, understanding pods/services/deployments, Helm charts, and debugging cluster-level issues
Exposure to Infrastructure-as-Code tools — Terraform, Pulumi, or CloudFormation — and understanding the value of declarative infrastructure management
Experience with CI/CD pipelines — GitHub Actions, Jenkins, or similar — and optimizing build/deploy times for platform services

Programming & Framework

Experience with Django or similar Python web frameworks — building APIs, managing migrations, and understanding ORM performance characteristics
Familiarity with gRPC or protobuf for inter-service communication in a microservices architecture

Technical Leadership & Platform Thinking

Experience leading a small team (2-4 engineers) to build a feature or component from scratch — scoping, task breakdown, code reviews, and delivery management
Experience building internal developer platforms or tooling — CLIs, SDKs, self-service portals, or automation that improved developer productivity
Track record of reducing operational toil — automating runbooks

Abnormal AI is an equal opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, national origin, disability, protected veteran status or other characteristics protected by law. For our EEO policy statement please click here. If you would like more information on your EEO rights under the law, please click here.

総閲覧数

応募クリック数

模擬応募者数

スクラップ

類似の求人

DEVELOPER L3

Wipro · Kochi, India

Software Engineer III

JPMorgan Chase · Bengaluru, Karnataka, India, IN

Installed Base Service Engineer

Schneider Electric · Jamnagar, India

Installed Base Service Engineer

Schneider Electric · Jamnagar, India

CYBER SECURITY ANALYST L2

Wipro · Kochi, India

Abnormal Securityについて

Abnormal Security

Series B

Software company.

201-500

従業員数

Miami

本社所在地

$4B

企業価値

レビュー

3.0

1件のレビュー

ワークライフバランス

3.0

報酬

3.0

企業文化

2.5

キャリア

4.0

経営陣

2.0

60%

友人に勧める

良い点

Lots of opportunity for impact

Plenty of work and projects

High impact role

改善点

Questionable leadership

Non data-driven decisions

Poor management decisions

給与レンジ

50件のデータ

Senior/L5

Senior/L5 · Senior Manager of Customer Success

1件のレポート

$202,412

年収総額

基本給

$176,010

ストック

ボーナス

$202,412

面接体験

1件の面接

難易度

1.0

/ 5

期間

14-28週間

体験

ポジティブ 0%

普通 0%

ネガティブ 100%

面接プロセス

Application Review

Recruiter Screen

Technical Phone Screen

Onsite/Virtual Interviews

Team Matching

Offer

よくある質問

Behavioral/STAR

Technical Knowledge

Culture Fit

Past Experience

ニュース＆話題

Cybercrime goes plug and play with voice fraud-as-a-service platform - Cybernews

Cybernews

News

3d ago

[HIRING] a Regional Director, Sales (NYC)! in Abnormal Security

Company: Abnormal Security Location: Remote - USA 📍 Salary: 197.65K - 232.5K 💰 Date Posted: April 07, 2026 📅 Work Type: Full-Time ⏰ Categories: #businessdevelopment #senior #fulltime #remote Apply & Description 👉 https://jobboardsearch.com/redirect?utm_source=reddit&utm_medium=bot&utm_id=jobboarsearch&utm_term=www.remoteweek.io&rurl=aHR0cHM6Ly93d3cucmVtb3Rld2Vlay5pby9qb2JzL2QwOWE3NDU1LWI0MjAtNGM1My1hZmNmLWEyMzA1NzBkYmVjMw==

5d ago

Exclusive: Artemis raises $70M to help fight AI-powered attacks with AI - Fortune

Fortune

News

5d ago

New remote job at Abnormal Security

Abnormal Security is hiring a [Digital Customer Success Manager, EMEA](https://abnormal.ai/careers/jobs/7700258003?gh_jid=7700258003) NoCommute is a daily newsletter with just-posted remote jobs. To get hundreds of jobs like this sent to your email 5x a week, [subscribe here](https://www.nocommutejob.com/?utm_source=reddit&utm_medium=post&utm_campaign=job_post_bot)

6d ago