Open to opportunities · Globally

Sandip Dhakal

Senior Platform Engineer | Cloud Security & DevSecOps

I |

I build secure cloud platforms for healthcare SaaS under HIPAA compliance. From zero-trust access controls and automated compliance enforcement to AI-powered developer tooling across multi-account AWS.

0+Years Experience
0%+Security Findings Reduced
0%Faster Incident Detection
0+Docs Authored

About

Platform engineer focused on making cloud infrastructure secure by default and developer experience frictionless.

5+ years in healthcare SaaS, progressing from DevOps to Senior Platform Engineer. I work at the intersection of security, automation, and developer productivity across a multi-account AWS environment handling protected health information (PHI) under HIPAA compliance.

I've built zero-trust access systems, authored container security strategies aligned to NIST frameworks, integrated AI into our documentation workflow, rolled out WAF with BOT protection across 6+ accounts, and automated compliance across the entire AWS organization. I also write extensively: 50+ internal technical documents covering security procedures, architecture guides, DR runbooks, and incident readouts.

Zero Trust & IAM

Zero standing access to PHI data. JIT access with auto-revoke, SCPs, least-privilege enforcement across the AWS organization

Container Security

8-pillar security strategy across production EKS clusters. OPA Gatekeeper admission control, golden base images, Sysdig scanning

AI + Platform

AWS Bedrock RAG search, LLM governance with PII redaction, prompt injection hardening, rate limiting

FinOps & Cost Optimization

$50K+ saved through resource right-sizing, unused resource cleanup, and tagging strategies

Projects

Each project shows the business problem, what I built, and the measurable outcome.

01

Zero-Trust JIT Access for PHI Data

Security Engineering / IAM

Problem

Engineers had persistent access to S3 buckets containing protected health information. This violated zero-trust principles and created compliance risk under HIPAA. No audit trail for who accessed what data and when.

What I Built

A serverless JIT access system using Lambda, AWS IAM Identity Center, DynamoDB, and EventBridge. Users request time-limited, per-bucket access through an approval workflow. Per-bucket SSO permission sets are auto-created on first use. Access auto-revokes every 15 minutes via scheduled EventBridge rules. HTML email notifications on grant and revoke.

Outcome

  • Zero standing human access to PHI data
  • Per-resource isolation (no cross-bucket leakage)
  • Full audit trail via DynamoDB + CloudTrail
  • Automated grant/revoke with email notifications
  • Bucket policies deny all except approved JIT roles
LambdaIAM Identity CenterDynamoDBEventBridgeZero TrustPythonHIPAA
02

Container Security Strategy & OPA Gatekeeper

Security Architecture / Kubernetes

Problem

Multiple EKS clusters with no unified security standard. Teams deployed containers with root access, no resource limits, images from unvetted registries, and no admission control. No standardized base images or vulnerability SLAs.

What I Built

An 8-pillar container security strategy: golden base images (Alpine, weekly rebuilds), CI/CD scan gates (Sysdig), OPA Gatekeeper admission policies (8 policies), runtime hardening (non-root, read-only FS, dropped capabilities, seccomp), continuous patching cadence, Sysdig runtime monitoring, audit governance, and image tagging standards. Aligned to NIST SP 800-190, CIS Benchmarks, and SOC 2.

Outcome

  • 8 Gatekeeper policies enforced at Kubernetes admission
  • Non-root, read-only FS, dropped capabilities by default
  • Weekly golden image rebuilds with vulnerability scanning
  • Vulnerability SLAs: Critical 14 days, High 30 days
  • Dockerfile standards adopted organization-wide
EKSOPA GatekeeperSysdigNIST 800-190DockerAlpineECR
03

AI-Powered Documentation Search (RAG)

AI Integration / Developer Experience

Problem

Thousands of documentation pages with poor native search. Engineers spent significant time searching for runbooks, procedures, and architecture docs. Same questions asked repeatedly across teams. No way to query documentation using natural language.

What I Built

A RAG-based semantic search tool connecting our documentation platform with AWS Bedrock. Includes PII redaction via Bedrock Guardrails, rate limiting (20 req/min per IP), prompt injection hardening, input sanitization against CQL injection, output validation for leaked credentials, and a CloudWatch observability dashboard for query volume, errors, and latency.

Outcome

  • Semantic search across all documentation spaces
  • PII automatically redacted from LLM responses
  • Governance: rate limiting, output validation, audit logging
  • CloudWatch dashboard for query analytics
  • Reduced time-to-answer for operational questions
AWS BedrockRAGPythonLLM GovernanceCloudWatchPII Redaction
04

Multi-Account Cloud Security Automation

Compliance / DevSecOps

Problem

Recurring Security Hub findings across 6+ AWS accounts. Detective controls only notified after misconfigurations happened. Manual resource creation led to inconsistent security posture. No preventive measures to stop misconfigurations before they occurred.

What I Built

Automated compliance checkers for S3 (ACL removal, bucket policies, access logging), KMS (key rotation, public access blocking), CloudFront (logging, SSL/TLS), EC2 (IMDSv2 enforcement), Redshift (VPC routing, admin usernames), and WorkSpaces (volume encryption). Applied SCPs and Resource Control Policies at the AWS Organization level. Created IaC templates for security-by-default deployments. Monthly security patching across 9+ AWS accounts.

Outcome

  • Preventive controls across 6+ AWS accounts
  • IMDSv2 enforced on all EC2, EKS, and autoscaling
  • S3 ACLs eliminated, bucket policies standardized
  • KMS key rotation enforced, public access blocked
  • RDS encryption SCP applied organization-wide
  • 9+ accounts monthly security patched
Security HubSCPTerraformIMDSv2KMSPythonCloudFormation
05

WAF V2 Rollout with BOT Protection

Network Security / Web Application Firewall

Problem

Web applications across multiple AWS accounts had no WAF protection or were running outdated WAF V1. No BOT detection, no centralized logging, and no consistent rule sets across environments. Applications were exposed to common web exploits and automated attacks.

What I Built

Rolled out AWS WAF V2 with BOT protection across 6+ AWS accounts spanning analytics, care management, corporate, payment technology, and CI/CD environments. Configured managed rule groups, BOT control rules, and custom rate-based rules. Centralized WAF logs from all accounts to a dedicated logging account. Upgraded WAF Lambda functions across production and non-production environments. Cleaned up unused legacy WAF resources.

Outcome

  • WAF V2 with BOT protection across 6+ accounts
  • Centralized WAF logging to LogArchive account
  • Consistent rule sets across all environments
  • Legacy WAF resources cleaned up
  • Documented setup guide for future account onboarding
AWS WAF V2BOT ProtectionCloudWatchLambdaMulti-Account
06

Production Deployment Pipeline Automation

CI/CD / Developer Experience

Problem

Production releases required 8+ manual steps across multiple tools. Deployments were slow, error-prone, and depended on tribal knowledge held by a few engineers. No automated rollback. 7+ product teams depended on platform operations for every release.

What I Built

Unified deployment pipelines: single-trigger backup, deploy, verify, and start. Implemented rolling and blue-green deployment strategies with automated rollback on failure detection. Built Veracode security scanning into the CI/CD pipeline. Documented every deployment procedure so any engineer could execute releases independently. Supported 7+ product teams across Analytics, Care Management, Population Health, Bundles, and Data Engineering.

Outcome

  • 8+ manual steps reduced to single pipeline trigger
  • 4x increase in daily release frequency
  • Automated rollback on failure detection
  • Veracode security scanning integrated into CI/CD
  • 7+ product teams supported with self-service deployments
  • Complete deployment runbooks for team independence
JenkinsBlue-GreenRolling DeployVeracodeAnsibleBash
07

Cross-Region Disaster Recovery

Infrastructure / SRE

Problem

Critical enterprise tools (issue tracking, documentation, code repositories) had no tested DR plan. A regional AWS outage would leave the entire engineering organization without core tooling. No automated failover, no replication monitoring, no recovery runbooks.

What I Built

Cross-region DR architecture using Terraform for Global Accelerator DNS failover, automated S3 replication monitoring with alerting, and CodeCommit repository replication via ECS Fargate with Lambda triggers and SNS notifications. Executed full DR drills with smoke testing checklists. Evaluated Arpio for application-level DR. Documented complete recovery procedures.

Outcome

  • Successful DR failover drill validated end-to-end
  • Infrastructure-as-Code for all DR components
  • Automated S3 replication health monitoring with alerts
  • CodeCommit cross-region replication via ECS Fargate
  • Complete DR runbook for the team
TerraformGlobal AcceleratorECS FargateS3 ReplicationLambdaSNS
08

Elasticsearch & OpenSearch Operations at Scale

Data Infrastructure / Search

Problem

Multiple self-managed Elasticsearch clusters and AWS-managed OpenSearch domains processing 1TB+ daily data. Cluster health issues, index replication failures, certificate expirations, and performance bottlenecks required constant attention. No standardized operations procedures across 134+ operational tickets.

What I Built

Automated cluster operations: index restoration with automated cluster selection, alias creation automation, cross-zone load balancing implementation, certificate rotation procedures, and cluster right-sizing (adding/removing nodes based on load). Upgraded AWS-managed OpenSearch domains across 4 production clusters serving analytics, care management, and centralized logging. Built index lifecycle management, Metricbeat monitoring, and APM server configuration. Implemented Redis cluster failover and ElastiCache upgrades across 4 environments.

Outcome

  • 134+ operational tickets resolved across ES/OpenSearch
  • Cross-zone load balancing on production clusters
  • 4 OpenSearch production domains upgraded
  • Automated index restoration and alias management
  • Redis/ElastiCache upgrades across 4 environments
  • Sub-200ms query response times maintained
ElasticsearchOpenSearchRedisElastiCacheKibanaAPMMetricbeat

Career Journey

Jun 2023 - PresentCurrent

Senior Platform Engineer

Cedar Gate Technologies

Leading security architecture, AI integration, and platform automation across multi-account AWS. Built zero-trust PHI access control, authored 8-pillar container security strategy with OPA Gatekeeper, built AI-powered documentation search with AWS Bedrock, rolled out WAF V2 across 6+ accounts, and automated compliance across the organization. Mentoring junior engineers. 50+ technical documents authored.

Dec 2022 - Jun 2023

Platform Engineer

Cedar Gate Technologies

Hardened AWS security across 6+ accounts: enforced IMDSv2 on all compute, migrated S3 from ACLs to IAM policies, applied KMS key rotation and public access blocking, implemented SCPs at the organization level. Upgraded Jira (v10 LTS) and Confluence (v9 LTS) with zero downtime. Monthly security patching across 9+ accounts. Managed Elasticsearch/OpenSearch clusters and Redis infrastructure.

Sep 2021 - Dec 2022

DevOps Engineer

Cedar Gate Technologies

Built unified CI/CD pipelines replacing 8+ manual deployment steps. Implemented rolling and blue-green deployments enabling 4x release frequency. Automated Elasticsearch operations (index restoration, alias creation, cluster management). Set up Terraform infrastructure, domain migration, and monitoring for all services. Supported 7+ product teams with deployments.

Jun 2020 - Aug 2021

Cloud Support Engineer

Genese Cloud Solutions

Deployed Prometheus, Grafana, and ELK observability stack across 100+ microservices, reducing mean time to detection by 60%. Authored 60+ Knowledge Base articles, reducing support tickets by 35%. Delivered $50K+ annual savings through FinOps practices and resource optimization. Maintained 99.95% customer satisfaction.

2019

BSc (Hons) Computing

The British College, Kathmandu

Skills & Certifications

Cloud Platforms

AWSEC2EKSLambdaS3IAMVPCSecurity HubControl TowerIAM Identity CenterCloudFormationWAFAzureGCP

Kubernetes & Containers

KubernetesDockerHelmOPA GatekeeperSysdigECRArgoCDBottlerocket

IaC & CI/CD

TerraformAnsibleCloudFormationJenkinsGitHub ActionsGitOpsBlue-GreenVeracode

Security & Compliance

Zero TrustDevSecOpsNIST 800-190SOC 2ISO 27001HIPAASCPIMDSv2CIS Benchmarks

Observability

PrometheusGrafanaELKCloudWatchPagerDutySLI/SLOAPMMetricbeat

Languages & Data

PythonBashPostgreSQLElasticsearchOpenSearchMongoDBRedisRedshiftSnowflake

AI & Emerging

AWS BedrockRAGLLM GovernanceSemantic SearchFinOpsClickHouse

Certifications

AWS Solutions ArchitectAssociate
AWS Cloud PractitionerFoundational
Terraform AssociateHashiCorp
FinOps Certified Engineer
FinOps Certified EngineerFinOps Foundation
FinOps Certified FOCUS Analyst
FinOps FOCUS AnalystFinOps Foundation

Education

BSc (Hons) Computing

The British College, 2019

Technical Writing

50+ internal documents covering security procedures, architecture guides, DR runbooks, incident readouts, and research reports.

Security & Compliance

  • Container Security Strategy (8 pillars, 4C model, NIST 800-190)
  • PHI JIT Access Control Architecture & Procedure
  • Client S3 Bucket Change Readiness Procedure
  • KMS Key Creation and Compliance Best Practices
  • SSE-C Bucket Policy Implementation and Testing
  • S3 ACL to IAM Role-Based Access Migration Guide
  • Container Image Tagging Standards

Infrastructure

  • Dockerfile Standards & Golden Image Strategy
  • AWS WAF V2 Setup with BOT Protection Guide
  • CodeCommit Cross-Region Replication (ECS Fargate)
  • Cross-Region DR Technical Overview & Runbook
  • EKS AMI Update & Monthly Patching Procedures
  • AWS Storage Gateway VM Migration Guide
  • ClickHouse Cluster Setup & Implementation

CI/CD & Automation

  • Jenkins Version Upgrade and Patching Guide
  • Build & Deployment Process with Veracode Scanning
  • Snowflake ELT Git-to-S3 Sync Automation
  • SAML SSO Setup Pipeline Guide
  • Production Deployment Runbooks (7+ applications)
  • Elasticsearch Certificate Update Procedures

Research & Incidents

  • Amazon QuickSight Feasibility Report
  • AWS AppStream 2.0 POC & User Survey
  • Arpio DR Feasibility Evaluation
  • Production Incident Readouts (3+ incidents)
  • Application Architecture Diagrams (all platforms)
  • Service Monitoring Strategy Documentation

Let's Connect

Open to platform engineering, DevSecOps, cloud security, and SRE roles. Based in Kathmandu, Nepal. Open to remote work globally and happy to relocate.