Executive Summary
Client: 876 Events (Ticketing & Event Management)
Objective: Move production from Cloudways to a security-first, autoscaling AWS stack—Application Load Balancer → ECS/Fargate (Nginx + PHP-FPM) → RDS Proxy → Aurora MySQL with Secrets Manager, S3 for uploads, SES for email, SQS workers, and an EventBridge scheduler.
I executed the build entirely via the AWS Console (GUI) and delivered three artifacts: a click-through checklist, a final architecture diagram, and a push-button runbook with rollback/teardown.
Business Context
876 Events experiences bursty traffic—ticket drops, on-sale windows, last-minute changes. On Cloudways, scaling behavior and secrets hygiene were limited, and deep observability was hard. I proposed a migration that emphasizes security by design, auditable steps, and reversible deployments without developer heroics.
Operating Principles I Enforced
-
GUI-only build (unless CLI was explicitly required) to keep the trail auditable and hand-off friendly.
-
Least-privilege IAM split between task execution (pull images, write logs) and app data-plane (read secrets, S3, SQS, SES).
-
Zero trust around data: no public database, Security Group chaining (ALB → App → RDS Proxy → Aurora), and TLS required to the database.
-
Blue/green mindset: health checks at the container and target group, deployment circuit breaker, quick rollback to the previous task definition.
Target Architecture (Production Overview)
-
Networking: Public Application Load Balancer terminates TLS and routes to private ECS/Fargate tasks spread across multiple AZs.
-
Compute:
-
Web service: Nginx + PHP-FPM, health-checked at
/healthz. -
Worker service: Dedicated tasks for queue processing.
-
-
Data: Aurora MySQL behind RDS Proxy with Require TLS enabled.
-
App storage: Private S3 bucket for uploads; application reads/writes via IAM-scoped access.
-
Messaging & Scheduling: SQS for jobs; EventBridge runs
php artisan schedule:runevery minute. -
Images: Container images stored in ECR with image scanning.
-
Secrets: Credentials and sensitive configuration in AWS Secrets Manager (strict JSON for DB username/password).
-
Observability: CloudWatch Logs and ECS service events; health reasons visible at the target group level.
Readiness & Guardrails
Before touching production traffic, I verified:
-
Certificates: Valid ACM certificate in the target region.
-
Networking: Two or more private subnets for ECS tasks; ALB in public subnets with only 80/443 open.
-
Security Groups:
-
ALB: ingress 80/443 from the internet.
-
App: ingress only from ALB SG.
-
RDS Proxy: ingress only from App SG.
-
Aurora: ingress only from RDS Proxy SG.
-
-
Secrets: DB secret stored as strict JSON
{"username":"…","password":"…"}. -
Runtime: ECR repo created; ECS Exec allowed; CloudWatch log groups ready.
-
Service Integrations: S3 bucket (private), SQS queue, SES identity verified.
Migration Phases
Phase 1 — Security & Networking Foundations
Goal: Establish the blast-radius boundaries first.
Actions: Set up VPC subnets, ALB in public subnets, ECS tasks in private subnets, and the SG chain described above. Issue/validate the ACM certificate.
Post-checks: ACM shows Issued; Security Groups reference other groups—not CIDR ranges—for east-west traffic.
Rollback: No production touch yet; nothing to revert beyond deleting unused resources.
Phase 2 — Data Plane with TLS
Goal: Centralize DB connections and enforce encryption in transit.
Actions: Create the Aurora cluster; configure RDS Proxy with Require TLS and Secrets Manager auth; point database clients to the Proxy endpoint.
Post-checks: Proxy status is Available; connecting with TLS succeeds in a test task.
Rollback: Keep Aurora and Proxy decoupled from public traffic until app services are healthy.
Phase 3 — Application Image & Boot Hygiene
Goal: Reproducible builds and deterministic boot.
Actions: Build the Docker image (Nginx + PHP-FPM), remove any debug packages/providers at build time, and push to ECR. The entrypoint:
-
Creates writable
storage/*andbootstrap/cache. -
Runs
php artisan optimize.
Post-checks: ECR shows latest tag, image scan is clean or findings are triaged.
Rollback: Re-tag to a previously vetted image.
Phase 4 — Web Service Behind the ALB
Goal: Serve traffic only when targets report healthy.
Actions:
-
Create an ALB target group with IP target type; health check path
/healthz. -
Create an ECS task definition for the web container.
-
Environment configuration (examples, not exhaustive):
-
Database: host = RDS Proxy endpoint, port 3306, connection = mysql, TLS CA path for MySQL client.
-
App:
APP_ENV=production,APP_DEBUG=false,APP_URL= client app subdomain. -
Storage:
FILESYSTEM_DISK=s3, region set appropriately. -
Queue:
QUEUE_CONNECTION=sqs, with SQS prefix/queue. -
Mail:
MAIL_MAILER=ses, with a verified sender.
-
-
Secrets mapping: map DB username/password from the JSON keys; optionally map Laravel
APP_KEYfrom a secret. -
Container health check: command
curl -f http://localhost/healthz || exit 1.
Post-checks: ECS service reaches steady state; target group shows healthy targets; logs are clean.
Rollback: Revert to the previous task definition revision and force a new deployment.
Phase 5 — Workers, Scheduler, and Integrations
Goal: Decouple background work and time-based tasks from web requests.
Actions:
-
S3 uploads bucket (private) with least-privilege access from the task role.
-
SQS queue for jobs; ECS worker service runs
php artisan queue:work. -
EventBridge rule triggers
php artisan schedule:runevery minute. -
SES domain identity verified; production access as required.
Post-checks: Jobs flow through SQS and are consumed; scheduler ticks on time; SES sends to real recipients.
Rollback: Scale worker service to 0 and disable the EventBridge rule.
Phase 6 — First-Run Initialization & Cutover
Goal: Finalize schema/caches and move traffic safely.
Actions:
-
Use ECS Exec into a healthy web task: run
php artisan session:table,php artisan migrate --force, andphp artisan optimize. -
Update DNS for the client subdomain to point to the ALB.
Validation: -
Sessions persist (DB sessions).
-
File uploads land in S3 and are retrievable by the app.
-
SES emails deliver successfully.
-
Queue jobs process end-to-end.
-
EventBridge scheduler fires as expected.
Rollback: Repoint DNS to the previous hosting; ECS services remain available for rapid re-cut.
Laravel-Specific Hardening I Baked In
-
Strict JSON in Secrets Manager for DB credentials; container env maps to
DB_USERNAME/DB_PASSWORD. -
TLS to DB via RDS Proxy with a CA certificate path for the MySQL client.
-
Debug packages removed at build; caches cleared/warmed on boot.
-
Writable directories (
storage/*,bootstrap/cache) enforced by entrypoint. -
Dual health checks (container command + target group path) to make deployments safe.
Observability & Fast Triage
-
CloudWatch Logs for each ECS task; I use these first when health checks fail.
-
ECS service events to spot deployment and scaling issues.
-
Target group health reasons to confirm application-level readiness vs. networking issues.
-
RDS Proxy metrics/status for connection pool health.
Cutover Results
-
Traffic moved behind the ALB only after targets reported healthy.
-
Database connections are mediated by RDS Proxy with TLS required.
-
Background work and scheduler run outside request/response cycles, improving latency consistency.
-
Secrets are out of code and centrally rotated in Secrets Manager.
-
The team received a GUI checklist, architecture diagram, and runbook to operate day-2 with confidence.
Deliverables
-
Click-through checklist (GUI paths and exact field values; safe to audit and repeat).
-
Architecture diagram reflecting the production topology and SG chaining.
-
Runbook that builds from a blank AWS account to production, with rollback and teardown notes.
Lessons Learned
-
Getting Security Groups right early prevents painful mid-migration rewiring.
-
Enforcing TLS at RDS Proxy from day one avoids risky “we’ll secure it later” drift.
-
Keeping the build GUI-first creates an accessible audit trail and an easy handoff to non-IaC teams; nothing prevents codifying later.
-
Treating every release as blue/green by default (health checks + circuit breaker) makes rollbacks routine instead of heroic.
