# Tier 1 Production Plan > This document sequences the work to take GuestGuard from feature-complete demo > to a product that can be sold to event hosts. > > Read `CLAUDE.md` for project conventions and the full 4-tier roadmap. > This document is purely the **what** and the **in what order** for Tier 1. --- ## TL;DR Eight work blocks (A–H), grouped into three waves that respect dependencies. Estimated effort: **~8–10 weeks for one engineer**, **~5–6 weeks for two**. ``` Wave 1 (foundation, must finish before anything else): A. Authentication ──┐ ├── B. Authorisation └── C. Rate limiting (parallel) Wave 2 (depends on auth being real): D. Notifications ───┐ ├── E. CSV import (parallel) └── F. Billing Wave 3 (ops + legal, can run alongside Wave 2): G. Backups & DR H. Privacy compliance ``` --- ## Block A — Authentication > **Why first**: every other Tier 1 item depends on knowing who's calling. ### Goal Replace the `useHost()` localStorage bootstrap with real auth: email + password, verified emails, password reset, JWT-based sessions with refresh tokens. The existing `users` table is reused. ### Schema changes Migration `0003_auth.up.sql`: ```sql ALTER TABLE users ADD COLUMN password_hash TEXT, -- bcrypt; nullable for OAuth-only users later ADD COLUMN email_verified BOOLEAN NOT NULL DEFAULT FALSE, ADD COLUMN email_verified_at TIMESTAMPTZ; CREATE TABLE email_verification_tokens ( token_hash TEXT PRIMARY KEY, user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, expires_at TIMESTAMPTZ NOT NULL, consumed_at TIMESTAMPTZ ); CREATE TABLE password_reset_tokens ( token_hash TEXT PRIMARY KEY, user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, expires_at TIMESTAMPTZ NOT NULL, consumed_at TIMESTAMPTZ ); CREATE TABLE refresh_tokens ( token_hash TEXT PRIMARY KEY, user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, expires_at TIMESTAMPTZ NOT NULL, revoked_at TIMESTAMPTZ, user_agent TEXT, ip_address INET, created_at TIMESTAMPTZ NOT NULL DEFAULT now() ); CREATE INDEX idx_refresh_tokens_user ON refresh_tokens(user_id) WHERE revoked_at IS NULL; ``` ### Backend (`internal/auth`, `internal/api`) - New `auth.PasswordHasher` (bcrypt via `golang.org/x/crypto/bcrypt`, cost 12) - New `auth.JWTSigner` issuing access tokens (15min TTL) signed with `GG_JWT_SECRET` - New repos for verification, reset, and refresh tokens (token *hashes* stored, never raw) - New handlers: - `POST /auth/signup` — creates unverified user, emits verification email - `POST /auth/login` — verifies password, requires verified email, returns access + refresh - `POST /auth/refresh` — rotates refresh token (single-use), returns new pair - `POST /auth/logout` — revokes the refresh token - `POST /auth/verify-email` — consumes verification token, sets `email_verified` - `POST /auth/forgot-password` — emits reset email (no-op if email unknown — don't leak existence) - `POST /auth/reset-password` — consumes reset token, updates `password_hash`, revokes all refresh tokens - New middleware `requireAuth` that pulls `Authorization: Bearer …`, validates, attaches `userID` to request context - Delete `POST /users` (the demo bootstrap) ### Frontend - Delete `useHost()` composable; replace with `useAuth()` (access token in memory, refresh token in httpOnly cookie set by server) - New pages: `/login`, `/signup`, `/verify-email`, `/forgot-password`, `/reset-password/:token` - `useApi()` composable adds `Authorization` header; on 401, calls `/auth/refresh`; on refresh failure, redirects to `/login` - Dashboard route guard: redirect to `/login` if no session - Sign-out button calls `/auth/logout`, clears state, redirects to `/` ### Notifications dependency Verification + reset emails need real email delivery. Until Block D lands, **use a stub `EmailSender` that prints the link to the API server logs** so developers and the test environment can complete the flow without a Twilio/SES account. Document this in the block's README. ### Tests - Unit: password hashing round-trip, JWT signing + parsing with expiry, token-hash storage - Integration: signup → verify-email → login → refresh → use-protected-endpoint → logout - Integration: forgot-password → reset-password → old refresh tokens revoked - Security: rate-limit signup (deferred to Block C, document the dependency) ### Definition of done - [ ] Migration `0003_auth.up.sql` applied - [ ] All `/auth/*` endpoints return appropriate status codes (verified against `httpstatus.dev` conventions) - [ ] Refresh-token rotation enforced (reusing a refresh token revokes the family — token-replay defence) - [ ] Email verification mandatory before first login - [ ] Frontend has working signup → verify → login → dashboard flow end-to-end - [ ] `useHost()` and `POST /users` removed from the codebase - [ ] No localhost-only assumptions in code paths ### Effort: ~2 weeks for one engineer. --- ## Block B — Authorisation > **Why now**: same PR cluster as Block A. Adding new endpoints without authz > bakes in security debt. ### Goal Every host-facing endpoint enforces "this caller can only touch their own data". Audit the current API surface and add authz checks to each endpoint. ### Schema changes None — `events.host_id` already exists. We just need to start trusting the session-derived `userID` instead of the query parameter. ### Backend - Apply `requireAuth` middleware to every route except: `/health`, `/auth/*`, the guest-facing `/access/{token}`, `/rsvp/{token}`, and the WS endpoint (note: WS auth needs its own design — see open questions) - For each event-scoped endpoint, derive `hostID` from session and reject if the event's `host_id` doesn't match: - `GET /events` → list only events where `host_id = session.userID` - `GET /events/{id}` → 404 (not 403, to avoid leaking existence) if owner mismatch - All `PATCH/DELETE /events/{id}` → same - `POST /events/{id}/guests`, `GET /events/{id}/guests`, `POST /events/{id}/guests/{guest_id}/tokens`, `GET /events/{id}/activity` → same - Remove the `?host_id=...` query parameter from `GET /events` — derive from session - Update the integration test to authenticate first ### Frontend - All host-facing API calls include the access token (already handled if `useApi()` was updated in Block A) - Update `GET /events` calls to drop the `host_id` query param ### WebSocket auth (open question) The WS endpoint `/ws/events/{id}` is currently anonymous. Options: 1. **Pass JWT as query param** (`?token=...`) — browsers can't send `Authorization` headers on WS handshake 2. **Cookie-based session** (httpOnly cookie set by `/auth/login`) 3. **Short-lived WS ticket**: client calls `POST /auth/ws-ticket` (auth required), receives a single-use 60s ticket, passes as `?ticket=...` to the WS handshake Recommend option 3 — most secure, no token in URL beyond a single request. Document the choice. ### Tests - Unit: authz middleware accepts/rejects/redirects appropriately - Integration: host A cannot list, read, modify host B's events (verify 404) - Integration: WS ticket flow works end-to-end ### Definition of done - [ ] Every host route requires a valid session - [ ] Cross-tenant data access returns 404, not 403 (don't leak existence) - [ ] WS authentication implemented (option 3 recommended) - [ ] `?host_id=...` query parameter removed everywhere - [ ] Pen-test pass: try to read/modify another user's event with their event_id but your own token ### Effort: ~3–4 days, assuming Block A laid the middleware groundwork. --- ## Block C — Rate limiting + abuse controls > **Why now**: small block, no dependency on auth other than knowing the > `userID` for per-user limits. Redis is already provisioned but unused — > this finally puts it to work. ### Goal Stop trivial abuse: someone scripting `POST /auth/signup` 10k times, brute-forcing the RSVP page, spamming token issuance, etc. ### Schema changes None — Redis only. ### Backend - New `internal/ratelimit` package with a sliding-window limiter backed by Redis (use Redis `INCR` + `EXPIRE` or a Lua script for atomicity) - Apply per-route, per-key limits via middleware: | Endpoint | Key | Limit | |---|---|---| | `POST /auth/signup` | IP | 5 / hour | | `POST /auth/login` | IP + email | 10 / 5 min (lock on consecutive failures) | | `POST /auth/forgot-password` | IP + email | 3 / hour | | `POST /rsvp/{token}` | token | 10 / hour | | `GET /access/{token}` | token | 60 / hour | | `POST /events` | userID | 20 / day | | `POST /events/{id}/guests` | userID | 1000 / day | | `POST /events/{id}/guests/{guest_id}/tokens` | userID | 500 / day | - Return `429 Too Many Requests` with `Retry-After` header on limit - CAPTCHA (hCaptcha or Cloudflare Turnstile) on `POST /auth/signup` and `POST /auth/forgot-password` - Lockout: after 5 consecutive failed logins, require password reset to unlock ### Frontend - Render CAPTCHA widget on signup + forgot-password forms - On `429`, show "You're going too fast — please try again in a minute" instead of generic error ### Tests - Unit: limiter increments correctly, expires at window boundary - Integration: 6th signup from the same IP within an hour returns 429 - Integration: CAPTCHA token validated server-side before processing signup ### Definition of done - [ ] Redis `MULTI/EXEC` or Lua script confirms atomicity of the limiter - [ ] All endpoints in the table above are limited - [ ] CAPTCHA wired on signup + forgot-password - [ ] Lockout flow tested end-to-end - [ ] Limiter exposes Prometheus metrics (already implicit — `ratelimit_block_total` per endpoint) ### Effort: ~3–4 days. --- ## Block D — Real notifications > **Why now**: Block A's email verification + password reset need real > delivery. Don't ship auth to production with a logger stub. ### Goal Replace `LogSender` in `internal/notification` with real Twilio + SES adapters. Branded HTML email templates. Bounce + complaint handling. Unsubscribe. ### Schema changes ```sql ALTER TABLE notifications ADD COLUMN provider_message_id TEXT, ADD COLUMN bounce_type TEXT, -- 'permanent' | 'transient' | NULL ADD COLUMN complained BOOLEAN NOT NULL DEFAULT FALSE, ADD COLUMN delivered_at TIMESTAMPTZ; -- already exists per memory, confirm CREATE TABLE unsubscribes ( email CITEXT PRIMARY KEY, reason TEXT, created_at TIMESTAMPTZ NOT NULL DEFAULT now() ); ``` ### Backend (`internal/notification`, `cmd/notifier`) - `TwilioSender` (real `github.com/twilio/twilio-go` client) - Retry with exponential backoff: 1s, 5s, 30s, 5m, 30m - Permanent failure codes mapped to `bounce_type = 'permanent'` - Cost tracking: log message segments per send - `SESSender` (real `github.com/aws/aws-sdk-go-v2/service/sesv2`) - HTML + plaintext multipart - List-Unsubscribe header on every email - Configuration set with SNS topic for bounces + complaints - HTML templates (`internal/notification/templates/*.tmpl`): - `invitation.html` — "You're invited to {event_name}" - `confirmation.html` — RSVP recorded - `verification.html` — verify your email - `reset.html` — reset your password - `reminder.html` — 1-day-before reminder - Webhook endpoints (in `internal/api`, public, signed by provider): - `POST /webhooks/twilio/status` — Twilio message status callbacks - `POST /webhooks/ses/notifications` — SNS-delivered bounce/complaint notifications - Both verify signatures before trusting the payload - Check `unsubscribes` table before sending any email; refuse silently if present ### Frontend - Unsubscribe page at `/unsubscribe/:token` — token signed so we know who's unsubscribing - Host setting: from-name + reply-to email per event (Tier 2 polish, defer if rushed) ### Configuration Required env vars (add to `internal/config`): ``` GG_TWILIO_ACCOUNT_SID GG_TWILIO_AUTH_TOKEN GG_TWILIO_FROM_NUMBER GG_SES_REGION GG_SES_FROM_EMAIL # must be a verified identity GG_SES_CONFIGURATION_SET GG_PUBLIC_BASE_URL # for unsubscribe + invitation links in templates ``` ### Tests - Unit: template rendering produces expected HTML and text - Unit: retry logic backs off correctly, surrenders after N attempts - Integration (with stubs): bounce webhook marks notification, blocks future sends to that email - Manual: actually send to a test inbox in a staging Twilio + SES account ### Definition of done - [ ] Email verification email arrives in a real inbox (Gmail, Outlook) - [ ] SMS arrives on a real phone - [ ] DKIM + SPF + DMARC verified for sender domain (this is human-owned infra setup) - [ ] Bounces and complaints recorded in `notifications` + `unsubscribes` - [ ] Unsubscribe link in every email; clicking it adds the address to the suppression list - [ ] Templates render correctly in Gmail web, Outlook web, iOS Mail, Apple Mail (litmus.com or equivalent) ### Effort: ~1.5–2 weeks (mostly template polish + deliverability setup). --- ## Block E — CSV guest import > **Why now**: highest user-visible impact of any Tier 1 item, no dependency > on other blocks except Block B's authz. Marketing already promises it. ### Goal A host can drag a `.csv` onto the dashboard and have hundreds of guests added in seconds. Validation surfaces problems before commit. Dedup is automatic. ### Schema changes None — uses existing `guests` table. ### Backend - `POST /events/{id}/guests/import` — `multipart/form-data`, single CSV file - Header detection: tolerant of `name|Name|guest_name`, `email|Email`, `phone|Phone|telephone`, `plus_ones|+1|plusones` - Validation: name required, email format if present, phone E.164-ish if present, plus_ones non-negative integer - Dedup: skip rows whose email matches an existing guest on the same event - Returns: `{ added: int, skipped: int, errors: [{ row: int, reason: string }] }` - Atomic per-batch: either all valid rows commit or none (transaction) - Limit: 5,000 rows per import - `POST /events/{id}/guests/import/preview` — same payload, but doesn't write; returns parsed rows for confirm UI - Sample CSV download: `GET /events/{id}/guests/import/template` — returns a `.csv` with example rows ### Frontend - New section on event detail page: "Import guests from a spreadsheet" - Drag-drop zone (use `vue-file-pond` or native HTML5 drag-drop) - After upload: hit `/preview`, show a sortable table of rows with row-level errors highlighted - "Looks good — import" button calls `/import` - Show success summary: "Imported 247 guests. Skipped 3 duplicates. 2 rows had errors." - Help text linking to the template CSV ### Tests - Unit: header detection accepts the listed variants and rejects unknown columns gracefully - Unit: validation rejects bad emails, accepts blank emails (phone-only guests valid) - Integration: dedup leaves existing guests untouched - Integration: rolling back on mid-batch error doesn't leave partial state ### Definition of done - [ ] Sample CSV downloadable from the import UI - [ ] Preview always shown before commit - [ ] Errors are row-level, not "the whole file is invalid" - [ ] Encoding: handles UTF-8 with BOM (Excel exports), UTF-16 (Mac Numbers exports) - [ ] File-size cap: 1MB / 5,000 rows enforced server-side - [ ] No memory blow-up: parse rows as a stream, not into a `[]Row` of arbitrary size ### Effort: ~3–5 days. --- ## Block F — Billing > **Why last in Wave 2**: depends on real auth (Block A), real notifications > (Block D, for receipts), and a stable data model. Don't build until those > are solid. ### Goal Stripe-based subscriptions. Free tier with hard limits. Paid tiers unlock higher limits. Failed-payment dunning. Self-serve upgrade + downgrade. ### Pricing model (decision required — see open questions) Recommended starter pricing (placeholder, validate with target market): | Tier | Price | Events/mo | Guests/event | SMS/mo | Branding | |---|---|---|---|---|---| | Free | $0 | 1 | 50 | 0 (email only) | No | | Personal | $19/event | 1 per purchase | 500 | 100 | Logo | | Pro | $49/mo | 10 | 1,000 | 1,000 | Full | | Business | $199/mo | Unlimited | 5,000 | 5,000 | + custom domain | ### Schema changes ```sql CREATE TABLE subscriptions ( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, stripe_customer_id TEXT NOT NULL, stripe_subscription_id TEXT, tier TEXT NOT NULL, -- 'free' | 'personal' | 'pro' | 'business' status TEXT NOT NULL, -- 'active' | 'past_due' | 'canceled' | 'incomplete' current_period_end TIMESTAMPTZ, cancel_at_period_end BOOLEAN NOT NULL DEFAULT FALSE, created_at TIMESTAMPTZ NOT NULL DEFAULT now(), updated_at TIMESTAMPTZ NOT NULL DEFAULT now() ); CREATE UNIQUE INDEX ON subscriptions(user_id) WHERE status = 'active'; CREATE TABLE usage_counters ( user_id UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE, period_start DATE NOT NULL, events_count INT NOT NULL DEFAULT 0, sms_count INT NOT NULL DEFAULT 0, PRIMARY KEY (user_id, period_start) ); ``` ### Backend - `internal/billing` package wrapping the Stripe SDK - `POST /billing/checkout-session` — returns a Stripe Checkout URL for the requested tier - `POST /billing/portal` — returns a Stripe Customer Portal URL - `POST /webhooks/stripe` — signature-verified, handles: - `customer.subscription.created` / `.updated` / `.deleted` → upsert into `subscriptions` - `invoice.payment_failed` → trigger dunning email (Block D) - `invoice.payment_succeeded` → clear past-due state - Enforcement: middleware checks usage against tier limits before allowing `POST /events`, `POST /events/{id}/guests`, SMS triggers. Returns `402 Payment Required` with the upgrade URL on limit. ### Frontend - `/billing` page: current plan, usage bars, upgrade/downgrade buttons - On `402`, show modal: "You've hit your plan limit. Upgrade?" - Stripe Checkout opens in a new tab; on return, poll subscription state until updated by webhook ### Tests - Integration: free user can create 1 event, second fails with 402 - Integration: webhook signature verification rejects forged payloads - Integration: cancellation flow keeps access until period end ### Definition of done - [ ] Stripe in test mode end-to-end working - [ ] Webhook signatures verified - [ ] Usage counters reset monthly (cron or compute on-demand) - [ ] Receipts emailed via Stripe (default behaviour, just confirm enabled) - [ ] Refund policy documented (referenced from billing page) ### Effort: ~2 weeks. --- ## Block G — Backups & disaster recovery > **Mostly infra-owned**, but the application side has documentation work. ### Claude's scope - All migrations have a `*.down.sql` that's been tested locally - New `docs/RUNBOOK_RESTORE.md` documenting the restore procedure step-by-step - Confirm Postgres connection string env var supports the recovery instance (no hardcoded primary-only hostnames) - Optional: a `cmd/restore-verify` tool that runs after a restore to assert schema invariants (guest counts ≈ rsvp counts, no orphaned tokens, etc.) ### Human / infra scope - `pg_basebackup` + WAL archiving to S3 - Daily logical dump as a secondary safety net - Cross-region replication of the S3 bucket - Monthly restore drill scheduled - Documented RTO (e.g. 1 hour) and RPO (e.g. 5 minutes) ### Definition of done - [ ] Every existing migration has a tested down migration - [ ] `docs/RUNBOOK_RESTORE.md` exists and a fresh engineer could follow it - [ ] First restore drill completed successfully ### Effort: ~2 days for the application-side work. --- ## Block H — Privacy compliance > Legal documents are human-owned. Application-level support is Claude scope. ### Claude's scope - `GET /me/data-export` — streams a JSON document with every record (user, events, guests, tokens, RSVPs, access_logs, notifications) belonging to the authenticated user. Long-running, so async: enqueue → email a link. - `DELETE /me` — cascade-deletes the user and everything tied to them. Soft-delete first (set `deleted_at`), hard-delete on a cron after 30 days to honour any in-flight legal holds. - `DELETE /events/{id}/guests/{guest_id}` (host-triggered) — already exists in spirit; add a "forget this guest" action that removes RSVP/access-log rows but keeps the aggregate counter for the event. - Data retention: automated nightly job to soft-delete events whose `event_date` is older than 18 months (configurable per host once Tier 2). - Add `privacy_policy_accepted_at` and `terms_accepted_at` columns to `users`; block first login until both are accepted. ### Human / legal scope - Privacy policy + ToS drafted by a lawyer - DPAs signed with Twilio, SES, Stripe, MaxMind, and any other subprocessor - Public privacy page at `/privacy`, ToS at `/terms` - Cookie banner (only required if analytics are added; currently we have none) - GDPR Article 30 record of processing activities ### Definition of done - [ ] `GET /me/data-export` produces a complete, parseable JSON dump - [ ] `DELETE /me` cascades correctly with no orphan rows (verified by FK constraints) - [ ] Privacy + ToS pages live and linked from the footer + signup form - [ ] Acceptance enforced on first login after the launch date - [ ] Retention cron job tested ### Effort: ~3–4 days for the application work; legal work runs in parallel. --- ## Cross-cutting concerns These touch most blocks above; bake them in as you go, not as a separate pass. ### Logging + auditing Every state-changing endpoint logs: `userID`, `action`, `target_id`, `result`, `request_id`. Use `slog` with a correlation ID middleware. Critical for post-incident forensics. ### Observability lite (Tier 3 scope, but minimum viable for launch) - Prometheus `/metrics` endpoint on the API exposing: request rate by endpoint, latency percentiles, 4xx/5xx counts, `ratelimit_block_total` - Sentry (or self-hosted GlitchTip) for unhandled errors, with release tagging ### Feature flags Lightweight `feature_flags` table or env-var driven (no LaunchDarkly yet). Useful for rolling out Block F's billing without exposing it to all users at once. --- ## Open questions Resolve before starting: 1. **Final pricing tiers** — the table in Block F is a placeholder. Confirm with the target market (interview 10 wedding planners, 10 corporate event managers). 2. **Email provider** — SES vs Postmark vs SendGrid. SES is cheapest but has the harshest deliverability ramp; Postmark is best for transactional but pricier. 3. **2FA at launch or v1.1?** — Recommend v1.1; one less moving piece on the launch path. 4. **Custom domain for RSVP pages at launch or v1.1?** — Recommend v1.1 (Tier 2). Adds DNS + cert complexity. 5. **WebSocket auth mechanism** — Recommend Block B option 3 (short-lived ticket). 6. **EU data residency at launch?** — If targeting EU customers, this becomes Tier 1 (separate EU deployment). Otherwise defer to Tier 4. --- ## Sequencing summary table | Wave | Block | Depends on | Effort (1 eng) | Can parallelise with | |---|---|---|---|---| | 1 | A. Auth | — | 2w | — | | 1 | B. Authz | A | 4d | C | | 1 | C. Rate limiting | A (for `userID`) | 4d | B | | 2 | D. Notifications | A | 2w | E | | 2 | E. CSV import | B | 4d | D, F | | 2 | F. Billing | A, D | 2w | E | | 3 | G. Backups | — (infra) | 2d (Claude) | any | | 3 | H. Privacy | A | 3d | any | **One engineer, sequential**: ~9 weeks. **Two engineers, parallel-where-possible**: ~5.5 weeks. --- ## What's *not* in Tier 1 (deliberate) These are tempting but are Tier 2: - Editable RSVPs (guests can change response after submitting) - Multi-host collaborators - Event branding (logo, colours, custom domain) - Day-of QR check-in - Better fraud-engine thresholds (false-positive feedback loop) - Calendar integration - Auto-reminders (1-day before, etc.) - Mobile push notifications Ship Tier 1 first. The launch story is "personal invitations + live tracking + quiet fraud detection + works reliably + you can pay us money". Everything else is the second release.