# Tier 1 Production Plan

> This document sequences the work to take GuestGuard from feature-complete demo
> to a product that can be sold to event hosts.
>
> Read `CLAUDE.md` for project conventions and the full 4-tier roadmap.
> This document is purely the **what** and the **in what order** for Tier 1.

---

## TL;DR

Eight work blocks (A–H), grouped into three waves that respect dependencies.
Estimated effort: **~8–10 weeks for one engineer**, **~5–6 weeks for two**.

```
Wave 1 (foundation, must finish before anything else):
  A. Authentication ──┐
                      ├── B. Authorisation
                      └── C. Rate limiting (parallel)

Wave 2 (depends on auth being real):
  D. Notifications ───┐
                      ├── E. CSV import (parallel)
                      └── F. Billing

Wave 3 (ops + legal, can run alongside Wave 2):
  G. Backups & DR
  H. Privacy compliance
```

---

## Block A — Authentication

> **Why first**: every other Tier 1 item depends on knowing who's calling.

### Goal

Replace the `useHost()` localStorage bootstrap with real auth: email + password,
verified emails, password reset, JWT-based sessions with refresh tokens. The
existing `users` table is reused.

### Schema changes

Migration `0003_auth.up.sql`:

```sql
ALTER TABLE users
  ADD COLUMN password_hash    TEXT,             -- bcrypt; nullable for OAuth-only users later
  ADD COLUMN email_verified   BOOLEAN NOT NULL DEFAULT FALSE,
  ADD COLUMN email_verified_at TIMESTAMPTZ;

CREATE TABLE email_verification_tokens (
  token_hash   TEXT PRIMARY KEY,
  user_id      UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  expires_at   TIMESTAMPTZ NOT NULL,
  consumed_at  TIMESTAMPTZ
);

CREATE TABLE password_reset_tokens (
  token_hash   TEXT PRIMARY KEY,
  user_id      UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  expires_at   TIMESTAMPTZ NOT NULL,
  consumed_at  TIMESTAMPTZ
);

CREATE TABLE refresh_tokens (
  token_hash   TEXT PRIMARY KEY,
  user_id      UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  expires_at   TIMESTAMPTZ NOT NULL,
  revoked_at   TIMESTAMPTZ,
  user_agent   TEXT,
  ip_address   INET,
  created_at   TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_refresh_tokens_user ON refresh_tokens(user_id) WHERE revoked_at IS NULL;
```

### Backend (`internal/auth`, `internal/api`)

- New `auth.PasswordHasher` (bcrypt via `golang.org/x/crypto/bcrypt`, cost 12)
- New `auth.JWTSigner` issuing access tokens (15min TTL) signed with `GG_JWT_SECRET`
- New repos for verification, reset, and refresh tokens (token *hashes* stored, never raw)
- New handlers:
  - `POST /auth/signup` — creates unverified user, emits verification email
  - `POST /auth/login` — verifies password, requires verified email, returns access + refresh
  - `POST /auth/refresh` — rotates refresh token (single-use), returns new pair
  - `POST /auth/logout` — revokes the refresh token
  - `POST /auth/verify-email` — consumes verification token, sets `email_verified`
  - `POST /auth/forgot-password` — emits reset email (no-op if email unknown — don't leak existence)
  - `POST /auth/reset-password` — consumes reset token, updates `password_hash`, revokes all refresh tokens
- New middleware `requireAuth` that pulls `Authorization: Bearer …`, validates, attaches `userID` to request context
- Delete `POST /users` (the demo bootstrap)

### Frontend

- Delete `useHost()` composable; replace with `useAuth()` (access token in memory, refresh token in httpOnly cookie set by server)
- New pages: `/login`, `/signup`, `/verify-email`, `/forgot-password`, `/reset-password/:token`
- `useApi()` composable adds `Authorization` header; on 401, calls `/auth/refresh`; on refresh failure, redirects to `/login`
- Dashboard route guard: redirect to `/login` if no session
- Sign-out button calls `/auth/logout`, clears state, redirects to `/`

### Notifications dependency

Verification + reset emails need real email delivery. Until Block D lands,
**use a stub `EmailSender` that prints the link to the API server logs** so
developers and the test environment can complete the flow without a Twilio/SES
account. Document this in the block's README.

### Tests

- Unit: password hashing round-trip, JWT signing + parsing with expiry, token-hash storage
- Integration: signup → verify-email → login → refresh → use-protected-endpoint → logout
- Integration: forgot-password → reset-password → old refresh tokens revoked
- Security: rate-limit signup (deferred to Block C, document the dependency)

### Definition of done

- [ ] Migration `0003_auth.up.sql` applied
- [ ] All `/auth/*` endpoints return appropriate status codes (verified against `httpstatus.dev` conventions)
- [ ] Refresh-token rotation enforced (reusing a refresh token revokes the family — token-replay defence)
- [ ] Email verification mandatory before first login
- [ ] Frontend has working signup → verify → login → dashboard flow end-to-end
- [ ] `useHost()` and `POST /users` removed from the codebase
- [ ] No localhost-only assumptions in code paths

### Effort: ~2 weeks for one engineer.

---

## Block B — Authorisation

> **Why now**: same PR cluster as Block A. Adding new endpoints without authz
> bakes in security debt.

### Goal

Every host-facing endpoint enforces "this caller can only touch their own data".
Audit the current API surface and add authz checks to each endpoint.

### Schema changes

None — `events.host_id` already exists. We just need to start trusting the
session-derived `userID` instead of the query parameter.

### Backend

- Apply `requireAuth` middleware to every route except: `/health`, `/auth/*`,
  the guest-facing `/access/{token}`, `/rsvp/{token}`, and the WS endpoint
  (note: WS auth needs its own design — see open questions)
- For each event-scoped endpoint, derive `hostID` from session and reject if
  the event's `host_id` doesn't match:
  - `GET /events` → list only events where `host_id = session.userID`
  - `GET /events/{id}` → 404 (not 403, to avoid leaking existence) if owner mismatch
  - All `PATCH/DELETE /events/{id}` → same
  - `POST /events/{id}/guests`, `GET /events/{id}/guests`, `POST /events/{id}/guests/{guest_id}/tokens`, `GET /events/{id}/activity` → same
- Remove the `?host_id=...` query parameter from `GET /events` — derive from session
- Update the integration test to authenticate first

### Frontend

- All host-facing API calls include the access token (already handled if `useApi()` was updated in Block A)
- Update `GET /events` calls to drop the `host_id` query param

### WebSocket auth (open question)

The WS endpoint `/ws/events/{id}` is currently anonymous. Options:

1. **Pass JWT as query param** (`?token=...`) — browsers can't send `Authorization` headers on WS handshake
2. **Cookie-based session** (httpOnly cookie set by `/auth/login`)
3. **Short-lived WS ticket**: client calls `POST /auth/ws-ticket` (auth required), receives a single-use 60s ticket, passes as `?ticket=...` to the WS handshake

Recommend option 3 — most secure, no token in URL beyond a single request. Document the choice.

### Tests

- Unit: authz middleware accepts/rejects/redirects appropriately
- Integration: host A cannot list, read, modify host B's events (verify 404)
- Integration: WS ticket flow works end-to-end

### Definition of done

- [ ] Every host route requires a valid session
- [ ] Cross-tenant data access returns 404, not 403 (don't leak existence)
- [ ] WS authentication implemented (option 3 recommended)
- [ ] `?host_id=...` query parameter removed everywhere
- [ ] Pen-test pass: try to read/modify another user's event with their event_id but your own token

### Effort: ~3–4 days, assuming Block A laid the middleware groundwork.

---

## Block C — Rate limiting + abuse controls

> **Why now**: small block, no dependency on auth other than knowing the
> `userID` for per-user limits. Redis is already provisioned but unused —
> this finally puts it to work.

### Goal

Stop trivial abuse: someone scripting `POST /auth/signup` 10k times,
brute-forcing the RSVP page, spamming token issuance, etc.

### Schema changes

None — Redis only.

### Backend

- New `internal/ratelimit` package with a sliding-window limiter backed by Redis
  (use Redis `INCR` + `EXPIRE` or a Lua script for atomicity)
- Apply per-route, per-key limits via middleware:

| Endpoint | Key | Limit |
|---|---|---|
| `POST /auth/signup` | IP | 5 / hour |
| `POST /auth/login` | IP + email | 10 / 5 min (lock on consecutive failures) |
| `POST /auth/forgot-password` | IP + email | 3 / hour |
| `POST /rsvp/{token}` | token | 10 / hour |
| `GET /access/{token}` | token | 60 / hour |
| `POST /events` | userID | 20 / day |
| `POST /events/{id}/guests` | userID | 1000 / day |
| `POST /events/{id}/guests/{guest_id}/tokens` | userID | 500 / day |

- Return `429 Too Many Requests` with `Retry-After` header on limit
- CAPTCHA (hCaptcha or Cloudflare Turnstile) on `POST /auth/signup` and `POST /auth/forgot-password`
- Lockout: after 5 consecutive failed logins, require password reset to unlock

### Frontend

- Render CAPTCHA widget on signup + forgot-password forms
- On `429`, show "You're going too fast — please try again in a minute" instead of generic error

### Tests

- Unit: limiter increments correctly, expires at window boundary
- Integration: 6th signup from the same IP within an hour returns 429
- Integration: CAPTCHA token validated server-side before processing signup

### Definition of done

- [ ] Redis `MULTI/EXEC` or Lua script confirms atomicity of the limiter
- [ ] All endpoints in the table above are limited
- [ ] CAPTCHA wired on signup + forgot-password
- [ ] Lockout flow tested end-to-end
- [ ] Limiter exposes Prometheus metrics (already implicit — `ratelimit_block_total` per endpoint)

### Effort: ~3–4 days.

---

## Block D — Real notifications

> **Why now**: Block A's email verification + password reset need real
> delivery. Don't ship auth to production with a logger stub.

### Goal

Replace `LogSender` in `internal/notification` with real Twilio + SES adapters.
Branded HTML email templates. Bounce + complaint handling. Unsubscribe.

### Schema changes

```sql
ALTER TABLE notifications
  ADD COLUMN provider_message_id TEXT,
  ADD COLUMN bounce_type TEXT,             -- 'permanent' | 'transient' | NULL
  ADD COLUMN complained BOOLEAN NOT NULL DEFAULT FALSE,
  ADD COLUMN delivered_at TIMESTAMPTZ;     -- already exists per memory, confirm

CREATE TABLE unsubscribes (
  email        CITEXT PRIMARY KEY,
  reason       TEXT,
  created_at   TIMESTAMPTZ NOT NULL DEFAULT now()
);
```

### Backend (`internal/notification`, `cmd/notifier`)

- `TwilioSender` (real `github.com/twilio/twilio-go` client)
  - Retry with exponential backoff: 1s, 5s, 30s, 5m, 30m
  - Permanent failure codes mapped to `bounce_type = 'permanent'`
  - Cost tracking: log message segments per send
- `SESSender` (real `github.com/aws/aws-sdk-go-v2/service/sesv2`)
  - HTML + plaintext multipart
  - List-Unsubscribe header on every email
  - Configuration set with SNS topic for bounces + complaints
- HTML templates (`internal/notification/templates/*.tmpl`):
  - `invitation.html` — "You're invited to {event_name}"
  - `confirmation.html` — RSVP recorded
  - `verification.html` — verify your email
  - `reset.html` — reset your password
  - `reminder.html` — 1-day-before reminder
- Webhook endpoints (in `internal/api`, public, signed by provider):
  - `POST /webhooks/twilio/status` — Twilio message status callbacks
  - `POST /webhooks/ses/notifications` — SNS-delivered bounce/complaint notifications
  - Both verify signatures before trusting the payload
- Check `unsubscribes` table before sending any email; refuse silently if present

### Frontend

- Unsubscribe page at `/unsubscribe/:token` — token signed so we know who's unsubscribing
- Host setting: from-name + reply-to email per event (Tier 2 polish, defer if rushed)

### Configuration

Required env vars (add to `internal/config`):

```
GG_TWILIO_ACCOUNT_SID
GG_TWILIO_AUTH_TOKEN
GG_TWILIO_FROM_NUMBER

GG_SES_REGION
GG_SES_FROM_EMAIL          # must be a verified identity
GG_SES_CONFIGURATION_SET

GG_PUBLIC_BASE_URL          # for unsubscribe + invitation links in templates
```

### Tests

- Unit: template rendering produces expected HTML and text
- Unit: retry logic backs off correctly, surrenders after N attempts
- Integration (with stubs): bounce webhook marks notification, blocks future sends to that email
- Manual: actually send to a test inbox in a staging Twilio + SES account

### Definition of done

- [ ] Email verification email arrives in a real inbox (Gmail, Outlook)
- [ ] SMS arrives on a real phone
- [ ] DKIM + SPF + DMARC verified for sender domain (this is human-owned infra setup)
- [ ] Bounces and complaints recorded in `notifications` + `unsubscribes`
- [ ] Unsubscribe link in every email; clicking it adds the address to the suppression list
- [ ] Templates render correctly in Gmail web, Outlook web, iOS Mail, Apple Mail (litmus.com or equivalent)

### Effort: ~1.5–2 weeks (mostly template polish + deliverability setup).

---

## Block E — CSV guest import

> **Why now**: highest user-visible impact of any Tier 1 item, no dependency
> on other blocks except Block B's authz. Marketing already promises it.

### Goal

A host can drag a `.csv` onto the dashboard and have hundreds of guests added
in seconds. Validation surfaces problems before commit. Dedup is automatic.

### Schema changes

None — uses existing `guests` table.

### Backend

- `POST /events/{id}/guests/import` — `multipart/form-data`, single CSV file
  - Header detection: tolerant of `name|Name|guest_name`, `email|Email`, `phone|Phone|telephone`, `plus_ones|+1|plusones`
  - Validation: name required, email format if present, phone E.164-ish if present, plus_ones non-negative integer
  - Dedup: skip rows whose email matches an existing guest on the same event
  - Returns: `{ added: int, skipped: int, errors: [{ row: int, reason: string }] }`
  - Atomic per-batch: either all valid rows commit or none (transaction)
  - Limit: 5,000 rows per import
- `POST /events/{id}/guests/import/preview` — same payload, but doesn't write; returns parsed rows for confirm UI
- Sample CSV download: `GET /events/{id}/guests/import/template` — returns a `.csv` with example rows

### Frontend

- New section on event detail page: "Import guests from a spreadsheet"
- Drag-drop zone (use `vue-file-pond` or native HTML5 drag-drop)
- After upload: hit `/preview`, show a sortable table of rows with row-level errors highlighted
- "Looks good — import" button calls `/import`
- Show success summary: "Imported 247 guests. Skipped 3 duplicates. 2 rows had errors."
- Help text linking to the template CSV

### Tests

- Unit: header detection accepts the listed variants and rejects unknown columns gracefully
- Unit: validation rejects bad emails, accepts blank emails (phone-only guests valid)
- Integration: dedup leaves existing guests untouched
- Integration: rolling back on mid-batch error doesn't leave partial state

### Definition of done

- [ ] Sample CSV downloadable from the import UI
- [ ] Preview always shown before commit
- [ ] Errors are row-level, not "the whole file is invalid"
- [ ] Encoding: handles UTF-8 with BOM (Excel exports), UTF-16 (Mac Numbers exports)
- [ ] File-size cap: 1MB / 5,000 rows enforced server-side
- [ ] No memory blow-up: parse rows as a stream, not into a `[]Row` of arbitrary size

### Effort: ~3–5 days.

---

## Block F — Billing

> **Why last in Wave 2**: depends on real auth (Block A), real notifications
> (Block D, for receipts), and a stable data model. Don't build until those
> are solid.

### Goal

Stripe-based subscriptions. Free tier with hard limits. Paid tiers unlock
higher limits. Failed-payment dunning. Self-serve upgrade + downgrade.

### Pricing model (decision required — see open questions)

Recommended starter pricing (placeholder, validate with target market):

| Tier | Price | Events/mo | Guests/event | SMS/mo | Branding |
|---|---|---|---|---|---|
| Free | $0 | 1 | 50 | 0 (email only) | No |
| Personal | $19/event | 1 per purchase | 500 | 100 | Logo |
| Pro | $49/mo | 10 | 1,000 | 1,000 | Full |
| Business | $199/mo | Unlimited | 5,000 | 5,000 | + custom domain |

### Schema changes

```sql
CREATE TABLE subscriptions (
  id                    UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id               UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  stripe_customer_id    TEXT NOT NULL,
  stripe_subscription_id TEXT,
  tier                  TEXT NOT NULL,           -- 'free' | 'personal' | 'pro' | 'business'
  status                TEXT NOT NULL,           -- 'active' | 'past_due' | 'canceled' | 'incomplete'
  current_period_end    TIMESTAMPTZ,
  cancel_at_period_end  BOOLEAN NOT NULL DEFAULT FALSE,
  created_at            TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at            TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE UNIQUE INDEX ON subscriptions(user_id) WHERE status = 'active';

CREATE TABLE usage_counters (
  user_id      UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  period_start DATE NOT NULL,
  events_count INT NOT NULL DEFAULT 0,
  sms_count    INT NOT NULL DEFAULT 0,
  PRIMARY KEY (user_id, period_start)
);
```

### Backend

- `internal/billing` package wrapping the Stripe SDK
- `POST /billing/checkout-session` — returns a Stripe Checkout URL for the requested tier
- `POST /billing/portal` — returns a Stripe Customer Portal URL
- `POST /webhooks/stripe` — signature-verified, handles:
  - `customer.subscription.created` / `.updated` / `.deleted` → upsert into `subscriptions`
  - `invoice.payment_failed` → trigger dunning email (Block D)
  - `invoice.payment_succeeded` → clear past-due state
- Enforcement: middleware checks usage against tier limits before allowing
  `POST /events`, `POST /events/{id}/guests`, SMS triggers. Returns
  `402 Payment Required` with the upgrade URL on limit.

### Frontend

- `/billing` page: current plan, usage bars, upgrade/downgrade buttons
- On `402`, show modal: "You've hit your plan limit. Upgrade?"
- Stripe Checkout opens in a new tab; on return, poll subscription state until updated by webhook

### Tests

- Integration: free user can create 1 event, second fails with 402
- Integration: webhook signature verification rejects forged payloads
- Integration: cancellation flow keeps access until period end

### Definition of done

- [ ] Stripe in test mode end-to-end working
- [ ] Webhook signatures verified
- [ ] Usage counters reset monthly (cron or compute on-demand)
- [ ] Receipts emailed via Stripe (default behaviour, just confirm enabled)
- [ ] Refund policy documented (referenced from billing page)

### Effort: ~2 weeks.

---

## Block G — Backups & disaster recovery

> **Mostly infra-owned**, but the application side has documentation work.

### Claude's scope

- All migrations have a `*.down.sql` that's been tested locally
- New `docs/RUNBOOK_RESTORE.md` documenting the restore procedure step-by-step
- Confirm Postgres connection string env var supports the recovery instance (no
  hardcoded primary-only hostnames)
- Optional: a `cmd/restore-verify` tool that runs after a restore to assert
  schema invariants (guest counts ≈ rsvp counts, no orphaned tokens, etc.)

### Human / infra scope

- `pg_basebackup` + WAL archiving to S3
- Daily logical dump as a secondary safety net
- Cross-region replication of the S3 bucket
- Monthly restore drill scheduled
- Documented RTO (e.g. 1 hour) and RPO (e.g. 5 minutes)

### Definition of done

- [ ] Every existing migration has a tested down migration
- [ ] `docs/RUNBOOK_RESTORE.md` exists and a fresh engineer could follow it
- [ ] First restore drill completed successfully

### Effort: ~2 days for the application-side work.

---

## Block H — Privacy compliance

> Legal documents are human-owned. Application-level support is Claude scope.

### Claude's scope

- `GET /me/data-export` — streams a JSON document with every record
  (user, events, guests, tokens, RSVPs, access_logs, notifications) belonging
  to the authenticated user. Long-running, so async: enqueue → email a link.
- `DELETE /me` — cascade-deletes the user and everything tied to them.
  Soft-delete first (set `deleted_at`), hard-delete on a cron after 30 days
  to honour any in-flight legal holds.
- `DELETE /events/{id}/guests/{guest_id}` (host-triggered) — already exists in
  spirit; add a "forget this guest" action that removes RSVP/access-log rows
  but keeps the aggregate counter for the event.
- Data retention: automated nightly job to soft-delete events whose
  `event_date` is older than 18 months (configurable per host once Tier 2).
- Add `privacy_policy_accepted_at` and `terms_accepted_at` columns to `users`;
  block first login until both are accepted.

### Human / legal scope

- Privacy policy + ToS drafted by a lawyer
- DPAs signed with Twilio, SES, Stripe, MaxMind, and any other subprocessor
- Public privacy page at `/privacy`, ToS at `/terms`
- Cookie banner (only required if analytics are added; currently we have none)
- GDPR Article 30 record of processing activities

### Definition of done

- [ ] `GET /me/data-export` produces a complete, parseable JSON dump
- [ ] `DELETE /me` cascades correctly with no orphan rows (verified by FK constraints)
- [ ] Privacy + ToS pages live and linked from the footer + signup form
- [ ] Acceptance enforced on first login after the launch date
- [ ] Retention cron job tested

### Effort: ~3–4 days for the application work; legal work runs in parallel.

---

## Cross-cutting concerns

These touch most blocks above; bake them in as you go, not as a separate pass.

### Logging + auditing

Every state-changing endpoint logs: `userID`, `action`, `target_id`, `result`,
`request_id`. Use `slog` with a correlation ID middleware. Critical for
post-incident forensics.

### Observability lite (Tier 3 scope, but minimum viable for launch)

- Prometheus `/metrics` endpoint on the API exposing: request rate by
  endpoint, latency percentiles, 4xx/5xx counts, `ratelimit_block_total`
- Sentry (or self-hosted GlitchTip) for unhandled errors, with release tagging

### Feature flags

Lightweight `feature_flags` table or env-var driven (no LaunchDarkly yet).
Useful for rolling out Block F's billing without exposing it to all users at
once.

---

## Open questions

Resolve before starting:

1. **Final pricing tiers** — the table in Block F is a placeholder. Confirm with the target market (interview 10 wedding planners, 10 corporate event managers).
2. **Email provider** — SES vs Postmark vs SendGrid. SES is cheapest but has the harshest deliverability ramp; Postmark is best for transactional but pricier.
3. **2FA at launch or v1.1?** — Recommend v1.1; one less moving piece on the launch path.
4. **Custom domain for RSVP pages at launch or v1.1?** — Recommend v1.1 (Tier 2). Adds DNS + cert complexity.
5. **WebSocket auth mechanism** — Recommend Block B option 3 (short-lived ticket).
6. **EU data residency at launch?** — If targeting EU customers, this becomes Tier 1 (separate EU deployment). Otherwise defer to Tier 4.

---

## Sequencing summary table

| Wave | Block | Depends on | Effort (1 eng) | Can parallelise with |
|---|---|---|---|---|
| 1 | A. Auth | — | 2w | — |
| 1 | B. Authz | A | 4d | C |
| 1 | C. Rate limiting | A (for `userID`) | 4d | B |
| 2 | D. Notifications | A | 2w | E |
| 2 | E. CSV import | B | 4d | D, F |
| 2 | F. Billing | A, D | 2w | E |
| 3 | G. Backups | — (infra) | 2d (Claude) | any |
| 3 | H. Privacy | A | 3d | any |

**One engineer, sequential**: ~9 weeks.
**Two engineers, parallel-where-possible**: ~5.5 weeks.

---

## What's *not* in Tier 1 (deliberate)

These are tempting but are Tier 2:

- Editable RSVPs (guests can change response after submitting)
- Multi-host collaborators
- Event branding (logo, colours, custom domain)
- Day-of QR check-in
- Better fraud-engine thresholds (false-positive feedback loop)
- Calendar integration
- Auto-reminders (1-day before, etc.)
- Mobile push notifications

Ship Tier 1 first. The launch story is "personal invitations + live tracking +
quiet fraud detection + works reliably + you can pay us money". Everything
else is the second release.