Files
Kwaku Danso 4bf0dab853 docs: add 4-tier production roadmap and detailed Tier 1 plan
- CLAUDE.md: 4-tier feature roadmap appended after the build-order
  section (launch blockers → moat features). Future sessions
  reference this to know which tier a new feature belongs to.

- docs/TIER1_PLAN.md: detailed sequencing for the 8 blocks of
  Tier 1 work (auth, authz, rate limiting, notifications, CSV
  import, billing, backups, privacy) with schema changes,
  endpoints, tests, and effort estimates per block.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-11 21:09:25 +01:00

24 KiB
Raw Permalink Blame History

Tier 1 Production Plan

This document sequences the work to take GuestGuard from feature-complete demo to a product that can be sold to event hosts.

Read CLAUDE.md for project conventions and the full 4-tier roadmap. This document is purely the what and the in what order for Tier 1.


TL;DR

Eight work blocks (AH), grouped into three waves that respect dependencies. Estimated effort: ~810 weeks for one engineer, ~56 weeks for two.

Wave 1 (foundation, must finish before anything else):
  A. Authentication ──┐
                      ├── B. Authorisation
                      └── C. Rate limiting (parallel)

Wave 2 (depends on auth being real):
  D. Notifications ───┐
                      ├── E. CSV import (parallel)
                      └── F. Billing

Wave 3 (ops + legal, can run alongside Wave 2):
  G. Backups & DR
  H. Privacy compliance

Block A — Authentication

Why first: every other Tier 1 item depends on knowing who's calling.

Goal

Replace the useHost() localStorage bootstrap with real auth: email + password, verified emails, password reset, JWT-based sessions with refresh tokens. The existing users table is reused.

Schema changes

Migration 0003_auth.up.sql:

ALTER TABLE users
  ADD COLUMN password_hash    TEXT,             -- bcrypt; nullable for OAuth-only users later
  ADD COLUMN email_verified   BOOLEAN NOT NULL DEFAULT FALSE,
  ADD COLUMN email_verified_at TIMESTAMPTZ;

CREATE TABLE email_verification_tokens (
  token_hash   TEXT PRIMARY KEY,
  user_id      UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  expires_at   TIMESTAMPTZ NOT NULL,
  consumed_at  TIMESTAMPTZ
);

CREATE TABLE password_reset_tokens (
  token_hash   TEXT PRIMARY KEY,
  user_id      UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  expires_at   TIMESTAMPTZ NOT NULL,
  consumed_at  TIMESTAMPTZ
);

CREATE TABLE refresh_tokens (
  token_hash   TEXT PRIMARY KEY,
  user_id      UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  expires_at   TIMESTAMPTZ NOT NULL,
  revoked_at   TIMESTAMPTZ,
  user_agent   TEXT,
  ip_address   INET,
  created_at   TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_refresh_tokens_user ON refresh_tokens(user_id) WHERE revoked_at IS NULL;

Backend (internal/auth, internal/api)

  • New auth.PasswordHasher (bcrypt via golang.org/x/crypto/bcrypt, cost 12)
  • New auth.JWTSigner issuing access tokens (15min TTL) signed with GG_JWT_SECRET
  • New repos for verification, reset, and refresh tokens (token hashes stored, never raw)
  • New handlers:
    • POST /auth/signup — creates unverified user, emits verification email
    • POST /auth/login — verifies password, requires verified email, returns access + refresh
    • POST /auth/refresh — rotates refresh token (single-use), returns new pair
    • POST /auth/logout — revokes the refresh token
    • POST /auth/verify-email — consumes verification token, sets email_verified
    • POST /auth/forgot-password — emits reset email (no-op if email unknown — don't leak existence)
    • POST /auth/reset-password — consumes reset token, updates password_hash, revokes all refresh tokens
  • New middleware requireAuth that pulls Authorization: Bearer …, validates, attaches userID to request context
  • Delete POST /users (the demo bootstrap)

Frontend

  • Delete useHost() composable; replace with useAuth() (access token in memory, refresh token in httpOnly cookie set by server)
  • New pages: /login, /signup, /verify-email, /forgot-password, /reset-password/:token
  • useApi() composable adds Authorization header; on 401, calls /auth/refresh; on refresh failure, redirects to /login
  • Dashboard route guard: redirect to /login if no session
  • Sign-out button calls /auth/logout, clears state, redirects to /

Notifications dependency

Verification + reset emails need real email delivery. Until Block D lands, use a stub EmailSender that prints the link to the API server logs so developers and the test environment can complete the flow without a Twilio/SES account. Document this in the block's README.

Tests

  • Unit: password hashing round-trip, JWT signing + parsing with expiry, token-hash storage
  • Integration: signup → verify-email → login → refresh → use-protected-endpoint → logout
  • Integration: forgot-password → reset-password → old refresh tokens revoked
  • Security: rate-limit signup (deferred to Block C, document the dependency)

Definition of done

  • Migration 0003_auth.up.sql applied
  • All /auth/* endpoints return appropriate status codes (verified against httpstatus.dev conventions)
  • Refresh-token rotation enforced (reusing a refresh token revokes the family — token-replay defence)
  • Email verification mandatory before first login
  • Frontend has working signup → verify → login → dashboard flow end-to-end
  • useHost() and POST /users removed from the codebase
  • No localhost-only assumptions in code paths

Effort: ~2 weeks for one engineer.


Block B — Authorisation

Why now: same PR cluster as Block A. Adding new endpoints without authz bakes in security debt.

Goal

Every host-facing endpoint enforces "this caller can only touch their own data". Audit the current API surface and add authz checks to each endpoint.

Schema changes

None — events.host_id already exists. We just need to start trusting the session-derived userID instead of the query parameter.

Backend

  • Apply requireAuth middleware to every route except: /health, /auth/*, the guest-facing /access/{token}, /rsvp/{token}, and the WS endpoint (note: WS auth needs its own design — see open questions)
  • For each event-scoped endpoint, derive hostID from session and reject if the event's host_id doesn't match:
    • GET /events → list only events where host_id = session.userID
    • GET /events/{id} → 404 (not 403, to avoid leaking existence) if owner mismatch
    • All PATCH/DELETE /events/{id} → same
    • POST /events/{id}/guests, GET /events/{id}/guests, POST /events/{id}/guests/{guest_id}/tokens, GET /events/{id}/activity → same
  • Remove the ?host_id=... query parameter from GET /events — derive from session
  • Update the integration test to authenticate first

Frontend

  • All host-facing API calls include the access token (already handled if useApi() was updated in Block A)
  • Update GET /events calls to drop the host_id query param

WebSocket auth (open question)

The WS endpoint /ws/events/{id} is currently anonymous. Options:

  1. Pass JWT as query param (?token=...) — browsers can't send Authorization headers on WS handshake
  2. Cookie-based session (httpOnly cookie set by /auth/login)
  3. Short-lived WS ticket: client calls POST /auth/ws-ticket (auth required), receives a single-use 60s ticket, passes as ?ticket=... to the WS handshake

Recommend option 3 — most secure, no token in URL beyond a single request. Document the choice.

Tests

  • Unit: authz middleware accepts/rejects/redirects appropriately
  • Integration: host A cannot list, read, modify host B's events (verify 404)
  • Integration: WS ticket flow works end-to-end

Definition of done

  • Every host route requires a valid session
  • Cross-tenant data access returns 404, not 403 (don't leak existence)
  • WS authentication implemented (option 3 recommended)
  • ?host_id=... query parameter removed everywhere
  • Pen-test pass: try to read/modify another user's event with their event_id but your own token

Effort: ~34 days, assuming Block A laid the middleware groundwork.


Block C — Rate limiting + abuse controls

Why now: small block, no dependency on auth other than knowing the userID for per-user limits. Redis is already provisioned but unused — this finally puts it to work.

Goal

Stop trivial abuse: someone scripting POST /auth/signup 10k times, brute-forcing the RSVP page, spamming token issuance, etc.

Schema changes

None — Redis only.

Backend

  • New internal/ratelimit package with a sliding-window limiter backed by Redis (use Redis INCR + EXPIRE or a Lua script for atomicity)
  • Apply per-route, per-key limits via middleware:
Endpoint Key Limit
POST /auth/signup IP 5 / hour
POST /auth/login IP + email 10 / 5 min (lock on consecutive failures)
POST /auth/forgot-password IP + email 3 / hour
POST /rsvp/{token} token 10 / hour
GET /access/{token} token 60 / hour
POST /events userID 20 / day
POST /events/{id}/guests userID 1000 / day
POST /events/{id}/guests/{guest_id}/tokens userID 500 / day
  • Return 429 Too Many Requests with Retry-After header on limit
  • CAPTCHA (hCaptcha or Cloudflare Turnstile) on POST /auth/signup and POST /auth/forgot-password
  • Lockout: after 5 consecutive failed logins, require password reset to unlock

Frontend

  • Render CAPTCHA widget on signup + forgot-password forms
  • On 429, show "You're going too fast — please try again in a minute" instead of generic error

Tests

  • Unit: limiter increments correctly, expires at window boundary
  • Integration: 6th signup from the same IP within an hour returns 429
  • Integration: CAPTCHA token validated server-side before processing signup

Definition of done

  • Redis MULTI/EXEC or Lua script confirms atomicity of the limiter
  • All endpoints in the table above are limited
  • CAPTCHA wired on signup + forgot-password
  • Lockout flow tested end-to-end
  • Limiter exposes Prometheus metrics (already implicit — ratelimit_block_total per endpoint)

Effort: ~34 days.


Block D — Real notifications

Why now: Block A's email verification + password reset need real delivery. Don't ship auth to production with a logger stub.

Goal

Replace LogSender in internal/notification with real Twilio + SES adapters. Branded HTML email templates. Bounce + complaint handling. Unsubscribe.

Schema changes

ALTER TABLE notifications
  ADD COLUMN provider_message_id TEXT,
  ADD COLUMN bounce_type TEXT,             -- 'permanent' | 'transient' | NULL
  ADD COLUMN complained BOOLEAN NOT NULL DEFAULT FALSE,
  ADD COLUMN delivered_at TIMESTAMPTZ;     -- already exists per memory, confirm

CREATE TABLE unsubscribes (
  email        CITEXT PRIMARY KEY,
  reason       TEXT,
  created_at   TIMESTAMPTZ NOT NULL DEFAULT now()
);

Backend (internal/notification, cmd/notifier)

  • TwilioSender (real github.com/twilio/twilio-go client)
    • Retry with exponential backoff: 1s, 5s, 30s, 5m, 30m
    • Permanent failure codes mapped to bounce_type = 'permanent'
    • Cost tracking: log message segments per send
  • SESSender (real github.com/aws/aws-sdk-go-v2/service/sesv2)
    • HTML + plaintext multipart
    • List-Unsubscribe header on every email
    • Configuration set with SNS topic for bounces + complaints
  • HTML templates (internal/notification/templates/*.tmpl):
    • invitation.html — "You're invited to {event_name}"
    • confirmation.html — RSVP recorded
    • verification.html — verify your email
    • reset.html — reset your password
    • reminder.html — 1-day-before reminder
  • Webhook endpoints (in internal/api, public, signed by provider):
    • POST /webhooks/twilio/status — Twilio message status callbacks
    • POST /webhooks/ses/notifications — SNS-delivered bounce/complaint notifications
    • Both verify signatures before trusting the payload
  • Check unsubscribes table before sending any email; refuse silently if present

Frontend

  • Unsubscribe page at /unsubscribe/:token — token signed so we know who's unsubscribing
  • Host setting: from-name + reply-to email per event (Tier 2 polish, defer if rushed)

Configuration

Required env vars (add to internal/config):

GG_TWILIO_ACCOUNT_SID
GG_TWILIO_AUTH_TOKEN
GG_TWILIO_FROM_NUMBER

GG_SES_REGION
GG_SES_FROM_EMAIL          # must be a verified identity
GG_SES_CONFIGURATION_SET

GG_PUBLIC_BASE_URL          # for unsubscribe + invitation links in templates

Tests

  • Unit: template rendering produces expected HTML and text
  • Unit: retry logic backs off correctly, surrenders after N attempts
  • Integration (with stubs): bounce webhook marks notification, blocks future sends to that email
  • Manual: actually send to a test inbox in a staging Twilio + SES account

Definition of done

  • Email verification email arrives in a real inbox (Gmail, Outlook)
  • SMS arrives on a real phone
  • DKIM + SPF + DMARC verified for sender domain (this is human-owned infra setup)
  • Bounces and complaints recorded in notifications + unsubscribes
  • Unsubscribe link in every email; clicking it adds the address to the suppression list
  • Templates render correctly in Gmail web, Outlook web, iOS Mail, Apple Mail (litmus.com or equivalent)

Effort: ~1.52 weeks (mostly template polish + deliverability setup).


Block E — CSV guest import

Why now: highest user-visible impact of any Tier 1 item, no dependency on other blocks except Block B's authz. Marketing already promises it.

Goal

A host can drag a .csv onto the dashboard and have hundreds of guests added in seconds. Validation surfaces problems before commit. Dedup is automatic.

Schema changes

None — uses existing guests table.

Backend

  • POST /events/{id}/guests/importmultipart/form-data, single CSV file
    • Header detection: tolerant of name|Name|guest_name, email|Email, phone|Phone|telephone, plus_ones|+1|plusones
    • Validation: name required, email format if present, phone E.164-ish if present, plus_ones non-negative integer
    • Dedup: skip rows whose email matches an existing guest on the same event
    • Returns: { added: int, skipped: int, errors: [{ row: int, reason: string }] }
    • Atomic per-batch: either all valid rows commit or none (transaction)
    • Limit: 5,000 rows per import
  • POST /events/{id}/guests/import/preview — same payload, but doesn't write; returns parsed rows for confirm UI
  • Sample CSV download: GET /events/{id}/guests/import/template — returns a .csv with example rows

Frontend

  • New section on event detail page: "Import guests from a spreadsheet"
  • Drag-drop zone (use vue-file-pond or native HTML5 drag-drop)
  • After upload: hit /preview, show a sortable table of rows with row-level errors highlighted
  • "Looks good — import" button calls /import
  • Show success summary: "Imported 247 guests. Skipped 3 duplicates. 2 rows had errors."
  • Help text linking to the template CSV

Tests

  • Unit: header detection accepts the listed variants and rejects unknown columns gracefully
  • Unit: validation rejects bad emails, accepts blank emails (phone-only guests valid)
  • Integration: dedup leaves existing guests untouched
  • Integration: rolling back on mid-batch error doesn't leave partial state

Definition of done

  • Sample CSV downloadable from the import UI
  • Preview always shown before commit
  • Errors are row-level, not "the whole file is invalid"
  • Encoding: handles UTF-8 with BOM (Excel exports), UTF-16 (Mac Numbers exports)
  • File-size cap: 1MB / 5,000 rows enforced server-side
  • No memory blow-up: parse rows as a stream, not into a []Row of arbitrary size

Effort: ~35 days.


Block F — Billing

Why last in Wave 2: depends on real auth (Block A), real notifications (Block D, for receipts), and a stable data model. Don't build until those are solid.

Goal

Stripe-based subscriptions. Free tier with hard limits. Paid tiers unlock higher limits. Failed-payment dunning. Self-serve upgrade + downgrade.

Pricing model (decision required — see open questions)

Recommended starter pricing (placeholder, validate with target market):

Tier Price Events/mo Guests/event SMS/mo Branding
Free $0 1 50 0 (email only) No
Personal $19/event 1 per purchase 500 100 Logo
Pro $49/mo 10 1,000 1,000 Full
Business $199/mo Unlimited 5,000 5,000 + custom domain

Schema changes

CREATE TABLE subscriptions (
  id                    UUID PRIMARY KEY DEFAULT gen_random_uuid(),
  user_id               UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  stripe_customer_id    TEXT NOT NULL,
  stripe_subscription_id TEXT,
  tier                  TEXT NOT NULL,           -- 'free' | 'personal' | 'pro' | 'business'
  status                TEXT NOT NULL,           -- 'active' | 'past_due' | 'canceled' | 'incomplete'
  current_period_end    TIMESTAMPTZ,
  cancel_at_period_end  BOOLEAN NOT NULL DEFAULT FALSE,
  created_at            TIMESTAMPTZ NOT NULL DEFAULT now(),
  updated_at            TIMESTAMPTZ NOT NULL DEFAULT now()
);
CREATE UNIQUE INDEX ON subscriptions(user_id) WHERE status = 'active';

CREATE TABLE usage_counters (
  user_id      UUID NOT NULL REFERENCES users(id) ON DELETE CASCADE,
  period_start DATE NOT NULL,
  events_count INT NOT NULL DEFAULT 0,
  sms_count    INT NOT NULL DEFAULT 0,
  PRIMARY KEY (user_id, period_start)
);

Backend

  • internal/billing package wrapping the Stripe SDK
  • POST /billing/checkout-session — returns a Stripe Checkout URL for the requested tier
  • POST /billing/portal — returns a Stripe Customer Portal URL
  • POST /webhooks/stripe — signature-verified, handles:
    • customer.subscription.created / .updated / .deleted → upsert into subscriptions
    • invoice.payment_failed → trigger dunning email (Block D)
    • invoice.payment_succeeded → clear past-due state
  • Enforcement: middleware checks usage against tier limits before allowing POST /events, POST /events/{id}/guests, SMS triggers. Returns 402 Payment Required with the upgrade URL on limit.

Frontend

  • /billing page: current plan, usage bars, upgrade/downgrade buttons
  • On 402, show modal: "You've hit your plan limit. Upgrade?"
  • Stripe Checkout opens in a new tab; on return, poll subscription state until updated by webhook

Tests

  • Integration: free user can create 1 event, second fails with 402
  • Integration: webhook signature verification rejects forged payloads
  • Integration: cancellation flow keeps access until period end

Definition of done

  • Stripe in test mode end-to-end working
  • Webhook signatures verified
  • Usage counters reset monthly (cron or compute on-demand)
  • Receipts emailed via Stripe (default behaviour, just confirm enabled)
  • Refund policy documented (referenced from billing page)

Effort: ~2 weeks.


Block G — Backups & disaster recovery

Mostly infra-owned, but the application side has documentation work.

Claude's scope

  • All migrations have a *.down.sql that's been tested locally
  • New docs/RUNBOOK_RESTORE.md documenting the restore procedure step-by-step
  • Confirm Postgres connection string env var supports the recovery instance (no hardcoded primary-only hostnames)
  • Optional: a cmd/restore-verify tool that runs after a restore to assert schema invariants (guest counts ≈ rsvp counts, no orphaned tokens, etc.)

Human / infra scope

  • pg_basebackup + WAL archiving to S3
  • Daily logical dump as a secondary safety net
  • Cross-region replication of the S3 bucket
  • Monthly restore drill scheduled
  • Documented RTO (e.g. 1 hour) and RPO (e.g. 5 minutes)

Definition of done

  • Every existing migration has a tested down migration
  • docs/RUNBOOK_RESTORE.md exists and a fresh engineer could follow it
  • First restore drill completed successfully

Effort: ~2 days for the application-side work.


Block H — Privacy compliance

Legal documents are human-owned. Application-level support is Claude scope.

Claude's scope

  • GET /me/data-export — streams a JSON document with every record (user, events, guests, tokens, RSVPs, access_logs, notifications) belonging to the authenticated user. Long-running, so async: enqueue → email a link.
  • DELETE /me — cascade-deletes the user and everything tied to them. Soft-delete first (set deleted_at), hard-delete on a cron after 30 days to honour any in-flight legal holds.
  • DELETE /events/{id}/guests/{guest_id} (host-triggered) — already exists in spirit; add a "forget this guest" action that removes RSVP/access-log rows but keeps the aggregate counter for the event.
  • Data retention: automated nightly job to soft-delete events whose event_date is older than 18 months (configurable per host once Tier 2).
  • Add privacy_policy_accepted_at and terms_accepted_at columns to users; block first login until both are accepted.
  • Privacy policy + ToS drafted by a lawyer
  • DPAs signed with Twilio, SES, Stripe, MaxMind, and any other subprocessor
  • Public privacy page at /privacy, ToS at /terms
  • Cookie banner (only required if analytics are added; currently we have none)
  • GDPR Article 30 record of processing activities

Definition of done

  • GET /me/data-export produces a complete, parseable JSON dump
  • DELETE /me cascades correctly with no orphan rows (verified by FK constraints)
  • Privacy + ToS pages live and linked from the footer + signup form
  • Acceptance enforced on first login after the launch date
  • Retention cron job tested

Cross-cutting concerns

These touch most blocks above; bake them in as you go, not as a separate pass.

Logging + auditing

Every state-changing endpoint logs: userID, action, target_id, result, request_id. Use slog with a correlation ID middleware. Critical for post-incident forensics.

Observability lite (Tier 3 scope, but minimum viable for launch)

  • Prometheus /metrics endpoint on the API exposing: request rate by endpoint, latency percentiles, 4xx/5xx counts, ratelimit_block_total
  • Sentry (or self-hosted GlitchTip) for unhandled errors, with release tagging

Feature flags

Lightweight feature_flags table or env-var driven (no LaunchDarkly yet). Useful for rolling out Block F's billing without exposing it to all users at once.


Open questions

Resolve before starting:

  1. Final pricing tiers — the table in Block F is a placeholder. Confirm with the target market (interview 10 wedding planners, 10 corporate event managers).
  2. Email provider — SES vs Postmark vs SendGrid. SES is cheapest but has the harshest deliverability ramp; Postmark is best for transactional but pricier.
  3. 2FA at launch or v1.1? — Recommend v1.1; one less moving piece on the launch path.
  4. Custom domain for RSVP pages at launch or v1.1? — Recommend v1.1 (Tier 2). Adds DNS + cert complexity.
  5. WebSocket auth mechanism — Recommend Block B option 3 (short-lived ticket).
  6. EU data residency at launch? — If targeting EU customers, this becomes Tier 1 (separate EU deployment). Otherwise defer to Tier 4.

Sequencing summary table

Wave Block Depends on Effort (1 eng) Can parallelise with
1 A. Auth 2w
1 B. Authz A 4d C
1 C. Rate limiting A (for userID) 4d B
2 D. Notifications A 2w E
2 E. CSV import B 4d D, F
2 F. Billing A, D 2w E
3 G. Backups — (infra) 2d (Claude) any
3 H. Privacy A 3d any

One engineer, sequential: ~9 weeks. Two engineers, parallel-where-possible: ~5.5 weeks.


What's not in Tier 1 (deliberate)

These are tempting but are Tier 2:

  • Editable RSVPs (guests can change response after submitting)
  • Multi-host collaborators
  • Event branding (logo, colours, custom domain)
  • Day-of QR check-in
  • Better fraud-engine thresholds (false-positive feedback loop)
  • Calendar integration
  • Auto-reminders (1-day before, etc.)
  • Mobile push notifications

Ship Tier 1 first. The launch story is "personal invitations + live tracking + quiet fraud detection + works reliably + you can pay us money". Everything else is the second release.