Tempo V3 UI + Fastify BFF
Ground-up rewrite of Walmart’s merchandising CMS authoring tool
Replacing the legacy Electrode V1 + GraphQL + React 16 stack with Next.js 15 App Router on React 19, fronted by a Fastify BFF that fans out to 15 downstream Walmart services. Live across 38 storefronts.
The problem
V2 was React 17 + Redux Toolkit + Apollo + a Koa BFF — slow dev loop, monolithic GraphQL resolvers, brittle module-definition form handling, and inconsistent UX with the rest of Walmart’s Living Design system. Adding a new downstream service required touching 15 duplicated resolver layers.
Why Fastify wrapping Next.js (not stock next start)
Walmart’s existing Saber/Electrode operational tooling (CCM2, electrode-tracing, electrode-ui-logger, electrode-prometheus, sso-pingfed) is Fastify-shaped. @walmart/wml-server-fastify wraps Next.js as a custom Node server, preserving the entire ops surface while getting App Router + React Server Components on top.
One BFF gateway for 15 services
A unified /api/proxy Fastify route fans out to Tempo Service, Tempo Runtime, Pronto, Tango, IronBank, Legato, Asset Service, P13N, RMA, Normalize, SEO/Tejas, DAL, Portal, Translation, and CCM — collapsing 15 duplicated GraphQL resolver layers from V2.
Adding a new downstream service is a ~50-line typed-client template against the same proxy contract. The proxy enforces SSRF (URL allowlist), normalizes headers, and adds defensive content-type parsing for downstreams that return unflagged JSON.
W3C trace propagation — first end-to-end tracing in Tempo’s history
Wrote a traceparent validator + forwarder for every outbound fetch(), mirroring the trace id onto a legacy X-Trace-ID header for backward compatibility with the Java backend’s interceptor. Validated >92% propagation success rate in stage (1,284 of 1,387 inbound requests over 2 hours) — unblocking end-to-end distributed tracing for the first time across the BFF ↔ Tempo Service ↔ Tempo Runtime chain.
Three classes of production bugs eliminated
A body-based proxy contract (application/proxy-service-json) fixed HTTP 431 on ~42KB Prism module submissions that previously failed at the gateway header-size limit. Defensive content-type parsing handled P13N’s unflagged JSON responses. SSRF protection via URL-allowlist validation closed an outbound-call attack surface.
Auth simplification: IAM + RMA → RMA-only
Replaced the two-system auth chain with RMA-only authorization through the Role Manager Auth Engine, enforcing access at tenant × pageType granularity. Cut outage modes from two to one and simplified the on-call triage playbook.
Tenant-aware Edge middleware
Next.js Edge middleware handles /{tenant}/ URL rewrites across all 38 tenants, cookie synchronization, feature-flag-driven legacy fallback to V2, and open-redirect protection. Enables a safe, route-by-route migration ramp without a flag-day cutover.
What I shipped
- Migrated the authoring UI from Electrode V1 + GraphQL + React 16 to Next.js 15 + React 19 on a Fastify custom server — cut cold-start build time by ~60% and shrank the client bundle through Server Components and TanStack Query streaming Suspense.
- Designed and shipped a unified /api/proxy Fastify BFF gateway fanning out to 15 downstream Walmart services — collapsed 15 duplicated GraphQL resolver layers and reduced "add new downstream service" to a ~50-line typed-client template.
- Implemented W3C Trace Context propagation across the BFF — validated >92% propagation success rate (1,284 / 1,387 requests over 2 hours) in stage.
- Eliminated three classes of production header bugs via a body-based proxy contract, defensive content-type parsing, and SSRF protection.
- Replaced the IAM + RMA auth chain with RMA-only authorization at tenant × pageType granularity — cut outage modes from two systems to one.
- Built tenant-aware Next.js Edge middleware handling /{tenant}/ URL rewrites across 38 tenants, cookie sync, feature-flag-driven legacy fallback, and open-redirect protection.
- Stood up the observability spine — OpenObserve dashboards, Quantum Metrics RUM, Prometheus metrics, and a Playwright e2e suite hitting live stage — surfacing per-user traffic patterns (79 unique editors, 1,857 calls / 4 days) and identifying ~57% bot-driven volume.
- Deployed multi-region active-active across three production clusters (uswest, uscentral, useast4) under the KITT nextjs-electrode-v1 profile with Concord + LooperPro pipelines and Akeyless-managed signature secrets.