Tempo Service
The Java authoring backend behind Tempo — 38 tenants, Oracle → Postgres on GCP
The system of record for module instances, published versions, draft/staging state, schedules, triggers, targeting payloads, layouts, page-type metadata, SEO overrides, and audit history across 38 production tenants. Rated business-critical in Service Registry.
The problem
Tempo Service is the spinal cord of authoring — a multi-minute outage stalls every Walmart publish across 38 tenants. The legacy Oracle environment had to be migrated to GCP Postgres in flight, without an authoring-side flag-day, while continuing to absorb ~65 Seed-monorepo PRs per month from teams the Tempo team doesn’t gate.
Module lifecycle as a state machine in SQL
A module instance flows draft_module → module_version (+ version_trigger, targeting) → published, with published_token augmenting the read-side for high-volume runtime serves. One published row per DRAFT_MODULE_PK indicates the row that’s live on the site, so the runtime’s join is bounded.
Triggers (URLs, page IDs, search terms, shelves, categories) and zone restrictions are enforced server-side at publish time, not in the UI — so a misbehaving client can’t produce a publish state the runtime can’t serve.
Self-healing the runtime cache
A runtime-validator cron runs every 2 minutes, reconciling published modules in the authoring DB against the runtime Meghacache and republishing any drift. Editorial changes propagate in seconds; cache drift heals itself in tens of seconds.
Cache-aside on the hot path
PublishedToken reads are served cache-aside via Walmart Pallet’s CacheService over Meghacache. Misses hit Postgres, hydrate the cache, and emit a Kafka event so downstream caches can warm in lockstep.
Kafka as the spine to downstream consumers
Every state change emits to KAFKA-V2-TEMPO-MOD-PROD, consumed by the Content Sync Service (which writes Cassandra for Tempo Runtime), Tango, the GraphQL CLS/OL layer, analytics, and re-indexing. The producer surface is centralized through KafkaProducerManager + SimpleKafkaProducer so producer config drift is impossible.
Why CXF (not Spring MVC)
Apache CXF lets the same Java service expose a JAX-RS surface under /services/* while keeping Spring Boot 3 underneath. Mature interceptors plug in for tenant header propagation, structured error envelopes, and W3C trace forwarding to downstreams — a more surgical fit for Walmart’s existing interceptor conventions than Spring MVC filters.
What I shipped
- Contributed to module versioning, publish, and rollback flows across draft_module, module_version, version_trigger, targeting, and published.
- Hardened tenant header propagation and W3C trace forwarding through CXF interceptors so the V3 UI → Tempo Service → Tempo Runtime chain is fully traceable end-to-end.
- Tuned cache-aside semantics on the PublishedToken hot path — cache hit ratios and Kafka invalidation events are now load-bearing for runtime cache warmth.
- Participated in the Oracle → GCP Postgres modernization (the *-post environments), validating Hibernate dialects and query plans against the new dialect.
- Multi-region prod deployment across useast4 / uscentral / EDC with Istio mTLS sidecars and Akeyless-managed secrets through the KITT pipeline.