NKNeelesh K.
All case studies
Tempo · Backend·Backend contributor·2023 – present·In production

Tempo Service

The Java authoring backend behind Tempo — 38 tenants, Oracle → Postgres on GCP

~27 TPS
Authoring traffic
p95 ≈ 2.5s (bulk tail to 15s)
38
Tenants
WM_GLASS, CA_GLASS, SAMS, ASDA, Mexico…
~65
Seed PRs / month
inbound from many teams
~$767 / day
Cloud spend
compute + DB + Kafka

The system of record for module instances, published versions, draft/staging state, schedules, triggers, targeting payloads, layouts, page-type metadata, SEO overrides, and audit history across 38 production tenants. Rated business-critical in Service Registry.

The problem

Tempo Service is the spinal cord of authoring — a multi-minute outage stalls every Walmart publish across 38 tenants. The legacy Oracle environment had to be migrated to GCP Postgres in flight, without an authoring-side flag-day, while continuing to absorb ~65 Seed-monorepo PRs per month from teams the Tempo team doesn’t gate.

Module lifecycle as a state machine in SQL

A module instance flows draft_module → module_version (+ version_trigger, targeting) → published, with published_token augmenting the read-side for high-volume runtime serves. One published row per DRAFT_MODULE_PK indicates the row that’s live on the site, so the runtime’s join is bounded.

Triggers (URLs, page IDs, search terms, shelves, categories) and zone restrictions are enforced server-side at publish time, not in the UI — so a misbehaving client can’t produce a publish state the runtime can’t serve.

Self-healing the runtime cache

A runtime-validator cron runs every 2 minutes, reconciling published modules in the authoring DB against the runtime Meghacache and republishing any drift. Editorial changes propagate in seconds; cache drift heals itself in tens of seconds.

Cache-aside on the hot path

PublishedToken reads are served cache-aside via Walmart Pallet’s CacheService over Meghacache. Misses hit Postgres, hydrate the cache, and emit a Kafka event so downstream caches can warm in lockstep.

Kafka as the spine to downstream consumers

Every state change emits to KAFKA-V2-TEMPO-MOD-PROD, consumed by the Content Sync Service (which writes Cassandra for Tempo Runtime), Tango, the GraphQL CLS/OL layer, analytics, and re-indexing. The producer surface is centralized through KafkaProducerManager + SimpleKafkaProducer so producer config drift is impossible.

Why CXF (not Spring MVC)

Apache CXF lets the same Java service expose a JAX-RS surface under /services/* while keeping Spring Boot 3 underneath. Mature interceptors plug in for tenant header propagation, structured error envelopes, and W3C trace forwarding to downstreams — a more surgical fit for Walmart’s existing interceptor conventions than Spring MVC filters.

What I shipped

  • Contributed to module versioning, publish, and rollback flows across draft_module, module_version, version_trigger, targeting, and published.
  • Hardened tenant header propagation and W3C trace forwarding through CXF interceptors so the V3 UI → Tempo Service → Tempo Runtime chain is fully traceable end-to-end.
  • Tuned cache-aside semantics on the PublishedToken hot path — cache hit ratios and Kafka invalidation events are now load-bearing for runtime cache warmth.
  • Participated in the Oracle → GCP Postgres modernization (the *-post environments), validating Hibernate dialects and query plans against the new dialect.
  • Multi-region prod deployment across useast4 / uscentral / EDC with Istio mTLS sidecars and Akeyless-managed secrets through the KITT pipeline.

Stack

Java 17Spring Boot 3.5Apache CXF (JAX-RS)Hibernate 6 + QueryDSL 5Oracle (legacy) + PostgreSQL (modernized)Meghacache (Memcached)KafkaStrati AF + PalletWCNP / GKEIstio mTLS