NKNeelesh K.
All case studies
PRISM · Backend·Backend engineer (co-owner)·2024 – present·In production

cxt-msg-asset-service

The PRISM system of record — Java 21, partitioned Postgres, Kafka outbox

150×
Homepage scale
20 → 3,500+ landing pages
+6.8%
mWeb ATF CTR
A/B vs Athena modules
+16%
Editor efficiency
CP / HP page-build workflows
12k+ / 24k+
Messages / Assets
live in production
~2 TPS
Write traffic
p95 ≈ 250ms
$11 / day
Cloud spend
service + DB + Kafka

PRISM’s backend system of record. Persists every Message + Asset, validates schemas for partner systems (SmartCreative, Digital Eyes), and is the upstream source of truth for downstream content distribution services (Pronto, IronBank, Tango, asset-discovery, P13N).

The problem

Athena modules required heavy seed overhead per landing page, had no real-time P13N sync, and used non-standard preview tooling. To scale Walmart from 20 hand-built homepages to thousands of targeted pages, the team needed reusable content as a first-class entity — with a strict lifecycle, multi-tenant isolation, and reliable Kafka fan-out to downstream consumers.

The PRISM domain model

A Message is "any idea we want to communicate to the customer" (Halloween Toys, Father’s Day, Local Bakery Finds). It owns schedule, targeting rules, eligible pages, and the linked destination — the four levers P13N uses to decide which Message to surface for a given customer on a given page.

Every Message hangs off a Message Hierarchy with a primary M0 parent (enables rollup reporting, campaign-level management, richer P13N signal). Each Message owns 1…N Assets, each Asset owns 1…N AssetConfigs (one per locale carrying the actual headline, image URL, CTA, colours, alignment).

The lifecycle is a strict state machine: Draft → In Progress → Published → Ended / Archived / Unpublished. P13N reads this state to make per-customer surface decisions.

Why list-partitioned Postgres

The Asset table is list-partitioned across assets_active / assets_inactive / assets_default with a composite primary key (asset_id, status). When an asset transitions ACTIVE → ARCHIVED, Postgres physically row-moves it across partitions — keeping hot OLTP load on the active partition while preserving full audit history.

Referential integrity is held together with a deferred-cascade FK from asset_config back to asset, so partition row-movement doesn’t orphan configs mid-transition. Without that, you get phantom configs pointing at archived assets and a slow leak of garbage.

Four datasources, one Strati layer

A single REST surface, but four physical datasources behind it (postgres, postgres-no-cdc, postgres-wtp, postgres-asset-ingestion), CCM-driven role injection toggles Kafka emission per write-path. This isolates translation + ingestion workloads from core OLTP load so a flood of Figma ingests can’t slow down editor saves.

Forklift transactional outbox — atomic DB + Kafka

Distributed transactions across DB and Kafka don’t exist in practice. The Forklift library writes the outbox row inside the same DB transaction as the entity mutation, and a background drainer publishes that tracker row to Kafka after commit — textbook outbox pattern.

Downstream consumers (Pronto, IronBank, asset-discovery, P13N) therefore never see a dual-write inconsistency. Either the Message exists and the event will be delivered, or neither happened.

Multi-tenant isolation as a default

A request-scoped WcpHeaders bean carries tenantId for the lifetime of the request. GenericBaseDAO.getDefaultPredicates injects the tenant predicate on every JPA query automatically, so no DAO can accidentally skip it. 25+ repositories therefore get tenant isolation for free — you have to actively opt out, not opt in.

Optimistic concurrency for bulk edits

Hibernate @Version on a BaseDO superclass maps to DB_LOCK_VERSION. A structured LockException unwrap pipeline (Strati LockException → FoundException → PersistenceException → PSQLException) surfaces deeply-nested DB errors as actionable HTTP 409s on bulk operations, so the UI can show "this asset was edited by someone else" instead of a 500 stack trace.

Auto-ingestion: Figma → Digital Eyes → Pronto → Asset

Eliminates merchant copy-paste between systems. A unique_ingested_figma_asset_constraint enforces idempotency — the same Figma layer can’t accidentally ingest twice. A partial unique index (WHERE status = 'START') enforces single-flight ingestion per tenant per metadata key, so a thundering herd of concurrent Figma webhooks can’t create duplicate work.

What I shipped

  • Modeled the PRISM domain (message, asset, asset_config, message_pages, message_targeting, message_group, message_hierarchy) plus the Draft → In Progress → Published → Ended/Archived/Unpublished state machine that P13N reads.
  • Designed the list-partitioned Postgres schema with composite PK (asset_id, status) and a deferred-cascade FK from asset_config — enabling lifecycle-driven row-movement across assets_active / assets_inactive / assets_default without losing referential integrity.
  • Architected a four-datasource layout (postgres, postgres-no-cdc, postgres-wtp, postgres-asset-ingestion) over Walmart’s Strati framework with CCM-driven role injection, toggling Kafka emission per write-path.
  • Hardened DB → Kafka reliability with the Forklift transactional-outbox / tracker-table pattern, writing the outbox row inside the same DB transaction as the entity mutation.
  • Delivered the auto-ingestion pipeline (Figma → Digital Eyes → SmartCreative → Pronto → Asset) with constraint-based idempotency and single-flight protection.
  • Shipped the Dynamic Asset Update (DAU) pipeline propagating CTA / attribute changes from the canonical asset out to every PRISM module’s deep-copied default — keeping non-personalized renders consistent with the system of record.
  • Built a multi-tenant data-access layer that transparently injects a tenantId predicate on every query via a request-scoped WcpHeaders bean and GenericBaseDAO.getDefaultPredicates — eliminating per-DAO tenancy plumbing across 25+ repositories.
  • Implemented optimistic concurrency control (Hibernate @Version on a BaseDO superclass + a structured LockException unwrap pipeline) surfacing deeply-nested errors as actionable HTTP 409s on bulk operations.
  • Integrated Digital Eyes asset-image analysis and the Walmart Translation Platform on a dedicated postgres-wtp datasource, triggering translation requests automatically on every asset insert and default-locale change.
  • Deployed to GKE behind GSLB with Istio sidecar, Akeyless-managed secrets, and a SonarQube quality gate at 80% — running for roughly $11/day in combined cloud spend.

Stack

Java 21Spring Boot 3.5Hibernate JPAQueryDSL 5PostgreSQLLiquibaseKafkaForklift outboxOpenTelemetryGKE / IstioAkeyless