cxt-msg-asset-service
The PRISM system of record — Java 21, partitioned Postgres, Kafka outbox
PRISM’s backend system of record. Persists every Message + Asset, validates schemas for partner systems (SmartCreative, Digital Eyes), and is the upstream source of truth for downstream content distribution services (Pronto, IronBank, Tango, asset-discovery, P13N).
The problem
Athena modules required heavy seed overhead per landing page, had no real-time P13N sync, and used non-standard preview tooling. To scale Walmart from 20 hand-built homepages to thousands of targeted pages, the team needed reusable content as a first-class entity — with a strict lifecycle, multi-tenant isolation, and reliable Kafka fan-out to downstream consumers.
The PRISM domain model
A Message is "any idea we want to communicate to the customer" (Halloween Toys, Father’s Day, Local Bakery Finds). It owns schedule, targeting rules, eligible pages, and the linked destination — the four levers P13N uses to decide which Message to surface for a given customer on a given page.
Every Message hangs off a Message Hierarchy with a primary M0 parent (enables rollup reporting, campaign-level management, richer P13N signal). Each Message owns 1…N Assets, each Asset owns 1…N AssetConfigs (one per locale carrying the actual headline, image URL, CTA, colours, alignment).
The lifecycle is a strict state machine: Draft → In Progress → Published → Ended / Archived / Unpublished. P13N reads this state to make per-customer surface decisions.
Why list-partitioned Postgres
The Asset table is list-partitioned across assets_active / assets_inactive / assets_default with a composite primary key (asset_id, status). When an asset transitions ACTIVE → ARCHIVED, Postgres physically row-moves it across partitions — keeping hot OLTP load on the active partition while preserving full audit history.
Referential integrity is held together with a deferred-cascade FK from asset_config back to asset, so partition row-movement doesn’t orphan configs mid-transition. Without that, you get phantom configs pointing at archived assets and a slow leak of garbage.
Four datasources, one Strati layer
A single REST surface, but four physical datasources behind it (postgres, postgres-no-cdc, postgres-wtp, postgres-asset-ingestion), CCM-driven role injection toggles Kafka emission per write-path. This isolates translation + ingestion workloads from core OLTP load so a flood of Figma ingests can’t slow down editor saves.
Forklift transactional outbox — atomic DB + Kafka
Distributed transactions across DB and Kafka don’t exist in practice. The Forklift library writes the outbox row inside the same DB transaction as the entity mutation, and a background drainer publishes that tracker row to Kafka after commit — textbook outbox pattern.
Downstream consumers (Pronto, IronBank, asset-discovery, P13N) therefore never see a dual-write inconsistency. Either the Message exists and the event will be delivered, or neither happened.
Multi-tenant isolation as a default
A request-scoped WcpHeaders bean carries tenantId for the lifetime of the request. GenericBaseDAO.getDefaultPredicates injects the tenant predicate on every JPA query automatically, so no DAO can accidentally skip it. 25+ repositories therefore get tenant isolation for free — you have to actively opt out, not opt in.
Optimistic concurrency for bulk edits
Hibernate @Version on a BaseDO superclass maps to DB_LOCK_VERSION. A structured LockException unwrap pipeline (Strati LockException → FoundException → PersistenceException → PSQLException) surfaces deeply-nested DB errors as actionable HTTP 409s on bulk operations, so the UI can show "this asset was edited by someone else" instead of a 500 stack trace.
Auto-ingestion: Figma → Digital Eyes → Pronto → Asset
Eliminates merchant copy-paste between systems. A unique_ingested_figma_asset_constraint enforces idempotency — the same Figma layer can’t accidentally ingest twice. A partial unique index (WHERE status = 'START') enforces single-flight ingestion per tenant per metadata key, so a thundering herd of concurrent Figma webhooks can’t create duplicate work.
What I shipped
- Modeled the PRISM domain (message, asset, asset_config, message_pages, message_targeting, message_group, message_hierarchy) plus the Draft → In Progress → Published → Ended/Archived/Unpublished state machine that P13N reads.
- Designed the list-partitioned Postgres schema with composite PK (asset_id, status) and a deferred-cascade FK from asset_config — enabling lifecycle-driven row-movement across assets_active / assets_inactive / assets_default without losing referential integrity.
- Architected a four-datasource layout (postgres, postgres-no-cdc, postgres-wtp, postgres-asset-ingestion) over Walmart’s Strati framework with CCM-driven role injection, toggling Kafka emission per write-path.
- Hardened DB → Kafka reliability with the Forklift transactional-outbox / tracker-table pattern, writing the outbox row inside the same DB transaction as the entity mutation.
- Delivered the auto-ingestion pipeline (Figma → Digital Eyes → SmartCreative → Pronto → Asset) with constraint-based idempotency and single-flight protection.
- Shipped the Dynamic Asset Update (DAU) pipeline propagating CTA / attribute changes from the canonical asset out to every PRISM module’s deep-copied default — keeping non-personalized renders consistent with the system of record.
- Built a multi-tenant data-access layer that transparently injects a tenantId predicate on every query via a request-scoped WcpHeaders bean and GenericBaseDAO.getDefaultPredicates — eliminating per-DAO tenancy plumbing across 25+ repositories.
- Implemented optimistic concurrency control (Hibernate @Version on a BaseDO superclass + a structured LockException unwrap pipeline) surfacing deeply-nested errors as actionable HTTP 409s on bulk operations.
- Integrated Digital Eyes asset-image analysis and the Walmart Translation Platform on a dedicated postgres-wtp datasource, triggering translation requests automatically on every asset insert and default-locale change.
- Deployed to GKE behind GSLB with Istio sidecar, Akeyless-managed secrets, and a SonarQube quality gate at 80% — running for roughly $11/day in combined cloud spend.