← Back to Home
Data 360 Architecture

Ingest · Prepare · Unify · Analyze · Govern · Act

A structured reference for Data 360 (formerly Salesforce Data Cloud) — ingestion patterns, the DLO/DMO data model, identity resolution, unstructured AI grounding, governance, and multi-org strategy.

Data 360 Architecture

☁️ Salesforce Data 360 — formerly Data Cloud

Data 360 is Salesforce's unified data platform for ingesting, harmonizing, and activating data at scale. It powers Unified Profiles for identity resolution, AI grounding via RAG for Agentforce, and real-time insights across the Customer 360. This reference covers the full lifecycle from raw ingestion to AI activation.

🔄 Data Flow 🔌 Connect 🔧 Prepare 🧠 Unstructured & AI 🔗 Unify & Analyze 🛡️ Governance 🏢 Multi-Org 🗺️ Full Overview
🔄

Data Flow Lifecycle

The four operational phases of Data 360 — formerly Salesforce Data Cloud

🔌
Connect / Ingest
Bring external data in via connectors or federate from external lakes
🔧
Prepare / Transform
Clean, shape, and map raw data into DLOs → DMOs
🔗
Harmonize / Unify
Resolve identities and build Unified Profiles from disparate sources
Analyze / Act
Generate Insights, power AI, activate across channels
🔌

Connect Data — Ingestion & Federation

Two patterns for getting data into Data 360: move it or query it in place

📥 Data Streams Ingestion

  • Pipelines established via connectors (Amazon S3, Google Cloud Storage, Salesforce CRM, etc.)
  • Data is physically brought into Data 360
  • Lands first in Data Lake Objects (DLOs) as raw, unmapped data
  • Best for: high-frequency data that needs transformation and unification

🔭 Zero Copy Data Federation Federation

  • Query data from external data lakes without moving or duplicating it
  • Data physically stays in its source system
  • Query Federation: live compute-based queries (e.g., Snowflake, BigQuery)
  • File Federation: storage-based queries reading open table formats (e.g., Delta, Iceberg)

⚡ Caching / Acceleration Performance

  • Temporarily stores external federated data in a local cache
  • Improves performance and reduces latency for frequent queries
  • Cache refreshes at configurable intervals — data is not permanently stored
  • Use when: federation query is expensive and the data doesn't change in real time
Pattern Data Location Latency Best For Cost Consideration
Data Streams (Ingestion) Copied into Data 360 Batch / Near Real-Time Unified profiles, transformations, AI grounding Storage consumed in Data 360
Query Federation Stays in source (e.g., Snowflake) Real-time (compute-bound) Large datasets, avoid duplication, live queries Compute cost on source system per query
File Federation Stays in object storage (S3, ADLS) Near Real-Time Open table formats, data lake architectures Storage read costs on source
Caching / Acceleration Temporarily in local cache Low (cached) Frequent reads on federated data Temporary storage + refresh compute
🔧

Prepare & Model Data

From raw ingestion to harmonized, queryable objects

🗃️ Data Lake Objects (DLOs) Raw

  • Initial storage containers for raw, unmapped data
  • Data lands here first from Data Streams
  • No schema enforcement yet — preserve original structure
  • Think of DLOs as the "landing zone" before processing

📐 Data Model Objects (DMOs) Harmonized

  • Harmonized groupings mapped to the Customer 360 Data Model
  • Standardized schema across all data sources
  • Enable cross-source joins and identity resolution
  • Think of DMOs as the "clean, usable layer" above DLOs

⚙️ Data Transforms Processing

  • Batch or Streaming processes to combine, clean, and shape data
  • Run directly within the platform — no external ETL needed
  • Output can map to DMOs or other DLOs
  • Use for: deduplication, field normalization, enrichment, aggregation

🔑 Fully Qualified Keys (FQK) Key Management

  • Combines a source key + key qualifier to create a globally unique identifier
  • Prevents key conflicts when harmonizing data from multiple sources
  • Example: CRM_001 from Salesforce + ERP_001 from SAP don't collide
  • Critical for correct Identity Resolution downstream
Object TypeLayerSchemaPurpose
DLO (Data Lake Object)RawSource-native, unmappedLanding zone for ingested data
DMO (Data Model Object)HarmonizedCustomer 360 standard modelUnified, queryable, AI-ready data
UDLO (Unstructured DLO)Raw UnstructuredNone (PDFs, HTML, audio)Reference container for unstructured files
UDMO (Unstructured DMO)Harmonized UnstructuredGrouped by content typeChunked, vectorized, indexed for AI/RAG
🧠

Unstructured Data & AI Grounding

How Data 360 powers RAG for Agentforce, Prompt Templates, and Einstein AI

📁 UDLOs & UDMOs Containers

  • UDLO: References unstructured files — PDFs, HTML, audio transcripts, images
  • UDMO: Groups UDLOs into logical domains for AI consumption
  • No traditional schema — content-based organization
  • Equivalent to DLO/DMO but for unstructured content

✂️ Chunking & Vectorization AI Processing

  • Breaks unstructured content into semantically meaningful chunks
  • Each chunk is transformed into a vector embedding for machine readability
  • Chunking respects document structure (headings, paragraphs) for better context preservation
  • Output feeds Search Indexes for retrieval

🔍 Search Indexes Retrieval

  • Vector Index: Semantic similarity search — finds conceptually related content
  • Hybrid Index: Combines vector + keyword search for broader coverage
  • Generated from vectorized chunks in UDMOs
  • Power Retrieval Augmented Generation (RAG) in Agentforce and Prompt Builder

🌉 Retrievers Bridge Layer

  • Logical layer bridging the Search Index to downstream AI solutions
  • Run at inference time — triggered per query, not pre-baked
  • Surface relevant chunks to: AI Agents, Prompt Templates, Einstein Copilot
  • Can be scoped per Subagent for domain-specific knowledge grounding
RAG Pipeline — Unstructured Data to AI Response
📁 UDLO
PDFs, HTML,
Audio
📐 UDMO
Grouped by
domain
✂️ Chunk &
Vectorize
Embeddings
🔍 Search
Index
Vector/Hybrid
🌉 Retriever
At inference
time
🤖 AI Agent /
Prompt Template
Grounded response
🔗

Harmonize, Unify & Analyze

Identity resolution, Data Graphs, and three types of Insights

👤 Identity Resolution Unification

  • Uses match and reconciliation rulesets to merge disparate records
  • Output: Unified Profiles (e.g., Unified Individual, Unified Account)
  • Match rules: exact match, fuzzy match, probabilistic — configurable per use case
  • Reconciliation rules determine which field value "wins" when sources conflict
  • FQKs ensure source records are correctly attributed post-merge

🕸️ Data Graphs Performance

  • Materialized, read-only JSON views combining multiple DMOs
  • Pre-computed for fast, near real-time or real-time querying
  • Power Real-Time Insights and low-latency AI activations
  • Think of them as pre-joined, optimized snapshots of related data
  • Automatically refresh based on configured update frequency

📊 Calculated Insights Batch

Built for high-volume data processing and complex, historical metric generation. Runs as batch jobs on large datasets — ideal for customer lifetime value, historical purchase frequency, or aggregated segment scores.

Best for: historical analysis, ML feature generation, complex aggregations

⚡ Streaming Insights Near Real-Time

Designed for continuous processing of micro-batches in near real-time. Processes data as it arrives in small windows — ideal for updating engagement scores, session activity, or cart abandonment signals.

Best for: behavioral signals, running counters, event-driven metric updates

🚀 Real-Time Insights Milliseconds

Calculates metrics from a single record in milliseconds, relying on real-time Data Graphs. Used when a decision must be made the instant a record is created or updated — e.g., next best action at the point of call.

Best for: in-the-moment decisions, live agent assistance, real-time personalization
Insight TypeProcessing ModelLatencyData DependencyUse Case Example
Calculated InsightsBatchHoursFull DMO datasetCustomer Lifetime Value, 12-month purchase history
Streaming InsightsMicro-batch / ContinuousSeconds–MinutesIncoming event streamReal-time engagement score, cart activity
Real-Time InsightsSingle-record computeMillisecondsReal-Time Data GraphNext Best Action at point of contact
🛡️

Governance, Security & Organization

Data Spaces, access control, masking, and lineage

🗂️ Data Spaces Partitioning

Logical partitions that segregate data, metadata, and processes — for example by brand or department. Users only access contextually relevant data. Prevents cross-contamination between business units sharing one Data 360 instance.

🏷️ Data Tagging & Classification Metadata

Manual or AI-assisted application of metadata tags to objects and fields. Enforces governance taxonomies at scale — e.g., tagging fields as PII, GDPR-sensitive, or financial data. Enables automated policy enforcement downstream.

🔐 Access Control Security

RBAC (Role-Based): Access granted based on user role — e.g., Data Engineer sees all DMOs, Analyst sees only their Data Space.

ABAC (Attribute-Based): Access controlled by data attributes — e.g., a field tagged "PII" is hidden from roles without PII clearance. Enforces object-level, field-level, and row-level security.

🎭 Dynamic Data Masking Privacy

Conceals sensitive structured data in real-time without altering its underlying values or breaking relationships. A user without PII access sees ***-**-1234 instead of a full SSN — the data is still usable for joins and aggregations.

🗺️ Unified Lineage Traceability

A visual relationship graph tracing how data objects connect upstream and downstream. Shows the full journey: source connector → DLO → Transform → DMO → Insight → Activation. Essential for audits, impact assessments, and debugging data quality issues.

🏢

Multi-Org Strategy — Data Cloud One

One Home Org, multiple Companion Orgs sharing data without separate licenses

🏠 Home Org

Primary Salesforce org
Data 360 licensed here
Central management & provisioning
Hosts Data Spaces

⇅ Bidirectional

🔗 Companion Org A

Linked Salesforce org
No separate Data 360 license
Accesses shared Data Spaces

🔗 Companion Org B

Linked Salesforce org
No separate Data 360 license
Accesses shared Data Spaces

🔗 Companion Org C

Linked Salesforce org
No separate Data 360 license
Accesses shared Data Spaces

Companion Orgs connect bidirectionally to the Home Org — sharing data and features without needing their own Data 360 instance

🏠 Home Org

  • Where Data 360 is provisioned and licensed
  • Centrally manages all Data Spaces, connectors, and governance policies
  • Hosts the master Unified Profiles and Insights
  • Acts as the single source of truth for all connected orgs

🔗 Companion Org

  • Additional Salesforce org linked bidirectionally to the Home Org
  • No separate Data 360 license required
  • Can access shared Data Spaces and Unified Profiles
  • Use case: multi-brand company where each brand has its own org but shares customer data
🗺️

Full Architecture Overview

End-to-end data flow from source to AI activation

Real-World Example

🛍️ Retail Chain — Unified Customer Profile & AI Personalization

A national retail chain with e-commerce, 200 physical stores, a loyalty program, and a service contact center wants to unify all customer data to power personalized Agentforce interactions and real-time next-best-action recommendations. This walkthrough applies every Data 360 concept to a concrete architecture decision.

🏢 Scenario 🔌 Connect 🔧 Prepare 🔗 Unify 🧠 AI Grounding ⚡ Insights 🛡️ Governance 🗺️ Diagram
🏢

Scenario — The Business Problem

What the retail chain needs and why Data 360 is the answer

🏪 The Company

  • NovaMart — national retail chain, 200 stores + e-commerce site
  • 12 million loyalty program members
  • 3 separate Salesforce orgs: Commerce Cloud, Service Cloud, Marketing Cloud
  • POS system (in-store transactions) running on a legacy on-premise platform
  • Product catalog and inventory in an external data warehouse (Snowflake)

❌ The Problems

  • A customer who buys in-store is unknown to the e-commerce and service teams
  • The contact center agent sees no purchase history — can't resolve complaints without asking
  • Marketing sends promotions without knowing what the customer already bought
  • Agentforce cannot answer "what did I buy last month?" — no unified data source
  • Same customer appears as 3 different records across orgs (email, loyalty ID, phone)

✅ The Goal

Build a Unified Customer Profile that merges online, in-store, loyalty, and service history into a single identity. Use it to:

  • Power an Agentforce service agent that knows the full purchase history
  • Generate a real-time "next best offer" at the moment of contact
  • Trigger personalized Marketing Cloud journeys based on in-store behavior
  • Give the contact center agent a 360 view without leaving Service Cloud
  • Segment customers by lifetime value and engagement tier
  • Comply with GDPR — tag and mask PII consistently across all sources
🔌

Connect — How Each Source Gets In

Choosing ingestion vs. federation per data source based on volume, frequency, and transformation needs

SourcePatternWhy This PatternLands in
Salesforce CRM (Service Cloud) Data Stream Native connector — data must be unified and transformed into DMOs for identity resolution DLO → DMO (Individual, Case)
Commerce Cloud (e-commerce orders) Data Stream Native connector — high-frequency order events need transformation and unification DLO → DMO (Order, Product)
POS System (in-store transactions) Data Stream via S3 Legacy system exports nightly CSVs to Amazon S3 — S3 connector ingests into DLOs DLO → Transform → DMO (Order)
Loyalty Program DB Data Stream Loyalty IDs are the primary cross-system key — must be ingested for identity resolution DLO → DMO (Loyalty Member)
Snowflake (product catalog + inventory) Query Federation Large dataset (5M SKUs) — no need to copy it. Query live for product lookups Federated (no DLO)
Product return policy PDFs Data Stream (files) Unstructured files for AI grounding — ingested as UDLOs for chunking and vectorization UDLO → UDMO
💡 Key Decision

The Snowflake product catalog uses Query Federation — not ingestion — because it's large, rarely changes, and only needed for ad-hoc lookups. Copying 5M SKUs into Data 360 would waste storage and add ETL overhead with no benefit. Federated queries hit Snowflake directly at query time.

🔧

Prepare — DLOs, DMOs & FQKs

Mapping raw data into the harmonized Customer 360 model

DLO (Raw)SourceTransformDMO (Harmonized)FQK Key Qualifier
crm_contact_raw Service Cloud Map name, email, phone to standard fields Individual CRM_SF
loyalty_member_raw Loyalty DB Map loyalty_id, email, tier to standard fields Individual + LoyaltyMember LOYALTY
ecom_order_raw Commerce Cloud Map order_id, customer_id, line items to standard SalesOrder + SalesOrderProduct ECOM
pos_transaction_raw S3 (nightly CSV) Normalize date format, map store_id to location, resolve currency SalesOrder + SalesOrderProduct POS
crm_case_raw Service Cloud Map case subject, status, resolution to standard fields Case CRM_SF
💡 Why FQKs Matter Here

A customer with order ID 1001 exists in both Commerce Cloud (key qualifier ECOM) and the POS system (key qualifier POS). Without FQKs, both records would collide. With FQKs: ECOM_1001 and POS_1001 are globally unique — they can both map to the same Unified Individual after identity resolution without overwriting each other.

🔗

Unify — Identity Resolution & Data Graphs

Merging 3 records of the same person into one Unified Profile

🔍 Match Rules

The identity resolution ruleset for NovaMart runs in order:

  • Rule 1 — Exact email match: same email in CRM, loyalty, and e-commerce → merge
  • Rule 2 — Phone + last name: same mobile number and last name across sources → merge
  • Rule 3 — Loyalty ID cross-reference: loyalty_id stored in Commerce Cloud profile → merge
  • Rule 4 — Fuzzy name + address: probabilistic match for customers without email

⚖️ Reconciliation Rules

When two sources have conflicting values for the same field, reconciliation determines which "wins":

  • Email: most recently updated source wins
  • Phone: Loyalty DB wins (most accurate — customers update it for rewards)
  • Full Name: Commerce Cloud wins (captured at checkout, most recent)
  • Address: POS wins (captured in-store at time of purchase)

🧩 The Result — Unified Individual: Sarah Chen

Source RecordSource KeyMatched ViaMerged Into
CRM Contact — Sarah ChenCRM_SF_00301Email: sarah@novamail.comUnified Individual
UI_sarah_chen_0042
Loyalty Member — S.Chen #LY-44821LOYALTY_LY-44821Email + loyalty_id in Commerce Cloud
Commerce Customer — sarah@novamail.comECOM_CC-9912Email exact match

🕸️ Data Graph — Real-Time Profile for Agentforce

A Data Graph is configured to pre-join Sarah's Unified Individual with her last 90 days of orders (both online and in-store) and her open cases. This materialized JSON snapshot is what the Agentforce service agent queries in milliseconds when Sarah calls in.

  • Refresh rate: Near real-time (updates within minutes of a new transaction)
  • Used by: Real-Time Insights (next best action) and Agentforce service agent
  • Contains: Unified Individual fields + SalesOrders (90d) + Cases (open)
🧠

AI Grounding — Unstructured Data for Agentforce

How product return policies and FAQs become RAG-ready knowledge

📁 Unstructured Sources

  • Return policy PDFs — one per product category (Electronics, Apparel, Home)
  • FAQ HTML pages — customer service knowledge base (200+ articles)
  • Store location guides — hours, services, accessibility info per store
  • Promotion terms PDFs — legal terms for seasonal campaigns

🔧 Processing Pipeline

  • Files land in UDLOs (one per content type)
  • UDLOs grouped into UDMOs: "Service Knowledge" and "Store Information"
  • Each UDMO is chunked respecting HTML heading structure (H1–H3)
  • Chunks are vectorized → stored in a Hybrid Search Index
  • Two Retrievers wired to Agentforce: one per UDMO scope
RAG in Action — "Can I return these headphones?"
👤 Sarah: "I bought headphones 3 weeks ago in-store. Can I return them?"
→ Retriever queries "Service Knowledge" UDMO for "headphones return policy"
📄 Retrieved chunk: Electronics — 30-day return window from purchase date. In-store purchase: bring receipt or loyalty card. No receipt required for Gold/Platinum loyalty members.
→ Agentforce reads Sarah's Unified Profile: Platinum loyalty member, purchase date 22 days ago
🤖 Agent: "Yes, Sarah — as a Platinum member you don't need a receipt. Your Sony WH-1000XM5 headphones purchased on April 28 are within the 30-day return window. You can return them at any store location."

Insights — Three Types in Practice

How NovaMart uses all three insight types for different business decisions

📊 Calculated Insight Batch

Customer Lifetime Value (CLV) — runs nightly as a batch job across all 12M loyalty members. Aggregates 24 months of order history, calculates average order value, purchase frequency, and return rate. Output stored back to a DMO field (clv_score) on Unified Individual.

Used by: Marketing Cloud segmentation, loyalty tier assignment

⚡ Streaming Insight Near Real-Time

30-day Purchase Activity Score — processes every new transaction event within seconds. Maintains a rolling 30-day engagement score that updates as each purchase, return, or browse event arrives. No need to wait for nightly batch.

Used by: Agentforce context (is this customer active?), in-app personalization

🚀 Real-Time Insight Milliseconds

Next Best Offer — fires in milliseconds when Sarah's inbound call is received. Reads the real-time Data Graph (Unified Profile + recent orders) and calculates the most relevant offer from current promotions based on her purchase history and CLV tier.

Used by: Agentforce service agent — surfaces offer before agent responds
🛡️

Governance — How NovaMart Handles PII & Compliance

Data Spaces, tagging, masking, and access control in practice

🗂️ Data Spaces

NovaMart runs one Data 360 instance (Home Org) shared by multiple business units. Data Spaces partition the data:

  • Retail Operations — POS transactions, store data, inventory
  • Customer Experience — Unified Profiles, cases, loyalty
  • Marketing — segments, campaign history, engagement scores

The Marketing team cannot query raw POS data — they only see aggregated segments from the Customer Experience space.

🏷️ PII Tagging & Masking

Fields tagged as PII at the DMO level:

  • Individual.email → tagged PII, GDPR
  • Individual.mobile_phone → tagged PII, GDPR
  • Individual.date_of_birth → tagged PII, GDPR, HIPAA
  • SalesOrder.billing_address → tagged PII

Dynamic Data Masking is applied: a Marketing analyst running a query sees s***@novamail.com — the email is usable for matching but not readable. The full value is only visible to roles with PII clearance.

🔐 Access Control — RBAC + ABAC Combined

RoleData Space AccessPII FieldsMasking Applied
Data EngineerAll spacesFull access (RBAC)None
Marketing AnalystMarketing onlyNo PII clearance (ABAC)Email, phone masked
Service Agent (Agentforce)Customer ExperienceEmail + phone visible (needed for contact)DOB masked
Store ManagerRetail OperationsNo PII clearance (ABAC)All PII masked
🗺️

Full Architecture Diagram — NovaMart

End-to-end flow from raw sources to Agentforce response