Data 360 Architecture

☁️ Salesforce Data 360 — formerly Data Cloud

Data 360 is Salesforce's unified data platform for ingesting, harmonizing, and activating data at scale. It powers Unified Profiles for identity resolution, AI grounding via RAG for Agentforce, and real-time insights across the Customer 360. This reference covers the full lifecycle from raw ingestion to AI activation.

🔄 Data Flow 🔌 Connect 🔧 Prepare 🧠 Unstructured & AI 🔗 Unify & Analyze 🛡️ Governance 🏢 Multi-Org 🗺️ Full Overview

🔄

Data Flow Lifecycle

The four operational phases of Data 360 — formerly Salesforce Data Cloud

🔌

Connect / Ingest

Bring external data in via connectors or federate from external lakes

🔧

Prepare / Transform

Clean, shape, and map raw data into DLOs → DMOs

🔗

Harmonize / Unify

Resolve identities and build Unified Profiles from disparate sources

⚡

Analyze / Act

Generate Insights, power AI, activate across channels

🔌

Connect Data — Ingestion & Federation

Two patterns for getting data into Data 360: move it or query it in place

📥 Data Streams Ingestion

Pipelines established via connectors (Amazon S3, Google Cloud Storage, Salesforce CRM, etc.)
Data is physically brought into Data 360
Lands first in Data Lake Objects (DLOs) as raw, unmapped data
Best for: high-frequency data that needs transformation and unification

🔭 Zero Copy Data Federation Federation

Query data from external data lakes without moving or duplicating it
Data physically stays in its source system
Query Federation: live compute-based queries (e.g., Snowflake, BigQuery)
File Federation: storage-based queries reading open table formats (e.g., Delta, Iceberg)

⚡ Caching / Acceleration Performance

Temporarily stores external federated data in a local cache
Improves performance and reduces latency for frequent queries
Cache refreshes at configurable intervals — data is not permanently stored
Use when: federation query is expensive and the data doesn't change in real time

Pattern	Data Location	Latency	Best For	Cost Consideration
Data Streams (Ingestion)	Copied into Data 360	Batch / Near Real-Time	Unified profiles, transformations, AI grounding	Storage consumed in Data 360
Query Federation	Stays in source (e.g., Snowflake)	Real-time (compute-bound)	Large datasets, avoid duplication, live queries	Compute cost on source system per query
File Federation	Stays in object storage (S3, ADLS)	Near Real-Time	Open table formats, data lake architectures	Storage read costs on source
Caching / Acceleration	Temporarily in local cache	Low (cached)	Frequent reads on federated data	Temporary storage + refresh compute

🔧

Prepare & Model Data

From raw ingestion to harmonized, queryable objects

🗃️ Data Lake Objects (DLOs) Raw

Initial storage containers for raw, unmapped data
Data lands here first from Data Streams
No schema enforcement yet — preserve original structure
Think of DLOs as the "landing zone" before processing

📐 Data Model Objects (DMOs) Harmonized

Harmonized groupings mapped to the Customer 360 Data Model
Standardized schema across all data sources
Enable cross-source joins and identity resolution
Think of DMOs as the "clean, usable layer" above DLOs

⚙️ Data Transforms Processing

Batch or Streaming processes to combine, clean, and shape data
Run directly within the platform — no external ETL needed
Output can map to DMOs or other DLOs
Use for: deduplication, field normalization, enrichment, aggregation

🔑 Fully Qualified Keys (FQK) Key Management

Combines a source key + key qualifier to create a globally unique identifier
Prevents key conflicts when harmonizing data from multiple sources
Example: CRM_001 from Salesforce + ERP_001 from SAP don't collide
Critical for correct Identity Resolution downstream

Object Type	Layer	Schema	Purpose
DLO (Data Lake Object)	Raw	Source-native, unmapped	Landing zone for ingested data
DMO (Data Model Object)	Harmonized	Customer 360 standard model	Unified, queryable, AI-ready data
UDLO (Unstructured DLO)	Raw Unstructured	None (PDFs, HTML, audio)	Reference container for unstructured files
UDMO (Unstructured DMO)	Harmonized Unstructured	Grouped by content type	Chunked, vectorized, indexed for AI/RAG

🧠

Unstructured Data & AI Grounding

How Data 360 powers RAG for Agentforce, Prompt Templates, and Einstein AI

📁 UDLOs & UDMOs Containers

UDLO: References unstructured files — PDFs, HTML, audio transcripts, images
UDMO: Groups UDLOs into logical domains for AI consumption
No traditional schema — content-based organization
Equivalent to DLO/DMO but for unstructured content

✂️ Chunking & Vectorization AI Processing

Breaks unstructured content into semantically meaningful chunks
Each chunk is transformed into a vector embedding for machine readability
Chunking respects document structure (headings, paragraphs) for better context preservation
Output feeds Search Indexes for retrieval

🔍 Search Indexes Retrieval

Vector Index: Semantic similarity search — finds conceptually related content
Hybrid Index: Combines vector + keyword search for broader coverage
Generated from vectorized chunks in UDMOs
Power Retrieval Augmented Generation (RAG) in Agentforce and Prompt Builder

🌉 Retrievers Bridge Layer

Logical layer bridging the Search Index to downstream AI solutions
Run at inference time — triggered per query, not pre-baked
Surface relevant chunks to: AI Agents, Prompt Templates, Einstein Copilot
Can be scoped per Subagent for domain-specific knowledge grounding

RAG Pipeline — Unstructured Data to AI Response

📁 UDLO
PDFs, HTML,
Audio

→

📐 UDMO
Grouped by
domain

→

✂️ Chunk &
Vectorize
Embeddings

→

🔍 Search
Index
Vector/Hybrid

→

🌉 Retriever
At inference
time

→

🤖 AI Agent /
Prompt Template
Grounded response

🔗

Harmonize, Unify & Analyze

Identity resolution, Data Graphs, and three types of Insights

👤 Identity Resolution Unification

Uses match and reconciliation rulesets to merge disparate records
Output: Unified Profiles (e.g., Unified Individual, Unified Account)
Match rules: exact match, fuzzy match, probabilistic — configurable per use case
Reconciliation rules determine which field value "wins" when sources conflict
FQKs ensure source records are correctly attributed post-merge

🕸️ Data Graphs Performance

Materialized, read-only JSON views combining multiple DMOs
Pre-computed for fast, near real-time or real-time querying
Power Real-Time Insights and low-latency AI activations
Think of them as pre-joined, optimized snapshots of related data
Automatically refresh based on configured update frequency

📊 Calculated Insights Batch

Built for high-volume data processing and complex, historical metric generation. Runs as batch jobs on large datasets — ideal for customer lifetime value, historical purchase frequency, or aggregated segment scores.

Best for: historical analysis, ML feature generation, complex aggregations

⚡ Streaming Insights Near Real-Time

Designed for continuous processing of micro-batches in near real-time. Processes data as it arrives in small windows — ideal for updating engagement scores, session activity, or cart abandonment signals.

Best for: behavioral signals, running counters, event-driven metric updates

🚀 Real-Time Insights Milliseconds

Calculates metrics from a single record in milliseconds, relying on real-time Data Graphs. Used when a decision must be made the instant a record is created or updated — e.g., next best action at the point of call.

Best for: in-the-moment decisions, live agent assistance, real-time personalization

Insight Type	Processing Model	Latency	Data Dependency	Use Case Example
Calculated Insights	Batch	Hours	Full DMO dataset	Customer Lifetime Value, 12-month purchase history
Streaming Insights	Micro-batch / Continuous	Seconds–Minutes	Incoming event stream	Real-time engagement score, cart activity
Real-Time Insights	Single-record compute	Milliseconds	Real-Time Data Graph	Next Best Action at point of contact

🛡️

Governance, Security & Organization

Data Spaces, access control, masking, and lineage

🗂️ Data Spaces Partitioning

Logical partitions that segregate data, metadata, and processes — for example by brand or department. Users only access contextually relevant data. Prevents cross-contamination between business units sharing one Data 360 instance.

🏷️ Data Tagging & Classification Metadata

Manual or AI-assisted application of metadata tags to objects and fields. Enforces governance taxonomies at scale — e.g., tagging fields as PII, GDPR-sensitive, or financial data. Enables automated policy enforcement downstream.

🔐 Access Control Security

RBAC (Role-Based): Access granted based on user role — e.g., Data Engineer sees all DMOs, Analyst sees only their Data Space.

ABAC (Attribute-Based): Access controlled by data attributes — e.g., a field tagged "PII" is hidden from roles without PII clearance. Enforces object-level, field-level, and row-level security.

🎭 Dynamic Data Masking Privacy

Conceals sensitive structured data in real-time without altering its underlying values or breaking relationships. A user without PII access sees ***-**-1234 instead of a full SSN — the data is still usable for joins and aggregations.

🗺️ Unified Lineage Traceability

A visual relationship graph tracing how data objects connect upstream and downstream. Shows the full journey: source connector → DLO → Transform → DMO → Insight → Activation. Essential for audits, impact assessments, and debugging data quality issues.

🏢

Multi-Org Strategy — Data Cloud One

One Home Org, multiple Companion Orgs sharing data without separate licenses

🏠 Home Org

Primary Salesforce org
Data 360 licensed here
Central management & provisioning
Hosts Data Spaces

⇅ Bidirectional

🔗 Companion Org A

Linked Salesforce org
No separate Data 360 license
Accesses shared Data Spaces

🔗 Companion Org B

Linked Salesforce org
No separate Data 360 license
Accesses shared Data Spaces

🔗 Companion Org C

Linked Salesforce org
No separate Data 360 license
Accesses shared Data Spaces

Companion Orgs connect bidirectionally to the Home Org — sharing data and features without needing their own Data 360 instance

🏠 Home Org

Where Data 360 is provisioned and licensed
Centrally manages all Data Spaces, connectors, and governance policies
Hosts the master Unified Profiles and Insights
Acts as the single source of truth for all connected orgs

🔗 Companion Org

Additional Salesforce org linked bidirectionally to the Home Org
No separate Data 360 license required
Can access shared Data Spaces and Unified Profiles
Use case: multi-brand company where each brand has its own org but shares customer data

🗺️

Full Architecture Overview

End-to-end data flow from source to AI activation

Real-World Example

🛍️ Retail Chain — Unified Customer Profile & AI Personalization

A national retail chain with e-commerce, 200 physical stores, a loyalty program, and a service contact center wants to unify all customer data to power personalized Agentforce interactions and real-time next-best-action recommendations. This walkthrough applies every Data 360 concept to a concrete architecture decision.

🏢 Scenario 🔌 Connect 🔧 Prepare 🔗 Unify 🧠 AI Grounding ⚡ Insights 🛡️ Governance 🗺️ Diagram

🏢

Scenario — The Business Problem

What the retail chain needs and why Data 360 is the answer

🏪 The Company

NovaMart — national retail chain, 200 stores + e-commerce site
12 million loyalty program members
3 separate Salesforce orgs: Commerce Cloud, Service Cloud, Marketing Cloud
POS system (in-store transactions) running on a legacy on-premise platform
Product catalog and inventory in an external data warehouse (Snowflake)

❌ The Problems

A customer who buys in-store is unknown to the e-commerce and service teams
The contact center agent sees no purchase history — can't resolve complaints without asking
Marketing sends promotions without knowing what the customer already bought
Agentforce cannot answer "what did I buy last month?" — no unified data source
Same customer appears as 3 different records across orgs (email, loyalty ID, phone)

✅ The Goal

Build a Unified Customer Profile that merges online, in-store, loyalty, and service history into a single identity. Use it to:

Power an Agentforce service agent that knows the full purchase history
Generate a real-time "next best offer" at the moment of contact
Trigger personalized Marketing Cloud journeys based on in-store behavior

Give the contact center agent a 360 view without leaving Service Cloud
Segment customers by lifetime value and engagement tier
Comply with GDPR — tag and mask PII consistently across all sources

🔌

Connect — How Each Source Gets In

Choosing ingestion vs. federation per data source based on volume, frequency, and transformation needs

Source	Pattern	Why This Pattern	Lands in
Salesforce CRM (Service Cloud)	Data Stream	Native connector — data must be unified and transformed into DMOs for identity resolution	DLO → DMO (Individual, Case)
Commerce Cloud (e-commerce orders)	Data Stream	Native connector — high-frequency order events need transformation and unification	DLO → DMO (Order, Product)
POS System (in-store transactions)	Data Stream via S3	Legacy system exports nightly CSVs to Amazon S3 — S3 connector ingests into DLOs	DLO → Transform → DMO (Order)
Loyalty Program DB	Data Stream	Loyalty IDs are the primary cross-system key — must be ingested for identity resolution	DLO → DMO (Loyalty Member)
Snowflake (product catalog + inventory)	Query Federation	Large dataset (5M SKUs) — no need to copy it. Query live for product lookups	Federated (no DLO)
Product return policy PDFs	Data Stream (files)	Unstructured files for AI grounding — ingested as UDLOs for chunking and vectorization	UDLO → UDMO

💡 Key Decision

The Snowflake product catalog uses Query Federation — not ingestion — because it's large, rarely changes, and only needed for ad-hoc lookups. Copying 5M SKUs into Data 360 would waste storage and add ETL overhead with no benefit. Federated queries hit Snowflake directly at query time.

🔧

Prepare — DLOs, DMOs & FQKs

Mapping raw data into the harmonized Customer 360 model

DLO (Raw)	Source	Transform	DMO (Harmonized)	FQK Key Qualifier
crm_contact_raw	Service Cloud	Map name, email, phone to standard fields	Individual	CRM_SF
loyalty_member_raw	Loyalty DB	Map loyalty_id, email, tier to standard fields	Individual + LoyaltyMember	LOYALTY
ecom_order_raw	Commerce Cloud	Map order_id, customer_id, line items to standard	SalesOrder + SalesOrderProduct	ECOM
pos_transaction_raw	S3 (nightly CSV)	Normalize date format, map store_id to location, resolve currency	SalesOrder + SalesOrderProduct	POS
crm_case_raw	Service Cloud	Map case subject, status, resolution to standard fields	Case	CRM_SF

💡 Why FQKs Matter Here

A customer with order ID 1001 exists in both Commerce Cloud (key qualifier ECOM) and the POS system (key qualifier POS). Without FQKs, both records would collide. With FQKs: ECOM_1001 and POS_1001 are globally unique — they can both map to the same Unified Individual after identity resolution without overwriting each other.

🔗

Unify — Identity Resolution & Data Graphs

Merging 3 records of the same person into one Unified Profile

🔍 Match Rules

The identity resolution ruleset for NovaMart runs in order:

Rule 1 — Exact email match: same email in CRM, loyalty, and e-commerce → merge
Rule 2 — Phone + last name: same mobile number and last name across sources → merge
Rule 3 — Loyalty ID cross-reference: loyalty_id stored in Commerce Cloud profile → merge
Rule 4 — Fuzzy name + address: probabilistic match for customers without email

⚖️ Reconciliation Rules

When two sources have conflicting values for the same field, reconciliation determines which "wins":

Email: most recently updated source wins
Phone: Loyalty DB wins (most accurate — customers update it for rewards)
Full Name: Commerce Cloud wins (captured at checkout, most recent)
Address: POS wins (captured in-store at time of purchase)

🧩 The Result — Unified Individual: Sarah Chen

Source Record	Source Key	Matched Via	Merged Into
CRM Contact — Sarah Chen	CRM_SF_00301	Email: sarah@novamail.com	Unified Individual UI_sarah_chen_0042
Loyalty Member — S.Chen #LY-44821	LOYALTY_LY-44821	Email + loyalty_id in Commerce Cloud
Commerce Customer — sarah@novamail.com	ECOM_CC-9912	Email exact match

🕸️ Data Graph — Real-Time Profile for Agentforce

A Data Graph is configured to pre-join Sarah's Unified Individual with her last 90 days of orders (both online and in-store) and her open cases. This materialized JSON snapshot is what the Agentforce service agent queries in milliseconds when Sarah calls in.

Refresh rate: Near real-time (updates within minutes of a new transaction)
Used by: Real-Time Insights (next best action) and Agentforce service agent
Contains: Unified Individual fields + SalesOrders (90d) + Cases (open)

🧠

AI Grounding — Unstructured Data for Agentforce

How product return policies and FAQs become RAG-ready knowledge

📁 Unstructured Sources

Return policy PDFs — one per product category (Electronics, Apparel, Home)
FAQ HTML pages — customer service knowledge base (200+ articles)
Store location guides — hours, services, accessibility info per store
Promotion terms PDFs — legal terms for seasonal campaigns

🔧 Processing Pipeline

Files land in UDLOs (one per content type)
UDLOs grouped into UDMOs: "Service Knowledge" and "Store Information"
Each UDMO is chunked respecting HTML heading structure (H1–H3)
Chunks are vectorized → stored in a Hybrid Search Index
Two Retrievers wired to Agentforce: one per UDMO scope

RAG in Action — "Can I return these headphones?"

👤 Sarah: "I bought headphones 3 weeks ago in-store. Can I return them?"

→ Retriever queries "Service Knowledge" UDMO for "headphones return policy"

📄 Retrieved chunk: Electronics — 30-day return window from purchase date. In-store purchase: bring receipt or loyalty card. No receipt required for Gold/Platinum loyalty members.

→ Agentforce reads Sarah's Unified Profile: Platinum loyalty member, purchase date 22 days ago

🤖 Agent: "Yes, Sarah — as a Platinum member you don't need a receipt. Your Sony WH-1000XM5 headphones purchased on April 28 are within the 30-day return window. You can return them at any store location."

⚡

Insights — Three Types in Practice

How NovaMart uses all three insight types for different business decisions

📊 Calculated Insight Batch

Customer Lifetime Value (CLV) — runs nightly as a batch job across all 12M loyalty members. Aggregates 24 months of order history, calculates average order value, purchase frequency, and return rate. Output stored back to a DMO field (clv_score) on Unified Individual.

Used by: Marketing Cloud segmentation, loyalty tier assignment

⚡ Streaming Insight Near Real-Time

30-day Purchase Activity Score — processes every new transaction event within seconds. Maintains a rolling 30-day engagement score that updates as each purchase, return, or browse event arrives. No need to wait for nightly batch.

Used by: Agentforce context (is this customer active?), in-app personalization

🚀 Real-Time Insight Milliseconds

Next Best Offer — fires in milliseconds when Sarah's inbound call is received. Reads the real-time Data Graph (Unified Profile + recent orders) and calculates the most relevant offer from current promotions based on her purchase history and CLV tier.

Used by: Agentforce service agent — surfaces offer before agent responds

🛡️

Governance — How NovaMart Handles PII & Compliance

Data Spaces, tagging, masking, and access control in practice

🗂️ Data Spaces

NovaMart runs one Data 360 instance (Home Org) shared by multiple business units. Data Spaces partition the data:

Retail Operations — POS transactions, store data, inventory
Customer Experience — Unified Profiles, cases, loyalty
Marketing — segments, campaign history, engagement scores

The Marketing team cannot query raw POS data — they only see aggregated segments from the Customer Experience space.

🏷️ PII Tagging & Masking

Fields tagged as PII at the DMO level:

Individual.email → tagged PII, GDPR
Individual.mobile_phone → tagged PII, GDPR
Individual.date_of_birth → tagged PII, GDPR, HIPAA
SalesOrder.billing_address → tagged PII

Dynamic Data Masking is applied: a Marketing analyst running a query sees s***@novamail.com — the email is usable for matching but not readable. The full value is only visible to roles with PII clearance.

🔐 Access Control — RBAC + ABAC Combined

Role	Data Space Access	PII Fields	Masking Applied
Data Engineer	All spaces	Full access (RBAC)	None
Marketing Analyst	Marketing only	No PII clearance (ABAC)	Email, phone masked
Service Agent (Agentforce)	Customer Experience	Email + phone visible (needed for contact)	DOB masked
Store Manager	Retail Operations	No PII clearance (ABAC)	All PII masked

🗺️

Full Architecture Diagram — NovaMart

End-to-end flow from raw sources to Agentforce response

Ingest · Prepare · Unify · Analyze · Govern · Act

☁️ Salesforce Data 360 — formerly Data Cloud

Data Flow Lifecycle

Connect Data — Ingestion & Federation

📥 Data Streams Ingestion

🔭 Zero Copy Data Federation Federation

⚡ Caching / Acceleration Performance

Prepare & Model Data

🗃️ Data Lake Objects (DLOs) Raw

📐 Data Model Objects (DMOs) Harmonized

⚙️ Data Transforms Processing

🔑 Fully Qualified Keys (FQK) Key Management

Unstructured Data & AI Grounding

📁 UDLOs & UDMOs Containers

✂️ Chunking & Vectorization AI Processing

🔍 Search Indexes Retrieval

🌉 Retrievers Bridge Layer

Harmonize, Unify & Analyze

👤 Identity Resolution Unification

🕸️ Data Graphs Performance

📊 Calculated Insights Batch

⚡ Streaming Insights Near Real-Time

🚀 Real-Time Insights Milliseconds

Governance, Security & Organization

🗂️ Data Spaces Partitioning

🏷️ Data Tagging & Classification Metadata

🔐 Access Control Security

🎭 Dynamic Data Masking Privacy

🗺️ Unified Lineage Traceability

Multi-Org Strategy — Data Cloud One

🏠 Home Org

🔗 Companion Org A

🔗 Companion Org B

🔗 Companion Org C

🏠 Home Org

🔗 Companion Org

Full Architecture Overview

🛍️ Retail Chain — Unified Customer Profile & AI Personalization

Scenario — The Business Problem

🏪 The Company

❌ The Problems

✅ The Goal

Connect — How Each Source Gets In

Prepare — DLOs, DMOs & FQKs

Unify — Identity Resolution & Data Graphs

🔍 Match Rules

⚖️ Reconciliation Rules

🧩 The Result — Unified Individual: Sarah Chen

🕸️ Data Graph — Real-Time Profile for Agentforce

AI Grounding — Unstructured Data for Agentforce

📁 Unstructured Sources

🔧 Processing Pipeline

Insights — Three Types in Practice

📊 Calculated Insight Batch

⚡ Streaming Insight Near Real-Time

🚀 Real-Time Insight Milliseconds

Governance — How NovaMart Handles PII & Compliance

🗂️ Data Spaces

🏷️ PII Tagging & Masking

🔐 Access Control — RBAC + ABAC Combined

Full Architecture Diagram — NovaMart