Executive Summary: CoreWeave AI Cloud

CoreWeave is a pure-play NVIDIA GPU cloud whose entire differentiation lives at three layers — the compute fabric (0), infrastructure orchestration (2A), and application runtime (2B) — and which is, unusually for a cloud in this assessment, built deliberately on open substrate. Its orchestration opinions rest on Kubernetes (CKS) and Slurm (SUNK), not a proprietary managed scheduler, and SUNK Anywhere runs those same workflows on non-CoreWeave and on-prem clusters with few configuration changes. That makes CoreWeave's orchestration layer the rare cloud cell that reads as Retained rather than Ceded.

The capture mechanism is decoupled and invisible. The substrate the buyer sees — Kubernetes, Slurm, an S3-compatible object store with no egress fees — is genuinely open and portable, which is reassuring at purchase. The value that actually accumulates sits in two captive places the buyer underprices: the silicon and fabric (wholly NVIDIA, with no x86-OEM substitutability because you rent CoreWeave's fleet rather than buy swappable hardware), and the Weights & Biases opinion layer at the top of the stack — experiment history, evaluation frameworks, and the closed training-to-inference 'Superintelligence Loop' — which is proprietary and does not move when the substrate does.

The data layers are where CoreWeave's openness gets its sharpest test, and the reading is substrate-agnostic: the customer contracts with CoreWeave for a CoreWeave service backed by a CoreWeave SLA, so whatever storage platform sits underneath (VAST, WEKA, DDN, and others all appear in the dedicated tier) is an implementation detail and irrelevant to authority — the same way a hyperscaler's object store is not scored on its drive supplier. What matters is the interface and the lift-to-leave. CoreWeave's default data services — AI Object Storage (CAIOS) via an S3-compatible API, and Distributed File Storage via POSIX — sit behind open, portable interfaces: a customer can re-point and leave, so authority is Delegated (a CoreWeave-managed service behind an open interface — the customer could switch without rebuilding; Retained is reserved for an open substrate the enterprise operates itself). The governance surface (catalog, lineage, advanced data services) exists only on the optional Dedicated tier, which CoreWeave's own docs steer most workloads away from; those governed artifacts are a closed proprietary platform with no open exit and are Ceded. So Layer 1A reads moderate — a real governance capability is available and CoreWeave-delivered, but it is optional and the only captive piece, while the default experience stays open. Context/retrieval (1B) and pipelines (1C) have no CoreWeave product at all and remain the enterprise's to provide.

The buyer's trade: CoreWeave offers the latest NVIDIA silicon, best-in-class GPU-cluster orchestration on open standards, and a credible agentic development loop via W&B — with a far more favorable orchestration-layer authority profile than any hyperscaler. In exchange it cedes total dependence on NVIDIA at Layer 0 (supplier, strategic partner, and equity holder after the January 2026 $2B investment) and accumulates captive value in the W&B value plane. The instrument's reading: open where it is cheap to be open, captive where the value compounds.

Layer-by-layer status: Layer 0 (Ceded to NVIDIA), Layer 1A (Open default storage; governance on optional tier is Ceded), Layer 1B (Enterprise-provided), Layer 1C (Bytes move fast, pipelines absent), Layer 2A (Strong — and Retained (open substrate)), Layer 2B (Strong — authority splits by what you use), Layer 2C (Gap: Improvement Loop, Not a Reasoning Plane), Layer 3 (+1) (W&B developer plane — captive opinion layer).

Assessment framework: 4+1 Layer AI Infrastructure Model. Scoring model: Decision Authority Placement Model (DAPM) — Retained, Delegated, or Ceded. Published by The CTO Advisor LLC. Author: Keith Townsend. Date assessed: June 20, 2026. Version: v1.3 - Managed-Service Authority + 2C-Gap Reconciliation.

CoreWeave AI Cloud

The Open-Substrate Neocloud — Mapped to the 4+1 Layer AI Infrastructure Model

v1.3 - Managed-Service Authority + 2C-Gap Reconciliation·Assessed June 20, 2026·Source: CoreWeave Platform and Storage docs, NVIDIA GTC 2026 (HGX B300 GA, Vera Rubin NVL72 H2 2026), SUNK / CKS / Mission Control product pages and docs, CoreWeave AI Object Storage (CAIOS) + LOTA, Distributed File Storage, Dedicated storage tiers (VAST/WEKA/DDN/IBM Spectrum Scale/Pure), CoreWeave Sandboxes, Weights & Biases acquisition (closed 2025) and unified agentic AI launch (May 2026 — Serverless RL, CoreWeave Inference, W&B Weave, W&B Skills), NVIDIA $2B investment (Jan 2026), SEC filings, analyst coverage. v1.1 added Dedicated storage tier + Sandboxes (CoreWeave AR, May 29 2026). v1.2 corrects the 1A authority reasoning: the architectural guarantee is CoreWeave's SLA on a CoreWeave service — the storage substrate (VAST/WEKA/etc.) is an implementation detail invisible to the customer and is removed from the DAPM call, consistent with not scoring a hyperscaler's object store on its drive supplier.

ACTIVE ASSESSMENT

Strength

Moderate

Gap

Partner

Layer 0 · ComputeCompute & Network FabricCeded to NVIDIA▼

Raw compute, networking, and acceleration fabric

Vendor-Provided

GPU compute (bare-metal CKS nodes)Ceded

Single-GPU through 8× NVLink systems and multi-node InfiniBand clusters; HGX B300 with 2.1 TB HBM3e; GB200/GB300 rack-scale; Vera Rubin NVL72 expected H2 2026. Rented capacity, not owned hardware.

Network fabricCeded

NVIDIA Quantum-X800 InfiniBand, ConnectX, BlueField DPUs; multi-cloud backbone with private interconnects, direct cloud peering, 400 Gbps-capable ports. No commodity-substrate equivalent.

NVIDIA-Provided

NVIDIA GPU fleet

First-to-market NVIDIA generations: HGX B300 GA, GB200/GB300 rack-scale, Hopper/Ada; among first to deploy Vera Rubin NVL72 in H2 2026.

NVIDIA networking

Quantum-X800 InfiniBand, ConnectX NICs, BlueField DPUs for node/resource offload — the entire fabric is NVIDIA silicon.

◆ Gap Analysis

CoreWeave's Layer 0 is differentiated and complete — the freshest NVIDIA fleet of any cloud, 40+ data centers, ultra-low-latency fiber, BlueField-offloaded bare-metal nodes, and MLPerf-leading utilization. Calibrates with AWS and NVIDIA at this layer: strong capability, zero buyer authority over silicon or fabric. The authority call is harder than it looks. Dell and HPE score Retained at Layer 0 because x86/NVIDIA hardware is substitutable across OEMs — the enterprise owns the box and can swap the vendor. CoreWeave is not that: the enterprise rents CoreWeave's NVIDIA-only fleet, so there is no commodity substrate it controls and no swap that does not mean leaving CoreWeave. The dependence is also uniquely total — NVIDIA is supplier, Elite Cloud Partner, and (after the January 2026 $2B investment) a significant equity holder. There is no alternative-accelerator path here as there is on AWS (Trainium) or Google (TPU). Ceded.

◆ Borrowed Judgment

◆ Working Notes

The NVIDIA column is the densest in this row. Unlike the hyperscalers, CoreWeave has no own-silicon hedge — NVIDIA dependence at Layer 0 is the structural fact of the business.

Layer 1A · StorageData Storage & GovernanceOpen default storage; governance on optional tier is Ceded▼

Durable, governed data foundation — the Governance Catalog that Layer 2C queries

Vendor-Provided

CoreWeave AI Object Storage (CAIOS) + LOTADelegated

CoreWeave's own S3-compatible object service; LOTA caches on GPU/CPU nodes for up to 7 GB/s per GPU, cross-region/cross-cloud reach, no egress/request fees, >75% lower cost; versioning, lifecycle, read-after-write consistency, security policies. Open S3 interface — the customer can leave by re-pointing workloads. The storage backend is CoreWeave's implementation detail, not part of the authority call. A managed CoreWeave service behind the multi-vendor S3 standard: object opinions lift to any S3 platform, so Delegated, not Retained (Retained is reserved for an open substrate the enterprise operates).

Distributed File StorageDelegated

POSIX-compliant shared filesystem for multi-node training synchronization; snapshots, async deletion. A CoreWeave-managed service behind the open POSIX/NFS interface: workloads lift to another POSIX filesystem, so Delegated. Performance, not governance.

Dedicated tier governance servicesCeded

Optional single-tenant dedicated clusters expose a full data-services suite (catalog, database, data-engine, audit logging, cross-cluster replication) — the only path to a real 1A governance surface. CoreWeave-delivered and SLA-backed, but a closed proprietary data platform with no open exit; governed artifacts can't be lifted without rebuilding. Defaulted-away-from for most workloads.

NVIDIA-Provided

Assessment pending

◆ Gap Analysis

This cell is scored on the customer's contractual boundary, not on the storage substrate. The customer contracts with CoreWeave for a CoreWeave service under a CoreWeave SLA; the platform underneath (CAIOS's backend, or the VAST/WEKA/DDN/Spectrum Scale/Pure options in the dedicated tier) is an implementation detail CoreWeave manages and is invisible to the customer — so it does not enter the authority call, the same way a hyperscaler's object store is not scored on who makes its drives. What governs is the interface and the lift-to-leave. Default data services are open at the interface and Delegated. AI Object Storage (CAIOS) is CoreWeave's own service, LOTA-accelerated (up to 7 GB/s per GPU, cross-cloud single-dataset reach, no egress, >75% lower cost), and it speaks an S3-compatible API with versioning, lifecycle, read-after-write consistency, and security policies. Distributed File Storage presents a POSIX shared filesystem for training synchronization. Both sit behind open, portable interfaces — a customer can re-point S3 or POSIX workloads to another provider without rebuilding the data layer. Delegated — a CoreWeave-managed service behind an open interface; the customer could switch without rebuilding, but does not operate the substrate (which would be Retained). A real governance surface exists, but only on the optional Dedicated tier and only as a closed platform. That tier exposes a full data-services suite (catalog, database, data-engine functions, audit logging, cross-cluster replication) — a genuine Layer 1A governance capability that CoreWeave delivers and supports. But CoreWeave's own docs name the default tiers (CAIOS, Distributed File Storage) as the recommended starting point for most AI workloads, so the governance-rich path is the exception, not the norm. And those governed artifacts — the catalog metadata, the database logic, the data-engine pipelines — are a proprietary platform with no open exit: the customer cannot lift them to another vendor without rebuilding. Ceded, on the closed-system litmus, not because a partner supplies the engine. Net: moderate. A governance capability is present and CoreWeave-delivered (so not gap), but it is optional, defaulted-away-from, and not a differentiated CoreWeave opinion set the way AWS Lake Formation is AWS's own (so not strong). Authority splits: Delegated on the open-interface default services, Ceded on the proprietary governance tier. Note the standing decoy: 'S3-compatible, no egress, runs anywhere' describes portability of bytes through an open interface — which is the legitimate basis for the Delegated default here — but says nothing about the separate governance opinions, which are where the capture sits.

◆ Borrowed Judgment

◆ Working Notes

v1.2 (May 29 2026): corrected the authority reasoning. v1.0 scored 1A gap on default storage alone; v1.1 moved it to moderate but mis-attributed the authority to VAST as a third party. The customer's guarantee is with CoreWeave, not the substrate vendor, so VAST/WEKA/etc. are removed from the DAPM call. Status stays moderate; the split is Delegated (open S3/POSIX interfaces — CoreWeave-managed services the customer could switch) vs. Ceded (proprietary governance services on the optional dedicated tier). v1.3 (Jun 20 2026): default-storage authority Retained→Delegated under the instrument-wide managed-service rule (a managed service behind a multi-vendor standard interface is Delegated; Retained is reserved for an open substrate the enterprise operates).

Layer 1B · RetrievalContext Management & RetrievalEnterprise-provided▼

Low-latency retrieval for RAG — vector/hybrid search, context windows

Vendor-Provided

NVIDIA-Provided

Assessment pending

◆ Gap Analysis

No native vector database, hybrid search, or managed RAG context service. W&B Weave provides tracing and evaluation for agent and LLM behavior, not retrieval. Tensorizer accelerates model loading, not context assembly. The enterprise brings its own retrieval stack (pgvector, Milvus, Weaviate, etc.) and runs it on CoreWeave compute. The same boundary that governs 1A governs here: a capability counts only if CoreWeave exposes it as a service you buy and they SLA. The underlying storage platform on the dedicated tier may have similarity/indexing services, but CoreWeave does not surface a retrieval product, so there is nothing for the customer to consume as a CoreWeave 1B capability. An underlay capability that is not exposed to the customer is not a CoreWeave capability — it is a gap. Calibrates with NVIDIA's 1B gap: CoreWeave accelerates retrieval workloads at Layer 0/2B but does not provide the retrieval capability itself. Function remains the enterprise's — gap, DAPM Retained.

◆ Borrowed Judgment

◆ Working Notes

Boundary rule: scored on what CoreWeave exposes and SLAs, not on what the storage substrate can technically do. CoreWeave sells storage tiers, not a retrieval product, so this is a gap regardless of substrate.

Layer 1C · PipelinesData Movement & PipelinesBytes move fast, pipelines absent▼

Move/transform data — ETL/ELT, lineage, cost-aware movement, KV cache tiering

Vendor-Provided

NVIDIA-Provided

Assessment pending

◆ Gap Analysis

LOTA moves bytes fast and across clouds, and AI Object Storage gives a single dataset global reach without replication. But fast data transport is not a data pipeline. There is no ETL/ELT, no transformation framework, no lineage tracking, and no cost-aware movement orchestration product. The enterprise composes this from open tooling (Airflow, Ray Data, dlt, Spark) on CoreWeave compute. Same boundary as 1A and 1B: the dedicated tier's data-engine can do pipeline-style work, but CoreWeave does not expose an ETL/pipeline/lineage product the customer can buy and hold CoreWeave to. An underlay capability CoreWeave does not surface as a service is not a CoreWeave capability. Gap. Data movement ≠ data pipelines. Cross-cloud reach is a Layer 0/1A throughput property, not a Layer 1C pipeline capability. Function remains the enterprise's — gap, DAPM Retained.

◆ Borrowed Judgment

Layer 2A · OrchestrationInfrastructure OrchestrationStrong — and Retained (open substrate)▼

GPU scheduling, quotas, RBAC, fair-share scheduling, utilization optimization

Vendor-Provided

CKS + SUNK (Slurm on Kubernetes)Retained

Managed Kubernetes with Slurm integrated as a K8s scheduler; login/compute/controller nodes as Pods; preemption across Slurm and K8s workloads. Built on CNCF Kubernetes and open-source Slurm. SUNK Anywhere extends to non-CoreWeave and on-prem clusters.

Mission ControlCeded

Proprietary cluster health, straggler detection, silent-fault mitigation, and node lifecycle management; ~50% fewer interruptions claimed. Operational intelligence layer captive to CoreWeave.

NVIDIA-Provided

Topology awareness on NVIDIA fabric

SUNK topology-aware scheduling tuned to GB200 rack-scale NVLink/InfiniBand layouts; Mission Control detects GPU stragglers and silent hardware faults.

◆ Gap Analysis

This is CoreWeave's signature layer and the most consequential authority call in the row. CKS (CoreWeave Kubernetes Service) plus SUNK (Slurm on Kubernetes) plus Mission Control deliver differentiated, complete GPU-cluster orchestration: unified Slurm batch scheduling and Kubernetes container orchestration on one cluster, topology-aware placement, preemption logic across both schedulers, automated node lifecycle and straggler mitigation, and self-service cluster provisioning. Strong capability — peer to the hyperscalers' managed orchestration and, by several customer accounts, ahead of it for large training clusters. The authority reading is where CoreWeave diverges from every hyperscaler in this assessment. The hyperscalers' managed orchestration is scored present-and-Ceded because the scheduling opinions (Karpenter, proprietary fair-share) are captive. CoreWeave's orchestration opinions rest on open substrate — Kubernetes (CNCF) and Slurm (open-source, SchedMD) — and SUNK Anywhere explicitly runs the same workflows on non-CoreWeave clusters and on-prem 'with very few configuration changes,' confirmed by customers running it across providers. The enterprise can lift its Slurm/K8s scheduling opinions and operate them elsewhere without rebuilding. By the litmus, that is Retained, and it calibrates near the IBM/Red Hat standard. Two proprietary slivers sit inside the otherwise-Retained layer: Mission Control (the health/straggler-detection and lifecycle service) and the SUNK self-service operator/console are CoreWeave IP. A buyer who depends on Mission Control's operational intelligence specifically inherits a Ceded dependency — but the core scheduling abstraction they would carry to another platform is open.

◆ Borrowed Judgment

◆ Working Notes

Pressure-tested against the tidy-story risk: it would be neat to call every cloud's 2A Ceded. CoreWeave genuinely breaks that pattern because it ships the open schedulers as the product rather than hiding a proprietary one behind a managed service. The Retained call is evidence-driven (SUNK Anywhere portability), not generous.

Layer 2B · RuntimeApplication Runtime & ExecutionStrong — authority splits by what you use▼

Model serving, agent execution, inference APIs, distributed inference

Vendor-Provided

CKS open serving (KServe / KubeFlow) + TensorizerRetained

Open model-serving on managed Kubernetes; Tensorizer accelerates model load from S3. Runtime opinions portable off CoreWeave.

CoreWeave Inference + W&B Serverless RLCeded

Always-on production inference with integrated monitoring; environment-free Serverless RL for agentic post-training (~40% cost reduction, ~1.4× faster claimed). Proprietary runtime.

CoreWeave SandboxesCeded

Productized isolated execution environment for RL, agent, and evaluation workloads, integrated into the inference service. The execution surface the Weave-driven improvement loop writes fine-tuning, eval, and RL changes into. Proprietary.

NVIDIA-Provided

Optimized runtime on NVIDIA

Inference and training tuned to NVIDIA GPUs; integrates NVIDIA stack but does not require NVIDIA AI Enterprise as the only runtime path.

◆ Gap Analysis

Strong, complete runtime capability across the training-to-inference span: CKS natively integrates open serving stacks (KServe, KubeFlow), Tensorizer streams serialized models from S3/HTTPS for fast cold-start (5× faster downloads claimed), and CoreWeave Inference runs continuously-on production workloads with built-in scaling and health monitoring. W&B Serverless RL adds post-training (RL) execution without provisioning, and CoreWeave Sandboxes provide a productized isolated execution environment for agent, RL, and evaluation workloads — the runtime substrate the improvement loop (Layer 2C) drives changes into. Authority is genuinely split and depends on the presented path the buyer chooses — flagged here rather than averaged away: • Open serving (KServe/KubeFlow on CKS) — the runtime opinions are open and portable. Retained. • CoreWeave Inference / W&B Serverless RL / Sandboxes — proprietary CoreWeave/W&B runtime; adopting it means the execution opinions are captive. Ceded where used. Calibrates with NVIDIA 2B (NIM/Dynamo runtime, strong/Ceded) on the proprietary path, but CoreWeave — unlike NVIDIA — also offers a fully open serving path that keeps the layer Retained. Strong capability, mixed authority.

◆ Borrowed Judgment

◆ Working Notes

This is the one cell resting partly on inference: the DAPM depends on which runtime a given customer adopts. Scored as presented architecture (both paths shipped), with the split named rather than collapsed. v1.1: added CoreWeave Sandboxes (confirmed by CoreWeave AR, May 29 2026) as the isolated RL/agent/eval execution environment inside the inference service.

Layer 2C · ReasoningAgentic Infrastructure — The Reasoning PlaneGap: Improvement Loop, Not a Reasoning Plane▼

Policy-driven placement and resource coordination — the Autonomy Layer

Vendor-Provided

NVIDIA-Provided

Assessment pending

◆ Gap Analysis

CoreWeave's May 2026 'Superintelligence Loop' — Serverless RL + CoreWeave Inference + W&B Weave observability + W&B Skills — is real Intelligence-2C: a closed feedback loop that governs which agents improve, evaluates agent actions against custom signals, surfaces failure modes in multi-agent workflows, and (via Skills + MCP server) turns coding agents into autonomous agent-improvers. It is productized and adopted, not a slideware claim. But routing is not reasoning, and improvement is not placement. This is closed-loop agent operations and continuous improvement, not live Infrastructure-2C: no policy engine answers 'given data residency, cost, latency, and compliance, run this inference on B300 in region X versus region Y at request time.' The physical placement underneath is just SUNK/Kubernetes scheduling. Calibrates with AWS — which has productized Intelligence-2C (AgentCore Policy/Guardrails) and absent Infrastructure-2C — but CoreWeave's Intelligence-2C is narrower: it governs agent improvement and evaluation, not a general action-authorization policy plane like Cedar-based AgentCore Policy. As a single agent-improvement-and-evaluation loop, it sits below the bar for a scored Layer 2C capability: the same line applied to Nutanix's Agent Gateway and VMware's MCP governance, both treated as sub-threshold building blocks at gap, versus the productized multi-component agent-governance/policy planes that earn moderate (AWS AgentCore Policy/Guardrails/Evaluations/Registry; Cisco identity + defense + observability). So this is a gap, not moderate: the enterprise retains policy-driven placement and a general action-authorization plane. The live per-inference placement gap is universal across this assessment, not specific to CoreWeave; it is noted, not penalized as a unique defect. Authority: the W&B improvement/observability loop is proprietary IP — named here as an emerging signal rather than scored; the enterprise retains policy-driven placement, and the underlying scheduling is the Retained SUNK/K8s layer.

◆ Borrowed Judgment

◆ Working Notes

Pressure-tested 'routing is not reasoning': the Superintelligence Loop is genuinely agentic but operates on the model/agent-quality axis, not the request-placement axis. Scored as a gap on the same basis as the other infrastructure rows whose single 2C signals (Nutanix Agent Gateway, VMware MCP governance) are sub-threshold.

Layer 3 (+1) · ApplicationsAI Application Layer — The Value PlaneW&B developer plane — captive opinion layer▼

AI-powered business capabilities — business logic, workflow automation

Vendor-Provided

W&B Models + RegistryCeded

Experiment tracking, model versioning, artifact lineage, lifecycle promotion. The accumulated experiment and lineage history is the captive value.

W&B Weave + SkillsCeded

Agent/LLM tracing, evaluation framework, production monitoring, Playground; Skills + MCP server for autonomous agent improvement. Proprietary developer value plane.

NVIDIA-Provided

Assessment pending

◆ Gap Analysis

Through Weights & Biases (acquired 2025), CoreWeave owns a genuine, widely-adopted value plane: W&B Models + Registry (experiment tracking, model versioning, lineage, lifecycle promotion), W&B Weave (tracing, evaluation, production monitoring, Playground), and W&B Skills. W&B powers 1,500+ organizations including 30+ foundation-model builders. This is a real Layer 3 opinion the vendor owns and operates, not merely an ISV catalog — so it is not scored 'partner.' It is narrower than the hyperscaler value planes, which is why it is moderate rather than strong. W&B is a developer/MLOps and agent-ops surface, not a breadth of business-workflow applications comparable to AWS Bedrock/Q/Kiro or Google's application stack. There is no business-process application ecosystem here — the value plane is for the people who build AI, not the people who consume it in line-of-business workflows. This is the decoupled capture, and it is the most important reading in the row. The substrate the buyer chose CoreWeave for — open Kubernetes, open Slurm, S3-API storage, no egress — is reassuringly portable. The value that actually compounds — years of experiment history, evaluation frameworks, the improvement loop, the model registry's lineage — accumulates inside W&B, which is proprietary with no open exit. The buyer feels free because the visible layer is open, and underprices the W&B commitment precisely because everything beneath it can move. Ceded.

◆ Borrowed Judgment

◆ Working Notes

The W&B capture is the inverse of the hyperscalers' coupled capture: there the data lives visibly in the vendor's namespace; here the data stays open while the opinion layer above it is captive. More reassuring at purchase, which is what makes it worth naming.

✦ Summary Finding

4+1 Layer AI Infrastructure Model · Vendor Assessment Series · The CTO Advisor LLC · thectoadvisor.com