Riven Models

Train, evaluate, and serve models at scale.

The Riven AI Platform gives you a unified control plane for the full model lifecycle — from fine-tuning to production inference. Built on vLLM with BM25 + vector hybrid search, it delivers sub-50ms p99 latency with full observability baked in.

vLLM

Serving Engine

< 50ms

p99 Latency

2k+

Tokens / sec

Request access Read the docs

riven models — training run

finetune-v3 · GRPORunning

GPU: A100batch: 32lr: 2e-4

Loss2.4

Eval score12%

Improving...

step 0 / 1000

Loading demo…

What's included

Everything you need, nothing you don't.

vLLM Inference Engine

Continuous batching, PagedAttention, and tensor parallelism. Deploy any HuggingFace or custom model in minutes with < 50ms p99.

Fine-Tuning Pipelines

LoRA, QLoRA, and full fine-tuning workflows. Connect your dataset, pick a base model, and let the pipeline handle the rest.

Evaluation Framework

Built-in eval harness with MMLU, HellaSwag, and custom benchmarks. Compare model versions side-by-side with drift detection.

Inference Observability

Token throughput, latency percentiles, and per-request traces. Grafana dashboards auto-provisioned on deploy.

Full capability list

vLLM-based serving with PagedAttention

Multi-GPU tensor & pipeline parallelism

LoRA / QLoRA fine-tuning

BM25 + vector hybrid retrieval

RLHF & DPO training loops

Model registry with versioning

A/B and shadow traffic routing

Auto-scaling on GPU utilization

OpenAI-compatible API surface

Prometheus + Grafana observability

Early access

Riven is in beta — pricing opens as we leave beta. Request access and we'll reach out within a few days.

Request access View all products