Introduction

Empirical is the fastest way to test different LLMs, prompts and other model configurations, across all the scenarios that matter for your application.

Try it out!

With Empirical, you can

Run your test datasets locally against off-the-shelf models
Test your own custom models and RAG applications (see how-to)
Reports to view, compare, analyze outputs on a web UI
Score your outputs with scoring functions
Run tests on CI/CD

Walk through

Watch a 6 mins demo video showing how Empirical can run the HumanEval benchmark.

Open source

Empirical is open source on GitHub. Star the repo, file issues or pull requests to contribute to the project.

Quick start

On this page

With Empirical, you can
Walk through
Open source

Get started

Model providers

Test dataset

Scoring outputs

With Empirical, you can

Walk through

Open source

Get started

Model providers

Test dataset

Scoring outputs

​With Empirical, you can

​Walk through

​Open source

With Empirical, you can

Walk through

Open source