Empirical bundles together a CLI and a web app. The CLI handles running tests and the web app visualizes results.

Everything runs locally, with a JSON configuration file, empiricalrc.json.

Required: Node.js 20+ needs to be installed on your system.

Start with a basic example

In this example, we will ask an LLM to parse user messages to extract entities and give us a structured JSON output. For example, “I’m Alice from Maryland” will become "{name: 'Alice', location: 'Maryland'}".

Our test will succeed if the model outputs valid JSON.

1

Set up Empirical

Use the CLI to create a sample configuration file in empiricalrc.json.

npx @empiricalrun/cli init

Read the file to see the configured models and dataset samples that we will test for. The default configuration uses models from OpenAI.

cat empiricalrc.json
2

Run the test

Run the test samples against the models with the run command.

npx @empiricalrun/cli run

This step requires the OPENAI_API_KEY environment variable to authenticate with OpenAI. This execution will cost $0.0026, based on the selected models.

3

See results

Use the ui command to open the reporter web app in your web browser and see side-by-side results.

npx @empiricalrun/cli ui
4

[Bonus] Fix GPT-4 Turbo

GPT-4 Turbo tends to fail our JSON syntax check, because it returns outputs in markdown syntax (with backticks ```json). We can fix this behavior by enabling JSON mode.

{
  "model": "gpt-4-turbo-preview",
  // ...
  // Existing properties
  "parameters": {
    "response_format": {
      "type": "json_object"
    }
  }
}

Re-running the test with npx @empiricalrun/cli run will give us better results for GPT-4 Turbo.

Make it yours

Edit the empiricalrc.json file to make Empirical work for your use-case.