Using a Python function as entrypoint, you can define a custom model to test with Empirical. This method can be also used to test an application, which does pre or post-processing around the LLM call or chains multiple LLM calls together.

Run configuration

In your config file, set type as py-script and specify the Python file path in the path field.

"runs": [
  {
    "type": "py-script",
    "path": "rag.py"
  }
]

You can additional pass following properties in run configuration:

  • name: string - a custom name to your run
  • parameters: object - object to pass values to the script to modify its behavior

The Python file is expected to have a method called execute with the following signature:

  • Arguments
    • inputs: dict of key-value pairs with sample inputs
    • parameters: dict of key-value pairs with the run parameters
  • Returns: a dict with
    • output (string): The response from the model/application
    • metadata (dict): Custom key-value pairs that are passed on to the scorer and web reporter
rag.py
def execute(inputs):
    # ...
    return {
        "output": output,
        "metadata": {
            "key": value
        }
    }

In a RAG application, metadata can be used to capture the retrieved context.

Example

The RAG example uses this model provider to test a RAG application.