DeepEval IntegrationAlpha v0.1.2pip install testrelic-deepeval

Memory layer for your LLM evaluations.

TestRelic's DeepEval pytest plugin captures every eval run — metrics, scores, reasons, prompts — into shared application memory. So your AI coding agent doesn't just have a harness to run evals; it has the team's eval history as context. Drop-in for any DeepEval test suite.

An AI agent harness for testing is the runtime, memory, and tool layer that turns a general-purpose LLM into a senior testing engineer for your specific app. TestRelic provides the memory leg. DeepEval provides the eval runner. Cursor, Claude Code, Copilot, and Codex are the LLMs that read from that memory over MCP.

All metricsAnswerRelevancy · Hallucination · Faithfulness · G-Eval · custom
Pytest-nativeauto-attach, no conftest
Alpha v0.1.2on PyPI today

Quickstart

Three lines from pip to first eval upload.

No conftest changes. No SDK init in your test files. Drop the plugin in, log in, and run your existing DeepEval suite.

~/your-project · zsh
pip install testrelic-deepeval
testrelic login
deepeval test run tests/
1pip install testrelic-deepeval

Pulls the pytest plugin from PyPI. No conftest plumbing required — the entry point auto-attaches when both DeepEval and testrelic-deepeval are present.

2testrelic login

Browser-based login flow that writes your API key to ~/.testrelic/credentials. One-time setup; CI uses TESTRELIC_API_KEY instead.

3deepeval test run tests/

Your existing DeepEval suite runs as-is. The TestRelic plugin auto-captures the TestRun and uploads metrics, cases, scores, and reasons to TestRelic.

What the memory looks like

Every eval run, captured and queryable.

Metrics, scores, thresholds, reasons, and evaluation models — uploaded automatically, served back to your AI agent on demand.

deepeval test run · AnswerRelevancy 0.83 · uploaded

$ deepeval test run tests/test_chat_assistant.py

eval run dashboard

248

Cases evaluated

6

Metrics tracked

0.81

Avg score

AnswerRelevancy ≥ 0.7
Faithfulness ≥ 0.7
Hallucination ≤ 0.3

Eval-run memory queryable from Cursor and Claude Code over MCP.

Comparisons

Where TestRelic fits in your eval stack.

Memory layer, agent harness, DeepEval, Confident AI — what does what.

Give your agent the team's eval history.

Drop-in for any DeepEval test suite — every metric, score, and reason lands in shared application memory your AI coding agent reads over MCP.

Disabled by default without an API key — if TESTRELIC_API_KEY is unset the plugin is a no-op, so your DeepEval suite never fails because of a missing TestRelic credential.