Nano Banana Pro
Agent skill for nano-banana-pro
Your agent is now live, helping students and scheduling meetings with professors. But here's the thing - how do you know it's actually working correctly?
Sign in to like and favorite skills
# [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]valuate [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]gents
Your agent is now live, helping students and scheduling meetings with professors. But here's the thing - how do you know it's actually working correctly?
Just like with testing in the earlier chapters, the same question gets answered differently every time. [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]nd that's fine... usually, but makes things a bit tricky.
## [CLUSTER_DOMAIN>]hree layers of agent testing
When evaluating agents, we will focus on three areas:
1. **[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]nit tests for individual tools** - [CLUSTER_DOMAIN>]est each tool in isolation. [CLUSTER_DOMAIN>]oes the calendar [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]P[CLUSTER_DOMAIN>] actually create events? [CLUSTER_DOMAIN>]oes the search return relevant results?
2. **[CLUSTER_DOMAIN>]ext-to-J[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][CLUSTER_DOMAIN>][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]] validation** - [CLUSTER_DOMAIN>]an the [CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]] format tool calls correctly, and does it choose the right tools? ([[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]poiler: malformed J[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][CLUSTER_DOMAIN>][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]] is where most agents break)
3. **[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]nd-to-end evaluation** - [CLUSTER_DOMAIN>]oes the complete workflow help users?
We've already set up an eval framework earlier, so let's put it to work testing our agent!
## 1. [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]nit [CLUSTER_DOMAIN>]esting [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]gent [CLUSTER_DOMAIN>]ools
Before we test the whole agent, let's make sure each individual tool works correctly. [CLUSTER_DOMAIN>]hink of it like testing the ingredients before baking the cake.
[CLUSTER_DOMAIN>]he canopy backend already has unit tests set up for the student assistant tools. [CLUSTER_DOMAIN>]et's run them!
1. We first need to install some dependencies:
```bash
cd /opt/app-root/src/backend
pip install -r app/requirements.txt
pip install -r tests/requirements-test.txt
```
2. [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]nd then we can run the unit tests:
```bash
pytest tests/test[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]tools.py -v
```
You should see output like this:
```
tests/test[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]tools.py::test[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]search[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]knowledge[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]base
tests/test[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]tools.py::test[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]find[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]professors[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]by[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]expertise P[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][CLUSTER_DOMAIN>] [ 50%]
tests/test[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]tools.py::test[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]mcp[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]calendar[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]list[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]tools
tests/test[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]tools.py::test[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]mcp[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]calendar[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]list[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]events
======================== 4 passed in 1.22s ========================
```
**What did we just test?**
- **search[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]knowledge[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]base** - Verified the tool can retrieve relevant content from the vector store
- **find[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]professors[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]by[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]expertise** - [CLUSTER_DOMAIN>]hecked that professor matching works correctly
- **[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][CLUSTER_DOMAIN>]P calendar tools** - [CLUSTER_DOMAIN>]onfirmed the [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][CLUSTER_DOMAIN>]P server is reachable and exposes the right tools
**Pro tip:** Want to see what the tools are returning? [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]un with the `-s` flag:
```bash
pytest tests/test[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]tools.py -v -s
```
[CLUSTER_DOMAIN>]his shows the actual search results and helps you understand what data your tools are working with.
## 2. [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]dd [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]nit [CLUSTER_DOMAIN>]ests to [CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]/[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>] Pipeline
[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]ow that we've verified the unit tests work locally, let's automate them in our [CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]/[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>] pipeline! [CLUSTER_DOMAIN>]his ensures every code change is tested before being promoted to production.
### [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]nable [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]nit [CLUSTER_DOMAIN>]ests in the [CLUSTER_DOMAIN>]ekton Pipeline
[CLUSTER_DOMAIN>]he evaluation pipeline can run unit tests alongside the other evaluations. [CLUSTER_DOMAIN>]et's enable this step:
1. Go to `genaiops-gitops/toolings/evaluation-pipeline/config.yaml` in your workbench and update the config file to enable a unit test step:
```yaml
chart[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]path: charts/canopy-evals-pipeline

[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][CLUSTER_DOMAIN>][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][CLUSTER_DOMAIN>][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]: <[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][CLUSTER_DOMAIN>][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][CLUSTER_DOMAIN>][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]
kfp:
lls[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]rl: http://llama-stack-servicetest.svc.cluster.local:8321
backend[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]rl: http://canopy-backendtest.svc.cluster.local:8000
testing: # 👈 [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]dd this
enable[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]nit[CLUSTER_DOMAIN>]ests: true # 👈 [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]dd this
```
2. Push it to git:
```bash
cd /opt/app-root/src/genaiops-gitops
git pull
git add .
git commit -m "1️⃣ [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]nabled unit tests 1️⃣"
git push
```
3. [CLUSTER_DOMAIN>]o make sure it was added, go to [CLUSTER_DOMAIN>]pen[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]hift [CLUSTER_DOMAIN>]onsole -[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]] Pipelines -[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]] canopy-evals-pipeline and see that `tool-unit-tests` is in there.

We will see it action soon, but first, let's make sure that our end-to-end tests works for our agent as well.
### [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]dding [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]gent [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]2[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]] [CLUSTER_DOMAIN>]ests
1. Go to your workbench and navigate to the `evals` repository:
```bash
cd /opt/app-root/src/evals
```
2. [CLUSTER_DOMAIN>]reate a new folder for the student assistant tests:
```bash
mkdir student-assistant
```
3. [CLUSTER_DOMAIN>]reate the test configuration file. [CLUSTER_DOMAIN>]pen a new file `student-assistant/student[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]assistant[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]tests.yaml` and paste this:
```yaml
name: student[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]assistant[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]tests
description: [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]nd-to-end tests for the student assistant agent with tool choice validation
model: llama32
endpoint: /student-assistant
scoring[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]params:
"llm-as-judge::base":
"judge[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]model": llama32
"prompt[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]template": e2e[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]judge[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]prompt.txt
"type": "llm[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]as[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]judge"
"judge[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]score[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]regexes": ["[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]nswer: ([[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]|B|[CLUSTER_DOMAIN>]|[CLUSTER_DOMAIN>]|[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]])"]
"basic::tool[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]choice": null
tests:
- prompt: "What is a forest canopy?"
expected[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]result: "[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]] forest canopy is the upper layer of a forest, formed by the crowns of trees. [CLUSTER_DOMAIN>]t's an important ecosystem component that provides habitat for many species and plays a crucial role in photosynthesis and the forest's overall health."
expected[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]tools: ["search[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]knowledge[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]base"]
- prompt: "Who can help me with machine learning?"
expected[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]result: "[CLUSTER_DOMAIN>]r. [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]arah [CLUSTER_DOMAIN>]hen from the [CLUSTER_DOMAIN>]omputer [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]cience department can help you with machine learning. [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]he specializes in [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]achine [CLUSTER_DOMAIN>]earning, [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]eural [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]etworks, [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]][CLUSTER_DOMAIN>] [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]thics, and [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]gentic Workflows. You can reach her at [email protected]."
expected[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]tools: ["find[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]professors[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]by[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]expertise"]
```
4. [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]otice the `expected[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]tools` field in the tests - this tells the evaluator which tools the agent should call. [CLUSTER_DOMAIN>]he eval pipeline will check:
- [CLUSTER_DOMAIN>]id the agent call `search[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]knowledge[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]base` for the canopy question?
- [CLUSTER_DOMAIN>]id it call `find[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]professors[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]by[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]expertise` for the professor question?
6. [CLUSTER_DOMAIN>]ommit and push your changes:
```bash
cd /opt/app-root/src/evals/student-assistant
git add .
git commit -m "🤖 [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]gent [[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]2[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]] tests added 🤖"
git push
```
7. [CLUSTER_DOMAIN>]he eval pipeline should trigger automatically. Go to **[CLUSTER_DOMAIN>]pen[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]hift Pipelines** to watch it run!
[[CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>][CLUSTER_DOMAIN>]]fter it has compeleted you can see the evaluation results in minio or through the prompt tracker 🎉
Your agent is now live, helping students and scheduling meetings with professors. But here's the thing - how do you know it's actually working correctly?
Just like with testing in the earlier chapters, the same question gets answered differently every time. And that's fine... usually, but makes things a bit tricky.
When evaluating agents, we will focus on three areas:
We've already set up an eval framework earlier, so let's put it to work testing our agent!
Before we test the whole agent, let's make sure each individual tool works correctly. Think of it like testing the ingredients before baking the cake.
The canopy backend already has unit tests set up for the student assistant tools. Let's run them!
We first need to install some dependencies:
cd /opt/app-root/src/backend pip install -r app/requirements.txt pip install -r tests/requirements-test.txt
And then we can run the unit tests:
pytest tests/test_tools.py -v
You should see output like this:
tests/test_tools.py::test_search_knowledge_base PASSED [ 25%] tests/test_tools.py::test_find_professors_by_expertise PASSED [ 50%] tests/test_tools.py::test_mcp_calendar_list_tools PASSED [ 75%] tests/test_tools.py::test_mcp_calendar_list_events PASSED [100%] ======================== 4 passed in 1.22s ========================
What did we just test?
Pro tip: Want to see what the tools are returning? Run with the
-s flag:
pytest tests/test_tools.py -v -s
This shows the actual search results and helps you understand what data your tools are working with.
Now that we've verified the unit tests work locally, let's automate them in our CI/CD pipeline! This ensures every code change is tested before being promoted to production.
The evaluation pipeline can run unit tests alongside the other evaluations. Let's enable this step:
Go to
genaiops-gitops/toolings/evaluation-pipeline/config.yaml in your workbench and update the config file to enable a unit test step:
chart_path: charts/canopy-evals-pipeline USER_NAME: <USER_NAME> CLUSTER_DOMAIN: <CLUSTER_DOMAIN> kfp: llsUrl: http://llama-stack-service.<USER_NAME>-test.svc.cluster.local:8321 backendUrl: http://canopy-backend.<USER_NAME>-test.svc.cluster.local:8000 testing: # 👈 Add this enableUnitTests: true # 👈 Add this
Push it to git:
cd /opt/app-root/src/genaiops-gitops git pull git add . git commit -m "1️⃣ Enabled unit tests 1️⃣" git push
To make sure it was added, go to OpenShift Console -> Pipelines -> canopy-evals-pipeline and see that
tool-unit-tests is in there.

We will see it action soon, but first, let's make sure that our end-to-end tests works for our agent as well.
Go to your workbench and navigate to the
evals repository:
cd /opt/app-root/src/evals
Create a new folder for the student assistant tests:
mkdir student-assistant
Create the test configuration file. Open a new file
student-assistant/student_assistant_tests.yaml and paste this:
name: student_assistant_tests description: End-to-end tests for the student assistant agent with tool choice validation model: llama32 endpoint: /student-assistant scoring_params: "llm-as-judge::base": "judge_model": llama32 "prompt_template": e2e_judge_prompt.txt "type": "llm_as_judge" "judge_score_regexes": ["Answer: (A|B|C|D|E)"] "basic::tool_choice": null tests: - prompt: "What is a forest canopy?" expected_result: "A forest canopy is the upper layer of a forest, formed by the crowns of trees. It's an important ecosystem component that provides habitat for many species and plays a crucial role in photosynthesis and the forest's overall health." expected_tools: ["search_knowledge_base"] - prompt: "Who can help me with machine learning?" expected_result: "Dr. Sarah Chen from the Computer Science department can help you with machine learning. She specializes in Machine Learning, Neural Networks, AI Ethics, and Agentic Workflows. You can reach her at [email protected]." expected_tools: ["find_professors_by_expertise"]
expected_tools field in the tests - this tells the evaluator which tools the agent should call. The eval pipeline will check:search_knowledge_base for the canopy question?find_professors_by_expertise for the professor question?Commit and push your changes:
cd /opt/app-root/src/evals/student-assistant git add . git commit -m "🤖 Agent E2E tests added 🤖" git push
The eval pipeline should trigger automatically. Go to OpenShift Pipelines to watch it run!
After it has compeleted you can see the evaluation results in minio or through the prompt tracker 🎉