Checkpoint: Evaluation and Observability
- I can define task success metrics.
- I can unit test individual tools.
- I can run integration tests for an agent flow.
- I can inspect traces and logs for a failed run.
- I measured quality, latency, cost, and failure rate.