Discussion about this post

User's avatar
Ex-Consultant in Tech's avatar

The deeper issue with LLM evals is that they force teams to admit something uncomfortable: most companies never really defined “quality” in the first place. With deterministic software, you could hide behind pass/fail tests. With LLMs, that illusion breaks. Now you have to decide what matters: accuracy, usefulness, tone, risk, latency, cost, refusal behavior, source faithfulness, user trust, business outcome. And those things often conflict.

AI Agents Simplified's avatar

Tools do not only change what we can do.

Over time, they also change what cognitive muscles we continue to practice using ourselves.

1 more comment...

No posts

Ready for more?