The System Design Newsletter

29 LLM Evaluation Concepts Every Engineer…

Apr 27

#142: From “it looked fine in testing” to a system you can actually trust

Read →

3 Comments

AI Agents Simplified

4d

Tools do not only change what we can do.

Over time, they also change what cognitive muscles we continue to practice using ourselves.

Reply

Share

Ex-Consultant in Tech

May 9

The deeper issue with LLM evals is that they force teams to admit something uncomfortable: most companies never really defined “quality” in the first place. With deterministic software, you could hide behind pass/fail tests. With LLMs, that illusion breaks. Now you have to decide what matters: accuracy, usefulness, tone, risk, latency, cost, refusal behavior, source faithfulness, user trust, business outcome. And those things often conflict.

Reply

Share

Humano no Loop – Marcel Scog

Apr 28

That’s an amazing one! Perfect timing! I would like to add just the CI for regression if they are custom-made for the engineer. For instance, you could build a gold set and run it through our CI to ensure that you achieve the desired outcome for each item when you push the same dataset, such as final evaluation, as positive or negative. This way, you can guarantee that any new changes won’t affect past results.

Reply

Share