Method
- Hold task, data, and model constant.
- Change one prompt detail.
- Run >10 real cases.
- Tag errors; locate brittle spans.
A small AI lab in public. Notes, tools, and experiments.
Compare prompts head to head instead of guessing. Small, controlled diffs expose brittle spans where wording flips decisions.