A small AI lab in public. Notes, tools, and experiments.

Model confidence often collapses around instruction edges. Token logprobs expose these brittle spans. A heatmap shows which phrases flip outcomes when wording shifts slightly.

Quick recipe

  1. Request token logprobs for a batch of realistic prompts.
  2. Highlight tokens around instructions and constraints.
  3. Compare spans across variants to see where confidence dips.