Felix Koole

Writing

Essays on AI security, adversarial machine learning, prompt injection, and the systems we build to keep AI honest.

5-min Paper: Should you use float or ints as confidence number

When you ask an LLM to rate its confidence, does the format matter? We tested four SOTA models with decimal (0.00–1.00) and integer (0–100) confidence scores across true, dubious, and nonsense labels. Decimal format produced more conservative estimates on ambiguous inputs and dramatically better cross-model agreement. Integer format caused surprising failures — GPT-5.2 alternated between 0 and 100 on obvious nonsense. The culprit? Tokenization. The 0. prefix appears to anchor models into calibrated probability-reasoning mode that integers simply don't activate.

5 minute paper
VectorSmell: The Fast, Deterministic Alternative to LLM-Based Content Moderation

If you're building chatbots or content moderation systems, you've probably wrestled with a frustrating tradeoff: regex is blazingly fast but brittle, while LLMs are flexible but painfully slow. What if there was a middle ground that gave you the best of both worlds? Enter VectorSmell — a hybrid approach that sits perfectly between probabilistic and deterministic text classification, delivering sub-50ms response times while maintaining the flexibility that regex can only dream of.

content moderation