Capital-of Is Not a Single SAE Feature. So I Built a Mutation Loop to Find What Is.
SAE features can't isolate relations in Gemma-2-2B. I built a mutation-selection loop that can. The bottleneck was tokenization.
Open post2 posts
SAE features can't isolate relations in Gemma-2-2B. I built a mutation-selection loop that can. The bottleneck was tokenization.
Open postWhy sycophancy SAE features have Cohen's d=9.9 but hallucination detection fails. The answer turned out to be deeper than measurement timing.
Open post