The rise of mechanistic interpretability

I recently read a twitter thread about the growing interest of pre-PhD students on mechanistic interpretability. A lot of non-pre-PhD researchers speculated this could be due to how the field is marketed and the fact that there’s a lower barrier to entry in terms of compute resources. What no one mentioned is how baffling the idea of building and deploying systems that nobody understands into the wild is to a newcomer to the field. Building abstract systems that ‘work’ (on a tiny, carefully curated distribution) and only later attempting to decode their inner workings is one thing, but arguing these systems would never be understood, as if by divine decree, is something else entirely. Without evidence, that’s a belief not a fact. Trying out different hyperparameter combinations, folding hands in prayer and making sure not to touch anything that works without understanding why it works does not sound like the ‘science’ pre-phd students imagine themselves doing, I think.