Circuits | Ayda Sultan

Features are the atomic, meaningful units in neural networks. Circuits are the connections between them. If we understand the meaning of each feature, the next question becomes can we understand how these features work together to construct higher-level concepts? Is there a meaningful structure through which information flows? Can we actually extract transferable mini-algorithms from this massive, entangled network?

Circuits represent local subgraphs. Whether they actually exist in networks is a matter of scrutiny, but significant evidence show that there are many model-agnostic circuits from which we can extract understandable algorithms (the weights actually do mean something!). Utilizing techniques such as activation patching (to be discussed in a future post), ablation and so forth we can test for the feasibility of structure within neural networks.

There are voices arguing that circuits are human imposed and that networks use more of distributed representations where every neuron contributes a little to many behaviors without any meaningful structure. Maybe or maybe not.