my research direction
the research I do is basically thinking about the papers I read, I do not have the math background yet to create my own research work. So do not consider this some AI reasearchers work :) ..it is just informal log of what I have been thinking and reading and testing
My "feel the model" research
I have been thinking about how we are building natural language autoencoders (not me, but anthropic safety team) - and how this might not be something the serves the purpose of model safety very well, because we miss out on a lot of data when we convert model activations into a text features. I propose that we build tansducers to convert activations into sensory data or humans to percoieve through BCI or visualization or sonification
My proposal related to this topics have been accepted at Scientific Python US 2026 and EuroSciPy (European Scientific Python 2026) - link
We can ttrace path of every neuron, as I learnt - this is why we need that. I have tried to divide the study into 4 parts - I dont know if that is good but it works for me. I am for now only looking into interpreting an already-trained model as-is. These are the broad ways:


Understand the dependcies
study conditional activation and attention dependency - what varies with what?


Understand the path in layers
Tracing, Connecting the layers - Basically understanding what sits between and influences the path of activations




Understand the Structure
What computation is happening, look at the residual stream and break it.
unchanged stuff
What stays unchanged between prompts
