my research direction

the research I do is basically thinking about the papers I read, I do not have the math background yet to create my own research work. So do not consider this some AI reasearchers work :) ..it is just informal log of what I have been thinking and reading and testing

My "feel the model" research
I have been thinking about how we are building natural language autoencoders (not me, but anthropic safety team) - and how this might not be something the serves the purpose of model safety very well, because we miss out on a lot of data when we convert model activations into a text features. I propose that we build tansducers to convert activations into sensory data or humans to percoieve through BCI or visualization or sonification

My proposal related to this topics have been accepted at Scientific Python US 2026 and EuroSciPy (European Scientific Python 2026) - link


We can ttrace path of every neuron, as I learnt - this is why we need that. I have tried to divide the study into 4 parts - I dont know if that is good but it works for me. I am for now only looking into interpreting an already-trained model as-is. These are the broad ways:

Close-up of a researcher analyzing AI model data on multiple screens.
Close-up of a researcher analyzing AI model data on multiple screens.
Understand the dependcies

study conditional activation and attention dependency - what varies with what?

Visualization of AI decision pathways highlighting safety checkpoints.
Visualization of AI decision pathways highlighting safety checkpoints.
Understand the path in layers

Tracing, Connecting the layers - Basically understanding what sits between and influences the path of activations

Team meeting discussing reconnaissance techniques around a table.
Team meeting discussing reconnaissance techniques around a table.
Graphs and charts showing AI model vulnerability assessments.
Graphs and charts showing AI model vulnerability assessments.
Understand the Structure

What computation is happening, look at the residual stream and break it.

unchanged stuff

What stays unchanged between prompts