my research direction
the research I do is basically thinking about the papers I read, I do not have the math background yet to create my own research work. So do not consider this some AI reasearchers work :) ..it is just informal log of what I have been thinking and reading and testing
I think (and read that) XAI should answer these questions:
“Why did the AI output this result for a given input?” - this is what users and testers should ask.
“Which input features or factors contributed the most to this decision?” what was most important part of the model.
“Under what conditions is the AI reliable (or unreliable)?” - this is a big one as this will help us detect misalignment
“What are the limitations, biases, or risks of using this model in production?” - this is more for others
“How can we debug, audit, or improve the model behavior (especially for fairness / safety)?”
We can ttrace path of every neuron, as I learnt - this is why we need that. I have tried to divide the study into 4 parts - I dont know if that is good but it works for me. I am for now only looking into interpreting an already-trained model as-is. These are the broad ways:


Understand the dependcies
study conditional activation and attention dependency - what varies with what?


Understand the path in layers
Tracing, Connecting the layers - Basically understanding what sits between and influences the path of activations




Understand the Structure
What computation is happening, look at the residual stream and break it.
unchanged stuff
What stays unchanged between prompts
