my proposed techniques for safe AI
Custom reconnaissance techniques to enhance AI model safety and reliability
in progress
I am building models with sole purpose of drawing circuit heat maps and flagging issues in the model under monitoring
in progress
I am building an array of visualization tools for analyzing the activation paths. Reduced data extraction is necessary to do that, I am working on a tool to do that. => https://github.com/modelrecon/mr-recon-tracer
This tool will extract data in the my proposed Activity Cube format
some existing techniques
Pairwise Shapley Values
This is an improvement over traditional feature-attribution with Shapley values, this method explains predictions by comparing pairs of similar data instances, yielding more intuitive, human-relatable explanations while reducing computational overhead. Read more
ViTmiX
This is a sort of a hybrid explainability method targeting vision-transformer (ViT) models, combining multiple visualization techniques to produce clearer explanations of why a model made a certain decision (e.g. in object recognition or segmentation tasks). Read More
XAI‑Guided Context‑Aware Data Augmentation
This method uses XAI insights (which features the model considers important) to guide augmentation so that transformations preserve relevant information; this helps improve performance and generalization, especially in low-resource domains. Read more
Causal-inference and neuro-symbolic explainability approaches
This method goes beyond correlation-based explanations, these aim to reveal causal relationships and embed symbolic (human-understandable) reasoning into neural models to make their decisions more transparent and interpretable. Read more
Mechanistic interpretability / Circuit tracing & sparse decomposition
This is my favourite one, it is a set of techniques that attempt to peer inside deep networks (especially large language models or transformers) and decompose them into simpler, interpretable sub-components (e.g. “circuits,” “features,” “concepts”) rather than treating them as opaque black boxes. Read more
Interactive and user-centered explanations / context-aware XAI
This technique proposes systems where users can ask “what-if” questions (counterfactuals) or receive context-tailored explanations (e.g. for medical or IoT applications) rather than static feature-importance outputs. Read more
