Tuomas Oikarinen



Developing scalable ways to understand deep learning. Especially excited about using (mechanistic) interpretability to help improve safety and reliability of neural networks. Current interests include automated interpretability, rigourous interpretability evals, concept bottleneck models (CBMs) and sparse autoencoders (SAEs).

PhD student at UC San Diego advised by Prof. Tsui-Wei (Lily) Weng.
Bachelor of Science in Computer Science and Engineering and in Philosophy from MIT

Google Scholar / Github / email: toikarinen@ucsd.edu

Select Publications (Interpretability)


Linear Explanations for Individual Neurons - [code] - [website]

Label-Free Concept Bottleneck Models - [code] - [slides]

CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks - [code] - [slides]

Other Interpretability


Concept Driven Continual Learning - [code] - [project website]

Describe-and-Dissect: Interpreting Neurons in Vision Networks with Language Models - [code] - [project website]

Crafting Large Language Models for Enhanced Interpretability - [code]

Concept-Monitor: Understanding DNN training through individual neurons

The Importance of Prompt Tuning for Automated Neuron Explanations - [code] - [project website]

Adversarial Robustness


Corrupting Neuron Explanations of Deep Visual Features - [code]

Robust Deep Reinforcement Learning through Adversarial Loss - [code] - [slides]

Applied ML


GraphMDN: Leveraging Graph Structure and Deep Learning to Solve Inverse Problems - [code]

Landslide Geohazard Assessment with Convolutional Neural Networks using Sentinel-2 Imagery Data - [code]

Deep Convolutional Network for Animal Sound Classification and Source Attribution using Dual Audio Recordings - [code]