Tuomas Oikarinen



Developing scalable ways to understand deep learning. Especially excited about using (mechanistic) interpretability to help improve safety and reliability of neural networks. Current interests include automated interpretability, rigourous interpretability evals, concept bottleneck models (CBMs) and sparse autoencoders (SAEs).

PhD student at UC San Diego advised by Prof. Tsui-Wei (Lily) Weng.
Bachelor of Science in Computer Science and Engineering and in Philosophy from MIT

Google Scholar / Github / email: toikarinen@ucsd.edu

Select Publications (Interpretability)


Linear Explanations for Individual Neurons - [code] - [website]

Label-Free Concept Bottleneck Models - [code] - [slides]

CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks - [code] - [slides]

Other Interpretability


Interpretable Generative Models through Post-hoc Concept Bottlenecks - [code] - [project website]

Concept Bottleneck Language Models for Protein Design - [code]

Concept Bottleneck Large Language Models - [code] - [project website]

Interpreting Neurons in Vision Networks with Language Models - [code] - [project website]

Concept Driven Continual Learning - [code] - [project website]

Concept-Monitor: Understanding DNN training through individual neurons

The Importance of Prompt Tuning for Automated Neuron Explanations - [code] - [project website]

Adversarial Robustness


Corrupting Neuron Explanations of Deep Visual Features - [code]

Robust Deep Reinforcement Learning through Adversarial Loss - [code] - [slides]

Applied ML


GraphMDN: Leveraging Graph Structure and Deep Learning to Solve Inverse Problems - [code]

Landslide Geohazard Assessment with Convolutional Neural Networks using Sentinel-2 Imagery Data - [code]

Deep Convolutional Network for Animal Sound Classification and Source Attribution using Dual Audio Recordings - [code]