Tuomas's homepage

Tuomas Oikarinen

Developing scalable ways to understand deep learning. Especially excited about using (mechanistic) interpretability to help improve safety and reliability of neural networks. Current interests include automated interpretability, rigourous interpretability evals, concept bottleneck models (CBMs) and sparse autoencoders (SAEs).

PhD student at UC San Diego advised by Prof. Tsui-Wei (Lily) Weng.
Bachelor of Science in Computer Science and Engineering and in Philosophy from MIT

Google Scholar / Github / email: toikarinen@ucsd.edu

Select Publications (Interpretability)

Evaluating Neruon Explanations: A Unified Framework with Sanity Checks - [code] - [website]

T. Oikarinen, G. Yan, T.-W. Weng
ICML 2025

Linear Explanations for Individual Neurons - [code] - [website]

T. Oikarinen, T.-W. Weng
ICML 2024

Label-Free Concept Bottleneck Models - [code] - [slides]

T. Oikarinen, S. Das, L. M. Nguyen, T.-W. Weng
ICLR 2023

CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks - [code] - [slides]

T. Oikarinen, T.-W. Weng
ICLR 2023 (Spotlight)

Other Interpretability

Interpretable Generative Models through Post-hoc Concept Bottlenecks - [code] - [project website]

A. Kulkarni, G. Yan, C.-E. Sun, T. Oikarinen, T.-W. Weng
CVPR 2025

Concept Bottleneck Language Models for Protein Design - [code]

A.A. Ismail, T. Oikarinen, A. Wang, J. Adebayo, S. Stanton, T. Joren, T. Kleinhenz, A. Goodman, H.C. Bravo, K. Cho, N.C. Frey
ICLR 2025

Concept Bottleneck Large Language Models - [code] - [project website]

C.-E. Sun, T. Oikarinen, B. Ustun, T.-W. Weng
ICLR 2025

Interpreting Neurons in Vision Networks with Language Models - [code] - [project website]

N. Bai*, R. A. Iyer*, T. Oikarinen, A. Kulkarni, T.-W. Weng
TMLR 2025, ICML 2024 Mechanistic Interpretability Workshop (Spotlight)

Concept Driven Continual Learning - [code] - [project website]

S.-H. Yang, T. Oikarinen, T.-W. Weng
TMLR 2024

Concept-Monitor: Understanding DNN training through individual neurons

M. A. Khan, T. Oikarinen, T.-W. Weng
AAAI 2024 DAI Workshop

The Importance of Prompt Tuning for Automated Neuron Explanations - [code] - [project website]

J. Lee*, T. Oikarinen*, A. Chatha, K.-C Chang, Y. Chen, T.-W. Weng
NeurIPS 2023 ATTRIB workshop

Adversarial Robustness

Corrupting Neuron Explanations of Deep Visual Features - [code]

D. Srivastava, T. Oikarinen, T.-W. Weng
ICCV 2023

Robust Deep Reinforcement Learning through Adversarial Loss - [code] - [slides]

T. Oikarinen, W. Zhang, A. Megretski, L. Daniel, T.-W. Weng
NeurIPS 2021

Applied ML

GraphMDN: Leveraging Graph Structure and Deep Learning to Solve Inverse Problems - [code]

T. Oikarinen, D. Hannah, S. Kazerounian
IJCNN 2021

Landslide Geohazard Assessment with Convolutional Neural Networks using Sentinel-2 Imagery Data - [code]

S. Ullo, M. Langenkamp, T. Oikarinen, M.P. Del Rosso, A. Sebastianelli, F. Piccirillo, S. Sica
IEEE IGARSS 2019

Deep Convolutional Network for Animal Sound Classification and Source Attribution using Dual Audio Recordings - [code]

T. Oikarinen, K. Srinivasan, O. Meisner, J. Hyman, S. Parmar, A. Fanucci-Kiss, R. Desimone, R. Landman, G. Feng
The Journal of the Acoustical Society of America, 2019