Savya Khosla

I am a first-year Ph.D. student at the University of Illinois Urbana-Champaign, advised by Professor Derek Hoiem and Professor Alexander Schwing. Previously, I completed my Masters in Computer Science at UIUC and my Bachelors in Computer Engineering at Delhi Technological University.

I am broadly interested in long-form video understanding and multimodal learning. More specifically, I have been working on

Designing more efficient ways to represent visual data
Fine-grained retrieval from episodic memory
Unifying generation and representation learning in a single model
Aligning and processing multiple modalities simultaneously

During my MS, I had the opportunity to collaborate with Jiasen Lu and Sangho Lee at the Allen Institute for AI, where I worked on developing an autoregressive multimodal model capable of parsing and generating images, text, audio, and video. Prior to that, I collaborated with Alex Lamb (Mila) and Kenji Kawaguchi (NUS) on improving active learning for heteroskedastic distributions. Earlier, during an internship at Google India, I worked with Partha Talukdar's group on training a multilingual language model for Indian languages.

If you are interested in collaborating, would like to discuss research, or have any question feel free to reach out to me at savyak2@illinois.edu.

[Jun 2025] Preprint of FRAME is out on arXiv.
[May 2025] Preprint of REN is out on arXiv.
[May 2025] Started as a research intern at Meta.
[May 2025] MAGNET got accepted at ACL 2025.
[Feb 2025] RELOCATE got accepted at CVPR 2025.
[Jan 2025] Preprint of MAGNET is out on arXiv.
[Jan 2025] Preprint of RELOCATE is out on arXiv.
[May 2024] Started as a research intern at Adobe Research.
[May 2024] Completed my MS in Computer Science from UIUC.
[Apr 2024] Accepted a CS PhD offer from UIUC.
[Feb 2024] Unified-IO 2 got accepted at CVPR 2024.
[May 2023] Started as a research intern at the Allen Institute for AI.
[Oct 2022] Began collaborating with the Allen Institute for AI on Unified-IO 2.
[Aug 2022] Started the thesis-track M.S. in Computer Science at the University of Illinois Urbana-Champaign.

Hover over the logos to read more about what I worked on.

Research
I have been involved in a range of research projects, collaborating across both industry and academia. My work has focused on a broad array of topics, including multimodal learning, video understanding, natural language processing, active learning, and adversarial learning.

Meta

May 2025 - Aug 2025

Working on multimodal representation learning

Adobe Research

May 2024 - Aug 2024

Developed MAGNET a method to simultaneously enhance LLMs with generative and representation learning capabilities

The enhanced LLMs can perform open-ended generation, text infilling, and token-level and sentence-level representation learning

Allen Institute for AI

May 2023 - Aug 2023

Contributed to Unified-IO 2, an instruction-following model that can parse and generate multimodal data and perform 120+ tasks

Worked on a memory-augmented multimodal encoder for understanding videos ranging from a few seconds to tens of minutes

National University of Singapore

Apr 2022 - Aug 2022

Developed robust active learning algorithm for handling heteroskedastic noise, resulting in 10% accuracy boost over baselines

Demonstrated 15% accuracy improvement in other state-of-the-art algorithms by incorporating a simple self-supervised approach

Mila

Apr 2021 - Nov 2021

Demonstrated catastrophic failure of uncertainty-based active learning algorithms by proposing 3 heteroskedastic data distributions

Proposed interpolated adversarial training that gives 48% reduction in error rate on clean data while preserving adversarial robustness

Delhi Technological University

Apr 2021 - Nov 2021

Leveraged image-based malware binary representations and techniques like ensembling and autoencoding to develop S-DCNN and AE-DCNN, CNNs for malware classification

Worked on improving object recognition systems in the presence of adversaries like occlusion and blurriness

Google

May 2020 - Jul 2020

Initiated the development of MuRIL, a BERT-based multilingual language model for 17 Indian dialects and their transliterated versions

Achieved a 10.42% F1 improvement in sentiment analysis and a 9.87% in named entity recognition for Indian languages

Teaching
I have worked as a teaching assistant, where I was responsible for teaching labs, conducting office hours, grading tests, and mentoring group projects.

CS 445: Computational Photography

Fall 2023

Contributed to Unified-IO 2, an instruction-following model that can parse and generate multimodal data and perform 120+ tasks

Worked on a memory-augmented multimodal encoder for understanding videos ranging from a few seconds to tens of minutes

CS 225: Data Structures and Algorithms

Fall 2022 and Spring 2023

Developed a method to simultaneously enhance LLMs with generative and representation learning capabilities

The enhanced LLMs can perform open-ended generation, text infilling, and token-level and sentence-level representation learning

Engineering
I have also worked briefly in software engineering roles (which helped me realize that while I love coding, my true passion lies in research).

Google

Aug 2021 - Mar 2022

Improved Google Search’s web ranking infrastructure using deep learning for better multimodal document understanding

Enhanced precision and recall in salient entity extraction from webpages by transitioning from traditional ML methods to LLMs

Cadence Design Systems

Dec 2018 - Jan 2019

Developed a unified functionality interface for two version control systems - Perforce and ClearCase

Implemented a functionality to streamline complex multi-step process of fetching file revisions from the two version control systems using a single bash command

A full list of publications can be seen on my Google Scholar author page.
(* denotes equal contribution)

FRAME: Pre-Training Video Feature Representations via Anticipation and Memory

Sethuraman TV, Savya Khosla, Vignesh Srinivasakumar, Jiahui Huang, Seoung Wug Oh, Simon Jenni, Derek Hoiem, Joon-Young Lee

arXiv, 2025

REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders

Savya Khosla, Sethuraman T V, Barnett Lee, Alexander Schwing, and Derek Hoiem

arXiv, 2025

RELOCATE: A Simple Training-Free Baseline for Visual Query Localization Using Region-Based Representations

Savya Khosla, Sethuraman T V, Alexander Schwing, and Derek Hoiem

Computer Vision and Pattern Recognition, 2025

MAGNET: Augmenting Generative Decoders with Representation Learning and Infilling Capabilities

Savya Khosla, Aditi Tiwari, Kushal Kafle, Simon Jenni, Handong Zhao, John Collomosse, and Jing Shi

Association for Computational Linguistics, 2025

Unified-IO 2: Scaling Autoregressive Multimodal Model with Vision, Language, Audio, and Action

Jiasen Lu*, Christopher Clark*, Sangho Lee*, Zichen Zhang*, Savya Khosla, Ryan Marten, Derek Hoiem, and Aniruddha Kembhavi

Computer Vision and Pattern Recognition, 2024

Survey on Memory-Augmented Neural Networks: Cognitive Insights to AI Applications

Savya Khosla*, Zhen Zhu*, and Yifie He*

arXiv, 2023

Understanding and Improving Neural Active Learning on Heteroskedastic Distributions

Savya Khosla, Chew Kin Whye, Jordan T. Ash, Cyril Zhang, Kenji Kawaguchi, and Alex Lamb

European Conference on Artificial Intelligence, 2023

Interpolated Adversarial Training: Achieving Robust Neural Networks without Sacrificing too much Accuracy

Alex Lamb, Vikas Verma, Kenji Kawaguchi, Alexander Matyasko, Savya Khosla, Juho Kannala, and Yoshua Bengio

Neural Networks, 2022

S-DCNN: Stacked Deep Convolutional Neural Networks for Malware Classification

Anil Singh Parihar, Shashank Kumar, and Savya Khosla

Multimedia Tools and Applications, 2022

Catastrophic Failures of Neural Active Learning on Heteroskedastic Distributions

Savya Khosla, Alex Lamb, Jordan T. Ash, Cyril Zhang, and Kenji Kawaguchi

NeurIPS Workshop on Distribution Shifts, 2021

MuRIL: Multilingual Representations for Indian Languages

Simran Khanuja, Diksha Bansal*, Sarvesh Mehtani*, Savya Khosla*, Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam, Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi Dave, Shruti Gupta, Subhash Chandra Bose Gali, Vish Subramanian, and Partha Talukdar

arXiv, 2021
Media Coverage: Economic Times, Indian Express, Google AI Blog

AE-DCNN: Autoencoder Enhanced Deep Convolutional Neural Network For Malware Classification

Shashank Kumar*, Savya Khosla*, Shivangi Meena, and Anil Singh Parihar

International Conference on Intelligent Technologies, 2021