I am a second-year Ph.D. student at the University of Illinois Urbana-Champaign,
advised by Prof. Derek Hoiem
and Prof. Alex Schwing.
My research interests include:
- Representation Learning: Developing region-based visual representations that improve efficiency and performance
compared to conventional patch-based representations.
- Long-Form Video Understanding: Enabling fine-grained retrieval and reasoning over long videos through streaming
architectures and memory integration.
- Multimodal Learning: Building unified architectures that can understand and generate vision, language, audio, and even bio-signals.
Previously, I completed my Masters in Computer Science at UIUC and my Bachelors in Computer
Engineering at Delhi Technological University.
If you are interested in collaborating or would like to discuss research, feel free to reach out to me
at savyak2@illinois.edu.
Hover over the logos to read more about what I worked on.
Research
I have collaborated across industry and academia on projects spanning multimodal learning, video understanding, natural language processing, and active learning.
Meta
May 2025 - Aug 2025
Built EMG-JEPA, a self-supervised joint-embedding predictive architecture for learning transferable representations from unlabeled surface electromyography (sEMG)
signals.
Reduced labeled data requirements and improved cross-user generalization for hand pose estimation.
Adobe Research
May 2024 - Aug 2024
Developed MAGNET a method to simultaneously enhance LLMs with generative and representation
learning capabilities.
The enhanced LLMs can perform open-ended generation, text
infilling, and token-level and sentence-level representation learning.
Allen Institute for AI
May 2023 - Aug 2023
Contributed to Unified-IO 2,
an instruction-following model that can parse and generate multimodal data and perform 120+ tasks.
Worked on a memory-augmented multimodal encoder for understanding videos ranging from a
few seconds to tens of minutes.
National University of Singapore
Apr 2022 - Aug 2022
Mila
Apr 2021 - Nov 2021
Delhi Technological University
Apr 2021 - Nov 2021
Leveraged image-based malware binary representations and techniques like ensembling and autoencoding to develop
S-DCNN and
AE-DCNN, CNNs for malware classification.
Worked on improving object recognition systems in the presence of adversaries like occlusion and blurriness.
Google
May 2020 - Jul 2020
Initiated the development of MuRIL,
a BERT-based multilingual language model for 17 Indian dialects and their transliterated versions.
Achieved a 10.42% F1 improvement in sentiment analysis and a 9.87% in named entity recognition for Indian languages.
Teaching/Mentoring
I have worked as a teaching assistant, where I was responsible for teaching labs, conducting office hours, grading tests,
and mentoring group projects.
Undergraduate Research in Scientific Advancement
Fall 2025
Mentored 2 sophomores (Akash Danda and
Mihir Tandon) in a semester-long research project on
improving object localization in long episodic video memories.
Achieved 60% localization success rate while improving the scalability to process 8 minute long videos .
The project won the best overall award
for the Fall 2025 URSA cohort.
CS 445: Computational Photography
Fall 2023
CS 225: Data Structures and Algorithms
Fall 2022 and Spring 2023
Engineering
I have also worked briefly in software engineering roles (which helped me realize that while I love coding, my
true passion lies in research).
Google
Aug 2021 - Mar 2022
Improved Google Search’s web ranking infrastructure using deep learning for better multimodal
document understanding.
Enhanced precision and recall in salient entity extraction from webpages by transitioning
from traditional ML methods to LLMs.
Cadence Design Systems
Dec 2018 - Jan 2019
Developed a unified functionality interface for two version control systems
- Perforce and ClearCase.
Implemented a functionality to streamline complex multi-step process of fetching file revisions
from the two version control systems using a single bash command.
A full list of publications can be seen on my
Google Scholar
author page.
(* denotes equal contribution)
Unified-IO 2: Scaling Autoregressive Multimodal Model with Vision, Language, Audio, and Action
Jiasen Lu*, Christopher Clark*, Sangho Lee*, Zichen Zhang*, Savya Khosla, Ryan Marten, Derek Hoiem, and
Aniruddha Kembhavi
Computer Vision and Pattern Recognition, 2024
MuRIL: Multilingual Representations for Indian Languages
Simran Khanuja, Diksha Bansal*, Sarvesh Mehtani*,
Savya Khosla*, Atreyee Dey, Balaji Gopalan, Dilip Kumar Margam,
Pooja Aggarwal, Rajiv Teja Nagipogu, Shachi Dave, Shruti Gupta,
Subhash Chandra Bose Gali, Vish Subramanian, and Partha Talukdar
arXiv, 2021
Media Coverage:
Economic Times,
Indian Express,
Google AI Blog