I'm a 3rd year Ph.D. in the
My research focuses on automating and optimizing foundation model inference--particularly for large language models (LLMs) and vision models--to facilitate inference efficiency in their real-world applications. While training these models incurs a significant but one-time expense, the costs of inference can escalate quickly over the model's lifecycle. To address this, my work develops automated techniques for pruning, quantization, and knowledge distillation, reducing the manual effort in tuning these methods. Ultimately, my goal is to make foundation models more accessible and sustainable across diverse domains. As an open-source effort towards this goal, I develop and maintain the library
My recent research interest includes the following topics: