Hengxu Yu

I'm a Ph.D. student in Data Science at the Chinese University of Hong Kong, Shenzhen working on optimization algorithms for machine learning, particularly focused on developing efficient methods for training large language models and analyzing stochastic optimization algorithms.

My recent work includes developing BAdam, a memory-efficient optimization method for full parameter fine-tuning of large language models, which allows training of models like Llama 3-70B with significantly reduced memory requirements while maintaining strong performance. I've also made theoretical contributions in analyzing random reshuffling algorithms, providing high probability guarantees for their convergence behavior in non-convex optimization settings.

I'm particularly interested in bridging the gap between theoretical analysis and practical algorithm design in optimization. My research aims to develop methods that are both theoretically sound and practically useful, especially for training modern deep learning models at scale. I'm advised by Prof. Xiao Li, focusing on problems at the intersection of optimization theory and machine learning.

Publications

BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models

BAdam: A Memory Efficient Full Parameter Optimization Method for Large Language Models

Qi Luo, Hengxu Yu, Xiao Li

Neural Information Processing Systems 2024

High Probability Guarantees for Random Reshuffling

High Probability Guarantees for Random Reshuffling

Hengxu Yu, Xiao Li

arXiv.org 2023