Forward vs Reverse KL in LLM Training
Why the direction of your KL divergence matters more than you think...
AI Researcher @ AWS
I'm an AI researcher at Amazon Web Services (AWS), focusing on artificial intelligence, agentic AI systems, and large language models. My work explores how we can make AI systems more capable, efficient, and reliable — from training efficiency in deep learning to building AI agents that can interact with the real world through tool use and function calling.
Previously, I completed my PhD at USC working on training efficiency in deep learning, with a focus on leveraging synthetic data for NLP tasks like named entity recognition and relation extraction.
arXiv:2510.17052
Read PaperarXiv:2510.17058
Read PaperNAACL 2024 Findings
Read PaperAsilomar Conference 2022
Read PaperIEEE ISIT 2019
Read PaperCWIT 2019
Read PaperIEEE Photonics Journal
Read PaperCWIT 2017
Read Paper
Why the direction of your KL divergence matters more than you think...
Strands vs AutoGen vs CrewAI — which one should you use?
Does cranking up temperature actually make your model more creative?
Understanding the key optimization that makes LLM inference fast...
Speed up inference by guessing tokens — and being right most of the time...
The elegant math behind modern LLM position encoding...