Current interests
My long-term goal is to develop systems capable of open-ended scientific knowledge discovery. My current interests revolve around enabling agents to learn from their own experience in open-ended and complex environments:
- Upscaling Reinforcement Learning — Enabling agents to learn from experience in vast state & action spaces under long-horizon, complex tasks.
- How do we densify sparse reward signals (e.g., verifiable outcome rewards, natural language task descriptions, etc.)?
- How do we attribute credit to atomic actions (e.g., tokens)?
- How do we learn from natural language feedback? Can models learn from their own feedback?
- Autonomous Information Seeking — Enabling agents to identify the data they need and develop a plan to obtain it via interaction.
- Can we intrinsically incentivize agents to perform efficient goal-directed exploration?
- How can agents explore & test-time search in large, noisy, and feedback-scarse environments?
- World Modeling & Open-Endedness — Generating scalable streams of experiential data, grounding long reasoning trajectories, and providing non-saturable training incentives.
- Can we leverage world models for online planning and training-time efficient sampling?
- Should we define a world modeling objective (i.e., modeling the dynamics of the environment) in post-training just like we did in pre-training (i.e., next-token prediction)?