Current interests
My long-term goal is to develop systems capable of open-ended scientific discovery. My current interests revolve around enabling agents to learn from their own experience in open-ended and complex environments. This involves two things: environments & signals.
- Scaling Data — Empowering agents to learn from experience in vast state & action spaces under long-horizon, complex tasks.
- Can we build environments for models to gather real-world live data during training?
- Can we intrinsically incentivize agents to perform efficient goal-directed exploration? How do we make agents learn what data to learn from"?
- Which sets of tools maximize agent empowerment for exploration?
- Scaling Supervision — Studying and designing scalable dense feedback methods that provide training & sampling signals.
- How do we construct signals for non-verifiable tasks?
- How do we attribute credit to atomic actions (e.g., tokens)?
- How do we learn from natural language feedback? Can models learn from their own feedback?
- Can we generally extract informative signals from the strong priors LLMs/VLMs have?