Current interests

My long-term goal is to develop systems capable of open-ended scientific discovery. My current interests revolve around enabling agents to learn from their own experience in open-ended and complex environments. This involves two things: environments & signals.

  • Scaling Data — Empowering agents to learn from experience in vast state & action spaces under long-horizon, complex tasks.
    • Can we build environments for models to gather real-world live data during training?
    • Can we intrinsically incentivize agents to perform efficient goal-directed exploration? How do we make agents learn what data to learn from"?
    • Which sets of tools maximize agent empowerment for exploration?
  • Scaling Supervision — Studying and designing scalable dense feedback methods that provide training & sampling signals.
    • How do we construct signals for non-verifiable tasks?
    • How do we attribute credit to atomic actions (e.g., tokens)?
    • How do we learn from natural language feedback? Can models learn from their own feedback?
    • Can we generally extract informative signals from the strong priors LLMs/VLMs have?

Research output

Type
Title
Authors
Year
Publisher
Links
TBASignalBench: Comparing Dense Feedback Methods for Long-Horizon AgentsSergio Hernández-Gutiérrez, Matteo Merler, Ilze Amanda Auzina, Joschka Strüber, Ameya Prabhu, Matthias Bethge2026
TBARevengeBench: Reverse Engineering Code-Space Policies from Behavioral ExperimentsBabak Rahmani, Sebastian Dziadzio, Joschka Strüber, Sergio Hernández-Gutiérrez, Matthias Bethge2026
conferenceIntrinsic Credit Assignment for Long Horizon InteractionIlze Amanda Auzina*, Joschka Strüber*, Sergio Hernández-Gutiérrez*, Shashwat Goel, Ameya Prabhu, Matthias Bethge2026ICML
conferenceCo-Adaptation of Embodiment and Control with Self-Imitation LearningSergio Hernández-Gutiérrez, Ville Kyrki, Kevin S. Luck2025IROS
workshopRecursive Decomposition with Dependencies for Generic Divide and Conquer ReasoningSergio Hernández-Gutiérrez, Minttu Alakuijala, Alexander V. Nikitin, Pekka Marttinen2024NeurIPS Sys2 Reasoning