Current interests

My long-term goal is to develop systems capable of open-ended scientific knowledge discovery. My current interests revolve around enabling agents to learn from their own experience in open-ended and complex environments:

  • Upscaling Reinforcement Learning — Enabling agents to learn from experience in vast state & action spaces under long-horizon, complex tasks.
    • How do we densify sparse reward signals (e.g., verifiable outcome rewards, natural language task descriptions, etc.)?
    • How do we attribute credit to atomic actions (e.g., tokens)?
    • How do we learn from natural language feedback? Can models learn from their own feedback?
  • Autonomous Information Seeking — Enabling agents to identify the data they need and develop a plan to obtain it via interaction.
    • Can we intrinsically incentivize agents to perform efficient goal-directed exploration?
    • How can agents explore & test-time search in large, noisy, and feedback-scarse environments?
  • World Modeling & Open-Endedness — Generating scalable streams of experiential data, grounding long reasoning trajectories, and providing non-saturable training incentives.
    • Can we leverage world models for online planning and training-time efficient sampling?
    • Should we define a world modeling objective (i.e., modeling the dynamics of the environment) in post-training just like we did in pre-training (i.e., next-token prediction)?

Research output

Type
Title
Authors
Year
Publisher
Recognitions
Links
TBAIntrinsic Credit Assignment for Long Horizon InteractionIlze Amanda Auzina*, Joschka Strüber*, Sergio Hernández-Gutiérrez*, Shashwat Goel, Ameya Prabhu, Matthias Bethge2026
conferenceCo-Adaptation of Embodiment and Control with Self-Imitation LearningSergio Hernández-Gutiérrez, Ville Kyrki, Kevin S. Luck2025IROS
workshopRecursive Decomposition with Dependencies for Generic Divide and Conquer ReasoningSergio Hernández-Gutiérrez, Minttu Alakuijala, Alexander V. Nikitin, Pekka Marttinen2024NeurIPS Sys2 Reasoning
workshopFollowing Ancestral Footsteps: Co-Designing Morphology and Behaviour with Self-Imitation LearningSergio Hernández-Gutiérrez, Ville Kyrki, Kevin S. Luck2024EARL RSS (oral presentation) and EWRLBest Workshop Paper Award (EARL RSS)
thesisSolving Reasoning Problems with Large Language Models via Recursive DecompositionSergio Hernández-Gutiérrez, Pekka Marttinen, Alexander Nikitin, Minttu Alakuijala2024Aalto University