GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning

University of Maryland
* Indicates equal contribution
GenFlowRL Teaser

Illustration of our GenFlowRL framework, which guides visuomotor RL policy by taking the generative object-centric flow as task motion prior (Right). In our proposed hybrid reward model, dense flow matching between online trajectories and flow prior, synergizing with sparse state-aware reward, facilitates efficient, robust, and generalizable policy learning. Our extensive evaluation includes 10 challenging simulation manipulation tasks (Fig.a) and real-world cross-embodiment reward matchness probing experiments (Fig.b).

Abstract

Recent advances have shown that video generation models can enhance robot learning by deriving effective robot actions through inverse dynamics. However, these methods heavily depend on the quality of generated data and struggle with fine-grained manipulation due to the lack of environment feedback. While video-based reinforcement learning improves policy robustness, it remains constrained by the uncertainty of video generation and the challenges of collecting large-scale robot datasets for training diffusion models. To address these limitations, we propose GenFlowRL, which derives shaped rewards from generated flow trained from easy-to-collect cross-embodiment datasets. This enables learning generalizable and robust policies from expert demonstrations using low-dimensional, object-centric features. Experiments on 10 manipulation tasks, both in simulation and real-world cross-embodiment evaluations, demonstrate that GenFlowRL effectively leverages manipulation features extracted from generated object-centric flow, consistently achieving superior performance across diverse and challenging scenarios.

Method Overview

Architectural overview of our proposed GenFlowRL framework, which encompasses our flow generation process (left), flow-derived policy learning (middle), and inference stage (right).

GenFlowRL Framework

Evaluations

Simulation Tasks

We conduct experiments with five challenging contact-rich manipulation and deformable manipulation tasks from the previous work Im2Flow2Act, where we compare our method with the baseline method Im2Flow2Act and other flow-based imitation learning algorithm.

Comparison with other methods

Comparison with Baseline Models

We conduct experiments with five contact-rich manipulation tasks from the MetaWorld benchmark, where we compare our method with other video-based reward shaping methods.

Baseline comparison

Real-world human hand flow-based reward shaping

We conduct real-world experiments to evaluate the effectiveness of the delta-flow-derived reward shaping method for cross-embodiment human hand demonstrations.

Force and position control demonstration

BibTeX

@article{yu2025genflowrl,
  title={GenFlowRL: Shaping Rewards with Generative Object-Centric Flow in Visual Reinforcement Learning},
  author={Yu, Kelin and Zhang, Sheng and Soora, Harshit and Huang, Furong and Huang, Heng and Tokekar, Pratap and Gao, Ruohan},
  journal={arXiv preprint arXiv:2505.20498},
  year={2025}
}