verl-project/verl

[Bug][CI] FDSP2 test in `model_rmpad` job seems unstable

Open

#1,388 opened on May 4, 2025

View on GitHub
 (0 comments) (1 reaction) (0 assignees)Python (3,940 forks)auto 404
bugcall for contributiongood first issue

Repository metrics

Stars
 (21,533 stars)
PR merge metrics
 (Avg merge 5d) (146 merged PRs in 30d)

Description

Motivation

https://github.com/volcengine/verl/actions/workflows/model.yml shows that:

  1. the FDSP2 test in model_rmpad workflow fails sometimes;
  2. but can also pass sometimes.

Plan

  • Find a setup that can reproduce the error steadily (possibly using the test container)
  • Locate the root cause
  • Fix the bug

Additional Info.

Contributor guide