call for contributiondocumentationgood first issue
Repository metrics
- Stars
- (21,533 stars)
- PR merge metrics
- (Avg merge 5d) (146 merged PRs in 30d)
Description
Motivation
There are many issues related to OOM, e.g. #328 . We might need a clear guide about how to resolve OOM.
Plan
A non-exclusive enumeration about related configurations:
- Rollout:
gpu_memory_utilization - Other Inference:
- Liger Kernel
*_max_len_per_gpu/micro_batch_size_per_gpu
- Training:
- Liger Kernel
- Ulysses Sequence Parallelism
- gradient checkpointing
- offload
TODO
- Complete the list of related configurations
- Benchmark the effect & overhead of each configuration