CodeAgent-RL: Replication of the RL pipeline described in the Composer 2 report for tool-using code agents
Hermes-4-14B, vLLM, DeepSpeed ZeRO-3, PyTorch Distributed, GRPO / Dr. GRPO, Modal Read the blog post |
Code on GitHub
Toward a 6+ TB/s CuTeDSL MXFP8 Quantizer for Blackwell
Against expectations, this worked somehow Code on GitHub