CodeAgent-RL: Replication of the RL pipeline described in the Composer 2 report for tool-using code agents
Hermes-4-14B, vLLM, DeepSpeed ZeRO-3, PyTorch Distributed, GRPO / Dr. GRPO, Modal
Read the blog post  |  Code on GitHub

Toward a 6+ TB/s CuTeDSL MXFP8 Quantizer for Blackwell
Against expectations, this worked somehow
Code on GitHub