Skip to content

Pull requests: huggingface/nanotron

Author
Filter by author
Label
Filter by label
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Milestones
Filter by milestone
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

Adding support for training chat models
#187 opened May 28, 2024 by TJ-Solergibert Loading…
[Feature] Monitor model states during training
#183 opened May 25, 2024 by xrsrke Loading…
Fix overflow in nanosets with big datasets
#182 opened May 23, 2024 by jquesnelle Loading…
Ring attention
#181 opened May 23, 2024 by zzhhjjj Loading…
Llama3 conversion scripts 🦙
#174 opened May 20, 2024 by TJ-Solergibert Loading…
9 tasks done
Fix _RowLinearAsyncCommunication
#172 opened May 16, 2024 by C-TC Loading…
[Feature] Mixture of Depths
#171 opened May 15, 2024 by xrsrke Draft
[Feature] Infini Attention
#169 opened May 14, 2024 by xrsrke Draft
Core attention
#168 opened May 13, 2024 by zzhhjjj Loading…
Adding checkpoint after traning ends
#165 opened May 7, 2024 by angegonzalez Loading…
Enable masking when tp=1
#160 opened May 2, 2024 by YongjunHe Loading…
llama tests
#157 opened Apr 30, 2024 by zzhhjjj Loading…
Fix TestContext warning
#156 opened Apr 29, 2024 by AleHD Loading…
Checkpoint 1.3 backwards compatibility
#152 opened Apr 25, 2024 by AleHD Loading…
3 tasks done
readme
#145 opened Apr 22, 2024 by zzhhjjj Loading…
Use CUDA Events for measuring elapsed time
#143 opened Apr 20, 2024 by staghado Loading…
Haojun/inference
#142 opened Apr 19, 2024 by zzhhjjj Loading…
[Feature] Use uv instead of pip in CI/CD
#116 opened Mar 26, 2024 by xrsrke Loading…
Integrating ScatterMoE
#104 opened Mar 15, 2024 by shawntan Loading…
[FP8 Training] End-to-end FP8 Training
#70 opened Feb 15, 2024 by xrsrke Loading…
ProTip! Type g i on any issue or pull request to go back to the issue listing page.