-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WandbCallback always (!) uploads entire model checkpoint to wandb #30896
Comments
Hi @mgerstgrasser, thanks for reporting. Not sure I got the issue, did you mean that once enabled, wandb will upload the model in all subsequent script runs even if |
No, I mean that right now in 4.41.0, enabling |
+1 on this observation - I recently upgraded my transformers version and noticed this issue. Can we turning off the uploading of model weights by default, and require an explicit parameter to enable it? This will introduce a lot of bandwidth consumption unknowingly to the end user, and was quite an unpleasant surprise that took me a day to figure out unfortunately :\ |
@mgerstgrasser Would you like to add a flag in your PR #30897 to control this behaviour? As we haven't heard from @parambharat, this issue is not being flagged by others (thanks @wongjingping!), and it seems like something we generally might not want to do I'd say lets add and we can always change the future default behaviour if necessary. |
Done! |
Update: we're going to be reverting #30135 in a patch release, that'll be released soon. The callback changes will stay on main, so:
This will give us time to review/test and make sure this integration is working well for all. Thanks for iterating on this so quickly @mgerstgrasser! |
System Info
transformers==4.41.0
Who can help?
@pacman100 @muellerzr @amyeroberts
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
--report_to wandb
in any script usingTrainer
.Expected behavior
I would expect this to either (a) not upload the initial model checkpoint at all, or (b) only do this if explicitly configured.
As it is, it seems that every run in 4.41.0 that logs to wandb will upload the entire initial model checkpoint to wandb.
This seems to be caused by #30135
The text was updated successfully, but these errors were encountered: