-
Notifications
You must be signed in to change notification settings - Fork 25.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BLIP2] BLIP2QFormerLayer is missing the self.intermediate parameter, which makes training from scratch impossible #30846
Comments
Hi @tongda, thanks for raising this issue! Would you like to open a PR with this suggestion? |
I agree that this makes sense, although if we do that it won't be backward compatible as we're going to change the way layers are designed. I think our implementation follows quite closely the original implementation: https://github.com/salesforce/LAVIS/blob/main/lavis/models/blip2_models/Qformer.py that also implements it that way (one needs to pass already processed |
@younesbelkada Thanks for confirmation. def forward(
self,
input_ids=None, # <---- the input_ids is the instruction for Qformer
position_ids=None,
query_embeds=None, # <---- the query_embeds is embedding of pretrained query_tokens of Qformer
past_key_values_length=0,
):
if input_ids is not None:
seq_length = input_ids.size()[1]
else:
seq_length = 0
if position_ids is None:
position_ids = self.position_ids[
:, past_key_values_length : seq_length + past_key_values_length
].clone()
if input_ids is not None:
embeddings = self.word_embeddings(input_ids)
if self.position_embedding_type == "absolute":
position_embeddings = self.position_embeddings(position_ids)
embeddings = embeddings + position_embeddings
if query_embeds is not None:
embeddings = torch.cat((query_embeds, embeddings), dim=1) # <---- if the input_ids exists, the final embeddings is concatenation of both. The query embed part and instruction embed part are treated in different way in the later process.
else:
embeddings = query_embeds
embeddings = self.LayerNorm(embeddings)
embeddings = self.dropout(embeddings)
return embeddings @amyeroberts I would like to make the change. However some unittest breaks, I will make a PR when I figure out how to fix the tests. |
Thanks for double checking, this is clearer for me indeed this could make it BC - let us know when you open the PR ! |
Thanks for raising this issue, it is is actually being addressed at #29261 |
Cool! I have checked the PR, it indeed include the change that I need and even more. Good job. |
System Info
transformers: 4.40.2
python: 3.10.14
system: Ubuntu 20.04
Who can help?
@amyeroberts
@NielsRogge
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Refer to
BLIP2QformerLayer
code here and here,self.intermediate
parameter is used to project text instruction embedding to LLM latent space. ButBLIP2QformerModel.forward
only acceptquery_embeds
as input, which is always embededquery_tokens
. So the text instruction path inBLIP2QformerLayer
will never go through.According to original paper:
Although the inference part do not need the query-text interaction, the model should provide a way for query-text interaction for training phase.
Expected behavior
Since current implementation forces user embeds the query outside of Qformer. To minimize the side effect, I think a simple way to fix it is to add
self.intermediate
andself.output
parameter back toBLIP2QformerLayer
and add a parameterquery_length
toBLIP2QFormerModel.forward()
, which is used to mark the length ofquery_tokens
. Nevertheless, the embedding of text instruction for Qformer should be consistent with Qformer backbone, instead of LLM embedding.The text was updated successfully, but these errors were encountered: