About the self-attention of DETR decoder #9

jiugexuan · 2024-10-01T10:52:40Z

In paper:
We propose a novel lightweight relation extractor, EGTR, which exploits the self-attention of DETR decoder, as depicted in Fig. 3. Since the self-attention weights in Eq. (1) contain N × N bidirectional relationships among the N object queries, our relation extractor aims to extract the predicate information from the self-attention weights in the entire L layers, by considering the attention queries and keys as subjects and objects, respectively.
Is the self-attention of DETR decoder the mask multi-attention layer in transformer decoder?

The text was updated successfully, but these errors were encountered:

jinbae · 2024-10-02T01:18:52Z

Please refer to the DETR paper.
The self-attention of DETR decoder (not masked) is different from that of the original Transformer decoder (masked).

The difference with the original transformer is that our model decodes the N objects in parallel at each decoder layer,
while Vaswani et al. [47] use an autoregressive model that predicts the output sequence one element at a time.

jiugexuan · 2024-10-17T08:41:08Z

so the q,k used for relation from the first attn_layer transformer decoder ?

from here?

jinbae · 2024-10-17T16:15:41Z

Yes, that's right.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

About the self-attention of DETR decoder #9

About the self-attention of DETR decoder #9

jiugexuan commented Oct 1, 2024

jinbae commented Oct 2, 2024

jiugexuan commented Oct 17, 2024

jinbae commented Oct 17, 2024

About the self-attention of DETR decoder #9

About the self-attention of DETR decoder #9

Comments

jiugexuan commented Oct 1, 2024

jinbae commented Oct 2, 2024

jiugexuan commented Oct 17, 2024

jinbae commented Oct 17, 2024