Details the Q-Former architecture: a 12-layer BERT-based model using 32 learnable query embeddings. These queries use cross-attention to extract visual information for MLLM input.Details the Q-Former architecture: a 12-layer BERT-based model using 32 learnable query embeddings. These queries use cross-attention to extract visual information for MLLM input.

Visual Prompt Generation: Cross-Attention in Q-Former

Abstract and 1 Introduction

  1. Related Work

    2.1. Multimodal Learning

    2.2. Multiple Instance Learning

  2. Methodology

    3.1. Preliminaries and Notations

    3.2. Relations between Attention-based VPG and MIL

    3.3. MIVPG for Multiple Visual Inputs

    3.4. Unveiling Instance Correlation in MIVPG for Enhanced Multi-instance Scenarios

  3. Experiments and 4.1. General Setup

    4.2. Scenario 1: Samples with Single Image

    4.3. Scenario 2: Samples with Multiple Images, with Each Image as a General Embedding

    4.4. Scenario 3: Samples with Multiple Images, with Each Image Having Multiple Patches to be Considered and 4.5. Case Study

  4. Conclusion and References

\ Supplementary Material

A. Detailed Architecture of QFormer

B. Proof of Proposition

C. More Experiments

\ Figure 7. Overview of QFormer

A. Detailed Architecture of QFormer

The architecture overview is depicted in Figure 7. Specifically, QFormer is initialized as a BERT-based model[8] comprising a total of L = 12 layers. In contrast to typical BERT models that process textual inputs, QFormer takes R = 32 learnable query embeddings as inputs. These embeddings are utilized to extract visual information from the input visual data during Stage-1 pretraining in BLIP2[22]. Subsequently, they serve as visual prompt embeddings for the LLM inputs after projection.

\ Inside the QFormer, each layer includes a self-attention module composed of a Multi-Head Attention component and a Forward module (consisting of Linear, LayerNorm, and Residual Connection). The cross-attention module, initialized with random values, is inserted every G layers, where learnable query embeddings interact with visual embeddings. In the main paper, for the sake of conciseness, we condensed the representation of the multi-head attention and forward modules into self(cross) attention modules. Furthermore, we exclusively illustrated the modifications made to the cross-attention module in MIVPG, as the self-attention modules remain unchanged. The final QFormer output is represented by the last layer’s query embeddings.

\ For a more comprehensive understanding, readers are encouraged to refer to [22].

\

:::info Authors:

(1) Wenliang Zhong, The University of Texas at Arlington (wxz9204@mavs.uta.edu);

(2) Wenyi Wu, Amazon (wenyiwu@amazon.com);

(3) Qi Li, Amazon (qlimz@amazon.com);

(4) Rob Barton, Amazon (rab@amazon.com);

(5) Boxin Du, Amazon (boxin@amazon.com);

(6) Shioulin Sam, Amazon (shioulin@amazon.com);

(7) Karim Bouyarmane, Amazon (bouykari@amazon.com);

(8) Ismail Tutar, Amazon (ismailt@amazon.com);

(9) Junzhou Huang, The University of Texas at Arlington (jzhuang@uta.edu).

:::


:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

Market Opportunity
Prompt Logo
Prompt Price(PROMPT)
$0.04975
$0.04975$0.04975
+3.21%
USD
Prompt (PROMPT) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Dogecoin (DOGE) Breaks Below Key Levels With Possible Upside Toward $0.18

Dogecoin (DOGE) Breaks Below Key Levels With Possible Upside Toward $0.18

Dogecoin (DOGE) is at a critical support level following the market downturn. The cryptocurrency is in a crucial area, which is a demand zone. The outcome of this
Share
Tronweekly2025/12/27 12:00
‘Love Island Games’ Season 2 Release Schedule—When Do New Episodes Come Out?

‘Love Island Games’ Season 2 Release Schedule—When Do New Episodes Come Out?

The post ‘Love Island Games’ Season 2 Release Schedule—When Do New Episodes Come Out? appeared on BitcoinEthereumNews.com. LOVE ISLAND GAMES — Episode 201 — Pictured: Ariana Madix — (Photo by: Ben Symons/PEACOCK via Getty Images) Ben Symons/PEACOCK via Getty Images We’ve got a text! It’s time for another season of Love Island Games. With fan-favorites returning in hopes of winning the $250,000 cash prize, read on to learn more about Love Island Games Season 2, including the release schedule so you don’t miss a second of drama. Love Island Games is a spinoff in the Love Island franchise that first premiered in 2023. The show follows a similar format to the original series, but with one major twist: all contestants are returning Islanders from previous seasons of Love Island from around the world, including the USA, UK, Australia and more. Another big difference is that games take on much more importance in Love Island Games than the mothership version, with the results “determining advantages, risks, and even who stays and who goes,” according to Peacock. Vanderpump Rules star Ariana Madix is taking over hosting duties for Love Island Games Season 2, replacing Love Island UK star Maya Jama who hosted the first season. Iain Stirling returns as the show’s narrator, while UK alum Maura Higgins will continue to host the Saturday show Love Island: Aftersun. ForbesWho’s In The ‘Love Island Games’ Season 2 Cast? Meet The IslandersBy Monica Mercuri Jack Fowler and Justine Ndiba were named the first-ever winners of Love Island Games in 2023. Justine had previously won Love Island USA Season 2 with Caleb Corprew, while Jack was a contestant on Love Island UK Season 4. In March 2024, Fowler announced on his Instagram story that he and Justine decided to remain “just friends.” The Season 2 premiere revealed the first couples of the season: Andrea Carmona and Charlie Georgios, Andreina Santos-Marte and Tyrique Hyde,…
Share
BitcoinEthereumNews2025/09/18 04:50
Ethena Price Forecast: ENA Could Surge to $0.38 After Falling Wedge Formation

Ethena Price Forecast: ENA Could Surge to $0.38 After Falling Wedge Formation

Ethena continues to move in the right direction in terms of improving trust and institutional credibility for its synthetic dollar, USDe, given its recent addition
Share
Tronweekly2025/12/27 12:30