Open‑YOLO 3D replaces costly SAM/CLIP steps with 2D detection, LG label‑maps, and parallelized visibility, enabling fast and accurate 3D OV segmentation.Open‑YOLO 3D replaces costly SAM/CLIP steps with 2D detection, LG label‑maps, and parallelized visibility, enabling fast and accurate 3D OV segmentation.

Drop the Heavyweights: YOLO‑Based 3D Segmentation Outpaces SAM/CLIP

Abstract and 1 Introduction

  1. Related works
  2. Preliminaries
  3. Method: Open-YOLO 3D
  4. Experiments
  5. Conclusion and References

A. Appendix

3 Preliminaries

Problem formulation: 3D instance segmentation aims at segmenting individual objects within a 3D scene and assigning one class label to each segmented object. In the open-vocabulary (OV) setting, the class label can belong to previously known classes in the training set as well as new class labels. To this end, let P denote a 3D reconstructed point cloud scene, where a sequence of RGB-D images was used for the reconstruction. We denote the RGB image frames as I along with their corresponding depth frames D. Similar to recent methods [35, 42, 34], we assume that the poses and camera parameters are available for the input 3D scene.

\

3.1 Baseline Open-Vocabulary 3D Instance Segmentation

We base our approach on OpenMask3D [42], which is the first method that performs open-vocabulary 3D instance segmentation in a zero-shot manner. OpenMask3D has two main modules: a class-agnostic mask proposal head, and a mask-feature computation module. The class-agnostic mask proposal head uses a transformer-based pre-trained 3D instance segmentation model [39] to predict a binary mask for each object in the point cloud. The mask-feature computation module first generates 2D segmentation masks by projecting 3D masks into views in which the 3D instances are highly visible, and refines them using the SAM [23] model. A pre-trained CLIP vision-language model [55] is then used to generate image embeddings for the 2D segmentation masks. The embeddings are then aggregated across all the 2D frames to generate a 3D mask-feature representation.

\ Limitations: OpenMask3D makes use of the advancements in 2D segmentation (SAM) and vision-language models (CLIP) to generate and aggregate 2D feature representations, enabling the querying of instances according to open-vocabulary concepts. However, this approach suffers from a high computation burden leading to slow inference times, with a processing time of 5-10 minutes per scene. The computation burden mainly originates from two sub-tasks: the 2D segmentation of the large number of objects from the various 2D views, and the 3D feature aggregation based on the object visibility. We next introduce our proposed method which aims at reducing the computation burden and improving the task accuracy.

\

4 Method: Open-YOLO 3D

Motivation: We here present our proposed 3D open-vocabulary instance segmentation method, Open-YOLO 3D, which aims at generating 3D instance predictions in an efficient strategy. Our proposed method introduces efficient and improved modules at the task level as well as the data level. Task Level: Unlike OpenMask3D, which generates segmentations of the projected 3D masks, we pursue a more efficient approach by relying on 2D object detection. Since the end target is to generate labels for the 3D masks, the increased computation from the 2D segmentation task is not necessary. Data Level: OpenMask3D computes the 3D mask visibility in 2D frames by iteratively counting visible points for each mask across all frames. This approach is time-consuming, and we propose an alternative approach to compute the 3D mask visibility within all frames at once.

\

4.1 Overall Architecture

\

4.2 3D Object Proposal

\

4.3 Low Granularity (LG) Label-Maps

\

4.4 Accelerated Visibility Computation (VAcc)

In order to associate 2D label maps with 3D proposals, we compute the visibility of each 3D mask. To this end, we propose a fast approach that is able to compute 3D mask visibility within frames via tensor operations which are highly parallelizable.

\ Figure 3: Multi-View Prompt Distribution (MVPDist). After creating the LG label maps for all frames, we select the top-k label maps based on the 2D projection of the 3D proposal. Using the (x, y) coordinates of the 2D projection, we choose the labels from the LG label maps to generate the MVPDist. This distribution predicts the ID of the text prompt with the highest probability.

\

\

\

4.5 Multi-View Prompt Distribution (MVPDist)

\ Table 1: State-of-the-art comparison on ScanNet200 validation set. We use Mask3D trained on the ScanNet200 training set to generate class-agnostic mask proposals. Our method demonstrates better performance compared to those that generate 3D proposals by fusing 2D masks and proposals from a 3D network (highlighted in gray in the table). It outperforms state-of-the-art methods by a wide margin under the same conditions using only proposals from a 3D network.

\

4.6 Instance Prediction Confidence Score

\

:::info Authors:

(1) Mohamed El Amine Boudjoghra, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) (mohamed.boudjoghra@mbzuai.ac.ae);

(2) Angela Dai, Technical University of Munich (TUM) (angela.dai@tum.de);

(3) Jean Lahoud, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) ( jean.lahoud@mbzuai.ac.ae);

(4) Hisham Cholakkal, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) (hisham.cholakkal@mbzuai.ac.ae);

(5) Rao Muhammad Anwer, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) and Aalto University (rao.anwer@mbzuai.ac.ae);

(6) Salman Khan, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) and Australian National University (salman.khan@mbzuai.ac.ae);

(7) Fahad Shahbaz Khan, Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI) and Australian National University (fahad.khan@mbzuai.ac.ae).

:::


:::info This paper is available on arxiv under CC BY-NC-SA 4.0 Deed (Attribution-Noncommercial-Sharelike 4.0 International) license.

:::

\

Market Opportunity
YOLO Logo
YOLO Price(YOLO)
$0.000000006676
$0.000000006676$0.000000006676
+0.58%
USD
YOLO (YOLO) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Momentum Check: Can XDC Network Charge Back to Its Recent Highs, or Is a Pullback in Play?

Momentum Check: Can XDC Network Charge Back to Its Recent Highs, or Is a Pullback in Play?

The crypto market’s prolonged bearish fear is gaining more traction day by day, which has pulled down the total market cap toward $2.95 trillion. Meanwhile, most
Share
Thenewscrypto2025/12/27 15:21
Ondo Plans Tokenized U.S. Stocks and ETFs on Solana in 2026 After Low-Slippage Tests

Ondo Plans Tokenized U.S. Stocks and ETFs on Solana in 2026 After Low-Slippage Tests

The post Ondo Plans Tokenized U.S. Stocks and ETFs on Solana in 2026 After Low-Slippage Tests appeared on BitcoinEthereumNews.com. Ondo Finance plans to launch
Share
BitcoinEthereumNews2025/12/27 19:39
Vitalik Buterin Reveals Ethereum’s Bold Plan to Stay Quantum-Secure and Simple!

Vitalik Buterin Reveals Ethereum’s Bold Plan to Stay Quantum-Secure and Simple!

Buterin unveils Ethereum’s strategy to tackle quantum security challenges ahead. Ethereum focuses on simplifying architecture while boosting security for users. Ethereum’s market stability grows as Buterin’s roadmap gains investor confidence. Ethereum founder Vitalik Buterin has unveiled his long-term vision for the blockchain, focusing on making Ethereum quantum-secure while maintaining its simplicity for users. Buterin presented his roadmap at the Japanese Developer Conference, and splits the future of Ethereum into three phases: short-term, mid-term, and long-term. Buterin’s most ambitious goal for Ethereum is to safeguard the blockchain against the threats posed by quantum computing.  The danger of such future developments is that the future may call into question the cryptographic security of most blockchain systems, and Ethereum will be able to remain ahead thanks to more sophisticated mathematical techniques to ensure the safety and integrity of its protocols. Buterin is committed to ensuring that Ethereum evolves in a way that not only meets today’s security challenges but also prepares for the unknowns of tomorrow. Also Read: Ethereum Giant The Ether Machine Takes Major Step Toward Going Public! However, in spite of such high ambitions, Buterin insisted that Ethereum also needed to simplify its architecture. An important aspect of this vision is to remove unnecessary complexity and make Ethereum more accessible and maintainable without losing its strong security capabilities. Security and simplicity form the core of Buterin’s strategy, as they guarantee that the users of Ethereum experience both security and smooth processes. Focus on Speed and Efficiency in the Short-Term In the short term, Buterin aims to enhance Ethereum’s transaction efficiency, a crucial step toward improving scalability and reducing transaction costs. These advantages are attributed to the fact that, within the mid-term, Ethereum is planning to enhance the speed of transactions in layer-2 networks. According to Butterin, this is part of Ethereum’s expansion, particularly because there is still more need to use blockchain technology to date. The other important aspect of Ethereum’s development is the layer-2 solutions. Buterin supports an approach in which the layer-2 networks are dependent on layer-1 to perform some essential tasks like data security, proof, and censorship resistance. This will enable the layer-2 systems of Ethereum to be concerned with verifying and sequencing transactions, which will improve the overall speed and efficiency of the network. Ethereum’s Market Stability Reflects Confidence in Long-Term Strategy Ethereum’s market performance has remained solid, with the cryptocurrency holding steady above $4,000. Currently priced at $4,492.15, Ethereum has experienced a slight 0.93% increase over the last 24 hours, while its trading volume surged by 8.72%, reaching $34.14 billion. These figures point to growing investor confidence in Ethereum’s long-term vision. The crypto community remains optimistic about Ethereum’s future, with many predicting the price could rise to $5,500 by mid-October. Buterin’s clear, forward-thinking strategy continues to build trust in Ethereum as one of the most secure and scalable blockchain platforms in the market. Also Read: Whales Dump 200 Million XRP in Just 2 Weeks – Is XRP’s Price on the Verge of Collapse? The post Vitalik Buterin Reveals Ethereum’s Bold Plan to Stay Quantum-Secure and Simple! appeared first on 36Crypto.
Share
Coinstats2025/09/18 01:22