MaGGIe improves accuracy and efficiency in instance matting via transformer attention and sparse convolution, with future goals in weakly-supervised learning.MaGGIe improves accuracy and efficiency in instance matting via transformer attention and sparse convolution, with future goals in weakly-supervised learning.

MaGGIe Roadmap: Overcoming Data Generalization in Matting Models

Abstract and 1. Introduction

  1. Related Works

  2. MaGGIe

    3.1. Efficient Masked Guided Instance Matting

    3.2. Feature-Matte Temporal Consistency

  3. Instance Matting Datasets

    4.1. Image Instance Matting and 4.2. Video Instance Matting

  4. Experiments

    5.1. Pre-training on image data

    5.2. Training on video data

  5. Discussion and References

\ Supplementary Material

  1. Architecture details

  2. Image matting

    8.1. Dataset generation and preparation

    8.2. Training details

    8.3. Quantitative details

    8.4. More qualitative results on natural images

  3. Video matting

    9.1. Dataset generation

    9.2. Training details

    9.3. Quantitative details

    9.4. More qualitative results

6. Discussion

Limitation and Future work. Our MaGGIe demonstrates effective performance in human video instance matting with binary mask guidance, yet it also presents opportunities for further research and development. One notable limitation is the reliance on one-hot vector representation for each location in the guidance mask, necessitating that each pixel is distinctly associated with a single instance. This requirement can pose challenges, particularly when integrating instance masks from varied sources, potentially leading to misalignments in certain regions. Additionally, the use of composite training datasets may constrain the model’s ability to generalize effectively to natural, real-world scenarios. While the creation of a comprehensive natural dataset remains a valuable goal, we propose an interim solution: the utilization of segmentation datasets combined with self-supervised or weakly-supervised learning techniques. This approach could enhance the model’s adaptability and performance in more diverse and realistic settings, paving the way for future advancements in the field.

\ Conclusion. Our study contributes to the evolving field of instance matting, with a focus that extends beyond human subjects. By integrating advanced techniques like transformer attention and sparse convolution, MaGGIe shows promising improvements over previous methods in detailed accuracy, temporal consistency, and computational efficiency for both image and video inputs. Additionally, our approach in synthesizing training data and developing a comprehensive benchmarking schema offers a new way to evaluate the robustness and effectiveness of models in instance matting tasks. This work represents a step forward in video instance matting and provides a foundation for future research in this area.

\ Acknowledgement. We sincerely appreciate Markus Woodson for the invaluable initial discussions. Additionally, I am deeply thankful to my wife, Quynh Phung, for her meticulous proofreading and feedback.

References

[1] Adobe. Adobe premiere. https://www.adobe.com/ products/premiere.html, 2023. 1

\ [2] Apple. Cutouts object ios 16. https://support. apple.com/en-hk/102460, 2023. 1

\ [3] Nicolas Ballas, Li Yao, Chris Pal, and Aaron Courville. Delving deeper into convolutional networks for learning video representations. arXiv preprint arXiv:1511.06432, 2015. 4

\ [4] Arie Berman, Arpag Dadourian, and Paul Vlahos. Method for removing from an image the background surrounding a selected object, 2000. US Patent 6,134,346. 2

\ [5] Guowei Chen, Yi Liu, Jian Wang, Juncai Peng, Yuying Hao, Lutao Chu, Shiyu Tang, Zewu Wu, Zeyu Chen, Zhiliang Yu, et al. Pp-matting: high-accuracy natural image matting. arXiv preprint arXiv:2204.09433, 2022. 2

\ [6] Xiangguang Chen, Ye Zhu, Yu Li, Bingtao Fu, Lei Sun, Ying Shan, and Shan Liu. Robust human matting via semantic guidance. In ACCV, 2022. 2

\ [7] Bowen Cheng, Ishan Misra, Alexander G Schwing, Alexander Kirillov, and Rohit Girdhar. Masked-attention mask transformer for universal image segmentation. In CVPR, 2022. 2

\ [8] Ho Kei Cheng and Alexander G Schwing. Xmem: Longterm video object segmentation with an atkinson-shiffrin memory model. In ECCV, 2022. 1, 5

\ [9] Donghyeon Cho, Yu-Wing Tai, and Inso Kweon. Natural image matting using deep convolutional neural networks. In ECCV, 2016. 2

\ [10] Spconv Contributors. Spconv: Spatially sparse convolution library. https://github.com/traveller59/ spconv, 2022. 5

\ [11] Marco Forte and Franc¸ois Pitie.´ f, b, alpha matting. arXiv preprint arXiv:2003.07711, 2020. 1, 2

\ [12] Google. Magic editor in google pixel 8. https : //pixel.withgoogle.com/Pixel8Pro/usemagic-editor, 2023. 1

\ [13] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In CVPR, 2016. 11

\ [14] Kaiming He, Georgia Gkioxari, Piotr Dollar, and Ross Girshick. Mask r-cnn. In ICCV, 2017. 13

\ [15] Anna Katharina Hebborn, Nils Hohner, and Stefan Muller. Occlusion matting: realistic occlusion handling for augmented reality applications. In 2017 IEEE International Symposium on Mixed and Augmented Reality (ISMAR). IEEE, 2017. 1

\ [16] Qiqi Hou and Feng Liu. Context-aware image matting for simultaneous foreground and alpha estimation. In ICCV, 2019. 1

\ [17] Wei-Lun Huang and Ming-Sui Lee. End-to-end video matting with trimap propagation. In CVPR, 2023. 1, 2, 3, 7, 23

\ [18] Chuong Huynh, Anh Tuan Tran, Khoa Luu, and Minh Hoai. Progressive semantic segmentation. In CVPR, 2021. 2

\ [19] Chuong Huynh, Yuqian Zhou, Zhe Lin, Connelly Barnes, Eli Shechtman, Sohrab Amirghodsi, and Abhinav Shrivastava. Simpson: Simplifying photo cleanup with single-click distracting object segmentation network. In CVPR, 2023. 2

\ [20] Sagar Imambi, Kolla Bhanu Prakash, and GR Kanagachidambaresan. Pytorch. Programming with TensorFlow: Solution for Edge Computing Applications, 2021. 5

\ [21] Lei Ke, Henghui Ding, Martin Danelljan, Yu-Wing Tai, ChiKeung Tang, and Fisher Yu. Video mask transfiner for highquality video instance segmentation. In ECCV, 2022. 2

\ [22] Zhanghan Ke, Jiayu Sun, Kaican Li, Qiong Yan, and Rynson WH Lau. Modnet: Real-time trimap-free portrait matting via objective decomposition. In AAAI, 2022. 2

\ [23] Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollar, and Ross Girshick. Segment anything. In ICCV, 2023. 2, 3

\ [24] Philip Lee and Ying Wu. Nonlocal matting. In CVPR, 2011. 2

\ [25] Anat Levin, Dani Lischinski, and Yair Weiss. A closed-form solution to natural image matting. IEEE TPAMI, 30(2), 2007. 2

\ [26] Jizhizi Li, Sihan Ma, Jing Zhang, and Dacheng Tao. Privacypreserving portrait matting. In ACM MM, 2021. 2

\ [27] Jizhizi Li, Jing Zhang, and Dacheng Tao. Deep automatic natural image matting. In IJCAI, 2021. 2

\ [28] Jiachen Li, Vidit Goel, Marianna Ohanyan, Shant Navasardyan, Yunchao Wei, and Humphrey Shi. Vmformer: End-to-end video matting with transformer. arXiv preprint arXiv:2208.12801, 2022. 3

\ [29] Jizhizi Li, Jing Zhang, Stephen J Maybank, and Dacheng Tao. Bridging composite and real: towards end-to-end deep image matting. IJCV, 2022. 2, 13

\ [30] Jiachen Li, Roberto Henschel, Vidit Goel, Marianna Ohanyan, Shant Navasardyan, and Humphrey Shi. Video instance matting. In WACV, 2024. 2

\ [31] Yaoyi Li and Hongtao Lu. Natural image matting via guided contextual attention. In AAAI, 2020. 1, 2

\ [32] Chung-Ching Lin, Jiang Wang, Kun Luo, Kevin Lin, Linjie Li, Lijuan Wang, and Zicheng Liu. Adaptive human matting for dynamic videos. In CVPR, 2023. 2, 3

\ [33] Shanchuan Lin, Andrey Ryabtsev, Soumyadip Sengupta, Brian L Curless, Steven M Seitz, and Ira KemelmacherShlizerman. Real-time high-resolution background matting. In CVPR, 2021. 2, 3, 5

\ [34] Shanchuan Lin, Linjie Yang, Imran Saleemi, and Soumyadip Sengupta. Robust high-resolution video matting with temporal guidance. In WACV, 2022. 2, 3

\ [35] Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar, and C Lawrence ´ Zitnick. Microsoft coco: Common objects in context. In ECCV, 2014. 2

\ [36] Baoyuan Liu, Min Wang, Hassan Foroosh, Marshall Tappen, and Marianna Pensky. Sparse convolutional neural networks. In CVPR, 2015. 2

\ [37] Hao Lu, Yutong Dai, Chunhua Shen, and Songcen Xu. Indices matter: Learning to index for deep image matting. In CVPR, 2019. 1, 2

\ [38] Seoung Wug Oh, Joon-Young Lee, Ning Xu, and Seon Joo Kim. Video object segmentation using space-time memory networks. In ICCV, 2019. 1

\ [39] Kwanyong Park, Sanghyun Woo, Seoung Wug Oh, In So Kweon, and Joon-Young Lee. Mask-guided matting in the wild. In CVPR, 2023. 1, 2, 3, 6, 19

\ [40] Khoi Pham, Kushal Kafle, Zhe Lin, Zhihong Ding, Scott Cohen, Quan Tran, and Abhinav Shrivastava. Improving closed and open-vocabulary attribute prediction using transformers. In ECCV, 2022. 2

\ [41] Khoi Pham, Chuong Huynh, and Abhinav Shrivastava. Composing object relations and attributes for image-text matching. In CVPR, 2024.

\ [42] Quynh Phung, Songwei Ge, and Jia-Bin Huang. Grounded text-to-image synthesis with attention refocusing. In CVPR, 2024. 2

\ [43] Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, et al. Imagenet large scale visual recognition challenge. IJCV, 2015. 13

\ [44] Soumyadip Sengupta, Vivek Jayaram, Brian Curless, Steven M Seitz, and Ira Kemelmacher-Shlizerman. Background matting: The world is your green screen. In CVPR, 2020. 1

\ [45] Hongje Seong, Seoung Wug Oh, Brian Price, Euntai Kim, and Joon-Young Lee. One-trimap video matting. In ECCV, 2022. 1, 2, 3, 5, 6, 7, 23

\ [46] Xiaoyong Shen, Xin Tao, Hongyun Gao, Chao Zhou, and Jiaya Jia. Deep automatic portrait matting. In ECCV, 2016. 2

\ [47] Yanan Sun, Chi-Keung Tang, and Yu-Wing Tai. Semantic image matting. In CVPR, 2021. 2 [48] Yanan Sun, Guanzhi Wang, Qiao Gu, Chi-Keung Tang, and Yu-Wing Tai. Deep video matting via spatio-temporal alignment and aggregation. In CVPR, 2021. 3, 6

\ [49] Yanan Sun, Chi-Keung Tang, and Yu-Wing Tai. Human instance matting via mutual guidance and multi-instance refinement. In CVPR, 2022. 1, 2, 3, 5, 6, 7, 11, 13, 14, 16, 17, 18, 20

\ [50] Yanan Sun, Chi-Keung Tang, and Yu-Wing Tai. Ultrahigh resolution image/video matting with spatio-temporal sparsity. In CVPR, 2023. 2, 3, 4, 5, 6, 7, 12, 13, 16, 17, 18, 20

\ [51] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. NeurIPS, 30, 2017. 3

\ [52] Tiantian Wang, Sifei Liu, Yapeng Tian, Kai Li, and MingHsuan Yang. Video matting via consistency-regularized graph neural networks. In ICCV, 2021. 3, 5

\ [53] Yumeng Wang, Bo Xu, Ziwen Li, Han Huang, Cheng Lu, and Yandong Guo. Video object matting via hierarchical space-time semantic guidance. In WACV, 2023. 2, 3

\ [54] Ning Xu, Brian Price, Scott Cohen, and Thomas Huang. Deep image matting. In CVPR, 2017. 2

\ [55] Zongxin Yang, Yunchao Wei, and Yi Yang. Associating objects with transformers for video object segmentation. NeurIPS, 2021. 2, 3, 11

\ [56] Qihang Yu, Jianming Zhang, He Zhang, Yilin Wang, Zhe Lin, Ning Xu, Yutong Bai, and Alan Yuille. Mask guided matting via progressive refinement network. In CVPR, 2021. 1, 2, 3, 5, 6, 7, 11, 13, 16, 17, 18, 19

\ [57] Yunke Zhang, Chi Wang, Miaomiao Cui, Peiran Ren, Xuansong Xie, Xian-Sheng Hua, Hujun Bao, Qixing Huang, and Weiwei Xu. Attention-guided temporally coherent video object matting. In ACM MM, 2021. 3, 5, 6, 7

\

:::info Authors:

(1) Chuong Huynh, University of Maryland, College Park (chuonghm@cs.umd.edu);

(2) Seoung Wug Oh, Adobe Research (seoh,jolee@adobe.com);

(3) Abhinav Shrivastava, University of Maryland, College Park (abhinav@cs.umd.edu);

(4) Joon-Young Lee, Adobe Research (jolee@adobe.com).

:::


:::info This paper is available on arxiv under CC by 4.0 Deed (Attribution 4.0 International) license.

:::

\

Market Opportunity
Octavia Logo
Octavia Price(VIA)
$0.0101
$0.0101$0.0101
+4.12%
USD
Octavia (VIA) Live Price Chart
Disclaimer: The articles reposted on this site are sourced from public platforms and are provided for informational purposes only. They do not necessarily reflect the views of MEXC. All rights remain with the original authors. If you believe any content infringes on third-party rights, please contact service@support.mexc.com for removal. MEXC makes no guarantees regarding the accuracy, completeness, or timeliness of the content and is not responsible for any actions taken based on the information provided. The content does not constitute financial, legal, or other professional advice, nor should it be considered a recommendation or endorsement by MEXC.

You May Also Like

Ethereum unveils roadmap focusing on scaling, interoperability, and security at Japan Dev Conference

Ethereum unveils roadmap focusing on scaling, interoperability, and security at Japan Dev Conference

The post Ethereum unveils roadmap focusing on scaling, interoperability, and security at Japan Dev Conference appeared on BitcoinEthereumNews.com. Key Takeaways Ethereum’s new roadmap was presented by Vitalik Buterin at the Japan Dev Conference. Short-term priorities include Layer 1 scaling and raising gas limits to enhance transaction throughput. Vitalik Buterin presented Ethereum’s development roadmap at the Japan Dev Conference today, outlining the blockchain platform’s priorities across multiple timeframes. The short-term goals focus on scaling solutions and increasing Layer 1 gas limits to improve transaction capacity. Mid-term objectives target enhanced cross-Layer 2 interoperability and faster network responsiveness to create a more seamless user experience across different scaling solutions. The long-term vision emphasizes building a secure, simple, quantum-resistant, and formally verified minimalist Ethereum network. This approach aims to future-proof the platform against emerging technological threats while maintaining its core functionality. The roadmap presentation comes as Ethereum continues to compete with other blockchain platforms for market share in the smart contract and decentralized application space. Source: https://cryptobriefing.com/ethereum-roadmap-scaling-interoperability-security-japan/
Share
BitcoinEthereumNews2025/09/18 00:25
Husky Inu (HINU) Completes Move To $0.00020688

Husky Inu (HINU) Completes Move To $0.00020688

Husky Inu (HINU) has completed its latest price jump, rising from $0.00020628 to $0.00020688. The price jump is part of the project’s pre-launch phase, which began on April 1, 2025.
Share
Cryptodaily2025/09/18 01:10
FCA komt in 2026 met aangepaste cryptoregels voor Britse markt

FCA komt in 2026 met aangepaste cryptoregels voor Britse markt

De Britse financiële waakhond, de FCA, komt in 2026 met nieuwe regels speciaal voor crypto bedrijven. Wat direct opvalt: de toezichthouder laat enkele klassieke financiële verplichtingen los om beter aan te sluiten op de snelle en grillige wereld van digitale activa. Tegelijkertijd wordt er extra nadruk gelegd op digitale beveiliging,... Het bericht FCA komt in 2026 met aangepaste cryptoregels voor Britse markt verscheen het eerst op Blockchain Stories.
Share
Coinstats2025/09/18 00:33