Efficient Segmentation Using Attention-Fusion Modules With Dense Predictions


ERİŞEN S.

IEEE ACCESS, cilt.13, ss.107552-107565, 2025 (SCI-Expanded, Scopus) identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 13
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1109/access.2025.3581986
  • Dergi Adı: IEEE ACCESS
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC, Directory of Open Access Journals
  • Sayfa Sayıları: ss.107552-107565
  • Hacettepe Üniversitesi Adresli: Hayır

Özet

Fusing the multi-scale global and local semantic information remains a challenging task for foundation models with computational costs and the need for effective long-range recognition. Based on the recent success of transformers and attention mechanisms, this research applies dense predictions to attention-based methods of attention-boosting modules and attention-fusion networks with residual layers, resulting in SERNet-Former_v2. Dense predictions are deployed to augment the efficacy of dense-attention-boosting modules and dense-attention-fusion networks, enabling the extraction of global context feature maps in the encoder. This approach also enhances the performance of state-of-the-art segmentation networks, addressing the challenges issued by foundation models like InternImage. Attention-based algorithms are utilized in InternImage architectures that combine vision transformers and convolutional layers. Our enhancements showed improvements in the test performance of InternImage-H on the Cityscapes test set (86.2 % mean IoU), on BDD100K (74.1 % mean IoU), and ADE20K datasets (64.0 % mean IoU). SERNet-Former_v2 has also been evaluated on challenging datasets of ADE20K, BDD100K, CamVid, and Cityscapes, showing significant advancements through attention-based methods. The results achieved by SERNet-Former_v2, developed using our methods on these datasets, are found to be noteworthy: 85.12 % mean IoU on the Cityscapes test dataset, 59.35 % mean IoU on the ADE20K validation dataset, 67.42 % mean IoU on BDD100K validation dataset, and 85.08 % mean IoU on the CamVid dataset.