3rd IEEE International Conference on Computer Vision and Machine Intelligence (IEEE CVMI 2024), Allahabad, Hindistan, 19 - 20 Ekim 2024, ss.1-6, (Tam Metin Bildiri)
This research proposes an encoder-decoder architecture with a unique, efficient residual network, EfficientResNet. Attention-boosting gates and modules are fused with the feature-based semantic information and the output of the global context of the efficient residual network in the encoder. The decoder network is developed with additional attention-fusion networks inspired by the attention-boosting modules. Attention-fusion networks are designed to efficiently improve the one-too ne conversion of semantic information by deploying additional convolution layers in the decoder part. Our network is tested on the challenging CamVid and Cityscapes datasets. The proposed methods in this research improve the residual networks significantly. To the best of our knowledge, the developed network, SERNet-Former, achieves state-of-the-art results (84.62 % mean IoU) on the CamVid dataset and challenging results (87.35 % mean IoU on the validation dataset) on the Cityscapes datasets.