24th ISPRS Congress on Imaging Today, Foreseeing Tomorrow, Nice, Fransa, 6 - 11 Haziran 2022, cilt.5-2, ss.211-218
A challenging aspect of developing deep learning-based models for extracting building footprints from very high resolution (< 0.1 m) aerial imagery is the amount of details contained within the images. The use of convolutional neural networks (CNNs) to tackle semantic image segmentation has been shown to outperform conventional computer vision and machine learning approaches in various applications. Here, we investigated the performances of two different CNN architectures, U-Net and LinkNet by implementing them on various backbones and by using a number of building footprint vectors in a part of Turkey for training. The dataset includes red-green-blue (RGB) true orthophotos and normalized digital surface model (nDSM) data. The performances of the implemented methods were assessed comparatively by using the RGB data only and the RGB + nDSM. The results show that by adding nDSM as the fourth band to the RGB, the accuracy values obtained from the RGB only results were improved by 3.27% and 5.90% expressed in F1-Score and Jaccard (IoU) values, respectively. The highest accuracy reflected by the F1-Score of the validation data was 97.31%, while the F1-Score of the test data that was excluded from the model training was 96.14%. A vectorization process using the GDAL and Douglas-Peucker simplification algorithm was also performed to obtain the building footprints as polygons.