© 2022, Deutsche Gesellschaft für Photogrammetrie, Fernerkundung und Geoinformation (DGPF) e.V.Remote sensing scene classification deals with the problem of classifying land use/cover of a region from images. To predict the development and socioeconomic structures of cities, the status of land use in regions is tracked by the national mapping agencies of countries. Many of these agencies use land-use types that are arranged in multiple levels. In this paper, we examined the efficiency of a hierarchically designed convolutional neural network (CNN)-based framework that is suitable for such arrangements. We use the NWPU-RESISC45 dataset for our experiments and arrange this data set in a two-level nested hierarchy. Each node in the designed hierarchy is trained using DenseNet-121 architectures. We provide detailed empirical analysis to compare the performances of this hierarchical scheme and its non-hierarchical counterpart, together with the individual model performances. We also evaluated the performance of the hierarchical structure statistically to validate the presented empirical results. The results of our experiments show that although individual classifiers for different sub-categories in the hierarchical scheme perform considerably well, the accumulation of the classification errors in the cascaded structure prevents its classification performance from exceeding that of the non-hierarchical deep model.