Unmanned aerial vehicles (UAVs) are invaluable technologies concerning their remote control and monitoring capabilities. Convolutional neural networks (CNNs), known for their high pattern recognition capabilities, are appropriate for forest fire detection with UAVs. Deep convolutional neural networks show substantial performance on hardware with high processing capabilities. While these networks can be operated in unmanned aerial vehicles controlled from ground control stations equipped with GPU-supported hardware, the execution on a typical UAV’s limited computational resources necessitates the use of lightweight, small-sized networks. To overcome these impediments, this article presents a lightweight and attention-based approach for performing forest fire detection tasks using UAV vision data (images acquired by cameras mounted on UAVs). In this paper, we also present comprehensive research for different approaches such as transfer learning, deep CNNs, and lightweight CNNs. Among the experimented models, the attention-based EfficientNetB0 backboned model emerged as the most successful architecture for forest fire detection. With the test accuracy of (Formula presented.), the F1-score of (Formula presented.), the recall of (Formula presented.), and the precision of (Formula presented.) have strongly reinforced the efficiency of the EfficientNetB0-based model in wildfire recognition. Moreover, the network has a less parameter size than the experimented networks. It proves the model’s suitability for wildfire detection with UAVs having limited hardware resources.