Computer Vision and Image Understanding, cilt.267, 2026 (SCI-Expanded, Scopus)
Vehicle re-identification requires representations that simultaneously capture global context and highly adaptive, fine-grained local cues. Yet, current architecture often struggles to combine long-range global context with the adaptive, high-frequency details required to distinguish similar vehicles. Transformer-based Re-ID methods rely on fixed linear projections that fail to model nonlinear, content-dependent appearance changes. In contrast, part-based networks rely on rigid pooling regions that struggle under viewpoint shifts and environmental variations. To overcome these limitations, we introduce a unified Operational Transformer with a Global Fusion Attention Module (OT-GFAM). By integrating operational nonlinear neurons, our method achieves feature-level adaptivity, dynamically capturing complex spectral variations and content-dependent details. Complementing this adaptive feature extraction, we incorporate a geometrically structured Multi-Granularity Part Embedding (GLPE) to enforce spatial alignment. Unlike standard linear Q,K,V projections, our model captures complex spectral features and content-dependent variations directly within the attention computation. Extensive experiments on VeRi-776, VehicleID, and VRU demonstrate that the proposed method achieves strong performance, including a 94.30% mAP on VRU, 91.41% mAP on VeRi-776, and 90.27% mAP on VehicleID (large). These results show that the proposed method offers a robust, computationally practical solution for real-world vehicle re-identification.