Robust visual navigation and adaptive control decision-making for autonomous agricultural robots in mature wheat fields: an improved RT-DETRv2 and Fuzzy-PID framework
Keywords:
Visual Navigation, Wheat Crop Row Detection, RT-DETRv2, Multi-Scale Attention, Adaptive Fuzzy-PID, Precision Agriculture.Abstract
Perceptual challenges are extreme in mature wheat due to dense canopy occlusion, specular reflections from senescent awns, wind lodging and therefore variable inter-row spacing, and the challenge of having to operate a combine in real time. In this situation, the authors propose a framework, RT-DETRv2-MSCA-SCGE with Adaptive Fuzzy-PID (RTMS-AFP), which combines the visual crop-row detection and real-time control decision-making for an autonomous agricultural machine. From the perceptual perspective, two modules are added: a Multi-Scale Channel Attention (MSCA) module in the transformer encoder backbone to improve the receptive field coverage and a Spatial Context Global Encoder (SCGE) module in the feature pyramid neck, to improve the edge-feature discrimination in dense canopy scenarios. The detection model was trained using a custom-made dataset of 7,846 annotated wheat-field images (2,134 images captured at original locations and 5,712 images created with diverse conditions), across seven environmental conditions in three wheat-growing regions in India and China. A PID controller with adaptive gains is implemented on the control side to adaptively adjust PID gains continuously through real-time fuzzy inference, which is derived from lateral deviation and heading angle error obtained from the detection pipeline. With a real-time processing time of 26.4 ms per frame, the proposed model achieved a mAP@50 of 97.1%, a mAP@50-95 of 89.4%, 3.84 px mean lateral error and 2.46° mean heading error, meeting real-time requirements with improvements of +2.96%, +4.93%, and -25.0% over baseline RT-DETRv2. Field trials over three wheat growing seasons in Yangling, China and Coimbatore, India resulted in an average crop row recognition accuracy of 97.8 % and RMSE of 0.041 m for forward speeds up to 1.1 m/s. One-way ANOVA (F = 74.2, p < 0.001) and Mann-Whitney U tests (p < 0.001) showed that the proposed framework was significantly superior to all the baseline frameworks. These findings demonstrate the ability of RTMS-AFP to solve the autonomous navigation problem in the most difficult part of the cereal harvest process, while being efficient in terms of computational power.
Downloads
Published
How to Cite
Issue
Section
Copyright (c) 2025 Madhusmita Swain

This work is licensed under a Creative Commons Attribution 4.0 International License.