Data collection and preprocessing
Data acquisition: Orchestrating nature’s symphony
In the pursuit of enhancing soybean cultivation through artificial intelligence, a meticulous orchestration unfolded, weaving together a diverse ensemble of data sources reminiscent of nature’s symphony. The data ecosystem embraces high-resolution aerial and ground-based imagery, capturing the visual poetry within soybean fields. This imagery harmonizes with real-time weather data, conducting atmospheric rhythms. Furthermore, historical pest incidence records resonate as timeless echoes within the dataset.
The pests included in this study are Anticarsia, Coccinellidae. A total of 1050 photos were taken between the hours of 8 and 10 am and 5 and 6:30 pm on several days and in varying weather. The weather datasheet has minimum temperature, maximum temperature, rainfall, evaporation, sunshine, WindGustDir, WindGustSpeed, WindDir9am, WindDir3pm, WindSpeed9am, WindSpeed3pm, Humidity9am, Humidity3pm, Pressure9am, Pressure3pm, Cloud9am, Cloud3pm, Temp9am, Temp3pm, RainToday, RISK_MM and RainTomorrow are key meteorological variables used to characterize and analyze weather conditions (Fig 1).
Data enhancement: Augmenting reality for machines
Each element within the dataset underwent a symphony of data enhancements. Images were refined, with adjustments made to contrast, brightness and resolution, allowing the models to discern the subtle details of soybean leaves and the elusive shadows of pests. Augmentation techniques, akin to the human eye’s adaptation, were meticulously applied to enrich the dataset, including rotation, translation and scaling
(Singh et al., 2016). This melodic process yielded a training corpus capable of revealing the most subtle cues of pest infestations.
Data fusion: Where nature meets code
The data fusion process resembled composing a masterpiece, blending visual and numerical harmonies into a seamless score for machine learning. Weather data, capturing the cadence of temperature, humidity and precipitation, was precisely synchronized with the corresponding image timestamps
(Waheed et al., 2020). This fusion enabled model to discern the connection between atmospheric conditions and pest prevalence, mimicking the human mind’s ability to perceive patterns and correlations.
Anomaly Detection: Harmonizing the outliers
The raw dataset was a cacophony of information, sometimes accompanied by anomalies resembling dissonant chords in a symphony. Adhering to the principles of anomaly detection inspired by the human brain, statistical methods and machine learning were employed to recognize and synchronize these outliers
(Saleem et al., 2019). The process involved harmonizing noisy data points, correcting timestamps and reconciling inconsistent records to compose a harmonious dataset.
Ethical considerations: Ensuring data respect
Just as a symphony requires a conductor, the research adhered to ethical guidelines. The ethical treatment of data, respecting privacy and consent, was ensured. Personal information was rigorously anonymized and the work was conducted with the utmost respect for both the environment and individual privacy, akin to the respect and empathy inherent in human interactions. In this symphonic endeavor of data collection and preprocessing, a dataset was orchestrated that reflects the artistry of soybean pest detection while respecting the principles of data ethics. The resulting dataset serves as the foundation upon which machine learning models conduct their symphony of vigilance in the soybean fields.
In the pursuit of establishing a digital sentinel to protect soybean crops from pest invasions, an intricate orchestration of data acquisition, preprocessing, feature extraction and machine learning was devised. Similar to an unwavering human guardian, the digital sentinel employs its “eye” - computer vision and sensors - to scan the fields tirelessly, leaving no subtle signs of infestation unnoticed.
In central China, the research region was chosen to be Guoyang County, Bozhou City, Anhui Province (33°27'~33°47' N, 115°53'~116°33' E). That county in Anhui Province has the greatest cultivation of soybeans. The plains that made up the majority of the county’s topography had a mild, temperate, semi-humid monsoon climate with moderate rainfall and enough sunlight.
Utilizing a diverse sensor array, the digital sentinel taps into the capabilities of RGB cameras, multispectral sensors and environmental data collectors. Serving as its eyes, these sensors capture real-time high-resolution imagery and environmental parameters. State-of-the-art drones, fixed cameras and weather stations are employed to ensure a continuous vigil.
Cameras are used to obtain pictures of the soybean crops from the field. A total of 1050 images taken in soybean fields comprise the dataset used for this study. Cameras were positioned 45 degrees above the soybean canopy at a height of 1.52 meters and they were configured to take a picture of the plot every 15 minutes from sunrise to sunset.
Data preprocessing: The sentinel’s prudent judgment
The preparation of the data is a comprehensive task before the digital sentinel closely reviews images and environmental data. Similar to how a human observer filters out noise to concentrate on essential details, the preprocessing pipeline purges, standardizes and enhances the data. Image enhancement techniques, calibration and data fusion are executed with the precision of a seasoned field expert. The preprocessing of data includes the following steps:
Image cropping
Crop the images to remove unnecessary background and focus on the region of interest in the soybean canopy. This ensures that the models of machine learning focus on relevant features.
Color correction
To account for fluctuations in lighting conditions throughout the day, adjust and standardize the color balance across all images. For accurate analysis, this stage ensures constant color representation.
Resolution standardize
Ensure that all image resolutions are uniformly formatted. By doing this step, the model’s performance is protected from fluctuations in image quality.
Feature extraction
This process helps to reduce the number of dimensions in the data while maintaining the most important information.
Data augmentation
To expand the dataset, use data augmentation techniques. Generalization of the model can be improved by using techniques such as rotation, flipping, or zooming to help diversity of the information used for training.
Data splitting
The dataset is divided into test, validation and training sets.
Feature extraction: The sentinel’s analytical mind
In replicating the discerning capabilities of a human expert in identifying pest-related anomalies, the sentinel utilizes state-of-the-art feature extraction algorithms. Deep convolutional neural networks (CNNs) act as its analytical mind, dissecting the images into meaningful patterns. Texture analysis, color histograms and shape recognition enable the sentinel to spot even the subtlest cues of pest presence.
It is an effective way to build a cutting-edge deep learning model that combines Recurrent Neural Networks (RNNs) for temporal relationship capture in environmental elements and Convolutional Neural Networks (CNNs) for image processing. These kinds of models are frequently employed in many different contexts, such as climate analysis, environmental monitoring and remote sensing. The method used in this work is represented in the flow chart (Fig 2).
Model training
During the training phase, the digital sentinel transforms into a formidable protector. A substantial dataset of labeled images and environmental data is fed to it. The sentinel, like an eager student, fine-tunes its neural networks through backpropagation, striving to minimize error and maximize prediction accuracy. Hyperparameter tuning and cross-validation serve as the sentinel’s practice sessions, ensuring it masters the art of early pest detection. The CNN model for image processing and RNN for temporal evaluations are presented in Fig 3 and 4.
Validation: The sentinel’s proving ground
The sentinel’s skills are rigorously tested in the validation phase. Subject to rigorous evaluations, the model undergoes assessments akin to those required to validate the expertise of a human professional in the agricultural domain. Precision, recall, F1 score and confusion matrices are calculated to assess the performance of the system. The sentinel’s vigilance is measured by its ability to minimize false negatives and false positives-ensuring a balance between early detection and minimizing unnecessary interventions.
Ethical considerations: The sentinel’s code of conduct
Adhering to ethical guidelines, the digital sentinel operates under a strict code of conduct, mirroring the principles that guide human behavior. Addressing data privacy and prioritizing the well-being of the crop, actions of the sentinel are designed to ensure that sensitive information remains confidential, minimizing pesticide usage and environmental impact.
Constructing the methodology around the idea of a digital sentinel with human-like qualities emphasizes the precision, vigilance and ethical considerations that underpin the research, ensuring soybean crops receive the best possible protection against pest threats.