Applying deep neural networks to predict incidence and phenology of plant pests and diseases

,


INTRODUCTION
Agriculture is facing major challenges concerning food security and food production for a global human population predicted to grow to nine billion by 2050 (Godfray et al. 2010).Besides shifting toward more plant-based diets and decreasing food waste (West et al. 2014, Shepon et al. 2018), sustainable intensification is necessary for food and environmental security (Godfray et al. 2010, Garnett andGodfray 2012).The rapidly growing field of precision agriculture uses modern information technology, including computer vision and artificial intelligence, and provides enormous potential to contribute to the goals of more sustainable agriculture (Bongiovanni and Lowenberg-deBoer, 2004, Lindblom et al. 2017, Patrício and Rieder 2018).Referred to as "smart farming," technological development is foreseen to support a more efficient use of natural resources and better target plant protection from pests and pathogens while minimizing hazards to environmental and human health (Chakraborty and Newton 2011, Tilman et al. 2011, Garnett et al. 2013, Walter et al. 2017).Pests and pathogens are responsible for large yield losses (Oerke, 2006) and are often counteracted with pesticides (Lamichhane et al. 2015).Rising concerns regarding the negative externalities of pesticides on human health and environmental safety (Tegtmeier and Duffy 2004, Geiger et al. 2010, Pimentel and Burgess 2014) foster strategies to reduce the risks from their use without compromising productivity and profitability (Lechenet et al. 2017).Reliable pest and pathogen detection and prediction support more timely and precise interventions, and thus reduced pesticide use.
Machine learning has great potential to assist the development of innovative methods for pest and pathogen management, supporting more sustainable plant protection (Behmann et al. 2015).The identification of pest and pathogens and the detection of damages on crops are challenging for farmers, yet crucial for the decision on appropriate control measures (Martinelli et al. 2015, Lamichhane et al. 2016).Deep neural networks (DNNs) and algorithms for image classification (Goodfellow et al. 2016) can serve the detection of pests and pathogens for plant protection (e.g., Mohanty et al. 2016, Sladojevic et al. 2016, Ferentinos 2018).For example, convolutional neural networks (CNNs; Krizhevsky et al. 2012, Szegedy et al. 2015) have been used to identify different pathogens on apple leaves, based on image data (Liu et al. 2018, Zhong andZhao 2020), and for the classification of insect pest species occurring on crops (Cheng et al. 2017, Thenmozhi andReddy 2019).The goal of these technological implementations is to help growers to recognize and detect pests and pathogens in the field, fostering faster and more self-reliant evaluation of the pest situation in situ and support decisionmaking processes to optimize yields in a sustainable way (e.g., Sladojevic et al. 2016).Appropriate timing for application of pest control measures is not trivial (Tang et al. 2010), and anticipating the damage can improve application precision.Temporal precision of application of pest control measures can increase the efficacy and reduce the number of required applications, and therefore lower the total amount (Möhring et al. 2020).Therefore, advancing novel technologies may lead to faster and more efficient recognition processes and eventually contribute to decrease risks associated with application of pest control measures, as well as yield losses.
Decision support systems (DSS) assist crop producers with the surveillance, decision on optimal timing, and anticipation for the need of application of pest control measures (Samietz et al. 2007), and big data in combination with deep learning can increase the precision of DSS.In general, DSS for pests and pathogens rely on phenological models, where the timing of crucial events in the life cycle of those damaging organisms, measured under controlled temperature conditions, is coupled with meteorological data to predict their seasonal occurrence.Phenological models have reached good performances for the prediction of pest and pathogen occurrence, offering an effective complement to field observations (e.g., Schaub et al. 2017), but are still time-and costintensive in development.The resulting models often target specific pest-crop or pathogen-crop systems, are location-dependent (Donatelli et al. 2017), and rarely updated once established.In the meantime, the life cycles of many insects are altered by changing climate conditions (Kingsolver et al. 2011).With warming climate, phenological shifts and disruption of synchrony between host plants and pests are widespread reactions (Forrest, 2016).This leads to increasing discrepancies between model predictions and actual observations, because forecast models at the core of DSS are seldom reparametrized to account for altered insect biology.Further, climate change promotes the introduction and spread of invasive species into newly suitable, so far uncolonized regions (Bebber et al. 2013, Gr ünig et al. 2020), requiring fast development of new DSS.Novel technologies to analyze big data based on deep learning (LeCun et al. 1989(LeCun et al. , 2015) ) can support monitoring and deliver the baseline for developing phenological models needed, in combination with weather data and forecasts, to anticipate pest damages.
Using 52,322 photographs taken under field and standard conditions during the spring-summer of 2019, here we develop a framework toward pest phenology forecasting based on big data and deep learning algorithms.We focus on a proof of concept for the development of damage classification tools, which, in combination with meteorological data, are used to produce phenological models (Fig. 1).We use DNNs to classify damages on apple tree leaves and investigate the phenology of six main classes of damages predicted by the DNNs.We couple the predicted occurrence of damages with meteorological data to model damage phenology.Our case study targets the apple crop because it is the most important fruit crop in Switzerland, with a production varying between 250,000 and 450,000 tons per year depending on weather (SBV 2019).Concerning damages, the focus lays on the mines of the pear leaf blister moth (Leucoptera malifoliella Costa, Lepidoptera: Lyonetiidae; from here on blister moth), a pest that has recolonized orchards in central Switzerland since 2013 (Zwahlen and Hunkeler 2017).The blister moth prefers apple trees as host plants and in case of heavy infestation can affect the photosynthesis and cause premature leaf drop (Ivanov 1976;www.cabi.org).Larvae of the blister moth are solitary miners producing characteristic brown, round mines that are distinguishable from physical damages.Our working hypotheses are as follows: 1. We expect that the development of DNNs to categorize different classes of damages on apple leaves is feasible with a subset of the 52,332 images collected during one season.2. We expect that by applying the resulting classification tools to the full data set, we can reconstruct the phenology of blister moth mines, which should match conventional monitoring methods.3. We expect to find a meteorological signature in damage phenology of the blister moth, providing the basis for the development of statistical prediction models.

Data collection
We sampled leaves and collected images weekly between 15 April and 28 August 2019 in three apple orchards in central Switzerland, in Kleinwangen (47°11 0 49.3″N; 8°17 0 22.8″E; 536 m a.s.l.), Gelfingen (47°13 0 12.9″ N; 8°16 0 10.0″ E; 557 m a.s.l.), and Waedenswil (47°13 0 18.4″ N; 8°40 0 38.6″E; 483 m a.s.l.; see Appendix S1: Fig. S1 for map with the locations).Using smartphone cameras, we collected pictures of leaves in the field (from here on referred to as field pictures) and sampled leaves to take pictures under standardized conditions in the laboratory (from here on standardized pictures).Taking pictures using smartphones results in pictures of a similar quality expected if they were taken by growers, untrained citizen scientists or from automated devices such as drones.We conducted a structural sampling in the three orchards.In each location, we sampled at least 400 leaves per week.With this structural sampling, we aimed to capture a representative set of pest symptoms.As the orchards have different numbers of trees planted in a different number of rows, we conducted three different sampling strategies for both pictures and leaf collection.In Kleinwangen, we took two pictures from every third and in Gelfingen two pictures from every fourth tree per row.In Waedenswil, we took four pictures from every tree.Field pictures and leaves were taken from the lower part and upper part of the trees by random choice of the collector.Collected leaves were kept in a 3°C storage room before we took pictures under standardized conditions.We used scotch tape to stick the leaves on a white paper in order to have a uniform background.The pictures were taken with one of two different mobile phones, an IPhone 6 (8 megapixel camera) and a Sony Xperia X (23 megapixel camera).At the time of sampling, in Kleinwangen and Gelfingen, extension services were conducting tests on the efficacy of pesticides targeted against blister moth.We distinguished between treated and untreated sections of the orchards for the image data collection, irrespective of the management of the treated trees.
In parallel to the leaf and picture sampling, population density of blister moth was monitored with pheromone trapping and mine counting in Gelfingen.The orchard was exposed to testing of eight different control methods including a control section.We placed one trap (Deltatrap pheromone traps, Andermatt Biocontrol AG, Grossdietwil, Switzerland) in the control section to document the occurrence of adults on a weekly basis.Weekly, 50 randomly selected leaves per treatment section were inspected visually for the presence of mines, resulting in 400 leaves per week.Each leaf was carefully checked by eye, and the occurrence of mines was noted.We registered the total number of mines per 50 leaves.The collection of the data set was done in about 50 workdays.

Meteorological data
Meteorological data were extracted from the gridded data set (2 × 2 km) obtained from the Swiss Federal Office of Meteorology and Once the classification tool is established, predictions using deep neural networks can be used for reconstructing the phenology of pest damages (c).Coupling damage phenology with meteorological data enables establishing phenology models (d).Eventually, these phenology models can be used for predictions and in DSS, aiming at informing on pest occurrence to support growers and experts, for instance by implementing the tool in a smartphone app (e).Image classification tools can then reinforce the data collection, by making it available to citizen scientists (f).In this study, we focus on the deep learning and phenology modeling aspects within this framework.

Data preparation
In total, we gathered 52,322 pictures of apple leaves.We manually classified 8,735 randomly sampled standardized (4629) and field pictures (4106) into damage classes.We arbitrarily defined 42 classes of distinguishable damages, including classes containing combined damages and classes with a low number of pictures (less than 100).We focused on seven classes with at least 100 pictures for further processing (Fig. 2): 1. Undamaged: no damages detected on the leaf.2. PLBM: mines of blister moth detected; 3. Physical damages: holes, cracks, fissures, or deformations; 4. Brown spots: brownish spots distinguishable from blister moth mines; 5. Lepidoptera: rolled in leaf edges indicating pupae of Lepidoptera species; 6. Mildew: powdery mildew (Podosphaera leucotricha) detected; 7. Feeding: feeding damage from herbivore insects.
The manual annotation of the pictures was done in about 5 -10 workdays.
We cropped all images to an extent of 2840 × 1560 pixels in the center of the image to focus on the leaf rather than the background (e.g., Fig. 2 left-most pictures of class Undamaged).

Deep neural networks
We used DNNs to apply image classification of the entire image in order to allocate it to one out of several classes.We implemented our deep neural network approach in R (version 3.5.3;R Core Team 2019), using the R-package "reticulate" (version 1.13.0-9003;Ushey et al. 2019) to open an interface to python.We used "Keras" (version 2.2.5.0;Allaire and Chollet 2019) and "Tensorflow" (version 1.9; Allaire and Tang 2019) R-packages as DNN frameworks.We loaded image data with the image_data_generator function of the "Keras" R-package.Images were imported and resized to 256 × 256 pixels with three channels (i.e., RGB color channels) and rescaled to values between 0 and 1.After preliminary experiments with varying and also larger patch sizes of 512 × 512 pixels, we found empirically that 256 × 256 pixel patches perform best.Further, we applied data augmentation to the training data set during the image import.Data augmentation is a common strategy to increase the number of images in the training data set.We applied the following specifications in the image_data_generator function to augment the data: zooming (range = 0.4), rotations (range = 90°), width and height shifts (range = 0.2), shearing (range = 0.2), and horizontal and vertical flips.We tested alternative data augmentation settings with less zooming (range = 0.1) and rotations (5°), as well as with restricted zooming only (see Appendix S1: Figs.S2 and S3 for model performance).As network structure, we used the ResNet50 (He et al. 2016) model, loaded with weights pre-trained on ImageNet (Deng et al. 2009) as base for our model architecture.We fine-tuned all layers of the ResNet50 network and added one dense layer with 256 nodes and a ReLU activation function, as well as an output layer with a softmax activation function on top of the ResNet50 to adapt to our data set.Moreover, we added dropout (0.5) after the ResNet50 network and as well after the densely connected layer to prevent model overfitting.Further, we used an RMSprop optimizer with a base learning rate of 0.0001 and a decay of 0.00001 for gradient descent.We set the mini-batch size, which defines how many images the DNN takes into account per step for calculating the model error and updating the model coefficients, to 32.All networks were trained for 100 epochs (i.e., iterations over the full training data set).
We trained DNNs on different combinations of classes (i.e., different classification tasks).First, we trained DNNs for each damage class to distinguish images of this class, from all other images (i.e., all other classes as one summary class), resulting in six classification tasks.Second, we trained full model DNNs to classify the six main classes (PLBM, Undamaged, Physical damages, Brown spots, Lepidoptera, and Mildew) simultaneously in one DNN.In a preliminary analysis step, we used Feeding and Physical damages as independent classes, but compounded them for the final analysis because DNNs struggled with differentiating these two classes.We trained DNNs for the same classification tasks with field pictures.To establish DNNs, we split the categorized images of each class into five subsets for a fivefold cross-validation.DNNs were trained on four subsets (80% of the data) and tested on the left-out subset (20%).From the 80% training data, 20% were used for validation to tune hyperparameters (i.e., settings to control the learning process of a deep neural network).This procedure was repeated five times resulting in five different DNNs per classification task.Additionally, we evaluated the ability of the DNNs to generalize using a geographic validation as an alternative train-test split strategy.As more pictures were recorded at the two main sites (Gelfingen and Kleinwangen) and less pictures at the smaller site in Waedenswil, a geographic crossvalidation, training DNNs on two sites, and test on a third were not possible, because there were not enough images to train the model for some classes.Therefore, we trained the DNNs on pictures from Gelfingen and Kleinwangen and tested their ability to generalize on the images from Waedenswil.We measured the performance of DNNs for each classification task on the test set with F1 score averaged over the five DNNs per classification task.F1 scores (Eq. 1) were calculated with the scores for true positives (TP), false positives (FP), and false negatives (FN), resulting in values ranging from 0 to 1, 1 being perfect classification: Additionally, we measured performance with classification accuracy as the percentage of correctly classified images.DNNs were evaluated with the performance on the test data set, which was not included in the construction of the network.

Coupling pest damage with meteorological data
We used DNNs to classify all images from Kleinwangen and Gelfingen, because in these two locations we found blister moths.We used the predict_class and the predict_proba function of the "Keras" R-package (version 2.2.5.0;Allaire and Chollet 2019) to obtain predictions on the class and the pseudo-probabilities per class for all images using the full model DNNs.The pseudo-probability prediction is output scores from DNNs showing how confident the DNN is in predicting a class for an image.We grouped the predictions into locations and dates.Further, we calculated the percentages of damaged leaves per sampling event, scaling the number of damaged leaves of each class with the total number of collected leaves to correct for uneven sampling, as the sampling events did not result in the exact same number of images.We used GLMs to model the percentage of damaged leaves with weekly meteorological data as predictor variables.Climatic variables included growing degree-days (i.e., cumulative sum of mean temperature over base temperature of 5°C), mean temperature, precipitation, solar radiation, and diurnal temperature range.We ran GLMs for each meteorological predictor and one multivariate GLM with all predictors, allowing second-degree polynomials and assuming binomial error distribution.We used the ecospat.adj.D2.glm function of the "ecospat" R-package (version 3.0; Broennimann et al. 2018) to obtain model deviance as adjusted D 2 values of all models.Additionally, we use the glarma function of the "glarma" R-package (version 1.6-0, Dunsmuir and Scott 2015) to run GLARMA (generalized linear autoregressive moving average) models for the same predictor variables, to check whether accounting for temporal autocorrelation would change the model estimation of parameters.

Data collection
We collected 52,322 pictures of apple tree leaves in total over 19 weeks of sampling.35,903 pictures were taken in the field and 21,087 under standardized conditions.For the two locations, where we used the image data to reconstruct the blister moth phenology, we gathered 14,466 images in Gelfingen and 18,384 in Kleinwangen.We did not find different signals from the different treatments and therefore only present the results for the collection of treated and control section.From the 8,735 categorized pictures, we found that class PLBM contained 1,390 images, Undamaged 1,415, Physical damages 1,139, Brown spots 2,025, Lepidoptera 103, and Mildew 134 images.

Deep neural networks
We established DNNs for 14 different classification tasks of apple tree leaves.F1 scores for classification tasks of standardized pictures ranged from 0.69 to 0.93 with the exception of the class Lepidoptera (0.32), where the number of training images was very low (with a total of 103 manually classified images, for model training we used between 26 and 44 images depending on classification task (standardized or field) and the cross-validation chunk).Classification accuracy ranged from 91.3% to 99.5%.The full model including all six classes reached a F1 score of 0.89 (standard deviation across the five crossvalidation runs: AE0.035) and a classification accuracy of 95.4% (AE1.5%).DNNs performed generally more poorly on images taken in the field (Fig. 2).F1 scores for field pictures ranged from 0.52 to 0.90 with the exception of physical damage where no F1 score could be determined because none of the test images was assigned to Physical damages, meaning that this class was not recognized by the DNN based on field pictures.Classification accuracy for all classes ranged from 87.5% to 99.4%.The full model reached a F1 score of 0.85 (AE0.02) and a classification accuracy of 87.7% (AE1.6%) for field images.For detailed model performance of each classifier, see Appendix S1: Fig. S4 and confusion matrices in Appendix S1: Tables S1-S8.Further, the geographic validation showed that the single class DNNs were able to generalize to the pictures from a different location for the classes Undamaged and Physical damages, showing similar performance as the fivefold cross-validation, but struggle for the classes of Brown spots and Mildew.Because in Waedenswil we found no PLBM marks and only one picture with Lepidoptera, we were not able to test the generalization for those classes and the full model (see Appendix S1: Table S9 for details).
We used the trained DNNs to classify the images of the data set that were not categorized a priori.From the five DNNs trained for the cross-validation for the full model, we selected the one with the best performance based on F1 score.With the full model for standardized pictures, we found 7627 images of class PLBM, 5598 Undamaged, 3063 Physical damages, 156 Brown spots, 122 Lepidoptera, and 850 Mildew.The full model for field pictures resulted in a prediction of 7350 PLBM, 10,638 Undamaged, 899 Physical damages, 131 Brown spots, 282 Lepidoptera, and 693 Mildew.The full model for field images was unable to detect the class Lepidoptera (see Appendix S1: Fig. S5 for reconstructed phenologies of all damage classes).We used the DNNs to reconstruct the phenology of the blister moth (Fig. 3).We found very similar patterns for the two locations, with an increase in blister moth mines in mid-June and a first peak in early July.Standardized and field pictures show very similar results, although field pictures indicated the peak one week later than the standardized pictures.
Although we found different DNN performances for the standardized pictures and the field pictures, predictions to the full data set resulted in very similar patterns of the phenology of the different damage classes.Further, we compared the blister moth phenology predictions of the DNNs with count data of blister moth adults in traps and mines obtained from surveys in the same orchards (Fig. 4).The count data support our findings for the blister moth phenology based on the predictions of DNNs.The phenology of the trapped adults explained patterns of the mines, which start to emerge 3-4 weeks after the peak of a generation of adults.These results matched well with literature descriptions of development times for one generation (e.g., 36 days at 18°C; Sáringer et al. 1985).We also observed the same pattern of decreasing numbers of mines in early to mid-July, which is explained by the simultaneous emergence of new leaves and the gap between the first and the second generations of blister moth larvae.Mine counts and reconstructed mine phenology showed a Pearson correlation coefficient between 0.938 and 0.978 for both locations with standardized and field pictures.

Coupling pest damages with meteorological data
We quantified the relationship between the phenology of the blister moth and climate using GLMs (Fig. 4).For both locations and for standardized and field pictures, we found that degree-days is the most important variable, with adjusted D 2 values between 0.950 and 0.967 (Table 1).The full model explained only slightly more of the deviance ranging from 0.956 to 0.969.We found similar results with GLARMA models, confirming that degree-days is the best predictor for blister moth phenology from Gelfingen, but only the second best predictor for Kleinwangen after mean temperature (Table 2).

DISCUSSION
In this study, we tested the feasibility of different segments of a framework for developing pest damage forecasting models, relying on big data, DNNs, and meteorological data (Fig. 1).Our case study on blister moth mines suggests that DNNs coupled with meteorological data are suitable tools for the proposed method.We use DNNs to build image classification tools to classify the damages of a large data set of apple leaf images and to reconstruct the phenology of different damage classes (Appendix S1: Fig. S5).Using blister moth as a case study species, we show that the blister moth phenology predicted with DNNs matched count data of mines and adults in the field.Finally, we quantify the phenology of blister moth mines with meteorological variables and show that phenological models based on degree-days are well fitting the blister moth phenology.While we show here that the proposed framework is feasible in principle, for the full implementation of the framework, data collection processes need to be optimized and the phenological models need to be validated with independent data.
Our results show that DNNs are suitable tools for pest damage classification based on image data, similarly as it has earlier been shown for other classification tasks (e.g., Mohanty et al. 2016, Cheng et al. 2017).Trained DNNs reached good model performance for categorizing blister moth damages with a classification accuracy of 93.8% (F1 score of 93.2); all other single class DNNs were trained successfully with F1 scores above 0.86 except Physical Damages (0.69) and Lepidoptera (0.32) (see Appendix S1: Figs.S6 and S7 for some examples of misclassifications).In general, we observe better and more robust results for classes where more data for training were available.DNNs classifying multiple classes were struggling with distinguishing some of the classes, but performed well overall (F1 score of 0.89 for six classes).Compared to studies predicting several classes of pathogens, the accuracies of our DNNs are slightly lower (e.g., Oppenheim and Shani 2017 reached 96% accuracy with five classes; see Barbedo 2018 for an overview).A reason for this could be that the manually classified part of our data set is rather small and applying a cross-validation lowers the number of pictures available to train the DNNs.Note: Degree-days above 5°C (GDD), mean temperature (Tmean), precipitation (Precip), radiation (SRad), diurnal temperature (Diur), and the model with all predictors (Full).Note: Degree-days above 5°C (GDD), mean temperature (Tmean), precipitation (Precip), radiation (SRad), diurnal temperature (Diur), and the model with all predictors (Full).
In general, we observe that DNNs struggle with distinguishing between classes with similar symptoms, for example, between Physical damages and Feeding, which we discriminated in preliminary analyses.Another reason for the decreasing model performance with more classes may lay in the co-occurrence of damages on one leaf, but general solutions to properly identify such simultaneous damages are still lacking (Barbedo 2018).Further, we show that DNNs are also capable of classifying pest damages with images taken under field conditions (see Results, Deep neural networks), which is crucial to develop useful pest or disease recognition tools, as the goal should be the application in the field (Sladojevic et al. 2016, Picon et al. 2019).Overall, DNNs established with field pictures show good results, but as expected, reach slightly lower performance than DNNs for standardized pictures, because external influences such as shading effects, multiple leaves, other plant parts, or irrelevant objects in the background can be disturbing (Ferentinos 2018).Particularly, DNNs struggled with the class of Physical damages.Still, the DNN for the class PLBM with field pictures reached a F1 score of 0.90 (classification accuracy of 93.1%).The full model DNN for field pictures was successful with an F1 score of 0.85 and classification accuracy of 87.7%.Our case study shows that the development of pest damage classification tools using DNNs is realizable, allowing to use those tools to obtain the phenology of classes by analyzing big data sets, given that sufficient data are available.
We highlight that reconstructing damage phenology with DNNs, coupled with meteorological data, opens up new possibilities to produce phenological models for pest forecasting.Big data science has recently been proposed to help to overcome current limitations in pest forecasting (Orlandini et al. 2018).In our case study, we find that degree-days is the most important variable to model blister moth phenology.Degree-days have been shown to be a reliable predictor of insect development (e.g., Cayton et al. 2015) and are important components in phenology models for insects (Nietschke et al. 2007).This is important for the development of operational systems, as in many regions of the world temperature data are available at high spatial and temporal resolution.We validate the reconstructed phenology of the blister moth, with count data on adults and mines obtained from the same study sites, showing that classification tools are able to reconstruct the real phenology.Further, we show that our approach is suitable to reconstruct the phenology of other classes and therefore could be used to investigate not only the phenology of insects but also the phenology of other types of damages (e.g., Mildew).With sufficient image data, the prediction approach may also be implemented for pests and pathogens to find meteorological signals behind their seasonal occurrence or the occurrence of the entailed damages.Successful recognition tools for pathogens have already been developed (e.g., Fuentes et al. 2017, Liu et al. 2018), and seasonal occurrence of pathogens is often limited by abiotic factors (Rossi et al. 2010).In addition, we emphasize that the framework for the establishment of this approach would also be suitable for invasive species phenology modeling, due to the potential of fast implementation.However, expert recommendations on management interventions need to be based on solid testing of control strategies by plant protection experts.Particularly for invasive species, this is crucial to implement sustainable control.We find promising results in this case study, underlining that the proposed framework could bring new opportunities for pest forecasting, given that new methods will help to overcome the lack of data availability.
While this case study highlights new possibilities for pest damage forecasting, we came across some limitations that need to be addressed in future studies.One of the main limitations is the low number of images of some classes we used to train the DNNs, which is also limiting the ability of the DNNs to generalize to an independent data set.Here, we use only a subset of our data set where the frequency of the different classes is not equally distributed.This means that for some classes, only few images are available for training and evaluation of a DNN.Meanwhile, previous studies establish DNNs using the full data of large data sets (e.g., PlantVillage) for model training and evaluation (e.g., Mohanty et al. 2016).However, such image data are limited and previous studies in the field of plant disease classification often rely on the same data set and similar tools, yielding in a low variation between the results of these studies (Barbedo 2018, Arsenovic et al. 2019).Increasing the input data for DNNs could therefore promote higher robustness and performance of the classification tools of some classes (Sladojevic et al. 2016) and increase the generalization ability of the DNNs.Further, within our sampling period, the variation in meteorological conditions was rather small.Longer-term surveys are required to capture a broader scope of meteorological settings, leading to more robust phenological models.Similarly, data from long-term monitoring programs are needed to validate phenological models.With the implementation of new data collection strategies, these limitations may be overcome.
To address the limitations of the current work, as well as provide the base of the proposed framework for pest forecasting, innovative data collection strategies must be established.We present perspectives and potential approaches for acquisition of data for the proposed framework for pest damage forecasting.The main disadvantage of deep learning is the amount of data needed (Kamilaris and Prenafeta-Bold ú 2018), and not many agricultural image data sets are publicly available (Kamilaris et al. 2017, Arsenovic et al. 2019).To overcome this data scarcity, we propose two approaches for data collection.First, with the increasing number of smartphones used worldwide, allowing to record images, sound, and location, there is a wide scope for gathering large data sets, in particular in the context of citizen science (Teacher et al. 2013).In agriculture, particularly in relation to pest and pathogen monitoring, there is pressing interest for this approach and farmers are traditionally interested in participating in research projects (Ryan et al. 2018).While citizen scientists benefit from the classification tool and the pest forecasting model, the collected image data can be used to create a feedback loop where new images can be used for updating the classification tool (see Fig. 1).Additionally, an advantage of a citizen science approach could come from detection of new invasive species (Hulbert et al. 2017, Johnson et al. 2020), as famers might want to inform themselves and alert the responsible experts when they encounter a yet unknown damage.Finally, the acceptance for DSS is expected to be higher if users are involved in their development (Lynch et al. 2000).
The second approach we propose here is the implementation of drones (Floreano and Wood 2015).Drones are expected to revolutionize precision agriculture by delivering big data that can be used for various purposes (Tripicchio et al. 2015, Finn andDonovan 2016).For example, drones have been used for weed detection, irrigation equipment monitoring, or crop health monitoring (Veroustraete 2015).Drones programmed with GIS inputs and equipped with highresolution cameras (e.g., 15 megapixel; Shankar et al. 2018) are suitable tools to collect data in a structured way, which can be analyzed with deep learning algorithms (Shankar et al. 2018).Together, these approaches highlight opportunities to overcome the lack of image data sets on pest and pathogens, allowing to advance with the proposed framework for pest forecasting and providing groundwork for other novel technologies supporting sustainable agriculture.

CONCLUSIONS
In conclusion, we present a framework for developing pest monitoring and forecasting tools that rely on big data and deep learning.A nonrepresentative survey suggested that farmers are generally interested in the development of new forecasting tools and that there is a demand for new technologies with broad applicability in plant protection.In this study, we focus on the segments of this framework connected to building DNNs and coupling the phenology reconstructed with those DNNs with meteorological variables to produce phenology models.The case study on blister moth phenology highlights that this approach is feasible.DNNs showed good performance on categorizing different classes of damages with pictures of leaves taken under standardized conditions and in the field.Further, the phenology of the blister moth obtained from DNNs matched the phenology observed with count data in the field well.While damage classification tools are valuable instruments for pest and pathogen monitoring, using those classification tools to reconstruct the damage phenology and coupling them with meteorological data will promote new opportunities for early warning.Together, this study highlights that big data and modern technologies provide new opportunities to advance sustainable plant protection.To overcome the scarcity in data availability, which presents the main limiting factor for such datadriven approaches, here we suggest to address this issue with data collection based on citizen science or drones.Increasing data availability would not only support this framework for pest damage forecasting, but also foster further development toward applying modern information technology to tackle current agricultural challenges.

Fig. 1 .
Fig. 1.Conceptual figure showing the overall goal of the framework for developing pest damage forecasting tools.Data collection can be implemented with drones or citizen science approaches (a).The collected data can be used to train deep neural networks (DNN) for image classification to recognize pest and pathogen damages (b).Once the classification tool is established, predictions using deep neural networks can be used for reconstructing the phenology of pest damages (c).Coupling damage phenology with meteorological data enables establishing phenology models (d).Eventually, these phenology models can be used for predictions and in DSS, aiming at informing on pest occurrence to support growers and experts, for instance by implementing the tool in a smartphone app (e).Image classification tools can then reinforce the data collection, by making it available to citizen scientists (f).In this study, we focus on the deep learning and phenology modeling aspects within this framework.

Fig. 2 .
Fig. 2. DNN performance for the 14 classification tasks measured as classification accuracy.Boxes show the classification accuracy variation over the fivefold cross-validation.Red boxes show the results for standardized pictures and blue boxes for field pictures.For the results of F1 scores, see Appendix S1: Fig. S2.Images show the different classes considered in the study.F.l.t.r: Undamaged, PLBM, Physical damage, Lepidoptera, Brown spots, and Mildew.Right-most boxes show the performance of the full model.

Fig. 3 .
Fig. 3. Comparison between blister moth mines phenology reconstructed with DNNs (red line) and count data of mines from the field (blue line).The black line marks the count data of adults caught in traps.The hatched areas in the background show a 4-week timespan between the peaks of the blister moth generations (adults) and the local peak of mines.Panels show the data for (a) Gelfingen standardized pictures, (b) Gelfingen field pictures, (c) Kleinwangen standardized pictures, (d) Kleinwangen field pictures.

Fig. 4 .
Fig. 4. Comparison of the seasonal evolution of the inferred damage phenology (DNN) and the phenology modeled with GLMs.Panels show the data for (a) Gelfingen standardized pictures, (b) Gelfingen field pictures, (c) Kleinwangen standardized pictures, and (d) Kleinwangen field pictures.The x-axis starts on the 1.1.2019and shows the calendar weeks.Sampling started in week 16 and lasted for 19 weeks.
Ceppi et al. (2012)tology (MeteoSwiss; meteoswiss.admin.ch).Daily data were extracted for the year 2019 and aggregated to the weekly resolution in order to match the weekly sampling rate of the damage monitoring.The statistical models employed to create the gridded data are described inCeppi et al. (2012)and Frei (2014) (daily minimum, maximum and mean temperature),Frei and Isotta  (2019) (precipitation), and D ürr and Zelenka (2009) (solar radiation).To track phenology, we further calculated the accumulated temperature sum (i.e., degree-days) as the cumulative sum of the mean temperature over 5°C on daily basis.

Table 1 .
Model deviance (as adjusted D 2 values) for GLM on blister moth phenology with different meteorological predictor variables.

Table 2 .
Akaike's information criterion (AIC) for GLARMA models on blister moth phenology with different meteorological predictor variables.