Alexander Knyshov, Samantha Hoang, Christiane Weirauch
Insect Systematics and Diversity 5 (2), 1-10, (31 December 2022) https://doi.org/10.1093/isd/ixab004
KEYWORDS: machine learning, automated insect identification, taxonomy, Phylinae
Automated insect identification systems have been explored for more than two decades but have only recently started to take advantage of powerful and versatile convolutional neural networks (CNNs). While typical CNN applications still require large training image datasets with hundreds of images per taxon, pretrained CNNs recently have been shown to be highly accurate, while being trained on much smaller datasets. We here evaluate the performance of CNN-based machine learning approaches in identifying three curated species-level dorsal habitus datasets for Miridae, the plant bugs. Miridae are of economic importance, but species-level identifications are challenging and typically rely on information other than dorsal habitus (e.g., host plants, locality, genitalic structures). Each dataset contained 2–6 species and 126–246 images in total, with a mean of only 32 images per species for the most difficult dataset. We find that closely related species of plant bugs can be identified with 80–90% accuracy based on their dorsal habitus alone. The pretrained CNN performed 10–20% better than a taxon expert who had access to the same dorsal habitus images. We find that feature extraction protocols (selection and combination of blocks of CNN layers) impact identification accuracy much more than the classifying mechanism (support vector machine and deep neural network classifiers). While our network has much lower accuracy on photographs of live insects (62%), overall results confirm that a pretrained CNN can be straightforwardly adapted to collection-based images for a new taxonomic group and successfully extract relevant features to classify insect species.