Open Access
How to translate text using browser tools
6 November 2013 Plant Identification Through Images: Using Feature Extraction of Key Points on Leaf Contours
Chih-Ying Gwo, Chia-Hung Wei
Author Affiliations +

Because plant identification demands extensive knowledge and uses complex terminology, even professional botanists need to take much time in the field to master plant identification (Rademaker, 2000). Plant identification by information systems has often been regarded as a possibility. By employing personal digital devices to photograph the whole plant or a portion of the plant, information systems can be used to perform plant recognition. Plants may be recognized through the leaves, flowers, roots, and fruits, which reflect the diversity of plant shapes available within an organism. In particular, the shape of leaves and the floral organs—the modified leaves—are especially important (Tsukaya, 2006), with the leaves considered an especially useful characteristic for species identification (Gu et al., 2005; Du et al., 2007; Wu et al., 2007). For example, the free mobile app Leafsnap ( http://leafsnap.com) has been developed to identify tree species from photographs of their leaves. Marcysiak (2012) examined the morphology of Salix herbacea L. leaves for intraspecific morphological variation. A total of 3890 leaves from 503 individuals were statistically analyzed based on leaf shape characters. A notable variation of shape characters of leaves of S. herbacea was found on different levels, including intra- and interindividual samples. For example, Gailing et al. (2012) identified morphological species and differentiation patterns on two species, Q. rubra L. and Q. ellipsoidalis E. J. Hill, which hybridize with each other. The two plant species were identified as two clusters when leaf morphological characters were measured. Furthermore, two populations of Q. ellipsoidalis were differentiated from eight other populations through analysis of leaf morphological characters. Therefore, leaf recognition through images can be considered an important research issue for plant recognition.

Shape is one of the most important features for describing an object. Humans can easily identify various objects and classify them into different categories solely from the outline of an object. Shape often carries several types of contour information, which are used as distinctive features for the classification of an object. In the MPEG-7 standard, shape descriptors can be divided into region-based shape descriptors and contour-based shape descriptors (Zhang and Lu, 2003a). Region-based shape descriptors such as Zernike moments (Wee and Paramesran, 2007) describe a shape based on both boundary and interior pixel information. Region-based shape descriptors can be used to depict several complex objects with filled regions (Bober et al., 2002), and can capture both the interior contents and boundary information of an object in an image. However, contour-based descriptors only exploit the boundary information of an object, and include the conventional representation and structural representation. Conventional descriptors such as curvature scale space (CSS) (Mokhtarian et al., 2005) retain the overall shape of an object during calculation. Structural descriptors such as chain code fragment the shape of an object into different boundary segments (Zhang and Lu, 2003b).

Because the morphology of leaves is commonly used for plant identification, the studies shown in Table 1 have examined the shape and morphological description for plant leaves. As leaf recognition can be regarded as an image classification issue, various types of neural networks were proposed for identifying the species to which a given leaf belongs. Chaki and Parekh (2011) presented a schematic for the automated detection of three classes in a plant species by analyzing the shapes of leaves and using several neural network classifiers. Gao et al. (2010a) proposed a neural network classifier based on prior evolution and iterative approximation for leaf recognition. Huang and He (2008) applied probabilistic neural networks for the recognition of 30 types of broad-leaved trees. Furthermore, Wu et al. (2007) also introduced the probabilistic neural network to classify 32 types of plants. Other various classification methods were proposed for leaf recognition in addition to neural networks. Ehsanirad (2010) trained a classifier to categorize 13 types of plants with 65 new or deformed leaves during the testing process. In the Du et al. (2007) study, a moving median-centered hypersphere classifier was adapted to perform the classification. Hajjdiab and Al Maskari (2011) presented an approach for identifying leaf images based on the cross-correlation of distances from the centroid to the leaf contour.

Feature extraction for leaf images requires consideration of which features are most useful for representing the leaves and which methods can effectively code leaf morphologies (Wu et al., 2006). A leaf of a given species normally represents a specific shape or contour; therefore, this characteristic is a reliable and meaningful indicator for leaf representation. The main contribution of this study is to propose a feature extraction method for leaf contours that describes these significant turning points. Moreover, a classifier of a statistical model is proposed for similarity matching with different numbers of features.

MATERIALS AND METHODS

Leaf recognition frameworkThe leaf recognition framework was divided into leaf modeling and leaf recognition. For leaf modeling, leaves belonging to the same species were used to detect and extract leaf features. The extracted features were then used for leaf modeling, creating a leaf model for each leaf species in the database. During leaf recognition, a query leaf was also tested by detecting feature points and feature extraction. Using these features, the recognition system can identify the best matching model and recognize the species of the query leaf.

Object contourThe contour of object O in image I can be detected to generate the set ξ;, which collects all contour points p in a Cartesian coordinate system. These contour points can be used to calculate the centroid C of the object using Equation 1.

e01_01.gif

where |ξ;| represents the number of edge points in set ξ;. All contour points are collected in a clockwise order and stored in set ξ;. As several segments of an object contour contain redundant points, these redundant points can be removed through sampling. The sampling process is to select the contour points from every five points in the set ξ;. Thereafter, the selected points are stored in another set S. Figure 1 illustrates the process of detecting contour points. The contour points of the leaf in Fig. 1A are sampled to result in Fig. 1B.

Feature extractionIn the object contour, straight lines are created between centroid C and each contour point ρ. Thereafter, the lengths of the straight lines can be calculated. Suppose that a set of contour points is S = {ρ1,ρ2,…,ρn}. The line length leni, can be computed as

TABLE 1.

Methods and features used in leaf recognition studies.

t01_01.gif

Fig. 1.

Detection of contour points. (A) Contour and centroid C of leaf. (B) Sampling result of contour points.

f01_01.jpg
e02_01.gif

The distance features are normalized to create a histogram that represents the distribution of distances in the object contour. All Len i are divided by the greatest Len max and collected in R to normalize the length features.

e03_01.gif

Intradifference, the difference in a leaf species at individual leaves, may cause mistaken recognition. To deal with the intradifference problem and make the classification stable, the proposed feature is processed through the fuzzy logic method. The degrees of probability from probabilistic logic (Lukasiewicz and Straccia, 2009) is introduced into the histogram, where the frequency of each bin is replaced by fuzzy scores. The fuzzy score algorithm transforms the normalized features into fuzzy scores as shown in the algorithm in Appendix 1. For example, the feature value of A is 4.25 and it is transformed into two fuzzy values [0.5, 0.5]. The two fuzzy values are accumulated into bins [3,4] and [4,5] in the histogram. For point B, three fuzzy values are [0,1,0] for bins [3,4], [4,5], and [5,6]. Two fuzzy values of point C are [0.3, 0.7] for bins [4,5] and [5,6]. Figure 2 shows that three feature values are transformed into fuzzy values. Due to the ri ∈ [0.1], the range of the normalized value is divided into N classes, which is set as N = 24 in this study. The j represents an array and ri is assigned to the given class based on the following rules ν;[•]:

e04_01.gif

Fig. 2.

Probabilistic logic diagram.

f02_01.jpg

Fig. 3.

Thirteen species of plant leaves collected for this study, including sample leaves and feature histograms.

f03_01.jpg

TABLE 2.

Recognition results of the proposed features for the training set and the test set.

t02_01.gif

Each object can result in a histogram that represents information regarding the contour. Therefore, these resulting histograms can be used to estimate the matching degree between any two objects.

Classifier of statistical modelOnce the leaf features X = (x1,x2,…,xn) are extracted from a leaf, the leaf classifier can be expressed using the following equation:

e05_01.gif

where Ti is the model of leaf i and P(Ti|X) is the discriminant function of Ti. Bayes' theorem indicates

e06_01.gif

where ƒ(·) is the probability density function. The ƒ(X) is the common term for identifying the maximum probability because fi01_01.gif is estimated. If we assume a uniform prior probability P(Ti ) on the species identity, the discriminant function in Equation 5 can be simplified as

e07_01.gif

To reduce computational complexity, we further assume that (x1,x2,…,xn) are mutually independent features. Equation 3 can be transformed into Equation 4

TABLE 3.

Recognition results of Zernike moments for the training set and the test set.

t03_01.gif
e08_01.gif

If x is distributed normally with mean µ and variance σ2, then ƒ(x)∼N(µ, σ2)

e09_01.gif

To compute the exponential value efficiently, we use the logarithm of the discriminant function

e10_01.gif

which is referred to as score function. Thereafter, c sample leaves of each species in the training set are used to estimate the parameters fi02_01.gif and fi03_01.gif of each T i as follows:

e11_01.gif

TABLE 4.

Recognition results of curvature scale space for the training set and the test set.

t04_01.gif
e12_01.gif

where fi04_01.gif represents the k-th feature of the j-th sample leaves in the i-th species.

RESULTS AND DISCUSSION

This study examined 13 species of fresh plant leaves as shown in Fig. 3. This figure also includes some sample leaves and the feature histogram of a given leaf. For each species, separate images of 40 plant leaves were used to evaluate the proposed features and algorithms. The first 20 images in each species are regarded as the training set and the last 20 images are the test set. Furthermore, a feature histogram ν;[•] was created for all leaves. Equation 11 and Equation 12 are applied to compute µj and fi05_01.gif for each plant species. Moreover, mean fi06_01.gif and variance fi06_01.gif of centroid Ci are computed for each leaf.

e13_01.gif
e14_01.gif

Fig. 4.

Two leaf contour images and their corresponding feature histograms. Although the two leaves belong to the same species, their histograms present two greatly different feature curves.

f04_01.jpg

Table 2 shows that the recognition results for the training set and test set are indicated as a tredecuple ordered list of correct representatives. The ordered list reports the result of the recognition results where the first position is the correct identification of the plant species. The listed second position is the recognition result identifying the plant species as the second probable plant species. It is expected that the correct representative should be ranked as high as possible. The results in Table 2 show that the top value of the tredecuple reaches 93.1% and the first two can even achieve 99.2% for the training set. In comparison with the test set, the top value achieves 92.7% and the first two values can achieve 97.3%. The recognition performances for the training set and test set are substantially close.

Fig. 5.

A binary leaf image presented at sizes from 90% to 10% of the original size to verify scale invariance, with their corresponding histograms. In these histograms, the horizontal axis and vertical axis represent feature number and feature value, respectively.

f05_01.jpg

Zernike moments and curvature scale space are two popular methods that are both invariant to scale and rotation and were tested in the same experimental setup. The Zernike moments derive from a set of complex polynomials orthogonal over the interior of a unit circle and defined in the polar coordinates. The recognition results of the two methods for the training set and test set are shown on Table 3 and Table 4. If we compare the recognition rate for the first probable plant species, the results shown in Tables 24 indicate that the proposed method outperforms Zernike moments and curvature scale space.

Fig. 6.

A binary leaf image rotated clockwise from 10° to 90° to verify rotation invariance, with corresponding histograms. In these histograms, the horizontal axis and vertical axis represent feature number and feature value, respectively.

f06_01.jpg

Numerous leaves belonging to the same species may still possess great differences in contour. For example, Fig. 4 shows two leaf contours and their corresponding feature histograms. Although the two leaves belong to the same species, their histograms present two greatly different feature curves. An erroneous recognition happens when the feature curve of a given leaf is closer to the model of another species than that of the correct species. The problem would be solved by building multiple models for the same species, which is a potential research issue for other researchers to investigate.

The experimental results indicate that the correct recognition rate is 92.7% if we strictly examine the first-position plant of the recognition result. In other words, the erroneous recognition rate is approximately 7.3%. The cause of the erroneous recognition may involve the use of the parameter N in Equation 4, which in feature extraction may affect the fuzzy feature. When N is set higher, the leaves belonging to the same species are regarded as different species. When N is set lower, the leaves belonging to the different species are seen as same species. The parameter determination issue is also similar to the length of an interval for sampling contour points.

Scale invarianceTo verify the scale invariance, a binary image was shrunk to various sizes from the original image (from 90% to 10%). The features of the different-sized images were extracted to create their corresponding histograms as shown in Fig. 5. Correlation coefficients were computed for the similarity of any two scale ratios. This test was performed on 45 comparison sets. These 45 correlation coefficients fell between the minimal value 0.98611 and the maximal value 0.99992, indicating a strongly positive correlation. The results indicate the 10 feature histograms are very similar in terms of correlation coefficients. The curve in the feature histogram does not fluctuate considerably even when the image is shrunk to 10% of the original scale. These results also confirm that the proposed features are invariant to scale.

Rotation invarianceTo verify the rotation invariance, a binary image was rotated clockwise to various degrees from the original degree (from 10° to 90°). The features of the rotated images were extracted to create their corresponding histograms as shown in Fig. 6. Like the scale invariance test, the rotation invariance test was performed for 45 comparison sets using correlation analysis. The range of the correlation coefficients was between 0.98071 and 0.99988. These results indicate that the curves of these histograms have a very similar appearance, indicating the property of rotation invariance in the proposed features.

CONCLUSIONS

This study presents a feature extraction method for shape description and a classifier of a statistical model for different feature dimensions. The extracted features are invariant to scale and rotation, and the proposed method outperforms Zernike moments and curvature scale space. If the shape of leaves within a species varies substantially, multiple leaf templates are suggested for creating the species leaf model. We will extract more features from the patterns of the leaf vein and positions of the petioles of leaves in a future study to improve recognition performance.

LITERATURE CITED

1.

M. Bober , F. Preteux , and W.-Y. Y. Kim . 2002. Shape descriptors. In B. S. Manjunath , P. Salembier , and T. Sikora [eds.], Introduction to MPEG-7: Multimedia content description interface, 231–260. Wiley, Oxford, United Kingdom. Google Scholar

2.

J. Chaki , and R. Parekh . 2011. Plant leaf recognition using shape based features and neural network classifiers. International Journal of Advanced Computer Science and Applications 2: 41–47. Google Scholar

3.

J.-X. Du , X.-F. Wang , and G.-J. Zhang . 2007. Leaf shape based plant species recognition. Applied Mathematics and Computation 185: 883–893. Google Scholar

4.

A. Ehsanirad 2010. Plant classification based on leaf recognition. International Journal of Computer Science and Information Security 8: 78–81. Google Scholar

5.

O. Gailing , J. Lind , and E. Lilleskov . 2012. Leaf morphological and genetic differentiation between Quercus rubra L. and Q. ellipsoidalis E.J. Hill populations in contrasting environments. Plant Systematics and Evolution 298: 1533–1545. Google Scholar

6.

L. Gao , X. Lin , M. Zhong , and J. Zeng . 2010a. A neural network classifier based on prior evolution and iterative approximation used for leaf recognition. In Proceedings of the Sixth International Conference on Natural Computation, Yantai, Shandong, China, 10–12 August 2010, vol. 2, 1038–1043. Institute of Electrical and Electronics Engineers, New York, New York, USA. Google Scholar

7.

L. Gao , X. Lin , W. Zhao , S. Chen , and H. Huang . 2010b. An algorithm of excising leafstalk while keeping its main body intact for leaf recognition. In Proceedings of the 3rd International Congress on Image and Signal Processing, Yantai, Shandong, China, 16–18 October 2010, vol. 6, 2732–2736. Institute of Electrical and Electronics Engineers, New York, New York, USA. Google Scholar

8.

X. Gu , J.-X. Du , and X.-F. Wang . 2005. Leaf recognition based on the combination of wavelet transform and Gaussian interpolation. In D.-S. Huang , X.-P. Zhang , and G.-B. Huang [eds.], Advances in intelligent computing. Proceedings of the International Conference on Intelligent Computing, 23–26 August 2005, Hefei, China. Lecture Notes in Computer Science, vol. 3644, 253–262. Springer, Berlin, Germany. Google Scholar

9.

H. Hajjdiab , and I. Al Maskari . 2011. Plant species recognition using leaf contours. In Proceedings of the IEEE International Conference on Imaging Systems and Techniques, 17–18 May 2011, Batu Ferringhi, Penang, Malaysia, 306–309. Institute of Electrical and Electronics Engineers, New York, New York, USA. Google Scholar

10.

L. Huang , and P. He . 2008. Machine recognition for broad-leaved trees based on synthetic features of leaves using probabilistic neural network. In Proceedings of the International Conference on Computer Science and Software Engineering, 12–14 December 2008, Wuhan, Hubei, China, vol. 6, 871–877. Institute of Electrical and Electonics Engineers, Los Alamitos, California, USA. Google Scholar

11.

Y.-P. Liao , H.-G. Zhou , and G.-R. Fan . 2010. Accelerating recognition system of leaves on Nios II embedded platform. In Q. Luo and C.-L. Kuo [eds.], Proceedings of the International Symposium on Computer Communication Control and Automation, 5–7 May 2010, Tainan, Taiwan, vol. 2, 334–337. Institute of Electrical and Electronics Engineers, New York, New York, USA. Google Scholar

12.

T. Lukasiewicz , and U. Straccia . 2009. Description logic programs under probabilistic uncertainty and fuzzy vagueness. International Journal of Approximate Reasoning 50: 837–853. Google Scholar

13.

K. Marcysiak 2012. Variation of leaf shape of Salix herbacea in Europe. Plant Systematics and Evolution 298: 1597–1607. Google Scholar

14.

F. Mokhtarian , Y. Khim Ung , and Z. Wang . 2005. Automatic fitting of digitised contours at multiple scales through the curvature scale space technique. Computers & Graphics 29: 961–971. Google Scholar

15.

C. A. Rademaker 2000. The classification of plants in the United States Patent Classification system. World Patent Information 22: 301–307. Google Scholar

16.

H. Tsukaya 2006. Mechanism of leaf-shape determination. Annual Review of Plant Biology 57: 477–496. Google Scholar

17.

C.-Y. Wee , and R. Paramesran . 2007. On the computational aspects of Zernike moments. Image and Vision Computing 25: 967–980. Google Scholar

18.

Q. Wu , C. Zhou , and C. Wang . 2006. Feature extraction and automatic recognition of plant leaf using artificial neural network. Research in Computing Science 20: 5–12. Google Scholar

19.

S. G. Wu , F. S. Bao , E. Y. Xu , W. Yu-Xuan , C. Yi-Fan , and X. Qiao-Liang . 2007. A leaf recognition algorithm for plant classification using probabilistic neural network. In Proceedings of the 2007 IEEE International Symposium on Signal Processing and Information Technology, 15–18 December 2007, Cairo, Egypt, 11–16. Institute of Electrical and Electronics Engineers, New York, New York, USA. Google Scholar

20.

D. Zhang , and G. Lu . 2003a. Evaluation of MPEG-7 shape descriptors against other shape descriptors. Multimedia Systems 9: 15–30. Google Scholar

21.

D. Zhang , and G. A. Lu . 2003b. A comparative study of curvature scale space and Fourier descriptors for shape-based image retrieval. Journal of Visual Communication and Image Representation 14: 41–60. Google Scholar

Appendices

APPENDIX 1.

The fuzzy score algorithm.

tA01_01.gif
Chih-Ying Gwo and Chia-Hung Wei "Plant Identification Through Images: Using Feature Extraction of Key Points on Leaf Contours," Applications in Plant Sciences 1(11), (6 November 2013). https://doi.org/10.3732/apps.1200005
Received: 13 November 2013; Accepted: 1 September 2013; Published: 6 November 2013
KEYWORDS
classifier of statistical model
edge detection
feature extraction
leaf recognition
Back to Top