Morphometric tools have been tested many times on leaf form and proven to be useful (e.g., White et al., 1988; Premoli, 1996; McLellan and Endler, 1998; Jensen et al., 2002; Krieger et al., 2007; Bensmihen et al., 2008; and many others reviewed in Krieger, 2010). Generally, differences in shape (e.g., among species or genotypes) or patterns of shape variation (e.g., whether shape varies continuously between exemplar morphologies) that are detectable subjectively by the investigator become very apparent under morphometric analysis. However, these methods have not been explored with the specific goal of developing a useful, universal toolkit for plant biologists. To perform an eigenshape analysis or elliptic Fourier combined with principal components analysis (EFA-PCA), two of the most popular morphometric methods, it is first necessary to build up a set of specimens, on the order of several hundred to thousands, perform an eigenanalysis to generate a morphospace, and then assess the utility of the shape metrics that define this space (in this context, each axis x, y, z, etc. in the multidimensional morphospace is a shape metric, with movement along each axis describing some pattern of shape variation). It is often the case that the shape metrics generated this way are very similar between studies, and the differences are trivial; nevertheless, these empirically derived shape metrics, created through an analysis of variance in the data set, will necessarily reflect the samples used, so although this approach succeeds at generating useful, novel shape metrics, it fails to generate a universal toolkit of shape metrics.
One possible approach to having shape metrics that are directly comparable between studies would be to request the matrices underlying the morphospaces generated in a published morphometric study. If it is possible to match a given data set to the published data set—e.g., by correctly interpolating and downsampling specimen outlines, then converting to phi functions, submitting to Fourier analysis, performing a relative warps analysis, etc., as needed—it would then be possible to project the specimens into the published morphospace, allowing a researcher to make the same shape measurements on his or her specimens as in a published study. However, even if the technical challenge of projecting specimens into a morphospace is not too much of a hurdle, convincing an investigator to share the requisite matrices may be an insurmountable challenge. When the alternative is a proscribed measurement, such as “length multiplied by width at a point 2/3rds down the midvein,” the latter becomes very attractive by virtue of ease of use but disappoints in failing to describe the morphology for which there is such a rich set of qualitative terminology.
Here I present a general protocol for generating broadly useful measures of outline geometry, which can be computed without the need of a large-scale morphometric analysis. The proposed protocol is very straightforward: select two end-point geometries (specifically, mathematical geometries such as a circle or ellipse, not exemplar specimens) and submit them to a coordinate-point eigenshape (CPES) analysis, which defines the vector between them, generating a single, geometric shape metric. The example explored here is circularity as measured using geometric morphometrics. Unlike existing metrics of circularity, such as the circularity shape factor (area multiplied by 4π, divided by perimeter squared) or Feret diameter ratio (diameter of the circle with same area as the specimen, divided by major axis length), this metric does not approximate circularity using perimeter (or length) and area; rather, it is measured directly from the geometry of the leaf. This geometric circularity vector represents a one-dimensional morphospace, what McGhee (1999) would term a “theoretical morphospace.” The creation of theoretical plant morphospaces has so far been a very mathematical endeavor (e.g., Niklas, 1978; Prusinkiewicz and Lindenmayer, 1990; Prusinkiewicz et al., 2007); McGhee (1999) encouraged the further development of theoretical morphospaces by plant biologists, but provided only a series of ad hoc, mathematical examples, with no protocol to be followed or guidance for those who are not mathematically inclined.
The protocol described here is the next logical step from the work of MacLeod (2002a), where a variety of exemplar shapes were subjected to eigenshape analysis to inform the delineation between the different character states they represented. The most relevant example in that paper was an eigenshape analysis of three leaf shape exemplars: simple, palmately lobed, and pinnately lobed. Use of actual leaves as exemplars meant that there was both taxon- and specimen-specific information incorporated into the analysis (a shortcoming acknowledged and discussed at length in MacLeod, 2002a). While three leaf exemplars were used (and the two other examples in the same paper used eight and 12 exemplars), it was certainly inferred that two exemplars could be used, and moving from exemplars to pure, mathematical geometries (e.g., using a circle and a line, as opposed to selecting two leaves, one reasonably circular and the other reasonably linear) is also only a slight change to the methodology described there. Nevertheless, using two, pure geometries as a way to generate a useful shape metric was never explicitly suggested, and it does not appear to have been subsequently attempted by the botanical community as a means to generate shape measures or theoretical morphospaces. Therefore, it seems useful to publish this protocol as a very specific application of the more generalized technique described in MacLeod (2002a), and to explicitly describe how this type of analysis of pure geometries can be of use to the plant biologist, building on their knowledge of the natural variation in plant geometry. The current protocol also benefits from the incorporation of CPES analysis, vs. the extended eigenshape analysis used in that study.
I created a geometric circularity vector, independent of any biological specimens, as a sample application of the protocol. This can be thought of as a line connecting the two shapes, with movement along the line changing shape from a line to a circle. Having to derive the mathematical description of this transformation, much less the transformation between two more complicated geometries, would be challenging; the approach described here automates this step. I calculated scores along the line-circle vector for a sample of fern leaves showing a range of morphologies between linear and circular. These “scores” can be thought of as any other measurement of leaf shape: it is simple enough to imagine a score of “3.2,” when the measurement is length in centimeters (i.e., 3.2 cm long); in the case of scores along the line-circle vector, the geometric meaning of scores can be visualized easily through modeling variation along the vector, which will show what shape corresponds to a particular score along the gradient from line to circle. To assess the line-circle vector against natural patterns of shape covariation. I compared scores along this vector to scores along a circularity axis from an empirical morphospace, generated through CPES analysis (MacLeod, 1995) of the fern leaf sample. The first principal component (or, eigenshape axis) in the CPES analysis, representing 85% of the variance in the sample, has previously been characterized as circularity (Krieger, 2007). This pattern of variation has appeared in diverse samples of leaves, usually explaining the majority of the sample variance, except where size has been left in the analysis (compare with models for a broad sample of dicot leaves in Krieger et al. [2007]; Antirrhinum L. in Weight et al. [2007]; Arabidopsis (DC.) Heynh. and Antirrhinum in Bensmihen et al. [2008]; and Hoya R. Br. in Torres et al. [2008]; in the second and third studies, size was the first principal component, and circularity was the second). Therefore, a metric that captures this pattern is likely to be both biologically meaningful and broadly useful. The line-circle vector was also generated using CPES analysis, a geometric morphometric technique, which is why I refer to it as the geometric circularity vector (abbreviated as “circularity vector” here). The initial empirical morphospace was centered on the overall sample mean shape (standard practice in this type of analysis). I made a second comparison, instead centering the sample on the mean of the circularity vector, comparing scores in this space to scores of the same specimens along the circularity vector (see Fig. 1). This ensured that the circularity vector intersected the center of the empirical morphospace, potentially bringing the two into closer correspondence. However, this would not be standard practice for a CPES analysis; if specimens are far from the center of the morphospace, the curvature of the space may distort the relationships among specimens. Finally, I standardized scores of leaves along the circularity vector to range from 0 to 1 and compared to calculated values of the circularity shape factor (CSF), the most commonly used metric of circularity, for each specimen. CSF (4π · area / perimeter2) is mathematically constrained to range from 0 for a line (where area = 0) to 1 for a circle (where perimeter2 = 4π · area).
METHODS AND RESULTS
I generated the geometric circularity metric as a line-circle vector, achieved very simply by performing CPES analysis on two shapes, a line and a circle, resulting in a one-dimensional morphospace. This is the essence of the proposed protocol: select two end-point geometries and submit them to a CPES analysis, which defines the vector between them. The two shapes must be centered at the origin, have unit centroid size, and have the correct number of points for the number used in the interpolation of our sample outlines. The “line” is not technically a line, but a linear outline with a width of zero (a description of how to compute these two shapes is given in Appendix 1). “Scores along the line-circle vector” and “geometric circularity” will be used interchangeably.
For comparison, I performed a CPES analysis on 938 leaves from 339 specimens, selected from 17 taxa (see Appendix S1 (apps.1400009_S1.docx)) in the genus Pleopeltis Humb. & Bonpl. ex Willd. (Polypodiaceae), or closely related genera, depending on the circumscription (for additional information on specimen selection and preparation of leaf outlines, see Krieger, 2007), with each leaf represented by a 498-point outline. The CPES analysis generated a series of variance-optimized axes, like a principal components analysis, which together defined a morphospace (see Macleod, 1995, 1999). This space was centered at the overall sample mean shape, to which each shape was aligned using Procrustes superposition, part of the CPES analysis. The first axis in this space, ES1, appeared to correspond to circularity (Fig. 2A). Scores along the line-circle vector were strongly correlated to scores along ES1 in the original CPES analysis morphospace; there was no apparent structure to the residuals from a major axis regression, and there were no notable outliers (Fig. 2D). This showed that the geometric circularity metric closely matched natural patterns of shape covariation in this sample. It also showed, definitively, that this axis in the CPES analysis corresponded to linear-circular variation, which previously had been inferred (Krieger, 2007).
I generated a second morphospace by instead aligning specimens on the mean shape along the line-circle vector (if a series of lines are drawn between corresponding landmarks on the two shapes, the mean shape is the shape formed of the midpoints of these lines). Scores along the new ES1 (see models in Fig. 2B) were strongly correlated to scores along the line-circle vector, with no apparent structure to the residuals or outliers (Fig. 2E). Despite recentering the morphospace, the results were largely the same as with the original CPES analysis. Points scoring high on the x-axis fell slightly below the regression line. This may reflect curvature of the recentered morphospace, due to centering it on a new mean shape.
I rescaled scores along the line-circle vector (see models in Fig. 2C) so they ranged from 0 (for a line) to 1 (for a circle), to match the range of values for the CSF. The two linear operations to achieve this—translation of the score at the “line” end of the line-circle vector to 0 and rescaling so the score at the “circle” end is at 1—were necessary so that every study measuring geometric circularity is using the same scale. To model variation at specific points along the rescaled vector, it is necessary to back-transform from scores along (0,1) to the original range, which is easily achieved. There was a strong correlation between the CSF (using only area and perimeter) and the geometric circularity metric. The slope of the major axis regression between these two metrics was not unity (Fig. 2E), as would be expected if they matched precisely. This could not be explained by the slight difference in the area : perimeter ratio for the circle as interpolated to 498 points. More likely, it reflected the fact that the actual leaves have rough margins, which will inflate perimeter with respect to area, and that the more circular leaves tended to have rougher margins and a greater prevalence of incisions along the margin. The sample was composed largely of simple leaves with entire margins; however, there were some noticeable outliers in Fig. 2F, specimens where the two metrics noticeably differed. One such leaf is shown in Fig. 3A. Because the margin of this leaf is incised, which is typical of this fern genus, the perimeter is inflated relative to area. To the degree that a plant biologist would assess circularity as if that incision were not there, it is clear that the geometric circularity metric is superior, because leaves with similar geometric circularity values have a similar overall shape, ignoring the incision.
An application of the line-circle vector is shown in Fig. 3B, with three of the taxa used in this study, showing three levels of dimorphy between sterile and fertile leaves: strong, weaker, and very weak (but still significant). Note that all of the distributions are along an axis scaled 0 (line) to 1 (circle). Using this protocol and the approach described in Appendix 1, another set of dimorphic leaves could be measured by another researcher, and directly compared to these values. This could be used as the basis for refining the terminology as applied to varying degrees of dimorphy (e.g., dimorphic, slightly dimorphic, slightly subdimorphic), as described by MacLeod (2002a) for degree and type of leaf lobing.
CONCLUSIONS
Based on the very strong correlations between geometric circularity scores and CPES analysis scores, it is clear that geometric circularity is an example of a synthetic shape metric that can capture natural patterns of shape covariation without the need of a morphometric analysis of a large data set. This alone means an enormous amount of time can be saved, digitizing a few specimens instead of the hundreds to thousands needed to generate a morphospace. The other pressing question is whether this synthetic metric is measuring what a plant biologist would consider to be “circularity.” The example shown in Fig. 3A would suggest that geometric circularity is closer to a qualitative sense of circularity than the classic metric, CSF. Because the position along the line-circle vector is determined by Procrustes superposition, a leaf with a toothed or wavy margin will have the same score on this vector as one with a smooth margin of the same overall shape, whereas its increased perimeter will give it a much lower (more linear) value for CSF. In both of these cases, geometric circularity is relatively insensitive to features that would likely be ignored by a plant biologist making a qualitative assessment of circularity, whereas they would both have a significant impact on the measurement of CSF. There may be cases where shape factor is more biologically meaningful, e.g., when quantifying a physical process like heat exchange; however, using the term “circularity” for this seems misleading, as illustrated in Fig. 3A. The other significant issue with the classic CSF is that there is no mapping from a value to a shape, other than at the ends of the spectrum, whereas this modeling ability is inherent in the geometric approach. That is not to say there will not be multiple leaf shapes with the same geometric circularity value, just that this value is grounded in the distance between a sample leaf and the models shown in Fig. 2C. Unlike CSF, it is possible to have specimens that are past the ends (0,1) of the line-circle portion of the geometric circularity vector. The sample used here does not incorporate the vast variability found in leaves; there are certainly cases where the geometric circularity metric could break down, such as highly dissected leaves with little lamina, or leaves with lamina that overlaps itself. Whether real leaves will map outside of the range (0,1) remains to be seen, but it is a possibility. However, it is not necessarily a problem if they do, and such leaves are likely to also have misleading values of CSF, so it may be that neither metric is useful in such cases.
The intent of this protocol is to provide a generalized approach to generating shape metrics useful in the comparison and analysis of leaves. Ideally, we could understand geometric circularity as one component in overall form, along with patterns of variation like ovate-obovate and elliptic-oblong, which both have a grounding in classic, qualitative terminology and appear as individual axes in morphometric analyses of diverse samples of leaves (both in the set of fern leaves used here, and in the diverse set of dicot leaves used in Krieger et al. [2007]; there are hints that they are present in other studies, including EFA-PCA analyses, but shape models are not published commonly enough to be sure). The challenge to generalizing the protocol is that the selection of end point geometries is not trivial. Unlike the case for circles and lines, it is not as clear as to what corresponds to the ideal “ovate” or “obovate” geometry. One solution is to fall back to the approach advocated by MacLeod (2002a), and use exemplar shapes provided by plant biologists (e.g., Hickey, 1973; Leaf Architecture Working Group, 1999), which have been developed through the observation of large numbers of leaves by many botanists, an organic form of shape decomposition. The downside of exemplars is that they may contain taxon- or specimen-specific shape information that distracts from the pure geometries of interest. Another solution is to assemble a large, diverse set of leaves and see what geometries appear as orthogonal axes. The use of a morphospace in this way, for exploring patterns of shape covariation and identifying useful characters, is well established (e.g., MacLeod, 2002b). This is not inherently part of the protocol, but it is useful for identifying uncorrelated characters, because the axes in a CPES analysis are orthogonal, as well as for assessing the proportion of variance explained by a particular character. As sample size and morphological diversity increase, the individual axis geometries tend to become more geometrically pure and less taxon- or specimen-specific. These axes can be used to inform the selection of pure forms of, e.g., “obovate,” “ovate,” “elliptic,” or “oblong” (even for “elliptic,” which should be easy to define, it is not clear what ratio of major and minor axes would be appropriate). It is certainly possible to use this approach to generate nonsensical metrics, if two unrelated geometric forms are selected. Therefore, it is likely to remain valuable to use empirical morphospaces to identify naturally occurring patterns of shape covariation. I suspect that an elliptic-oblong axis will be the next easiest to develop, perhaps by building an ellipse to match the line-circle vector mean shape, and using the vector between the mean and that ellipse as a starting point. The ultimate goal is to develop a standardized geometric toolkit that replicates the qualitative metrics already in use, such as those of Hickey (1973), in a quantitative, repeatable form.
LITERATURE CITED
Notes
[1] The creation of the data set used in this work was supported through the Colorado Museum Walker Van Riper fund and a Doctoral Dissertation Improvement grant from the National Science Foundation, and through the generous help of staff at COLO, F, JEPS, MO, and NY. The author thanks two anonymous reviewers for their valuable feedback on an earlier version of this manuscript.