Macadamia is a recently domesticated nut crop derived from the Australian subtropical rainforest species Macadamia integrifolia Maiden & Betche and M. tetraphylla L. A. S. Johnson and their hybrids. Within the genus, all species, including M. ternifolia F. Muell. and M. jansenii C. L. Gross & P. H. Weston, are under threat of genetic erosion (Mast et al., 2008; Costello et al., 2009). Commercial cultivars were developed primarily in Hawaii and are only a few generations removed from Australian wild progenitors (Hardner et al., 2009). Macadamia are preferentially out-crossing and take four to five years to reach maturity. For breeding programs to progress effectively, there is a need to discriminate among clonally propagated industry standard cultivars and novel selections well before maturity. Although the 17 available M. integrifolia microsatellite markers with perfect repeats were tested in our laboratory (Schmidt et al., 2006), only four amplified successfully. These results are consistent with previous research on M. integrifolia (Neal, 2008), and no published study has used more than four polymorphic markers (Shapcott and Powell, 2011; Spain and Lowe, 2011). Additional microsatellite markers are needed to support conservation studies and breeding programs.
Next-generation sequencing (NGS) platforms are now routinely used for isolation of microsatellite, or simple sequence repeat (SSR), loci from plants (Egan et al., 2012). Long-read platforms are commonly used because reads of 300 to 500 bp in length may contain both the SSR motif and flanking sequence for primer design (Zalapa et al., 2012). Together, paired-end reads from short-read platforms also contain the SSR motif and flanking sequence for primer design at a lower cost per base (Silva et al., 2013). The aim of this study was to develop polymorphic microsatellite markers for Macadamia using paired-end Illumina reads with and without prior de novo assembly.
METHODS AND RESULTS
Fresh leaf material was collected from macadamia nut cultivars at Clunes Varietal Trial M2, Clunes, New South Wales, Australia (Stephenson and Gallagher, 2000). Additional cultivars and clones of wild-collected individuals of all four Macadamia species were sourced from the Australian Macadamia Germplasm Collection at Alstonville Tropical Fruit Research Station, NSW Department of Primary Industries. Herbarium material is deposited at the Southern Cross University Medicinal Plant Herbarium (PHARM), Lismore, New South Wales, Australia (Appendix 1). Fresh leaf material was stored at −80°C (for Illumina sequencing) or after collection dried in a sealed container with 10× silica gel by fresh weight. Total DNA was extracted using a QIAGEN DNeasy Plant Kit (QIAGEN, Valencia, California, USA) according to manufacturer's protocols. Approximately 4.5 µg of DNA extracted from one individual of M. integrifolia was submitted to the Australian Genome Research Facility, Melbourne, for sequencing. A DNA library was prepared with an Illumina TruSeq Sample Preparation Kit (version 2) following the manufacturer's instructions (Illumina, San Diego, California, USA). Genomic DNA was sheared using a Covaris S2 sonication device (Covaris, Woburn, Massachusetts, USA). DNA fragments were end-repaired, A-tailed, and ligated to adapters. Size and concentration of DNA fragments were assessed using a DNA 1000 chip on a Bioanalyzer 2100 instrument (Agilent Technologies, Santa Clara, California, USA). Average insert size of the library was 424 bp. Approximately 4 pmol of the library was paired-end sequenced (100 × 2 cycles) on an Illumina Hi-Seq 2000 instrument.
Characterization of 12 polymorphic microsatellite loci developed in Macadamia integrifolia.a
Paired-end reads were imported into CLC Genomics Workbench (version 4.9; CLC Bio, Aarhaus, Denmark) and trimmed to remove low-quality base calls (<Q20; P < 0.01) and adapter sequences. For the purpose of primer design, reads containing SSR motifs were identified as follows. Raw sequence reads: the search function was used to identify di- and trinucleotide SSR motifs with a minimum of eight repeats in raw sequence reads. SSR regions were identified at the 3′-end of a read. Primers were then designed in the flanking regions (i.e., 5′-end of read containing SSR) and in the matching paired-end read. De novo contigs: trimmed reads were assembled de novo with the following parameters: similarity index = 0.8; length fraction = 0.5; insertion/deletion cost = 3; mismatch cost = 2. Contigs were screened for SSR regions using the search function described above. To develop and optimize a suite of SSR markers for cultivar identification and gene flow studies, primers were designed for 48 loci, 24 for each method using a batch function in Primer3 version 2 (Rozen and Skaletsky, 2000) specifying a primer melting temperature (Tm) range 58–70°C, maximum Tm difference 5°C, and primer GC content 40–60%. To minimize the cost of primer synthesis during the testing phase, one primer from each pair was 5′ modified with an engineered sequence (5′-CCCCCGGGGGC-3′) to enable the attachment of a third primer that was fluorescently labeled using a two-step PCR protocol (Pacey-Miller and Henry, 2003). Primer pairs were tested for amplification success and polymorphism among 12 DNA samples including eight M. integrifolia cultivars and one individual from each Macadamia species. Of the 48 primer pairs tested, six did not amplify and seven produced multiple bands. Of the remaining 35 loci, none were monomorphic, with two or more alleles detected among the 12 test individuals. Primer sequences for these loci are available on request from the author.
Twelve microsatellite loci were selected for further development on the basis of single band amplification, level of polymorphism, and size compatibility for pooled multilocus capillary electrophoresis. The 5′ end of one of each primer pair was fluorescently labeled (Table 1) and the following single-step PCR protocol was used: in 20-µL reaction volumes containing approximately 20 ng DNA template, 0.5 U Platinum Taq (Life Technologies, Carlsbad, California, USA), 2 µL Platinum Taq PCR buffer, 0.1 mM dNTPs, 2 mM MgCl2, 0.2 µM of each primer, and sterile water to 20 µL. Thermal cycling was conducted in a GeneAmp PCR System 9700 (Life Technologies) with the following conditions: initial denaturation at 94°C for 2 min; followed by 35 cycles of 94°C for 10 s, annealing temperature (Ta) (Table 1) for 10 s, extension at 70°C for 1 min; followed by final extension at 70°C for 5 min. Genotypes were generated using an ABI PRISM 3730 Genetic Analyzer (Applied Biosystems, Foster City, California, USA). Allele size was scored in reference to ABI PRISM GS (LIZ) internal size standards using the program Geneious version 6.1.6 (Biomatters Ltd., Auckland, New Zealand). We assessed variability and genotype consistency of the 12 loci in 22 macadamia cultivars (two to four replicate trees of each) including pure M. integrifolia and hybrids. The loci were also tested for cross-amplification in wild-collected individuals of M. integrifolia (n = 6), M. tetraphylla (n = 7), M. ternifolia (n = 2), and M. jansenii (n = 2).
After trimming, there were 245,099,904 reads, with an average length of 91.57 bp. We identified 2.29 million reads containing di- and trinucleotide SSR motifs with a minimum of eight repeats. Amplification success at 60°C annealing temperature was identical (87.5%) for primer pairs from unassembled reads and de novo assembled contigs. Genetic diversity parameters and principal coordinate analysis (PCoA) were calculated using GenAlEx version 6.5 (Peakall and Smouse, 2006, 2012) (Table 2).
Genetic properties of 12 microsatellite loci in Macadamia integrifolia and hybrid industry cultivars, and M. tetraphylla.
All 12 loci amplified and were polymorphic among 22 cultivars. Mean observed (Ho) and expected (He) heterozygosity were 0.571 and 0.626, respectively. A total of 71 alleles were detected, with an average of 5.9 per locus (Table 2). Unique genotypes were obtained for each cultivar with the exception of Hawaiian Agricultural Experiment Station (HAES) 741 and 660 that shared 24 of 24 alleles. Selection records for these two cultivars are the same, suggesting that they may have been sourced from the same tree at different times. Genotypes from replicate trees of cultivars were consistent, with the exception of one of three HAES 791 trees that is presumed to be a misidentification as its genotype was identical to HAES 344. In M. tetraphylla, 59 alleles were found, with an average of 4.9 per locus. Mean Ho and He were 0.573 and 0.632, respectively (Table 2). All loci amplified reliably in sampled wild M. integrifolia and M. tetraphylla individuals, and were polymorphic with the exception of Mac009 in M. integrifolia. Locus Mac005 in M. jansenii and Mac001 in M. ternifolia did not amplify. The remaining 11 loci amplified in M. jansenii and M. ternifolia, and eight were polymorphic in two individuals of each of these species. Species-specific clusters were generated by two-dimensional PCoA based on genetic distance. Most cultivars clustered with wild M. integrifolia individuals, although hybrid cultivars such as A4 and A16 were intermediate between M. integrifolia and M. tetraphylla (Fig. 1).
The microsatellite markers developed here enable discrimination among macadamia industry cultivars and will be used to select parental genotypes in breeding programs. Cross-amplification and polymorphism of the markers in all Macadamia species will facilitate studies of population structure, gene flow, and hybridization. In this work, we demonstrate the effectiveness of Illumina NGS paired-end sequence reads for rapid and cost-effective microsatellite development with and without prior assembly of reads.
 The authors are grateful to the NSW Government Industry and Investment, Korora Research and Development, and Mustard Seed Finance Trust for funding this work. We also thank Laura Homer, Nicole Rice, Kim Wilson, Jolyon Burnett, Maria Matthes, Trevor Oleson, Peter Moutt, Michael Powell, Alison Shapcott, the Australian Macadamia Society, and the Macadamia Conservation Trust.