Open Access
How to translate text using browser tools
28 November 2019 Updating genetic relationship matrices and their inverses: a methodology note
Mohammad Ali Nilforooshan
Author Affiliations +

Most of the calculations for genetic relationship matrices and their inverses utilized in recent (t + 1) evaluations for populations of interest can be avoided by updating these matrices at t evaluations for only new animals. This study describes and develops existing methods for updating pedigree-based and marker-based relationship matrices and their inverses. Updating some of the matrices could benefit from parallel computing.


Best linear unbiased prediction (BLUP; Henderson 1973), genomic BLUP (GBLUP; VanRaden 2008), and single-step GBLUP (ssGBLUP; Aguilar et al. 2010; Christensen and Lund 2010) are the most common methods used for genetic and (or) genomic evaluation of livestock. Relationships among individuals are modelled via pedigree or genomic relationship matrices (A or G, respectively). The inverse of these matrices are required in BLUP and GBLUP, respectively. In ssGBLUP, both inverses together with the inverse of the pedigree relationship matrix for genotyped animals are required. A major computational burden for genetic evaluation systems is matrix inversion. The inversion of G has a cubic computational cost proportional to the number of genotyped animals (Meyer et al. 2013), using the conventional matrix inversion algorithms. However, the method for indirect inversion of A (Henderson 1975; Quaas 1976) has made its inversion possible at a linear cost. There is an opportunity to reduce the computation costs at time t + 1 evaluations by updating the relationship matrices and their inverses at time t for the information on the new animals. In this methodology note, methods of updating relationship matrices and their inverses are presented.


Updating G

The most common way of forming the genomic relationship matrix is G = cZZ′ (VanRaden 2008), where c = 1/[2Σpi(1 - pi)] for scaling G to be analogous to A, pi is the allele frequency at locus i, Z = M - 2P, where M is the genotype matrix with genotypes at each loci coded as {0, 1 (heterozygote), 2}, P = 1p′, where p is a column vector of pi (from the previous evaluation), and 1 is a column vector of ones with the order of the number of animals. Denoting old and young animals with 1 and 2, the genomic relationship matrix can be updated for new genotypes:



Matrices G12 and G22 are required for updating G:



where M2 and P2 are the rows of M and P for young animals. In comparison with ZZ′, ZZ2′ has (n1 + n2)mn2 computational complexity (O) rather than (n1 + n2)2m, where n1, n2, and m are the number of old genotyped animals, young genotyped animals, and markers, respectively. One unit of O is defined as a single arithmetic operation. Figure 1 shows (a) O(ZZ′) for m = 1 and (b) O(ZZ2′)/O(ZZ′) = n2/(n1 + n2) for various n1 and n2. For n2 < n1, O(ZZ′) changed linearly by n2 and exponentially increased by increasing n1 (Fig. 1a). The benefit from updating G increased by increasing n1/n2 (Fig. 1b).

Fig. 1.

The impact of various n1 and n2 on (a) (n1 + n2)2 in millions (M) and (b) n2/(n1 + n2).


Considering P′ = [P1P2′], if the old and new genotypes are from discrete populations or breeds, allele frequencies in P1 (rows of P for old genotypes) and in P2 are different (i.e., cjas-2019-0106ieq1.gifp and cjas-2019-0106ieq2.gifp2). In that case,



Even for homogeneous populations, p and p2 can be different. If n2 is relatively large and deviations between p and p2 are considerable, p should be updated to p* = (n1p + n2p2)/(n1 + n2), and c should be updated to c* (using p*) for both groups of animals. That updates eq. 2 and results in eq. 4:



Furthermore, allele frequencies in G11 should be updated, which is done by regressing G11 as if p* was applied (cjas-2019-0106ieq3.gif) to G11 (i.e., cjas-2019-0106ieq4.gif), where b = c*/c and α = 4c* [p*′p* - pp - (p* - p)′M11/n1]. The proof of α is provided in the Appendix A. Like any regression, error terms are involved (in cjas-2019-0106ieq3.gif). Therefore, cjas-2019-0106ieq3.gif is an approximation of G11 as if p* was used instead of p. Vitezica et al. (2011) used a similar approach to regress G to Agg (block of A corresponding to genotyped animals) in the context of ssGBLUP to regress G to the same base population as in Agg. Given p and p*, O(α) = (n1 + 4)(m + 1) - 2, from which c*, p*′p*, and pp have computational complexities of m and O[(p* - p)′M1′] = m + mn1. Figure 2 shows the impact of different m and n1 on O(α).

Fig. 2.

The impact of various n1 and m on (n1 + 4)(m + 1) - 2 in millions (M).


Updating G-1

Hager (1989) introduced the state-of-the-art method of updating the inverse of a matrix. Meyer et al. (2013) showed how this method can be used to update G-1, which is presented below.



where G22 = (G22 - QG12)-1 and cjas-2019-0106ieq5.gif. There are other forms of presenting the updated G-1, such as






where I is an identity matrix with the order of the number of new genotyped animals (n2). The only matrix in need of inversion is G22 - QG12, which is a small matrix because usually the number of additional genotypes is small. This method considers no changes in allele frequencies due to new genotypes. Assuming allele frequencies in the new genotyped animals causing change in the scale, but not the base of G, this method can be modified by obtaining G12 and G22 according to eq. 4, and multiplying cjas-2019-0106ieq6.gif by c/c*.

The equation for updating G-1 (eq. 5) is similar to the equation for the inverse of the pedigree relationship matrix including phantom parent groups, introduced by Quaas (1988). By substituting G22 with A-1, cjas-2019-0106ieq6.gif with Φ22, and redefining Q as the relationship coefficient matrix between the base animals and their phantom parent groups, eq. 5 is then changed to



Considering Φ22 = 0, this matrix becomes the same as the inverse of the pedigree relationship matrix including phantom parent groups (Quaas 1988). This assumption is true because there is no previous inverse for new animals (phantom parent groups) to be added.

Compared with inverting the whole G, updating G-1 reduces the matrix inversion complexity from (n1 + n2)3 to cjas-2019-0106ieq7.gif. However, there are additional complexities for matrix multiplication, equal to cjas-2019-0106ieq8.gif for Q, cjas-2019-0106ieq9.gif for G22Q, cjas-2019-0106ieq8.gif for QG22Q, and for matrix summations equal to n1 + n2. The greater the n1, and the smaller the n2, the more the advantage in updating G-1 (Meyer et al. 2013). Figure 3 shows (a) O(G-1) and (b) O(updating cjas-2019-0106ieq6.gif to G-1)/O(G-1) = [cjas-2019-0106ieq7.gif + n1n2(2n1 + n2) + n1 + n2]/(n1 + n2)3 with various n1 and n2. Generally, computational complexity of G-1 increases exponentially by increasing n1 and n2. However, with n2 smaller than n1, the trend of O(G-1) by n2 gets closer to linear (Fig. 3a). Larger n1 also caused O(G-1) to increase at a higher rate by increasing n2. The benefit from updating G-1 increased by increasing n1/n2 (Fig. 3b). The strategy of updating G-1 can be applied for updating cjas-2019-0106ieq10.gif used in ssGBLUP (Aguilar et al. 2010; Christensen and Lund 2010). However, it would be more convenient to update cjas-2019-0106ieq11.gif, where Agg is the block of A-1 for genotyped animals. The matrix cjas-2019-0106ieq12.gif is used in ssGBLUP instead of Agg in BLUP. According to Strandén and Mäntysaari (2014):



where n denotes non-genotyped animals. The Ang and Ann blocks are easy to obtain. Instead of directly inverting Ann, each column of (Ann)-1Ang can be obtained by solving the equation system Annx = y, where x and y are the jth columns of (Ann)-1Ang and Ang, respectively. Therefore, to get an updated cjas-2019-0106ieq11.gif, the new genotyped animals are appended to Ang in the above formula. However, this is assuming that the pedigree of the new genotyped animals does not contain any new non-genotyped animal.

Fig. 3.

The impact of various n1 and n2 on (a) (n1 + n2)3 in billions (B) and (b) [cjas-2019-0106ieq7.gif + n1n2(2n1 + n2) + n1 + n2]/(n1 + n2)3.


A better approach for updating G-1 is using the algorithm for proven and young (APY), developed by Misztal et al. (2014). This algorithm is better in the sense that it is computationally more feasible, it has less noise associated with noncoding markers (Nilforooshan and Lee 2019), and it may overcome the singularity problem of G, especially when the number of genotyped animals reach the number of markers. The cjas-2019-0106ieq13.gif approximate of G-1 is calculated as (Misztal et al. 2014)



where D22 is a diagonal matrix with diagonal elements cjas-2019-0106ieq14.gif, where gii is the ith diagonal element of G22, n is the number of markers, mij is the genotype of the new individual i for marker j, and gi1 is the ith row of G21. Thus, given cjas-2019-0106ieq6.gif from the previous evaluation, G12 and gii elements are required, where G12 = c(M1 - 2P1)(M2 - 2P2)′, and cjas-2019-0106ieq15.gif. Please note that forming and inverting G22 are not required.

In APY, genotyped animals are divided into core and noncore animals and the direct inversion is only required for the block of G for core animals (Misztal et al. 2014). This method can overcome the computational challenges of inverting a large G. In addition, the closer the number of genotyped animals get to the number of markers, the numerical stability of G decreases until the number of genotyped animals exceed the number of markers. At this point, G would be singular. By updating G-1 using APY, previously genotyped animals are considered as core and the new genotyped animals are considered as noncore. If genotypes from the core animals provide enough information about the independent chromosome segments in the population, the core information is provided and the off-diagonals of G22 do not contribute to the accuracy and can be ignored. Thus, the genomic breeding value of noncore animals is conditioned to the genomic breeding value of core animals (Misztal 2016). Random cross-generation core definition has been found to perform well in APY (Ostersen et al. 2016; Bradford et al. 2017; Nilforooshan and Lee 2019). Updating G-1 consecutively with APY results in dropping the latest generations from the core sample, which may have unfavourable effects. Therefore, after one or a few (depending on the population) APY updates of G-1, it is recommended to replace cjas-2019-0106ieq6.gif with a new G-1 or with a random core cjas-2019-0106ieq13.gif.

Parallel processing

Computational time for matrix operations can be considerably reduced by parallel processing. Computational complexity of a n1 × m by m × n2 matrix multiplication is n1mn2. The n1mn2 computational complexity can split across n1, n2, or n3 parallel processes, where n3 is an integer less than n1 and n2. Matrix multiplication can be done in independent parallel processes; thus, the cost of communications across nodes is minimized. For example, row i of AB can be obtained independently from other rows of AB by multiplying row i of A to B, or column j of AB can be obtained independently from other columns of AB by multiplying A to column j of B. An R function for parallel processing of matrix multiplication is provided in Appendix B. Though not discussed here, parallel processing algorithms do exist for matrix inversion and are used in some genetic evaluation softwares.

Updating A

Matrix A is not needed in genetic evaluation models and its calculation is computationally more difficult than calculating A-1. However, it is usually needed for postevaluation procedures, such as designing breeding schemes, preserving genetic variation, and controlling inbreeding in the population. Updating A is not different from resuming an incomplete construction of A to its full matrix. The following algorithm demonstrates a procedure for updating A. Considering no pedigree errors (e.g., young animals being parents to old animals) and no pedigree correction, the pedigree file for young animals has only young animals in the first (animal ID) column. Any row corresponding to animals not among the young animals is deleted. Then, A is updated with the following simple rules:

  1. For animals in the new pedigree with no parents:

    • 1.1. Append elements of 1 to A for the corresponding diagonal values.

    • 1.2. Delete those animals from the new pedigree.

  2. While the new pedigree is not empty:

    • 2.1. Find an animal with parent(s) not available in the first column of the new pedigree.

    • 2.2. Following Emik and Terrill (1949), the relationship of animal i with others (already) in A is the average of the relationship of its parents, s and d with those animals. The same is true for the relationship of the animal to its parents [Ais =(Ass + Asd)/2, Aid = (Add + Asd)/2], and Aii = 1 + Asd/2.

    • 2.3. Delete that animal from the new pedigree.

This technique can be used, not only for updating, but also for constructing A from scratch. Sorting animals by age is not needed as it is built into the algorithm by picking animals in the right order (parents before progeny). Instead of updating A, which may include millions of animals, a subset of it including the animals of interest (e.g., the last x generations) and the parents of the new animals (if known) can be updated.

There are available methods for calculating inbreeding coefficients (Tier 1990; Meuwissen and Luo 1992; Colleau 2002; Sargolzaei and Iwaisaki 2004; R.L. Quaas (1995), unpublished note) that can be adopted in updating situations (i.e., calculating inbreeding coefficients for young animals given inbreeding coefficients for old animals). Sargolzaei and Iwaisaki (2005) compared computational performance of these four algorithms. The algorithms of R.L. Quaas (1995, unpublished note) and Sargolzaei and Iwaisaki (2004) are modifications to the algorithm of Meuwissen and Luo (1992). The performance of the algorithms depended on the number of generations, population, and family sizes. The algorithm of Sargolzaei and Iwaisaki (2004) was faster than the other algorithms. Computation time considerably decreased in updating situations, in which the algorithm of Sargolzaei and Iwaisaki (2004) outperformed the other algorithms (Sargolzaei and Iwaisaki 2005). Colleau (2002) developed an indirect method for obtaining individual inbreeding coefficients and relationship statistics in the population. The method was indirect because instead of element-wise calculation of the relationship matrix, groups of elements were calculated simultaneously. The computational efficiency of the method was heavily reliant on the sparseness of the inverse of the relationship matrix.

Computational complexity of calculating or updating A depends on many factors and it differs from population to population and animal to animal. The computation involves finding the parents of the animal; searching for their relationships, the size, and the sparsity of A; the number of previous animals in the pedigree; the number of new animals; and the computational algorithm.

Updating A-1

Updating A-1 is not different from resuming an incomplete construction of A-1 to its full matrix. However, computationally, reading and updating an old A-1 is not justifiable over calculating A-1 from scratch. Calculation of A-1 is easier than the calculation of A. The elements of A-1 for an animal are conditional to its parents, whereas the elements of A for an animal are also conditional to the other relatives.

Matrix storage

Conventionally, relationship matrices and their inverses are saved in a sparse upper- and (or) lower-triangle format with three columns for the ID of animals and the matrix element. It saves disk space and makes indexing matrix elements easy. However, symmetric dense matrices can benefit from being stored as an array of upper- and (or) lower-triangular values, which takes considerably less storage than a three-column data frame. In programming, this method of storing a matrix is called packed storage. This method is used in Fortran’s linear algebra libraries, BLAS, and LAPACK (Wikipedia Contributors 2013). For a packed matrix of length n, the matrix dimension is obtained as N = [√(1 + 8n) - 1]/2. For indexing, each row i starts with the [(i - 1)(N - i/2 + 1) + 1]th element and ends with the [(2N - i + 1)i/2]th element of the vector, and an element from the ith row and jth column of the matrix is located at the [(i - 1)(N - i/2 + 1) + 1 + |i - j|]th element of the vector.

Updating vs. rebuilding

On deciding whether to update matrices or computing them from scratch, some consideration should be given to the following:

  1. The drive’s read speed should be considered. Nowadays, there are commercial non-volatile memory express drives with a read speed over 5000 MB per second. There is less concern about the writing speed as it occurs after computations have been completed. In efficient programming, read-write instances are minimized.

  2. The size and the sparsity of the matrix to be read compared with the size of the previous data file (e.g., pedigree or genotypes) to rebuild the matrix.

  3. Data file format (e.g., binary vs. ASCII).

  4. Data reading strategy (i.e., mapping the file into memory vs. reading the file into a buffer via a connection; reading a large file by memory mapping is faster). The pros and cons of specific data reading strategies are not in the scope of this paper.

  5. Computational complexity of the calculations.

  6. Central processing unit clock speed.

  7. Single-core vs. parallel computing.

  8. Storage vs. computation cost.


An overview of the methods for updating different relationship matrices and their inverses was provided and possible improvements were proposed. There are possibilities for reducing required computational time and resources for calculating relationship matrices or their inverses by updating the previously calculated matrix for new animals and parallel computing. The downside of updating matrices is allocating disk space for saving the matrix at time t and reading it for the evaluation at time t + 1. Both can be reduced by saving the old matrix as half-stored (i.e., upper and (or) lower diagonal), sparse (i.e., skipping zero elements), and binary.


The author acknowledges Livestock Improvement Corporation (LIC, Hamilton, NZ) for providing funds for the open access publication of this paper.



Aguilar , I. , Misztal , I. , Johnson , D.L. , Legarra , A. , Tsuruta , S. , Lawlor , T.J. 2010. Hot topic: a unified approach to utilize phenotypic, full pedigree, and genomic information for genetic evaluation of Holstein final score. J. Dairy Sci. 93: 743–752. doi: PMID:PMID:20105546. Google Scholar


Bradford , H.L. , Pocrnić , I. , Fragomeni , B.O. , Lourenco , D.A.L. , Misztal , I. 2017. Selection of core animals in the algorithm for proven and young using a simulation model. J. Anim. Breed. Genet. 134: 545–552. doi: PMID:PMID:28464315. Google Scholar


Christensen , O.F. , Lund , M.S. 2010. Genomic prediction when some animals are not genotyped. Genet. Sel. Evol. 42: 2. doi: PMID:PMID:20105297. Google Scholar


Colleau , J.-J. 2002. An indirect approach to the extensive calculation of relationship coefficients. Genet. Sel. Evol. 34: 409. doi: PMID:PMID:12270102. Google Scholar


Emik , L.O. , Terrill , C.E. 1949. Systematic procedures for calculating inbreeding coefficients. J. Hered. 40: 51–55. doi: PMID:PMID:18116093. Google Scholar


Hager , W.W. 1989. Updating the inverse of a matrix. SIAM Rev. 31: 221–239. doi: Scholar


Henderson , C.R. 1973. Sire evaluation and genetic trends. J. Anim. Sci. 1973: 10–41. doi: Scholar


Henderson , C.R. 1975. Rapid method for computing the inverse of a relationship matrix. J. Dairy Sci. 58: 1727–1730. doi: Scholar


Meuwissen , T.H.E. , Luo , Z. 1992. Computing inbreeding coefficients in large populations. Genet. Sel. Evol. 24: 305–313. doi: Scholar


Meyer , K. , Tier , B. , Graser , H.U. 2013. Technical note: updating the inverse of the genomic relationship matrix. J. Anim. Sci. 91: 2583–2586. doi: PMID:PMID:23508030. Google Scholar


Misztal , I. 2016. Inexpensive computation of the inverse of the genomic relationship matrix in populations with small effective population size. Genetics, 202: 401–409. doi: PMID:PMID:26584903. Google Scholar


Misztal , I. , Legarra , A. , Aguilar , I. 2014. Using recursion to compute the inverse of the genomic relationship matrix. J. Dairy Sci. 97: 3943–3952. doi: PMID:PMID:24679933. Google Scholar


Nilforooshan , M.A. , Lee , M. 2019. The quality of the algorithm for proven and young with various sets of core animals in a multibreed sheep population. J. Anim. Sci. 97: 1090–1100. doi: PMID:PMID:30624671. Google Scholar


Ostersen , T. , Christensen , O.F. , Madsen , P. , Henryon , M. 2016. Sparse single-step method for genomic evaluation in pigs. Genet. Sel. Evol. 48: 48. doi: PMID:PMID:27357825. Google Scholar


Quaas , R.L. 1976. Computing the diagonal elements and inverse of a large numerator relationship matrix. Biometrics, 32: 949–953. doi: Scholar


Quaas , R.L. 1988. Additive genetic model with groups and relationships. J. Dairy Sci. 71: 1338–1345. doi: Scholar


Sargolzaei , M. , Iwaisaki , H. 2004. An efficient algorithm for computing inbreeding coefficients in large populations. Jpn. J. Biom. 25: 25–36. doi: Scholar


Sargolzaei , M. , Iwaisaki , H. 2005. Comparison of four direct algorithms for computing inbreeding coefficients. Anim. Sci. J. 76: 401–406. doi: Scholar


Strandén, I., and Mäntysaari, E.A. 2014. Comparison of some equivalent equations to solve single-step GBLUP. Page, 69 in Proc. 10th World Congress of Genetics Applied to Livestock Production, 17–22 Aug. 2014, Vancouver, BC, Canada.Strandén, I., and Mäntysaari, E.A. 2014. Comparison of some equivalent equations to solve single-step GBLUP. Page, 69 in Proc. 10th World Congress of Genetics Applied to Livestock Production, 17–22 Aug. 2014, Vancouver, BC, Canada.


Tier , B. 1990. Computing inbreeding coefficients quickly. Genet. Sel. Evol. 22: 419–430. doi: Scholar


VanRaden , P.M. 2008. Efficient methods to compute genomic predictions. J. Dairy Sci. 91: 4414–4423. doi: PMID:PMID:18946147. Google Scholar


Vitezica , Z.G. , Aguilar , I. , Misztal , I. , Legarra , A. 2011. Bias in genomic predictions for populations under selection. Genet. Res. 93: 357–366. doi: PMID:PMID:21767459. Google Scholar


Wikipedia Contributors. 2013. Packed storage matrix. [Online]. Available from [13 June 2019].Wikipedia Contributors. 2013. Packed storage matrix. [Online]. Available from [13 June 2019].


Appendix A

Assuming a G (genomic relationship) matrix available on n number of genotypes in an M genotype matrix, before expanding G for new genotypes, it needs to be corrected to G* (same size as G) for allele frequencies (p) to be changed to p* due to the new genotypes. The aim is to obtain regression coefficients α and b based on p and p* to predict (a regression, not an exact estimation) G* as G if p* was used instead of p (pk = allele frequency for marker k in p).


G = cZZ′, c = 1/[2∑pk(1 - pk)], G* = c*Z*Z*′, and c* = 1/[2 ∑pk* (1 - pk*)]. Thus,


ij Gij* = ∑ij Gij, to obtain the unknown independent G*, it is assumed that the slope of G* on G is due to the applied coefficients (i.e., b = c*/c). Thus,


Similarly, Z* = M - 2P* and P* = 1p*′. Therefore,


Substituting the above line in the formula for α gives


Appendix B

R function for parallel processing of matrix multiplication:


mmultpar <- function(A, B, ncl) {

if(ncol(A)!=nrow(B)) stop(“ERROR: Dimension mis-match”)

if(ncl < 1) stop(“ERROR: ncl should be a positive integer.”)

if((ncl < nrow(A)) & (ncl < ncol(B))) {

cl = makeCluster(ncl)

Alist = lapply(splitIndices(nrow(A), length(cl)), function(x) A[x,,drop=FALSE])

ans = clusterApply(cl, Alist, get(“%*%”), B)

return(, ans))

} else if (nrow(A) > ncol(B)) {

cl = makeCluster(ncol(B))

Blist = lapply(1:ncol(B), function(x) t(B)[x,,drop=FALSE])

ans = clusterApply(cl, Blist, get(“%*%”), t(A))

return(t(, ans)))

} else {

cl = makeCluster(nrow(A))

Alist = lapply(1:nrow(A), function(x) A[x,,drop=FALSE])

ans = clusterApply(cl, Alist, get(“%*%”), B)

return(, ans))



Three objects are required: (i) the left-side matrix (A), (ii) the right-side matrix (B), and (iii) the user-defined number of cluster nodes (ncl).

The processes are independent, and ncl is not limited to the number of available nodes on the computer. If ncl > max(nrow(A), ncol(B)), min(nrow(A), ncol(B)) is applied as the number of cluster nodes.

Copyright remains with the author(s) or their institution(s). This work is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Mohammad Ali Nilforooshan "Updating genetic relationship matrices and their inverses: a methodology note," Canadian Journal of Animal Science 100(2), 292-298, (28 November 2019).
Received: 17 June 2019; Accepted: 22 November 2019; Published: 28 November 2019
parallel processing
relationship matrix
Back to Top