Ease
aims to implement in a simple and efficient way in
R the possibility to perform population genetics simulations considering
multiple loci whose epistasis is fully customizable. Specifically suited
to the modelling of multilocus nucleocytoplasmic systems, it is
nevertheless possible to simulate purely nucleic, i.e. diploid (or
purely cytoplasmic, i.e. haploid) genetic models. The simulations are
not individual-centred in that the transition from one generation to the
next is done matrix-wise on the basis of deterministic equations.
Instead of each individual being described separately, the simulations
only handle the genotype frequencies within the population. All possible
genotype frequencies considering the loci and alleles defined by the
user are explicitly tracked. The simulations are therefore fast only if
the number of genotypes is not too large.
The consideration of genetic drift and thus a specific population
size is nevertheless introduced as a multinomial draw each generation,
which adds to the realism of the simulations by adding randomisation. In
the Ease
package, the life cycle of the simulated
population is standard ([selection on gamete production] -
[gametogenesis (recombination + meiosis + mutation)] - [selection on
gametes] - [syngamy] - [selection on individuals] - [drift]) and may
consider the population dioecious or hermaphroditic.
NOTE: because selection is only definable genotype by genotype and haplotype by haplotype, Ease is (for the moment at least) not suitable when many genotypes are generated by multiple loci and alleles, unless you automate the process yourself. Very complex genetic models or those involving many loci are not the most optimised way to be simulated with Ease. Note that roughly speaking, if the number of genotypes possible by the input genome configuration is greater than the number of individuals desired, an individual-centred model is probably more suitable (see SLiM software; BC Haller, PW Messer (2019). SLiM 3: Forward genetic simulations beyond the Wright–Fisher model. Molecular Biology and Evolution. 36:632.).
A genome is defined by the set of loci to which lists of alleles are attached. Each loci and each allele is defined by a unique name, which allows it to be unequivocally identified.
There are two types of loci: diploid and haploid. A genotype is
defined as an allelic combination of all the alleles of an individual’s
loci and a haplotype as only those alleles that have been inherited
together from a single parent. A genotype is therefore made up of two
haplotypes. A distinction is also made between diploid (resp. haploid)
haplotypes which correspond to allelic combinations taking into account
only diploid (resp. haploid) loci. The loci are defined by a list of
vectors that enumerates their respective alleles. The order in which the
loci are placed is not important in the case of haploid loci. It does
matter in the case of diploid loci because recombination is likely to
affect the haplotypes. In the Ease
package, diploid loci
are
In the case of diploid loci, however, if several are defined, the
order of the diploid loci in the list is not trivial. The rates of
two-to-one combinations between them must indeed be defined by a vector
of recombination rates. For example, if three diploid loci are defined,
this vector must be of length 2, the first of its values defining the
recombination rate between the first and second loci, the second of its
values the recombination rate between the second and third loci. For
example, if we want to define two groups of two loci that are linked to
each other but are on two different chromosomes, we can define the
recombination rate vector as c(0.1, 0.5, 0.1)
. The first
two loci are thus relatively linked (recombination rate of
0.1
), as are the last two loci. On the other hand, the
recombination rate of 0.5
between the second and third loci
ensures that the two groups are independent.
To create a haplotype ID, we concatenate all diploid alleles and all
haploid alleles separately, then concatenate these two strings by
separating them with "||"
. For example
"Ab||CD"
corresponds to a haplotype with four loci, two
diploid with alleles A
and b
, and two haploid
with alleles C
and D
. The principle is the
same for the genotypes, but the second diploid haplotype is added by
separating it from the first by a "/"
, for example
"Ab/ab||CD"
.
Each loci is represented by a name and a factor vector that lists its
alleles. If one wish to consider a system with two loci, a diploid and a
haploid, each of which has two alleles, A
and
a
, and B
and b
respectively, the
construction of the genome is done as follows:
= list(dl = as.factor(c("A", "a")))
LD = list(hl = as.factor(c("B", "b")))
HL = setGenome(listHapLoci = HL, listDipLoci = LD) genomeObj
The haplotypes and genotypes have been generated automatically, their
numbers can be retrieved by simply displaying the Genome
object created:
genomeObj#> -=-=-=-=-=-= GENOME OBJECT =-=-=-=-=-=-
#> # 1 haploid locus, with 2 allele(s)
#> # 1 diploid locus, with 2 allele(s)
#> # 4 haplotypes
#> # 6 genotypes
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
#> (use print for a list of haplotypes and genotypes)
and an exhaustive list can be displayed using the print
method:
print(genomeObj)
#> -=-=-=-=-=-= GENOME OBJECT =-=-=-=-=-=-
#> in details
#>
#> # 1 haploid loci:
#> - 'hl' with B and b alleles
#>
#> # 1 diploid loci:
#> - 'dl' with A and a alleles
#>
#> # 4 haplotypes:
#> [1] 1) A||B 2) a||B 3) A||b 4) a||b
#>
#> # 6 genotypes:
#> [1] 1) A/A||B 2) A/a||B 3) a/a||B 4) A/A||b 5) A/a||b
#> [6] 6) a/a||b
#>
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
The haplotypes and genotypes are numbered, and these numberings will be important in defining the different types of fitness, as we shall now see.
A genome necessarily has a mutation matrix attached to it. This mutation matrix is haplotypic: it is a square probability matrix (the sum of the rows of which is equal to 1), of size equal to the number of haplotypes defined in the genome. This mutation matrix is not provided as is by the user, in which case it would be too tedious to define. Instead the user is asked to either :
NOTE: In practice, the mutation matrix is not used as such in the simulations. It is associated with the recombination matrix and the meiosis matrix which associates to each genotype the probability that they produce each haplotype by chromosomal segregation. It is with a matrix product Recombination matrix x Meiosis matrix x Mutation matrix that a single gametogenesis matrix is produced and used for the simulations.
Definition of the haplotypic mutation matrix by filling in the allelic mutation matrices :
= setMutationMatrix(genomeObj = genomeObj,
mutMatrixObj mutHapLoci = list(matrix(c(0.95, 0.05, 0.03, 0.97), 2, byrow = T)),
mutDipLoci = list(matrix(c(0.9, 0.1, 0.09, 0.91), 2, byrow = T)))
mutMatrixObj#> -=-=-=- MUTATION MATRIX OBJECT -=-=-=-
#> # Haplotypic mutation matrix:
#> A||B a||B A||b a||b
#> A||B 0.8550 0.0950 0.0450 0.0050
#> a||B 0.0855 0.8645 0.0045 0.0455
#> A||b 0.0270 0.0030 0.8730 0.0970
#> a||b 0.0027 0.0273 0.0873 0.8827
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
#> (use print to access the allelic mutation matrices)
Definition of the haplotypic mutation matrix by filling in the forward and backward mutation rates:
= setMutationMatrixByRates(genomeObj = genomeObj, forwardMut = 1e-2)
mutMatrixObj
mutMatrixObj#> -=-=-=- MUTATION MATRIX OBJECT -=-=-=-
#> # Haplotypic mutation matrix:
#> A||B a||B A||b a||b
#> A||B 0.9801 0.0099 0.0099 1e-04
#> a||B 0.0000 0.9900 0.0000 1e-02
#> A||b 0.0000 0.0000 0.9900 1e-02
#> a||b 0.0000 0.0000 0.0000 1e+00
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
#> (use print to access the allelic mutation matrices)
Selection can be defined at three stages of this cycle: on mature individuals directly, on their gamete production or on the gametes. For individuals and gamete production, a fitness value is associated with each genotype. For gametes, it is with each haplotype. When defining fitness vectors, it is therefore necessary to know the order of haplotypes and genotypes (see previous section).
A fitness value is any positive or zero real. Fitness values are relative, so if all genotypes have a fitness value of 3, there will be no effect on the dynamics of the model.
In all cases, the construction of a Selection
object is
done using a genome class object (which is used to check the
compatibility between the constructed genome and the desired selection
parameters).
Then it is for example possible to define no selection (neutral
model) with the function setSelectNeutral
to construct a
selection
object where the fitnesses are all identical
(equal to 1):
= setSelectNeutral(genomeObj = genomeObj) selectionObj
We can then check that no selection has been defined:
selectionObj#> -=-=-=-=-=- SELECTION OJBECT =-=-=-=-=-
#> # On individuals: NO
#> # On gametes: NO
#> # On gamete production: NO
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
#> (use print to access the fitness values)
or with :
print(selectionObj)
#> -=-=-=-=-=- SELECTION OJBECT =-=-=-=-=-
#> in details
#>
#> No selection defined.
#>
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Using the example given in the Genome section,
one might want to simulate a system of genetic incompatibility where
when the derived alleles a
and b
are put
together within the same genotype, they induce a fitness cost through
negative epistasis. This cost, which we will call s
, is
associated with h
dominance which reduces this cost when
the a
nuclear allele is in the heterozygous state. Thus
individuals A/A||B
, A/a||B
,
a/a||B
and A/A||b
do not suffer any fitness
cost (because they have only one of the two incompatible alleles), their
fitness is equal to 1. The genotype A/a||b
undergoing the
reduced cost of incompatibility has a fitness of 1-h*s
and
the genotype a/a||b
undergoing the full cost of
incompatibility has a fitness of 1 - s
.
= 0.8
s = 0.5
h = setSelectOnInds(genomeObj = genomeObj, indFit = c(1, 1, 1, 1, 1 - h*s, 1 - s)) selectionObj
We can then check that selection has been defined:
selectionObj#> -=-=-=-=-=- SELECTION OJBECT =-=-=-=-=-
#> # On individuals: YES
#> # On gametes: NO
#> # On gamete production: NO
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
#> (use print to access the fitness values)
Regarding selection on individuals, it is necessary to understand
that it will potentially not be identical if the modelled population is
hermaphroditic or dioecious. In the case of hermaphroditism there is no
distinction between female and male fitness, and so the
indFit
parameter will govern their fitness. If the sexes
are separated, however, one can either define a fitness in individuals
indFit
that will apply to both males and females, or
specify separately for males and females with the parameters
femaleFit
and maleFit
.
In any case it is good to check with the print method that the fitnesses are those wanted:
print(selectionObj)
#> -=-=-=-=-=- SELECTION OJBECT =-=-=-=-=-
#> in details
#>
#> Individuals Female Male
#> A/A||B 1.0 1.0 1.0
#> A/a||B 1.0 1.0 1.0
#> a/a||B 1.0 1.0 1.0
#> A/A||b 1.0 1.0 1.0
#> A/a||b 0.6 0.6 0.6
#> a/a||b 0.2 0.2 0.2
#>
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
Selection can also be defined on gamete production:
= setSelectOnGametesProd(genomeObj = genomeObj, indProdFit = c(1, 1, 1, 1, 1 - h*s, 1 - s)) selectionObj
or on the gametes directly:
= setSelectOnGametes(genomeObj = genomeObj, femaleFit = c(1, 1, 1 - s, 1 - s)) selectionObj
For these two ways of selecting for gametes, one can define fitness on a sex-by-sex basis or on all gametes, as desired.
Last but not least, it is obviously possible to combine these
different layers of selections. This is done using the
selectionObj
parameter that each of the
setSelect...
functions has (except
setSelectNeutral
), it is then unnecessary to recall the
genome to which the selection refers. For example, if we want to combine
the three types of selections presented here :
= 0.8
s = 0.5
h = setSelectOnInds(genomeObj = genomeObj,
selectionObj indFit = c(1, 1, 1, 1, 1 - h*s, 1 - s))
= setSelectOnGametesProd(indProdFit = c(1, 1, 1, 1, 1 - h*s, 1 - s),
selectionObj selectionObj = selectionObj)
= setSelectOnGametes(femaleFit = c(1, 1, 1 - s, 1 - s),
selectionObj selectionObj = selectionObj)
print(selectionObj)
#> -=-=-=-=-=- SELECTION OJBECT =-=-=-=-=-
#> in details
#>
#> Individuals Female Male
#> A/A||B 1.0 1.0 1.0
#> A/a||B 1.0 1.0 1.0
#> a/a||B 1.0 1.0 1.0
#> A/A||b 1.0 1.0 1.0
#> A/a||b 0.6 0.6 0.6
#> a/a||b 0.2 0.2 0.2
#>
#> # On gametes:
#> Female gamete Male gamete
#> A||B 1.0 1
#> a||B 1.0 1
#> A||b 0.2 1
#> a||b 0.2 1
#>
#> # On gamete production:
#> Female gamete Male gamete
#> A/A||B 1.0 1.0
#> A/a||B 1.0 1.0
#> a/a||B 1.0 1.0
#> A/A||b 1.0 1.0
#> A/a||b 0.6 0.6
#> a/a||b 0.2 0.2
#>
#> -=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
The Ease
class object, eponymous with the name of the
package, has the role of gathering all the parameters necessary for the
construction of the model for simulations. As an input, it therefore
takes all that is necessary, deduces the transition matrices that will
be used for the simulations, and then, thanks to the
simulate
method, generates the results of the simulations.
The results of the simulations can be obtained thanks to the
getResults
function, but a summary analysis can be done
thanks to the plot
and summary
methods. The
simulations can be saved every x generations, i.e. the
genotypic and allelic frequencies will be stored for all simulations.
This allows a better understanding of the dynamics of the simulations if
needed, but requires longer calculation times. The access to these
records is done through the getRecords
method.
There are two ways in which a simulation can stop: it has reached a stop condition, or it has reached a user-defined generation threshold beyond which the simulation stops. A stop condition is a vector containing the name(s) of the allele(s) which, when set, cause the simulation to stop. Whether one or more stop conditions are defined, a list is systematically created which brings them together, and which allows them to be named (which is recommended).
We build a model via the setEase function by giving it as parameters
the population size N, the threshold of generations not to be exceeded,
the type of system, dioecy (dioecy = TRUE
) or
hermaphroditism (dioecy = FALSE
), the rate of
self-fertilisation (which will be ignored in dioecy), the list of stop
conditions, and then the three objects of the three classes that were
presented in the previous sections, i.e., the mutation matrix, the
genome, and the selection object.
= setEase(N = 100, threshold = 1e6, dioecy = F, selfRate = 0.5,
mod stopCondition = list(nucleo = "a", cyto = "b"),
mutMatrixObj = mutMatrixObj,
genomeObj = genomeObj,
selectionObj = selectionObj)
Then simulations can be generated:
= simulate(mod, nsim = 50, recording = T, seed = 123) mod
And the results can be displayed using the plot
method:
plot(mod)
#> <Press enter to go to the next graph>
#> Warning: Removed 1 rows containing missing values (position_stack).
#> <Press enter to go to the next graph>
#> <Press enter to go to the next graph>