Abstract
The first aim of this article is to shed light on the two distinct definitions of the assumption of additivity in quantitative behavior genetics and its associated methodologies, such as heritability estimation. In addition, this article aims to assess the validity of this assumption, based on both of the ways in which it has been defined. There appear to be two related but distinct definitions of the additivity assumption: 1) the assumption that the total phenotypic variance of any quantitative trait can be expressed as the sum of the genetic variance and environmental variance components for that trait, i.e. that the whole is equal to the sum of its separable parts, and 2) the assumption that most, if not all, genetic variance in a trait is additive in nature, i.e. that the total genetic effect is mostly or entirely equal to the sum of the effects of multiple genes acting independently and additively. It is hoped that highlighting these distinct meanings will help to clarify future discussions surrounding research in quantitative behavior genetics and the assumptions on which such research depends.
Introduction
The assumption of additivity underlying much research in the quantitative genetics of human behavioral traits (e.g. twin studies, heritability analysis, etc.) has long been criticized as untenable. But what exactly does this assumption mean? Here I outline the confusing fact that there appear to be two distinct answers to this question, and attempt to distinguish between these two distinct assumptions that are often confusingly referred to in similar or even identical terms. I also try to assess the extent to which each of these assumptions is supported by the available empirical evidence, and call for greater semantic clarity in future discussions of the existence of genetic additivity.
Defining the assumption of additivity
First, I will attempt to describe and distinguish between the two definitions that have been used for the assumption of additivity in the context of quantitative behavior genetics and heritability analyses. My goal of doing so is influenced by Moore, who recently described the different, often-confused meanings of the widely used phrase "gene-environment interaction".\cite{Moore_2018} Next, I attempt to situate each definition of this assumption in the context of heritability estimation and related human genetics research, and to critically assess the validity of each of the assumption's definitions in light of the available scientific evidence.
Definition A
Elsewhere, Moore has defined the additivity assumption as the assumption "that genetic and environmental influences on phenotypic variation are additive".\cite{Moore_2006} This is what I will call "definition A" of the assumption of additivity: that phenotypic variance (V) can be accurately expressed as the sum of two separate components, one for genes and one for the environment. (Note that I will henceforth use the words "variance" and "variation" interchangeably.) This may be written as the following equation (e.g., \cite{Tabery_2008}):
V = VG + VE (Eq. 1)
Here, V (also sometimes written VP, with the P standing for "phenotype") = total phenotypic variation in a given trait in a given population, VG = variation in that trait that is due to genetic factors, and VE = variation due to environmental factors. This definition of the assumption of additivity is only valid if equation 1 is accurate. It would not be accurate if, for instance, there is gene-environment interaction (G x E), in which case you would not be able to get the total variance just by adding together the genetic and environmental sources thereof.
Another way of understanding definition A can be gleaned from an examination of the ACE model, according to which total variation in a phenotype is equal to the sum of Additive genetic factors (A), Common environmental factors (C), and random Environmental factors (E). The validity of this model, at least in its simplest form, also depends on a form of definition A of the assumption of additivity (hereafter simply "definition A"). This is because the ACE model assumes that the A, C, and E components do not significantly interact with each other.\cite{Benchek2013} Therefore, the ACE model, like the idea that total variance can be described as the sum of that due to genetic and environmental factors, relies on the validity of definition A. Fundamentally, then, the equation underlying the ACE model is the same as Eq. 1, except that VG, which normally represents all genetic variance, is replaced by "A", which only represents additive genetic variance, and the term for environmental variance in Eq. 1 (VE) is replaced with C and E, which, added together, are supposed to equal the total environmental variance. In some cases, the A, C, and E terms in the ACE model can be represented with h, c, and e; because we are talking about variance, each of these terms should be squared. Thus, the equation underlying the ACE model can be denoted as follows:\cite{Partridge_2011}
P = h2 + c2 + e2 (Eq. 2)
What follows are some more examples, from various publications, of the assumption of additivity being referred to in keeping with definition A:
- "...it is asserted that the measured score (or phenotype) of an individual on a psychological test (Yi) is the sum of only two components, Gi determined by the genes and Ei specified by the person's environment; that is, G and E must not interact."\cite{Wahlsten_1994} (p. 245)
- "The traditional way to model the impact of genetic and environmental factors on risk for disease has been to assume additivity. That is, we assume the final risk is a result of the addition of genetic to environmental vulnerability."\cite{Kendler_2010}
- "...the assumption of additive gene and environmental effects is not only shown to be invalid by the ubiquity of G-E effects. Even more, however, the additive assumption leads developmental science down a fruitless path."\cite{Lerner_2006}
- "...standard behavioral genetic models assume that genetic, shared environmental, and nonshared environmental influences are additive and separable – the additivity assumption."\cite{Daw_2015}
- "In twin and adoption studies, estimates of the power of environmental factors are derived by adopting the additive assumption, i.e. by assuming that that the sources of variation in a trait can be separated into independent genetic (G) and environmental (E) components that together (along with error variance) add to l00% of the variance to be accounted for."\cite{Maccoby_2000}
Definition A is sometimes framed as a conceptual one: the assumption that it makes fundamental scientific sense to try to determine the relative importance of "nature vs. nurture" to variation in any trait. Or it can also be viewed as the assumption that it makes sense to conceptualize the total variation in a trait as the sum of genetic variation and environmental variation, however many additional interaction terms one may add, based on the underlying additive framework. Relational developmental systems is one framework that, in contrast to quantitative behavior genetics, views traits as the result of genes and environments interacting in a complex and inseparable way that renders attempts to determine the relative importance of genetics vs. the environment futile (e.g. \cite{Overton_2011}).
In short, definition A is the assumption that the "nature-nurture debate" is a legitimate debate with two opposing sides, each with their distinct quantitative level of importance, and that the relative importance of these two factors on variation in a phenotype can be determined through statistical techniques. If this assumption is true, then the total variation is equal to the sum of the genetic variance and the environmental variance.
There is little question that it is really important for quantitative BG researchers that definition A of the assumption of additivity is true: it is essential to being able to interpret heritability estimates causally, as Lewontin noted in 1974. If this assumption is invalid, no functional conclusions can be drawn regarding the trait being analyzed.\cite{LEWONTIN_2006} As Oftedal explains, critics argue that this is because of the inherently local nature of heritability estimates if the additivity assumption is not met: "In situations of additivity, heritability estimates are no longer just local. The result from one environment can be extrapolated to other environments."\cite{Oftedal_2005} (p. 702) Similarly, Lynch (2016) has recently noted that the heritability statistic's validity "...relies on the assumption that VG and VE act additively, so that there is no interaction or correlation between the two terms."\cite{Lynch_2016}
Here's another way to phrase the points summarized in the preceding paragraph:
- if definition A of the assumption is true, then heritability estimates are not just local. If heritability estimates are not just local, they can be applied to other environments, and if they can be applied to other environments, they can be interpreted causally.
- But returning to where we started, if definition A is false, then heritability estimates cannot be interpreted causally. In this case they acquire the status of being meaningless for any research looking for causes of traits.
Can non-additivity be adequately corrected for?
There are a number of methodological approaches that have been suggested and employed to correct for violations of definition A. One of these is log-transformation of a specific type of non-additive relationship between genes (G) and environment (E)--namely, a multiplicative relationship--so that it becomes additive. This can be used to dismiss the "rectangle" analogy, which compares the relative importance of genes/environment to that of length/width of a rectangle's area. Advocates of this approach include Neven Sesardic.\cite{sesardic2005}(p. 53) The idea is that you start with a multiplicative relationship between G and E that, when multiplied together, produce the phenotype (Y):
Y = G*E
This is supposed to be analogous to the fact that the area of a rectangle is equal to the product of its length and its width. Then you take the log of both sides of the above equation, producing:
log(Y) = log(G) + log(E)
And voila! What was once a multiplicative relationship is now additive, and definition A is no longer false. Or is it? Wahlsten (1990) notes that this practice, which is focused on removing non-additive relationships from data before it is analyzed, can distort the actual relationship between variables, rather than actually solving the non-additivity problem:
"The log transform alters the relations among the variables; consequently, transforming the scale of measurement may conceal the relations among heredity and environment, as it might conceal the essence of gravitation."\cite{Wahlsten_1990} (p. 118)
And on the very next page of this paper:
"If H and E really are multiplicative in a particular situation, a calculated "heritability" is nonsensical and taking the log of the observations may compound this."\cite{Wahlsten_1990} (p. 119)
In addition, Partridge (2011) has pointed out several other approaches for dealing with violations of definition A include: "...extensions to multivariate and latent variable models (ACE models, in which A = additive genetic variance, C = common environmental variance, and E = unique environmental variance) (Martin & Eves,1977; Michel & Moore, 1995), as well as use of multilevel models, to better address the nonindependence inherent in twin data (see Medlend & Neale, 2010)..." Partridge observes that these modifications "...are notable but still follow the same basic structure of Fisher’s original model", by which Partridge means Fisher's infinitesimal model, in which the total genetic effect on a quantitative trait is simply the sum of a very large number of individual loci, each of which has a very small effect on the trait.
Definition B
In addition to definition A, there is a second definition of the additivity assumption in regards to BG: namely, the assumption that the effects of genetic loci on variation in a complex trait are independent of each other and of the environment, so that you can add their individual effects together to get the total genetic effect. In other words, this definition, which I will call definition B, is that most genetic variance for complex traits in general is additive. As Wahlsten (1994) wrote in describing the method of heritability analysis, "the effect of all polymorphic loci affecting a behaviour are combined by adding them to yield the total Gi for an individual, which assumes genotype at one locus does not influence the action of genes at other loci." \cite{Wahlsten_1994} (p. 245)
Thus, I will refer to the assumption that all genetic variance in complex traits, behavioral or otherwise, is additive in nature as "definition B" of the additivity assumption. Even if some variance is due to non-additive genetic effects (dominance, epistasis, and/or GxE), definition B is still tenable so long as most of the genetic variance (VG) is due to additive effects. Therefore, this definition pertains to the % of VG that is due to additive genetic effects (VA) that are exactly equal to the sum of their parts. Thus this excludes all "genetic effects" due to dominance, epistasis, or gene-environment interactions. In other words, definition B is strictly true if VA/VG = 100% and is generally true (using a more lax definition) is VA/VG > 50%. Hill et al. (2008) concluded that VA/VG is in fact typically above 50% and often at or near 100%.\cite{Hill_2008} Nevertheless, many aspects of variance components analyses aimed at determining the relative importance of additive and non-additive genetic variation have been criticized.\cite{Huang_2016}
It should be noted that these two definitions overlap to some extent: if definition B is true, it seems to at least make definition A more plausible than if definition B is false. This is because if VG can be viewed as consisting entirely of additive genetic effects, we do not need to include terms for dominance, epistasis, or GxE in variance components equations to get a reasonably accurate result. There is also a sort of semantic confusion involved in the logic here: if genetic effects are mostly/entirely additive (i.e. definition B is true), then it follows that it makes sense to view phenotypic variation as the additive combination of genetic and environmental variation (i.e. definition A is true), because you will not need to include GxE or other non-additive terms in Eq. 1.
Definition B seems to be somewhat more commonly used in the literature compared to definition A. This is likely because it focuses on obvious, tangible things--specifically, genes and how they interact with each other and with the environment. Here are some examples of definition B being used to refer to the assumption of additivity:
- "...most GWA studies of heritability rely, in part or entirely, on an assumption of ‘additivity.’ For example, the claim that 8000 SNPs account for half the heritability of schizophrenia depends upon the assumption that each SNP contributes 1/8000 to half the heritability."\cite{Charney_2016} (p. 5)
- "...it is common to assume additive-only genetics — that is, where the effect of each SNP’s minor allele is strictly additive in relation to its count."\cite{Sabourin2015}
- "There are two types of genetic nonadditivity. The first is caused by genetic dominance."\cite{Rodgers_2001}
- "In most human genetic studies, the "solution" has been simply to make the (usually unstated) assumption that there is no genetic interaction... Typically, the studies assume a strictly additive model."\cite{Zuk_2012} (p. 1194)
Some of the confusion between the definitions stems from papers that use both definitions A and B without outlining that genetic "additivity" may refer to either definition (e.g. \cite{Wahlsten_1994}).
Defining heritability
Next, it should be noted that there are two types of heritability: narrow-sense heritability (hB2) and broad-sense heritability (h2). The difference between the two is that hB2 is only based on additive genetic variance, whereas h2 is based on both additive and non-additive (i.e. total) genetic variance.\cite{edition} hB2, rather than h2, is the value that agricultural breeders care about, because it is used to predict what the fastest way will be to maximize a desired trait in the organism of interest through a specific selective breeding strategy.\cite{Feldman1975}
It also needs to be explained just what heritability estimation actually is: as Oftedal has noted, it is "a statistical method based on a linear analysis of variance".\cite{Oftedal_2005} (p. 700)
So first we will consider the first definition ("definition A") of the additivity assumption: that variation in a phenotype = genetically caused + environmentally caused + maybe a small interaction term. Lynch (2016) has recently highlighted the two ways that this assumption can be violated: gene-environment interaction (G x E) and gene-environment correlation (the latter also called gene-environment covariance, abbreviated G-E covariance).\cite{Lynch_2016}
In addition, Wahlsten noted in a 1990 paper that
"Additivity is often tested by examining the interaction effect in a two-way analysis of variance (ANOVA) or its equivalent multiple regression model. If this effect is not statistically significant at the α = 0.05 level, it is common practice in certain fields (e.g., human behavior genetics) to conclude that the two factors really are additive and then to use linear models, which assume additivity."
But he reported in the same paper that
"...ANOVA often fails to detect nonadditivity because it has much less power in tests of interaction than in tests of main effects. Likewise, the sample sizes needed to detect real interactions are substantially greater than those needed to detect main effects."\cite{Wahlsten_1990}
In a subsequent paper, Wahlsten argues that there are fundamental biological reasons to believe that the assumption of additivity will almost always be false:
"The additive model is not biologically realistic. There are so many instances where the response of an organism to a change in environment depends on its genotype or where the consequences of a genetic defect depend strongly upon the environment, that genuine additivity of the two factors is very likely the rare exception."\cite{Wahlsten_1994} (p. 249)
The three quotes from Wahlsten cited above make it clear that he is discussing definition A of the assumption of additivity.
And elsewhere, he contends that human BG faces unique obstacles in controlling for non-additivity that animal researchers (such as himself) do not have as much of a problem with:
"To test interaction between genotype and environment, there must be many individuals with the same genotype who are reared in different environments. This is easily achieved with standard laboratory strains but not with humans. For our species, there is no valid test of gene x environment interaction, no matter what the sample size, unless distinct alleles of a specific gene in question can be identified...Because the additivity assumption cannot be tested empirically, the whole edifice of path models must be accepted on faith, if it is to be accepted at all."\cite{Wahlsten_2000}(p. 50)
Many critics of BG have argued that definition A of the additivity assumption is untenable, and that the way in which genes and environments actually interact to produce phenotypes is just that--interactive, not additive. Thus this criticism alleges that heritability calculations are uninterpretable (at least in terms of the relative roles of genes vs. environment in causing phenotypic variation), because definition A is simply false. This criticism is well summed up in a paper by Vreeke: "The core of the critique of behavior genetics, as far as it relies on the analysis of variance, is thus that it conceptualises the relation between genes and the environment as (mainly) additive, whereas in fact development is interactive."\cite{Vreeke_2000} (p. 37) The same paper notes, "Experimental animal research shows that interaction between genotype and the environment occurs often. And if genes and the environment interact, it is not possible to separately weigh the effect of one of those factors: they depend on each other. There is no reason to expect that humans are different in this respect. An analysis of variance ignores those effects, so cannot provide a true account of the causes of behavior."\cite{Vreeke_2000} (p. 37)
Locality and causality
As noted above, critics of heritability analysis argue that the additivity assumption is false, and that heritability estimates are really just local. But which assumption is it that the critics claim is false? To some extent it is both, but definition A seems to be a more common target of such criticisms. Lewontin makes it clear that he considers the locality of heritability estimates to prevent them from allowing causal conclusions to be drawn: "There is one circumstance in which the analysis of variance can, in fact, estimate functional relationships...It is not surprising that the assumption of additivity is so often made, since this assumption is necessary to make the analysis of variance anything more than a local description."\cite{LEWONTIN_2006} This criticism is referred to by Oftedal as the "locality objection".\cite{Oftedal_2005} (p. 702)
So what we have here are two BG responses to argument B: 1) Actually, most (if not all) genetic variance is additive, so this assumption is going to be at least mostly correct, and 2) to the extent that the assumption of additivity is false, there are plenty of ways that we can successfully account for it already, thank you very much! I will now focus a bit more on the first of these responses: that most variation in the traits behavior geneticists are studying is actually additive, meaning that the assumption Charney criticizes so harshly is actually pretty accurate. One frequently cited study by those making this claim is that of Hill et al. (2008), entitled "Data and Theory Point to Mainly Additive Genetic Variance for Complex Traits". I noted above that Neiderhiser et al. (2017) cited this paper to justify their claim that genetic variation in complex traits is mostly additive. But is this conclusion really justified by the evidence presented by Hill et al. (2008)?
Zuk et al. (2012) don't seem to think so: they argue that "...mistakenly assuming that a trait is additive can seriously distort inferences about missing heritability. From a biological standpoint, there is no a priori reason to expect that traits should be additive. Biology is filled with nonlinearity: The saturation of enzymes with substrate concentration and receptors with ligand concentration yields sigmoid response curves; cooperative binding of proteins gives rise to sharp transitions; the outputs of pathways are constrained by rate-limiting inputs; and genetic networks exhibit bistable states."\cite{Zuk_2012}
In their supplementary information (p. 45), Zuk et al. go into more detail about why they consider Hill et al.'s claims not to stand up to scrutiny. First, Zuk et al. explain two key arguments made by Hill et al.: "(a) most variants in a large population will have extremely low minor allele frequency and (b) traits caused by low-frequency alleles will not have substantial variance due to interactions." But Zuk et al. don't find these arguments the least bit convincing:
"Their claim is wrong, because the LP [linear pathway] models (a) can have substantial variance due to interactions (indeed, the majority) and yet (b) can involve any class of allele frequencies. (Specifically, LP models are defined as the minimum value of a set of traits, each of which is additive and normally distributed. There is no constraint on the allele frequencies of the variants that sum to yield these additive and normally distributed traits.)"
And on the next page:
"In effect, Hill et al.’s theory thus actually describes what happens for rare traits caused by a few rare variants. Not surprisingly, interactions account for a small proportion of the variance for such traits. Hill et al.’s model, however, is not pertinent to common traits. The interesting complex traits are those that have significant genetic variance in the population: these traits necessarily have higher allele frequencies (assuming they depend only on a few, e.g. two loci) and thus, under Hill et al.’s analysis, can involve larger interaction variance and a higher ratio VAA/VG." (Note: VAA = interaction variance and VG = total genetic variance.)
Behavior geneticists respond
To the extent that BG heritability-estimation researchers have defended their practice against the charge that it inaccurately assumes additivity, they have made such arguments as this one, made by Michael Rutter in 2003: "Critics of behavior genetics are fond of attacking it on the grounds of the unwarranted presumption of additivity. However, behavior geneticists are well aware of this issue, and it is commonplace nowadays to make explicit tests for dominance or epistatic effects. Moreover, it is perfectly straightforward to include these in any overall model. There is a need to consider such effects, but their likely existence for some traits is not a justifiable reason for doubting behavior genetics."\cite{michael2003} This quotation is clearly referring
So how exactly do behavior geneticists take the (non)existence of additivity/presence of non-additive effects into account? Rutter makes it sound really easy, but just how do they do it, and are their procedures for doing so adequate? It is important to keep in mind that many critics of BG argue that the techniques researchers in the field use to try to test and account for genetic non-additivity are woefully inadequate. In fact, such arguments have been made since at least 1973(!), when Willis Overton wrote, "...it does not change the situation any to maintain that this position does consider interactions by introducing an interaction term into the analysis of variance...As discussed by Overton and Reese [1972], such interaction effects, ‘are themselves linear, since they are defined as population cell means minus the sum of main effects (plus the population base rate)’ (p. 84). In fact, the very use of the term ‘interaction’ within this paradigm indicates that definitions of terms are not model independent".\cite{Overton_1973}
Just fix the model!
More recently, Partridge has argued that "Although these advances in GxE transactional models represent a substantial step forward for quantitative behavioral genetics models, there are inherent structural limitations to their analytic foundations...the nature of GxE transactions go much deeper than statistical interactionist models can accommodate. If structural sequences in the genome were isomorphic to genetic function and, more important, to protein function, then the inferred genetic variability assumed by behavioral genetic models might be more instrumental. However, genes, rather than being static structural entities, are dynamic processes."\cite{Partridge_2011}