Wednesday, July 23, 2008

Duffy-HIV association: an odd choice of ancestry markers

p-ter at GNXP does a great job of explaining a complex topic: how ancestry can confound a genetic association study, potentially leading to a false positive result.

The subject is a recent study suggesting an association between a loss-of-function (null) variant of the Duffy gene with increased susceptibility to HIV infection. The study examined African-American personnel in the US Air Force, and found that individuals who carried two copies of the null variant had a 40% increase in risk of contracting HIV, but paradoxically also display slower progression of the disease once infected. The study is summarised nicely by Nick Wade in the NY Times.

p-ter expands on this paragraph from Wade's article:
Dr. Goldstein said that in parts of the United States, African-Americans have a higher infection rate than European-Americans, and that patients with a higher proportion of African genes may be more vulnerable to H.I.V. for reasons unconnected to the SNP. Nonetheless, the SNP would show up in a greater proportion of infected people simply because of their African heritage. If so, the gene’s apparent association with H.I.V. infection could be just coincidental, not causal.
Basically, the problem is that the Duffy null variant is vastly more common in Africans than Europeans. In fact, the difference is about as large as it's possible to be, with frequencies of close to zero in Europeans and approaching 100% in many African populations; African-Americans, being an admixture of European and African ancestry, have a frequency of around 70%. So here's the danger: because the null variant correlates so well with African ancestry, it will likely also show a correlation with any trait that varies between individuals of European and African ancestry - potentially including HIV susceptibility.

p-ter notes:
...it's quite possible that the authors have simply shown a correlation between level of African ancestry and susceptibility to HIV (which could be due to any number of sociological, demographic, or genetic factors), rather than an association between Duffy null and susceptibility to HIV.
This sort of false positive is a well-known danger in genetic association studies, and is traditionally guarded against by genotyping a set of ancestry-informative markers (AIMs) that differentiate between African and European ancestry, and using this information to correct for any possible effects of confounding by population structure. This step is routine in genome-wide association studies, where the presence of information for hundreds of thousands of genetic markers make this correction straightforward.

In the Duffy study the authors attempt to perform this type of correction using a set of just 11 markers they describe as "differentially distributed between European and African populations". p-ter notes that several of these markers are not particularly ancestry-informative, and indeed on closer inspection it's clear why this is: these genes weren't originally selected on the basis of ancestry informativeness, but rather because they are associated with HIV biology. Every single one of the 11 markers has some association with HIV: three of them have previously been associated with HIV infection, progression, or response to treatment (CCR5 delta32, APOBEC3G H186R, GNB3 C825T); most of the remaining markers are in genes that are known binding targets or modulators of HIV (CCR5, CXCR4, PD1, TRIM5, IL-2, IL-4).

I can't find anywhere in the article where the authors mention that all of their "ancestry" markers also just happen to be associated with HIV biology; in the supplementary data they're described as "genetic markers that we found and/or have been reported elsewhere (NCBI SNP data bases) to be more prevalent in an ethnic background compared to others." Yet it's obvious that this wasn't the original motivation for selecting these markers. Actually, it seems most plausible that the authors genotyped all of these markers as candidates for association with HIV infection risk; when only the Duffy gene emerged as significant, they instead re-badged their unsuccessful candidates (or at least those with frequency differences between Europeans and Africans) as "ancestry markers".

If that's true - and it's difficult to see any other rationale for using these HIV markers rather than a set of validated AIMs - this is poor form for at least two reasons. Firstly, it's unlikely that using such a weak set of ancestry-informative markers provides an effective correction for a marker with as strong a correlation with ancestry as Duffy (as p-ter notes, all of the supposed ancestry markers are far weaker predictors of ancestry than the Duffy variant). Secondly, testing several different variants for an association with HIV and then only reporting the one that achieved significance creates the perfect conditions for a false positive due to multiple comparisons. I'll be discussing this second point in more detail in a separate post.

Anyway, the ultimate test will be independent replication - I'm sure we'll all be watching with interest to see if this association holds up in studies where the effects of ancestry are adequately controlled.


HE, W., NEIL, S., KULKARNI, H., WRIGHT, E., AGAN, B., MARCONI, V., DOLAN, M., WEISS, R., AHUJA, S. (2008). Duffy Antigen Receptor for Chemokines Mediates trans-Infection of HIV-1 from Red Blood Cells to Target Cells and Affects HIV-AIDS Susceptibility. Cell Host & Microbe, 4(1), 52-62. DOI: 10.1016/j.chom.2008.06.002

Subscribe to Genetic Future.

2 comments:

p-ter said...

It's difficult to assess this fully because the manuscript doesn't seem to report a single P value (!), although I note that the lower edge of the 95% confidence interval of the odds ratio in Figure 2C is perilously close to 1 following their ancestry "correction".

I think the 0.03 above the plot in 2C is the p-value. not an overwhelming association.

Daniel said...

Thanks - not entirely sure how I managed to miss that, but it confirms that there's no way this association would have survived correction for multiple comparisons. I think I'll roll this over into a separate post.