The latest really scary paper regarding GMOs has been circulated widely on Twitter today, primarily by the usual suspects (Bittman, Pollan, and many others). The paper (available here, on the blog of the primary author Judy Carman) is titled “*A long-term toxicology study on pigs fed a combined genetically modified (GM) soy and GM maize diet.*” The study has already been criticized for various reasons by David Tribe and Mark Lynas. The authors of the study fed pigs for ~~nearly 2 years~~ 22.7 weeks with either a “GM” or a “non-GM” diet. The GM diet was a mixture of corn and soybean that had Bt and glyphosate-resistance traits. The non-GM diet apparently has a similar amount of corn and soybean, but used non-GM conventional varieties instead. The authors measured LOTS of things, and mostly found there was no statistical differences between the GM and non-GM diets.

I don’t have time to do a full critique, but there is at least one statistical choice that I found odd, and thought I’d throw it out there for others to discuss. The authors claim to have found 2 differences between groups of pigs fed the different diets, and that has been the basis for the widespread interest in this study, particularly among folks who are anti-biotechnology. From the abstract:

“There were no differences between pigs fed the GM and non-GM diets for feed intake, weight gain, mortality, and routine blood biochemistry measurements. The GM diet was associated with gastric and uterine differences in pigs. GM-fed pigs had uteri that were 25% heavier than non-GM fed pigs (p=0.025). GM-fed pigs had a higher rate of severe stomach inflammation with a rate of 32% of GM-fed pigs compared to 12% of non-GM-fed pigs (p=0.004). The severe stomach inflammation was worse in GM-fed males compared to non-GM fed males by a factor of 4.0 (p=0.041), and GM-fed females compared to non-GM fed females by a factor of 2.2 (p=0.034).”

So the GM diet apparently resulted in increased uterus weight, and increased stomach inflammation compared to the non-GM diet. Table 2 in the manuscript presents organ weights, and as described in the abstract, the uterus. Since I don’t have the benefit of raw data, I suppose we’ll have to trust the authors on this one, that the uterus weight was greater in GM-fed vs non-GM fed pigs. Group means were 0.12 and 0.10 for pigs in the GM and non-GM groups, respectively. Table 2 doesn’t list the units for any of the numbers, so I don’t know if the weights are in grams, kilograms, ounces, metric tons… As a plant scientist, I really have no concept of what a normal pig uterus should weigh. Or any uterus, for that matter. But I digress. *[UPDATE: it was recently pointed out to me that the numbers are a percentage of total body weight. So 0.1 and 0.12% of body weight, I guess. I still don’t really know if that is good, bad, or normal. Go ahead and let me know in the comments if you like.]*

The second major finding of this study relates to stomach inflammation. The authors present in Table 3 of the manuscript “gross pathologies” related to various organs. For the stomach, the authors list 4 different categories related to inflammation:

- Nil inflammation
- Mild inflammation
- Moderate inflammation
- Severe inflammation

The authors compared the number of pigs that fell into each category independently, and found no differences between GM and non-GM groups with respect to Nil, Mild, or Moderate inflammation categories. But the authors found that there were more pigs from the GM-fed group with “Severe inflammation” compared to the non-GM group. And this is the major finding of the study; that “GM-fed pigs had a higher rate of severe stomach inflammation.”

But this seems to me a very strange way to analyze this data. The 4 categories the authors used to classify stomach inflammation are what is known as ordinal categorical data, and are pretty common in the literature. The typical way to analyze ordinal data is to give values to each category, and conduct ~~either a t-test or~~a Mann-Whitney (also called Wilcoxson) test. *[EDIT: many other tests are possible, the Mann-Whitney being among the simplest.]* The reason for this, is that there is structure to the data; that is, Mild inflammation is worse than Nil inflammation. And Severe is worse than the other three categories. We lose that information by separating them for analysis the way the authors of the pig study did. All 4 categories give information about stomach inflammation, and if we look only at “severe” inflammation, we lose the additional information the other categories provide. A proper analysis would include the structure of these data.

Since the authors present the number of animals in each category, we can analyze the data in a more standard way. I’ve provided the R code for doing so if you’d like to follow along at home. We’re going to use

## Coding: Nil = 0, Mild = 1, Moderate = 2, Severe = 3 ## enter the non-GM diet data: nonGM.fed<-c(rep(0,4),rep(1,31),rep(2,29),rep(3,9)) ## enter the GM diet data: GM.fed<-c(rep(0,8),rep(1,23),rep(2,18),rep(3,23))

[TABLE OMITTED in response to a valid criticism in the comments by Steve Kass.]

~~This table shows the number of pigs in each treatment group, and the mean and median values for stomach inflammation, based on the coding we used (Nil = 0, Mild = 1, Moderate = 2, Severe = 3). The mean inflammation values basically tell us that, on average, pigs on the non-GM diet had mild to moderate stomach inflammation, and the GM-fed pigs were only slightly different (1.59 vs 1.78). But are these values statistically different? Below is the code (and output) using a t-test and a Wilcoxson (Mann-Whitney) test:~~

[NOTE: I’ve left the code for t-test below, but as pointed out by several commenters, the Wilcoxson test is more appropriate for this data.]

t.test(nonGM.fed,GM.fed) # Welch Two Sample t-test # t = -1.248, df = 132.574,wilcox.test(nonGM.fed,GM.fed) # Wilcoxon rank sum test with continuity correction # W = 2325,p-value = 0.2142p-value = 0.2081

Notice the p-values in the ~~t-test and~~ Mann-Whitney test. Much higher than those reported by the authors who only analyzed the severe group. But does it hold up by running the males and females separately, as the authors did in Table 4?

## Males male.nonGM.fed<-c(rep(0,1),rep(1,16),rep(2,17),rep(3,2)) male.GM.fed<-c(rep(0,4),rep(1,12),rep(2,12),rep(3,8)) #t.test(male.nonGM.fed,male.GM.fed) wilcox.test(male.nonGM.fed,male.GM.fed) # Wilcoxon rank sum test with continuity correction # W = 600,## Females female.nonGM.fed<-c(rep(0,3),rep(1,15),rep(2,12),rep(3,7)) female.GM.fed<-c(rep(0,4),rep(1,11),rep(2,6),rep(3,15)) #t.test(female.nonGM.fed,female.GM.fed) wilcox.test(female.nonGM.fed,female.GM.fed) # Wilcoxon rank sum test with continuity correction # W = 564,p-value = 0.5669p-value = 0.2408

If I were to have analyzed these data, using the statistical techniques that I was taught were appropriate for the type of data, I would have concluded there was no statistical difference in stomach inflammation between the pigs fed the two different diets. To analyze these data the way the authors did makes it seem like they’re trying to find a difference, where none really exist.

**UPDATE: June 13, 2013**

I’ve been accused by whoever runs the gmoseralini.org whoops… I mean the gmojudycarman.org website of failing “kindergarten-level statistics.” I think that may be a slight exaggeration. Nonetheless, I will very briefly address their criticism. Bill Price has already addressed this to some extent in the comments:

Now, reasonable people can certainly disagree on how data should be analyzed. If there were only one correct way to analyze data, there would be far fewer statisticians in the world. But I stand by my view (and Dr. Price seems to agree) that is is inappropriate to collect data by categorizing into 4 ordinal categories, but then ignore that structure in the analysis. I concede that the Mann-Whitney (or Wilcoxson) test is more appropriate for this data compared to the t-test (both of which I presented above), but both tests above show the same result: very little evidence that the diets caused different amounts of stomach inflammation.

In the response at gmojudycarman.org, they state:

“Categorical data are data that fit into categories, such as male / female or pregnant / not pregnant. [Kniss] has tried to turn this sort of data into data that is continuous, like you get with body weight or height. This is really bad statistical methodology. It is like taking pregnant / not pregnant data and trying to twist that data into groups that could be described as: pregnant, half pregnant and fully pregnant. And you are right, it doesn’t make sense to even try to do something like that.”

Well, that’s an interesting statement… because that is exactly what the Carmen et al. authors did, right? They “twisted” inflammed/not inflammed into Nil, Mild, Moderate, and Severe inflammation. Personally, I don’t have a problem with using these categories (although the authors now seem to think it is “bad statistical methodology”??). My problem is with analyzing them separately.

There are different types of categorical data. The data described in the quote above is of a binomial nature (on/off, pregnant/not pregnant, present/absent, alive/dead). The data presented by the Carmen paper is more than that; it is Nil, Mild, Moderate, Severe. There are four different categories, that have a distinct order. Each category has meaning, and is linked to the others (Moderate is greater than Mild, but less than Severe). But this bit of criticism brings up an interesting question: what if we look at the data as a binary categorization (inflammed/not inflammed)? Let’s do that!

### Inflammed or not inflammed ## enter the non-GM diet data: nonGM.fed<-c(rep(0,4),rep(1,69)) ## enter the GM diet data: GM.fed<-c(rep(0,8),rep(1,64)) N.obs<-c(length(nonGM.fed),length(GM.fed)) num.inflam<-c(sum(nonGM.fed),sum(GM.fed)) pct.inflam<-round(num.inflam/N.obs*100,0) data.frame(N.obs,num.inflam,pct.inflam,row.names=c("nonGM","GM"))

Number of pigs | Number with stomach inflammation | Percentage of animals with stomach inflammation | |

nonGM | 73 | 69 | 95 |

GM | 72 | 64 | 89 |

Looking at the data this way, **the GM-fed pigs had LESS inflammation!** A whopping 95% of the animals fed non-GM feed had stomach inflammation, compared to 89% of the animals fed GM diets. That’s a lot of stomach inflammation. Is this difference statistically significant?

wilcox.test(nonGM.fed,GM.fed) Wilcoxon rank sum test with continuity correction data: nonGM.fed and GM.fed W = 2776, p-value = 0.2216 alternative hypothesis: true location shift is not equal to 0

The p-value is 0.22, so not much evidence that there is a difference. And I don’t care what type of fancy statistical test you use, **you simply can’t make the case that the GM-fed pigs were worse off if they had LESS stomach inflammation compared to the non-GM fed pigs.**

You’re very far off in your method. For a starters, the only reason why a Chi2 test wouldn’t be feasible, is because it doesn’t make the assumption that the effect is rising or falling in a monotone way when you go up or down the category stack. Now whether or not you make that assumption, is up to the researcher to decide.

A Mann-Whitney test is absolutely not suited for the data, as a Mann-Whitney test performs poorly in the presence of ties. By the very nature of your data, you only have ties. Exit Mann-Whitney.

If you want to refer to the “most correct” method, please grab a copy of Agresti’s “Categorical Data Analysis” and read a bit. Paragraph 7.3 on cumulative link models for example, very enlightening.

Joris,

We have analyzed this data (what we have access to) in many ways including the categorical models you suggest, as well as generalized linear models. See comments that Andrew references above:

http://weedcontrolfreaks.com/2013/06/gmo-pig/comment-page-1/#comment-13104

and

http://weedcontrolfreaks.com/2013/06/gmo-pig/comment-page-1/#comment-12946

Bill Price

This is a pure example of a “garbage in-garbage out” study. It doesn’t matter one iota which statistical analysis test you use, because the measurement that is being analyzed is non-valid. The authors purport to have measured stomach inflammation, but they haven’t. What was evaluated is the level of redness of the stomach lining. Any veterinary pathologist will tell you that the degree of redness of the stomach lining is highly variable even in healthy animals, and meaningless in itself. In order to measure inflammation, they would have had to make histologic sections of the stomach and have these examined by a veterinary pathologist. So you can argue until the cows (or pigs) come home about the proper stats, the bottom line is that you’re arguing about a meaningless variable.

Jean-Martin,

Yes, I fully agree. While that issue has been discussed elsewhere at length, however, it is important for the wider audience to discuss all aspects of a study. In order to demonstrate why such studies are so roundly criticized, we must show how poorly it was carried out in every phase from conception, design, implementation, measurement, and analysis. A much stronger argument can be made if we show it fails on many, if not all, levels.