**UPDATE: If you’re looking for information on the ‘republished’ version of this manuscript, a full statistical analysis of the released data can be found here.**

If you’re following the news about the French GM maize feeding trial, you’ve probably heard: (A) we need to pull GMO crops off the market immediately; or (B) that the study is flawed and is basically meaningless. I guess I find myself leaning toward the second group on this one. Here is why I think the recent GM corn feeding trial by Saralini (Seralini et al. 2012. Food and Chemical Toxicology) is bogus. Just for full discosure, I’m not in any way an expert on animal feeding studies. But I know something about statistics and probability (in most cases, just enough to be dangerous). I would invite anyone who sees errors in my logic from either an animal science, toxicology, or statistics standpoint to please let me know in the comments.

In the very first sentence of the introduction, the authors state that “There is an ongoing international debate as to the necessary length of mammalian toxicity studies in relation to the consumption of genetically modified (GM) plants including regular metabolic analyses (Séralini et al., 2011).” I find it interesting that Seralini cites himself as proof of this… I did not look up the reference or search to see if this is an actual international debate, or if it is simply Seralini vs the world on this point. But I digress. A reasonable person would certainly agree that long-term studies of food products sounds like a good idea, and so it is easy to side with the authors that this type of research is needed.

But if we compare the life span of rats with the life span of humans, the concept of “long term” is not at all similar. And this is where I think the Seralini study falls apart. It boils down to the fact that this study lasted for 2 years, and used Sprague-Dawley rats. To those of us who don’t do rat studies, 2 years probably seems like a reasonable “long term” duration for a study (it did to me at first glance). However, it seems that for the specific line of rats they chose (Sprague-Dawley), 2 years may be an exceptionally long time.

A 1979 paper by Suzuki et al. published in the Journal of Cancer Research and Clinical Oncology looked at the spontaneous appearance of endocrine tumors in this particular line of rats. Spontaneous appearance basically means the authors didn’t apply any treatments (like feeding them GMOs or herbicides). They just watched the rats for 2 years and observed what happened in otherwise healthy rats. When the study was terminated at 2 years (the same duration as the Seralini study), a whopping 86% of male and 72% of female rats had developed tumors.

Below I provide the results of a very basic simulation using R. I’ve also provided the R code incase anyone would like to repeat or modify this little exercise (R code is in red, output is in blue). Let’s assume that the Suzuki et al. (1979) paper is correct, and 72% of female Sprague-Dawley rats develop tumors after 2 years, even if no treatments are applied. If we randomly choose 10,000 rats with a 72% chance that they will have a tumor after 2 years, we can be pretty certain that approximately 72% of the rats we selected will develop a tumor by the end of 2 years.

## Create a sample of 10,000 Female rats. Each rat we choose ## has a 72% chance of developing a tumor after 2 years. SD.Female<-sample(c(0,1),10000,replace=T,c(0.28,0.72)) ## The mean of this population (of 0s and 1s) will tell us the ## the proportion of rats that developed tumors, by chance. ## 0 = no tumor; 1= tumor mean(SD.Female) [1] 0.714

In our very large sample of 10,000 simulated rats, we found that 71.4% of them will develop tumors by the end of a 2 year study. That’s pretty close to 72%. But here is where sample size becomes so critically important. If we only select 10 female rats, the chances of finding exactly 72% of them with tumors is much less. In fact, there is a pretty good chance the percentage of 10 rats developing tumors could be MUCH different than the population mean of 72%. This is because there is a greater chance that our small sample of 10 will not be representative of the larger population.

UPDATE: 9/20/2012 – See the comment from Luis below for a more elegant way to set up the 9 groups. It also allows you to more easily change the probabilities (only one time, instead of 9) if you want to see the impact if the probability of tumors is 50 or 80% instead of 72%. Thanks Luis!

## Create 9 groups of rats. Each group has 10 individuals. ## Each individual has a 72% chance of developing a tumor ## after 2 years. SD.Fgrp1<-sample(c(0,1),10,replace=T,c(0.28,0.72)) SD.Fgrp2<-sample(c(0,1),10,replace=T,c(0.28,0.72)) SD.Fgrp3<-sample(c(0,1),10,replace=T,c(0.28,0.72)) SD.Fgrp4<-sample(c(0,1),10,replace=T,c(0.28,0.72)) SD.Fgrp5<-sample(c(0,1),10,replace=T,c(0.28,0.72)) SD.Fgrp6<-sample(c(0,1),10,replace=T,c(0.28,0.72)) SD.Fgrp7<-sample(c(0,1),10,replace=T,c(0.28,0.72)) SD.Fgrp8<-sample(c(0,1),10,replace=T,c(0.28,0.72)) SD.Fgrp9Female.9grp colnames(Female.9grp)<-c("Control","t1","t2","t3","t4","t5","t6","t7","t8") Female.9grp

Control | t1 | t2 | t3 | t4 | t5 | t6 | t7 | t8 | |

1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 | 0 |

2 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0 | 1 |

3 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 |

4 | 1 | 1 | 1 | 0 | 1 | 1 | 1 | 0 | 0 |

5 | 0 | 1 | 0 | 1 | 1 | 1 | 0 | 1 | 1 |

6 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 1 | 0 |

7 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 | 0 |

8 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |

9 | 1 | 1 | 1 | 1 | 1 | 0 | 1 | 1 | 0 |

10 | 0 | 1 | 1 | 0 | 1 | 1 | 1 | 1 | 1 |

sum(Female.9grp)

[1] 62

sum(Female.9grp)/90

[1] 0.6889

The 9 groups (in columns) of 10 rats each represent one possible randomization of the rats used in the Seralini study. Let’s assume that “Control” is the control group, “t1” is the first treatment group, and so on. If we look at all 90 simulated female rats chosen for the experiment, 62 rats (about 69%) would develop tumors after 2 years, even if no treatments were applied. Again, that’s not too far away from our known population mean of 72%.

* But here’s the important part:* Simply by chance, if we draw 10 rats from a population in which 72% get tumors after 2 years, we have anywhere from 5 (“t2”) to 10 (“t1”) rats in a treatment group that will develop tumors.

**Simply due to chance; not due to treatments.**If I did not know about this predisposition for developing tumors in Sprague-Dawley rats, and I were comparing these treatment groups, I might be inclined to say that there is indeed a difference between treatment 1 and treatment 2. Only 5 animals developed tumors in treatment 1, and all 10 animals developed tumors treatment 2; that seems pretty convincing. But again, in this case, it was purely due to chance.

So my conclusion is that this study is flawed due to the choice of Sprague-Dawley rats, and the duration (2 years) for which the study was conducted. Sprague-Dawley rats appear to have a high probability of health problems after 2 years. And when there is a high probability of health problems, there is a high probability that **just by chance** you will find differences between treatments, especially if your sample size for each treatment is only 10 individuals.

* UPDATE: September 23. *For those of you who would like more information on this study by Seralini et al,

**please read Emily Willingham’s critique of the study.**It is by far the most comprehensive summary I have read. Emily is on twitter at @ejwillingham. An excerpt:

The possible explanations are legion, but with several different kinds of estrogen receptors with different actions in different tissues, compounds that block a receptor at one concentration but activate it at another, compounds that interact with different kinds of hormone receptors in different ways, and differential effects in different species–it’s no wonder the results with mixtures are themselves so mixed. The one thing that doesn’t leap out here as being involved, among a sea of likely possibilities, is the GM corn itself.

* UPDATE: September 28.* For a graphical demonstration of this post, check out the Inspiring Science blog.

* UPDATE: October 4.*The European Food Safety Authority (EFSA) has released a statement on the Seralini study. Their conclusion (emphasis mine):

EFSA notes that the Séralini et al. (2012) study has unclear objectives and is inadequately reported in the publication, with many key details of the design, conduct and analysis being omitted. Without such details it is impossible to give weight to the results. Conclusions cannot be drawn on the difference in tumour incidence between the treatment groups on the basis of the design, the analysis and the results as reported in the Séralini et al. (2012) publication.

In particular, Séralini et al. (2012) draw conclusions on the incidence of tumours based on 10 rats per treatment per sex which is an insufficient number of animals to distinguish between specific treatment effects and chance occurrences of tumours in rats.Considering that the study as reported in the Séralini et al. (2012) publication is of inadequate design, analysis and reporting, EFSA finds that it is of insufficient scientific quality for safety assessment.

and:

Séralini et al. (2012) draw conclusions on the incidence of tumours based on 10 rats per treatment per sex.

This falls considerably short of the 50 rats per treatment per sex as recommended in the relevant international guidelines on carcinogenicity testing (i.e. OECD 451 and OECD 453). Given the spontaneous occurrence of tumours in Sprague-Dawley rats, the low number of rats reported in the Séralini et al. (2012) publication is insufficient to distinguish between specific treatment effects and chance occurrences of tumours in rats.

I guess that pretty much settles it.

Nice logic but it will do little to convince the general public. Photos of rats with tumors & accompanying text reading “fed GMO corn” is all that needs to be done. We will see this study brought up forever whether or not it is published. It is basically “fact” as far as any reader of an article is concerned.

I think there is a lot of truth to that. The images seem to have been placed into the manuscript for shock value more than anything else, and they will be spread far and wide. Carl Zimmer recently wrote a piece about “de-discovery” that captures your idea well: http://blogs.discovermagazine.com/loom/2012/09/18/the-slow-slow-road-to-de-discovery/

Once a study like this is published, the damage is mostly done. Many studies and many rebuttals will be published, but you are correct. It is tough to put the toothpaste back in the tube. It reminds me of the study many years ago about Bt corn pollen killing Monarch butterflies. Years after that study was shown to be without any real-world significance, people were still wearing butterfly suits to protest GMO crops.

Great example Andrew. I think you are right on the money, particularly as there is no evidence of dose response on the study (as pointed out here).

Concerning the R code, you could generate all samples in one go using this:

trial = matrix(sample(c(0, 1), 90, replace = TRUE, prob = c(0.28, 0.72)), nrow = 10, ncol = 9)

It is fun to run the code several times to see the huge variability we can observe in the results, even when there is no treatment.

Hi Luis,

Thanks for the coding tip! I hadn’t thought of using matrix() to get the groups more efficiently. I’ve updated the post to make your version easier to find.

AK

Seriously, great background research. It’s a sad day when science cuts corners, even if it is to discredit an evil conglomerate.

Great post dude, keep it up

What’s nice about this is that even a lay observer such as I can understand it. Great work.

Here’s a google translate of this comment for those interested:

What GM produce tumors? No panic – Amazings.es

Recently the press has jumped to the news that feeding mice with a transgenic maize variety resistant to glyphosate NK60 subtraction of this herbicide these mice developed tumors the size of a ping pong ball. This information has been distributed as alarmist by different means, but only a few have expressed reluctance (as Es.materia, or even Medical Science Center in a much more detailed article, or in Control Freaks in discussing the methodology used)

Guys there are flaws in all studies. The group eating the GMOs got more tumors and got them earlier earlier. Bottom line.

Tumors in livestock from eating GMO corn is endemic- try this:

Jerry Rosman talking about pigs

Why haven’t multiple generation studies been done?

Check Andres Carrasco on glyphosate studies

Unlike the ‘organic’ study by Stanford, this one will not go mainstream and if it does, will be subject to ‘expert’ criticism.

Thanks for a good summary of tumor prevalence without treatment. I would add that Seralini, et al might realize problems here as they did NO statistical analysis of the tumor data (just summaries like Table 2). I, being curious, however, did. 🙂 I found no significant treatment effects. There is no power in these tests because there are not sufficient data given the levels of variability present (like you show above).

The only inferential analyses conducted were multivariate and these also suffer from sample size problems (see the biofortified.org forum). For a good summary of these types of tests and potential problems, see “Statistical strategies for avoiding false discoveries in metabolomics and related experiments” (Broadhurst and Kell. Metabolomics, Vol. 2, No. 4, December 2006). I found that to be very helpful.

Hi Pdiff,

A very nice summary that you posted on the Biofortified forum (http://www.biofortified.org/community/forum/?vasthtmlaction=viewtopic&t=227.0). I was also thinking of running a generalized linear model on the data, but frankly lost interest after spending most of the day on the topic. 🙂 Thanks for the comment and analysis!

AK

Regarding the “they used the wrong type of rats” point, please read the response from the scientists who undertook this new research.

Dr Michael Antoniou, a reader in molecular genetics and member of Criigen – the Committee of Research & Independent Information on Genetic Engineering – has vigorously refuted questions raised by fellow scientists about the robustness of the study.

Researchers had come under fire from Prof Tom Saunders, head of the nutritional sciences research division at King’s College London, who said the breed of rats used in the study, the Sprague-Dawley, was very prone to mammary tumours – particularly when food intake is not restricted.

“The SD rat was used in the original glyphosate toxicity studies,” Antoniou said.

“In addition, many studies – including many from industry – on GM foods use SD rats. Based on this history of use, it was appropriate to use this strain too. If it was the wrong strain to use here then it was wrong in many previous GM food safety feeding studies conducted by industry and upon which marketing approval was granted.”

He continued: “The key is that there were both quantitative and qualitative differences in the tumours arising in control and test groups. In the former they appeared much later and at most there was one tumour per animal, if at all.

“In the latter case, the tumours began to be detected much earlier (4 months in males; 7 months in females), grew much faster and many animals had two or even three tumours.

“Many animals in the test groups had to be euthanised for welfare legal reasons due to the massive size of the tumours; none of the control animals had to be euthanised but died in their own time. One should not ignore these biological facts.”

This comes from

The Grocer, 20 September 2012

http://www.thegrocer.co.uk/topics/health/scientists-shrug-off-attacks-on-monsanto-gm/cancer-trial/232696.article

Well, I am no expert in statistics. As other scientists, I also have concerns about the safety GMO, especially those carrying resistance genes to pesticides or toxin genes. The mouse strain used in this study is the same used in the monsanto paper (Hammond et al, 2004). If they have used a different strain, critics would argue that the studies are not comparable.The number of rats evaluated are also the same (actually the monsanto paper used 20 rats, but only analyzed data from 10 rats in each group). So, why no one drew criticism about the small sample in the monsanto paper (used as support data for the safety of this GM crop). Do monsanto researchers know about the 1979? If so, why not choose a more suitable strain to allow short and long-term studies in the same strain?

Hammond et al. Results of a 13 week safety assurance study with rats fed grain from glyphosate tolerant corn. Food and Chemical Toxicology, Volume 42, Issue 6, June 2004, Pages 1003–1014.

The problem isn’t strictly the strain of rats, or the duration of the study, or the sample size. It is the combination of these three things. Sprague-Dawley rats have been shown many times to develop major health problems when allowed to grow for 2 years. This type of rat is fine for 90 day feeding trials, because these late-life issues are not an issue yet. The point is, that a very high percentage of these rats develop health problems late in life, so they seem to be an improper test organism if you want to test the effects of a diet later in life. There is simply too much variability to draw conclusions. This is presumably one reason why most studies are not carried out for 24 months when using this type of rat. That does not preclude them from being used for shorter duration studies (such as 90 day feeding trials), as have been done for a very long time.

Hi Roger,

Please see my response to Jefferson Santos above. It addresses your concerns as well. Thanks for visiting and commenting!

AK

Hi Andrew,

I don’t think you are right to say that these type of rats seem to be an improper test organism for longer than 90 day trials.

They were used in the original glyphosate two-year toxicity studies conducted in 2002 for regulatory approval within the EU.

The industry standard for toxicity tests performed by industry for regulatory purposes is the

international protocol set out by the OECD (Organisation for International Cooperation and

Development). This says that long-term carcinogenicity studies should be performed with the same

strain of rat as used in shorter mid-term experiments, because this allows effects seen in the shorter

experiment to be tracked to see how they develop in the long-term experiment, without the

confounding factor that would occur if a different strain of rat was employed. Therefore, based on the

past use of SD rats in trials of GM food and glyphosate it was scientifically correct and consistent to

use this strain in Prof Seralini’s long-term study.

There is more on this, and answers to other questions about the trials, here:

http://research.sustainablefoodtrust.org/wp-content/uploads/2012/09/Response-to-criticisms.pdf

I’m not saying Sprague-Dawley rats absolutely couldn’t be used for more than 90 days. I’m saying that as the duration of the study approaches 2 years, it seems there is a much higher probability of health problems that might confound the study. As variability in the experimental units increase, one must account for that increased variability somehow in the experimental design. Since the authors ran no statistics (except for an odd multivariate analysis that Marion Nestle referred to as unnecessarily complex), we can’t be sure how much of their result is due to treatments, and how much due to random variability.

I think there is a problem with the 72% of rats that developed tumor in the 1979 article, as it seems that the authors are reporting ANY kind of tumor. However, in the GMO article there are studying specific types of tumours. As the 1979 article has no open-acces I can’t know what are the perentage of rats that have developed the specific type of tumour considered in the GMO article, which is the value that we should use to make the simulations, isn’t it?

Hi Hh. Thanks for stopping by and commenting. The 1979 article is specifically looking at spontaneous appearance of endocrine tumors, not “ANY kind of tumor.” The authors specifically say that their results “can be explained by the non linear endocrine-disrupting effects of Roundup, but also by the overexpression of the transgene in the GMO and its metabolic consequences.” If the authors theorize the effects are due to endocrine disruption, it seems logical that we should look at the endocrine tumors from previous studies.

Using the same methods, if we reduce the probability to 50% that they would get tumors, we actually are likely to get similarly variable results:

> model.50 = matrix(sample(c(0, 1), 90, replace = TRUE,

+ prob = c(0.5, 0.5)), nrow = 10, ncol = 9)

> model.50

C t1 t2 t3 t4 t5 t6 t7 t8

[1,] 1 0 0 0 0 1 1 1 1

[2,] 0 0 0 1 1 1 0 1 0

[3,] 1 1 1 1 0 1 0 0 1

[4,] 1 0 1 1 0 1 1 0 1

[5,] 1 1 1 0 0 1 1 1 0

[6,] 0 0 0 1 0 0 0 0 1

[7,] 0 0 0 0 0 1 1 0 1

[8,] 0 1 0 1 0 0 1 0 1

[9,] 0 0 1 1 0 1 1 0 0

[10,] 1 0 0 1 1 1 0 1 1

In this example, we got anywhere between 2 rats with tumors (t4) and 8 rats with tumors (t5). So if we assume that 50% is a more realistic estimate of the number of rats that will develop tumors after 2 years, our results are still HEAVILY influenced by random variability. Take a look at Rachael Ludwick’s comment over at MotherJones (http://www.motherjones.com/tom-philpott/2012/09/gmo-corn-rat-tumor#comment-657883136) to see another example of using 20%, and getting almost identical results.

Hi, thank you for your answer. You’re right.

I have made some simulations in MATLAB (I have more experience than in R) to look for the probability of obtaining the results in the control group versus the three treated GMO groups. I have made 100000 simulations of 4 groups with 10 individuals and a binomial probability of 0.72.

Then I have obtained the probability to obtain similar results to those obtained in the paper, i.e. the first group (control) has 2 positive (1) less than the first, second and third group (which I assume should be similar to a one-tail test, isnt’it) or 2 positive less or more than the other three groups (a two-test tail). The probabilities I have obtained are:

n_i = number of positives in the group i

p( (n_1 – n_2)<=-2 AND (n_1 – n_3)<=-2 AND (n_1 – n_4) =2 AND abs(n_1 – n_3) >=2 AND abs(n_1 – n_4)>2) = 0.23

Both are non-significant, but the first is in the “gray” zone, isn’t it?

Please, can somebody tell me if this approach is correct?

On the other side, I have made a Cochran–Armitage test for trend (prop.test.trend) in R.

The proportion of female rats (the number in parenthesis) that develop mammary tumours is not significant:

prop.trend.test(c(5,7,7,8),c(10,10,10,10), c(0,1,2,3))

Chi-squared Test for Trend in Proportions

data: c(5, 7, 7, 8) out of c(10, 10, 10, 10) ,

using scores: 0 1 2 3

X-squared = 1.8462, df = 1, p-value = 0.1742

The proportion of males with pathological signs in the liver it is significant:

prop.trend.test(c(2,4,7,6),c(10,10,10,10), c(0,1,2,3))

Chi-squared Test for Trend in Proportions

data: c(2, 4, 7, 6) out of c(10, 10, 10, 10) ,

using scores: 0 1 2 3

X-squared = 4.5113, df = 1, p-value = 0.03367

what do you think?

Sorry, the data have change:

p( (n_1 – n_2)<=-2 AND (n_1 – n_3)<=-2 AND (n_1 – n_4) =2 AND abs(n_1 – n_3) >=2 AND abs(n_1 – n_4)>2) = 0.23

By the way, I aslo think that the study is poorly designed and I can’t imagine how can they present the data in Figure 1, figure 2 and table 2 without a single statistical test.

Sorry again , I don’t know what happens when I try to put the data:

in the first case I obtain a probability of 0.0709, in the second case, a 0.23.

Hi Hh,

I’m not very good a Matlab, but am interested in your approach. However, due to the very high traffic we’ve received with this post, we need to migrate to another server sometime this afternoon. I will plan to re-read and comment on your analysis soon, but I need to get ready for the move. I will talk to you soon!

Thanks,

AK

You can choose from all these random draws which most closely resembles the results Séralini, but:

Séralini is a lucky man! He had just enough money and time to do a test … he managed to get the desired random = which gave him a scoop.

What are the chances, with 200 rats, get the best score? Not a bad draw (which looks like GMOs are good for health), Not a middle draw (nothing to say), but something in line with his beliefs?

Hi,

With which rat chow were fed the animals in the study by Suzuki, H. et al. (1979)?

In what kind of cages were the rats kept?

Is a simulation needed to show the weakness of such poor sampling ?

Basic statistics say that any variable from a sample has an fluctuation interval of p +- 1/sqrt(n) (high school maths here in France) by simple chance (95% chance in fact).

For Seralini rats, p=0.72%, n=10, so p can be anywhere between 40% and 100% meaning that his “results” are meaningless. The guy wouldn’t smell a rat even if he slept with 200 of them.

Hi Jean,

Certainly, I don’t feel a simulation is *necessary* to demonstrate this fact. But I think it is easier to illustrate how widely results can vary if you have a test organism which is prone to health issues. I will refrain from commenting on the authors’ motives or sense of smell, but I think you are not alone with this opinion. Thanks for stopping by and commenting! -AK

Hi again Andrew,

There wasn’t a reply button to your last post to me so I’m popping up down here.

I note that you are now saying that Sprague-Dawley rats could be used for experiments of more than 90 days,

and I think the answer to your latest post is also contained in the link I gave

http://research.sustainablefoodtrust.org/wp-content/uploads/2012/09/Response-to-criticisms.pdf

“The key thing is that there are big differences between the tumour frequencies in the control and the

experimental groups (see previous answer). Claims that the results are just the result of random

variation in a rat line that has a high frequency of tumours are not valid. The evidence for this is that

the differences between the groups are much larger than the standard deviations of the two groups. In

Seralini’s study, the differences are so large that it is not necessary to use a statistical test.

This study used more rats in test groups, for a far longer duration, than any previous investigation

employed by industry to obtain approval for NK603 GM maize and other GM crop products.”

Hi Roger, Sorry about the Reply button missing… I may have the number of response levels set too low. We’ll actually be migrating servers in the next 24 hours due to the heavy traffic this post has created, so once we’re moved, I will try and fix that.

With regard to your point: I’m not saying Sprague-Dawley rats should or shouldn’t be used for experiments, as I am not an animal scientist. But it appears to me, based purely on the likelihood of health issues, the COMBINATION of sample size, rat line, and duration present a high likelihood of observing differences between treatments purely due to chance.

With respect to your link, certainly the authors will defend their choice of methods/analysis. Can you point me to an independent scientist (*not* a co-author or close collaborator) that will defend the experimental design and analysis? I’ve not heard anyone who is willing to defend the unconventional analysis conducted here. This is probably because conventional statistical tests would show that the results are likely to occur by chance. And that is the point of my post. That because of the COMBINATION of rats, duration, and sample size, the results could easily have occurred by chance alone.

“The evidence for this is that the differences between the groups are much larger than the standard deviations of the two groups. In Seralini’s study, the differences are so large that it is not necessary to use a statistical test.”

Standard deviation on data doesn’t say anything about the uncertainty of the data : a thermometer can give 10°C with standard deviation of 1°C but if its accuracy is 3°C (a common systematic error), you can’t tell if there is a temperature difference with another thermometer with 12°C.

And basic statistics say that with a sample of 10 rats, uncertainty is huge : +-1/sqrt(10) or 30%.

In Seralini paper, results between groups are clearly overlapping and no scientific conclusion should be made.

Hello again Andrew and Jean,

In reply I can only supply some of the relevant answers produced by the Séralini team given at their press conference. Hope this helps…

Q: Why have you not used a standard statistical method?

A: These methods have not been judged satisfactory by expert agencies to demonstrate toxicity for groups of 10 rats. However, the maximum deviations of deaths or tumors (600 days, 2-5 times more) speak for themselves.

In addition, there is an underestimation of the tumorigenic effects at the end of two years compared to controls according to these curves data. This underestimation is due to the fact that the controls are living longer and developing conditions, including tumors, towards the end of life.

Q: What degree of confidence is there in the significant differences found by the statistical method OPLS-DA, there is no p-values?

A: This is one of the most modern methods to treat a large number of variables, such as in genomics, indeed the significance doesw not pass through the p-value reserved for other tests.

Q:What is the magnitude of the difference in mortality of the controls compared to the historical norm?

A: Each experiment having its own conditions, the historical norm is too large to be a relevant comparator. The controls are in the average normal life, and our differences are compared to the controls of the experiment.

Q: It is recommended to experiment on 50 rats for a statutory study on carcinogenesis. What value to bring to your results on 10 rats?

A: We studied 200 rats, 10 rats/group. Statutory biochemical studies are recommended by the OECD on 10 rats per group minimum.

No statutory study which allowed the authorization of GMOs had more than 10 rats measured per group.

We therefore made the most robust tests in the world, especially as we were examining the long term.

We could not anticipate the results of the tumors, but we observed and recorded them in this study, what was normal, it is not the study of carcinogenesis that would not have allowed to observe the hepatorenal effects and others.

“In reply I can only supply some of the relevant answers produced by the Séralini team given at their press conference. Hope this helps…”

@Roger Mainwood

No, it does not. Seralini et al have been unresponsive or diversionnary in order to avoid the simple fact that invalidate their conclusions : their data have an natural fluctuation range of +-30% and are overlapping making their conclusions baseless.

To be clear, the criticism is not about n= 10 Sprague-Dawley rat groups. It’s that

1) the rats are tested on such a long period that it gives 72% ***natural*** cancer rate

2) the small sample n=10 associated with the high natural cancer rate gives a huge natural cancer fluctuation range, here 72% +- 1/sqrt(10) ie 40% to 100%! Such poor experimental setup (too long period, too small sample => too large natural fluctuation interval) would have never been approved for funding by a scientific commission (any toxicity study worth its salt, for example in pharmaceutical research must have its experimental setup pre-approuved before funding in order to avoid flawed setup and after-the-fact avoidance of “inconvenient” results & statistics, in other words, lie by omission).

Had Seralini made his tests on shorter periods like 3 months on SD rats or on longer periods but with other rats where natural cancer occurrence is around 0%, he would have halved the natural fluctuation range to 0-30% even with 10 rat groups. He may reduce (not very much) that natural fluctuation range by some sampling techniques provided he made public the associated statistics for independent checks (but he did not).

Then maybe his results would be statistically credible. Or maybe not.

@Jean Demesure

I agree with you in the potential to develop tumours, but what about the hepatic and kidney damage observed in male rats? I suppose it is not invalidated by the 70% natural cancer rate, isn’t it?

And what about the statement of a sooner development in tumours in GM-feed rats. Do you think this is affected by the natural cancer rate?

A quick note to some of the comments above.

Standard errors on percentage: The tumor data in Table 2 (numbers in parentheses) are binomial count type data, not continuous data. When computing a standard error for a percentages from this data, using 1/sqrt(n) is not appropriate. If the proportion of tumors (i.e. the %/100) is labeled as p, then the estimated standard error will be SE = sqrt(p*(1-p)/n). A 95% confidence interval on this will then be p +- 1.96*SE. This is an approximation, and will become tenuous at very low or high values of p.

Seralini’s press conference replies: They are quoted as:

“Q: Why have you not used a standard statistical method?

A: These methods have not been judged satisfactory by expert agencies to demonstrate toxicity for groups of 10 rats. However, the maximum deviations of deaths or tumors (600 days, 2-5 times more) speak for themselves.”

So their answer is: The methods used weren’t approved, but it doesn’t matter because we think things just look big enough. Is this just a loss in translation?

“Q: What degree of confidence is there in the significant differences found by the statistical method OPLS-DA, there is no p-values?

A: This is one of the most modern methods to treat a large number of variables, such as in genomics, indeed the significance doesw not pass through the p-value reserved for other tests.”

This true. PLS DA is a common descriptive statistical procedure used in metabolomics and normally does not produce inferential statistical tests (p-values). The authors, however, went ahead and induced an implied testing procedure on the analysis by using a resampling jack knife technique to generate the confidence intervals presented in Figure 5. This, in and of itself, is not unreasonable. Note, however, that to do this they first compare every one of the nine treatments to the control (each is a separate PLS DA analysis). Within each of those, they potentially then compute 48 confidence intervals, the purpose of which is to provide a means of determining the significance of each of the 48 metabolites (aka inferential testing). This all raises the issue of multiple testing and lack of control over the overall error rate. Normally, an adjustment to the confidence level, such as Bonferoni’s, would be made here. Above, I said “potentially then computed 48 confidence intervals” because we don’t know what happened in 8 of the 9 possible analyses. Only one is reported in Figure 5. These issues with the PLS DA analysis plus other more serious ones not described here answer the original question of how much confidence is there in the “significant differences found by the statistical method OPLS-DA”. The answer is simple: NONE.

Very interesting, Pdiff. PLS DA is not an analysis I’m very familiar with (either conducting, or interpreting). It has been a few years since I’ve run any kind of discriminant analysis… So it is good to hear your view of the way it is presented.

I’ve pasted in below Jean’s reply to my post of Sept 21st. Somehow Jean’s reply appeared way back up the thread, but since it refers to my post above I thought it would be good to have it down here too so others can follow the debate more easily. I follow it with a further reply….

Jean Demesure says:

September 22, 2012 at 10:11 am

“In reply I can only supply some of the relevant answers produced by the Séralini team given at their press conference. Hope this helps…”

@Roger Mainwood

No, it does not. Seralini et al have been unresponsive or diversionnary in order to avoid the simple fact that invalidate their conclusions : their data have an natural fluctuation range of +-30% and are overlapping making their conclusions baseless.

To be clear, the criticism is not about n= 10 Sprague-Dawley rat groups. It’s that

1) the rats are tested on such a long period that it gives 72% ***natural*** cancer rate

2) the small sample n=10 associated with the high natural cancer rate gives a huge natural cancer fluctuation range, here 72% +- 1/sqrt(10) ie 40% to 100%! Such poor experimental setup (too long period, too small sample => too large natural fluctuation interval) would have never been approved for funding by a scientific commission (any toxicity study worth its salt, for example in pharmaceutical research must have its experimental setup pre-approuved before funding in order to avoid flawed setup and after-the-fact avoidance of “inconvenient” results & statistics, in other words, lie by omission).

Had Seralini made his tests on shorter periods like 3 months on SD rats or on longer periods but with other rats where natural cancer occurrence is around 0%, he would have halved the natural fluctuation range to 0-30% even with 10 rat groups. He may reduce (not very much) that natural fluctuation range by some sampling techniques provided he made public the associated statistics for independent checks (but he did not).

Then maybe his results would be statistically credible. Or maybe not.

@Jean DemesureI

In response:

I think the 72% ‘natural’ cancer rate comes from Suzuki paper

http://www.ncbi.nlm.nih.gov/pubmed/521452

Suzuki paper saw 81% tumours in rats living more than 2 yrs and 72% in 2 year rats? (can’t see full paper, only abstract)

This of course is historical control data; rats bred in different years, with a different diet with different pesticide residues, and different environmental conditions than the Seralini rats. So there is a big range, and much data ‘noise’ that are effectively confounding factors. The Seralini team have pointed out that they do controlled experiments to exclude variables except the one thing being tested, ie GM maize or GM maize + roundup or roundup (each being tested individually). What matters is the control within the experiment, ie NOT historical control data.

Whoah …Andrew, something has gone skewy with the date system on this thread since the change of server. Check back up the thread for latest posts.

Hi,

It is pretty clear that the greater tumour rate is not significant. However, I have made some calculations, and the increase in the rate of individual male rats that presented hepatic lessions with the greater GM Maize % in diet is statistically significant:

Firts I did a Cochran–Armitage trend test:

prop.trend.test(c(2,4,7,6),c(10,10,10,10), c(0,1,2,3))

Chi-squared Test for Trend in Proportions

data: c(2, 4, 7, 6) out of c(10, 10, 10, 10) ,

using scores: 0 1 2 3

X-squared = 4.5113, df = 1, p-value = 0.03367

Then I did a logistic regression

i = 0:3

t = c(2,4,7,6)

n = c(10,10,10,10)

lHep = glm(cbind(t, n-t) ~ i, family=binomial(logit))

Coefficients:

(Intercept) i

-1.0776 0.6432

Deviance Residuals:

1 2 3 4

-0.40231 0.04484 0.95884 -0.67897

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -1.0776 0.5880 -1.833 0.0668 .

i 0.6432 0.3134 2.052 0.0401 *

—

Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 6.2059 on 3 degrees of freedom

Residual deviance: 1.5442 on 2 degrees of freedom

AIC: 16.113

Number of Fisher Scoring iterations: 4

Sorry, I am not very experienced with logistic regression, but this means that it seems that transgenic diet increases the risk of suffering hepatic lessions, isn’t it?

@Hh

I won’t pretend I understand all that Hh but I’m sure it is interesting for those that do!

There does seem to be a lot of confusion around though. Monsanto for example, in their response, has said:-

“Despite author’s reference to OECD Testing Guidelines, the study design does not meet OECD standard for number of animals in a chronic study design (50 per group),… ”

Now, what are they talking about here. They give no reference to which OECD standard they are refering to.

Monsanto is correct that standard OECD cancer protocol is 50 rats but as Seralini has said, he had no reason to expect NK603 caused cancer as it was claimed this maize was totally safe in that respect.

So he did not do the OECD cancer protocol but did do the OECD chronic tox protocol. But in fact, cancer tumours were found.

Chronic toxicity study is 10 male + 10 female per treatment group under OECD protocol, which is what Seralini did.

The Seralini paper has been peer-reviewed and accepted by a reputable scientific journal with an IF of about 3 (yes I had to look that one up!) – so I think the least we can say is that this whole issue needs further experimentation.

The obvious next step is to do an OECD cancer protocol with 50animals in each test group.

I think it is a mistake just to focus exclusively on the tumour data when so many other pathologies are evidenced, especially when talking about statistical significance that requires a sufficient sample size. Granted, this focus has in part been due to the authors including photos of tumours in their publicity!

[Actually, according to my cigarette packet calculation there seems to be a borderline (well, almost) statistically significant conclusion to be drawn from the mammary tumour data, but it would probably relate mostly to the Roundup group. But let’s forget the Roundup data for a minute.]

This is from the paper’s Table 2

http://research.sustainablefoodtrust.org/wp-content/uploads/2012/09/Final-Paper.pdf

The table shows the number of pathologies, and number of rats affected in brackets

I removed the results of the Roundup group to fit it all in.

The first number in each row is the number of the pathologies in the controls (the number of rats in brackets), and the next three numbers are the number of the pathologies in the GMO feed groups GMO 11%, GMO 22% and GMO 33% respectively

Extract from Table 2

Males, in liver 2 (2) 5 (4) 11 (7) 8 (6)

In hepatodigestive tract 6 (5) 10 (6) 13 (7) 9 (6)

Kidneys, CPN 3 (3) 4 (4) 5 (5) 7 (7)

Females, mammary tumors 8 (5) 15 (7) 10 (7) 15 (8)

In mammary glands 10 (5) 22 (8) 10 (7) 16 (8)

Pituitary 9 (6) 23 (9) 20 (8) 8 (5)

The number of rats in brackets is out of 10 rats in the group.

For example, the data in the ‘Females, mammary tumors’ group shows that there were 5 rats in the control group with mammary tumours versus 7, 7 and 8 respectively in the three GMO fed groups. It is true that on it’s own, this difference would not be significant – e.g. if you took the base probability of a rat having a tumour as 27/40 i.e. (5+7+7+8)/40=0.675 then a binomial calculation for the control rats or the other rats would give something like p=0.2.

However when you consider the table AS A WHOLE, it’s pretty clear that the number of pathologies is much greater in the GM-fed rats overall than in the controls.

Six pathologies were evaluated across the untreated and the 3 treated groups, allowing 18 numerical comparisons with the rats in the control group. In only ONE of these 18 comparisons is the number of tumours in the treated group less than the control (9 vs 8 in the pituitary category of GMO33%).

Just adding up the columns, the total number of pathologies in the controls was 38 versus 79, 69 and 63 for the three GMO feed groups respectively. You don’t need a formal statistical test to see there is a significant result here!

So it’s a pity that the discussion rages along about the significance of one row – though as I say it’s partly the authors fault for that!

Regards

Hi Walter,

I would certainly not agree that Table 2 clearly shows anything. Without either comparing these things statistically, or at least showing all of the other “pathologies” that the group measured, it is really difficult to say anything about these results. How many other “pathologies” did they find that were not presented? How many of those were greater in the control group?

It seems to me that Table 2 is a “cherry picked” set of data that seemed to show what they wanted to show. But even then, if we are willing to say that a difference of 1 or 2 rats getting these pathologies is “significant” in the absence of supporting statistics, how can we explain the GMO group getting fewer pituitary abnormalities with greater GMO, but fewer at lower doses in the GMO-R treatments? Or more pituitary abnormalities at the 22% dose compared to 11 or 33% doses in the GMO+R treatments? All of this says to me that these abnormalities are all within the amount of normal variation one might see in the absence of treatments. Without a supporting statistical analysis, it is difficult to conclude otherwise.

I think you should get your story straight, Andrew. According to you, the table shows nothing, and is a fiddle as well.

Although your two objections are contradictory (unless the researchers are complete fools unable to even cherry pick data to get significance), lets take them individually.

“I would certainly not agree that Table 2 clearly shows anything.”

Well here are the number of pathologies in control rats compared with GMO feed (just taking the mean of the three GMO feed levels)

Males, in liver 2 pathologies versus 8

In hepatodigestive tract 6 versus 10.7

Kidneys, CPN 3 versus 5.3

Females, mammary tumors 8 versus 13.3

In mammary glands 10 versus 16

Pituitary 9 versus 17

Do any statistical analysis you like, the difference will be significant. Here is a very simple one: the number of rats with each pathology was greater in each GMO group except one, the ‘sign test’ giving seventeen out of eighteen. What are the chances of this happening by chance, if the feeds are all the same? Quite remote, I think, less than 1 in ten thousand.

“It seems to me that Table 2 is a “cherry picked” set of data that seemed to show what they wanted to show.”

Do you have any evidence for this claim? I don’t think so, as you haven’t presented any; and also, you have already said the data didn’t show anything anyway!

It’s not just about ‘a difference of 1 or 2 rats’ – in each of six groups (on average) that is – but the number of pathologies, which is nearly double in the GMO groups.

“…how can we explain the GMO group getting fewer pituitary abnormalities with greater GMO, but fewer at lower doses in the GMO-R treatments?”

‘How can we explain…’ is going from science to rhetoric, awarding yourself the benefit of any doubt.

You have answered your own question – random variation. Anyway I think the researchers do explain the lack of dose response, there being a threshold in hormone effects or something like that.

Regards

Roger,

My story is quite straight, and I think I’ve been very clear. The results presented in the Seralini paper are flawed due to poor design and (lack of) analysis. They are likely due to random variability as I’ve described. I don’t believe there is any need to explain the pituitary abnormalities, precisely because I think the data are meaningless. I was asking you to explain the data, because you are the one trying to “see” trends where none exist. If you agree with me that the results are due to random variation, then we can agree that the results are basically meaningless.

I would encourage you to read Emily Willingham’s piece: http://goo.gl/WLArA She has presented the same data in a far easier to read format than Seralini et al. She makes a pretty convincing case that the data show absolutely no effect of NK603 corn.

Hi Andrew,

Yes, you have been very clear and I appreciate that you are now saying Sprague-Dawley rats couldn’t be used for more than a 90 day experiment, but I’m actually not trying to see any trend. I’m just trying to work out what’s going on here.

We have been presented with a peer-reviewed study, accepted by a respected science journal, I think we need to take it seriously and see what the scientists who conducted it are actually saying. Hence my posts.

Significant typo !I meant of course …”I appreciate that you are now saying Sprague-Dawley rats could be used for more than a 90 day experiment,”

Also I found this of interest – it comes courtesy of GMWatch in the UK.

Seralini based his study on the chronic toxicity part of OECD protocol no.

453. It states that for a carcinogenesis trial you need a minimum of 50

animals of each sex per test group but for a toxicity trial a minimum of 10

per sex suffices.

Monsanto’s earlier 90-day feeding study on NK603, submitted to the EU in

support of its approval, had been re-analysed by Seralini’s team. They found

it reveealed signs of liver and kidney toxicity:

de Vendomois, J. S., F. Roullier, et al. (2009). “A comparison of the

effects of three GM corn varieties on mammalian health.” Int J Biol Sci 5(7):

706-726.

So for this new experiment, Seralini’s team chose a chronic toxicity

protocol to see if the signs of liver and kidney toxicity escalated into

something serious, which they clearly did.

We MUST remember that the study embarked on by Seralini’s team was NOT a

carcinogenicity study but a chronic toxicity study. That is, they did not

embark on a study to see if the GM maize or Roundup caused cancer. The rise in

tumour incidence was unexpected and a surprise and thus not planned for.

So it is disingenuous to call this work a carcinogenicity study. The

experimental design that Seralini used, compared with Monsanto’s 90-day study,

was more extensive (longer; greater number of tests groups; greater range of

parameters measured; diets better characterised including certainty that

the control diet was non-GM, which Monsanto failed to provide data on in

their 90-day feeding trial data; proper control diets as stipulated by EU GMO

legislation instead of irrelevant control diets as used by Monsanto).

It is also worth remembering that Monsanto used 20 rats of each sex per

group in its feeding trials but bizarrely, they only analysed 10, the same

number as Seralini. So Monsanto does not have a leg to stand on on this point!

We wonder why Monsanto only analysed 10 rats out of 20. Were these

randomly chosen or were they selected because they were apparently healthy?

Monsanto’s data, like most such industry feeding trial data on GMOs, is not

published so we cannot check this.

Roger, I don’t know who wrote that piece you posted from GM Watch, but they seem to have little understanding of how science is approached.

Just to tackle one of the points raised: if the appearance of cancers was unexpected, the first things the authors should have done was gone to the literature. There they would have found research papers going back to 1956 describing the high incidence of cancer – up to 85% in some studies – in this rat when kept for long periods of time. They would also have discovered a high incidence of mammary cancers in females was normal.

At that point, they should have identified the issue and understood their experimental set-up was of no value in trying to determine whether the treatments were having an effect on tumours.

Instead, this appears to be a post hoc explanation for why the researchers didn’t have an appropriate design for their experiment. As the design was wrong, either a priori or post hoc it doesn’t matter, then the conclusions are bogus. Seralini should never had made them. He is either incompetent at science or it was deliberate.

Chris – I again have to refer you to what the Seralini team said on this…

Q: It is recommended to experiment on 50 rats for a statutory study on carcinogenesis. What value to bring to your results on 10 rats?

A: We studied 200 rats, 10 rats/group. Statutory biochemical studies are recommended by the OECD on 10 rats per group minimum. No statutory study which allowed the authorization of GMOs had more than 10 rats measured per group.

We therefore made the most robust tests in the world, especially as we were examining the long term.

We could not anticipate the results of the tumors, but we observed and recorded them in this study, what was normal, it is not the study of carcinogenesis that would not have allowed us to observe the hepatorenal effects and others.

…As I said in an earlier posting I think the least we can say is that this whole issue needs further experimentation.

and the obvious next step is to do an OECD cancer protocol with 50 animals in each test group. Would be in support of that?

In addition Chris I should add that all you really seem to be bringing up again is the “they’ve used the wrong rat” argument.

I feel that issue has been dealt with at some length in other posts here. But to be clear, here is more on that particular point.

CRITICISM: Strain of rats used Sprague-Dawley (SD) is prone to tumours

RESPONSE: SD rats have been used in most animal feeding trials to evaluate the safety of GM foods, and their results have been used by the biotech industry to secure approval to market GM products. They were used in the 90-day feeding trial that was conducted by industry to evaluate the toxicity of NK603 GM maize as part of the application for approval within the EU. They were also used in the original glyphosate two-year toxicity studies conducted in 2002 for regulatory approval within the EU.

The industry standard for toxicity tests performed by industry for regulatory purposes is the international protocol set out by the OECD (Organisation for International Cooperation and Development). This says that long-term carcinogenicity studies should be performed with the same strain of rat as used in shorter mid-term experiments, because this allows effects seen in the shorter experiment to be tracked to see how they develop in the long-term experiment, without the confounding factor that would occur if a different strain of rat was employed. Therefore, based on the past use of SD rats in trials of GM food and glyphosate it was scientifically correct and consistent to use this strain in Prof Seralini’s long-term study.

The rats that consumed NK603 GM maize and/or Roundup in Prof Seralini’s trial had an incidence of tumours, which was not just significantly greater than the control rats but also also significantly greater than observed in previous studies of SD rats. The tumour incidence in the test groups in his study was overall around three times higher than that the normal rate observed in the Harlan Sprague Dawley rat strain he used, as reported in the literature (Brix et al., 2005) including in the largest study with 1329 Sprague Dawley female rats (Chandra et al., 1992).

Furthermore, the key is that there were both quantitative and qualitative differences in the tumours arising in control and test groups. In the control rats they appeared much later and at most there was one tumour per animal if at all. In the treated rats the tumours began to be detected much earlier (four months in males; seven months in females), grew much faster and many animals had two or even three tumours. Many animals in the test groups had to be euthanised under animal welfare rules due to the massive size of the tumours; none of the control animals had to be euthanised but died in their own time. One should not ignore these biological facts.

Roger, if you read the OECD guidelines yourself rather than relying on GMWatch, you will discover 1) the guideline for chronic toxicity (OECD Guideline 452) requires a minimum of 20 animals per treatment and sex, not 10. The number 10 comes from the guideline for a combined chronic toxicity and carcinogenicity test (Guideline 453). However, the authors have already said this was not a carcinogenicity test. Even then, Guideline 453 requires a minimum of 50 animals per treatment and sex with a minimum of 10 in the chronic toxicity arm. The 10 required here is because data from the 40 in the carcinogenicity arm will support the results of the chronic toxicity arm. So Seralini did not follow the OECD guidelines.

Secondly, the OECD guidelines require sufficient animals for a statistical evaluation. This means doing a power analysis before beginning. What a power analysis does is tell you how many subjects you need to reliably find an effect you are testing for if it is present. 10 subjects might be sufficient if you are looking for a very large difference (for example where the difference in values is much greater than the standard deviation), but 50 subjects would be needed if you are looking for a smaller difference.

In this case, because the rats are highly prone to cancer with between 70 and 85% expected to get cancer in their life-time, 10 individuals was simply far too few. As it was far too few, there was no point in doing the test in the first place and having done the test, no conclusions can be drawn from it. The fact that Seralini wants to draw conclusions from this test and failed to alert his readers to the prior literature showing high numbers of cancers indicates he is either incompetent or did this deliberately.

This work is so bad and so skewed it really should be retracted, because it is not science. As it is not science, it raises no questions of science that need to be investigated. What it does do is raise questions about the authors of the study.

Hi again Andrew, just realised you were actually addressing Walter not me when you said “My story is quite straight, and I think I’ve been very clear.”

Hi Andrew. Having called for a statistical analysis, when faced with the most simple one possible you appear to rely on puzzling assertions that nothing is going on. You also appear to be lingering uncomfortably close to that smokescreen:

“If you agree with me that the results are due to random variation, then we can agree that the results are basically meaningless.”

That obviously misinterprets my point. The “pituitary abnormalities” in the control vs GMO comparison (remember we were not discussing Roundup) amount to only one comparison – where the difference was one pathology! You don’t seriously suggest that ‘random variation’ can not explain that ONE difference of one, without simultaneously junking the other SEVENTEEN comparisons?

Emily Willingham’s piece is interesting. She suggests that BPA may be responsible. Your interpretation of her story doesn’t appear straight either. One the one hand there is nothing to explain, the data simply show random variation. On the other hand, it could all be explained by BPA! That’s a second mutually contradictory pair of criticisms from you. A menu is developing but no coherent position other than you don’t LIKE the results.

To be fair to Emily Willingham, she is more concerned with the Roundup issue and clearly knows her stuff on the mysterious hormone effects.

She puts forward the useful idea of comparing the number of pathologies per affected rat. So lets look at that (see blog table).

Leaving out the kidney results (for which the ratio was invariably one) the ratio for the GMO feed was less than that for the normal feed only twice out of fifteen comparisons. One was a draw, so let’s say (leaving out the Roundup once more) that twelve out of fourteen comparisons went the way of more patholgies per affected rat in the GMO-fed groups. According to my sophisticated statistical analysis [sign test using binomdist(12,14,0.5,1) in Excel or something like that] tells me this is less than a 1% chance. Another highly significan result, surprising that you syill insist there is no pattern.

And this is a different entity from the number of rats with tumours – indeed Emily herself argues this might be a better indicator.

Cheers

@Walter “According to my sophisticated statistical analysis [sign test using binomdist(12,14,0.5,1) in Excel or something like that] tells me this is less than a 1% chance. Another highly significan result, surprising that you syill insist there is no pattern.”

Except that in this case the binomial distribution does not follow, at least not in a simple manner. Here, your 12 and 14, that is the # of GMO pathologies and the # of total pathologies, respectively, are both random variates. In the true binomial, the total would be fixed. What you are suggesting would be a two phased random distribution where we first draw the number of pathologies and then draw the number of GMO paths conditional on that. If we had some assumption on the distribution for the total number of paths (a place I’m not willing to go), one could then run multiple simulations such as those above to test your hypothesis more accurately. Depending on how much variability was present in the total numbers distribution, the resulting differences could easily be swamped out.

Earlier, I had done some modeling on the tumor occurance data (parenthetical data in Table 2) and found no significance. I was leery of taking on the number of pathologies data as, to me, they do not seem independent. That is, given that a rat develops some pathology, it is probably going to influence whether they develop another of the same or even different pathologies. Willingham’s relative rate ratio form is nice, but that form eats up the limited replication, which precludes more rigorous analysis in the ANOVA framework.

Hi there Pdiff, good post. In the sign test you just compare the number of outcomes which are greater with the number that are less. In the revised table created by Emily W, out of 14 comparisons, in 12 cases there were more pathologies per affected rat with the GMO feed than the controls. So the outcome of each comparison IS a binary one suitable in itself for the binomial distrubution – (as was the previous one I offered, involving the number of rats with patholgies)? It’s VERY crude, but is usually a precursor to a more sophisticated analysis. When an obviously significant outcome results, it’s not usually a precursor to stopping completely! Or dismissing the thing as showing nothing, that show bias. Not that you have been as dismissive yourself, carrying on with the scientific thought process. I appreciate that 🙂

I am not suggesting modeling the tumour data on its own, I’ve already noted it doesn’t appear significant (maybe you might get significance if you incorporate both the number of rats with tumours and the number of tumours per rat).

Some analysis of the data in Table 2 would be needed. I take your point there probably is a degree of dependence that couldn’t be estimated statistically, perhaps medically. This would reduce the p-value (0.0065 I make it) but by how much? In the first sign test I suggested, the p-value was much less than even that. Can the possible dependence explain it all? It’s hard to see why the results should be dismissed.

Regards

@myself Sorry, a typo in last post (or rather, a ‘thinko’)

Just to avoid confusion, I said

“This would reduce the p-value…”

I meant that this would INCREASE the p-value

Duh!

Yes, I see what you saying, although I am not seeing where you come up with 14. If you run across all pathologies (An unwise move, IMO. The sign test you utilize requires independent data, which these certainly are not. This is, in fact, pseudo replication implying more degrees of freedom than actually exist.), I get 28 usable ratios (non ties). Perhaps I’m just being dense here. Did you halve this for some reason?

At any rate, I believe my original premise still applies. While the ratios look like good solid values, they, in fact, are not. We have no good measure on their variability and hence can’t tell if a 1.25 or 1.5 or even 2.0 is really significantly different from 1.0. Ratios are deceiving creatures in that we can not discern the magnitude of underlying differences (a ratio of 2.0 can arise from 1 vs 2 as well as 5 vs 10, yet they are treated equally in the system you propose.). To use the sign test we must be able to classify with certainty each ratio as either greater than or less than the control, with ties being ignored. The certainty of the process is lacking. Because of this, your classification into greater than or less than a control is unintentionally deceiving in its certainty, IMO.

I’ll bite and ignore this, however, in the interest of conversation. A power analysis of the sign test as I see it (n=28, positives=23) looks initially pretty good at 0.95. My reservations on the pooling of data over pathologies and the issue of pseudo replication it raises, however, remain. IMO, it is only appropriate to look at such a test organ by organ. This of course greatly reduces the power of the subsequent tests (to around 0.1), in the non degenerate cases where it can be computed. As I claimed above, I believe there is little useful statistical analysis that can be gained from the ratio data.

A small rant on my part here: Aside from these issues, there is the problem of observer bias in these data as they did NO blinding of the treatments. Given that the pathologies were in large degree done by palpitation (i.e. observer judgement) and that the researchers have shown overt bias previously against GMO (and will apparently receive financial gain from the outcome via book and movie releases this week), the outcomes in Table 2 are highly suspect. In truth, every researcher/analyst is biased. It’s just human nature. Study blinding is a very effective tactic to circumvent these human limitations. The lack of blinding in this study is by far the most glaring problem and should have been sufficient rationale for rejecting the work outright. Unfortunately, it was not and subsequently it makes all our arguments and counter arguments less than academic.

Hi again Pdiff. My number 14 was got from 3×5 minus one. Not degrees of freedom, nothing so technical, just by ignoring roundup – thereby reducing Table 2 to six times four, i.e. six patholigies of which the kidney seems not relevant as the ratio was one throughout (even including the roundup data). Of these fifteen comparisons one was equal and is discarded for the sign test.

You are clearly technically capable – perhaps that’s where the problem lies. Or perhaps, the problem lies with me – perhaps I think it is so ‘obvious’ the data are significant that I am not detailing a sufficiently rigorous argument?

There should be many ways to deal with your legitimate objection, given an apparent preponderance of ‘near misses’.

For example, are the data on Males in liver all about tumours? If so, Emily W’s table gives 3 pairs of data for tumours in male rats and 3 pairs for tumours in females. As the ratio for GMO fed rats is higher in each case, this alone would give a p-value for tumours of 1/2^6 or 1/64, i.e. about 0.016. The text isn’t quite clear enough to be sure of my assumption, though. Narrow escape, or just my lack of technique?

According to the authors in the text, the tumours almost invariably appeared EARLIER in the GMO-fed rats – however we don’t have precise details to fit this information into a numerical test. That’s another narrow escape – the GMO feed is hanging on by the skin of those poor rats’ teeth!

Time to put those rats out of their misery, and do a simple test that answers your valid technical objections. For both male and female rats, there are data on 3 pathologies. There are 3 comparisons for each pathology. The triplet from each male pathology data could be combined with that from each female pathology to produce a sign test with 6 pairs of data. There are therefore 9 such possible sign tests.

Further, these tests do no suffer from the drawback you raised (of the data not being independent) – as in each of these sign tests, there is only data from one male pathology and one female pathology. So there is no problem in combining them, we are combining data on different rats.

These sign tests are obviously simple to calculate. As I mentioned, in all but one of the tests (the pituitary in the bottom row of Table 2) there were more pathologies on the GMO treated rats than in the controls. In the sign tests that include this row, the GMO-fed rats had more pathologies in five out of six. There are three of those.

In the other six tests, however (i.e. using the other two female pathologies) there is a ‘clean sweep’ – the sign test produces six out of six where the GMO-fed rats had more pathologies than the controls.

So for six of the possible sign tests, the p-value was 1/2^6 or 1/64, i.e. about 0.016. For three sign tests it would be 7/32 or about 0.22. So, six tests out of nine produce a statistically significant result.

Overall, this is a far more significant result than one significant one.

But as I say, with more precise information there would surely be others. It is the authors mistake not to hammer the point home – but I think they thought it obvious too :(.

Cheers

Walter

Walter, I am replying to this in a new comment below (I think) to avoid the increasingly narrow formatting of replies. Old eyes are not compatible with the ‘nets.

See below…

Pdiff, on your criticism that the testing wasn’t ‘blinded’. Were the other studies blinded that pronounced GM food safe? This systematic review makes no mention of blinding, at least not in the abstract – I can’t tell but it looks like this was not a prerequisite of the review, at least.

http://www.ncbi.nlm.nih.gov/pubmed/22155268

This is in reply to Chris Preston’s post dated September 24, 2012 at 5:22 pm. Unfortunately there was no reply button on the end of the post so you’ll need to check back to see what he was saying.

In reply:

Seralini used OECD453, not 452. 453 is the combined chronic toxicity and carcinogenic test. Seralini was doing the chronic toxicity bit. He did not embark on a carcinogenicity test as there was no data from Monsanto or any independent scientists suggesting that NK603 was carcinogenic. There is no prior data on which to base your claim that Seralini should have done OECD452 rather than OECD453.

This is from OECD453, which Seralini followed:

“Each dose group (as outlined in paragraph 22) and concurrent control group intended for the chronic toxicity phase of the study should contain at least 10 animals of each sex, in the case of rodents”

Seralini used 10 animals each sex per treatment dose.

And 10 animals per sex control group.

This is in line with the chronic toxicity arm of 453.

As for your point “Secondly, the OECD guidelines require sufficient animals for a statistical evaluation” and thus a power analysis is required – this is a nice idea but I cannot find in the OECD guidelines where it says you have to do a power analysis. Are you able to point to that bit?

Also we have never seen a power analysis done by industry to calculate the number of animals needed for the best study design. Instead the standard OECD numbers are used. And because Monsanto say they did not know about any effects NK603/Roundup might cause this means there was barely any prior data available, so a power analysis becomes difficult or impossible. This is what EFSA says about power analysis:

“The required sample size in clinical trials is usually determined using a power analysis. This requires a

decision on an effect size of scientific interest (EFSA, 2011) and an estimate of the standard deviation

(assuming quantitative variables). However, there are difficulties in defining the effect size in a

toxicity test where there are multiple possible outcomes, any one of which could indicate toxicity.”

In independent (non-industry) scientific studies, the number of animals is usually decided on an understanding that significant differences could be shown by the data. Having said this, it is dependent on the circumstances to see how easy to show significant differences. If the animals are very similar and the conditions of the experiment are well controlled, significant differences can be detected by using 4-5 animals (note: this is NOT the case with using historical control data, as Monsanto says Seralini should do, in its response). If you check peer-reviewed scientific papers on nutrition, other biological effects, even for cancer development, you will find scientists using 4-5 rats or mice per treatment group all the time. It is industrial “cancer trials” with animals of different ages, body weight and so on which require 50 animals to be able to show an effect.

To have 50 for cancer studies, and 10 to find toxicology is just the OECD and industry playing with numbers. If 10 animals can show up “tumours” but not “cancer” (according to OECD protocols) this does not invalidate the data. If you plan to look for 1 side effect, but find another, should not you notice it? Seralini did not plan to find cancer or tumours at the start of the experiment.

Seralini cited the background data on tumour incidence in his strain/origin of rat. He got his SD rats from Harlan, not Charles River labs, yet Monsanto cites background data from CR rats. This is another variable that makes the Monsanto comparison invalid. You will note in Seralini’s paper he makes specific reference to the Harlan SD strain and publications that have evaluated tumour rates in these animals. He accounted for the background rates in that strain/origin of rat.

Roger, perhaps you should make up your mind. Is this a combined chronic toxicity and carcinogenicity study or a chronic toxicity study? You seem to be changing your mind every time you post. If it was a combined chronic toxicity and carcinogenicity study there needed to be at least 50 animals per group. One can’t decide one is only going to do the chronic toxicity arm of that type of study. Perhaps you needed to read the two sentences after the one you quoted. They would have provided you with all the information you needed “It should be noted that this number is lower than in the chronic toxicity study TG 452. The interpretation of the data from the reduced number of animals per group in the chronic toxicity phase of this combined study will however be supported by the data from the larger number of animals in the carcinogenicity phase of the study.”

So why did you not quote the rest?

Roger, when it says a sufficient number of animals should be used so a statistical evaluation is possible, the only way to work out how many animals are required is to do a power analysis. This is a pretty standard statistical approach. It is not a case of knowing what size the effect will be, but knowing what size of effect will be a concern. The normal standard deviations are known for all the values normally measured in feeding studies and it is assumed the s.d. for the treated groups would be the same. Generally speaking what you are looking for in a whole food feeding study is a value well outside the known range – often 2 s.d. from the mean. As effect is greater than s.d. 20 animals per group would usually be sufficient – depending on the confidence level you wanted. Monsanto’s original test had two groups of 20 rats in the treated arm and 8 groups of 20 rats for controls. From a statistical point of view this is a much stronger study.

‘morning Chris,

Changing my mind every time I post is quite a charge! Could you do the quotes to show that?

Also I should add that Seralini has been really clear that this was not a carcinogenic study. It was a combined combined chronic toxicity and carcinogenic test (OECD 453). Seralini was doing the chronic toxicity bit, as has clearly said. And as I think I clearly posted. Tumours are one of the endpoints you look for in 453.

Having a crystal ball woudl have been the only way Seralini could have known he had to do a solely carcinogenic study on NK603.

So, as I have also said in previous posts, the obvious next step is to do an OECD cancer protocol with 50 animals in each test group. Could I assume that you would be in support of that?

And just to make things even clearer, ’increased tumours’ is a valid conclusion to draw from Seralini’s experiment with the numbers of animals he used, according to OECD protocols, because it’s one of the endpoints that you should note in the chronic toxicity phase of OECD453. And because the chronic toxicity phase of OECD453 requires that number of animals (10 per sex per dose group).

On the other hand ‘carcinogenic’ is NOT a valid conclusion to draw from Seralini’s experiment according to OECD protocols, as you would have to do the 50 animal test. And Seralini is very careful what he says in this regard.

The conclusion surely has to be that a 50 animal carcinogenic test needs to be done, based on findings of this chronic toxicity study of Seralini.

Roger, so it was a chronic toxicity study then?

If so, there should have been at least 20 rats per group to conform to OECD guidelines. Seralini’s argument that he only did the chronic arm of a combined chronic toxicity and carcinogenicity study in fact breaches the OECD guidelines as I have pointed out. It is not possible to do just the chronic toxicity arm, because you need the other arm to provide additional information.

As to the tumours, what Seralini should have done was recognise he did not have enough data and not try to draw any conclusions from it. Or at least analyse it properly. I have done a quick and dirty analysis of the female tumour data (it is a bit flawed because the numbers per rat are not reported and total numbers are unlikely to be independent) and there is no significant difference in the treatments.

If Seralini is really saying that he followed OECD guidelines by only doing the chronic toxicity arm of a combined chronic toxicity and carcinogenicity study, then he is being duplicitous.

Chris,

I’m glad you seem to be acknowledging that you were mistaken when you said earlier:

” September 25, 2012 at 6:15 am

Roger, perhaps you should make up your mind. Is this a combined chronic toxicity and carcinogenicity study or a chronic toxicity study? You seem to be changing your mind every time you post.”

There was no confusion on my part about OECD453, which is stated clearly in Seralini’s paper. It was you who somehow claimed Seralini was doing 452? Seralini has been clear, they were using the chronic toxicity arm of OECD453 in their experiment. This arm is carried out and analysed separately so it is perfectly possible for an independent non-industry scientist to only do that part.

The tumours were an unexpected result as Monsanto had not found, or omitted to test for, tumours. As Seralini didn’t do the carcinogenic arm of 453 (lack of resources probably), he didn’t need 50 per group.

The lack of the carcinogenic arm is is why Seralini has never concluded “carcinogenic” but only “increased tumours”–this is a valid endpoint for the chronic toxicity arm of OECD453. So your point about “interpretation” in the OECD guidelines is redundant here, as Seralini has not wrongly or over interpreted.

Monsanto had 20 animals in each group but only analysed 10 (de Vendomois 2009) and based their statistical analysis on 10 per group! This is stated in published literature, though Monsanto did not publish its study. This is irregular. I’d be interested in why you think that was?

So Seralini’s study remains the most in-depth. And I have to point out that you haven’t said anything about industry never doing power analysis in its studies for regulatory purposes.

The point is that as a non-industry scientist (OECD guidelines are for industry tests for regulation purposes and the protocols are not seen as very sensitive by independent scientists) Seralini was under no obligation to do any particular arm of 453. So with regard to independent scientists, they cannot be accused of “breaching” an OECD protocol when they only say they based their study on OECD453 but then added more time, more perameters, and only did the chronic toxicity arm — drawing only conclusions based on what can be seen in the chronic toxicity arm.

The Seralini team chose to do the chronic toxicity arm but put in far more perameters (health effect measurements) and made the study longer than OECD stipulates. They did OECD453 + extra perameters and length of time, but minus the carcinogenicity arm, which needed far more animals. The conclusions that Seralini drew are valid for that part of the test, and that part of the test only.

That is why I keep asking you whether you would be behind the call for a further experiment for carcinogenicity. This is really up to Monsanto to pay for as it would be a considerable cost, but I would argue that it should be carried out (paid for by them but done by an independent science body) as it looks like previously they have failed to do a proper carinogenicity test.

And on the need for more experiments:

a number of independent academics have praised the French team’s work, describing it as the most thorough and extensive feeding trials involving GM to date.

Mustafa Djamgoz, the Professor of Cancer Biology, at Imperial College, London, has said the findings relating to eating GM corn were a ‘surprise’.

And Prof Djamgoz, who describes himself as a neutral on GM, said: ‘The results are significant. The experiments are, more or less, the best of their kind to date.’ However, he said that it is now important to ensure they are repeated with more animals by independent laboratories to confirm the outcome. ‘We are not scaremongering here. More research, including a repetition of this particular study are warranted,’ he said. The professor said it will take two to three years to get a definitive answer.

Roger, it is clear that Seralini et al. did not follow the OECD protocol. You now seem to want to weasel them out of that by claiming that basing their research on the protocol means they could do something entirely different altogether.

I and others have demonstrated why this particular piece of research is a complete load of junk. Despite this, you continue to make excuses for the authors – presumably because you want to believe the results. Every time I, or others, deal with one of the issues you raise, you move the goalposts somewhere else. Clearly I am not going to convince you otherwise, so there is little value in continuing the conversation.

Chris,

You are really now putting words in my mouth. First of all if you look back at my postings there has been no “shifting of goal posts” at all, and I’d like to know where you think there has been? And you are wrong to assume that I have made my mind up on the Seralini team’s work. All I have done is try to turn people’s attention back to what they are actually saying rather than what others are claiming they are saying. There was an awful lot of instant reactions put out barely hours after the journal published their work, all designed to show that their 2 year study was, as you put it, “a complete load of junk”. More discerning voices have called for their work to be considered seriously. Certainly the peer-reviewing panel that accepted it for publication in the journal Food and Chemical Toxicology were of that opinion. The best conclusion I can come to is that this requires further work, and shouldn’t be dismissed totally out of hand within a few hours.

So again, I have to ask you, would you be behind the call for a further experiment for carcinogenicity, using larger sample sizes? Seralini did not plan to find cancer or tumours at the start of the experiment, so this does seem to be the next logical step. Whether it is the next practical step, because of costs, remains to be seen.

Back to the main charge of this article. Let’s assume that Andrew’s statistical approach is correct and that, due to randomness, there is a possibility of having one group with 5 tumours and the other group with 10 tumours due to chance alone.

Therefore, you are saying that the maximum difference that you could get when comparing one group to another, is one group could have twice as many tumours as the other group, due to chance alone. I hope I haven’t misinterpreted that.

However, Seralini found in male rats, that one group had four times as many tumours as the other group. This is twice what would be expected by chance alone according to the statistical approach you are choosing. Therefore hasn’t that staistical approach just shown that Seralini appears to have found adverse effects in his rats that cannot be due to chance alone.

Hi Roger, yes you have mis-interpreted that. Under those numbers, 5 and 10 could come up by chance with no treatment. To find something significant you would need more than 10 rats showing the effect.

But Monsanto only analysed 10 out of 20 rats in its study, you can check that in the literature, so I think that answers this particular statistic point.

(See my posting September 26, 2012 at 2:09 am for more on this.)

Roger, what has that got to do with your mistaken idea that 5 rats having tumours in one group is statistically significant to 10 having tumours in another group.

You almost appear to be saying that as you disagree with how Monsnato ran their study, therefore the Seralini study is correct. That is wrong. each study has to stand on its own merits. I have already pointed out why the Monsanto study was on much stronger grounds statistically. It had mauch larger control groups, more of them and fewer treatment groups.

All I’m saying is that Seralini found in male rats, that one group had four times as many tumours as the other group. This is twice what would be expected by chance alone and should be investigated further.

You mention that Monsanto used more rats but Monsanto only analysed 10 out of 20 rats in its study – we have to ask why that was?

The problem I think most people have, when trying to look at this as objectively as anyone can, is transparency. There is a call for Seralini to release more data than he was able to include in the limired space of the journal. This I think is a fair request and one that I haven’t heard the Seralini team say they won’t be doing at the appropriate time. But equally Monsanto needs to be far more open with its data than it has been. Monsanto has not only refused to release the raw data behind its safety studies but has even fought through the courts to try and prevent it from reaching the public domain.

@Walter: (Continued from above to facilitate formatting) I’ll take this a piece at a time.

You said: “Hi again Pdiff. My number 14 was got from 3×5 minus one. Not degrees of freedom, nothing so technical, just by ignoring roundup thereby reducing Table 2 to six times four, i.e. six pathologies of which the kidney seems not relevant as the ratio was one throughout (even including the roundup data). Of these fifteen comparisons one was equal and is discarded for the sign test.”

Ok, I see this now. You are eliminating all roundup treatments, not just the water ones. I too dropped the kidney pathology, although sign rank will do this as part of the analysis (ties drop out). For the record, it actually is a degrees of freedom issue as it controls the amount of information entered into the analysis. The DF does not come into play in this analysis directly, however.

One can compute a power analysis on this test. Doing so, assuming n=14 and the proportion of 0.8571 (12/14), gives a moderate value of 0.67. That is, the probability of recovering a true difference given the assumptions above is 67%. That’s not bad nor as high as one would typically desire. I would still argue, however, that these 14 observations are not independent. The different pathologies are all measured across a common set of rats, with some rats exhibiting multiple pathologies and those may be physiologically dependent (I am not knowledgeable in this area, but see it as a distinct possibility).

Ignoring this, I will also again raise the issue of significant difference from the control values. You assume that because you get a numerically larger value than the control that this automatically indicates significance and designate this as a positive hit. As was demonstrated by Andrew and others above, however, such simple numerical superiority is not a sufficient measure of treatment effect in the presence of variability. If it were, I would be out of a job and experimentation would be unnecessary. Given the variability of the real world, we must use probability based measurements of effect. In this case, we would need to (somehow) make a statistical judgement as to whether a particular treatment exceeds the control value sufficiently (within the range of variability) to record it as a positive hit. Let’s imagine for moment that the measured pathologies are only accurate to within 2 (0.14 in ratio terms). This , BTW is very generous as the actual variability in pathologies is much higher. If we then base significant difference on whether the treatment could potentially be within 2 pathologies of the control value (i.e. control +2 and treatment -2), the sign test now becomes 8 out of 12 or 57% with a p-value of 0.60. I am not suggesting this is the correct means of analyzing this data, but merely demonstrating the sensitivity of the measurements to variability.

You said: “For example, are the data on Males in liver all about tumours? If so, Emily W’s table gives 3 pairs of data for tumours in male rats and 3 pairs for tumours in females. As the ratio for GMO fed rats is higher in each case, this alone would give a p-value for tumours of 1/2^6 or 1/64, i.e. about 0.016. The text isn’t quite clear enough to be sure of my assumption, though. Narrow escape, or just my lack of technique?”

I’m sorry, but I am not following you here. Where do you see that there are three pairs for males and three for females? In any case, I believe the objections stated above still apply here.

You said: “According to the authors in the text, the tumours almost invariably appeared EARLIER in the GMO-fed rats – however we don’t have precise details to fit this information into a numerical test. That’s another narrow escape – the GMO feed is hanging on by the skin of those poor rats’ teeth!”

This is the palpitation data and has been judged suspect due to its subjective nature, so let’s skip that for now.

You say: “Time to put those rats out of their misery, and do a simple test that answers your valid technical objections. For both male and female rats, there are data on 3 pathologies. There are 3 comparisons for each pathology. The triplet from each male pathology data could be combined with that from each female pathology to produce a sign test with 6 pairs of data. There are therefore 9 such possible sign tests.”

Still not seeing where this is coming from, but combining genders is a definite non-starter given a prior knowledge that males and females of this breed exhibit markedly different outcomes re: tumors.

You say: “Further, these tests do no suffer from the drawback you raised (of the data not being independent) – as in each of these sign tests, there is only data from one male pathology and one female pathology. So there is no problem in combining them, we are combining data on different rats.”

If you do a test with a triplet within a pathology, say Female mammary, then yes they are different rats. The power of the resulting test, however, is too low to even reliably compute (I tried). Combining over genders is not advisable either, not from an independence point of view, but because of the confounding effect of gender. You are not just measuring treatment anymore.

You say “But as I say, with more precise information there would surely be others. It is the authors mistake not to hammer the point home but I think they thought it obvious too.”

I wouldn’t say surely at all, given the data variability. Given Seralini’s past tendency to gloat over significant results, it is very telling to me that they, in fact, did not analyze this data. Being “obvious” is a judgmental decision. I could just as easily point out the number of studies finding no effects from GMO to say “it’s obvious there is no GMO effect”. Care to run your sign test on positive versus negative studies?

I’ll leave you with a quote I posted elsewhere from statisticians Efron and Tibshirani :”Left to our own devices, we are all too good at picking out non-existent patterns that happen to suit our purposes (1993)”.

Hi again, about those sign tests you say

“You assume that because you get a numerically larger value than the control that this automatically indicates significance and designate this as a positive hit.”

Pdiff, I don’t think you (or our esteemed host) have quite got the point of the sign test – you are probably too advanced to ever wish to use it! It doesn’t make assumptions about the underlying distributions. The presence of variability is of no consequence.

[Yes you would be out of a job if you relied only on sign tests :)]

Come on down to my level https://en.wikipedia.org/wiki/Sign_test.

Measurement error doesn’t come into it, that’s clutching at straws when the significance level is so strong. I agree it’s a low power test you nevertheless reach statistical significance so you cannot just dismiss the results. It can also reveal certain technical dismissals as more sticking in the mud than real …

“I’m sorry, but I am not following you here. Where do you see that there are three pairs for males and three for females? In any case, I believe the objections stated above still apply here.”

By the ‘three pairs for males’ I mean control v GMO 11%, control v GMO 22% and control v GMO 33% – three comparisons for each pathology. You have three pathologies for male and three for female hence nine combinations of two triplets.

You concede that combining across the genders bypasses the independence hurdle. Your new objection is “because of the confounding effect of gender. You are not just measuring treatment anymore.”

I think this criticism is rather hopeful. You have given no reason why rats should get various pathologies due to ‘gender’.

“I wouldn’t say surely at all, given the data variability. Given Seralini’s past tendency to gloat over significant results, it is very telling to me that they, in fact, did not analyze this data. Being “obvious” is a judgmental decision. I could just as easily point out the number of studies finding no effects from GMO to say “it’s obvious there is no GMO effect”. Care to run your sign test on positive versus negative studies?”

Personalizing, then changing the subject – not very statistical behaviour! Seventeen numbers out of eighteen, and then twelve numbers (with different meanings) out of fourteen surely IS a pattern. As noted, the explanation of ‘variability’ is a red herring (that’s the point of the sign test).

Your parting quote is interesting:

“Left to our own devices, we are all too good at picking out non-existent patterns that happen to suit our purposes (1993)”.

This does not apply here, because the pattern is clearly not ‘non-existent’ – the GMO pathologies at least SEEM overwhelmingly higher, or nobody would be discussing it! The actual question here is whether statistical significance is achieved (seems that it is though). In saying ‘picking out non-existent patterns’, you are kind of misrepresenting yourself – as you have spent a deal of time trying to show the ‘pattern’ wasn’t statistically significant.

Your points are becoming less statistical and more rhetorical. Currently we appear to have a few statistically significant results awaiting proper explanation, though,

Cheers

“Pdiff, I don’t think you (or our esteemed host) have quite got the point of the sign test – you are probably too advanced to ever wish to use it! It doesn’t make assumptions about the underlying distributions. The presence of variability is of no consequence.”

Uhhh, seriously Walter, you need to think/study this topic more. I don’t think you have the point of misusing a test statistic. There are definitely assumptions on the distributions. Where do you think your p-values are coming from? Answer: Binomial distribution. The unknown underlying probability distribution is supplanted by an assumed distribution that is based on a number of observed successes and failures out of a fixed number of trials. As I’ve tried to point out, and you have failed to grasp, the observation part of that process is, in this case, far from certain, making the sign test unreliable at best. While you may choose to view this, by your description, in a “simplistic” way, the underlying variability will still be present whether you care to acknowledge it or not. Failing to do so, however, can and will eventually lead to erroneous conclusions. Variability is of every consequence.

“Measurement error doesn’t come into it, that’s clutching at straws when the significance level is so strong. I agree it’s a low power test you nevertheless reach statistical significance so you cannot just dismiss the results. It can also reveal certain technical dismissals as more sticking in the mud than real …”

I’m afraid you have the clutching at straws argument backwards. You tell me that you admit the power of the test sucks, but then turn around and claim some wondrous statistical significance. This tells me you haven’t understood any of this 🙁 If you really believe that measurement error is not important and that the technical details are just nit picking and not real, then all I can say is that I hope the hell you are never involved with any process that my safety and well being depend on.

“You concede that combining across the genders bypasses the independence hurdle. Your new objection is ‘because of the confounding effect of gender. You are not just measuring treatment anymore.’

I think this criticism is rather hopeful. You have given no reason why rats should get various pathologies due to ‘gender’. ”

Have you followed any of these arguments at all? The females of this breed are well documented and known to have much higher rates of tumors than males. If the rates of tumor occurrence are gender specific, then gender is confounding of treatment effects. Even Seralini and his coauthors apparently realize this and separate the genders for this reason.

“ ‘I wouldn’t say surely at all, given the data variability. Given Seralini’s past tendency to gloat over significant results, it is very telling to me that they, in fact, did not analyze this data. Being “obvious” is a judgmental decision. I could just as easily point out the number of studies finding no effects from GMO to say “it’s obvious there is no GMO effect”. Care to run your sign test on positive versus negative studies?’

Personalizing, then changing the subject – not very statistical behaviour! Seventeen numbers out of eighteen, and then twelve numbers (with different meanings) out of fourteen surely IS a pattern. As noted, the explanation of ‘variability’ is a red herring (that’s the point of the sign test).”

Nope, that’s not the point of the sign test or any other non parametric procedure. The point is to reformulate the problem in such a manner that the unknown underlying distribution can be replaced by an assumed one. Unfortunately in doing so, one loses much information and statistical power.

If you can’t understand the basic premise of variability and why it is important, I’m not sure what else I can say to you on that topic. Oh, and excuse me for interjecting a personal opinion, although an accurate one. To me the source of data is an important component when evaluating it.

“Your parting quote is interesting:

‘Left to our own devices, we are all too good at picking out non-existent patterns that happen to suit our purposes (1993)’.

This does not apply here, because the pattern is clearly not ‘non-existent’ – the GMO pathologies at least SEEM overwhelmingly higher, or nobody would be discussing it! The actual question here is whether statistical significance is achieved (seems that it is though). In saying ‘picking out non-existent patterns’, you are kind of misrepresenting yourself – as you have spent a deal of time trying to show the ‘pattern’ wasn’t statistically significant.”

You clearly do not understand this quote nor its application to your position that the pattern is “obvious”.

“Your points are becoming less statistical and more rhetorical. Currently we appear to have a few statistically significant results awaiting proper explanation, though,”

What “statistically significant” results would those be Walter. Seralini, et al have provided NONE and yours carry no reliable information.

Hi again PD

On variability etc. in the sign test (and the universe….). You are welcome to attribute the differences to some kind of observational or measurement error. I don’t object to that – that last chance saloon option is always available, and I happily wish you good luck with arguing it.

What you are not entitled to do, is claim the test does not produce a valid test statistic. Unless the study was a complete fiddle, the numbers that were measured and recorded DID produce the test statistic that results. The GMO-fed numbers of patholgies recorded DO overwhelmingly exceed those for the controls – a fact. The test statistic is what you are trying to explain away – but ‘invalidity’ is not itself a valid explanation. BTW I do not ‘refuse to acknowledge’ the underlying variability, but pointed out quite clearly that it is not a consideration that affects the validity. With all the variability in the world, under the null hypothesis (that there is no treatment difference) you still wouldn’t get p<0.05 more than 5% of the time. Your explanation is a reason why the result MAY not be as reliable as an experiment which had less variability. Though, usually people only enter the last chance saloon when they have nowhere else to go.

More generally, you could junk any piece of research, if you are willing to endlessly splice until groups are too small, refuse to collate anything and find ‘technical’ objections to using tools that others use every day.

“I’m afraid you have the clutching at straws argument backwards. You tell me that you admit the power of the test sucks, but then turn around and claim some wondrous statistical significance. This tells me you haven’t understood any of this”

This tells me you are talking to the gallery. You know perfectly well that a) the low power is only a barrier to achieving significance not interpreting it, so that a low power test will still produce significance is the results are strong enough and b) if a significant result is achieved despite the low power then the low power can not be used as a reason to dismiss it. So you are making a criticism that you know not to be valid 🙁

“If you really believe that measurement error is not important and that the technical details are just nit picking and not real, then all I can say is that I hope the hell you are never involved with any process that my safety and well being depend on.”

You continue with your mixture of wordsmithery and technical feinting but your rhetorical posturing isn’t backed up by the substance of our discussion. I didn’t say technical details are not ‘real’ but that your technical objection is misplaced or you over-rely on it. Perhaps it should be placed along with the other obstacles you have placed in the way of accepting that seventeen higher numbers out of eighteen actually represents a pattern.

Your ‘safety’ swipe, that’s a joke surely? You appear to be the one hellbent on denying the obvious regarding a safety study. With you finding endless ‘technical’ objections that defy commonsense the plane will never be ‘technically’ at risk of crashing. You’ve got it backwards anyway (a sign error :)). If you want to make a criticism that may be valid in the context of our discussion, then the danger for you if I am involved is that your plane will never be allowed to leave the ground!

Me: “You have given no reason why rats should get various pathologies due to ‘gender’. ”

You: “Have you followed any of these arguments at all? The females of this breed are well documented and known to have much higher rates of tumors than males. If the rates of tumor occurrence are gender specific, then gender is confounding of treatment effects. Even Seralini and his coauthors apparently realize this and separate the genders for this reason.”

Perhaps I wasn’t clear. Why should the GMO-fed rats get MORE gender-specific pathologies due to gender? Yes, they separated the rats because they were investigating probable gender-related pathologies. Doesn’t mean gender would necessarily be confounding surely. I don’t think they do state this. The separation prevents mis-categorizing of e.g. the absence of female pathologies in male rats. But to count the pathologies would still be valid though (in each test each rat would be included in only one pathology so there would be independence), so the conclusion (if this is what was investigated) would still be that the GMO-fed rats had more pathologies and it wouldn't be confounded by gender.

It shouldn’t be all that surprising that seventeen differences out of eighteen might somehow be significant – should it?

“If you can’t understand the basic premise of variability and why it is important, I’m not sure what else I can say to you on that topic.”

Is it that time already ? I hope you are tapping your nose knowledgably 🙂 Seriously, if the technical arguments are appearing to run aground you need to revamp them, not retreat into variability-mysticism that you have actually not managed to substantiate yet.

Thanks Pdiff I don’t suppose we will be going that much further, but I’ll look out for your reply.

Cheers

@walter:

I’m going to respond to 2 points, as that is all I have time for right now. I’ve been out of town for a few days and have not been able to respond in “real-time.” I will be out of town again for a while, so I may not be able to respond again for a few days.

You said: “You don’t seriously suggest that ‘random variation’ can not explain that ONE difference of one, without simultaneously junking the other SEVENTEEN comparisons?”

If you think that ONE treatment could be explained by random variation, how do you choose which one? If ONE treatment could be due to random variability, why shouldn’t we expect the CONTROL is one of those? If you are willing to concede that ONE treatment had a very low number of tumors/pathologies/deaths/etc by chance, then you must be willing to concede that multiple treatments could have occurred this way. This was the point of my post. To illustrate that there is a rather large possibility that these results could have been observed by chance. Even if the results “look” different to you, if they are within the normal variation for the population, it is absolutely possible that it could have happened by chance.

You said: “To be fair to Emily Willingham, she is more concerned with the Roundup issue and clearly knows her stuff on the mysterious hormone effects.”

Read her piece again. The reason she is more concerned with the Roundup issue is because (in her own words): “The data are messy, but in general, GM vs GM+Roundup seemed to yield quite similar or conflicting results, and control values sometimes overlap or even exceed values from the higher and highest treatment doses.” After an extremely thorough examination of the Seralini et al. paper, she concludes “The one thing that doesn’t leap out here as being involved, among a sea of likely possibilities, is the GM corn itself.” THAT is why she is more concerned with the Roundup issue as you say. She went through the article with a fine toothed comb, and concluded the corn data made no sense.

Hi Andrew.

You appear to be confusing treatments with individual outcomes. I did not say that “that ONE treatment could be explained by random variation”. One outcome. Just because there’s a statistical effect, you cannot expect every instance of one treatment to beat every instance of the controls. That would be called a clean sweep.

On your second point it is clear most of Emily W’s energy was devoted to the roundup analysis, which is probably pretty good I’d say.

“She went through the article with a fine toothed comb, and concluded the corn data made no sense.”

Not the GMO-fed comparison she didn’t. So I don’t see how that refutes anything I’ve said previously.

Anyway I took what she said about that on board in one of my answers to Pdiff. Indeed her suggestion of ratios of pathologies to affected rats seemed to produce other candidate significant results.

Cheers