TCPR: Dr. Turner, recently you and your colleagues published a paper entitled “Selective Publication of Antidepressant Trials and Its Influence on Apparent Efficacy” (Turner E et al., N Engl J Med 2008;358:252-60). It caused quite a stir in the field. Can you explain your findings?
Dr. Turner: We were interested in determining whether suppression of negative data has been a significant problem in the antidepressant research literature. So we started by obtaining reviews from the Food and Drug Administration (FDA) for all pre-marketing trials of 12 antidepressants approved between 1987 and 2004.
TCPR: Specifically which drugs were included in your analysis?
Dr. Turner: Six were SSRIs—fluoxetine, paroxetine, paroxetine CR, sertraline, citalopram, and escitalopram. The other six were bupropion, mirtazapine, nefazodone, venlafaxine and venlafaxine XR, and duloxetine. The selegiline patch (EMSAM) and desvenlafaxine (Pristiq) were approved too late to be included, and we didn’t look at agents approved prior to 1987, such as the tricyclics or the MAOIs. There were 12,564 patients involved in these placebo-controlled studies. We then searched the published literature to locate articles in which these trials were reported. What we found is that there has been very selective publication of the data. If you use the FDA’s version of what happened in the clinical trials that brought these 12 antidepressants to market, you find out that the efficacy of the drugs is quite a bit less than what the published peer-reviewed literature would lead you to believe.
TCPR:What were the numbers?
Dr. Turner: If you look at the published literature you would have the impression that nearly 100% of the trials were positive, meaning they nearly always seemed to show statistical superiority to placebo. However, the true percentage of positive trials, according to the FDA, was 50%. The negative trials—the ones in which the drug did not separate from placebo—were either not published, or the data were “spun” and published as if they were positive. After that, we did two meta-analyses, one based on the FDA data and another based on the peer-review literature. The FDA-based effect size was 0.31, while the literature-based effect size was 0.41, a 32% exaggeration or “boost” in effect size due to selective publication. That was for all antidepressants combined. If you look at individual drugs, the boost in effect size ranged from 11% to 69%.
TCPR: Looking at your paper, some readers, seeing that only half of the studies submitted showed the antidepressant to be effective, might wonder, “Does this mean that these antidepressants basically aren’t effective at all, that it is just a flip of a coin whether a study shows a drug to be effective?” Is that a proper way of looking at it or no?
Dr. Turner: That question comes up a lot. No, that would not be a proper interpretation, because if the drug does not separate from placebo statistically, it might still be numerically better than placebo. In almost every trial we looked at, the drug still had a numerical edge over placebo (there were a few exceptions, trials in which the drug actually did worse than placebo), just not a big enough edge to be statistically significant. Now if you have a lot of trials like that and you combine them through meta-analysis, you get more statistical power, and the overall difference can become significant. And that’s what we found, that each of the 12 drugs, overall, was statistically superior to placebo.The other caveat would be that the results of these trials get reported as means, but each mean value is based upon thousands of individuals, some with very good responses and others with lousy responses. So when you think about how an individual patient in your practice might respond, remember the old saying, “Your mileage may vary.”
TCPR: Recently, another metaanalysis of antidepressants was published by Cipriani and colleagues. They reported that venlafaxine, sertraline, escitalopram and mirtazapine are the most efficacious, but their big “winner” was sertraline when they combined efficacy, tolerability, and cost. What is your take on this conclusion?
Dr. Turner: I would say it was jumping to a conclusion prematurely.
Dr. Turner: Let’s not get into how you measure and factor in tolerability and cost, because that opens up new cans of worms. But if you just look at their efficacy ranking, their findings are not robust. Robustness means the ability of the data to stand up to different ways of looking at it, and their ranking of antidepressants changes when you analyze the data a different way.
TCPR:What do you mean?
Dr. Turner: In our study, we used data from the FDA, while they relied in part on data from journal publications, which, as we showed, can be spun. Second, the trials we looked at were pre-marketing FDA-registered placebo-controlled trials, while Cipriani’s group included many post-marketing head-to-head trials. Post-marketing trials are generally conducted for marketing purposes by a company in order to show that their drug is better than a competitor’s drug. These kinds of studies are subject to various biases. The sponsor might design the trial in such a way that their drug has an advantage over their competitor’s drug. Furthermore, if it turns out the sponsor’s drug performs worse than the competing drug, there is nothing to prevent them from simply not publishing that trial, so the average doctor relying on medical journals has no way of knowing that that trial was ever done.
TCPR: And the Cipriani paper relied on information from these post-marketing trials?
Dr. Turner: Yes. While they did ask the drug companies for all of their unpublished data, how can we verify that the drug companies really turned over everything that they had done? And secondly, what they did turn over, how can we be sure it wasn’t spun? The data we used, by contrast, was vetted by FDA reviewers with access to the raw data and the original protocols.
TCPR: But in your paper, you looked only at placebo-controlled trials. How can you make any judgments from placebo-controlled trials regarding comparative efficacy of medications?
Dr. Turner: That is a controversial area. When you compare placebo-controlled trials, you are using what is called an indirect comparison: you get an effect size based upon each drug vs. placebo and then you compare the placebo-adjusted effect sizes among the 12 drugs. While one might assume that direct head-to-head comparisons would be a better way to compare drug efficacy, that’s only true if you have access to all the data, and not only the positive data the drug company marketing department wants you to see.
TCPR: Based on your indirect comparisons, how did your ranking compare with the Cipriani ranking?
Dr. Turner: As you can see from the table, in our study paroxetine had the largest effect size, but it doesn’t appear in Cipriani’s top 4. And neither escitalopram nor sertraline were in our top 4, while both were in Cipriani’s top 4.
TCPR: So what does this mean? Which is the most trustworthy ranking?
Dr. Turner: Each method has its limitations. The fact that Paxil had the largest effect size in our dataset does not necessarily mean it is the most effective antidepressant. First, using different methodology, you get a different rank-order, so I would suggest not reading too much into it. Second, and perhaps more important, the difference between the Paxil effect size and most of the other antidepressants was not statistically significant. Third, even if the differences were significant, the larger effect size could be because the Paxil clinical trials program did a better job differentiating placebo responders compared to their competitors.
TCPR: So, the counter-argument to the validity of the rank ordering in your paper might be, “Look, what you found is an artifact of the fact that some drug companies have just been very good at figuring out ways of conducting their studies to differentiate placebo responders from active drug responders.”
Dr. Turner: Yes, that is well put, and it is a valid argument. In fact, we didn’t write our paper in order to come up with an accurate rank ordering. Our purpose was to see whether and how much selective publication exaggerates the apparent efficacy of antidepressants. But the fact that our rank ordering is so different from Cipriani’s means that we can’t really have much confidence in the validity of either ranking. In my opinion, the best methodology would be to use both indirect (placebo) comparison data and also direct (head-to-head) comparison data. But I would be careful to restrict it to unbiased data vetted by the FDA, so that you avoid the influence of selective publication.
TCPR: Well, how would their having included the placebo data have changed the head-to-head comparison data?
Dr. Turner: One can only speculate—maybe it would have brought their ranking more into line with ours, because it would have made our methodological approaches more similar. Perhaps in a few years we’ll have consensus on the best methodology and thus arrive at a ranking we can all agree on. I would suggest that the clinician not get too hung up on this issue, especially since even a consensus ranking may or may not apply to the individual patient in the office. What I think we can say is (a) all of the second-generation antidepressants are effective—they all work better than placebo, (b) their advantage over placebo is not as much as it appears to be in the published literature, and (c) if you’ve been disappointed that so many of your patients seem only partially responsive, maybe it’s because the published literature has inflated your expectations.