This Month’s Expert: Dr. John Rush on Interpreting the STAR*D Trial

This Month’s Expert: Dr. John Rush on Interpreting the STAR*D TrialTCPR: Dr. Rush, thanks for agreeing to help us understand the STAR*D trial. You were the overall principal investigator of what I believe was the largest study in history comparing different antidepressants. Looking at the first set of results, are you encouraged by the findings?

Dr. Rush: Very much so.

TCPR: The first paper on Celexa (citalopram) monotherapy reported a 30% remission rate in patients taking an average of about 40 mg/day for 10 weeks, and a response rate of 47%.

Dr. Rush: Actually, the remission rate by self-report was 33%, which is probably a little more accurate.

TCPR: More accurate than the Hamilton Depression Scale remission data?

Dr. Rush: Right, and the reason is that we got self-report data from the QIDS (Quick Inventory of Depressive Symptomatology) on every patient, whereas we have incomplete data from the Ham-D. Any patient who left the study without a final Ham-D score was assumed to have not remitted. This led to the more conservative estimate of 30%, but if you look at the results of the QIDS, you get a 33% remission rate, which is a little more accurate.

TCPR:All right, so that’s a little bit more encouraging than 30%. Either way, clinicians are divided on whether this is good or bad news about the efficacy of citalopram and SSRIs in general. What’s your take?

This article originally appeared in The Carlat Psychiatry Report -- an unbiased monthly covering all things psychiatry.
Want more, plus easy CME credit?
Subscribe today!

Dr. Rush: Actually, we thought these numbers were really, really good because they do approximate what you get in those eight week efficacy trials with less complicated, less chronically ill patients. The eight week research trials routinely exclude people who have been in a depressive episode for more than two years, and they often exclude people with significant comorbidity. And, of course, these trials are not aiming at remission; they are aiming at some signal of efficacy that is sufficient to exceed placebo. And in spite of all this, our numbers are surprisingly close to what you get in those trials. We got around a 48-50% response rate and a 30-33% remission rate.

TCPR: The STAR*D trial was open label, meaning that both patients and investigators knew what everyone was taking.

Dr. Rush:
Yes, everyone knew the treatment except the interviewers who gathered the outcome data using the Hamilton Depression Scale. The assessment was done by telephone with an interviewer who was not at the research site and was blind to the treatments. If the patient inadvertently said, “My Wellbutrin is helping me,” then we switched the outcome assessor. So we did a lot to protect the blinding of the assessment.

TCPR: But isn’t it true that in open-label trials, response rates tend to be higher than in double-blind trials? So when you say that this 30-33% figure is impressive compared to a typical double-blind and placebo-controlled trial, is that a fair comparison? Shouldn’t you be comparing that with numbers that one would get in other open-label trials?

Dr. Rush: Yes, I suppose you could make that argument. The big difference, though, is that our population was more impaired than the population you typically see in double-blind trials. But obviously, 33 percent is not 50 or 75 percent, which is what we would have hoped for.

TCPR: Once the citalopram portion of STAR*D was complete, you had approximately 70 percent of the original patients who did not achieve remission, and that remaining group of patients was randomized to various other treatments, one of which was the augmentation trial. I believe that 565 patients were randomized to augmentation with either Wellbutrin SR (sustained-release bupropion) or BuSpar (buspirone).

Dr. Rush: That’s right. And both treatments yielded about 30% remission rates.

TCPR: This is another result that has discouraged some psychiatrists. BuSpar augmentation is a treatment strategy that showed promise in some open trials in the early 1990s, but later controlled trials showed no apparent benefit. So now we have results showing that Wellbutrin augmentation is no better than BuSpar augmentation, which, in turn, is no more effective than placebo. What do we make of this?

Dr. Rush: Those controlled BuSpar augmentation trials that you refer to were designed in a way that might have prevented them from detecting a genuine effect from augmentation. The reason is that the runin period on the primary drug, before patients were randomized to placebo or BuSpar, was rather brief – only four weeks. And what we do know, from our own Level 1 findings plus lots of other literature, is that there are people who don’t do well in the first four weeks of treatment who do well in the second four weeks. So people who got an antidepressant plus placebo in those earlier augmentation trials may have had enough additional improvement, just because of the passage of time, to make it impossible to detect the real difference between placebo and BuSpar.

TCPR: But if there had been a placebo arm of the augmentation trial, isn’t it possible that placebo plus citalopram would have yielded a 30% remission rate as well?

Dr. Rush: I don’t think so. Remember that these patients had already had treatment with Celexa for 11 weeks before they received augmentation. We believe that the nonspecific aspects of treatment, if they were powerful enough to create remission, would have done so by 11 weeks.

TCPR: So you take a more optimistic view, which is that this data shows not that Wellbutrin performs poorly, but rather that BuSpar actually performs better than clinicians have generally thought.

Dr. Rush: That is at least our view, but, of course, with no placebo in this trial you can’t know for sure.

TCPR: Here’s something I found confusing: in most clinical trials, remission rates are lower than response rates, because remission criteria are more stringent than response criteria, and people “remit” only after they’ve “responded” (for example, if a drug shows a 40% response and 20% remission, it means that half of the responders went on to remission). However, in the STAR*D augmentation results, it’s the reverse – for example, the remission rate for BuSpar was 30%, but the response rate was lower, at only 27%. Please educate us a bit on how that happened.

Dr. Rush: It is confusing, so it is worthwhile to talk it through. In order to get into Level 2 and subsequent levels, you had to either fail to achieve remission or you had to be intolerant of the drug. In order to make sure patients really got a good shot at remitting at Level 1, we vigorously ramped up the dose of citalopram, so that the patients that went into the augmentation arm were taking an average of 55 mg of Celexa a day. That high dose led a lot of patients to come pretty close to remission; for example, many reached a QIDS score of 7, 8, or 9, where a score of 5 constitutes remission.

TCPR: I think I see where you’re heading. The definition of “response” is a 50% improvement, so a patient with a score of 8 going into the augmentation arm might hit a score of 5 before a score of 4, and thus be defined as achieving “remission” without passing through “response.”

Dr. Rush: Exactly. It is a quirk of the design. And that is why I don’t think the response numbers mean much. As a clinician, your primary outcome is remission; that is what you are shooting for, gauging your treatments by, and measuring your outcome against.

TCPR: However, if it’s true that a number of people were entering Level 2 doing pretty well, with QIDS in the 7, 8, or 9 range, doesn’t that again cast some doubt on the idea that we are seeing anything more than a placebo response to augmentation? Because I think most clinicians would argue, “Well, if you have patients who are almost there and you just keep them on high-dose Celexa for another 10 weeks, chances are pretty good that a lot of them will inch their way into remission, whether or not you augment their treatment.”

Dr. Rush: It’s true that patients can hit remission as a result of just prolonged exposure and that does raise a very fair question about what we would have gotten if we had used a placebo. But there were some serious ethical issues about taking a person who has already been depressed for up to 12 weeks on an active drug and then adding a placebo for another 12 weeks. Many IRBs (institutional review boards) would not have approved such a design.

TCPR: At any rate, I understand that these results are only the tip of the iceberg.

Dr. Rush: Yes, our next wave of analyses will begin in May and are called “moderator analyses,” in which we will look for data that might allow us to predict which patients will do well on which sequence of treatments. Ultimately, we want to be able to tailor and individualize antidepressant treatment. We will also have some cognitive therapy data and, importantly, some long-term data on patients who were followed for a year. We are excited. I think we are going to have most of our major findings out by the end of the year.

TCPR: Thank you very much, Dr. Rush.

This Month’s Expert: Dr. John Rush on Interpreting the STAR*D Trial

This article originally appeared in:

The Carlat Psychiatry Report
Click on the image to learn more or subscribe today!

This article was published in print 5/2006 in Volume:Issue 4:5.


APA Reference
John, D. (2013). This Month’s Expert: Dr. John Rush on Interpreting the STAR*D Trial. Psych Central. Retrieved on December 4, 2020, from


Scientifically Reviewed
Last updated: 10 Aug 2013
Last reviewed: By John M. Grohol, Psy.D. on 10 Aug 2013
Published on All rights reserved.