This Month’s Expert: Mark Zimmerman, M.D. On Generalizability in Research

This Month’s Expert: Mark Zimmerman, M.D. On Generalizability in ResearchTCR: Dr. Zimmerman, you and your colleagues have published prolifically on whether the research studies we read are truly relevant to our clinical practices. What’s the bottom line of your research?

Dr. Zimmerman: The bottom line is that the overwhelming majority of individuals who are seen in routine clinical practice with a diagnosis of major depressive disorder would not qualify for a placebo-controlled antidepressant efficacy trial.

TCR: Why not?

Dr. Zimmerman: The two criteria that probably have the greatest influence on exclusion and thus generalizability are the symptom severity inclusion criterion and the diagnostic co-morbidity exclusion criterion.

TCR: What sort of symptom severity is required for most studies?

Dr. Zimmerman: Almost every study requires a minimum score on the Hamilton Depression Scale, and so by design they apply to only of a subset of depressed individuals, yet none of the drugs are marketed in this way.

TCR: So what percentage of patients are excluded?

This article originally appeared in The Carlat Psychiatry Report -- an unbiased monthly covering all things psychiatry.
Want more, plus easy CME credit?
Subscribe today!

Dr. Zimmerman: We reviewed 39 antidepressant efficacy studies, and our first finding was that there were no fewer than 11 different permutations of Hamilton Scale length and cutoff scores to qualify for studies, so there doesn’t seem to be much consensus in the field. Depending upon the cutoff score, we found that 40 to 45 percent of the patients that we see with major depressive disorder wouldn’t quality for an antidepressant efficacy trial. And the reason why this might be important is there is some data to suggest that in mild major depressive disorder you may not find differences between active drug and placebo.

TCR: Which may be one of the reasons these severity cutoffs are used in the first place!

Dr. Zimmerman: Exactly. After all, the primary purpose of these efficacy trials is to both establish the safety of the medications and to establish that they work. But with the ubiquity of that Hamilton cut-off, we really have only established that they work for that 60 percent or so of depressed patients that are moderately to severely depressed.

TCR: What other exclusion criteria are typically used in these studies?

Dr. Zimmerman: Co-morbidity is the other major exclusion. We published a paper in the Journal of Clinical Psychiatry that found that in our community-based practice, more than two-thirds of the depressed patients that we treat have a comorbid condition that would have excluded them from many clinical trials. Some studies exclude individuals with any concurrent Axis I disorders, while other studies will exclude particular disorders, such as any anxiety disorder or perhaps just panic disorder, obsessive-compulsive disorder, or an eating disorder. There doesn’t seem to be any rhyme or reason as to which comorbid conditions are excluded, and in fact Michael Posternak and I published a review article in which we found that there is essentially no empirical support for any of the common comorbidity exclusion criteria.

TCR: So the standard practice of excluding co-morbidities doesn’t really stand up to empirical scrutiny.

Dr. Zimmerman: Right. And the lack of relevant comorbidity data is a problem for practicing clinicians. For example, I don’t know of any published placebo-controlled studies of the efficacy of antidepressants for individuals with major depressive disorder and borderline personality disorder. Certainly, we use the medications for these patients. They may work; they may not; we don’t know.

TCR: Moving onto another issue related to research design, what is your take on the controversy about whether there is actually any significant difference between active drug and placebo in most clinical trials?

Dr. Zimmerman: I think that issue is a bit of a red herring. Usually the focus is on the fact that there is only about a 2 to 4 point difference on the Hamilton Scale between active drug and placebo.

TCR: Well, that doesn’t seem like a very robust difference, does it?

Dr. Zimmerman: True, but if you look at response rates, especially the clinician’s global impression of response, there is a considerable difference between drug and placebo in most studies. In fact, a 2 to 4 point difference on the Hamilton may well translate into a 20 to 30 percent difference in overall response rate. But don’t get me wrong, I think the pharmaceutical industry is wonderful at figuring out how to overanalyze their data and put their best foot forward. And one of the things that really needs to be done is to bring some sort of standardization to the process of how research data are analyzed.

TCR: Why is that important?

Dr. Zimmerman: Because there are too many different cut-off points used in data analysis. While there seems to be some consensus that 50 percent or more improvement defines antidepressant “response,” what actually happens is that if a researcher doesn’t find a significant difference on that variable, other analyses are done until the researcher gets a positive finding. The risk is that you have conducted so many analyses that you may find a difference to be statistically significant, when, in fact, it is not truly different. When you do 10 tests, each of which has a 1 out of 20 chance of showing a statistical difference (assuming you use the 0.05 level of significance), you are increasing the odds that you will find some difference due purely to chance. So, for example, let’s say you are analyzing the percentage of people who remitted with treatment. You can say, “let me define remission as a cutoff score less than 6, then less than 7, less than 8, less than 9, less than 10, and I’ll just look at every possibility until I find one that leads to a significant difference.” When you write it up in the published paper you then report the one significant result. A reader has no way of knowing that the statisticians did 20 different statistical tests in order to find the one significant result.

TCR: Well, how can we be more savvy when we look at some of these studies?

Dr. Zimmerman: The only way you can be more savvy is if you notice that a particular result is reported in an unusual or aberrant way, but in order for this to be obvious there has to be some sort of standardization in terms of how results are reported.

TCR: Right, because it is pretty common for results to be reported in terms of not only a primary outcome variable, but also in terms of many secondary variables as well, and if the primary outcome doesn’t show a significant difference, one of the secondary variables will be highlighted instead.

Dr. Zimmerman: And that is one of the motivations for reporting a lot of measures in the results sections of studies. For example, many papers will report both the Hamilton and the MADRS depression scales. Those scales are highly correlated with one another, so why bother to collect them both, other than to afford yourself the opportunity of doing more analyses.

TCR: So perhaps we should be a little suspicious when we see studies that report the results of a lot of different rating instruments. How common is this in the literature?

Dr. Zimmerman: I have recently reviewed the literature in order to determine what factors clinicians use in choosing an SSRI. I found 28 published head-to-head SSRI comparison studies. Across these 28 studies, there were more than 400 outcome variables assessed. On average, there were about 14 outcome variables per study. In one paper it went as high as 41, whereby they analyzed every item on the Hamilton. And across these over 400 variables, there were only 11 significant differences…less than what you would expect by chance alone.

TCR: It sounds like a statistical fishing expedition!

Dr. Zimmerman: Right. There is essentially no difference between the SSRIs in terms of end point efficacy, but this highlights the length to which it sometimes seems that the investigators, the pharmaceutical firms, I don’t know who you want to ascribe it to, will go in their search for something.

TCR: And, of course there are enormous financial incentives for them to grab a little more market share.

Dr. Zimmerman: I understand where they are coming from. I don’t blame them. Quite frankly, I blame the journal editors, the ones who should be responsible for policing this and saying, “Hey this is a problem,” and they are the ones who should get together and say, “Okay this is the format that we are going to adopt. If you want to publish in my journal or our journals, then this is how you need to present the results. You can do additional analyses if you want, but you are going to have to present it this way.”

This Month’s Expert: Mark Zimmerman, M.D. On Generalizability in Research

This article originally appeared in:

The Carlat Psychiatry Report
Click on the image to learn more or subscribe today!

This article was published in print 12/2003 in Volume:Issue 1:12.


APA Reference
Zimmerman,, M. (2013). This Month’s Expert: Mark Zimmerman, M.D. On Generalizability in Research. Psych Central. Retrieved on October 27, 2020, from


Scientifically Reviewed
Last updated: 2 Apr 2013
Last reviewed: By John M. Grohol, Psy.D. on 2 Apr 2013
Published on All rights reserved.