Introduction to Reading the Medical Literature: RCTs.

Jul 09 2012 Published by under Uncategorized

So now I'm going to write about my first day of class, and discuss what we learned about reading the medical literature. Specifically, how to read and understand a paper about a Randomized Clinical Trial (RCT) (also called a "randomized controlled trial".). But first, I'm going to back up and talk a little bit about how we interpret any paper on a therapeutic intervention. We need to ask ourselves three basic questions. Is the study valid? What are the results? How is the study applicable beyond the study itself?


Validity is the first concern. If a study isn't valid, we don't really need to ask the next two questions. We can break this up into four issues: treatment allocation, blinding, follow-up, and analysis. Allocation is the method for determining who was tested; who were the subjects of the study, and how were they compared with the control group (the group which did not receive the intervention for purposes of comparison)? We discussed four basic types of studies: randomized clinical trials, cohort studies, case-control studies, and case reports. I'll probably come back to discuss each of those over the course of this week, but for now, let's just say that I've ordered them in descending order of quality of evidence. And case reports are basically not evidence of anything; they're useful only for hypothesis generation, not drawing any conclusions.

The RCT is at the top of the heap. At least, it is when it's done well. Essentially, the sample population is passed through a sieve to include or exclude patients. Then, patients are randomly sorted into two groups, the treatment group, and the control group. Then, the outcomes from the two interventions are considered (figure 1).

Figure 1: RCT

Randomization can balance the prognostic factors between the two groups, and prevents bias in deciding which patients belong in which group. We do not, for example, want all the very sick patients in one group or the other. We're not guarenteed that the two groups will be identical. But we are very, very likely to have them be very similar if the sample size is large.

Blinding is done in three ways, and not always done in every trial. Randomization is its own blind. The process of selecting which group each subject is in should be an unbiased, unknown process, so that there is no reason to believe that the control group and the experimental group differ in any important way. The second type of blinding is the subjects. For many trials, it is possible to blind the patients to which group they are in. This is done by giving a placebo pill instead of a pill with medicine in it to the control group, for example. The patient doesn't know if they're getting medicine or not. However, sometimes it is impossible to blind the subjects to which group they are in, especially if it is a complex procedure with a recovery time.  Third, the providers can be blinded to the group. Again, this is not always possible. Surgeons generally know if they have performed surgery or not.

Finally the evaluators can be blinded. If the providers or patients cannot be blinded, it is valuable to have a third party evaluate the outcome of the study, so that they do not know if the patient they're reviewing has had the tested intervention. If the evaluators are not blinded to study group, it is best to only evaluate hard outcomes, rather than subjective ones. For example, living or dead, rather than "severity of pain".

Follow-up must be observed as well. Many patients are lost to follow-up. We'd like to know if they are similar between groups, or not. We'd like to know why they were lost to follow-up. And finally, we'd like to know if those lost to follow-up could have significantly changed the results of the study. So, we'd like to see a calculation that assesses: if ALL of the patients lost to follow-up had the least desirable outcome, would this make me change my conclusions about an intervention? For example, if a heart drug seems to prevent heart attacks, but 5% of my intervention group was lost to follow-up, I'd like to see a calculation determining the results if all of those patients had heart attacks. This is unlikely, yes. But it allows us to bound the effectiveness of the intervention given the missing patients.

Finally, the analysis. What about patients in the intervention group who don't adhere to the study? Are they evaluated in the intervention group, or the control group? Neither? What if some patients cross over between groups during the course of the study? Do the study authors use different analyses for determining benefits vs. risks? And what about sub-group analyses? Measuring the effects of intervention on sub-groups should be decided a priori, and not post-hoc based on observing an event cluster in a particular sub-group.

So, that's a list of things we need to think about when determining if a study has validity.


To determine the results of the study, we want to look at three basic aspects: statistical significance, precision, and power. Statistical significance is based on hypothesis testing using statistical methods, which I'm not going to discuss. But essentially, they tell us, when looking at these two groups of patients, what is the probability that the differences in their outcomes is due to chance, and not due to the intervention? The industry standard is 5%. If there is a less than 5% chance (we write: "p<0.05") that the result is due to chance, then we reject the "null hypothesis" (that the two groups are the same), and accept the active hypothesis that the difference in the two groups is due to the intervention we instituted.

It's worth noting at this point that not all RCTs are placebo controlled. Many are controlled with "usual care". We can't, for example, test a new blood sugar controlling drug against nothing at all. The patients in the control group would still be allowed to use insulin, or their other blood sugar medication. It's unethical to withhold treatment just because the patients are in a study. So the usual care is the control, and the active hypothesis is that the intervention is superior to usual care (often abbrev. UC).

We also use 95% confidence intervals for statistical significance. When computing an odd ratio, for example, which allows us to say that the likelihood of an event is x% greater given an intervention, we calculate the statistical bounds on that value. The strict definition is that 95% of the point estimates from sample populations will fall within the interval. But the practical interpretation (because we can do only one RCT, generally, and certainly not many of them) is that given the estimate we have from the RCT on the event rate, we can be 95% certain that the "real" value is within the CI. Then, if the confidence interval includes the value for "no increased likelihood" (so, if the odds ratio is 1.3, for a 30% increase, and the 95% CI includes 1.0, for 0% increase), we say that the result is not statistically significant. If the CI does not include this value, then we say it is a significant finding.

Power was not discussed in class, and I freely admit to not understanding it terribly well. Essentially, well powered studies have enough patients in each group to trust the results. I'll add a clarification if we go into greater detail. I'm also sure someone like Mark CC could answer it in the comments if we bug him enough.


Applicability was described as having four basic elements: intervention, patients, outcomes, and environmental factors. We'd like to know if the paper, the results of this study, are applicable in our own circumstances. Does this study, this intervention, apply to me? Or, if I were a physician, to my patients? We ask if the intervention is reproducible. Do we have similar products, similar exposure, and do we have similar skills required to implement the intervention? This may be especially important if the intervention is a surgery requiring equipment and training.

We ask if we, or our patients, would meet the study inclusion criteria. Suppose that blood sugar drug was not tested on diabetics with renal failure. Is there a reason? Is it safe to use on patients with renal failure? This will matter to me if I'm a nephrologist with a lot of diabetic patients. The whole population in the real world is different from the sample population, which is different from the study population. The only group we know for sure the study applies to is the study population.

We ask if the outcomes are clinically relevant. Did we use primary or surrogate outcomes? Primary outcomes for a diabetic might be vision loss, or renal failure. The surrogate outcome often used is hemoglobin a1c, which is a measure of average blood glucose content. We try to control the surrogate outcome because we know that it is highly associated with our primary outcomes. But this can fail. Because a drug which is beneficial to the surrogate could be negative to the primary through some other mechanism. We've seen this with drugs recalled from the market which may help control cholesterol, but actually increase cardiac mortality.

We ask if the benefits and risks were properly assessed. A drug which decreases risk of stroke might increase risk of fatal GI bleeds. We need to determine which of these effects dominates if we are going to recommend adopting the use of this drug. Maybe it decreases the risk of stroke by 10%, but increases the risk of GI bleed by 20%. Even then, that might be quite acceptable if the baseline risk of GI bleed is very low. We might be saving 10 strokes for every additional GI bleed.

Finally, we look at the environmental and other factors: what are the patients' and providers' expectations? These will enhance or temper the placebo effect. Are there behaviors relating to adherence and persistence? An intervention which is highly effective at full dosage, but which dramatically loses effectiveness with a single missed pill might not be worth the expense.


So, I hope the above has given a little primer on how to read and interpret an RCT. I was going to include my own analysis of a paper now, but GASCKHK. I'm at 1720 words already, and I have more work to do tonight. So, here's your homework: go find a paper. Read it, and report back to me on your findings. Is it valid? What are the results? And how applicable are the results to you or your patient population? These are the questions, apparently, that physicians and epidemiologists ask when reading the medical lit.

Tomorrow? No idea! I'm having a fantabulous time. Oh look: here's Fonzie, my flying helper-lemur, with a cup of hot tea and a biscuit.

3 responses so far