The fascinating history of surgery: when placebo-controlled trials clash with common beliefs

This blog post takes a closer look at the fascinating history of surgical procedures. In contrast to drugs or medical devices, surgeons do not need to provide evidence of a new procedure to introduce it into medical practice. This results in a prolonged delay between the use of surgical interventions and scientific experiments that test if these interventions are effective or not.

It’s this delay that’s so fascinating. It provides insight into how medical research works and why it’s important to control for various biases in clinical trials. Above all, it shows how misleading clinical intuition can be. Therefore, the history of surgery has important implications for patients with poorly understood conditions where research is of poor quality, and treatments are introduced prematurely.

The long list of surgeries that were once part of medical practice but are now abandoned:

Most of the information in this blog post comes from the book ‘Surgery: the ultimate placebo’ by Australian professor and orthopedic surgeon Ian Harris.* Harris’s book gives a tantalizing account of how many invasive procedures came into use because they ‘made sense’ or had some preliminary evidence, but were eventually abandoned after testing in a proper scientific experiment.

The list of inadequately studied surgical procedures that were once part of medical practice only to be abandoned after closer scrutiny, is rather long. It includes prefrontal lobotomy, routine tonsillectomy, gastric freezing, radical mastectomy, glomectomy for asthma, prophylactic portacaval shunting, and adrenalectomy for essential hypertension.

The most famous example, however, is therapeutic phlebotomy or bloodletting. For centuries it was common for physicians to make incisions to let their patients bleed in the hope that this would restore the internal balance of bodily fluids. The procedure was used for various illnesses including pneumonia, cancer, diabetes, and jaundice. It did more harm than good and likely hastened the death of many patients, George Washington being a famous example. Harris notes that “because its use was so widespread, it probably killed more people than any other medical treatment in history.”

Bloodletting provides a cautionary tale of how doctors can sometimes do more harm than good to their patients even with the best of intentions. Although its rationale may now seem absurd, for centuries bloodletting was an established procedure, promoted by some of the greatest physicians that ever lived, including the famous Canadian doctor William Osler.

Artery ligation: the first placebo-controlled trials for surgical interventions

Recent examples of surgical procedures that are now abandoned are even more interesting as they demonstrate how observational evidence can be misleading. Seeing patients improve after surgery is simply not enough to conclude that the procedure is effective. Perhaps patients would have improved faster or to a larger extent if they didn’t receive the surgery. What is needed to determine efficacy, is a controlled trial where patients are blinded and approximately half of them receives not the intervention, but a placebo.

Harris takes us back to the first placebo-controlled randomized trials for surgical interventions at the end of the 1950s. Two landmark studies tested the use of internal mammary artery ligation for angina pectoris (chest pain). The idea behind this procedure was to shut off certain arteries that were thought to be less essential so that more blood would flow to the heart. A 1957 article in Reader’s Digest heralded it as “New Surgery for Ailing Hearts” and described impressive success stories of patients who were able to return to work after getting the surgery. Harris writes that “the operation had everything going for it: biological plausibility (meaning it made sense, at least on superficial consideration), support from animal experiments, and good results from a series of patients who had the procedure.”

Luckily some researchers were skeptical and decided to put this procedure to the test in a proper scientific experiment: a randomized placebo-controlled trial. To reduce bias, patients were randomly allocated to the intervention or control group. Patients in the control group received a skin incision without the surgical procedure so that conditions were closely matched. The results showed that patients in both groups improved but without any significant difference between the two. Because of these two pioneering trials, the surgical procedure was quickly abandoned. One can only imagine how long it would have remained in practice if these placebo-controlled trials had never been performed. Many common surgical procedures surgeons used today have never been put to the test like this.

“You can make up a biologically plausible mechanism for anything you want”

In more recent years, patients with coronary artery disease and restricted blood flow (ischemia) are frequently treated with a stent, a tiny tube that helps to keep the artery open. Although the procedure makes sense and is still frequently performed, randomized controlled trials have shown that it is not effective in treating heart disease that’s severe but stable. This year, for example, a large international study on more than 5000 patients with moderate or severe ischemia was published in the New England Journal of Medicine. It showed that invasive procedures are no better than medications and lifestyle advice in reducing the risk of ischemic cardiovascular events or death.

This brings me to an important point that Harris emphasizes repeatedly in his book: sometimes a treatment makes a lot of sense but still doesn’t work. A biologically plausible mechanism is required (if you don’t have one then your hypothesis is in trouble) but it shouldn’t impress anyone because they are quite easy to manufacture. Or as Harris says in a lecture:

“You can make up a biologically plausible mechanism for anything you want […] it just means absolutely nothing to me to have a biologically plausible mechanism because nearly everything I’ve tested has a biologically plausible mechanism and doesn’t work. It’s not enough to have a biologically plausible mechanism: it has to actually work.”

In other words, even if a pudding looks delicious, the proof is in the eating. Harris also stresses that proof of efficacy in animal studies is not very convincing as frequently these results fail to translate to studies in humans.

A good example is fetal-tissue transplantation as a treatment of Parkinson’s disease. “In the 1990s many clinics were transplanting dopamine-producing cells from embryos into the brains of people with Parkinson’s disease”, Harris writes. “This procedure involved drilling holes in the skull, through which the cells were inserted. Animal studies showed that the cells could survive, and that the procedure could correct some of the movement disorders.” Despite these encouraging results, randomized placebo-controlled trials showed that, in the end, the treatment didn’t work. Even though the theory and results from open-label trials were promising, the treatment did not have a clinical benefit and was associated with side-effect such as dyskinesias.

Liberation therapy for multiple sclerosis: an example of a research hype

Harris mentions a few other examples in his book. These are interesting case studies of how impressive anecdotes, observational studies, and open-label trials can be refuted by proper placebo-controlled randomized trials.

In contrast to drugs, there are no regulatory requirements to provide evidence of efficacy or safety when introducing a new surgical procedure into medical practice (unless they involve the use of medical devices that carry substantial risk). So once in a while, a pioneering surgeon comes along with a radical new theory and surgical procedure that elicits hope in desperate patient communities. This is usually followed by case studies describing impressive improvements or encouraging open-label studies that suggest the intervention works. When the surgical procedure spreads and more patients are treated, eventually some of the more skeptical surgeons perform a randomized placebo-controlled trial showing that the new surgery was not effective after all.

A good example of this is surgery for chronic cerebrospinal venous insufficiency (CCSVI) in patients with multiple sclerosis. In the 2000s, Italian researcher Paolo Zamboni reported obstructions of blood flow in the neck of patients with multiple sclerosis, a condition he called CCSVI. Zamboni argued that obstructions in veins lead to a build-up of iron in the central nervous system, which might trigger the autoimmune response seen in multiple sclerosis. His treatment for CCSVI consisted of surgery to clear the blockage of blood flow by inserting balloons or stents to dilate neck veins. Observational studies reported impressive results and news articles featured patients who experienced dramatic improvements and could now get out of their wheelchair. More and more patients with multiple sclerosis were being treated, especially in Canada where the hype was huge after a television program on the procedure. Eventually, two randomized placebo-controlled trials, including one by Zamboni himself, showed that the treatment did not work. CCSVI was not more common in MS patients than in controls as was originally reported and the abnormalities found by Zamboni and colleagues were likely due to misinterpretation of ultrasound data.

PFO closure to treat migraine

Another interesting example is PFO surgery for migraine. PFO stands for ‘patent foramen ovale’, a hole between the left and right upper chambers of the heart. PFO is a remnant of fetal anatomy. All fetuses have it because, at this development stage, blood circulation through the lungs isn’t functional yet. The hole normally closes months after birth when it is no longer necessary, but in approximately 25% of people, an opening remains. This is what PFO refers to. Most people with the condition never know they have it and they don’t need treatment.

In the 1990s, however, there were reports of an association between PFO and migraine with aura. Doctors started treating PFO surgically and described how patients’ migraines improved notably afterward. Eventually, 3 randomized controlled trials were conducted and they all showed a small but insignificant improvement in migraine following PFO closure, suggesting the surgery wasn’t effective. A possible link between PFO and migraine is contested. A 2016 review argues that the quality of studies is poor and that “There is no good quality evidence to support a link between migraine and PFO.”

Vertebroplasty: as the methodology of the trials increased, the effect size shrunk

Let’s take a look at another example that follows the same pattern: vertebroplasty. This was a procedure for patients with frequent fractures such as elderly persons with osteopenia or transplant recipients who have to take steroids to suppress their immune response. Vertebroplasty consists of injecting bone cement, or polymethylmethacrylate (PMMA), into the vertebral body to ease the pain and help fractures heal. Initial reports were more than encouraging and the procedure was quickly embraced by multiple surgeons. One editorial stated: “the opportunity to develop this procedure as a real contribution to the well-being of patients with spinal disorders could be enormous. […] Here is a chance for us to ply our surgical skills in an elegant way and really help people. Let’s do it well.” Within a short period, vertebroplasty for osteoporotic compression fractures became a common procedure.

As the methodology of the trials increased, however, the effect size shrunk. As one review explains:

“What seems most impressive in this whole affair has been the relentless shrinking of the apparent vertebroplasty effect size and duration. With each increase in the quality of evidence, the apparent quantity of effect has diminished and diminished until we are searching for any traces of it at all. […] We have gone from 1998, when 90% of a large cohort had immediate and complete relief of symptoms, perhaps lasting a year or longer completely attributed to the procedure to arguing whether even the modest improvement now reported is a totally nonspecific effect, at even 6 weeks after treatment.”

It took approximately 10 years but eventually, two randomized, double-blind, sham-controlled clinical trials were conducted and these showed no benefit of the procedure. These trials also have their weaknesses so perhaps they should not be considered to be the final answer. But as the authors have argued convincingly “the onus is now on proponents of the procedure to disprove our findings in similar high-quality placebo-controlled trial.”

Arthroscopic surgery for osteoarthritis of the knee

In 2002, Mosely and colleagues published the results of what is perhaps the most notorious randomized placebo-controlled trial on a surgical intervention. It looked at arthroscopic surgery for osteoarthritis of the knee, one of the most commonly performed orthopedic surgeries.  

Arthroscopy or keyhole surgery is an elegant and minimally invasive procedure. An endoscope is inserted into the knee joint through a small incision so that the whole joint doesn’t have to be fully opened. Surgeons usually clean the joint, remove particles and smoothen surfaces. In uncontrolled studies about half the patients reported relief from pain after this procedure. Surgeons saw their patients improve and everyone thought the surgery was effective. Until Mosely and colleagues published the results of their trial…

Patients in the placebo group received skin incisions and underwent simulated smoothening without insertion of the arthroscope. The surprising result was that these patients did just as well. Or as the authors reported: “At no point did either of the intervention groups report less pain or better function than the placebo group. […] the 95 percent confidence intervals for the differences between the placebo group and the intervention groups exclude any clinically meaningful difference.”

The trial received a lot of criticism but it was of high quality and has been confirmed by other studies. Some surgeons argued that they mostly performed arthroscopy to treat meniscal tear but then a high-quality randomized trial with a sham comparison was done, showing that this wasn’t an effective treatment either. For many surgeons, it came as a shock. Although these interventions are no longer recommended, they are still performed worldwide because it is hard to change habits and traditions.

Spinal surgery for back pain

Another example that Harris discusses in his book is fusion surgery for low back pain. The procedure involves fusing bones into a single unit to correct joint deformities and was originally used to treat severe scoliosis, spinal tuberculosis, or large fractures. Today, however, these indications account for only a small fraction of spinal-fusion procedures. Now, most spinal fusions are performed for spondylosis and other degenerative conditions that affect the lumbar spine such as arthritis.

In contrast to arthroscopy, this is a major procedure, one that costs tens of thousands of dollars. One paper states that “spinal fusion is one of the most lucrative areas of medicine and it generates billions of dollars for the hospitals and the surgeons.” There is however little evidence that fusion surgery for back pain is effective. As Harris states in his book “It is very expensive (the implants alone are often tens of thousands of dollars per case), often leads to complications, often requires further surgery, is associated with increased mortality, and often does not even result in the spine being fused.” There are no placebo-controlled studies but two randomized trials (here and here) compared spinal fusion to non-operative treatment alternatives and reported that the latter did just as well. A third, Swedish study did initially report a beneficial effect but at long-term follow-up, there were little to no differences between the groups. Harris notes that “the surgical group didn’t do any better than in the other studies; the difference was that the non-operative group didn’t get better at all. This is because the non-operative treatment was not dressed up as something that might work (that is, it wasn’t a good placebo).”

Surgery for shoulder impingement syndrome

When you raise your arms the long bone in your upper arm, ‘impinges’ against the outer end of the shoulder blade called the acromion. When this movement hurts, it is called ‘shoulder impingement syndrome’. Surgeons treat impingement syndrome by ‘decompressing’ the acromion, which means smoothening it and removing bone spurs and soft tissue to make extra space. It is one of the most frequently performed orthopedic procedures in the world. It also doesn’t work. Reviews previously indicated that it is no more effective than exercise therapy and in 2017-2018 two randomized placebo-controlled trials were published (here and here) showing it wasn’t superior to diagnostic arthroscopy (which was used as a placebo).

Why aren’t more placebo-controlled trials conducted?

These are just a couple of interesting examples but there are many more. Harris’ book also discusses hysterectomy, cesarean section, venous clot filters, laparoscopy for bowel adhesions, intradiscal electrothermal therapy, surgery for Meniere’s disease, floating kidney, and tendon rupture as interventions for which reliable evidence of effectiveness is missing.

A 2014 review of all randomized placebo-controlled trials for surgical interventions (51 in total) found that in half of these studies, the effect of surgery did not differ from that of a placebo. In the other half, surgery was superior to placebo but the magnitude of the effect size was generally small. These results are quite astonishing and make one wonder why researchers aren’t conducting more of these trials.

The most common objection is that it is unethical to give patients in the placebo group anesthesia or an incision without any intervention that might work. They get some of the risks but none of the benefits. That might explain why some of these placebo-controlled trials reported recruitment problems. The editor of the journal Arthroscopy took this argument to the extreme when he accused the New England Journal of Medicine of bias against arthroscopy because they were publishing so many negative findings. He and his co-editors wrote:

“Really, what patient in his or her right mind, no matter how well intentioned to participate in research, would consent to sham surgery? We would not consent to the possibility of anesthesia and sham surgery, nor do we believe our right-minded patients would do so. We have a concern that methods of sham surgical trials result in selection bias, where patients who may not be of entirely sound mind are selected as research subjects, and research performed on such individuals would not be generalizable to mentally healthy patients.”

I thought this was too funny not to share. The authors weren’t joking, however, indicating what skeptics of surgical trials are up against.

Another way to look at this ethical dilemma is in terms of odds. Patients who participate in placebo-controlled surgical trials get a certain chance to receive the novel intervention at the risk of getting a placebo incision. For some, this might be more appealing than not having a shot at a novel treatment at all. They also contribute to scientific understanding so that one day an effective treatment for their condition might be found. Some trials have also used a diagnostic procedure as a placebo such as the trials on shoulder impingement syndrome mentioned above.

Are placebo-controlled surgical trials unethical?

The best counter-argument against those who claim that randomized placebo-controlled surgical trials are unethical are the results of those trials. These have frequently shown that many commonly used surgical procedures are ineffective and shouldn’t be used at all. These trials have likely stopped unnecessary surgery on millions of people over the years. From an ethical point of view, there is a strong case for continuing to perform placebo-controlled studies so that we know which surgeries are effective and which ones are redundant.

Harris argues that “it is unethical to perform new treatments without such proof of effectiveness.” He criticizes an ethical double standard where there is no ethical oversight on (new) surgical interventions but strict regulation for anyone who wants to put these treatments to the test. He writes “there are no ethical restrictions on what type of procedure you perform, but if you want to measure the results of that procedure, you need approval.”

The parachute analogy

Some critics of randomized placebo-controlled trials have used the parachute analogy. It states that for some interventions the evidence of effectiveness is so overwhelmingly clear that a randomized trial would be absurd and unnecessary, much like you don’t randomize people to not wearing a parachute when jumping out of an airplane. Interestingly researchers did a review of when this analogy is invoked. They found that in many cases, the intervention being compared to a parachute was tested in a randomized controlled trial and shown to be ineffective. The authors conclude that “most parachute analogies in medicine are inappropriate, incorrect or misused.”

Why do patients in the placebo group improve so much?

Thus far our story can be summarized as follows: surgeons often think their intervention works because they see their patients improve. But in many cases, randomized trials show that patients who receive a placebo improve to a similar degree as those who receive the surgical procedure. This begs the question: ‘why do patients in the placebo group improve so much?’

Harris argues that surgery likely has a large placebo effect because it makes a big impression on the patient. But there are three other important mechanisms.

  • The first one is the natural progression of the illness. A good example of this is a tennis elbow. Most patients with this condition improve regardless of any intervention. If you operate on them, as doctors frequently did, it gives a false impression that the surgery had excellent results. In reality, patients do just as well without.
  • A second mechanism is concomitant treatment: patients might take other drugs, treatments, or undergo lifestyle changes of which the period of effect overlaps with what is being tested in the trial. 
  • A third mechanism is called regression to the mean. It’s the explanation that is underappreciated the most. If you select something based on having an extreme value, it will likely move closer to normal, next time you measure it. It’s the explanation for why sports stars ‘underperform’ after appearing on the cover of Sports Illustrated. A lot of medical conditions have a fluctuating course and treatment is often initiated when patients are at their worst, making it probable that patients will be a little better at the next assessment.

The history of surgery unveils how medicine works

I apologize that this blog post has gotten rather long but I think these examples are interesting because they tell us a lot about how medicine works. Because surgical procedures are unregulated there is often a lag between their introduction into medical practice and randomized controlled trials to test their efficacy. It’s this time lag that unveils how misleading clinical experience and open-label studies can be. I think Harris and others have made a strong case for more randomized, placebo-controlled, and blinded studies on surgical interventions.

I wonder how researchers of behavioral interventions look at this because trials that are blinded and placebo-controlled are not practically feasible in their field. How do we know if cognitive behavioral therapy is not like arthroscopic surgery for the knee?

For a lot of poorly understood chronic illnesses such as myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS), the situation shows many similarities to surgery. Research is generally of poor quality and treatments are often introduced based on clinical experience. I hope this history of surgery sparks a skeptical attitude as it shows how easily we can be misled by clinical intuition.

Surgery, the Ultimate Placebo: A surgeon cuts through the evidence by Harris Ian. NewSouth: Sydney (Australia). 2016. 293 pages. ISBN: 9781742234571.

* Unfortunately, Harris has a problematic view of medically unexplained symptoms including CFS. He seems to think that these physical symptoms are the result of psychosocial factors, citing the Lancet article by Wessely and colleagues from 1999.

2 thoughts on “The fascinating history of surgery: when placebo-controlled trials clash with common beliefs

  1. J says:

    This article is of course very relevant with regard to the outdated speculative theory that GET/CBT treatment has been based upon.

    I find the distinction important that;

    1) Also every successful new treatment will go through a phase where there is no placebo-controlled evidence. I sometimes encounter the attitude that a treatment is assumed to be ineffective citing that “there is no scientific proof of evidence”, discounting patients’ reports, but not considering whether studies exist. Although an absence of studies would be a cause for caution and only such studies can exclude a placebo effect, absence of studies would not automatically warrant the conclusion that a treatment is ineffective.

    2) When a treatment is shown to be ineffective in a large group of patients, if the patient group is inhomogeneous, treatment may still actually help a small subgroup of patients even though it is shown to not be effective on average. Also, the results of this subgroup could distort the study results. This is also an important aspect of study design which is especially relevant whenever the actual physical disease mechanism of an illness is not yet known (such as ME/CFS) or when the illness is a symptom/syndrome with a possibly large variety of causes (such as “chronic back pain”). Examples; If a GET study was based on older CFS criteria broad enough to include other fatigue patients that do not actually suffer from post-exertional malaise (and thus would probably not be considered ME/CFS patients today), those patients may possibly have benefitted from exercise. / Due to physiological variety observed within the ME/CFS patient group, a biomedical treatment e.g. certain supplements may improve wellbeing in one subgroup but not others.

    Reply
  2. soppeflop says:

    really we’re getting off easy with just CBT/GET, at least when compared to a lobotomy, although a forced walk with no energy can feel like one

    Reply

Leave a Reply