Flottorp and colleagues have published a commentary in the Lancet in which they criticize the new NICE guideline on myalgic encephalomyelitis/chronic fatigue syndrome (ME/CFS). They argue that the guideline development process“was not driven by science but by ideology”. This blog post will examine each of their arguments. The commentary in question is:
Flottorp SA, Brurgberg KG, Fink P, Knoop H, Wyller VBB. New NICE guideline on chronic fatigue syndrome: more ideology than science? The Lancet. 2022; 399(10325): 611-613. https://doi.org/10.1016/S0140-6736(22)00183-0.
First a little background for those who are new to the story. The previous NICE guideline on ME/CFS, published in 2007, recommended rehabilitative treatments such as graded exercise therapy (GET) and cognitive behavior therapy (CBT). Both interventions try to extend patients’ perceived limits by gradually increasing their activity level under the guidance of a healthcare professional.
The current NICE guidance, however, no longer recommends GET and CBT because it considers the evidence-base for both interventions to be of low to very low quality. In surveys, ME/CFS patients also frequently report that these rehabilitative interventions do more harm than good. Patients often say they would very much like to increase their activity level, but that they only get worse when they try. The latter phenomenon is often referred to as post-exertional malaise (PEM), a marked worsening of symptoms following exertion.
Some of the authors of the letter in the Lancet have conducted research on GET or CBT for ME/CFS and believe that NICE is wrong for no longer recommending these treatments. Let’s now take a look at their arguments.
Subjective outcomes in non-blinded trials
The authors argue that, since a diagnosis of ME/CFS is based on subjective symptoms, these are also the most valid endpoints in clinical trials. The NICE guideline downgraded trials on GET and CBT because subjective outcomes were used while patients nor therapists could be blinded to treatment allocation. Flottorp et al. disagree with this judgment. They argue that “absence of blinding and subjective outcomes is common in studies of non-pharmacological interventions for conditions without objective criteria, but does not imply that all such studies are biased and should be downgraded.”
There are several problems with this line of reasoning. Just because something is common, doesn’t mean it is reliable. Similarly, a lack of objective measurements of ME/CFS doesn’t mean that subjective questionnaires suddenly become immune to response bias. And as the Cochrane handbook explains, “the potential for bias cannot be ignored even if the outcome assessor cannot be blinded.”
The problem with non-blinded trials is that participants know if they are receiving the treatment that is being tested or not. Patients who realize they are getting an active intervention rather than the control might be more optimistic about their health or report symptoms according to what they think will please the researchers.
The intervention itself may also change how patients report their symptoms. Booklets on GET and CBT explained how these treatments would help them get better. They told patients how their symptoms might not result from an occult disease, but from more benign explanations such as deconditioning, stress or anxiety. Therapists were instructed to encourage optimism and clarify that impairments were reversible if patients committed to treatment. All these factors might have influenced how patients rated the severity of their fatigue. That’s why subjective outcomes are considered problematic as outcomes in non-blinded treatment trials, compared to subjective measurements taken outside the context of a clinical trial.
This is all common knowledge. That lack of blinding makes the use of subjective outcomes problematic, is explained in introductory textbooks. It’s acknowledged as a source of bias in guidelines on assessing evidence (such as the GRADE and Cochrane handbooks), and in the principles that both the FDA and EMA use for clinical trials (formulated at the International Conference on Harmonization, ICH). Some quotes and references are listed below for those interested.
|Statement on blinding||Source|
|“In research there is a particular risk of expectation influencing findings, most obviously when there is some subjectivity in assessment, leading to biased results. […] Blinding patients to the treatment they have received in a controlled trial is particularly important when the response criteria are subjective, such as alleviation of pain, but less important for objective criteria, such as death.”||Day & Altman. BMJ Statistical Notes. Blinding in clinical trials and other studies, 2000.|
|“The potential for bias cannot be ignored|
even if the outcome assessor cannot be blinded.”
|Higgins et al. Cochrane handbook, Version 6.2, 2021.|
|“A clinical trial should, ideally, have a double-blind design in order to avoid potential problems of bias during data collection and assessment. In studies where such a design is impossible, other measures to reduce potential bias are advocated”||Friedman et al. Fundamentals of Clinical Trials. 2010.|
|“In single-blind or open-label trials every effort should be made to minimize the various known sources of bias and primary variables should be as objective as possible.”||ICH Topic E 9 Statistical Principles for Clinical Trials, 1998.|
|“Our results suggest that, as far as possible, clinical and policy decisions should not be based on trials in which blinding is not feasible and outcome measures are subjectively assessed. Therefore, trials in which blinding is not feasible should focus as far as possible on objectively measured outcomes, and should aim to blind outcome assessors.”||Savović et al.. Influence of reported study design characteristics on intervention effect estimates from randomised controlled trials: combined analysis of meta-epidemiological studies. Health technology assessment. 2012;16(35):1-82.|
|“In a clinical trial where the blinding of patients and care providers is not possible but no improvement is found for an ‘objective outcome’ (eg, peak flow), it seems reasonable to be less confident in an improvement of a ‘subjective outcome’ (eg, quality of life) as this may not be caused by the intervention as such.”||Moustgaard H, Bello S, Miller FG, Hróbjartsson A. Subjective and objective outcomes in randomized clinical trials: definitions differed in methods publications and were often absent from trial reports. Journal of clinical epidemiology. 2014 Dec 1;67(12):1327-34.|
|“Using subjective outcomes in an open-label study undermines its internal validity because it makes it impossible to determine how much of the reported effect is related to the investigated treatment and how much is related to various forms of bias.”||Wartolowska K, Beard D, Carr A. Blinding in trials of interventional procedures is possible and worthwhile. F1000Research. 2017;6.|
When blinding cannot be implemented, it is usually recommended that researchers also use objective outcomes. Some trials on GET and CBT did use some objective outcomes such as employment figures, fitness tests, and activity levels but these showed no consistent improvements.
Post-exertional malaise and case definitions
Flottorp and colleagues criticize the NICE guideline for making a “new non-validated diagnostic definition of CFS/ME, making post-exertional malaise (PEM) a required criterion.” They also criticize the NICE guideline committee for downgrading trials that did not require participants to have PEM.
Old case definitions of ME/CFS focused almost solely on fatigue. This wasn’t very helpful given that fatigue is present in almost all medical conditions. Newer case definitions, therefore, put more emphasis on PEM, a worsening of symptoms when patients exceed their energy limit. Many experts consider this a characteristic symptom of ME/CFS.
It isn’t the NICE Committee that decided to make PEM a required criterion. The Canadian Consensus Criteria (2003), The Institute of Medicine (IOM) definition (2015), and the International Consensus Criteria (2011) all require patients to experience this marked flare-up of symptoms following physical or mental activity. In fact, the description used in the previous NICE guideline also highlighted PEM as a key feature which, if absent, requires diagnosis of ME/CFS to be reconsidered.
Trials that used any of these case definitions weren’t downgraded. This only happened for trials that used older case definitions such as the Oxford (1991) and Fukuda (1994) criteria that do not require PEM.
The NICE committee also didn’t create a new case definition out of the blue. They only slightly tweaked the IOM definition that is in use in the United States, for example by the Centers for Disease Control and Prevention (CDC).
The longest follow-up
Flottorp and colleagues criticize the guideline Committee for extracting trial outcomes at long-term follow-up. They argue that the Committee failed to consider that at this point, some patients in the control group might have crossed over to receive GET or CBT.
The NICE Committee already responded to this criticism. It considered “long-term data of treatments for ME/CFS to be more reflective of real-world efficacy and more helpful for decision making and implementation in clinical practice.”
The data from the largest and most recent trials on GET and CBT, the PACE, FINE, and GETSET trials, all failed to find significant benefits of the interventions at long-term follow-up. In the long run, patients in the control group seem to perform just as well. We highlighted this shortcoming in a previous blog post, but it is often ignored in the literature. It might suggest that initial improvements reported post-treatment are due to response bias, rather than genuine improvements in health.
What about the possibility that the lack of differences between groups at long-term follow-up is due to additional therapy received after the trial ended? In both GETSET and PACE, a sensitivity analysis showed that this was unlikely. The authors of GETSET for example, reported that “there is no evidence that the improvements observed in the SMC group [the control group] were due to them having received more exposure to therapy than the GES group [the intervention group] after trial completion.” Similarly, in the PACE trial the control group caught up on the intervention group and “there was some evidence from an exploratory analysis that improvement after the 1 year trial final outcome was not associated with receipt of additional treatment with CBT or GET.” In other words, at long-term follow-up patients who received GET or CBT seem to do just as poorly as those who did not.
The Cochrane review as a starting point
Next argument. Flottorp and colleagues criticize the guideline Committee for not including Cochrane reviews on GET and CBT. This is a rather strange argument because the trials used in those reviews were also evaluated by the NICE Committee. So, it is unclear why the NICE Committee should have included these.
The Cochrane review on CBT for ME/CFS, for example, dates from 2008 and has not been updated since. It has an editorial note explaining “it should not be used for clinical decision‐making. The author team is no longer available to maintain the review.”
One of the authors of the critique in The Lancet, Kjetil Brurberg, was an author of the Cochrane review on GET for ME/CFS. So perhaps there was some disappointment that NICE did not use this review as a starting point.
There were, however, good reasons not to. The review by Brurberg and colleagues has repeatedly been criticized for methodological flaws. An internal review by Cochrane indicated that some of these criticisms were correct. Therefore, the review was remedied by an amendment in 2019. Some problems, however, required a full update with a change of protocol. Therefore, Cochrane decided to conduct a full update of their review on GET for ME/CFS with an entirely new author team. This process was started in 2019 and is still ongoing.
NICE did not conduct their own data synthesis
Another criticism made is that the NICE committee “did not conduct their own data synthesis of clinical trials.” This is also a bit peculiar because NICE has published hundreds of pages full of GRADE tables and meta-analyses.
We suspect Flottorp et al. are mainly disappointed that NICE didn’t perform the data synthesis as they would like to see it done, namely by combining as many GET and CBT trials as possible. This is the approach used by the Cochrane review on GET by Brurgberg and colleagues. It adds more data so that a review can make stronger conclusions.
But it can also make a review look more impressive than it is. One problem with combining many studies is that it results in high heterogeneity. This signals that the trial design or interventions differed too much to be added into the same analysis, like apples and oranges. The NICE review has therefore chosen to split up these comparisons into interventions that are more alike, an approach that is similar to a 2015 review on ME/CFS by a National Institutes of Health Working Group in the US.
There are arguments for both approaches. In our view, it might be best to simply read the trial reports rather than reviews and meta-analyses as the latter are more subjective and biased than people often think.
The Committee was biased
This brings us to the next point. According to Flottorp et al., the NICE Committee was biased because some members had previously expressed negative views about GET or CBT. They write: “we know from social media that some of the committee members and two of the three expert witnesses had negative opinions regarding the interventions considered.”
What they do not mention is that many of the other Committee members had previously expressed positive views of GET or CBT. These included researchers that published on GET/CBT (one member was for example a co-author of the PACE trial findings) or physicians that prescribed these treatments.
One could argue that such interests form a greater bias than those resulting from what somebody posted on social media. It’s not easy for a researcher to conclude that a trial they worked on for many years, is flawed. Or for a doctor to admit that the treatment they prescribed, is ineffective or even harmful.
When the NICE Committee was formed, many thought it was biased in favor of supporters of GET and CBT. Patients and patient organizations wrote to NICE to express their discontent about this. It seems that only after the first draft came out, did proponents of GET and CBT claim publicly that the Committee was biased in the other direction.
NICE did not listen to critiques from the Royal Colleges
Flottorp and colleagues also argue that the NICE Committee didn’t listen to the critical comments submitted by the Royal Colleges. These comments, however, weren’t always well-drafted, and some favored treatments based on pseudoscientific principles such as neurolinguistic processing. It is unclear how many people proofread these submissions within the colleges. NICE had good reasons not to follow some of these proposals.