| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
Original Article |
1 Department of Surgery, Divisions of General Surgery and Surgical Research, University of Basel, Basel, Switzerland
2 Department of Surgery, University of Toronto, Toronto, Canada
Correspondence: Address correspondence and reprint requests to: Ulrich Guller, MD, MHS, Department of Surgical Oncology, Princess Margaret Hospital, Rm 3-130, 600, University Avenue, Toronto, ON, Canada, E-mail: uguller{at}yahoo.com
| ABSTRACT |
|---|
|
|
|---|
Key Words: Non-inferiority trial Surgical oncology Statistics Ethics Assay sensitivity Biocreep
| INTRODUCTION |
|---|
|
|
|---|
For instance, in surgical oncology, clinical trials frequently have several endpoints. One endpoint, often overall survival, is designated as the primary endpoint. However, there are other relevant out-comes such as post-operative quality of life, short-and long-term morbidity, cosmesis, and costs, all of which are important for both patients and health care providers. For instance, a small decrease in overall survival may be justifiedor even desired by patientsif there is a corresponding large gain in postoperative quality of life. In this setting, a non-inferiority trial may be the most appropriate study design.
Non-inferiority trials are performed to assess whether a new, investigational treatment or surgical procedure has equivalent (non-inferior) efficacy in treating the illness (e.g., similar overall survival) to an established treatment while offering other, relevant advantages. In surgical oncology, we are challenged by the rapid pace of change in surgical techniques. These new surgical techniques may be less invasive, more convenient and faster to perform, less expensive, or associated with an improved post-operative quality of life, compared to the existing procedure. However, prior to abandoning the standard technique, it is essential to assess whether the new procedure has equivalent (or non-inferior) efficacy regarding overall survival.
Moreover, if an effective therapy already exists, performing a placebo-controlled randomized clinical trial may pose ethical questions.15 In this setting, it is appropriate either to conduct a superiority trial comparing a new treatment to a standard treatment, or to design a non-inferiority trial to compare a new therapeutic approach to an established standard therapy, assuming the new therapy to be equivalent (non-inferior) to the standard therapy with respect to efficacy while having other significant advantages.
| STATISTICS IN THE DESIGN OF NON-INFERIORITY TRIALS |
|---|
|
|
|---|
)
).6 This margin states how similar the performance of a new intervention must be relative to the standard treatment to be considered "non-inferior" or "equivalent". Hence, delta can be described as the chosen definition of inferiority in a non-inferiority trial. It is critically important that the non-inferiority margin be specified prior to the initiation of the study (a priori hypothesis).10 The choice of delta will be the chief determinant of the sample size.11 The sample size is proportional to 1/delta2: the smaller delta, the larger the sample size (and vice versa). Choosing the appropriate non-inferiority margin for a given study is a challenging but essential task.12 The first step is to estimate the expected effect of the standard therapy. Some investigators have suggested that the non-inferiority margin should be smaller than the expected effect size of the standard treatment to ensure that the investigational treatment has at least some efficacy.2,10,12 However, in many contexts this premise is not sufficient and investigators require a greater percentage of efficacy of the investigational treatment compared to the standard one. The choice of a non-inferiority margin usually depends on all of the following factors: (a) known effect of standard therapy over placebo, (b) severity of the disease under investigation, (c) toxicity concerns, and (d) primary study endpoint.13 Therefore, choosing the appropriate delta represents a difficult task that requires extensive knowledge and input from both statisticians and clinicians.10,14 The non-inferiority or equivalence margin may reflect the least clinically relevant difference. A new therapy may then be accepted in a non-inferiority trial if the difference between investigational and standard treatment is smaller than delta. This precludes the possibility that there exists a clinically relevant difference between the two treatments.15 Ideally, a smaller delta should be selected if a more severe condition is being studied and/or if mortality is the primary endpoint. However, there is no true consensus for selection of non-inferiority margins9, which may vary even among trials in similar therapeutic areas.
Research Hypotheses in Non-Inferiority Trials
Traditional randomized clinical trials aim to prove a clinically relevant difference between standard and investigational therapy (alternative hypothesis, Ha), while the null hypothesis (H0) states that no difference between the treatments under investigation exists.19 Importantly, a traditional superiority trial failing to reject the null hypothesis of no difference between treatments does not demonstrate equivalence of the treatments.
In the design of a non-inferiority trial, the null hypothesis (H0) is that the treatment in arm 2 (investigational treatment) is worse than the treatment in arm 1 (standard treatment) by a difference equal to or greater than the non-inferiority margin (Table 1
).20 If, however, the difference between the investigational and the standard treatment is smaller than the pre-specified equivalence margin (delta,
) or if the investigational treatment is superior to the standard treatment, the null hypothesis (H0) can be rejected, providing evidence supporting the adoption of the new treatment (Table 1
).
|
Type I and Type II Error
A type I error (synonym: alpha) is a false-positive result21 regardless of the type of study. In a non-inferiority trial a type I error is to falsely conclude that the new therapy is equivalent (= non-inferior) to or even superior than the standard therapy when this is not the case. Conversely, a type II error (beta, false-negative result)21 in a non-inferiority trial is to falsely conclude that the new therapy is not equivalent (= inferior) to the standard therapy when both treatments are in fact equally efficient (Table 2
).6,19 In a non-inferiority trial, committing a type I error (obtaining a false-positive result) puts future patients at risk of having a poorer outcome (e.g., overall survival) as a result of the use of the new therapy whichin realityis inferior to the standard treatment.22 The risk of obtaining a false-positive result can be minimized either by setting the type I error probability below the standard of 0.05 (e.g.,
= 0.01) or by choosing a very small non-inferiority margin (delta). Such precautionary methods may, however, lead to very large sample sizes6,11,19,23 potentially associated with enormous costs and efforts and thereby threatening the feasibility of the study.
|
Importantly, although the chosen non-inferiority margin influences the probability of committing type I and II errors, the selected delta must be set based on meaningful clinical criteria, and should not be driven by statistical constraints.
Sample Size
Sample size computations are crucial in the design of clinical trials and help to estimate the needed number of subjects for a given study. The importance for appropriate sample size computations is obvious: if the sample size is too large, resources will be wasted and the study may even be ethically questionable. Conversely, if the sample size is too small, the study will be underpowered thereby hindering researchers from drawing meaningful conclusions.12,19,24
In a conventional superiority trial, the power of a study is the probability of finding a statistically significant result for a specified true difference in the overall population.19,21 An underpowered study due to a too small sample size may result in a statistically non-significant difference (a false-negative result), even if a clinically relevant difference exists, and hence may lead to erroneous conclusions.19,24,25
In a non-inferiority trial, the power of a study is the probability of finding equivalence (non-inferiority) between standard and investigational treatment if in fact the two therapies are equivalent in the overall population. Power is related to the alternative hypothesis, i.e., generally "no difference" in a non-inferiority trial. Furthermore, power is associated with the probability of committing a type II error (ß, false-negative result): power = 1 ß. The higher the power, the lower the risk of committing a type II error. Conversely, an underpowered study will increase the risk of committing a type II error and may bias the study toward the null hypothesis (a false-negative result), even if the two treatments are equivalent.3,26 It is therefore absolutely critical to perform sample size computations early in the process of designing the study. Only thoroughly performed sample size estimates and consecutively resulting adequately large patient samples will prevent from obtaining false-negative findings, and hence from potentially hindering patients of the benefits of the investigational treatment.
The sample size in a non-inferiority trial depends, among other parameters, on the accepted rate of false-positive results (chosen type I error, alpha), the accepted rate of false-negative results (chosen type II error, beta), and the selected non-inferiority margin (delta). Sample size, alpha, beta, and delta are the four important parameters in the design of a non-inferiority trial. Investigators can specify only three of these parameters and subsequently determine the fourth. Generally, a margin of non-inferiority (delta) is first determined, which must be based on the expected efficacy of both standard and investigational treatments. The accepted rate of false-positivity (alpha) is then chosen as well as the power of the study (complementary to beta). Based on these parameters an adequate sample size can be calculated.
| CHALLENGES AND CAVEATS IN THE CONDUCT AND INTERPRETATION OF NON-INFERIORITY TRIALS |
|---|
|
|
|---|
The challenge of assay sensitivity could be removed if a three-arm (standard treatment vs. investigational treatment vs. placebo) non-inferiority trial were designed.11 The alternative hypothesis would then be that the new, investigational treatment is non-inferior to the standard treatment but performs significantly better than placebo. However, the use of surgical placebos (sham surgery) is highly controversial and thus, surgical placebos are rarely used.25
Improper Study Design, Conduct, or Interpretation
Improper study design and conduct (e.g., non-compliance of patients, drop-outs, inclusion of ineligible patients, etc.) is likely to decrease an existing difference between two treatments.12,23 In a traditional superiority trial, investigators are keen in avoiding such "sloppiness",3 in order to preclude the possible blurring of an existing difference. Furthermore, a failure to find a statistically significant difference between standard and investigational therapy might be due to an underpowered study design, to the occurrence of too few events, or to an imprecise measurement of the outcome (large variability).19,24,26
Conversely, in a non-inferiority trial, improper study design or conduct may bias the results towards a finding of equivalence (the alternative hypothesis).23 Non-inferiority trials are particularly sensitive to protocol deviations. For instance, deviations from the inclusion criteria, from the intended treatment (crossover), or from the treatment schedule may result in smaller differences between standard and investigational treatment and hence make a conclusion of non-inferiority more likely. Thus, the investigators may wrongly conclude that the new investigational treatment is non-inferior to the standard treatment (false-positive result), whereas in fact, the standard treatment is superior with respect to efficacy compared with the new treatment. Future patients who undergo the new treatment may be at risk of having a poorer outcome due to this false conclusion. Thus, it is critically important that non-inferiority trials be performed with greatest attention to methodological rigor and study conduct.12 Indeed, the existent literature documents that only a few trials concluding no difference between two treatments under investigation were actually adequately designed and conducted as non-inferiority trials with sufficient power. Greene et al.9 recently reviewed 88 published controlled clinical trials and only found 23% of studies that reported equivalence actually setting an equivalence boundary and confirming it statistically. In addition, only 33% of reports calculated a sample size necessary to confirm their results prior to conducting the trial. The authors suggest that many investigators conclude similarity based upon inappropriate statistical tests or inadequate sample sizes, possibly leading to unreliable and potentially dangerous information being employed in clinical practice.9
"Intention-to-Treat" Versus "Per Protocol" Analysis
Intention-to-treat represents a central tenet of performing randomized clinical superiority trials. Intention-to-treat refers to the axiom that patients enrolled in a randomized clinical trial are analyzed according to their initial arm assignment, regardless whether or not they received the assigned treatment. It is well known that some patients cross over from the assigned treatment arm to the other arm. In a conventional superiority trial, an intention-to-treat analysis usually decreases the difference in the outcome between the study groups and provides a conservative approach to analysis by shifting results towards the null hypothesis (the hypothesis of no difference). Similarly, in a non-inferiority trial an intention-to-treat analysis may decrease observed differences between the compared arms. This will, however, bias the results towards the alternative hypothesis. Therefore, we share the opinion that a non-inferiority trial should be evaluated both by "intention-to-treat" as well as "per protocol" analyses.2,3 If non-inferiority is found in both analyses, we can be reasonably certain that both treatments are in fact similar if study design and conduct were appropriate.
Bio-Creep
Bio-creep refers to the phenomenon that the efficacy of investigational treatment could degrade over time as they are repeatedly compared to less efficacious treatments.2 Although bio-creep is mostly used in the context of drug studies, it may also occur in surgical clinical trials. For instance, a standard surgical procedure (A) with an efficacy of 75% was originally determined to be superior to an initial procedure (0), which had an efficacy rate of 55%. In a non-inferiority trial, a new, less-invasive procedure (B) was compared to the standard procedure (A) and was found to be non-inferior with a delta less than 20% and efficacy rates of 60% (procedure B) and 75% (procedure A). Another investigational surgical procedure (C) was later found to be non-inferior to procedure (B) with a delta less than 20% and efficacy rates of 45% (procedure C) and 60% (procedure B). Thus, by these criteria, one could erroneously assume procedure (C) = procedure (B) = standard procedure (A), which was superior to the initial procedure (0), when in fact the efficacy of procedure C may be worse than historically demonstrated for procedure (0) (Fig. 1
). This phenomenon is termed "biocreep" and underscores the cardinal importance of selecting an appropriate comparator therapy and non-inferiority margin for a non-inferiority trial.2
|
| EXAMPLES OF NON-INFERIORITY TRIALS IN SURGICAL ONCOLOGY |
|---|
|
|
|---|
On the basis of all these advantages laparoscopic colon surgery would seem preferable to open colectomy. However, until recently, it was unknown whether laparoscopic surgery was as good (non-inferior) to open surgery with respect to disease-free and overall survival. Thus, the aim of this trial was to evaluate if disease-free and overall survival are equivalent (non-inferior) after laparoscopic compared to the conventional open approach. The trial was designed with a power of 81% (type II error probability: 19%) to declare that the laparoscopic procedure is inferior if indeed it were associated with at least a 23% increase (non-inferiority margin, delta) of cancer recurrence. Alpha (type I error probability) was set at 5%.
After a median follow-up of 4.4 years, both disease-free and overall survival were similar in both arms, suggesting that the laparoscopic-assisted colectomy is a viable alternative with respect to efficacy (disease-free and overall survival) while having significant advantages over the conventional approach.
Example 2
A multicenter randomized clinical trial in surgical oncology serves as another excellent example of a non-inferiority trial.39 This study enrolled patients with cancer of the distal half of the stomach. Patients were randomized to undergo either total gastrectomy (arm 1) or subtotal gastrectomy (arm 2). The primary endpoint of the study was overall survival at 5-year follow-up.
This non-inferiority trial aimed at clearing a decade-long controversy whether to perform a total or subtotal gastrectomy in patients with distal gastric cancer. Previously, a single randomized controlled trial addressed this relevant research question in 1989.40 However, this previous study was methodologically suboptimal (e.g., underpowered trial, lacking control for prognostic variables). A total gastrectomy may reduce the likelihood of recurrence at the proximal resection margin, and may eliminate the risk of developing a metachronous carcinoma in the gastric remnant, suggesting a superiority in terms of oncologic outcome. However, former non-randomized studies did not show a better overall survival in patients undergoing total gastrectomy compared to those having a subtotal gastrectomy.4144 Furthermore, it is well known that a total gastrectomy, which requires an esophageal anastomosis, is the more extensive operation and may be associated with an increased morbidity, nutritional deficiencies, a lacking preservation of physiologic functions of the gastric remnant, and a reduced quality of life.4547 Therefore, it would be of great clinical and potentially economic benefit if the more extensive total gastrectomy could be replaced by a less traumatic procedure, such as the distal gastrectomy in patients with distal gastric cancer. Hence, a non-inferiority trial was designed to compare 5-year overall survival rates between patients undergoing total and subtotal gastrectomy. Total gastrectomy was taken as a reference treatment, with an expected 5-year survival of 50%. The null hypotheses (H0) stated that the 5-year survival for subtotal gastrectomy was 40% or less. Conversely, the alternative hypothesis (Ha) stated equivalent 5-year overall survival rates for total and subtotal gastrectomies, with a non-inferiority margin of 10%, a type I error (
) probability set at 0.05, and a type II error (ß) probability set at 0.2. Power calculations based on these assumptions resulted in a total sample size of 600 patients. A total of 618 patients were recruited, hence this trial was adequately powered. The estimated 5-year overall survival was 62.4% in the total gastrectomy group and 65.3% in the subtotal gastrectomy group. Therefore, the null hypothesis was rejected and the alternative hypothesis of equivalence of both treatment options was accepted. This non-inferiority trial considerably contributed to our current surgical practice of performing subtotal gastrectomy in patients with distal gastric cancer, and helped to improve the post-operative quality of life for many of these patients.
| CONCLUSIONS |
|---|
|
|
|---|
We hope that the present overview clarifies some of the most important aspects of non-inferiority trials and helps the surgical oncologist in the interpretation and implementation of their results.
| QUALITY OF A NON-INFERIORITY TRIALA CHECKLIST FOR THE SURGEON |
|---|
|
|
|---|
Received for publication August 14, 2006. Accepted for publication October 27, 2006.
| REFERENCES |
|---|
|
|
|---|
This article has been cited by other articles:
![]() |
F. Farjah and D. R. Flum When Not Being Superior May Not Be Good Enough JAMA, August 22, 2007; 298(8): 924 - 925. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |