HER2 and Herceptin represent the biggest success story in breast
cancer of the last two decades. All might be as advertised, but growing
evidence raises key questions: Does HER2 drive breast cancer? Is it prognostic?
Is the definition of HER2-positive clinically validated? Are HER2 assays
reproducible? Can re-testing HER2 status change trial outcomes? Does HER2
predict benefit from Herceptin? Does Herceptin’s mechanism of action depend on
HER2? Has Herceptin increased five-year breast cancer survival at the
population level? Would Herceptin be approved by the FDA today? In the most
recent trials of Herceptin and T-DM1, can the spectacular success of adding
pertuzumab to Herceptin be squared with the failure of pertuzumab to make any
difference when given with Herceptin-based T-DM1?
Examination of these questions (ten in total) casts doubt on
the validity and clinical utility of the HER2 subtype and current treatment
guidelines for Herceptin in breast cancer.
1. Does
HER2 drive breast cancer?
Summary: Evidence
conflicts regarding whether HER2 drives cancer in humans; the decades-old
experiments should be repeated to resolve the question definitively. Today, HER2
is the sole “Class I” oncogene that works by overexpression or amplification
rather than mutation; no other type of cancer with a targeted therapy is driven
by overexpression or amplification of the target. GRB7 is frequently
co-amplified with HER2 and might be required to drive cancer, or GRB7 might contribute
to oncogenesis independently of HER2. Conceivably, neither HER2, GRB7 nor any
of the genes on the amplicon they share drive oncogenesis.
Robert Weinberg’s lab discovered that mutation
of neu in rats drives cancer. But the
analogous gene in humans, HER2*, is
rarely mutated in breast cancer. Instead HER2 is either amplified or its
protein product overexpressed. However, Weinberg’s lab found
that amplifying neu 100-fold and increasing
expression 10-fold did not transform mouse cells from normal to cancerous. Subsequently,
the lab of Stuart Aaronson performed
two similar experiments in the same mouse NIH/3T3 cell line but amplifying HER2
rather than neu. The first experiment
confirmed Weinberg’s finding. But a second test used a different promoter that
ratcheted HER2 expression five to ten times higher than in the first experiment,
resulting in transformed cells, according to the researchers. Three decades
later, however, Weinberg seems unpersuaded: the “report of Aaronson… may or may
not have been independently replicated over the past 30 years since it appeared,”
Weinberg wrote in email.
The experiments should be repeated. (I was not able to
determine if HER2 has been shown to be transforming in human cell lines.)
HER2 is the only “Class I” oncogene that drives by overexpression and/or amplification
A 2004 census
of amplified and overexpressed oncogenes found that just six met the most stringent
“Class I” criteria. But by 2010, this fell to three: EGFR, AR
and HER2. Now in 2016, HER2 stands alone as a Class I gene.
Class I genes must meet the lesser criteria of Class II and
III genes and also require “that a drug that targets the encoded protein is
used to treat patients for which efficacy must have been shown in clinical
trials.” For EGFR in colo-rectal cancer, the targeted agent is Erbitux. However,
although the FDA label
for Erbitux lists EGFR/ERBB1 expression as an indication, the “consensus is
that ERBB1 expression not required for therapeutic success,” according to Bert
Vogelstein at Johns Hopkins University. In prostate cancer, amplification of the
androgen receptor gene (AR) does not initiate prostate cancer but results from treatment.
That leaves HER2 in a class by itself.
I asked Mike Stratton, a co-author of the 2010 census paper
and director of the Singer Institute, if HER2 was indeed the sole Class I gene.
Stratton replied: “I think that what you say is correct.”
A fundamental principle of targeted therapy is to attack
tumor cells without harming normal cells. Consequently, targets are usually
mutations. Herceptin’s ostensible target, however, is unmutated HER2. There is
no driving oncogene for any human cancer where a targeted therapy aims for an
unmutated target—except HER2 breast cancer.
GRB7
In cancer cells, amplified regions are quite common. The
near absence of Class I genes, according to the 2010 census paper, “reflects
the difficulty encountered in identifying the true cancer gene on amplicons
that often include several candidate genes.” GRB7 resides on the same amplicon
as HER2, and one research group found that both genes
are co-amplified in 15% of invasive breast cancers. The same group suggested
HER2 was not transforming by itself but required GRB7 co-amplification, creating
the possibility that a “combination of multiple genes, which do not have
independent transforming activity, causes transformation.” That is, HER2 might
not be transforming. An analysis of the
Cancer Genome Atlas concurred that “GRB7 may
be necessary for cancer cells harboring this amplicon, as previously suggested.”
Betsy Ramsey’s lab also studied GRB7
and HER2. In email, Ramsey summarized: “Certainly GRB7 seems to be a player
with or perhaps without HER2.”
2. Is
HER2 prognostic in breast cancer?
Summary: Individual
studies come to disparate conclusions about HER2 as prognostic in breast cancer.
Reviews of the literature have been tagged with Expressions of Concern, leaving
belief that HER2 is prognostic unsupported.
Women testing HER2-positive opt for more radical surgery
than patients found to be HER2-negative. According to an analysis of more than 113,000 women, “mastectomy
rates were higher in women with HER2-positive tumors than in those with HER2-negative
tumors.” The reason is not known although the negative prognosis associated
with HER2 breast cancer, long considered as “aggressive,” might be leading
doctors and patients to pursue more aggressive treatment. But the evidence no
longer supports HER2 as prognostic.
The HER2 prognostic literature begins with a 1987 paper from Slamon et al. which
considered 86 node-positive patients from an unrelated clinical trial. (This might
have made the cases non-random. In addition, the authors did not discuss
whether the 86 were all from the same arm, raising the possibility of treatment
confounding the analysis.) Slamon and colleagues found a statistically
significant association between degree of HER2 amplification and recurrence
and, to a lesser degree, survival: greater HER2 amplification increased the
risk of recurrence and shortened survival. But when simply separating cases
into amplified (n=52) vs. non-amplified (n=34), no difference in recurrence or
survival emerged. To show such a difference and that HER2 was prognostic, the
researchers dropped patients with a middling HER2 copy number of 2-4 (n=23)
from the analysis. Based on the few remaining patients (n=11), with gene copy
numbers of five or more, the authors reported a statistically significant
difference in disease free survival (p=0.015) although still no difference in
survival (p=0.06). Thus was born HER2’s reputation as an aggressive form of
breast cancer.
More than one hundred studies followed Slamon et al., with
an array of disparate results that might be expected given the methodology of
the founding paper. However, the contradictory mess resolved into iron
scientific consensus: “Initial conflicting reports regarding the prognostic
relevance of HER2 were resolved with improved methodologies,” wrote Mark
Moasser “and the overwhelming data now confirms this initial [Slamon et al.] landmark
genetic-biologic finding.” The overwhelming data, according to Moasser, was
“nicely reviewed” in a paper by Jeffrey Ross and colleagues.
Ross served as first author on three reviews of the HER2
prognostic literature, published in 1999, 2003, and 2009. However, in March of
this year, all three reviews received an Expression of
Concern from The Oncologist. (The EOC was prompted by my re-analysis
of the Ross et al. reviews.)
The 2009 review included 107 papers. No search or inclusion
criteria were specified, creating the possibility of selection bias. More
important than how the 107 papers were chosen, however, the review contains
errors on 30. More than one in four (28%) of the papers reviewed are
mis-reported or should not have been included. For example, a paper from Battifora et
al. is misreported: Ross et al. categorized it as “yes” under multivariate
analysis of prognostic factors when the paper clearly did not find that HER2
was independently prognostic:
“This analysis identified independent prognostic factors of
DFS and OS when all variables were considered together. Independent predictors
of DFS included stage of disease, histology, and nuclear grade. Nuclear grade
and stage were the only significant predictors of OS.”
There are 10 errors of this particularly blatant sort among
the 107 papers reviewed in Ross et al. 2009. A separate group of seven of the
107 papers conducted no multivariate analysis of whether HER2 was prognostic,
but Ross et al. reported that those studies did and that each of the seven
found HER2 independently prognostic. Six of the seven did not conduct a
multivariate analysis of any kind; one of the seven did but all patients were HER2-positive.
An additional set of 11 papers should have been excluded.
Nine of these correlated HER2 with different biomarkers, not clinical outcomes.
Among these, one paper
included 3,655 patients, by far the largest study in the review. Together,
these 11 papers contributed 7,511 (19%) of the 39,730 patients in the review.
Of these extraneous patients, 7,213 (96%) were adduced in support of HER2 being
independently prognostic.
I contacted Jeffrey Ross regarding these errors. Concerning
those in his 2003 paper, Ross acknowledged “scattered errors.” Ross disputed
none of the 30 errors I identified.
There do not appear to be other literature reviews showing
HER2 to be prognostic in breast cancer, leaving that belief unsupported.
3. Is
the definition of HER2-positive clinically validated?
Summary: There is no
gold standard assay for definitively identifying HER2-positive tumors. The
preferred assay and cutoff values have changed and changed back over time. But the
modified standards are arbitrary, not clinically validated. The changes may
reduce interobserver disagreement but do not increase accuracy in identifying
true HER2-positives.
Changing assays: IHC vs. FISH
“There is no gold standard at present,” according to the
most recent guidelines
for HER2 testing, published in 2013. Perhaps for that reason, the preferred
method for determining HER2 status has changed over time. Early research
had shown that “amplification added little predictive value to the expression
data.” Instead, the assay of choice, immunohistochemical staining (IHC),
measured HER2 overexpression. The senior author of the paper, Dennis Slamon,
subsequently led the first Phase III trial of
Herceptin, for metastatic breast cancer. That study used immunohistochemical
analysis and led to the first FDA approval of Herceptin in 1998.
However, not long after, re-analysis by the
trialists showed that amplification measured by fluorescence in situ
hybridization (FISH) predicted response better than IHC:
“FISH assays have higher sensitivity and higher accuracy and
more frequently correctly identify altered HER-2/neu status
(amplification/overexpression) in previously molecularly characterized
specimens than did the FDA-approved immunohistochemistry assays interpreted
manually.”
Of note, two of the authors, Michael Press and Dennis
Slamon, also co-wrote the paper nine years before which found that “amplification
added little predictive value to the expression data.” Now it was the opposite.
Slamon-led researchers ultimately dismissed
IHC with prejudice in 2005: “We do not consider immunohistochemistry screening
for entry to clinical trials or for selection to Herceptin immunotherapy to be
an acceptable strategy.”
The FDA, not long after approving Herceptin and IHC, voiced
its displeasure about HER2 testing and the “many unanswered questions regarding
HER2 detection systems…” The agency threatened to change the Herceptin label
because of the “considerable confusion and misunderstanding on the part of the
oncology community,” which was “significant enough to warrant general
precautionary comment in the trastuzumab [Herceptin] label…”
By 2006, HER2 testing was broken, “a disorganized practice”
with a “high rate of inaccuracy,” according to guidelines published that
year by the American Society of Clinical Oncology (ASCO) and the College of
American Pathologists (CAP). Both organizations had previously recommended HER2
testing but without seeing a need to specify how to do it.
Instead of choosing sides in the IHC vs. FISH debate, the
new guidelines put the two assays together. No single kind of test sufficed to identify
all HER2-positive tumors. An equivocal result from one assay would trigger
another test using the other assay. The UK used this “two tier” system, and the
ASCO/CAP 2006 guidelines proposed the same approach for the United States.
Although the UK guidelines
detailed testing procedures, they did not cite clinical evidence in their support.
Instead, the UK authors “expected that emerging data on accuracy of prediction
of response to HER2 targeted treatments will influence the choice of testing
method.” The US guidelines also adduced no evidence in support of the two tier
system, acknowledging that, “current data are insufficient to define whether
these patients represent true- or false-positives.”
Some of the ASCO/CAP panelists had wanted to scrap IHC
entirely: “A minority view expressed within the panel was that IHC is not a
sufficiently accurate assay to determine HER2 status and that FISH should be
preferentially used.” Instead, between 80 and 90% of primary HER2 testing in
the United States is done with IHC, and only 10 to 20% uses FISH, according to figures from
2008. A 2010 paper
reported that 80% of HER2 assessments in the United States used the IHC-based HercepTest,
manufactured by Dako. FISH costs more and requires more expensive equipment, a
possible barrier for some hospitals and conceivably part of why FISH is not
mandatory.
Herceptin trials were no help in shaping the ASCO/CAP
guidelines: “the large prospective randomized clinical trials of trastuzumab
were not prospectively designed to answer these questions.” Instead,
retrospective “correlative studies” would have to suffice. Also, instead of
seeking a true gold standard, any new assays would be measured by how well they
reproduced the results of the old assays. According to the guidelines:
“Although a new HER2 assay ideally should have its clinical
utility validated using specimens from prospective therapeutic trials that
tested the effects of anti-HER2 therapy, the Update Committee recognizes that
the rarity of these valuable specimens requires that new HER2 assays be
approved on the basis of concordance studies comparing them with other
established HER2 tests.”
In addition, the guidelines aimed for concordance rather
than accurate diagnosis, with concordance substituting
for accuracy. Inaccurate diagnosis is not possible to detect because there is
no gold standard. By contrast, discordance can be measured, causes
embarrassment and raises concerns at the FDA. As the guidelines recognized,
however, interobserver agreement is not the same thing as accurate diagnosis: “concordance
of assays does not assure accuracy (i.e., how close the measured values are to
a supposed true value…).” Early UK guidelines
strove to reduce “interobserver variation in the assessment of staining” by
standardizing scoring “against known positive, negative, and borderline cases.”
I asked Ian Ellis, corresponding author of the long-ago UK guidelines paper,
whether the cases were clinically validated or if the staining patterns were
selected for the purpose of minimizing interobserver disagreement. Ellis did
not reply to multiple inquiries.
Cutoffs
In 2006, widespread HER2 assay discordance prompted ASCO/CAP
to announce more stringent cutoffs than the FDA because “the original US Food
and Drug Administration-approved interpretation guidelines provide insufficient
specificity.” The FDA had approved an IHC staining threshold of 10% of cells. But
the panelists believed this resulted in an unacceptably large number of false
positives. The guidelines recommended a higher cutoff of 30%, not based on published
evidence but anecdote, “the cumulative experience of panel members that usually
a high percentage of the cells will be positive if it is a true IHC 3+.”
The guidelines referred to “published reports using cutoff
values higher than 10%,” but the footnote pointed to a single study in
France which achieved 95% concordance between IHC and FISH by using a vastly higher
staining cutoff of 60%. Why the revised US guidelines recommended a 30% cutoff
is not clear.
The cutoff for FISH HER2/CEP 17 ratio was also raised, from
2.0 to >2.2. In addition to changing the definition of IHC3+, IHC 2+
patients were no longer considered HER2-positive as they had been in the original,
FDA approval-winning trial of Herceptin.
But then seven years after raising the IHC thresholds, the
ASCO/CAP guidelines rolled them back. The 2013 guidelines committee
“decided to revert to the previously used IHC criterion of more than 10% cells
staining.” Re-examination
of one trial with 2,904 patients found that the higher threshold only excluded 107
or 3.7%. This seemed to contradict the “cumulative experience” of the 2006
panel that “usually a high percentage of the cells will be positive if it is a
true IHC 3+.” Now it seemed to be remarkably rare for IHC 3+ to have more than
10% staining.
The re-examination found that patients not meeting the
higher ASCO/CAP guidelines showed no difference (p=.55) in disease free
survival whether they received Herceptin or not. But the guidelines were
switched back nonetheless.
The threshold for IHC had been raised in 2006 out of concern
for false positives. But a then-ongoing study
assuaged this worry. The 2013 guidelines said that study found “less than 6% of
patients initially considered eligible were not subsequently centrally
confirmed as being HER2-positive.” The US, at least, appeared to have its house
in order.
The 6% figure came from a central laboratory at the Mayo
Clinic that re-examined about one thousand locally-tested patients. However, central
vs. local testing in Europe of more than 8,000 patients yielded discordant
results in 15% of cases. And when the two US and European central labs later compared
results, examining only samples known to be false negatives, they differed on IHC
scores for 6 of 25 cases (24%) while FISH scores differed for 3 of 25 (12%).
Moreover, the Mayo Clinic systematically assigned higher IHC and FISH scores to
a set of 23 cases previously judged as equivocal in local testing. Of the 23, the
Mayo Clinic found 15 to be HER2-positive while versus 11 according to the
European lab. In other words, the Mayo Clinic might have generated a high
percentage of false positives, the concern that had led to raising the IHC
staining threshold to 30%, which the 2013 guidelines rolled back.
Which central lab was right? Did the high US concordance
rate mean US labs were correctly identifying true HER2-positive patients and
the European lab was wrong? “It is not possible to know,” according to the ring
study paper, “which central laboratory determination of HER2 status… was
biologically correct in terms of distinguishing patients who do or do not
benefit from HER2-targeted… therapies.”
4. Are
HER2 assays reproducible?
Summary: Neither IHC,
FISH, local or central testing generate reliably reproducible HER2 results. Re-examination
of the clinical trials leading to FDA approval of Herceptin found discordant
HER2 status on as many 26% of patients. Technical shortcomings of both IHC and
FISH assays contribute to reproducibility problems. Each laboratory may be
using its own cutoff criteria in judging these semi-quantitative assays. The
type of assay, its manufacturer and who performs the test can decisively
influence the HER2 status of any given patient.
The major trials leading to FDA approvals of Herceptin in
breast cancer have been re-examined, producing substantially different results for
patients’ HER2 status. As mentioned, the breakthrough trial leading to the
first FDA approval of Herceptin depended on IHC. But re-examination found
amplification measured by FISH predicted response to Herceptin more accurately;
i.e., FISH did not reproduce IHC.
Central testing frequently contradicts local results, a
problem which affected both US trials that supported FDA approval of Herceptin
in the adjuvant setting. Central retesting of patients in NCCTG N9831 failed to
reproduce a local HER2-positive result in as many as 26% of re-examined
cases. Similarly, retrospective analysis of NSABP B-31 found 18% of tumors were HER2-positive
according to local testing but HER2-negative by central testing using both IHC
and FISH.
Trial MA.31
compared Herceptin and lapatinib in metastatic breast cancer in 652 patients
found HER2-positive by local testing. However, central re-testing found 115
patients (18%) were not HER2-positive, although they had been enrolled in the trial
and treated with an anti-HER2 therapy.
False negatives are also a problem. A 2014 study of 552 locally HER2-negative patients found that
4% were HER2-positive by central testing.
Non-reproducibility also impacts central laboratory testing.
As mentioned, a US and European lab each examined the same set of 23 equivocal
HER2 cases. The US lab found 15 (65%) HER2-positive while the European lab
found only 11 (48%) HER2-positive.
Contributing to reproducibility problems are shortcomings of
the tests themselves. In 2006, the ASCO/CAP HER2 testing panel had considered
throwing out IHC, the original technology for identifying HER2-positive breast
cancer. Skepticism, even condemnation, of IHC continues to this day. “The IHC
assay is lousy,” according to Bert Vogelstein at Johns Hopkins University. “No
IHC assay is great, many are inaccurate,” he added. FISH, according to Vogelstein,
“is not that great either, but it’s the best that the pathologists have.”
For both IHC and FISH, the handling of samples before the
test can affect test results. Also, the assays are only semi-quantitative. As
the FDA observed
in 2001, it “views both IHC and FISH as semi-quantitative if performed under
ideal circumstances.” In addition, “Both methods require subjective
interpretation.”
According to David Rimm, a HER2 testing expert at Yale
Medical School, each lab has its own cutoffs, which he considered a “dirty
secret.” (In reply to my initial inquiries about issues with HER2 testing, Rimm
replied: “You are about to uncover a landmine.”) The College of American
Pathologists (CAP) sends HER2 testing facilities samples to measure and
encourage adherence to common, cross-laboratory criteria for HER2-positives and
negatives. But according to Rimm, “the College doesn’t send them too many hard
ones,” possibly to avoid generating discordant, non-reproducible results.
An additional complicating and underappreciated problem is
that tumors are sometimes heterogeneous. A biopsy from one part of a tumor can
test HER2-positive while a sample from a different part of the tumor tests
negative. Researchers reporting such
a case wrote: “We do not know the frequency with which a disparity of this
degree occurs, but it is not even mentioned in reviews on this subject or
consensus guidelines published previously. We therefore assume that it must be
a rare phenomenon or one clearly underappreciated.”
According to Kornelia Polyak of the Dana Farber Cancer
Institute, the phenomenon is not rare: “This is a pretty serious problem as we
see that ~30-40% of HER2+ tumors have high heterogeneity for HER2 itself within
the tumor.”
Not only do different labs sometimes disagree about assay
results for a given specimen, in addition, where the tissue sample comes from
in the tumor can determine whether a patient is deemed HER2-positive or HER2-negative.
(Also tumor cells can interconvert between HER2-positive and HER2-negative
states. See question 6.)
5. Can
non-reproducible HER2 assays alter trial outcomes?
Summary: Central and
local testing can produce conflicting HER2 assessments. In at least one trial,
central retesting changed the outcome of the study. There are implications not
only for bioethics but a forthcoming meta-analysis that attempts to measure Herceptin’s
effects across trials, some with multiple, conflicting HER2 assessments.
The outcome of trial MA.31
depended on whether local or central HER2 determinations were used. By central
testing, Herceptin extended life more than lapatinib; by local testing, there
was no difference. Recall that in MA.31, local testing identified 652 patients as
HER2-positive but central re-testing later found 115 (18%) weren’t HER2-positive.
This makes trial interpretation difficult or impossible, while
the timing of the tests created an ethical conundrum. The central re-testing
occurred during MA.31 according to
Karen Gelmon, the study’s corresponding author. Regarding the unusual timing,
Gelmon said: “the thought was to make it easier for the patients and doctors to
use local HER2 for randomization to avoid a delay in starting treatment…” However,
in speeding patients into the possible benefits of an untested treatment
regimen, the design and conduct of the trial resulted in centrally HER2-negative
patients being treated with anti-HER2 therapies.
Gelmon said “the central results are considered definitive.”
But most patients, including those that were centrally HER2-negative, appear to
have completed the study. Regarding this ethical conundrum, Gelmon said: “If
the central confirmation showed negative results it was up to the treating
physician to decide how to treat, which is always how it is, and they could
continue the Herceptin if they thought the local testing was valid.” Gelmon did
not reply when asked how many patients stopped receiving anti-HER2 treatment
after being found HER2-negative by central testing. It is not clear if patients
were informed of the equivocal test results. Anti-HER2 therapies are of course not
approved for use in HER2-negative patients because the benefits, if any, are
outweighed by toxicities and other side effects.
Although this ethical dilemma arose in a clinical trial, it
conceivably impacts every breast cancer patient. It is unclear whether to believe
local testing, central testing or neither. Gelmon doubled-down on central
testing: “yes – central or validated testing is what should be recommended.” However,
the ASCO/CAP guidelines recommend only that the testing laboratory be
accredited. Gelmon simultaneously regards central testing as definitive but
supports optionally ignoring it. Edith Perez, prompted by FISH-negative cases
who later turned out HER2-positive by IHC, recommended
that “in the case of negative results, it’s advisable to repeat the test you
started with or to run a different test,” perhaps making it sound like testing
should be continued until the result is positive. Perez led the NCCTG N9831 study.
It is unclear that a coherent testing algorithm is
obtainable from the recommendations and practices coming from these clinical
trials.
CTSU
The Clinical Trials Service Unit (CTSU) at Oxford is
conducting a meta-analysis of Herceptin in early breast cancer. But how will
the study define “HER2-positive?” For trials with two sets of assessments, the meta-analysis
will have to choose between them (and ignore one set) or present results for
both local and central testing. However, not all trials re-tested HER2 status,
further complicating the aggregation of Herceptin’s effects across trials.
In addition, the type of assay used might be important and
worth reporting. “Scientists who actually do these assays (rather than see the
reports of the results) know that neither of these assays (FISH or IHC) are
particularly reliable on clinical samples,” said Bert Vogelstein. Given the
actual complexity and uncertainties in HER2 assessments, CTSU could (and
should) examine their accuracy, computing the likelihood of an assessment being
correct and/or linking assessment accuracy estimates to the confidence interval
around Herceptin’s clinical benefit. Since there is no way to identify “true” HER2-positives,
perhaps the best that can be done is to calculate the likelihood of concordance
or discordance if a sample were subjected to re-testing.
I asked CTSU’s Richard Gray: “Will your study use the
initial or retested results for those trials? How will your meta-analysis deal
with the mixture of protocols for determining HER2 status across trials?” Gray
replied indirectly: “One prime aim of the meta-analysis will be to investigate
whether there is benefit in HER2 receptor equivocal patients, and we’ll collect
results of all available local and central assays to look at this.”
However, arguably the real problem is how to perform a meta-analysis
of randomized clinical trials where a main variable, HER2, was not controlled. Kornelia Polyak believes Herceptin works, but observed:
“If you pick variable patients by definition you will have variable responses
leading to confusion.”
6. Is
HER2 a valid biomarker that predicts benefit from Herceptin in breast cancer?
Summary: The published
literature no longer supports the validity of HER2 as a biomarker for Herceptin
in breast cancer. Both US trials leading to FDA approval of Herceptin in early
breast cancer later found HER2 did not predict benefit from Herceptin. In small
subgroups, HER2-negative patients appeared to benefit more than HER2-positive
patients, and Herceptin is now being tested in HER2-negative patients. Alternative
biomarkers for Herceptin have been proposed but none accepted.
The FDA approved Herceptin in 2006 for early breast cancer
based mainly on two US trials: NCCTG N9831 and NSABP B-31. Both trials later
announced some patients had been misdiagnosed as HER2-positive, making it
possible to examine clinical outcomes of patients negative for HER2 by central
testing who had been treated with Herceptin.
In 2007, one year after helping win FDA approval of
Herceptin, B-31 trialists reported neither FISH nor IHC predicted response
to Herceptin: “No statistical interaction was found between DFS benefit
from trastuzumab and levels of protein (p=0.26) or HER2 gene copy number
(p=0.60).” However, although the B-31 trialists wrote of “no statistical
interaction,” it appears that HER2-negative patients benefited more than HER2-positive patients. The
subgroups were small but nonetheless the researchers reported significant
values for each of them, turning the HER2 world on its head.
|
|
Relative risk
|
p-value
|
HER2-negative
|
IHC- (0-2+)
|
0.28
|
0.0033
|
|
FISH- IHC- (0-2+)
|
0.36
|
0.032
|
|
FISH-
|
0.40
|
0.026
|
|
|
|
|
HER2-positive
|
IHC 3+
|
0.45
|
<0.0001
|
|
FISH+
|
0.47
|
<0.0001
|
HER2-negative patients
benefited more from Herceptin than HER2-positive patients in NSABP B-31. (Adapted
from Paik et al., 2007)
In 2013, B-31 trialists sought
a new biomarker, reiterating that “HER2 itself failed to show predictive interaction
with trastuzumab…”
The second trial key to FDA approval, NCCTG N9831, corroborated
B-31’s finding that FISH did not predict response to Herceptin. A 2010 re-analysis
of N9831, again by the original investigators, found “Trastuzumab
benefit seemed independent of HER2/centromere 17 ratio and chromosome 17 copy
number,” i.e. independent of FISH.
The two trials which had established HER2 by IHC and/or FISH
as the biomarker for Herceptin subsequently disestablished both. On this basis
alone, HER2 would seem to no longer be a valid biomarker for predicting
response to Herceptin. Logically, HER2 status no longer stands as a valid
indicator for treatment with Herceptin. We have known this since 2010.
FISH and IHC might have found redemption in Hera, the trial
that led to approval of Herceptin in the adjuvant setting in Europe and also supported
FDA approval in the US. Post-approval, Hera trialists examined the
relationship between degree of HER2 amplification by FISH and benefit from
Herceptin. But they chose not to examine 41 patients with a FISH ratio under
two, i.e. HER2-negative by FISH. “We deemed it inappropriate to analyze this
small group,” wrote the investigators. Consequently, they could not say how
FISH-negative patients responded to Herceptin and whether or not IHC by itself
predicted response.
Without looking at the FISH-negative group, researchers
continued to posit “a strong threshold effect whereby any degree of
amplification above the cutoff ratio of 2.0 is of equal clinical significance.”
However, B-31 previously and N9831 found no
threshold among a combined 330 FISH-negative patients. The Hera team simply
looked away.
The Hera trialists also examined IHC
staining intensity and clinical outcomes. This time, IHC-negative patients were
not included, which prevented analysis of whether FISH predicted response to
Herceptin. Hera investigator Mitch Dowsett explained, in June 2014: “Because of
our policy on recruiting only centrally confirmed HER2-positive cases to Hera
we were not in a position to do this.” However, apprised that Hera enrolled at
least 299 centrally confirmed, HER2-positive patients who were IHC-negative,
Dowsett revised his explanation. “I think ‘policy’ is overstating things. We
could and maybe should have looked at this group in more detail previously.” But,
“prompted by a UK pathologist,” rather than the failure of IHC to predict response
in B-31 and N9831, Dowsett said the Hera trialists would examine the IHC-negative,
FISH-positive subgroup.
In August 2015, more than a year later, I asked Dowsett how
the project was going. “The work was conducted and a manuscript created,” he
replied. But then the primary investigator, Bharat Jasani, “left for [a] job in
Kazakhstan,” said Dowsett, stalling the investigation. I emailed Jasani and
asked: “Were you examining IHC-negative, FISH-positive cases from Hera before
leaving for Kazakhstan?” Jasani seemed to contradict Dowsett: “The simple
answer is no and I would like to confirm once again that I have not examined at
any time any IHC-negative, FISH-positive cases from Hera.”
Analyses of key subgroups in the Hera trial appear to have
been avoided. As it stands, every re-test of any assay used to assess HER2 in
the FDA approval-winning trials in early breast cancer found that that assay
did not predict benefit from Herceptin, or that being HER2-negative predicted
greater benefit.
Similar to the Hera trials evasions, HER2 testing experts
also avoided addressing HER2’s validity as a biomarker when I raised the issue
to them in 2014. I emailed John Bartlett, at the Ontario Institute for Cancer
Research, asking: “what established HER2 as a biomarker and what data informed
the cutoff point for positive vs. negative?” Bartlett previously co-authored HER2 testing
guidelines. His assistant replied: “John says he should be able to answer it
via email.” However, Bartlett eventually wanted to speak by phone. When I
requested email, the assistant wrote back: “Unfortunately Dr. Bartlett is
unable to answer this question.”
The lead author of HER2 testing guidelines, Antonio Wolff,
wrote me that FISH “Absolutely yes” predicts response to Herceptin even though
N9831 and B-31 showed it did not. “I do fear that the dots you are connecting
don’t quite tell a story,” Wolff said. Rather than explain, he wrote: “I think
I will stop here.” He requested to speak by telephone but would not allow
recording it: “Recording our conversation will not be ok and you do not have my
permission.” He added: “My goal was to walk you through your questions
informally as an expert source.” Arguably, Wolff declined to go on record to
explain why FISH remained valid.
Reliably identifying HER2-positive patients might be
impossible. According to Daniel Haber, at Massachusetts General Hospital, “Whether
there are ‘true HER2’ tumors or not is up for discussion.” Instead of HER2
status predicting response to Herceptin, response to Herceptin determines who
is HER2-positive. Said Haber: “the real definition is probably whether they [patients]
respond to HER2 therapy or not…” But if so, HER2 is not a valid biomarker for
Herceptin, and Herceptin has no valid biomarker, and prescribing Herceptin for HER2-positive
patients makes no medical sense.
7. Does
Herceptin’s mechanism of action depend on HER2?
Summary: Recent
research suggests Herceptin does not block HER2 signaling, once considered its
mechanism of action. No new mechanism of action has been clearly established.
Some researchers believe Herceptin might work in HER2 0 patients, i.e.
independently of HER2 status.
Herceptin does not block HER2 signaling
“[T]he talking points,
the posters, the advertisements, are all about ‘HER2 blockade.’ It makes a good story, much simpler to
understand, very pretty pictures, and nicely amenable to commercialization.
Unfortunately it's not true.”
So wrote Mark Moasser, at the University of California at
San Francisco, in email. More formally, in a published paper, Moasser
wrote that Herceptin “was developed on the basis of 1980s understanding of
HER2, and it is now clear that it does not actually inhibit HER2 signaling
functions very well.” Tyrosine kinase inhibitors like lapatinib do block HER2 signaling but the clinical
benefits of lapatinib are scant. Dual anti-HER2 therapy in which lapatinib is
added to Herceptin showed no survival benefit in either the adjuvant or
neoadjuvant settings as tested in the ALTTO and NeoALTTO trials. A
lapatinib-only arm in ALTTO was closed early due to futility.
Additionally, there appears to be no consensus whether
degree of HER2 positivity increases response to Herceptin, with greater
amplification or overexpression leading to more pronounced clinical benefit.
Also unexplained is how Herceptin might work in patients positive for HER2 by amplification
but negative for overexpression. Krop and Burstein further fragment the HER2
edifice: in wondering “qui bono” or
who benefits from Herceptin, they posit that “the mechanisms may differ in
early- and late-stage breast cancer.”
Also, tumor cells appear to convert back and forth between HER2-positive
and HER2-negative. Thus HER2 expression “identifies dynamic functional states,”
according to Jordan
et al. Interconversion may make tumors
heterogeneous for any HER2 signal and might partly explain difficulties linking
HER2 expression to any tumor phenotype.
NSABP B-47
HER2 may have nothing to do with Herceptin’s mechanism of
action: “We don’t know that trastuzumab would not work in the adjuvant setting
for HER2 0 patients,” according to Lou Fehrenbacher. Fehrenbacher is leading a trial, NSABP B-47,
which tests Herceptin in HER2-negative patients. The trialists considered
enrolling HER2 0 patients, but according to Fehrenbacher, this was deemed “too
adventurous.” It might have undermined the entire HER2/Herceptin story. Instead
of asking the question: “do Herceptin’s effects have anything to do with HER2,”
B-47 answers the question: “Should HER2 low patients also be treated with
Herceptin?” a stepwise distancing from current orthodoxy rather than quick, complete
abandonment. B-47, which only includes HER2 1+ or 2+ patients, might also
result in a large increase in patients treated with Herceptin. According to
Fehrenbacher:
“[T]he number of women with 1+ and 2+ non HER2-positive tumors
in the US, is 4x the number with HER2-positive. So if the trial is successful
the number of women benefiting from trastuzumab will rise to a level 500% of
the current number.”
By contrast, a trial design including HER2 0 patients might have
shown no relationship between Herceptin and HER2 or perhaps an inverse
relationship like the re-analysis of B-31.
B-47 represents an opportunity to test both whether
Herceptin works in HER2 0 patients and whether the degree of HER2 positivity
predicts greater benefit from Herceptin. B-47 should add an arm of HER2 0 patients
allocated to Herceptin or placebo. In addition, a partial arm of 3+ patients
should be added, all receiving Herceptin, to allow comparison of the drug’s
effect across the range of HER2 positivity, from 0 to 3+.
ADCC
There is no agreed upon alternative mechanism of action for
Herceptin. A leading but unproven candidate is antibody dependent cellular
cytotoxicity (ADCC). The current FDA label
says “Herceptin is a mediator of antibody-dependent cellular cytotoxicity,” but
only based on in vitro evidence. For clinical evidence, Mark Moasser pointed to
“A recent landmark study… that showed response/resistance to trastuzumab is
powerfully predicted by the immunological signature.” This re-examination of NCCTG
N9831 found “that
immune function genes are strongly linked to clinical outcome.” The authors proposed
a complicated signature comprised “of any nine or more of 14 immune function
genes at or above the 0.40 quantile for the population.”
But a critique by NSABP
B-31 trialists found that randomly selecting any 14 genes at any expression level resulted in an interaction
probability of less than 0.01 in 92% of 10,000 runs conducted using data from
731 patients in B-31. Consequently, “the conclusion that immune-related genes
are driving the observation may not be valid because this criterion can be
eliminated without effect on model performance.”
ADCC remains a hypothesis. “As far as I’m concerned, the
jury is still out whether Herceptin works by ADCC, through other indirect
mechanisms, or through interrupting some signaling pathway,” according to Bert
Vogelstein. “If it does involve ADCC, it would have to discriminate between low
and high amounts of cell surface ERBB2 protein.” It is “not so obvious” how
Herceptin might do this given the widely varying levels of ERBB2 protein on
cancer cells even in HER2-positive tumors.
In 2010, Edith Perez recommended
against Herceptin for HER2-negative patients. One reason: “It doesn’t make any
biological sense,” according to Perez. If Herceptin’s mechanism of action is
independent of HER2, seemingly it would not make biological sense to recommend
Herceptin for HER2-positive cases.
Ultimately, however, Mark Moasser is not worried by there no
longer being an agreed upon mechanism of action for Herceptin: “At the end of
the day, it doesn’t really matter what the mechanism is, as long as it works.”
8. Would
the FDA approve Herceptin today?
Summary: Herceptin’s
approval in the metastatic setting benefited from a new FDA fast track. The
single phase III trial providing the basis for approval underwent extensive
mid-trial modifications—adding different treatment arms and unlike
patients—practices that are no longer permitted. Avastin later faced a
different FDA process which led to revocation of its approval.
In early breast
cancer, had Herceptin’s FDA application been based on the re-analyses of NCCTG
N9831 and NSABP B-31, it presumably would have been rejected. The initially
impressive results presented to the FDA likely required the heavy modifications
made to them, including merging N9831 and B-31 together while dropping one arm
which showed no survival benefit for Herceptin. It is unlikely such changes
would be allowed today. The trials were enabled and shaped by new NCI policies that
allowed cooperative groups like NCCTG and NSABP to conduct phase III trials in
support of FDA approval while permitting those groups to accept funding from
pharmaceutical companies.
Herceptin in metastatic breast cancer
Fast track
Genentech’s Herceptin first won FDA approval in 1998 for
metastatic breast cancer. With the process taking just five months, Herceptin benefitted
from being the second drug to come off a new
FDA fast track. Public perception at the time was that an approval logjam
was blocking life-saving cancer drugs from reaching patients. But going faster
required relaxing standards. On the fast track, “potential” effectiveness was
to be considered with the standard only that “potential effectiveness of the treatment
should outweigh its toxicities.” These educated guesses would not necessarily have
to be checked later: “A post-approval study will not necessarily be required in
the exact population for which the approval was granted.”
Trial H0648g provided the basis of Herceptin’s FDA
application. But according to the FDA review,
“multiple major changes in the protocol were enacted during the conduct of the
study.” The biggest mid-course change added entirely new arms to the trial
after enrollment of only about 100 patients. The original design tested
Adriamycin and cyclophosphamide (AC) against AC + Herceptin (H). The new arms
tested a taxane (T) against T + H.
The new arms were then pooled with the original arms—to the
chagrin of the FDA. It considered patients in the AC and T arms as “clinically
distinct.” The taxane cohort represented a “different prognostic group” than
the AC patients and “baseline characteristics differed markedly between
paclitaxel and AC patients regardless of assignment to Herceptin therapy or
not.” However, the FDA acquiesced on pooling.
Remarkably, as arms were added, the double-blind with
placebo design was dropped and the trial became open label. “Patients and
investigators object to the placebo,” said the FDA, again accepting a fait accompli.
The trial found adding Herceptin to AC made no difference in
overall survival. Similarly, in the new taxane arms, adding Herceptin did not
increase survival. But the pooling of
the AC and taxane arms, which the FDA had frowned upon, produced a
statistically significant overall survival benefit, albeit with a confidence
interval touching
1.0.) But absent the large, mid-course alterations to the trial, Herceptin
would have shown no survival benefit.
Avastin
Avastin, also from Roche/Genentech, lost its FDA approval
for treating breast cancer after post-approval trials failed to demonstrate a
survival benefit. Genentech proposed Avastin for treatment of metastatic HER2-negative
breast cancer. As with Herceptin earlier, an accelerated FDA application for
Avastin relied on an open label trial, E2100. Although the FDA initially approved
Avastin, the review
scolded the drug sponsor: “Genentech did not meet with FDA to reach agreement
on the design of Study E2100 prior to study initiation.” The FDA found a host
of problems with trial E2100 including the open label design and loss of
patients to follow-up:
“[T]he effect on PFS by an independent group, masked to
treatment assignment, was not implemented during the conduct of the trial. Retrospective
analyses by an endpoint review team masked to treatment assignment to
independently confirm the E2100 results was marred by substantial loss to
follow-up prior to the independent review team’s confirmation of disease
progression."
In addition, the lack of independent review led to
investigator bias—toward Avastin. According to the FDA, “the discordance rates
are slightly different for the two study arms, with the difference favoring the
PAC/Bev [Paclitaxel/Avastin] arm over the PAC arm in ECOG
investigator-determined assessment of PFS.” The FDA also looked at missing and
data and found that a worst case analysis resulted in “elimination of the treatment
effect altogether.”
The FDA examined financial ties to the sponsor and found
five of the sixteen members on the data monitoring committee members received
payments greater than $25,000 from Genentech. A sixth reported compensation
that “could be affected by the study outcome.” In addition, “Eight out of 26
investigators (30%) who provided financial disclosure in the E2100 study
administration body and data monitoring committee reported financial conflict
of interest for receiving payment from pharmaceutical companies.” One of the study
co-chairs “failed to reply to the Financial Disclosure requests.”
For Herceptin, by contrast, the FDA did not examine financial
ties of trial investigators. But when the study was published
in the New England Journal of Medicine, nine of the 12 authors reported
relationships with Genentech. The FDA allowed arms to be added mid-trial for
Herceptin whereas for Avastin, simply starting a trial without it being OK’d by
the FDA drew censure.
Subsequent testing of Avastin in a double-blind, placebo-controlled
design required by the FDA found no overall survival benefit, and the FDA revoked
its approval of Avastin for breast cancer. For Herceptin to show a survival
benefit in the metastatic setting had required pooling of arms the FDA regarded
as distinct in an open label design.
Herceptin’s approval hurdles were lower; it might not have
met later, higher standards.
Re-analyses re-visited
In early breast cancer, the FDA approved Herceptin in 2006 based
on a joint analysis of NCCTG N9831 and NSABP B-31. But by 2007, B-31 trialists reported that HER2 didn’t predict response to Herceptin, whether measured by IHC or
FISH: “No statistical interaction was found between DFS benefit from
trastuzumab and levels of protein (p=0.26) or HER2 gene copy number (p=0.60).”
And although the authors reported no statistical interaction, HER2-negative
patients appeared to benefit more
than HER2-positive cases. Corroborating B-31’s results, N9831 trialists reported in
2010 that FISH did not predict response to Herceptin: “Trastuzumab benefit
seemed independent of HER2/centromere 17 ratio and chromosome 17 copy number…”
Had these results been presented to the FDA when considering
the application for Herceptin in early breast cancer, the application presumably
would have been rejected.
“Joint” trial
What the FDA saw in the Herceptin application was a single
successful trial, which was actually made from two studies merged together,
with one arm discarded. The unplanned changes were made while the trials were
in progress.
In N9831, Arm B tested sequential Herceptin in roughly one
thousand women and ultimately showed no
overall survival benefit from Herceptin: five-year survival for arm B was
89.3% versus 88.4% in the control arm. Arm B was dropped when N9831 was joined
to NSABP B-31.
Although the FDA went along with merging the trials, the
oncology community was divided. In 2006, one specialist noted that “In
terms of combining the data from the two trials, some oncologists were
initially questioning whether that was legitimate.” Sandra Swain, who was at
NCI when the trials were joined, answered it was “clearly legitimate.” She
asserted that the trials were combined because they were going well: “No one
had any idea that we’d have the benefit that we do.” However, joining trials
increases statistical power, enabling detection of weaker effects while,
obviously, dropping a low or non-performing group of patients might have
enhanced the perceived effects of Herceptin in the remaining arms.
An FDA spokesperson offered conflicting answers in 2014
regarding whether the individual trials would have met their endpoints,
initially saying: “The FDA cannot speculate on if the trials would or would not
have met their original endpoints.” But subsequently the spokesperson
speculated that the trials would have been “likely to demonstrate efficacy as
individual trials…”
No results for B-31 have been published. A number of
researchers suggested in a letter, “Trastuzumab: possible publication bias,” published
by the Lancet in 2008, that the results of the individual trials should be
published separately. Asked in 2014 for efficacy data, NSABP’s Soon Paik
declined, saying only that “they are essentially the same as what is in the
combined analysis.”
A meta-analysis of Herceptin being conducted by the Clinical
Trials Services Unit (CTSU) at Oxford will include all N9831 patients,
including Arm B. According to CTSU’s Richard Gray: “We will analyse the
combined concurrent and sequential trastuzumab arms versus no trastuzumab from
the 3-way randomisation periods of N9831,” as well as the concurrent and
sequential arms separately.
Cooperative Groups
The N9831 and B-31 trials were conducted by cooperative
groups, the North Central Cancer Treatment Group (NCCTG) and the National
Surgical Adjuvant Breast and Bowel Project (NSABP) respectively. Originally,
cooperative groups were funded by NCI. However, in 2000, NCI allowed
cooperative groups to accept industry funding. And two years before, the FDA said it
would accept trials performed by cooperative groups to support applications for
FDA approval. Previously, cooperative groups mostly conducted phase II trials.
Arguably, these decisions transformed a public and publically-funded system
into one dominated by pharmaceutical companies. B-31 and N9831 were started
around the time of the new NCI and FDA policies, in July 1999 and April 2000.
In 2002, the FDA sought to tighten a number of clinical
trials policies, but they were opposed
by the cooperative groups. NSABP, joined by NCCTG, challenged the FDA reforms
in a letter signed by John Bryant, the statistician for the joint N9831/B-31
trial. The FDA had sought to treat the cooperative groups as a sponsor, perhaps
because they had begun receiving industry funding. Also, the FDA wanted more
blinding of study teams and greater independence of statisticians preparing
reports. But the NSABP letter answered
that “it will not be practical to arrange for statisticians independent of the
Cooperative Groups to prepare and present interim reports…” There were too many
trials and “simply not enough qualified personnel available to do so.” The
cooperative groups claimed these and other proposed changes would have a
“substantial negative impact” on clinical trials including even patient safety.
Concern about industry funding of previously trustworthy
cooperative groups surfaced at a 2009 NCI workshop on “Multi-Center Phase III
Clinical Trials and NCI Cooperative Groups.” As one participant said:
“If we do not have a robust independent review of these
trials, the criticism will be raised quite quickly that these trials are being
done by industry and that public dollars should not pay for them. What will
protect these trials is that they have a very robust independent review, not
just a cooperative group–only review.”
The FDA audited none
of the US Herceptin trials. According to the FDA medical review of the joint
N9831/B-31 trial, “A DSI [Division of Scientific Investigations] inspection was
not performed for this application; given the large number of sites and small
percentage of patients enrolled at any individual site, no single study or
limited number of sites would have substantial impact on the study results.” If
multiple sites and widely distributed patients protect against improprieties,
then perhaps no phase III trial would ever need to be audited.
The FDA was unable to confirm that the cooperative groups
audited their Herceptin trials: “Because of the nature of the conduct and
reporting of the clinical site audits, it cannot be determined whether a
specific study was audited during the clinical site inspection…” A statement by
the sponsor about audits provided “no information on the actual results of site
audits,” according the FDA review of Herceptin.
(I suggested to Richard Gray that the CTSU Herceptin meta-analysis
could attempt to reproduce the results of the individual studies as one kind of
check on the un-audited trials.)
The possibility of investigator bias was not examined. The FDA
“did not request confirmation of the events by an independent endpoint
assessment panel that was masked to treatment assignment.” The FDA reported “approximately
4% of the population in the ITT efficacy dataset had missing information with
respect to surgical type, nodal status, hormone receptor status, tumor size,
histological grade and histologic type.” However, the FDA did not examine whether
the gaps could have influenced trial endpoints, whereas in the case of Avastin,
a worst case analysis found that missing data eliminated the reported treatment
effect.
Irregularities
In 2014, the FDA modified the Herceptin label
to state for the first time that the drug increases overall survival in early
breast cancer. However, the benefit was found in an “efficacy evaluable”
population rather than the gold standard, intention to treat population (ITT). In
an April 2014 conference call, the FDA asserted that the ITT and efficacy
evaluable populations were identical and that the sponsor, Roche/Genentech,
requested that the label read “efficacy evaluable.” Why a pharmaceutical
company would request a lower grade of evidence for the lifesaving benefits of
its drug is not clear.
Also, in the joint trial, disease free survival falls while
overall survival climbs. Perhaps only Provenge demonstrates a similar pattern
among cancer drugs. Provenge does not enjoy the same reputation for efficacy as
Herceptin.
“It is what [it] is,” N9831 statistician Vera Suman wrote in
email.
|
Median follow-up
|
HR: disease event
|
HR: death
|
2005
NEJM
|
2.5 years
|
0.48 (0.39-0.59)
|
0.67 (0.48-0.93)
|
2011
JCO
|
3.9 years
|
0.52 (0.45-0.60)
|
0.61 (0.50-0.75)
|
2012
SABC
|
8.4 years
|
0.60 (0.53-0.68)
|
0.63 (0.54-0.73)
|
Joint N9831/B-31 trial results
over time (Source: Vera Suman personal communication, 18 October 2013)
Also, in the final report on the joint study, years of
median follow-up took an unusually large, 4.5-year leap in the space of
approximately one calendar year. Rebecca Gelman, statistician at the Dana
Farber Cancer Institute, brought this to my attention in 2013:
“As a side comment,
this all leads me to wonder about the ‘8.4 years of follow-up’ in the 2012 SABC
abstract, since it is so much longer than the 2011 JCO paper. Either someone
did a big update of survival in 2012 (by calling all the patients or by
checking the National Death Index), or else the SABC abstract was reporting OS
at a time past the median survival).”
Another statistician described the leap in follow-up as
“impossible,” saying that median follow-up usually goes up about one year for
every calendar year. The FDA said the difference might be explained by the data
lock dates for the two papers. However, the agency didn’t provide dates that would
allow verifying their explanation.
9. Can
trial estimates of survival increases be squared with population-level survival
figures?
Summary: Some medical
researchers have suggested that therapies containing Herceptin may cure breast
cancer. A Genentech-funded study estimated Herceptin saved 156,413 total life
years in the United States from 1999 to 2013 for metastatic breast cancer alone.
However, NCI reports only a 1.1% increase in five-year survival over a similar
period. Estimates of Herceptin’s life-extending benefits should be compared to
population-level figures.
HER2 prevalence at the
population level is only 15%, according to NCI, well below early estimates of
25-30%.
Impact of Herceptin on five-year survival at the population level
At the 2012 SABC, presenting
joint N9831/B-31 results, co-primary investigator Edith Perez advanced the idea
that Herceptin cures breast cancer: “We believe that the data support the
concept that many patients who present with HER2-positive breast cancer may be
cured with combination strategies.” Herceptin had come a long way. Dennis
Slamon, a main progenitor of Herceptin, originally believed that, by itself,
Herceptin was only cytostatic, halting tumor growth which “resumed on
termination of antibody therapy, indicating a cytostatic effect.”
According to a Genentech-funded study, Herceptin has saved 156,413 total life years in the
United States from 1999 to 2013 for metastatic breast cancer alone. But it is
unclear if population level statistics corroborate Herceptin’s curative powers.
It is not known whether five-year survival has increased as much as it would
need to in order to match the Genentech-funded estimate of years of life added.
“We did not try to triangulate our results to the overall population,” said
corresponding author of the study, Mark Danese.
In the overall population, according to NCI’s Jenny Haliski,
“we are seeing a small increase in survival since 1998,” the year of
Herceptin’s first FDA approval. Haliski is NCI Media Branch Chief. “Part of
this increase can be attributed to improvements in treatment,” said Haliski. However,
clinical trials are conducted in “ideal situations and usually include younger
patients without comorbidity,” according to Haliski. “Thus, treatment efficacy
in a clinical trial is usually higher than treatment effectiveness at the
population level.”
It ought to be possible and instructive to decompose the
1.1% increase in five-year survival from 1999 to 2012 to determine the
contribution from Herceptin. As I have suggested to Richard Gray, CTSU’s
meta-study could and perhaps should try to square its estimate of Herceptin’s
benefits with population-based survival figures.
Herceptin won FDA
approval for metastatic breast cancer in 1998 and early breast cancer in 2006.
(Chart source: National Cancer Institute, SEER Cancer Statistics Review
1975-2013, Table 4.13, all ages, all races)
HER2 prevalence
Early estimates for HER2 breast cancer prevalence of 25-30%
(e.g. Slamon
et al.) have not been reproduced at the population level. NCI puts
prevalence at only 14.9%
based on SEER reporting.
10. Can the Cleopatra and Marianne trials be reconciled?
Summary: The Cleopatra
trial, which added pertuzumab to Herceptin and a taxane, produced the largest
survival increases of any of clinical trial of Herceptin, nearly 16 months. But
the Marianne trial seems to contradict Cleopatra. Marianne tested a version of
Herceptin, T-DM1. Adding pertuzumab provided no more clinical benefit than T-DM1
alone. The phase II NeoSphere trial of pertuzumab and Herceptin also did not produce
the remarkable results of Cleopatra.
Adding pertuzumab to Herceptin and a taxane in the Cleopatra
trial yielded a remarkably large increase in survival, nearly 16 months longer
than the standard of care, Herceptin + taxane. However, the Marianne trial
seems to contradict Cleopatra: an arm testing pertuzumab with the
Herceptin-based T-DM1 did no better than Herceptin + taxane. As a notice on the
ASCO website said: “the addition of pertuzumab to T-DM1 provided no efficacy
benefit.” T-DM1 conjugates the cytotoxic emtansine to the Herceptin antibody.
Similarly, in the neoadjuvant setting, adding pertuzumab to
Herceptin showed no benefit in the NeoSphere trial which reported
that “progression-free survival and disease-free survival at 5-year follow-up
show large and overlapping CIs.” Pertuzumab by itself showed
very little single agent activity in a phase II trial, so the benefit of
combination with Herceptin is presumably synergistic. Why would it not also be
synergistic with T-DM1?
Paul Ellis, who led the Marianne trial, pointed to a “number
of possibilities and probably a mix of a number of issues” that explained why
pertuzumab showed no benefit. In Cleopatra, said Ellis, “patients have Taxol/
Taxotere as a backup” if they do not respond to Herceptin. However, T-DM1 by
itself performed just as well as Herceptin plus a taxane. No “backup” needed,
and the question is why including pertuzumab added nothing in Marianne.
Ellis also observed that the “Herceptin dose per week [was] higher
than T-DM1.” Yet the dose of T-DM1 was apparently high enough to perform as
well as H + T. And there does not appear to be support for another trial with a
different dose. Said Ellis, T-DM1 “will now never see the light of day” in
early breast cancer.
Also figuring in Ellis’ possibilities were “slightly
different patient populations.” However, the differences would need to be
extreme rather than slight: no response at all to pertuzumab in Marianne and
incredible life-extending responses among Cleopatra patients.
That leaves the idea that “maybe [T-DM1] binds differently
and alters configuration in a different way” than Herceptin. However, Ellis
acknowledged this directly contradicted expectation: “Every senior clinician I
know in his area expected Marianne to be positive!” In addition, prior to
Marianne, one research group reported
“T-DM1 plus pertuzumab resulted in synergistic inhibition of cell proliferation
and induction of apoptotic cell death” while another found
“Trastuzumab-DM1 (T-DM1) retains all the mechanisms of action of trastuzumab.”
Commented Ellis: “I think this study [Marianne] has forced them to go back into
the lab and try and understand it better.” According to Ellis, “even the guy at
Genentech who invented both Pertuzumab and T-DM1 can’t really understand why”
pertuzumab did nothing in Marianne.
In other words, it appears that conjugating emtansine to
Herceptin completely cancels synergy with pertuzumab, although both drugs were
designed by the same person. Alternately, Marianne disconfirms the results of
Cleopatra.
I also asked Allan Lipton about the Cleopatra-Marianne
dissonance. Lipton replied: “I do not think I am the right person to answer
your Cleopatra questions.” But Lipton has co-authored several papers on
alternative assays for determining HER2 status and investigated HER2:HER3
dimerization and pertuzumab. I replied to Lipton: “I wonder if you aren’t the
ideal person to answer such questions.” He demurred: “I don't think I have any
answers for you on these observations from clinical trials.”
Although pertuzumab is frequently described as completing
the blockade of HER2 and HER3, according to Mark Moasser, “pertuzumab doesn’t
interfere with dimerization when HER2 is overexpressed.” HER2 overexpression has
been thought of as the sine qua non
of HER2-positive breast cancer. Moasser emphasized that it is “very true” that
pertuzumab doesn’t block HER2 signaling when HER2 is overexpressed. Instead,
“trastuzumab and pertuzumab work through immunologic mechanisms in HER2-positive
cancers, and two antibodies provides double the tumor cell coverage and better
immunologic targeting by the immune system.” He added: “This is not universally
accepted by everyone but at this point the data is pretty clear to me and many
others.”
Moasser attributes the disappointing performance of
pertuzumab + T-DM1 in Marianne to the absence of a taxane: “I would say it's
because taxol (or taxotere) is so effective, it’s not a shortcoming of T-DM1.” Paul
Ellis advanced a similar argument. However, T-DM1 by itself performed as well
as Herceptin and a taxane. In fact, progression free survival with T-DM1 alone
was higher, 14.1 months vs. 13.7 months although not significantly. But adding
pertuzumab to T-DM1 did nothing.
According to Moasser:
“Chemos have a 12-hour high concentration exposure and cause
a lot of tumor cell kill in a short time leading to release of many cellular
antigens, etc. T-DM1 provides continuous
exposure and there is incremental tumor cell killing day-by-day rather than
mass killing on one day. That may be less immunogenic than the chemo method.”
However, emtansine provided enough immunologic kick for T-DM1
to equal the clinical benefits of Herceptin and a taxane. Thus Moasser’s
explanation for the futility of pertuzumab seems to require that pertuzumab has
different immunological prerequisites than T-DM1.
In the trial
which won Herceptin initial FDA approval, adding a taxane to Herceptin delayed
disease progression by 3.9 months, while in Cleopatra, further adding
pertuzumab to the regimen added nearly 16 months. This quite massive effect is
unexplained. Said Moasser: “I don’t claim to know all the nuances of how chemo
and immunology interact with each other, and frankly nobody really does, the
field is still in its infancy.” The pharmacologists, however, have somehow hit
a home run with Herceptin and pertuzumab although swinging as if with eyes
closed.
With EGFR inhibitors in lung cancers or BRAF inhibitors in
melanomas, the mechanisms of action are clear as are the clinical results.
However, said Bert Vogelstein, “we do not know how or why Herceptin works,” and
the conflicting results of Cleopatra and Marianne show “that all conclusions or
predictions are on thin ice,” according to Vogelstein.
Conclusion
The HER2 and Herceptin story used to be simple and
compelling: we knew who it worked for and why. Now we don’t, despite nearly two
decades of learning. The current balance of scientific evidence arguably no
longer supports the idea of a HER2 subtype in breast cancer.
There is conflicting evidence whether HER2 is even transforming
and whether it drives breast cancer. Also, the Ross et al. literature reviews
supporting the prognostic role of HER2 are especially dubious. (Those papers
should be corrected or retracted.) At present, the view that HER2 is prognostic
is unsupported.
Although medical diagnostics have gray areas, the
reproducibility of HER2 testing appears to be in a range where it perhaps
should not be considered scientific. Different pre-analytic conditions,
different assays, different subjective assessment criteria, tumor heterogeneity
and the lack of any gold standard lead to conflicting results which are
resolved arbitrarily. That the Hera trialists evade or perhaps even dissimulate
regarding investigations of particular subgroups that could help validate or
further discredit FISH and IHC might point to a widening disparity between appearance
and reality. The main Herceptin orthodoxy has broken down completely: Herceptin
does not block HER2 signaling and its mechanism of action might have little or nothing
to do with HER2.
Nonetheless, Krop and Burstein contend: “Beyond
a doubt, trastuzumab works.” Yet absent questionable modifications to key trials,
Herceptin might not have won FDA approval. Avastin, which lost FDA approval,
also works for some breast cancer patients, but there is no biomarker to
predict response. The published literature demonstrates that HER2 does not
predict response to Herceptin, leaving Herceptin without a valid biomarker. To
paraphrase Daniel Haber: “HER2-positive” just means “responds to Herceptin.” Even
HER2-negative patients can benefit, perhaps even more than HER2-positive
patients. Seemingly, either all breast cancer patients should get Herceptin or
none should, the latter option representing the FDA’s decision for Avastin.
At present, the standard of care is for all breast cancer
patients to be tested for HER2. The tests suffer very considerable
reproducibility problems. In addition, based on the re-analysis of clinical
trials leading to FDA approval, HER2 doesn’t predict response to Herceptin. We
don’t know who should get Herceptin but current guidelines pretend otherwise with
HER2 tests that are too much like divining rods.
The clinical benefits of Herceptin might be smaller than
thought. The modifications of the trials leading to FDA approval might have
artificially pumped up the drug’s benefits. But in addition, at the population
level, five-year survival has only increased about 1.1% since the introduction
of Herceptin. Converting that modest rise into median number of months of
increased survival per Herceptin patient might be instructive—perhaps
corrective—of strong claims regarding the curative powers of
Herceptin-containing treatment regimens.
The Cleopatra trial reported the largest increases in survival
of any Herceptin trial ever. The addition of pertuzumab to Herceptin and a
taxane pushed median survival up by an incredible 16 months, whereas adding the
supposed workhorse of the two, Herceptin, to a taxane produced only a 4-month
rise. Furthermore, in the Marianne trial, adding pertuzumab to the
Herceptin-based T-DM1 did no better than T-DM1 alone, adding zero months of
survival instead of 16. Worryingly, researchers who might be able to explain
the seemingly contradictory results are silent. Somewhat as with conflicting
HER2 assessments, researchers and physicians can just choose what to believe.
A kind of HER2 fundamentalism has taken hold as foundational
truths have broken down: “clinicians should rely on established markers of HER2
expression for selecting patients,” suggested Krop and Burstein. But those very
same biomarkers are what have been dis-established.
Also, “established” does not mean valid, rather physicians are counseled to use
the old knowledge from when the HER2/Herceptin story was compelling and
coherent.
Like efforts to keep the earth at the center of the solar
system, complicated epicycles have been devised to hold on to HER2 orthodoxies.
A simpler explanation might better fit the contradictory evidence: while HER2
overexpression and amplification are real phenomena, there might not be a
clinically meaningful HER2 breast cancer subtype.
Summary of Recommendations
- Reconduct
the experiments addressing whether HER2 is transforming in mouse cell lines
- Add
arms to B-47. Include HER 0 patients, receiving either Herceptin or a placebo,
and a partial arm of HER2 3+ patients, all receiving Herceptin
- Decompose
the 1.1% increase in five-year breast cancer survival from 1999 to 2012 to
determine the contribution from Herceptin
The Herceptin meta-analysis being
conducted by the Clinical Trials Services Unit at Oxford should:
-
Attempt to reproduce findings of the individual studies, including the joint N9831/B-31 trial that led to FDA approval of Herceptin in early breast cancer
- Estimate the likelihood of assessed HER2 status being correct, if that is possible
- Allow confidence intervals around Herceptin’s clinical benefits to reflect estimated HER2 test accuracy
- Check estimates of Herceptin’s contribution to overall survival against population-based survival figures