Help interpreting "five sigma" standard?

1,073

Solution 1

I think this question may arise from a difference between somewhat rough layman's-terms presentations and the more careful statistics which goes on in the actual labs. But even after a given body of data has been analyzed to death, there is no formal way to capture in full the evidence underlying the way knowledge of physics grows. The evidence surrounding the Higgs mechanism, for example, would not be nearly as convincing if the Higgs mechanism itself were not an elegant combination of ideas which already find their place in a coherent whole.

The hypothesis that one is gathering evidence against is always the hypothesis that we are mistaken as to how a given body of data (such as a peak in a spectrum) came about. The mistake could be quite simple, as for example when in fact the underlying distribution is flat and the peak is an artifact of random noise. But usually one has to consider the possibility that the peak is there but is owing to something else than the mechanism under study. The hypothesis one is testing in the strict sense---the sense of ruling out at some level of confidence---is the set of all other ways we have thought of yet as to how the data could arise. In this set of ways we only need to consider ways that reflect known physics and known amounts of noise etc. in the apparatus.

I think what the community of physicists do is a bit like Sherlock Holmes: we try to think of plausible other ways the data could arise, and then give reasons as to why those other ways can be ruled out. The final step, where we proceed to the claim that the leading candidate explanation is what really happened, is not a step that can be quantified by any statistical measure. This is because it relies not only on a given data set, but also on a judgement about the quality of the theory under consideration.

Solution 2

The Higgs-discovery experiment is a particle-counting experiment. Lots of particles are produced by collisions in the accelerator, and appear in its various detectors. Information about those particles is stored for later: when they appeared, the direction they were traveling, their kinetic energy, their charge, what other particles appeared elsewhere in the detector at the same time. Then you can reconstruct “events,” group them in different ways, and look at them in a histogram, like this one:

Higgs-discovery histogram

[Mea culpa: I remember this image, and others like it, from the Higgs discovery announcement, but I found it from an image search and I don’t have a proper source link.]

These are simultaneous detections of two photons (“diphotons”), grouped by the “equivalent mass” $m_{\gamma\gamma}$ of the pair. There are tons and tons and tons of photons rattling around these collisions, and directional tracking for photons is not very good, so most of these “pairs” are just random coincidences, unrelated photons that happened to reach different parts of the detector at the same time. Because each collision is independent of all the others, the filling of each little bin is subject to Poisson statistics: a bin with $N$ events in it has an intrinsic “one-sigma” statistical uncertainty of $\pm\sqrt N$. You can see the error bars in the total-minus-fit plot in the bottom panel: on the left side, where $N\approx 6000$ events per interval in the top figure, the error bars are roughly $\sqrt{6000}\approx 80$ events; on the right side, where there is less signal, the error bars are appropriately smaller.

The “one-sigma” confidence limit is 68%. Therefore, if those data were really independently generated by a Poissonian process whose average behavior were described by the fit line, you would expect the data points to be equally distributed above and below the fit, with about 68% of the error bars crossing the fit line. The other third-ish will miss the fit line, just from ordinary noise. In this plot we have thirty points, and about ten of them have error bars that don’t cross the fit line: totally reasonable. On average one point in twenty should be, randomly, two or more error bars away from the prediction (or, “two sigma” corresponds to a 95% confidence limit).

There are two remarkable bins in this histogram, centered on 125 GeV and 127 GeV, which are different from the background fit by (reading by eye) approximately $180\pm60$ and $260\pm60$ events. The “null hypothesis” is that these two differences, roughly $3\sigma$ and $4\sigma$, are both statistical flukes, just like the low bin at 143 GeV is probably a statistical fluke. You can see that this null hypothesis is strongly disfavored, relative to the hypothesis that “in some collisions, an object with mass near 125 GeV decays into two photons.”

This diphoton plot by itself doesn’t get you to a five-sigma discovery: that required data in multiple different Higgs decay channels, combined from both of the big CERN experiments, which required a great deal of statistical sophistication. An important part of the discovery was combining the data from all channels to determine the best estimate for the Higgs’s mass, charge, and spin. Another important result out of the discover was the relative intensities of the different decay modes. As another answer says, it helped a lot that we already had a prediction there might be a particle with this mass. But I think this data set shows the null hypothesis nicely: most of ATLAS’s photon pairs come from a well-defined continuum background of accidental coincidences, and the null hypothesis is that there’s nothing special about any of the photon pairs which happen to have an equivalent mass of 125 GeV.

Solution 3

The null hypothesis here is that the data was generated by physics which obeys the effective field theory describing all the Standard Model particles except the Higgs. This model doesn't usually have a name, but could reasonably be called the 'Standard Model without Higgs'. It's a perfectly good effective field theory. It's predictions are barely different from the usual Standard Model (with Higgs).

Asking for a 5 sigma rejection of the null hypothesis in this case means accumulating a lot of data which is incompatible with the 'Standard Model without Higgs". Enough data that a couple of 1 sigma experimental errors don't ruin the result.

Solution 4

Refresher on hypothesis testing
In (frequentist) hypothesis testing one always have (at least) two hypotheses: the null hypothesis, and the alternative hypothesis. Then p-value is the probability of observing certain dataset, given that the null hypothesis is true, whereas the power of the test is the probability that the alternative hypothesis is true, given the observed data.

If p-value is smaller than a pre-defined threshold (significance level), one rejects the null hypothesis as improbable. In the example given in the OP the data is supposed to follow the Gaussian/normal distribution, and five sigma determines the significance level in terms of this distribution (a rather stringent one).

What does it have to do with Popper?
From statistical viewpoint, Popperian epistemiology simply means that designing a test to reject a hypothesis and calculating its p-value is usually easier than calculating the test power (which typically requires some ad-hoc assumptions about the underlying probability distributions). In other words, disproving the null hypothesis is easier than proving that the alternative hypothesis is correct. One then chooses the null hypothesis in such a way that it can be disproved, rather than trying to prove it. Choosing whether the particle exists as the null hypothesis and particle does not exist as the alternative one, or vice versa, depends not on the philosophical meaning of either statement, but on our ability to disprove it.

Remark
In my opinion the chapter on statistical testing published by the Partciel Data Group is one of the best crash courses on statistics for physicists.

Share:
1,073

Related videos on Youtube

Him
Author by

Him

Updated on May 22, 2021

Comments

  • Him
    Him over 2 years

    So, I am coming from a math/stats background. I was just wondering about this in the abstract, and tried Googling around, and came across this article which says the following regarding some experiments undertaken at CERN:

    it is the probability that if the particle does not exist, the data that CERN scientists collected in Geneva, Switzerland, would be at least as extreme as what they observed.

    But, "does not exist" doesn't seem to me to be a very well-defined hypothesis to test:

    In my understanding of frequentist hypothesis testing, tests are always designed with the intent to provide evidence against a particular hypothesis, in a very Popperian sort of epistemology. It just so happens that in a lot of the toy examples used in stats classes, and also in many real-life instances, the negation of the hypothesis one sets out to prove wrong is itself an interesting hypothesis. E.g. ACME corp hypothesizes that their ACME brand bird seed will attract >90% of roadrunners passing within 5m of a box of it. W.E. Coyote hypothesizes the negation. Either can set about gathering data to provide evidence against the hypothesis of the other, and because the hypotheses are logical negations of one another, evidence against ACME is evidence for W.E.C. and vice versa.

    In the quote above, they attempt to frame one hypothesis as "yes Higgs' Boson" and it's negation as "no Higgs' Boson". It seems that if the intent is to provide evidence for "yes Higgs' Boson", then in normal frequentist methodology, one gathers evidence against "no Higgs' Boson" and can quantify that evidence into a p-value or just a number of standard errors of whatever quantity predicted by the theory we happen to be investigating. But this seems to me to be silly, since the negation of the physical model that includes the Higgs' is an infinite space of models. OTOH, this is the only context in which the "five sigma" p-value surrogate seems to make any sense.

    In fact, this was my original thought when I set out Googling: the five sigma standard implies that we are gathering evidence against something, but modern physics theories seem to encompass such a breadth, and are yet so specific, that gathering evidence against their bare negation is nonsense.

    What am I missing here? What does "five sigma" evidence for the Higgs hypothesis (or other physics hypotheses) mean in this context?

    • Brick
      Brick over 2 years
      I suspect that the words right before where you started quoting this are critical. Without reading the article we don't know precisely what "it" is.
    • Him
      Him over 2 years
      @Brick """In short, five-sigma corresponds to a p-value, or probability, of 3x10-7, or about 1 in 3.5 million. This is not the probability that the Higgs boson does or doesn't exist; rather, it is the probability that if the particle does not exist, the data that CERN scientists collected in Geneva, Switzerland, would be at least as extreme as what they observed."""
    • Him
      Him over 2 years
      "it" is the five sigma standard, which I suppose in normal frequentist lingo is more akin to "alpha" than a p-value, but /shrug.
    • Richard Myers
      Richard Myers over 2 years
      It's worth pointing out, I think, that not all data analysis in particle physics is frequentist. Also, if you're already familiar with statistics and want to quickly get familiar with what, specifically, particle physicists are doing, I recommend the particle data group which has many reviews.
    • Daniel R. Collins
      Daniel R. Collins over 2 years
      @Him: Actually, no, the "it" at the start of the quote refers to a p-value. Suggest you edit the question to include the full quote.
    • Him
      Him over 2 years
      @DanielRCollins Sure, but the journalist presumably meant to refer to $\alpha$ and not $p$, and $\alpha$ is merely a different unit scale for five sigma. I included the quote as I did because I didn't want to have a conversation about unit conversion or about poor scientific reporting.
  • Him
    Him over 2 years
    My point is that "particle does not exist" is not a physical model that makes a prediction that can be disproved. How can one possibly provide evidence against this statement?
  • Roger Vadim
    Roger Vadim over 2 years
    @Him We are talking here about the experimental measurements: we observe certain results and explain them using a theory that assumes that the particle does not exists. E.g., two particles come into collision, and one can assume that they annihilate OR that they combine into a new particle. In some cases writing down a theory for two particles annihilating and testing it is easier, than designing a description fo a completely unnown particle.
  • Him
    Him over 2 years
    So, the "five sigma" interpretation is the normal one, and the evidence provided is, in fact, against some other well-defined model that makes a measurable prediction. I suppose that this is considered strong evidence for the Higgs theory because it is the only known competing theory, and there currently exists no evidence against the Higgs theory?
  • Roger Vadim
    Roger Vadim over 2 years
    One cannot prove Higgs theory... but one coudl disprove the theory that assumes that there is no Higgs particle. And the probability that we are mistaken (p-value) is so small that it is negligieable for all practical purposes.
  • Him
    Him over 2 years
    My point is that "the theory that assumes that there is no Higgs particle" is not simply the complement of "the theory that assumes that there is a Higgs particle". The theory that assumes that there is a Higgs particle makes quantitative predictions that we can measure. The complement of this theory is, in fact, an infinite set of theories that may each make their own various predictions. So one can disprove a theory that happens to not be the Higgs theory.
  • Him
    Him over 2 years
    I think that your comments here are making things somewhat clearer. You are saying that we mean "five sigma" as in: "The Higgs theory makes a measurable prediction X. We are pretty sure that any sane models that simultaneously agree with other well-established theories and also disagree enough with Higgs theory to be described as different theories make a measurable prediction of something very close to Y that we can compare with X. We make a sample of measurements, and get something that is five standard errors away from Y, but which is not so distant from X."
  • user1504
    user1504 over 2 years
    And of course the alternative hypothesis is that the data was generated by physics obeying the effective field theory we call the Standard Model.
  • Him
    Him over 2 years
    "barely different" I think that maybe something surrounding this statement is actually the crux of the matter. This theory is "barely different" is 99.99999% of ways, but is "very different" in this one particular prediction that we've decided to measure. The discussion with @RogerVadim has me thinking that the implication is that "Standard model with Higgs" is considered to be "very different" in this particular way from any sensible extensions of the "Standard Model without Higgs" that might be reasonably considered actually different theories from the "Standard Model with Higgs"
  • user1504
    user1504 over 2 years
    @Him I'm afraid I can't parse your last sentence.
  • Him
    Him over 2 years
    my point here is that "the effective field theory describing all the Standard Model particles except the Higgs" is not a theory. It is a set of theories. There are many possible extensions of the Standard Model that may include a spike in particle event frequency at a given energy. Not all of those are sensible extensions. For example, the "Standard Model with 7TeV magical nano wizard". Not all of them are really different from "Standard Model with Higgs". For example, the "Standard Model with Biggs particle that is just like Higgs but with a "B""
  • Him
    Him over 2 years
    So, in this sense, even though "Standard Model without Higgs" is an infinite set of models, we say that we've got "five sigma" evidence against them all because we've got "five sigma" evidence against the subset of those hypotheses that we have imagined so far, and which are reasonable according to our current understanding of physics, and are not essentially equivalent to "Standard Model with Higgs"
  • Him
    Him over 2 years
    I think that @AndrewSteane's explanation is a good paraphrase of this.
  • Nat
    Nat over 2 years
    @Him: The null hypothesis was that there wasn't a particle of the approximate mass. This was rejected to 5-sigma confidence. The initial discovery was of such a particle of such mass; further arguments were used to make the case that the particle was the Higgs. This article kinda talks about it.
  • Andrew Steane
    Andrew Steane over 2 years
    Nice answer; I have a query about the red line on the graph. First, I think it would be appropriate to present the data without such a curve, so as not to pre-judge the issue of whether or not there is a significant bump at 126 GeV. But if we go ahead and fit a peak, then how come the one shown is slightly to the left of the data? It is also a tad too wide I think (compared to a least-squares best fit).
  • rob
    rob over 2 years
    @AndrewSteane I think the short answer to your question is that the best-fit mass and decay width for the Higgs aren’t a fit to just this diphoton data, but at the same time to other Higgs decay channels. (The long answer is probably “read the discovery papers.”) I agree with you that a best fit for these data alone would probably have the peak a little narrower and at slightly higher mass. But, in the spirit of this answer about the null hypothesis, I think you would also agree that the red curve is in no way excluded.