The primary objective of the CODATA Task Group on Fundamental Constants is ‘to periodically provide the scientific and technological communities with a self-consistent set of internationally recommended values of the basic constants and conversion factors of physics and chemistry based on all of the relevant data available at a given point in time’. I discuss why the availability of these recommended values is important and how it simplifies and improves science. I outline the process of determining the recommended values and introduce the principles that are used to deal with discrepant results. In particular, I discuss the specific challenges posed by the present situation of gravitational constant experimental results and how these principles were applied to the most recent 2010 recommended value. Finally, I speculate about what may be expected for the next recommended value of the gravitational constant scheduled for evaluation in 2014.
Early in the twentieth century, it became apparent that there was a need for consensus values of fundamental physical constants. With rapid advances in quantum mechanics and relativity, and their incorporation into physics, a new understanding of the physical universe emerged, which, in part, depended on the properties of quanta and elementary particles. The more recent advances in quantum electrodynamics, quantum chromodynamics and the standard model, as well as in other branches of physics, have continued this evolution, refining our understanding of these elementary particles and their physical relationships. To test and improve these theories and their universality, it is necessary to compare measurements, and this invariably leads to the seemingly simple problem of different scientists using different values of the same physical constants. Comparing results of a couple of decades that use different values (sometimes new, sometimes old and sometimes just different) quickly complicates a simple problem into a mess. Physics needs standardization just like industry to meaningfully compare things over space and time.
Of course, comparison is important, but there are many relationships between different fundamental constants and it is equally important to have a self-consistent set of values that relates all of the values of the fundamental constants. Only in this way can experiments using different theories and different fundamental constants test the consistency of their results and eventually the consistency of physics itself. It is generally agreed that Birge  was the first to attempt an adjustment of the values of the fundamental constants (and it seems most appropriate that this reference is the first page of the first volume of Review of Modern Physics). In the intervening time, others have made similar adjustments [2–6], but it eventually became apparent that a single source of these adjustments with broad international acceptance would provide the most benefits. This is why the CODATA Task Group on Fundamental Constants (TGFC) came into being.
2. The CODATA Task Group on Fundamental Constants
The CODATA TGFC has become the authoritative source for providing values and uncertainties for the fundamental physical constants of nature. To understand both why this is true and how it has happened, we have to consider the task group's place in the scientific community, its impact and its past accomplishments.
The International Council for Science (ICSU; http://www.icsu.org/) was established in 1931 (formerly International Council of Scientific Unions) and is the largest and oldest coordinating organization for worldwide science. It is non-governmental and has a global membership of 120 national scientific bodies and 31 international scientific unions. With a mission to strengthen international science for the benefit of society, it provides coordination, strategic focus and in some cases governance for many organizations, including the International Union of Pure and Applied Physics and the International Union of Pure and Applied Chemistry. But ICSU is more than just its members; it also has 17 interdisciplinary bodies addressing major issues of relevance to both science and society.
CODATA is the Committee on Data for Science and Technology (http://www.codata.org/) and was set up in 1966 as an interdisciplinary body sponsored by ICSU. The mission of CODATA is to strengthen international science for the benefit of society by promoting improved scientific and technical data management and use. CODATA is primarily concerned with the data management of quantitative data collected from many different fields, including physics, biology, chemistry, geology and the environment. The headquarters of both ICSU and CODATA are in Paris, France.
Many of CODATA's activities are performed through task groups, and the first, established in 1969, was the TGFC (http://physics.nist.gov/constants and http://www.bipm.org/extra/codata/). The purpose of the CODATA TGFC is ‘to periodically provide the scientific and technological communities with a self-consistent set of internationally recommended values of the basic constants and conversion factors of physics and chemistry based on all of the relevant data available at a given point in time’. CODATA recommended values of the fundamental constants have been published in 1973, 1986, 1998, 2002, 2006 and 2010 [7–12] and are all on open access on the Web (http://physics.nist.gov/constants). The next publication is planned for the end of 2014.
This brief history of scientific organizations illustrates that the CODATA TGFC has established a long-term record of its work. It also illustrates that the TGFC has been subjected to governance, oversight and review by international scientific organizations for a long time. ICSU reviewed the task group's activities just last year. This long history of scientific oversight and review is the first aspect of the TGFC's reputation.
The six publications of recommended values [7–12] are each very long comprehensive reviews detailing the experimental and theoretical results as well as providing recommended values. Taken together, these publications represent an immense archive of science related to fundamental constants and associated physics. These peer-reviewed articles have been accepted by the scientific community and are referred to and used by thousands of scientists, not just those pursuing studies about fundamental constants. The benefits and convenience of using a single set of recommended self-consistent values, which are easily accessible and used not just by one's colleagues but the entire scientific community, are too helpful to ignore. These benefits, easy access and wide acceptance translate into a broad impact, which is the second reason for the TGFC's authoritative reputation.
Each of the CODATA recommended value publications builds upon the previous publication and retains a common subject structure and style. The results are publicly available in journals and online (http://physics.nist.gov/constants). All of this is consciously promoted to maximize continuity. The transition from one set of recommended values to the next is made with extensive publication. The end result is, as much as possible, an abrupt transition to the latest recommended values and an equally easy relationship between results using different sets of recommended values. While this continuity is most important to a relatively small number of scientists actively involved in fundamental constants, it ensures the maximum continuity for experts and the general community alike. Continuity is the third reason for the TGFC's authoritative reputation.
3. The methodology of recommending a value
Earlier work [2–6] often used a least-squares analysis (LSA) to provide self-consistent values for a set or subset of fundamental constants. The TGFC continues this approach but it has been extended to include virtually all of the essential fundamental constants and to account for the functional and experimental correlations. For the rest of this paper, I refer to the analysis simply as an LSA. The present TGFC LSAs are variance-weighted, generalized, multivariate least-squares adjustments, which include covariances. Weighting by the inverse of the variance is consistent with the Guide to the expression of uncertainty in measurements (GUM)  and provides the smallest possible uncertainty that can be supported by the uncertainties of the individual results.
Several TGFC policies have a direct impact on the LSA methodology. The uncertainties of recommended values are reported to two significant figures. This has the effect of excluding determinations of much poorer uncertainties from the LSA analysis. Second, the CODATA recommended value publications are now prepared every 4 years. This regularizes the process, more easily identifies new determinations and helps researchers schedule their objectives. The third policy is more fundamental; the task group does not alter the authors' published value or uncertainty. In a few special cases, and with the authors' permission, some small corrections have been included, usually to account for newly recognized effects in an older result. In all such cases, the corrections are fully described in the LSA text. This last policy has an impact on how discrepant data are handled and infers that the LSA cannot arbitrarily alter the values, uncertainties or inclusion of acceptable data points.
To understand how the analysis works, first consider a single fundamental constant. Let Y be the vector of the input data, the individual determinations of that fundamental constant minus the unknown weighed mean, ym, The variance, , of each of the data points is separated into the sum of a uncorrelated part and the parts correlated with other data points, Then the covariance matrix, C, is defined by and χ2 is given by χ2=YC−1YT.
Using a regression to minimize χ2 with respect to ym yields the variance-weighted mean and its uncertainty, while accounting for covariances.
4. Goodness of fit
The consistency of the dataset and the goodness of the fit to a weighted mean are assessed in several ways. The χ2, with the degrees of freedom, υ, are used to calculate a consistency probability, i.e. the probability that the dataset can be independently sampled from a common Gaussian probability distribution. Because the input data and our analysis are GUM compliant, then approximately 50% would be expected from a Gaussian distribution. As the probability gets larger, it suggests that the authors may have overestimated their uncertainties. As the probability gets smaller, it suggests that the authors may have underestimated their uncertainties or that there are unknown errors or biases. It may also suggest new physics, but that is outside our initial assumptions. Other indicators of consistency that are examined include the Birge ratio , which should tend to one, and both the mean and maximum normalized residuals from the fit. In much of the work undertaken by the task group, the number of data points or number of individual experimental results are relatively small. This limitation itself generates increasing uncertainty in a simple interpretation of any goodness-of-fit criteria and progressively forces more reliance on the expertise and judgement of the task group members.
The full LSA is an extension to all of the fundamental constants in the LSA with observational equations that incorporate their functional interdependences. Observational equations relate the determinations of a particular fundamental constant to other fundamental constants. For example where RK is the von Klitzing constant, μ0 is the permeability of vacuum, c is the speed of light and α is the fine structure constant. The 2010 LSA had 160 input data values, 135 distinct types of observational equations, and 83 adjusted constants or unknowns.
5. Data selection
We are fortunate not to be performing the first LSA. This would require searching for all previous fundamental constant determinations, even though many would not be used in the final analysis. Instead, we rely upon the previous LSA and the extensive bibliography database (http://physics.nist.gov/constants) to obtain the older determinations, and we only have to update the new determinations obtained since the last LSA. Input data from publications in recognized peer-reviewed journals are always accepted, but results that have been accepted but awaiting publication or even have been widely available may be considered. The date for acceptance of input data for the 2014 LSA is 31 December 2014 midnight EST.
Many thousands of determinations have been made at poor uncertainties, and including them all in a single LSA would be both computationally impracticable and intellectually unreasonable. To cull the dataset, several considerations are made. As only two significant figures are reported in the recommended uncertainty, input data that do not contribute more than 1% can be excluded without affecting the result. This is often referred to as the self-sensitivity criterion. A single project may publish several new determinations as a result of improvements or modifications. In general, the most recent result supersedes an earlier result unless it is specifically demonstrated that the individual results are uncorrelated. This superseding principle is based, in part, on the belief that the knowledge about the experiment is improving with time (even if the claimed uncertainty is increasing). Exceptions to this guideline may be made by estimating the covariance between the results. Lastly, input data may be removed from the analysis at the authors' request. Occasionally, errors or effects are realized after publication and a corrected publication is impracticable or impossible. In such cases, the TGFC seeks the written approval from the authors to exclude the data and this action is noted in the LSA text.
6. Consistent data: the 2010 least-squares analysis Boltzmann constant
Before considering discrepant data, it may be useful to consider an acceptable dataset. The 2010 data for the Boltzmann constant are shown in figure 1.
The solid points of figure 1 are determinations of the Boltzmann constant that have passed the self-sensitivity criterion. The labels indicate the type of determination, the laboratory and date and are more fully documented in . The open points are the 2006 and 2010 CODATA recommended values and are not input data. This dataset has a χ2 of 2.04 and probability of agreement of 96%. The Birge ratio is 0.51, and a maximum normalized residual is 0.83. From a purely statistical viewpoint, this dataset is not showing enough scatter. A 96% probability of agreement suggests that the uncertainties may be overestimated; however, the TGFC does not make any further changes to this analysis and it is accepted.
7. Discrepant data: the 2010 least-squares analysis gravitational constant
Unlike the Boltzmann data, the gravitational constant data are extremely discrepant. The full experimental details of the experiments are beyond the scope of this publication, but two auxiliary issues are worth noting before describing the input data. Kuroda  showed that the anelasticity of a torsion fibre could cause a systematic error in the results of some G determinations made with a low-Q fibre. Subsequently, attempts have been made to correct some of the earlier determinations, but most later determinations have included the Kuroda effect in their uncertainty budget evaluations. One means of eliminating this effect is to null the torsional deflection with feedback, usually using an electrostatic technique. However, losses occurring in dielectrics exposed to the fringing field have been observed and these have also produced a systematic error in the results . Conversion to an AC servo seems to eliminate this particular problem.
There are 11 input data for the 2010 LSA that were included after the initial self-sensitivity test. The following is a list of the CODATA 2010 LSA label, some author details, additional comments and a result.
NIST-82 is the Luther & Towler  experiment from the National Bureau of Standards and the University of Virginia. It consisted of a torsion balance using the period-of-oscillation mode. A 12 μm CrAu-plated quartz fibre was used as well as magnetic damping. No correction was originally made for the Kuroda effect. G=(6.6726±0.0005)×10−11 m3 s−2 kg−1.
TR&D-97 is the Karagioz & Izmailov  experiment. A torsion balance using the period-of-oscillation mode was employed with a ferromagnetic torsion offset adjustment. An in situ fibre annealing was used and damping was magnetic. Again, no correction was made for the Kuroda effect. G=(6.6729±0.0005)×10−11 m3 s−2 kg−1.
LANL-97 is the Bagley & Luther experiment  at the Los Alamos National Laboratory and used a torsion balance and the period-of-oscillation mode. Two different tungsten fibres with different Qs were tested. The Q was varied by plating one fibre with gold, lowering the Q from 950 to 490. Applies the Kuroda effect. G=(6.6740±0.0007)×10−11 m3 s−2 kg−1.
UWash-00 is the Gundlach & Merkowitz experiment  at the University of Washington. Again, a torsion balance was used, but this time on a rotation table and using feedback to eliminate net rotation of the fibre (no Kuroda effect). There is a comprehensive uncertainty budget. G=(6.674215±0.000092)×10−11 m3 s−2 kg−1.
BIPM-01 is the Bureau International des Poids et Mesures (BIPM) experiment by Quinn et al. . A torsion balance with a BeCu strip rather than a fibre was used. Both electrostatic servo control and free deflection (also called the Cavendish method) methods were studied. A very high Q (approx. 300 000) essentially eliminates the Kuroda effect. An AC servo was employed to overcome a loss mechanism introduced by fringing fields. Good agreement was achieved between the two methods and a comprehensive uncertainty budget was presented. G=(6.6755927±0.00027)×10−11 m3 s−2 kg−1.
UWup-02 is the PhD thesis work of Kleinevoß  at the University of Wuppertal. A double pendulum with a microwave Fabry–Perot interferometer sensed the changes in the microwave cavity caused by movement of the attractor mass. G=(6.67422±0.00098)×10−11 m3 s−2 kg−1.
MSL-03 is the Armstrong & Fitgerald experiment  at the Measurement Standards Laboratory in New Zealand. An electrostatically compensated torsion balance (no Kuroda effect) was used with stainless-steel and copper attractor masses. G=(6.67387±0.00027)×10−11 m3 s−2 kg−1.
HUST-05 is the Hu et al. experiment  at the Huazhong University of Science and Technology (HUST). A high-Q torsion balance study using the period-of-oscillation mode. The experiment has gone through several iterations of improvements. G=(6.6723±0.0009)×10−11 m3 s−2 kg−1.
UZur-06 is the University of Zurich experiment  by Schlamminger et al. A beam balance was used to measure G in the presence of the Earth's gravitational field. A very detailed and challenging experiment and quite different from the other data presented. G=(6.674252±0.000124)×10−11 m3 s−2 kg−1.
HUST-09 is the work  by Tu et al. at HUST. A high-Q torsion balance was used in the period-of-oscillation mode. The experiment has been rebuilt from 2005 with new masses, improved symmetry, etc. A tungsten fibre was used but anelastic effects were compared with a higher-Q quartz fibre. The experiment included evaluation of fibre ageing effects, two independent G measurements and very comprehensive uncertainty budget. G=(6.67349±0.00018)×10−11 m3 s−2 kg−1.
JILA-10 is the Parks & Faller experiment  performed at the Joint Institute for Laboratory Astrophysics. A double pendulum with a laser Fabry–Perot interferometer sensed the changes in distance caused by movement of the attractor masses. This seems to be a detailed, careful experiment. G=(6.67234±0.00014)×10−11 m3 s−2 kg−1.
The solid circles in figure 2 are determinations of the gravitational constant that have passed the self-sensitivity criterion. The open circles are the 2006 and 2010 CODATA values and are not input data. This dataset has a χ2 of 208.6 and probability of agreement of less than 0.001%. The Birge ratio is 4.57, and a maximum normalized residual is 10.7! There is no obvious correlation with value, uncertainty or technique. This is one of the most discrepant datasets that the TGFC has ever encountered. It is not just one data point that appears to be an outlier, but many.
The task group deals with this discrepancy by multiplying a common expansion factor to all of the input data uncertainties. The LSA is rerun until some combination of the consistency criteria is achieved that is acceptable. The task group decides on an expansion factor based on their expertise, experience and statistical indicators, but they do not simply apply a statistical criterion to a dataset that is clearly not being sampled from a Gaussian distribution. It may seem surprising that the expansion factor is not chosen to exactly match a particular numerical criterion; for example, selecting an expansion factor that sets the probability of agreement from the χ2 at exactly 50% or sets the Birge ratio exactly equal to one. Such an approach is trying to force an inherently non-Gaussian dataset onto a Gaussian model with a single adjustable parameter. Letting the task group decide on the expansion factor when conventional statistics has failed may seem arbitrary but the alternatives have their own problems.
Before discussing alternative approaches of dealing with discrepant data, let me comment on the use of the CODATA recommended values. The most important things about the recommended values are the values themselves, because many users only use the values. The uncertainties, while important, are of more concern to experts of fundamental constants and people trying to evaluate the general state of the art of physics. This suggests that our analysis when faced with a necessary compromise should be focused on preserving the value even at the expense of the uncertainty.
Over the years, many other statistical approaches of dealing with discrepant data have been considered, but so far all have been rejected for one or more reasons. In general, most approaches may be divided into three classes. The first class decreases the weights of one or more input data, and these include the rejection of outliers and maximum consistent subset approaches. This process shifts the recommended mean and reduces the recommended uncertainty while deprecating or ignoring some data. For the maximum consistent subset approach, especially, the analysis becomes redirected towards quantifying the consistency of some parts of physics instead of all of physics. This is also why the recommended uncertainty is reduced, and is no longer supported by the entire dataset.
The second class applies unknown biases to each of the input data. The size and sign of each bias is usually obtained from the minimization of an effective χ2, but the biases are usually constrained by the residuals of the original fit, the claimed uncertainties or some other parameter. The process shifts the recommended mean and reduces the recommended uncertainty. The existence of biases is often believed, but whether they can be modelled by such statistics is hotly debated. Again, the recommended uncertainty is reduced and is no longer supported by the entire dataset.
The third class is the application of expansion factors to the input uncertainties. The factors may be individual for each input data or common, and the factors may be combined in quadrature or multiplied. The classic approach is to use a common expansion factor and multiply it to each input data point. This process does not shift the recommended mean and in that sense it is unbiased. In addition, the relative values of the uncertainties are retained. The process expands the recommended uncertainty and can result in it being supported by the ensemble of input data. However, with respect to some of the input data, the assigned uncertainty of the recommended value may seem unrealistically large.
In the case of the 2010 LSA of the gravitational constant, the task group finally accepted an expansion factor of 14. This produced a χ2 of 1.06, a probability of agreement of 1.00, a Birge ratio of 0.33 and a maximum normalized residual of 0.77, some of which may sound overly conservative, but when taken as a whole still has elements of inconsistency. The 2010 recommended value for the gravitational constant is G=(6.67384±0.00080)×10−11 m3 s−2 kg−1. The CODATA-10 point in figure 2 is the result of this expansion factor.
8. Conclusion and future expectations
The gravitational constant data pose a very difficult problem and one wonders if this can be resolved in a reasonable time scale. While the future is unknown, there are constraints that allow us to make some predictions. The task group has no expectation of the withdrawal of any of the existing data points, so only new data can affect the present dataset. Gravitational constant projects take many years to assemble and complete, and the task group is aware of most if not all projects and often their expected publication schedules and expected uncertainties as well. By December 2014, we will have one new data point from BIPM and possibly new points from University of California and HUST. Unfortunately, these data points are not expected to improve the present discrepancy and may well exacerbate the problem. The discrepancy problem of big G is likely to be with us for the 2014 and perhaps the 2018 LSAs.
The only hope for resolution of this problem is to obtain one or more new data points with a much smaller uncertainty than presently available. This would have the effect of eliminating some of the present data points by their reduced self-sensitivity. Achieving such a small uncertainty only seems possible with a revolutionary new measurement technique. Perhaps this is the real prediction—eventually someone will discover such a technique.
One contribution of 13 to a Theo Murphy Meeting Issue ‘The Newtonian constant of gravitation, a constant too difficult to measure?’
- © 2014 The Author(s) Published by the Royal Society. All rights reserved.