The traditional crystallographic R-factor or R-value
is defined as
Sum |F(obs) - F(calc)|
R = ------------------------
Sum F(obs)
This R-factor is a measure of the level of disagreement
between the (properly scaled) observed structure factors (Fobs)
and calculated structure factors (Fcalc).
It is usually reported in %, i.e. an R-factor of 0.18 is reported as 18%.
Note: This R factor is distinct from the calculation of
Rmerge or Rsym
values which are used to report the quality of the experimental diffraction
data as the average discrepancy of multiple measurements of the same reflection.
These R factors are based on diffraction intensities:
Sum |I(obs(i)) - I(mean)|
R = ---------------------------
Sum I(mean)
Example for R factor computed on Fs (structure factors),
like the crystallographic R factor:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
R factor (F) = 4 / 30 = 0.133 or 13.3%
Example for R factor computed on equivalent Is (intensities
or squared structure factors):
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
R factor (I) = 82 / 308 = 0.266 or 26.6%
Example for R factor computed on equivalent Is (intensities)
as for Rsym or Rmerge:
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Rmerge factor = 82 / 710 = 0.117 or 11.7%
The crystallographic R-factor is an attempt to report with a single number the degree to which a complex atomic model with four or more parameters per atom agrees with a large set of individual diffraction observations. Typically, thousands of atoms give rise to thousands of computed structure factors which are then compared to the observed structure factors to yield a single R-factor. This R-factor is not very sensitive to errors, especially in case of:
As a rule of thumb, structures with a resolution at or better than 2.5 A with R-factors below 25% tend to be largely correct |
Crystallographic refinement is the modification of an atomic model to improve the agreement between observed and calculated structure factors (or intensities) while maintaining reasonable chemical restraints |
A major problem of this task is due to the complexity
of the refinement: moving thousands of atoms to improve agreement
between the Fourier transform of these atoms or F(calc) and the
observed structure factors or F(obs) to achieve the best global
fit.
In principle, this type of non-linear optimization can be carried out by least-squares or conjugate-gradient approaches which minimize a target function, the so-called residual:
Resi = Sum [F(obs) - F(calc)]2
Another problem is the fact that due to the high dimensionality
of the problem a vast number of local minima exist which standard
least-squares algorithms cannot easily distinguish from the global minimum.
This results in a very limited radius of convergence because of
the use of a linear approximation to a highly non-linear problem.
The radius of convergence can be increased substantially
by using simulated annealing techniques (XPLOR, CNS): in simplified
terms, thermal motions during molecular dynamics are used to help the refinement
overcome local minima.
One look at the equation for the residual above
shows that it is closely related to the numerator of the R-factor equation
which is the Sum (|F(obs) - F(calc)|). So especially
for low resolution cases where the observations-to-parameters ratio is
low (less than 1) it is easy to refine to a low residual (the target
of the minimizer) which will result in a low R factor, but probably at
the cost of a stereochemically ill-defined model. To prevent such
over-refinement the Rfree
(see section below) and the stereochemistry of the model (see
next set of pages) should be checked constantly.
If the starting model comes from molecular replacement
it is likely to contain large systematic errors. A preliminary step
of rigid body refinement should be carried out to reduce the largest
errors in coordinates. In this technique the model is divided into
a small number of 'rigid bodies'. The number of parameters
in this refinement is much smaller (6 per rigid body as opposed to 3 per
atom), so one can refine using only low resolution data (for example 4-12
A) and thus greatly increase the radius of convergence. Only once
the R-factors (R and Rfree,
see below) have dropped to the low 40% range or lower, should one begin
to gradually include the high resolution data as well.
One of the more recent advances in the field of macromolecular
refinement has been the introduction of the free R-factor or Rfree
concept by Bruenger and coworkers (1992). It is based on the statistical
method called cross-validation. The free R-factor was introduced
because it shows greatly reduced coupling to the target function used during
minimization (the residual) than the R-factor. It measures the degree
to which the atomic model predicts a subset of the observed diffraction
data that has been omitted from the refinement.
So instead of letting the minimizer refine against the
complete dataset (all F(obs)), a random subset (5-10%) of the dataset is
set aside and labeled the free or test set. The remaining
90-95% of the dataset (working set) are used to form the target
function for refinement and to compute the traditional crystallographic
R-factor.
In contrast, the Rfree
is the R-factor calculated for the test set alone. The Rfree
is commonly 2-8% higher than the regular R-factor.
To avoid bias of the test set and thus of the Rfree:
|
|
|
|
| Random structure |
|
|
| Initial MIR model |
|
|
| Initial MR model |
|
|
| Refined 2.5 A structure without solvent |
|
|
| Fully efined 2.5 A structure |
|
|
| Fully refined 1.0 A protein structure |
|
|
| Fully refined small molecule structure |
|
|
|
R-factor | Rfree |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Please report typos, errors etc. by EMAIL (mention the title of this page).