R-factor calculations and their significance

 

The traditional crystallographic R-factor or R-value


The traditional crystallographic R-factor or R-value is defined as
 

         Sum |F(obs) - F(calc)|
    R = ------------------------
               Sum F(obs)
 

This R-factor is a measure of the level of disagreement between the (properly scaled) observed structure factors (Fobs) and calculated structure factors (Fcalc).  It is usually reported in %, i.e. an R-factor of 0.18 is reported as 18%.
 

Note: This R factor is distinct from the calculation of Rmerge or Rsym values which are used to report the quality of the experimental diffraction data as the average discrepancy of multiple measurements of the same reflection.  These R factors are based on diffraction intensities:
 

           Sum |I(obs(i)) - I(mean)|
      R = ---------------------------
                   Sum I(mean)
 
 
 

Example for R factor computed on Fs (structure factors), like the crystallographic R factor:
 

F(obs)
F(calc)
|F(obs) - F(calc)|
10
11
1
12
10
2
8
9
1
Sum = 30
Sum = 30
Sum = 4

R factor (F) = 4 / 30 = 0.133 or 13.3%














Example for R factor computed on equivalent Is (intensities or squared structure factors):
 

I(obs)
I(calc)
|I(obs) - I(calc)|
100
121
21
144
100
44
64
81
17
Sum = 308
Sum = 302
Sum = 82

R factor (I) = 82 / 308 = 0.266 or 26.6%








Example for R factor computed on equivalent Is (intensities) as for Rsym or Rmerge:
 

I(1)
I(2)
I(mean)
|I(n) - I(mean)|
100
121
110.5
10.5/10.5
144
100
122
22/22
 64
 81
72.5
8.5/8.5
Sum = 308
Sum = 302
Sum = 2*305
Sum = 82

Rmerge factor = 82 / 710 = 0.117 or 11.7%





The crystallographic R-factor is an attempt to report with a single number the degree to which a complex atomic model with four or more parameters per atom agrees with a large set of individual diffraction observations.  Typically, thousands of atoms give rise to thousands of computed structure factors which are then compared to the observed structure factors to yield a single R-factor.  This R-factor is not very sensitive to errors, especially in case of:


 

As a rule of thumb, structures with a resolution at or better than 2.5 A with R-factors below 25% tend to be largely correct


 

A little discourse on crystallographic refinement (supposed to have been covered in previous lectures)

 

Crystallographic refinement is the modification of an atomic model to improve the agreement between observed and calculated structure factors (or intensities) while maintaining reasonable chemical restraints


 

A major problem of this task is due to the complexity of the refinement:  moving thousands of atoms to improve agreement between the Fourier transform of these atoms or F(calc) and the observed structure factors or F(obs) to achieve the best global fit.
 

In principle, this type of non-linear optimization can be carried out by least-squares or conjugate-gradient approaches which minimize a target function, the so-called residual:

  Resi = Sum [F(obs) - F(calc)]2
 

Another problem is the fact that due to the high dimensionality of the problem a vast number of local minima exist which standard least-squares algorithms cannot easily distinguish from the global minimum.  This results in a very limited radius of convergence because of the use of a linear approximation to a highly non-linear problem.
 
 
 

The radius of convergence can be increased substantially by using simulated annealing techniques (XPLOR, CNS):  in simplified terms, thermal motions during molecular dynamics are used to help the refinement overcome local minima.
 

One look at the equation for the residual above shows that it is closely related to the numerator of the R-factor equation which is the Sum (|F(obs) - F(calc)|).  So especially for low resolution cases where the observations-to-parameters ratio is low (less than 1) it is easy to refine to a low residual (the target of the minimizer) which will result in a low R factor, but probably at the cost of a stereochemically ill-defined model.  To prevent such over-refinement the Rfree (see section below) and the stereochemistry of the model (see next set of pages) should be checked constantly.
 
 
 

Practical refinement hints

If the starting model for refinement comes from the interpretation of MIR maps, the model should be fairly complete (80%+ of the main chain atoms and 50%+ of the side chain atoms) before meaningful refinement can be attempted.
 

If the starting model comes from molecular replacement it is likely to contain large systematic errors.  A preliminary step of rigid body refinement should be carried out to reduce the largest errors in coordinates.  In this technique the model is divided into a small number of  'rigid bodies'.  The number of parameters in this refinement is much smaller (6 per rigid body as opposed to 3 per atom), so one can refine using only low resolution data (for example 4-12 A) and thus greatly increase the radius of convergence.  Only once the R-factors (R and Rfree, see below) have dropped to the low 40% range or lower, should one begin to gradually include the high resolution data as well.
 
 
 

The Free R factor


One of the more recent advances in the field of macromolecular refinement has been the introduction of the free R-factor or Rfree concept by Bruenger and coworkers (1992).  It is based on the statistical method called cross-validation.  The free R-factor was introduced because it shows greatly reduced coupling to the target function used during minimization (the residual) than the R-factor.  It measures the degree to which the atomic model predicts a subset of the observed diffraction data that has been omitted from the refinement.
 

So instead of letting the minimizer refine against the complete dataset (all F(obs)), a random subset (5-10%) of the dataset is set aside and labeled the free or test set.  The remaining 90-95% of the dataset (working set) are used to form the target function for refinement and to compute the traditional crystallographic R-factor.
 

In contrast, the Rfree is the R-factor calculated for the test set alone.  The Rfree is commonly 2-8% higher than the regular R-factor.
 

To avoid bias of the test set and thus of the Rfree:

 
 
 
 
 
 
 
 

Typical R-factors

 
Type
R-Factor
Free R-factor
Random structure
57+%
Initial MIR model
40-50%
Initial MR model
35-52%
Refined 2.5 A structure without solvent
25%
30%
Fully efined 2.5 A structure
22%
27%
Fully refined 1.0 A protein structure
8.8%
10.8%
Fully refined small molecule structure
3%

 
 
 

R-factor cosmetics

 

 

Sigma cutoff for inclusion of Fs
R-factor   Rfree 
Completeness of highest-resolution shell
0
23.5%
31.9%
82.8%
2
22.1%
30.7%
75.2%
3
21.7%
30.3%
71.4%
4
21.3%
29.9%
67.5%

 
 
 

References & weblinks

  1. Crystallographic refinement by simulated annealing: methods and applications by A.T. Bruenger, Meth. Enz. 277, 243-269 (1997).
  2. Free R Value: Cross-Validation in Crystallography by A.T. Bruenger, Meth. Enz. 277, 366-396 (1997).

 

Next:  Programs that check the quality of protein structures


Please report typos, errors etc. by EMAIL (mention the title of this page).