Data reporting for publications

 

You have solved, refined & analyzed your MIR/MR structure - now what?

 

Assemble data collection and refinement statistics, usually in form of a table:



Prepare the atomic coordinate in PDB format file for deposition

 

In a PDB file every atom is represented by one line starting with 'ATOM'.  The ATOM label is followed in fixed format by the atom number, atom type, residue type, [chain name], residue number, Cartesian coordinates (x, y, z in Å), occupancy and B-factor:
 

ATOM      1  N   GLN A   3      -7.473   2.493   8.251  1.00 91.59
ATOM      2  CA  GLN A   3      -6.052   2.183   8.590  1.00 91.36
ATOM      3  C   GLN A   3      -5.650   2.944   9.849  1.00 90.93
ATOM      4  O   GLN A   3      -4.559   3.527   9.928  1.00 90.73
ATOM      5  CB  GLN A   3      -5.130   2.542   7.420  1.00 91.75
ATOM      6  N   ILE A   4      -6.557   2.930  10.824  1.00 90.45
ATOM      7  CA  ILE A   4      -6.368   3.594  12.115  1.00 90.40
ATOM      8  C   ILE A   4      -5.029   3.198  12.762  1.00 90.28
ATOM      9  O   ILE A   4      -4.109   4.023  12.872  1.00 89.61
ATOM     10  CB  ILE A   4      -7.552   3.258  13.085  1.00 90.26
ATOM     11  CG1 ILE A   4      -7.256   3.762  14.504  1.00 89.86
ATOM     12  CG2 ILE A   4      -7.855   1.754  13.068  1.00 89.61
ATOM     13  CD1 ILE A   4      -8.232   3.270  15.565  1.00 89.31
...etc...
END
 
 

Final PDB files also have an extensive HEADER section that contains a lot of details about the macromolecule, references, co-factors, data collection, refinement, etc.  This header section usually contains more details than the primary reference:
 

HEADER    ION TRANSPORT                           28-JUL-99   1C3W
TITLE     BACTERIORHODOPSIN/LIPID COMPLEX AT 1.55 A RESOLUTION
COMPND    MOL_ID: 1;
COMPND   2 MOLECULE: BACTERIORHODOPSIN (GROUND STATE WILD TYPE "BR");
COMPND   3 CHAIN: A;
COMPND   4 ENGINEERED: YES;
COMPND   5 BIOLOGICAL_UNIT: HOMOTRIMER;
COMPND   6 OTHER_DETAILS: SCHIFF BASE LINKAGE BETWEEN LYS 216 (NZ)
COMPND   7 AND RET 301 (C15) DIETHER LIPID BILAYER
SOURCE    MOL_ID: 1;
SOURCE   2 ORGANISM_SCIENTIFIC: HALOBACTERIUM SALINARUM;
SOURCE   3 CELLULAR_LOCATION: PLASMA MEMBRANE;
SOURCE   4 EXPRESSION_SYSTEM: HALOBACTERIUM SALINARUM;
SOURCE   5 EXPRESSION_SYSTEM_CELLULAR_LOCATION: CYTOPLASM;
SOURCE   6 OTHER_DETAILS: THIS SEQUENCE OCCURS NATURALLY IN H.
SOURCE   7 SALINARUM
KEYWDS    ION PUMP, MEMBRANE PROTEIN, RETINAL PROTEIN, LIPIDS,
KEYWDS   2 PHOTORECEPTOR, HALOARCHAEA, 7-TRANSMEMBRANE, SERPENTINE,
KEYWDS   3 ION TRANSPORT, MEROHEDRAL TWINNING
EXPDTA    X-RAY DIFFRACTION
AUTHOR    H.LUECKE
REVDAT   2   22-SEP-99 1C3W    1       REMARK HETNAM
REVDAT   1   15-SEP-99 1C3W    0
JRNL        AUTH   H.LUECKE,B.SCHOBERT,H.-T.RICHTER,J.-P.CARTAILLER,
JRNL        AUTH 2 J.K.LANYI
JRNL        TITL   STRUCTURE OF BACTERIORHODOPSIN AT 1.55 ANGSTROM
JRNL        TITL 2 RESOLUTION
JRNL        REF    J.MOL.BIOL.                   V. 291   899 1999
JRNL        REFN   ASTM JMOBAK  UK ISSN 0022-2836                 0070
REMARK   1
REMARK   1 REFERENCE 1
REMARK   1  AUTH   H.LUECKE,H.-T.RICHTER,J.K.LANYI
REMARK   1  TITL   PROTON TRANSFER PATHWAYS IN BACTERIORHODOPSIN AT
REMARK   1  TITL 2 2.3 ANGSTROM RESOLUTION
REMARK   1  REF    SCIENCE                       V. 280  1934 1998
REMARK   1  REFN   ASTM SCIEAS  US ISSN 0036-8075                 0038
REMARK   2
REMARK   2 RESOLUTION. 1.55 ANGSTROMS.
REMARK   3
REMARK   3 REFINEMENT.
REMARK   3   PROGRAM     : SHELXL-97
REMARK   3   AUTHORS     : G.M.SHELDRICK
REMARK   3
REMARK   3  DATA USED IN REFINEMENT.
REMARK   3   RESOLUTION RANGE HIGH (ANGSTROMS) : 1.55
REMARK   3   RESOLUTION RANGE LOW  (ANGSTROMS) : 12.0
REMARK   3   DATA CUTOFF            (SIGMA(F)) : 0.000
REMARK   3   COMPLETENESS FOR RANGE        (%) : 99.1
REMARK   3   CROSS-VALIDATION METHOD           : THROUGHOUT
REMARK   3   FREE R VALUE TEST SET SELECTION   : THIN RESOLUTION
REMARK   3                                       SHELLS
REMARK   3
REMARK   3  FIT TO DATA USED IN REFINEMENT (NO CUTOFF).
REMARK   3   R VALUE   (WORKING + TEST SET, NO CUTOFF) : NULL
REMARK   3   R VALUE          (WORKING SET, NO CUTOFF) : 0.158
REMARK   3   FREE R VALUE                  (NO CUTOFF) : 0.225
REMARK   3   FREE R VALUE TEST SET SIZE (%, NO CUTOFF) : 5.00
REMARK   3   FREE R VALUE TEST SET COUNT   (NO CUTOFF) : 1687
REMARK   3   TOTAL NUMBER OF REFLECTIONS   (NO CUTOFF) : 32249
REMARK   3
REMARK   3  FIT/AGREEMENT OF MODEL FOR DATA WITH F>4SIG(F).
REMARK   3   R VALUE   (WORKING + TEST SET, F>4SIG(F)) : NULL
REMARK   3   R VALUE          (WORKING SET, F>4SIG(F)) : 0.140
REMARK   3   FREE R VALUE                  (F>4SIG(F)) : 0.201
REMARK   3   FREE R VALUE TEST SET SIZE (%, F>4SIG(F)) : 5.00
REMARK   3   FREE R VALUE TEST SET COUNT   (F>4SIG(F)) : 1390
REMARK   3   TOTAL NUMBER OF REFLECTIONS   (F>4SIG(F)) : 26270
REMARK   3
REMARK   3  NUMBER OF NON-HYDROGEN ATOMS USED IN REFINEMENT.
REMARK   3   PROTEIN ATOMS      :       1720
REMARK   3   NUCLEIC ACID ATOMS :       0
REMARK   3   HETEROGEN ATOMS    :       330
REMARK   3   SOLVENT ATOMS      :       23
REMARK   3
REMARK   3  MODEL REFINEMENT.
REMARK   3   OCCUPANCY SUM OF NON-HYDROGEN ATOMS      : 2073.00
REMARK   3   OCCUPANCY SUM OF HYDROGEN ATOMS          : NULL
REMARK   3   NUMBER OF DISCRETELY DISORDERED RESIDUES : 0
REMARK   3   NUMBER OF LEAST-SQUARES PARAMETERS       : 8300
REMARK   3   NUMBER OF RESTRAINTS                     : 8209
REMARK   3
REMARK   3  RMS DEVIATIONS FROM RESTRAINT TARGET VALUES.
REMARK   3   BOND LENGTHS                         (A) : 0.010
REMARK   3   ANGLE DISTANCES                      (A) : 0.030
REMARK   3   SIMILAR DISTANCES (NO TARGET VALUES) (A) : NULL
REMARK   3   DISTANCES FROM RESTRAINT PLANES      (A) : 0.264
REMARK   3   ZERO CHIRAL VOLUMES               (A**3) : 0.072
REMARK   3   NON-ZERO CHIRAL VOLUMES           (A**3) : 0.079
REMARK   3   ANTI-BUMPING DISTANCE RESTRAINTS     (A) : 0.013
REMARK   3   RIGID-BOND ADP COMPONENTS         (A**2) : NULL
REMARK   3   SIMILAR ADP COMPONENTS            (A**2) : NULL
REMARK   3   APPROXIMATELY ISOTROPIC ADPS      (A**2) : NULL
REMARK   3
REMARK   3  BULK SOLVENT MODELING.
REMARK   3   METHOD USED: SHELXL-97 SWAT, BABINET'S PRINCIPLE
...
HELIX    1   1 GLU A    9  GLY A   31  1                                  23
HELIX    2   2 ASP A   36  GLY A   63  1                                  28
HELIX    3   3 TRP A   80  VAL A  101  1                                  22
HELIX    4   4 ASP A  104  THR A  128  1                                  25
HELIX    5   5 VAL A  130  GLY A  155  1                                  26
HELIX    6   6 ARG A  164  GLY A  192  1                                  29
HELIX    7   7 PRO A  200  SER A  226  1                                  27
HELIX    8   8 ARG A  227  PHE A  230  5                                   4
SHEET    1   A 2 LEU A  66  PHE A  71  0
SHEET    2   A 2 GLU A  74  TYR A  79 -1  N  GLU A  74   O  PHE A  71
LINK         NZ  LYS A 216                 C15 RET A 301
CRYST1   60.631   60.631  108.156  90.00  90.00 120.00 P 63
...
ATOM      1  N   THR A   5      24.150  25.374 -13.588  1.00 61.71           N
 
 

Structure factor file format


The mmCIF format is an extension of the CIF (Crystallographic Information File) format developed for small molecule x-ray structures by the IUCr (International Union of Crystallography).  The objective is to provide a powerful syntax to allow a complete description of any macromolecular structure.  mmCIF is currently only used for the deposition of observed structure factors.  In this file excerpt, there is a declaration or header portion which defines the columns in the data portion.  Each data line contains three Miller indeces (h, k, l), the observed structure factor (in this case its square or the intensity), and standard deviation of the square of the observed structure factor:
 

data_r403dsf

loop_
 _refln.index_h
 _refln.index_k
 _refln.index_l
 _refln.F_squared_meas
 _refln.F_squared_sigma
   1   0   0     1.05    0.75
   2   0   0  3184.56   49.78
   4   0   0    79.85    7.26
   5   0   0     9.50    3.84
   6   0   0  1116.30   39.13
   7   0   0    13.63    7.03
   8   0   0   145.03    7.33
  10   0   0    68.84   12.00
  12   0   0    98.49   15.63
  14   0   0    42.32   12.84
  15   0   0    16.88   10.46
  16   0   0    38.74   12.35
   1   0   1  3905.92   48.39
   2   0   1    41.17    2.84
   3   0   1  1517.18   41.30
...
#END OF REFLECTIONS
 

 
 
 

Deposition of the atomic coordinates and observed diffraction data

 
Run PROCHECK and/or WHATIF yourself before submitting!

ADIT (AutoDep Input Tool) is used for submitting x-ray diffraction, electron diffraction, NMR or theoretical structures to the PDB of data to be included in the PDB archive.  It is also used internally by the PDB for data processing & validation.  To deposit a structure, the user uploads the relevant coordinate and experimental data files and then adds any additional information.  A session ID number is provided for depositors who wish to continue a deposition session at a later time.  ADIT is a web-based tool that uses frames.  ADIT works with most browsers, although there are known problems with Netscape on Linux.




Structures deposited using ADIT are processed immediately by the PDB staff and returned for final approval to the author.  In most cases, files are fully processed within nine days and are released according to the release status provided by the author:


For publication in (almost) all journals you will need the four character PDB code (like 1C3W below) at the galley stage:




 
 

Size & Growth of the PDB


The number of PDB entries as of June 6, 2000 is 12,474.  Growth of the database has been nearly exponential over the past 25 years.  With Structural Genomics on the horizon, this growth rate is unlikely to change anytime soon:



 
 
 
 
 

Growth in number of new folds


Number of "new folds" (blue) and "old folds" (red) for a given year.  Note: A chain fold was considered old if it was similar to a deposited fold according to the following criteria:

Otherwise the chain fold was considered new:
 


 

 

Proportion of "new folds" to all chains for a given year:





 References & weblinks

  1. Protein Data Bank archives of three-dimensional macromolecular structures by E.E. Abola et al., Meth. Enz. 277, 556-571 (1997).
  2. Website of the Protein Data Bank (or PDB)
  3. Macromolecular Crystallographic Information File by E. Bourne et al., Meth. Enz. 277, 571-590 (1997).
  4. http://ndbserver.rutgers.edu/mmcif/

 

Next:  Critique of papers


Please report typos, errors etc. by EMAIL (mention the title of this page).