A post by Deepak alerts that the PDB will no longer be accepting computational models of protein structures as of this October, and notes the pressing need for a peer-reviewed knowledgebase of protein models to exist alongside the PDB. The field of “structural genomics” has envisaged a marriage of experimental and theoretical determinations since its beginnings in 1998, but since then one gets the impression that the US-funded Protein Structure Initiative (PSI) has been sidetracked by the difficulties involved in experimentally determining protein structures in a number that approaches its initial somewhat optimistic forecasts. It is encouraging to see via these press releases that computational modeling is still within the realm of the PSI. Large scale funding for protein structure modeling such as this may deliver the kind of quality protein model databases that are needed in the light of the decision of the PDB.
Neil Saunders is wondering if there is some standard software to manage the data involved in structural genomics projects. Unfortunately most structural genomics laboratories (e.g. the JCSG, the TB consortium) have had to tackle this problem independently.
A couple of years back I worked on a LIMS system for stuctural genomics, which can be read about here. In theory the SQL schema and frontend are available for anyone, but there would be a fair amount of tweaking involved if another lab wanted to use it, and I think that would be the case for other systems. There have been efforts at the EBI to develop standard data models for structural genomics, though.
This is another example of the importance of data standardization in bioinformatics / post-genomic biology. Everyone in the structural genomics community agrees that standardization of the data is very important, because data mining of the results of thousands of individual experiments along the structural genomics pipeline might yield some really important results. For example, by comparing the sets of proteins that couldn’t be crystallized with those that could, we could find those properties of proteins that are conducive to crystallization. Some attempts at this kind of retrospective data mining of experimental results have already been made. And with a view to that, the PDB has a target registration database to which all SG centres submit their data in XML format, although the flexibility of the DTD for this data is questionable, and more importantly, there is no guarantee that the data that is reported is up-to-date. It will be interesting to see how the structural genomics community will continue to deal with all their data.