Neil Saunders is wondering if there is some standard software to manage the data involved in structural genomics projects. Unfortunately most structural genomics laboratories (e.g. the JCSG, the TB consortium) have had to tackle this problem independently.
A couple of years back I worked on a LIMS system for stuctural genomics, which can be read about here. In theory the SQL schema and frontend are available for anyone, but there would be a fair amount of tweaking involved if another lab wanted to use it, and I think that would be the case for other systems. There have been efforts at the EBI to develop standard data models for structural genomics, though.
This is another example of the importance of data standardization in bioinformatics / post-genomic biology. Everyone in the structural genomics community agrees that standardization of the data is very important, because data mining of the results of thousands of individual experiments along the structural genomics pipeline might yield some really important results. For example, by comparing the sets of proteins that couldn’t be crystallized with those that could, we could find those properties of proteins that are conducive to crystallization. Some attempts at this kind of retrospective data mining of experimental results have already been made. And with a view to that, the PDB has a target registration database to which all SG centres submit their data in XML format, although the flexibility of the DTD for this data is questionable, and more importantly, there is no guarantee that the data that is reported is up-to-date. It will be interesting to see how the structural genomics community will continue to deal with all their data.