The Final Report
Greetings everyone,
I am glad to write the final report for the project "SBML-JSON converter and SBML scheme". It's been three months now since I am working on this project and about 5-6 months since I am involved in it. And I must say that I have learned quite a lot by now, thanks to Matthias and Andreas for all their help. It had been a very nice journey with both of them, marked by daily chats and weekly meetings.
We have actually covered all that we claimed in the initial proposal, and the complete code has been submitted to COBRApy. Two of our pull requests have got merged, one is under review and will get merged soon, and one may require some time because it contains some functionalities which will be used only after libsbml (the interfacing library between SBML and COBRApy) incorporate fbc-v3 package.
All the pull requests submitted so far are summarised in this spreadsheet.
Here is a summary of the work done in this project:
- The JSON schema v1 had some issues which were making it unsuitable for use. It had been updated accordingly. Here is the corresponding pull request.
- The infinity numbers were not supported initially for storing bounds of the reaction and other things. This has been fixed now. Here is the corresponding pull request.
- Metadata Information: The main task of this project was to have a full lossless conversion of metadata information from SBML to JSON and in the COBRA models also. The metadata information was having a very limited support and there were many issues existing with its initial data format. The current implemented version of metadata classes is not just solving all the existing issues, it is even backward compatible. So even if we have a model with old annotation form in JSON or some other format, it will be first read, fixed and then written in new complete form. All these classes are having internal helper functions to set and parse data easily. Following are the implementations for metadata information:
- CVTerms Class: This class is storing the Controlled Vocabulary annotation data in a similar fashion like SBML. It is basically a dictionary with keys as the biology qualifier defining the relationship between component and resources and its values are list (called as CVList), storing all sets of resources defined for this biology qualifier (basically the alternative resources for this qualifier). A single item in this list is an object of ExternalResource class with two attributes namely “resource”, for storing reference to data in other databases, and “nested_data”, for defining the resources themselves. This CVTerms class object is synchronized with another dictionary, storing annotation data in old format, and simply calling the “annotation” attribute of the object will return this old format data, ensuring complete backward compatibility. The new format data can be accessed via object.annotation.cvterms.
- History Class: History class is also a part of the annotation attribute of the component and can be accessed via object.annotation.history. It basically stores the information about the creators, created date and modified date of that component. Initially, the history object was only supported on the main model, but now we can attach history on any component derived from Object class in COBRApy.
- KeyValuePair Class: The KeyValuePair class is used for storing any type of key-value pair data, which is not suitable to store anywhere else in the model. This is also a part of annotation, and hence present inside the annotation attribute of component (accessed using object.annotation.keyvaluedata). This feature is a part of fbc-v3.
- Notes Class: The notes class is used to store the notes data of a component. Initially, COBRApy was only extracting the key-value pairs present inside the notes, and everything else was left over. The current implementation however, stores the complete notes string, as well as a dictionary synchronised with the notes string. The data in this dictionary can only be updated, no new key-value pair can be added to it, because notes are not a right place to store any machine parsable information, it should contain only human readable information.
- Group to JSON: The Group package was though supported by COBRApy, but it's parsing was not done to JSON and other formats. Hence, support of the Group package has been added to JSON and other formats.
- JSON schemas: The version 2 of JSON schema is made, and JSON schema version 1 is updated to fix some issues. New implemented features are all added in JSON schema version 2, and the validation function for validating JSON models has been added.
- User-Defined-Constraints Class: The user-defined constraint class is a new feature of fbc-v3 package, and was not supported by COBRApy and JSON and other formats too. Its support has been added by implementing classes for it and adding helper functions to add constraints easily. One can now add constraints by simply passing string expressions of the constraint and lower and upper bound for that constraint. Here's the pull request corresponding to it.
Comments
Post a Comment