The Final Report

Greetings everyone,

I am glad to write the final report for the project "SBML-JSON converter and SBML scheme". It's been three months now since I am working on this project and about 5-6 months since I am involved in it. And I must say that I have learned quite a lot by now, thanks to Matthias and Andreas for all their help. It had been a very nice journey with both of them, marked by daily chats and weekly meetings.

We have actually covered all that we claimed in the initial proposal, and the complete code has been submitted to COBRApy. Two of our pull requests have got merged, one is under review and will get merged soon, and one may require some time because it contains some functionalities which will be used only after libsbml (the interfacing library between SBML and COBRApy) incorporate fbc-v3 package.

All the pull requests submitted so far are summarised in this spreadsheet.

Here is a summary of the work done in this project:

The JSON schema v1 had some issues which were making it unsuitable for use. It had been updated accordingly. Here is the corresponding pull request.
The infinity numbers were not supported initially for storing bounds of the reaction and other things. This has been fixed now. Here is the corresponding pull request.
Metadata Information: The main task of this project was to have a full lossless conversion of metadata information from SBML to JSON and in the COBRA models also. The metadata information was having a very limited support and there were many issues existing with its initial data format. The current implemented version of metadata classes is not just solving all the existing issues, it is even backward compatible. So even if we have a model with old annotation form in JSON or some other format, it will be first read, fixed and then written in new complete form. All these classes are having internal helper functions to set and parse data easily. Following are the implementations for metadata information:

CVTerms Class: This class is storing the Controlled Vocabulary annotation data in a similar fashion like SBML. It is basically a dictionary with keys as the biology qualifier defining the relationship between component and resources and its values are list (called as CVList), storing all sets of resources defined for this biology qualifier (basically the alternative resources for this qualifier). A single item in this list is an object of ExternalResource class with two attributes namely “resource”, for storing reference to data in other databases, and “nested_data”, for defining the resources themselves. This CVTerms class object is synchronized with another dictionary, storing annotation data in old format, and simply calling the “annotation” attribute of the object will return this old format data, ensuring complete backward compatibility. The new format data can be accessed via object.annotation.cvterms.
History Class: History class is also a part of the annotation attribute of the component and can be accessed via object.annotation.history. It basically stores the information about the creators, created date and modified date of that component. Initially, the history object was only supported on the main model, but now we can attach history on any component derived from Object class in COBRApy.
KeyValuePair Class: The KeyValuePair class is used for storing any type of key-value pair data, which is not suitable to store anywhere else in the model. This is also a part of annotation, and hence present inside the annotation attribute of component (accessed using object.annotation.keyvaluedata). This feature is a part of fbc-v3.
Notes Class: The notes class is used to store the notes data of a component. Initially, COBRApy was only extracting the key-value pairs present inside the notes, and everything else was left over. The current implementation however, stores the complete notes string, as well as a dictionary synchronised with the notes string. The data in this dictionary can only be updated, no new key-value pair can be added to it, because notes are not a right place to store any machine parsable information, it should contain only human readable information.

Following is graph shows a report on how much percentage of meta-information is getting parsed in SBML-JSON-SBML roundtrip. I have tested it on 1382 models from memote models repository. Different types of models have been tested as shown in the figure (agora, bigg, etc). There were some models which were incorrect and producing error like model can’t be read, etc. They are shown with zero percentage of data parsing.

The corresponding pull request for metadata implementation is here.

Group to JSON: The Group package was though supported by COBRApy, but it's parsing was not done to JSON and other formats. Hence, support of the Group package has been added to JSON and other formats.
JSON schemas: The version 2 of JSON schema is made, and JSON schema version 1 is updated to fix some issues. New implemented features are all added in JSON schema version 2, and the validation function for validating JSON models has been added.

For both of the above two features, the pull request is the same as that for metadata classes.

User-Defined-Constraints Class: The user-defined constraint class is a new feature of fbc-v3 package, and was not supported by COBRApy and JSON and other formats too. Its support has been added by implementing classes for it and adding helper functions to add constraints easily. One can now add constraints by simply passing string expressions of the constraint and lower and upper bound for that constraint. Here's the pull request corresponding to it.

Along with all these new implementations, the existing issues stated in the proposal are also fixed. Tests corresponding to all each new functionality are also added to check their functioning.

Wrapping up, Google Summer of Codes has provided me a very good experience where I worked with people outside my national boundary on a fully dedicated project. I learned a lot about open source development, professional code writing, and also about python, JSON, and SBML which were the main technologies used in this project. I and my mentors will keep in touch even after this project, and we may make the metadata classes more modular and put them in a different package.

With this, I complete this season's Google Summer of Codes blogpost. Hope you enjoyed it. For any type of queries/discussion, feel free to comment here, or you can also connect with me via mail (hemant@cs.iitr.ac.in).

Thank you

Search This Blog

SBML-JSON Converter: GSoC 2020

The Final Report

Comments

Post a Comment

Popular posts from this blog

The Week 12

The Week 13