JPhyloIO Metadata Demo

This example application demonstrates how to read and write metadata using JPhyloIO. It is an extension of the tree demo application. It is therefore necessary to understand the tree demo before proceeding here.

[Download executable demo files]   [Browse source codes]

Overview

UML class diagram providing an overview over the classes making up this example application and the relation to their superclasses in the tree demo.
Figure 1  UML class diagram providing an overview over the classes making up this example application and the relation to their superclasses in the tree demo. The general design of the metadata demo application is the same as for the tree demo application and consists of an application main class, which implements the GUI and can read trees using MetadataTreeReader and write them using application specific data adapter implementations. Code for general tree processing is reused by inheritance from tree demo classes and additional implementations necessary for reading and writing metadata are added by the respective subclasses of this demo.
The main window of this example application.
Figure 2  The main window of this example application. The example tree provided with this demo application is loaded tree and annotations of nodes and edges can be edited.

Since this demo application extends the functionality of the tree demo instead of reimplementing the general tree processing, it extends classes from that demo and overwrites some methods that need to be adjusted to read and write metadata. As shown in figure 1 the main class MetadataApplication extends Application and overwrites the methods readTree() and writeTree() as well as the getter methods providing application name, version and URL.

To read a tree MetadataTreeReader is used, which is inherited from TreeReader as implemented in the tree demo. The writing of metadata attached to tree nodes and edges is implemented in adapters.NodeListDataAdapter and adapters.EdgeListDataAdapter, which are both inherited from tree demo class NodeEdgeListDataAdapter.

In addition to the topology read by the tree demo application, this demo program also models specific metadata attached to nodes and branches. Nodes carry two literal metadata string values (the scientific name and the NCBI taxonomy ID of the taxon) which are grouped by a parent resource metadata element modeling the taxonomy information of that node as a whole. Furthermore they may carry a list of size measurements of individuals of that species, which is stored in a single literal metadata element. Edges carry a numeric support value.

As shown in figure 2 all this metadata of a node and its afferent edge can be edited in the GUI after selecting that node. To store this metadata in formats that support it, necessary predicates identifying each metadata element modeled by this application are defined in the interface IOConstants. (Note that existing established ontologies should generally be used if available, but to keep this demo application more simple, we use this simple application specific example ontology.)

The following different examples of metadata shall demonstrate which type of annotations can be represented in which format and how a maximum format-independence can be achieved. Since NeXML offers the most comprehensive metadata model, all data can be represented there, while the other formats support only a subset of this.

Simple values attached to edges (the support value example)

The first metadata example in this application is a numeric support value attached to edge of a phylogenetic tree. (Note that it makes a difference if a value is attached to a node or an edge, e.g. if a tree is rerooted. The GUI of this demo application allows to edit the support value of the afferent edge of the selected node.) This simple numeric annotation can be represented in all tree formats that are supported by JPhyloIO and no format-specific implementations for reading or writing are necessary.

Writing

The support value is written to a file in the writeContentData() method of EdgeListDataAdapter. The easiest way to do that is to use the following tool method of JPhyloIOWritingUtils:

JPhyloIOWritingUtils.writeSimpleLiteralMetadata(receiver, id + DEFAULT_META_ID_PREFIX + "Support", null, 
        PREDICATE_HAS_SUPPORT, DATA_TYPE_DOUBLE, support, null);

You can refer to the JavaDoc of JPhyloIOWritingUtils.writeSimpleLiteralMetadata() for details on all parameters. Important to mention here is the event ID parameter. As shown above it is a concatenation of the edge ID (id), a meta ID prefix (DEFAULT_META_ID_PREFIX as defined in ReadWriteConstants) and the string literal "Support". Such IDs need to be unique within a written document and they must be identical for each event, even if the method writeContentData() of EdgeListDataAdapter is called multiple times, which is achieved by the implementation here. (Note that some writers will call this method multiple times when writing a single document and do not work properly, if the ID of an event changes between the calls, e.g. because IDs are generated dynamically by a counter.) The other parameters define the predicate for this type of metadata as declared in IOConstants, the data type xs:double and the double value to be written (support).

Representation in the different formats

The support value annotation can be read and written by JPhyloIO from different formats, which use a different representation of such data. The data folder of the demo application in the subversion repository contains example outputs in all supported formats. You can also try this out yourself by loading a tree into the demo application and selecting Save as... from the main menu.

NeXML will model this annotation using its meta tag. See NeXML.xml for the full document.

<edge source="node0" target="node1" id="edge1" about="#edge1">
    <meta id="edge1metaSupport" xsi:type="nex:LiteralMeta" property="a:isSupportedWith" datatype="xsd:double">99.0</meta>
</edge>

Newick and Nexus are able to represent simple edge annotations as hot comments inside the Newick string. See Newick.nwk and Nexus.nex for the full documents.

((Guinea_pig, Louse)Animals[&][&isSupportedWith=99.0], (Grass, Dandelion)Plants)Root;

The node annotations (described below) have been left out here. Note that the empty hot comment ([&]) preceding the support value hot comment is necessary, since the first hot comment is considered as a node annotation and the second as an edge annotation and no node attachments are present on the node Animals. The empty hot comment could also be omitted if a branch length would have been specified. Since Newick and Nexus use string keys instead of CURIE predicates and there is no possibility to declare namespaces, only the local part of the URI (isSupportedWith) is written to the document.

PhyloXML uses property tags to represent custom annotations, which are not directly modeled in special tags. (Note that PhyloXML does offer the confidence tag especially for support values, which should usually be used. To achieve this a PhyloXML-specific predicate could be used, but in order to keep this example simple, we decided to write the support value in the standard way here to demonstrate how simple literal attachments are treated in general by JPhyloIO for PhyloXML. The taxonomy annotation described below will show how to make use of special annotation tags in PhyloXML.) See PhyloXML.xml for the full document.

<clade id_source="node1">
    ...
    <property ref="a:isSupportedWith" datatype="xsd:double" applies_to="parent_branch">99.0</property>
    ...
</clade>

The ref attribute allows to specify a predicate similar to the property attribute of the meta tag in NeXML. The datatype attributes of both formats are also similar, but PhyloXML is more limited here, since it does not allow externally defined data types, but only a set of values defined in its schema. This is not a problem here, but we will see in the size measurement example below that some workarounds are necessary, if the property tag shall be used for data types that are not supported by PhyloXML. The applies_to attribute determines whether an annotation is attached to the node or the edge.

(Note that although the ID of the literal metadata start event (that is written in the call of JPhyloIOWritingUtils.writeSimpleLiteralMetadata()) is not contained in the PhyloXML file (since there are no such metadata IDs modeled in the format), PhyloXMLEventWriter will anyway need a document-wide unique event ID to function properly.)

Reading

MetadataTreeReader overwrites the method readEdgeContents() inherited from TreeReader to implement writing the support value. Analogous to reading, writing is also implemented using the tool readLiteralMetadataContentAsObject from JPhyloIOReadingUtils.

JPhyloIOEvent event = reader.next();
while (reader.hasNextEvent() && !event.getType().getTopologyType().equals(EventTopologyType.END)) {
	if (event.getType().getTopologyType().equals(EventTopologyType.START)) {
		if (event.getType().getContentType().equals(EventContentType.LITERAL_META)) { 
			LiteralMetadataEvent literalEvent = event.asLiteralMetadataEvent();
			
			// Load possible support value:
			if (PREDICATE_HAS_SUPPORT.equals(literalEvent.getPredicate().getURI())) {
				((NodeData)targetNode.getUserObject()).setSupport(
						JPhyloIOReadingUtils.readLiteralMetadataContentAsObject(reader, Number.class).doubleValue());
			}
			else {  // Skip all nested events and the end event if another literal metadata element is nested.
				JPhyloIOReadingUtils.reachElementEnd(reader);
			}
		}
		else {  // Skip possible other event subsequences.
			JPhyloIOReadingUtils.reachElementEnd(reader);
		}
	}
	event = reader.next();
}

This tool method consumes the whole event sequence from the literal metadata start to the respective end event and ignores any possible comment events. If an application is interested in such comments or if it expects very large string values that are separated over multiple instances of LiteralMetadataContentEvent it should implement processing the event sequence directly instead of using this method. This method is also not applicable to XML content of literal metadata annotations.

The above code snippet taken from MetadataTreeReader.readEdgeContents() subsequently reads events from the underlying JPhyloIO reader until the edge end event is reached. Within that loop it checks whether a literal metadata start event with the predicate PREDICATE_HAS_SUPPORT (declared in IOConstants) is encountered. If so, its content is read using JPhyloIOReadingUtils.readLiteralMetadataContentAsObject() and stored into the NodeData instance of the respective tree node (which is the application's data model). Since we are expecting a double value here, we try to parse in object that implements Number. Alternatively we could directly specify Double.class as the class parameter, but this way the implementation is more flexibly and can also read files declaring other numeric data types like Float, Integer or BigDecimal.

As mentioned above Newick and Nexus use string keys instead of URIs to label metadata. Therefore a LiteralMetadataEvent produced by NewickEventReader or NexusEventReader will only have a string representation as a predicate and not a QName. To be able to load simple annotations (stored support values in this case) from Newick or Nexus documents, the condition in the code snippet above needs to be extended:

if (PREDICATE_HAS_SUPPORT.equals(literalEvent.getPredicate().getURI()) ||
        PREDICATE_HAS_SUPPORT.getLocalPart().equals(literalEvent.getPredicate().getStringRepresentation())) {

    ...
}

The second condition accepts predicates that only have the local part of the URI as their string representations. This way such metadata can be read from Newick and Nexus as well, but care should be taken if predicates from different ontologies (under different namespaces) are combined. In such cases it may happen that the string keys are ambiguous.

Nested RDF-annotations (the taxonomy example)

After we saw how to read and write simple (e.g. numeric) literal metadata annotations in the last section, we will now have a look at how to handle more complex metadata. We chose taxonomic information about a tree node as an example and read and write the scientific name and the NCBI taxonomy ID of the terminal nodes of the tree. For this example, we chose to group these two string literal metadata elements under a resource metadata element that models the taxonomy information as a whole. These three elements will be connected by the following predicates (declared in IOConstants):

  • PREDICATE_HAS_TAXONOMY connects the resource metadata element modeling the taxonomy information as a whole to a tree node.
  • PREDICATE_HAS_SCIENTIFIC_NAME connects the taxonomy resource metadata subject with the literal metadata object that contains the scientific name as a string.
  • PREDICATE_HAS_NCBI_ID connects the taxonomy resource metadata subject with the literal metadata object that contains the NCBI taxonomy ID as a string.

The following code writes the taxonomy resource element with its two nested literal elements. It is taken from NodeListDataAdapter.writeContentData().

receiver.add(new ResourceMetadataEvent(id + DEFAULT_META_ID_PREFIX + "Tax1", null, 
        new URIOrStringIdentifier(null, PREDICATE_HAS_TAXONOMY), null, null));

if ((data.getTaxonomy().getNCBIID() != null) && !data.getTaxonomy().getNCBIID().isEmpty()) {
    JPhyloIOWritingUtils.writeSimpleLiteralMetadata(receiver, id + DEFAULT_META_ID_PREFIX + "Tax2", null,
            PREDICATE_HAS_NCBI_ID, DATA_TYPE_STRING, data.getTaxonomy().getNCBIID(), null);
}

if ((data.getTaxonomy().getScientificName() != null) && !data.getTaxonomy().getScientificName().isEmpty()) {
    JPhyloIOWritingUtils.writeSimpleLiteralMetadata(receiver, id + DEFAULT_META_ID_PREFIX + "Tax3", null,
            PREDICATE_HAS_SCIENTIFIC_NAME, DATA_TYPE_STRING, data.getTaxonomy().getScientificName(), null);
}

receiver.add(ConcreteJPhyloIOEvent.createEndEvent(EventContentType.RESOURCE_META));

The tool method writeSimpleLiteralMetadata() is used in the same way as in the section above and its two calls are surrounded by the highlighted code that writes an instance of ResourceMetadataEvent and its respective end event. (In the full file you find additional code that writes an alternative event sequence with PhyloXML-specific predicates. This is only necessary if special annotation tags available in PhyloXML shall be supported. It is explained below.)

The NeXML output resulting from executing the code abobe will look like this:

<node id="node2" about="#node2" label="Guinea pig">
    <meta id="node2metaTax1" xsi:type="nex:ResourceMeta" rel="a:hasTaxonomy">
        <meta id="node2metaTax2" xsi:type="nex:LiteralMeta" property="a:hasNCBIID" datatype="xsd:string">10141</meta>
        <meta id="node2metaTax3" xsi:type="nex:LiteralMeta" property="a:hasScientificName" datatype="xsd:string">Cavia porcellus</meta>
    </meta>
    ...
</node>

Reading files with nested metadata elements is similar to reading other nested events (e.g. the nodes within the tree or the sequences within an alignment) which was explained in the previous demos. In this case the method readStandardTaxonomy() was added to MetadataTreeReader that is called when the parent taxonomy resource start event is encountered and reads its nested sequence. Reading the two literal metadata elements within this method in done in the same way is it was to read the support value above. (Note that the alternative method readPhyloXMLTaxonomy() exists as well. It is only necessary to read data from the PhyloXML taxonomy tag and is explained below.)

Using PhyloXML specific annotations

Unlike NeXML, PhyloXML does not allow to represent custom nested annotations (e.g. using property tags), but if offers a set of predefined annotations to be represented in special tags. For taxonomic information the taxonomy tag exists, which has a number of child elements including a scientific_name and an id tag.

To make use of this special tags, NodeListDataAdapter checks in writeContentData() if the target format is PhyloXML or not. If so, it writes a different event sequence then for other formats using PhyloXML-specific predicates:

if (parameters.getObject(KEY_WRITER_INSTANCE, null, JPhyloIOEventWriter.class).getFormatID().equals(
        JPhyloIOFormatIDs.PHYLOXML_FORMAT_ID)) {
	
	
    receiver.add(new ResourceMetadataEvent(id + DEFAULT_META_ID_PREFIX + "Tax1", null, 
            new URIOrStringIdentifier(null, PhyloXMLConstants.PREDICATE_TAXONOMY), null, null));

    // Write NCBI taxonomy ID to JPhyloIO:
    if ((data.getTaxonomy().getNCBIID() != null) && !data.getTaxonomy().getNCBIID().isEmpty()) {
        receiver.add(new ResourceMetadataEvent(id + DEFAULT_META_ID_PREFIX + "Tax3", null, 
                new URIOrStringIdentifier(null, PhyloXMLConstants.PREDICATE_TAXONOMY_ID), null, null));
 
        JPhyloIOWritingUtils.writeSimpleLiteralMetadata(receiver, id + DEFAULT_META_ID_PREFIX + "Tax4", null,
                PhyloXMLConstants.PREDICATE_TAXONOMY_ID_ATTR_PROVIDER, DATA_TYPE_STRING, PHYLOXML_ID_PROVIDER_NCBI, null);
        JPhyloIOWritingUtils.writeSimpleLiteralMetadata(receiver, id + DEFAULT_META_ID_PREFIX + "Tax5", null,
                PhyloXMLConstants.PREDICATE_TAXONOMY_ID_VALUE, DATA_TYPE_STRING, data.getTaxonomy().getNCBIID(), null);
 
        receiver.add(ConcreteJPhyloIOEvent.createEndEvent(EventContentType.RESOURCE_META));  // Terminate the taxonomy ID resource metadata element.
    }

    // Write scientific name to JPhyloIO:
    if ((data.getTaxonomy().getScientificName() != null) && !data.getTaxonomy().getScientificName().isEmpty()) {
        JPhyloIOWritingUtils.writeSimpleLiteralMetadata(receiver, id + DEFAULT_META_ID_PREFIX + "Tax2", null,
                PhyloXMLConstants.PREDICATE_TAXONOMY_SCIENTIFIC_NAME, DATA_TYPE_STRING, data.getTaxonomy().getScientificName(), null);
    }
}
else {  // Write events for other formats
    ...  // See code snippet above.
}

There is predicate constant declared in PhyloXMLConstants for every metadata tag of PhyloXML. To write such specialized metadata tags, metadata events with these predicates can be written as shown above. Parent tags of PhyloXML are modeled as ResourceMetadataEvents, while attributes and text within such tags are modeled as LiteralMetadataEvents nested within the resource metadata events. Note that the order of writing for the events must match the order that is defined in the PhyloXML schema. Refer to the documentation of PhyloXMLEventWriter for further details.

The above code produces the following output in PhyloXML:

<clade id_source="node2">
    ...
    <taxonomy>
        <id provider="ncbi_taxonomy">10141</id>
        <scientific_name>Cavia porcellus</scientific_name>
    </taxonomy>
    ...
</clade>

The following event sequence is produced by JPhyloIO from the PhyloXML snippet above:

  • Start event of the type EventContentType.RESOURCE_META with the rel PhyloXMLConstants.PREDICATE_TAXONOMY
    • Start event of the type EventContentType.RESOURCE_META with the rel PhyloXMLConstants.PREDICATE_TAXONOMY_ID
      • Start event of the type EventContentType.LITERAL_META with the predicate PREDICATE_TAXONOMY_ID_ATTR_PROVIDER
        • Sole event EventContentType.LITERAL_META_CONTENT with the content "ncbi_taxonomy"
      • End event of the type EventContentType.LITERAL_META
      • Start event of the type EventContentType.LITERAL_META with the predicate PREDICATE_TAXONOMY_ID_VALUE
        • Sole event EventContentType.LITERAL_META_CONTENT with the content "10141"
      • End event of the type EventContentType.LITERAL_META
    • End event of the type EventContentType.RESOURCE_META
    • Start event of the type EventContentType.LITERAL_META with the predicate PREDICATE_TAXONOMY_SCIENTIFIC_NAME
      • Sole event EventContentType.LITERAL_META_CONTENT with the content "Cavia porcellus"
    • End event of the type EventContentType.LITERAL_META
  • End event of the type EventContentType.RESOURCE_META

Reading taxonomy information from PhyloXML is implemented in the methods readPhyloXMLTaxonomy() and readPhyloXMLTaxonomyID() of MetadataTreeReader, which process the event stream modeling the content of one taxonomy tag as shown above. Since the contents of the id tag of PhyloXML (the text value and the provider attribute) are nested under another resource metadata element, the additional method readPhyloXMLTaxonomyID() for processing the events nested under this resource metadata event was implemented.

All code explained in this section is only necessary, of PhyloXML-specfic tags shall be supported. If no such format-specific implementations are made, the genus and species name would be written into property tags on the top level and the information about the parent resource metadata element (as it is written to NeXML) would be lost (in the same way as it is described for Newick and Nexus below). Note that the parameter KEY_PHYLOXML_METADATA_TREATMENT can be used to customize the way how PhyloXMLEventWriter handles nested metadata if no PhyloXML-specific predicates are used. PhyloXMLMetadataTreatment enumerated the different options.

Nested annotations in Newick and Nexus

The Newick and Nexus formats do not support nesting annotations. NewickEventWriter and NexusEventWriter only write terminal metadata events and ignore all resource elements that contain nestes elements. Therefore the hasTaxonomy root element gets lost when writing to these formats and the genus and species annotations are on the top level:

((Guinea_pig[&hasGenus=Cavia, hasSpecies=porcellus], Louse[&hasGenus=Gyropus, hasSpecies=ovalis]) ... )Root;

Due to the shift of the genus and species annotation to the top level, this demo application will not be able to read these annotations again from Newick or Nexus, which is intended to show which formats support nested annotations and which don't. It would of course be possible though to adjust MetadataTreeReader in a way that it can also read the genus and species name from literal metadata events with respective string predicates on the top level. Such an specialized implementation is not done in this example to keep it from becoming too complex.

Lists of simple values as annotations (the size measurement example)

In this last example a list of double values shall be attached to tree nodes. It represents a list of size measuments obtained from different individuals of the respective species. Special hot comments for Newick and Nexus exist that allow attaching a list of strings or numeric values to a node or an edge. If a literal metadata event with any implementation of Iterable as its object value is passed to a JPhyloIO writer as shown below, it will automatically be written as a list. The list ↔ string conversion is implemented in ListTranslator.

JPhyloIOWritingUtils.writeSimpleLiteralMetadata(receiver, id + DEFAULT_META_ID_PREFIX + "Sizes", null, 
        PREDICATE_HAS_SIZE_MEASUREMENTS, DATA_TYPE_SIMPLE_VALUE_LIST, data.getSizeMeasurements());

If the code above is executed, the result in Newick and Nexus will look like this:

((Guinea_pig[&hasGenus=Cavia, hasSpecies=porcellus, hasSizeMeasurements={0.3, 0.31, 0.24}], 
        Louse[&hasGenus=Gyropus, hasSpecies=ovalis, hasSizeMeasurements={0.0012, 9.0E-4, 0.0011, 0.0013, 0.0012}]) ... )Root;

To achieve a maximal format-independence, JPhyloIO supports writing such lists also to XML formats (although these would of course also allow to use a special XML representation for a list). To achieve this the data type DATA_TYPE_SIMPLE_VALUE_LIST (declared in ReadWriteConstants) must be specified for such a LiteralMetaEvent.

The output in NeXML would be the following:

<node id="node2" about="#node2" label="Guinea pig">
    ...
    <meta id="node2metaSizes" xsi:type="nex:LiteralMeta" property="a:hasSizeMeasurements"
            datatype="jpd:simpleValueList">{0.3, 0.31, 0.24}</meta>
</node>

PhyloXML-specific reading

The representation in PhyloXML looks like this:

<property ref="a:hasSizeMeasurements" datatype="xsd:string" applies_to="node">{0.3, 0.31, 0.24}</property>

As mentioned above, PhyloXML only allows to use a predefined set of data type declarations, which is why the JPhyloIO-specific data type jpd:simpleValueList cannot be written out to a document. PhyloXMLEventWriter uses xsd:string instead. As a consequence ListTranslator is not used for interpreting the string representation of that list, but it will be modeled as string literal metadata instead. As a workaround for this limitation of the PhyloXML format, this example application tests if the read object is an instance of list and if not manually converts the string:

Object list = JPhyloIOReadingUtils.readLiteralMetadataContentAsObject(reader, Object.class);
if (list instanceof List) {  // This case is used when reading valid documents of all formats but PhyloXML.
    data.setSizeMeasurements((List<Double>)list);  // If the document is invalid, the list would not necessarily contain only double values. This would have to checked in a real-world application to avoid exceptions.
}
else if (list instanceof String) {  // This block is used when reading valid PhyloXML documents.
    try {
        data.setSizeMeasurements((List)ListTranslator.parseList((String)list));
    } 
    catch (UnsupportedOperationException | InvalidObjectSourceDataException e) {}
}

The code snippet above is taken from MetadataTreeReader.

bioinfweb RSS feed JPhyloIO on ResearchGate bioinfweb on twitter JPhyloIO on GitHub
bioinfweb - Biology & Informatics Website