Help - Alignment and data models

From LibrAlign Documentation

LibrAlign defines interfaces to be implemented by model classes providing the data for the Alignment GUI components, as well as abstract and complete model implementations. The main interface for alignment models is AlignmentModel.

Using alignment models

Details on the methods of an alignment model can be found in the API documentation of AlignmentModel. This section gives an overview on how to create model instances and how edit and access sequences and tokens. The sample code developed below can also be found in the class CreateAlignmentModel in the demo repository.

Creating a model instance

The following example codes creates a new model instance to take up DNA nucleotide data:

// Create new model instance:
AlignmentModel<Character> model = new PackedAlignmentModel<Character>(CharacterTokenSet.newDNAInstance());

The interface AlignmentModel has a generic parameter defining the token class to be used. In this example we chose Character to represent nucleotides. LibrAlign does not make any demands on the token class, therefore any other type (e.g. String or NucleotiodeCompound from BioJava or any custom class) would have been valid here. The only requirement is, that the alignment model instance is creating using an according token set, that can deal with the specified token type. (See Token sets for details on how to create custom token sets.)

PackedAlignmentModel was chosen as the alignment model implementation for this example. Packed alignment models only use a small amount of memory to store the sequence data and are usually a good choice, especially useful for large sequences. The only disadvantage of them is, that they need to know the number of different tokens, when they are constructed. (Their token set cannot be extended later on.) If such an extension could be necessary, because the full token set is not known at construction time, another model implementation (e.g. ArrayListAlignmentModel, see below) could be used.

The constructor of PackedAlignmentModel expects a token set, that is compatible with the generic token type. Since we chose Character as the token type, it was possible to use CharacterTokenSet, which offers a set of static methods providing predefined sets for DNA, RNA or amino acid data. Alternatively tokens could have been added manually to the set. (See token sets for further details.)

Adding sequences and tokens

After the model instance is created, we are going to add the first sequence:

// Add one sequence and add single tokens:
int id = model.addSequence("Seq1");
model.appendToken(id, 'A');
model.appendToken(id, 'T');
model.appendToken(id, 'C');
model.appendToken(id, 'G');

This example shows how to manually add sequences and tokens to a model instance, to make you familiar with the architecture of alignment models in LibrAlign. (Note that usually tokens and sequences would be added by the application user with an according GUI component or loaded from a data source.)

To be able to add tokens, we first need to add a sequence to our empty model. This is done using the addSequence() method, which returns an integer that identifies this sequence from now on. (Note that sequences are identified by these IDs in LibrAlign instead of their names. That makes it easy to keep valid references to sequences, even if they are renamed.)

The subsequent appendToken() calls append four single tokens to the new sequence using the sequence ID that was stored in id.

With the following code, we are going to add another sequence to our model. This time we will add a whole list of tokens in one step, instead of adding each tokens with a single method call:

// Add another sequence and add a list of tokens:
id = model.addSequence("Seq2");		
model.appendTokens(id, AlignmentModelUtils.charSequenceToTokenList("AT-G", model.getTokenSet()));

The appendTokens() method defined by AlignmentModel allows to add a collection of tokens in one step. In this case we have been using a convenience method of AlignmentModelUtils to convert a string into a Collection of Character objects. Another advantage of of using appendTokens() instead of multiple calls of appendToken() is that only one event will be generated by the model (see below).

Accessing sequences and tokens

At the end of our example, we will access the sequences and tokens from the model again and print the to the standard out. (This is just for demonstration purposes, since usually an AlignmentArea would display the contents of a model.)

// Test output of the alignment:
Iterator<Integer> idIterator = model.sequenceIDIterator();
while (idIterator.hasNext()) {
	id = idIterator.next();
	
	// Print sequence name:
	System.out.print(model.sequenceNameByID(id) + "\t");
	
	// Print tokens (nucleotides):
	for (int column = 0; column < model.getSequenceLength(id); column++) {
		System.out.print(model.getTokenAt(id, column));
	}
	System.out.println();
}

To access the sequences contained in the model, we have been using an iterator over all sequence IDs returned by the sequenceIDIterator() method. The length of each sequence can be determined by calling getSequenceLength() with the according ID and single tokens can be accessed using getTokenAt() with the sequence ID and the column index.

The output looks as follows:

Seq1	ATCG
Seq2	AT-G

Available alignment models

LibrAlign offers the following set of implementations of AlignmentModel which can directly be used:

  • PackedAlignmentModel: Compresses the stored data, but still allows access in constant time. Token sets cannot be extended after creation of the model.
  • ArrayListAlignmentModel: An implementation of AlignmentModel using ArrayList objects internally. Token sets may be extended after the model has been created, but uses more memory than PackedAlignmentModel.
  • CharSequenceAlignmentModel: An alignment model implementation backed by a set of CharSequence implementations (e.g. String objects. Single tokens are characters.
  • BioJava adapter alignment models.

Although the model implementations mentioned above should be sufficient for most cases, LibrAlign Additionally offers a set of abstract model implementations to simplify the creation of new custom models. The following list contains the most basic available classes. Additional classes are shown in UML diagram below.

  • AbstractAlignmentModel: This is the base class for all alignment models and alignment model decorators. It implements the common change listener functionality.
  • AbstractUndecoratedAlignmentModel: Implements sequence ID management, token set storage and efficient max length calculation for all direct alignment model implementations (that are not decorators of another model, see below).
  • AbstractMapBasedAlignmentModel: Base class that inherits from AbstractUndecoratedAlignmentModel and provides a map to organize sequence objects.

Alignment model decorators

Alignment model decorators are implementations of AlignmentModel that provide modified access to another underlying alignment model. Such modifications could be the replacement of tokens (e.g. viewing a DNA sequence as an RNA sequence) or inserting additional gaps (e.g. in AlignmentComparator). For decorators LibrAlign also provides abstract implementations to inherit custom decorators from, as well as full implementations.

The following abstract decorator implementations are available in LibrAlign:

  • AbstractAlignmentModelDecorator: Basic implementation for alignment model decorators that stores a reference to the underlying model and implements basic event forwarding from the underlying model.
  • AbstractTokenReplacementAlignmentModelDecorator: Decorators that simply replace tokens without changing their position can be inherited from this class. It extends AbstractAlignmentModelDecorator by implementing many inherited abstract methods for that purpose.

In addition the following concrete decorator implementations are available to date:

  • DNAAlignmentModelDecorator: Shows the underlying nucleotide data source as a DNA sequence, i.e. replaces all uracil tokens by thymine tokens.
  • RNAAlignmentModelDecorator: Shows the underlying nucleotide data source as a RNA sequence, i.e. replaces all thymine tokens by uracil tokens.

Note that decorators are a way of converting one sequence type into another (e.g. converting between DNA, RNA or amino acids) on the model level. Alternatively the output of an AlignmentArea can also be modified using different token painters to perform a conversion in the view level. The latter option may e.g. be better, if a user should be able to switch the way sequences are displayed in an AlignmentArea (e.g. between nucleotide and amino acid view) dynamically during runtime.

Overview on available alignment model classes

UML class diagram of the alignment models available in LibrAlign. On the left direct (abstract and concrete) implementations of AlignmentModel are shown, while decorator implementations are shown on the right.

Listening to model events

Data models