CLUSTALW

../../_images/ClustalW.jpg

About

ClustalW2 [1] is a widely used multiple program for multiple alignment of nucleic acid and protein sequences. sequence alignment computer program.

The program accepts a wide range on input formats including: NBRF/PIR, FASTA, EMBL/Swissprot, Clustal, GCC/MSF, GCG9 RSF, and GDE, and executes the following workflow:

  • Pairwise alignment;
  • Creation of a phylogenetic tree (or use a user-defined tree);
  • Use of the phylogenetic tree to carry out a multiple alignment

Users can align the sequences using the default setting but occasionally it may be useful to customize one’s own parameters. The main parameters are the gap opening penalty and the gap extension penalty.

For more information:

  • Consult the official documentation [2];
  • Clustal W and Clustal X version 2.0 [3];
  • The MP4 file is a video [4] showing how to use the ClustalW from the Africa Grid Science Gateway [5]

Installation

To install this portlet the WAR file has to be deployed into the application server.

As soon as the portlet has been successfully deployed on the Science Gateway the administrator has to configure:

  • the list of e-Infrastructures where the application can be executed;
  • some additional application settings.

1.) To configure a generic e-Infrastructure, the following settings have to be provided:

Enabled: A true/false flag which enables or disable the generic e-Infrastructure;

Infrastructure: The acronym to reference the e-Infrastructure;

VOName: The VO for this e-Infrastructure;

TopBDII: The Top BDII for this e-Infrastructure;

WMS Endpoint: A list of WMS endpoint for this e-Infrastructure (max. 10);

MyProxyServer: The MyProxyServer for this e-Infrastructure;

eTokenServer: The eTokenServer for this e-Infrastructure;

Port: The eTokenServer port for this e-Infrastructure;

Serial Number: The MD5SUM of the robot certificate to be used for this e-Infrastructure;

In the following figure is shown how the portlet has been configured to run simulation on the EUMEDGRIDSupport e-Infrastructure [6].

../../_images/CLUSTALW_settings.jpg

2.) To configure the application, the following settings have to be provided:

AppID: The ApplicationID as registered in the UserTracking MySQL database (GridOperations table);

Software TAG: The list of software tags requested by the application;

SMTP Host: The SMTP server used to send notification to users;

Sender: The FROM e-mail address to send notification messages about the jobs execution to users;

In the figure below is shown how the application settings have been configured to run on the Africa Grid Science Gateway [5].

../../_images/CLUSTALW_settings2.jpg

Usage

To perform the Multi Sequence Alignment for DNA or protein the user has to:

  • click on the third accordion of the portlet,
  • choose the sequence type (e.g.: DNA or protein),
  • upload the sequence as ASCII file OR use the default one pre-configured clicking in the textarea below,
  • configured additional settings if needed as shown in the below figure:
../../_images/CLUSTALW_inputs.jpg

Each simulation will produce:

  • std.out: the standard output file;
  • std.err: the standard error file;
  • .tar.gz: containing the results of the Monte Carlo simulation.

A typical simulation produces, at the end, the following files:

]$ tree SequenceAlignmentSimulationStarted_126163/
SequenceAlignmentSimulationStarted_126163/
├── std.err
├── std.out
├── output.README
└── outputs.tar.gz

]$ tar zxvf outputs.tar.gz
20150601120928_larocca.aln
20150601120928_larocca.dnd

Contributor(s)

Please feel free to contact us any time if you have any questions or comments.

Authors:

Roberto BARBERA - Italian National Institute of Nuclear Physics (INFN),

Giuseppe LA ROCCA - Italian National Institute of Nuclear Physics (INFN)