Antimicrobial peptides are considered a promising alternative to conventional antibiotics, especially now that the advent of technology has facilitated efforts to predict antimicrobial activity algorithmically, allowing more focused experiments. These efforts have produced a number of antimicrobial peptide databases, such as
- APD2 (Wang et al., 2009),
- CAMP (Shaini et al., 2009),
- AMSDb (Tossi and Sandri, 2002),
- AMPer (Fjell et al., 2007),
- YADAMP (Piotto et al., 2012),
- DAMPD (Sundararajan et al., 2011) and
- BACTIBASE (Hammami et al., 2007),
as well as prediction tools. Although various plant species have been examined even from the ancient years in respect to their pharmacological properties, only a very small in size but precious in experimental data database exists hosting plant antimicrobial peptides: PhytAMP (Hammami et al., 2008).
We have made a large scale exploration of all the possible peptides (with lengths from 5 AA to 100 AA) coming from all the proteins of the whole plant
species regarding their predicted antimicrobial activity. Our classifier is a Support Vector Machine (SVM) implementation based on LIBSVM with
built-in support for probability estimates for each class.
We chose the pseudo amino acid composition (REF) with respect to the E1 amino acid descriptor of (REF) as our feature vector.
- The E1 descriptor is part of a set of 5 descriptors, derived from an original set of amino acid 237 descriptors,
in such a way that the distribution of amino acids in this 5-dimensional property space is roughly the same as that in the original 237-dimensional space.
As a result, each of these descriptors is correlated with numerous physicochemical properties;
E1 descriptor is tightly coupled with measures of hydrophobicity/hydrophilicity, polarity and charge, all of which have been associated with antimicrobial activity.
- Moreover, the pseudo amino acid composition allows us to take into account both amino acid composition and relative position of amino acids within a sequence, and has been used in the past for protein functional characterization.
Our initial positive dataset comprised all the APD and experimentally validated CAMP peptides filtered at 85% identity by CD-HIT. Our negative dataset comprised from random subsequences of SwissProt proteins that have not been described/annotated as antimicrobial, synthesized amino acid sequences following a uniform amino acid distribution and synthesized amino acid sequences following the amino acid distribution of SwissProt.
The classifier's performance across a 10-fold cross-validation is very good since it presents average accuracy 0.91, average sensitivity 0.93 and average specificity 0.90.
It is worth noting, that out of the 273 antimicrobial peptides in PhytAMP, 67 were in our training set, and our classifier classified 99.5 % of the remaining as antimicrobial whereas the corresponding percentage for the CAMP SVM classifier is 91.2%.
C-PAmP Database contains 15,174,905 computationally predicted peptides, with lengths from 5 to 100 AA, from 2,112 plant species.It facilitates multiple types of queries. The user may search by
- peptide sequences,
- by proteins or
- by species.
Also, the user is able to retrieve potential antimicrobial peptides in a certain length range or above a certain score, maximizing the confidence of antimicrobial activity. Each retrieved peptide from the database is accompanied by the C-PAmP score and the CAMP score as well, in order to enrich the antimicrobial scoring information. Also, the protein search returns antimicrobial heat maps of each protein that graphically represent the antimicrobially 'hot' regions of each protein.
C-PAmP has been developed and hosted by the Academy of Athens Foundation for Biomedical Research
For more information about the responsible research group, you can visit the Biomedical Informatics Group