Downloadable FireSat algorithm implementations
The Octave/Matlab versions of FireSat implement the functionality to detect TRs where the repetitive motif length ranges from 10 to 250. Implementations of FireSat 1, 2 and 3 are provided. Note that FireSat3 has been analysed and optimised for accuracy. This has been accomplished by doing a Recall-Precision analysis, a deviation of an Reciever Operator Characteristic (ROC) analysis. Provided synthetic data has TRs at every 1000 nucleotides.
The main file "/phd_cdr_implement/firesat_3/15_fsat/latest/try_do_find_trs_110_default.m" tests 20 datasets with motif-lengths [10, 25, 50, 100 and 200]. The respective motif-length files have error percentages of [2, 5, 10, and 20]%.
Parameters are currently set to default values. Parameters can however be tweaked by editing them in the preampble of this file. Either Matlab or Octave can be used. Run time achieved on a Dell Precision laptop with a 6th generation i7 processor for a single dataset in Octave is 26s while 2.6s is achieved in Matlab.
An alternative main file, prompting the user to select files from the provided synthetic input data sets, can be found in the downloaded zip at: "/phd_cdr_implement/firesat_3/15_fsat/latest/find_trs_user_input.m".
The functional interface to FireSat3 is: "/phd_cdr_implement/firesat_3/15_fsat/latest/fsat.m" can be excecuted without parameters on a synthetic dataset or help is provided by typing help fsat.
fsat - functional interface function [trs, mots] = fsat(sfname,mlen,motif_errperc_max,mis_max_perc,... del_max_perc,ins_max_perc, p_mis, p_del, p_ins,... sigma_max, beta, match_score_tfactor ) Parameters 'sfname'- FASTA format genetic sequence to be searched for TRs. 'mlen' - |rho| the motif length (PTRE-length) range [10-250]. 'motif_errperc_max' - var-epsilon max percentage, maximum motif error percentage allowed for an ATRE, range [0-50] (default = 20%) 'mis_max_perc, del_max_perc, ins_max_perc' - the maximum error percentage allowed for m,d and i respectively. These are optional parameters (default = 20%). 'p_mis, p_del, p_ins' - are the penalties for mismatches, deletions and insertions. The default penalties are [1 1 1]. 'sigma_max' - the maximum substring error percentage allowed. sigma_max is calulated as follows: '(p_mis*nm + p_del*nd + p_ins*ni)*100/mlen = sigma_max' where nd, nm and ni are the number of deletions, mismatches and insertions respectively. sigma_max is calculated over a TRE. The range of sigma_max is [0-100] (default = 40%). 'beta' - beta-min, the minimum TREs that should occur before a TR is valid (default = 2). 'match_score_tfactor' - the threshold factor, range [0,1] (default 1.0) is multiplied with a motif length dependant match score threshold function. The value obtained is compared with the LC-norm (Levenshtein correspondence based match score) to validate a TRE detection. Outputs 'tres' - an Nx6 array where each row denotes a detected TR: [mlen, motif_errperc_max, tr_pos, tr_len, ntres, mean(LC_norm)] 'mots' - the introductory motifs in a cell-array.
The advanced commandline version runs on Windows and Linux. It has added functionality including that it finds TRs of length 1 to 6. One can add TranscriptInfo as parameter and can only partially parse an input file if one so wishes. A batch file is included in th zip as an example of usage.
FireSat was tested on Octave version 4.2.1 and on Matlab version 2016b both 64bit and runs on Windows 7 or later.
The absolute substring version runs on Windows and was implemented by calculating thresholds based on the absolute substring error as defined in this document.
The relative substring error version runs on Windows and was implemented by calculating thresholds based on the relative substring error. The relative substring error is defined as the substring error here and here.
The FireµSat2 ReadMe file provides instructions for the installation and execution of the FireµSat2-GUI version as well as an explanation of the options to execute the command line version.
The FireµSat2 Input/Output document provides a description of the usage and range for the input parameters of FireµSat2. A discussion of the output format of FireµSat2 is also included (applicable to all versions).
- Results of the verification of a published comparative study on PTR detection
The following table presents the scripts used to verify the results of 3 packages when searching for PTRs on the specified dataset as published in the article titled Review of tandem repeat search tools: a systematic approach to evaluating algorithmic performance in the journal Briefings in Bioinformatics.
|ptrfind.m||none needed||none||none needed||596 PTRs len>=20|
|FireµSat2||runall6err0_no_trinfo_20.bat||article_seq_20.zip||try_firemusat.m||596 PTRs detected|
|T-Reks on 24 Sept '12||sim=1.0, overlaps=OFF,indels=0%||t_reksm.zip||try_treks.m||393 PTRs detected|
|INVERTER on 24 Sept '12||Min-len=20, Subset=ON||inverter_new.csv||try_inverter.m||125 PTRs detected|