Skip to main content Skip to navigation

Perl Scripts

I have been teaching myself Perl recently and have generated a few scripts which I find useful. On the off-chance that someone else might want to use them I've linked them below.


########################## Scripts: #############################

These scripts are fairly verbose and may contain bugs. However, they are designed to speed up tasks that I find monotonous and time-consuming - they are not designed for general user-friendliness. Having said that, let me know if you do find any bugs. I'll try to remember to update versions as I add more functionality and remove bugs.

Most scripts rely upon a library file of subroutines (library.pm) which needs to be in the same folder as the scripts (I think)...

PS. All my new-found Perl expertise comes from James Tisdall, via his excellent book 'Beginning Perl for Bioinformatics' (O'Reilly Media Inc.). As such, elements of my code may look similar in style to those in his book, though I've tried to write from scratch as much as possible.

####################### FASTA File Scripts ######################### ### FASTA_DNA.pl ### Reads in data from a FASTA format sequence file, changes case, removes whitespace and line returns, characterises base composition and outputs outfile with sequence as single word, or in lines of user-defined length. Recommended to clean up any file prior to use in other scripts. ############################################################## ### FASTA_protein.pl ### Reads in data from a FASTA format sequence file, changes case, removes whitespace and line returns and outputs outfile with sequence as single word, or in lines of user-defined length. Recommended to clean up any file prior to use in other scripts. ############################################################## ### Rev_Comp.pl ### Reads in a DNA sequence FASTA file and produces reverse complement as a FASTA file. ############################################################## ### Translate_DNA.pl ### Reads in a DNA sequence FASTA file and translates in all 6 frames. Creates outfile of all predicted ORFs, as a multiple FASTA file. FASTA tags give contig#, length of contig, start/stop positions and length of ORF. $cutoff defined within library.pm gives a minimum ORF size. ############################################################## ### RE_digest.pl ### Given a FASTA format DNA file, searches for restriction enzyme sites in a sequence. Creates an outfile of results. Needs a REBASE format RE database file available from NEB website. (Note: remove the file extension from the database file prior to use.) ############################################################## ### DNA_extract.pl ### Given a FASTA sequence will output a substring of the sequence as FASTA file, starting and stopping at user-defined positions. Will loop through until terminated. ############################################################## ### DNA_extract_with_flanking.pl ### Given a FASTA sequence will output a substring of the sequence as FASTA file, starting and stopping at user-defined positions. Will loop through until terminated. Returns 3Kbp of flanking sequence either side if possible. (Can be altered easily within script). ##################### Multiple FASTA File Scripts ###################### ### multiFASTA.pl ### Given a multiple FASTA format file, spits out FASTA files containing only one of the contigs (in FASTA format). Does this repetitively until terminated by user. ############################################################## ### Translate_multiFASTA.pl ### Takes in and formats a multiple FASTA file as a hash. Then translates all sequences (cutoff defined within library.pm) and prints to OUTFILE. ########################## Genbank Scripts ######################## ### Genbank_format.pl ### Given a Genbank DNA file, will extract the DNA sequence, format it, and return it as a FASTA file. ########################### Other Scripts ######################### ### Random_Sequence_Generator.pl ### Given a user-defined GC content and length desired, outputs random sequence as a FASTA file. ##############################################################