In the previous article, I introduced Bionitio and provided a simplified template. Here, we build upon the template by adding two new features: logging and error catching.
Logging
The below logging function does a few things:
Automatically starts logging when the programme is run
Adds time and date to every log entry
The last line adds the user-supplied command-line arguments to the log
import logging
import sys
def init_logging(file_location):
'''If the log_filename is defined, then
initialise the logging facility, and write log statement
indicating the program has started, and also write out the
command line from sys.argv
Arguments:
log_filename: either None, if logging is not required, or the
string name of the log file to write to
Result:
None
'''
logging.basicConfig(filename=file_location + '.log.txt',
level=logging.INFO,
format='%(asctime)s %(levelname)s - %(message)s',
datefmt="%Y-%m-%dT%H:%M:%S%z",
force = True)
logging.info('program started')
logging.info('command line: %s', ' '.join(sys.argv))
If you want to add a log entry later in the script, simply use logging.info('your_message'). For example, I add a log for every sequence I reverse complement below (see bold). I used an f-string so that every log entry will carry the unique ID of the record.
def reverse_complement_all(fasta):
# Reads fasta file
fasta_records = SeqIO.parse(fasta, "fasta")
# Create empty list to store reverse complemented seqs
reverse_complement_list = []
# Loop through fasta_records and
# add reverse complement seq to empty list
for record in fasta_records:
forward_seq = Seq(str(record.seq))
reverse_complement_list += [ SeqRecord(forward_seq.reverse_complement(),
id = record.id,
description = '') ]
logging.info(f"{record.id} is reverse complemented.")
return reverse_complement_list
Let's run our programme and see what happens:
$ python3 rcomper.py --out test.fasta ./input/test_multiple.fasta
An output log file is produced, which shows everything works. Nice.
2022-01-29T14:47:05+0800 INFO - program started
2022-01-29T14:47:05+0800 INFO - command line: rcomper.py --out test.fasta ./input/test_multiple.fasta
2022-01-29T14:47:05+0800 INFO - 1 is reverse complemented.
2022-01-29T14:47:05+0800 INFO - 2 is reverse complemented.
2022-01-29T14:47:05+0800 INFO - 3 is reverse complemented.
Catching errors and exiting
If a critical error occurs, we must stop the programme and return a non-zero integer. The reason is explained by gnu.org:
For the shell’s purposes, a command which exits with a zero exit status has succeeded. A non-zero exit status indicates failure. This seemingly counter-intuitive scheme is used so there is one well-defined way to indicate success and a variety of ways to indicate various failure modes.
Let's start with understanding Bionitio's error exit function. It does the following:
If called, error logging is performed
The terminal will receive your error message
System exit stops the programme from running
def exit_with_error(message, exit_status):
'''Print an error message to stderr, prefixed by the program name and 'ERROR'.
Then exit program with supplied exit status.
Arguments:
message: an error message as a string.
exit_status: a positive integer representing the exit status of the
program.
'''
logging.error(message)
print("{} ERROR: {}, exiting".format(PROGRAM_NAME, message), file=sys.stderr)
sys.exit(exit_status)
Here is an example from our script showing how the error exit function works. We first supply the programme with a non-existing fasta file and see what error is called.
$ python3 rcomper.py --out test.fasta ./input/nonexisting.fasta
Traceback (most recent call last):
File "<ipython-input-5-b880fb73ba91>", line 1, in <module>
fasta_records = SeqIO.parse(fasta, "fasta")
File "E:\anaconda\lib\site-packages\Bio\SeqIO\__init__.py", line 605, in parse
return iterator_generator(handle)
File "E:\anaconda\lib\site-packages\Bio\SeqIO\FastaIO.py", line 183, in __init__
super().__init__(source, mode="t", fmt="Fasta")
File "E:\anaconda\lib\site-packages\Bio\SeqIO\Interfaces.py", line 48, in __init__
self.stream = open(source, "r" + mode)
FileNotFoundError: [Errno 2] No such file or directory: 'out.fasta'
Now we include the Try-Except block of codes. This means we try to read the fasta file and if it is non-existent, then we catch the FileNotFoundError. Then, the exit_with_error function comes into play where we update the log and return a specific non-zero value for this error (in this case the value of 1).
FASTA_FILE_NOT_FOUND = 1
def reverse_complement_all(fasta):
# Reads fasta file
try:
fasta_records = SeqIO.parse(fasta, "fasta")
except FileNotFoundError:
exit_with_error("Input fasta file not found. " +
"Please check directory and name of fasta file.",
FASTA_FILE_NOT_FOUND)
# Create empty list to store reverse complemented seqs
reverse_complement_list = []
# Loop through fasta_records and
# add reverse complement seq to empty list
for record in fasta_records:
forward_seq = Seq(str(record.seq))
reverse_complement_list += [ SeqRecord(forward_seq.reverse_complement(),
id = record.id,
description = '') ]
logging.info(f"{record.id} is reverse complemented.")
return reverse_complement_list
In our script, write_out() is the last computation we run. We only call write_out() when the programme is run smoothly and correctly to the end, so we return zero to signify everything's working well.
def write_out(reverse_comp_records, file_location):
'''Write reverse complement fasta file to user-specified location'''
SeqIO.write(reverse_comp_records, file_location, "fasta")
return 0
Let's try the new error exit feature:
$ python3 rcomper.py --out test.fasta ./input/nonexisting.fasta
rcomper ERROR: Fasta file not found. Please check directory and name of fasta input., exiting
Summary
With logging and error catching and exiting implemented, we have a more complete script. We could now rearrange everything for it to look neater and we are one step closer to the ideal:
'''
Module : Main
Description : The main entry point for the program.
Copyright : (c) WEIYUAN, 23JAN2021
License : MIT_LICENSE
Maintainer : BIO.WEIYUAN@GMAIL.COM
Portability : POSIX
The program accepts a list of DNA sequences in FASTA format and
returns a list of their reverse complement in FASTA format
'''
from argparse import ArgumentParser
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
import logging
import sys
PROGRAM_NAME = "rcomper"
PROGRAM_VERSION = 1.00
FASTA_FILE_NOT_FOUND = 1
def init_logging(file_location):
'''If the log_filename is defined, then
initialise the logging facility, and write log statement
indicating the program has started, and also write out the
command line from sys.argv
Arguments:
log_filename: either None, if logging is not required, or the
string name of the log file to write to
Result:
None
'''
logging.basicConfig(filename=file_location + '.log.txt',
level=logging.INFO,
format='%(asctime)s %(levelname)s - %(message)s',
datefmt="%Y-%m-%dT%H:%M:%S%z",
force = True)
logging.info('program started')
logging.info('command line: %s', ' '.join(sys.argv))
def exit_with_error(message, exit_status):
'''Print an error message to stderr, prefixed by the program name and 'ERROR'.
Then exit program with supplied exit status.
Arguments:
message: an error message as a string.
exit_status: a positive integer representing the exit status of the
program.
'''
logging.error(message)
print("{} ERROR: {}, exiting".format(PROGRAM_NAME, message), file=sys.stderr)
sys.exit(exit_status)
def parse_args():
'''Parse command line arguments.
Returns Options object with command line argument values as attributes.
Will exit the program on a command line error.
'''
description = 'Read a list of fasta sequences and returns their reverse complement'
parser = ArgumentParser(description=description)
parser.add_argument('--out',
type=str,
default = './out.fasta',
help='Output file location and file name')
parser.add_argument('fasta_file',
type=str,
help='Input FASTA files')
return parser.parse_args()
def reverse_complement_all(fasta):
# Reads fasta file
try:
fasta_records = SeqIO.parse(fasta, "fasta")
except FileNotFoundError:
exit_with_error("Input fasta file not found. " +
"Please check directory and name of fasta file.",
FASTA_FILE_NOT_FOUND)
# Create empty list to store reverse complemented seqs
reverse_complement_list = []
# Loop through fasta_records and
# add reverse complement seq to empty list
for record in fasta_records:
forward_seq = Seq(str(record.seq))
reverse_complement_list += [ SeqRecord(forward_seq.reverse_complement(),
id = record.id,
description = '') ]
logging.info(f"{record.id} is reverse complemented.")
return reverse_complement_list
def write_out(reverse_comp_records, file_location):
'''Write reverse complement fasta file to user-specified location'''
SeqIO.write(reverse_comp_records, file_location, "fasta")
return 0
def main():
"Orchestrate the execution of the program"
# Get command-line input
args = parse_args()
fasta = args.fasta_file
file_location = args.out
# Log
init_logging(file_location)
# Run function
reverse_comp_records = reverse_complement_all(fasta)
write_out(reverse_comp_records, file_location)
# If this script is run from the command line then call the main function.
if __name__ == '__main__':
main()
In the next few articles, I will cover what's critically missing: automated testing, smooth installation of the software and also the writing of a Docker image.
Комментарии