Bionitio | Logging and errors

Updated: Feb 1



In the previous article, I introduced Bionitio and provided a simplified template. Here, we build upon the template by adding two new features: logging and error catching.


Logging

The below logging function does a few things:

  1. Automatically starts logging when the programme is run

  2. Adds time and date to every log entry

  3. The last line adds the user-supplied command-line arguments to the log

import logging
import sys

def init_logging(file_location):
    '''If the log_filename is defined, then
    initialise the logging facility, and write log statement
    indicating the program has started, and also write out the
    command line from sys.argv
    Arguments:
        log_filename: either None, if logging is not required, or the
            string name of the log file to write to
    Result:
        None
    '''
    logging.basicConfig(filename=file_location + '.log.txt',
                        level=logging.INFO,
                        format='%(asctime)s %(levelname)s - %(message)s',
                        datefmt="%Y-%m-%dT%H:%M:%S%z",
                        force = True)
    logging.info('program started')
    logging.info('command line: %s', ' '.join(sys.argv))


If you want to add a log entry later in the script, simply use logging.info('your_message'). For example, I add a log for every sequence I reverse complement below (see bold). I used an f-string so that every log entry will carry the unique ID of the record.

def reverse_complement_all(fasta):
    # Reads fasta file
    fasta_records = SeqIO.parse(fasta, "fasta")
    # Create empty list to store reverse complemented seqs
    reverse_complement_list = [] 
    # Loop through fasta_records and 
    # add reverse complement seq to empty list
    for record in fasta_records:
        forward_seq = Seq(str(record.seq))
        reverse_complement_list += [ SeqRecord(forward_seq.reverse_complement(),
                                    id = record.id,
                                    description = '') ]
        logging.info(f"{record.id} is reverse complemented.")
    return reverse_complement_list

Let's run our programme and see what happens:

$ python3 rcomper.py --out test.fasta ./input/test_multiple.fasta

An output log file is produced, which shows everything works. Nice.

2022-01-29T14:47:05+0800 INFO - program started
2022-01-29T14:47:05+0800 INFO - command line: rcomper.py --out test.fasta ./input/test_multiple.fasta
2022-01-29T14:47:05+0800 INFO - 1 is reverse complemented.
2022-01-29T14:47:05+0800 INFO - 2 is reverse complemented.
2022-01-29T14:47:05+0800 INFO - 3 is reverse complemented.


Catching errors and exiting

If a critical error occurs, we must stop the programme and return a non-zero integer. The reason is explained by gnu.org:

For the shell’s purposes, a command which exits with a zero exit status has succeeded. A non-zero exit status indicates failure. This seemingly counter-intuitive scheme is used so there is one well-defined way to indicate success and a variety of ways to indicate various failure modes.

Let's start with understanding Bionitio's error exit function. It does the following:

  1. If called, error logging is performed

  2. The terminal will receive your error message

  3. System exit stops the programme from running

def exit_with_error(message, exit_status):
    '''Print an error message to stderr, prefixed by the program name and 'ERROR'.
    Then exit program with supplied exit status.
    Arguments:
        message: an error message as a string.
        exit_status: a positive integer representing the exit status of the
            program.
    '''
    logging.error(message)
    print("{} ERROR: {}, exiting".format(PROGRAM_NAME, message), file=sys.stderr)
    sys.exit(exit_status)

Here is an example from our script showing how the error exit function works. We first supply the programme with a non-existing fasta file and see what error is called.

$ python3 rcomper.py --out test.fasta ./input/nonexisting.fasta

Traceback (most recent call last):

  File "<ipython-input-5-b880fb73ba91>", line 1, in <module>
    fasta_records = SeqIO.parse(fasta, "fasta")

  File "E:\anaconda\lib\site-packages\Bio\SeqIO\__init__.py", line 605, in parse
    return iterator_generator(handle)

  File "E:\anaconda\lib\site-packages\Bio\SeqIO\FastaIO.py", line 183, in __init__
    super().__init__(source, mode="t", fmt="Fasta")

  File "E:\anaconda\lib\site-packages\Bio\SeqIO\Interfaces.py", line 48, in __init__
    self.stream = open(source, "r" + mode)

FileNotFoundError: [Errno 2] No such file or directory: 'out.fasta'

Now we include the Try-Except block of codes. This means we try to read the fasta file and if it is non-existent, then we catch the FileNotFoundError. Then, the exit_with_error function comes into play where we update the log and return a specific non-zero value for this error (in this case the value of 1).

FASTA_FILE_NOT_FOUND = 1

def reverse_complement_all(fasta):
    # Reads fasta file
    try:
        fasta_records = SeqIO.parse(fasta, "fasta")
    except FileNotFoundError:
        exit_with_error("Input fasta file not found. " + 
                        "Please check directory and name of fasta file.", 
                        FASTA_FILE_NOT_FOUND)
    # Create empty list to store reverse complemented seqs
    reverse_complement_list = [] 
    # Loop through fasta_records and 
    # add reverse complement seq to empty list
    for record in fasta_records:
        forward_seq = Seq(str(record.seq))
        reverse_complement_list += [ SeqRecord(forward_seq.reverse_complement(),
                                    id = record.id,
                                    description = '') ]
        logging.info(f"{record.id} is reverse complemented.")
    return reverse_complement_list

In our script, write_out() is the last computation we run. We only call write_out() when the programme is run smoothly and correctly to the end, so we return zero to signify everything's working well.

def write_out(reverse_comp_records, file_location):
    '''Write reverse complement fasta file to user-specified location'''
    SeqIO.write(reverse_comp_records, file_location, "fasta")
    return 0

Let's try the new error exit feature:

$ python3 rcomper.py --out test.fasta ./input/nonexisting.fasta
rcomper ERROR: Fasta file not found. Please check directory and name of fasta input., exiting

Summary

With logging and error catching and exiting implemented, we have a more complete script. We could now rearrange everything for it to look neater and we are one step closer to the ideal:

'''
Module      : Main
Description : The main entry point for the program.
Copyright   : (c) WEIYUAN, 23JAN2021 
License     : MIT_LICENSE 
Maintainer  : BIO.WEIYUAN@GMAIL.COM 
Portability : POSIX

The program accepts a list of DNA sequences in FASTA format and
returns a list of their reverse complement in FASTA format
'''

from argparse import ArgumentParser
from Bio import SeqIO
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
import logging
import sys

PROGRAM_NAME = "rcomper"
PROGRAM_VERSION = 1.00
FASTA_FILE_NOT_FOUND = 1

def init_logging(file_location):
    '''If the log_filename is defined, then
    initialise the logging facility, and write log statement
    indicating the program has started, and also write out the
    command line from sys.argv
    Arguments:
        log_filename: either None, if logging is not required, or the
            string name of the log file to write to
    Result:
        None
    '''
    logging.basicConfig(filename=file_location + '.log.txt',
                        level=logging.INFO,
                        format='%(asctime)s %(levelname)s - %(message)s',
                        datefmt="%Y-%m-%dT%H:%M:%S%z",
                        force = True)
    logging.info('program started')
    logging.info('command line: %s', ' '.join(sys.argv))
    
def exit_with_error(message, exit_status):
    '''Print an error message to stderr, prefixed by the program name and 'ERROR'.
    Then exit program with supplied exit status.
    Arguments:
        message: an error message as a string.
        exit_status: a positive integer representing the exit status of the
            program.
    '''
    logging.error(message)
    print("{} ERROR: {}, exiting".format(PROGRAM_NAME, message), file=sys.stderr)
    sys.exit(exit_status)
    
def parse_args():
    '''Parse command line arguments.
    Returns Options object with command line argument values as attributes.
    Will exit the program on a command line error.
    '''
    description = 'Read a list of fasta sequences and returns their reverse complement'
    parser = ArgumentParser(description=description)
    parser.add_argument('--out',
                        type=str,
                        default = './out.fasta',
                        help='Output file location and file name')
    parser.add_argument('fasta_file',
                        type=str,
                        help='Input FASTA files')
    return parser.parse_args()

def reverse_complement_all(fasta):
    # Reads fasta file
    try:
        fasta_records = SeqIO.parse(fasta, "fasta")
    except FileNotFoundError:
        exit_with_error("Input fasta file not found. " + 
                        "Please check directory and name of fasta file.", 
                        FASTA_FILE_NOT_FOUND)
    # Create empty list to store reverse complemented seqs
    reverse_complement_list = [] 
    # Loop through fasta_records and 
    # add reverse complement seq to empty list
    for record in fasta_records:
        forward_seq = Seq(str(record.seq))
        reverse_complement_list += [ SeqRecord(forward_seq.reverse_complement(),
                                    id = record.id,
                                    description = '') ]
        logging.info(f"{record.id} is reverse complemented.")
    return reverse_complement_list

def write_out(reverse_comp_records, file_location):
    '''Write reverse complement fasta file to user-specified location'''
    SeqIO.write(reverse_comp_records, file_location, "fasta")
    return 0

def main():
    "Orchestrate the execution of the program"
    
    # Get command-line input
    args = parse_args()
    fasta = args.fasta_file
    file_location = args.out
    
    # Log
    init_logging(file_location)
    
    # Run function
    reverse_comp_records = reverse_complement_all(fasta)
    write_out(reverse_comp_records, file_location)
    
# If this script is run from the command line then call the main function.
if __name__ == '__main__':
    main()

In the next few articles, I will cover what's critically missing: automated testing, smooth installation of the software and also the writing of a Docker image.

10 views0 comments

Recent Posts

See All