Python classes - Inheritance

In the previous post we saw how to create python classes and methods under them. We create a DNA class representing a DNA sequence.

However, we can treat a DNA sequence as a string. A special kind of string that consists of only four letters, namely, ‘A’, ‘T’, ‘G’, and ‘C’ representing the nucleotides adenine, thiamine, guanine and cytosine, respectively.

The DNA sequence should not contain any other characters. For ease of use we will allow entry of small and capital case letters which would be converted to capital case letter inside the class definition.

Here we will create a class that is inherits properties from the built-in str class.

class subclass(parent_class):
    # class definition

To do so we just have to put the parent class in brackets while defining our current class. We can create as many subclasses that are themselves inherited from other subclasses in this way.

class subclass(parent_class):
    # class definition
    
class subclass_2(subclass):
    # class definition
    
class subclass_3(subclass_2):
    # class definition

For the purpose of this write-up, we will only make one subclass, DNA which is inherited from class str.

class DNA(str):
    
    def __init__(self, seq):
        self.seq = seq.upper()
        

We will then create an instance of DNA.

seq1 = DNA('atgcttaacggcattggcat')

To see what methods it has inherited from the parent class, we will use the dir() function.

print(dir(seq1))
['__add__', '__class__', '__contains__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__module__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'removeprefix', 'removesuffix', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'seq', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']

We can see that, even though we have not defined any methods in DNA class like we did previously, a lot of methods are available for use with this newly created class.

All these methods are inherited from the parent class str.

This is called inheritance of class and is very powerful method while constructing our own software with custom classes. Code for methods of parent classes can be used in child classes and there is no need to write the codes again thus reducing the redundancy in code.

Validity of DNA class

An instance of DNA can be created using above code, but the sequence can still have all the characters that could be used inside a string.

seq2 = DNA('fghr$@fhakf_fmg')
seq2.seq
'FGHR$@FHAKF_FMG'

But, a DNA sequence can have only four of the characters, A, T, G and C. Validation of the DNA sequence can be easily implemented as below.

class DNA(str):
    
    def __init__(self, seq):
        self.seq = seq.upper()
        
        self.check_validity()
    
    def check_validity(self):
        if self.seq.count('A') + self.seq.count('T') + self.seq.count('G') + self.seq.count('C') != len(self.seq):
            raise Exception('The DNA sequence is not valid')

See how it works with the same sequence, seq2.

seq2 = DNA('fghr$@fhakf_fmg')
---------------------------------------------------------------------------

Exception                                 Traceback (most recent call last)

Cell In[26], line 1
----> 1 seq2 = DNA('fghr$@fhakf_fmg')


Cell In[25], line 6, in DNA.__init__(self, seq)
      3 def __init__(self, seq):
      4     self.seq = seq.upper()
----> 6     self.check_validity()


Cell In[25], line 10, in DNA.check_validity(self)
      8 def check_validity(self):
      9     if self.seq.count('A') + self.seq.count('T') + self.seq.count('G') + self.seq.count('C') != len(self.seq):
---> 10         raise Exception('The DNA sequence is not valid')


Exception: The DNA sequence is not valid

The check_validity method raises an exception if the sequence contains any letter other than A, T, G or C.

Similarly, we can write other relevant methods for the DNA class. Few of which that come into my mind would serve following purpose:

  • GC content
  • Complement
  • Reverse compliment
  • Transcribed RNA sequence.
  • Translated sequence
  • Find recognition sites and count them.

Popular posts from this blog

Principal Coordinate analysis in R and python

Principal Coordinate Analysis (PCoA) in R