Python classes : introduction

Classes are user-defined objects. Python has several built-in object types such as integers, float and strings. The programmers can create objects required for their program.

We have seen some of the python objects previously. For example:

a = 3
print(type(a))
<class 'int'>

Here, a is an int type of object.

 

Similarly,

b = 'letter'
print(type(b))
<class 'str'>

Here, b is a str or string type of object.

d = {
    'a' : [1,2,3],
    'b' : 'numbers'
}
print(type(d))
dict

Here, d is a dict or dictionary object.

Python objects have their own set of attributes or methods attached to them. We can see those methods by using the dir() function.

# print the methods associated with dictionary object, d
print(dir(d))
['__class__', '__class_getitem__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__ior__', '__iter__', '__le__', '__len__', '__lt__', '__ne__', '__new__', '__or__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__ror__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'copy', 'fromkeys', 'get', 'items', 'keys', 'pop', 'popitem', 'setdefault', 'update', 'values']

Above, we can see the various methods associated with the dictionary d. Some of them have double underscore (__) flanking the name of method. These are special methods for that object which aren’t called directly.

The methods without the double underscores are the ones that are called directly. For example the values method prints out the values of the dictionary.

d.values()
dict_values([[1, 2, 3], 'numbers'])

Having user-defined classes while coding our own python project can have some advantages.

  • The methods act as properties of the class
  • The methods help to distinguish between objects that appear to be similar.
  • Methods can be inherited by subclasses

Creating a python class

Let’s go ith an example where we want to create a class of DNA sequence. A DNA sequence mainly contains the a sequence of the nucleotides and just through the sequence, several properties of the DNA sequence could be inferred.

Let’s first define a basic dna class and initiate it with a sequence using the __init__ method.
It can be done as follows:

class DNA:
    
    def __init__(self, seq):
        self.seq = seq.upper()        
        

Above, code is the class definition.
We can now generate several dna type objects as follows.

seq1 = DNA('atgccggta')

To be clear, we will print the type of the object.

print(type(seq1))
<class '__main__.DNA'>

So, the seq1 object is a DNA type of object.
The seq1 is called an instance of the class DNA.

Let’s see what methods are associated with it.

print(dir(seq1))
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'seq']

So, apart from the special methods, only the seq method is available for the dna object.

Creating methods of a class

Let’s create a method to get the length of the sequence.

class DNA:
    
    def __init__(self, seq):
        self.seq = seq.upper()
        
    def length(self):
        return len(self.seq)

We can now get the length of a DNA object using length() method.

seq1 = DNA('atgccggta')
seq1.length()
9

But we want to also be able to use to use the built-in python function len() to get the length of the sequence.
As of now, this functionality is not available four our class DNA.

seq1 = DNA('atgccggta')
len(seq1)
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

Cell In[41], line 2
      1 seq1 = DNA('atgccggta')
----> 2 len(seq1)


TypeError: object of type 'DNA' has no len()

Using len() functions on DNA object gives TypeError.
We can resolve this by adding a special function __len__ to the class definition.

class DNA:
    
    def __init__(self, seq):
        self.seq = seq.upper()
        
    def __len__(self):
        return len(self.seq)
        
    def length(self):
        return len(self.seq)

Now, we can use the len() function on the DNA object instance.

seq1 = DNA('atgccggta')
len(seq1)
9

Similarly we can make print() function work with the DNA object by adding __str__ function.
First we will verify that print() doesn’t work on seq1.

seq1 = DNA('atgccggta')
print(seq1)
<__main__.DNA object at 0x000001DAED5A9250>

It prints the object type and memory location of the instance.

Let’s try this again by adding the __str__ special function.

class DNA:
    
    def __init__(self, seq):
        self.seq = seq.upper()
        
    def __len__(self):
        return len(self.seq)
    
    def __str__(self):
        return self.seq
        
    def length(self):
        return len(self.seq)
seq1 = DNA('atgccggta')
print(seq1)
ATGCCGGTA

Now we can see that print() function actually prints the sequence of the DNA object.

Using dir(), we can see the new methods available for the DNA object.

print(dir(seq1))
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__len__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'length', 'seq']

We see that the length function is added to the methods.

This was the simplest tutorial I could imagine on how to create classes in python.

We created an object called DNA which could represent a DNA sequence in real work. However, we created this class as a main class.

But, in reality, a DNA sequence is a type of nucleotide sequence. All nucleotide sequence can be considered as special type of strings. We can reproduce this reality by using something called as inheritance of classes.

In this case: 

  • str or string would the parent class 
  • SEQ could be a subclass of str 
  •  DNA could be the subclass of SEQ

But we will cover this in next post.


Popular posts from this blog

Principal Coordinate analysis in R and python

Principal Coordinate Analysis (PCoA) in R