In Part 1 of this series, you saw a few practical examples of how Object-Oriented Programming (OOP) can be used to help you resolve some code design problems. Make sure you've checked that out if you haven't already!
The language around OOP can seem intimidating. You've seen some of this language in the example in Part 1, but let's make it a little more concrete. Firstly, let's start with probably the most basic question: what's the difference between a
class and an
- Classes - The definition of the data and procedures available to a given structure. In other words, a class defines what data it refers to, and what procedures (methods) can be used on this data.
- Objects - Concrete instances of classes. For example, in the example above you defined the class
Rectangleand instantiated this class to produce a
There's also some important differences between
class methods and variables, and
instance methods and variables that can have an impact on the behaviour of your code:
- Instance variables - These are data elements 'belonging' to each instance of a class (i.e. object, for example the
widthvariables on the
- Class variables - These are data elements 'belonging' to all instances of a class – there is a single copy for all instances of that class.
To clarify this a little, here's a look at the difference between the two:
class NewCircle(Shape): pi = math.pi def __init__(self, radius: float) -> None: self.radius = radius def area(self) -> float: return self.pi * self.radius ** 2
In this case,
pi is a class variable, and
radius is an instance variable. Practically,
pi is shared across all classes, so if you do this:
a, b = NewCircle(1), NewCircle(1) print(a.area(), b.area()) # 3.141592653589793 3.141592653589793 NewCircle.pi = 3.14 # this changes `pi` on both `a` and `b`. print(a.area(), b.area()) # 3.14 3.14
You'll see that updating the class variable with
NewCircle.pi changes the area of both circles, whereas:
a, b = NewCircle(1), NewCircle(1) print(a.area(), b.area()) # 3.141592653589793 3.141592653589793 a.pi = 3 # update only the copy of `pi` on the instance `a`. print(a.area(), b.area()) # 3 3.141592653589793
Will only modify
pi on a copy of
pi belonging to the
a instance of this class.
How about methods – the things that operate on the data? As you saw in Part 1, methods can be thought of as functions that are members (i.e. 'belong to') a class. There's two particularly important examples that mirror the variable definitions above:
- Instance methods - Much like instance variables, instance methods 'belong' to individual objects. These methods can access the data and methods encapsulated be the object, including other methods, instance variables and class variables.
- Class methods - In contrast, class methods are methods that are available to all instances of a class, but may only access other class methods and class instances of that class.
Class methods (and a couple of other varieties too) are supported in Python, but understanding their use in Python requires a solid understanding of a couple of intermediate-level Python language features (including decorators), so an example and discussion on these will be left to a future post.
Now to formally introduce some of the bigger ideas of OOP.
In the example in Part 1 you saw the definition of the
Rectangle class. To recap, you had:
class Rectangle(Shape): def __init__(self, length: float, width: float) -> None: self.length = length self.width = width def area(self) -> float: return self.length * self.width def perimeter(self) -> float: return (self.length + self.width) * 2.0
In OOP, encapsulation refers the bundling of data and functions (methods) into a single structure (a class). Practically, encapsulation is used to hide the state of an object. Part of this information hiding involves defining how specific variables and methods may be accessed in order to limit misuse and ensure stability. This is the origin of the concept of public, protected and private 'access modifiers' used in some languages. Let's take a look at what that means.
In the example here, all the instance variables and methods in
Rectangle can be described as public – they are 'visible' (accessible) to any code interacting with any
Rectangle instance. However, what happens if you decide that you don't want your users interfering with the
width instance variables after you've instantiated a
Rectangle. One approach could be to make your variables and methods protected or private. This would prevent (or discourage in some cases in Python) someone using
Rectangle directly from accessing
width. Concretely, you can define members of a class as having one of three levels of access:
- Public - Visible to any code using the class.
- Protected - Only visible to the class that defined the member, and any subclasses of that class.
- Private - Only visible to the class that defined the member.
class Rectangle(Shape): def __init__(self, length, width): self._length = length self.__width = width def area(self) -> float: return self._length * self.__width def perimeter(self) -> float: return (self._length + self.__width) * 2.0
This snippet follows the Python conventions that now indicate that the
_length instance variable is a protected member (i.e. accessible to subclasses) and that
__width is a private member (i.e. only accessible to
Rectangle). This indicates that if you were to create
class Square(Rectangle), this new class cannot use the
width variable at all.
Additionally, you should not access the variable
_length on instances of
Rectangle(10, 5)._length). If you're using a linter, you'll notice that it'll give you warnings if you attempt to violate these rules. Moreover, while Python doesn't enforce either protected or private members in a conventional way, many languages do, and the ability to control access to said members (even in Python's more limited approach) can be a useful design feature.
For example, it can be useful to break up intermediate steps in a calculation into distinct protected methods, but only expose one public method to be used by your users. In other words: hide implementation details that end-users should not have access to, and expose only those they should have access to.
Consequently, it's often a good idea to make only a minimal subset of methods public (or conversely, you should default to making variables and methods protected unless you have a specific reason to make them public). This helps keep the ways in which users interact with a class as narrow as possible, which in turn reduces the 'surface area' of the API you're exposing to them, which then generally reduces the development effort required to support and maintain that API.
Let's revisit the refactored
Shape example from Part 1 one more time:
class Shape: def area(self) -> float: ... class Rectangle(Shape): def __init__(self, length: float, width: float) -> None: ... def area(self) -> float: ... class Triangle(Shape): def __init__(self, base: float, height: float) -> None: ... def area(self) -> float: ... shapes = [Rectangle(5, 10), Triangle(1, 2)] area = 0 for shape in shapes: area += shape.area()
This snippet captures a couple of key ideas related to the concept of polymorphism. Technically, polymorphism refers to the concept that objects of different types can expose a single interface. In the example code here,
Triangle both expose the same method/s, so the code calling those methods can be indifferent to the type of object it is operating on. In other words, your loop over the list of
shapes only need the guarantee that the objects it operates on implement the
Shape interface, and if they do, it'll always work just fine.
This is a very powerful concept. Used well, it allows you to construct clean, extensible APIs that are easy to use and simple to debug. This specific concept is used as the basis of many popular frameworks: the exposed interfaces capture a model of a domain or problem which you can then interact with or extend.
Concretely, take a popular machine learning (ML) library like Scikit-Learn. If you've ever used this library, you'll no doubt be familiar with the classic
predict methods (among others) that characterise models in the library. This interface is simple and clean – if sometimes restrictive (by defining what something is, you also end up defining what it is not, after all) – and allows users to build ML pipelines that leverage it without worrying about the specific model variant being used by the pipeline (indeed, that's precisely what Scikit-Learn Pipelines do!).
... by defining what something is, you also often end up defining what it is not too, after all.
Consequently, other providers can implement versions of their own models that conform to this interface, which in turn can immediately be used in any pipeline set up to use Scikit-Learn models. You may recall that other popular libraries such as LightGBM, XGBoost and Tensorflow provide Scikit-Learn compliant interfaces. This is part of why there exists such a vibrant ecosystem of Scikit-Learn compliant tools, and why this fact is so useful (and important) from an engineering perspective: it helps you to separate the logic of what you're actually doing with a model from the implementation details of the specific model variant. This is enabled (in part) through polymorphism.
If you're interested in getting a more formal grasp on the ideas behind the various forms of polymorphism, you might find it useful to read up on related ideas, including the Liskov Substitution Principle. Additionally, polymorphism is sometimes mistaken for a specific aspect of OOP itself. Instead, it is a more general programming concept, and variants can be found in many different paradigms in one form or another, including in Functional Programming (another pre-eminent programming paradigm).
A third major feature of OOP is inheritance. The key idea here is that inheritance allows you to express "is a type of" relationships between classes. For example, in the
Shape example you saw in Part 1, you could express the relationship
class Triangle(Shape) as:
Triangle is a type of
Shape. Similarly, you could express
class RightTriangle(Triangle) as:
RightTriangle is a type of
Triangle. You might then start to see that you're building a hierarchy of classes. In the case of this simple example, you have something like:
It is common for the 'root node' in these sorts of hierarchy structures (i.e.
Shape in this case) to be referred to as a base class. It is also quite common for these classes to be abstract: they do not specify their own implementation, but instead define an interface (and perhaps a partial implementation). An abstract class is not designed to be instantiated directly: it is designed to be subclassed. Many languages actively enforce this fact and prevent you from attempting to directly instantiate an abstract class. This behaviour can be achieved in Python too. Methods that are defined on these classes that do not provide an implementation (like
area in the example) are referred to as abstract methods (or equivalently in some languages/contexts as virtual methods).
To make this all a little more concrete, an abstract class can be defined as:
- Abstract Class - A class with one or more abstract methods.
So why is this useful? Inheritance (in theory) enables you to easily extend and modify classes, which in turn can make it easier to add features and functionality to your code. Take the example above: you saw how the
Triangle class was extended to quickly and easily implement a new method
perimeter on two new types of
Triangle without having to 'touch' the parent
You may be able to see how this could be used in a business context: you may choose to capture different types of customers, transactions or other business entities as a class hierarchy, and then use the ideas you've seen from polymorphism to create some nice and generic business logic to operate on these different types of object. You should certainly explore this idea, but do so with caution.
This post has really only scratched the surface of OOP: it is a big area with a huge array of tools, ideas and implementations that underpin its use in a modern software project. If you choose to dig deeper into the world of OOP, you'll notice a lot of similarities across languages of certain lineages (e.g. C++, Java), as well as a fair few distinctions too. Some languages and tools deliberately adopt specific subsets of the features discussed here, while others implement more sophisticated versions as well. If you spend the time to learn these ideas – particularly across languages to help you compare and contrast ideas and approaches – you will find OOP to be an invaluable tool in your programming toolkit. However...
A word of warning
So far, you've seen how OOP can be used to help you structure and address problems. In experienced hands, it is a powerful tool. However, when used indiscriminately, OOP can be problematic. Inappropriate/excessive use of OOP concepts and capabilities can very easily act against you. As is always the case when learning new knowledge and skills, it is common for individuals new to the concepts of OOP to fall into the trap set by 'the law of the hammer': when you have a hammer, everything looks like a nail.
... it is common for individuals new to the concepts of OOP to fall into the trap set by 'the law of the hammer': when you have a hammer, everything looks like a nail.
Practically, the very same features of encapsulation, polymorphism and inheritance you saw above can increase complexity and hinder debugging, performance and maintenance of your code if you use them without care and forethought. For example, excessively complex and/or poorly designed class hierarchies are a common way development teams end up tying themselves in knots – their class hierarchies can become large entangled structures that are hard to reason about and technically difficult to extend.
As with any skill, understanding when and how to apply OOP concepts comes with practice and – frankly – occasional failure. You will make some code unnecessarily complex and unwieldy. You may well break things. Ultimately, you'll need to apply the ideas of OOP to your own problems a few times before you'll be able to get a feel for what works and what doesn't. Furthermore, a 'purely' technical grasp of the concepts is not enough: you need to remember to take a step back when you're starting on a new project (or joining an existing one) and think about the bigger picture and how best to use the tools available to you.
As you've seen, the concepts provided by OOP and specific implementations of OO language features can help you design and implement your code to function in quite elegant ways. However, they do not add anything 'new', in a sense: you can write code that solves any problems you're likely to face without ever needing to reach for the tools OOP provides. That said, judicious use of OOP ideas may make you a much more productive programmer, and may ease the adoption and reuse of your code by others. Additionally, a good grasp of OOP will also help you better understand and reason about the behaviour and design of many popular software frameworks.
... a good grasp of OOP will also help you better understand and reason about the behaviour and design of many popular software frameworks.
As always, knowing when to reach for a specific tool (and having that tool there waiting to be used) is a valuable skill to develop. It's important that you try and reason about how and where to apply OOP ideas to your own work. For example, if you're sure you're writing a solution to a one-off problem, then creating an elaborate class hierarchy may well be both over-engineering your solution and spawning more problems than it solves. However, if you're starting a project that you know is going to be widely used and extended, it may really pay off.