If you've been programming for at least a little while, you'll likely have come across (and perhaps used) Object Oriented Programming (OOP) concepts and language-features. This programming paradigm has been a foundational concept in software engineering since the mid-90's, and can provide some very powerful capabilities to programmers – especially when used carefully.
However, it isn't uncommon for many programmers to swirl around concepts like OOP for many years – perhaps gaining the odd bit of insight here and there – but not consolidating that understanding into a clear set of ideas. For beginners too, the concepts of OOP can be a little bewildering, with some guides utilising language-specific OOP implementations to illustrate ideas and many using subtly distinct of overloaded language, all of which in turn can sometimes obfuscate OOP concepts in the more generic sense.
This post is aimed at taking a brief but practical tour of the core concepts of OOP. It'll give you some examples and comparisons written in Python between the type of code you may already be writing, and the type of code OOP facilitates. It's aimed at folks who are familiar with programming, but with a limited formal understanding of OOP and who'd like to sharpen their understanding in this area. It won't be providing a deep dive into specific aspects and applications of OOP or the various 'flavours' of OOP you might come across in different language implementations – it'll try and stay as generic as possible. If this sounds good, read on!
A beginners approach
Let's get started. Say you are writing a program to compute the total area of a group of shapes. Maybe you're working on a mapping tool, or maybe a bounding-box type computer vision problem. You know you need to calculate the total area of a group of rectangles and circles. Simple enough, right? You set off and write two functions:
import math def area_of_rectangle(length, width): return length * width def area_of_circle(radius): return math.pi * radius ** 2
Great. A nice clean start. Now, to calculate the area of a specific set of shapes, you could write:
area_of_rectangle(10, 5) + area_of_circle(5) + area_of_circle(10)
If you only ever need to calculate these three specific shapes then congratulations, you're done! But how about if you need to calculate arbitrary sets of circles and rectangles? Perhaps you'd do something like this:
circles = [5, 10, 25] # each element is a radius of a given circle. rectangles = [(10, 5), (6, 8)] # each element is (length, width) pair for a given rectangle. # calculate area of rectangles area_rectangles = 0 for (l, w) in rectangles: area_rectangles += area_of_rectangle(l, w) # calculate area of circles area_circles = 0 for r in circles: area_circles += area_of_circles(r) # get total area total_area = area_circles + area_rectangles
Again, nice and simple, and as before: perhaps all you need to do. But you may already be able to see some problems creeping into this code: each shape requires its own
for loop, and each loop is structurally very similar, which leads to a degree of code duplication. However, if you are absolutely certain that you will only ever be asked to compute the total area of a set of circles and rectangles, then this is great.
That said, if you work in the messy world of commercial software, you may be able to predict what's coming. Someone 'upstream' from you releases a new feature that introduces triangles to the mix of shapes, and they need you to calculate the area of these triangles, too. Now, in the above code you could go in and add something like:
def area_of_triangle(base, height): return (base * height) / 2.0
You could then add another
for loop, another list of
triangles and update the
total_area to reflect your needs. Simple enough, but your code is likely going to start looking repetitive and a little verbose. Perhaps that's still tolerable, in which case, fair enough.
However, you now get a request to have your program calculate both the area and the perimeter of each shape. What do you do? You can add something along the lines of:
def perimeter_of_rectangle(length, width): return 2*length + 2*width
For each new shape, you will have to duplicate (perhaps even copy-and-paste! 😱) many more lines of code to support each new shape or operation.
Additionally, the core problem of calculating the area (and perimeter) is becoming a little obfuscated by the surrounding logic. Plus, as your code grows, it becomes easier for mistakes to start slipping in, more difficult to debug these sorts of problems, and harder for new people to pick up and change. While this specific example is a bit contrived, in general, this sort of approach is not very extensible, it's pretty verbose, and it has a lot of code duplication/repetition. All of this can make your code a bit of nightmare when things start getting still more complicated down the line – as they inevitably do in a professional setting.
This is where (carefully applied) usage of OOP can become a boon. Time to start the shape-area calculation problem again from the very beginning. At a glance, it is clear that your little program above has a lot of structural similarity – you have a set of shapes, and each shape can be said to have an area. What varies is the information required to define each shape: the
radius for a circle, the
width for a rectangle, and how that information is used to compute the area. How can you capture this observation programmatically? That's where
classes can come in handy. Here's a new
class Shape: def area(self) -> float: ...
What is this telling you? It says that there's a structure called
Shape which has a method
area attached to it. In other words: all
Shapes have an area associated with them. You might think that
area looks a lot like a standard Python function, and you'd be right. Functions that are members of classes are typically referred to as methods.
Importantly, this specific snippet defines an abstract class: you have a completely abstract 'concept' of
Shape. It's abstract because it doesn't define how
area is calculated, only that all
Shapes 'have' an area that you can access. Why is this helpful? Well you can now define subclasses – specific variants of the abstract
Shape that implement this information for specific shapes. Here's what a
Circle might look like using this:
class Rectangle(Shape): def __init__(self, length: float, width: float) -> None: self.length = length self.width = width def area(self) -> float: return self.length * self.width class Circle(Shape): def __init__(self, radius: float) -> None: self.radius = radius def area(self) -> float: return math.pi * self.radius ** 2
Let's unpack this a little. You'll see that each of these subclasses is defined along the lines of
class Circle(Shape). You could read this as 'a
Circle is a type of
Shape'. Technically, this in turn means that
Circle will immediately inherit the
area method from
Shape. Logically: all
Circles also have an
area. You can see here that for each shape the
area method is now implemented as appropriate for the shape in question.
You'll also see that there's an odd-looking method
__init__ on these classes now too. In Python, this is technically referred to as the 'initializer'. However, it is functionally very similar to what is more generally known as a constructor method, and for generality, this post will refer to it as such (the difference between the two is a topic for a much later discussion).
A constructor is a special method that provides instructions on how to construct a new instance of a given class. In the example above, you can see that the 'visible' difference between the two types of shape is captured in the constructor: a
radius. This means that all other distinctions (e.g. how
area is calculated) is hidden from the rest of the code. Time to see what this means for your the initial example:
shapes = [Circle(5), Rectangle(10, 5), Circle(10), Rectangle(6, 8), Circle(25)] area = 0 for shape in shapes: area += shape.area()
As you can see, the core code itself is much less verbose, and arguably much easier to understand: given a set of shapes, iterate over those shapes and sum their areas. You might even choose to use a generator expression instead:
area = sum(shape.area() for shape in shapes)
Much more succinct and explicit. Clearly, the implementation of
area is unimportant (and not visible) to this logic.
Extending your code
Now, back to the problem of introducing new features to your code. In the original example, you were required to add support for computing the area of triangles. Using what you've seen, you could create a new type of shape
Triangle as follows:
class Triangle(Shape): def __init__(self, base: float, height: float) -> None: self.base = base self.height = height def area(self) -> float: return (self.base * self.height) / 2.0
Again, the implementation of
area is class-specific, but is hidden from code using instances of the class. Consequently, the core logic of the code (in some sense 'the interesting bit') remains identical. All you need to do is add a
Triangle to your list of
Shapes. This idea goes further: any instance of a class that is a 'valid'
Shape can be used anywhere a
Shape can be used with no other changes required. You can think of the abstract
Shape class as defining a contract that all code using a
Shape can rely on to provide the required functionality (in this case, the
area of the shape). Concretely, your
shapes list would now look like:
shapes = [Circle(5), Rectangle(10, 5), Circle(10), Rectangle(6, 8), Circle(25), Triangle(10, 5)]
You might be able to see that the code has become more extensible with the introduction of some basic OOP concepts. You can add any number of shape variants and you need not do any work to change your 'core' logic. You might notice that this also isolates your 'business logic' (sum areas of a set of shapes) from implementation details (the
But there was one more request: you were then asked to compute the perimeter for each shape too. How do you do this?
class Shape: def area(self) -> float: ... def perimeter(self) -> float: ... class Rectangle(Shape): ... def perimeter(self) -> float: return (self.length + self.width) * 2.0
In this case, you can see that a new method
perimeter has been added to the
Shape class. As you might expect, this informs the program that all valid
Shape classes should implement the
perimeter method. Going back to your
Rectangle class, you can now implement said
perimeter method as shown in the snippet. You can do the same for each other
Shape too. Note that for brevity the ellipsis here is used as shorthand for the
area methods defined above.
However, you may preempt another issue here: computing the
perimeter of triangles by
height alone differs by type of triangle, but the
area remains the same. This is a case where making subclasses of
EquilateralTriangle might make sense:
class RightTriangle(Triangle): def perimeter(self) -> float: return self.base + self.height + math.sqrt(self.base**2 + self.height ** 2) class EquilateralTriangle(Triangle): def __init__(self, base: float) -> None: super().__init__(base, base) def perimeter(self) -> float: return 3.0 * self.base
What does this mean? You'll notice that in both cases there's now a type-specific implementation of the
perimeter method. You might also notice that
RightTriangle doesn't use the constructor method defined on
Triangle. this is because the constructor for this type of
Triangle is identical to the parent class, and by not overriding this method in the subclass
RightTriangle will tell the class to use the default constructor instead.
EquilateralTriangle does override the
__init__ method. Clearly, for an equilateral triangle, you need only the length of one side to fully specify the shape. Here, the constructor has been modified to only require the
base to reflect this. You'll see that the implementation then executes the line
super().__init__(base, base). This line calls the constructor of the parent class
Triangle with the arguments positional arguments
length instead mapped to just
Putting this together, you'd have:
shapes = [Circle(5), Rectangle(10, 5), Circle(10), Rectangle(6, 8), Circle(25), RightTriangle(10, 5), EquilateralTriangle(5)] area = 0 for shape in shapes: area += shape.perimeter()
Hopefully you can see that changes to the 'core' logic are minimal, and this little program is now much more extensible than the initial example. You have created an ontology that is captured by your class hierarchy. This allows you to add new features to your code by either extending your class hierarchy or by updating your core business logic. You may see that this has (largely) decoupled these tasks, making it easier for a new user to focus just on one or on the other. How is this useful in practice, you ask?
Let's take this away from shapes for a moment. Imagine you were working for a company like Stripe (a digital payments company). You have some code that processes each transaction on your platform as it occurs. You have defined a simple
Transaction class that captures information about the transaction. Your business logic pipeline then takes instances of this class, checks for signs of fraud, updates your internal records of transactions, sends a push notification to your customer and then archives the transaction metadata somewhere.
This flow would likely be quite complex, and clearly performance and reliability are essential to your service. You'll likely have a battery of test suites sat around this pipeline, and perhaps a particularly fearsome code review process too. Basically: you don't want to make spurious changes to your code. However, the business world loves spurious changes. In this example, perhaps there's a regulatory change, or maybe you need to operate in a new territory that requires you process additional data. Practically, you still want one well-designed pipeline – you don't want to update the whole pipeline for each small business change or for each new territory you expand into.
With this example, you may be able to see how applying the concepts in this post could come in handy. By isolating what varies (i.e. transactional info) from your pipeline implementation using a combination of OOP techniques, you can reduce the amount of code you need to change for each new business-specific use-case that you come across, and keep your business-critical code rock solid in the meantime. Cool, eh?
Until next time
That's it for Part 1. You've seen some of the practical capabilities of OOP on a toy problem, and how they can change how you think about designing and building software. You've also seen a brief example of the benefits of OOP for something of a more 'real-world' application. Hopefully you've found it useful.
However, so far you've escaped a more technical look at the language and concepts underpinning OOP. Plus, you've seen a perhaps overly-optimistic view of the application of OOP too: there's a fair few reasons to be cautious in your use of OOP, and it's important you understand what those reasons are before applying it to your projects.
Fortunately for you, that's what you'll get from Part 2! Tune in next time!
If you've got any questions or feedback, get in touch with me on Twitter.