State of the Blog: 2020

Why, hello there!

I've been a bit late in getting my December newsletter out of the door. I decided to take the last couple of weeks of December off to get some much-needed R&R. However, here it is, belatedly!

This newsletter is going to be a little different though. Rather than give a shortlist of some of great Machine Learning and tech content I've come across in December, I'm instead going to lay out a few changes that I'm intending to make to the blog in 2021, what that means for subscribers, and highlight some of the posts I'm most pleased with from 2020. Let's dig in.

Incoming changes

The primary aim of this blog and newsletter is to help me develop a consistent, high-quality writing practice, and to improve my ability to communicate technical concepts to different audiences. It's also a means through which I've been able to connect with the data science and machine learning community, and to better understand the interests and trends in that community too.

Since relaunching the blog in summer 2020, I've learned a lot about what makes a good article, and how to draw in a high-quality readership too. I've also realised how much I have yet to learn, too. These learnings have motivated me to make a few tweaks to the type of content that'll appear on this blog, and to change the publishing cadence I've been targeting to better address the priorities I've identified for myself. What does this mean for you? Here's what you can expect:

  • The current monthly 'trends' newsletter will instead become a longer, more data-driven review of trends in the world of Machine Learning and software sent out on a quarterly basis. It'll borrow some aspects from my recent article on data science technology trends in 2020.
  • A focus on longer, richer articles released approximately once per month. These articles will also tend to be part of broader themes and/or series, too.
  • Some articles and/or series will move to being subscriber-only after 30 days.

I'm also looking at adding some new features to the blog too. Adding a comments section is high on my list of 'nice-to-haves', for example. I'm tentatively considering experimenting with adding some paid content to the blog too. However, if I do this and you're already subscribed by the point I add paid content, I'll upgrade your account so you'll get any paid content for free. I'll keep you posted as I figure things out a bit more.

Best of 2020

Right, time for my highlights of 2020. While I'm rarely content with my writing, there have been a few posts I've been more pleased with than others. Fortunately, many of you agreed with me (the feedback for which I'm very grateful!).

One theme of my 'research' interests this year has been how to best go about scaling out ML systems to be robust in very high demand use-cases, while also being cost effective. That led me to a couple of deep dives into the world of serverless computing, which is the subject of the first article I'd like to highlight:

Serverless ML: Deploying Lightweight Models at Scale
Deploying ML models ‘into production’ as scalable APIs can be tricky. This post looks at how Serverless Functions can make deployment easier for some applications, and gives an example project to get you started deploying your own models as Google Cloud Functions.

Another big 'discovery' for me in 2020 was Streamlit, the hugely productive, well-designed Python library that enables you to quickly build and deploy data-intensive web apps in a matter of moments. Better yet, it's designed to allow you to add the capabilities it offers into existing code. I've turned many of my little scripts into Streamlit apps, including a bunch of personal finance scripts I've cobbled together over the years! Here's an article I put together explaining the basics in more detail, and how to share a Streamlit app on Google Cloud:

Deploying Streamlit Apps to GCP
Streamlit is a minimal, modern data visualization framework that’s rapidly becoming the go-to dataapp framework in the Python ecosystem. This post introduces Streamlit, and shows you how to securely and scalably deploy your Streamlit apps with Google App Engine.

My most popular blog of the year was also my last. Sometime in November I decided to pull statistics from various blogs around the web to 'apply Data Science to Data Science' and understand what trends we'd seen as a community over the last couple of years. This all culminated in an overview of key technology trends in 2020. Here's that article:

Data Science in 2020: Technology
This article analyses 30,000 unique Data Science blog posts from the last year to get to the bottom of what the Data Science community has been discussing. This post looks at the most discussed -- and most popular -- technologies of the year.

Finally, if you aren't already aware, I've started reposting some content from this blog over on Medium on the Towards Data Science publication. Here's my current most popular post on that site:

A Gentle Introduction: Automating Machine Learning Pipelines
Deploying software regularly and reliably is hard. Deploying software that utilises Machine Learning (ML) models regularly and reliably can be harder still. At the end of the day, the long-term value…

That's a wrap

That's it for this newsletter. I've enjoyed the first six months or so of my blogging trip, and I hope you've found it useful too. You should expect my first 'data-driven trends' newsletter at the end of March 2021. In the meantime, if you have any questions on feedback, feel free to drop me a line on LinkedIn or Twitter.

I wish you happiness and good health for 2021.

Mark Douthwaite

Mark Douthwaite