Retrieval-Augmented Generation for LLMs: A Gentle Introduction If you’ve been on or near the Generative AI (Gen AI) rollercoaster that has characterised the last year or so, it is likely you will have come across the acronym ‘RAG’. RAG stands for ‘Retrieval-Augmented Generation’. It is an approach commonly credited to a 2021 paper by a team
Personal 7 Reasons To Work At A Startup, And 1 Reason Not To Inspired by Chip Huyen's similarly titled post, this post summarises some of my thoughts on why joining a startup might just be the best decision you could make for your career, and a word of caution for those considering doing just that.
Engineering Skills for Data Scientists Load Testing a Machine Learning Model API Deploying a Machine Learning (ML) model as a live service to be consumed by a business-critical system or directly by end-users can be a scary prospect. This post looks at how you can perform load testing on your model APIs to ensure they can stand up to even the highest-demand situations.
Programming Featured Flask in Production: Minimal Web APIs Flask is a popular 'micro-framework' for building web APIs in Python. However, getting a Flask API 'into production' can be a little tricky for newcomers. This post provides a minimal template project for a Flask API, and gives some tips on how to build out basic production Flask APIs.
Personal Books of 2020 A rundown of my 5 favourite books from 2020, with a few honourable mentions to boot.
Data Science Featured Data Science in 2020: Technology This article analyses 30,000 unique Data Science blog posts from the last year to get to the bottom of what the Data Science community has been discussing. This post looks at the most discussed -- and most popular -- technologies of the year.
Engineering Skills for Data Scientists A Brief Introduction to HDF5 Data models and data formats are an easily overlooked but critical aspect of modern data infrastructure and development work. This post gives an introduction to HDF5 and how to get started using it in Go.
Computer Science Fundamentals Object-Oriented Programming: A Practical Introduction (Part 2) In Part 1 of this mini-series, you saw how OOP concepts can be used to structure and manipulate code. In this part, you'll see how these ideas are formally defined, and look at a couple of more advanced concepts too.
Data Science How Good is Xanthus? Xanthus is a Deep Learning (DL) library built on top of Tensorflow and uses the Keras API to implement various neural recommendation model architectures. This post benchmarks the models implemented in Xanthus against some popular 'classic' matrix factorisation models.
Technology Deploying Streamlit Apps to GCP Streamlit is a minimal, modern data visualization framework that's rapidly becoming the go-to dataapp framework in the Python ecosystem. This post introduces Streamlit, and shows you how to securely and scalably deploy your Streamlit apps with Google App Engine.
Computer Science Fundamentals Object-Oriented Programming: A Practical Introduction (Part 1) Whether you're a fan or not, OOP is a valuable tool in your programming toolkit. It's also sometimes a little bewildering for new programmers (and some more experienced ones too). This post provides a (brief) practical introduction to OOP concepts.
Engineering Skills for Data Scientists Featured MLOps: Building Continuous Training and Delivery Pipelines MLOps is an emerging engineering movement aimed at accelerating the delivery of reliable, working ML software on an ongoing basis. This post provides an intro to MLOps and gives you an example project to get you started with building your own ML pipelines using GitHub Actions and Google Cloud.
Engineering Skills for Data Scientists Featured Serverless ML: Deploying Lightweight Models at Scale Deploying ML models 'into production' as scalable APIs can be tricky. This post looks at how Serverless Functions can make deployment easier for some applications, and gives an example project to get you started deploying your own models as Google Cloud Functions.
Engineering Skills for Data Scientists A Brief Introduction to Serverless Computing This post introduces the concepts behind 'serverless computing' -- a way of quickly and easily deploying lightweight apps (e.g. APIs). It looks at the associated advantages and disadvantages of serverless, and gives a short example showing how to deploy your own serverless function to Google Cloud.
Artificial Intelligence Featured Introducing Xanthus This post introduces Xanthus - a new open source Deep Learning package built on TensorFlow 2.0 for quickly and easily building state of the art recommendation models in Python.
Literature The Freedom of the Press It's easy to feel that the present times are 'unprecedented'. Yet sometimes it's clear that the events of today are echoes of problems familiar to those in the past. This post shares an essay written by Orwell in 1945, that feels particularly relevant to the times.
Engineering Skills for Data Scientists Fire: Simple CLIs done right Creating CLIs can help improve accessibility and reuse of your scripts and packages, but they can also be a bit of a pain to set up and maintain. Fire makes building CLIs for your latest ML pipeline a breeze.
Personal Books of 2019 Here's my annual list of my favourite books from the past year. There's even a business book on here for the first time.
Programming Cython: Lightspeed Python Python is a wonderful language for many applications, but it is not renowned for its speed. This post looks at how you can quickly and easily use Cython to dramatically accelerate your Python code (in some cases).
Computer Science Featured What is Reservoir Computing? What if there was a framework that could be used to exploit the properties of matter for computation? Originally posted on The Conversation.
Personal Books of 2018 A rundown of my 5 favourite books from 2018, plus a few honourable mentions for good measure.
Technology Tracking Monster Jobs with TQDM TQDM is a tiny Python package that lets you add customisable progress bars to your code. Ideal for some those nasty multi-hour model training jobs.
Personal Self Important Rant About Books Because sometimes you need to rant. About reading. And books. And generally disposable media.
Artificial Intelligence The Side Effects of Autonomous Vehicles Autonomous vehicles are starting to pop up all over the place, even in Milton Keynes. But what does this mean for the rest of us? Originally posted on The Conversation.
Technology Graphene: The Key to Next-Gen Batteries? Graphene has had a fair bit of buzz in the last decade. One major possible use case is to improve significantly battery tech. This article looks at how. Originally posted on The Conversation.