• Configuration Data Formats in Data Engineering August 20, 2024
  • When I started out, I didn’t give much thought to how I stored my configurations. A JSON file here, a YAML file there, whatever came to my mind first. As projects grew more complex, the way I structured my configuration files began to matter a lot more.

    Read more »
  • Logging for Data Engineers August 9, 2024
  • After years of building data pipelines, from ones that lived in janky notebooks to ones that automate complex workflows in production environments, logging wasn’t something I paid a lot of attention to.

    Read more »
  • Unit Testing in Data Engineering April 21, 2024
  • I believe we can all agree that writing tests can feel as mundane as doing the laundry as most of us would probably rather be coding new features or learning something new. We would also agree that testing is undeniably a critical component of data engineering, setting individuals apart by the skill it demands.

    Read more »
  • Event-Driven Data Pipelines in AWS - Part 2 April 6, 2024
  • A data engineer doesn’t just code all day, they also design, plan, tinker, analyse, fix bugs, attend meetings and manage trade off of best principles all day. Being familiar with Terraform is a great skill to have to automate provisioning and management of infrastructure. As well as a cross-functional skill, it aligns data engineers with DevOps principles, fostering continuous integration, delivery, and deployment.

    Read more »
  • Event-Driven Data Pipelines in AWS - Part 1 March 25, 2024
  • In my early days as an analyst, I was always eager to automate tasks, often finding myself repeating lines of code without scalable solutions in sight. As I delved deeper into the field of data engineering, I started tinkering with cloud, where I found myself hooked on the possibilities.

    Read more »
  • Introduction to Concurrency in Python for Data Engineers November 12, 2023
  • Concurrency is a complex topic that takes some unpacking to understand it when it comes to implementing it in Python. Simply, it refers to the ability to execute multiple things simultaneously, but is a nuanced term due to the different implementations.

    Read more »
  • First Impressions of Mage July 1, 2023
  • Mage is not just any ordinary data orchestration tool, it could be the superhero of modern data engineering. It streamlines and optimises data processing in an effortless way and will make your other data pipeline tools look like rusty old relics.

    Read more »
  • Breaking into Data Engineering as a Self-Taught Developer May 30, 2023
  • In the ever-evolving field of data, many analytics roles offer growth, ownership and innovation opportunities. Data engineers play a pivotal role in designing, building, and maintaining systems that enable organisations to effectively leverage their data. They work closely with data scientists to ensure that data is properly stored, secured, and made available for analysis. Aside form the core concepts that a data engineer should be familiar with, there are also a plethora of tools out there to work with.

    Read more »