Havelock London

Data Engineering Kickstart: Migrating and Modernising Havelock London’s Data Processes

The Challenge

Havelock London is a boutique investment management company that analyses financial data from various sources to inform their investment decisions.

They needed to move their ETL pipelines from a legacy semi-manual setup to a modern fully automated cloud-based system using Apache Airflow, but lacked the necessary experience internally.

They asked us to design a solution, build the platform, migrate the first pipelines and provide handover and training so their internal teams could take it forward and migrate the remaining pipelines.

The Solution

Designing the platform

We implemented a solution using Apache Airflow, designed to wrap around Havelock’s existing proprietary ETL logic; this ensured that the underlying data processing remained consistent, while adding the capability for Havelock to extract and store data files as soon as they became available.

We used Azure Kubernetes for deployment, integrating with Havelock’s existing cloud infrastructure to maximise ease of configuration and maintenance.

Kubernetes infrastructure changes were managed by templated Helm charts, again allowing configurability with a low burden of maintenance.

Migrating the first pipeline

We migrated the self-contained piece of Havelock’s existing data-processing functionality to Airflow, fully documenting the process to act as a model to be followed by Havelock when migrating further pipelines.

This included setup of local, staging and production environments and demonstrating Airflow development best practices, such as keeping individual Airflow tasks atomic and making use of Airflow’s provider packages, to set up patterns that could be easily followed in the future.

Our solution was reinforced by automated unit tests written using the Pytest testing framework, along with continuous integration and delivery using Azure Pipelines and Cobertura to automatically publish test results. This allowed for easy identification of test failures and provided key metrics such as code coverage and ensured that both code and infrastructure changes could be deployed in a fast, safe, and reliable manner.

Setting Havelock up to succeed after we left

As we created the data pipeline, we streamlined Havelock’s data and development processes. We adapted their existing continuous integration and delivery pipelines to publish their proprietary Python package to a private registry, allowing for centralised versioning of their data transformation logic.

In addition, we migrated Havelock’s data file-storage strategy to a cloud-based solution using Azure Blob Storage, including creation of data lifecycle-management policies handling file retention and deletion.

Our team worked closely with senior stakeholders to understand Havelock’s key business drivers and motivations, which led every part of our development, and we fed this back through weekly demos.

Throughout the project, we created in-depth documentation to cover both code and infrastructure implementation. We combined this with an end-of-project knowledge transfer session to ensure that the solution would be effectively handed over.

The Result

Over six weeks, our small team built the foundations of Havelock’s new platform and migrated a key data pipeline. This first data pipeline successfully extracts and transforms data from live sources twice a day automatically, with the output data available for consumption by Havelock immediately.

This automation removes the reliance on the manual work involved in triggering these pipelines, meaning that the Havelock team can instead focus on analysis.

We also designed and scoped the technical work needed to migrate the rest of Havelock’s existing pipelines, enabling development to be taken forward by Havelock. The Havelock team have since continued the work, using the knowledge gained from working with Softwire.

“As a small team we have an incredibly limited bandwidth, and so decided to contract out the project to move our data processing into a more robust and automated framework. This was the first time that we have used third parties in this way, and so were not without reservations. From our first contact point we had a very positive experience of working with Softwire, with the transparency that they provided at each stage helping to boost our confidence that the project was heading in the correct direction.

The agile approach to the project, with daily stand-ups, worked well as we were able to provide feedback as the project progressed. The project was delivered on time and on budget, with the team members assigned to it being high calibre and a pleasure to work with. This all speaks volumes to the culture at Softwire, where we were left with a sense that they took pride in delivering good outcomes for their clients.”

– Matthew Beddall, CEO Havelock London

Client
Havelock London

Technologies
Airflow, Python, Azure Kubernetes Service, Helm, PostgreSQL

The project
Migrating and modernising Havelock London's ETL pipelines from a legacy semi-manual setup to a modern fully automated cloud-based system using Apache Airflow.

The results
Automation means Havelock are no longer reliant on manual work to trigger pipelines, freeing up the team to focus on analysis.