Blood cancers, such as leukaemia, are the fifth-most common type of cancer in the UK.

There are more than 240,000 people living with blood cancer in the UK and 40,000 people are diagnosed with blood cancer each year.

A stem cell transplant is a possible treatment option for those with blood cancer. Some people needing a transplant can find a suitable donor within their family, but the remainder rely on finding a stranger on the donor register to save their lives.

Anthony Nolan is a charity that maintains a register of over 700,000 adult stem cell donors. It is also the hub for the Anthony Nolan and NHS Stem Cell Register, which comprises the four UK donor registers: Anthony Nolan, the British Bone Marrow Registry (BBMR), the Welsh Bone Marrow Donor Registry (WBMDR) and DKMS UK.

When a patient needs a transplant, clinicians can ask Anthony Nolan to search the various registers to see if they can find an appropriately matched donor.As part of a wider digital transformation programme that Softwire has been working on with Anthony Nolan, we helped write and test a new search system for the donor registers.

In this paper, we take a look at the testing process we applied to the new system.

A brief look at how stem cell transplants work

To maximise the chances of a good post-transplant outcome for a patient, their genetic makeup must be similar to that of the donor. The genes that determine the compatibility of a transplant are known as HLA (human leukocyte antigen), or histocompatibility antigens. Genes are made up of DNA, and chromosomes are in turn made up of genes. Some genes have a variety of different forms (known as alleles), each of which is located at the same position, or “genetic locus” of a chromosome.

A more detailed explanation of the science behind the genetics involved in matching is out of the scope of this article, but the book The Compatibility Gene by Daniel M. Davis gives a good overview of the history and science behind the process.

HLA nomenclature

All known HLA alleles are named and documented by an international body, the WHO Nomenclature Committee for Factors of the HLA System. Each allele is given a string representation.

An example string representation used for an allele looks like this: “01:01”. In many cases there is some ambiguity, and instead a string representing multiple options of allele is used.

NameExampleNotes
Allele*01:01:01:01,
*01:02:03,
*01:05
Can be two-, three- or four-field. Fields are separated by colons.
Allele string*01:01/02/03 , *01:01/01:02/01:03

i.e. *01:01 OR *01:02 OR *01:03
Come in two forms.

Subtypes: here we assume that the first field is shared in all possible types.

Strings: One of multiple possible alleles (known to a certain resolution), where the first field is not necessarily the same.
NMDP code*01:AB = *01:01/01:02

i.e. *01:01 OR *01:02
NMDP codes are a a compressed way of writing long allele strings, as the strings themselves can be hundreds of characters. The code refers to all but the first field of an allele string of subtypes.
Truncated allele*01:02

Truncated form of *01:02:03:04
Sometimes only part of the allele name is displayed. This only applies when a longer form of the allele exists: i.e. *01:02 is a truncated form of *01:02:03:04
XX code*01:XX

i.e. *01:01 OR *01:02 OR *01:03 etc…
Can be considered to be a special case or wildcard, which always refers to all possible subtypes: i.e. only the first field is known.
Serology1An older, less accurate typing method based on how the immune system of the cell responds to infection and not the sequence of the DNA. A serological type will map to a number of related alleles, often sharing a first field.
UntypedFor some donors, alleles at some loci may not have been typed (i.e. we do not know which alleles they have at that gene). Untyped genes are considered a potential match.

The genes we consider within the algorithm are known by the names HLA-A, -B, -C, -DPB1, -DQB1, and -DRB1. We have two copies of each gene (one from each parent), so the two are given the numbers 1 and 2.

The search algorithm

The system we’re testing will, for a given set of strings representing a patient’s HLA, search the registry for all matching donors (with a given acceptable match strength, indicated by the number of allowed mismatches), and return a list of donors ranked by match strength.

Internally, the algorithm performs two main functions.

(a) Matching

The algorithm searches a database of registered donors, returning any who are considered a potential match. (An exact match is preferable, but we can allow for a certain number of ‘mismatches’ at specific loci.)

(b) Scoring

For each of the returned donors, the algorithm performs analysis on the match and gives it an appropriate grade and confidence.

The ‘Grade’ indicates how good the match is (e.g. the same allele is the strongest match possible, but other, similar alleles may still be considered a less good match)

The ‘Confidence’ indicates how likely the match is (i.e. how much ambiguity there is in the typing resolution of the donor – see the above tables for possible resolutions)

Testing requirements

Having written the algorithm, we required a suite of automated tests to validate the algorithm performs according to specifications. In addition to asserting the behaviour of the algorithm, the test suites must be:

  • Understandable to non-technical team members
  • Fast enough to be run as part of deployment
  • Covering realistic HLA data

In addition to these requirements, we decided that to best test a wide array of possible inputs, we wanted our test data to be semi-randomly generated. So we can add another requirement to our list:

  • Run on programmatically selected data

In addition to enabling testing of a wide range of test data, pseudo-random data-generation removes the need for manual selection of expected inputs/outputs, which, for a large test suite, would be both tedious to curate and very prone to human error.

Test data requirements

The HLA types chosen for our test donors must be valid (existing) HLA types. We should be able to specify details about the expected match, for example which loci are a match, and to what resolution the donor is typed.

We must also be able to assert that a given lower resolution or ambiguous typing is matched when searching for a higher-resolution one – e.g. 01:01/02 covers two alleles, 01:01 and 01:02. Our tests should be able to assert that the ambiguous typing 01:01/02 is matched when just one of the alleles in the string is matched – i.e. when searching for 01:01. The same is true for all ambiguous typings.

Introducing the “meta-donor”

To achieve the necessary correlation between typing resolutions, we devised the concept of the ‘meta-donor’. A meta-donor is a hypothetical donor, for whom all loci are as strongly typed as possible – i.e. single allele resolution.

When provided with a meta-donor, the test framework can ‘dumb-down’ the meta-donor’s single allele to any of the possible lower resolutions.

Some of these are easily computed

e.g. XX Codes:  01:01:01:01 => 01:XX

Some are possible with the selection of additional data

e.g. Allele Strings: 01:01 => 01:01/01:02/01:03

And some require additional corresponding information to be looked up

e.g. NMDP codes: 01:01 => 01:NMDP

This enables us to simplify the selection of underlying test data itself: We need only choose from a list of available alleles, and any lower resolutions will be computed.

As such, we curated a list of high-frequency alleles for each locus, along with corresponding serology and NMDP code values. The alleles for each meta-donor were selected randomly from these lists.

Several allele datasets were created, because of the need to use specific test data in certain cases (including when a test case requires particularly specific alleles to be used). Each meta-donor definition specifies which dataset it’s alleles should be chosen from.

Each meta-donor specified in our test data file also contains a list of resolutions. For each resolution in the list, the alleles for that meta-donor would be downgraded to the specified resolution and added to our database. This means that for each meta-donor, we can have multiple donors in the database. As far as the system under test knows, these are unrelated, but for our purposes, we named them ‘database donors’, to distinguish them from the meta-donors.

Data selection

Consider the case where we must assert that a lower-resolution donor is returned when searching for matching high-resolution HLA.

Worked example:

That an XX code typed donor is returned when searching for matching single alleles

PatientDonorLocus
01:0101:XXA1
01:0201:XXA2
02:0902:XXB1
etc...

(1) Meta-donor selection

To test this case, we need to select three things. Each of these three selectors was designed to be a stateless, standalone component. Each has an input of a set of criteria, and will return corresponding selected data. This architecture enabled us to easily write unit tests for the bulk of the selection logic, giving us greater confidence that our test inputs/outputs will always be what we asked for.

A meta-donor must be selected to be used for the test case.

In our example, we must find a meta-donor that has been mapped to a database donor with XX code resolution, at all loci.

The variables that could affect which meta-donor is appropriate for a test case were collated into a set of ‘meta-donor selection criteria’.

Examples of such variables are as follows (for each locus/position):

  • Donor type (adult vs umbilical cord blood)
  • Donor’s registry
  • Desired typing resolution of donor
  • Whether a locus should be homozygous (same allele at each position within a locus)
  • Whether a donor’s HLA contains a null allele
  • What match level is desired (certain match levels are only possible with certain datasets)

The selector then uses these criteria to search our defined meta-donors for one matching the criteria. The first meta-donor that matches all the specified criteria will be returned. If none are found, an exception is thrown (suggesting that the meta-donor will need defining)

(2) Database donor selection

Once we know that the test data exists and which meta-donor to use, we must decide which database donor we expect to be returned by the search.

In our example, whichever donor has XX code resolution at all loci.

This selector is much simpler, as it only concerns the typing resolution of the individual donor. Provided with a meta-donor, and a set of resolutions, it will return the ID of the database donor that is typed at the expected resolution at all loci.

(3) Patient HLA selection

Finally, having selected what data we expect to be returned by the algorithm, we can select our input data. We can specify that the patient HLA be of lower resolution as well, but by default, we will use the exact typing stored with the meta-donor.

In our example, we will use the high-resolution typings of the meta-donor, to ensure a match at all loci.

In addition to specifying the typing resolution of the patient HLA, we can request (per-locus):

  • Whether a match is requested at all
  • What level of match is requested
  • Whether the locus is homozygous (same allele at each position within a locus)
  • Whether the match orientation should be cross or direct
  • If A1 (donor) matches A1 (patient), and A2 (donor) matches A2 (patient) we call the match ‘direct’
    • If A1 (donor) matches A2 (patient), and A2 (donor) matches A1 (patient), we call the match ‘cross’

Once it’s provided with the selected meta-donor, the patient HLA selector will reduce the HLA resolution as appropriate, or select ‘HLA known not to match at all’ if a match is not required.

Putting it all together

As mentioned earlier, the three selectors are stateless, purely functional components. In practice, to build up our set of criteria, we must track the criteria state for each validation test case. We use what we call the ‘Patient Data Factory’ to maintain these criteria. It exposes an interface that enables us to change individual criteria at a per-locus level, along with some methods to retrieve expected input HLA / expected output donor IDs, which invoke the relevant selectors.

This factory also accounts for the fact that some of the criteria are directly related. Without it, we could risk updating one set of criteria but not the other.

E.g. If asking for a donor at a specific resolution, both the meta-donor selector (to ensure such a database donor exists for the selected meta-donor), and database donor selector (to select the correct database donor ID) must be informed.

With these three selections complete, we now have some input data, an expected output, and a guarantee that the requested scenario is possible with the existing test data (if it wasn’t, one of the three selection steps would throw an exception).

Cucumber

Recalling the first requirement for the test suite:

Understandable to non-technical team members

One purpose of the test suite is to demonstrate to non-technical team members, who will use the algorithm in production, that the search fulfils the specification to their satisfaction. With this is mind, we chose to write our tests in Cucumber – a popular test language that enables tests to be written in a very human-readable format.

Scenario: 10/10 Search with a NMDP code typed match

Given a patient has a match

And the matching donor is NMDP code typed at each locus

And the match orientation is direct at locus B

When I run a 10/10 search

Then the results should contain the specified donor

Behind the scenes, these are mapped by regex to a series of step definitions written in C#. The step definitions perform some string-matching themselves, and set the appropriate criteria in the patient data factory.

Let’s work through the example, line by line.

Given a patient has a match

This merely tells us that the patient data we select should exactly match the selected donor. We set this on the patient HLA criteria, at all loci:

And the matching donor is NMDP code typed at each locus

This rule maps to a step definition via the regex:

the matching donor is (.*) typed at (.*)

‘Each locus’ tells us to set the criteria for all loci, and ‘NMDP code typed’ tells us to set the expected donor resolution to ‘NMDP code’ in the meta-donor and database donor criteria.

And the match orientation is direct at locus B

This line tells us the match should be a direct one at a specific locus. We set this on the patient HLA criteria

When I run a 10/10 search

This line pertains to the search criteria themselves. At this stage, we use the criteria we’ve built up to retrieve patient HLA, and we hit the API with a search request.

It’s at this stage a meta-donor will be selected, along with patient HLA.

10/10 refers to the number of mismatches allowed. In this case we allow for 0 mismatches at all 5 searched loci.

Then the results should contain the specified donor

This is the most common assertion throughout the test suite. We again make use of our selectors – this time the database donor selector – to check which database donor we are expecting to match.

Having retrieved this, we can check it against the results of the algorithm. If the donor was returned, the test passes. If it wasn’t, the test fails.

Conclusions

At the time of writing, we’ve written a suite of these cucumber tests covering all the scenarios covered by the algorithm specification. Have they covered everything that we wanted them to?

Let’s have another look over the initial goals of our test suite.

Understandable to non-technical team members

As shown in our Cucumber example, the tests themselves are written in full sentences, following a clear Given/When/Then format. This format of automated tests may take some getting used to for anyone new to it, but the tests themselves are very much human readable.

Fast enough to be run as part of deployment

This test suite was always going to be slower than a unit test suite, but still run in a reasonable timeframe. Locally, the full suite takes a few minutes to run to completion – the majority of that time is pre-processing and data import. While we might not want to run the full suite on every commit, speed shouldn’t be a barrier to running the suite as a prerequisite to release.

Covering realistic HLA data

All the test alleles used to generate the test donors have been observed in real donors, so they are realistic. There are some intricacies that are not covered (for example, certain pairs of alleles are more likely to appear together). As the algorithm does not itself consider these phenomena/rarity of HLA, we can be confident in the tests without it.

Run on programmatically selected data

Apart from the test cases where the data required is very specific, the donors are entirely randomly generated.

Maintainability

The logic that selects patient, donor and meta-donor details has become quite complex. The unit tests covering this data-selection provide confidence that the test suite itself is working as intended, but the system as a whole may be quite daunting for new developers to make changes to.

However, within the existing rules, writing new tests themselves requires no knowledge of the system, and they can be added solely by adding more Cucumber features. In addition, it is possible to specify donor and patient HLA directly from the Cucumber files, if a particularly specific case is required in future.

So for the majority of tests that may need adding in future, we expect that the bolts of the test data generation and selection are sufficiently encapsulated from the test files, and will be simple to add, even for non-developers.

We hope you’ve enjoyed reading this and that it’s perhaps provided some inspiration for your own projects. If you’d like to be involved in the life-saving process we’ve been discussing, Anthony Nolan is always looking for more people aged 16-30 to join the register – you can sign up here.