Typing Lodash in TypeScript, with Generic Union Types


5 November 2015, by

TypeScript is a powerful compile-to-JS language for the browser and node, designed to act as a superset of JavaScript, with optional static type annotations. We’ve written a detailed series of posts on it recently (start here), but in this post I want to talk about some specific open-source work we’ve done with it around Lodash, and some of the interesting details around how the types involved work.

TypeScript

For those of you who haven’t read the whole series: TypeScript gives you a language very similar to JavaScript, but including future features (due to the compile step) such as classes and arrow functions, and support for more structure, with it’s own module system, and optional type annotations. It allows you to annotate variables with these type annotations as you see fit, and then uses an extremely powerful type inference engine to automatically infer types for much of the rest of your code from there, automatically catching whole classes of bugs for you immediately. This is totally optional though, and any variables without types are implicitly assigned the ‘any’ type, opting them out of type checks entirely.

This all works really well, and TypeScript has quite a few fans over here at Softwire. It gets more difficult when you’re using code written outside your project though, as most of the JavaScript ecosystem is written in plain JavaScript, without type annotations. This takes away some of your new exciting benefits; every library object is treated as having ‘any’ type, so all method calls return ‘any’ type, and passing data through other libraries quickly untypes it.

Fortunately the open-source community stepped up and built DefinitelyTyped, a compilation of external type annotations for other existing libraries. These ‘type definitions’ can be dropped into projects alongside the real library code to let you write completely type-safe code, using non-TypeScript libraries.

This is great! Sadly, it’s not that simple in practice. These type definitions need to be maintained, and can sometimes be inaccurate and out of date.

In this article I want to take a look at a particular example of that, around Lodash’s _.flatten() function, and use this to look at some of the more exciting newer features in TypeScript’s type system, and how that can give us types to effectively describe fairly complex APIs with ease.

What’s _.flatten?

Let’s step back. Lodash is a great library that provides utility functions for all sorts of things that are useful in JavaScript, notably including array manipulation.

Flatten is one of these methods. Flattening an array unwraps any arrays that appear nested within it, and includes the values within those nested arrays instead. Flatten also takes an optional second boolean parameter, defining whether this processes should be recursive. An example:

 

_.flatten([1, 2, 3]);                     // returns [1, 2, 3] - does nothing

_.flatten([[1], [2, 3]]);                 // returns [1, 2, 3] - unwraps both inner arrays

_.flatten([[1], [2, 3], 4]);              // returns [1, 2, 3, 4] - unwraps both inner arrays,
                                          // and includes the existing non-list element

_.flatten([[1], [2, 3], [[4, 5]]]);       // returns [1, 2, 3, [4, 5]] - unwraps all arrays,
                                          // but only one level

_.flatten([[1], [2, 3], [[4, 5]]], true); // returns [1, 2, 3, 4, 5] - unwraps all arrays 
                                          // recursively

 

This is frequently very useful, especially in a collection pipeline, and is fairly easy to describe and understand. Sadly it’s not that easy to type, and the previous DefinitelyTyped type definitions didn’t provide static typing over these operations.

What’s wrong with the previous flatten type definitions?

Lots of things! The _.flatten definitions include some method overloads that don’t exist, have some unnecessary duplication, incorrect documentation, but most interestingly their return type isn’t based on their input, which takes away your strict typing. Specifically, the method I’m concerned with has a type definition like the below:

 

interface LoDashStatic {
  flatten<T>(array: List<any>, isDeep?: boolean): List<T>;
}

 

This type says that the flatten method on the LoDashStatic interface (the interface that _ implements) takes a list of anything, and an optional boolean argument, and returns an array of T’s, where T is a generic parameter to flatten. Because T only appears in the output though, not the type of our ‘array’ parameter, this isn’t useful! We can pass a list of numbers, and tell TypeScript we’re expecting a list of strings back, and it won’t know any better.

We can definitely do better than that. Intuitively, you can think of the type of this method as being (for any X, e.g. string, number, or HTMLElement):

 

_.flatten(list of X): returns a list of X
_.flatten(list of lists of X): returns a list of X
_.flatten(list of both X and lists of X): returns a list of X

_.flatten(list of lists of lists of X): returns a list of list of X (unwraps one level)
_.flatten(list of lists of lists of X, true): returns a list of X (unwraps all levels)

 

(Ignoring the case where you pass false as the 2nd argument, just for the moment)

Turning this into a TypeScript type definition is a little more involved, but this gives us a reasonable idea of what’s going on here that we can start with.

How do we describe these types in TypeScript?

Let’s start with our core feature: unwrapping a nested list with _.flatten(list of lists of X). The type of this looks like:

flatten<T>(array: List<List<T>>): List<T>;

Here, we say that when I pass flatten a list that only contains lists, which contain elements of some common type T, then I’ll get back a list containing only type T elements. Thus if I call _.flatten([[1], [2, 3]]), TypeScript knows that the only valid T for this is ‘number’, where the input is List<List<number>>, and the output will therefore definitely be a List<number>, and TypeScript can quickly find your mistake if you try to do stupid things with that.

That’s not sufficient though. This covers the [[1], [2, 3]] case, but not the ultra-simple case ([1, 2, 3]) or the mixed case ([[1], [2, 3], 4]). We need something more general that will let TypeScript automatically know that in all those cases the result will be a List<number>.

Fortunately, union types let us define general structures like that. Union types allow you to say a variable is of either type X or type Y, with syntax like: var myVariable: X|Y;. We can use this to handle the mixed value/lists of values case (and thereby both other single-level cases too), with a type definition like:

flatten<T>(array: List<T | List<T>>): List<T>;

I.e. if given a list of items that are either type T, or lists of type T, then you’ll get a list of T back. Neat! This works very nicely, and for clarity (and because we’re going to reuse it elsewhere) we can refactor it out with a type alias, giving a full implementation like:

 

interface MaybeNestedList<T> extends List<T | List<T>> { }

interface LoDashStatic {
  flatten<T>(array: MaybeNestedList<T>): List<T>;
}

 

Can we describe the recursive flatten type?

That fully solves the one-level case. Now, can we solve the recursive case as well, where we provide a 2nd boolean parameter to optionally apply this process recursively to the list structure?

No, sadly.

Unfortunately, in this case the return type depends not just on the types of the parameters provided, but the actual runtime values. _.flatten(xs, false) is the same as _.flatten(xs), so has the same type as before, but _.flatten(xs, true) has a different return type, and we can’t necessarily know which was called at compile time.

(As an aside: with constant values technically we could know this at compile-time, and TypeScript does actually have support for overloading on constants for strings. Not booleans yet though, although I’ve opened an issue to look into it)

We can get close though. To start with, let’s ignore the false argument case. Can we type a recursive flatten? Our previous MaybeNested type doesn’t work, as it only allows lists of X or lists of lists of X, and we want to allow ‘lists of (X or lists of)* X’ (i.e. any depth of list, with an eventually common contained type). We can do this by defining a type alias similar to MaybeNested, but making it recursive. With that, a basic type definition for this (again, ignoring the isDeep = false case) might look something like:

 

interface RecursiveList<T> extends List<T | RecursiveList<T>> { }

interface LoDashStatic {
  flatten<T>(array: RecursiveList<T>, isDeep: boolean): List<T>;
}

 

Neat, we can write optionally recursive type definitions! Even better, the TypeScript inference engine is capable of working out what this means, and inferring the types for this (well, sometimes. It may be ambiguous, in which case we’ll have to explicitly specify T, although that is then checked to guarantee it’s a valid candidate).

Unfortunately when we pass isDeep = false, this isn’t correct: _.flatten([[[1]]], false) would be expected to potentially return a List<number>, but because it’s not recursive it’ll actually always return [[1]] (a List<List<number>>).

Union types save the day again though. Let’s make this a little more general (at the cost of being a little less specific):

flatten<T>(array: RecursiveList<T>, isDeep: boolean): List<T> | RecursiveList<T>;

We can make the return type more general, to include both potential cases explicitly. Either we’re returning a totally unwrapped list of T, or we’re returning list that contains at least one more level of nesting (conveniently, this has the same definition as our recursive list input). This is actually a little dupicative, List<T> is a RecursiveList<T>, but including both definitions is a bit clearer, to my eye. This isn’t quite as specific as we’d like, but it is now always correct, and still much closer than the original type (where we essentially had to blind cast things, or accept any-types everywhere).

Putting all this together

These two types together allow us to replace the original definition. We can be extra specific and include both by removing the optional parameter from the original type definition, and instead including two separate definitions, as we can be more specific about the case where the boolean parameter is omitted. Wrapping all that up, this takes us from our original definition:

 

interface LoDashStatic {
  flatten<T>(array: List<any>, isDeep?: boolean): List<T>;
}

 

to our new, improved, and more typesafe definition:

 

interface RecursiveList<T> extends List<T | RecursiveList<T>> { }
interface MaybeNestedList<T> extends List<T | List<T>> { }

interface LoDashStatic {
  flatten<T>(array: MaybeNestedList<T>): List<T>;
  flatten<T>(array: RecursiveList<T>, isDeep: boolean): List<T> | RecursiveList<T>;
}

 

You can play around with this for yourself, and examine the errors and the compiler output, using the TypeScript Playground.

We’ve submitted this back to the DefinitelyTyped project (with other related changes), in https://github.com/borisyankov/DefinitelyTyped/pull/4791, and this has now been merged, fixing this for Lodash lovers everywhere!

Tags: , , , , ,

Categories: Technical

«
»

Leave a Reply

* Mandatory fields


seven − = 1

Submit Comment