Softwire Blog


Your Build Smells: Nervous Pushes


6 January 2016, by

This post is the second post in a series looking at Build Smells – signs of major issues in team build processes – and the things we can do to try and solve them. You might also be interested in the first post on painful merges, and do watch this space or follow us on twitter to see more of these as we publish them.

Nervous Pushes

Actually pushing code frequently is important for lots of reasons. As we covered in our previous post, sharing code earlier with your team can significantly reduce the time you spend fighting your version control systems, but even outside that, pushing code brings you closer to actually releasing useful code, and generally getting anything useful done.

Anything that stops you pushing code is something that’s slowing and hurting your team, but unfortunately it’s easy to get into a situation where your build does exactly that. If pushing code normally means something breaks then not only do you lose the time required to investigate and fix the issue (if there even is one), but you also lose the focus and concentration you had up until that point, as you have to context switch out of whatever new task you’d picked up in the meantime back to the fallout from your last one.

All this time and effort is something we humans instinctively avoid, and very quickly you’ll find if pushing code creates hassle, then people will push code less often. At this point, your build is making your team worse at development. This is a build smell.

Confidence

All of this fundamentally undermines your confidence in your build. If you have confidence in your build then you can write code, quickly find out whether it works (and quickly deal with it if it doesn’t), and iterate at high speed to code that actually solves your problem. Confidence lets you go fast, focusing on each task in hand sequentially and quickly moving toward success there, rather than having to keep one distracted eye nervously watching everything else to see what’ll break next.

This is one of the underlying components of flow, the much-discussed feeling of complete focus and being ‘in the zone’ that lets you zoom through development tasks with speed and control. This only works if you can be quickly confident in whether your code works as you write it; you need to be able to keep moving forwards without distractions and uncertainty, have your mind full with only what you’re building, and concentrate on your goals. Without confidence in your build, that’s not possible.

When confidence really falls apart you end up assuming the build is wrong more than it’s right and this gets much worse; you lose more than just the required time to fix issues, but you normalize build failures, start to ignore them, and lose the ability to deal effectively with actual failures when they come up. Once you start automatically just rerunning the build a couple of times when it fails then fixing the build starts to take even longer still, this problem gets even worse, and the whole situation quickly spirals.

If we can set up our builds to give us confidence instead, then we can build better software, faster (and enjoy ourselves more). If your build doesn’t give you this, then you have a problem to fix.

How can we fix this?

This all revolves around the core quality-checking steps of your build, which I mean to include anything that could really fail because of potentially incorrect code changes: automated tests, linting and style checks, and even compilation phases.

To get confidence your build, these steps need to give you accuracy, thoroughness, speed and simplicity.

Accuracy

If the code actually works, the build should pass.

The easiest way to totally undermine confidence is to have a build that fails half the time. As soon as this becomes normal, it becomes very difficult to quickly or easily get confidence in the code you write, and your development speed (and quality) drops significantly. There’s a few things that we’ve found particularly effective for helping here:

  • Reduce test dependencies – The most common cause of inaccuracy is intermittent issues with something outside your codebase, be that external systems, time and timezones, or the current operating system. Dependencies on any of these will bite you eventually, and working to remove or limit those really helps. Setup and teardown dependencies within tests, and take care about the assumptions you make about your environment.
  • Keep environments identical – Builds that fail only on the build server are hard to trust, frustrating, and difficult to fix. Try and keep environments consistent to avoid that, at the very least make it possible to create an environment locally that’s identical to CI, for debugging. There’s lots of great tools nowadays to support this; Docker and Vagrant are both good options, for example.
  • Have zero tolerance for inaccuracy – Lack of accuracy will snowball; once you’ve got one test that randomly fails more will join it without you noticing, and the frustrations of this make it difficult for people to invest in test quality generally, making it worse. You have to disproportionately fight to beat these issues when they happen, or they’ll get far worse.

Thoroughness

If the build passes, the code should actually work.

Passing tests need to be something that gives you good confidence in the quality of the codebase, so you’ve got passing tests you can move forwards and focus on the next problem, with nagging doubts that you’ve broken something and not noticed.

  • Monitor Coverage (a bit) – Code coverage is an oft-rejected metric, and strict adherence to arbitrary goals can be harmful, but overall coverage trends and patterns reliably contain useful data. It’s very useful to know that coverage is going down recently, or that one area is much better covered than another. Measure (and examine) these patterns to get the real value, and keep your team focused on thoroughness.
  • Ensure every potential risk is covered – The key things here is to not skip testing things because it’s hard. Sometimes you can’t usefully unit test something, but that doesn’t mean you should skip testing it. Add thorough integration tests instead, and ensure every potential risk gets caught somewhere.
  • Test what might fail – Overtesting can bite you too, increasing maintenance cost, and sapping morale and team investment in good testing. If there are no reasons you can think of why a test could fail, it’s not useful (even Kent Beck agrees), and you should potentially move the test up to a higher level. Instead of writing tests for every simple getter, test them as part of the code using them.

Speed

Builds should run fast enough to let you quickly know their result.

Speed powers confidence, allowing you to quickly check how you’re doing and iterate. Time spent waiting for builds is typically unproductive time, but also creates painful context switching: If you get pulled back to your previous change three hours later when the build fails then you’re no longer in the best place to fix it, and you’ve lost any focus on what you moved onto next too.

  • Quick feedback – Effective team flow and the ability to move quickly depends on a quick feedback cycle. Waiting 4 hours to know whether your change worked makes it very hard to quickly iterate on changes. Anything you can do to reduce the time to feedback makes this much easier, and helps you and your team develop faster and better. Tools like Wallaby (for JS) and NCrunch (for .Net) are fantastic for this.
  • Watchable unit tests – Unit tests that you are prepared to look at while they run (so 10 seconds max, unless they’re somehow fantastic to watch) are tests that you’ll be much happier to run all the time. Longer, and your mind will wander – taking your focus with it – and resulting in you running the tests far less. Fight to keep unit tests running fast.
  • Smoke test suites – You won’t manage to get integration or system test suites down to this time. Once you’ve got a 4 hour system test suite though, you can be sure that nobody on the team is running it before every commit. You can make that easier by picking a very small representative set of these to use as smoke tests, which are quick enough that people can run them as they work. You can also set these up as a early build step in CI, to ensure major problems are caught quickly in CI (and to avoid waiting 4 hours to discover that 100% of your tests failed).

Simplicity

Builds should be easy enough to check locally and fix that people actually do.

People are lazy, and that includes you, me, and everybody on your team. If running your build takes work, they’ll avoid it. Keeping the build simple and easy to use encourages people to actually go and use it, rather than avoiding it as an obstacle on their road to getting things done.

  • One click – If your build takes a series of actions, people will forget one, do them wrong, or skip to ‘only the important ones’. It’s typically easier than you’d expect to get it down to a single click (or simple command line step), and providing that kind of simple interfaces makes it drastically more likely than you’d expect that people will use it.
  • Zero setup – Manual setup wastes time, and you’ll do it wrong anyway. It also You’ll also have big problems when the setup changes, you have to migrate users from setup A to setup B, and you start finding tests that only pass in one setup. Again, this is often easier than you’d expect (especially with tools like Docker/Vagrant and Puppet/Ansible/Chef), and has big value.
  • Easy debugging – Eventually builds will fail, and making it easy to work out why will make people fix them more quickly and more effectively. Have Selenium take screenshots at the end of failing tests, make sure your build and CI setup provides detailed info on what exactly failed, have any logs from the system under test easily accessible, and ensure it’s easy to run individual tests standalone locally.

Put all this together, and you can put your project in a state where your tests are easy and quick to run, and reliably give you a good clear idea of whether your system works. All of this powers confidence, and makes it much easier to quickly and successfully build high-quality software every day.

Hopefully that’s an interesting insight into the kind of issues we see in projects, and some of our approaches to solving them!

We do encourage all our teams to find and play with their own solutions for problems they face, and we studiously avoid dogmatically sticking to any one approach, so I should note that these are just an example of a range of practices we’ve found effective in different cases. Don’t blindly follow advice from the internet just because it sounds convincing! Take a look at the concrete problems your team in facing, and consider whether some of the approaches and practices here might help you to effectively solve your real issues.

Let us know your thoughts all this below, or on twitter! In the next post, coming soon, we’ll take a look at the next smell in the series: Rare Releases.

Your Build Smells: Avoiding Painful Merges


23 December 2015, by

This post is the first post in a series looking at Build Smells – signs of major issues in team build processes – and the things we can do to try and solve them. Watch this space, or follow us on twitter to see more of these as we publish them.

Painful Merges

If you’re not careful, it’s easy to end up in a situation on a project where you end up consistently wasting time fighting merge conflicts, rather than getting useful things done. This isn’t unavoidable, and it’s a build smell: a sign that all is not quite right here.

We see cases where this is not just occasional, but actually becomes completely standard, and there’s simply an accepted expectation that teams working together will burn time every couple of weeks, as somebody spends a day merging everybody’s changes together (or a week a month, or worse). It’s not uncommon to see cases where this is even encouraged: where everybody in the team automatically starts a branch every sprint, and everybody merges them all together all at once at the end (nearly two weeks later). This is a recipe for pain.

This wastes time and saps morale. It’s even worse than that in reality though, because if you’re continually expecting to hit merge conflicts then you can’t really refactor; any wide-ranging changes to the codebase either require everybody to down tools, or to risk huge time-wasting conflicts. You end up in a situation where your build processes and tools are pressuring you not to refactor, where your own build is encouraging you to write worse code. Renaming a method becomes a painful and potentially costly process, so you don’t.

This is a build smell. Merges should be frequent, small, and typically easy. If you see people burning large amounts of time merging, actively avoiding merging their code, and racing each other to push changes to avoid having to deal with the repercussions, you’re seeing this in action.

How do these happen?

We can examine this more closely at this by looking at some of the most common forms of merge conflict:

  • Textual conflicts

    Conflicts where two changes have been made that are totally independent, but which don’t have a clear resolution when merging textually.

    For example: two people adding two new fields to a class in Java, on the same line. The fact they chose the same line typically makes no difference, and the fields typically independent, but all major merge tools will ask you to resolve this manually, as they have no way to decide which line should appear above the other, or whether that can be done safely.

    Interestingly, this conflict transparently disappears if we make these changes in sequence. If one person adds a field, and pushes the code, and another pulls the code, and adds a field, then there are no merge conflicts, and the situation is invisibility resolved with zero extra effort from either party.

  • Simple semantic conflicts

    Conflicts where two changes have some interaction that produces incorrect behaviour, but aren’t necessarily in direct opposition. Changing a getDistance() method to return a distance in meters instead of centimeters for example, while somebody else adds some code that calls that method.

    These changes may well not textually conflict, but do have a relationship: code that’s expecting centimeters will (probably) not act correctly when given a different unit.

    (As an aside: issues like this are a good reason to include units of measure in your types, to use things like F#’s built-in unit of measure support where possible in languages that support that, or to at least use clearer method names).

    Again though these conflicts are trivial to resolve if the changes are made in sequence. If you are already aware that the method returns meters, because that change has already been shared, then you’ll ‘resolve’ the conflict without even noticing, by writing your code with consideration of that built-in from the start.

  • Fundamental semantic conflicts

    These are the conflicts where you and somebody else are trying to make changes that are fundamentally in opposition.

    Perhaps I need a button to be green, and you need it to be red, or I need to make a major refactoring of a class that makes some functionality impossible, and you’re writing some new code that depends fundamentally on that functionality existing.

    These are conflicts without easy resolutions. To solve this, we’re going to have to have a discussion, and come to some compromise in our goals and overall approach.

    These issues are also far rarer than people imagine though. Try to think of a conflict recently that would’ve been an issue if you’d made your changes entirely in sequence, rather than in parallel. It’s rare that two developers on the team are chasing mutually exclusive goals at the same time.

    Notably when this does happen it’s still much easier if we can resolve these immediately. If I find out that we can’t make buttons red when I make the first button red, I’ve got a problem to solve. If I find out that we can’t make it red after I’ve spent three days redesigning the whole application in shades of burgundy then I’ve wasted a huge amount of time, I have to undo all that, and I still have the same problem to solve too.

When we hit painful merges, it’s rarely because we’ve actually hit fundamental conflicts. Most of the time, we’ve just built up a large set of textual and relatively simple semantic conflicts all together, and they’ve snowballed into a huge conglomerated mess that we now have to unravel.

The underlying conflicts here come from isolation. Code that’s kept on a branch or kept locally unpushed is code that’s isolated from everybody else on the team. They’re secret changes you’ve made, that you’re not sharing with everybody else. Because you haven’t shared them, nobody can take them into account in their own changes, and conflicts quickly proliferate.

This gets dramatically worse as isolation increases. If you keep code on a branch for a day, you have one day of your changes to potentially combine with one day of everybody else’s on the team. If you all stay on your branches for a week, you’ve got a whole week’s worth of your own changes to combine, with every other person’s week-sized chunk of work too. This quickly gets out of hand.

What can we do about it?

The solution here comes neatly out of the above: large conflicts are typically a collection of many relatively simple conflicts, and each of those would almost disappear if the changes were made in sequence. Therefore, if we want to get rid of merge conflicts, we need to move as much possible away from isolating our codebases, and get as close to being a team of people making small changes in sequence as we can.

To fix this smell, share your code early. As early as possible; I’d suggest you integrate your code with the rest of your team every time you have code that doesn’t break anything. As soon as you have code that compiles, passes your tests, and functionally works (to a level where you’d be happy showing it to a customer): share it with your team. Sharing this won’t break anything (by definition), and completely eliminates any risk of conflicts within that code.

That’s all well and good, but much easier said than done. Let’s look at some examples:

  • Sharing tests

    If you’ve written a couple of new passing tests in a clean codebase (for example, just before you’re about to refactor something), you can commit them and share them immediately. They’re passing tests, so you can be pretty confident you’re not going to break anything, and once pushed you remove any risk that they’ll conflict with others in the team.

    If you don’t do this for a while, you build up isolated changes. When somebody attempts to refactor test code in the meantime, touch any of the dependencies you’re using, or actually touch the code you’re testing, they have no way to easily take your changes into consideration.

    Meanwhile if you have pushed your tests then it’s easy for others to check that their changes don’t break it, or to apply their wider refactorings to your test too (often automatically, with automatically applied refactorings).

    Doing TDD, and you’ve written a failing test instead? There’s reasonably mileage to be gained in many cases in ignoring the test (e.g. with @ignore in JUnit), and sharing that (although this does require discipline to ensure you don’t leave tests ignored long term).

    You don’t gain any guarantees in this case that the behaviour under test won’t change, but you do ensure your code will continue to compile, and others making wider changes that affect your code will fix it in the process of those changes like any other code in the codebase. By sharing your code early here, you allow people making other changes to do so with respect to yours immediately, rather than leaving you to tidy up the mess later.

  • Sharing model and component changes

    If you’ve just added a field, or a new method to the models or internal components of your system, you’re almost certainly very quickly back in a functionally working state where your changes could be shared. A remarkable amount of the code you add as you progress towards building new functionality doesn’t break your application for your users or other developers, and can be easily shared immediately.

    Added a new calculateCost() method to your Invoice class as the first step in the feature you’re building? Share that now, and the other developer who needs to refactor Invoice tomorrow for their changes won’t step on your toes, and you’ll both live happier lives.

    This is especially relevant in wider-ranging cross-cutting changes. If you’ve refactored big chunks of code, changing how logging works, or renamed some fields, it’s should be very unlikely you’ve made any user-facing changes to your application at all, and your tests should be catching any such issues if you have. In addition, the stakes are higher; refactorings particularly quickly collect conflicts, while the majority of them won’t have any fundamental conflicts at all (rename field/variable, extract method/class/interface, etc), and fundamental refactoring conflicts that do appear will explode far more painfully if left to ferment for a few weeks.

    Sharing changes like early drastically reduces the potential chance and damage of conflicts, and is well worth pushing for where possible.

  • Sharing trickier changes

    These are two examples where it’s particularly easy, but they don’t cover every case. Often you want to make a change that is fairly large in itself, and individual changes within that will result in broken or incomplete functionality. Rewriting the interface for a page of your application is a good example, performing a huge refactoring or rewriting a major hefty algorithm within your application; in many cases this might be a large and challenging change, and nobody wants to share a broken or incomplete page or algorithm with the rest of their team (let alone their users).

    There are things we can do in this case however. Using feature flags/toggles and branching by abstraction are two particular approaches. Both revolve around sharing code changes immediately, but isolating the user-facing functionality within the codebase (rather than isolating the functionality and the code changes, using version control).

    Branch by abstraction works by introducing an abstraction layer above the code you’re going to change, which delegates to either the old or new implementations. Initially this calls the old implementation (changing nothing), but once standalone parts of the new implementation are completed and ready the abstraction can be changed to call through to new code instead, and eventually the old implementation (and potentially the abstraction) can be removed.

    Feature flags work somewhat similarly, but dynamically with configuration, either application-wide, or in per-user settings. A conditional statement is added at a relevant branching point for your change, switching between the two implementations based on the current configuration (return flags.newUserPageEnabled ? renderNewUserPage() : renderOldUserPage()).

    This has a few advantages, notably making local development of visible features like new pages easier (just enable the config property only on your machine), and allowing you to take this even further to become a feature in itself (as done by Github, Flickr and Travis CI) providing you fine-grained control over your feature release process, for progressive rollouts or A/B testing.

All of these techniques help you share code earlier, and sharing code early enough can almost totally remove conflicts from your team’s process, and make them easier to deal with when they do happen.

I should be clear that I’m not saying branching is always a bad idea. Branching has many sensible practical and effective uses, and it’s an incredibly powerful tool. Unfortunately many teams fall back to it as their goto tool for every change they make, and they start to build the way they work around actively keeping things on branches.

Branching comes with a cost, that rapidly increases over the lifetime of the branch. That’s fine if you’ve got a specific benefit you want to get from branches, but not a good idea if you don’t. Some good reasons to use branches for example: very experimental code that’s unlikely to make it into production, situations where you can’t allow everybody push access to master directly (as in open-source and some regulatory environments), and situations where your review processes don’t work effectively with it (on of the main reasons I see for not following this, as Crucible is one of few good tools that will easily let you review commits on master).

Nonetheless, in each of these cases it’s still worth aiming to merge more frequently. Encourage many small pull requests to add a feature, don’t allow experiments to go on too long with deciding whether they’re worth committing to, and push for more smaller reviews with minimal turnaround time before merging.

In general, smaller and more frequently shared changes disproportionately reduce the risk of each change, make refactoring vastly easier, and can often make conflicts disappear with no extra effort. A large proportion of merge conflicts simply are simply invisible if the changes are made in sequence instead, and sharing code earlier moves you closer and closer to this situation.

Finally, it’s notable that this way of working isn’t just a benefit for conflict resolution. It is a good idea all the time to aim to break your work into many small chunks, with many steps at which the code is fully working. Doing so helps avoid yak shaving, makes each of your changes easier to understand, and has lots of benefits for time management and estimation too. There’s a lot of upsides.

Caveats

Hopefully that’s an interesting insight into the kind of issue we see in projects, and some of our approaches to improving them!

We do encourage all our teams to find their own solutions independently for their problems however, and I’d be lying if I claimed this is our standard go-to approach in every case. It is an approach that many of us have found effective in places though, and it’s well worth considering as an option if you’re facing these same challenges in your team.

I should be clear also that I’m not suggesting that branches are never useful, that this is the One True Way of managing version control, or that you should dive into this 100% immediately. If your team is facing concrete issues with conflicts and merges, experiment with reducing isolation and sharing code earlier and more frequently, and measure and discuss the changes you see in practice. We’ve found benefits in many cases, but that is by no means definitive, and there will be cases where this is not the right choice.

If you’re interested in more thoughts on this topic, I do recommend the writings of Martin Fowler, particularly around feature branches, and Jez Humble and Dave Farley’s book ‘Continuous Delivery‘.

Let us know your thoughts all this below, or on twitter! In the next post, coming soon, we’ll take a look at the next smell in the series: Nervous Pushes.

24 Pull Requests: The Hackathon


18 December 2015, by

We at Softwire care about open-source, and for the last 3 years every December we’ve been very involved in 24 Pull Requests, an initiative to encourage people to give back to the open-source community in the 24 days of advent up to Christmas.

That’s still ongoing (although we’re doing great so far, and watch this space for details once we’re done), but this year we decided to kick it up a notch further, and run an internal full-day hackathon, which we recently ran, on the 7th. This came with free holiday for employees (with 1/2 a day of holiday from our morale budget and 1/2 a day from our holiday charity matching scheme), to make it as easy as possible for people to get involved, and prizes for the first 3 contributions and the best 3 contributions (judged at the end).

The Softwire team enjoy a good breakfast

We kicked the day off with a cooked breakfast buffet at 9, following a brief bit of warm up and setup, and then a team stand-up with the 20 or so participants. Everybody quickly went round and outlined what they were looking at (or what they wanted, if they didn’t have an idea), so people doing similar things could find each other, and people with more enthusiasm than specific plans could find inspiration or other people to pair with and help out. And from there, we’re off!

We then managed to get the first prizes for the 3 pull requests all claimed by 11.30, with Rob Pethick adding in missing resources in Humanizer, Hereward Mills removing some unused code from Cachet, and Dan Corder moving AssertJ from JUnit parameterized tests to JUnitParams (quickly followed by a stream more of PRs coming in just slightly behind).

This pace then accelerated further through the afternoon, as people found projects and got more set up, resulting in a very impressive 46 pull requests total by the end of the day (taking us to a total of 89 for the month so far).

These covered all sorts of projects and languages, and included changes like:

Plus a whole range of other interesting and slightly larger changes, along with the many small typo fixes, documentation updates, setup instruction fixes and so on that are so important in really making open-source projects polished, reliable and usable.

Finally, the most credit must of course go to our winning three pull requests of the day:

  1. Adding some detailed and fiddly tests for various statistical distributions in jStat, and fixing the bugs that they found too, by David Simons
  2. Fixing a tricky Sequelize bug, to support column removal with MSSQL DBs, by Michael Kearns
  3. Improving the accuracy of error highlighting in the output of the TypeScript compiler (despite neither knowing the compiler or even the language beforehand), by Iain Monro

Our team hard at work

Great work all around. We enjoyed an excellent day of open-sourcing, made a lot of progress, and we’re now in a great place to do more and more open-source over the coming weeks, and beyond once 24PRs is complete.

We’re now up to 96 pull requests in total, and counting, and we’re hoping to keep making progress up the 24PRs teams leaderboard as it continues!

Watch this space for more updates on the tools we’ve been working on and interesting changes we’ve been making, or follow us on twitter for extra details.

Typing Lodash in TypeScript, with Generic Union Types


5 November 2015, by

TypeScript is a powerful compile-to-JS language for the browser and node, designed to act as a superset of JavaScript, with optional static type annotations. We’ve written a detailed series of posts on it recently (start here), but in this post I want to talk about some specific open-source work we’ve done with it around Lodash, and some of the interesting details around how the types involved work.

TypeScript

For those of you who haven’t read the whole series: TypeScript gives you a language very similar to JavaScript, but including future features (due to the compile step) such as classes and arrow functions, and support for more structure, with it’s own module system, and optional type annotations. It allows you to annotate variables with these type annotations as you see fit, and then uses an extremely powerful type inference engine to automatically infer types for much of the rest of your code from there, automatically catching whole classes of bugs for you immediately. This is totally optional though, and any variables without types are implicitly assigned the ‘any’ type, opting them out of type checks entirely.

This all works really well, and TypeScript has quite a few fans over here at Softwire. It gets more difficult when you’re using code written outside your project though, as most of the JavaScript ecosystem is written in plain JavaScript, without type annotations. This takes away some of your new exciting benefits; every library object is treated as having ‘any’ type, so all method calls return ‘any’ type, and passing data through other libraries quickly untypes it.

Fortunately the open-source community stepped up and built DefinitelyTyped, a compilation of external type annotations for other existing libraries. These ‘type definitions’ can be dropped into projects alongside the real library code to let you write completely type-safe code, using non-TypeScript libraries.

This is great! Sadly, it’s not that simple in practice. These type definitions need to be maintained, and can sometimes be inaccurate and out of date.

In this article I want to take a look at a particular example of that, around Lodash’s _.flatten() function, and use this to look at some of the more exciting newer features in TypeScript’s type system, and how that can give us types to effectively describe fairly complex APIs with ease.

What’s _.flatten?

Let’s step back. Lodash is a great library that provides utility functions for all sorts of things that are useful in JavaScript, notably including array manipulation.

Flatten is one of these methods. Flattening an array unwraps any arrays that appear nested within it, and includes the values within those nested arrays instead. Flatten also takes an optional second boolean parameter, defining whether this processes should be recursive. An example:

 

_.flatten([1, 2, 3]);                     // returns [1, 2, 3] - does nothing

_.flatten([[1], [2, 3]]);                 // returns [1, 2, 3] - unwraps both inner arrays

_.flatten([[1], [2, 3], 4]);              // returns [1, 2, 3, 4] - unwraps both inner arrays,
                                          // and includes the existing non-list element

_.flatten([[1], [2, 3], [[4, 5]]]);       // returns [1, 2, 3, [4, 5]] - unwraps all arrays,
                                          // but only one level

_.flatten([[1], [2, 3], [[4, 5]]], true); // returns [1, 2, 3, 4, 5] - unwraps all arrays 
                                          // recursively

 

This is frequently very useful, especially in a collection pipeline, and is fairly easy to describe and understand. Sadly it’s not that easy to type, and the previous DefinitelyTyped type definitions didn’t provide static typing over these operations.

What’s wrong with the previous flatten type definitions?

Lots of things! The _.flatten definitions include some method overloads that don’t exist, have some unnecessary duplication, incorrect documentation, but most interestingly their return type isn’t based on their input, which takes away your strict typing. Specifically, the method I’m concerned with has a type definition like the below:

 

interface LoDashStatic {
  flatten<T>(array: List<any>, isDeep?: boolean): List<T>;
}

 

This type says that the flatten method on the LoDashStatic interface (the interface that _ implements) takes a list of anything, and an optional boolean argument, and returns an array of T’s, where T is a generic parameter to flatten. Because T only appears in the output though, not the type of our ‘array’ parameter, this isn’t useful! We can pass a list of numbers, and tell TypeScript we’re expecting a list of strings back, and it won’t know any better.

We can definitely do better than that. Intuitively, you can think of the type of this method as being (for any X, e.g. string, number, or HTMLElement):

 

_.flatten(list of X): returns a list of X
_.flatten(list of lists of X): returns a list of X
_.flatten(list of both X and lists of X): returns a list of X

_.flatten(list of lists of lists of X): returns a list of list of X (unwraps one level)
_.flatten(list of lists of lists of X, true): returns a list of X (unwraps all levels)

 

(Ignoring the case where you pass false as the 2nd argument, just for the moment)

Turning this into a TypeScript type definition is a little more involved, but this gives us a reasonable idea of what’s going on here that we can start with.

How do we describe these types in TypeScript?

Let’s start with our core feature: unwrapping a nested list with _.flatten(list of lists of X). The type of this looks like:

flatten<T>(array: List<List<T>>): List<T>;

Here, we say that when I pass flatten a list that only contains lists, which contain elements of some common type T, then I’ll get back a list containing only type T elements. Thus if I call _.flatten([[1], [2, 3]]), TypeScript knows that the only valid T for this is ‘number’, where the input is List<List<number>>, and the output will therefore definitely be a List<number>, and TypeScript can quickly find your mistake if you try to do stupid things with that.

That’s not sufficient though. This covers the [[1], [2, 3]] case, but not the ultra-simple case ([1, 2, 3]) or the mixed case ([[1], [2, 3], 4]). We need something more general that will let TypeScript automatically know that in all those cases the result will be a List<number>.

Fortunately, union types let us define general structures like that. Union types allow you to say a variable is of either type X or type Y, with syntax like: var myVariable: X|Y;. We can use this to handle the mixed value/lists of values case (and thereby both other single-level cases too), with a type definition like:

flatten<T>(array: List<T | List<T>>): List<T>;

I.e. if given a list of items that are either type T, or lists of type T, then you’ll get a list of T back. Neat! This works very nicely, and for clarity (and because we’re going to reuse it elsewhere) we can refactor it out with a type alias, giving a full implementation like:

 

interface MaybeNestedList<T> extends List<T | List<T>> { }

interface LoDashStatic {
  flatten<T>(array: MaybeNestedList<T>): List<T>;
}

 

Can we describe the recursive flatten type?

That fully solves the one-level case. Now, can we solve the recursive case as well, where we provide a 2nd boolean parameter to optionally apply this process recursively to the list structure?

No, sadly.

Unfortunately, in this case the return type depends not just on the types of the parameters provided, but the actual runtime values. _.flatten(xs, false) is the same as _.flatten(xs), so has the same type as before, but _.flatten(xs, true) has a different return type, and we can’t necessarily know which was called at compile time.

(As an aside: with constant values technically we could know this at compile-time, and TypeScript does actually have support for overloading on constants for strings. Not booleans yet though, although I’ve opened an issue to look into it)

We can get close though. To start with, let’s ignore the false argument case. Can we type a recursive flatten? Our previous MaybeNested type doesn’t work, as it only allows lists of X or lists of lists of X, and we want to allow ‘lists of (X or lists of)* X’ (i.e. any depth of list, with an eventually common contained type). We can do this by defining a type alias similar to MaybeNested, but making it recursive. With that, a basic type definition for this (again, ignoring the isDeep = false case) might look something like:

 

interface RecursiveList<T> extends List<T | RecursiveList<T>> { }

interface LoDashStatic {
  flatten<T>(array: RecursiveList<T>, isDeep: boolean): List<T>;
}

 

Neat, we can write optionally recursive type definitions! Even better, the TypeScript inference engine is capable of working out what this means, and inferring the types for this (well, sometimes. It may be ambiguous, in which case we’ll have to explicitly specify T, although that is then checked to guarantee it’s a valid candidate).

Unfortunately when we pass isDeep = false, this isn’t correct: _.flatten([[[1]]], false) would be expected to potentially return a List<number>, but because it’s not recursive it’ll actually always return [[1]] (a List<List<number>>).

Union types save the day again though. Let’s make this a little more general (at the cost of being a little less specific):

flatten<T>(array: RecursiveList<T>, isDeep: boolean): List<T> | RecursiveList<T>;

We can make the return type more general, to include both potential cases explicitly. Either we’re returning a totally unwrapped list of T, or we’re returning list that contains at least one more level of nesting (conveniently, this has the same definition as our recursive list input). This is actually a little dupicative, List<T> is a RecursiveList<T>, but including both definitions is a bit clearer, to my eye. This isn’t quite as specific as we’d like, but it is now always correct, and still much closer than the original type (where we essentially had to blind cast things, or accept any-types everywhere).

Putting all this together

These two types together allow us to replace the original definition. We can be extra specific and include both by removing the optional parameter from the original type definition, and instead including two separate definitions, as we can be more specific about the case where the boolean parameter is omitted. Wrapping all that up, this takes us from our original definition:

 

interface LoDashStatic {
  flatten<T>(array: List<any>, isDeep?: boolean): List<T>;
}

 

to our new, improved, and more typesafe definition:

 

interface RecursiveList<T> extends List<T | RecursiveList<T>> { }
interface MaybeNestedList<T> extends List<T | List<T>> { }

interface LoDashStatic {
  flatten<T>(array: MaybeNestedList<T>): List<T>;
  flatten<T>(array: RecursiveList<T>, isDeep: boolean): List<T> | RecursiveList<T>;
}

 

You can play around with this for yourself, and examine the errors and the compiler output, using the TypeScript Playground.

We’ve submitted this back to the DefinitelyTyped project (with other related changes), in https://github.com/borisyankov/DefinitelyTyped/pull/4791, and this has now been merged, fixing this for Lodash lovers everywhere!

Open-Source at Softwire


4 September 2015, by

Here at Softwire, we’re pretty keen on open-source. A huge amount of the work we do day to day depends on the community-developed tools that power the ecosystems around languages like Java, C# and JavaScript.

Sometime it’s nice to give back. There’s the lovely warm feeling of Doing Good, but then we’ve found it’s also a great way to improve your knowledge of your tools, train your development skills outside of normal project work and give ourselves (and everybody else) better tools to use in future. It feels good to be able to look back at the improvements you’ve made to, and to reflect on work we’ve been doing recently.

That’s what I want to do with this post: highlight some of the great contributions Softwire people have made back to the open-source community so far this year, and talk about options for finding things to help out with and getting more involved (both for us, and to encourage any of you who feel keen after reading this!).

Recent open-source contributions we’re proud of:

Want to get involved?

Inspired by any of the above, and interested in trying your hand at a bit of open-source contribution yourself? We’ve put together a list of useful ways we’ve found to find and dig into interesting projects:

Help write Firefox’s Dev Tools

Mostly just needs JS/HTML/CSS skills. firefox-dev.tools has a list of bugs for total beginners to get started with, including a filter to find ‘mentored’ bugs, where there’s somebody assigned to help whoever picks the bug up first get set up and going. wiki.mozilla.org/DevTools/GetInvolved has more details.

Interested in helping out with other Mozilla open-source projects (Firefox OS, Rust, Servo, Firefox itself)? Take a look through whatcanidoformozilla.org.

Subscribe to interesting projects on CodeTriage

CodeTriage lets you search projects by their number of open issues and language, e.g. Java, C# or JavaScript. Get emailed bugs in projects you subscribe to, and have a go at fixing any that sound interesting.

Help write better documentation

DocsDoctor will send you bits of documentation from projects you subscribe to; take a look through them, thing about whether they could be improved or clarified, and put in a quick patch to make everything more understandable for everybody.

For the more hardcore, you can also sign up to get totally sent undocumented methods and classes, and try and put together some useful notes on what they’re for and how to use them.

Make the tools you use day to day better for everybody

What libraries are you using at the moment? Do they work perfectly? Anything in the API that’s confusing, any poor documentation, or weird behaviour you have to work around? Send them a quick bug on Github, or have a look at fixing it yourself, and make your life (and everybody else’s) easier forevermore.

Come work for Softwire

We love this stuff, and we’re hiring.

 

Introduction to TypeScript: More Than Just Types


27 August 2015, by

In this post we’re going to take a look at the exciting extra features of TypeScript, and see what provides beyond just JavaScript with types.

This is the final post in our 4-part series on TypeScript, so do take a look at our previous posts on type inference, type annotations and type declarations for more background.

ES015 (aka ES6)

ES2015 has introduced a whole swathe of features, including arrow functions, classes, let/const and promises. Browser vendors are rapidly working to add these features, and so is TypeScript, in its ongoing effort to act as a superset of standard JavaScript syntax. Progress on this so far is good, with around 52% support for the new features added, compared to 48% for Chrome, 67% for Firefox and 66% for Edge. That 52% figure does require a browser that supports that ES6 syntax though (or a polyfill, like core-js), in addition to you compiling the TypeScript with the ‘--target ES6‘ option. Still, this means you can typically immediately start using ES6 features supported by your target browser set in TypeScript.

We can do better than this though. In lots of cases ES2015 features can be compiled back into ES5 or even ES3-compatible code, allowing you to use new ES2015 features right now, and run it even in substantially older browsers. This works to varying degrees, and is difficult to accurately measure since there’s a few specific cases that can’t be effectively worked around in special circumstances (like closuring a ‘let’ declaration within a loop), but overall TypeScript does currently support around 30% of the ES2015 features even when targeting ES5.

That means out of the box when writing TypeScript in an environment that doesn’t support ES2015, you can still immediately use:

  • Arrow functions:  (x) => x * 2
  • Destructuring:  var {a, b} = {a:1, b:2}
  • Spread operator:  function f(...x) { }; var xs = [1, 2, ...iterable, 5, 6];
  • For/of:  for (var x of [1, 2, 3]) { ... }
  • Let/const:  let x = 1; const y = "hi"
  • Template strings:  y = `hello ${myVar}`
  • Tagged template strings:  y = escapeHtml`<script>...</script>`
  • Classes (as seen in our previous post)
  • ES2015 modules (see ‘module systems’ below)
  • Unicode characters outside BMP:  "\u{1f4a9}"
  • Default parameters:  function f(a = 1) { }

And probably more, although there doesn’t seem to be an authoritative list anywhere. All of these get successfully compiled back into their non-ES2015 equivalents, allowing you to write enjoyably modern and clean ES2015 code without giving up on compatibility with older browsers. The TypeScript team are aiming to extend this further, wherever it’s possible to do so with reasonably simple and performant equivalents. All of the above fit that, and a few cases have no runtime impact at all: let/const are emitted just as ‘var’ statements for example, but checked for correct usage at compile time instead.

Paste any of above into the TypeScript Playground and take a look at the resulting compiled ES5 code that appears, if you want to see how this works in practice.

Extended class syntax

In addition to supporting the ES2015 class syntax, TypeScript also includes a few extensions. Some of these we’ve seen in previous posts, such as field visibility (public/private/protected), but there’s a couple of other interesting features.

Constructor parameter properties

class MyClass {
  constructor(private x: string) {}
}

In a syntax similar to that of Scala, TypeScript provides a shorthand to let you take a parameter in your constructor and immediately assign it to a public, private or protected field. The above code is exactly equivalent to (but shorter and simpler than):

class MyClass {
  private x: string;
  constructor(x: string) {
    this.x = x;
  }
}

Decorators

Taking a leaf from the current ES2016 (ES7) drafts, in turn heavily inspired by Python, TypeScript 1.5 includes support for class, method, property and parameter decorators. These are quite complicated and very new (feel free to read the ES7 proposed implementation docs for all the gory details), but provide some exciting new options for flexibility and DRY code when defining classes.

The usage of the 4 various decorator types looks like:

@myClassDecorator
class AClass {

  @myPropertyDecorator
  private x: string;
  
  @myMethodDecorator
  public f(@myParamDecorator x: number) { }  

}

In each case, the decorator wraps the decorated element, and potentially redefines or extends how it works.

You can for example create a @log method decorator that logs a message before and after the method is called, to give you a trace you can analyse to understand the flow in your application. Each form of decorator works in a slightly different way to allow this kind of wrapping; a class decorator is given the class’s constructor for example, and must return the new constructor that will replace it.

Digging into the depths of this is really out of the scope of this post, but decorators are a complicated and powerful tool for sharing cross-cutting logic between multiple classes that’s well worth learning to keep your code clean as your codebase expands. Take a look at this excellent TypeScript writeup if you’re keen to start looking into understanding and using these immediately (but I’d recommend getting to grips with the rest of TypeScript first).

Note again, as with many of the ES2015 features, that this is backward compatible and happily compiles down to standard ES5 code, so you can start writing code using it immediately.

Module systems

For a long time JavaScript has had an ongoing battle between various approaches to modularization, from globals to IIFE to CommonJS and AMD, to ES6 modules, and more, each introducing different syntax, functionality, and new sets of problems.

TypeScript lets you push straight past this. TypeScript includes it’s own standard built-in module system, sticking closely to the standard ES6 approach, and automatically hooked into the type system, so it can understand the type of the value that’s just been imported by looking at its corresponding source at compile time. The syntax for this is fairly simple and easy to understand if you’ve ever used either CommonJS or the new ES6 modules:

file1.ts:

export default function myFunction(): boolean {
  ...
  return true;
}

file2.ts:

import myFunction from "./file1";

var y: boolean = myFunction(); // understands types from the other file automatically

There’s more to it than this, but that’s the essence. The reason this is particularly intesting though, rather than just being another ES2015 feature, is that the output module format of TypeScript compilation is configurable. By specifying the ‘--module‘ argument, you can compile from this to code that supports any other module format supported. Currently that’s:

  • CommonJS
  • AMD
  • UMD
  • SystemJS
  • ES6 (by specifying ‘--target es6‘ and not specifying the ‘module’ argument)

This lets you write code in the most modern and powerful format available (ES2015 modules), while transparently compiling back to whatever format you need to use to integrate with the rest of your existing code and libraries.

Coming soon…

All this is already in TypeScript, but there’s more coming soon too! The full roadmap is available at github.com/Microsoft/TypeScript/wiki/Roadmap, and it’s worth being aware of some of the particular highlights:

  • Async/await – drawn from ES2016, inspired in turn by C#. Synchronous looking and feeling APIs for async code, to make asynchrony as frictionless as possible
  • Generators – another ES2015 feature, this one coming from Python. Syntax for describing functions that can repeatedly return values on demand, letting you treat them as iterables
  • JSX support – fluent syntax for creating HTML elements when using React

The End

This concludes our 4-part series on TypeScript. Hopefully it’s been an interesting insight into the language, how and why you might use it, and given you an itch to try it out! TypeScript is a exciting language, providing safety, structure, and a host of powerful features that can be used incrementally on top of the JavaScript you already writing, giving you remarkable flexibility to progressively pull modern development practices from a range of other languages into your JS.

If you are interested in having a closer look, the easiest way to quickly a go yourself is with the TypeScript playground. Setting it up in a real project isn’t much more complicated: you can build your code directly with the official command-line tool, or there’s plugins for grunt, gulp, broccoli, maven, gradle and probably whatever the latest flavour of the month build tool is. There’s also integrations abound in the tools you use for development, including Intellij, Visual Studio, Atom and Sublime Text.

Play around, try it out on your next project, and let us know your thoughts either at @SoftwireUK, or in the comments below.

Introduction to TypeScript: Type Declarations


19 August 2015, by

This is the 3rd post in our series on TypeScript. Take a look at the first and second in this series for an introduction to the basic of TypeScript, type inference, and type annotations. In this post we’re going to take a more detailed look at how we can define our own types in TypeScript, to be used by the inference engine and in our own annotations.

Defining your own types

Up until now we’ve looked at ways to describe types in terms of the predefined types available (e.g. string, number or HTMLElement). This is powerful alone, but defining our own types on top of this dramatically increases its usefulness. In TypeScript there are a few main ways to define a new type:

    • Type aliases: type ElementGenerator = (a: string, b: string, c: boolean) => HTMLElement

      The easiest option is to just define a new simpler name for an existing type. Type aliases (added in 1.4) let you do this. This new type can then be used exactly as you would any other type, but with a shorter easier snappier name.

      Note that this is mostly just adding readability to your code: unlike equivalents in other languages (such as Haskell’s newtype) structural typing means this doesn’t increase type safety. Since types are still matched only structurally you can’t define type minutes = number, and use that to check that your variable is set only with values that are explicitly specified as being of the minutes type; any number is always a valid minute.

    • Interfaces
      interface MyInterface {
        property1: number;
        anotherProperty: boolean[];
        aMethod(HTMLElement): boolean;
        eventListener: (e: Event) => void;  
      }
      
      // Interfaces can be extended, as in most other modern OO languages
      interface My2ndInterface extends MyInterface {
        yetAnotherProperty: HTMLElement;
      }
      
      // Similarly, interfaces can be generic as in other languages
      interface MyGenericInterface<T> {
        getValue(): T;
      }
      
      interface FunctionInterface {
        // Objects fulfilling this are callable functions, taking a number and returning a boolean
        (x: number): boolean;
      
        // We can also have hybrid interfaces: functions that also have properties (like jQuery's $)
        myProperty: string;
      }
      
      interface DictionaryInterface {
        // Objects fulfilling this act as dictionaries, with arbitrary string keys and numeric values
        [key: string]: number;   
      }
      

      Interfaces act mostly as you’d expect from languages like Java, C# and Swift, with a few extra features. Any function or variable that is annotated or inferred to have the type of the interface will only accept values that match this, giving you guarantees that your values are always what you expect. Note too that all of this is just for compile-time checking; this is all thrown away by the compiler after compilation, and your interfaces don’t existing in the resulting compiled JS.

      The key major difference between how this works and most other languages is that this type checking is done purely structurally. A value matches an interface only because they have the same shape, not because they’re explicitly indicated as being a member of that that interface anywhere else.

    • Classes
      class MyClass extends MySuperclass implements AnInterface {
        // Fields and methods can have private/public modifiers (they're otherwise public by default)
        private myField: number;
      
        constructor(input: number) {
          super("hi"); // Subclasses have to call superclasses constructors before initializing themselves
          this.myField = input * 2;
        }
      
        calculateAnImportantValue(newInput: number): number {
          return this.myField * newInput + 1;
        }
      
        // Classes can include property accessors
        get propertyAccessor(): number {
          return this.myField;
        }
      
        static myStaticFunction(xs: number[]): MyClass[] {
          return xs.map(function (x) {
            return new MyClass(x);
          });
        }
      }
      
      // Classes are instantiated exactly as in vanilla JavaScript.
      // (The type alias here is just for clarity; it'd be inferred anyway)
      
      var instance: MyClass = new MyClass(10);
      

      Classes simultaneously define two things: a new type (the corresponding interface for the class), and an implementation of that type. The actual implementation of this acts the same as the new built-in class functionality in ES2015 (ES6) – defining a constructor function and attaching methods to the prototypes – but

      You are only allowed one implementation per function in Typescript (including the methods and constructor here) to ensure compatibility when using the compiled code from pure JavaScript, so there’s no method overloading like you might’ve seen elsewhere. You can have multiple type definitions for a function though to simulate this yourself, although it’s a bit fiddly. Take a look at function overloads in the TypeScript handbook for the gory details.

    • Enums
      enum Color {
        Red,
        Green,
        Blue
      }
      
      var c: Color = Color.Blue;
      

      Another nice feature, straight from the standard OO playbook. Enums let you define a fixed set of values with usable friendly names (underneath they’re each just numbers, and an object to let you look up numbers/names from one another).

      Structural typing here is actually something of a hinderance however, limiting the value of enums compared to elsewhere. Enums become far more powerful within nominal type systems, whereas in TypeScript you sadly can’t check a method that takes a Color from above isn’t given any old potentially invalid number instead, for example. Take a look at the TypeScript playground at http://goo.gl/FZMzaj for a happily compiling but totally incorrect example.

      Nonetheless, while enum’s safety-giving power is limited they can still bring quite a bit of clarity and simplicity to code, and are definitely a useful tool regardless.

Describing external types

Sometimes you want to use TypeScript code that you didn’t write, but you’d still like it to be safely typed. Fortunately TypeScript supports exactly that.

Using the above type definitions we can describe the full shape of an external library totally externally to it, without having to actually change the code. Structural typing means the original library code doesn’t need to specify which interfaces it supports, and we just need a definition of the interface of the library, and to tell TypeScript that a variable matching this interface is available.

To do that, we use ambient modules; these describe modules of code that are defined outside our codebase. They can be either ‘internal’ (the variables declared are already defined in the global scope, typically by a script tag somewhere) or ‘external’ (the variables declared are exposed by a library that needs to be loaded through a module loader like CommonJS or AMD – we’ll look at TypeScript modules in a later post).

This is all very interesting, but helpfully you don’t really need to go any deeper than that for now. The syntax for this isn’t particularly important for TypeScript development day-to-day (although the section in the TypeScript handbook includes a few illustrative examples), because it’s already been done for you, for almost library you’ll use, as part of a project called DefinitelyTyped.

DefinitelyTyped includes not only type definitions for every library you might want (e.g. jQuery, lodash or loglevel), but also its own package manager TSD, to automatically retrieve and updates these type definitions for you.

To get started with this, just install TSD (npm install tsd -g), install the type definitions you need (tsd install jquery knockout moment --save), and ensure they’re referenced in your compilation process (e.g. include them as files to compile on the command line to tsc, add them to your input files list in grunt-ts, or use <reference> tags). TypeScript will know the variables exposed by each library and their types, giving you strong static typing wherever they’re used.

Bonus TypeScript Features

With this 3rd post, you’ve now seen the core of everything TypeScript has to offer, when automatically inferring types, manually annotating types yourself, and defining your own types to extend inference and annotation even further.

That’s not all TypeScript has to offer though. On top of this, TypeScript adds a selection of interesting bonus features, drawn from both ES2015 (ES6) and other languages, but compiling down into backward-compatible JavaScript you can run anywhere. Watch this space for the 4th and final post in this series, where we’ll take a closer look at exactly what’s available there, and how you can put it to use.

Introduction to TypeScript: Type Annotations


14 August 2015, by

This is the 2nd post in our series on TypeScript. Take a look at the first post in this series for a bit more of an introduction to the basic of TypeScript, and the powers of type inference. In this post we’re going to take a more detailed look at the type annotations TypeScript provides to explicitly describe types, extending the power of static typing over more of our code that type inference can’t quite cover.

Extending types beyond pure inference

The simplest approach to typing your TypeScript code is to let type inference do it for you. This works well for locally defined and used variables, but falls down in cases where TypeScript can’t see enough context to know exactly what values we’re expecting. TypeScript infers types for variables by the values they’re initialised to, and by seeing them either returned by or provided as arguments to functions for which it already has types. That doesn’t cover many cases though, particularly the types of arguments in new function definitions, types for variables that aren’t immediately initialised, and any use of variables outside of the compiled TypeScript code (e.g. code coming from external JavaScript libraries).

This doesn’t necessarily result in a failure to compile your code. Variables that don’t have inferable types are given the ‘any’ type: a dynamic type that opts them out of type checking, and blindly trusts their usage. You can disable this by enabling the noImplicitAny) flag to require strict typing everywhere, but this is often useful behaviour initially; treating unknown variables as any allows you to gradually type a codebase, rather than forcing you to ensure everything is fully typed immediately. It’s rarely what you want in the long-term though. Types catch bugs, and the more specific you can be about the types you’re expecting, the more mistakes you’re going to catch at compile-time, before they hurt.

In these cases then where TypeScript can’t infer our types, how to we specify them? First the basics:

var x: string;

function aFunctionWithTypedArguments(a: number, b: Array<number>): void { ... }

function aFunctionReturningAnElement(): HTMLElement { ... }

function aGenericFunction<T>(arg: T): Array<T> { ... }

Here we annotate types on a variable, function arguments, function return types, and a generic function’s argument and return types. With these in place the compiler can now validate these types are correct (refusing to any attempts to call the 1st function with two numbers, for example), and can use these types in future inferences (for example automatically inferring the type of ‘c’ in var c = aFunctionReturningAnElement();).

Hopefully this is fairly intuitive to anybody who’s written code in a statically typed language before; all of this acts just as you’d expect coming from languages like Java or C#.

Note the generic code particularly. While this might look complex to anybody only familiar with JavaScript, it’s fundamentally the same as the generics used in throughout many popular statically typed languages. aGenericFunction here is a function that takes an argument of any type, and returns an array of that type: e.g. aGenericFunction(1) is guaranteed to return an array of numbers.

More complex type annotations

That’s it for the simple case. Unfortunately JavaScript has quite a few more complicated types than this though, and TypeScript aims to let you to describe all of the types that we see in real world JS. To do this TypeScript provides some more unusual types to effectively describe more complex structure:

  • Inline function types: var callback: (e: Event) => boolean

    JavaScript APIs tend to be very fond of passing functions around, particularly for callbacks, and the type system has to be able to keep up. The signatures for this is simple: brackets listing the argument types, and an arrow (=>) to the return type.

  • Anonymous object types: var x: { name: string };

    TypeScript has a structural type system: a variable matches a type if it has the same structure. X is matches type T if X has all the properties that T has, with the same types. You can think of this as compile-time duck typing. This differs drastically from languages like C# or Java with nominal type systems, where types match only if there’s an explicit relationship between them (e.g. one one is a subclass of the other, or explicitly implements its interface).The end result is that you can define types by their structure alone. Above for example is a variable that can be assigned any object with a name property that’s a string. TypeScript will then allow you to set it to any kind of object from any source, as long as it fulfills that description, and catch any cases that don’t fit that at compile time. This is a key, as lots of existing JavaScript depends on duck-typing, and would be extremely difficult to externally type with a more traditional OO type system.

  • Union types: var x: string|number

    Union types are a fairly new TypeScript feature, added in 1.4. They allow you to describe a variable as either being of type A or type B, and will only allow you to perform operations valid on both. To then pick a specific one of the two options you can use type guards: if (x instanceof string) { ... }. TypeScript’s inference engine understands type guard expressions like these, and will allow you to use the variable as the specific type that you’ve checked for within the body of the if block.In addition TypeScript also has explicit casting, like many statically typed languages, for cases where you want to tell the compiler you’re already sure what type something is (var y = <number> x;).Like structural typing, union types are useful because they closely match common patterns used in existing JavaScript code. Many libraries (e.g. JQuery) return completely different types of variable depending on the specific arguments provided at runtime, and this provides a very effective way of explicitly describing and handling that case.

Defining your own types

That’s the essence of how you annotate your types in TypeScript. With this, you can add annotations describing the core structure and contracts with your code, to get type checking across your codebase.

This is still a bit limited though: we can only use built in types (number, string, HTMLElement), or combinations and structures we build from those explicitly. In the next post in this series we’ll take a closer look at that, and the tools TypeScript provides to let you define your own types, with classes, enums, and more.

Introduction to TypeScript: Type Inference In Practice


3 August 2015, by

TypeScript is a powerful compile-to-JS language for the browser and node, designed to act as a superset of JavaScript, with optional static type annotations. We touched on it years ago when it was first released, but version 1.5 is coming soon, and it’s time for a closer look!

In this post, I want to first take a look at what type inference TypeScript gives you on vanilla JS, and why you that might be something to care about, before we later start digging into the extra constructs TypeScript provides.

What is TypeScript?

TypeScript as a language is very similar to JavaScript, but bringing in type inference, access to future features (due to the compile step), support for more structure, and optional type annotations, all while remaining a strict superset of JavaScript.

Type inference and annotation are the killer features here: it allows you to annotate variables with these type annotations as you see fit, and then uses an extremely powerful type inference engine to automatically infer types for much of the rest of your code from there, automatically catching whole classes of bugs for you immediately. This is totally optional though, and any variables without types are implicitly assigned the ‘any’ type, opting them out of type checks entirely, allowing you to progressively add types to your codebase only where they’re useful, according to your preferences.

Catching bugs for free (almost) with static type inference

Below is a very simple example chunk of vanilla standalone JavaScript. There are a selection of showstopping bugs in the below; without reading on, how many can you spot?

 

navigator.geolocation.getCurrentPosition(function onSuccess(position) {
  var lat = position.latitude;
  var long = position.longitude;

  var lastUpdated = Date.parse(position.timestamp);
  var now = new Date();
  var positionIsCurrent = now.getYear() === lastUpdated.getYear();
	
  if (positionIsCurrent) {
    var div = document.createElement("div");
    div.class = "message";
    div.style = "width: 100%; height: 100px; background-color: red;";		
    div.text = "Up to date position: " + lat + ", " + long;
    document.body.append(div);
  } else {
    var messageDivs = document.querySelectorAll("div.message");
    messageDivs.forEach(function (message) {
      message.style.display = false;
    });
  }
}, { enableHighAccuracy: "never" });

 

Done?

In total there’s actually 12 bugs in here, ranging from the easily spottable (it’s .appendNode() not append(), which will crash with an undefined method exception as soon as you start testing), to the more subtle (enableHighAccuracy is a boolean and “never” is truthy, so this unintentionally turns on your GPS and waits until the position is accurate enough, and various assignments to incorrect properties will just silently do nothing). All 12 of these are caught by TypeScript automatically however, when run on this vanilla JavaScript source just with type inference, no type annotations.

Take a look at a commented version for some more details, with these issues automatically caught by the TypeScript compiler, in the TypeScript playground at http://goo.gl/L9qp8o.

This then gets even more powerful once you do start annotating parameters, to provide more information in the cases it can’t automatically spot for you. Of these, how much time would it take to trace these bugs down? How does a compiler that points out each of these half a second after you add it sound?

Types are powerful. They don’t catch all bugs by any means, and good test coverage remains important, but type checking is an easy and effective method to totally immediately remove an entire class of bugs from your codebase, to let your testing instead focus on actual functionality and behaviour, rather than just checking your code is sensical.

A key point here is the speed of feedback: writing + running your tests is always going to be a slower and far more time consuming process than compiling your code, and good IDEs (both Visual Studio and Intellij) will give you line by line feedback in sub-second times when you write code that won’t compile. This is something that great tools like JSHint can provide, but while they’re definitely useful, without understanding the types in a codebase they’re severely hampered. In the above code, JSHint sees no issues whatsoever.

TypeScript meanwhile catches these issues, working with your vanilla JS code out of this box, for near-zero effort (near-zero: there’s a small bit of setup required to add the compile step, but very very small, and it shouldn’t require any code changes). This is no panacea, there’ll still be bugs and you’ll still need to test your code heavily, but with types you’ll at least already have the confidence that there’s nothing totally drastically wrong before you do so.

Going beyond this

That’s enough for one post, but hopefully starts to give you a taste of what TypeScript can provide above and beyond JavaScript.

This post is the start of a series on TypeScript; in the next post we’ll take a look at type annotations, and how you can extend this power to cover all your code, not just code where the types are easily inferrable, to take this even further. If you’re hungry for more in the meantime, take a look at TypeScriptLang.org for the full guide into the details of how TypeScript works, or ping us on Twitter with your questions and thoughts.

Fixing aggregation over relationships with Sequelize


17 July 2015, by

I’ve been setting up Sequelize recently for a new Softwire Node.js project that’s starting up. As part of the initial work we wanted to investigate Sequelize (the go-to Node SQL ORM) in a little depth, to make sure it could neatly handle some on the trickier operations we wanted to perform, without us having to fall back to raw SQL all the time.

Most of these came out very easily in the wash, but one was trickier and needed investigation and upstream patches, and I’d like to take a closer look at that in this post. The challenging case: aggregating values across a relationship (i.e. SUM over a column from a JOIN).

Some Background

The project is sadly confidential, but the core operation has an easy equivalent in the classic Blog model of Posts and Comments. We have lots of Posts on our blog, and each Post has 0 or more Comments. For the purposes of this example, Comments can have some number of likes. Defining this model in Sequelize looks something like:

var Sequelize = require('sequelize');
var db = require('./db-connection');

var Post = db.define('Post', {
    publishDate: { type: Sequelize.DATEONLY }
});
var Comment = db.define('Comment', {
    likes: { type: Sequelize.INTEGER }
});

Post.hasMany(Comment);

With this model in place, Sequelize makes it easy for us to do some basic querying operations, using its Promise-based API:

db.sync().then(function () {
    return Post.findAll({
        // 'Include' joins the Post table with the Comments table, to load both together
        include: [ { model: Comment } ]
    });
}).then(function (results) {
    // Logs all Posts, each with a 'Comments' field containing a nested array of their related Comments
    console.log(JSON.stringify(results));
}).catch(console.error);

The Problem

Given this Post/Comment model, we want to get the total number of likes across all comments for matched set of articles (‘how many likes did we get in total for this month’s articles?’). A great result would be a SQL query like:

SELECT SUM(comment.likes) AS totalLikes
FROM dbo.Posts AS post
LEFT OUTER JOIN dbo.Comments AS comment ON post.id = comment.postId
WHERE post.publishDate >= '2015-05-01'

Sequelize in principle supports queries like this. It allows an ‘include’ option (in the example above), a ‘where’ option (for filtering) and an ‘attributes’ option (specifying the fields to return). There is also Sequelize.fn, to call SQL functions as part of expressions (such as the attributes we want returned). Combining all of these together suggests we can build the above with something like:

db.sync().then(function () {
    return Post.findAll({
        // Join with 'Comment', but don't actually return the comments
        include: [ { model: Comment, attributes: [] } ],
        // Return SUM(Comment.likes) as the only attribute
        attributes: [[db.fn('SUM', db.col('Comments.likes')), 'totalLikes']],
        // Filter on publishDate
        where: { publishDate: { gte: new Date('2015-05-01') } }
    });
}).then(function (result) {
    console.log(JSON.stringify(result));
}).catch(console.error);

Sadly this doesn’t work. It instead prints “Column ‘Posts.id’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause”, because the generated SQL looks like:

SELECT Post.id, SUM(Comments.likes) AS totalLikes
FROM Posts AS Post
LEFT OUTER JOIN Comments AS Comments ON Post.id = Comments.PostId
WHERE Post.publishDate >= '2015-05-01 00:00:00.000 +00:00';

This is wrong! This attempts to load the total aggregate result, but also to load it along with all the Post ids involved, which isn’t really meaningful in SQL, unfortunately. If we group by Post.id this will work (and that is possible in Sequelize), but in reality there are a large number of Posts here, and we’d just like a single total, not a total per-Post that we have to load and add up later ourselves.

Making this Work

Unfortunately it turns out that there is no easy way to do this in the Sequelize, without getting involved in the internals. Fortunately it’s open-source, so we can do exactly that.

The real problem here is that the ‘attributes’ array we provide isn’t being honoured, and Posts.id is being added to it. After a quick bit of analysis tracing back where ‘attributes’ get changed, it turns out the cause of this is inside findAll, in Sequelize’s model.js. Take a look at the specific code in lib/model.js lines 1176-1187. This code ensures that if you ever use an ‘include’ (JOIN), you must always return the primary key in your results, even if you explicitly set ‘attributes’ to not do that. Not helpful.

The reason this exists is to ensure that Sequelize can internally interpret these results when building models from them, to reliably deduplicate when the same post comes back twice with two different joined comments, for example. That’s not something we need here though, as we’re just trying to load an aggregate and we don’t want populated ‘Post’ models back from this, and it causes a fairly annoying problem (for us and various others). There is a ‘raw’ option that disables building a model from these results, but that sadly doesn’t make any differences to the behaviour here.

In the short-term, Sequelize has ‘hooks’ functionality that lets you tie your own code into its query generation. Using that, we can put together a very simple workaround by changing our connection setup code to look something like the below (and this is what we’ve done, for the very short-term).

function resetAttributes(options) {
    if (options.originalAttributes !== undefined) {
        options.attributes = options.originalAttributes;
        if (options.include) {
            options.include.forEach(resetAttributes);
        }
    }
}

var db = new Sequelize(db, username, password, {
    "hooks": {
        "beforeFindAfterOptions": function (options) {
            if (options.raw) resetAttributes(options);
        }
    }
}

If you’re in this situation right now, the above will fix it. It changes query generation to drop all ‘attributes’ overrides if ‘raw’ is set on the query, solving this issue, so that running the aggregation query above with ‘raw: true’ then works. Magic.

Solving this Permanently

That’s all very well for now, but it feels like a bit of a hack, and this behaviour seems like something that’s not desirable for Sequelize generally anyway.

Fortunately, we’ve now fixed it for you, in a pull request up at https://github.com/sequelize/sequelize/pull/4029.

This PR solves this issue properly, updating the internals of model.js to not change the specified attributes if it’s not necessary (if ‘raw’ is set to true) both for this case (attributes on the query root), and the include case (the attributes of your JOINed models). That PR’s recently been merged, solving this issue long-term in a cleaner way, and should be available from the next Sequelize release.

Once that’s in place, this blog post becomes much shorter: if you want to aggregate over an include in Sequelize, add ‘raw: true’ to your query. Phew!