Softwire Blog


Fixing aggregation over relationships with Sequelize


17 July 2015, by

I’ve been setting up Sequelize recently for a new Softwire Node.js project that’s starting up. As part of the initial work we wanted to investigate Sequelize (the go-to Node SQL ORM) in a little depth, to make sure it could neatly handle some on the trickier operations we wanted to perform, without us having to fall back to raw SQL all the time.

Most of these came out very easily in the wash, but one was trickier and needed investigation and upstream patches, and I’d like to take a closer look at that in this post. The challenging case: aggregating values across a relationship (i.e. SUM over a column from a JOIN).

Some Background

The project is sadly confidential, but the core operation has an easy equivalent in the classic Blog model of Posts and Comments. We have lots of Posts on our blog, and each Post has 0 or more Comments. For the purposes of this example, Comments can have some number of likes. Defining this model in Sequelize looks something like:

var Sequelize = require('sequelize');
var db = require('./db-connection');

var Post = db.define('Post', {
    publishDate: { type: Sequelize.DATEONLY }
});
var Comment = db.define('Comment', {
    likes: { type: Sequelize.INTEGER }
});

Post.hasMany(Comment);

With this model in place, Sequelize makes it easy for us to do some basic querying operations, using its Promise-based API:

db.sync().then(function () {
    return Post.findAll({
        // 'Include' joins the Post table with the Comments table, to load both together
        include: [ { model: Comment } ]
    });
}).then(function (results) {
    // Logs all Posts, each with a 'Comments' field containing a nested array of their related Comments
    console.log(JSON.stringify(results));
}).catch(console.error);

The Problem

Given this Post/Comment model, we want to get the total number of likes across all comments for matched set of articles (‘how many likes did we get in total for this month’s articles?’). A great result would be a SQL query like:

SELECT SUM(comment.likes) AS totalLikes
FROM dbo.Posts AS post
LEFT OUTER JOIN dbo.Comments AS comment ON post.id = comment.postId
WHERE post.publishDate >= '2015-05-01'

Sequelize in principle supports queries like this. It allows an ‘include’ option (in the example above), a ‘where’ option (for filtering) and an ‘attributes’ option (specifying the fields to return). There is also Sequelize.fn, to call SQL functions as part of expressions (such as the attributes we want returned). Combining all of these together suggests we can build the above with something like:

db.sync().then(function () {
    return Post.findAll({
        // Join with 'Comment', but don't actually return the comments
        include: [ { model: Comment, attributes: [] } ],
        // Return SUM(Comment.likes) as the only attribute
        attributes: [[db.fn('SUM', db.col('Comments.likes')), 'totalLikes']],
        // Filter on publishDate
        where: { publishDate: { gte: new Date('2015-05-01') } }
    });
}).then(function (result) {
    console.log(JSON.stringify(result));
}).catch(console.error);

Sadly this doesn’t work. It instead prints “Column ‘Posts.id’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause”, because the generated SQL looks like:

SELECT Post.id, SUM(Comments.likes) AS totalLikes
FROM Posts AS Post
LEFT OUTER JOIN Comments AS Comments ON Post.id = Comments.PostId
WHERE Post.publishDate >= '2015-05-01 00:00:00.000 +00:00';

This is wrong! This attempts to load the total aggregate result, but also to load it along with all the Post ids involved, which isn’t really meaningful in SQL, unfortunately. If we group by Post.id this will work (and that is possible in Sequelize), but in reality there are a large number of Posts here, and we’d just like a single total, not a total per-Post that we have to load and add up later ourselves.

Making this Work

Unfortunately it turns out that there is no easy way to do this in the Sequelize, without getting involved in the internals. Fortunately it’s open-source, so we can do exactly that.

The real problem here is that the ‘attributes’ array we provide isn’t being honoured, and Posts.id is being added to it. After a quick bit of analysis tracing back where ‘attributes’ get changed, it turns out the cause of this is inside findAll, in Sequelize’s model.js. Take a look at the specific code in lib/model.js lines 1176-1187. This code ensures that if you ever use an ‘include’ (JOIN), you must always return the primary key in your results, even if you explicitly set ‘attributes’ to not do that. Not helpful.

The reason this exists is to ensure that Sequelize can internally interpret these results when building models from them, to reliably deduplicate when the same post comes back twice with two different joined comments, for example. That’s not something we need here though, as we’re just trying to load an aggregate and we don’t want populated ‘Post’ models back from this, and it causes a fairly annoying problem (for us and various others). There is a ‘raw’ option that disables building a model from these results, but that sadly doesn’t make any differences to the behaviour here.

In the short-term, Sequelize has ‘hooks’ functionality that lets you tie your own code into its query generation. Using that, we can put together a very simple workaround by changing our connection setup code to look something like the below (and this is what we’ve done, for the very short-term).

function resetAttributes(options) {
    if (options.originalAttributes !== undefined) {
        options.attributes = options.originalAttributes;
        if (options.include) {
            options.include.forEach(resetAttributes);
        }
    }
}

var db = new Sequelize(db, username, password, {
    "hooks": {
        "beforeFindAfterOptions": function (options) {
            if (options.raw) resetAttributes(options);
        }
    }
}

If you’re in this situation right now, the above will fix it. It changes query generation to drop all ‘attributes’ overrides if ‘raw’ is set on the query, solving this issue, so that running the aggregation query above with ‘raw: true’ then works. Magic.

Solving this Permanently

That’s all very well for now, but it feels like a bit of a hack, and this behaviour seems like something that’s not desirable for Sequelize generally anyway.

Fortunately, we’ve now fixed it for you, in a pull request up at https://github.com/sequelize/sequelize/pull/4029.

This PR solves this issue properly, updating the internals of model.js to not change the specified attributes if it’s not necessary (if ‘raw’ is set to true) both for this case (attributes on the query root), and the include case (the attributes of your JOINed models). That PR’s recently been merged, solving this issue long-term in a cleaner way, and should be available from the next Sequelize release.

Once that’s in place, this blog post becomes much shorter: if you want to aggregate over an include in Sequelize, add ‘raw: true’ to your query. Phew!

Testing and Debugging Webpages for BlackBerry Mobiles


22 January 2013, by

If you are testing a webpage you have developed for BlackBerry mobiles, then the BlackBerry website is a useful place to visit as it has simulators of all their devices available to download. Once installed, the simulators that are running OS6 or higher can be used to view your webpage in their browser straight away. To get the browser working on OS5 simulators or older, the BlackBerry Mobile Data Service simulator (MDS) needs to be installed and run, instructions for doing this on Windows 7 x64 can be found in this helpful blog post here (this is a link to the Google cache version as the site is down at the time of writing).

The most useful thing I have found is, since BlackBerry OS7, there is now a way to debug webpages, which is very useful as the BlackBerry browser has a lot of quirks, despite being Webkit based since OS6. The following can be done in either the simulator or on an actual device:

First of all you need to go to the Browser Options and ensure ‘Enable Developer Tools’ is selected. In the browser you then need to bring up the menu by pressing the BlackBerry button and select Developer Tools > Enable Web Inspector. A popup will then display an IP address and port number.

You can then connect to that IP address using a Webkit based browser on your PC (or a PC on the same network as the device if you aren’t using the simulator) and select the site that is being displayed by the BlackBerry.

Once you have selected the webpage you have access to the full debug tools that are available to Webkit browsers, which are very useful for figuring out why that <select> is appearing twice as large as it is on every other browser.

Apache Bench (ab) stalls after 16000 requests


2 September 2012, by

I’ve been doing some trivial benchmarking of Play 2 with ab (Apache Bench) just to get an idea of its raw capabilities for serving simple requests – and because it’s what I always do when picking up a new framework so I know what I’m dealing with. In doing so I ran into a bit of a puzzler that had me thinking Play 2 was bugged – but my spidey sense soon kicked in and told me it was more likely to be an OS or ab issue. I had done approximately the following, using Play 2.0.1 on OS X 10.7.3, and I’m pretty certain you’ll see the same results if you do this on a Mac:

(more…)

An actual Java bug: Investigating a JVM crash on multi-threaded string access


8 July 2012, by

When your program doesn’t work as you expected, it is often tempting to blame a bug in the framework. Over the years, I have come to regard these claims with extreme suspicion. I have seen apparent “framework” bugs turn out to be programming mistakes hundreds of times. However, when the JVM crashed in some java-only string manipulation code (or gave different answers on different runs with the same input), I realised I had stumbled upon a real Java bug.

(more…)

Debugging Third-Party Applications – Part 3


7 February 2012, by

previous article in series

This is the last in a short series of posts about a problem I had with Microsoft Exchange’s Outlook Web Access (OWA): reading the Old New Thing blog through OWA suddenly stopped working with error:

This link has been disabled for your security.

In the second part I tracked down the .NET code that was rejecting the blog article URL as unsafe and I guessed that the problem was the URL was relative not absolute – and so looked to OWA like an attempt to reference one of its own resources, potentially maliciously, and not a real link to a page on the internet.

(more…)

Debugging Third-Party Applications – Part 2


19 January 2012, by

previous article in series

This is a short series of posts about a problem I had with Microsoft Exchange’s Outlook Web Access (OWA): reading the Old New Thing blog through OWA suddenly stopped working with this error:

This link has been disabled for your security

In the first part I tracked down this error message to a static page UrlBlockedError.aspx in the OWA files on the Exchange server.

(more…)

Debugging Third-Party Applications – Part 1


12 January 2012, by

This is a short series of posts as a worked-example of how to debug into someone else’s code. It covers a problem I ran into with Microsoft Exchange’s Outlook Web Access (OWA), which Microsoft has since fixed, but the advice is intended to apply to many other situations.

(more…)