Softwire Blog


Using sqlite for dev nodeJS environments


29 November 2016, by

This post will explain how to get one click DBs working in nodeJS.

Problems with your zero to hero

I was recently working on a NodeJS project, and it was set up to use a mySQL database. This was fine for production, but meant that people joining the project had several manual steps to get the project working.

  • Install SQL server
  • Set up a SQL database
  • Update the NodeJS config to point to the database.

This isn’t too difficult to do. But I wanted to reduce the ramp up to getting the project up and running on a dev machine.

This meant moving to a database that could configure itself from whatever was checked into the codebase.

One option would have been to translate the steps above into code so that running a command would create a SQL database that was configured to be used by the code.

This would have been overkill, as there are much better options.

I chose to instead configure dev and test environments to use a local sqlite database.

Local sqlite database

A sqlite database can be added as an npm dependency, so that it is installed as part of dependency management when a new developer checks out the code.

"dependencies": {
  "sqlite3": "3.1.4"
}

Now that sqlite has been added as a dependency, it can be called from the ORM that we’re using. In this example, I’ve used Sequelize as the ORM, as once set up, it allows easy mapping of Javascript objects to database objects, as well as having built in support for database migrations.

The next thing to do is to have a configuration file which details how to connect to the database in different environments:

module.exports = {
  development: {
    dialect: 'sqlite',
    storage: 'data/dev-db.sqlite3'
  },
  test: {
    dialect: 'sqlite',
    storage: 'data/test-db.sqlite3'
  },
  production: {
    username: process.env.RDS_USERNAME,
    password: process.env.RDS_PASSWORD,
    database: 'ebdb',
    port: process.env.RDS_PORT,
    host: process.env.RDS_HOSTNAME,
    dialect: 'mysql'
  }
};

This config.js file shows that in dev and test mode, we’ll connect to the sqlite database which is stored in the file “data/<env>-db.sqlite3”.

In production, it instead connects to an RDS instance (the production machine in this project was running on AWS), with the connection details stored in the environment on the cloud machine (rather than being checked into the code).

Now we need to setup sequelize to use the correct database when it is initialised. This is done in the index.js file under the models folder:

var path = require('path');
var Sequelize = require('sequelize');
const basename = path.basename(module.filename);
const env = process.env.NODE_ENV || 'development';
const config = require(path.join(__dirname, '..', 'config', 'config.js'))[env];
const db = {};
var sequelize = new Sequelize(config.database, config.username, config.password, config);

This code fetches the relevant section of the config.js file, and uses it to initialise the ORM. What’s not shown above is the code that loads all the models into Sequelize. We’ll leave that as an excercise for the reader.

As you can see. This file is using an environment variable to determine what mode we are running the code in. All that’s required to run in production is set NODE_ENV in the environment variables to be equal to “production”.

The app can now be started using “npm start”, which will start, using the dev database.

Migrations

Just a quick note here on using Sequelize for database migrations. You can hook your migrations into the “npm start” command above by adding the following to your package.json file:

"scripts": {
    "prestart": "sequelize db:migrate -c config/config.js"
}

Summary

So what have we learnt?

  • You should try and ensure setting up a project is as easy as possible for new developers
  • You can use different environments variables to separate out dev/test/production environments (and default to test).
  • This method can work, even if you are deploying to the cloud.
  • You can easily add database migrations into the start command of your application.

This post originally appeared on Chris Arnott’s personal blog.

Fixing aggregation over relationships with Sequelize


17 July 2015, by

I’ve been setting up Sequelize recently for a new Softwire Node.js project that’s starting up. As part of the initial work we wanted to investigate Sequelize (the go-to Node SQL ORM) in a little depth, to make sure it could neatly handle some on the trickier operations we wanted to perform, without us having to fall back to raw SQL all the time.

Most of these came out very easily in the wash, but one was trickier and needed investigation and upstream patches, and I’d like to take a closer look at that in this post. The challenging case: aggregating values across a relationship (i.e. SUM over a column from a JOIN).

Some Background

The project is sadly confidential, but the core operation has an easy equivalent in the classic Blog model of Posts and Comments. We have lots of Posts on our blog, and each Post has 0 or more Comments. For the purposes of this example, Comments can have some number of likes. Defining this model in Sequelize looks something like:

var Sequelize = require('sequelize');
var db = require('./db-connection');

var Post = db.define('Post', {
    publishDate: { type: Sequelize.DATEONLY }
});
var Comment = db.define('Comment', {
    likes: { type: Sequelize.INTEGER }
});

Post.hasMany(Comment);

With this model in place, Sequelize makes it easy for us to do some basic querying operations, using its Promise-based API:

db.sync().then(function () {
    return Post.findAll({
        // 'Include' joins the Post table with the Comments table, to load both together
        include: [ { model: Comment } ]
    });
}).then(function (results) {
    // Logs all Posts, each with a 'Comments' field containing a nested array of their related Comments
    console.log(JSON.stringify(results));
}).catch(console.error);

The Problem

Given this Post/Comment model, we want to get the total number of likes across all comments for matched set of articles (‘how many likes did we get in total for this month’s articles?’). A great result would be a SQL query like:

SELECT SUM(comment.likes) AS totalLikes
FROM dbo.Posts AS post
LEFT OUTER JOIN dbo.Comments AS comment ON post.id = comment.postId
WHERE post.publishDate >= '2015-05-01'

Sequelize in principle supports queries like this. It allows an ‘include’ option (in the example above), a ‘where’ option (for filtering) and an ‘attributes’ option (specifying the fields to return). There is also Sequelize.fn, to call SQL functions as part of expressions (such as the attributes we want returned). Combining all of these together suggests we can build the above with something like:

db.sync().then(function () {
    return Post.findAll({
        // Join with 'Comment', but don't actually return the comments
        include: [ { model: Comment, attributes: [] } ],
        // Return SUM(Comment.likes) as the only attribute
        attributes: [[db.fn('SUM', db.col('Comments.likes')), 'totalLikes']],
        // Filter on publishDate
        where: { publishDate: { gte: new Date('2015-05-01') } }
    });
}).then(function (result) {
    console.log(JSON.stringify(result));
}).catch(console.error);

Sadly this doesn’t work. It instead prints “Column ‘Posts.id’ is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause”, because the generated SQL looks like:

SELECT Post.id, SUM(Comments.likes) AS totalLikes
FROM Posts AS Post
LEFT OUTER JOIN Comments AS Comments ON Post.id = Comments.PostId
WHERE Post.publishDate >= '2015-05-01 00:00:00.000 +00:00';

This is wrong! This attempts to load the total aggregate result, but also to load it along with all the Post ids involved, which isn’t really meaningful in SQL, unfortunately. If we group by Post.id this will work (and that is possible in Sequelize), but in reality there are a large number of Posts here, and we’d just like a single total, not a total per-Post that we have to load and add up later ourselves.

Making this Work

Unfortunately it turns out that there is no easy way to do this in the Sequelize, without getting involved in the internals. Fortunately it’s open-source, so we can do exactly that.

The real problem here is that the ‘attributes’ array we provide isn’t being honoured, and Posts.id is being added to it. After a quick bit of analysis tracing back where ‘attributes’ get changed, it turns out the cause of this is inside findAll, in Sequelize’s model.js. Take a look at the specific code in lib/model.js lines 1176-1187. This code ensures that if you ever use an ‘include’ (JOIN), you must always return the primary key in your results, even if you explicitly set ‘attributes’ to not do that. Not helpful.

The reason this exists is to ensure that Sequelize can internally interpret these results when building models from them, to reliably deduplicate when the same post comes back twice with two different joined comments, for example. That’s not something we need here though, as we’re just trying to load an aggregate and we don’t want populated ‘Post’ models back from this, and it causes a fairly annoying problem (for us and various others). There is a ‘raw’ option that disables building a model from these results, but that sadly doesn’t make any differences to the behaviour here.

In the short-term, Sequelize has ‘hooks’ functionality that lets you tie your own code into its query generation. Using that, we can put together a very simple workaround by changing our connection setup code to look something like the below (and this is what we’ve done, for the very short-term).

function resetAttributes(options) {
    if (options.originalAttributes !== undefined) {
        options.attributes = options.originalAttributes;
        if (options.include) {
            options.include.forEach(resetAttributes);
        }
    }
}

var db = new Sequelize(db, username, password, {
    "hooks": {
        "beforeFindAfterOptions": function (options) {
            if (options.raw) resetAttributes(options);
        }
    }
}

If you’re in this situation right now, the above will fix it. It changes query generation to drop all ‘attributes’ overrides if ‘raw’ is set on the query, solving this issue, so that running the aggregation query above with ‘raw: true’ then works. Magic.

Solving this Permanently

That’s all very well for now, but it feels like a bit of a hack, and this behaviour seems like something that’s not desirable for Sequelize generally anyway.

Fortunately, we’ve now fixed it for you, in a pull request up at https://github.com/sequelize/sequelize/pull/4029.

This PR solves this issue properly, updating the internals of model.js to not change the specified attributes if it’s not necessary (if ‘raw’ is set to true) both for this case (attributes on the query root), and the include case (the attributes of your JOINed models). That PR’s recently been merged, solving this issue long-term in a cleaner way, and should be available from the next Sequelize release.

Once that’s in place, this blog post becomes much shorter: if you want to aggregate over an include in Sequelize, add ‘raw: true’ to your query. Phew!

SQL Server Maintenance Plans (Part 2)


25 October 2012, by

previous article in series

Welcome to the second part of my post on SQL Server Maintenance Plans.

Last time, we looked at the Control Flow UI, and wrapped our heads around how to get the Plan to do the actions in the sequence that we want. This time, I’ll show you a GOTCHA relating to making those actions do what you want. Specifically, about …

Executing Agent Jobs from Maintenance Plans

Disclaimer: The following comes from SQL Server 2005 and may not apply to later versions, although I looked briefly at SQL 2008 and it appeared to be set up the same way.

(more…)

SQL Server Maintenance Plans (Part 1)


18 October 2012, by

I recently had to overhaul the Backups System for one of our customers using SQL Server 2005 Maintenance Plans, and my main take away is that it’s a pain in the rump. It’s a fairly frustrating interface in general, and there are a couple of particular gotchas that I thought I should share with anyone who ends up doing the same sort of thing.

(more…)