Archiverify


15 June 2015, by

Archiverify

A little while ago whilst going through some of my digital photos I noticed that some of them weren’t displaying correctly. Somehow some of my files had got corrupted; oh dear.

I do have a back-up system, but it worked by copying all the files from my NAS (network attached storage device) on to a removable drive. I had two removable drives that I used in rotation so I had a fairly short history to look back through. Fortunately some of the files had got corrupted in the window I could recover from, but some had gone bad too long ago and were now permanently broken.

So I decided that I needed to do something to notice this kind of issue in the future without having to regularly look at each of my thousands of files. I had a look around the internet and found a few things that were free and would probably work OK but not do exactly what I wanted (and also did a whole lot more that I didn’t want), and the usual cloud back-up systems that would also solve my problem, but for a fee.

At the same time I was feeling that I wanted to do a bit more coding so I thought I’d write something new and focussed on the particular needs that I had. Specifically I wanted something that would:

  • Take two directories of photos (although it would work for any kind of set of unchanging files e.g. a music library)
  • Check that the files in each haven’t changed, and if possible fix any that have
  • Find any new files and copy them into the other directory
  • Be cross platform (because I wanted it to be useful to other people)
  • Allow me to learn something new

In the end I settled on a design that created hashes for each file found and stored them alongside the files. When running through the directories the files are read and their hashes compared to the stored hashes in both directories and any inconsistencies are flagged up.

I decided to implement the program in Java so that it would be cross platform. As this was a one-person project I wanted to use a Test Driven Development methodology to ensure I ended up with good automated test coverage so that I didn’t have to spend time on manual testing, which quickly gets painful if you want to be thorough. I had heard some good things about Groovy (http://www.groovy-lang.org/) and Spock (https://code.google.com/p/spock/) so decided to use them for my tests to tick the “learn something new” box.

Softwire have semi-regular “pizza and programming” evenings after work where the company buys pizza and anyone that wants to can stay and work on any kind of coding project. I started going to these to work on my project which definitely helped me make progress. I was able to discuss ideas with other people, and I had some well-defined time free from other distractions to actually get on with it.

I quickly found I wanted to use a few libraries in my project, but I also wanted to be able to ship a single JAR file rather than a whole folder full. I ended up using One-JAR (http://one-jar.sourceforge.net/) to do this for me as it would work from within a build script. The build script itself is a Gradle (https://gradle.org/) script which builds the JAR, adds a version number to the file name, and runs the automated tests. There are still a few kinks in the script in non-mainline cases that I need to work out, but generally it works well.

So after some time I had a working application that I could build and test easily and was actually useful (to me at least). What next? I decided I needed a small website as GitHub’s default sites aren’t particularly user friendly for non-programmers. Fortunately GitHub knows this and provide “GitHub Pages” which allow you to push html pages to a special branch of your repository and those pages will then be displayed by GitHub at http://username.github.io. Not only that, but GitHub pages supports Jekyll so you can create your website in a more modular way than plain HTML.

Of the various new technologies I used, I found most of them fairly straightforward to get started with and they were all useful. To call out a few specific things of interest:

  • I found Spock’s ability to run the same test with multiple different inputs very useful (other test frameworks do this too).
  • I also found Spock’s test failure messages very readable and useful.
  • Gradle has possibly too much documentation! I found that to do some simple things I had to wade through a lot of documentation to try to find the relevant parts. This is also why the build script still isn’t 100% right in some cases.
  • I find it slightly odd that Java still relies on unofficial third party programs to allow bundling of JAR files into a single file.
  • I really like being able to publish a web page by just writing some markdown and pushing it to GitHub.

If you’d like to learn more about Archiverify, download the new v2.0.0, or simply see what a Jekyll based GitHub pages site can look like, then you can visit the Archiverify site at http://dancorder.github.io/Archiverify/index.html.

Categories: Technical

«
»

Leave a Reply

* Mandatory fields


five + 3 =

Submit Comment