A continuous integration server is a server that compiles and runs automated tests for each commit of a software project. Travis proposes continuous integration as a service and offers it for free for open-source projects.
However, beyond the scoping of compilation and testing, a continuous integration server is able to run any arbitrary code. Why not using it for performing computational scientific experiments? This is what we have set up over the last weeks. I present in this post a scheme to perform computational scientific experiments in a pure open science way: the experimental code is open source, the results as well, and the experiments is fully reproducible.
More than running compilation for each commit, Travis runs it for each creation of branches. It can be attached to Github, which means that for each branch created to Github, Travis runs the compilation and runs the tests.
Using Travis for scientific experiments means two things, first replacing the compilation script by the experimental script, second, configuring the experiment in a branch.
For the first one, it means changing Travis’ config file
.travis.yml , to do more than compilation, for instance:
language: java install: mvn compile script: ./run-experiment.sh
For configuring the experiment, you simply have to add parameters to your script. You can create a new branch (
git branch -b my-experiment-10-20 ) and modify the experiment parameters as the following commit diff shows:
- script: ./run-experiment.sh + script: ./run-experiment.sh 10 20
Then you run the experiment by committing and pushing to github:
git commit -m "setting up experiment with params 10 and 20" -a git push origin my-experiment-10-20
The latter triggers Travis to checkout branch
my-experiment-10-20 and run the experiment script.
This what we do for our perturbation experiments: https://github.com/Spirals-Team/jPerturb-experiments/branches
Collecting the output
If you print to the console, you can view the output of the experiment by looking at at the travis. This has the drawback that the output should not be more than 4MB and that you have to parse them.
There are two other options to collect the output of the experiments.
You can push the outputs to a file transfer service such as http://transfer.sh .
Also, you can push the output files to Github directly, in the same branch or in another one. In this case, simply make sure to disable travis for this commit otherwise, you’ll enter into an infinite recursion (github → travis → github → travis, etc.)
This scheme allows absolute replicability as long as Github and Travis remains alive. To replicate an experiment, one simply has to execute travis again on a given experimental branch.
git checkout my-experiment-10-20 git commit -m "trigger travis" --allow-empty git push origin my-experiment-10-20
This is what we have done for our TSE’16 paper .
If you use Travis, the experiment has to run in less than 45 minutes (the April 2016 limits). However, if you use your own continuous integration server, this limit disappears.
Continuous integration and computational experiments may seem quite different. However, with what I’ve discussed here, they converge into the exact same infrastructure. The reason is that they have the same requirements: computation as a service and version management.
The open question is whether Travis would consider whether open science is related to open source, hence whether it’s legitimate to use their free CPU muscles to do science.