(This article was first published on OpenCPU , and kindly contributed toR-bloggers)
Yesterday a new version of the jsonlite package was released to CRAN. This update includes no new features, it only introduces performance optimizations.
The jsonlite package was already highly optimized for converting vectors and data frames to json. However Gregory Jefferis and Duncan Murdoch had found that conversion of tall matrices as used by rglwidget was slower than expected.
It turned out this was indeed an edge case that I had overlooked. The new version of jsonlite fixes this problem and matrix conversion should be about 200 times faster than before. Technical details follow below; first a benchmark:
# Old version! > system.time(j<-toJSON(matrix(1L, ncol = 3, nrow = 50000))) user system elapsed 4.715 0.015 4.729 # New version! > system.time(j<-toJSON(matrix(1L, ncol = 3, nrow = 50000))) user system elapsed 0.022 0.002 0.023
This artificial example (every field has the number 1) highlights the improvement. The relative improvement might be less for matrices with actual data because of additional time spent on number formatting double/integer values (which was already optimized in jsonlite a while ago ).
So what was the problem? The previous version of jsonlite had an elegant solution that would recurse through the dimensions of a matrix/array and apply json conversion on each of its elements. E.g. for a matrix (2D array) it would convert each row to json, and then combine the results. However it turns out that the
apply call below is really slow.
# Technical example, don't use this code ! x <- matrix(1L, ncol = 3, nrow = 50000) rows <- apply(x, 1, jsonlite:::asJSON) json <- jsonlite:::collapse(rows, indent = NA)
The new version exploits the fact that matrices and arrays are homogenous (i.e. all elements have the same type). It first removes the dimensions from the array using
c(x) and converts all of the individual elements to json with a single call to
asJSON . This results in a significant speedup because
asJSON is only called once rather than
# Technical example, don't use this code ! str <- jsonlite:::asJSON(c(x), collapse = FALSE) dim(str) <- dim(x) rows <- apply(str, 1, jsonlite:::collapse, indent = NA) json <- jsonlite:::collapse(rows, indent = NA)
Things get a bit more complicated for higher dimensional arrays, especially with
toJSON(x, pretty = TRUE) but this illustrates the core issue.
You might be thinking: can we avoid
apply alltogether? Yes! For the important case of 2 dimensional arrays jsonlite has a complete C implementation which makes
toJSON on matrices is extra fast. For higher dimensional arrays it currently still uses the solution above, which performs quite well. We might be able to further optimize this case by porting this to C as well, but working with high dimensional arrays in C makes my head hurt.