Burst Reporting On A Budget

Creating a burst reporting application using R, Quarto, and Typst.
R
Author

Joe Kirincic

Published

November 24, 2025

Introduction

Reports. Love or hate them, we’ve all made them. If there’s one thing corporate loves to get data people to do that isn’t dashboards or AI, it’s reports. In the digital age, we see companies starting to move away from PDF reports to reports using dynamic web interfaces. But there are some companies that still benefit from PDFs. Healthcare comes to mind. Many healthcare facilities, especially ones in rural areas, often lack either up to date technology or strong internet connections. In circumstances like this, employees can’t afford to lose access to information because of lagging internet, so they print a lot of documents they need to review. This post is for folks that find themselves in such a situation. More specifically, it’s for the folks that have to cater to this need. Even more specifically, we’re going to talk about a particular kind of reporting feature that these folks benefit greatly from, and that’s burst reporting.

The idea behind burst reporting is that, given a dataset, you generate one report for each chosen segment of that dataset. If you have a sales dataset that is broken up by region, you would use burst reporting to generate one report for each region in the sales dataset. This capability is valuable for analysts. We can’t usually send one big report to all the heads of regional sales: the different regionals aren’t given access to each other’s data. Moreover, it can be cumbersome to have to navigate a large document comprised of information that’s mostly irrelevant to you. By providing one report tailored to each region, we give everyone the exact data they need to remain informed. And by automating this process, we’re able to divvy out information quickly and efficiently. Good deal.

I hadn’t heard of the term “burst reporting” until I saw it used by a company called Jaspersoft, a tech company that offers a myriad of BI tools for creating publication quality reports and dashboards. It’s like Tableau or Power BI, but more geared towards PDF generation than web interfaces (though they’re covering the web space, too). While I was looking through their website, I wondered whether I could jerry rig a useful burst reporting application using R, Quarto, and Typst. And so this post was born. We’re going to cover the solution I put together and evaluate its efficacy against some fictional corporate use cases. So let’s get to it.

Bird’s eye view

What we’re building is simple to describe. We need an application that can take a dataset, split the dataset into a bunch of subsets based on whatever dimensions we choose to split on, and generate a report for each of those subsets. In code, it’d look something like this.

library(tibble)
library(dplyr)

source("my-functions.R")

df <- load_dataset()

df |>
  group_by(
    region
  ) |>
  group_walk(
    \(x, y){
      generate_report(x, y)
    }
  )

We basically iterate over all the subsets created from grouping by region, calling the generate_report function on each subset. If you’re wondering what the second argument y is in the anonymous function, it’s a tibble containing one row and one column for each column passed to the group_by function. Anyway, at this level, the mechanics are pretty clear. To make this burst reporting app fast, there are two things to focus on. 1. Making generate_report as fast as possible. 2. Parallelizing the processing of our subsets.

Making generate_report efficient

What’s going on inside of generate_report? This function is a simple wrapper around the render function from the quarto R package, which is itself just a wrapper around the Quarto CLI. It passes the group keys as Quarto parameters, which are used to filter the data to the target subset before the PDF is rendered. In this way, the document is rendered with only the data in that subset. We could make this a little faster by preemptively writing out each subset to disk as an .rda file, and having the .rda file be what’s passed as a parameter to Quarto. By doing this, we spare Quarto having to filter the dataset a bunch of times, having it instead focus on rendering the document. The graphic below shows the difference in rendering time between the two approaches.

app n_reports median_total_exec_time_min median_exec_time_per_report_sec
Writing subsets to .rda files. 34 1.6 3.0
Filtering dataset based on group keys. 34 1.8 3.2

We do see a slight improvement using this second approach. But is it worth it? It depends on your use case. If the dimension you’re bursting on has small cardinality (e.g. 10 or less), then this probably doesn’t matter a ton. But if you’re bursting on a dimension with high cardinality, like generating reports for individual patients across different hospitals, then things are different. Little things add up to big things, so any optimizations we make here can drive down processing time in a big way, which is important if we want our reporting system to flourish at scale.

There’s other things we could do to improve performance further. We can imagine redesigning the plots in our reports so they display aggregated data instead of the raw data. We could also mess with the graphics device, reaching for something that has performance advantages over the default graphics device used by ggplot2 (e.g. ragg). While we may be able to squeeze more performance out of this system using these strategies, they come at the cost of flexibility. I want to see how far we can take performance without sacrificing flexibility. This will give us a clearer idea of how a solution like this compares to proprietary ones. So we’ll end our analysis of the generate_report function here, and we’ll turn our focus towards parallelizing the code using the {mirai} package.

Parallelizing the code using {mirai}

Generating reports over distinct subsets of a dataset is an example of what’s called an embarrassingly parallel problem. There’s no overlap in these subsets whatsoever, so we don’t have to worry about processing one report impacting the processing of any other reports. In this way, we can generate as many reports at any given time as we have cores. This sort of problem is well suited for a package like {mirai}, which we can use to launch these tasks in parallel. We’ll try that now, and see where it lands us in terms of performance enhancement.

There’s a couple of things we need to handle for this to work properly. If you just try to call mirai({ # business logic here }), a majority of reports will fail to generate. This is because, when the background R sessions call generate_report, they will each start to write files to the same directory, causing work to be overwritten and the compilation of the report to fail. So we need our background R sessions to do their work inside of a working directory that’s isolated from the other R sessions.

Now that we’ve solved for that, we can compare the performance of the two implementations. The following table shows the difference in median execution time between the two implementations.

app n_reports median_total_exec_time_min median_exec_time_per_report_sec
Without parallelism 34 1.8 3.2
With parallelism 34 0.3 0.6

Based on total execution time, the parallelized version is six times faster than the original! Awesome. By fully utilizing the available cores on our machine, we were able to drive the total execution time considerably.

Should we do this?

That’s the real question, right? Sure, we could do this, but will we be better off doing it? As with all important questions that hope for a clear cut answer, it depends. By implementing a system like this, you save yourself from buying the proprietary software. You also get a solution that’s code first, which is much easier to scale out through automation. Click ops is great for enabling fast prototyping and allowing non-technical team members to get involved, but it can become painful to scale. We could identify all sorts of reasons for or against this approach to burst reporting, but I’m going to go about it another way. I’m going to ask myself a series of questions with respect to whatever use cases I’m considering, and the answers to those questions will determine whether this is a viable solution to purchasing proprietary software.

Suppose we’re preparing a reporting service for a hospital system that needs reports for each patient updated every hour. Across all of your customers, you’re looking at hundreds of patients a day. If we can produce reports for these hundreds of patients within 40 minutes without going beyond 80% of our CPU utilization, then I’d say this is worth doing. Otherwise, you’re probably better off going with a proprietary solution.

Conclusion

In this post, we’ve covered burst reporting, what it is and why it matters. We went through an example implementation, highlighting the key technologies involved, and detailing ways we can improve the performance of the basic implementation. Finally, we evaluated hand-rolling your own burst reporting solution instead of shelling out for a proprietary solution. I encourage you to try this out for yourself and see if this is a solution that works for you.