Pivoting tables in MariaDB (and awk)

While raw data can be useful, it’s not uncommon for Business Analysts, Project Managers, Product Owners, etc. to request data in different formats. Pivoting tables is a common request, because often, a different perspective on the data is more useful than the raw output of a SELECT *.

In this article, we’ll look at a few examples on dealing with common patterns that may appear in reporting tasks.

The Problem

Consider the following table: you already did the hard part, and counted up the number of times a product was traded:

product_id,culture,year,no_of_trades
1,UK,2022,100
2,DE,2011,50
3,UK,2013,93
4,ES,2024,204
5,ES,2024,78
6,AT,2022,34
7,NO,2022,11
8,PT,2025,60
9,ES,2025,149
10,DE,2018,182
11,IE,2016,55
12,GR,2020,11
13,NL,2014,201
14,FI,2017,5
15,DE,2020,107

Where

-- product_id = unique numerical id
-- culture = country issuing the product
-- year = year in which transaction happened
-- no_of_trades = number of trades for product per culture per year

Now this example table is only 15 products, and a handful of countries, with just a few columns, but of course, in reality, datasets will usually be much larger: orders of magnitude more products than just 15, dozens of countries, and years of history spanning decades.

The necessary SQL statements to re-create the test table will be provided in the appendix.

Summary by total trades for each culture

A simple first question would be “how many trades did we have in total, for each culture?”

Based on the table data of this problem, the following SQL statement should suffice:

SELECT culture,
    SUM(CASE WHEN culture='UK' THEN no_of_trades ELSE 0 END) AS UK,
    SUM(CASE WHEN culture='DE' THEN no_of_trades ELSE 0 END) AS DE,
    SUM(CASE WHEN culture='ES' THEN no_of_trades ELSE 0 END) AS ES,
    SUM(CASE WHEN culture='AT' THEN no_of_trades ELSE 0 END) AS AT,
    SUM(CASE WHEN culture='NO' THEN no_of_trades ELSE 0 END) AS NO,
    SUM(CASE WHEN culture='PT' THEN no_of_trades ELSE 0 END) AS PT,
    SUM(CASE WHEN culture='IE' THEN no_of_trades ELSE 0 END) AS IE,
    SUM(CASE WHEN culture='NL' THEN no_of_trades ELSE 0 END) AS NL,
    SUM(CASE WHEN culture='GR' THEN no_of_trades ELSE 0 END) AS GR,
    SUM(CASE WHEN culture='FI' THEN no_of_trades ELSE 0 END) AS FI
FROM trades
GROUP BY culture;

This works, but it’s not pleasant to maintain. Should a culture be added or removed, this SQL needs to be updated. If it’s part of a stored procedure, that likely means a full deployment approval and process.

However, if you are creating a report, and the user does not need to execute the queries, you can simplify the problem with a scripting language like awk and pipe the result into mailx:

awk 'BEGIN{FS=","; OFS=","; print "Culture,Sum of Trades"}(NR>1){a[$2]+=$4}END{for(x in a){print x,a[x]}}'

Output:

Culture,Sum of Trades
FI,5
NO,11
UK,193
AT,34
DE,339
PT,60
ES,431
GR,11
NL,201
IE,55

Let’s break down what happens here:

A simple one liner, which can do the process of this pivot in a single pass, keeping only the running totals in memory. This makes it well scalable for tasks like this. This is the approach that I use for these kinds of reporting tasks. Awk is almost always already installed on most unix-like systems, so you can always pipe query results into awk, and set up the report via a simple cron job.

Summary by total trades for each year

Now the user might ask a simple change, “Oh, sorry, I meant year, not culture.” In this case, the SQL query needs a lot of changes, since the WHEN statements now need to refer to the years. In SQL this is getting tedious… again, what if the years go back to 1980? You’re probably already reaching for awk to generate the SQL you were otherwise going to write anyway:

SELECT year,
    SUM(CASE WHEN year=2025 THEN no_of_trades ELSE 0 END) AS '2025',
    SUM(CASE WHEN year=2024 THEN no_of_trades ELSE 0 END) AS '2024',
    SUM(CASE WHEN year=2023 THEN no_of_trades ELSE 0 END) AS '2023'
    ...
FROM trades
GROUP BY year;

In awk, the only change that needs to be made, other than a title update, is to swap out which column is going to get summed, and that’s it.

awk 'BEGIN{FS=","; OFS=","; print "Year,Sum of Trades"}(NR>1){a[$3]+=$4}END{for(x in a){print x,a[x]}}'

Output:

Year,Sum of Trades
2011,50
2013,93
2014,201
2016,55
2017,5
2018,182
2020,118
2022,145
2024,282
2025,209

Summary of year vs culture

Now, an interesting follow up question would be to get a summary of year vs culture overview. A true 2x2 overview of everything. This is again, something that gets fairly tedious if the number of cultures are more than a handful than the example in this article, but here is the SQL solution:

SELECT year,
    SUM(CASE WHEN culture='UK' THEN no_of_trades ELSE 0 END) AS UK,
    SUM(CASE WHEN culture='DE' THEN no_of_trades ELSE 0 END) AS DE,
    SUM(CASE WHEN culture='ES' THEN no_of_trades ELSE 0 END) AS ES,
    SUM(CASE WHEN culture='AT' THEN no_of_trades ELSE 0 END) AS AT,
    SUM(CASE WHEN culture='NO' THEN no_of_trades ELSE 0 END) AS NO,
    SUM(CASE WHEN culture='PT' THEN no_of_trades ELSE 0 END) AS PT,
    SUM(CASE WHEN culture='IE' THEN no_of_trades ELSE 0 END) AS IE,
    SUM(CASE WHEN culture='NL' THEN no_of_trades ELSE 0 END) AS NL,
    SUM(CASE WHEN culture='GR' THEN no_of_trades ELSE 0 END) AS GR,
    SUM(CASE WHEN culture='FI' THEN no_of_trades ELSE 0 END) AS FI
FROM trades
GROUP BY year
ORDER BY year;

Output:

+------+------+------+------+------+------+------+------+------+------+------+
| year | UK   | DE   | ES   | AT   | NO   | PT   | IE   | NL   | GR   | FI   |
+------+------+------+------+------+------+------+------+------+------+------+
| 2011 |    0 |   50 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |
| 2013 |   93 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |
| 2014 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |  201 |    0 |    0 |
| 2016 |    0 |    0 |    0 |    0 |    0 |    0 |   55 |    0 |    0 |    0 |
| 2017 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    5 |
| 2018 |    0 |  182 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |
| 2020 |    0 |  107 |    0 |    0 |    0 |    0 |    0 |    0 |   11 |    0 |
| 2022 |  100 |    0 |    0 |   34 |   11 |    0 |    0 |    0 |    0 |    0 |
| 2024 |    0 |    0 |  282 |    0 |    0 |    0 |    0 |    0 |    0 |    0 |
| 2025 |    0 |    0 |  149 |    0 |    0 |   60 |    0 |    0 |    0 |    0 |
+------+------+------+------+------+------+------+------+------+------+------+
10 rows in set (0.000 sec)

But can we reproduce this result in awk? Is it easy? Let’s compare.

Quick recap on printing 2x2 results with awk

This case gets slightly more complicated when it comes to scripting, since unlike before, it is now necessary to take care of a proper 2x2 output. The concept is not very difficult, even if the syntax might look a little daunting. The way to do this, is to use loops. One loop to print the column headers, and then one nested-loop to print out the rows.

Consider a simple awk script to print a small 2x2 table:

BEGIN{
    FS=",";
    OFS=",";

    # here is a synthetic 2x2 "report" with 4 elements
    t["A"1"]=10;
    t["A"2"]=20;
    t["B"1"]=30;
    t["B"2"]=40;

    rows[1]="A";
    rows[2]="B";
    cols[1]="1";
    cols[2]="2";

    printf "x";
    for (i=1; i<=length(cols); i++) {
        printf "%s%s", OFS, cols[i];
    }
    print ";

    for (j=1; j<=length(rows); j++) {
        printf "%s", rows[j];
        for (i=1; i<=length(cols); i++) {
            printf "%s%s", OFS, t[rows[j] cols[i]];
        }
        print ";
    }
}

Now going back to the problem, it should be more clear what is happening in the following snippet:

BEGIN{
    FS=",";
    OFS=",";
}
(NR>1) {
    t[$3$2]+=$4;
    yr[$3]++;
    c[$2]++;
}
END{
    n=asorti(c,c1);
    printf "x";
    for(x=1; x<=n; x++) {
        printf "%s%s",FS,c1[x];
    };
    print ";

    n2=asorti(yr,c2);
    for(y=1; y<=n2; y++) {
        printf "%s", c2[y];
        for(z=1; z<=n; z++) {
            printf "%s%s",FS,t[c2[y]c1[z]]
        };
        print ";
    }
}

Output:

x,AT,DE,ES,FI,GR,IE,NL,NO,PT,UK
2011,,50,,,,,,,,
2013,,,,,,,,,,93
2014,,,,,,,201,,,
2016,,,,,,55,,,,
2017,,,,5,,,,,,
2018,,182,,,,,,,,
2020,,107,,,11,,,,,
2022,34,,,,,,,11,,100
2024,,,282,,,,,,,
2025,,,149,,,,,,60,

The output will look like a big mess in the terminal, but you can redirect it to a csv, and mail it off, and libreoffice (or excel) will import it as a spreadsheet just fine. If you wish to add a 0 in case there’s no value, you can force awk to do so by sticking a +0 to the result set, in our case t[c2[y]c1[z]]+0, then the output looks like this:

x,AT,DE,ES,FI,GR,IE,NL,NO,PT,UK
2011,0,50,0,0,0,0,0,0,0,0
2013,0,0,0,0,0,0,0,0,0,93
2014,0,0,0,0,0,0,201,0,0,0
2016,0,0,0,0,0,55,0,0,0,0
2017,0,0,0,5,0,0,0,0,0,0
2018,0,182,0,0,0,0,0,0,0,0
2020,0,107,0,0,11,0,0,0,0,0
2022,34,0,0,0,0,0,0,11,0,100
2024,0,0,282,0,0,0,0,0,0,0
2025,0,0,149,0,0,0,0,0,60,0

Caveats

Conclusion

While MariaDB can pivot tables using conditional aggregation, maintaining these queries becomes tedious as columns and fields in a table might change. While a stored procedure with dynamic SQL generation might sidestep that problem, it may be a slower and more time-consuming approach than a simple awk script when deployment approvals and testing requirements come into the picture.

Appendix

These are the queries needed to create the table and data:

CREATE TABLE trades (
    product_id INT(11) NOT NULL AUTO_INCREMENT,
    culture VARCHAR(255) NOT NULL,
    year INT(11) NOT NULL,
    no_of_trades INT(11) NOT NULL,
    PRIMARY KEY (product_id)
) ENGINE=InnoDB;

INSERT INTO trades (product_id,culture,year,no_of_trades) VALUES
(1,'UK',2022,100),
(2,'DE',2011,50),
(3,'UK',2013,93),
(4,'ES',2024,204),
(5,'ES',2024,78),
(6,'AT',2022,34),
(7,'NO',2022,11),
(8,'PT',2025,60),
(9,'ES',2025,149),
(10,'DE',2018,182),
(11,'IE',2016,55),
(12,'GR',2020,11),
(13,'NL',2014,201),
(14,'FI',2017,5),
(15,'DE',2020,107);