There. I spilled the not-so-big secret. Excel files from Excel 2007 and above (.xlsx) are automatically compressed. A feature which, in all my years of using Excel, I never knew about.

I once received a large excel file from finance for analysis. Normally I would convert said file to CSV (comma separated values) as the latter:

  1. …is just data, no formatting. Exactly what I need for a data extract and nothing more.
  2. …tends to be more malleable across multiple applications.
  3. …and because of #s 1 and 2, tends to have a smaller file size.

So imagine my surprise when, upon converting to CSV, my 29 MB file ballooned to 115 MB.


Usually it’s the other way around. With all the formatting and formulas removed, the file size usually shrinks.

But apparently this is no longer the case when you have a lot of data. Once you go over a certain point, the amount of data you use matters more than the formatting.

Fortunately .xlsx is compatible with Power BI, which is where I was going to plug the data into anyway. I let the file type stay as is.

Makes for a convincing argument for the utilizing the Microsoft suite, eh?

(And in case your answer is no, let me argue that even technology research group Gartner agrees with me by crowning Microsoft king in business intelligence and analytics platforms.)


  1. By the way, you can also use advzip from AdvanceCOMP to recompress the files in place (although a backup should be done first in case the computer crashes in the middle of the process).

    For example the command advzip -z --shrink-insane FY2018.xlsx will recompress the file using an extreme (and slow) compression algorithm.

    These are my results with a 1 MB file on Intel i7-3770 CPU.

    Algorithm   Size        Size [%]   Time [s]
    N/A         1,056,266   100.00%     N/A
    store       9,027,710   854.68%     0.029s
    fast          926,150    87.68%     1.018s
    normal        710,561    67.27%     1.357s
    extra         710,527    67.27%    11.210s
    insane        707,581    66.99%    37.896s

    The first line represents the case when nothing is done with the file.


