Skip to content

Prevent massive read of data #40

@alexandre-lecoq

Description

@alexandre-lecoq

Reproduction steps

  • Generate an excel file using a query which will read 2000000 rows over several databases.

Observed

  • It will take hours to read the data, and hours to merge the data. The result might be 9 GB of data. Garbage collection will take hours. It won't be possible to get an excel file in a reasonable time, if at all.

Expected

  • Prevent doing things that can't work.

How to fix

  • Hard limit the size of tables at every step (reading, merging, generating). If a table is bigger than the hard limit : drop the table and issue an error log.
  • Add 2 parameters --maximumTableRows and --maximumTableColumns to reduce the limit to a certain amount.
  • The limit cannot be increased.
  • The limit should not apply when results are discarded using --discardresults

The hard limit should be 1048576 rows and 16384 columns per excel specifications.
See https://support.office.com/en-us/article/excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3

If #48 is implemented those fix would be questionable.

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions