Skip to content

Firm up the definition of ColumnObject.type #1

@TomAugspurger

Description

@TomAugspurger

The ColumnObject has a loosely specified type field that's intended to capture the column's dtype:

Data type of the column. If using a file format with a type system (like Parquet), we recommend you use those types.

This definition is pretty vague, but maybe that's OK. It depends on what this is used for. When the Asset is a typed file format like Parquet, the type in ColumnObject is irrelevant when you're actually loading the data. Under that scenario, I think it's mostly just useful for humans learning about the data. But if the Asset is something like a CSV, a reader might want to use the types from the table:columns to avoid inferring a data type from the values.

IMO, we have a few choices:

  1. Leave type as is: don't take a strong stance on what values types can take. Make it clear to users that this field is primarily informational.
  2. Adopt a specific type system (e.g. parquet or json-table-schema's) and require users to translate the Asset's type to that system.
  3. Drop type in favor of multiple fields like, parquet_type, arrow_type, jsonstablechema_type, etc. Let the provider choose which one they provide.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions