Skip to content

Feature request: Add fenn.tabular module for data exploration #22

@blkdmr

Description

@blkdmr

Introduce a new fenn.tabular module that provides a small, composable toolbox for tabular data exploration built on top of pandas and NumPy. This should focus on ergonomic helpers for common Exploratory Data Analysis (EDA) patterns rather than re-implementing core pandas/NumPy features.

Goal

Create an initial version of fenn.tabular with a clear, well-documented API to simplify early-stage data exploration workflows. The module should make it easy to quickly inspect, summarize, and sanity-check datasets represented as pandas.DataFrame or numpy.ndarray objects.

Proposed features

Ideas for a first iteration (not all are mandatory for a single PR):

  • summary (dataframe): one-shot overview combining shape, dtypes, basic stats, missing values counts, and simple cardinality info for categorical-like columns.
  • quick_sample (dataframe): convenience wrapper around head/random sampling with optional column subset and seed.
  • missing_report (dataframe): compact report of missing values per column, percentage, and flags for all-null or almost-all-null columns.
  • unique_report(dataframe): show number of unique values per column and, for low-cardinality columns, a small frequency table.
  • numeric_profile(dataframe): describe numeric columns only (min, max, mean, std, quantiles) with optional clipping of extreme quantiles.
  • corr_overview(dataframe): compute correlations between numeric columns and return the strongest pairs as a tidy table.
  • array_summary(ndarray): NumPy-oriented helper for shape, dtype, basic stats, and NaN checks on ndarray.

These functions should be pure utilities (optional plotting?) and should not alter the input objects in place.

Tasks

  • Create the fenn/tabular/__init__.py module and basic package structure.
  • Implement a first subset of utilities (for example: summary, missing_report, array_summary).
  • Add type hints and docstrings with small usage examples.
  • Add unit tests with small, synthetic DataFrames/arrays.
  • Integrate the new module into the public API (if applicable) and update the documentation/README with a short usage section.

Contributing

If you want to work on this:

  • Please read the CONTRIBUTING guide before starting.
  • Comment on this issue to claim a part of it and join the discord server to discuss which subset of functions you plan to implement first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions