-
-
Notifications
You must be signed in to change notification settings - Fork 19
Description
Introduce a new fenn.tabular module that provides a small, composable toolbox for tabular data exploration built on top of pandas and NumPy. This should focus on ergonomic helpers for common Exploratory Data Analysis (EDA) patterns rather than re-implementing core pandas/NumPy features.
Goal
Create an initial version of fenn.tabular with a clear, well-documented API to simplify early-stage data exploration workflows. The module should make it easy to quickly inspect, summarize, and sanity-check datasets represented as pandas.DataFrame or numpy.ndarray objects.
Proposed features
Ideas for a first iteration (not all are mandatory for a single PR):
summary (dataframe): one-shot overview combining shape, dtypes, basic stats, missing values counts, and simple cardinality info for categorical-like columns.quick_sample (dataframe): convenience wrapper aroundhead/random sampling with optional column subset and seed.missing_report (dataframe): compact report of missing values per column, percentage, and flags for all-null or almost-all-null columns.unique_report(dataframe): show number of unique values per column and, for low-cardinality columns, a small frequency table.numeric_profile(dataframe): describe numeric columns only (min, max, mean, std, quantiles) with optional clipping of extreme quantiles.corr_overview(dataframe): compute correlations between numeric columns and return the strongest pairs as a tidy table.array_summary(ndarray): NumPy-oriented helper for shape, dtype, basic stats, and NaN checks onndarray.
These functions should be pure utilities (optional plotting?) and should not alter the input objects in place.
Tasks
- Create the
fenn/tabular/__init__.pymodule and basic package structure. - Implement a first subset of utilities (for example:
summary,missing_report,array_summary). - Add type hints and docstrings with small usage examples.
- Add unit tests with small, synthetic DataFrames/arrays.
- Integrate the new module into the public API (if applicable) and update the documentation/README with a short usage section.
Contributing
If you want to work on this:
- Please read the CONTRIBUTING guide before starting.
- Comment on this issue to claim a part of it and join the discord server to discuss which subset of functions you plan to implement first.