Skip to content

Load RTTMs with keep_default_na=False to avoid nan speaker label#115

Open
cifkao wants to merge 3 commits intopyannote:developfrom
cifkao:patch-1
Open

Load RTTMs with keep_default_na=False to avoid nan speaker label#115
cifkao wants to merge 3 commits intopyannote:developfrom
cifkao:patch-1

Conversation

@cifkao
Copy link

@cifkao cifkao commented Jan 22, 2026

Currently, load_rttm uses pd.read_csv() with keep_default_na=True. This results in speaker labels such as "<NA>", "nan", "NA", "N/A", and "NULL" all being parsed as the float nan value.

It appears that the only value that could legitimately represent a NaN/missing value in an RTTM file is <NA>, but it seems that it would be unexpected for most (meaningful) columns, and it definitely seems incorrect to read it as a float for string columns (uri, speaker).

The fix I'm proposing is therefore to pass keep_default_na=False and na_values=["<NA>"], and convert nan values to None for uri and speaker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant