To download into repository root:
Sample_Program_Service_Data.csv- Download from Amazon Sample pre-aggregated financial and text data from Form 990s.
In repository:
manula_datakind_labels.csv- ~345 hand-labeled SDG rows (about ten per SDG)NTEE_EINs_EOBMF.csvmaps between EINs and NTEE taxonomygoals.txtcontains SDGs and their descriptions
This code requires Python 3 and pip. We recommend using virtual environments (via virtualenv or conda).
- Run
pip install -r requirements.txtto get all Python dependencies.
990_analysis.ipynb- requires missingnonprofit-descriptions_2016.csvandNTEE_EINs.csvmerged_data_models.ipynb- trains and evaluates a set of NLP models on the sample dataset, to predict SDGs
Script to produce NTEE EINs from the IRS EOBMF
df = pd.read_csv("irs_eobmf.csv")
df = df[["ein", "ntee_cd"]].rename(columns={"ein": "EIN", "ntee_cd": "NTEE"})
df.to_csv("NTEE_EINs_EOBMF.csv", index=False)