Note: The model assumes a discrete latent variable (cluster variable) that changes the normal distribution parameters for part of the population.
3 stage algorithm to estimate heterogeneous treatment effects for populations with distinct distributions.
Initial estimation of normal distribution parameters through maximum-likelihood.
Initial estimation of probability parameters maximum-likelihood.
Sub-stages 1 and 2 are iterated by fixing the parameters from the other sub-stage as "fixed" parameters. This is done for purely practical reasons, since attempting to run the full model from the get-go often resulted in non-convergence.
Take in parameters from the last iteration of stages 1 and 2 as the initial values to estimate the full model through maximum-likelihood.
For every observation, predict the conditional probability of belonging to each cluster through Bayes' theorem with the calibrated density functions.
Use predicted probabilities to estimate the Heterogeneous Treatment Effect (HTE) through OLS. In the case of two clusters:
This is only a proof-of-concept and further advancements will be needed to lift the rather restrictive assumptions.