I saw that @yuanchun-li mentioned the use of visual methods to detect UI in https://arxiv.org/pdf/1901.02633
Are you considering/trying to tag UI elements using CV methods
In this way, many applications such as Flutter /web that cannot pass xml markup can also use this scheme