For the high-resolution images, like a 20k x 12k image captured in a nadir view with the objects far away, ranging from 800m to 1500 m away. Also, in the datasets, they are collected mostly from video files, with each image having high overlap with the previous one and the next ones. For a practical capture condition with 60% overlap, high-resolution images, and objects being far away, how will this model perform?