Make use of those multi-line responses of "images from text". Cards/suggestions can be like buttons that are activated by voice. Tic-tac-toe?