A Phrase Dataset with Difficulty Ratings Under Simulated Touchscreen Input

Home > Publications > A Phrase Dataset with Difficulty Ratings Under Simulated Touchscreen Input

A Phrase Dataset with Difficulty Ratings Under Simulated Touchscreen Input

Dylan Gaines, Keith Vertanen

MobileHCI 2022 Workshop on Shaping Text Entry Research for 2030, 2022 (to appear).

We extract phrases from the web forum Reddit for use in text entry studies. We simulate the input of these phrases on a touchscreen keyboard with auto-correct and word completions while using different input noise levels and language model sizes. We rank the difficulty of each phrase from 1-10 based on the character error rate of the simulation. We release the final phrases and metadata to allow researchers to select phrases according to the needs of their study. We conjecture that more difficult phrases will be useful for testing an interface's features designed to help users detect, avoid, or correct recognition errors.

Reference: