AI generates video game levels and characters from text prompts

A generative AI model based on small datasets was able to create maps and 2D character models for video games on demand

A simple generative AI tool can create video game maps, character models and emojis from a single-sentence prompt within milliseconds.

Julian Togelius and Timothy Merino at New York University and their colleagues designed the system as a way of understanding how simple an AI model can be while still proving useful.

“We tried as a starting point to figure out the most naive, simple approach we could do for textbook map generation,” says Merino. “It was surprisingly effective.”

The model was trained on databases of 882 game maps, 100 game sprites and 10,000 emojis, all of which were labelled with descriptions of what the images showed. “All of our datasets were fairly small, and that was by design,” says Merino.

The labelling avoided listing the specific names of characters, instead describing Mario as “a man with a moustache dressed in red”, for example. Alternative labels were also created to train the model using GPT-4, the large language model behind ChatGPT.

The AI model itself uses a simplistic neural network that removes many of the modern developments that are powering the current generation of AI. For example, the network doesn’t include any feedback loops, meaning the information flows in one direction, from input to output.

Despite its simplicity, the system was able to produce accurate depictions of what was asked by users via text prompts, such as “a grassy field with some flowers”, “an island of trees in the river” or “a flooded village”.

The model shows what can be done with limited computing power, says Togelius. “A lot of people are aware of the potential for AI changing how games work,” he says. “But also, a lot of these things are just massive models that require massive amounts of data to train. This thing is trained on your home computer and runs on your phone, basically, in blazingly fast time.”