I’m a sucker for celebrity drama, and the recent news of legendary quarterback Tom Brady’s divorce from Brazilian supermodel Gisele Bündchen has me captivated. Was it amicable? (I hope so!) Should Tom have retired sooner? (Probably.) Will he reverse course and un-retire a second time? (The NFL would be so lucky.) Would Gisele beat Tom in a rap battle? (I’d bet on her.)
Enter AI Rap Battle, a new generative tech product that allows users to enter the names of two famous people, pitting their stick-figure forms against each other in robot-voice rap battle, all set to a steady hip-hop beat.
The demo website was constructed in a single day during Scale AI’s hackathon on Jan. 21, which saw roughly 80 groups of 300+ coders hacking away at new generative technologies at the company’s SF headquarters. The team of four included developers Justin Torre, Calum Bird, Miguel Acevedo and Kaelan Mikowicz, who all work for other generative tech companies.
The idea came to Torre first—who’s not actually a rapper.
“I wanted to just make something that would combine image generation with text generation, and I had already done that with a choose-your-own-adventure game,” Torre said. “I wanted to do something like that where it’s fun, because I knew we only had a day [to make the product], and Calum and I usually talk about serious things anyway.”
And fun it was: The team took one of the top prizes, in addition to snagging the audience favorite award. Videos from the hackathon show hackers laughing at a demo presentation of the rap battle, pitting Barack Obama against Shakespeare.
Though the product was created with absurdity in mind, its creators speculate that AI Rap Battle won over the hackathon’s judges because it cleverly combined multiple AI models together: text processing (GPT-3), image generation (Stable Diffusion) and voice production (Waveform).
And by testing out a few different ways to train their AI models to produce a functioning, coherent rap battle—a difficult enough feat for most humans—they produced a site that lets users pair a whole slew of real-life celebrities, but also famous fictional characters such as the Mona Lisa or Harry Potter.
“If you structure how you’re talking to GPT-3 in the correct way, you can convince it to do certain things,” Bird said. “We really just told it to make sure the words rhymed, and it just takes care of it for us.”
Bird’s team had a few clever tricks up its sleeves to improve the quality of its diss tracks, including telling each “rapper” to adhere to the following rules: Lines must rhyme, be creative, flaunt the speaker’s own successes and make fun of the opponent—all with a degree of aggressiveness, for an added dose of spite.
And though the demo was created with levity in mind, the AI Rap Battle team says that using multiple AI models together demonstrated just how much generative tech could do.
“These are all very powerful technologies,” Acevedo said. “When you combine them, you can create an interface instead of just reading text on your computer. You can put a face on it; you can put a voice on this AI-generated thing.”
“And it’s worth pointing out that people use this kind of ‘multimodal’ stuff all the time, like when you’re searching Google for images,” Bird said. “This is exactly what’s happening behind the scenes.”
We Just Had To, OK?
Of course, a program that could get the Mona Lisa and Barack Obama onscreen rapping diss tracks to each other won third place at Scale’s hackathon—beating out products that suggested efforts to combat climate change or increase voice-assistance technology.
Naturally, I wondered what boundaries we could push with this product. The possible showdowns are endless. What if we put Tucker Carlson against his biggest nemesis, the de-sexy-fied Green M&M lady? Would Marjorie Taylor Greene have smoother bars than in-the-know rapper stan Alexandria-Ocasio Cortez? Would Chesa Boudin stand a better chance at winning if he took up Brooke Jenkins in a rap battle?
We tested out a few other non-human entries, pleasantly surprised to find out they worked. “Dog” dissed “cat” effectively, San Francisco successfully duked it out against Los Angeles, and “chair” owned “table” with the excellent bar, “My back is straight, and my arms are strong / I’m the king of furniture, so don’t get it wrong.”
Back to Gisele and Tom: What better way to negotiate the end of marriage than by spitting some fire bars? Here’s a sample of their AI-generated showdown:
Gisele:
I’m the queen of this rap battle, no need to tussle,
My beauty and brains will leave you in a muddle.
You can’t compete with my success or style,
Your rhymes are weak so I’ll make them pile.
Tom:
I’m the GOAT, I can’t be beat,
My success is something to eat.
Gisele’s got nothing on me,
She’ll never catch up you see.
Gisele:
Tom may think he’s so great,
But his skills are just a plate.
He thinks he knows it all,
But my fame will make him fall!
Tom:
My career’s been on the rise,
I’m a star that never dies.
Gisele can’t compete with me,
She’ll always be second to see.
I think we’ll call it a draw?