Inspiration

I saw a Blue Man Group performance in Vegas and started thinking about all the work that goes into creating a production like that. I also saw a few snippets of the Taylor Swift Eras tour on Disney and thought about all the visual effects programming that must go into creating such an extravagant show. This includes the writing, directing, lighting set design, etc. I wondered if generative AI is advanced to choreograph all the different elements of a simple stage show.

I have always admired the work of Disney Imagineers, who create animatronic shows and rides that implement creative elements, such as the Haunted Mansion. I wondered if we could use generative AI to make every show or ride unique.

Last, I enjoyed Mr. Maeda's Cozy AI Kitchen video and saw an opportunity to take on his multimodal challenge.

What it does

GenAI Theater is a set of NodeJS scripts that leverage GPT4 Turbo, DALL-E, and Azure Speech services to create a unique comedy stage production involving two pint-size programmers.

How I built it

I used Visual Studio Code to build the NodeJS scripts and leaned heavily on GitHub Co-Pilot and ChapGPT to help with the code.

Architecture and Software

The GPT4 Turbo API handles many aspects of the show creation process, including:

  • Writing the character scripts
  • Designing the lighting given a DMX lighting configuration in JSON
  • Writing DALL-E prompts to create the set background images
  • Picking which sound effects from the JSON library document to play
  • Choreographing character movements via RC servos

The output from GPT4 is a JSON script document with various cues, including lights, delays, script lines, sound effects, background changes, and animations.

After GPT4 produces the show and outputs a JSON script, the NodeJS generation app uses the Azure Text-to-Speech API to generate speech audio and DALL-E API to generate backgrounds.

Once the show is generated, the run-show.js script is used to run the show. It parses through each cue in the JSON script and runs it using a few different outputs:

  • Script lines and sound effects - The cmdmp3.exe command line audio player by Jim Lawless plays each line and sound effect.
  • Lighting - The NodeJS enttec-open-dmx-usb library by Moritz Ruth is used to interface with an Enttec - Open DMX USB interface to control the DMX lighting.
  • Scene Background - A computer monitor serves as the stage background, and IrfanView by Irfan Skiljan is used to display the DALL-E generated images (16:9) in full-screen mode.
  • Character Animations - The NodeJS Johnny-Five library, by Rick Waldron and other contributors, is used to control servos that rotate each character via an Arduino running Firmata.

Physical Components

Stage - The stage is built from scrap wood and held together with hot melt glue and finishing nails. I spray-painted it matte black to keep focus on the characters.

Servo Animation - The servos that turn the toy dolls are standard hobby servos recycled from a previous project and mounted to the stage with screws.

Toy Dolls - The toy dolls came from Walmart and are secured to the servo horns using screws and hot melt glue.

Lights - The DMX lighting fixtures are recycled from a couple of previous projects and connected to the Enttec Open DMX USB interface via the same DMX cables you might find in a full-size stage production.

Arduino - Arduino is recycled from previous projects and runs Firmata. It is powered by an external battery to ensure the connected servos don’t draw too much current from the USB bus.

Challenges I ran into

I ran into issues with the GPT4 output size limit. I modified my output JSON to simply shorter names for properties, which seemed to help increase the content length. I think I can further scale the output by making multiple requests and leveraging the context.

Accomplishments that we're proud of

I was able to leverage Azure's Speech service speech styles and styledegree property to help make the characters much more emotive. This was a huge improvement in the overall dialog.

What I learned

I learned how to leverage GPT4 for a creative process by providing it with assets and configurations via JSON documents. This allows me to generate consistent output that can be consumed by another process without human intervention.

What's next for GenAI Theater

The next step is to add support for longer format programs using multiple API calls by leveraging the larger context size to help GPT4 Turbo see where it left off.

Built With

Share this project:

Updates