Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion
Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion
Flavio Schneider,Zhijing Jin,B. Schölkopf
2023 · DOI: 10.48550/arXiv.2301.11757
arXiv.org · 引用 71 次
TLDR
A cascading latent diffusion approach that can generate multiple minutes of high-quality stereo music at 48kHz from textual descriptions is developed, targeting real-time on a single consumer GPU.
