Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion
Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion
Flavio Schneider,Zhijing Jin,B. Schölkopf
2023 · DOI: 10.48550/arXiv.2301.11757
arXiv.org · 71 件の引用
TLDR
A cascading latent diffusion approach that can generate multiple minutes of high-quality stereo music at 48kHz from textual descriptions is developed, targeting real-time on a single consumer GPU.
