Skip to content

Media platform that aims to connect creative minds through generative AI

License

Notifications You must be signed in to change notification settings

ShoggyR79/MuseAI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

52 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MuseAI

By: Du Duong, Binh Ho, Rana Khan, ZhanqiZhu

Inspiration

Our words have creative power. More often than not, we can’t help coming up with creative artistic ideas yet only able to express ourselves with words: not everyone has the power and ability to realize the words and the imaginations that are contained within them. For example, a person might imagine that their Discord profile would be very cool with a profile picture of a cyberpunk robot in front of a dystopian background, but without the artistic skills, they would not be able to convert their imagination into reality. However, with MuseAI, our burning passions are to realize creativity in words and give it the breath of life that it deserves. With the help of powerful and state-of-the-art artificial intelligent models, we believe that the users of MuseAI will be able to freely express their creativity beyond the boundaries of words and to be able to share their imagination with the other members of a truly open-sourced and collaborative artistic community, connecting the hearts and souls of like-minded people around the world on a journey through AI-generated, human-inspired pieces of art and music. Lastly, we would like to take a big leap towards a diverse set of ideas, adhering to our goals to connect people around the world regardless of language barriers. Some languages are more expressive than others, hence our MuseAI is compatible with numerous other languages. Welcome to creativity without language barriers!

What it does

MuseAI is a fully capable web application that is capable of understanding semantics from over 100 languages across the world. Our goal for MuseAI is to create a media platform where people could come together and express their creative minds. Powered by large language models, MuseAI is one of its kind application that utilizes diffusion models to generate not only images for visual appeal, but also uses semantic tokens in the image to enkindle music to go along with it. From deaf people who cannot enjoy art in the form of music, to blind people who might be looking to please their ears, MuseAI allows creative people from all walks of life to muse. MuseAI also allows these creative users to share their creations with people around the world, communicate, and view what their fellow peers have been thinking of. Fully aware that with newer technology comes more responsibilities, we have implemented an NSFW filter so that no user can accidentally enter a domain they would not be comfortable with as well as make sure no one can misuse this technology. As an effort to promote MuseAI as an open, friendly, and caring media platform, there is added functionality for each picture to personalize and the ability to show support through likes and views, and for independent users to take pride in their creativity.

How we built it

For the front-end of the application, we primarily used JavaScript with React.js as the backbone of our project. We chose to use React.js because it is a modern and cutting-edge technology framework for front-end development in the industry and also because of its robustness and ease of use. We also leveraged CSS skills extensively in order to create animations that added touches of personalization and elegance to the front-end as well as keeping the overall layout clean, responsive, and user-friendly. Instead of using a traditional backend like node.js, we chose to handle all backend related functions with Google’s Firebase. We believe that firebase is perfect at creating small scale projects fast. As a group, we researched and utilized Firebase provided functions to handle authentication, data storage, and data query. Firebase allowed us to skip building a traditional backend system, which saved us a lot in programming time. We used Flask to build the Rest API that connects the frontend with the diffusion models. Since flask can help us transform json objects to strings that can be used by the model, and also convert the Muse Ids and error status to a json file that can be used by the front-end. Using Flask as our REST API, we connected the frontend to our diffusion and language models. REST API parses the arguments provided by the frontend, sends them to the diffusion and language models. We implemented a denoising pipeline for diffusion probabilistic models using techniques derived from papers [3. Stefano Et. Al] as well inference pipelines. We used hugging face, pytorch and mubert api to implement the diffusion models and OpenAI’s CLIP model as our language model to give semantics to our prompts. For translational components we are using Google’s Translate API.

Challenges we ran into

It is an ambitious project, so challenges along the way are inevitable. For example, while planning the project, we ended up exploring too many options and features that we wanted to deploy. Soon enough, we realized that they are not realistic expectations, so we trimmed the project down in order to fit the time frame of the event, and it was challenging to decide which features to drop and which to keep. Furthermore, another challenge that greatly hinders our development progress is that we are working on completely new technologies. We had to first refer to documents and tutorials video of the tech stacks before even starting to implement them to our needs. One final challenge that prevented us from shipping this project is that our backend needs to be deployed on a server with a good GPU. Since we are just 4 college students looking for SWE internships, we can not afford it just yet :( But that is one direction we would also want to head towards. However, these challenges are also part of the reason why this project is super rewarding.

Accomplishments that we're proud of

Coming into the project, we wanted to challenge ourselves by experiencing new technologies. Before this hackathon, none of us had any ample experience on React, Flask, and Firebase. Thus, we were super proud of learning all the tech stacks listed prior and communicating with one another to finally construct a tangible, functioning project. The sense of relief after solving issues after issues and finally having a running program is both exhilarating and fun.

What we learned

Front-end is sometimes tedious work, but being able to see your progress and your work as it grows really motivates us throughout the project. We also had the chance to practice putting more thoughts into front-end designs in order to best facilitate user experience through intuitive, user-friendly, and eye-catching UIs and layouts.

What's next for MuseAI

So far we have functionalities such as creating images and music through prompts and tags. We would eventually like to make our model be capable of converting images to music and vice versa. We already do have an implementation for this but it needs to be optimized and we do not have time given that this is a hackathon, limited styled project. In the future, we will try to step into creating videos through prompts, background music for videos, and generating music through images. We would also like to try to build a game engine for users to create their own game characters and environments that contain images and music through the words they input. Further, we would eventually like to make higher resolution environments for people who would like to see their music animated and allow them to produce music videos just by clicking one button. As for scalability, it would be nice if we could also host the backend to further accommodate every user. All the while our motivation is to demystify the often over complicated world of artificial intelligence and allow these complex technologies to be used and capitalized by a layman person.

Citations

  • High-Resolution Image Synthesis with Latent Diffusion Models. Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, Björn Ommer
  • Hierarchical Text-Conditional Image Generation with CLIP Latents. Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, Mark Chen
  • Generative Modeling by Estimating Gradients of the Data Distribution. Yang Song, Stefano Ermon

About

Media platform that aims to connect creative minds through generative AI

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published