Inspiration

I pursued this project to gain some familiarity with the capabilities of Microsoft Azure for OpenAI's services and models. Alongside CosmosDB, this project reinforces an understanding of how embeddings and vector searching can be incredibly powerful for creating novel responses for even proprietary matters, through the form of an AI copilot to chat with.

What it does

The submission allows users to prompt a copilot with questions, powered through vectorizing inputs to find similarities within a multi-dimensional vector store, made possible through CosmosDB and tools like LangChain.

How I built it

I built it by setting up Azure resources for several tools and integrating them with a virtual environment for a Jupyter Notebook, allowing me to experiment with manipulating the CosmosDB database, as well as developing vector embeddings for my MongoDB JSON documents. All this allowed me to create a model trained with RAG patterns to be able to answer proprietary questions (as well as generic ones), through the use of similarity analysis through the form of vectorized embeddings in a massive vector store.

Challenges I ran into

Having all my data stores initially kept in JSON documents for MongoDB, I vectorized all the documents into the vector store database and encountered timeout issues with each lengthy document translation call. I was able to resolve it by specifying a particular batch size for the embedding find() calls, which ensured the vectoring of such large documents was unhindered. I also encountered another challenge where my docker container image could not be deployed to Azure since it defaulted to linux/arm64 (due to an M1 Macbook). I resolved it by reading the docker man pages and modifying a few of the docker construction command flags to cast the container to linux/amd64 to align with Azure Container App requirements.

Accomplishments that I'm proud of

I'm happy that I was able to get up to speed quickly with Azure and figure out how different resources and cloud services interact, while also making steady progress throughout. This certainly helped me through several stages of development, including those where bugs were hard to diagnose and resolve.

What I learned

I learned how to work with various Azure resources such as [vCore] CosmosDB and OpenAI, as well as formulate vectorized embeddings of MongoDB documents. This second learning is quite powerful and enabled me to formulate a strong understanding of how a neural network can make use of multi-dimensional embeddings entering the hidden layer, especially when being used to generate constructive responses.

What's next for MS Open AI Copilot

I hope to extend the understandings and systems I've experimented with to real applications, where the data stores could be converted to vectorized embeddings with much more scalable and versatile capabilities. This can allow for creating something like a digital assistant utilizing RAG patterns, LangChain, and more for fine-tuned responsiveness.

Built With

Share this project:

Updates