Inspiration

We run a company called Addy AI that allows e-commerce traders to train custom LLMs to respond to their customer inquiries 24/7.

In order to train, these companies have to upload their data to our servers so we can handle ingestion, vectorization, similarity search, etc.

However, not every company is willing to do that. We recently lost a customer in Europe because a new EU law require company data to be only hosted in databases the company controls.

So we made Drive LLM, so you can host AI apps with your data 100% hosted in your own cloud --starting with google drive.

What it does

This library allows companies to perform full in-context learning in their own cloud. With a sample Q/A chatbot to demonstrate that.

In one click, you use OAuth to connect your google account and give the repo permissions to access google drive.

From there you use our drive_utils API to access a suite of CRUD operations just as you would in any vector database.

When your files change, we have hooks that monitor to update the state of the vectors.

Developers can install this repo via NPM to handle only backend functions. It provides code to both handle backend requests and the accompanying chatbot UI to issue the requests. Individuals may use a free instance of this service by visiting addy-ai.com. Individual contributors can fork a copy, or contribute a PR, by following the instructions (and documentation) below. Google OAuth2 keys are required to run your own instance.

How we built it

The entire db is stored in your google drive. We made the drive utils create a SQLite file stored in your google drive that acts as a proxy db.

We use Node.js to create the drive_utils which can manipulate the files in your drive. These utils are served via express. You can run your own instance in Heroku in one click

Challenges we ran into

  • oAuth is a bitch.
  • Huggingface pipeline for langchain support was hard to implement
  • Open models on huggingface need to be convered to support onnx which works w/ JavaScript.

Accomplishments that we're proud of

Implementing CRUD operations from scratch. As far as we know, this might be the first SQL lite implementation that wraps a vector db in a private cloud.

What we learned

But just because you have a model doesn't mean it works w langchain Write operations to the vector store on google drive can be triggered by something as simple as a one-line change in a doc that is being watched. This may not scale well for documents with high volume of changes and saves, hence will make the db very write-heavy.

Instead, a better way would be to have cron jobs that run intermittently to do the update operations, except if a developer wants to trigger a write manually.

So we learned how to make a db write-optimized

What's next for Drive LLM

We want to extent in two ways:

  1. To add other cloud-drive providers not only google drive, but iCloud, OneDrive, Dropbox, etc.
  2. To extend to be able to use open source models on huggingface.

We're open-sourcing this

Share this project:

Updates