Inspiration

Space! We are all members of a project team at UMich that develops AR interfaces for NASA astronauts. We firmly believe that AR has great potential not just in space, but in our daily lives. We all love cooking, so we made Cosmocook!, an AR cooking assistant, to showcase the vast potential of AR.

What it does

Cosmocook! is an augmented reality cooking assistant that simultaneously incorporates audio, textual, and video input to provide personalized assistance to help people learn how to cook in an exploratory way. Currently Cosmocook! offers the following features;

  • Step-by-step instructions delivered in a dynamic 3D interface
  • Context-aware (recipe, video feed, etc..) guidance for ingredient substitution, unit conversions, and cooking technique improvements
  • Automatic ingredient detection
  • Convenient recipe search with Google Search API
  • Versatile recipe scraping and contextualization with Gemini 1.5
  • Integrated reference images powered by Google image search
  • Touchless input system with voice commands (activated by voice) and hand gestures
  • Suggests recipe ideas based on available ingredients (shown in fridge / pantry)

How we built it

We delegated our development to two main systems: the AR interface and the backend server. We built the AR interface with the HoloLens 2, Mixed Reality Toolkit (MRTK), and Unity. We built the backend server with Flask, native websockets, Redis (for caching results), Gemini 1.5 Pro API, Google Search API, Beautiful Soup, and OpenCV. To facilitate communication between the AR interface and the API, we used a websocket connection. To successfully contextualize visual information with minimal latency, we implemented a multithreaded video retrieval buffer that maintains the most recent 15 seconds of video content from the HoloLens.

Challenges we ran into

We implemented many features simultaneously, so organizing the connections between the AR interface and the web server was challenging. It was very difficult to develop so many features because AR development lacks many frameworks and tools that give web development a fast turnaround. We also experienced challenges with Gemini's poorly formatted or hallucinated responses. However, we solved these issues by utilizing system instructions and function calling to ensure consistency and validation. We faced issues regulating usage of the Gemini API, which only allowed for ~2 requests/min. We faced issues regulating usage of the Gemini API, which only allowed for ~2 requests/min. To address this, we created a Redis cache to store the output of each Gemini API request, which helped us avoid redundant requests during testing and development.

Accomplishments that we're proud of

We are proud of the diversity of our features that we were able to achieve in ~24 hours. We are happy that all of these features were able to be integrated into the application, giving it comprehensive functionality to guide users' cooking experience.

What we learned

We learned how important it is to implement redundancies when working with LLMs due to their instability. We learned that LLMs can have limitless capabilities; however, deliberate prompt engineering is crucial.

What's next for Cosmocook!

Improve the stability of our API framework. Investigate alternative AR systems and their tradeoffs. We also have other ideas, such as video responses (such as YouTube) when the user asks how to perform a certain task. Creating a my kitchen feature that keeps track of your ingredients and suggests recipes and food ideas on a day-to-day basis.

+ 3 more
Share this project:

Updates