Cerebro

Cerebro Logo

Inspiration

For this hack, we wanted to increase the accessibility of technology for people who may have limitations with traditional input methods, such as a keyboard and mouse. By focusing on hands-free browsing, we aimed to provide an alternative solution that not only addressed these challenges but also empowered users to use the internet with greater freedom and efficiency. Through this hack, we hope to provide a more inclusive online experience where everyone can benefit from modern day technologies, regardless of their physical abilities.

What it does

Cerebro provides an accessible, hands-free alternative to internet browsing to the physically impaired. It uses computer vision to detect minor facial movements that:

Vertically scroll the webpage
Interact with detected points of interest (hyperlinks, input fields, etc…)

It also uses voice commands to:

Navigate through web browser history
Type with voice input

How we built it

The computer vision was implemented through the use of faceAPI.js, a javascript library that specializes on detecting/tracking facial movements.

The voice commands were achieved by using Google’s Web Speech API, which allowed us to easily transcribe and handle microphone input.

The link between the Chrome Extension window and the current browser window was achieved through the use of extension content scripts, which allow us to securely inject javascript from the extension into the browser. Communication between the injected scripts and the extension can then be easily achieved via Chrome’s built-in runtime event system.

Challenges we ran into

The first major challenge we ran into was the display of our live camera feed on our chrome extension. Upon attempting to do this, we ran into multiple permissions issues with chrome that we tried to resolve in many different ways. It was quite a niche problem with little documentation online, but we eventually came across a solution involving the creation of an options page within the extension that granted permission on startup.

The selection of a speech-to-text api for our extension was another difficult task. We experimented with several different options, including OpenAI’s Whisper and AssemblyAI, finally choosing to land on Google’s Web Speech API. Latency with the voice typing was a significant problem that we were running into with the other tools, and Google’s solution showed to be the best tool as it was fully integrated with Chrome and did not require any external fetch requests.

Finally, determining the methods of control for our element selection was one of our biggest challenges. Our initial approach was moving to the next element using winks, winking the right eye to go forwards and the left eye to go backwards. We were attempting to implement the approach detailed in a 2016 paper called “Real-Time Eye Blink Detection using Facial Landmarks”, using an equation that calculated an eye aspect ratio to get a number for how open the eye was. However, with many confounding variables such as the distance from the camera and the angle of the face, this implementation turned out to be nearly impossible to control. Eventually, we abandoned this approach and decided to detect eyebrow movements instead, which turned out to be surprisingly ergonomic and effective. We created our own equation for detecting changes in relative eyebrow position, making element selection much more user friendly.

Accomplishments that we're proud of

There are many aspects of Cerebro that we’re proud of. Primarily, we were very proud of the overall performance of our implementation of the AI algorithms that handle camera and audio input. They were able to detect facial movements and transcribe audio at a far greater level of accuracy than we had anticipated.

Additionally, we were also very proud of the control scheme solution that we had engineered. Our control gestures were very intuitive and simple for users. This is great as it helps achieve our goal of Cerebro being a tool that is fully accessible to everyone.

This is also our first hackathon, and our first time using a lot of these tools. While the learning process may have been frustrating at time, we are very proud of what we’ve been able to accomplish over this weekend.

What we learned

There are many things that we learned when building Cerebro.

We learned not only how to develop Chrome Extension, but also what the appropriate workflow for collaborative extension development should be.
A lot about various computer vision techniques.
How to effectively transcribe microphone input
Adaptability! (Possibly a generic answer, but still very true. There were many times when we had to rethink the design we had in mind for Cerebro).

What's next for Cerebro?

As of right now, Cerebro is a proof of concept. There is a lot that we would like to develop on further for this project to reach its true potential.

We plan to:

Fine tune the CV algorithms to be more accurate/consistent
Further develop the UI to have a more pleasant UX
Add more functionality such as zooming, tab navigation, and cursor movement
Possibly pivot away from just a chrome extension to obtain multi-platform support on various browsers.

Built With

Submitted to

Bitcamp 2024
- Winner Best Razzle Dazzle Hack - Bitcamp

Created by

I worked on the front-end for the chrome extension web page. It was my first time using html and css, so I learned a lot more about how the languages work.

Ryan Shechtman
I worked on establishing the connection via the extension and the browser using extension content scripts. Additionally, I built the algorithm that discovers all the points of interest on a webpage for the hands-free element selecting. I also built the voice command functionality using Google's Web Speech API. It was my first time doing either of these things, and I learned a lot!

Declan Scott
I implemented the hands-free element selecting and scrolling functionalities of our chrome extension using the face-api.js computer vision library. I made use of the 68-points facial landmark detection model to track different useful areas on the face in real time and engineer several creative solutions to make control gestures as user-friendly as possible. Working on this project was a challenging and exhilarating experience — I learned a lot about computer vision in javascript and built a chrome extension for the first time!

Tanay Naik