Inspiration
For this hack, we wanted to increase the accessibility of technology for people who may have limitations with traditional input methods, such as a keyboard and mouse. By focusing on hands-free browsing, we aimed to provide an alternative solution that not only addressed these challenges but also empowered users to use the internet with greater freedom and efficiency. Through this hack, we hope to provide a more inclusive online experience where everyone can benefit from modern day technologies, regardless of their physical abilities.
What it does
Cerebro provides an accessible, hands-free alternative to internet browsing to the physically impaired. It uses computer vision to detect minor facial movements that:
- Vertically scroll the webpage
- Interact with detected points of interest (hyperlinks, input fields, etc…)
It also uses voice commands to:
- Navigate through web browser history
- Type with voice input
How we built it
The computer vision was implemented through the use of faceAPI.js, a javascript library that specializes on detecting/tracking facial movements.
The voice commands were achieved by using Google’s Web Speech API, which allowed us to easily transcribe and handle microphone input.
The link between the Chrome Extension window and the current browser window was achieved through the use of extension content scripts, which allow us to securely inject javascript from the extension into the browser. Communication between the injected scripts and the extension can then be easily achieved via Chrome’s built-in runtime event system.
Challenges we ran into
The first major challenge we ran into was the display of our live camera feed on our chrome extension. Upon attempting to do this, we ran into multiple permissions issues with chrome that we tried to resolve in many different ways. It was quite a niche problem with little documentation online, but we eventually came across a solution involving the creation of an options page within the extension that granted permission on startup.
The selection of a speech-to-text api for our extension was another difficult task. We experimented with several different options, including OpenAI’s Whisper and AssemblyAI, finally choosing to land on Google’s Web Speech API. Latency with the voice typing was a significant problem that we were running into with the other tools, and Google’s solution showed to be the best tool as it was fully integrated with Chrome and did not require any external fetch requests.
Finally, determining the methods of control for our element selection was one of our biggest challenges. Our initial approach was moving to the next element using winks, winking the right eye to go forwards and the left eye to go backwards. We were attempting to implement the approach detailed in a 2016 paper called “Real-Time Eye Blink Detection using Facial Landmarks”, using an equation that calculated an eye aspect ratio to get a number for how open the eye was. However, with many confounding variables such as the distance from the camera and the angle of the face, this implementation turned out to be nearly impossible to control. Eventually, we abandoned this approach and decided to detect eyebrow movements instead, which turned out to be surprisingly ergonomic and effective. We created our own equation for detecting changes in relative eyebrow position, making element selection much more user friendly.
Accomplishments that we're proud of
There are many aspects of Cerebro that we’re proud of. Primarily, we were very proud of the overall performance of our implementation of the AI algorithms that handle camera and audio input. They were able to detect facial movements and transcribe audio at a far greater level of accuracy than we had anticipated.
Additionally, we were also very proud of the control scheme solution that we had engineered. Our control gestures were very intuitive and simple for users. This is great as it helps achieve our goal of Cerebro being a tool that is fully accessible to everyone.
This is also our first hackathon, and our first time using a lot of these tools. While the learning process may have been frustrating at time, we are very proud of what we’ve been able to accomplish over this weekend.
What we learned
There are many things that we learned when building Cerebro.
- We learned not only how to develop Chrome Extension, but also what the appropriate workflow for collaborative extension development should be.
- A lot about various computer vision techniques.
- How to effectively transcribe microphone input
- Adaptability! (Possibly a generic answer, but still very true. There were many times when we had to rethink the design we had in mind for Cerebro).
What's next for Cerebro?
As of right now, Cerebro is a proof of concept. There is a lot that we would like to develop on further for this project to reach its true potential.
We plan to:
- Fine tune the CV algorithms to be more accurate/consistent
- Further develop the UI to have a more pleasant UX
- Add more functionality such as zooming, tab navigation, and cursor movement
- Possibly pivot away from just a chrome extension to obtain multi-platform support on various browsers.
Log in or sign up for Devpost to join the conversation.