Inspiration
This project was inspired by my experience helping at a farmer's market last year. They processed orders with a Square POS app by searching/tapping through the UI. I have always thought tapping through a series of menus gets a little cumbersome, and it compounds when your hands are needed for something else, or if your screen gets water on it. So it seems like a good idea to allow for orders to be processed via dictation.
What it does
This app allows for a Square user to dictate an order from their catalog using a microphone. The app will fill out the order item, their variation, and quantity, and present the order to the user for them to further edit and submit. Functionally, it replaces selecting items via a click-based menu with a speech-to-text model.
How we built it
VoicePOS was built in Python3.10 with a simple Flask framework and some HTML/Javascript. It accesses a given Square application's catalog by making a call to the Square API "List Catalog" endpoint. The app then produces a "Record" button with which the user can dictate the items to put into an order. VoicePOS will then store that recording in a Gcloud bucket and use Google's speech-to-text model API to convert the recording into text. The extracted text is then pattern matched to the catalog via regex to compile an order that can be submitted back into the Square API "Submit Order" endpoint.
Challenges we ran into
A main challenge was that I wasn't sure how to integrate this app into the existing Square UI. My vision would be to simply have a microphone button next to the catalog UI, and the app would run from there. But either that kind of app-extension thing isn't really supported in the Square Developer world, or I couldn't find it. So for a proof-of-concept, I just built a web app.
Another challenge was to produce a regex method that adequately parses text into separate order line items. Because of the proof-of-concept nature of this submission, the text-to-order parsing component is rather brittle and supports English only. It could definitely be improved.
Thirdly, this project was built without a dataset to measure anything like retrieval performance, and without anything like online learning for fine-tuning the speech-to-text model. Therefore it is more of a proof-of-concept that dictation can work and that it can be nice not to have to tap into a big inventory interface. It would be very cool to develop this model and produce some user statistics.
Accomplishments that we're proud of
It works pretty well for a demo, and it doesn't use very much code at all.
What we learned
I learned to find a balance in the flexibility-usability tradeoff, and what to dial in on for the most important feature of the app- order composition through a speech-to-text model.
What's next for VoicePOS
Figure out how to integrate this into the actual Square mobile app UI. Then produce some user statistics and improve the model pipeline's ability and expand!
Built With
- flask
- gcloud
- google-ai
- google-app-engine
- python
- square
Log in or sign up for Devpost to join the conversation.