Inspiration

Inspired by a recent experience in which our basketball was stolen at a basketball court in our neighborhood, we were thinking of ways to create something that can help out real detectives solve real mysteries. In big cities, where organized crime and gangs are prevalent, and the same criminals commit similar crimes repeatedly within the city, wouldn’t it be helpful to create a software that could help identify and catch them? We came up with Suspect Watch.

What it does

Connected to a network of all surveillance cameras within a city or area, the front-end of Suspect Watch allows a detective or law enforcement officer to input detailed descriptions about a suspect, as well as a general timeframe for any prior criminal behaviors of the suspect. By continuously monitoring surveillance cameras and comparing incoming images against suspect profiles and details, Suspect Watch provides real-time notifications when a suspect re-enters a monitored area. This speeds up the process of solving crimes that would otherwise remain unsolved mysteries.

How we built it

The front end sends a video to the back end of cctv footage. We sent that to the backend, and then used a YOLOv8 segmentation model to identify each of the humans within each frame. Then, we used a vision transformer to create an embedding in multidimensional space for each human. Then, the description of the thief is sent to the backend, and that is also embedded into multidimensional space using an attention transformer model. The image vectors are stored in a vector database, and queried using the text embedding. In order to train the vision transformer and text transformer to generate embedding that correspond, where a person and they’re description appear at the same coordinate, we needed a image text paired dataset, which isn’t available online. To get that, we used a general image dataset, COCO, and then ran YOLOv8 on thousands of images to get images of humans. Then, we used the chatgpt 4o API to generate descriptions of those extracted people. Finally, we trained the text and vision transformer together using contrastive learning, and were able to reach a median distinguishability of 96%.

Challenges we ran into

One of the challenges we ran into is that the API wasn’t giving us usable descriptions, but thankfully, we were able to fix this. Further, the model we used before wasn’t giving us the right amount of values in the tensor. Another issue we faced is that the description wasn’t sending to the backend.

Accomplishments that we're proud of

An accomplishment that we are proud of is being able to get past the numerous challenges we faced in order to create a good, running project that can potentially be useful to real detectives. There are so many unsolved crimes and mysteries out there, and with the usage of a software like Suspect Watch, some justice could be served.

What we learned

We learned problem-solving skills when developing Suspect Watch that involved a lot of challenges to overcome related to API integration, tensor values, and description transmission. As well as technical expertise, the project involved integrating front-end and back-end components, using computer vision models (such as Yolo V8 and vision transformers), and handling data processing. Gaining expertise in these areas is valuable for future projects. Overall we learned that this project could make a real-world impact because understanding how technology can address real-world problems is crucial.

What's next for Suspect Watch

Next, we plan on implementing things like vehicle/license plate detection. We also want to have larger cloud storage, and plan on updating Suspect Watch as more and more technologies are innovated to have it continue being a modern, useful product.

Share this project:

Updates