Inspiration
In BC time (before ChatGPT), Sean worked on a project with the CIA to aid in knowledge sharing and retention between analyst rotations. We were inspired by that work and the plethora of new vision-language models - we wanted to explore what types of tools we could build to support better utilization of imagery data without significant engineering investments for new analysis features.
What it does
Problem: Intelligence analysts are inundated with unstructured open source data that requires significant efforts to analyze and utilize. Imagery data from different sources are particularly hard to catalog and discover in a time efficient manner. Building tools to analyze imagery data require significant engineering investments. Solution: Provide tools for extracting insights from imagery data and use semantic search across imagery and text data sources for data discoverability
Key Functionalities: Semantic search across social media text and imagery Satellite imagery zero-shot object detection and feature extraction Automated analysis of satellite imagery
How we built it
Open source, locally running detection, captioning, and VQA models OWL-ViTv2 Open-Vocabulary object detection COGVLM image captioning and visual question answering RAM++ image tagging PostgresQL + PGVector for semantic search Streamlit for lightweight UIs OpenAI GPT and text embedding APIs
Challenges we ran into
Long inference times on CPU Data access issues on some datasets
Accomplishments that we're proud of
Setting up a full semantic search datastore over text and imagery Impressive detection fidelity on satellite imagery with open source models Built a few different analysis tools with a team of two
What we learned
Open source visual question-answering tools are viable for imagery analysis with diverse viewpoints (satellite, first person, security footage). Open vocab trackers across videos are sparse in open source community
What's next for Imagery Adventures: Semantic Analysis and Extraction
Analytics extraction from video sources
- How many people exited a building in the last hour?
- How many ships docked in the last day? Continuous analysis pipelines from imagery sources Open-vocabulary object detection and tracking Improved imagery tagging for semantic search over image crops
Demo Deck: https://docs.google.com/presentation/d/1NnpZpbe55q8T-tL0j6mvWFpfmkL1DBC_mVnTUYks_4Y/edit?usp=sharing
Built With
- cogvlm
- detection
- embedding
- gpt
- object
- open-vocabulary
- openai
- pgvector
- postgresql
- ram++
- streamlit
- text
- vqa
Log in or sign up for Devpost to join the conversation.