Inspiration

In BC time (before ChatGPT), Sean worked on a project with the CIA to aid in knowledge sharing and retention between analyst rotations. We were inspired by that work and the plethora of new vision-language models - we wanted to explore what types of tools we could build to support better utilization of imagery data without significant engineering investments for new analysis features.

What it does

Problem: Intelligence analysts are inundated with unstructured open source data that requires significant efforts to analyze and utilize. Imagery data from different sources are particularly hard to catalog and discover in a time efficient manner. Building tools to analyze imagery data require significant engineering investments. Solution: Provide tools for extracting insights from imagery data and use semantic search across imagery and text data sources for data discoverability

Key Functionalities: Semantic search across social media text and imagery Satellite imagery zero-shot object detection and feature extraction Automated analysis of satellite imagery

How we built it

Open source, locally running detection, captioning, and VQA models OWL-ViTv2 Open-Vocabulary object detection COGVLM image captioning and visual question answering RAM++ image tagging PostgresQL + PGVector for semantic search Streamlit for lightweight UIs OpenAI GPT and text embedding APIs

Challenges we ran into

Long inference times on CPU Data access issues on some datasets

Accomplishments that we're proud of

Setting up a full semantic search datastore over text and imagery Impressive detection fidelity on satellite imagery with open source models Built a few different analysis tools with a team of two

What we learned

Open source visual question-answering tools are viable for imagery analysis with diverse viewpoints (satellite, first person, security footage). Open vocab trackers across videos are sparse in open source community

What's next for Imagery Adventures: Semantic Analysis and Extraction

Analytics extraction from video sources

  • How many people exited a building in the last hour?
  • How many ships docked in the last day? Continuous analysis pipelines from imagery sources Open-vocabulary object detection and tracking Improved imagery tagging for semantic search over image crops

Demo Deck: https://docs.google.com/presentation/d/1NnpZpbe55q8T-tL0j6mvWFpfmkL1DBC_mVnTUYks_4Y/edit?usp=sharing

Built With

  • cogvlm
  • detection
  • embedding
  • gpt
  • object
  • open-vocabulary
  • openai
  • pgvector
  • postgresql
  • ram++
  • streamlit
  • text
  • vqa
Share this project:

Updates