Inspiration

Wide-spread UAV use is now a reality of modern combat. With more drones, the challenge of ISR scales well beyond our warfighter's capacity to oversee and analyze, let alone make meaningful tactical decisions.

We were inspired by conversations with John from SOCOM and Mahmud from Ukraine in the Discord, who both have hands-on experience in the Ukraine conflict. Mahmud’s specialty is in ingesting real-time drone footage and gave us valuable insights into the problem space.

What it does

Overwatch is a platform for analysts and operators to ingest UAV footage and rapidly get relevant information to make tactical decisions with.

Video Segmentation Models

  • Operators/Analysts can provide prompts about “objects of interest” which we’ll track. We use SAM which spits back a highlighted semantic map of said objects which we’ll track for the duration of the video. LLM-reasoning traces & semantic scene search
  • We further process “events of interest” extracted (i.e. an exploding tank) from the video segmentation models.
  • Semantic scene understanding/search allows us to precise snippets of hours-long video for the operator based on actual events and queries.

How we built it

There basically aren’t drone warfare datasets, so we chose an off-the-shelf, general purpose segmentation model (i.e. Meta’s Segment Anything Model).

We then use GroundingDINO to figure out where in the scene the semantic labels are.

We then use GPT-4-Turbo to run “reasoning traces” on the semantic labels and some scene data for further reasoning about events.

Then we stitched this together with some simple Uvicorn/FastAPI server and a NextJS frontend.

Challenges we ran into

  • Running GroundingDINO
  • Getting drone photogrammetry to work

Accomplishments that we're proud of

Working prototype!

We’re able to input the few combat UAV videos we were able to get our hands on and get somewhat meaningful segmentations / analyses! We were able to get instances of armored vehicles and track them across the video. We also took a stab at localization without GPS by correlating satellite imagery with images extracted from drone footage.

What we learned

We were surprised at how well off-the-shelf works! It gets us even more excited for when we can tune on real data.

What's next for Overwatch

  • Fine-tune model on military specific data
  • Local inference on drone (TensorRT & Quantization)
  • Improve localization (heatmap)

Built With

Share this project:

Updates