AdImpact Databot

Instance #1 of the databot looks at AdImpact political advertising data.
It allows natural language query of complex datasets, and can explain the SQL statements it generates.
Using Cortex's TRANSLATE() and SUMMARIZE() functions, we can extract short summaries of ad text to understand what politicians are saying.
Instance #2 of the Databot looks at AdImpact automotive ads -- it takes just a minor configuration change to look at a different dataset.
The state machine used by the agent to mediate among the user, the Arctic LLM, the SQL Analyzer, and the Snowflake Database.
The AdImpact Databot Team: Michael, Daniel, Minita, and Joe.

Inspiration

The idea for AdImpact Databot was born out of the need to empower our analysts, who excel with Excel and Tableau but lack SQL expertise, to harness the full potential of our data. We recognized the potential of Large Language Models (LLMs) to bridge this gap by converting natural language instructions into working SQL statements. Our goal was to create a tool that not only answers queries but also educates users on SQL, ensuring secure and safe data interactions.

What it Does

AdImpact Databot enables users to query databases using natural language, translating these queries into SQL. It provides answers and explanations, guiding users through the nuances of SQL. The bot ensures security by parsing and validating SQL statements, restricting them to safe read-only operations. Users interact with the Snowflake Arctic LLM only through the SqlAgent mediator, which manages state transitions and ensures secure, meaningful responses. The version exposed for the contest has access to 2024 US political ad and spending information; pointing it at different tables would need only a simple configuration change.

How We Built It

We built the AdImpact Databot on top of Snowflake, Arctic, and Streamlit, plus our own leading-edge LLM agent code. Some key aspects of the tool:

Configurability: Allows easy setup of new instances with minimal configuration changes, adapting to different data sets and user needs.
SQL Agent with State Machine: Manages interaction states: awaiting input, prompting the LLM, running SQL commands, and sending output.
SQL Parser and Validator: Ensures SQL statements are valid and read-only.
Security Measures: Configured Snowflake users and roles to limit access and prevent security breaches.
Performance Optimization: Improved response times through database tuning and session state maintenance.

Challenges We Ran Into

Security and Safety: Ensuring the LLM-generated SQL statements were secure and adhered to read-only operations.
Performance Issues: Initial response times were slow, requiring extensive optimization.
Testing and Usability: Designing and testing the chatbot required diverse user interactions to uncover potential issues and improve usability.
Context Window Limitations: Dealing with the Arctic LLM’s token limit required careful prompt construction and metadata management.

Accomplishments that We're Proud Of

Security Architecture: Successfully implemented a robust security system to prevent SQL injection and unauthorized data access.
Configurability and Scalability: Achieved a highly configurable system that can be easily adapted to different use cases with minimal effort.
Data Enrichment: Used Cortex TRANSLATE() and SUMMARIZE() functions to turn ad transcripts, some in Spanish and some very long, into pithy English extracts.
Performance Improvements: Significantly reduced response times, ensuring a smoother user experience.
User Empowerment: Enabled non-technical users to interact with and understand complex data through natural language queries.

What We Learned

Importance of Diverse Testing: Engaging multiple testers revealed unexpected issues and improved the system’s robustness.
Effective Metadata Utilization: Embedding database documentation in view creation statements enhanced the LLM’s ability to infer and explain data structures.
Prompt Engineering: Providing sample results and focusing on reporting tables improved the LLM’s query accuracy and user comprehension.

What's Next for AdImpact Databot

Refinement of Capabilities: Experiment with different system prompts to elicit varied behaviors and further optimize performance.
Integration with Existing Products: Embed AdImpact Databot’s capabilities into our existing products to provide faster, more accurate information to users.
Enhanced User Training and Documentation: Develop comprehensive guides and training programs to maximize user engagement and understanding.
Advanced Security Features: Implement real-time monitoring and dynamic access controls to further enhance security.
Scalability and Load Management: Ensure the system can handle increased loads and implement load balancing techniques to distribute computational resources effectively.

By continuing to refine and expand AdImpact Databot, we aim to create an indispensable tool that not only answers queries but also educates and empowers our users, making data interaction more intuitive and secure.

About the Team

AdImpact CTO Michael Schader uses his PhD in Swarm Robotics to turn everything into a multiagent system. Full Stack Developer Daniel Brewer hacks on LLMs and video processing pipelines just for fun. Database Manager Minita Surti created and maintains a logical data architecture that's easy for both humans and machines to work with. Database Analyst Joe Bieda makes it all make sense with his command of the data dictionaries.

Built With

amazon-web-services
arctic
cortex
foundationdb
java
python
snowflake
state-machine
streamlit

Submitted to

THE FUTURE OF AI IS OPEN

Created by

I worked on testing the application and some security tips and testing against SQL injection, or improper permissions and accesses.

Daniel Brewer
I worked on testing the application, and I gave the bot working definitions of each column that would translate to every day speech.

JosefBieda
I was primarily engaged in the development of the backend tables and the underlying database architecture. This involved overseeing the end-to-end process of creating the tables and ensuring their operational functionality

Minita Surti
Michael Schader
AdImpact's founding CTO and the architect behind the company's solutions. Also uses his CS PhD to do research in AI and swarm robotics.

Updates

Michael Schader started this project — May 21, 2024 06:21 PM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.