Contact Center Post Interaction Task Automation (CC-PITA)

Sequence Diagram
System Architecture (With PITA Portal)
System Architecture (With Call Recording Software)
Whiteboard Design(Brainstorming idea session)
Whiteboard Design(Brainstorming idea session)2
Google AI Studio - Sample Prompt Template
CC-PITA Prompt Template

Inspiration

Having worked closely in the contact center domain, we are cognizant of stagnating issues of customer service domain and our minds are continually in search of solutions which will alleviate these issues. Given that customer service is utmost important for any business, it is equally important that contact center agents are sharp, knowledgeable and highly productive. While the training and supervisor guidance helps agents, it's the manual, tedious and monotonous tasks which bring fatigue to the agents and at the end drop in productivity.

One of the non-negotiable and essential activities for call center agents is to perform after call work which includes submitting the information pertaining to the incident/case/interaction they worked on. For this, agents need to take notes during the call, after that call, remember the conversation and important information captured during the call, and all of this needs to be completed within a matter of a couple of minutes. This is because until this information is submitted, agents can not receive the next call which affects their performance and may even cause escalation. Finally leaving the agent frustrated and uninspired.

Since contact centers are gold-mine of data like customer interactions, pain-points, customer feedback and what not. One of the key players in this is the audio recording of the conversation between caller and IVR (IVA, VoiceAI, VoiceBot)+Agent or caller and agent. Since we can derive several parameters from these recordings alone like customer sentiment, intent, feedback, call summary etc, we decided to put it to a good use and build an automation which can submit the critical information on agent's behalf relieving them of the time consuming manual tasks they have been doing since ever. This will take the pressure off the agents so that they can focus completely on helping customers rather than taking notes or worrying about missing any information.

What it does

PITA application uses the call recording (which is the conversation between caller and IVR+Agent, or caller and agent) stored in Google Cloud Storage - uploaded by the PITA portal (Streamlit based frontend app) user (In this demo) or by the recording software (production scenario) and uses multimodal capabilities of Gemini Pro 1.5 to transcribe the audio to text and analyse this text using prompt template and prompt engineering. The recording analysis provides crucial insights like intent of the conversation, caller sentiment, summary of the conversation and many more. This data is systematically converted to JSON format again by using prompt template and prompt engineering. This JSON is further stored in Google firestore and is used to display data in PITA portal. All of this process is driven by the intricately fabricated applications (Backend services written using Springboot and NodeJS) deployed on GCP's Cloud Run service.
In nutshell, this suite consists of two main components:

UI Component - Developed using Streamlit and hosted on Streamlit cloud.
Backend Component - This consists of Cloud function, PITA Orchestrator and PITA Backend.
- The Cloud function detects a new object upload event on GCP cloud storage and then triggers REST API of Orchestrator running on Cloud Run.
- The Orchestrator executes a series of tasks in series through a pipeline designed in Springboot application and hosted on Cloud run by dockerizing the Springboot application. PITA Orchestrator invokes PITA backend API providing it the call recording details.
- PITA Backend uploads the media file for Gemini Pro 1.5 multimodal and then invokes generative AI functionality for producing parameters (key details like intent, summary, sentiment, etc.) from this audio recording which are sent back to the orchestrator.
- Finally, the process ends by inserting this key information into Cloud Firestore for further analysis and for it to be displayed on the PITA portal.

Please note that for this hackathon, PITA Portal was developed to show how this solution works. In real-world production scenario, contact center call recording software (like Verint, NICE, etc.) will upload the call recording file in real-time to the assigned cloud storage bucket instead of PITA portal. Rest of the process and pipeline execution remains the same. This is mentioned in architecture diagram link

How we built it

We used several tools and technologies to build PITA. Here is our tech stack:

Gemini Pro 1.5 Multimodal with prompt template
Google Cloud Storage (GCS)
Google Firestore
Google Cloud Run
Docker
Streamlit
Streamlit Cloud
Python
Springboot
Node JS

Challenges we ran into

There is no Java SDK available for integrating with multimodal Gemini pro 1.5. We had to incorporate this in the NodeJS/ExpressJS app to be able to consume the file API and generateContent multimodal API.

Accomplishments that we're proud of

Use of Gemini Multimodal which eliminated the need of transcription service. If it were not for Gemini multimodal, we would have to use transcription service, which adds to complexities, latency and application maintenance.
Diverse Technical Stack
Extensively driven by Google Cloud Platform
Our ability to assimilate for building GUI portal despite being the Voice+Conversational application developers.

What we learned

Gemini Pro 1.5
Streamlit

What's next for Contact Center Post Interaction Task Automation (CC-PITA)

Highlight the newly added record in "Post Interaction Task Records" table after the file was uploaded for processing in the playground.
Allow switching between dark and light themes once Streamlit allows to do so programmatically.
Multiple audio file uploads on the playground screen.
Customization and add more varieties of charts and metrics in the "Charts and Metrics" section.
Login/RBAC (Role based Access) support for the portal.
Add filtering on Post Interaction Task Records.
Allow selection and export selected records to CSV in the Post Interaction Task Records section.

Built With

gcp
google-cloud
java
node.js
python
springboot
streamlit

Submitted to

Google AI Hackathon

Created by

I designed the flow and contributed on Java, GCP - Cloud functions, Springboot Backend, GCP - Deployment, Documentation, Execution steps.

Shivaprasad Mohanrao
I worked on Streamlit front-end and Node JS backend projects. I was a good learning experience. I also learnt about Gemini's pro 1.5, first multimodal I tried. Overall I enjoyed building this solution as I could use and apply my contact centre experience/knowledge and learn a few new things as well.
- Priti Sharma

Priti Sharma