Inspiration
It all started with our own experiences in school during the pandemic era, engaged with online learning. The video calls and lectures that each of us was attending day by day were informative but not efficient ways of teaching students in the virtual world. The tests we gave were not analysed properly by the teachers which could provide essential insights as to where we lack understanding. The same goes for the live lectures, which could be analysed to see exactly where the students lacked through their body gestures, just like in a physical classroom.
Thus we thought of bringing the same reality into existence through those live lectures to benefit the students in their study which is also applicable in the present times. Our model currently supports these features in one-on-one student-teacher classes as of now.
What it does
Detection of Emotion and Attentiveness of a Student
Used Transfer Learning on EfficientNetB7 model along with a custom dataset combined with an open-sourced LLMs to capture the same through the recorded lecture which the student attended through the webcam during that session.
Topic Modelling for Each Segment where Student lost attention
-> Through the power of Gemini-Pro SDK and OpenAI Whisper module, each of the segments where the loss of interest is detected is transcribed into text and the most relevant keywords(Topic Modelling) are done to bring out the topics where students need clarification. -> The teacher is thus present with an analysis as to when the student lost his/her attention and on which topic(s) through the video lecture
Automatic Video Time-stamping into Different Topics- Saving Teacher's Time
-> This feature allows automatic time stamping of videos depending on a hyperparameter chosen by the teacher exclusively.
Reverse Video Search
-> This feature allows the student to not get through the video again and again for a particular keyword but rather type it and get the exact timestamp(s) for the same.
Generation of Questionnaire on topics where students lacked attention
-> This feature allows the teacher to generate/upload a questionnaire through an LLM for the topics for the student to double-check his/her understanding to plan further steps. -> The student can also get recommended most relevant YouTube videos on the same or the teacher could suggest some materials or videos from their course itself.
Advance RAG-based QnA bot (Handwritten/Non-Handwritten)
-> This feature allows the user to perform question-answer with their uploaded PDFs whether it's Handwritten or Non-Handwritten (under-development). -> This allows the student to quickly search and get relevant information/explanation for a particular topic of interest and not go through searching for it, wasting their precious time.
Advance RAG-based QnA bot for Video/Tutorial
-> This feature allows the user to perform question-answer with their video/tutorials. -> This allows the student to quickly search and get relevant information/explanation for a particular topic of interest in the video/tutorial and not go through searching for it throughout the video.
How we built it
Backend
Video Analysis -> Leveraging the power of the GeminiPro Vision model, MTCNN and a few Python Libraries, this segment was built. -> Through the drive link, the video is downloaded and stored locally. This gets passed through a Python library which generates clips of the video with a specified interval and is stored. These clips then get processed(detection of student's face) by the MTCNN model. -> These images are then finally passed through the GeminiPro Vision model which predicts the "Emotional State" of the student at that point and the "Attentiveness Factor/State". -> At last, through the power of Gemini-Pro and OpenAI Whisper module, each of the segments where the loss of interest is detected is transcribed into text and the most relevant keywords(Topic Modelling) are done to bring out the topics where students need clarification. -> The teacher is thus present with an analysis as to when the student lost his/her attention and on which topic(s) through the video lecture ->The results with their time intervals are displayed on the webpage, thus giving a hint as to specifically where the student lacked responsiveness.
Video Keyword-oriented Timestamp Generation For teachers -> This feature allows automatic time stamping of videos depending on a hyperparameter(time interval) chosen by the teacher exclusively.\ -> Through the power of Gemini-Pro and OpenAI Whisper module, each of the segments detected is transcribed into text and the most relevant keywords(Topic Modelling) are done to bring out the topics where students need clarification.
For students -> The students can also quickly get relevant keywords for a specified interval as chosen by them so that he/she knows what they are getting into for a particular video lecture.
Search within the Video Lecture ->This feature allows the student to not get through the video again and again for a particular keyword but rather type it and get the exact timestamp(s) for the same.\
-> Firstly, the whole lecture gets divided into certain chunks and later gets transcribed through OpenAI Whisper. These then get their embeddings generated through GeminiPro API, and with the help of Cosine Similarity, and between the stored embeddings and the embeddings of the query, a time stamp is given as the output.
QnA Bot for Video Lectures -> Firstly, the whole lecture gets divided into certain chunks and later gets transcribed through OpenAI Whisper. These then get their embeddings generated through GeminiPro API, and with the help of the FAISS tool of the Langchain library, and the help of Lanchain templates, a conversational chain is set up and a RAG setup is made with an LLM instance of Gemini Pro. ->The fetched results are displayed on the website.
QnA Bot for PDFs
Non-Handwritten ->First, the whole PDF gets through the embeddings generation phase through Langchain with the help of GeminiPro and FAISS tools of the same. ->With the help of FAISS, and the help of Lanchain templates, a conversational chain is set up and a RAG setup is made with an LLM instance of Gemini Pro. ->The fetched results are displayed on the website.
Handwritten ->For this, the PDF of the notes gets, with the help of Python Library, split into separate instances and converted into images. These are then passed into the GeminiPro model for OCR and the generated text gets stored. -> The embedding is generated and with the help of Langchain tools, and the help of Lanchain templates, a conversational chain is set up and a RAG setup is made with an LLM instance of Gemini Pro which generates the response for the query and is displayed on the webpage.
Frontend
Building the Frontend with Next.js and TypeScript We built Insight-Ed's user interface using Next.js, a popular React framework for server-side rendering and static site generation. This choice offered several advantages:
Improved Performance: Next.js pre-renders pages on the server, resulting in faster initial load times for users. This is crucial for a smooth learning experience.
Enhanced SEO: Pre-rendered pages are easily indexed by search engines, potentially increasing organic user acquisition. We opted for TypeScript, a superset of JavaScript, that adds static type-checking. This provided several benefits:
Improved Code Maintainability: Typescript enforces data types, catching potential errors early in development and making code easier to understand for both us and future contributors.
Enhanced Developer Experience: IDE features like autocompletion and type inference improve development speed and code clarity.
Challenges we ran into
FrontEnd
While Next.js and TypeScript offered a strong foundation, we encountered some challenges during frontend development:
Managing Complex State: Insight-Ed juggles various data points like student engagement, video timestamps, and quiz generation. Managing this complex state efficiently across components requires careful planning.
Designing for Different User Roles: Our platform caters to both teachers and students. We had to ensure a clear and intuitive user experience for each role, employing conditional rendering and role-based access control mechanisms.
Building Interactive Features: Features like reverse video search and the QnA bot required us to leverage JavaScript libraries and APIs to create a seamless and engaging user experience. Integrating these functionalities with the Next.js framework presented some initial learning curves.
Backend
Detection of Handwritten Texts: The detection of handwritten texts proved to be difficult since the recognition of these textual data was necessary for the QnA bot to work correctly. Thus, we experimented with different methods and open-source works but the ones with the help of multimodal LLms like GmeiniPro Vision proved to be one of the best. Though it is not fully supported in the API, we did however come up with a simple prompt that gets the job done, requiring more work in the near future.
Deployment: The deployment of the Flask server proved to be difficult on GCP(VM instances), requiring multiple corrections in the version numbers, resolving conflicts between tools' versions, missing packages and overall compatibility. However, we then came across Docker and thought of containerizing our whole project to get rid of most of the issues and we were able to successfully deploy it and make it available on DockerHub.
What we learned
Overcoming these challenges provided valuable learning experiences:
Importance of Modular Design: By breaking down our frontend into well-defined, reusable components, we simplified state management and improved maintainability.
Leveraging the Community: The vast developer communities for Next.js and TypeScript offered a wealth of resources, tutorials, and best practices to help us navigate complex functionalities.
Prioritizing User Experience: Throughout development, we constantly evaluated features from the perspective of both teachers and students. This focus ensured an intuitive and user-friendly interface.
OCR (Handwritten PDF): Delving into the concept of OCR and studying all the techniques that best fit our use case was a very fruitful experience that did furnish our research skills which will further aid us in many ways in future.
Deployment: We learned about Docker and how containerizing our product makes it very easy and reliable to manage it and apply DevOps principles to streamline our production environment. Also about serverless architecture, we learned about scalability issues and other Cloud-related knowledge that will prove to be of immense use in the future.
What's next for Insight-Ed
We envision Insight-Ed evolving into a comprehensive online learning ecosystem with features like:
Real-time Collaboration Tools: Allowing teachers and students to interact and collaborate during live lectures.
Gamification Elements: Introducing badges, points, and leaderboards to enhance student motivation and engagement.
Advanced Analytics Dashboard: Providing teachers with in-depth insights into student progress, engagement patterns, and areas requiring additional focus.
Fully-Fledged Tutorbot: Allows the student to get clarifications and explanations regarding the topic that the student had studied for a better understanding from an AI Chatbot designed to act like a substitute teacher, encompassing vast knowledge in the subjects of the student.
We believe Insight-Ed has the potential to revolutionize online learning by leveraging AI and user-centric design to bridge the gap between teachers and students. We are excited to continue developing and iterating on the platform based on user feedback and future technological advancements.
Built With
- chromadb
- cloudrun
- docker
- faiss
- flask
- gcp
- gemini
- huggingface
- langchain
- mtcnn
- next.js
- postgresql
- python
- tailwind
- typescript
- vercel
- whisper
Log in or sign up for Devpost to join the conversation.