Marites Profiler - Better Personalisation

Cover photo
Generalised news
Profiler page
Personalised view
Topics extracted
What the data looks like in TigerGraph

Inspiration

Personalisation refers to the action of tailoring a business' content or offerings based on an individual's preference. It is proven to provide many benefits for businesses and their customers. It can result in better conversions, improved customer loyalty, better user experience, increased sales, and higher retention.

Personalisation is generally achieved through recommender engines that uses two filtering approaches; collaborative filtering or content-based filtering.

Collaborative filtering focuses on collecting and analysing data on user behaviour, activities and preferences to predict what a person will like based on their similarity to other users. It makes decisions based on what they know about the users.

Content-based filtering, on the other hand, works on the principle that if you like a particular item, you will also like this other item. This approach is generally based on the customer's preferences and a description of an item.

These two approaches are generally combined together to reap the most benefits, however it can take a very long time to gather data for them to be effective for personalised marketing if the user doesn't spend enough time using the application.

I wanted to find a way to personalise my content as soon as the user registers for the site. I was inspired by a research paper that suggested better recommendations through aggregated social media data. and I wanted to put their theory into practice.

What it does

It uses Amazon Comprehend's new targeted sentiment analysis feature to create a model that represents the user and their interests by analysing their posts and the posts of the people they follow.

Targeted sentiment analysis extracts keywords from the posts and maps them to their sentiment. This data is uploaded into TigerGraph so that I can make complex queries on a user's likes and dislikes to provide personalised data.

You can find a live demo of the website here. It will be live until the end of May this year.

How I built it

Majority of the technology stack is hosted through AWS. I used AWS CDK during development so that I can build and teardown my entire AWS infrastructure with a single command and it's easier to reproduce my solution for anyone else interested in using it.

The frontend is built using NextJS and React. It communicates with the backend through an Amazon API Gateway.

The backend is created using lambda functions that are stitched together using an API gateway. It supports endpoints for analysing the user, retrieving user information, and retrieving website content.

The analysis endpoint uses the Twitter API to collect raw information about the user. This is then uploaded into an S3 bucket that serves as an input to AWS Comprehend. Once the files are uploaded, it will trigger an AWS Comprehend targeted sentiment analysis job that will run in the background. New users generally take around 20 - 30 minutes to process depending on the number of followers and posts they have. Once AWS Comprehend finishes analysing the raw data, it will upload the output into another S3 bucket.

Each S3 bucket has an internal lambda function that gets triggered whenever a file is uploaded inside it. The internal lambda functions are responsible for creating vertices and edges inside TigerGraph.

TigerGraph stores a large pool of data that contains information about users, the people they follow, their posts, topics related to the posts and the post's sentiment towards those topics. I use it to retrieve information about the user's interests by only querying topics that have positive sentiments.

Whenever the user "logs in", the frontend initially queries the TigerGraph for information and only performs the analysis if they don't have any followers (we haven't analysed them yet, but we have stored information about them due to a previous analysis) or the user doesn't exist (we haven't analysed them at all).

The interests are then passed into the "news" / content endpoint to retrieve news articles associated with the query. The content endpoint currently aggregates data from News API and Newscatcher API.

The lambda functions are written using Python. The AWS CDK Stack is built using Typescript and NodeJS.

Full details about the architecture, its components and the logical flow can be found here.

Challenges I ran into

The most difficult part about this project is figuring out what sort of problem I could and want to solve without knowing anything about TigerGraph or AI at the beginning.

Throughout the four weeks of developing this project, I spent the first one and a half weeks reading through books about machine learning and AI, scanning through documentations, and watching videos about TigerGraph to identify what sort of problem I could solve.

Coincidentally, I ran into a conversation at work about sales and marketing techniques such as cross-selling, personalisation and all that sort. I got curious about how they worked and started looking into it. That's when I came across recommender engines, the research paper about aggregated social media data, and the rest.

Since I'm a one-man team, the second difficult part about the development of the project was juggling between my full-time job and finding time to work on this hackathon. I didn't want to join a team because I didn't want to feel terrible if I ended up skipping the hackathon all-together due to work commitments.

The last challenge I encountered was using the TigerGraph SDK on lambda functions. I encountered A LOT of errors that wasn't caused by my implementation. The library simply didn't work at times and I had to find workarounds by reading through their SDK's source code and figuring out where the problems were. It was like this is the first time the TigerGraph python SDK is used on a lambda function... but I didn't mind it. I sent a lot of messages on the support channel and hoped that it helped the TG team improve their software in the future.

Accomplishments that I'm proud of

I'm honestly just proud that I finished this project in time. As someone who didn't know anything about AI at the beginning, I feel proud of myself for building a full-stack application with an AI component in two and a half weeks time. It really shows me how far I've come since I started competing in hackathons during my 2nd year university.

I'm also proud of how useful my product is! The application uncovered things that I didn't even know about myself.

What I learned

Majority of the things I used in this hackathon was completely new to me. I was familiar of the concept of lambda functions, API gateways, CDK and all that but I didn't know how to practically use them or set them up. Now I do.

It also allowed me to realise that there are databases that exists beyond relational and nosql databases. This may sound cheesy but TigerGraph made me change my perspective in seeing databases as a bunch of collection of "stuff" and instead look into the data as a whole.

I also learnt a lot about the basics of machine learning and AI that I wish to continue studying in the future.

What's next for Marites Profiler - Better Personalisation

I'm going to bring this creation to my workplace and see if we can improve our personalisation with it.

I also plan on applying it to other use cases like competition analysis after finding interesting things about companies using my tool.

Built With

amazon-web-services
cdk
comprehend
heroku
jupyter
lambda
newsapi
newscatcher
nextjs
node.js
python
react
s3
tigergraph
typescript

Updates

Chris Rabe posted an update — Apr 26, 2022 03:10 AM EDT

UPDATE: Due to a usage warning limit from Vercel, I may be taking down the demo site early to avoid account suspension. Currently requested the judges to allow me to make a small, minor fix in my frontend code and requested Vercel support to give me 48 hours to resolve this issue. If I don't hear from the judges within 24 hours or I'm not allowed to make the change, the site will be taken down tomorrow. Sorry for the inconvenience.

Log in or sign up for Devpost to join the conversation.

Chris Rabe started this project — Apr 19, 2022 07:09 AM EDT

Leave feedback in the comments!

Log in or sign up for Devpost to join the conversation.