Question
We are looking for someone to help us figure out the high-level strategy for building a system that works.
Consider you have tons of groups (like Facebook groups) on different topics, sometimes overlapping topics, sometimes not.
Our goal is to achieve 2 things:
- Based on the content shared by a group, links (and potentially even their conversations), identify relevant topics to that group. Ex: machine learning, react, etc.
- Based on the collective list of contents shared by these groups, to suggest content from one group to another as long as it's within their interest (or in the close neighborhood).
If you were to tackle this with data science, which would be the best approaches in each case?
Just looking for an overall picture, we'll hire experts time for detailed orientation.
Answer
1. To tackle the problem of extracting relevant topics of "key-words" from a text (posts, tags, conversation) a simple NER (Named Entity Recognition) system can be used. It can create a list of all the relevant Topics in the text.
2. Once you have the list, this list can be used in another DL algorithm - Recommendation System.
Challenges with a Recommendation System is that it requires a lot of pre-training data and a huge resource to train. (This is only viable in cases where you already have a good amount of tagged data and do not have any constraints in terms of using bigger resources).
3. Another simpler, faster but less accurate way is to use a clustering model ( can be from ML or DL depending on the data ). This approach creates multiple clusters and can tag each element in the NER list with one of the clusters.
Then using distance formula within the cluster one can find out the most relevant topics that are related to the element in the NER list.
If you'd like more details on the approach let me know.
Regards,
Deepesh