We are looking for someone to help us figure out the high-level strategy for building a system that works.
Consider you have tons of groups (like Facebook groups) on different topics, sometimes overlapping topics, sometimes not.
Our goal is to achieve 2 things:
- Based on the content shared by a group, links (and potentially even their conversations), identify relevant topics to that group. Ex: machine learning, react, etc.
- Based on the collective list of contents shared by these groups, to suggest content from one group to another as long as it's within their interest (or in the close neighborhood).
If you were to tackle this with data science, which would be the best approaches in each case?
Just looking for an overall picture, we'll hire experts time for detailed orientation.
1) Look through posts and just and tally which "side-topics" show up the most within the last week. Rank them. The top two side-topics for each group will be the most promising ones to suggest to that group in the future. You'll basically be making a 'word cloud', like this: https://www.jasondavies.com/wordcloud/
2) To test your new hypothesis, occasionally start making your own posts on the groups with links to articles about the most promising side-topics you've identified. Either make brand new posts, or cross-post from other groups. See how popular the posts are.
3) If not, move to the next most popular side-topic and do some tests with that.
4) Repeat steps 1 - 4 (each time you do step 1, use the most recent week of posts)
If you'd like more detailed advice on how to do this, and test its effectiveness with regard to your specific groups, let me know,
best,
Lee
1. To tackle the problem of extracting relevant topics of "key-words" from a text (posts, tags, conversation) a simple NER (Named Entity Recognition) system can be used. It can create a list of all the relevant Topics in the text.
2. Once you have the list, this list can be used in another DL algorithm - Recommendation System.
Challenges with a Recommendation System is that it requires a lot of pre-training data and a huge resource to train. (This is only viable in cases where you already have a good amount of tagged data and do not have any constraints in terms of using bigger resources).
3. Another simpler, faster but less accurate way is to use a clustering model ( can be from ML or DL depending on the data ). This approach creates multiple clusters and can tag each element in the NER list with one of the clusters.
Then using distance formula within the cluster one can find out the most relevant topics that are related to the element in the NER list.
If you'd like more details on the approach let me know.
Regards,
Deepesh