Loading...
Answers
MenuHow can I aggregate data from online sources about a specific topic?
Answers
There are so many ways to do it... Do you need this data for yourself, or you are planning to make a product around it?
From what I see you can use Twitter API and Facebook Graph API (Are you comfortable programming?) Most of the students are active on social media so you will find lots of data. Facebook graph API will give you a number of likes and comments to all the posts of you competitors. You can analyze all the posts of your competitors. Using Twitter API you can get all the twits that use certain hashtags or mentions. If you are not into coding, but still want to get social media information, you can take a look at tools like IBM Watson ANalytics ($30 for personal use), it natively connects to Twitter API, and you don't have to be a programmer at all. It is intuitive and easy to learn. Analytics Canvas connects to Facebook Graph API (it's free for 30 days of trial).
Unfortunately, you would not be able to collect any personal information from social media at large scale (age, income, gender, etc.), because it violates all the laws about privacy on the Internet. You can use census data instead.
Google Sheets are a very handy tool if you are planning to use this information for personal research. You can set up a spreadsheet and add some Java script to make it collect all information from competitor's blogs, and also sites like Reddit.
Finally, you can try web scraping (it's not the best, but can speed up the process). A tool like OutWitHub will collect information from websites (such as website reviews) based on the structure you provide (select html tags). You can collect thousands of reviews in one day if you automate it (paid version). Very easy to use.
Note: not all the websites are open to this method, review their policies to make sure you are not violating their terms of service. Reviews belong to the website where they were published.
If you REALLY need personal data (like how much they earn and how much they spend, etc.), just print out 100 questionnaires and go to Student Union Building of Dalhousie University. Most of the students will share any personal data in exchange for a Tim Horton's gift card that gets them a free coffee. It is probably the least technical and fastest way to get all the data you need.
Hope this helps.
This one is tricky because all social media platforms have terms and site rules around collecting personal data. Facebook in particular is touchy about this, so I would not recommend it. Twitter has an API (https://developer.twitter.com/en/docs/tweets/search/overview) that will give you raw data for a small cost, so your hashtag information can be obtained that way. If you are serious about this, then I would identify the top 50-100 businesses that employ students in the area which also keep their pay scale in the range you're looking for. Then using those company's social media pages to identify employees. You are now zeroed on your audience, you just need to fine tune it to filter on the those students that eat out, perhaps by cross referencing Yelp/Google reviews with the employee data obtained.
I would like to re-emphasize here that this sort of data is tricky to obtain as many of the sites containing this data have terms and conditions which discourage collecting personal data.
Are you hoping for a dynamic system that identifies new data sources as they become available? Or will you identify existing sites/pages that give you the data you want and focus on what is generated there? From there, you can obtain the data manually or programatically. Manually, you will actually be visiting those pages on a regular basis, copying and pasting the data there, and then formatting the data to be usable. This is not a scalable solution, but can work if the data sources are limited and you've got the time. Programmatically, you can build small scripts that will make the site requests for you. If you don't program, there are many freelancers/developers that have this skillset and are reasonably priced. There are also 3rd party platforms that are well priced for small scale web crawling.
To sum up, you need to identify where you'll get your data from online (the income data & personal will be the trickiest), then identify how you want to obtain that data (manually or programmatically), and finally format the data into a usable dataset (CSV & JSON being simple and universal).
I have the solution. Contact me
Related Questions
-
How big data can evolve if our overall ability to convert data into really valuable info that then can drive effective decision making is so limited?
I would disagree with the premise that "big data" isn't being converted into valuable information. Modern advertising is being driven by exactly this ability. The problem lies not with the information, or lack thereof, but with the decision makers. Lack of vision at both the strategic and tactical levels of decision, fear of change, among other things are at the root of the problem.GD
-
How can I know when to go with my gut or when to trust the data when making a business decision?
In my experience coaching hundreds of people, anytime you go against your gut (aka intuition) you will lose. Data is important to consider, but your gut often knows the path.RC
-
Affordable analytics platform for product, marketing, sales, UX, tech support and employee satisfaction?
You mention, that you can code and maintain yourself, and at the same time it sounds like your data can be dispersed across multiple sources. And you want to minimise license cost. In my business we use R a lot for both analysis and reporting. It is free (open source), but you will have to build programs. But it will read any format, and also create output to any destination you want. So you could start off with a mix of R, Excel and pdf, just to get things going. However, it could make sense for you to build a database already so you familiarise yourself with data warehouse thinking, since you want to expand with marketing automation. At that point you will need a database for monitoring response, sales etc. So it is important to build the right foundation as ealy as possible. If you don't want to code, but want point-and-click, there is boatloads of software for that, but probably more license cost (unless you can find open source for that). So If I were you, I would start off with something smaller. After all, you want to focus on the value you create for the business and your colleagues, and the time you save, rather than a smooth IT-infrastructure for this. I have lots of experience building reporting and analysis in the areas you mention, so if you want to discuss further, feel free to set up a call. Good look with your development. Best regards Kenneth WolstrupKW
-
If you were to build a freelance marketplace for data scientists and data analysts, what kind of companies and projects would you target?
It's unlikely that companies would look to outsource such a critical component and also it would be near impossible to create trust around 3rd parties accessing their data especially via an intermediary service.TW
-
What is the viability of big data in education? Is learning analytics a potentially viable market considering current restraints on public education?
Qualifications on the answer: Created the first website to be ever commercially licensed by a ministry of education for in-school use (Brainium 1996). Was a VC associate at a 500,000,000 firm called Knowledge Universe focused on all things education-related that intersected with technology. Short Answer: Yes and no. Longer Answer: Learning analytics is one of those "lightening in a bottle" kind of industries. Everyone knows it's going to happen, whomever makes it happen at scale first, will have a huge advantage, and even more niche plays in this space will still create massive value for shareholders and significant transformation of education as we know it. That's the good news. The problem is that funding startups in the education space that are dependent on "permission" being granted by existing institutions and their employees is very much hit or miss. It's getting easier to raise a $500-800k seed round for a good idea with a good team, but the problem is that getting the next bigger round is very challenging in that the models of most of these seed-funded companies are not growing fast enough to prove to institutional investors that a growth model of the idea being very big in a relatively short time-period is credible. So if you're intent on pursuing this, I'd be sure to focus entirely on how you can get growth and adoption with requiring little to no institutional buy-in which sounds almost impossible on the surface of the area you're exploring but I believe isn't actually impossible. I'm very passionate about this space and want something like what you're describing to succeed so would be happy to do a quick call to hear where you're at and see if I can provide you some actionable ideas on how to reduce the friction / improve adoption of your analytics product.TW
the startups.com platform
Copyright © 2025 Startups.com. All rights reserved.