Loading...
Answers
MenuHow can I aggregate data from online sources about a specific topic?
Answers
There are so many ways to do it... Do you need this data for yourself, or you are planning to make a product around it?
From what I see you can use Twitter API and Facebook Graph API (Are you comfortable programming?) Most of the students are active on social media so you will find lots of data. Facebook graph API will give you a number of likes and comments to all the posts of you competitors. You can analyze all the posts of your competitors. Using Twitter API you can get all the twits that use certain hashtags or mentions. If you are not into coding, but still want to get social media information, you can take a look at tools like IBM Watson ANalytics ($30 for personal use), it natively connects to Twitter API, and you don't have to be a programmer at all. It is intuitive and easy to learn. Analytics Canvas connects to Facebook Graph API (it's free for 30 days of trial).
Unfortunately, you would not be able to collect any personal information from social media at large scale (age, income, gender, etc.), because it violates all the laws about privacy on the Internet. You can use census data instead.
Google Sheets are a very handy tool if you are planning to use this information for personal research. You can set up a spreadsheet and add some Java script to make it collect all information from competitor's blogs, and also sites like Reddit.
Finally, you can try web scraping (it's not the best, but can speed up the process). A tool like OutWitHub will collect information from websites (such as website reviews) based on the structure you provide (select html tags). You can collect thousands of reviews in one day if you automate it (paid version). Very easy to use.
Note: not all the websites are open to this method, review their policies to make sure you are not violating their terms of service. Reviews belong to the website where they were published.
If you REALLY need personal data (like how much they earn and how much they spend, etc.), just print out 100 questionnaires and go to Student Union Building of Dalhousie University. Most of the students will share any personal data in exchange for a Tim Horton's gift card that gets them a free coffee. It is probably the least technical and fastest way to get all the data you need.
Hope this helps.
This one is tricky because all social media platforms have terms and site rules around collecting personal data. Facebook in particular is touchy about this, so I would not recommend it. Twitter has an API (https://developer.twitter.com/en/docs/tweets/search/overview) that will give you raw data for a small cost, so your hashtag information can be obtained that way. If you are serious about this, then I would identify the top 50-100 businesses that employ students in the area which also keep their pay scale in the range you're looking for. Then using those company's social media pages to identify employees. You are now zeroed on your audience, you just need to fine tune it to filter on the those students that eat out, perhaps by cross referencing Yelp/Google reviews with the employee data obtained.
I would like to re-emphasize here that this sort of data is tricky to obtain as many of the sites containing this data have terms and conditions which discourage collecting personal data.
Are you hoping for a dynamic system that identifies new data sources as they become available? Or will you identify existing sites/pages that give you the data you want and focus on what is generated there? From there, you can obtain the data manually or programatically. Manually, you will actually be visiting those pages on a regular basis, copying and pasting the data there, and then formatting the data to be usable. This is not a scalable solution, but can work if the data sources are limited and you've got the time. Programmatically, you can build small scripts that will make the site requests for you. If you don't program, there are many freelancers/developers that have this skillset and are reasonably priced. There are also 3rd party platforms that are well priced for small scale web crawling.
To sum up, you need to identify where you'll get your data from online (the income data & personal will be the trickiest), then identify how you want to obtain that data (manually or programmatically), and finally format the data into a usable dataset (CSV & JSON being simple and universal).
I have the solution. Contact me
Related Questions
-
How can I assess the value of my data assets?
I have been helping clients design, build, and deploy data platforms for many years. Initially on premise, but now in the cloud. This is a really good question. My immediate thought is, yes, there is value in this data. The more difficult one to answer is where and how? The obviously answer is monetisation i.e. what would people or organisations be willing to pay for this data. More importantly, is this my data (vs. customer data) to sell? The data may also represent your intellectual property or be your "secret sauce" so you might not necessarily want to sell it. It is then worth asking questions like "what would the impact on my organisation be if I did not have this data?" or "what would the impact on my organisation be if my competitors had this data?". Hopefully this has given you some ideas. Feel free to book in a call if you have any questions or would like to delve into the detail.GR
-
How to implement a meritocracy using analytics?
ughh this is a big hairy question for many businesses and from past experience consulting, it's going entirely succeed/fail based on your specific business & it's existing operational behavior / employees comfort level in adopting new tools. Integrating across existing software/tools is always the problem here, and is almost what kills every effort in improving one's business systems(sales/biz-dev optimization in your case); There's so many team/project management services online it'd be pointless to list without knowing more about your existing systems & business size/stage with employees. HOWEVER, to end on your point of 'value for the business' as opposed to the bias raw REVENUE creation being attributed to each employee: sounds like you want/need a simplified way of logging events in which are abstract in value but could at the end of say each week/month be reviewed/curated/compiled into understanding metrics/tags/categories for a manager/CEO to review within each department/employee/manager... quantification is what really hurts this as all businesses are complex in their own specific value offer/delivery, so all I can think of to end on is: let each level of management comment/tag/note their own interpretation of the 'quantified or tagged' value be it $10, 10k, or satisified customer, repeat buyer, new bizness market identified, marketing opportunity identified, social media result(hearts, likes, retweets, etc); Hope that helpsCW
-
What are the best traits to look for in a data scientist/data analyst?
First off, I have several people I could introduce. I'd also like to know the industry you're operating in, what the data looks like, where it comes from, and how much it needs to be cleaned up if putting into a relational database, or if the better solution would be a distributed file system like MongoDB where you don't necessarily need to normalize the data. Also, if you're a startup, or if the company is well established with many existing customers and if this is for a new initiative. Assuming you're working with a relational database, which it sounds like you are, you will want to implement something like Tableau or build out a custom dashboard using Google Charts, HighCharts, D3Js, or any number of other potential dashboarding/visualization solutions, which usually involves some programming/scripting in JavaScript. There are paid solutions like Tableau (which is amazingly powerful), and then there are free/open source options. I'd be happy to talk about possible ways to architect the solution, and discover who you would need once I understand the variables more. If you're building a web application, then you will likely need someone who is also a full stack developer, meaning they can handle building the back end and the front end in addition to the data requirements. Many early startups choose Ruby on Rails (because there is a ton of open source code out there for it) with Twitter Bootstrap (modified) and in order to visualize the data, they will need to work with JavaScript. It makes sense to have this person act as initial product and to derive the insights out of the data. They're pretty much the only person who can do this anyways, because they're the ones on the data. If you're in an early stage startup, I would recommend the strongest business owner (usually the early stage startup's CEO) be directly involved with this person on communicating what value your solution brings to clients, and what they pay you for, and in brainstorming on potential features and reports. Once the solution becomes established, and many customers start using it directly, there should be a different product person interfacing to those customers over time, gathering feature requests from customers and bringing it back to the Data Scientist/Analyst who spends their time working on the data. Depending on whether the solutions are SQL or NoSQL or hybrid, there are different types of Data Science professionals you should consider: 1. Data Scientist 2. Data Engineer 3. Data Modeler/Analyst 1. The Data Scientist handles experimenting with the data, and is able to prove statistically significant events and statistically valid arguments. Normally, this person would have modeling skills with Matlab, R, or perhaps SAS, and they should also have some programming/scripting skills with C++ or Python. It really depends on your whole environment and the flow of data. In my experience, Data Scientists that exclusively use SAS are sometimes extremely skilled PhD level statisticians and focused exclusively on the accuracy of the models (which is okay), but often not sufficiently skilled to fit within an early startup's big data environment in today's world and handle all of the responsibilities you'd like them to handle described in your question. I'm am not bad mouthing SAS people as they are often the MOST talented mathematicians and I have a great deal of respect for their minds, but if they do not have the programming skills, they become isolated within a group without a Data Engineer helping them along. Often a SAS user trying to fit into this environment will force you to use a stack of technologies that a skilled Data Architect would not recommend using. It takes programming in some object oriented language to fit into today's big data environments, and the better Data Scientists are using hybrid functional and OOP programming languages like Scala. Extremely hard to find Data Scientist can also work with graph databases like Neo4j, Titan, or Apache Giraph. 2. The Data Engineer, if you're dealing with a firehose of data like Twitter and capturing it into a NoSQL architecture, this is the person who would prepare the data for the Data Scientist to analyze. They often are capable of using machine learning libraries like WEKA to transform data, or techniques like MapReduce on Hadoop. 3. The Data Modeler/Analyst is someone who can use a tool like SAS, SPSS, Matlab, or even R, probably a very strong advanced Excel user, but likely won't be a strong programmer, although perhaps they will have a computer science degree and have some academic programming experience. The most important thing to watch out for is someone who is too academic, and has not been proven to deliver a solution in the real world. This will really screw you up if you're a startup, and could be the reason you fail. Often, the startup will run out of money due to the time it takes to deliver a complete solution or in the startup's case, a minimally viable product. Ask for examples of their work, and specifically dig into what it is that they did for that solution. I've tried to cover a pretty broad range of possibilities here, but it's best to talk in specifics. I'd be happy to discuss this with you in detail. To answer your question, is it perfectly reasonable for someone to handle all of the responsibilities described in your question, if you find the right type of person with the appropriate skills, and a history of success.SE
-
How do I find the Total Addressable Market for a Big Data product like Datameer?
As someone who has built TAM models for various industries - I can attest this isn;t always easy. Many analyst firms (former analyst here) typically do some surveys and research among public company revenue reports for the year and then do some pixie dust extrapolating. The hard thing about a market where startups like Datameer play is that it is a) a nascent market and b) not a "zero sum" game. This makes it difficult to a) fully understand WHICH companies actually are in the market for this flavor of BI and b) know how the space is developing in terms of white space AND disrupting the entrenched BI giants. You might want to look at the press published versions (read: free) of analyst market share reports, and extrapolate from their. Also, back-channel gossip may give you some revenue numbers on these players - and if you assume pipeline is 3-4x revenue, build a model from there. Hope this helps.MS
-
Affordable analytics platform for product, marketing, sales, UX, tech support and employee satisfaction?
You mention, that you can code and maintain yourself, and at the same time it sounds like your data can be dispersed across multiple sources. And you want to minimise license cost. In my business we use R a lot for both analysis and reporting. It is free (open source), but you will have to build programs. But it will read any format, and also create output to any destination you want. So you could start off with a mix of R, Excel and pdf, just to get things going. However, it could make sense for you to build a database already so you familiarise yourself with data warehouse thinking, since you want to expand with marketing automation. At that point you will need a database for monitoring response, sales etc. So it is important to build the right foundation as ealy as possible. If you don't want to code, but want point-and-click, there is boatloads of software for that, but probably more license cost (unless you can find open source for that). So If I were you, I would start off with something smaller. After all, you want to focus on the value you create for the business and your colleagues, and the time you save, rather than a smooth IT-infrastructure for this. I have lots of experience building reporting and analysis in the areas you mention, so if you want to discuss further, feel free to set up a call. Good look with your development. Best regards Kenneth WolstrupKW
the startups.com platform
Copyright © 2025 Startups.com. All rights reserved.