Loading...
Answers
MenuHow big data can evolve if our overall ability to convert data into really valuable info that then can drive effective decision making is so limited?
This question has no further details.
Answers
I would disagree with the premise that "big data" isn't being converted into valuable information. Modern advertising is being driven by exactly this ability. The problem lies not with the information, or lack thereof, but with the decision makers. Lack of vision at both the strategic and tactical levels of decision, fear of change, among other things are at the root of the problem.


I also agree. It really depends on what you do with the data. Even a small bit of data can mean a lot.
The other problem is that the people gathering the data and analyzing it are sometimes people with PhDs and not MBAs. We are talking about a very technical thing here and the industry is growing. Just give it some time and I guarantee it'll evolve. So how you ask? Time, that's how. Trial and error. The risk takers will lead the way.
To be honest, I do believe many people are wasting too much money on huge data warehouses. I firmly believe we can get the data we need to act on without going so crazy. We're all too impressed with size. I mean, maybe if you're trying to clone Dolly or are the NSA or something, you can't avoid all that data...But size doesn't matter. Results do.
For example, a 5 terabyte data set of Twitter messages from random people no one cares about talking about what they ate for breakfast? Useless.
I know, it's unfair...We've spent millions of dollars learning about breakfast foods from Twitter and Tony the Tiger's family is still starving. Yup, it happens.
Just give it time. Big data and data mining on the web is still in its infancy in terms of application.


I am not at all convinced that our ability to convert data to actionable decision making metrics is limited.
In fact, I do just that every day.
There are definitely limiting factors, but the ability to make use of the data is much stronger than people commonly believe.
For structured data you just need a good understanding of the data structures and their relationships. Unstructured data (tweets, statuses, etc.) is much more difficult but there are several technologies available and under development that tackle unstructured data specifically. This even includes some high end tools that monitor voice data.
If you have a specific data scenario in mind I can be more specific. If you'd like to arrange a call I'd be happy to provide more detail.
Related Questions
-
How can I know when to go with my gut or when to trust the data when making a business decision?
In my experience coaching hundreds of people, anytime you go against your gut (aka intuition) you will lose. Data is important to consider, but your gut often knows the path.
-
Do services social monitoring tools like Mention.com, Socialmention.com, and Nitrogram rely soley on APIs or do they crawl or cache the meta data?
It's hard to get a definite answer to this as these companies will not tell us how their algorithms work. Having said that: Based on my experience working with these APIs, it is totally possible to implement a real-time search like "socialmention.com" purely based on the APIs from Google, Twitter, Yahoo, or Facebook. On the other hand, crawling and saving all that data just for the eventuality that a customer might search for it is probably not economically viable and will also violate the Terms of Services of most of these APIs. Bottom line is: They are probably not crawling and caching.
-
How can I aggregate data from online sources about a specific topic?
There are so many ways to do it... Do you need this data for yourself, or you are planning to make a product around it? From what I see you can use Twitter API and Facebook Graph API (Are you comfortable programming?) Most of the students are active on social media so you will find lots of data. Facebook graph API will give you a number of likes and comments to all the posts of you competitors. You can analyze all the posts of your competitors. Using Twitter API you can get all the twits that use certain hashtags or mentions. If you are not into coding, but still want to get social media information, you can take a look at tools like IBM Watson ANalytics ($30 for personal use), it natively connects to Twitter API, and you don't have to be a programmer at all. It is intuitive and easy to learn. Analytics Canvas connects to Facebook Graph API (it's free for 30 days of trial). Unfortunately, you would not be able to collect any personal information from social media at large scale (age, income, gender, etc.), because it violates all the laws about privacy on the Internet. You can use census data instead. Google Sheets are a very handy tool if you are planning to use this information for personal research. You can set up a spreadsheet and add some Java script to make it collect all information from competitor's blogs, and also sites like Reddit. Finally, you can try web scraping (it's not the best, but can speed up the process). A tool like OutWitHub will collect information from websites (such as website reviews) based on the structure you provide (select html tags). You can collect thousands of reviews in one day if you automate it (paid version). Very easy to use. Note: not all the websites are open to this method, review their policies to make sure you are not violating their terms of service. Reviews belong to the website where they were published. If you REALLY need personal data (like how much they earn and how much they spend, etc.), just print out 100 questionnaires and go to Student Union Building of Dalhousie University. Most of the students will share any personal data in exchange for a Tim Horton's gift card that gets them a free coffee. It is probably the least technical and fastest way to get all the data you need. Hope this helps.
-
What are the best traits to look for in a data scientist/data analyst?
First off, I have several people I could introduce. I'd also like to know the industry you're operating in, what the data looks like, where it comes from, and how much it needs to be cleaned up if putting into a relational database, or if the better solution would be a distributed file system like MongoDB where you don't necessarily need to normalize the data. Also, if you're a startup, or if the company is well established with many existing customers and if this is for a new initiative. Assuming you're working with a relational database, which it sounds like you are, you will want to implement something like Tableau or build out a custom dashboard using Google Charts, HighCharts, D3Js, or any number of other potential dashboarding/visualization solutions, which usually involves some programming/scripting in JavaScript. There are paid solutions like Tableau (which is amazingly powerful), and then there are free/open source options. I'd be happy to talk about possible ways to architect the solution, and discover who you would need once I understand the variables more. If you're building a web application, then you will likely need someone who is also a full stack developer, meaning they can handle building the back end and the front end in addition to the data requirements. Many early startups choose Ruby on Rails (because there is a ton of open source code out there for it) with Twitter Bootstrap (modified) and in order to visualize the data, they will need to work with JavaScript. It makes sense to have this person act as initial product and to derive the insights out of the data. They're pretty much the only person who can do this anyways, because they're the ones on the data. If you're in an early stage startup, I would recommend the strongest business owner (usually the early stage startup's CEO) be directly involved with this person on communicating what value your solution brings to clients, and what they pay you for, and in brainstorming on potential features and reports. Once the solution becomes established, and many customers start using it directly, there should be a different product person interfacing to those customers over time, gathering feature requests from customers and bringing it back to the Data Scientist/Analyst who spends their time working on the data. Depending on whether the solutions are SQL or NoSQL or hybrid, there are different types of Data Science professionals you should consider: 1. Data Scientist 2. Data Engineer 3. Data Modeler/Analyst 1. The Data Scientist handles experimenting with the data, and is able to prove statistically significant events and statistically valid arguments. Normally, this person would have modeling skills with Matlab, R, or perhaps SAS, and they should also have some programming/scripting skills with C++ or Python. It really depends on your whole environment and the flow of data. In my experience, Data Scientists that exclusively use SAS are sometimes extremely skilled PhD level statisticians and focused exclusively on the accuracy of the models (which is okay), but often not sufficiently skilled to fit within an early startup's big data environment in today's world and handle all of the responsibilities you'd like them to handle described in your question. I'm am not bad mouthing SAS people as they are often the MOST talented mathematicians and I have a great deal of respect for their minds, but if they do not have the programming skills, they become isolated within a group without a Data Engineer helping them along. Often a SAS user trying to fit into this environment will force you to use a stack of technologies that a skilled Data Architect would not recommend using. It takes programming in some object oriented language to fit into today's big data environments, and the better Data Scientists are using hybrid functional and OOP programming languages like Scala. Extremely hard to find Data Scientist can also work with graph databases like Neo4j, Titan, or Apache Giraph. 2. The Data Engineer, if you're dealing with a firehose of data like Twitter and capturing it into a NoSQL architecture, this is the person who would prepare the data for the Data Scientist to analyze. They often are capable of using machine learning libraries like WEKA to transform data, or techniques like MapReduce on Hadoop. 3. The Data Modeler/Analyst is someone who can use a tool like SAS, SPSS, Matlab, or even R, probably a very strong advanced Excel user, but likely won't be a strong programmer, although perhaps they will have a computer science degree and have some academic programming experience. The most important thing to watch out for is someone who is too academic, and has not been proven to deliver a solution in the real world. This will really screw you up if you're a startup, and could be the reason you fail. Often, the startup will run out of money due to the time it takes to deliver a complete solution or in the startup's case, a minimally viable product. Ask for examples of their work, and specifically dig into what it is that they did for that solution. I've tried to cover a pretty broad range of possibilities here, but it's best to talk in specifics. I'd be happy to discuss this with you in detail. To answer your question, is it perfectly reasonable for someone to handle all of the responsibilities described in your question, if you find the right type of person with the appropriate skills, and a history of success.
-
How can I use my cohort analysis by revenue to determine if I'm spending too much or too little in advertising?
You'll find that looking at top line revenue alone won't provide much value to you if you are asking the question "Is my advertising spend worth it?" Typically, in advertising I want to know the Lifetime Value of the customer or the Average Revenue Per User (ARPU). Then, it boils down to : If I spend $100 to acquire this new customer and he spends $150 over the lifetime of his paid relationship with me then, yes--- my ad spend is worth it. On the other hand: I could spend 10K this month in ads. I could sign up 20K in users/customers/top line revenue. But my data will tell me I spent MORE per customer/user than they are actually worth over the average lifetime. So that's what usually answers the "Is it worth it" question in any advertising scenario. 1. How much to acquire the user? 2. How much does the user spend over the avg lifetime? If it doesn't make sense, it's not worth it, regardless of what your top line revenue tells you. You might find this blog post on cohort analysts valuable as it goes into more detail. (I have no affiliation.) https://apsalar.com/blog/2012/01/how-to-use-cohort-analysis-to-improve-revenue/