I used Excel heavily to manipulate data on my desktop computer (PC), but as soon as I get up to 500,000 rows and complex calculations, the performance massively slows down and makes the work hugely cumbersome.
I'm considering taking the jump and retraining to learn Python, so that I can store and query the data in a database, however as I'd like others to be able to access the data, I wanted to explore other options with excel.
Can I set up a virtual machine (with huge amount of memory), running Excel, that I can remote into. Will this allow me to handle massive datasets (up to 1m rows with ease?)
A good enough computer would be able to run huge datasets in Excel, but it seems like if you yourself need a supercomputer to use Excel to process the data, it doesn't exactly mean the data is in a form that other people can access, which is what you want.
I would strongly recommend using Matlab to process the data. It's incredibly easy, intuitive, powerful, and well documented and supported online. There will be almost no learning curve. After processing and analyzing the data in Matlab you can save the relevant chunks as .csv files that others could open on a normal computer running Excel.
Alternatively, they (or you) could use Octave, which is like a free open source version of Matlab (http://www.gnu.org/software/octave/)
Just in case you don't know, 1 million rows is getting close to the absolute max that Excel 2010 can actually have (https://support.office.com/en-nz/article/Excel-specifications-and-limits-1672b34d-7043-467e-8e27-269d656771c3).
If you want additional advice on splitting huge datasets, using Matlab, etc. send me a message,
all the best,
Excel's initial flexibility and interactive UI do eventually gravitate towards a limit, that's why you should definitely switch to mySQL or even MongoDB (best choice if you haven't figured out yet the presentation format for your other users. Matlab/Octave are laid out for scientific&research purposes and therefore come with an even steeper learning curve. There are plenty Big Data MOOCs out there that would give you an edge by teaching R (good for statistics / machine learning) or even Python, as you suggested yourself.