I need to remove 6,000+ URLs from a website which has already been crawled by Google.
What will be the best way to update those URLs? By updating those URLs to 404, noindex no follow and canonical or other?
Please suggest me what will be best from an SEO point of view.
You've got a few options here.
1) Add a noindex tag to the URLs (or an X Robots header), and then, after 3-4 weeks, apply a robots.txt block (and specify the directory you want to remove (for instance). This should help the search engines to see that you don't want these pages indexed. Once they've crawled them, and see the noindex, you then apply a robots block to ensure they don't recrawl then.
The above assume that you've removed the internal crawl path to the pages.
2) Return a 404 or 410 on the offending URLs that you want removed.
3) Execute URL removals via Webmaster Tools. Here you can manually request that URLs are removed. They are often removed within 24 hours. If you're still linking to these pages (in your internal nav or via XML sitemaps) then these pages could return.
I hope that helps, and do give me a shout if you have any other questions. I've dealt with issues like this, a lot, with enterprise level sites.
My first question to you is what are these URLs? It's important to know what these urls role in your overall website (ie do they carry important search signals or not , or by them being crawled they actually diluting these search signals)
so based on your answer above :
if they are important , or duplicated then you need either 301 them and if too much dev time you go with your 2nd choice which is use rel=canonical. (PS canonical is a suggestion/hint to google and dnt active like directive )
if they are not important ( search pages, doorway pages or whatever) then you need to remove them either by a 404 or a 410 then after couple of months (when you see all of these urls has been disappeared from the index) apply robots.txt exclusion to not to crawl these urls anymore.
please note that the url removal works just for 90 days and if you didnt apply a robots.txt to disable bots from crawling them again then they will reappear.
hope this helps
This completely depends on your goals...
Want to retain link juice? 301 redirect.
Want to get rid of poisonous results? Just 404 them today.
Are they all the same exact page? Canonicals
Let's talk for 20 minutes and you'll have everything you need to move forward -