We submitted a sitemap several months ago, but our site is large enough that generating a new one is time-intensive. How important is it for Google/Bing/Yahoo to have a current sitemap, or can they crawl the site appropriately once the initial one is submitted?
Hi there -
There are two parts to the answer. First, it's very important to have a site map that is both current and does not have any URLs in it that return a status code other than a 200. Even 301 redirected URLs should not be in there. Also, the site map should not have more than 50,000 URLs. If it has more, then create multiple site maps and list them in an index site map.
Second, site maps are a great way to get content *discovered*, but they won't necessarily rank well. You need the URLs to also be easily discoverable on your site through internal linking.
I hope this helps. Feel free to book a call with me if you'd like to discuss more.
It is essential to provide Google with a constantly updated sitemap, especially for large sites. If the sitemap generation takes a long time you can divide it into several pages, containing a limited number of links, and update only the section that includes newest and updated contents.
I agree with what was previously said. It will be time-sensitive but worth it if you implement a system to update your sitemap automatically in the future. Also, a solution might exist for your platform that will save you time.
Dividing your sitemap using a sitemap index will also 1. help diagnose possible indexing problems via Webmaster Tools 2. force you to review your site architecture which can lead to valuable insights to improve your site.
Relevant article on the subject: http://www.distilled.net/blog/seo/indexation-problems-diagnosis-using-google-webmaster-tools/
Just to add a couple of items to the stack of already great answers.
1) For a large site. Absolutely. I would split out the XML sitemaps by category or page type (assuming we're only talking about static pages (not images/videos), this will help you to understand potential indexation problems (alongside investigating your server logs and landing page traffic)
2) Use a delay job/cron job to update your sitemaps on a regular basis (i.e, have them run at midnight) so that any new pages are added, and junk pages removed. You're ultimately working to ensure that the search engines can see all of your content, including the latest pages.
3) It's worth paying attention to your XML sitemaps, and ensure that, at John rightly pointed out, that they are free of broken URLs 400's and 500's, free of redirects (301's and 302's) and don't exclude 50,000 URLs each.
4) And lastly... don't get obsessed by having a MASSIVE site, with hundreds of thousands of thin pages, with very poor indexation. Instead (and particularly post Panda), focus on 'making every page count'.
Cut down the bloat. Improve your overall site quality. Consolidate your internal and external link flow, and maximise the effectiveness of your crawl budget.
As your Domain and Page authority improved, you'll be able to handle a larger site...
For a marketplace, this is essential.
Hope this helps.
At the end of the day your sitemap is "just a file" with the links to all the files on your site. It is one of many items on your good SEO/website checklist to do and shouldn't be ignored however.
The SEO guys may scream in pain here but there are lots of sites that rank fine without a sitemap because they are doing damm fine in other more important areas.
There are lots of automated ways to make sure that your sitemap is up to date and doesn't deliver bad results and once you automate it you should in theory be able to "virtually walk away".