I have a Q&A website, and facing an issue google search engine indexing, please have a look at following use case:
1. User Posted a question: Question gets submitted to Google for indexing through xml sitemap
2. Suppose next day somebody answered that question
Now what happening if my question gets indexed before answering the google is not updating its cache as soon as some change happen in the content, please let me know how can I tell google to recrawl content of my page as soon as somebody add new answer to it?
Hi
Your concern is very common and not knowing the site and not knowing its structure, age or underlying technology I hope below will help.
You can't force google really to crawl your site. Nor can you force it to crawl your page.
You can resubmit your sitemap and hope this it will get crawled. The best method is to be active on your website in general and Google will crawl it with more regularly.
If your site is light on content and pretty new (weeks) then it won't get crawled as often as you would like.
You will read all sorts of stuff about how you could fool Google but from my experience the thing to is to keep a regular content stream going and that will indicate to Google that you are an active site.
You may ask what about the specific questions and the answers then don't worry Google will get to it if you are creating content in other areas.
Let me give you a recent example of one of my niche sites.
I created the site initially with 10-12 core pages. - Day 1 - Week 1
I then over the following days created 2-3 posts / pages on the site - Day 2-Day14
The site pages weren't visible till about Day 4-5.
I continue on a daily basis to create the best content possible. Length helps of course but don't create words just for the sake of it.
You may sya that your Q&A website is a different model but I would look at the big Q&A websites like Quora and look at some of their core pages that surround the site and creat them.
If you are light on questions then reach out to your network and ask them to either ask questions or help with answers.
Glad to do a quick call with you if this is unclear.
Google discovers new web pages by crawling the web, and then they add those pages to their index. They do this using a web spider called Googlebot. Let us define a few key terms:
1. Crawling: The process of following hyperlinks on the web to discover new content.
2. Indexing: The process of storing every web page in a vast database.
3. Web spider: A piece of software designed to carry out the crawling process at scale.
4. Googlebot: Google’s web spider.
When you Google something, you are asking Google to return all relevant pages from their index. Because there are often millions of pages that fit the bill, Google’s ranking algorithm does its best to sort the pages so that you see the best and most relevant results first.
Go to Google, then search for site:yourwebsite.com. This number shows roughly how many of your pages Google has indexed. If you want to check the index status of a specific URL, use the same site:yourwebsite.com/web-page-slug operator. No results will show up if the page is not indexed.
Now, it is worth noting that if you are a Google Search Console user, you can use the Coverage report to get a more accurate insight into the index status of your website. Just go to: Google Search Console > Index > Coverage
Look at the number of valid pages (with and without warnings). If these two numbers total anything but zero, then Google has at least some of the pages on your website indexed. If not, then you have a severe problem because none of your web pages are indexed.
You can also use Search Console to check whether a specific page is indexed. To do that, paste the URL into the URL Inspection tool. If that page is indexed, it will say “URL is on Google.” If the page is not indexed, you will see the words “URL is not on Google.”
Found that your website or web page is not indexed in Google? Try this:
1. Go to Google Search Console
2. Navigate to the URL inspection tool
3. Paste the URL you would like Google to index into the search bar.
4. Wait for Google to check the URL
5. Click the “Request indexing” button
This process is good practice when you publish a new post or page. You are effectively telling Google that you have added something new to your site and that they should look at it. However, requesting indexing is unlikely to solve underlying problems preventing Google from indexing old pages. If that is the case, follow the checklist below to diagnose and fix the problem.
Here are some quick links to each tactic in case you have already tried some:
1) Remove crawl blocks in your robots.txt file: Is Google not indexing your entire website? It could be due to a crawl block in something called a robots.txt file. To check for this issue, go to yourdomain.com/robots.txt. A crawl block in robots.txt could also be the culprit if Google isn’t indexing a single web page. To check if this is the case, paste the URL into the URL inspection tool in Google Search Console. Click on the Coverage block to reveal more details, then look for the “Crawl allowed? No: blocked by robots.txt” error.
This indicates that the page is blocked in robots.txt. If that’s the case, recheck your robots.txt file for any “disallow” rules relating to the page or related subsection. Remove where necessary.
2) Remove rogue no index tags: Google will not index pages if you tell them not to. This is useful for keeping some web pages private. There are two ways to do it:
Method 1: meta tag
Pages with either of these meta tags in their <head> section will not be indexed by Google:
1<meta name=“robots” content=“noindex”>
1<meta name=“googlebot” content=“noindex”>
This is a meta robots tag, and it tells search engines whether they can or can’t index the page. To find all pages with a noindex meta tag on your site, run a crawl with Ahrefs’ Site Audit. Go to the Indexability report. Look for “Noindex page” warnings. Click through to see all affected pages. Remove the noindex meta tag from any pages where it does not belong.
Method 2: X‑Robots-Tag: Crawlers also respect the X‑Robots-Tag HTTP response header. You can implement this using a server-side scripting language like PHP, or in your .htaccess file, or by changing your server configuration. The URL inspection tool in Search Console tells you whether Google is blocked from crawling a page because of this header. Just enter your URL, then look for the “Indexing allowed? No: ‘noindex’ detected in ‘X‑Robots-Tag’ http header”. If you want to check for this issue across your site, run a crawl in Ahrefs’ Site Audit tool, then use the “Robots information in HTTP header” filter in the Page Explorer. You can exclude pages you want indexing from returning this header.
3) Include the page in your sitemap: A sitemap tells Google which pages on your site are important, and which are not. It may also give some guidance on how often they should be re-crawled. Google should be able to find pages on your website regardless of whether they are in your sitemap, but it is still good practice to include them. After all, there is no point making Google’s life difficult. To check if a page is in your sitemap, use the URL inspection tool in Search Console. If you see the “URL is not on Google” error and “Sitemap: N/A,” then it is not in your sitemap or indexed. Not using Search Console? Head to your sitemap URL—usually, yourdomain.com/sitemap.xml—and search for the page.
Or, if you want to find all the crawlable and indexable pages that aren’t in your sitemap, run a crawl in Ahrefs’ Site Audit. These pages should be in your sitemap, so add them. Once done, let Google know that you’ve updated your sitemap by pinging this URL:http://www.google.com/ping?sitemap=http://yourwebsite.com/sitemap_url.xml
Replace that last part with your sitemap URL.
4) Remove rogue canonical tags: A canonical tag tells Google which is the preferred version of a page. It looks something like this: <link rel="canonical” href="/page.html/">
Most pages either have no canonical tag, or what’s called a self-referencing canonical tag. That tells Google the page itself is the preferred and probably the only version. In other words, you want this page to be indexed.
But if your page has a rogue canonical tag, then it could be telling Google about a preferred version of this page that does not exist. In which case, your page will not get indexed. To check for a canonical, use Google’s URL inspection tool. You will see an “Alternate page with canonical tag” warning if the canonical points to another page.
If this should not be there, and you want to index the page, remove the canonical tag. Canonical tags are not always bad. Most pages with these tags will have them for a reason. If you see that your page has a canonical set, then check the canonical page. If this is indeed the preferred version of the page, and there is no need to index the page in question as well, then the canonical tag should stay. If you want a quick way to find rogue canonical tags across your entire site, run a crawl in Ahrefs’ Site Audit tool. This looks for pages in your sitemap with non-self-referencing canonical tags. Because you almost certainly want to index the pages in your sitemap, you should investigate further if this filter returns any results. It’s highly likely that these pages either have a rogue canonical or shouldn’t be in your sitemap in the first place.
5) Check that the page is not orphaned: Orphan pages are those without internal links pointing to them. Because Google discovers new content by crawling the web, they are unable to discover orphan pages through that process. Website visitors will not be able to find them either. To check for orphan pages, crawl your site with Ahrefs’ Site Audit. Next, check the Links report for “Orphan page (has no incoming internal links)” errors. This shows all pages that are both indexable and present in your sitemap, yet have no internal links pointing to them. This process only works when two things are true:
1. All the pages you want indexing are in your sitemaps
2. You checked the box to use the pages in your sitemaps as starting points for the crawl when setting up the project in Ahrefs’ Site Audit.
Not confident that all the pages you want to be indexed are in your sitemap? Try this:
1. Download a full list of pages on your site (via your CMS)
2. Crawl your website (using a tool like Ahrefs’ Site Audit)
3. Cross-reference the two lists of URLs
Any URLs not found during the crawl are orphan pages.
You can fix orphan pages in one of two ways:
1. If the page is unimportant, delete it and remove from your sitemap.
2. If the page is important, incorporate it into the internal link structure of your website.
6) Fix nofollow internal links: Nofollow links are links with a rel=“nofollow” tag. They prevent the transfer of PageRank to the destination URL. Google also doesn’t crawl nofollow links.
Here’s what Google says about the matter:
Essentially, using nofollow causes us to drop the target links from our overall graph of the web. However, the target pages may still appear in our index if other sites link to them without using nofollow, or if the URLs are submitted to Google in a Sitemap.
In short, you should make sure that all internal links to indexable pages are followed. To do this, use Ahrefs’ Site Audit tool to crawl your site. Check the Links report for indexable pages with “Page has nofollow incoming internal links only” errors. Remove the nofollow tag from these internal links, if you want Google to index the page. If not, either delete the page or noindex it.
7) Add “powerful” internal links: Google discovers new content by crawling your website. If you neglect to internally link to the page in question, then they may not be able to find it. One easy solution to this problem is to add some internal links to the page. You can do that from any other web page that Google can crawl and index. However, if you want Google to index the page as fast as possible, it makes sense to do so from one of your more “powerful” pages. Why? Because Google is likely to recrawl such pages faster than less important pages. To do this, head over to Ahrefs’ Site Explorer, enter your domain, then visit the Best by links report. This shows all the pages on your website sorted by URL Rating (UR). In other words, it shows the most authoritative pages first. Skim this list and look for relevant pages from which to add internal links to the page in question. For example, if we were looking to add an internal link to our guest posting guide, our link building guide would probably offer a relevant place from which to do so. Google will then see and follow that link next time they recrawl the page. Paste the page from which you added the internal link into Google’s URL inspection tool. Hit the “Request indexing” button to let Google know that something on the page has changed and that they should recrawl it as soon as possible. This may speed up the process of them discovering the internal link and consequently, the page you want indexing.
8) Make sure the page is valuable and unique: Google is unlikely to index low-quality pages because they hold no value for its users. Here’s what Google’s John Mueller said about indexing in 2018: “We never index all known URLs, that's pretty normal. I'd focus on making the site awesome and inspiring, then things usually work out better.” He implies that if you want Google to index your website or web page, it needs to be “awesome and inspiring.”If you have ruled out technical issues for the lack of indexing, then a lack of value could be the culprit. For that reason, it is worth reviewing the page with fresh eyes and asking yourself: Is this page genuinely valuable? Would a user find value in this page if they clicked on it from the search results? If the answer is no to either of those questions, then you need to improve your content. You can find more potentially low-quality pages that aren’t indexed using Ahrefs’ Site Audit tool and URL Profiler. This will return “thin” pages that are indexable and currently get no organic traffic. In other words, there’s a decent chance they aren’t indexed. Export the report, then paste all the URLs into URL Profiler and run a Google Indexation check. It’s recommended to use proxies if you’re doing this for lots of pages (i.e., over 100). Otherwise, you run the risk of your IP getting banned by Google. If you cannot do that, then another alternative is to search Google for a “free bulk Google indexation checker.” There are a few of these tools around, but most of them are limited to <25 pages at a time. Check any non-indexed pages for quality issues. Improve where necessary, then request reindexing in Google Search Console. You should also aim to fix issues with duplicate content. Google is unlikely to index duplicate or near-duplicate pages. Use the Duplicate content report in Site Audit to check for these issues.
9) Remove low-quality pages (to optimize “crawl budget”): Having too many low-quality pages on your website serves only to waste crawl budget.
Here’s what Google says on the matter:
Wasting server resources on [low-value-add pages] will drain crawl activity from pages that do have value, which may cause a significant delay in discovering great content on a site.
Think of it like a teacher grading essays, one of which is yours. If they have ten essays to grade, they are going to get to yours quite quickly. If they have a hundred, it will take them a bit longer. If they have thousands, their workload is too high, and they may never get around to grading your essay.
Google does state that “crawl budget […] is not something most publishers have to worry about,” and that “if a site has fewer than a few thousand URLs, most of the time it will be crawled efficiently.”
Still, removing low-quality pages from your website is never a bad thing. It can only have a positive effect on crawl budget.
You can use our content audit template to find potentially low-quality and irrelevant pages that can be deleted.
10) Build high-quality backlinks: Backlinks tell Google that a web page is important. After all, if someone is linking to it, then it must hold some value. These are paging that Google wants to index. For full transparency, Google does not only index web pages with backlinks. There are plenty (billions) of indexed pages with no backlinks. However, because Google sees pages with high-quality links as more important, they are likely to crawl—and re-crawl—such pages faster than those without. That leads to faster indexing.
Besides if you do have any questions give me a call: https://clarity.fm/joy-brotonath