• Resolved Jim

    (@jwmc)


    Google Search Console notified me that I have a URL that is “Indexed, though blocked by robots.txt”. The full URL is shown above, but it goes to /wp-admin/upload.php I searched for any links to this URL on the site and couldn’t find any.

    My robots.txt simply shows

    User-agent: * Disallow: /wp-admin/ Allow: /wp-admin/admin-ajax.php Sitemap: https://-----.org/sitemap.xml User-agent: * Disallow: /wp-content/uploads/wpo-plugins-tables-list.json

    I’ve struggled to understand how this happened and can’t figure out how to fix it.

    The page I need help with: [log in to see the link]

Viewing 3 replies - 1 through 3 (of 3 total)
  • Plugin Author Sybre Waaijer

    (@cybr)

    Hello!

    What you’re finding is normal, intended, and expected behavior.

    Somewhere on your site, a link to the /wp-admin/upload.php was provided.

    Because we disallow /wp-admin/, all links leading to /wp-admin/* will be blocked — this includes the /wp-admin/upload.php file.

    There’s but one exception: /wp-admin/admin-ajax.php. We allow this one file because it’s also used in non-admin scenarios (such as loading a page via AJAX on the front-end); blocking that would stop crawling the AJAX-loaded page.

    Via the Indexing report, at “Blocked by robots.txt”, click on the URL at the Examples.

    Then, click on “Inspect URL.” You may be able to find the “Referring page” there.

    The “Referring page” is where Google found the URL and followed it.

    Sometimes, they may even take plain text that somehow looks like a URL and follow that, as was the case with us, where Google followed a URL from a code snippet:

    You can learn more about robots.txt here: https://developers.google.com/search/docs/crawling-indexing/robots/intro.

    I hope this explains everything well. Let me know if you have any more questions!

    Thread Starter Jim

    (@jwmc)

    Thank you for your reply and sorry to take this up again after a long delay. As mentioned, there is no link to this URL on my site. But following your advice and clicking around in Search Console, I eventually found the referring page. It is some nonsense website that seems to be somehow copying support emails or forum posts with Wordfence. I apparently mentioned that URL 4 years ago while communicating with them!

    The referring page is (leaving off the https:// in hopes not to reinforce Google’s indexing): ditted24.rssing.com/chan-12949233/all_p1233.html

    • This reply was modified 1 month, 2 weeks ago by Jim.
    • This reply was modified 1 month, 2 weeks ago by Jim.
    • This reply was modified 1 month, 2 weeks ago by Jim.
    Plugin Author Sybre Waaijer

    (@cybr)

    Hi again!

    I’m glad you found the source. Perhaps you can disavow the backlink, or otherwise request them to remove the post. Still, this “backlink” is not harmful, and it might be best to ignore it. The web is full of interlinked nonsense; cleaning up would take many lifetimes.

    Note that Google will crawl anything that even barely looks like a URL… so even that link you posted will cause them to crawl it again. 🥲

    In any case, I hope you have a lovely weekend!

Viewing 3 replies - 1 through 3 (of 3 total)
  • You must be logged in to reply to this topic.