Hej Aminoer, Nogen der har en ide om hvorvidt man kan blokere for disse "uindbudte fremmede", der i bedste fald ingen skade gør, men egentlig er ret irriterende (og måske skadelige?)? 1586.New Microsoft Office Word Document.docx Rent praktisk ("GA teknisk") kan man jo bare vælge avancerede indstillinger -> uden crawlere hver gang, og dermed sortere denne forstyrrende faktor fra. Men jeg hører meget gerne hvad I andre, der evt. måtte have oplevet samme problemer gør. |
Hej Nicolai
Jeg har haft samme problem, som vores hostingudbyder gjorde os opmærksomme på. Her er den første mail jeg fik fra dem:
Hello,
System administration has identified your account as using higher resources on the server housing your account. This is impacting other users, and we may be forced to suspend or have already suspended your site in order to stabilize the server.
We noticed that your site is being heavily 'crawled' by search engines. Search engines tend to mimic the effect of hundreds of visitors going through every portion of your site, often all at once.
You may wish to implement a robots.txt file in order to reduce this effect. This file contains instructions for well behaving 'robots' on how to crawl your site. You can find more information about this here:
http://www.robotstxt.org/.
The basic format would be as follows to block robots from the following (example) directories as well as set a 10 second delay between requests:
User-agent: *
Crawl-delay: 10
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /tmp/
Disallow: /private/
Crawl-delay is an unofficial extension to the robots.txt standard but one that most popular search engines use. One notable example however is Google's crawlers, which instead require you to set this delay in Google Webmaster Tools. We have a step-by-step guide on doing so at this URL:
http://www.inmotionhosting.com/support/website/google-tools/setting-a-crawl-delay-in-google-webmaster-tools
The delay and directories which are disallowed for crawlers are particularly useful for parts of your sites like forums or 'tag clouds' that, while useful to human visitors, are troublesome in terms of how robots aggressively pass through them repeatedly.
Nedenfor var den løsning de selv implementerede.
Hello,We once again noticed a high CPU load on your server, due to bot crawling. As we further investigated the issue, it appears that one particular bot was causing this issue, the "80legs" crawler. This crawler is known to be rather aggressive. As it is not a major contributor to search engine rankings or information, we have blocked this bot via the following code in your .htaccess file
ErrorDocument 503 "Site temporarily disabled for crawling"
RewriteEngine On
RewriteCond %{HTTP_USER_AGENT} ^.*(80legs).*$ [NC] RewriteRule .* - [R=503,L]