Wiki's robots.txt inappropriate
Original Reporter info from Mantis: jwdietrich @jwdietrich21
-
Reporter name: Johannes W. Dietrich
Original Reporter info from Mantis: jwdietrich @jwdietrich21
- Reporter name: Johannes W. Dietrich
Description:
The robots.txt file of the Lazarus wiki should be changed.
In the current version it is:
User-agent: *
Crawl-delay: 1
Disallow: /index.php
Disallow: /Special:WhatLinksHere
Disallow: /Special:Contributions/
Disallow: /Special:Upload
Disallow: /Special:Log
Disallow: /skins/
The second line ("Disallow: /index.php") prevents some crawlers, e.g. that of the important internet archive, to access the site (although others including Google aren't blocked).
This is caused by the fact that articles like http://wiki.lazarus.freepascal.org/FPSpreadsheet are in fact shortcuts for URLs like http://http://wiki.lazarus.freepascal.org/index.php?title=FPSpreadsheet .
A more appropriate version would be:
User-agent: *
Crawl-delay: 1
Disallow: /Special:WhatLinksHere
Disallow: /Special:Contributions/
Disallow: /Special:Upload
Disallow: /Special:Log
Disallow: /skins/
Disallow: /User:
Disallow: /Talk:
Disallow: /User talk:
Disallow: /index.php?title=/Special:WhatLinksHere
Disallow: /index.php?title=/Special:Contributions/
Disallow: /index.php?title=/Special:Upload
Disallow: /index.php?title=/Special:Log
Disallow: /index.php?title=/skins/
Disallow: /index.php?title=/User:
Disallow: /index.php?title=/Talk:
Disallow: /index.php?title=/User talk:
Steps to reproduce:
Accessing http://wiki.lazarus.freepascal.org/robots.txt and trying to save the site at https://archive.org/web/
Mantis conversion info:
- Mantis ID: 26047
- Monitored by: » @jwdietrich21 (Johannes W. Dietrich)