With great power comes great responsibility
Most of the basic settings are described on the Options and Rules pages. Below are the more advanced and obscure settings which are found on the 'Advanced' tab.
You can change the user-agent string to make Scrutiny appear to the server to be a browser (known as 'spoofing'). Choose one of the regular browsers from the drop-down menu or paste in one of your own.
The contents of this box are sent as the Accept-Language request header. Usually, * is the best setting. Occasionally, a website may use this field to determine which language to send. If this is the case with your site, then you can control which language version of your site is scanned.
If you're getting timeouts you should first reduce the number of threads you're using.
Your server may not respond to many simultaneous requests - it may have trouble coping or may deliberately stop responding if being bombarded from the same IP. If you get many timeouts at the same time, there are a couple of things you can do. First of all, move the number of threads to the extreme left, then Scrutiny will send one request at a time, and process the result before sending the next. This alone may work.
If not, then you can now specify the maximum number of requests that Scrutiny makes per minute.
You don't need to do any maths; it's not 'per thread'. Scrutiny will calculate things according to the number of threads you've set (and using a few threads will help to keep things running smoothly). It will reduce the number of threads if appropriate for your specified maximum requests.
If your server is simply being slow to respond or your connection is busy, you can increase the timeout (in seconds).
This slider sets the number of requests that Scrutiny can make at once. Using more threads may crawl your site faster, but it will use more of your computer's resources and your internet bandwidth, and also hit your website harder.
Using fewer will allow you to use your computer while the crawl is going on with the minimum disruption.
The default is 12, minimum is one and maximum is 40. Experiment to find the optimum crawl speed with the slide half way.
Beware - your site may start to give timeouts / errors or other problems if you have this setting too high. In some cases, too many threads may stop the server from responding or responding to your IP. If moving the number of threads to the minimum doesn't cure this problem, see 'Timeout and Rate limiting' below.
If your first scan didn't proceed or finish as expected, here are some important settings which may need changing to suit your site.
The default settings of these controls will suit most sites, but if you have problems, read the descriptions below and decide whether you may need to change them.
The querystring is information within the url of a page. It follows a '?' - for example www.mysite.co.uk/index.html?thisis=thequerystring. If you don't use querystrings on your site, then it won't matter whether you set this option. If your page is the same with or without the querysrting (for example, if it contains a session id) then check 'ignore querystrings'. If the querystring determines which page appears (for example, if it contains the page id) then you shouldn't ignore querystrings, because Scrutiny won't crawl your site properly.
If you have to allow querystrings because there's a page id in there, but a session id or some other parameter is causing the crawl to go on for ever, then Scrutiny now has the option to ignore only the session id (or another single parameter). See the 'Advanced' tab
Some content management systems have urls where a page name is included in the starting url but has no file extension. eg mysite.com/mypage/ If this is your starting url, Scrutiny cannot know whether this is a directory or a page. If a directory, Scrutiny would limit its scan to the directory /mypage/. If it's a page then the scan would be limited to mysite.com. This situation is auto-detected, you should be asked the question if necessary, but you are able to manually change the setting here if necessary.
foo.com is considered the same as foo.com/ with this setting switched on. For most sites it'll be correct to leave this setting on, but some sites are touchy about the trailing slash being present or not.
If a page requires javascript to populate some or all content, it may display its 'noscript' text in browsers that have javascript disabled and that may be what Scrutiny sees. If your site requires javascript to be switched on before content or navigation links are visible then Scrutiny can render the page before scanning the page.
Check 'Site is inaccessible without clientside rendering' to switch on this feature. The scan will be slower and use far more resources, so only use this option if you're absolutely sure that it's absolutely necessary.
Note that 'onload' script will be executed, but Scrutiny can't perform user actions like clicking menus, scrolling or trawl through javascript searching for links.
This setting will ignore the session id within the querystring, but leave the rest of the querystring intact. Some sites assign a session id in the querystring ( ?sid=12345 ) and this may change during the scan, leading to the same page being logged many times and the scan never finishing. One solution is to set 'ignore querystrings' but sometimes this isn't possible if other parameters within the querystring are essential. Using this setting, you can ask Scrutiny to remove just the session id (or any other single parameter from the querystring).