application icon

Scrutiny 9 Preferences and global settings

You choose


preferences menu item

General

User Agent String

You can change the user-agent string to make Scrutiny appear to the server to be a browser (known as 'spoofing'). Choose one of the regular browsers from the drop-down menu or paste in one of your own.

Tolerance

It's pretty common for pages to contain relative links like '../somepage.html' where ../ means 'directory above'. If this literally takes the user above the domain, then this is technically an error. Most browsers will be tolerant to this and assume that you mean the root (which you probably do). But it's a situation you might want to know about and fix. 'Ignore ../ that travels above the domain' will cause Scrutiny to be tolerant to this in the same way that browsers usually are.

Limits

The settings for 'stop at X links' and 'crawl maximum X clicks from home' prevent Scrutiny from running for ever. There are a number of reasons why a web crawler might find itself in a recursive loop. Each site you create can have its own values in the site's settings, but the defaults are set here.

'Don't log more than X occurrences of each link' can save memory when scanning extremely large sites.

Options

With Autosave switched on, data is autosaved for each site you scan (only the most recent scan for each of your sites). Beware if you scan a very large site, this will use a significant amount of disc space. As an alternative, you can save data manually when you choose File > Save Data and load back in using File > Open.

Only the data for the most recent scan of any site is kept, but your autosaved data can build up if you scan large sites or scan different sites regularly. You can manage this data using Tools > Manage Autosave Data (⌘5).

'Show charts while scan is running' shows pie and radar charts by the progress bar while a scan is in progress. The scan will be a little more efficient with this setting switched off.

Labels

'Labels' refers to the colouring of the table rows - red means a server error (a 5xx server response code) or a client error (4xx server response code). orange is a warning, usually meaning a redirect (3xx server response code). Click the colour wells to change the colours if you want to. You can switch off the colour labelling altogether. If you don't want to see the orange warning for redirects, you can change the colour to white or transparent, or if you're not interested in redirects at all, you can switch this feature off on the Preferences > Links tab.

Verbosity

Any setting other than silent causes text to be sent to the Console and a logfile. This takes time and disk space, so only use a setting other than silent if you experience problems that you want to analyse. This setting will be reduced if necessary to 'a few warnings' on launch.

Links

Display 'Appears on' as

Certain columns of certain tables show which page each occurence of each link appears on. These columns can display the url or the title of the page in question.

Redirects

It's not uncommon for a link's url to redirect to another page. Maybe even more than one redirect before reaching its final destination. Some webmasters like to correct links so that they point at that final destination. Others don't consider this a problem at all. If you like, you can switch off the reporting of such redirects and only see the final url and status.

Check for soft 404s

A 'soft 404' is a page that looks like an error page, but which returns a 200 response code. The page may say 'Page not found' or it may be a default page such as the home page or a special page set up for the purpose.

If the page doesn't state that the requested page hasn't been found then it's confusing for the visitor. Unless the page returns a 404 or 410 code then it's very difficult for a web crawler to find the broken link.

Google doesn't like such pages - they don't want to index a page which isn't the intended page. It's best if your site returns a 404 or 410 code when a page isn't found.

If your site does return soft 404s or if you want to find your external links that link to soft 404s, then Scrutiny can try to spot and highlight them. It does this by searching for certain text on the page (such as 'page not found'). If you know that your site generates these soft 404s then switch the feature on and type a piece of unique text, which is only found on your soft 404 page (such as 'page not found') into the box.

When scanning a secure (https) site

There are many ways that Scrutiny can help you to find insecure links or insecure content - details are here.

Sitemap

Here are some basic settings which configure the xml sitemap file that Scrutiny produces. You can include images if you like, and include pdf files as if they were web pages. Note that, at present, if you change the lastmod or pdf setting, you'll need to re-scan.

You can also edit the template which wraps your urls in the xml file