How to write a link checker in the browser with Vanilla JavaScript

TLDR; A simple link checker running from a snippet or console has some secondary advantages like jumping to the links and showing CSP and CORB errors

I’ve been experimenting more with JavaScript and working more from the console.

It was great to see Santhosh Tuppad using a lot of JavaScript from the console in his security workshop at LTG Gathering. I thought his encouragement to everyone to create a local unpacked adhoc chrome extension was a good idea.

I created a tutorial showing how to create a Chrome extension on YouTube

https://www.youtube.com/watch?v=Olz4wo-ILwI

And I have a couple of examples of extensions on github

one that I’m building with Viv Richards https://github.com/eviltester/usefuljssnippetextension
one that I use for my own adhoc experiments https://github.com/eviltester/javascriptbotshowcase

Working with Chrome extensions gives you access to a few more APIs to avoid some of the constraints of cross site scripting.

External Link Checkers

I use link checkers like:

Total Validator

To externally crawl my site for errors.

External crawlers are important for finding the status of pages e.g. 404, 200

Building a Link Checker

As a quick experiment I wanted to see how much of a link checker I could build in JavaScript and run it from snippets.

I have uploaded all the code as a Gist:

https://gist.github.com/eviltester/8a27ca23f7475d6b47fc99fc11ad3198

Find all the links

In essence, what I do is:

find all the links
iterate over them

Which looks like this:

var links = document.querySelectorAll("a"); 
var linkReport = [];
links.forEach(function(link){
    var reportLine = {url: link.getAttribute('href'), status:0, message : "", element : link};
    linkReport.push(reportLine);
    // do stuff to the reportLine and link here
});
console.table(linkReport);

You could run this from the console, or add it as a snippet.

When it finishes it uses the console.table functionality to output all the objects.

The objects are created in the line:

var reportLine = {url: link.getAttribute('href'), status:0, message : "", element : link};

In this form it doesn’t really do anything, but…

…if anything caught your eye in the table, say it was link 70 in the table, then you could, in the console…

Scroll it into view:

linkReport[70].element.scrollIntoView()

And highlight it on screen with:

linkReport[70].element.style.backgroundColor = "red"

So it might be useful in that simple form.

Checking Links

I wanted to check the links by making a HEAD request.

I knew this wouldn’t work for all links because the browser would block some of the requests due to cross site scripting concerns.

Extensions like Check My Links will use Chrome APIs to avoid the XSS issues.

But I carried on regardless to see if anything interesting would happen.

I initially used XMLHttpRequests:

    var http = new XMLHttpRequest();
        http.open('HEAD', reportLine.url);
        
        http.onreadystatechange = (function(line,xhttp) {
            return function(){
                if (xhttp.readyState == xhttp.DONE) {
                    line.status = xhttp.status;
                    line.message = xhttp.responseText + xhttp.statusText;
                    linksChecked++;
                    console.table(xhttp);
                }
            }
        })(reportLine, http);
        http.send();

This console logs the http request as it works.

Because this is callback based, if I output the table after the loop it would not have all the request status so I maintain a count of links checked linksChecked++; and added a polling mechanism after the loop:

var finishReport = setInterval(
                        function(){
                              if(linksChecked>=linkReport.length){
                                  console.table(linkReport);
                                  clearInterval(finishReport);
                                  }
                               }
                        , 3000);

This way the final console.table report is only shown when the number of links check matches the number of links in the array.

Simple link checker using XMLHttpRequest

Giving me a simple link checker like this:

var links = document.querySelectorAll("a");
var linkReport = [];
var linksChecked=0;
links.forEach(function(link){
    var http = new XMLHttpRequest();
    var reportLine = {url: link.getAttribute('href'), status:0, message : "", element : link};

        http.open('HEAD', reportLine.url);
        linkReport.push(reportLine);
        
        http.onreadystatechange = (function(line,xhttp) {
            return function(){
                if (xhttp.readyState == xhttp.DONE) {
                    line.status = xhttp.status;
                    linksChecked++;
                    line.message = xhttp.responseText + xhttp.statusText;
                    console.table(xhttp);
                }
            }
        })(reportLine, http);
        http.send();
});
var finishReport = setInterval(
                        function(){
                              if(linksChecked>=linkReport.length){
                                  console.table(linkReport);
                                  clearInterval(finishReport);
                                  }
                               }
                        , 3000);

Again I can scroll to link and make it visible.

One of the issues I have with Check My Links is that when a link fails it can be hard to find it on screen sometimes. This way I can use JavaScript in the console to jump to it.

Using Fetch

I thought I’d try with Fetch and see how different the output was:

    fetch(reportLine.url, {
      method: 'HEAD'
    })
    .then(function(response) {
        linksChecked++;
        reportLine.status=response.status;
        reportLine.message= response.statusText + " | " +
                            response.type + " | " + 
                            (response.message || "") + " | " +
                            (response.redirected ? "redirected | " : "") +
                            JSON.stringify(response.headers) ;
        console.table(response);
        }
    )
    .catch(function(error){
        reportLine.message = error;
        console.table(error);
        linksChecked++;
    });

This was a little easier to use and the response had more useful information so I crudely concatenated the response fields I was interested into the message property of the report line.

Errors

When the link checker runs it shows me all the CSP errors in the console:

VM14:1 Refused to connect to 'https://help.github.com/' because it violates the document's Content Security Policy.

And all the CORB errors:

Cross-Origin Read Blocking (CORB) blocked cross-origin response https://gist.githubusercontent.com/eviltester with MIME type text/plain. See https://www.chromestatus.com/feature/5629709824032768 for more details.

This was a useful side-effect. The table report shows me a status of 0, but I can look in the console for the other errors.

This is a useful side-effect because I don’t see these warnings with external link checkers, but it is important to be able to check that the various XSS policies are in place, or have been deliberately eased up on for some servers as appropriate.

I don’t think I have any other tools which provide me with this information easily.

Could I check status for these?

In order to try add even more information I thought I’d see if I could check the status for anything that was throwing errors in the initial log.

So I used a quick hack that I learned in Santhosh’s workshop.

Image tags are often used for XSS to pass information to another site, but I wanted to see if that could give me any status information.

function imgreport(links){    
    links.forEach(function(link){
            if(link.status==0){
                // trigger error messages with status 
                // to the console for status of 0
                var img = new Image();
                img.src = link.url;
            }
        }
    );
}

The above function creates a new image and sets the url to one of the links that failed to work with the fetch.

Would this provide more information?

It did.

With the Fetch I learned:

`Access to fetch at ‘https://twitter.com/eviltester' from origin ‘https://www.eviltester.com’ has been blocked by CORS policy: No ‘Access-Control-Allow-Origin’ header is present on the requested resource. If an opaque response serves your needs, set the request’s mode to ’no-cors’ to fetch the resource with CORS disabled.

For the same URL with the image I learned:

GET https://twitter.com/eviltester 403

What else could fetch do?

I had a look at the fetch documentation and saw that it could follow the redirects for me:

    fetch(reportLine.url, {
      method: 'HEAD',
      mode: 'cors',
      redirect: 'follow'
    })

So I added the url it was redirected to into the report

        if(response.redirected){
            reportLine.redirectedTo = response.url;
        }

My final code

The final code for my linkchecker used the ‘fetch’ version as it had more actionable and useful information.

var links = document.querySelectorAll("a");
var linkReport = [];
var linksChecked=0;
links.forEach(function(link){
    
    var reportLine = {url: link.getAttribute('href'), status:0, redirectedTo: "", message : "", element : link};
    linkReport.push(reportLine);

    console.log("HEAD " + reportLine.url);

    fetch(reportLine.url, {
      method: 'HEAD',
      mode: 'cors',
      //mode: 'no-cors',
      redirect: 'follow'
    })
    .then(function(response) {
        linksChecked++;
        reportLine.status=response.status;
        reportLine.message= response.statusText + " | " + 
                            response.type + " | " + 
                            (response.message || "") + " | " +                            
                            JSON.stringify(response.headers) ;
        if(response.redirected){
            reportLine.redirectedTo = response.url;
        }
        console.table(response);
        }
    )
    .catch(function(error){
        reportLine.message = error;
        console.table(error);
        linksChecked++;
    });

});

function imgreport(links){    
    links.forEach(function(link){
            if(link.status==0){
                // trigger error messages with status 
                // to the console for status of 0
                var img = new Image();
                img.src = link.url;
            }
        }
    );
}

var finishReport = setInterval(
                        function(){if(linksChecked>=linkReport.length){
                            console.table(linkReport);
                            imgreport(linkReport);
                            clearInterval(finishReport);
                            }}
                        , 3000);

Not an everyday link checker

I found that a useful exercise.

The link checker report is useful to me because it does reveal issues on the page that were hinted at by icons in Chrome, but very visible in the fetch error messages.

Using the console.table allows me to sort the ‘report’ in the console to make the investigation useful, and I learned a bit more about fetch

All the code is easy to copy and paste to experiment with from this gist

And if you wanted to learn a bit more JavaScript then: