Link Checking Module - 1st attempt

March 19th, 2008 by Aaron

So I wrote some code the other day. It sat in my code repository and I never tested it. I was pretty certain it was going to be some good code, though.

A few weeks later I came back to it and looked through it - and laughed!! Anyone figure out where ALL the holes are in this code?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
<?php
class linkChecker
{
    protected $_links = array();
 
    protected $_sites = array();
 
 
    public function __construct() 
    {
 
    }
 
    public function addSite($site)
    {
        if (in_array($site, $this->_sites)) {
            throw new linkException("Site already in list");
        }
 
        $this->_sites[] = $site;
    }
 
    public function processSites()
    {
        foreach ($this->_sites as $site) {
            $this->_processLinks($site);
        }
    }
 
    protected function _processLinks($url)
    {
        $this->_addLink($url, $url);
        $d = new DomDocument;
        @$d->loadHTMLFile($url);
        foreach ($d->getElementsByTagName('a') as $link) {
            $this->_addLink($link->getAttribute('href'), $url);
        }
        unset($d);
    }
 
    protected function _addLink($link, $url)
    {
        $l = new checkableLink($link, $url);
        if (!isset($this->_links[$l->url])) {
            $this->_checkLink($l);
            $this->_links[$l->url] = $l;
        }
        unset($l);
    }
 
    protected function _checkLink(checkableLink &amp;$checkableLink)
    {
        $d = new DomDocument;
        $d->loadHTMLFile($checkableLink->url) or $checkableLink->valid = false;
    }
}
 
 
 
 
 
 
class checkableLink
{
    public $host = null;
 
    public $url = null;
 
    public $checked = false;
 
    public $valid = true;
 
    public function __construct($link = null, $url = null)
    {
        if (stripos($link, '/') === 0) {
            $this->url = $url . $link;
        }
        else {
            $this->url = $url;
        }
    }
}
 
class linkException extends exception
{}
?>

Tags:


2 Responses to “Link Checking Module - 1st attempt”

  1. Todd Says:

    For starters, I think I would validate the sites being added via the public addSite() method. I would open a socket and throw a HEAD request out there just to make sure it existed. Or are you looking for a little more than that ;-)

  2. Aaron Says:

    Yeh - this was just a top of my head code snippet. I think if I DO write one, I’ll be more apt to look at the various different HTTP protocol bits and handle redirects and all that :)

Leave a Reply

©2008 102 Degrees LLC - All Rights Reserved Home Services Products Network Blog Open Source Learning Contact