Link Checking Module - 1st attempt
March 19th, 2008 by Aaron
So I wrote some code the other day. It sat in my code repository and I never tested it. I was pretty certain it was going to be some good code, though.
A few weeks later I came back to it and looked through it - and laughed!! Anyone figure out where ALL the holes are in this code?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 | <?php class linkChecker { protected $_links = array(); protected $_sites = array(); public function __construct() { } public function addSite($site) { if (in_array($site, $this->_sites)) { throw new linkException("Site already in list"); } $this->_sites[] = $site; } public function processSites() { foreach ($this->_sites as $site) { $this->_processLinks($site); } } protected function _processLinks($url) { $this->_addLink($url, $url); $d = new DomDocument; @$d->loadHTMLFile($url); foreach ($d->getElementsByTagName('a') as $link) { $this->_addLink($link->getAttribute('href'), $url); } unset($d); } protected function _addLink($link, $url) { $l = new checkableLink($link, $url); if (!isset($this->_links[$l->url])) { $this->_checkLink($l); $this->_links[$l->url] = $l; } unset($l); } protected function _checkLink(checkableLink &$checkableLink) { $d = new DomDocument; $d->loadHTMLFile($checkableLink->url) or $checkableLink->valid = false; } } class checkableLink { public $host = null; public $url = null; public $checked = false; public $valid = true; public function __construct($link = null, $url = null) { if (stripos($link, '/') === 0) { $this->url = $url . $link; } else { $this->url = $url; } } } class linkException extends exception {} ?> |
Tags: PHP
This entry was posted on Wednesday, March 19th, 2008 at 6:11 pm and is filed under PHP • Website Monitoring Project. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.















April 17th, 2008 at 8:59 am
For starters, I think I would validate the sites being added via the public addSite() method. I would open a socket and throw a HEAD request out there just to make sure it existed. Or are you looking for a little more than that
May 3rd, 2008 at 1:20 pm
Yeh - this was just a top of my head code snippet. I think if I DO write one, I’ll be more apt to look at the various different HTTP protocol bits and handle redirects and all that