Crawling for Bing results with Simple HTML DOM

Simple HTML DOM is a PHP library that helps you parse the DOM and get to find things inside the DOM very fast, instead of using plain PHP that will take you hours to make your own libraries. There is another similar that is called PHP Selector.

What we want to do is to grab the results of Bing in this case.


$keyword = $argv[1]; // send the argument when you run the script like = php this.php  your keyword
$bing = '' . $keyword . '&count=50';
// We do it with bing but it is almost the same with the other searches.
echo '#####################################';
echo '###        SEARCHING IN BING     ####';
echo '#####################################';
$html = file_get_html($bing);
$linkObjs = $html->find('li h2 a');
foreach ($linkObjs as $linkObj) {
    $title = trim($linkObj->plaintext);
    $link  = trim($linkObj->href);
    // if it is not a direct link but url reference found inside it, then extract
    if (!preg_match('/^https?/', $link) && preg_match('/q=(.+)&amp;sa=/U', $link, $matches) && preg_match('/^https?/', $matches[1])) {
        $link = $matches[1];
    } else if (!preg_match('/^https?/', $link)) { // skip if it is not a valid link
    print '<p>Title: ' . $title . '<br />\n';
    print 'Link: ' . $link . '</p>\n';    

So this is practically it, you are using the find() function from Simple HTML DOM library in the DOM to find the links.

If you need more help in this remember I can help with custom bots 😉

Leave a Reply