{"id":17,"date":"2016-07-15T00:49:45","date_gmt":"2016-07-15T00:49:45","guid":{"rendered":"http:\/\/wizardofbots.com\/network\/?p=17"},"modified":"2016-07-15T03:14:07","modified_gmt":"2016-07-15T03:14:07","slug":"crawling-for-google-bing-and-yahoo-results-with-simple-html-dom","status":"publish","type":"post","link":"http:\/\/wizardofbots.com\/network\/crawling-for-google-bing-and-yahoo-results-with-simple-html-dom\/","title":{"rendered":"Crawling for Bing results with Simple HTML DOM"},"content":{"rendered":"<p><a href=\"http:\/\/simplehtmldom.sourceforge.net\/\">Simple HTML DOM<\/a> is a PHP library that helps you parse the DOM and get to find things inside the DOM very fast, instead of using plain PHP that will take you hours to make your own libraries. There is another similar that is called <a href=\"https:\/\/github.com\/tj\/php-selector\">PHP Selector<\/a>.<\/p>\n<p>What we want to do is to grab the results of Bing in this case.<\/p>\n<p>&nbsp;<\/p>\n<pre class=\"lang:php decode:true \">&lt;?php\r\n$keyword = $argv[1]; \/\/ send the argument when you run the script like = php this.php  your keyword\r\nrequire_once('simple_html_dom.php');\r\n$bing = 'http:\/\/www.bing.com\/search?q=' . $keyword . '&amp;count=50';\r\n\/\/ We do it with bing but it is almost the same with the other searches.\r\necho '#####################################';\r\necho '###        SEARCHING IN BING     ####';\r\necho '#####################################';\r\n$html = file_get_html($bing);\r\n$linkObjs = $html-&gt;find('li h2 a');\r\nforeach ($linkObjs as $linkObj) {\r\n    $title = trim($linkObj-&gt;plaintext);\r\n    $link  = trim($linkObj-&gt;href);\r\n    \r\n    \/\/ if it is not a direct link but url reference found inside it, then extract\r\n    if (!preg_match('\/^https?\/', $link) &amp;&amp; preg_match('\/q=(.+)&amp;amp;sa=\/U', $link, $matches) &amp;&amp; preg_match('\/^https?\/', $matches[1])) {\r\n        $link = $matches[1];\r\n    } else if (!preg_match('\/^https?\/', $link)) { \/\/ skip if it is not a valid link\r\n        continue;    \r\n    }\r\n    \r\n    print '&lt;p&gt;Title: ' . $title . '&lt;br \/&gt;\\n';\r\n    print 'Link: ' . $link . '&lt;\/p&gt;\\n';    \r\n}<\/pre>\n<p>So this is practically it, you are using the find() function from Simple HTML DOM library in the DOM to find the links.<\/p>\n<p>If you need more help in this remember I can help with custom bots \ud83d\ude09<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Simple HTML DOM is a PHP library that helps you parse the DOM and get to find things inside the DOM very fast, instead of using plain PHP that will take you hours to make your own libraries. There is another similar that is called PHP Selector. What we want to do is to grab [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":24,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[10],"tags":[12,14,11,13],"class_list":["post-17","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-php","tag-bing-results","tag-php","tag-scraping","tag-search-engines"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v23.7 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Crawling for Bing results with Simple HTML DOM - Wizard Of Bots<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"http:\/\/wizardofbots.com\/network\/crawling-for-google-bing-and-yahoo-results-with-simple-html-dom\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Crawling for Bing results with Simple HTML DOM - Wizard Of Bots\" \/>\n<meta property=\"og:description\" content=\"Simple HTML DOM is a PHP library that helps you parse the DOM and get to find things inside the DOM very fast, instead of using plain PHP that will take you hours to make your own libraries. There is another similar that is called PHP Selector. What we want to do is to grab [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"http:\/\/wizardofbots.com\/network\/crawling-for-google-bing-and-yahoo-results-with-simple-html-dom\/\" \/>\n<meta property=\"og:site_name\" content=\"Wizard Of Bots\" \/>\n<meta property=\"article:published_time\" content=\"2016-07-15T00:49:45+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2016-07-15T03:14:07+00:00\" \/>\n<meta property=\"og:image\" content=\"http:\/\/wizardofbots.com\/network\/wp-content\/uploads\/2016\/07\/php-simple-html-dom-parser.png\" \/>\n\t<meta property=\"og:image:width\" content=\"650\" \/>\n\t<meta property=\"og:image:height\" content=\"150\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"wizardofbots\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"wizardofbots\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"http:\/\/wizardofbots.com\/network\/crawling-for-google-bing-and-yahoo-results-with-simple-html-dom\/\",\"url\":\"http:\/\/wizardofbots.com\/network\/crawling-for-google-bing-and-yahoo-results-with-simple-html-dom\/\",\"name\":\"Crawling for Bing results with Simple HTML DOM - Wizard Of Bots\",\"isPartOf\":{\"@id\":\"https:\/\/wizardofbots.com\/network\/#website\"},\"primaryImageOfPage\":{\"@id\":\"http:\/\/wizardofbots.com\/network\/crawling-for-google-bing-and-yahoo-results-with-simple-html-dom\/#primaryimage\"},\"image\":{\"@id\":\"http:\/\/wizardofbots.com\/network\/crawling-for-google-bing-and-yahoo-results-with-simple-html-dom\/#primaryimage\"},\"thumbnailUrl\":\"http:\/\/wizardofbots.com\/network\/wp-content\/uploads\/2016\/07\/php-simple-html-dom-parser.png\",\"datePublished\":\"2016-07-15T00:49:45+00:00\",\"dateModified\":\"2016-07-15T03:14:07+00:00\",\"author\":{\"@id\":\"https:\/\/wizardofbots.com\/network\/#\/schema\/person\/31f9e486da1c11791d94a861854a2a9f\"},\"breadcrumb\":{\"@id\":\"http:\/\/wizardofbots.com\/network\/crawling-for-google-bing-and-yahoo-results-with-simple-html-dom\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"http:\/\/wizardofbots.com\/network\/crawling-for-google-bing-and-yahoo-results-with-simple-html-dom\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"http:\/\/wizardofbots.com\/network\/crawling-for-google-bing-and-yahoo-results-with-simple-html-dom\/#primaryimage\",\"url\":\"http:\/\/wizardofbots.com\/network\/wp-content\/uploads\/2016\/07\/php-simple-html-dom-parser.png\",\"contentUrl\":\"http:\/\/wizardofbots.com\/network\/wp-content\/uploads\/2016\/07\/php-simple-html-dom-parser.png\",\"width\":650,\"height\":150},{\"@type\":\"BreadcrumbList\",\"@id\":\"http:\/\/wizardofbots.com\/network\/crawling-for-google-bing-and-yahoo-results-with-simple-html-dom\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/wizardofbots.com\/network\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Crawling for Bing results with Simple HTML DOM\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/wizardofbots.com\/network\/#website\",\"url\":\"https:\/\/wizardofbots.com\/network\/\",\"name\":\"Wizard Of Bots\",\"description\":\"Botting and AI community\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/wizardofbots.com\/network\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/wizardofbots.com\/network\/#\/schema\/person\/31f9e486da1c11791d94a861854a2a9f\",\"name\":\"wizardofbots\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/wizardofbots.com\/network\/#\/schema\/person\/image\/\",\"url\":\"http:\/\/2.gravatar.com\/avatar\/584eebc303f64610559ab9f305f6928d?s=96&d=mm&r=g\",\"contentUrl\":\"http:\/\/2.gravatar.com\/avatar\/584eebc303f64610559ab9f305f6928d?s=96&d=mm&r=g\",\"caption\":\"wizardofbots\"},\"url\":\"http:\/\/wizardofbots.com\/network\/author\/wizardofbots\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Crawling for Bing results with Simple HTML DOM - Wizard Of Bots","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"http:\/\/wizardofbots.com\/network\/crawling-for-google-bing-and-yahoo-results-with-simple-html-dom\/","og_locale":"en_US","og_type":"article","og_title":"Crawling for Bing results with Simple HTML DOM - Wizard Of Bots","og_description":"Simple HTML DOM is a PHP library that helps you parse the DOM and get to find things inside the DOM very fast, instead of using plain PHP that will take you hours to make your own libraries. There is another similar that is called PHP Selector. What we want to do is to grab [&hellip;]","og_url":"http:\/\/wizardofbots.com\/network\/crawling-for-google-bing-and-yahoo-results-with-simple-html-dom\/","og_site_name":"Wizard Of Bots","article_published_time":"2016-07-15T00:49:45+00:00","article_modified_time":"2016-07-15T03:14:07+00:00","og_image":[{"width":650,"height":150,"url":"http:\/\/wizardofbots.com\/network\/wp-content\/uploads\/2016\/07\/php-simple-html-dom-parser.png","type":"image\/png"}],"author":"wizardofbots","twitter_card":"summary_large_image","twitter_misc":{"Written by":"wizardofbots","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"http:\/\/wizardofbots.com\/network\/crawling-for-google-bing-and-yahoo-results-with-simple-html-dom\/","url":"http:\/\/wizardofbots.com\/network\/crawling-for-google-bing-and-yahoo-results-with-simple-html-dom\/","name":"Crawling for Bing results with Simple HTML DOM - Wizard Of Bots","isPartOf":{"@id":"https:\/\/wizardofbots.com\/network\/#website"},"primaryImageOfPage":{"@id":"http:\/\/wizardofbots.com\/network\/crawling-for-google-bing-and-yahoo-results-with-simple-html-dom\/#primaryimage"},"image":{"@id":"http:\/\/wizardofbots.com\/network\/crawling-for-google-bing-and-yahoo-results-with-simple-html-dom\/#primaryimage"},"thumbnailUrl":"http:\/\/wizardofbots.com\/network\/wp-content\/uploads\/2016\/07\/php-simple-html-dom-parser.png","datePublished":"2016-07-15T00:49:45+00:00","dateModified":"2016-07-15T03:14:07+00:00","author":{"@id":"https:\/\/wizardofbots.com\/network\/#\/schema\/person\/31f9e486da1c11791d94a861854a2a9f"},"breadcrumb":{"@id":"http:\/\/wizardofbots.com\/network\/crawling-for-google-bing-and-yahoo-results-with-simple-html-dom\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["http:\/\/wizardofbots.com\/network\/crawling-for-google-bing-and-yahoo-results-with-simple-html-dom\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"http:\/\/wizardofbots.com\/network\/crawling-for-google-bing-and-yahoo-results-with-simple-html-dom\/#primaryimage","url":"http:\/\/wizardofbots.com\/network\/wp-content\/uploads\/2016\/07\/php-simple-html-dom-parser.png","contentUrl":"http:\/\/wizardofbots.com\/network\/wp-content\/uploads\/2016\/07\/php-simple-html-dom-parser.png","width":650,"height":150},{"@type":"BreadcrumbList","@id":"http:\/\/wizardofbots.com\/network\/crawling-for-google-bing-and-yahoo-results-with-simple-html-dom\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/wizardofbots.com\/network\/"},{"@type":"ListItem","position":2,"name":"Crawling for Bing results with Simple HTML DOM"}]},{"@type":"WebSite","@id":"https:\/\/wizardofbots.com\/network\/#website","url":"https:\/\/wizardofbots.com\/network\/","name":"Wizard Of Bots","description":"Botting and AI community","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/wizardofbots.com\/network\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/wizardofbots.com\/network\/#\/schema\/person\/31f9e486da1c11791d94a861854a2a9f","name":"wizardofbots","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/wizardofbots.com\/network\/#\/schema\/person\/image\/","url":"http:\/\/2.gravatar.com\/avatar\/584eebc303f64610559ab9f305f6928d?s=96&d=mm&r=g","contentUrl":"http:\/\/2.gravatar.com\/avatar\/584eebc303f64610559ab9f305f6928d?s=96&d=mm&r=g","caption":"wizardofbots"},"url":"http:\/\/wizardofbots.com\/network\/author\/wizardofbots\/"}]}},"_links":{"self":[{"href":"http:\/\/wizardofbots.com\/network\/wp-json\/wp\/v2\/posts\/17"}],"collection":[{"href":"http:\/\/wizardofbots.com\/network\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/wizardofbots.com\/network\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/wizardofbots.com\/network\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/wizardofbots.com\/network\/wp-json\/wp\/v2\/comments?post=17"}],"version-history":[{"count":9,"href":"http:\/\/wizardofbots.com\/network\/wp-json\/wp\/v2\/posts\/17\/revisions"}],"predecessor-version":[{"id":34,"href":"http:\/\/wizardofbots.com\/network\/wp-json\/wp\/v2\/posts\/17\/revisions\/34"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/wizardofbots.com\/network\/wp-json\/wp\/v2\/media\/24"}],"wp:attachment":[{"href":"http:\/\/wizardofbots.com\/network\/wp-json\/wp\/v2\/media?parent=17"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/wizardofbots.com\/network\/wp-json\/wp\/v2\/categories?post=17"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/wizardofbots.com\/network\/wp-json\/wp\/v2\/tags?post=17"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}