{"id":67,"date":"2016-07-19T19:14:12","date_gmt":"2016-07-19T19:14:12","guid":{"rendered":"http:\/\/wizardofbots.com\/network\/?p=67"},"modified":"2016-07-19T21:32:07","modified_gmt":"2016-07-19T21:32:07","slug":"osmosis-one-of-the-best-web-scrapers-in-nodejs","status":"publish","type":"post","link":"http:\/\/wizardofbots.com\/network\/osmosis-one-of-the-best-web-scrapers-in-nodejs\/","title":{"rendered":"Osmosis one of the best web scrapers in NodeJS"},"content":{"rendered":"<p><img loading=\"lazy\" decoding=\"async\" class=\"alignright\" src=\"https:\/\/media.giphy.com\/media\/Urf7wFa0FPlba\/giphy.gif\" width=\"240\" height=\"121\" \/><\/p>\n<p>Hell yeah, this also deserves the place along nightmareJS, even though this doesnt use Electron for simulating browser, but a headless parser native libxml C bindings that will do a great job.<\/p>\n<p>To start, you have to make sure you have previously installed <strong>nodejs<\/strong> libraries along with <strong>npm <\/strong>and if you get an error for the <strong>libxmljs<\/strong> library, make sure that you install this:<\/p>\n<pre class=\"lang-js prettyprint prettyprinted\"><code>npm install node-gyp \r\nnpm install libxmljs \r\nnpm install osmosis<\/code><\/pre>\n<p>And then it should be working properly if you create a file.js and run the example script.<\/p>\n<p>Here you have a bunch of examples to copy and paste to test it. In order to explain it further please ask your questions or requirements for video tutorials. I might open a premium spot for it, so you better fucking invite me a cup of oil so I can continue my compromise with you.<\/p>\n<p>Craiglist example:<\/p>\n<pre class=\"lang:js decode:true \">var osmosis = require('osmosis');\r\n\r\nosmosis\r\n.get('www.craigslist.org\/about\/sites')\r\n.find('h1 + div a')\r\n.set('location')\r\n.follow('@href')\r\n.find('header + div + div li &gt; a')\r\n.set('category')\r\n.follow('@href')\r\n.paginate('.totallink + a.button.next:first')\r\n.find('p &gt; a')\r\n.follow('@href')\r\n.set({\r\n    'title':        'section &gt; h2',\r\n    'description':  '#postingbody',\r\n    'subcategory':  'div.breadbox &gt; span[4]',\r\n    'date':         'time@datetime',\r\n    'latitude':     '#map@data-latitude',\r\n    'longitude':    '#map@data-longitude',\r\n    'images':       ['img@src']\r\n})\r\n.data(function(listing) {\r\n    \/\/ do something with listing data\r\n})\r\n.log(console.log)\r\n.error(console.log)\r\n.debug(console.log)<\/pre>\n<p>This is the official repo for Osmosis: <a href=\"https:\/\/github.com\/rchipka\/node-osmosis\" target=\"_blank\">https:\/\/github.com\/rchipka\/node-osmosis<\/a><\/p>\n<p>It works like this:<\/p>\n<p><iframe loading=\"lazy\" width=\"640\" height=\"360\" src=\"https:\/\/www.youtube.com\/embed\/8KmZmgEMfD4\" frameborder=\"0\" allowfullscreen><\/iframe><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Hell yeah, this also deserves the place along nightmareJS, even though this doesnt use Electron for simulating browser, but a headless parser native libxml C bindings that will do a great job. To start, you have to make sure you have previously installed nodejs libraries along with npm and if you get an error for [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_bbp_topic_count":0,"_bbp_reply_count":0,"_bbp_total_topic_count":0,"_bbp_total_reply_count":0,"_bbp_voice_count":0,"_bbp_anonymous_reply_count":0,"_bbp_topic_count_hidden":0,"_bbp_reply_count_hidden":0,"_bbp_forum_subforum_count":0,"footnotes":""},"categories":[33],"tags":[39,40,38],"class_list":["post-67","post","type-post","status-publish","format-standard","hentry","category-javascript","tag-nodejs","tag-osmosis","tag-scrapers"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v23.7 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Osmosis one of the best web scrapers in NodeJS - Wizard Of Bots<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"http:\/\/wizardofbots.com\/network\/osmosis-one-of-the-best-web-scrapers-in-nodejs\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Osmosis one of the best web scrapers in NodeJS - Wizard Of Bots\" \/>\n<meta property=\"og:description\" content=\"Hell yeah, this also deserves the place along nightmareJS, even though this doesnt use Electron for simulating browser, but a headless parser native libxml C bindings that will do a great job. To start, you have to make sure you have previously installed nodejs libraries along with npm and if you get an error for [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"http:\/\/wizardofbots.com\/network\/osmosis-one-of-the-best-web-scrapers-in-nodejs\/\" \/>\n<meta property=\"og:site_name\" content=\"Wizard Of Bots\" \/>\n<meta property=\"article:published_time\" content=\"2016-07-19T19:14:12+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2016-07-19T21:32:07+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/media.giphy.com\/media\/Urf7wFa0FPlba\/giphy.gif\" \/>\n<meta name=\"author\" content=\"wizardofbots\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"wizardofbots\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"http:\/\/wizardofbots.com\/network\/osmosis-one-of-the-best-web-scrapers-in-nodejs\/\",\"url\":\"http:\/\/wizardofbots.com\/network\/osmosis-one-of-the-best-web-scrapers-in-nodejs\/\",\"name\":\"Osmosis one of the best web scrapers in NodeJS - Wizard Of Bots\",\"isPartOf\":{\"@id\":\"http:\/\/wizardofbots.com\/network\/#website\"},\"primaryImageOfPage\":{\"@id\":\"http:\/\/wizardofbots.com\/network\/osmosis-one-of-the-best-web-scrapers-in-nodejs\/#primaryimage\"},\"image\":{\"@id\":\"http:\/\/wizardofbots.com\/network\/osmosis-one-of-the-best-web-scrapers-in-nodejs\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/media.giphy.com\/media\/Urf7wFa0FPlba\/giphy.gif\",\"datePublished\":\"2016-07-19T19:14:12+00:00\",\"dateModified\":\"2016-07-19T21:32:07+00:00\",\"author\":{\"@id\":\"http:\/\/wizardofbots.com\/network\/#\/schema\/person\/31f9e486da1c11791d94a861854a2a9f\"},\"breadcrumb\":{\"@id\":\"http:\/\/wizardofbots.com\/network\/osmosis-one-of-the-best-web-scrapers-in-nodejs\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"http:\/\/wizardofbots.com\/network\/osmosis-one-of-the-best-web-scrapers-in-nodejs\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"http:\/\/wizardofbots.com\/network\/osmosis-one-of-the-best-web-scrapers-in-nodejs\/#primaryimage\",\"url\":\"https:\/\/media.giphy.com\/media\/Urf7wFa0FPlba\/giphy.gif\",\"contentUrl\":\"https:\/\/media.giphy.com\/media\/Urf7wFa0FPlba\/giphy.gif\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"http:\/\/wizardofbots.com\/network\/osmosis-one-of-the-best-web-scrapers-in-nodejs\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"http:\/\/wizardofbots.com\/network\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Osmosis one of the best web scrapers in NodeJS\"}]},{\"@type\":\"WebSite\",\"@id\":\"http:\/\/wizardofbots.com\/network\/#website\",\"url\":\"http:\/\/wizardofbots.com\/network\/\",\"name\":\"Wizard Of Bots\",\"description\":\"Botting and AI community\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"http:\/\/wizardofbots.com\/network\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"http:\/\/wizardofbots.com\/network\/#\/schema\/person\/31f9e486da1c11791d94a861854a2a9f\",\"name\":\"wizardofbots\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"http:\/\/wizardofbots.com\/network\/#\/schema\/person\/image\/\",\"url\":\"http:\/\/2.gravatar.com\/avatar\/584eebc303f64610559ab9f305f6928d?s=96&d=mm&r=g\",\"contentUrl\":\"http:\/\/2.gravatar.com\/avatar\/584eebc303f64610559ab9f305f6928d?s=96&d=mm&r=g\",\"caption\":\"wizardofbots\"},\"url\":\"http:\/\/wizardofbots.com\/network\/author\/wizardofbots\/\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Osmosis one of the best web scrapers in NodeJS - Wizard Of Bots","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"http:\/\/wizardofbots.com\/network\/osmosis-one-of-the-best-web-scrapers-in-nodejs\/","og_locale":"en_US","og_type":"article","og_title":"Osmosis one of the best web scrapers in NodeJS - Wizard Of Bots","og_description":"Hell yeah, this also deserves the place along nightmareJS, even though this doesnt use Electron for simulating browser, but a headless parser native libxml C bindings that will do a great job. To start, you have to make sure you have previously installed nodejs libraries along with npm and if you get an error for [&hellip;]","og_url":"http:\/\/wizardofbots.com\/network\/osmosis-one-of-the-best-web-scrapers-in-nodejs\/","og_site_name":"Wizard Of Bots","article_published_time":"2016-07-19T19:14:12+00:00","article_modified_time":"2016-07-19T21:32:07+00:00","og_image":[{"url":"https:\/\/media.giphy.com\/media\/Urf7wFa0FPlba\/giphy.gif"}],"author":"wizardofbots","twitter_card":"summary_large_image","twitter_misc":{"Written by":"wizardofbots","Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"http:\/\/wizardofbots.com\/network\/osmosis-one-of-the-best-web-scrapers-in-nodejs\/","url":"http:\/\/wizardofbots.com\/network\/osmosis-one-of-the-best-web-scrapers-in-nodejs\/","name":"Osmosis one of the best web scrapers in NodeJS - Wizard Of Bots","isPartOf":{"@id":"http:\/\/wizardofbots.com\/network\/#website"},"primaryImageOfPage":{"@id":"http:\/\/wizardofbots.com\/network\/osmosis-one-of-the-best-web-scrapers-in-nodejs\/#primaryimage"},"image":{"@id":"http:\/\/wizardofbots.com\/network\/osmosis-one-of-the-best-web-scrapers-in-nodejs\/#primaryimage"},"thumbnailUrl":"https:\/\/media.giphy.com\/media\/Urf7wFa0FPlba\/giphy.gif","datePublished":"2016-07-19T19:14:12+00:00","dateModified":"2016-07-19T21:32:07+00:00","author":{"@id":"http:\/\/wizardofbots.com\/network\/#\/schema\/person\/31f9e486da1c11791d94a861854a2a9f"},"breadcrumb":{"@id":"http:\/\/wizardofbots.com\/network\/osmosis-one-of-the-best-web-scrapers-in-nodejs\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["http:\/\/wizardofbots.com\/network\/osmosis-one-of-the-best-web-scrapers-in-nodejs\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"http:\/\/wizardofbots.com\/network\/osmosis-one-of-the-best-web-scrapers-in-nodejs\/#primaryimage","url":"https:\/\/media.giphy.com\/media\/Urf7wFa0FPlba\/giphy.gif","contentUrl":"https:\/\/media.giphy.com\/media\/Urf7wFa0FPlba\/giphy.gif"},{"@type":"BreadcrumbList","@id":"http:\/\/wizardofbots.com\/network\/osmosis-one-of-the-best-web-scrapers-in-nodejs\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"http:\/\/wizardofbots.com\/network\/"},{"@type":"ListItem","position":2,"name":"Osmosis one of the best web scrapers in NodeJS"}]},{"@type":"WebSite","@id":"http:\/\/wizardofbots.com\/network\/#website","url":"http:\/\/wizardofbots.com\/network\/","name":"Wizard Of Bots","description":"Botting and AI community","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"http:\/\/wizardofbots.com\/network\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Person","@id":"http:\/\/wizardofbots.com\/network\/#\/schema\/person\/31f9e486da1c11791d94a861854a2a9f","name":"wizardofbots","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"http:\/\/wizardofbots.com\/network\/#\/schema\/person\/image\/","url":"http:\/\/2.gravatar.com\/avatar\/584eebc303f64610559ab9f305f6928d?s=96&d=mm&r=g","contentUrl":"http:\/\/2.gravatar.com\/avatar\/584eebc303f64610559ab9f305f6928d?s=96&d=mm&r=g","caption":"wizardofbots"},"url":"http:\/\/wizardofbots.com\/network\/author\/wizardofbots\/"}]}},"_links":{"self":[{"href":"http:\/\/wizardofbots.com\/network\/wp-json\/wp\/v2\/posts\/67"}],"collection":[{"href":"http:\/\/wizardofbots.com\/network\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/wizardofbots.com\/network\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/wizardofbots.com\/network\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"http:\/\/wizardofbots.com\/network\/wp-json\/wp\/v2\/comments?post=67"}],"version-history":[{"count":7,"href":"http:\/\/wizardofbots.com\/network\/wp-json\/wp\/v2\/posts\/67\/revisions"}],"predecessor-version":[{"id":74,"href":"http:\/\/wizardofbots.com\/network\/wp-json\/wp\/v2\/posts\/67\/revisions\/74"}],"wp:attachment":[{"href":"http:\/\/wizardofbots.com\/network\/wp-json\/wp\/v2\/media?parent=67"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/wizardofbots.com\/network\/wp-json\/wp\/v2\/categories?post=67"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/wizardofbots.com\/network\/wp-json\/wp\/v2\/tags?post=67"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}