Osmosis one of the best web scrapers in NodeJS

Hell yeah, this also deserves the place along nightmareJS, even though this doesnt use Electron for simulating browser, but a headless parser native libxml C bindings that will do a great job.

To start, you have to make sure you have previously installed nodejs libraries along with npm and if you get an error for the libxmljs library, make sure that you install this:

npm install node-gyp 
npm install libxmljs 
npm install osmosis

And then it should be working properly if you create a file.js and run the example script.

Here you have a bunch of examples to copy and paste to test it. In order to explain it further please ask your questions or requirements for video tutorials. I might open a premium spot for it, so you better fucking invite me a cup of oil so I can continue my compromise with you.

Craiglist example:

var osmosis = require('osmosis');

osmosis
.get('www.craigslist.org/about/sites')
.find('h1 + div a')
.set('location')
.follow('@href')
.find('header + div + div li > a')
.set('category')
.follow('@href')
.paginate('.totallink + a.button.next:first')
.find('p > a')
.follow('@href')
.set({
    'title':        'section > h2',
    'description':  '#postingbody',
    'subcategory':  'div.breadbox > span[4]',
    'date':         'time@datetime',
    'latitude':     '#map@data-latitude',
    'longitude':    '#map@data-longitude',
    'images':       ['img@src']
})
.data(function(listing) {
    // do something with listing data
})
.log(console.log)
.error(console.log)
.debug(console.log)

This is the official repo for Osmosis: https://github.com/rchipka/node-osmosis

It works like this:

Leave a Reply