Hey fellas, sorry for being absent for a long time, mainly it was lots of work on other projects.

In this post I am going to teach you how to screen scrape using NodeJS and JQuery (cheerio). Its relatively easy, here is the code:

var request = require('request'); // we need request library
var cheerio = require('cheerio'); // and cheerio library/ JQuery
// set some defaults
req = request.defaults({
  jar: true,                 // save cookies to jar
  rejectUnauthorized: false, 
  followAllRedirects: true   // allow redirections
});
// scrape the page
req.get({
    url: "http://www.whatsmyip.org/",
    headers: {
        'User-Agent': 'Google' // You can put the user-agent that you want
     }
  }, function(err, resp, body) {
  
  // load the html into cheerio
  var $ = cheerio.load(body);
  
  // get the data and output to console
  console.log( 'IP: ' + $('#ip').text() );  //scrape using CSS selector
  console.log( 'Host: ' + $('#hostname').text() );
  console.log( 'User-Agent: ' + $('#useragent').text() );
});

 

Hey fellas, I’ve got this little exercise I did to do a mass url shortener using TinyURL API as an example, also I added a big list of API’s for shortening URLs like Bit.ly, Tiny.cc and more.

You might not have any use for this, but its good for education, we didn’t use any cURL because the API is opened and with a simple file_get_contents we can see the result when we input the url to the main gate of this API, check it out:

<?php
// we first create the function to not repeat ourselves
function tinyurl($longUrl) {
	// We use the tiny_url API
	$short_url= file_get_contents('http://tinyurl.com/api-create.php?url=' . $longUrl);
	return $short_url; // an obviously return the data so you assign it to a variable
}
//we specify the file which is in same folder of this script
// make sure to paste line by line all the URLs in links.txt
$filename = 'links.txt';
// we open the file into variable
$links = file($filename);
$links_bucket = array(); // create our array before.
// we will iterate the $links variable which assigned to file()
foreach($links as $link) {
	// time to use the function that returns the shortened URL  
	$tinyURL = tinyurl($link);
	// and we push into the array that we will var_dump
	array_push($links_bucket, $tinyURL);
}

var_dump($links_bucket);

So with this example you can create your own functions either using file_get_contents for open API’s or cURL making POST request with proper credentials to get the output.

So here is the promised list of URL shortener services, I don’t know if some of them might help to cloak or anything, you will have to try it:

  • bit.ly
  • goo.gl
  • tinyurl.com
  • is.gd
  • cli.gs
  • pic.gd    tweetphoto
  • DwarfURL.com
  • ow.ly
  • yfrog.com
  • migre.me
  • ff.im
  • tiny.cc
  • url4.eu
  • tr.im
  • twit.ac
  • su.pr
  • twurl.nl
  • snipurl.com
  • BudURL.com
  • short.to
  • ping.fm
  • Digg.com
  • post.ly
  • Just.as
  • .tk
  • bkite.com
  • snipr.com
  • flic.kr
  • loopt.us
  • doiop.com
  • twitthis.com
  • htxt.it
  • AltURL.com
  • RedirX.com
  • DigBig.com
  • short.ie
  • u.mavrev.com
  • kl.am
  • wp.me
  • u.nu
  • rubyurl.com
  • om.ly
  • linkbee.com
  • Yep.it
  • posted.at
  • xrl.us
  • metamark.net
  • sn.im
  • hurl.ws
  • eepurl.com
  • idek.net
  • urlpire.com
  • chilp.it
  • moourl.com
  • snurl.com
  • xr.com
  • lin.cr
  • EasyURI.com
  • zz.gd
  • ur1.ca
  • URL.ie
  • adjix.com
  • twurl.cc
  • s7y.us    shrinkify
  • EasyURL.net
  • atu.ca
  • sp2.ro
  • Profile.to
  • ub0.cc
  • minurl.fr
  • cort.as
  • fire.to
  • 2tu.us
  • twiturl.de
  • to.ly
  • BurnURL.com
  • nn.nf
  • clck.ru
  • notlong.com
  • thrdl.es
  • spedr.com
  • vl.am
  • miniurl.com
  • virl.com
  • PiURL.com
  • 1url.com
  • gri.ms
  • tr.my
  • Sharein.com
  • urlzen.com
  • fon.gs
  • Shrinkify.com
  • ri.ms
  • b23.ru
  • Fly2.ws
  • xrl.in
  • Fhurl.com
  • wipi.es
  • korta.nu
  • shortna.me
  • fa.b
  • WapURL.co.uk
  • urlcut.com
  • 6url.com
  • abbrr.com
  • SimURL.com
  • klck.me
  • x.se
  • 2big.at
  • url.co.uk
  • ewerl.com
  • inreply.to
  • TightURL.com
  • a.gg
  • tinytw.it
  • zi.pe
  • riz.gd
  • hex.io
  • fwd4.me
  • bacn.me
  • shrt.st
  • ln-s.ru
  • tiny.pl
  • o-x.fr
  • StartURL.com
  • jijr.com
  • shorl.com
  • icanhaz.com
  • updating.me
  • kissa.be
  • hellotxt.com
  • pnt.me
  • nsfw.in
  • xurl.jp
  • yweb.com
  • urlkiss.com
  • QLNK.net
  • w3t.org
  • lt.tl
  • twirl.at
  • zipmyurl.com
  • urlot.com
  • a.nf
  • hurl.me
  • URLHawk.com
  • Tnij.org
  • 4url.cc
  • firsturl.de
  • Hurl.it
  • sturly.com
  • shrinkster.com
  • ln-s.net
  • go2cut.com
  • liip.to
  • shw.me
  • XeeURL.com
  • liltext.com
  • lnk.gd
  • xzb.cc
  • linkbun.ch
  • href.in
  • urlbrief.com
  • 2ya.com
  • safe.mn
  • shrunkin.com
  • bloat.me
  • krunchd.com
  • minilien.com
  • ShortLinks.co.uk
  • qicute.com
  • rb6.me
  • urlx.ie
  • pd.am
  • go2.me
  • tinyarro.ws
  • tinyvid.io
  • lurl.no
  • ru.ly
  • lru.jp
  • rickroll.it
  • togoto.us
  • ClickMeter.com
  • hugeurl.com
  • tinyuri.ca
  • shrten.com
  • shorturl.com
  • Quip-Art.com
  • urlao.com
  • a2a.me
  • tcrn.ch
  • goshrink.com
  • DecentURL.com
  • decenturl.com
  • zi.ma
  • 1link.in
  • sharetabs.com
  • shoturl.us
  • fff.to
  • hover.com
  • lnk.in
  • jmp2.net
  • dy.fi
  • urlcover.com
  • 2pl.us
  • tweetburner.com
  • u6e.de
  • xaddr.com
  • gl.am
  • dfl8.me
  • go.9nl.com
  • gurl.es
  • C-O.IN
  • TraceURL.com
  • liurl.cn
  • MyURL.in
  • urlenco.de
  • ne1.net
  • buk.me
  • rsmonkey.com
  • cuturl.com
  • turo.us
  • sqrl.it
  • iterasi.net
  • tiny123.com
  • EsyURL.com
  • urlx.org
  • IsCool.net
  • twitterpan.com
  • GoWat.ch
  • poprl.com
  • njx.me

Try it yourself and let me know how it worked 😉

Im done by today I might come with other stuff later, Im on my PHP spree, I might change soon. So don’t question why I haven’t used other languages. I just feel like it.

 

cURL might be confusing for many, but it is matter of learning the basics on how to config the options, knowing which method (GET, POST, PUT) but..

Why in the hell would you use curl for? 

  • Requests to API’s sending parameters and stuff. Consume them easily.
  • Requests to websites, download and stuff.
  • Crawl websites parsing DOM and following links.
  • many more..
// the best fuction so you stop saving time.
function curlwiz($uri, $method='GET', $data=null, $curl_headers=array(), $curl_options=array()) {
  // default curl options which will almost be static, you can modify if you want
  $default_curl_options = array(
    CURLOPT_SSL_VERIFYPEER => false,
    CURLOPT_HEADER => true,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_TIMEOUT => 3,
  );
  // you can set the default headers into this array, usually you dont need them.
  $default_headers = array();

  // We need to trim and change into MAYUS the method passed
  $method = strtoupper(trim($method));
  $allowed_methods = array('GET', 'POST', 'PUT', 'DELETE'); // array with allowed methods. 

  if(!in_array($method, $allowed_methods)) // if the method from input is not in allowed_methods array, then throw an error.
    throw new \Exception("'$method' is not valid cURL HTTP method.");

  if(!empty($data) && !is_string($data))
    throw new \Exception("Invalid data for cURL request '$method $uri'");

  // init
  $curl = curl_init($uri);

  // apply default options
  curl_setopt_array($curl, $default_curl_options);

  // apply method specific options
  switch($method) {
    case 'GET':
      break;
    case 'POST':
      if(!is_string($data))
        throw new \Exception("Invalid data for cURL request '$method $uri'");
      curl_setopt($curl, CURLOPT_POST, true);
      curl_setopt($curl, CURLOPT_POSTFIELDS, $data);
      break;
    case 'PUT':
      if(!is_string($data))
        throw new \Exception("Invalid data for cURL request '$method $uri'");
      curl_setopt($curl, CURLOPT_CUSTOMREQUEST, $method);
      curl_setopt($curl, CURLOPT_POSTFIELDS, $data);
      break;
    case 'DELETE':
      curl_setopt($curl, CURLOPT_CUSTOMREQUEST, $method);
      break;
  }

  // apply user options
  curl_setopt_array($curl, $curl_options);

  // add headers
  curl_setopt($curl, CURLOPT_HTTPHEADER, array_merge($default_headers, $curl_headers));

  // parse result from curl
  $raw = rtrim(curl_exec($curl));
  //var_dump($raw);
  $lines = explode("\r\n", $raw); // we exploder curl response line by line
  var_dump($lines);
  $headers = array(); 
  $content = '';
  $write_content = false;
  if(count($lines) > 3) {
    foreach($lines as $h) {
      if($h == '')
        $write_content = true;
      else {
        if($write_content)
          $content .= $h."\n";
        else
          $headers[] = $h;
      }
    }
  }
  $error = curl_error($curl);

  curl_close($curl);

  // return
  return array(
    'raw' => $raw,
    'headers' => $headers,
    'content' => $content,
    'error' => $error
  );
}

curlwiz('http://facebook.com', 'GET');

With this you will be able to easily do do cURL just using 1 line of code and pre-configured set up of cURL into a function and using switch in the function. So we can split to know which configuration needs to be done if GET, or which other with POST.

There are many guides on how to config cURL but there is one better than all guides:

http://php.net/manual/en/curl.examples-basic.php

But there are also plenty of wrappers built by many other coders, but now you understand how they create so easy wrappers, many using OOP and many others with functions and switch (which I like much more).

List of wrappers i found:

With that! you will be able to code like this lazy fat: