Building A Page Scraper With Ruby and Mechanize

https://buyanddefi.com

Just like interest rates, staking/defi rates change periodically. Collecting and entering this information from different crypto exchange websites manually can be a painstaking task. Unfortunately, very few crypto exchanges provide an API to collect that information. So, I decided to write a page scraper to collect the data and then have it upload to a database.

Enter Mechanize

Years ago, I used a Perl module called Mechanize to “browse” the web and collect data from websites. I was happy to see that there was a ruby gem of the same name and capabilities. Here’s a description from the Github repo:

The Mechanize library is used for automating interaction with websites. Mechanize automatically stores and sends cookies, follows redirects, and can follow links and submit forms. Form fields can be populated and submitted. Mechanize also keeps track of the sites that you have visited as a history.

So, I decided to try it out on Binance US.

When you visit https://binance.us/staking, you can see on the page the offers listed in the various cards on the page. In order to get the information, I needed to write a script that did the following:

  1. Visit the staking page
  2. Identify all the cards on the page
  3. Extract the Symbol, Name, and Rate from each card
  4. Store it in an object (JSON)

Visiting the staking page

After installing the mechanize gem, the first thing I needed to do was create a Mechanize object and I set the user_agent to ‘Windows Chrome’.

require 'mechanize'

a = Mechanize.new { |agent|
agent.user_agent = 'Windows Chrome'
}

Then, I needed to visit the staking page.

page = a.get('https://www.binance.us/staking')

To see what information page received, you can do an inspection and see the following information:

#<Mechanize::Page
{url #<URI::HTTPS https://www.binance.us/staking>}
{meta_refresh}
{title “Earn Rewards With Crypto Staking | Binance.US”}
{iframes
#<Mechanize::Page::Frame
nil
https://www.googletagmanager.com/ns.html?id=GTM-5PWJS9J">}
{frames}

Identify all the cards on the page

Mechanize provides a method called search. Using xpath as a parameter, I was able to collect all the elements on the page that represented a card.

xpath = '//*[@id="__APP"]/div/div[1]/div/div[2]/div/div/div/div/child::*'
cards = page.search(xpath)

Extract the Symbol, Name, and Rate From Each Card

Now comes the fun part. Inspecting the html value of one of the cards collected by Mechanize, I can see that the information can be found in the HTML/XML collected by the search method.

<div class="card-item">
<div class="sc-bcXHqe fLkoyl item-intro">
<div class="item-icon">
<img style="border-radius:100%;overflow:hidden;display:block" width="40" height="40" src="https://static.binance.us/image/admin_mgs_image_upload/20210202/b2a3e592-3659-4e5d-9ef6-474cb9f61963.png">
</div>
<div class="sc-bcXHqe lcVLsA item-info">
<div>
<h3 class="sc-bcXHqe gcebil">ETH</h3>
<p style="white-space:nowrap">Ethereum</p>
</div>
<div class="item-info-rate" style="color:#828384">
<span style="font-size:20px">5.00<!-- -->%</span><span style="font-size:20px"> APY</span>
</div>
</div>
</div>
<div class="sc-bcXHqe lcVLsA item-btns">
<button data-bn-type="button" class="sc-bcXHqe ibyizy">Stake <!-- -->ETH</button><button data-bn-type="button" class="sc-bcXHqe kqxVKL">Buy <!-- -->ETH</button>
</div>
</div>

In the html, the values that need to be extracted are there.

  • Symbol (ETH)
  • Name (Ethereum)
  • APY (5.00)

Playing around with the methods element_children, elements, children and text, I was able to traverse through the html (xml) and find the text that contained the symbol ETH.

coin[:symbol] = card.element_children[0].element_children[0].elements.children.children[0].text

Repeating the process, I retrieved the name and the APY. Because I’m only interested in the APY rate, I removed the % symbol as part of my extraction process using the following regular expression pattern, /(.*)%/.

 coin[:name] = card.element_children[0].element_children[0].elements.children.children[1].text
coin[:rate] = card.element_children[0].element_children[0].elements.children.children[2].text[/(.*)%/,1]

With 1 card down, I added a loop that would iterate through all the cards collected by my search method earlier and store the information in a json/hash.

# initialize an array
binance = []
cards.each do | card |
# initialze a hash to store the extracted information
coin = {}
coin[:symbol] = card.element_children[0].element_children[0].elements.children.children[0].text
coin[:name] = card.element_children[0].element_children[0].elements.children.children[1].text
coin[:rate] = card.element_children[0].element_children[0].elements.children.children[2].text[/(.*)%/,1]
# add the coin to the array
binance << coin
end

And there you have it. As of this writing, the following rates from Binance US are:

[{:symbol=>"ETH", :name=>"Ethereum", :rate=>"5.00"},
{:symbol=>"BNB", :name=>"BNB", :rate=>"2.50"},
{:symbol=>"ADA", :name=>"Cardano", :rate=>"4.30"},
{:symbol=>"MATIC", :name=>"Polygon", :rate=>"6.40"},
{:symbol=>"VET", :name=>"VeChain", :rate=>"2.00"},
{:symbol=>"SOL", :name=>"Solana", :rate=>"7.70"},
{:symbol=>"ATOM", :name=>"Cosmos", :rate=>"15.00"},
{:symbol=>"ONE", :name=>"Harmony", :rate=>"7.20"},
{:symbol=>"AVAX", :name=>"Avalanche", :rate=>"5.40"},
{:symbol=>"DOT", :name=>"Polkadot", :rate=>"10.30"},
{:symbol=>"NEAR", :name=>"NEAR Protocol", :rate=>"7.00"},
{:symbol=>"ALGO", :name=>"Algorand", :rate=>"6.80"},
{:symbol=>"FTM", :name=>"Fantom ", :rate=>"3.40"},
{:symbol=>"GRT", :name=>"The Graph", :rate=>"6.70"},
{:symbol=>"XTZ", :name=>"Tezos ", :rate=>"3.00"},
{:symbol=>"ROSE", :name=>"Oasis Network", :rate=>"4.50"},
{:symbol=>"FLOW", :name=>"FLOW", :rate=>"7.00"},
{:symbol=>"TRX", :name=>"TRON", :rate=>"4.90"},
{:symbol=>"AUDIO", :name=>"Audius", :rate=>"12.80"},
{:symbol=>"LPT", :name=>"Livepeer", :rate=>"11.40"},
{:symbol=>"FET", :name=>"Fetch.ai", :rate=>"3.60"},
{:symbol=>"KSM", :name=>"Kusama", :rate=>"11.80"},
{:symbol=>"CELR", :name=>"Celer", :rate=>"4.80"},
{:symbol=>"BAND", :name=>"Band Protocol", :rate=>"12.00"},
{:symbol=>"SKL", :name=>"SKALE", :rate=>"7.00"},
{:symbol=>"T", :name=>"Threshold", :rate=>"5.50"}]

Stay tuned to how I use an Kiba, an ETL framework, to upload this information to Buy And Defi’s database.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store