👨🏿‍💻
Learn
  • Learn And The Power Of Community
  • Intro
    • learn-co-curriculum/welcome-to-learn-verified
    • learn-co-curriculum/your-first-lab
    • learn-co-curriculum/environment-setup
  • Intro to tic tac toe
    • matz-readme
    • what-is-a-program?
    • hello world
    • Intro to irb
    • Reading-error-messages
    • Data-types
    • variable
    • Variable-assignment lab
    • String interpolation
    • Interpolation-super-power
    • Welcome to tic tac toe
    • Array
    • Tic tac toe board
    • Intro to methods
    • Puts print and return
    • Intro-to-tdd-rspec-and-learn
    • Methods and arguments
    • Say hello (lab)
    • Methods-default-arguments
    • ttt-3-display_board-example
    • ttt-4-display-board-rb
    • Intro-to-cli-applications
    • Greeting-cli
    • cli-applications in Ruby
    • Ruby-gets-input
    • Tic tac toe move
    • Truthiness-in-ruby-readme
    • booleans
    • conditional (if)
    • ttt-6-position-taken
    • ttt-7-valid-move
    • rspec-fizzbuzz
    • Looping-introduction
    • Loop
    • while-and-until-loop
    • Tic Tac Toe Turn lab
    • looping-while-until lab
    • Tic Tac Toe Play Loop (lab)
    • Tic Tac Toe Current Player (lab)
    • Intro to ruby iterators
    • Nested Arrays
    • Boolean Enumerators
    • Search Enumerators
    • Tic Tac Toe Game Status
    • tic-tac-toe
  • OOP tic tac toe
    • intro to oop
    • Intro-to-classes-and-instances
    • Classes-and-instances-lab
    • Instance-methods
    • Instance-methods-lab
    • Object Attributes
    • object-attributes-lab
    • Object Initialization
    • Object-initialize-lab
    • oop barking dog lab
    • Procedural-vs-oop
    • oop tic tac toe
  • Git and github
    • Intro to Version Control
    • Git Repository Basics
    • Git-basics-quiz
    • Forks-and-clones
    • Git Remotes and Github
    • Git Remotes and Github Codealong
    • Thinking Ahead: GitHub as Career Differentiator
    • Github Pull Requests
    • Git Collaboration
    • Git-collaboration-quiz
    • Git Basics Quiz
  • HTML
    • A-quick-tour-of-the-web
    • The-web-is-made-of-strings
    • What-makes-the-web-possible?
    • html-introduction
    • Your first-html-tag-lab
    • Nested-tags-and-attributes
    • Well-formed-html-document-lab
    • HTML elements types overview
    • Researching-HTML-elements
    • Separation-of-content-and-presentation
  • CSS
    • Introduction-to-css
    • introduction-to-css-code-along
  • Procedural Ruby
    • Regex-what-is-a-pattern
    • Regex-basics
    • Regex-lab
    • Regex-match-scan-grep-methods
    • learn-co-curriculum/method-arguments-lab
    • Method-scope
    • Return Values Lab
    • Debugging-with-pry
    • Method-scope-lab
    • Truthiness-code-challenge
    • If Statements Lab
    • Case-statements
    • Case-statements-quiz
    • Logic and Conditionals Quiz
    • Ternary Operators and Statement Modifiers lab
    • Looping Lab
    • looping-quiz
    • learn-co-curriculum/looping-times
    • countdown-to-midnight lab
    • Array introduction
    • Using Arrays
    • Array-CRUD-lab
    • Array-methods
    • Array-methods-lab
    • Square array lab
    • Collect and Return Values
    • Collect Lab
    • Badges and Schedules Lab
    • Oxford comma lab
    • Deli counter lab
    • Reverse Each Word Lab
    • Yield-and-blocks
    • Each Lab
    • Return from Yield Statements
    • My All? Lab
    • My Find Lab
    • Cartoon Collections Lab
    • Enumerators Code Challenge
    • Prime? Lab
    • Sorting
    • Sorting Lab
    • Introduction to Hashes
    • Using Hashes lab
    • Ruby-symbols
    • Hash iteration
    • Hash Iteration Lab
    • Hash Iteration with Collect
    • Intro to Nested Hashes
    • Building Nested Hashes
    • Building Nested Hashes Lab
    • Nested Hash Iteration
    • Nested Hashes Lab
    • Multitype Collections Lab
    • Iterating over Nested Hashes Codealong
    • Other Hashes Codealong
    • Hashes Manipulation Lab
  • OOP Ruby
    • OO Ruby Video: Object Orientation Overview
    • Object Accessors
    • Instance Variables lab
    • Video Review: Object Properties
    • Meowing Cat
    • Intro to Object Orientation - Ruby
    • oo basics lab
    • OO Basics with Class Constants
    • Self
    • OO School Domain lab
    • OO Counting Sentences lab
    • Class Variables and Methods
    • Class Variables and Methods Lab
    • Remembering Objects
    • Puppy Lab
    • Advanced Class Methods
    • Advanced Class Methods Lab
    • Video Review: Object Models
    • OO Email Parser lab
    • OO Anagram Detector lab
    • OO Cash Register lab
    • Intro to Object Relationships
    • Belongs to Object Lab
    • Has Many Object
    • Has Many Object Lab
    • Collaborating Objects Review
    • Collaborating Objects Lab
    • OO My Pets
    • oo kickstarter lab
    • OO Banking lab
    • Has Many Objects Through
    • Has Many Objects Through Lab
    • Intro to Inheritance
    • Intro to Inheritance Lab
    • Super
    • Super Lab
    • Intro to Modules
    • Intro to Modules Lab
    • Mass Assignment
    • Mass Assignment and Metaprogramming
    • Mass Assignment Lab
    • Custom Errors lab
    • OO Triangle lab
  • Scraping and project
    • Gems and Bundler
    • Scraping
    • Scraping Lab
    • Kickstarter Scraping Lab
    • Video Review: Object Orientation and Scraping
    • OO Ruby Object Orientation Video Review
    • Music Library CLI
    • Video Review: Music Library CLI
    • Tic-tac-toe with AI project
    • Student Scraper
    • CLI Data Gem Portfolio Project
    • CLI Data Gem Walkthrough
    • CLI Data Gem Walkthrough: Creating a CLI Scraper Gem
    • Common Anti-Patterns in CLI Data Gem
    • Student Example 1: Refactoring CLI Gem
    • Student Example 2: Refactoring CLI Gem
  • SQL
    • What is SQL
    • SQL Intro and Installation
    • SQL Database Basics
    • SQL Databases and Text Editors
    • SQL Data Types
    • SQL Inserting, Updating, and Selecting
    • Basic SQL Queries
    • SQL Aggregate Functions
    • SQL Aggregate Functions Lab
    • SQL Bear Organizer Lab
    • Edgar Codd and Table Relations
    • Table Relations
    • SQL JOINS
    • SQL Complex Joins
    • SQL Join Tables
    • Grouping and Sorting Data
    • SQL Joins Review Lectures
    • SQL Crowdfunding Lab
    • SQL Library Lab
    • Pokemon Scraper Lab
  • ORM And Active record
    • Why an ORM is Useful
    • Mapping Ruby Classes to Database Tables
    • Mapping Classes to Tables Lab
    • Mapping Database Tables to Ruby Objects
    • Mapping Database Rows to Objects Lab
    • Updating Records in an ORM
    • Updating Records in an ORM Lab
    • Preventing Record Duplication
    • ORMs Lab: Bringing It All Together lab
    • Dynamic ORMs
    • Dynamic ORMs with Inheritance
    • ActiveRecord Mechanics
    • Translating from ORM to ActiveRecord
    • Intro to Rake
    • Mechanics of Migrations
    • Writing Our Own Migrations
    • Migrations and Active Record Lab
    • ActiveRecord CRUD Lab
    • Advanced Finding Lab
    • ActiveRecord Associations
    • ActiveRecord Associations Lab
    • ActiveRecord Associations Lab II
    • ActiveRecord Associations Video Review
    • ActiveRecord Associations Video Review II
    • Video Review: Aliasing ActiveRecord Associations
    • Video Review: Blog CLI with ActiveRecord and Associations
  • Rack
    • How the Internet Works
    • Increasing Layers of Abstraction
    • Inspecting the Web with Rack (lab)
    • The HTTP Request
    • Dynamic URL Routes
    • Dynamic Web Apps with Rack (lab)
    • Rack Responses Lab
    • Rack Routes and GET Params Lab
    • HTTP Status Codes
    • Dynamic URLs and Status Codes Lab
    • Video Review: How The Web Works, Pt 1
    • Video Review: How the Web Works, Pt 2
  • Html
    • How the Web Works
    • Site Planning
    • HTML Fundamentals
    • HTTP Status Codes
    • video review how the web works pt 1
    • How the Web Works, Part 2: Overview
    • Setting Up a New Site
    • Document Structure
    • Text Formatting
    • HTML Tables
    • Html-images
    • HTML Links
    • Html backing-up changes
    • HTML Validation
    • Quiz - HTML Fundamentals
    • Dev Tools Super Power
    • HTML Lists
    • Html issue bot 9000 (lab)
    • HTML Forms and Iframes
    • HTML Map and Contact Form Code-along
    • HTML5 Media
    • HTML5 Video Embed Code-Along
    • HTML5 Semantic Elements
    • HTML5 Semantic Containers Code-along
    • HTML5 Quiz
  • CSS
    • CSS Fundamentals
    • CSS Styling Code Along
    • My Little Rainbow
    • CSS Kitten Wheelbarrow
    • CSS Graffiti Override Lab
    • CSS Issue Bot 9000
    • Your first deployment
    • The Box Model
    • Layout Types
    • Float
    • Clearfix
    • Centering
    • Column Structure
    • CSS Columns Code Along Exercise (lab)
    • Box Model & Page Layout
    • Using Z Index
    • Positioning
    • ZHW Shoes Layout (lab)
    • Zetsy (lab)
    • CSS Box Style Code Along
    • Animal Save (lab)
    • Building Responsive Sites
    • Intro to Responsive Media
    • CSS Media Queries
    • Working with Responsive Type
    • Responsive layout
    • The Viewport Property
    • Responsive Features Code-Along (lab)
    • Bootstrap Introduction
    • Bootstrap Code-Along
    • Bootstrap Grid System
    • Grid Layout Code-Along
    • Bootstrap Navbar Code-Along
  • Sinatra
    • What is Sinatra?
    • Sinatra From Scratch
    • Using the Shotgun Development Server (lab)
    • Sinatra Basics
    • Sinatra Hello World Basics (lab)
    • Routes in Sinatra
    • Sinatra Routes Lab
    • Intro To MVC
    • Sinatra MVC File Structure (lab)
    • Sinatra Views: Using ERB
    • Sinatra Views (lab)
    • Sinatra Basic Views Lab
    • Sinatra Views Lab II
    • Intro To Capybara
    • Dynamic Routes in Sinatra
    • HTML Forms and Params
    • Passing Data Between Views and Controllers in Sinatra
    • Sinatra Forms Lab
    • Sinatra Yield Readme
    • Integrating Models Sinatra Code-along
    • Sinatra MVC Lab - Pig Latinizer
    • Sinatra Basic Forms Lab
    • Sinatra Forms
    • Nested Forms Readme
    • Sinatra Nested Forms Lab: Pirates!
    • Lab Review-- Sinatra Nested Forms Lab: Pirates
    • Sinatra Nested Forms Lab: Superheroes!
    • Sessions and Cookies
    • Mechanics of Sessions
    • Sinatra Basic Sessions Lab
    • Using Sessions
    • Sinatra and Active Record CRUD
    • Sinatra Activerecord Setup
    • Sinatra ActiveRecord CRUD
    • User Authentication in Sinatra
    • Sinatra Sessions Lab - User Logins
    • Securing Passwords
    • Secure Password Lab
    • Sinatra Authentication- Overview
    • RESTful Routes
    • Restful Routes Basic Lab
    • Sinatra ActiveRecord Associations: Join Tables
    • Using Tux in Sinatra with ActiveRecord
    • ActiveRecord Associations in Sinatra
    • Sinatra Multiple Controllers
    • Sinatra and Active Record: Associations and Complex Forms
    • Sinatra Playlister (lab)
    • Welcome to NYC Sinatra! (lab)
    • Building a Site Generator, Part 1- Overview
    • Building a Site Generator, Part 2- Overview
    • Fwitter Group Project
  • Rails
    • Welcome To Rails
      • Rails Application Basics
      • Rails Static Request
      • Rails Hello World Lab
      • Rails Model View Controller
      • Intro to Rails- Overview
    • Intro to REST
    • Active Record Models and Rails
    • ActiveRecord Model Rails Lab
    • RESTful Index Action Lab
    • Rails Dynamic Request
    • Rails Dynamic Request Lab
    • Rails URL Helpers
    • Rails URL Helpers Lab
    • Rails form_tag
    • Rails form_tag Lab
    • Create Action
    • Create Action Lab
    • Index, Show, New, Create Lab
    • Edit/Update Action
    • form_for on Edit
    • Strong Params Basics
    • form_for Lab
    • Rails Generators
    • CRU with form_for Lab
    • Resource and Scaffold Generator
    • Rails Blog scaffold
    • Todo mvc assets and managing lists
    • Rails Forms Overview
    • ActiveRecord Validations
    • ActiveRecord Validations Lab
    • Validations in Controller Actions
    • Validations In Controller Actions Lab
    • Validations with form_tag
    • Validations with form_for
    • DELETE Forms and Requests
    • Testing in Rails
    • Validations with form_tag
    • CRUD With Validations Lab
    • Join the Fun rails
    • Activerecord lifecycle reading
    • Displaying Associations Rails
    • Active Record Associations Review
    • Forms And Basic Associations Rails
    • Forms And Basic Associations Rails Lab
    • Basic Nested Forms
    • Displaying Has Many Through Rails
    • Displaying Has Many Through Rails Lab
    • Has Many Through Forms Rails
    • Has Many Through Forms Rails Labs
    • Has Many Through in Forms Lab Review- Overview
    • Deep Dive into Nested Forms- Overview
    • Layouts And Templates in Rails
    • Rails Layouts And Templates Lab
    • Simple Partials
    • Simple Partials Lab
    • Partials with Locals
    • Partials with Locals
    • Refresher on MVC
    • Refactoring Views With Helpers
    • Refactoring Views With Helpers Lab
    • Model Class Methods
    • Optimal Queries using Active Record (lab)
    • Routing And Nested Resources
    • Nested Resource Routing Lab
    • Modifying Nested Resources
    • Modifying Nested Resources Lab
    • Namespaced Routes
    • Namespaced Routes Lab
    • Todomvc 2 lists have items
    • TodoMVC 3: Mark Items Complete
    • Todomvc 4 refactoring with partials and helpers
    • Todomvc 5 deleting items
    • Introduction to Authentication and Authorization
      • Cookies and sessions
      • Cookies and Sessions Lab
      • Sessions Controller
      • Sessions Controller Lab
      • Login Required Readme
      • Login Required Lab
      • Using has_secure_password
      • Has_secure_password lab
      • Authentication- Overviewn
      • OmniAuth
      • Omniauth Lab
      • Omniauth review lecture in todomvc
      • Authentication and authorization recap and gems
    • Rails Amusement Park lab
    • How to Find Gems
  • JavaScript
    • Intro to JavaScript
      • JavaScript Data Types
      • JavaScript Data Types Quiz
      • JavaScript Variables
      • JavaScript Comparisons
      • Conditionals
      • Logical Operators
      • Functions
      • Intro to Debugging
      • Intro to Testing
      • JavaScript Basics Quiz
    • Scope
      • Scope chain
      • JavaScript Practice Scope Lab
      • Lexical scoping
      • Errors and Stack Traces
      • Hoisting
    • Arrays And Objects
      • Objects
      • JavaScript: Objects and Arrays Quiz
      • Object Iteration
      • JavaScript Logging
      • Traversing Nested Objects
      • Filter
      • Map
    • Functions Revised
      • First-Class Functions Lab
      • First-Class Functions
      • First-Class Functions Practice
      • First-Class Functions Practice Lab
    • OOP
      • Creating Objects
      • Object Methods and Classes
      • Using Prototypes
      • Using Classes in Javascript
      • JavaScript This Walkthrough
      • This Code-along
      • Bind, Call, and Apply Readme
      • Bind, Call, Apply Lab
      • Object Relations
      • Association Methods in Javascript
      • Class Relations Lab
      • JavaScript Closures and Higher Order Functions
      • Closures Lab
      • JavaScript Arrow Functions
      • Daily Lunch Lab
    • DOM
      • Introduction to the DOM
      • Introduction to the DOM Lab
      • More on the DOM
      • Creating and Inserting DOM Nodes
      • The DOM Is a Tree
      • Listening to Nodes
      • Modify HTML With jQuery
      • Modifying HTML Lab
      • jQuery Selectors
      • Document.ready
      • Acting On Events Lab
      • DOM Quiz
    • Templates
      • Introduction to CSS
      • CSS Quiz
      • CSS Libraries
      • CSS Libraries Lab
      • Intro to Templates
      • Template Engines
      • Template Engines Lab
      • Advanced Templating
      • Advanced Templating Lab
    • Asynchronous JavaScript
      • Intro to XHR Code Along
      • Hitting APIs Lab
      • Advanced AJAX Lab
      • AJAX and Callbacks
      • AJAX and Callbacks Lab
      • REST Refresher
      • REST Quiz
      • Fetch
      • JavaScript fetch() Lab
      • Intro to Mocha
      • Testing with Spies
      • Testing with Mocks and Stubs
  • Rails and JavaScript
Powered by GitBook
On this page
  • Objectives
  • Overview
  • Code Along I: The Course Class
  • Code Along II: The Scraper Class
  • Getting the HTML Doc and Finding Our Selectors
  • Finding CSS Selectors for The Desired Attributes
  • Creating Course Objects with Scraped Attributes
  • Extracting Our Code into Methods
  • Final Code: The Scraper Class
  1. Scraping and project

Scraping Lab

PreviousScrapingNextKickstarter Scraping Lab

Last updated 5 years ago

Objectives

  • Scrape a web page's HTML using Nokogiri.

  • Use scraped data to give attributes to Ruby objects.

  • Write an object oriented Scraper class.

Overview

One of the most common use-cases for web scraping involves you, the programmer, scraping data that you will then use to instantiate your own Ruby objects. In this lab, we'll be scraping a static site that displays the course offerings of the Flatiron School. We'll be taking the scraped data to create our own Course objects. Each instance of the Course class will have a series of attributes. The values of each of these attributes will be scraped from the website.

We'll be creating two classes, Scraper and Course. The Scraper class will be responsible for using Nokogiri to scrape the relevant data. It will also be responsible for taking that data and using it to instantiate instances of the Course class.

Code Along I: The Course Class

Fork and clone this lab to get started!

Before we build our Scraper, we'll build the Course class. We know that the purpose of our scrape is to get data to assign to the attributes of Course class instances.

Let's take a look at the site we'll be scraping in order to get a sense of what attributes we should give our Course class instances. Go ahead and open up .

Scroll down until you are looking at the list of course offerings:

We can see that each course has a title, a schedule (either Part- or Full-Time) and a description. This seems like a great place to start in terms of defining our own Course class objects.

Open up lib/course.rb and define your class:

Now, let's run just the Course specs by typing rspec spec/course_spec.rb in the terminal. You should see the following test output:

Looks like we have some methods to define. Let's start with the instance method tests. The test output tells us that we need setters and getters for title, schedule and description. Let's make them attr_accessors

class Course
  attr_accessor :title, :schedule, :description
end

Go ahead and run the test suite again. Now we should be passing all of our instance method tests! But not our class method tests...

Let's write the .all and .reset_all class methods. You should be familiar with the usage of the .all class method at this point––remember that every time an instance of the class is initialized, we should push it into the @@all class variable. Get these tests passing with the following code:

class Course

  attr_accessor :title, :schedule, :description

  @@all = []

  def initialize
    @@all << self
  end

  def self.all
    @@all
  end

  def self.reset_all
    @@all.clear
  end

end

Run the Course specs again and we should be passing all the tests. We're ready to write our Scraper!

Code Along II: The Scraper Class

Start by running the Scraper specs with the rspec spec/scraper_spec.rb line in your terminal. You should see failing tests and describe a number of methods. Let's run through the desired behavior of each method:

#get_page

The #get_page instance method will be responsible for using Nokogiri and open-uri to grab the entire HTML document from the web page.

#get_courses

The #get_courses instance method will be responsible for using a CSS selector to grab all of the HTML elements that contain a course. In other words, the return value of this method should be a collection of Nokogiri XML elements, each of which describes a course offering. We're going to have to examine the page with the element inspector to find the CSS selector that contains the courses.

#make_courses

The #make_courses method will be responsible for actually instantiating Course objects and giving each course object the correct title, schedule and description attribute that we scraped from the page.

#print_courses

The #print_courses method we made for you! It calls on .make_courses and then iterates over all of the courses that get created to puts out a list of course offerings. We gave you this freebie so that we can easily see how cool it is to scrape data and make real live Ruby objects with it.

Now that we have a basic concept of the methods we're expected to build, we're going to ignore them (surprise!). We've already discussed how tricky it is to scrape data from a web page. It is a very precise process and it takes a lot of playing around in Pry to find the right CSS selectors for the desired data. So, we're going to start by building our #get_page method. As soon as we get the HTML document using Nokogiri, we will drop into our program using Pry and play around with CSS selectors until we find what we're looking for. Once we have working code, we'll worry about organizing the appropriate code into the above-described methods.

Getting the HTML Doc and Finding Our Selectors

Open up lib/scraper.rb and define the .get_page method:

require 'nokogiri'
require 'open-uri'
require 'pry'

require_relative './course.rb'

class Scraper

  def get_page
      # more code coming soon!
  end

end

Notice that we are already requiring Nokogiri, open-uri and Pry at the top of the file. We are ready to us Nokogiri and open-uri to get our HTML. Add the following line to your .get_page method:

doc = Nokogiri::HTML(open("http://learn-co-curriculum.github.io/site-for-scraping/courses"))

Then, we'll place a binding.pry on the next line. At the bottom of the file, outside of the class definition, we'll call Scraper.new.get_page. That way, we'll hit our binding and be able to play around with the HTML document in the terminal in order to find the CSS selectors we're looking for:

require 'nokogiri'
require 'open-uri'
require 'pry'

require_relative './course.rb'

class Scraper

  def get_page
      doc = Nokogiri::HTML(open("http://learn-co-curriculum.github.io/site-for-scraping/courses"))

      binding.pry
  end

end

Scraper.new.get_page

Once your file looks like the code above, run the file with ruby lib/scraper.rb in your terminal. Once you hit your binding, type the doc variable into the terminal and you should see the HTML document, retrieved for us by Nokogiri and open-uri. You should see something like this:

Note: When you hit the pry, you may end up with a prompt that looks like this :. That just means there's a lot of text and you can scroll up and down using the arrow keys. If you need to escape this prompt so you can type in doc, then you simply have to press q or the esc key.

If you scroll down in your terminal, you should see more and more of the HTML document.

Okay, we're ready to find the CSS selector that will grab the course offering from the HTML document. How should we go about doing this? Should we guess? Should we manually read the entire HTML document, looking for the HTML elements that contain the course offerings? Nope. We're going to revisit the Flatiron website in the browser and use the developer tools of our browser to inspect the elements.

Let's take a closer look at the highlighted line in the element inspector:

<article class="post same-height-left" style="height: 489px;">

Looks like the element that contains an individual course has a class of "post". Let's use this CSS selector of .post to try to grab all courses.

Go back to your terminal and execute the following line:

You should see something like this:

Whoa! That's a lot of XML. But, if you take a closer look at the content, you'll see that these Nokogiri XML elements do describe the individual courses. You'll notice course titles and descriptions, among other pieces of information.

Top-Tip: You can scroll down and view more of a long document like this in Pry by using the down arrow key. To stop scrolling and free up the command line so that you are still in Pry but able to type in and execute lines of code, hit q.

Okay, now that we have a working line of code for grabbing all of the courses from the page, let's operate on those courses in order to find the title, schedule and description of each one.

Finding CSS Selectors for The Desired Attributes

We know that a collection of Nokogiri XML elements functions like an array. So, it makes sense that we can iterate over the collection with an enumerator like .each or .collect in order to grab the title, schedule and description of each one. BUT, before we worry about iterating, lets grab just one element and try to identify the correct CSS selectors for title, schedule and description.

In your terminal, execute doc.css(".post").first. This will grab us just the first element from the collection. You should see something like this:

This describes just one course offering. If you look closely, you'll see it contains all the info we need. You can see the title, the schedule and the description. The easiest way to ID the correct CSS selector for extracting this information, however, is to revisit the web page and examine a course offering with our "inspect element" tool.

Scraping Course Title

We don't care about the height and width but we do care about the selector, h2.

Test the following code in your terminal:

doc.css(".post").first.css("h2")

You should see the following returned to you:

[#<Nokogiri::XML::Element:0x3fc350c47c34 name="h2" children=[#<Nokogiri::XML::Text:0x3fc350c4793c "Web Development Immersive">]>]

We're so close! The course title is right there, inside the Nokogiri::XML:Text element. Let's grab it:

doc.css(".post").first.css("h2").text

You should see the following return value:

"Web Development Immersive"

We did it! We found the code for grabbing an individual course's title. Let's do the same for schedule and description.

Scraping Course Schedule

It looks like the schedule element has a class of "date". Let's use that CSS selector to grab the date of the first course.

In your terminal, execute:

doc.css(".post").first.css(".date").text

You should see the following returned to you:

Great, now we have the code for grabbing an individual course's schedule. Let's get that description.

Scraping Course Description

Once again, use the magnifying glass to hover over the first course's description. You should see a tag appear with the following text: p 750. blah blah some pixels. Okay, it looks like we have our selector: the p tag.

Try out the following line in your console:

doc.css(".post").first.css("p").text

You should see returned to you:

"An intensive, Ruby and Javascript course that teaches the skills necessary to start a career as a full-stack software developer."

We did it! We have the working code for grabbing:

  • The page itself:

    • doc = Nokogiri::HTML(open("http://learn-co-curriculum.github.io/site-for-scraping/courses"))

  • The collection of course offerings:

    • doc.css(".post")

  • The title of an individual course offering:

    • doc.css(".post").first.css("h2").text

  • The schedule of an individual course offering:

    • doc.css(".post").first.css(".date").text

  • The description of an individual course offering:

    • doc.css(".post").first.css("p").text

Now we're ready to use our code to create Course objects and give them attributes.

Creating Course Objects with Scraped Attributes

Notice that the scraper.rb file includes this line near the top:

require_relative './course.rb'

We are requiring our Course class file so that our Scraper can make new courses and give them attributes scraped from the web page.

We know how to grab an array-like collection of course elements from the page with the doc.css(".post") line. We also know what code will grab us the title, schedule and description of an individual member of that collection.

So, we can iterate over the collection, make a new Course instance for each course offering element we are iterating over, and assign that instance the scraped title, schedule and description, using the working code for those attributes that we already figured out.

In your #get_page method of the Scraper class, place the following code:

require 'nokogiri'
require 'open-uri'
require 'pry'

require_relative './course.rb'

class Scraper

  def get_page

    doc = Nokogiri::HTML(open("http://learn-co-curriculum.github.io/site-for-scraping/courses"))

    doc.css(".post").each do |post|
      course = Course.new
      course.title = post.css("h2").text
      course.schedule = post.css(".date").text
      course.description = post.css("p").text
    end
  end

For each iteration over the collection of Nokogiri XML elements returned to us by the doc.css(".post") line, we are making a new instance of the Course class and giving that instance the title, schedule and description extracted from the XML.

Place a binding.pry at the end of the method. Now, run the code in this file with ruby lib/scraper.rb. When you hit the binding, enter Course.all into your terminal and take a look at all the courses we made:

Wow! We have a collection of Course objects, each of which have attributes that we scraped from the website. We are such good programmers.

Extracting Our Code into Methods

Okay, we have some great working code. But, it doesn't really all belong in the #get_page method. The #get_page method should be responsible for just getting the page. Let's do some refactoring and get our Scraper tests passing!

#get_page

This method should contain only the code for getting the HTML document. Place the following code in your #get_page method and comment out the rest of that method. We'll need to refer to that code to get our other tests passing.

require 'nokogiri'
require 'open-uri'
require 'pry'

require_relative './course.rb'

class Scraper

  def get_page

    Nokogiri::HTML(open("http://learn-co-curriculum.github.io/site-for-scraping/courses"))

    #doc.css(".post").each do |post|
      #course = Course.new
      #course.title = post.css("h2").text
      #course.schedule = post.css(".date").text
      #course.description = post.css("p").text
    #end
  end

end

Run your Scraper test suite with rspec spec/scraper_spec.rb. Your first test should be passing.

#get_courses

The #get_courses method should operate on the HTML page (which is the return value of the .get_page method) and return the collection of Nokogiri XML elements that describe each course. So, we'll call on our .get_page method inside the .get_courses method.

def get_courses
  self.get_page.css(".post")
end

Run the test suite again and the second test should be passing.

#make_courses

The #make_courses method should operate on the collection of course offering Nokogiri XML elements that was returned by the .get_courses method. The .make_courses method should iterate over the collection and make a new instance of Course class for each one while assigning it the appropriate attributes:

def make_courses
  self.get_courses.each do |post|
    course = Course.new
    course.title = post.css("h2").text
    course.schedule = post.css(".date").text
    course.description = post.css("p").text
  end
end

Run the test suite again and all of your tests should be passing!

Now, just for fun. Place the following line at the bottom of lib/scraper.rb

Scraper.new.print_courses

Ta-da! We did it. Check out all of those awesome courses printed out to your terminal. If you're still having trouble getting your tests to pass, check out the final code below:

Final Code: The Scraper Class

require 'nokogiri'
require 'open-uri'

require_relative './course.rb'

class Scraper

  def get_page
    Nokogiri::HTML(open("http://learn-co-curriculum.github.io/site-for-scraping/courses"))
  end

  def get_courses
    self.get_page.css(".post")
  end

  def make_courses
    self.get_courses.each do |post|
      course = Course.new
      course.title = post.css("h2").text
      course.schedule = post.css(".date").text
      course.description = post.css("p").text
    end
  end

  def print_courses
    self.make_courses
    Course.all.each do |course|
      if course.title && course.title != ""
        puts "Title: #{course.title}"
        puts "  Schedule: #{course.schedule}"
        puts "  Description: #{course.description}"
      end
    end
  end

end

Scraper.new.print_courses

Click on and once again scroll down to the section of the page that lists the course offerings. Right click on any course offering and select "inspect element". You should see something like this in your browser:

Go back to the and open up the element inspector again. Click the symbol in the upper left of your console (it looks like an arrow cursor pointing into a box) to hover over the title of the first course offering. You should see a tag appear when you hover over the course title with this tool. The tag should say h2 750.428 x 28px.

Go back to the and open up the element inspector again. Use the magnifying glass symbol to hover over the schedule of the first course offering. You should see a tag appear when you hover over the schedule (the line that reads "Part-Time" or "Full-Time") that reads em.date ...

View on Learn.co and start learning to code for free.

this link
site
site
Scraping Lab
http://learn-co-curriculum.github.io/site-for-scraping/courses
first doc.css output
course list
inspect element browser console
retrieved doc
spec error output
doc.css output
Course class example