👨🏿‍💻
Learn
  • Learn And The Power Of Community
  • Intro
    • learn-co-curriculum/welcome-to-learn-verified
    • learn-co-curriculum/your-first-lab
    • learn-co-curriculum/environment-setup
  • Intro to tic tac toe
    • matz-readme
    • what-is-a-program?
    • hello world
    • Intro to irb
    • Reading-error-messages
    • Data-types
    • variable
    • Variable-assignment lab
    • String interpolation
    • Interpolation-super-power
    • Welcome to tic tac toe
    • Array
    • Tic tac toe board
    • Intro to methods
    • Puts print and return
    • Intro-to-tdd-rspec-and-learn
    • Methods and arguments
    • Say hello (lab)
    • Methods-default-arguments
    • ttt-3-display_board-example
    • ttt-4-display-board-rb
    • Intro-to-cli-applications
    • Greeting-cli
    • cli-applications in Ruby
    • Ruby-gets-input
    • Tic tac toe move
    • Truthiness-in-ruby-readme
    • booleans
    • conditional (if)
    • ttt-6-position-taken
    • ttt-7-valid-move
    • rspec-fizzbuzz
    • Looping-introduction
    • Loop
    • while-and-until-loop
    • Tic Tac Toe Turn lab
    • looping-while-until lab
    • Tic Tac Toe Play Loop (lab)
    • Tic Tac Toe Current Player (lab)
    • Intro to ruby iterators
    • Nested Arrays
    • Boolean Enumerators
    • Search Enumerators
    • Tic Tac Toe Game Status
    • tic-tac-toe
  • OOP tic tac toe
    • intro to oop
    • Intro-to-classes-and-instances
    • Classes-and-instances-lab
    • Instance-methods
    • Instance-methods-lab
    • Object Attributes
    • object-attributes-lab
    • Object Initialization
    • Object-initialize-lab
    • oop barking dog lab
    • Procedural-vs-oop
    • oop tic tac toe
  • Git and github
    • Intro to Version Control
    • Git Repository Basics
    • Git-basics-quiz
    • Forks-and-clones
    • Git Remotes and Github
    • Git Remotes and Github Codealong
    • Thinking Ahead: GitHub as Career Differentiator
    • Github Pull Requests
    • Git Collaboration
    • Git-collaboration-quiz
    • Git Basics Quiz
  • HTML
    • A-quick-tour-of-the-web
    • The-web-is-made-of-strings
    • What-makes-the-web-possible?
    • html-introduction
    • Your first-html-tag-lab
    • Nested-tags-and-attributes
    • Well-formed-html-document-lab
    • HTML elements types overview
    • Researching-HTML-elements
    • Separation-of-content-and-presentation
  • CSS
    • Introduction-to-css
    • introduction-to-css-code-along
  • Procedural Ruby
    • Regex-what-is-a-pattern
    • Regex-basics
    • Regex-lab
    • Regex-match-scan-grep-methods
    • learn-co-curriculum/method-arguments-lab
    • Method-scope
    • Return Values Lab
    • Debugging-with-pry
    • Method-scope-lab
    • Truthiness-code-challenge
    • If Statements Lab
    • Case-statements
    • Case-statements-quiz
    • Logic and Conditionals Quiz
    • Ternary Operators and Statement Modifiers lab
    • Looping Lab
    • looping-quiz
    • learn-co-curriculum/looping-times
    • countdown-to-midnight lab
    • Array introduction
    • Using Arrays
    • Array-CRUD-lab
    • Array-methods
    • Array-methods-lab
    • Square array lab
    • Collect and Return Values
    • Collect Lab
    • Badges and Schedules Lab
    • Oxford comma lab
    • Deli counter lab
    • Reverse Each Word Lab
    • Yield-and-blocks
    • Each Lab
    • Return from Yield Statements
    • My All? Lab
    • My Find Lab
    • Cartoon Collections Lab
    • Enumerators Code Challenge
    • Prime? Lab
    • Sorting
    • Sorting Lab
    • Introduction to Hashes
    • Using Hashes lab
    • Ruby-symbols
    • Hash iteration
    • Hash Iteration Lab
    • Hash Iteration with Collect
    • Intro to Nested Hashes
    • Building Nested Hashes
    • Building Nested Hashes Lab
    • Nested Hash Iteration
    • Nested Hashes Lab
    • Multitype Collections Lab
    • Iterating over Nested Hashes Codealong
    • Other Hashes Codealong
    • Hashes Manipulation Lab
  • OOP Ruby
    • OO Ruby Video: Object Orientation Overview
    • Object Accessors
    • Instance Variables lab
    • Video Review: Object Properties
    • Meowing Cat
    • Intro to Object Orientation - Ruby
    • oo basics lab
    • OO Basics with Class Constants
    • Self
    • OO School Domain lab
    • OO Counting Sentences lab
    • Class Variables and Methods
    • Class Variables and Methods Lab
    • Remembering Objects
    • Puppy Lab
    • Advanced Class Methods
    • Advanced Class Methods Lab
    • Video Review: Object Models
    • OO Email Parser lab
    • OO Anagram Detector lab
    • OO Cash Register lab
    • Intro to Object Relationships
    • Belongs to Object Lab
    • Has Many Object
    • Has Many Object Lab
    • Collaborating Objects Review
    • Collaborating Objects Lab
    • OO My Pets
    • oo kickstarter lab
    • OO Banking lab
    • Has Many Objects Through
    • Has Many Objects Through Lab
    • Intro to Inheritance
    • Intro to Inheritance Lab
    • Super
    • Super Lab
    • Intro to Modules
    • Intro to Modules Lab
    • Mass Assignment
    • Mass Assignment and Metaprogramming
    • Mass Assignment Lab
    • Custom Errors lab
    • OO Triangle lab
  • Scraping and project
    • Gems and Bundler
    • Scraping
    • Scraping Lab
    • Kickstarter Scraping Lab
    • Video Review: Object Orientation and Scraping
    • OO Ruby Object Orientation Video Review
    • Music Library CLI
    • Video Review: Music Library CLI
    • Tic-tac-toe with AI project
    • Student Scraper
    • CLI Data Gem Portfolio Project
    • CLI Data Gem Walkthrough
    • CLI Data Gem Walkthrough: Creating a CLI Scraper Gem
    • Common Anti-Patterns in CLI Data Gem
    • Student Example 1: Refactoring CLI Gem
    • Student Example 2: Refactoring CLI Gem
  • SQL
    • What is SQL
    • SQL Intro and Installation
    • SQL Database Basics
    • SQL Databases and Text Editors
    • SQL Data Types
    • SQL Inserting, Updating, and Selecting
    • Basic SQL Queries
    • SQL Aggregate Functions
    • SQL Aggregate Functions Lab
    • SQL Bear Organizer Lab
    • Edgar Codd and Table Relations
    • Table Relations
    • SQL JOINS
    • SQL Complex Joins
    • SQL Join Tables
    • Grouping and Sorting Data
    • SQL Joins Review Lectures
    • SQL Crowdfunding Lab
    • SQL Library Lab
    • Pokemon Scraper Lab
  • ORM And Active record
    • Why an ORM is Useful
    • Mapping Ruby Classes to Database Tables
    • Mapping Classes to Tables Lab
    • Mapping Database Tables to Ruby Objects
    • Mapping Database Rows to Objects Lab
    • Updating Records in an ORM
    • Updating Records in an ORM Lab
    • Preventing Record Duplication
    • ORMs Lab: Bringing It All Together lab
    • Dynamic ORMs
    • Dynamic ORMs with Inheritance
    • ActiveRecord Mechanics
    • Translating from ORM to ActiveRecord
    • Intro to Rake
    • Mechanics of Migrations
    • Writing Our Own Migrations
    • Migrations and Active Record Lab
    • ActiveRecord CRUD Lab
    • Advanced Finding Lab
    • ActiveRecord Associations
    • ActiveRecord Associations Lab
    • ActiveRecord Associations Lab II
    • ActiveRecord Associations Video Review
    • ActiveRecord Associations Video Review II
    • Video Review: Aliasing ActiveRecord Associations
    • Video Review: Blog CLI with ActiveRecord and Associations
  • Rack
    • How the Internet Works
    • Increasing Layers of Abstraction
    • Inspecting the Web with Rack (lab)
    • The HTTP Request
    • Dynamic URL Routes
    • Dynamic Web Apps with Rack (lab)
    • Rack Responses Lab
    • Rack Routes and GET Params Lab
    • HTTP Status Codes
    • Dynamic URLs and Status Codes Lab
    • Video Review: How The Web Works, Pt 1
    • Video Review: How the Web Works, Pt 2
  • Html
    • How the Web Works
    • Site Planning
    • HTML Fundamentals
    • HTTP Status Codes
    • video review how the web works pt 1
    • How the Web Works, Part 2: Overview
    • Setting Up a New Site
    • Document Structure
    • Text Formatting
    • HTML Tables
    • Html-images
    • HTML Links
    • Html backing-up changes
    • HTML Validation
    • Quiz - HTML Fundamentals
    • Dev Tools Super Power
    • HTML Lists
    • Html issue bot 9000 (lab)
    • HTML Forms and Iframes
    • HTML Map and Contact Form Code-along
    • HTML5 Media
    • HTML5 Video Embed Code-Along
    • HTML5 Semantic Elements
    • HTML5 Semantic Containers Code-along
    • HTML5 Quiz
  • CSS
    • CSS Fundamentals
    • CSS Styling Code Along
    • My Little Rainbow
    • CSS Kitten Wheelbarrow
    • CSS Graffiti Override Lab
    • CSS Issue Bot 9000
    • Your first deployment
    • The Box Model
    • Layout Types
    • Float
    • Clearfix
    • Centering
    • Column Structure
    • CSS Columns Code Along Exercise (lab)
    • Box Model & Page Layout
    • Using Z Index
    • Positioning
    • ZHW Shoes Layout (lab)
    • Zetsy (lab)
    • CSS Box Style Code Along
    • Animal Save (lab)
    • Building Responsive Sites
    • Intro to Responsive Media
    • CSS Media Queries
    • Working with Responsive Type
    • Responsive layout
    • The Viewport Property
    • Responsive Features Code-Along (lab)
    • Bootstrap Introduction
    • Bootstrap Code-Along
    • Bootstrap Grid System
    • Grid Layout Code-Along
    • Bootstrap Navbar Code-Along
  • Sinatra
    • What is Sinatra?
    • Sinatra From Scratch
    • Using the Shotgun Development Server (lab)
    • Sinatra Basics
    • Sinatra Hello World Basics (lab)
    • Routes in Sinatra
    • Sinatra Routes Lab
    • Intro To MVC
    • Sinatra MVC File Structure (lab)
    • Sinatra Views: Using ERB
    • Sinatra Views (lab)
    • Sinatra Basic Views Lab
    • Sinatra Views Lab II
    • Intro To Capybara
    • Dynamic Routes in Sinatra
    • HTML Forms and Params
    • Passing Data Between Views and Controllers in Sinatra
    • Sinatra Forms Lab
    • Sinatra Yield Readme
    • Integrating Models Sinatra Code-along
    • Sinatra MVC Lab - Pig Latinizer
    • Sinatra Basic Forms Lab
    • Sinatra Forms
    • Nested Forms Readme
    • Sinatra Nested Forms Lab: Pirates!
    • Lab Review-- Sinatra Nested Forms Lab: Pirates
    • Sinatra Nested Forms Lab: Superheroes!
    • Sessions and Cookies
    • Mechanics of Sessions
    • Sinatra Basic Sessions Lab
    • Using Sessions
    • Sinatra and Active Record CRUD
    • Sinatra Activerecord Setup
    • Sinatra ActiveRecord CRUD
    • User Authentication in Sinatra
    • Sinatra Sessions Lab - User Logins
    • Securing Passwords
    • Secure Password Lab
    • Sinatra Authentication- Overview
    • RESTful Routes
    • Restful Routes Basic Lab
    • Sinatra ActiveRecord Associations: Join Tables
    • Using Tux in Sinatra with ActiveRecord
    • ActiveRecord Associations in Sinatra
    • Sinatra Multiple Controllers
    • Sinatra and Active Record: Associations and Complex Forms
    • Sinatra Playlister (lab)
    • Welcome to NYC Sinatra! (lab)
    • Building a Site Generator, Part 1- Overview
    • Building a Site Generator, Part 2- Overview
    • Fwitter Group Project
  • Rails
    • Welcome To Rails
      • Rails Application Basics
      • Rails Static Request
      • Rails Hello World Lab
      • Rails Model View Controller
      • Intro to Rails- Overview
    • Intro to REST
    • Active Record Models and Rails
    • ActiveRecord Model Rails Lab
    • RESTful Index Action Lab
    • Rails Dynamic Request
    • Rails Dynamic Request Lab
    • Rails URL Helpers
    • Rails URL Helpers Lab
    • Rails form_tag
    • Rails form_tag Lab
    • Create Action
    • Create Action Lab
    • Index, Show, New, Create Lab
    • Edit/Update Action
    • form_for on Edit
    • Strong Params Basics
    • form_for Lab
    • Rails Generators
    • CRU with form_for Lab
    • Resource and Scaffold Generator
    • Rails Blog scaffold
    • Todo mvc assets and managing lists
    • Rails Forms Overview
    • ActiveRecord Validations
    • ActiveRecord Validations Lab
    • Validations in Controller Actions
    • Validations In Controller Actions Lab
    • Validations with form_tag
    • Validations with form_for
    • DELETE Forms and Requests
    • Testing in Rails
    • Validations with form_tag
    • CRUD With Validations Lab
    • Join the Fun rails
    • Activerecord lifecycle reading
    • Displaying Associations Rails
    • Active Record Associations Review
    • Forms And Basic Associations Rails
    • Forms And Basic Associations Rails Lab
    • Basic Nested Forms
    • Displaying Has Many Through Rails
    • Displaying Has Many Through Rails Lab
    • Has Many Through Forms Rails
    • Has Many Through Forms Rails Labs
    • Has Many Through in Forms Lab Review- Overview
    • Deep Dive into Nested Forms- Overview
    • Layouts And Templates in Rails
    • Rails Layouts And Templates Lab
    • Simple Partials
    • Simple Partials Lab
    • Partials with Locals
    • Partials with Locals
    • Refresher on MVC
    • Refactoring Views With Helpers
    • Refactoring Views With Helpers Lab
    • Model Class Methods
    • Optimal Queries using Active Record (lab)
    • Routing And Nested Resources
    • Nested Resource Routing Lab
    • Modifying Nested Resources
    • Modifying Nested Resources Lab
    • Namespaced Routes
    • Namespaced Routes Lab
    • Todomvc 2 lists have items
    • TodoMVC 3: Mark Items Complete
    • Todomvc 4 refactoring with partials and helpers
    • Todomvc 5 deleting items
    • Introduction to Authentication and Authorization
      • Cookies and sessions
      • Cookies and Sessions Lab
      • Sessions Controller
      • Sessions Controller Lab
      • Login Required Readme
      • Login Required Lab
      • Using has_secure_password
      • Has_secure_password lab
      • Authentication- Overviewn
      • OmniAuth
      • Omniauth Lab
      • Omniauth review lecture in todomvc
      • Authentication and authorization recap and gems
    • Rails Amusement Park lab
    • How to Find Gems
  • JavaScript
    • Intro to JavaScript
      • JavaScript Data Types
      • JavaScript Data Types Quiz
      • JavaScript Variables
      • JavaScript Comparisons
      • Conditionals
      • Logical Operators
      • Functions
      • Intro to Debugging
      • Intro to Testing
      • JavaScript Basics Quiz
    • Scope
      • Scope chain
      • JavaScript Practice Scope Lab
      • Lexical scoping
      • Errors and Stack Traces
      • Hoisting
    • Arrays And Objects
      • Objects
      • JavaScript: Objects and Arrays Quiz
      • Object Iteration
      • JavaScript Logging
      • Traversing Nested Objects
      • Filter
      • Map
    • Functions Revised
      • First-Class Functions Lab
      • First-Class Functions
      • First-Class Functions Practice
      • First-Class Functions Practice Lab
    • OOP
      • Creating Objects
      • Object Methods and Classes
      • Using Prototypes
      • Using Classes in Javascript
      • JavaScript This Walkthrough
      • This Code-along
      • Bind, Call, and Apply Readme
      • Bind, Call, Apply Lab
      • Object Relations
      • Association Methods in Javascript
      • Class Relations Lab
      • JavaScript Closures and Higher Order Functions
      • Closures Lab
      • JavaScript Arrow Functions
      • Daily Lunch Lab
    • DOM
      • Introduction to the DOM
      • Introduction to the DOM Lab
      • More on the DOM
      • Creating and Inserting DOM Nodes
      • The DOM Is a Tree
      • Listening to Nodes
      • Modify HTML With jQuery
      • Modifying HTML Lab
      • jQuery Selectors
      • Document.ready
      • Acting On Events Lab
      • DOM Quiz
    • Templates
      • Introduction to CSS
      • CSS Quiz
      • CSS Libraries
      • CSS Libraries Lab
      • Intro to Templates
      • Template Engines
      • Template Engines Lab
      • Advanced Templating
      • Advanced Templating Lab
    • Asynchronous JavaScript
      • Intro to XHR Code Along
      • Hitting APIs Lab
      • Advanced AJAX Lab
      • AJAX and Callbacks
      • AJAX and Callbacks Lab
      • REST Refresher
      • REST Quiz
      • Fetch
      • JavaScript fetch() Lab
      • Intro to Mocha
      • Testing with Spies
      • Testing with Mocks and Stubs
  • Rails and JavaScript
Powered by GitBook
On this page
  • Objectives
  • Overview
  • Before You Begin
  • Instructions
  • The Scraper Class
  • The Student Class
  • Our Code in Action
  1. Scraping and project

Student Scraper

PreviousTic-tac-toe with AI projectNextCLI Data Gem Portfolio Project

Last updated 5 years ago

Objectives

  • Build two classes, a class that is responsible for scraping data from a web page and a class that uses that data to instantiate new objects.

  • Understand the contents of a third class that is responsible for the command line interface.

  • Use meta-programming to instantiate new instances of a class and add attributes to instances of a class.

Overview

In this lab, you'll be scraping your Learn.co student website. You'll use the index page to grab a list of current students and instantiate a series of Student objects. You'll scrape the individual profile pages of each student to add attributes to each individual student.

We've given you a third class, CommandLineInterface that is responsible for generating students using both the Student and Scraper classes. As you go through this lab, take some time to read through the code in the CommandLineInterface class. Try to understand how it works and how it uses the code in our other classes to actually create, add attributes, and display students to the user via the command-line interface.

We've also given you an executable file in bin/run that you can execute once you get your tests passing to see your code in action!

Before You Begin

For this project, we'll be scraping data from the student profiles website. Sort of. We can't really give you a project with a full test suite and base those tests on the real live Learn.co website on the real live internet. Why? Because websites change! They get new styling or new information, or they break because someone did something wrong. It happens! So, any tests we write would be based on the website at a given point in time. The code you write to pass those tests would assume that you are scraping a website that may have since changed. Your scraper would pass tests but fail to actually scrape the web page if you tried to run your code by sending a real web request to the real website. That would be terrible! Then you couldn't see your code in action or view the page you were writing code to scrape. Just awful.

Don't worry! We've very cleverly solved this problem for the purposes of this project. We've created a mock-up student site,

This copy is being maintained only for the purposes of this project, so we don't have to worry about things like the styling changing or the code-breaking and affecting our scraper code.

Instructions

Run bundle install first.

The Scraper Class

Let's start with the Scraper class in lib/scraper.rb. In this class you are responsible for defining two methods. The #scrape_index_page method is responsible for scraping the index page that lists all of the students and the #scrape_profile_page method is responsible for scraping an individual student's profile page to get further information about that student.

The .scrape_index_page Method

This is a class method that should take in an argument of the URL of the index page, for the purposes of our test the URL will be https://learn-co-curriculum.github.io/student-scraper-test-page/index.html. This method should use Nokogiri and Open-URI to access that page. The return value of this method should be an array of hashes in which each hash represents a single student. The keys of the individual student hashes should be :name, :location and :profile_url.

Here's a look at the desired behavior:

Scraper.scrape_index_page(index_url)
# => [
        {:name => "Abby Smith", :location => "Brooklyn, NY", :profile_url => "students/abby-smith.html"},
        {:name => "Joe Jones", :location => "Paris, France", :profile_url => "students/joe-jonas.html"},
        {:name => "Carlos Rodriguez", :location => "New York, NY", :profile_url => "students/carlos-rodriguez.html"},
        {:name => "Lorenzo Oro", :location => "Los Angeles, CA", :profile_url => "students/lorenzo-oro.html"},
        {:name => "Marisa Royer", :location => "Tampa, FL", :profile_url => "students/marisa-royer.html"}
      ]

Top-Tip: Remember to use the element inspector in your browser's developer tools to examine each element whose value you are trying to scrape. You can also use binding.pry and experiment with different element selectors in your terminal. It takes a lot of trial and error to find the correct selectors for the desired element.

The .scrape_profile_page Method

This is a class method that should take in an argument of a student's profile URL. It should use Nokogiri and Open-URI to access that page. The return value of this method should be a hash in which the key/value pairs describe an individual student. Some students don't have a Twitter or some other social link. Be sure to be able to handle that. Once written, the following code should return a hash containing info from the provided profile_url:

Scraper.scrape_profile_page(profile_url)

And the returned hash should look like the following:

{
  :twitter=>"http://twitter.com/flatironschool",
  :linkedin=>"https://www.linkedin.com/in/flatironschool",
  :github=>"https://github.com/learn-co",
  :blog=>"http://flatironschool.com",
  :profile_quote=>"\"Forget safety. Live where you fear to live. Destroy your reputation. Be notorious.\" - Rumi",
  :bio=> "I'm a school"
}

The only attributes you need to scrape from a student's profile page are the ones listed above: the Twitter URL, LinkedIn URL, GitHub URL, blog URL, profile quote, and bio. The hash you build using those attributes should be formatted like the one in the example above.

Why class methods?

Why are our scraping methods being defined as class methods? Well, we don't need to store any information about the Scraper once it has completed the job of scraping. We simply need to scrape some information and pass that information along to our Student class. So, we don't need to produce instances of Scraper that maintain their own attributes.

The Student Class

We've already given you the attr_accessors that you are required to have for each individual student.

The student class will use the information returned by the above methods from our Scraper class to create students and add attributes to individual students. However, the Student class shouldn't know about the Scraper class. This means that the Student class shouldn't directly interact with the Scraper class––it shouldn't call on the Scraper class in any of its methods or take in the Scraper class itself as an argument. Why is this? We want our program to be as flexible as possible.

We can imagine any number of applications that use a Student model. So we don't want our Student model to be dependent on how it gets information regarding the students it creates. It should simply be ready to take in that information, regardless of its source (be it scraping, a .csv file, or a form on a website).

The #create_from_collection(students_array)

This class method should take in an array of hashes. In fact, we will call Student.create_from_collection with the return value of the Scraper.scrape_index_page method as the argument. The #create_from_collection method should iterate over the array of hashes and create a new individual student using each hash. This brings us to the #initialize method on our Student class.

The #initialize Method

The #initialize method should take in an argument of a hash and use meta-programming to assign the newly created student attributes and values per the key/value pairs of the hash. Use the #send method to achieve this. This method should also add the newly created student to the Student class' @@all array of all students. You'll need to create this class variable and set it equal to an empty array at the top of your class. Push self into the array at the end of the #initialize method.

The #add_student_attributes Method

This instance method should take in a hash whose key/value pairs describe additional attributes of an individual student. In fact, we will be calling student.add_student_attributes with the return value of the Scraper.scrape_profile_page method as the argument.

The #add_student_attributes method should iterate over the given hash and use meta-programming to dynamically assign the student attributes and values per the key/value pairs of the hash. Use the #send method to achieve this.

Important: The return value of this method should be the student itself. Use the self keyword.

The .all Method

This class method should return the contents of the @@all array.

Our Code in Action

Now that you have all your tests passing, you can run our executable file, which relies on our CommandLineInterface class.

We've provided you with all of the code in the CommandLineInterface class. Take a few minutes to read through this class and gain a strong understanding of how it uses the code you wrote in your Scraper and Student classes to make a request to the local files and scrape the students.

Now run the executable file with ruby bin/run. You should see all of the students you scraped and instantiated puts-ed out to the terminal. Great job!

Reminder: Nokogiri will take care of parsing the actual HTML page. Once parsed, we can use Nokogiri's .css method to access specific elements based on their class or id. After reading the instructions below, inspect the in a browser to find the appropriate CSS classes for elements you need to scrape. For more information, check out for additional information

View on Learn.co and start learning to code for free.

available here as a GitHub Page
website
Nokogiri's reference materials
Student Scraper