How to use scrapy to feed an SQLite database

Spread the love

In Depth Analysis of the Website

Before creating a database, we need to define which data we can import and how many tables are needed. Thus analyzing the website to scrap has to be done now.

It’s important to choose a dynamic website, so the retrieved data will be dependent on the URL. The example chosen in this tutorial is the classes enrollment offered by Towson University College.

[callout type=”warning” size=”lg”]

Dynamic URLs

If the site you want to scrap doesn’t use dynamic URLs, because of JavaScript for instance, you cannot use Scrapy.

[/callout]

Let’s browse the classes from Spring 2018:

tu-mobile-classes-Spring2018

By browsing the classes, we can see that they arranged by class name: XXXX - main title

Let’s open the ACCT Accounting classes:

tu mobile classes ACCT

Each sub-class has a leading number and a title. Let’s open the first one:

tu mobile classes ACCT 201

Each class has sections, with different class rooms, meetings and teachers. And if we  open a section, there we can see all the attributes of this class:

tu mobile classes ACCT 201 001 1

This looks good and easy to scrap!We have attributes on the left, and values on the right, line by line. This is an easy go for us. Each attribute will be a column in our database.

Take note the URL of this page as you will use it in the console.