The world’s leading publication for data science, AI, and ML professionals.

Get started with SQLite3 in Python Creating Tables & Fetching Rows

Learn to use SQLite – The most used DBMS in the world

Photo by Sunder Muthukumaran on Unsplash
Photo by Sunder Muthukumaran on Unsplash

Overview

  1. Introduction – What is SQLite and SQLite3?
  2. Creating our First SQLite Database
  3. Connectors and Cursors
  4. Creating Tables
  5. Inserting Rows Into the Database
  6. Fetching Rows From the Database
  7. Wrapping Up

Introduction – What is SQLite and SQLite3?

One of the core skills for most modern IT professionals is Structured Query Language (SQL). This is a declarative language that is used to interact with relational databases. Data engineers and analysts regularly use SQL to run data pipelines and investigate useful relationships within the data.

Going straight to common database management systems (DBMS) like PostgreSQL or MySQL can be a bit intimidating when you don’t have any SQL experience. Luckily, SQLite is a great option for learning the basis of SQL. It is simple to set up and easy to manage since it has no separate server process. So although data engineers and data analysts will typically use different database management systems than SQLite, it is a great place to learn SQL. In fact, SQLite is the most commonly used DBMS in the world!

Additionally, the Python library sqlite3 is a simple interface for interacting with SQLite. In this blog post, we will use SQLite and the sqlite3 library to learn two major concepts:

  • Some elementary ways of using the most basic and useful commands of Sql such as CREATE TABLE, INSERT INTO, and SELECT - FROM.
  • How a programming language (in our case Python) can be used to interact with a relational database.

We will set up a Sqlite database, create a database connection from Python with sqlite3, and insert/fetch some rows in the database. The goal is not that you become an expert on SQL, but that you see how it can be used and learn some basic commands to get started. If you want to learn more, I have 8 free videos on YouTube that start the same as this blog post, but go much further:


Creating our First SQLite Database

On the official webpage for SQLite, you can find information about downloading SQLite. However, for most of us, this is not necessary as SQLite is already included on most machines. You also need the sqlite3 library in Python, but this is in the standard library, and thus included with most Python distributions. So most likely, there is nothing to install 😃

To check if everything is already installed, open a new Python file and write the single command:

import sqlite3

If the above file runs fine, then both SQLite and the Python library sqlite3 are installed. We are ready to go!

After the import step, we need to create a connection to the database. This is done by using the connect() function in the sqlite3 library:

# Create a connection to the database
connection = sqlite3.connect("music.db")

The argument passed into the connect() function will be the name of the database. Since we don’t have a database yet, this will simply create a new database for us. If you now run the Python file, then a new file will appear in the directory you are working in called music.db. This is our database!

A relational database consists of various tables. If you are new to this, then you can think about this like a collection of Excel sheets. This is undercutting how powerful relational databases are, but is a nice mental model in the beginning.

After creating a connection object, we need to create a cursor. A cursor can execute SQL commands against the database. To create this, we use the .cursor() method on the connection object as follows:

# Create a cursor
cursor = connection.cursor()

The variable cursor now holds a cursor object that we can use to insert and fetch data from the database. So far, you should have the following code:

import sqlite3

# Create a connection to the database
connection = sqlite3.connect("music.db")

# Create a cursor
cursor = connection.cursor()

Creating Tables

First of all, we need a table in our database. We are going to be working with data representing songs from the 80s. On the cursor object, we can call the method execute() to execute SQL statements. The first statement we are going to learn is the CREATE TABLE statement:

# Create a table
cursor.execute("CREATE TABLE IF NOT EXISTS songs(name, artist, album, year, duration)")

As you can see from the command above, we create a table called songs that has five columns: name, artist, album, year, and duration. The optional part IF NOT EXISTS ensures that the table is only created if it does not already exist. If it does exist, then the command does nothing.

Even though our table is empty for now, the schema is clear. We are setting up a table that records relevant information about songs. The information we want to track for each song is the name, artist, album, year, and duration. We will soon populate this table with rows that represent various songs.

After running the Python file, your immediate idea might be to open the database file music.db in your current directory to investigate what has happened. However, the information in music.db is not intended to be accessed like this. You will only see scrambled information there as this is a format not intended for viewing like this. We will have to write more SQL commands to read the information in the database.

Congratulations on learning your first SQL command! Make sure to separate in your mind what are SQL commands and what is the Python library sqlite3. It is only the sentence CREATE TABLE ... that is an SQL command. The connection and cursor are Python objects that are used to interact with the database.


Inserting Rows Into the Database

We now have a database with a single table. But the table is empty! For a database to be useful, we need some data in it. Let us now see how we can insert data into tables in our database with the SQL keywords INSERT INTO. We first create a list of songs, where each song is represented as a tuple of information:

# Rows for the songs table
songs = [
    ("I Wanna Dance with Somebody (Who Loves Me)", "Whitney Houston", "Whitney", 1987, 291),
    ("Dancing in the Dark", "Bruce Springsteen", "Born In The U.S.A.", 1984, 241),
    ("Take On Me", "a-ha", "Hunting High and Low", 1985, 225),
    ("Africa", "TOTO", "Toto IV", 1982, 295),
    ("Never Gonna Give You Up", "Rick Astley", "Whenever You Need Somebody", 1987, 213)
]

As you can see, each tuple has five parts that correspond to the columns in the songs table. For the first song, we have:

  • The name will be I Wanna Dance with Somebody (Who Loves Me).
  • The artist will be Whitney Houston.
  • The album will be Born In The U.S.A..
  • The year will be 1987.
  • The duration (in seconds) will be 291.

Now that we have our rows ready, we need to insert them into the songs table in the music.db database.

One way to do this is to insert a single row into the table at a time. The following code inserts the first song into the table:

# Insert a single value into the database
cursor.execute("INSERT INTO songs VALUES(?, ?, ?, ?, ?)", songs[0])
connection.commit()

Here we have picked out the first song in the songs list and insert this into the table songs. We use the SQL command INSERT INTO table VALUES where table is the table we want to insert into. Finally, we use the .commit() method to ensure that the transaction is fully complete.

By repeating this approach with a for-loop in Python, we can write the following code to insert all the rows into the table:

# Insert all the values into the table by looping
for song in songs:
    cursor.execute("INSERT INTO songs VALUES(?, ?, ?, ?, ?)", song)
    connection.commit()

There is no new SQL command here, only some Python logic to ensure that all the rows are inserted into the songs table.

The disadvantage of the above method is that it is pretty slow when you have a lot of rows to insert. In our example, everything is quick since we only have a few songs. But tables in relational databases can have millions or even billions of rows. Then looping in Python can slow down the insertion of rows.

The solution to this is to insert all of the rows at once, rather than loop through them. We can do this by using the .executemany() method on the cursor object, rather than the .execute() method that we have used so far. The following code inserts all the rows in a single batch:

# Can insert all the values at the same time with a batch approach
cursor.executemany("INSERT INTO songs VALUES(?, ?, ?, ?, ?)", songs)
connection.commit()

We now have a table songs in the database music.db that have some rows inserted. The code we have written so far (without comments) looks like this:

import sqlite3

songs = [
    ("I Wanna Dance with Somebody (Who Loves Me)", "Whitney Houston", "Whitney", 1987, 291),
    ("Dancing in the Dark", "Bruce Springsteen", "Born In The U.S.A.", 1984, 241),
    ("Take On Me", "a-ha", "Hunting High and Low", 1985, 225),
    ("Africa", "TOTO", "Toto IV", 1982, 295),
    ("Never Gonna Give You Up", "Rick Astley", "Whenever You Need Somebody", 1987, 213)
]

connection = sqlite3.connect("music.db")

cursor = connection.cursor()

cursor.execute("DROP TABLE IF EXISTS songs")

cursor.execute("CREATE TABLE IF NOT EXISTS songs(name, artist, album, year, duration)")

cursor.executemany("INSERT INTO songs VALUES(?, ?, ?, ?, ?)", songs)
connection.commit()

Looking closely, you will see that I have sneaked in a new line in the code. This is the line that executes the SQL command DROP TABLE IF EXISTS songs. If you run the code above, it first removes the table if it exists, and then builds it again.

This will avoid the experimentation we do leaving us with different results. By running the Python file above, we reset the state of the database and should get the same results in the next section. A statement like this in a production system would be very costly as the entire table is rebuilt every time we insert rows. However, this is fine for experimentation, which we are doing here.


Fetching Rows from the Database

Photo by Andy Powell on Unsplash
Photo by Andy Powell on Unsplash

It is now time to fetch rows back from the database. We will use the SQL keywords SELECT and FROM to do this. Let us start with just getting a single song back from the database:

# Fetch a single song
single_song = cursor.execute("SELECT * FROM songs").fetchone()
print(single_song)

As usual, we use the .execute() method on the cursor object to execute an SQL statement. The statement SELECT * FROM songs fetches all the columns and all the rows from the database. So this gives us everything. However, we use the .fetchone() method in sqlite3 to only fetch a single one of those rows. By doing this, we only print out a single song when running our Python file.

We have used the wildcard symbol * to retrieve all the columns back. If you only need some of the columns, then you can specify them as follows:

# Fetch only name and artist column of a single song
name_and_artist = cursor.execute("SELECT name, artist FROM songs").fetchone()
print(name_and_artist)

In addition to the method .fetchone(), you can also use the methods .fetchmany(number_of_rows) and .fetchall() to get more rows. The following code selects all the songs with the .fetchall() method:

# Getting all the rows and columns back
full_songs = cursor.execute("SELECT * FROM songs").fetchall()
print(full_songs)

Once you have the information back into Python, you can use standard Python logic to get useful insights. The following code exemplifies this, by finding the average duration for all the songs in our database:

# Get the average duration with Python logic
average_duration = 0
for song in full_songs:
    average_duration += song[-1]
average_duration = average_duration / len(full_songs)
print(f"The average 80s song is {int(average_duration // 60)} minutes and {int(average_duration % 60)} seconds long.")

You might think that we are going a bit back and forth. We already have the original songs list in the Python script, so why do we first need to insert this into the database, and then retrieve it again? This is where a tutorial is a bit artificial. In practice, the Python scripts that insert and fetch data from the database are not the same. There might also be multiple Python scripts (or other interfaces) that insert data into the database. Hence fetching the data from the database and calculating the average duration might be our only option to get this information.

Finally, before finishing up we need to close the database connection. We opened the connection to the database with the connection() function earlier. If we don’t close it, it will remain open and can cause performance and persistence problems in more complicated applications. It is a good practice always to ensure the connection to the database is closed. To do this in sqlite3, we can use the .close() method on the connection object:

# Close the connection
connection.close()

The following code demonstrates everything we have done:

import sqlite3

songs = [
    ("I Wanna Dance with Somebody (Who Loves Me)", "Whitney Houston", "Whitney", 1987, 291),
    ("Dancing in the Dark", "Bruce Springsteen", "Born In The U.S.A.", 1984, 241),
    ("Take On Me", "a-ha", "Hunting High and Low", 1985, 225),
    ("Africa", "TOTO", "Toto IV", 1982, 295),
    ("Never Gonna Give You Up", "Rick Astley", "Whenever You Need Somebody", 1987, 213)
]

connection = sqlite3.connect("music.db")

cursor = connection.cursor()

cursor.execute("DROP TABLE IF EXISTS songs")

cursor.execute("CREATE TABLE IF NOT EXISTS songs(name, artist, album, year, duration)")

cursor.executemany("INSERT INTO songs VALUES(?, ?, ?, ?, ?)", songs)
connection.commit()

full_songs = cursor.execute("SELECT name, artist, album, year, duration FROM songs").fetchall()

average_duration = 0
for song in full_songs:
    average_duration += song[-1]
average_duration = average_duration / len(full_songs)
print(f"The average 80s song is {int(average_duration // 60)} minutes and {int(average_duration % 60)} seconds long.")

connection.close()

Wrapping Up

Photo by Spencer Bergen on Unsplash
Photo by Spencer Bergen on Unsplash

I hope this blog post helped you understand SQL commands and the sqlite3 library in Python. If you are interested in AI, data science, or data engineering, then feel free to follow me or connect on LinkedIn.

Like my writing? Check out some of my other posts for more content:


Related Articles