As Data scientists, most of us are used to filter tables using pandas (on Python) __ or dplyr (on R). Indeed, there are certain wrangling operations that are more convenient to do in both libraries and in the context of the languages we use to code our algorithms.
But, if your data source is sitting on a database somewhere (and you are probably accessing it using some SQL interface), some filters and table operations would be more efficient using SQL code and performing those operations on the server side.
Filtering tables is one of the most common operations one does when working with SQL. On the surface, they seem pretty straightforward, you apply a WHERE
clause and that’s it! But.. filters have more than meets the eye and can get complex quite fast.
Learning how to apply the different filtering methods and leverage theWHERE
clause with IN
, AND
, OR
, precedence and wildcards will improve your SQL game enormously. In this post, we’ll explore those extra spices that can be applied when filtering rows from a table. By the end of the post, you should be able to understand most complex WHERE
statements in the majority of SQL queries out there.
I’ll use a sample table that contains data for students in a school – this table is completely made up and all names and information are fictional. I’m creating this table inside a sandbox
database:
create table sandbox.students (
student_id integer auto_increment primary key,
student_full_name varchar(30),
favorite_classes varchar(150),
student_age integer,
birth_date date,
degree varchar(30)
);
insert into sandbox.students (
student_full_name, favorite_classes, student_age, birth_date, degree
) values ('John Smith', 'Mathematics, Political Science, Biology', 23, '1999-03-01', 'Engineering'),
('Amy Smith', 'History, Political Science, Biology', 23, '1999-03-01', 'History'),
('Joe Adams', 'History, Mathematics', 23, '1999-06-01', 'History'),
('Joe Williams', 'Biology, Chemistry', 22, '2000-03-01', 'Chemistry'),
('Anne Williams', 'Mathematics, Arts', 22, '2000-03-16', 'Mathematics'),
('May Taylor', 'History, Geography', 22, '2000-05-19', 'History'),
('Zoe Brown', 'Physical Education, History', 21, '2001-02-18', 'Physical Education'),
('Jennifer Davies', 'Biology, Chemistry', 21, '2001-03-19', 'Chemistry'),
('Robert Jones', 'Chemistry, Biology, Mathematics', 21, '2001-06-02', 'Chemistry');
Our students table contains data of 9 college students – although a low sample, this should be enough to understand the main difference between the filtering concepts we will explore.
Let’s improve our Sql game!
AND / OR
The difference between these two clauses – AND
, OR
– is ultra relevant in filters and takes into account logical operations commonly used in other programming languages.
Imagine that, from our students table, we wanted to subset all the students from the "Chemistry" major. That would be easy with a standard WHERE
clause, like:
select * from sandbox.students
where degree = 'Chemistry'
This will output all the Chemistry students, 3 in this case:

But, what if we just wanted to subset the students that are 21 years Chemistry students? We can do that with an AND
clause:
select * from sandbox.students
where degree = 'Chemistry' AND student_age = 21;

The AND
clause enables us to select both conditions and only the rows that match exactly both conditions. In this case, we only select degree = 'Chemistry'
ANDstudent_age = 21
– both conditions must be matched by the row to be returned in the output set.
Naturally, a question arises. What if we wanted to match only one of the conditions? Meaning we want to select rows that match the first condition OR the second condition – this is also super easy in SQL, using the OR
clause:
select * from sandbox.students
where degree = 'Chemistry' OR student_age = 21;

Notice that we have 4 students in this result – this happened because we are choosing students that have degree = 'Chemistry'
OR student_age = 21
. In this case, a student can match only one of the conditions in the WHERE
clause.
That’s why we have a Physical Education student in there, that student has 21 years old and matches that condition only.
This is one of the most simple and effective concepts in SQL filters. Nevertheless, you’ll notice that when you start to play around with multiple AND
and OR
clauses, things can become quite messy. And that’s where filtering precedences enter the picture.
Operator Precedence
SQL enables the combination of multiple AND
and OR
conditions in the same WHERE
. This is really cool but also upgrades the complexity of our queries.
Luckily, by learning about filtering precedence, we are able to navigate the logic behind more complex filters. Operator precedence logic mimic what we know from mathematical equations – for instance, imagine the following SQL query with our students table:
select * from sandbox.students
where degree = 'Chemistry' OR student_age = 23 AND favorite_classes = 'History, Mathematics';
What will be the output of this query?
If you read the query directly, do you think SQL will select all the students that have the Chemistry Degree or are 23 years old but at the same time they have their favorite classes being History and Mathematics?
Wow, this sentence was confusing. Let’s break it apart.
Extremely important – AND
has priority over OR
! This means that the clauses connected with AND
will be tied together as a single clause – so we are searching for:
- Students that are 23 years old
AND
their favorite classes are "History, Mathematics";
Because of the AND
these two clauses are treated simultaneously, let’s call this clause 1.
Ok, so where will the degree = 'Chemistry'
enter the picture? Well, this clause is the other clause that we are including in the search:
- Every student that has
degree = 'Chemistry'
Let’s call this Clause 2.
Resuming, we are searching for all students that match Clause 1 OR
Clause 2. Meaning that a student that is not 23 years old and doesn’t have its favorite classes as History and Mathematics can still be selected in our query if its favorite degree is Chemistry.

Bottom line, first commandment of SQL filtering is: AND
has a priority over OR
. Second commandment is that we can override these rules using parenthesis, just like we would do in mathematical equations:
select * from sandbox.students
where (degree = 'Chemistry' OR student_age = 23) AND favorite_classes = 'History, Mathematics';
In this case, the operations are a bit different – clause 1 will be:
- Students that have a degree in Chemistry or are 23 years old.
And clause 2 is:
- Students with favorite Classes History and Mathematics.
In this case we want: "Every student with class History and Mathematics that is either in Chemistry or has 23 years old. In this case, only one student matches the condition:

Notice how a parenthesis changed the whole result of the query! Operator precedence are really important and completely change the result of your return data. In summary:
AND
has priority overOR
- Parenthesis change the precedence and artificially create a priority for the bit of code inside the parenthesis.
Nice! Let’s continue with another concept that reduces a significant portion of our code with multiple OR
conditions.
IN / NOT IN
IN
and NOT IN
operators are really cool ways to write conditions where you need to match multiple elements at the same time. For instance, imagine that I would ask you to filter the following students from the table: "Joe Adams" , "John Smith" and "Amy Adams".
Knowing the OR
condition, you could immediately write the following query:
select * from sandbox.students
where student_full_name = 'Joe Adams' or student_full_name = 'John Smith' or student_full_name = 'Amy Smith';

This seems a reasonable result but.. imagine you would like to subset twenty names to select? Do you really want to write the OR
condition 20 times? I think not!
Luckily, we have the IN
operator to will help us with that!
select * from sandbox.students
where student_full_name IN ('Joe Adams','John Smith','Amy Smith');
The IN
operator acts as a multiple OR
condition but written in a more elegant way. Inside the IN
, we can add multiple elements to match without repeating the "or student_full_name ="
multiple times.
The NOT IN
works exactly in the same way but it excludes the elements inside the parenthesis:
select * from sandbox.students
where student_full_name NOT IN ('Joe Adams','John Smith','Amy Smith');
This will select all the students except the ones inside the IN
clause:

The IN
clause is super cool and handy – we can use it to avoid writing really long SQL statements.
Wildcards
Our last SQL filtering concept is the wildcard. Wildcard are super useful to filter based on a "subset" condition.
For instance, imagine that we would like to select all students with "Smith" as last name – we could, of course, split the column to do that but, that would be extra code that is not needed if we know how to apply wildcards.
Wildcards are centered around three different elements:
- The
LIKE
clause; - The
%
character; - The
_
character;
Starting with our example – selecting all the students with last name "Smith":
select * from sandbox.students
where student_full_name LIKE '%Smith';
In this wildcard selection, the %
works as "does not matter what’s before Smith". Translating this query into a sentence we are selecting "every row where the full name ends with "Smith", regardless of what’s before that".

What if we want to select all the "Joes" from our table? That’s easy! We just put the wildcard %
character to the end of the element we are searching:
select * from sandbox.students
where student_full_name LIKE 'Joe%';

Cool! So the %
always works in this way – it makes our query agnostic to what’s before or after the string we are searching. We can also use a double sided %
:
select * from sandbox.students
where student_full_name LIKE '%am%';

This last query will search for all students that contain an "am" in their name somewhere. Amy Smith, Joe Adams, Joe Williams and Anne Williams. The double sided %
doesn’t care of what comes before or after the string we are searching, simultaneously.
Finally, let’s check the _
character:
select * from sandbox.students
where student_full_name LIKE '_o%';
The _
specifically points to a position where we are searching for a specific letter or string. For instance, here we are searching for all the names that contain an "o" in the second position, this will yield:

All these students have an "o" as the second letter of their letter. What if we only want the students with an "e" on the third letter of their name? That’s also easy with the _
character!
select * from sandbox.students
where student_full_name LIKE '__e%';
As we now wanted the third position of the "name" – we added two underscores: __
. This query yields:

Wildcards are really cool filtering elements! Mastering wildcards will simplify your queries and make them really efficient.
And we’re done! Thank you for taking the time to read this post. SQL is a really cool language to pump up our data wrangling game.
As a data scientist, there’s a really high chance that you will end up working with data stored in databases – grasping some of these concepts will improve your productivity and code efficiency.
I’ve set up a course on learning SQL from Scratch on Udemy where I’ve included a deep dive into these concepts and more – the course is suitable for absolute beginners and I would love to have you around!
