In this article, we will go over a Sql question from an Amazon data science interview. Hope the procedure explained in this article would help you become more effective in writing SQL queries.
SQL Question:
Best Selling Item
Find the best-selling item for each month (no need to separate months by year) where the biggest total invoice was paid. The best-selling item is calculated using the formula (unitprice * quantity). Output the description of the item along with the amount paid.
Source: stratascratch
I would recommend clicking the link to practice writing queries for this exercise.
Table: online_retail

Step 1: let’s check out the raw data first.
- invoiceno: this field includes invoice ID which can uniquely identify a transaction.
- stockcode: this field includes product ID which can uniquely identify a product.
- description: this field includes product description which can also uniquely identify a product
- invoicedate: this field includes the date of a transaction. We need this field to create the month variable.
- quanity, unitprice: these fields include quantity and unit price of the sold product in a given transaction.
- customerid, country: these fields are customer ID and country information, which are not relevant for this problem.
Step 2: let’s brainstorm how to solve the problem.
Ultimately we need to return the best-selling product in each month. To start we need to create two new variables,
- amount: as instructed, the dollar amount of a transaction can be computed as (unitprice * quantity)
- month: we need to extract the month information from a transaction date
Next, we can compute the total transaction dollars of each product in a given month, then rank them based on transaction dollars in each month. Lastly, we return the best-selling product in each month.
Step 3: let’s prepare the data and get it ready for analysis. For this exercise, we would need to prepare the data in several mini steps.
We can create the two new variables we identify in step 2 using the following code.
SELECT *,
quantity * unitprice AS amount,
DATE_PART('month', invoicedate) AS month
FROM online_retail
ORDER BY DATE_PART('month', invoicedate), stockcode
Let’s understand the code in detail:
DATE_PART
: this function allows us to extract time information, such as year, month, day, hour, minutes, and second from a date or timestamp value. See more examples,
SELECT
DATE_PART('year', '2014-04-01 06:16:36'::TIMESTAMP), -- returns 2014
DATE_PART('month', '2014-04-01 06:16:36'::TIMESTAMP), -- returns 4
DATE_PART('day', '2014-04-01 06:16:36'::TIMESTAMP), -- returns 1
DATE_PART('hour', '2014-04-01 06:16:36'::TIMESTAMP), -- returns 6
DATE_PART('minute', '2014-04-01 06:16:36'::TIMESTAMP), -- returns 16
DATE_PART('second', '2014-04-01 06:16:36'::TIMESTAMP), -- returns 36
DATE_PART('year', '2014-04-01'::DATE), -- returns 2014
DATE_PART('month', '2014-04-01'::DATE), -- returns 4
DATE_PART('day', '2014-04-01'::DATE), -- returns 1
DATE_PART('hour', '2014-04-01'::DATE), -- returns 0 based on 00:00:00
DATE_PART('minute', '2014-04-01'::DATE), -- returns 0
DATE_PART('second', '2014-04-01'::DATE) -- returns 0
When we run the above code, we can produce a table like the following.

Next, we can use the aggregate function SUM()
to compute the total transaction dollars for each product in a given month with GROUP BY month, description
. Since we are going to prepare the data in several steps, it is recommended to use WITH
statement to create temporary tables.
WITH cte AS(
SELECT *,
quantity * unitprice AS amount,
DATE_PART('month', invoicedate) AS month
FROM online_retail
)
SELECT month, description, SUM(amount) AS tot_dollar
FROM cte
GROUP BY month, description
ORDER BY month, tot_dollar DESC
When we run the above code, we can produce a table like the following.

Then we rank the products based on total transaction dollars in each month. We can use the window function RANK()
in each partition (i.e., month) with OVER(PARTITION BY month ORDER BY tot_dollar DESC)
.
OVER
: this indicates the function we use here is a window function, not an aggregate function.PARTITION BY
: this partitions the rows in the data table, so that we can define which rows the window function would be applied to. For this exercise, we define a partition based on the month variable.ORDER BY
is used to sort the observations within each partition. In this exercise, we would sort total transaction dollars in each partition.DESC
would make sure we sort the rows from the highest value to the lowest.
WITH cte AS(
SELECT *,
quantity * unitprice AS amount,
DATE_PART('month', invoicedate) AS month
FROM online_retail
),
monthly_total_by_product AS(
SELECT month, description, SUM(amount) AS tot_dollar
FROM cte
GROUP BY month, description
)
SELECT month, description, tot_dollar,
RANK() OVER(PARTITION BY month ORDER BY tot_dollar DESC) AS rank
FROM monthly_total_by_product
When we run the above code, we can produce a table like the following.

Step 4: Once the data is prepared in step 3, it becomes straightforward to keep the best-selling product in each month using WHERE RANK = 1
.
Final solution:
WITH cte AS(
SELECT *,
quantity * unitprice AS amount,
DATE_PART('month', invoicedate) AS month
FROM online_retail
),
monthly_total_by_product AS(
SELECT month, description, SUM(amount) AS tot_dollar
FROM cte
GROUP BY month, description
),
monthly_ranking AS (
SELECT month, description, tot_dollar,
RANK() OVER(PARTITION BY month ORDER BY tot_dollar DESC) AS rank
FROM monthly_total_by_product
)
SELECT month, description, tot_dollar
FROM monthly_ranking
WHERE rank = 1
ORDER BY MONTH
Answer:

If you would like to explore more Sql Interview Questions, please check out my articles:
- Comprehensive SQL Cheat Sheet
- Crack SQL Interview Question with Useful Procedure
- Crack SQL Interview Question: Subquery vs. CTE
- Crack SQL Interview Question: Join vs Case-When Statement
- Crack SQL Interview Question: Window Functions with Partition-By
- Crack SQL Interview Question: Date_Part Function
- Crack SQL Interview Questions: ROW_NUMBER, RANK and DENSE_RANK
- Crack SQL Interview Question: UNNEST, STRING_TO_ARRAY
- Crack SQL Interview Question: GENERATE_SERIES, STRING_AGG, SPLIT_PART
- Crack SQL Interview Question: Self-Join and Non-Equi Join
- Crack SQL Interview Question: ANY Operator
- Crack SQL Interview Question: Subquery
Thank you for reading !!!
If you enjoy this article and would like to Buy Me a Coffee, please click here.
You can sign up for a membership to unlock full access to my articles, and have unlimited access to everything on Medium. Please subscribe if you’d like to get an email notification whenever I post a new article.