This project involved a full-stack data analysis process, from data cleaning and database setup to querying for insights.
Before writing the SQL queries, the dataset was cleaned using Microsoft Excel. The data cleaning process included:
The analysis was performed on a PostgreSQL database. The setup process included the following steps:
paintings
.
The queries operate on the following key tables and their relationships:
artist
: Contains artist details (artist_id
, full_name
, nationality
, artist_style
, birth_year
, death_year
).work
: Contains painting details (work_id
, work_name
, artist_id
, museum_id
, art_style
).museum
: Contains museum details (museum_id
, museum_name
, city
, country
, postal_code
, url
).museum_hours
: Stores daily opening and closing times for museums (museum_id
, day_of_week
, opening_time
, closing_time
).product_details
: Links paintings to pricing information (work_id
, size_id
, sale_price
, regular_price
).subject
: Lists the subjects of each painting (work_id
, subject
).canvas_size
: Details about the paintings canvasFinds the artists with the most paintings in the database.
SELECT a.artist_id, a.full_name, a.nationality, COUNT(w.work_id) AS number_of_paintings
FROM work AS w
JOIN artist AS a ON w.artist_id = a.artist_id
GROUP BY a.artist_id, a.full_name, a.nationality
ORDER BY number_of_paintings DESC
LIMIT 5;
Finds artists whose work is displayed in museums in more than one country.
SELECT a.full_name, COUNT(DISTINCT m.country) AS number_of_countries
FROM artist AS a
JOIN work AS w ON a.artist_id = w.artist_id
JOIN museum AS m ON w.museum_id = m.museum_id
GROUP BY a.full_name
HAVING COUNT(DISTINCT m.country) > 1
ORDER BY number_of_countries DESC;
Defines popularity by the total number of paintings held by a museum
SELECT m.museum_name, m.city, COUNT(w.work_id) AS count_of_paintings
FROM work AS w
JOIN museum AS m ON w.museum_id = m.museum_id
GROUP BY m.museum_name, m.city
ORDER BY count_of_paintings DESC
LIMIT 5;
Finds museums that have entries for all seven days of the week in the museum_hours table.
SELECT m.museum_name, m.city
FROM museum AS m
WHERE m.museum_id IN
(
SELECT mh.museum_id
FROM museum_hours AS mh
GROUP BY mh.museum_id
HAVING COUNT(mh.day_of_week) = 7
);
Find the Most and Least Expensive Paintings
This query uses UNION ALL to combine two result sets, identifying both the highest and lowest-priced paintings in the database.
(
SELECT
'Most Expensive' AS price_category,
a.full_name AS artist_name,
w.work_name,
pd.sale_price,
m.museum_name,
m.city,
cs.canvas_label
FROM product_details AS pd
JOIN work AS w ON pd.work_id = w.work_id
JOIN artist AS a ON a.artist_id = w.artist_id
JOIN museum AS m ON w.museum_id = m.museum_id
JOIN canvas_size AS cs ON cs.size_id = pd.size_id
ORDER BY pd.sale_price DESC
LIMIT 1
)
UNION ALL
(
SELECT
'Least Expensive' AS price_category,
a.full_name AS artist_name,
w.work_name,
pd.sale_price,
m.museum_name,
m.city,
cs.canvas_label
FROM product_details AS pd
JOIN work AS w ON pd.work_id = w.work_id
JOIN artist AS a ON a.artist_id = w.artist_id
JOIN museum AS m ON w.museum_id = m.museum_id
JOIN canvas_size AS cs ON cs.size_id = pd.size_id
ORDER BY pd.sale_price ASC
LIMIT 1
);
Identifies painting styles with the highest and lowest number of works.
-- Which are the 3 most popular painting styles?
SELECT w.art_style, COUNT(w.work_id) AS number_of_paintings
FROM work AS w
GROUP BY w.art_style
ORDER BY number_of_paintings DESC
LIMIT 3;
-- Which are the 3 least popular painting styles?
SELECT w.art_style, COUNT(w.work_id) AS number_of_paintings
FROM work AS w
GROUP BY w.art_style
ORDER BY number_of_paintings
LIMIT 3;