웅진X유데미 STARTERS/TIL (Today I Learned)

[유데미 스타터스 취업 부트캠프 4기] SQL 최종평가 공부 (쿼리)

라밍쏭 2023. 4. 22. 18:05
728x90

1. 월 별 매출액 분석

1.1 월 별 매출액, 주문건수, 구매자수 추출

SELECT to_char(o.order_date, 'YYYY-MM') AS order_month
	 , sum(od.unit_price * od.quantity * (1-od.discount)) AS sales
	 , count(DISTINCT o.order_id) AS 주문건수
	 , count(DISTINCT c.customer_id) AS 구매자수
FROM orders o 
	INNER JOIN order_details od 
		ON o.order_id = od.order_id
	INNER JOIN customers c 
		ON o.customer_id = c.customer_id 
GROUP BY 1
ORDER BY 1;

 

1.2 월 별 인당 평균 매출액, 건당 평균 매출액 추출

SELECT to_char(o.order_date, 'YYYY-MM') AS order_date
	 , sum(od.unit_price * od.quantity * (1-od.discount)) AS sales
	 , sum(od.unit_price * od.quantity * (1-od.discount)) / count(DISTINCT c.customer_id) AS 인당매출액
	 , sum(od.unit_price * od.quantity * (1-od.discount)) / count(DISTINCT o.order_id) AS 건당매출액
FROM orders o 
	INNER JOIN order_details od 
		ON o.order_id = od.order_id
	INNER JOIN customers c 
		ON c.customer_id = o.customer_id
GROUP BY 1;

 

1.3 월 별 평균 매출액, 주문수, 고객수 상관관계

WITH cte_sales AS (
	SELECT to_char(o.order_date, 'YYYY-MM') AS order_date
		 , sum(od.unit_price * od.quantity * (1-od.discount)) AS 매출액
		 , count(DISTINCT o.order_id) AS 주문수
		 , count(DISTINCT c.customer_id) AS 고객수
	FROM orders o 
		INNER JOIN order_details od 
			ON o.order_id = od.order_id
		INNER JOIN customers c 
			ON c.customer_id = o.customer_id
	GROUP BY 1)
SELECT corr(매출액, 주문수) AS sales_order
	 , corr(매출액, 고객수) AS sales_customer
	 , corr(주문수, 고객수) AS order_customer
FROM cte_sales;

 

2. 윈도우 함수

2.1 전체 집계

WITH cte_orders AS (
	SELECT to_char(o.order_date, 'YYYY-MM-DD') AS order_date
		 , sum(od.unit_price * od.quantity * (1-od.discount)) AS 매출액
	FROM orders o 
		INNER JOIN order_details od 
			ON o.order_id = od.order_id
		INNER JOIN customers c 
			ON c.customer_id = o.customer_id
	GROUP BY 1)
SELECT *
	 , sum(매출액) over() AS 전체매출액
FROM cte_orders;

 

2.2 누적 합계

- 월 단위로 누적 합계 (→ 월이 변경되면 새롭게 합계가 누적됨)
    - `PARTITION BY YEAR, MONTH` : 누적 합계 그룹
    - `ORDER BY YEAR, MONTH, DAY` : 순서

WITH cte_orders AS (
	SELECT to_char(o.order_date, 'YYYY') AS YEAR
		 , to_char(o.order_date, 'MM') AS MONTH
		 , to_char(o.order_date, 'DD') AS DAY
		 , od.unit_price * od.quantity * (1-od.discount) AS sales
	FROM orders o 
		INNER JOIN order_details od 
			ON o.order_id = od.order_id
		INNER JOIN customers c 
			ON c.customer_id = o.customer_id)
SELECT *
	 , sum(sales) over(PARTITION BY YEAR, MONTH ORDER BY YEAR, MONTH, DAY) AS 전체매출액
FROM cte_orders;

 

2.3 lag 함수 (이전 행 출력)

WITH cte_orders AS (
	SELECT to_char(o.order_date, 'YYYY') AS YEAR
		 , to_char(o.order_date, 'MM') AS MONTH
		 , to_char(o.order_date, 'DD') AS DAY
		 , sum(od.unit_price * od.quantity * (1-od.discount)) AS 매출액
	FROM orders o 
		INNER JOIN order_details od 
			ON o.order_id = od.order_id
		INNER JOIN customers c 
			ON c.customer_id = o.customer_id
	GROUP BY 1, 2, 3
	ORDER BY 1, 2, 3)
SELECT *
	 , lag(매출액) over(PARTITION BY MONTH ORDER BY YEAR, MONTH, DAY) AS 전월매출액
FROM cte_orders;

 

2.4 lead 함수 (다음 행 출력)

WITH cte_orders AS (
	SELECT to_char(o.order_date, 'YYYY') AS YEAR
		 , to_char(o.order_date, 'MM') AS MONTH
		 , to_char(o.order_date, 'DD') AS DAY
		 , sum(od.unit_price * od.quantity * (1-od.discount)) AS 매출액
	FROM orders o 
		INNER JOIN order_details od 
			ON o.order_id = od.order_id
		INNER JOIN customers c 
			ON c.customer_id = o.customer_id
	GROUP BY 1, 2, 3
	ORDER BY 1, 2, 3)
SELECT *
	 , LEAD(매출액) over(PARTITION BY YEAR, MONTH ORDER BY YEAR, MONTH, DAY) AS 익월매출액
FROM cte_orders;

 

2.5 윈도우 프레임 지정 (이동평균)

WITH cte_orders AS (
	SELECT to_char(o.order_date, 'YYYY') AS YEAR
		 , to_char(o.order_date, 'MM') AS MONTH
		 , to_char(o.order_date, 'DD') AS DAY
		 , avg(od.unit_price * od.quantity * (1-od.discount)) AS avg_sales
	FROM orders o 
		INNER JOIN order_details od 
			ON o.order_id = od.order_id
		INNER JOIN customers c 
			ON c.customer_id = o.customer_id
	GROUP BY 1, 2, 3
	ORDER BY 1, 2, 3)
-- 일 기준 1~4번 데이터는 5개 미만의 데이터의 값들을 이동평균 계산하므로 5 이상인 애들만 이동평균 계산
SELECT *
	 , CASE WHEN row_number() over(ORDER BY YEAR, MONTH, DAY) >= 5 
	 			THEN avg(avg_sales) over(ORDER BY YEAR, MONTH, DAY ROWS BETWEEN 4 PRECEDING AND CURRENT ROW) 
	 		END AS "5일이동평균"
FROM cte_orders;

 

3. 매출 증감 분석

1. YEAR, MONTH를 기준으로 총 매출액을 출력
2. LAG() 함수를 활용하여 전월매출액을 계산
3. 총 매출액과 전월 매출액의 차이를 구하여 비율까지 구하기

-- 1) 월 별 매출액 계산
WITH cte_orders AS (
	SELECT to_char(o.order_date, 'YYYY-MM') AS order_month
		 , sum(od.unit_price * od.quantity * (1-od.discount)) AS 매출액
	FROM orders o 
		INNER JOIN order_details od 
			ON o.order_id = od.order_id
		INNER JOIN customers c 
			ON c.customer_id = o.customer_id
	GROUP BY 1
	ORDER BY 1)
-- 2) lag 함수로 전월 매출액 계산
-- 3) 전월대비 증감비율 계산
SELECT *
	 , 매출액
	 , lag(매출액) over(ORDER BY order_month) AS 전월매출액
	 , 매출액 - lag(매출액) over(ORDER BY order_month) AS 전월대비증감액
	 , round((매출액 / lag(매출액) over(ORDER BY order_month) * 100)::numeric, 2) AS 전월대비증감비율
	 , CASE WHEN 매출액 - lag(매출액) over(ORDER BY order_month) < 0 THEN '-'
	 		WHEN 매출액 - lag(매출액) over(ORDER BY order_month) > 0 THEN '+'
	 		ELSE '해당없음'
	 	END AS 증감여부
FROM cte_orders;

 

4. ABC 분석

✅ 순서
1. 제품 별 매출액 구하기
2. 제품 기준 매출액 누계 구하기
3. 매출 비율 구하기
4. 등급 매기기

-- 1) 제품 별 매출액 구하기
WITH cte_sales AS (
	SELECT p.product_name 
		 , sum(od.unit_price * od.quantity * (1-od.discount)) AS 매출액
	FROM orders o 
		INNER JOIN order_details od 
			ON o.order_id = od.order_id
		INNER JOIN products p  
			ON p.product_id = od.product_id 
	GROUP BY 1
	ORDER BY 2 DESC)
-- 2) 제품 기준 매출 비율 구하기
, cte_num_sum AS (
	SELECT product_name
		 , 매출액
		 , (매출액 / sum(매출액) over())*100 AS 매출비율
	FROM cte_sales)
-- 3) 매출 비율 누계 구하기
, cte_percent AS (
	SELECT *
		 , sum(매출비율) over(ORDER BY 매출액 DESC) AS 매출비율누계
	FROM cte_num_sum)
SELECT *
	 , CASE WHEN 매출비율누계 <= 70 THEN 'A'
	 		WHEN 매출비율누계 <= 90 THEN 'B'
	 		ELSE 'C'
	 	END AS 등급
FROM cte_percent;

 

5. 피벗 테이블

✅ 국가별 판매수량 TOP 5
1. 필요한 기본 테이블 만들기
2. 판매수량, 매출액 기준 rank 산정
3. 피벗 테이블 만들기 (case when 구문 활용)

-- 1) 필요한 기본 테이블 만들기
WITH cte_products AS (
	SELECT c.country 
		 , p.product_name
		 , sum(od.quantity) AS 판매수량
		 , sum(od.unit_price * od.quantity * (1-od.discount)) AS 매출액
	FROM orders o 
		INNER JOIN order_details od 
			ON o.order_id = od.order_id
		INNER JOIN products p  
			ON p.product_id = od.product_id
		INNER JOIN customers c
			ON c.customer_id = o.customer_id
	GROUP BY 1, 2)
-- 2) 판매수량, 매출액 기준 rank 산정
, cte_rank AS (
	SELECT rank() over(PARTITION BY country ORDER BY 판매수량 DESC, 매출액 DESC) AS RANK
		 , country
		 , product_name
		 , 매출액
	FROM cte_products)
-- 3) 피벗 테이블 만들기
SELECT country
	 , max(CASE WHEN RANK = 1 THEN product_name END) AS "rank_1"
	 , max(CASE WHEN RANK = 2 THEN product_name END) AS "rank_2"
	 , max(CASE WHEN RANK = 3 THEN product_name END) AS "rank_3"
	 , max(CASE WHEN RANK = 4 THEN product_name END) AS "rank_4"
	 , max(CASE WHEN RANK = 5 THEN product_name END) AS "rank_5"
FROM cte_rank
GROUP BY 1
ORDER BY sum(매출액) desc;

 

6. Z차트

1. 기본 테이블 만들기 (월 별 매출액 계산)
2. 이동 연계 계산 (11개월 이동합계 (매출액))
3. 매출 누계 계산
4. 기간 : 1997-06 ~ 1998-04

-- 1) 기본 테이블 만들기 (월 별 매출액 계산)
WITH cte_sales AS (
	SELECT to_char(o.order_date, 'YYYY-MM') AS order_month
		 , sum(od.unit_price * od.quantity * (1-od.discount)) AS 월별매출액
	FROM orders o 
		INNER JOIN order_details od 
			ON o.order_id = od.order_id
	GROUP BY 1
	ORDER BY 1)
-- 2) 이동 연계 계산 (11개월 이동합계 (매출액))
, cte_moving_sum AS (
	SELECT *
		 , sum(월별매출액) over(ORDER BY order_month ROWS BETWEEN 10 PRECEDING AND CURRENT ROW) AS 이동연계
	FROM cte_sales)
-- 3) 매출 누계 계산
SELECT order_month
	 , 월별매출액
	 , sum(월별매출액) over(ORDER BY order_month) AS 매출액누계
	 , 이동연계
FROM cte_moving_sum
WHERE order_month BETWEEN '1997-06' AND '1998-04';

 

7. 그룹함수

- GROUPING SETS
- CUBE
- ROLLUP

 

7.1 GROUPING SETS

각 그룹 별로 집계한 값들이 출력됨 (각 컬럼 별로 해당되지 않는 데이터는 null 값으로 출력됨
→ 카테고리, 제품, 공급업체, 국가, (국가, 도시), 전체 매출액

 

WITH cte_sales AS (
	SELECT c2.category_name
		 , p.product_name 
		 , c.company_name 
		 , c.country 
		 , c.city 
		 , sum(od.unit_price * od.quantity * (1-od.discount)) AS 매출액
	FROM orders o 
		INNER JOIN order_details od 
			ON o.order_id = od.order_id
		INNER JOIN products p  
			ON p.product_id = od.product_id
		INNER JOIN customers c
			ON c.customer_id = o.customer_id
		INNER JOIN categories c2 
			ON c2.category_id = p.category_id
	GROUP BY GROUPING SETS (c2.category_name, p.product_name, c.company_name, c.country, (c.country, c.city), ()))
SELECT *
FROM cte_sales;

 

-- CTE 테이블 생략 --
-- category_name 별 매출액만 보기
SELECT category_name
	 , 매출액
FROM cte_sales
WHERE category_name IS NOT null;

 

7.2 ROLLUP

→ 각 컬럼 별 소계를 나타냄

WITH cte_rollup AS (
	SELECT c.category_name
		 , p.product_name 
		 , sum(od.unit_price * od.quantity * (1-od.discount)) AS 매출액
	FROM orders o 
		INNER JOIN order_details od 
			ON o.order_id = od.order_id
		INNER JOIN products p  
			ON p.product_id = od.product_id
		INNER JOIN categories c
			ON c.category_id = p.category_id
	GROUP BY ROLLUP (c.category_name, p.product_name))
SELECT *
FROM cte_rollup
ORDER BY 1, 2;

 

7.3 CUBE

→ 컬럼 조합의 모든 경우의 수에 해당하는 소계 출력

WITH cte_cube AS (
	SELECT c.category_name
		 , p.product_name 
		 , sum(od.unit_price * od.quantity * (1-od.discount)) AS 매출액
	FROM orders o 
		INNER JOIN order_details od 
			ON o.order_id = od.order_id
		INNER JOIN products p  
			ON p.product_id = od.product_id
		INNER JOIN categories c
			ON c.category_id = p.category_id
	GROUP BY CUBE (c.category_name, p.product_name))
SELECT *
FROM cte_cube
ORDER BY 1, 2;

 

8. Decil 분석

① 고객의 총 매출액 기준으로 정렬

② 상위부터 10%씩 나누어 10개의 그룹 할당

③ decil 별 매출합계

④ decil 별 구성비

⑤ decil 별 구성비 누계

-- ① 고객의 총 매출액 기준으로 정렬
WITH cte_customers AS (
	SELECT o.customer_id 
		 , sum(od.unit_price * od.quantity * (1-od.discount)) AS 매출액
	FROM orders o 
		INNER JOIN order_details od 
			ON o.order_id = od.order_id
	GROUP BY 1
	ORDER BY 2 DESC)
-- ② 상위부터 10%씩 나누어 10개의 그룹 할당
, cte_ntile AS (
	SELECT *
		, ntile(10) over(ORDER BY 매출액 DESC) AS 그룹
	FROM cte_customers)
-- ③ decil 별 매출합계
, cte_group_sales AS (
	SELECT 그룹
		 , sum(매출액) AS 매출액
	FROM cte_ntile
	GROUP BY 1
	ORDER BY 1)
--④ decil 별 구성비
, cte_group_percent AS (
	SELECT 그룹
		 , 매출액
		 , round((매출액 / sum(매출액) over() * 100)::NUMERIC, 2) AS 구성비
	FROM cte_group_sales)
--⑤ decil 별 구성비 누계
SELECT *
	 , sum(구성비) over(ORDER BY 구성비 DESC) AS 구성비누계
FROM cte_group_percent;

 

9. RFM 분석 (고객 분석)

-- 1) 기본 테이블 생성
WITH cte_customers AS (
	SELECT c.customer_id -- 고객분석이기 때문에 customer_id가 필요함
		 , o.order_id -- F를 계산하기 위해 필요함
		 , sum(od.unit_price * od.quantity * (1-od.discount)) AS sales
		 , max(o.order_date) over() - max(o.order_date) AS date_diff
	FROM orders o 
		INNER JOIN order_details od 
			ON o.order_id = od.order_id
		INNER JOIN customers c
			ON c.customer_id = o.customer_id
	GROUP BY 1, 2)
-- 2) rfm 점수 계산
, cte_rfm AS (
	SELECT customer_id 
		 , min(date_diff) AS R   -- datediff에서 min 값 계산하기
		 , count(DISTINCT order_id) AS F
		 , sum(sales) AS M
	FROM cte_customers
	GROUP BY 1) --customer_id로 그룹화하기
-- 3) rfm 별 등급 (선택) → 개별 기준 또는 ntile 함수를 활용하여 score를 계산하기
SELECT *
FROM cte_rfm;

 

10. 재구매율 (self join)

-- 1) 고객 별 구매 연도 출력
WITH cte_customers AS (
	SELECT DISTINCT o.customer_id -- 고객 별 재구매율이기 때문에 distinct 추가 (+ 많은 컬럼을 select 하는 것이 아니기에 distinct, 만약 컬럼이 많다면 고민해봐야 함)
		 , to_char(o.order_date, 'YYYY') AS YEAR
	FROM orders o 
		INNER JOIN order_details od 
			ON o.order_id = od.order_id
	ORDER BY 1, 2)
-- 2) self join(left join)을 활용하여 기준 해, 다음 해 컬럼 생성
, cte_self_join AS (
	SELECT c1.customer_id
		 , c1.YEAR
		 , c2.YEAR AS next_year
	FROM cte_customers c1
		LEFT JOIN cte_customers c2
			ON c1.customer_id = c2.customer_id
			AND c1.YEAR::integer + 1 = c2.YEAR::integer)
-- 3) 재구매율 구하기 (각 연 컬럼의 count로 구할 수 있음)
SELECT YEAR
	 , count(year) AS 당해구매자수
	 , count(next_year) AS 다음해재구매자수
	 , round((count(next_year) / count(year)::numeric)*100, 2) AS 재구매율
FROM cte_self_join
GROUP BY year;

 

11. 이탈고객

--① 고객 별 최종 구매일 추출
WITH cte_customers AS (
	SELECT o.customer_id
		 , max(o.order_date) AS 고객최종구매일
	FROM orders o 
		INNER JOIN order_details od 
			ON o.order_id = od.order_id
	GROUP BY 1)
--② 경과일 계산
, cte_date_diff AS (
	SELECT *
		 , max(고객최종구매일) over() - 고객최종구매일 AS 경과일
	FROM cte_customers
	ORDER BY 3 DESC)
--③ 이탈 고객 여부 (마지막 구매일 이후 90일 이상 경과한 고객)
, cte_bye_cust AS (
	SELECT customer_id
		 , CASE WHEN 경과일 >= 90 THEN 1
		 		ELSE 0
		 	END AS 이탈고객여부
	FROM cte_date_diff)
--④ 이탈률 계산
SELECT count(*) AS 전체고객수
	 , sum(이탈고객여부) AS 이탈고객수
	 , round(sum(이탈고객여부) / count(*)::NUMERIC *100, 2) AS 이탈률
FROM cte_bye_cust;

 

12. 백분위수, 최빈값

✔ percentile_cont 함수

: 백분위수를 연속값으로 계산
→ 입력받은 수치값이 어느 두 값의 사이일 경우 두 값의 평균을 계산하여 리턴


✔ percentile_disc 함수

: 백분위수를 이산값으로 반환
→ 입력받은 수치값이 어느 두 값의 사이일 경우 두 값중 작은 값을 리턴


✔ mode 함수

: 최빈값 반환

 

12.1 백분위수 구하기

-- 1) percentile_cont
-- 두 값일 경우 평균 계산
WITH numbers(n) AS (
	VALUES (1), (1), (2), (3), (4), (5))
SELECT percentile_cont(0.25) WITHIN GROUP (ORDER BY n)
FROM numbers;

-- 2) percentile_disc
-- 두 값일 경우 작은 수 리턴
WITH numbers(n) AS (
	VALUES (1), (1), (2), (3), (4), (5))
SELECT percentile_disc(0.5) WITHIN GROUP (ORDER BY n)
FROM numbers;

-- 3) 적용 (사분위수 구하기)
SELECT percentile_cont(0.25) WITHIN GROUP (ORDER BY unit_price) AS q1
	 , percentile_cont(0.5) WITHIN GROUP (ORDER BY unit_price) AS q2
	 , percentile_cont(0.75) WITHIN GROUP (ORDER BY unit_price) AS q3
FROM products;

 

12.2 최빈값 구하기

/* 13. 최빈값 구하기 */
-- 1) mode 함수
WITH numbers(n) AS (
	VALUES (1), (1), (2), (3), (4), (5))
SELECT mode() WITHIN GROUP (ORDER BY n)
FROM numbers;

-- 2) 적용
SELECT mode() WITHIN GROUP (ORDER BY to_char(order_date, 'MM'))
FROM orders;

 

13. 기본 문법

 

[스타터스 TIL] 46일차.SQL 실전 트레이닝 (2) - 컬럼 연산자, 날짜/시간형 데이터, 다중 행 함수

1. 데이터 추출 1.1 컬럼 추출 ✔ customer 테이블의 country, city 데이터를 유니크하게 추출하고 country 오름차순, city 내림차순으로 정렬하기 SELECT DISTINCT country, city FROM customers ORDER BY country, city DESC; ✔

ars420.tistory.com

 

 

 

[스타터스 TIL] 51일차.SQL 실전 트레이닝 (7) - RFM 분석, 재구매율, 이탈고객 분석, 백분위수, 최빈값

1. RFM 분석 - 얼마나 최근에 구매를 했는지 - 얼마나 빈번하게 구매를 했는지 - 얼마나 많은 금액을 지불을 했는지를 확인하기 위한 지표 1.1 RFM 집계 WITH cte_customers AS ( SELECT c.customer_id, c.company_name

ars420.tistory.com

 

728x90