Skip to content

jaypark72/bigquery_style_guide

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

19 Commits
ย 
ย 
ย 
ย 

Repository files navigation

BigQuery Style Guide

๋ชฉ์ฐจ (Table of Contents)


์ผ๋ฐ˜ ๊ฐ€์ด๋“œ (General Guide)

์ด ๋ฌธ์„œ๋Š” ๋น…๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๋Š” ์—”์ง€๋‹ˆ์–ด(data engineer), ๋ฐ์ดํ„ฐ ๋ถ„์„๊ฐ€(data analyst)์™€ ๋ฐ์ดํ„ฐ ๊ณผํ•™์ž(data scientist)๊ฐ€ BigQuery ์–ธ์–ด๋กœ ์ฝ”๋“œ ์ž‘์„ฑ์‹œ์˜ ์ง€์นจ(guide)๊ณผ ์Šคํƒ€์ผ(style)์„ ์ œ์‹œํ•˜๊ณ  ์žˆ๋‹ค.

ํ•ต์‹ฌ์›์น™ (core principle)

  • ๋ฌธ์žฅ ๊ตฌ์กฐ์™€ ์ฝ”๋“œ ์Šคํƒ€์ผ์˜ ์ผ๊ด€์„ฑ์„ ์œ ์ง€ํ•˜์—ฌ ๊ฐ€๋…์„ฑ์ด ๋†’์€ ์ฝ”๋“œ๊ฐ€ ๋˜๋„๋ก ํ•œ๋‹ค.
  • ์ฝ๊ธฐ ์‰ฝ๊ณ  ์œ ์ง€๋ณด์ˆ˜๊ฐ€ ์šฉ์ดํ•œ ์ฝ”๋“œ๋ฅผ ์ตœ์šฐ์„ ์œผ๋กœ ํ•œ๋‹ค.
  • ์ค‘์–ธ๋ถ€์–ธ์ด๋‚˜ ๊ตฐ๋”๋”๊ธฐ ์—†๋Š” ๊ฐ„๊ฒฐํ•œ ์ฝ”๋“œ๊ฐ€ ๋˜๋„๋ก ํž˜์“ด๋‹ค.
  • ์ฝ”๋“œ๋งŒ์œผ๋กœ ํ‘œํ˜„ํ•˜์ง€ ๋ชปํ•˜๋Š” ๋งฅ๋ฝ์€ ์ฃผ์„์„ ์ถ”๊ฐ€ํ•˜์—ฌ ์ดํ•ด๋ฅผ ๋•๋Š”๋‹ค.

๊ณตํ†ต ์Šคํƒ€์ผ ๊ฐ€์ด๋“œ (common style guide)

  • ์ค„๋ฐ”๊ฟˆ๊ณผ ๋“ค์—ฌ์“ฐ๊ธฐ๋ฅผ ์ ์ ˆํžˆ ์‚ฌ์šฉํ•˜์—ฌ ๋ฌธ์žฅ์ด ์˜๋ฏธ ๋‹จ์œ„๋กœ ์ฝํž ์ˆ˜ ์žˆ๋„๋ก ๋งŒ๋“ ๋‹ค.
  • ๋“ค์—ฌ์“ฐ๊ธฐ๋Š” ํƒญ์ด ์•„๋‹Œ ๊ณต๋ฐฑ์„ ์ด์šฉํ•˜๋ฉฐ 2์นธ ๋“ค์—ฌ์“ฐ๊ธฐ๋ฅผ ๊ธฐ๋ณธ์œผ๋กœ ํ•œ๋‹ค.
  • ๋ฌธ์žฅ(statement)์˜ ๊ฐ ์ ˆ(clause)์€ ์ƒˆ๋กœ์šด ๋ผ์ธ์—์„œ ์‹œ์ž‘ํ•˜๋ฉฐ ๊ฐ ์ ˆ์˜ ์‹œ์ž‘ ํ‚ค์›Œ๋“œ๊ฐ€ ์˜ค๋ฅธ์ชฝ ์ •๋ ฌ์ด ๋˜๋„๋ก ํ•œ๋‹ค.
    • DDL(Data Definition Language) ๋ฌธ์€ ์˜ˆ์™ธ์ ์œผ๋กœ ์™ผ์ชฝ์œผ๋กœ ์ •๋ ฌ์‹œํ‚จ๋‹ค.
  • ์ค„๋ฐ”๊ฟˆ์ด ์ง€๋‚˜์ณ ์ข๊ณ  ๊ธธ์ญ‰ํ•œ (skinny) ๊ตฌ์กฐ๊ฐ€ ๋˜๋Š” ๊ฒฝ์šฐ ์—ฐ๊ด€๋œ ์ฝ”๋“œ ๋ญ‰์น˜๋ฅผ ํ•˜๋‚˜์˜ ๋ผ์ธ์œผ๋กœ ํ•ฉ์น  ์ˆ˜ ์žˆ๋‹ค.
  • ๋ฐ˜๋Œ€๋กœ ํ•œ ๋ผ์ธ์˜ ๊ธธ์ด๊ฐ€ 80 ๊ธ€์ž ๋‚ด์™ธ์˜ ์‹œ์•ผ ๋ฒ”์œ„๋ฅผ ๋„˜์–ด์„œ๋Š” ๊ฒฝ์šฐ ๊ฐ€๋กœ๋กœ ์ง€๋‚˜์น˜๊ฒŒ ๊ธธ์–ด์ง€์ง€ (flat and wide) ์•Š๋„๋ก ์ค„๋ฐ”๊ฟˆ์„ ํ†ตํ•ด ์ ์ ˆํ•œ ํญ์„ ์œ ์ง€ํ•œ๋‹ค.
  • ์–ธ์–ด์—์„œ ์ œ๊ณตํ•˜๋Š” ์žฅ์น˜๋“ค์„ ์ด์šฉํ•˜์—ฌ ๋ฐ˜๋ณต์„ ์ตœ์†Œํ™”ํ•œ๋‹ค. ๊ณตํ†ต ํ…Œ์ด๋ธ” ํ‘œํ˜„์‹ CTE(Common Table Expression)๊ณผ ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜ UDF(User Defined Function) ๋“ฑ์€ ์ฝ”๋“œ์˜ ๋ชจ๋“ˆํ™”๋ฅผ ๋•๋Š” ์žฅ์น˜๋“ค๋กœ ์ค‘๋ณต์„ ์ œ๊ฑฐํ•˜๋Š”๋ฐ ์œ ์šฉํ•˜๋‹ค.

์Šคํƒ€์ผ ๊ฐ€์ด๋“œ๋ฅผ ์ ์šฉํ•œ ์˜ˆ์‹œ (examples)

SQL ๋ฌธ์žฅ์€ ์•„๋ž˜์™€ ๊ฐ™์ด ์„ธ๋ถ„ํ™”ํ•  ์ˆ˜ ์žˆ๋‹ค. ๊ฐ ๋ฌธ์žฅ์— ๋Œ€ํ•ด ์Šคํƒ€์ผ ๊ฐ€์ด๋“œ๋ฅผ ์ ์šฉํ•œ ์˜ˆ์‹œ๋“ค์„ ๋ช‡๊ฐ€์ง€ ์‚ดํŽด๋ณด๋„๋ก ํ•˜์ž.

  • SEL statement - SELECT
  • DDL(Data Definition Language) statement - CREATE, ALTER, DROP
  • DML(Data Manipulation Language) statement - INSERT, UPDATE, DELETE, TRUNCATE, MERGE
  • DCL(Data Constraint Language) statement - BigQuery์—์„œ๋Š” Data Constraint Language ๊ตฌ๋ฌธ๋ฅผ ์ง€์›ํ•˜์ง€ ์•Š๋Š”๋‹ค.

SELECT๋Š” DML์˜ ํ•œ ์ข…๋ฅ˜์ด์ง€๋งŒ ๋‹ค๋ฅธ DML ๋ฌธ๊ณผ ๋‹ฌ๋ฆฌ ํ…Œ์ด๋ธ”์˜ ๋‚ด์šฉ์„ ๋ณ€๊ฒฝํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ์ ์—์„œ ๊ตฌ๋ถ„๋˜๊ธฐ๋„ ํ•œ๋‹ค.

  • SELECT statement ์˜ˆ์‹œ

๋ณธ ๊ฐ€์ด๋“œ๋Š” SELECT ๋ฌธ์žฅ์˜ ๊ตฌ์กฐ๋ฅผ ๊ตฌ์„ฑํ•˜๋Š” ํ‚ค์›Œ๋“œ๋Š” ๋Œ€๋ฌธ์ž๋กœ ์˜ค๋ฅธ์ชฝ ์ •๋ ฌ์„ ์‹œํ‚ค๊ณ , ๋น„์ฆˆ๋‹ˆ์Šค ๋กœ์ง์€ ์†Œ๋ฌธ์ž๋ฅผ ์œ„์ฃผ๋กœ ์™ผ์ชฝ ์ •๋ ฌ์‹œ์ผœ ๋ฌธ์žฅ์˜ ๊ตฌ์กฐ(Syntax)์™€ ์˜๋ฏธ(Semantics)๊ฐ€ ๊ตฌ๋ถ„๋˜๋„๋ก ํ•˜๊ณ  ์žˆ๋‹ค.

select_structure

/* ๊ฐ ์ ˆ์˜ ์‹œ์ž‘ ํ‚ค์›Œ๋“œ์ธ SELECT, FROM, LEFT, WHERE, GROUP ๋“ฑ์„ ์˜ค๋ฅธ์ชฝ ์ •๋ ฌ */
SELECT station_id,
       name,
       status,
       latitude, longitude,  -- ์—ฐ๊ด€ ์ปฌ๋Ÿผ๋“ค์„ ๊ฐ™์€ ๋ผ์ธ์— ๋‚˜์—ดํ•˜๋Š” ๊ฒƒ๋„ ๊ฐ€๋Šฅ
       ST_DISTANCE(
         ST_GEOGPOINT(longitude, latitude), ST_GEOGPOINT(-0.118092, 51.509865)
       ) AS distance_from_city_centre,
       COUNT(1) AS cnt,
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations` s
  LEFT JOIN `bigquery-public-data.austin_bikeshare.bikeshare_trips` t
    ON s.station_id = t.start_station_id
 WHERE s.status IN ('active')
   AND s.latitude > 15.0
 GROUP BY 1, 2, 3, 4, 5
HAVING cnt > 1
 LIMIT 1000
;
  • DDL statement ์˜ˆ์‹œ

DDL๋ฌธ์€ SEL ๋ฌธ์žฅ๊ณผ ๋‹ฌ๋ฆฌ ์‹œ์ž‘ ํ‚ค์›Œ๋“œ๋ฅผ ์™ผ์ชฝ์œผ๋กœ ์ •๋ ฌ์„ ์‹œํ‚จ๋‹ค.

-- https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#creating_a_new_table

CREATE TABLE my_dataset.new_table (
  x INT64 OPTIONS (description = 'An optional INTEGER field'),
  y STRUCT<
    a ARRAY<STRING> OPTIONS (description = 'A repeated STRING field'),
    b BOOL
  >,
)
PARTITION BY _PARTITIONDATE
OPTIONS (
  expiration_timestamp = TIMESTAMP '2025-01-01 00:00:00 UTC',
  partition_expiration_days = 1,
  description = 'a table that expires in 2025, with each partition living for 24 hours',
  labels=[("org_unit", "development")]
)
;
-- https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#examples_6

ALTER TABLE mydataset.mytable
SET OPTIONS (
      expiration_timestamp = TIMESTAMP_ADD(CURRENT_TIMESTAMP(), INTERVAL 7 DAY),
      description = "Table that expires seven days from now"
    )
;
  • DML statement ์˜ˆ์‹œ
-- https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#insert_using_explicit_values

INSERT dataset.Inventory (
         product,
         quantity
       )
VALUES ('top load washer', 10),
       ('front load washer', 20),
       ('dryer', 30),
       ('refrigerator', 10),
       ('microwave', 20),
       ('dishwasher', 30),
       ('oven', 5)
;
-- https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#update_using_joins

UPDATE dataset.Inventory
   SET quantity = quantity + (
         SELECT quantity
           FROM dataset.NewArrivals
          WHERE Inventory.product = NewArrivals.product
       ),
       supply_constrained = false
 WHERE product IN (SELECT product FROM dataset.NewArrivals)
;

UPDATE dataset.Inventory i
   SET quantity = i.quantity + n.quantity,
       supply_constrained = false
  FROM dataset.NewArrivals n
 WHERE i.product = n.product
;
-- https://cloud.google.com/bigquery/docs/reference/standard-sql/dml-syntax#merge_examples

MERGE dataset.DetailedInventory T
USING dataset.Inventory S
   ON T.product = S.product
 WHEN NOT MATCHED AND quantity < 20 THEN
      INSERT (product, quantity, supply_constrained, comments)
      VALUES (
        product, quantity, true,
        ARRAY<STRUCT<created DATE, comment STRING>>[
          (DATE('2016-01-01'), 'comment1')
        ]
      )
 WHEN NOT MATCHED THEN
      INSERT (product, quantity, supply_constrained)
      VALUES (product, quantity, false)

์ด๋ฆ„์ง“๊ธฐ์™€ ์‚ฌ์šฉํ•˜๊ธฐ (Naming Conventions)

์ด ์žฅ์—์„œ๋Š” ๋ฌธ์žฅ์„ ์ด๋ฃจ๋Š” ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ ๋‹จ์œ„์ธ ํ‚ค์›Œ๋“œ์™€ ์‹๋ณ„์ž ์ด๋ฆ„์— ๋Œ€ํ•œ ๋‚ด์šฉ์„ ๋‹ค๋ฃฌ๋‹ค.

๋ช…๋ช… ๊ธฐ๋ณธ ๊ฐ€์ด๋“œ

์‹๋ณ„์ž(identifier)๋Š” ๋ฌธ์žฅ๋‚ด์—์„œ ์œ ์ผํ•˜๊ฒŒ ๊ตฌ๋ณ„๋˜์–ด ์ธ์‹๋˜๋Š” ์ด๋ฆ„์œผ๋กœ ์•„๋ž˜ ํ•ญ๋ชฉ๋“ค์„ ํฌํ•จํ•œ๋‹ค.

  1. ์–ธ์–ด์˜ ๋ฌธ๋ฒ•๊ตฌ์กฐ๋ฅผ ๊ธฐ์ˆ ํ•˜๋Š” ํ† ํฐ(token) - ex. SELECT, JOIN, FROM, ...
  2. ํ…Œ์ด๋ธ”, ์ปฌ๋Ÿผ์˜ ๋ช…์นญ์ด๋‚˜ ๋ณ„์นญ(alias)
  3. ๊ธฐ๋ณธ ์ œ๊ณต ํ•จ์ˆ˜ ๋˜๋Š” ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜ ์ด๋ฆ„ - ex. SUM(), STRPOS(), ...

ํ‚ค์›Œ๋“œ(keyword) ๋Š” ์˜ˆ์•ฝ๋œ ์‹๋ณ„์ž(reserved identifier) ๋กœ์„œ SQL ๋ช…์„ธ์— ์˜ํ•ด ๊ทธ ์“ฐ์ž„์ด ์ด๋ฏธ ์ •ํ•ด์ ธ ์žˆ๋Š” ์ด๋ฆ„์„ ๋งํ•œ๋‹ค.

๋ณธ ๊ฐ€์ด๋“œ์—์„œ๋Š” ์•„๋ž˜ ์„œ์ˆ ๋œ ๊ด€๋ก€์— ๋”ฐ๋ผ ์ด๋ฆ„์„ ์ •์˜ํ•˜๊ณ  ํ‘œ๊ธฐํ•˜๋Š” ๊ฒƒ์„ ๊ถŒ์žฅํ•œ๋‹ค.

  • ํ‚ค์›Œ๋“œ๋Š” ๋Œ€๋ฌธ์ž๋กœ ๊ทธ ์™ธ์˜ ์‹๋ณ„์ž๋Š” ์†Œ๋ฌธ์ž๋กœ ์ž‘์„ฑํ•˜๋„๋ก ํ•œ๋‹ค.
  • ์—ฌ๋Ÿฌ ๋‹จ์–ด๋กœ ์ด๋ฃจ์–ด์ง„ ์‹๋ณ„์ž๋Š” camelCase๊ฐ€ ์•„๋‹Œ snake_case์˜ ํ˜•ํƒœ๋กœ ์ž‘์„ฑํ† ๋ก ํ•œ๋‹ค.
  • ๊ธฐ๋ณธ ์ œ๊ณต ํ•จ์ˆ˜ ํ˜น์€ ๋‚ด์žฅ ํ•จ์ˆ˜(built-in function)๋Š” ๋Œ€์†Œ๋ฌธ์ž๋ฅผ ๊ตฌ๋ถ„ํ•˜์ง€ ์•Š์œผ๋‚˜ ๋Œ€๋ฌธ์ž๋กœ ์ž‘์„ฑํ•˜๋Š” ๊ฒƒ์„ ๊ธฐ๋ณธ์œผ๋กœ ํ•œ๋‹ค.
  • ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜(UDF, user-defined function)๋Š” ์†Œ๋ฌธ์ž๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํ•จ์ˆ˜ ์ด๋ฆ„์„ ์ •์˜ํ•˜๋„๋ก ํ•œ๋‹ค. UDF ์ด๋ฆ„์€ ๋Œ€์†Œ๋ฌธ์ž๋ฅผ ๊ตฌ๋ถ„ํ•œ๋‹ค.

์œ„์˜ ๊ด€๋ก€์— ๋”ฐ๋ฅธ ์˜ˆ์‹œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

๐Ÿ˜„ ๊ถŒ์žฅ (recommend)

SELECT first_name, last_name FROM my_dataset.my_table

๐Ÿ˜ž ๊ธฐํ”ผ (avoid)

select firstName, lastName from myDataset.myTable -- camel case ์‚ฌ์šฉ
-- OR
SELECT FIRST_NAME, LAST_NAME FROM MY_DATASET.MY_TABLE

๋‚ด์žฅ ํ•จ์ˆ˜์™€ ๋‹ฌ๋ฆฌ ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜์˜ ์ด๋ฆ„์— ์†Œ๋ฌธ์ž๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ์ด์œ ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜๋ฅผ ์˜์† ํ•จ์ˆ˜ (persistent function) ๋กœ ์ •์˜ํ•˜๋Š” ๊ฒฝ์šฐ ๋ฐ์ดํ„ฐ์…‹ ์ด๋ฆ„์„ ํฌํ•จํ•œ ํ•œ์ •๋œ ํ•จ์ˆ˜ ์ด๋ฆ„ (qualified function name)์„ ์‚ฌ์šฉํ•ด์•ผ ํ•œ๋‹ค. ์ด๋•Œ ํ•จ์ˆ˜๋ฅผ ํ•œ์ •์‹œํ‚ค๊ธฐ ์œ„ํ•ด ์‚ฌ์šฉํ•˜๋Š” ํ”„๋กœ์ ํŠธ ์ด๋ฆ„์ด๋‚˜ ๋ฐ์ดํ„ฐ์…‹์˜ ์ด๋ฆ„์ด ์†Œ๋ฌธ์ž๋กœ ์ž‘์„ฑ๋˜๊ธฐ ๋•Œ๋ฌธ์— ํ•˜๋‚˜์˜ ์˜๋ฏธ ๋‹จ์œ„๋‚ด์—์„œ ๋Œ€์†Œ๋ฌธ์ž๊ฐ€ ํ˜ผ์šฉ๋˜์ง€ ์•Š๊ฒŒ ํ•˜๊ธฐ ์œ„ํ•ด์„œ์ด๋‹ค.

์ด๋ฅผ ํ†ตํ•ด ์•„๋ž˜์— ์„ค๋ช…ํ•  FQTN (Fully Qualified Table Name) ํ…Œ์ด๋ธ” ์ด๋ฆ„ ๊ทœ์น™๊ณผ์˜ ์ผ๊ด€์„ฑ๋„ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ๋‹ค.

-- ์™„์ „ ํ•œ์ •๋œ ํ•จ์ˆ˜ ์ด๋ฆ„์˜ ์˜ˆ - fully qualified function name
SELECT bqutils.fn.last_day('2021-03-07')

๋Œ€์†Œ๋ฌธ์ž์˜ ๊ตฌ๋ถ„์€ ๋ฌธ์žฅ๋‚ด์—์„œ ๊ฐ ์ด๋ฆ„์ด ์ œ ์—ญํ• ์„ ๋“œ๋Ÿฌ๋‚ด๋„๋ก ํ•˜๋Š”๋ฐ ๋„์›€์„ ์ค€๋‹ค.

๋Œ€๋ฌธ์ž SELECT, FROM ๋“ฑ์˜ ํ‚ค์›Œ๋“œ๋Š” ์ฃผ๋กœ ๋ฌธ์žฅ ๊ตฌ์กฐ์˜ ํ‘œํ˜„์— ์‚ฌ์šฉ๋˜๋Š”๋ฐ ์†Œ๋ฌธ์ž๋กœ ํ‘œ๊ธฐ๋˜๋Š” ๊ฐœ์ฒด(entity)๋‚˜ ์†์„ฑ(attribute)์˜ ์‹๋ณ„์ž ์ด๋ฆ„๊ณผ ์‹œ๊ฐ์ ์œผ๋กœ ๊ตฌ๋ถ„์ด ๋œ๋‹ค. ์ด๋Š” ์ „์ฒด์ ์ธ ๋ฌธ์žฅ ๊ตฌ์กฐ์˜ ํŒŒ์•…์„ ์šฉ์ดํ•˜๊ฒŒ ํ•œ๋‹ค.

๊ธฐํƒ€ ์Šคํƒ€์ผ ๊ฐ€์ด๋“œ

  • ํ‚ค์›Œ๋“œ๊ฐ€ ๋ถ€๋“์ดํ•˜๊ฒŒ ์ปฌ๋Ÿผ์˜ ์ด๋ฆ„์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๊ฒฝ์šฐ backtick(`)์„ ์ด์šฉํ•˜์—ฌ ๊ฐ์‹ธ์ค€๋‹ค (quoted identifiers).
-- from ์˜ˆ์•ฝ์–ด๊ฐ€ ์ปฌ๋Ÿผ๋ช…์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ
SELECT 'asia-northeast3' AS `from` FROM ...
  • ๋ฌธ์ž์—ด ๋ฆฌํ„ฐ๋Ÿด(string literal)์€ ์ž‘์€๋”ฐ์˜ดํ‘œ(')๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.
  • ์ƒ์ˆ˜(constant)๊ฐ’์„ ๊ฐ–๋Š” ๋ณ€์ˆ˜ ์ด๋ฆ„์€ ๋Œ€๋ฌธ์ž๋กœ ํ•œ๋‹ค.
DECLARE START_DATE DATE DEFAULT '2021-02-19'

์ฐธ๊ณ  (references)


์ปฌ๋Ÿผ ์ด๋ฆ„ ์ง“๊ธฐ์™€ ์‚ฌ์šฉํ•˜๊ธฐ (Column Naming)

์ปฌ๋Ÿผ ์ด๋ฆ„ ๊ธฐ๋ณธ ๊ฐ€์ด๋“œ

์ปฌ๋Ÿผ์ด๋ฆ„์€ camelCase๊ฐ€ ์•„๋‹Œ ์†Œ๋ฌธ์ž๋ฅผ ์ด์šฉํ•˜์—ฌ snake_case ๋กœ ํ‘œํ˜„ํ•˜๋ฉฐ ํ—๊ฐ€๋ฆฌ์–ธ ํ‘œ๊ธฐ ๋‚˜ ์˜๋ฏธ๋ฅผ ์œ ์ถ”ํ•˜๊ธฐ ํž˜๋“  ์ง€๋‚˜์น˜๊ฒŒ ์ถ•์•ฝ๋œ ์ ‘๋‘์–ด(prefix), ์ ‘๋ฏธ์–ด(suffix)์˜ ์‚ฌ์šฉ์€ ์ž์ œํ•œ๋‹ค.

๊ธฐ์ค€์ •๋ณด์™€ ๊ฐ™์ด ํŒ€๋‚ด์—์„œ ํ‘œ์ค€์œผ๋กœ ๊ด€๋ฆฌ๋˜๋Š” ์†์„ฑ๋“ค์ด ์žˆ๋Š” ๊ฒฝ์šฐ์—๋Š” ํ•ด๋‹น ํ‘œ์ค€์—์„œ ์ •์˜๋œ ๋ช…๋ช…๊ทœ์น™์„ ์šฐ์„ ํ•œ๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, ๊ตญ๊ฐ€์ฝ”๋“œ๋“ฑ ๊ธฐ์ค€์ •๋ณด๋กœ ์‚ฌ์šฉ๋˜๋Š” ์ปฌ๋Ÿผ์˜ ์ด๋ฆ„์€ ํ‘œ์ค€์—์„œ ์ •์˜๋œ ๋ช…๋ช… ๊ทœ์น™์„ ์ฐจ์šฉํ•ด์„œ ์‚ฌ์šฉํ•˜๋„๋ก ํ•œ๋‹ค.

SELECT cnty_cd, mcc, mnc, csc, FROM ...

๋ฐฐ์—ดํ˜•์ด๋‚˜ ๋ณต์ˆ˜์˜ ์˜๋ฏธ๋ฅผ ์ง€๋‹ˆ๋Š” ์ปฌ๋Ÿผ์ธ ๊ฒฝ์šฐ๋Š” ์ด๋ฆ„์„ ๋ณต์ˆ˜ํ˜•์œผ๋กœ ํ•  ์ˆ˜ ์žˆ๋‹ค.

SELECT hits,          -- ARRAY<STRUCT<>> ํ˜•
       totals.visits, -- INT64ํ˜•, ์ด ์„ธ์…˜ ํšŸ์ˆ˜
       totals.hits,   -- INT64ํ˜•, ์„ธ์…˜๋‚ด ์กฐํšŒ์ˆ˜
  FROM `bigquery-public-data.google_analytics_sample.ga_sessions_20170801`
;

์ ‘๋‘์–ด์™€ ์ ‘๋ฏธ์–ด (prefix, suffix)

์ ‘๋‘์–ด๋‚˜ ์ ‘๋ฏธ์–ด๋Š” ์ด๋ฆ„์— ๋ถ€๊ฐ€์ ์ธ ์˜๋ฏธ(semantics)๋ฅผ ๋ถ€์—ฌํ•˜๊ธฐ ์œ„ํ•œ ๋ชฉ์ ์œผ๋กœ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•˜๋‚˜ ์ง€๋‚˜์น˜๊ฒŒ ์ถ•์•ฝ๋œ ์•ฝ์–ด๋‚˜ ๊ณผ๋„ํ•œ ์‚ฌ์šฉ์€ ์˜คํžˆ๋ ค ๊ฐ€๋…์„ฑ์„ ๋–จ์–ด๋œจ๋ฆด ์ˆ˜ ์žˆ์–ด ์‚ฌ์šฉ์— ์‹ ์ค‘ํ•ด์•ผ ํ•œ๋‹ค.

์•„๋ž˜๋Š” ๋ช‡๊ฐ€์ง€ ์‚ฌ์šฉ ๊ฐ€๋Šฅํ•œ ์˜ˆ์‹œ์ด๋‹ค.

  • Boolean ํ˜•์˜ ๊ฒฝ์šฐ is_, has_์™€ ๊ฐ™์€ ์ ‘๋‘์–ด๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.
  • Dateํ˜•์€ _dt ์ ‘๋ฏธ์–ด๋ฅผ ๋ถ™์ผ ์ˆ˜ ์žˆ๋‹ค.
  • DateTime์ด๋‚˜ Timestampํ˜•์€ _at ์ ‘๋ฏธ์–ด๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค.
  • user_id, device_id ์ฒ˜๋Ÿผ ์‹๋ณ„์ž(id) ์ ‘๋ฏธ์–ด์˜ ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•˜๋‹ค.
SELECT is_active,
       registered_dt,
       updated_at,
  FROM ...

์ปฌ๋Ÿผ ๋ณ„์นญ (column_alias)

์ปฌ๋Ÿผ์˜ ๋ณ„์นญ ์ด๋ฆ„์€ ์ปฌ๋Ÿผ ์ด๋ฆ„ ๊ธฐ๋ณธ ๊ฐ€์ด๋“œ์— ์ค€ํ•˜์—ฌ ์ž‘์„ฑํ•œ๋‹ค. ๋ณ„์นญ ์•ž์˜ AS ํ‚ค์›Œ๋“œ ๋Š” ์ƒ๋žต ๊ฐ€๋Šฅํ•˜๋‚˜ ๋ณธ ๊ฐ€์ด๋“œ์—์„œ๋Š” ๋ช…์‹œ์ ์œผ๋กœ ์‚ฌ์šฉํ•˜๋Š” ๊ฒƒ์„ ๊ธฐ๋ณธ์œผ๋กœ ํ•œ๋‹ค.

๋ณ„์นญ์€ ๋ชฉ์ ์— ๋”ฐ๋ผ์„œ ์ปฌ๋Ÿผ ์ด๋ฆ„๋ณด๋‹ค ์ƒ์„ธํžˆ ์ž‘์„ฑ๋˜๋Š” ๊ฒฝ์šฐ๋„ ์žˆ๊ณ  ๋ฐ˜๋Œ€๋กœ ๊ฐ„๋žตํ•˜๊ฒŒ ์ถ•์•ฝ๋œ ํ˜•ํƒœ๋กœ ์‚ฌ์šฉํ•˜๊ธฐ๋„ ํ•œ๋‹ค.

์ปฌ๋Ÿผ์˜ ์˜๋ฏธ๋ฅผ ๋ช…ํ™•ํžˆ ํ•˜๊ธฐ ์œ„ํ•ด์„œ ์•„๋ž˜์™€ ๊ฐ™์ด ๊ธด ์ด๋ฆ„์˜ ๋ณ„์นญ์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

SELECT fname AS first_name,
       lname AS last_name,
  FROM ...

๋ฐ˜๋ฉด์— ์ปฌ๋Ÿผ ์ด๋ฆ„์ด ์œ ๋„๋œ ์ปฌ๋Ÿผ(derived column) ๋˜๋Š” ๊ณ„์‚ฐ๋œ ์ปฌ๋Ÿผ(calculated column)์„ ๋งŒ๋“ค์–ด ๋‚ด๋Š” ์—ฐ์‚ฐ์‹์—์„œ ๋ฐ˜๋ณต์ ์œผ๋กœ ์‚ฌ์šฉ๋˜์–ด์ง€๋Š” ๊ฒฝ์šฐ๋Š” ๊ฐ„๊ฒฐํ•จ์„ ์œ„ํ•ด์„œ ์ถ•์•ฝ๋œ ์ด๋ฆ„์„ ์‚ฌ์šฉํ•˜๊ธฐ๋„ ํ•œ๋‹ค.

๐Ÿ˜ž ๊ธฐํ”ผ (avoid)

WITH locations AS (
  SELECT x_position,
         y_position,
         city_centre_x_position,
         city_center_y_position,
    FROM ...
)
SELECT SQRT(
         (city_center_x_position - x_position) * (city_center_x_position - x_position) +
         (city_center_y_position - y_position) * (city_center_y_position - y_position)
       ) AS distince,  -- calculated(or derived) column
  FROM locations

์œ„์˜ ์˜ˆ์‹œ๋Š” ํ…Œ์ด๋ธ”์ด ๊ฐ€์ง€๋Š” ๊ธฐ๋ณธ ์ปฌ๋Ÿผ (base column) ์œผ๋กœ๋ถ€ํ„ฐ ์ƒˆ๋กœ์šด ์ปฌ๋Ÿผ์„ ํŒŒ์ƒ์‹œํ‚ค๋Š” ์—ฐ์‚ฐ์‹์— ์ปฌ๋Ÿผ ์ด๋ฆ„์ด ๋ฐ˜๋ณต์ ์œผ๋กœ ๋“ฑ์žฅํ•˜์—ฌ ๋ผ์ธ์ด ๊ธธ์–ด์ง€๊ณ  ์žˆ๋‹ค. ์ด๋ฅผ ์•„๋ž˜์™€ ๊ฐ™์ด ์ถ•์•ฝ๋œ ๋ณ„์นญ์„ ์‚ฌ์šฉํ•˜์—ฌ ์—ฐ์‚ฐ์‹์„ ๊ฐ„๊ฒฐํ•˜๊ฒŒ ํ‘œํ˜„ํ•  ์ˆ˜ ์žˆ๋‹ค.

๐Ÿ˜„ ๊ถŒ์žฅ (recommend)

WITH location AS (
  SELECT x_position AS x,
         y_position AS y,
         city_centre_x_position AS cx,
         city_center_y_position AS cy,
    FROM ...
)
SELECT SQRT((cx - x) * (cx - x) + (cy - y) * (cy - y)) AS distince,
  FROM location

ํ…Œ์ด๋ธ” ์ด๋ฆ„ ์ง“๊ธฐ์™€ ์‚ฌ์šฉํ•˜๊ธฐ (Table Naming)

ํ…Œ์ด๋ธ” ์ด๋ฆ„ ๊ธฐ๋ณธ ๊ฐ€์ด๋“œ

ํ…Œ์ด๋ธ” ์ด๋ฆ„์€ ์†Œ๋ฌธ์ž ์‚ฌ์šฉ์„ ๊ธฐ๋ณธ์œผ๋กœ ํ•˜๋˜ ์˜๊ตฌ ํ…Œ์ด๋ธ”(permanent table)์— ๋Œ€ํ•œ ํŒ€๋‚ด ๋ช…๋ช… ๊ทœ์น™์ด ๋ณ„๋„๋กœ ์ •ํ•ด์ ธ ์žˆ๋Š” ๊ฒฝ์šฐ ๊ทธ๊ฒƒ์„ ์šฐ์„ ํ•œ๋‹ค.

๋‹ค๋งŒ, ์ž„์‹œํ…Œ์ด๋ธ”(temporary table)์ด๋‚˜ ๊ณตํ†ต ํ…Œ์ด๋ธ” ํ‘œํ˜„์‹(CTEs) ์ด๋ฆ„์˜ ๊ฒฝ์šฐ๋Š” ์†Œ๋ฌธ์ž๋ฅผ ์‚ฌ์šฉํ•˜๋„๋ก ํ•œ๋‹ค. ํ…Œ์ด๋ธ” ์ด๋ฆ„์€ ๋Œ€์†Œ๋ฌธ์ž๋ฅผ ๊ตฌ๋ถ„ํ•œ๋‹ค.

  • ์—ฌ๋ž˜ ๊ฐœ์˜ ๋‹จ์–ด๋ฅผ ์กฐํ•ฉํ•˜์—ฌ ์ด๋ฆ„๋ฅผ ์ง€์„ ๊ฒฝ์šฐ dash(-)๊ฐ€ ์•„๋‹Œ underscore(_)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ snake_case ์ด๋ฆ„์ด ๋˜๋„๋ก ํ•œ๋‹ค.
  • ๊ณผ์ œ๋ณ„ ๋ช…๋ช…๊ทœ์น™์— ์˜ํ•ด ์ด๋ฏธ ์ƒ์„ฑ๋˜์–ด ์šด์˜๋˜๊ณ  ์žˆ๋Š” ๋ฐ์ดํ„ฐ์…‹๊ณผ ํ…Œ์ด๋ธ” ์ด๋ฆ„๋“ค์€ ๊ฐ€์ด๋“œ์˜ ๊ถŒ๊ณ ๋ฅผ ๋”ฐ๋ฅด์ง€ ์•Š๋”๋ผ๋„ As-Is ์ด๋ฆ„์„ ๊ทธ๋Œ€๋กœ ์‚ฌ์šฉํ† ๋ก ํ•œ๋‹ค.

โ–  Fully Qualified Table Name (FQTN)

'์™„์ „ํ•˜๊ฒŒ ํ•œ์ •๋œ ํ…Œ์ด๋ธ” ์ด๋ฆ„ ๋˜๋Š” ์ •๊ทœํ™”๋œ ํ…Œ์ด๋ธ” ์ด๋ฆ„'์˜ ์˜๋ฏธ๋Š” ๊ทธ ์ด๋ฆ„์„ ๊ฐ€์ง€๊ณ  ์‹คํ–‰์ปจํ…์ŠคํŠธ์™€ ์ƒ๊ด€์—†์ด ์œ ์ผํ•˜๊ฒŒ ํ…Œ์ด๋ธ”๋ฅผ ํŠน์ •ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ๋ง์ด๋‹ค. ์˜ˆ๋ฅผ ๋“ค์–ด, customer.devices ๋ผ๋Š” ์ผ๋ถ€ ํ•œ์ •๋œ ํ…Œ์ด๋ธ” ์ด๋ฆ„ ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ์ฟผ๋ฆฌ๊ฐ€ ์‹คํ–‰๋˜๋Š” ํ”„๋กœ์ ํŠธ์— ๋”ฐ๋ผ ๊ฐ๊ธฐ ๋‹ค๋ฅธ customer.devices ํ…Œ์ด๋ธ”์— ์ ‘๊ทผ๋  ์ˆ˜ ์žˆ๋‹ค.

BigQuery๋กœ DWH(Data Ware House)๋ฅผ ๊ตฌ์ถ•ํ•˜๋Š” ๊ฒฝ์šฐ ๋ฐ์ดํ„ฐ๊ฐ€ ์ €์žฅ๋œ ํ”„๋กœ์ ํŠธ์™€ BigQuery Job์„ ์‹คํ–‰์‹œ์ผœ์ฃผ๋Š” ํ”„๋กœ์ ํŠธ๊ฐ€ ๋‹ค๋ฅผ ์ˆ˜ ์žˆ๊ธฐ ๋•Œ๋ฌธ์— ์‹คํ–‰์ปจํ…์ŠคํŠธ์˜ ์˜ํ–ฅ์„ ๋ฐ›์ง€ ์•Š๋„๋ก ์ •๊ทœํ™”๋œ ํ˜•ํƒœ์˜ ํ…Œ์ด๋ธ” ์ด๋ฆ„์„ ์‚ฌ์šฉํ•˜๋„๋ก ํ•œ๋‹ค. ํ…Œ์ด๋ธ” ์ด๋ฆ„์— ํ”„๋กœ์ ํŠธ ๋ช…์„ ์ƒ๋žตํ•˜๋Š” ๊ฒฝ์šฐ BigQuery Job ์ด ์ˆ˜ํ–‰๋˜๊ณ  ์žˆ๋Š” ํ”„๋กœ์ ํŠธ ์ด๋ฆ„์„ ์•”๋ฌต์ ์œผ๋กœ ์‚ฌ์šฉํ•œ๋‹ค.

์ด์™ธ์—, ๋‹จ์ผ ๋ฌธ์žฅ(single statement)์˜ ์ฟผ๋ฆฌ๋ฅผ ์‹คํ–‰ํ•˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ์•„๋‹Œ BigQuery ์Šคํฌ๋ฆฝํŠธ(script)๋กœ ๋ณต์ˆ˜์˜ ๋ฌธ์žฅ(multiple statements)์„ ์‹คํ–‰ํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” GCP Project ์ด๋ฆ„์— ํฌํ•จ๋œ dash(-)๊ฐ€ ๋บ„์…ˆ ๋“ฑ์˜ ๋‹ค๋ฅธ ์˜๋ฏธ๋กœ ํ•ด์„์ด ๋˜์ง€ ์•Š๋„๋ก ์ •๊ทœํ™”๋œ ํ…Œ์ด๋ธ” ์ด๋ฆ„ ์„ backtick(`)์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ์‹ธ์ค€๋‹ค.

์ด๋•Œ dash๊ฐ€ ํฌํ•จ๋œ ํ”„๋กœ์ ํŠธ ์ด๋ฆ„๋งŒ backtick์œผ๋กœ ๊ฐ์‹ธ์ฃผ์–ด๋„ ์˜ค๋ฅ˜๊ฐ€ ๋ฐœ์ƒํ•˜์ง€๋Š” ์•Š์œผ๋‚˜ FQTN ์ „์ฒด๋ฅผ backtick์„ ๊ฐ์‹ธ์ฃผ์–ด ์˜๋ฏธ์ ์œผ๋กœ ํ•˜๋‚˜์˜ ๋‹จ์œ„์ž„์„ ๋ช…์‹œ์ ์œผ๋กœ ๋‚˜ํƒ€๋‚ด๋„๋ก ํ•œ๋‹ค.

๐Ÿ˜„ ๊ถŒ์žฅ (recommend)

SELECT * FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`

๐Ÿ˜ž ๊ธฐํ”ผ (avoid)

SELECT * FROM `bigquery-public-data.austin_bikeshare`.bikeshare_stations
-- or
SELECT * FROM `bigquery-public-data.austin_bikeshare`.`bikeshare_stations`
-- or
SELECT * FROM `bigquery-public-data`.`austin_bikeshare`.`bikeshare_stations`

๋ณ„ํ‘œ(*)๋ฅผ ํฌํ•จํ•˜๋Š” ์™€์ผ๋“œ์นด๋“œ(wildcard) ํ…Œ์ด๋ธ” ์ด๋ฆ„๋„ backtick(`)์œผ๋กœ ๊ฐ์‹ธ์ฃผ์–ด์•ผ ํ•œ๋‹ค.

/* Valid standard SQL query with wildcard table name*/
SELECT max
  FROM `bigquery-public-data.noaa_gsod.gsod*`
 WHERE max != 9999.9 # code for missing data
   AND _TABLE_SUFFIX = '1929'
 ORDER BY max DESC
;

โ–  ์ฃผ์˜

ํ…Œ์ด๋ธ” ์ด๋ฆ„์ด ๋”ฐ์˜ดํ‘œ๋กœ ๋ฌถ์ธ ์‹๋ณ„์ž(quoted identifier)์ธ ๊ฒฝ์šฐ ์ฐธ์กฐ(reference)์‹œ ๋ฌถ์Œ ๋‹จ์œ„๋กœ ์‚ฌ์šฉํ•ด์•ผ ํ•œ๋‹ค. ์—ฌ๋Ÿฌ ํ…Œ์ด๋ธ”์„ ๋‹ค๋ฃจ๋Š” JOIN ๊ตฌ๋ฌธ์—์„œ ์•„๋ž˜์™€ ๊ฐ™์ด ํ…Œ์ด๋ธ”์„ ์ฐธ์กฐํ•  ๊ฒฝ์šฐ ์—๋Ÿฌ๊ฐ€ ๋ฐœ์ƒํ•œ๋‹ค.

-- Error
SELECT *
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
  JOIN `bigquery-public-data.san_francisco.bikeshare_trips`
    ON bikeshare_stations.station_id = bikeshare_trips.start_station_id

Unrecognized name: bikeshare_stations

์ด๋Š” FQTN ์ „์ฒด๊ฐ€ ๋”ฐ์˜ดํ‘œ๋กœ ๋ฌถ์ธ ์‹๋ณ„์ž์—ฌ์„œ ํ•˜๋‚˜์˜ ๋ฌถ์Œ์œผ๋กœ ์ฐธ์กฐ๋˜์–ด์•ผ ํ•˜๊ณ  ๋ฌถ์Œ ์•ˆ์˜ ํ…Œ์ด๋ธ” ์ด๋ฆ„์„ ๋”ฐ๋กœ ๊บผ๋‚ด ์‚ฌ์šฉํ•  ์ˆ˜ ์—†๊ธฐ ๋•Œ๋ฌธ์ด๋‹ค.

FQTN ์ „์ฒด๋ฅผ ์ฐธ์กฐํ•˜๋Š” ์•„๋ž˜ ์ฟผ๋ฆฌ๋Š” ์ •์ƒ ์ˆ˜ํ–‰๋˜๋‚˜ ๋ณด๊ธฐ์— ์ข‹์€ ์ฝ”๋“œ๋Š” ์•„๋‹ˆ๋‹ค.

-- working but looks ugly
SELECT *
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
  JOIN `bigquery-public-data.san_francisco.bikeshare_trips`
    ON `bigquery-public-data.austin_bikeshare.bikeshare_stations`.station_id =
       `bigquery-public-data.san_francisco.bikeshare_trips`.start_station_id

์ด๋Ÿฐ ๊ฒฝ์šฐ ํ…Œ์ด๋ธ” ๋ณ„์นญ(alias)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋”ฐ์˜ดํ‘œ๋กœ ๋ฌถ์ธ FQTN ์ด ์•„๋‹Œ ๋ณ„์นญ์„ ์ฐธ์กฐํ•˜๋„๋ก ํ•˜์—ฌ ํ…Œ์ด๋ธ” ์ด๋ฆ„ ์ง“๊ธฐ ์ง€์นจ์„ ๋”ฐ๋ฅด๋ฉด์„œ๋„ ๊ฐ„๊ฒฐํ•จ์„ ์œ ์ง€ํ•  ์ˆ˜ ์žˆ๋‹ค.

SELECT *
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations` s
  JOIN `bigquery-public-data.san_francisco.bikeshare_trips` t
    ON s.station_id = t.start_station_id

๋ฌธ์žฅ ๊ตฌ์กฐํ™” ํ•˜๊ธฐ (Statement Structure)

SQL ๋ฌธ๋ฒ•์— ์ •์˜๋œ ํ‚ค์›Œ๋“œ๋“ค๊ณผ ์•ž์„œ์˜ ๋ช…๋ช… ๊ทœ์น™์— ์˜ํ•ด ์ง€์–ด์ง„ ์ด๋ฆ„์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ฐ ๊ตฌ์ ˆ์„ ๊ธฐ์ˆ ํ•˜๊ณ  ์ด๋ฅผ ๋ธ”๋Ÿญ์ฒ˜๋Ÿผ ์Œ“์•„ ๋ฌธ์žฅ์œผ๋กœ ๊ตฌ์กฐํ™”ํ•ด ๋ณด๋„๋ก ํ•œ๋‹ค.

SELECT ์ปฌ๋Ÿผ ๋ฆฌ์ŠคํŠธ

SQL์˜ ๊ฐ€์žฅ ๊ธฐ๋ณธ์ ์ธ SELECT ๋ฌธ์žฅ์˜ ์ปฌ๋Ÿผ ๋ฆฌ์ŠคํŠธ๋Š” ์•„๋ž˜์˜ ์ง€์นจ์— ๋”ฐ๋ผ ์ž‘์„ฑํ•œ๋‹ค.

  • SELECT ๋ฆฌ์ŠคํŠธ์— ๋“ฑ์žฅํ•˜๋Š” ์ปฌ๋Ÿผ์ด ๋ณต์ˆ˜๊ฐœ์ผ ๊ฒฝ์šฐ ํ•œ ๋ผ์ธ์— ํ•˜๋‚˜์”ฉ ์ž‘์„ฑํ•˜๋Š” ๊ฒƒ์„ ๊ธฐ๋ณธ์œผ๋กœ ํ•œ๋‹ค.
  • SELECT * ๋กœ ์ „์ฒด ์ปฌ๋Ÿผ์„ ๊ฐ€์ ธ์˜ค๊ธฐ ๋ณด๋‹ค๋Š” ํ•„์š”ํ•œ ์ปฌ๋Ÿผ๋งŒ์„ SELECT ๋ฆฌ์ŠคํŠธ์— ์—ด๊ฑฐํ•˜๋„๋ก ํ•œ๋‹ค.
SELECT station_id,
       name,
       status,
       latitude,
       longitude,
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
  • ์ผ๋ถ€ ์ปฌ๋Ÿผ์„ ์ œ์™ธํ•œ ์ „์ฒด ์ปฌ๋Ÿผ์ด ํ•„์š”ํ•œ ๊ฒฝ์šฐ๋Š” EXCEPT ๊ตฌ๋ฌธ์„ ํ™œ์šฉํ•œ๋‹ค.
SELECT * EXCEPT(latitude, longitude)
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
  • ์˜๋ฏธ์ ์œผ๋กœ ๊ด€๋ จ์ด ์žˆ๊ฑฐ๋‚˜ ์‹œ์•ผ ๋ฒ”์œ„๋‚ด์—์„œ ํ•œ ๋ˆˆ์— ์ฝํžˆ๊ณ  ์ดํ•ด๋  ์ˆ˜ ์žˆ๋‹ค๋ฉด 80์ž๋ฅผ ๋„˜์ง€ ์•Š๋Š” ์„ ์—์„œ ํ•˜๋‚˜์˜ ๋ผ์ธ์— ๋ณต์ˆ˜์˜ ์ปฌ๋Ÿผ๋“ค์„ ๊ธฐ์ˆ ํ•  ์ˆ˜ ์žˆ๋‹ค.
  • ์ฃผ์„์„ ์‚ฌ์šฉํ•˜์—ฌ ์ปฌ๋Ÿผ์˜ ์˜๋ฏธ๋ฅผ ๋ถ€์—ฐ ์„ค๋ช…ํ•  ์ˆ˜ ์žˆ๋‹ค.
  • ๋งˆ์ง€๋ง‰ ์ปฌ๋Ÿผ๋ช…์˜ ๋’ค์— ์ฝค๋งˆ(,)๋ฅผ ์ถ”๊ฐ€ํ•œ๋‹ค.
SELECT station_id,
       name,
       status,
       latitude, longitude,  -- ํ•˜๋‚˜์˜ ์ปฌ๋Ÿผ ๋ญ‰์น˜(Column Family)๋กœ ์˜๋ฏธ์ ์œผ๋กœ ๋ฌถ์ž„
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
;
-- ๋˜๋Š”
SELECT station_id, name, status, latitude, longitude,
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
;
  • ์ปฌ๋Ÿผ ๋ณ„์นญ(alias)์„ ํฌํ•จํ•˜๋Š” ๊ฒฝ์šฐ์—๋„ ์œ„์˜ ๋‚ด์šฉ๋“ค์ด ๋™์ผํ•˜๊ฒŒ ์ ์šฉ๋œ๋‹ค.
  • ์ปฌ๋Ÿผ ๋ณ„์นญ์€ AS ํ‚ค์›Œ๋“œ ๋‹ค์Œ์— ์œ„์น˜์‹œํ‚ค๋ฉฐ ๋ณ„์นญ์˜ ์‹œ์ž‘ ์œ„์น˜๋Š” ๋”ฐ๋กœ ์ •๋ ฌ์‹œํ‚ค์ง€ ์•Š๋Š”๋‹ค.

๐Ÿ˜„ ๊ถŒ์žฅ (recommend)

SELECT station_id AS station_id,
       name AS station_name,
       status AS bike_status,
       latitude,
       longitude,
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`

๐Ÿ˜ž ๊ธฐํ”ผ (avoid)

SELECT station_id         AS station_id,
       name               AS station_name,
       status             AS bike_status,
       latitude,
       longitude,
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`

์•ž์„  ์‰ผํ‘œ์™€ ๋’ค๋”ฐ๋ฅด๋Š” ์‰ผํ‘œ (leading comma vs. trailing comma)

BigQuery์—์„œ๋Š” SELECT ๋ฆฌ์ŠคํŠธ์˜ ๋งˆ์ง€๋ง‰ ์ปฌ๋Ÿผ ๋’ค์— ์ฝค๋งˆ(,)๋ฅผ ํ—ˆ์šฉํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ณธ ๊ฐ€์ด๋“œ์—์„œ๋Š” ๋’ค๋”ฐ๋ฅด๋Š” ์‰ผํ‘œ๋ฅผ ์‚ฌ์šฉํ•œ๋‹ค.

๐Ÿ˜„ ๊ถŒ์žฅ recommend

SELECT station_id,
       name,
       status,
       latitude,
       longitude,  -- ๋งˆ์ง€๋ง‰ ์ปฌ๋Ÿผ ๋’ค ์ฝค๋งˆ(,) ์ถ”๊ฐ€
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`

๐Ÿ˜ž ๊ธฐํ”ผ avoid

SELECT station_id
     , name
     , status
     , latitude
     , longitude
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`

๋’ค๋”ฐ๋ฅด๋Š” ์‰ผํ‘œ๋ฅผ ์ง€์›ํ•˜์ง€ ์•Š๋Š” SQL ์–ธ์–ด์—์„œ๋Š” ์•ž์„  ์‰ผํ‘œ๋ฅผ ์‚ฌ์šฉํ•ด์„œ SELECT ๋ฆฌ์ŠคํŠธ์— ์ƒˆ๋กœ์šด ์ปฌ๋Ÿผ์„ ์ถ”๊ฐ€ํ•ด๋„ ์ง์ „ ๋ผ์ธ์— ์˜ํ–ฅ์„ ์ฃผ์ง€ ์•Š๋„๋ก ํ•˜์˜€์œผ๋‚˜ BigQuery์—์„œ๋Š” ๋’ค๋”ฐ๋ฅด๋Š” ์‰ผํ‘œ๋ฅผ ์ง€์›ํ•˜๋ฏ€๋กœ ์ด๋ฅผ ๊ธฐ๋ณธ ์Šคํƒ€์ผ๋กœ ์‚ผ๋Š”๋‹ค.

-- ๋’ค๋”ฐ๋ฅด๋Š” ์‰ผํ‘œ ์‚ฌ์šฉ
SELECT station_id,
       name
  FROM ...

-- ์ปฌ๋Ÿผ ์ถ”๊ฐ€์‹œ
SELECT station_id,
       name,    -- ์•„๋ž˜ ๋•Œ๋ฌธ์— comma๊ฐ€ ์ถ”๊ฐ€๋จ.
       status   -- ์‹ ๊ทœ์ปฌ๋Ÿผ ์ถ”๊ฐ€
  FROM ...
-- ์•ž์„  ์‰ผํ‘œ ์‚ฌ์šฉ
SELECT station_id
     , name
  FROM ...

-- ์ปฌ๋Ÿผ ์ถ”๊ฐ€์‹œ
SELECT station_id
     , name     -- ์•„๋ž˜ ์ปฌ๋Ÿผ ์ถ”๊ฐ€๋กœ ๋ณ€๊ฒฝ๋˜๋Š” ๋‚ด์šฉ์ด ์—†์Œ
     , status   -- ์‹ ๊ทœ์ปฌ๋Ÿผ ์ถ”๊ฐ€
  FROM ...

์ด์™ธ์— ๋ฒ”์šฉ ํ”„๋กœ๊ทธ๋ž˜๋ฐ ์–ธ์–ด๋‚˜ ๋™์ (dynamic) SQL์„ ์‚ฌ์šฉํ•˜์—ฌ SELECT ๋ฌธ์˜ ์ปฌ๋Ÿผ ๋ฆฌ์ŠคํŠธ๋ฅผ ์ž๋™ ์ƒ์„ฑํ•  ๋•Œ ๋’ค๋”ฐ๋ฅด๋Š” ์‰ผํ‘œ๋ฅผ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ ๋ฆฌ์ŠคํŠธ์˜ ๋งˆ์ง€๋ง‰ ์ปฌ๋Ÿผ์— ์‰ผํ‘œ๊ฐ€ ์ƒ๋žต๋˜๋„๋ก ์˜ˆ์™ธ ์ฒ˜๋ฆฌํ•  ํ•„์š”๊ฐ€ ์—†์–ด ๊ตฌํ˜„ ๋กœ์ง์ด ๋‹จ์ˆœํ•ด์ง€๋Š” ์žฅ์ ์ด ์žˆ๋‹ค.

์ปฌ๋Ÿผ ์ˆœ์„œ ๊ด€๋ก€ (column order conventions)

  • ๊ธฐ๋ณธํ‚ค(PK, Primary Key)์— ํ•ด๋‹นํ•˜๋Š” ์ปฌ๋Ÿผ๋ช…์„ ๋จผ์ € ์œ„์น˜์‹œํ‚จ๋‹ค.
  • (optional) ํŒŒํ‹ฐ์…˜๋œ ํ…Œ์ด๋ธ” ์‚ฌ์šฉ์‹œ ํŒŒํ‹ฐ์…˜ ์ปฌ๋Ÿผ๋ช…์€ ๋งˆ์ง€๋ง‰์— ์œ„์น˜์‹œํ‚จ๋‹ค. ์šด์˜์ค‘์— ์Šคํ‚ค๋งˆ ๋ณ€๊ฒฝ์œผ๋กœ ์ธํ•œ ์ปฌ๋Ÿผ ์ถ”๊ฐ€์‹œ๋Š” ์˜ˆ์™ธ๊ฐ€ ๋œ๋‹ค.
  • GROUP BY ๋ฅผ ๋™๋ฐ˜ํ•œ ์ง‘๊ณ„ ์ฟผ๋ฆฌ์—์„œ๋Š” ์ฐจ์›(dimension)์— ํ•ด๋‹นํ•˜๋Š” ์ปฌ๋Ÿผ๋ช…์„ ๋จผ์ € ๋‚˜์—ดํ•œ ํ›„ ์ง€ํ‘œ(metric)์— ํ•ด๋‹นํ•˜๋Š” ์ปฌ๋Ÿผ๋ช…์„ ๊ธฐ์ˆ ํ•œ๋‹ค.

FROM ์ ˆ

FROM ์ ˆ์—๋Š” ํ…Œ์ด๋ธ” ๋˜๋Š” ์ด์— ์ƒ์‘ํ•˜๋Š” ์„œ๋ธŒ์ฟผ๋ฆฌ๊ฐ€ ์œ„์น˜ํ•˜๊ฒŒ ๋œ๋‹ค. ์„œ๋ธŒ์ฟผ๋ฆฌ์— ๋Œ€ํ•œ ๋Œ€ํ•œ ์Šคํƒ€์ผ์€ ๋‹ค์Œ ์žฅ์—์„œ ๋‹ค์‹œ ์–ธ๊ธ‰ํ•˜๋„๋ก ํ•œ๋‹ค.

  • FROM ์ ˆ์€ SELECT ๋ฆฌ์ŠคํŠธ์˜ ๋‹ค์Œ ๋ผ์ธ์— ์œ„์น˜ํ•˜๋ฉฐ ํ…Œ์ด๋ธ” ์ด๋ฆ„, ์„œ๋ธŒ์ฟผ๋ฆฌ ํ˜น์€ UNNEST์™€ ๊ฐ™์€ ํ…Œ์ด๋ธ” ํ•จ์ˆ˜ ์ด๋ฆ„์ด FROM ํ‚ค์›Œ๋“œ์— ์ด์–ด ์ž‘์„ฑ๋œ๋‹ค.
  • FROM ํ‚ค์›Œ๋“œ๋Š” SELECT ํ‚ค์›Œ๋“œ์— ์˜ค๋ฅธ์ชฝ ์ •๋ ฌ(right-aligned)์ด ๋˜๋„๋ก ์œ„์น˜์‹œํ‚จ๋‹ค.

์˜๊ตฌ ํ…Œ์ด๋ธ” ์ด๋ฆ„๊ณผ ์ž„์‹œ ํ…Œ์ด๋ธ” ์ด๋ฆ„ (permanent and temporary table name)

  • FROM ํ‚ค์›Œ๋“œ ๋‹ค์Œ์— ๋“ฑ์žฅํ•˜๋Š” ํ…Œ์ด๋ธ” ์ด๋ฆ„์œผ๋กœ๋Š” ์˜๊ตฌ ํ…Œ์ด๋ธ”(permanent table)๊ณผ ์ž„์‹œ ํ…Œ์ด๋ธ”(temporary table) ์ด๋ฆ„ ๋˜๋Š” CTE(common table expression) ๊ตฌ๋ฌธ์— ์˜ํ•ด ์ •์˜๋œ ์ด๋ฆ„์„ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.
  • ์˜๊ตฌ ํ…Œ์ด๋ธ” ์ด๋ฆ„์€ FQTN ํ˜•์‹์„ ์‚ฌ์šฉํ•˜๋ฉฐ ์ž„์‹œ ํ…Œ์ด๋ธ”์ด๋‚˜ CTE ์ด๋ฆ„์€ snake_case ํ˜•์‹์„ ์‚ฌ์šฉํ•œ๋‹ค.
  • ์ž„์‹œ ํ…Œ์ด๋ธ”์ด๋‚˜ CTE ์ด๋ฆ„์€ ํ…Œ์ด๋ธ”์˜ ์‚ฌ์šฉ๋ชฉ์ ์ด ๋ช…ํ™•ํžˆ ๋“ค์–ด๋‚˜๋„๋ก ์ž‘์„ฑํ•œ๋‹ค.
-- Permanent Table
SELECT *
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`;
-- Temporara Table (ex. CTAS ๊ตฌ๋ฌธ์œผ๋กœ ์ž„์‹œํ…Œ์ด๋ธ” ์ƒ์„ฑ)
CREATE TEMP TABLE tmp_table AS SELECT ... ;

SELECT * FROM tmp_table;
-- CTEs - named subquery
WITH cte_name AS ( -- snake_case ๊ทœ์น™์— ๋”ฐ๋ผ ๊ธฐ์ˆ 
  SELECT ... -- ์™ผ์ชฝ ๋‘์นธ ๋“ค์—ฌ์“ฐ๊ธฐ๋ฅผ ํ•œ๋‹ค.
    FROM ...
)
SELECT * FROM cte_name;

์„œ๋ธŒ์ฟผ๋ฆฌ (subqueries)

ํ…Œ์ด๋ธ” ์ด๋ฆ„์ด๋‚˜ CTE ์ด๋ฆ„์ด ์„œ๋ธŒ์ฟผ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ FROM ์ ˆ์„ ๊ตฌ์„ฑํ•  ๊ฒฝ์šฐ์— ์•„๋ž˜์™€ ๊ฐ™์ด ๋‹ค์–‘ํ•œ ์Šคํƒ€์ผ์ด ๊ฐ€๋Šฅํ•˜๋‹ค. ์ด ๊ฐ€์ด๋“œ์—์„œ๋Š” from_style_#2 ๋ฅผ ์‚ฌ์šฉํ•œ ๊ตฌ์„ฑ์„ ๊ถŒ์žฅํ•œ๋‹ค.

๐Ÿ˜ž ๊ธฐํ”ผ (avoid)

-- ํ…Œ์ด๋ธ” ์„œ๋ธŒ์ฟผ๋ฆฌ ์Šคํƒ€์ผ #1 (from_style_#1)
SELECT *
  FROM (SELECT station_id,
               name,
               status
          FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`)

๐Ÿ˜„ ๊ถŒ์žฅ (recommend)

-- ํ…Œ์ด๋ธ” ์„œ๋ธŒ์ฟผ๋ฆฌ ์Šคํƒ€์ผ #2 (from_style_#2)
SELECT *
  FROM (
    SELECT station_id, -- FROM ํ‚ค์›Œ๋“œ๋กœ๋ถ€ํ„ฐ ๋‘ ์นธ ๋“ค์—ฌ์“ฐ๊ธฐ
           name,
           status,
      FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
  ) -- ๋‹ซ๋Š” ๊ด„ํ˜ธ๋Š” ์Œ์ด ๋˜๋Š” ์—ฌ๋Š” ๊ด„ํ˜ธ์˜ ํ‚ค์›Œ๋“œ ์‹œ์ž‘์ ๊ณผ ๋ผ์ธ์„ ๋งž์ถ˜๋‹ค.

๋ณธ ๊ฐ€์ด๋“œ์—์„œ from_style_#2 ๋ฅผ ์„ ํ˜ธํ•˜๋Š” ์ด์œ ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค. ์šฐ์„  ์œ„์˜ ์ฟผ๋ฆฌ๋ฅผ JOIN์„ ์ด์šฉํ•˜์—ฌ ํ™•์žฅํ•ด ๋ณด๋„๋ก ํ•˜์ž.

/* JOIN์„ ์ด์šฉํ•˜์—ฌ ์ฟผ๋ฆฌ๋ฅผ ํ™•์žฅํ•œ ์˜ˆ์‹œ */
-- ํ…Œ์ด๋ธ” ์„œ๋ธŒ์ฟผ๋ฆฌ ์Šคํƒ€์ผ #1 (from_style_#1)
SELECT *
  FROM (SELECT station_id,
               name,
               status
          FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`) AS a
  LEFT JOIN (SELECT station_id,
                    name,
                    status
               FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`) AS b
    ON a.station_id = b.station_id
;
-- ํ…Œ์ด๋ธ” ์„œ๋ธŒ์ฟผ๋ฆฌ ์Šคํƒ€์ผ #2 (from_style_#2)
SELECT *
  FROM (
    SELECT station_id,
           name,
           status,
      FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
  ) AS a
  LEFT JOIN (
    SELECT station_id,
           name,
           status,
      FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
  ) AS b
    ON a.station_id = b.station_id
;

from_style_#1 ์€ ๋™์ผํ•œ ์ˆ˜์ค€(level or depth)์— ์œ„์น˜ํ•œ ๋‘ ์„œ๋ธŒ์ฟผ๋ฆฌ a, b์˜ ๋“ค์—ฌ์“ฐ๊ธฐ๊ฐ€ ์„œ๋กœ ๋‹ฌ๋ผ์ ธ ์žˆ๋‹ค. ๋ฐ˜๋ฉด์— from_style_#2 ์€ ๋‘ ์„œ๋ธŒ์ฟผ๋ฆฌ๊ฐ€ ๋™์ผํ•œ ๋“ค์—ฌ์“ฐ๊ธฐ๋ฅผ ์œ ์ง€ํ•˜๋ฉด์„œ ๋™์‹œ์— ํ…Œ์ด๋ธ” ๋ณ„์นญ(alias)์„ ๋ผ์ธ์˜ ๋์ด ์•„๋‹Œ ์•ž์ชฝ์— ์œ„์น˜ํ•จ์œผ๋กœ์จ ON์ ˆ์—์„œ ์‹œ์„ ์„ ํฌ๊ฒŒ ์ด๋™์‹œํ‚ค์ง€ ์•Š๊ณ ์„œ๋„ ํ™•์ธ ๊ฐ€๋Šฅํ•˜๋‹ค.

WHERE ์ ˆ

WHERE ์ ˆ์—๋Š” FROM ์ ˆ์˜ ๊ฒฐ๊ณผ๋ฅผ ํ•„ํ„ฐ๋งํ•˜๊ธฐ ์œ„ํ•œ ์กฐ๊ฑด์„ ๊ธฐ์ˆ ํ•œ๋‹ค. WHERE ์ ˆ์€ ์•„๋ž˜์˜ ๊ฐ€์ด๋“œ์— ๋”ฐ๋ผ ์ž‘์„ฑํ•œ๋‹ค.

  • ๋น„๊ต ์—ฐ์‚ฐ์ž์˜ ์•ž ๋’ค๋กœ ๊ณต๋ฐฑ์„ ๋‘๋„๋ก ํ•œ๋‹ค.
  • ๋น„๋™๋“ฑ์—ฐ์‚ฐ์ž๋Š” !=์™€ <> ๋ชจ๋‘ ์‚ฌ์šฉ๊ฐ€๋Šฅํ•˜๋‹ค. SQL-1999์—์„œ๋Š” <>๋ฅผ ํ‘œ์ค€์œผ๋กœ ์ •์˜ํ•˜๊ณ  ์žˆ๋‹ค.
  • ์กฐ๊ฑด์ ˆ์€ AND์™€ OR ๋…ผ๋ฆฌ ์—ฐ์‚ฐ์ž์— ์˜ํ•ด ๋ณต์ˆ˜๊ฐœ๊ฐ€ ์—ฐ๊ฒฐ(chaining)๋  ์ˆ˜ ์žˆ๋‹ค.
  • ๋ณต์ˆ˜๊ฐœ์˜ ์กฐ๊ฑด์„ ๊ธฐ์ˆ ํ•˜๋Š” ๊ฒฝ์šฐ๋Š” ํ•˜๋‚˜์˜ ๋ผ์ธ์— ํ•˜๋‚˜์˜ ์กฐ๊ฑด์„ ๊ธฐ์ˆ ํ•˜๋Š” ๊ฒƒ์„ ๊ธฐ๋ณธ์œผ๋กœ ํ•˜๋˜ ์—ฐ๊ด€๋œ ์กฐ๊ฑด์˜ ๊ฒฝ์šฐ์—๋Š” 80์ž๋ฅผ ๋„˜์ง€ ์•Š๋Š” ์„ ์—์„œ ํ•˜๋‚˜์˜ ๋ผ์ธ์— ๊ฐ™์ด ์ž‘์„ฑ ๊ฐ€๋Šฅํ•˜๋‹ค.
  • AND์™€ OR ๋…ผ๋ฆฌ ์—ฐ์‚ฐ์ž๋Š” WHERE ํ‚ค์›Œ๋“œ์— ์˜ค๋ฅธ์ชฝ ์ •๋ ฌ์ด ๋˜๋„๋ก ํ•œ๋‹ค. ๋‹จ, ํ˜ผ์šฉ๋˜์–ด ์‚ฌ์šฉ๋˜๋Š” ๊ฒฝ์šฐ๋Š” ๋…ผ๋ฆฌ ์—ฐ์‚ฐ์ž์˜ ํ˜ผ์šฉ (mixed logical operators) ๊ฐ€์ด๋“œ๋ฅผ ๋”ฐ๋ฅธ๋‹ค.
  • BETWEEN์ด๋‚˜ IN ์—ฐ์‚ฐ์ž๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ณต์ˆ˜๊ฐœ์˜ ์กฐ๊ฑด์„ ๋ฌถ์„ ์ˆ˜ ์žˆ๋Š” ๊ฒฝ์šฐ๋Š” ์—ฐ์‚ฐ์ž๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์กฐ๊ฑด์„ ๊ฐ„๊ฒฐํ•˜๊ฒŒ ํ‘œํ˜„ํ•œ๋‹ค.
SELECT station_id,
       name,
       status,
       latitude, longitude,
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
 WHERE status = 'active'
   AND latitude > 135.0   -- AND ๋กœ ์กฐ๊ฑด ์—ฐ๊ฒฐ
;

๋…ผ๋ฆฌ ์—ฐ์‚ฐ์ž์˜ ํ˜ผ์šฉ (mixed logical operators)

ํ…Œ์ด๋ธ” ์กฐํšŒ์‹œ AND์™€ OR๋ฅผ ํ˜ผ์šฉํ•˜์—ฌ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” ์•„๋ž˜์˜ ๊ฐ€์ด๋“œ์— ๋”ฐ๋ผ ์ฝ”๋“œ๋ฅผ ๊ตฌ์กฐํ™”ํ•œ๋‹ค.

  1. ์ฃผ์–ด์ง„ ํ…Œ์ด๋ธ”์—์„œ AND ์กฐ๊ฑด์œผ๋กœ ๊ฒฐ๊ณผ์…‹(result set)์„ ์ค„์—ฌ๋‚˜๊ฐ€๋Š”์ง€ ๋ฐ˜๋Œ€๋กœ OR ์กฐ๊ฑด์œผ๋กœ ๋Œ€์ƒ์„ ๋Š˜๋ ค๋‚˜๊ฐ€๋Š”์ง€๋ฅผ ๋ณด๊ณ  ์ตœ์ƒ์œ„ ์กฐ๊ฑด์ ˆ์˜ ๊ตฌ์กฐ๋ฅผ ๋จผ์ € ํŒŒ์•…ํ•œ๋‹ค.
  2. ๊ฐ ์ƒ์œ„ ์กฐ๊ฑด๋“ค์ด ๋‹ค์‹œ ์„ธ๋ถ€์ ์ธ ์กฐ๊ฑด์œผ๋กœ ๋‚˜๋‰˜๋Š” ๊ฒฝ์šฐ์— ๊ด„ํ˜ธ๋ฅผ ์ด์šฉํ•˜์—ฌ ํ•ด๋‹น ์„ธ๋ถ€์กฐ๊ฑด์„ ๋ฌถ๋Š”๋‹ค. ๊ด„ํ˜ธ๋ฅผ ํ†ตํ•ด์„œ ์—ฐ์‚ฐ์ž ์šฐ์„ ์ˆœ์œ„์— ๋”ฐ๋ฅธ ๋ถ€์ž‘์šฉ(side effect)์„ ์—†์•ค๋‹ค.
  3. ์„ธ๋ถ€ ์กฐ๊ฑด์— ๋“ฑ์žฅํ•˜๋Š” AND ๋˜๋Š” OR ์—ฐ์‚ฐ์ž๋Š” ๋ผ์ธ์˜ ๋์— ์œ„์น˜์‹œํ‚ค๋„๋ก ํ•œ๋‹ค.
  4. 3 ๋‹จ๊ณ„ ์ด์ƒ์˜ ๊นŠ์ด์—์„œ ๋…ผ๋ฆฌ์—ฐ์‚ฐ์ด ์ด๋ฃจ์–ด์ง€๋Š” ๊ฒฝ์šฐ ๋ถ„๋ฐฐ ๋ฒ•์น™์„ ์ด์šฉํ•˜์—ฌ ๊นŠ์ด๋ฅผ ๋‚ฎ์ถœ ์ˆ˜ ์žˆ๋‹ค.
-- ์ตœ์ƒ์œ„ ์กฐ๊ฑด๋“ค์„ AND๋กœ ๋ฌถ๊ณ  ์„ธ๋ถ€์กฐ๊ฑด์ด OR๋กœ ๋ฌถ์ด๋Š” ๊ฒฝ์šฐ
SELECT station_id,
       name,
       status,
       latitude, longitude,
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
 WHERE status = 'active'
   AND (latitude > 135.0 OR
        longitude > 80.0)
   AND ...
;
-- ๋˜๋Š”, ์ตœ์ƒ์œ„ ์กฐ๊ฑด๋“ค์„ OR๋กœ ๋ฌถ๊ณ  ์„ธ๋ถ€์กฐ๊ฑด์„ AND๋กœ ๋ฌถ๋Š” ๊ฒฝ์šฐ
SELECT station_id,
       name,
       status,
       latitude, longitude,
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
 WHERE status = 'active'
    OR (latitude > 135.0 AND longitude > 80.0) -- ์˜๋ฏธ๋‹จ์œ„๋กœ ๋ฌถ์–ด์„œ ํ•œ ์ค„์— ํ‘œํ˜„
    OR ...
SELECT *
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
 WHERE status = 'active'
    OR (latitude > 135.0 AND (longitude = 80.0 OR longitude = 90.0))
    OR ...
-- ์œ„์—์„œ์™€ ๊ฐ™์ด 3 ๋‹จ๊ณ„ ์ด์ƒ์˜ ๊นŠ์ด์—์„œ ๋…ผ๋ฆฌ์—ฐ์‚ฐ ์ด๋ฃจ์–ด์ง€๋Š” ๊ฒฝ์šฐ ๋ถ„๋ฐฐ๋ฒ•์น™์„ ์ด์šฉํ•˜์—ฌ ์•„๋ž˜์™€ ๊ฐ™์ด ๊ฒฐ๊ณผ์ ์œผ๋กœ ๋™์น˜์ธ ๋ฌธ์žฅ์œผ๋กœ ํ’€์–ด์“ธ ์ˆ˜ ์žˆ๋‹ค. a ยท (b + c) = (a ยท b) + (a ยท c)
 WHERE status = 'active'
    OR (latitude > 135.0 AND longitude = 80.0)
    OR (latitude > 135.0 AND longitude = 90.0)
    OR ...

GROUP BY ์ ˆ

  • ์ง‘๊ณ„ํ•จ์ˆ˜์— ์˜ํ•ด์„œ ๋งŒ๋“ค์–ด์ง€๋Š” ์ง€ํ‘œ(metric) ์ปฌ๋Ÿผ์€ ๋ณ„์นญ(alias)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ต๋ช…(anonymous)์ด ์•„๋‹Œ ์ด๋ฆ„์„ ๋ถ€์—ฌํ•œ๋‹ค.
  • GROUP BY ๋ฆฌ์ŠคํŠธ์—๋Š” ์ปฌ๋Ÿผ์˜ ๋ณ„์นญ์ด๋‚˜ ์ปฌ๋Ÿผ ์ˆœ์„œ(ordinal position)๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์œผ๋ฏ€๋กœ ์ปฌ๋Ÿผ ์ˆœ์„œ๋ฅผ ์šฐ์„ ํ•ด์„œ ์‚ฌ์šฉํ•˜๋„๋ก ํ•œ๋‹ค.
  • ํŠนํžˆ, ๊ธฐ๋ณธ์ปฌ๋Ÿผ์—์„œ CASE๋ฌธ ๋“ฑ์œผ๋กœ ๊ณ„์‚ฐ๋œ ์ปฌ๋Ÿผ(calculated column)์ด ์ฐจ์›์œผ๋กœ ์‚ฌ์šฉ๋˜๋Š” ๊ฒฝ์šฐ์—๋Š” ๋ฐ˜๋“œ์‹œ ๋ณ„์นญ์ด๋‚˜ ์ปฌ๋Ÿผ ์ˆœ์„œ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ฝ”๋“œ๊ฐ€ ๋ฐ˜๋ณต๋˜์–ด ์‚ฌ์šฉ๋˜์ง€ ์•Š๋„๋ก ํ•œ๋‹ค.
  • HAVING ์€ ์ง‘๊ณ„๊ฒฐ๊ณผ๋ฅผ ํ•„ํ„ฐ๋งํ•˜๊ธฐ ์œ„ํ•œ ์ง‘๊ณ„๊ฐ’์˜ ์กฐ๊ฑด์„ ๊ธฐ์ˆ ํ•˜๋Š” ๊ตฌ๋ฌธ์œผ๋กœ ORDER BY ์—์„œ์™€ ๊ฐ™์ด ์ปฌ๋Ÿผ ๋ณ„์นญ ์‚ฌ์šฉ์ด ๊ฐ€๋Šฅํ•˜๋‹ค.
SELECT station_id,
       name AS station_name,
       status,
       CASE
         WHEN ST_DISTANCE (
                ST_GEOGPOINT(longitude, latitude),
                ST_GEOGPOINT(-0.118092, 51.509865)
              ) < 7000000 THEN 'downtown'
         ELSE 'outskirt'
       END AS downtown_or_outskirt,
       COUNT(1) AS cnt,    -- ์ง‘๊ณ„ํ•จ์ˆ˜์— ์˜ํ•ด ๊ณ„์‚ฐ๋œ metric ์ปฌ๋Ÿผ์— ๋ณ„์นญ ๋ถ€์—ฌ
  FROM `bigquery-public-data.austin_bikeshare.bikeshare_stations`
 WHERE s.status IN ('active')
   AND s.latitude > 135.0
 GROUP BY 1, 2, 3  -- ordinal position
HAVING cnt > 1 -- column alias
;

์•„๋ž˜ ์˜ˆ์‹œ์ฒ˜๋Ÿผ SELECT ๋ฆฌ์ŠคํŠธ์— ๊ณ„์‚ฐ๋œ ์ปฌ๋Ÿผ์„ ์ •์˜ํ•˜๊ณ  ํ•ด๋‹น ์ปฌ๋Ÿผ์„ ์ฐจ์›์œผ๋กœ GROUP BY ์ง‘๊ณ„๋ฅผ ํ•˜๋Š” ๊ฒฝ์šฐ GROUP BY ๋ฆฌ์ŠคํŠธ์—์„œ ์ฝ”๋“œ๊ฐ€ ๋ฐ˜๋ณต๋˜์–ด ๋ฌธ์žฅ์ด ๊ธธ์–ด์ง€๊ณ  ํ–ฅํ›„ ์œ ์ง€๋ณด์ˆ˜๊ฐ€ ์–ด๋ ค์›Œ์ง€๊ฒŒ ๋œ๋‹ค.

๐Ÿ˜„ ๊ถŒ์žฅ (recommend)

SELECT CASE
         WHEN is_male = TRUE THEN 'Male'
         WHEN is_male = FALSE THEN 'Female'
       END AS gender,
       COUNT(1) AS cnt,
  FROM `bigquery-public-data.samples.natality`
 GROUP BY 1
HAVING cnt > 1000
;

๐Ÿ˜ž ๊ธฐํ”ผ (avoid)

SELECT CASE
         WHEN is_male = TRUE THEN 'Male'
         WHEN is_male = FALSE THEN 'Female'
       END,
       COUNT(1) AS cnt,
  FROM `bigquery-public-data.samples.natality`
 GROUP BY
       CASE
         WHEN is_male = TRUE THEN 'Male'
         WHEN is_male = FALSE THEN 'Female'
       END
;

๋˜ํ•œ, ๊ณ„์‚ฐ๋œ ์ปฌ๋Ÿผ์„ ์ง€ํ‘œ๋กœ ์‚ฌ์šฉํ•˜์—ฌ ์ง‘๊ณ„๋ฅผ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒฝ์šฐ์— ์ปฌ๋Ÿผ ์ด๋ฆ„๊ณผ ๋ณ„์นญ์„ ํ˜ผ๋™ํ•˜์—ฌ ์‚ฌ์šฉํ•  ๊ฒฝ์šฐ ์•„๋ž˜์™€ ๊ฐ™์ด ์›์น˜ ์•Š๋Š” ๊ฒฐ๊ณผ๊ฐ€ ๋งŒ๋“ค์–ด์งˆ ์ˆ˜ ์žˆ์œผ๋‹ˆ ์ฃผ์˜๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

DECLARE data ARRAY<STRUCT<region STRING,  amount INT64>> DEFAULT [
    ('', 1), ('NULL', 2), ('KOR', 2), ('USA', 3), ('KOR', 4), ('CHN', 1)
];

SELECT IF(d.region IN ('', 'NULL'), 'UNKNOWN', d.region) AS rgn,
       SUM(d.amount) AS totals,
  FROM UNNEST(data) d GROUP BY region; -- ๋ณ„์นญ์ด ์•„๋‹Œ ์ปฌ๋Ÿผ ์ด๋ฆ„์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ทธ๋ฃนํ•‘
ํ–‰ rgn totals
1 UNKNOWN 1
2 UNKNOWN 2
3 KOR 6
4 USA 3
5 CHN 1

๊ณตํ†ต ํ…Œ์ด๋ธ” ํ‘œํ˜„์‹ Common Table Expressions, CTEs

Common Table Expression ์€ ์„œ๋ธŒ ์ฟผ๋ฆฌ์— ์ด๋ฆ„์„ ๋ถ€์—ฌํ•˜์—ฌ ์ฐธ์กฐํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ํ‘œํ˜„๋ฐฉ์‹์ด๋‹ค. ์„œ๋ธŒ์ฟผ๋ฆฌ์— ์ด๋ฆ„์„ ๋ถ€์—ฌํ•œ Named Subquery ์ธ ์…ˆ์ด๋‹ค.

์„œ๋ธŒ์ฟผ๋ฆฌ๋Š” ์ •์˜๋œ ํ•ด๋‹น ์œ„์น˜์—์„œ๋งŒ ์ฐธ์กฐ ๊ฐ€๋Šฅํ•˜์ง€๋งŒ, ๊ณตํ†ต ํ…Œ์ด๋ธ” ํ‘œํ˜„์‹์€ ๋ฌธ์žฅ ๋‚ด ์—ฌ๋Ÿฌ ๊ณณ์—์„œ ๋ฐ˜๋ณต์ ์œผ๋กœ ์ฐธ์กฐ๋  ์ˆ˜ ์žˆ๋‹ค.

CTE๋Š” ๋ณต์žกํ•œ ์ฟผ๋ฆฌ ๊ตฌ์กฐ๋ฅผ ๊ฐ„๊ฒฐํ•˜๊ฒŒ ์žฌ๊ตฌ์„ฑํ•˜๋Š”๋ฐ ํ•ต์‹ฌ์ ์ธ ์—ญํ• ์„ ํ•œ๋‹ค.

CTE ๊ธฐ๋ณธ ๊ฐ€์ด๋“œ

์ฝ”๋“œ ๋‚ด์— 2๋‹จ๊ณ„๋ฅผ ๋„˜์–ด ์ค‘์ฒฉ๋˜๋Š” ์„œ๋ธŒ์ฟผ๋ฆฌ๊ฐ€ ์กด์žฌํ•˜๋Š” ๊ฒฝ์šฐ์—๋Š” ์„œ๋ธŒ์ฟผ๋ฆฌ๋ฅผ CTE ๊ตฌ๋ฌธ์œผ๋กœ ๋Œ€์ฒดํ•˜์—ฌ ์ „์ฒด ์ฟผ๋ฆฌ ๊ตฌ์กฐ๊ฐ€ ์ ์ ˆํ•œ ๊นŠ์ด(2 depth)๋ฅผ ์œ ์ง€ ํ•˜๋„๋ก ํ•œ๋‹ค.

CTE๋Š” ํ…Œ์ด๋ธ” ์„œ๋ธŒ์ฟผ๋ฆฌ(table subquery)์™ธ์—๋„ ์Šค์นผ๋ผ ์„œ๋ธŒ์ฟผ๋ฆฌ(scalar subquery)๋ฅผ ๋Œ€์ฒดํ•˜์—ฌ ์ฝ”๋“œ๋ฅผ ๊ฐ„๊ฒฐํ•˜๊ฒŒ ๋งŒ๋“œ๋Š”๋ฐ๋„ ์œ ์šฉํ•˜๋‹ค.

์•„๋ž˜์˜ ์˜ˆ์‹œ์—์„œ๋Š” 2๋‹จ๊ณ„ ์ด์ƒ ์ค‘์ฒฉ๋œ ์„œ๋ธŒ์ฟผ๋ฆฌ๋ฅผ CTE ๊ตฌ๋ฌธ์„ ์‚ฌ์šฉํ•˜์—ฌ ์ฝ”๋“œ์˜ ๊นŠ์ด๊ฐ€ ํ•œ ๋‹จ๊ณ„๊ฐ€ ๋˜๋„๋ก ํ•˜๊ณ  ์žˆ๋‹ค. ์ดํ›„ ๋‹จ๊ณ„๋ณ„ ์ค‘๊ฐ„ ํ…Œ์ด๋ธ” ์—ญ์‹œ CTE๋กœ ์ •์˜ํ•˜์—ฌ ์‚ฌ์šฉํ•˜๋ฉด์„œ ์ตœ์ข…์ ์œผ๋กœ๋Š” ๋™์ผํ•œ ๊ฒฐ๊ณผ๋ฅผ ๋งŒ๋“ค์–ด๋‚ด๊ฒŒ ๋œ๋‹ค.

๐Ÿ˜ž ๊ธฐํ”ผ avoid

SELECT *
  FROM (
    SELECT *
      FROM (
        SELECT *
          FROM base_table
      ) AS first_subquery
  ) AS second_subquery

๐Ÿ˜„ ๊ถŒ์žฅ recommend

WITH first_subquery (  -- ๊ฐ€์žฅ ์•ˆ์ชฝ์˜ ์„œ๋ธŒ์ฟผ๋ฆฌ
SELECT *
  FROM base_table
),
second_subquery ( -- ์ƒ์œ„ ์„œ๋ธŒ์ฟผ๋ฆฌ
  SELECT *
    FROM first_subquery
)
SELECT *  -- ๋ฉ”์ธ ์ฟผ๋ฆฌ
  FROM second_subquery
;

FROM ์ ˆ์— ์ง์ ‘ ์„œ๋ธŒ์ฟผ๋ฆฌ๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ์•„๋ž˜์ชฝ์— ์œ„์น˜ํ•œ ์„œ๋ธŒ์ฟผ๋ฆฌ๋ฅผ ๋จผ์ € ์ฝ๊ณ  ๋‹ค์‹œ ์œ„์ชฝ์˜ ์ฟผ๋ฆฌ๋ฅผ ์ฝ์–ด์•ผ ํ•˜๋Š” ๋ฐ˜๋ฉด, CTE๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ๋Š” ์œ„์—์„œ๋ถ€ํ„ฐ ์ž์—ฐ์Šค๋Ÿฝ๊ฒŒ ์•„๋ž˜๋กœ ์ฝ”๋“œ ์ฝ๊ธฐ๊ฐ€ ๊ฐ€๋Šฅํ•œ ์žฅ์ ์ด ์žˆ๋‹ค.

์•„๋ž˜๋Š” CTEs๋ฅผ ์ด์šฉํ•˜์—ฌ ๊ณ„์‚ฐ๋œ ์ปฌ๋Ÿผ์ด ๋ฐ˜๋ณต๋˜์ง€ ์•Š๋„๋ก ๊ตฌ์กฐ๋ฅผ ๋ณ€๊ฒฝํ•œ ์˜ˆ์‹œ์ด๋‹ค. ์ค‘๋ณต์€ ์ฝ”๋“œ ๊ธธ์ด๋ฅผ ์ฆ๊ฐ€์‹œํ‚ค๊ณ  ์œ ์ง€๋ณด์ˆ˜๋ฅผ ์–ด๋ ต๊ฒŒ ํ•˜๊ธฐ ๋•Œ๋ฌธ์— ์„œ๋ธŒ์ฟผ๋ฆฌ๋‚˜ CTE๋ฅผ ์ด์šฉํ•˜์—ฌ ์ฝ”๋“œ๊ฐ€ ๋ฐ˜๋ณต๋˜์ง€ ์•Š๋„๋ก ๋งŒ๋“œ๋Š” ๊ฒƒ์ด ์ข‹๋‹ค.

๐Ÿ˜ž ๊ธฐํ”ผ (avoid)

SELECT name AS station_name,
       ST_DISTINCE (
         ST_GEOPOINT(longitude, latitude), ST_GEOPOINT(-0.118092, 51.509865)
       ) AS distance_from_city_centre_m
  FROM `bigquery-public-data.london_bicycles.cycle_stations`
 WHERE ST_DISTINCE (
         ST_GEOPOINT(longitude, latitude),
         ST_GEOPOINT(-0.118092, 51.509865)
       ) <= 500
 ORDER BY distance_from_city_centre_m

๐Ÿ˜„ ๊ถŒ์žฅ (recommend)

WITH stations (
  SELECT name AS station_name,
         ST_DISTINCE (
           ST_GEOPOINT(longitude, latitude), ST_GEOPOINT(-0.118092, 51.509865)
         ) AS distance_from_city_centre_m
    FROM `bigquery-public-data.london_bicycles.cycle_stations`
)
SELECT *
  FROM stations
 WHERE distance_from_city_centre_m <= 500
 ORDER BY distance_from_city_centre_m

JOIN ์ ˆ

JOIN ๊ธฐ๋ณธ ๊ฐ€์ด๋“œ

  • INNER์™€ OUTER ํ‚ค์›Œ๋“œ๋Š” ์ƒ๋žต๋˜๋”๋ผ๋„ ์˜๋ฏธ์ ์œผ๋กœ ํ˜ผ๋™์„ ์ฃผ์ง€ ์•Š๊ธฐ ๋•Œ๋ฌธ์— ์ƒ๋žตํ•œ๋‹ค. ๋‚ด๋ถ€์กฐ์ธ์€ JOIN, ์™ธ๋ถ€์กฐ์ธ์€ LEFT JOIN, OUTER JOIN์„ ์‚ฌ์šฉํ•œ๋‹ค.
  • Cartesian Product๋Š” CROSS JOIN ์‚ฌ์šฉํ•œ ์กฐ์ธ ํ˜•์‹์„ ์ฝค๋งˆ(,)๋ฅผ ์‚ฌ์šฉํ•œ ์˜ค๋ผํด ์กฐ์ธ ํ˜•์‹ ๋ณด๋‹ค ์šฐ์„ ํ•œ๋‹ค. ๋‹ค๋งŒ, ์•„๋ž˜์—์„œ ์–ธ๊ธ‰ํ•  ์ƒ๊ด€ ํฌ๋กœ์Šค ์กฐ์ธ (correlated cross-join)์˜ ๊ฒฝ์šฐ๋Š” ์ฝค๋งˆ ์กฐ์ธ ํ˜•์‹์„ ์šฐ์„ ํ•œ๋‹ค.
  • ON ์ ˆ์€ JOIN์ ˆ ๋ฐ”๋กœ ์•„๋ž˜์— ์ž‘์„ฑํ•˜๋ฉฐ ๊ฒฐํ•ฉ์กฐ๊ฑด์ด ์—ฌ๋Ÿฌ ๊ฐœ ๋“ฑ์žฅํ•˜๋Š” ๊ฒฝ์šฐ๋Š” WHERE ์ ˆ์„ ์ฐธ๊ณ ํ•˜์—ฌ ๋ณต์ˆ˜๊ฐœ์˜ ๊ฒฐํ•ฉ ์กฐ๊ฑด์„ ๊ธฐ์ˆ ํ•œ๋‹ค.
  • ํ…Œ์ด๋ธ” ๋ณ„์นญ์„ ์‚ฌ์šฉํ•˜์—ฌ ๊ธด ํ…Œ์ด๋ธ” ์ด๋ฆ„์ด ๋ฐ˜๋ณต๋˜์ง€ ์•Š๋„๋ก ํ•œ๋‹ค.

๐Ÿ˜ž ๊ธฐํ”ผ (avoid)

SELECT m.*,
       d.*,
  FROM masters
  LEFT OUTER JOIN details
    ON masters.mid = details.did

๐Ÿ˜„ ๊ถŒ์žฅ (recommend)

SELECT m.*,
       d.*,
  FROM masters AS m
  LEFT JOIN details AS d
    ON m.mid = d.did
   AND m.status = d.sts

์ƒ๊ด€ ํฌ๋กœ์Šค ์กฐ์ธ (correlated cross-join)

BigQuery์˜ ๋ฐฐ์—ด์„ ํ‰ํƒ„ํ™”(flattening) ํ•œ ํ›„์˜ CROSS JOIN ์„ ์ƒ๊ด€ ํฌ๋กœ์Šค ์กฐ์ธ์ด๋ผํ•˜๊ณ  ์—ฌ๊ธฐ์„œ์˜ ์ƒ๊ด€(correleated)์€ ํฌ๋กœ์Šค ์กฐ์ธ์˜ ๋ฒ”์œ„๊ฐ€ ํ…Œ์ด๋ธ” ์ „์ฒด ํ–‰(row)์ด ์•„๋‹Œ ๋ฐฐ์—ด์„ ๋ณด์œ ํ•˜๊ณ  ์žˆ๋Š” ํ–‰(row)์— ์—ฎ์—ฌ ๊ตญํ•œ๋œ๋‹ค๋Š” ์˜๋ฏธ์ด๋‹ค.

์•„๋ž˜ ์˜ˆ์‹œ์˜ ๊ฒฝ์šฐ data ํ…Œ์ด๋ธ”๊ณผ y ๋ฐฐ์—ด์ด ์˜ค๋ผํด ํ˜•์‹์˜ ํฌ๋กœ์Šค ์กฐ์ธ์„ ์ˆ˜ํ–‰ํ•˜๊ณ  ์žˆ์œผ๋‚˜, ์ผ๋ฐ˜์ ์ธ ํฌ๋กœ์Šค ์กฐ์ธ ์ฒ˜๋Ÿผ y ๋ฐฐ์—ด์˜ ์š”์†Œ๋“ค์ด data ํ…Œ์ด๋ธ”์˜ ๋ชจ๋“  ํ–‰(row) ๊ณผ ๊ฒฐํ•ฉ๋˜์–ด m X n์˜ ์กฐํ•ฉ์„ ๋งŒ๋“ค์ง€ ์•Š๊ณ  y๋ฐฐ์—ด์„ ํฌํ•จํ•œ ํ•˜๋‚˜์˜ ํ–‰๊ณผ 1 X n(y๋ฐฐ์—ดํฌ๊ธฐ) ์˜ ํ•œ์ •๋œ ์กฐ์ธ ๊ฒฐ๊ณผ๋ฅผ ๋งŒ๋“ค์–ด ๋‚ธ๋‹ค.

-- Correlated Cross-Join
WITH data AS (
  SELECT 1 AS w, 'a' AS x, ['P', 'Q'] AS y
   UNION ALL
  SELECT 2 AS w, 'b' AS x, ['R', 'S', 'T'] AS y
)
-- Comma Join --> data CROSS JOIN data.y ๋™์ผ
SELECT w, x, y_ FROM data, data.y y_
;
-- ์œ„์—์„œ ๋ฉ”์ธ `SELECT` ๋ฌธ์žฅ์€ ์•„๋ž˜ ์ฝ”๋“œ๊ฐ€ ์ถ•์•ฝ๋œ ํ˜•ํƒœ์ด๋‹ค.
-- UNNEST()๋ฅผ ์ƒ๋žตํ•˜๊ณ  y ์ปฌ๋Ÿผ์ด๋ฆ„์„ data.y๋กœ ํ•œ์ •์‹œํ‚ด.
SELECT w, x, y_ FROM data, UNNEST(y) y_

์•„๋ž˜๋Š” UNNEST๋ฅผ ์—ฐ์‡„์ ์œผ๋กœ ์ƒ๊ด€ ์กฐ์ธ์„ ์ˆ˜ํ–‰ํ•˜๋Š” ๊ฒฝ์šฐ์˜ ์Šคํƒ€์ผ ์˜ˆ์‹œ์ด๋‹ค. ** FROM(์„œ๋ธŒ์ฟผ๋ฆฌ)** ์ฒ˜๋Ÿผ ๋ฐฐ์—ด์„ ์„œ๋ธŒ์ฟผ๋ฆฌ๋กœ ๊ฐ„์ฃผํ•˜๊ณ  ๋น„์Šทํ•˜๊ฒŒ ์Šคํƒ€์ผ๋งํ•˜๋Š” ๋ณ€ํ˜•์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

DECLARE rectangular_table ARRAY<STRUCT<A STRING, B STRING, C STRING>> DEFAULT [
  ('1', '1', '1'),  ('2', 'other', '2'), ('TRUE', '3', '3'), ('4', '4', '4')
];

-- Style #1 - preferred
SELECT SUM(SAFE_CAST(v AS FLOAT64)) AS `sum`
  FROM UNNEST(rectangular_table) t,
       UNNEST(REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':"?([-0-9.]*)"?[,}]')) v
;
-- Style #2 -- UNNEST๋ฅผ FROM๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ ๊ฐ„์ฃผ
SELECT SUM(SAFE_CAST(v AS FLOAT64)) AS `sum`
  FROM UNNEST(rectangular_table) t,
UNNEST (REGEXP_EXTRACT_ALL(TO_JSON_STRING(t), r':"?([-0-9.]*)"?[,}]')) v

Window Function

์œˆ๋„์šฐ ํ•จ์ˆ˜ ๊ธฐ๋ณธ ์Šคํƒ€์ผ ๊ฐ€์ด๋“œ

๋ถ„์„ ํ•จ์ˆ˜ ๋˜๋Š” ์œˆ๋„์šฐ ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•œ ์ปฌ๋Ÿผ์˜ ๊ตฌ์กฐ๋Š” ๋Œ€๋žต ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

  • OVER ํ‚ค์›Œ๋“œ๋ฅผ ์ค‘์‹ฌ์œผ๋กœ ์™ผ์ชฝ์— ๋ถ„์„ ํ•จ์ˆ˜๊ฐ€ ๋†“์ด๊ณ  ์˜ค๋ฅธ์ชฝ์—๋Š” ๋ถ„์„ ํ•จ์ˆ˜๋ฅผ ์ ์šฉํ•  ์œˆ๋„์šฐ ํ”„๋ ˆ์ž„ (window frame)์„ ์ •์˜ํ•˜๋Š” ๊ตฌ๋ฌธ์ด ๋‚˜์˜จ๋‹ค.
  • ์œˆ๋„์šฐ ํ”„๋ ˆ์ž„(window frame) ์ •์˜ ๊ตฌ๋ฌธ์€ ๋‹ค์‹œ ์•„๋ž˜์˜ 3๊ฐœ์˜ ๊ตฌ๋ฌธ์œผ๋กœ ์„ธ๋ถ„ํ™” ๋œ๋‹ค. 3๊ฐ€์ง€ ๊ตฌ๋ฌธ์€ ๋ชจ๋‘ ์„ ํƒ์ ์œผ๋กœ(optional)๋กœ ์‚ฌ์šฉ๋œ๋‹ค.
    • PARTITION BY๋กœ ํŒŒํ‹ฐ์…˜(partition)์„ ์ •์˜
    • ORDER BY๋กœ ๊ฐ ํŒŒํ‹ฐ์…˜์„ ์–ด๋–ค ๊ธฐ์ค€์œผ๋กœ ์ •๋ ฌํ• ์ง€ ์ •์˜
    • ROWS ๋‚˜ RANGE ๋กœ ๊ฐ ํŒŒํ‹ฐ์…˜์•ˆ์—์„œ ๋ฌผ๋ฆฌ์  ๋˜๋Š” ๋…ผ๋ฆฌ์ ์ธ ์œˆ๋„์šฐ ํ”„๋ ˆ์ž„(window frame)์„ ์–ด๋–ป๊ฒŒ ์ ์šฉํ• ์ง€ ์ •์˜

์ด๋Ÿฌํ•œ ๊ตฌ์กฐ์— ๋Œ€ํ•œ ์ฝ”๋”ฉ ์Šคํƒ€์ผ์€ ๋‹ค์Œ ์ง€์นจ์— ๋”ฐ๋ผ ์ž‘์„ฑํ† ๋ก ํ•œ๋‹ค.

  • ๋ถ„์„ ํ•จ์ˆ˜์™€ OVER ํ‚ค์›Œ๋“œ๋Š” ์ปฌ๋Ÿผ์˜ ์ฒซ๋ฒˆ์งธ ๋ผ์ธ์— ์œ„์น˜์‹œํ‚จ๋‹ค.
  • OVER () ๊ตฌ๋ฌธ๋‚ด์— ์œˆ๋„์šฐ ํ”„๋ ˆ์ž„์„ ์ •์˜ํ•˜๋Š” ๊ตฌ๋ฌธ์ด ๊ธธ์–ด์ง€๋Š” ๊ฒฝ์šฐ ์ค„๋ฐ”๊ฟˆ์„ ํ•œ ํ›„ ๊ฐ ๊ตฌ๋ฌธ์„ ํ•˜๋‚˜์˜ ๋ผ์ธ์— ๊ธฐ์ˆ ํ•œ๋‹ค.
  • ์ค„๋ฐ”๊ฟˆ๋œ ์œˆ๋„์šฐ ํ”„๋ ˆ์ž„ ์ •์˜ ๊ด€๋ จ ํ‚ค์›Œ๋“œ๋“ค์€ ์œˆ๋„์šฐ ํ•จ์ˆ˜ ์ด๋ฆ„์„ ๊ธฐ์ค€์œผ๋กœ ๋‘ ์นธ ๋“ค์—ฌ์“ฐ๊ธฐ๋ฅผ ํ•œ ํ›„ ์™ผ์ชฝ ์ •๋ ฌ์‹œํ‚จ๋‹ค.
  • select-list ์ ˆ์— ์œˆ๋„์šฐ ํ”„๋ ˆ์ž„ ์ •์˜ ๊ตฌ๋ฌธ ์ž‘์„ฑ์‹œ ๋ผ์ธ์ด ๊ธธ์–ด์ง€๋ฏ€๋กœ WINDOW ์ ˆ์— ์ •์˜ํ•˜๊ณ  ์ •์˜๋œ ์œˆ๋„์šฐ ํ”„๋ ˆ์ž„ ์ด๋ฆ„์„ ์‚ฌ์šฉํ•œ๋‹ค.

์œ„์˜ ์Šคํƒ€์ผ์„ ์ ์šฉํ•˜์—ฌ ์ž‘์„ฑํ•œ ์ฝ”๋“œ ์˜ˆ์‹œ๋Š” ์•„๋ž˜์™€ ๊ฐ™๋‹ค.

  • ์ฒซ๋ฒˆ์งธ ์˜ˆ์‹œ
SELECT item, purchases, category,
       SUM(purchases) OVER (
         PARTITION BY category  -- ๋‘ ์นธ ๋“ค์—ฌ์“ฐ๊ธฐ ๋ฐ
         ORDER BY purchases     -- PARTITION, ORDER ํ‚ค์›Œ๋“œ ์™ผ์ชฝ ์ •๋ ฌ
       ) AS total_purchases,    -- ๋ถ„์„ํ•จ์ˆ˜ ์ปฌ๋Ÿผ์— ๋Œ€ํ•œ ๋ณ„์นญ(alias)

       SUM(purchases) OVER (
         PARTITION BY category
         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
       ) AS cumulative_purchases,

       LAST_VALUE(item) OVER (
         PARTITION BY category ORDER BY purchases
         ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
       ) AS most_popular,
  FROM Produce
;

SELECT animal, population, category,
       COUNT(*) OVER (
         ORDER BY population
         RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING
       ) AS similar_population,
  FROM Farm
;

WINDOW ๊ตฌ๋ฌธ์„ ์‚ฌ์šฉํ•œ ์Šคํƒ€์ผ ์˜ˆ์‹œ์ด๋‹ค. ์ด๋ฆ„์„ ๊ฐ€์ง„ ์œˆ๋„์šฐ ํ”„๋ ˆ์ž„์€ ์ด์–ด์ง€๋Š” ์œˆ๋„์šฐ ํ”„๋ ˆ์ž„์—์„œ ์ฐธ์กฐ ๊ฐ€๋Šฅํ•˜๋ฏ€๋กœ ๋ฐ˜๋ณต๋˜๋Š” ์ •์˜๋ฅผ ์ค„์ผ ์ˆ˜ ์žˆ๋‹ค.

SELECT item, purchases, category,
       SUM(purchases) OVER w1 AS total_purchases,
       SUM(purchases) OVER w2 AS cumulative_purchases,
       LAST_VALUE(item) OVER w3 AS most_popular,
  FROM Produce
WINDOW w0 AS (PARTITION BY category),
       w1 AS (W0 ORDER BY purchases),
       W2 AS (W0 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW),
       W2 AS (W1 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
;
  • ๋‘ ๋ฒˆ์งธ ์˜ˆ์‹œ
SELECT name,
       division,
       FORMAT_TIMESTAMP('%X', finish_time) AS finish_time,
       FORMAT_TIMESTAMP('%X', fastest_time) AS fastest_time,
       TIMESTAMP_DIFF(finish_time, fastest_time, SECOND) AS delta_in_seconds
  FROM (
    SELECT name, finish_time, division,
           FIRST_VALUE(finish_time) OVER (
             PARTITION BY division ORDER BY finish_time ASC
             ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
           ) AS fastest_time
      FROM finishers
);

User Defined Function

BigQuery์˜ ์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜๋Š” SQL๊ณผ JavaScript ๋‘ ๊ฐ€์ง€ ์–ธ์–ด๋กœ ์ž‘์„ฑ ๊ฐ€๋Šฅํ•˜๋‹ค.

  • ํ•จ์ˆ˜ ๋ณธ๋ฌธ(function body)์„ ์ž‘์„ฑํ•  ๋•Œ๋Š” ๊ฐ ์–ธ์–ด์˜ ๊ธฐ๋ณธ ์Šคํƒ€์ผ ๊ฐ€์ด๋“œ๋ฅผ ๋”ฐ๋ฅธ๋‹ค.
  • SQL ์–ธ์–ด๋กœ ์ž‘์„ฑ์‹œ๋Š” ์ด ๋ฌธ์„œ์˜ ๋‚ด์šฉ์„ ์ฐธ๊ณ ํ•œ๋‹ค.
  • JavaScript ์–ธ์–ด๋กœ ์ž‘์„ฑ์‹œ๋Š” ํŒ€ ๋‚ด ์‚ฌ์šฉ์ค‘์ธ ์Šคํƒ€์ผ ๊ฐ€์ด๋“œ๋ฅผ ๋”ฐ๋ฅธ๋‹ค. ์—†์„ ๊ฒฝ์šฐ์—๋Š” ๋‹ค์Œ์„ ์ฐธ์กฐํ•œ๋‹ค.

์‚ฌ์šฉ์ž ์ •์˜ ํ•จ์ˆ˜ ์Šคํƒ€์ผ ์˜ˆ์‹œ

  • JavaScript UDF
CREATE TEMP FUNCTION multiply_inputs(x FLOAT64, y FLOAT64)
RETURNS FLOAT64
LANGUAGE js AS r"""
  return x * y;
""";
SELECT multiplyInputs(3, 5) as product;
  • SQL UDF
CREATE TEMP FUNCTION multiplyInputs(x FLOAT64, y FLOAT64) AS (
  x * y
);
SELECT multiplyInputs(3, 5) as product;
CREATE TEMPORARY FUNCTION dayOfWeek(x TIMESTAMP) AS (
  ['Sun','Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat']
  [ORDINAL(EXTRACT(DAYOFWEEK from x))]
);

CREATE TEMPORARY FUNCTION getDate(x TIMESTAMP) AS (
  EXTRACT(DATE FROM x)
);

WITH overnight_trips AS (
  SELECT duration, dayOfWeek(start_date) AS start_day
    FROM `bigquery-public-data.london_bicycles.cycle_hire`
   WHERE getDate(start_date) != getDate(end_date)
)
SELECT * FROM overnight_trips LIMIT 100;

SQL UDF ์—์„œ SELECT ๋ฌธ์„ ์‚ฌ์šฉํ•˜๋Š” ๊ฒฝ์šฐ ์ผ๋ฐ˜์ ์ธ ์Šค์นผ๋ผ ์„œ๋ธŒ์ฟผ๋ฆฌ(scalar subquery)์™€ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ ๊ด„ํ˜ธ๋กœ ๊ฐ์‹ธ์ค˜์•ผ ํ•œ๋‹ค. ์ด ๋•Œ ๊ด„ํ˜ธ๋Š” ์•„๋ž˜์ฒ˜๋Ÿผ ํ•จ์ˆ˜ ๋ณธ๋ฌธ (_functon body)์˜ ๊ด„ํ˜ธ์™€ ๊ฐ™์ด ์‚ฌ์šฉํ•˜์ด ์ด์ค‘ ๊ด„ํ˜ธ ํ˜•ํƒœ๊ฐ€ ๋˜๋„๋ก ํ•˜์—ฌ UDF ๋ณธ๋ฌธ์ด ๊ด„ํ˜ธ๊ฐ€ ์•„๋‹Œ SELECT๋กœ ์‹œ์ž‘๋˜๋„๋ก ๋งŒ๋“ ๋‹ค.

CREATE OR REPLACE FUNCTION `jaeseok-park.fn.median` (arr ANY TYPE) AS ((
  SELECT IF(MOD(ARRAY_LENGTH(arr), 2) = 0,
            (arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2) - 1)] +
             arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2))] ) / 2,

            arr[OFFSET(DIV(ARRAY_LENGTH(arr), 2))]
         ) AS median
    FROM (SELECT ARRAY_AGG(x ORDER BY x) AS arr FROM UNNEST(arr) AS x)
));

BigQuery Scripting

TBD


BigQuery ์Šคํƒ€์ผ ์˜ˆ์‹œ

์•„๋ž˜๋Š” ์ง€๊ธˆ๊นŒ์ง€์˜ ๊ฐ€์ด๋“œ์— ๋”ฐ๋ผ ์‹ค์ œ ๋ถ„์„์šฉ ์ฟผ๋ฆฌ๋ฅผ ๊ตฌ์กฐํ™”ํ•œ ์˜ˆ์‹œ์ด๋‹ค. ์ฟผ๋ฆฌ๋Š” ์•„๋ž˜ ๋งํฌ์˜ ๋‚ด์šฉ์„ ์‚ฌ์šฉํ•˜์˜€๋‹ค.

#standardSQL
WITH variants AS (
  -- Retrieve the variants in this cohort, flattening by alternate bases and
  -- counting affected alleles.
  SELECT REPLACE(reference_name, 'chr', '') as reference_name,
         start_position,
         end_position,
         reference_bases,
         alternate_bases.alt AS alt,
         (SELECT COUNTIF(gt = alt_offset + 1) FROM v.call, call.genotype gt) AS num_variant_alleles,
         (SELECT COUNTIF(gt >= 0) FROM v.call, call.genotype gt) AS total_num_alleles,
    FROM `bigquery-public-data.human_genome_variants.platinum_genomes_deepvariant_variants_20180823`,
         UNNEST(alternate_bases) alternate_bases WITH OFFSET alt_offset
),
intervals AS (
  -- Define an inline table that uses five rows
  -- selected from silver-wall-555.TuteTable.hg19.
  SELECT *
    FROM UNNEST([
           STRUCT<Gene STRING, Chr STRING,
                  gene_start INT64, gene_end INT64,
                  region_start INT64, region_end INT64>
           ('PRCC',  '1', 156736274, 156771607, 156636274, 156871607),
           ('NTRK1', '1', 156785541, 156852640, 156685541, 156952640),
           ('PAX8',  '2', 113972574, 114037496, 113872574, 114137496),
           ('FHIT',  '3',  59734036,  61238131,  59634036,  61338131),
           ('PPARG', '3',  12328349,  12476853,  12228349,  12576853)
         ])
),
gene_variants AS (
  --
  -- JOIN the variants with the genomic intervals overlapping
  -- the genes of interest.
  --
  -- The JOIN criteria is complicated because the task is to see if
  -- an SNP overlaps an interval.  With standard SQL you can use complex
  -- JOIN predicates, including arbitrary expressions.
  SELECT reference_name,
         start_position,
         reference_bases,
         alt,
         num_variant_alleles,
         total_num_alleles,
    FROM variants v
    JOIN intervals i
      ON v.reference_name = i.Chr
     AND i.region_start <= v.start_position
     AND i.region_end >= v.end_position
)
--
-- And finally JOIN the variants in the regions of interest
-- with annotations for rare variants.
SELECT DISTINCT
       Chr,
       annots.Start AS Start,
       Ref,
       annots.Alt,
       Func,
       Gene,
       PopFreqMax,
       ExonicFunc,
       num_variant_alleles,
       total_num_alleles,
  FROM `silver-wall-555.TuteTable.hg19` AS annots
  JOIN gene_variants AS vars
    ON vars.reference_name = annots.Chr
   AND vars.start_position = annots.Start
   AND vars.reference_bases = annots.Ref
   AND vars.alt = annots.Alt
 WHERE PopFreqMax <= 0.01 -- Retrieve annotations for rare variants only.
 ORDER BY Chr, Start
;

์ฐธ๊ณ  (references)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •