Search

CSV

CSV (Comma-Separated Values)

์‰ผํ‘œ(,)๋กœ ๊ตฌ๋ถ„๋œ ๊ฐ’๋“ค์„ ์ด์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๋Š” ํŒŒ์ผ ํ˜•์‹
โ€ข
CSV ํŒŒ์ผ์€ ์Šคํ”„๋ ˆ๋“œ์‹œํŠธ๋‚˜ ๋ฐ์ดํ„ฐ๋ฒ ์ด์Šค์™€ ๊ฐ™์€ ํ‘œ ํ˜•์‹์˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ €์žฅํ•˜๋Š” ๋ฐ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.
โ€ข
CSV ํŒŒ์ผ์˜ ๊ฐ ํ–‰์€ ๋ฐ์ดํ„ฐ์˜ ํ•œ ํ–‰์„ ๋‚˜ํƒ€๋‚ด๋ฉฐ, ๊ฐ ํ–‰ ๋‚ด์˜ ๊ฐ’๋“ค์€ ์‰ผํ‘œ๋กœ ๊ตฌ๋ถ„๋ฉ๋‹ˆ๋‹ค.
โ€ข
CSV ํŒŒ์ผ์€ ๋ฐ์ดํ„ฐ๋ฅผ ์„œ๋กœ ๋‹ค๋ฅธ ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ๊ฐ„์— ๊ตํ™˜ํ•˜๋Š” ๋ฐ ๋„๋ฆฌ ์‚ฌ์šฉ๋ฉ๋‹ˆ๋‹ค.

์˜ˆ์‹œ

์ด๋ฆ„, ๋‚˜์ด, ๋„์‹œ ์ฒ ์ˆ˜, 25, ์„œ์šธ ์˜ํฌ, 30, ๋ถ€์‚ฐ ๋ฏผ์ˆ˜, 28, ๋Œ€๊ตฌ
Python
๋ณต์‚ฌ

pandas ๋ชจ๋“ˆ

๋ฐ์ดํ„ฐ ์กฐ์ž‘๊ณผ ๋ถ„์„์„ ์œ„ํ•œ ํŒŒ์ด์ฌ ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ
ํŠนํžˆ ๊ตฌ์กฐํ™”๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋‹ค๋ฃจ๋Š” ๋ฐ ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค. ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„(DataFrame) ์ด๋ผ๋Š” ์ž๋ฃŒ๊ตฌ์กฐ๋ฅผ ํ†ตํ•ด ๋ฐ์ดํ„ฐ๋ฅผ ํšจ์œจ์ ์„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค๋‹ˆ๋‹ค.

CSV ํŒŒ์ผ ํŒŒ์ด์ฌ์œผ๋กœ ๋‹ค๋ฃจ๊ธฐ

โ€ข
CSV ํŒŒ์ผ ์ฝ๊ณ  ์“ฐ๊ธฐ
โ€ข
ํŠน์ • ํ–‰์„ ํ•„ํ„ฐ๋งํ•˜๊ธฐ
โ€ข
ํŠน์ • ์—ด ์„ ํƒํ•˜๊ธฐ
โ€ข
์—ฐ์†๋œ ํ–‰ ์„ ํƒํ•˜๊ธฐ
โ€ข
ํ—ค๋” ์ถ”๊ฐ€ํ•˜๊ธฐ
โ€ข
์—ฌ๋Ÿฌ ๊ฐœ์˜ CSV ํŒŒ์ผ ์ฝ๊ธฐ
โ€ข
์—ฌ๋Ÿฌ ํŒŒ์ผ์˜ ๋ฐ์ดํ„ฐ ํ•ฉ์น˜๊ธฐ
โ€ข
ํŒŒ์ผ์—์„œ ๋ฐ์ดํ„ฐ ๊ฐ’์˜ ํ•ฉ๊ณ„ ๋ฐ ํ‰๊ท  ๊ณ„์‚ฐํ•˜๊ธฐ

CSV ํŒŒ์ผ ์ฝ๊ณ  ์“ฐ๊ธฐ

ํŒŒ์ด์ฌ์„ ์ด์šฉํ•˜์—ฌ ํŒŒ์ผ์„ ์ฝ๊ณ  ์“ฐ๋Š” ๋ฐฉ๋ฒ•์„ ์•Œ์•„๋ด…๋‹ˆ๋‹ค.
โ€ข
ํŒŒ์ด์ฌ ๊ธฐ๋ณธ์œผ๋กœ CSV ์ฝ๊ณ  ์“ฐ๊ธฐ
โ—ฆ
๊ธฐ๋ณธ ํŒŒ์ผ ์ž…์ถœ๋ ฅ
โ—ฆ
csv ๋ชจ๋“ˆ์„ ์ด์šฉํ•œ ์ž…์ถœ๋ ฅ
โ€ข
ํŒ๋‹ค์Šค ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ด์šฉํ•˜์—ฌ CSV ์ฝ๊ณ  ์“ฐ๊ธฐ

ํŒŒ์ด์ฌ ๊ธฐ๋ณธ์œผ๋กœ CSV ์ฝ๊ณ  ์“ฐ๊ธฐ

โ€ข
๊ธฐ๋ณธ ํŒŒ์ผ ์ž…์ถœ๋ ฅ
โ€ข
csv ๋ชจ๋“ˆ์„ ์ด์šฉํ•œ ์ž…์ถœ๋ ฅ

๊ธฐ๋ณธ ํŒŒ์ผ ์ž…์ถœ๋ ฅ

CSVํŒŒ์ผ์ž…์ถœ๋ ฅ.py

import sys input_file = sys.argv[1] output_file = sys.argv[2] with open(input_file, 'r', newline='') as filereader: with open(output_file, 'w', newline='') as filewriter: header = filereader.readline() header = header.strip() header_list = header.split(',') print(header_list) filewriter.write(','.join(map(str,header_list))+'\n') for row in filereader: row = row.strip() row_list = row.split(',') print(row_list) filewriter.write(','.join(map(str,row_list))+'\n')
Python
๋ณต์‚ฌ
โ€ข
์‹คํ–‰๋ฐฉ๋ฒ•
python ํŒŒ์ผ๋ช… "์ž…๋ ฅcsvํŒŒ์ผ๊ฒฝ๋กœ" "์ถœ๋ ฅcsvํŒŒ์ผ๊ฒฝ๋กœ"
Python
๋ณต์‚ฌ
์ž…๋ ฅํŒŒ์ผ๊ณผ ์ถœ๋ ฅํŒŒ์ผ์˜ ์ ˆ๋Œ€๊ฒฝ๋กœ๋ฅผ โ€œ๋ฌธ์ž์—ด๋กœโ€ ์ง€์ •ํ•˜์—ฌ ํ”„๋กœ๊ทธ๋žจ ์‹คํ–‰ ์‹œ ์ธ์ž๋กœ ๋„ฃ์–ด์ค€๋‹ค.

csv ๋ชจ๋“ˆ์„ ์ด์šฉํ•œ ์ž…์ถœ๋ ฅ

csv ๋ชจ๋“ˆ์„ ์ด์šฉํ•œ ์ž…์ถœ๋ ฅ

import csv import sys input_file = sys.argv[1] output_file = sys.argv[2] with open(input_file, 'r', newline='') as csv_in_file: with open(output_file, 'w', newline='') as csv_out_file: filereader = csv.reader(csv_in_file, delimiter=',') filewriter = csv.writer(csv_out_file, delimiter=',') for row_list in filereader: filewriter.writerow(row_list)
Python
๋ณต์‚ฌ

ํŒ๋‹ค์Šค ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ์ด์šฉํ•˜์—ฌ CSV ์ฝ๊ณ  ์“ฐ๊ธฐ

ํŒ๋‹ค์ŠคCSVํŒŒ์ผ์ž…์ถœ๋ ฅ.py

import sys import pandas as pd input_file = sys.argv[1] output_file = sys.argv[2] data_frame = pd.read_csv(input_file) print(data_frame) data_frame.to_csv(output_file, index=False)
Python
๋ณต์‚ฌ
(์—๋Ÿฌ)
import pandas as pd ModuleNotFoundError: No module named 'pandas'
pandas ๋ชจ๋“ˆ์ด ์„ค์น˜๋˜์–ด ์žˆ์ง€ ์•Š๋‹ค๋ฉด, ๋ชจ๋“ˆ ์„ค์น˜๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.

ํŒ๋‹ค์Šค ๋ชจ๋“ˆ ์„ค์น˜ํ•˜๊ธฐ

pip install pandas
Python
๋ณต์‚ฌ

ํŒŒ์ผ ๊ฒฝ๋กœ ์ง€์ • ๋” ์‰ฝ๊ฒŒ ํ•˜๊ธฐ

์ฃผ์–ด์ง„ ์˜ˆ์ œ ์ฝ”๋“œ์—์„œ๋Š” sys ๋ชจ๋“ˆ์„ ์ด์šฉํ•˜์—ฌ ์•„๋ž˜์™€ ๊ฐ™์€ ๋ฐฉ๋ฒ•์œผ๋กœ ํŒŒ์ผ ๊ฒฝ๋กœ๋ฅผ ๊ฐ€์ ธ์˜จ๋‹ค.
import sys input_file = sys.argv[1] output_file = sys.argv[2]
Python
๋ณต์‚ฌ
์ด๋ ‡๊ฒŒ ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•˜๋ฉด, ๋งค๋ฒˆ python ๋ช…๋ น์–ด๋ฅผ ํ„ฐ๋ฏธ๋„์— ์ž…๋ ฅํ•ด์„œ ์‹คํ–‰ํ•ด์•ผํ•˜๊ณ , ๋˜ ํŒŒ์ผ๊ฒฝ๋กœ๋ฅผ ๋งค๋ฒˆ ๋ณต์‚ฌํ•ด์„œ ์ž…๋ ฅํ•ด์ฃผ์–ด์•ผํ•˜๊ธฐ ๋•Œ๋ฌธ์— ๋ฒˆ๊ฑฐ๋กœ์›€์ด ์žˆ๋‹ค.
python ํŒŒ์ผ๋ช… "์ž…๋ ฅcsvํŒŒ์ผ๊ฒฝ๋กœ" "์ถœ๋ ฅcsvํŒŒ์ผ๊ฒฝ๋กœ"
Bash
๋ณต์‚ฌ
์œ„์™€ ๊ฐ™์€ ๋ฐฉ๋ฒ•์œผ๋กœ ํ”„๋กœ๊ทธ๋žจ์„ ์‹คํ–‰ํ•˜๋ฉด, ๊ฒฝ๋กœ๋ฅผ ์‹ค์ˆ˜๋กœ ์ž˜๋ชป ์ž…๋ ฅํ•˜๊ฑฐ๋‚˜ ์˜คํƒ€๊ฐ€ ์žˆ์–ด ์‹คํ–‰์ด ์•ˆ๋˜๋Š” ๊ฒฝ์šฐ๊ฐ€ ๋งŽ์ด ๋ฐœ์ƒ๋œ๋‹ค.
๋”ฐ๋ผ์„œ ์•„๋ž˜์™€ ๊ฐ™์€ ๋ฐฉ๋ฒ•์œผ๋กœ ๋ฏธ๋ฆฌ ํ”„๋กœ๊ทธ๋žจ์„ ์‹คํ–‰ํ•˜๋Š” ๊ฒฝ๋กœ๋ฅผ ์ง€์ •ํ•˜๊ณ , ์ž…๋ ฅ ํŒŒ์ผ๊ณผ ์ถœ๋ ฅ ํŒŒ์ผ ์ด๋ฆ„๋งŒ ์ž…๋ ฅํ•˜์—ฌ ์ž…๋ ฅ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์™€์„œ ๋ถ„์„ํ•˜๋Š” ํ˜•์‹์œผ๋กœ ์ฝ”๋“œ๋ฅผ ์‚ฌ์šฉํ•ด๋ณด๋ ค๊ณ  ํ•œ๋‹ค.

์‹ค์Šต ํด๋” ๊ตฌ์กฐ

๐Ÿ“ฆ workspace โ”œโ”€โ”€ ๐Ÿ“ path โ”‚ โ”œโ”€โ”€ ๐Ÿ“ input โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“œ example.csv โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“œ example2.xlsx โ”‚ โ”‚ โ””โ”€โ”€ ... โ”‚ โ”œโ”€โ”€ ๐Ÿ“ output โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“œ output.csv โ”‚ โ”‚ โ”œโ”€โ”€ ๐Ÿ“œ output2.xlsx โ”‚ โ”‚ โ””โ”€โ”€ ... โ”‚ โ”œโ”€โ”€ ๐Ÿ“„ example.py โ”‚ โ””โ”€โ”€ ๐Ÿ“„ example2.py โ””โ”€โ”€ ๐Ÿ“„ README.md
Plain Text
๋ณต์‚ฌ
โ€ข
workspace : ์ž‘์—… ํด๋”
โ€ข
path : ๊ฐ ์˜ˆ์ œ๋“ค์„ ๊ตฌ๋ถ„ํ•  ํด๋”
โ—ฆ
input : ์ž…๋ ฅ ๋ฐ์ดํ„ฐ ํŒŒ์ผ์„ ์ €์žฅํ•œ๋‹ค.
โ—ฆ
output : ์ถœ๋ ฅ ๋ฐ์ดํ„ฐ ํŒŒ์ผ์„ ์ €์žฅํ•œ๋‹ค.

์ž…์ถœ๋ ฅ ํŒŒ์ผ ์ง€์ •ํ•˜๋Š” ์ฝ”๋“œ

import os # ์‹คํ–‰ ํ”„๋กœ๊ทธ๋žจ์˜ ๊ฒฝ๋กœ program_path = os.path.abspath(__file__) # ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ฒฝ๋กœ - ์ด ์•ˆ์˜ input, output ํด๋”์—์„œ ์ž…์ถœ๋ ฅํ•œ๋‹ค. path = os.path.dirname(program_path) # ์ž…๋ ฅํŒŒ์ผ, ์ถœ๋ ฅํŒŒ์ผ input_file = path + '/input/' + input('์ž…๋ ฅ ํŒŒ์ผ : ') output_file = path + '/output/' + input('์ถœ๋ ฅ ํŒŒ์ผ : ')
Python
๋ณต์‚ฌ
์œ„์˜ ์ฝ”๋“œ๋กœ, os ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•˜์—ฌ ์‹œ์Šคํ…œ์œผ๋กœ๋ถ€ํ„ฐ ์‹คํ–‰๋œ ํ”„๋กœ๊ทธ๋žจ์˜ ์ ˆ๋Œ€ ๊ฒฝ๋กœ๋ฅผ ๊ฐ€์ ธ์˜ค๊ณ , ๊ทธ ๊ฒฝ๋กœ๋กœ๋ถ€ํ„ฐ ํ˜„์žฌ ๋””๋ ‰ํ„ฐ๋ฆฌ๋ฅผ ์•Œ์•„๋‚ธ๋‹ค.
๊ทธ๋ฆฌ๊ณ , ํ˜„์žฌ ๋””๋ ‰ํ„ฐ๋ฆฌ ์•„๋ž˜ ์ค€๋น„ํ•œ input, output ํด๋”์— ์ž…๋ ฅํ•  ํŒŒ์ผ ๋ช…์„ ์ž…๋ ฅ ๋ฐ›์•„ ์ž…๋ ฅ ํŒŒ์ผ๊ณผ ์ถœ๋ ฅ ํŒŒ์ผ์„ ์ง€์ •ํ•˜์—ฌ ์‚ฌ์šฉํ•œ๋‹ค.

ํŠน์ • ํ–‰์„ ํ•„ํ„ฐ๋งํ•˜๊ธฐ

โ€ข
์กฐ๊ฑด์œผ๋กœ ํ•„ํ„ฐ๋งํ•˜๊ธฐ
โ€ข
์ง‘ํ•ฉ์œผ๋กœ ํ•„ํ„ฐ๋งํ•˜๊ธฐ
โ€ข
์ •๊ทœํ‘œํ˜„์‹์œผ๋กœ ํ•„ํ„ฐ๋งํ•˜๊ธฐ

์กฐ๊ฑด์œผ๋กœ ํ•„ํ„ฐ๋งํ•˜๊ธฐ

โ€ข
csv ๋ชจ๋“ˆ ์ด์šฉ
โ€ข
pandas ๋ชจ๋“ˆ ์ด์šฉ

csv ๋ชจ๋“ˆ ์ด์šฉ

import csv import os # ์‹คํ–‰ ํ”„๋กœ๊ทธ๋žจ์˜ ๊ฒฝ๋กœ program_path = os.path.abspath(__file__) # ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ฒฝ๋กœ - ์ด ์•ˆ์˜ input, output ํด๋”์—์„œ ์ž…์ถœ๋ ฅํ•œ๋‹ค. path = os.path.dirname(program_path) # ์ž…๋ ฅํŒŒ์ผ, ์ถœ๋ ฅํŒŒ์ผ input_file = os.path.join(path, 'input', input('์ž…๋ ฅ ํŒŒ์ผ : ')) output_file = os.path.join(path, 'output', input('์ถœ๋ ฅ ํŒŒ์ผ : ')) with open(input_file, 'r', newline='') as csv_in_file: with open(output_file, 'w', newline='') as csv_out_file: filereader = csv.reader(csv_in_file) # csv ์ฝ๊ธฐ ๋ชจ๋“œ ๊ฐ์ฒด ์ƒ์„ฑ filewriter = csv.writer(csv_out_file) # csv ์“ฐ๊ธฐ ๋ชจ๋“œ ๊ฐ์ฒด ์ƒ์„ฑ header = next(filereader) # ์ฒซ ํ–‰์„ ์ž…๋ ฅ filewriter.writerow(header) # ์ฒซ ํ–‰์„ ์ถœ๋ ฅ for row_list in filereader: supplier = str(row_list[0]).strip() # ๊ณต๊ธ‰์—…์ฒด๋ช… cost = str(row_list[3]).strip('$').replace(',', '') # ๊ฐ€๊ฒฉ # ์กฐ๊ฑด์œผ๋กœ ํŠน์ •ํ–‰ ํ•„ํ„ฐ if supplier == 'Supplier Z' and float(cost) > 600.0: filewriter.writerow(row_list)
Python
๋ณต์‚ฌ

pandas ๋ชจ๋“ˆ ์ด์šฉ

import pandas as pd import os # ์‹คํ–‰ ํ”„๋กœ๊ทธ๋žจ์˜ ๊ฒฝ๋กœ program_path = os.path.abspath(__file__) # ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ฒฝ๋กœ - ์ด ์•ˆ์˜ input, output ํด๋”์—์„œ ์ž…์ถœ๋ ฅํ•œ๋‹ค. path = os.path.dirname(program_path) # ์ž…๋ ฅํŒŒ์ผ, ์ถœ๋ ฅํŒŒ์ผ input_file = path + '/input/' + input('์ž…๋ ฅ ํŒŒ์ผ : ') output_file = path + '/output/' + input('์ถœ๋ ฅ ํŒŒ์ผ : ') # csv ์ฝ์–ด์„œ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ ๊ฐ€์ ธ์˜จ๋‹ค data_frame = pd.read_csv(input_file) # Cost ์—ด์—์„œ $ ๊ธฐํ˜ธ๋ฅผ ์ œ๊ฑฐ, float(์ˆซ์ž) ํƒ€์ž…์œผ๋กœ ๋ณ€ํ™˜ (๋น„๊ต์—ฐ์‚ฐ์„ ์œ„ํ•ด์„œ) data_frame['Cost'] = data_frame['Cost'].str.strip('$').astype(float) # ํŠน์ •ํ–‰ ํ•„ํ„ฐ๋ง # loc[ ํ–‰ ๋ผ๋ฒจ, ์—ด ๋ผ๋ฒจ ] # : ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์—์„œ ์ง€์ •ํ•œ ํ–‰๊ณผ ์—ด์„ ์„ ํƒํ•œ ํ•จ์ˆ˜ data_frame_value_meets_condition = data_frame.loc[(data_frame['Supplier Name']\ .str.contains('Z')) | (data_frame['Cost'] > 600.0), :] # โ“ OR : |, AND : & # data_frame.loc[ (A | B), : ] # : A ๋˜๋Š” B ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜๋Š” ํ–‰์„ ์„ ํƒํ•˜๊ณ , ๋ชจ๋“  ์—ด์„ ์„ ํƒ ํ•œ๋‹ค. # loc[ ํ–‰๋ผ๋ฒจ, ์—ด๋ผ๋ฒจ ] # โœ… ํŠน์ • ํ–‰ ๋˜๋Š” ์—ด์„ ์„ ํƒํ•˜์ง€ ์•Š๋Š”๋‹ค๋ฉด : ์œผ๋กœ ์ƒ๋žต๊ฐ€๋Šฅ # | : OR ์—ฐ์‚ฐ ( A ๋˜๋Š” B ) # (data_frame['Supplier Name']\.str.contains('Z')) # 1 - ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์—์„œ 'Supplier Name' ์—ด์„ ๋ฌธ์ž์—ด๋กœ ๊ฐ€์ ธ์˜จ๋‹ค # 2 - 'Supplier Name' ์—ด์—์„œ 'Z' ๊ฐ€ ํฌํ•จ๋œ ์—ฌ๋ถ€๋ฅผ ๋ฐ˜ํ™˜ํ•œ๋‹ค. (True, False) # (data_frame['Cost'] > 600.0) # 1 - ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์—์„œ 'Cost' ์—ด์„ ๊ฐ€์ ธ์˜จ๋‹ค. (float) # 2 - 600.0 ์ดˆ๊ณผ์ธ ์—ฌ๋ถ€๋ฅผ ๋ฐ˜ํ™˜ํ•œ๋‹ค. (True, False) # ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ csv ํŒŒ์ผ๋กœ ์ถœ๋ ฅ data_frame_value_meets_condition.to_csv(output_file, index=False)
Python
๋ณต์‚ฌ

์ง‘ํ•ฉ์œผ๋กœ ํ•„ํ„ฐ๋งํ•˜๊ธฐ

โ€ข
csv ๋ชจ๋“ˆ ์ด์šฉ
โ€ข
pandas ๋ชจ๋“ˆ ์ด์šฉ

csv ๋ชจ๋“ˆ ์ด์šฉ

import csv import os # ์‹คํ–‰ ํ”„๋กœ๊ทธ๋žจ์˜ ๊ฒฝ๋กœ program_path = os.path.abspath(__file__) # ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ฒฝ๋กœ - ์ด ์•ˆ์˜ input, output ํด๋”์—์„œ ์ž…์ถœ๋ ฅํ•œ๋‹ค. path = os.path.dirname(program_path) # ์ž…๋ ฅํŒŒ์ผ, ์ถœ๋ ฅํŒŒ์ผ input_file = path + '/input/' + input('์ž…๋ ฅ ํŒŒ์ผ : ') output_file = path + '/output/' + input('์ถœ๋ ฅ ํŒŒ์ผ : ') important_dates = ['1/20/14', '1/30/14'] # ํŠน์ • ๋‚ ์งœ ์ง‘ํ•ฉ์„ ๋ฆฌ์ŠคํŠธ๋กœ ์„ ์ธ with open(input_file, 'r', newline='') as csv_in_file: with open(output_file, 'w', newline='') as csv_out_file: filereader = csv.reader(csv_in_file) filewriter = csv.writer(csv_out_file) header = next(filereader) filewriter.writerow(header) for row_list in filereader: a_date = row_list[4] # ๊ตฌ๋งค์ผ์ž if a_date in important_dates: # ํŠน์ • ๋ฆฌ์ŠคํŠธ์— ํฌํ•จ ์—ฌ๋ถ€ ๋ฐ˜ํ™˜ filewriter.writerow(row_list)
Python
๋ณต์‚ฌ

pandas ๋ชจ๋“ˆ ์ด์šฉ

import pandas as pd import os # ์‹คํ–‰ ํ”„๋กœ๊ทธ๋žจ์˜ ๊ฒฝ๋กœ program_path = os.path.abspath(__file__) # ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ฒฝ๋กœ - ์ด ์•ˆ์˜ input, output ํด๋”์—์„œ ์ž…์ถœ๋ ฅํ•œ๋‹ค. path = os.path.dirname(program_path) # ์ž…๋ ฅํŒŒ์ผ, ์ถœ๋ ฅํŒŒ์ผ input_file = path + '/input/' + input('์ž…๋ ฅ ํŒŒ์ผ : ') output_file = path + '/output/' + input('์ถœ๋ ฅ ํŒŒ์ผ : ') data_frame = pd.read_csv(input_file) important_dates = ['1/20/14', '1/30/14'] # ํŠน์ • ๋‚ ์งœ ์ง‘ํ•ฉ์„ ๋ฆฌ์ŠคํŠธ๋กœ ์„ ์–ธ # ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„.loc[ ํ–‰๋ผ๋ฒจ, ์—ด๋ผ๋ฒจ ] # : ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ ํŠน์ • ํ–‰ ๋ฐ ์—ด์„ ์„ ํƒํ•˜๋Š” ํ•จ์ˆ˜ data_frame_value_in_set = data_frame.loc[data_frame['Purchase Date']\ .isin(important_dates), :] # data_frame['Purchase Date'] : Series ๊ฐ์ฒด # -> ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์—์„œ ํŠน์ • ์—ด์„ ์„ ํƒํ•˜๋ฉด ๊ทธ ๊ตฌ์กฐ๋Š” ์‹œ๋ฆฌ์ฆˆ๊ฐ€ ๋œ๋‹ค. # isin() # : ํ•ด๋‹น Series ๊ฐ์ฒด์˜ ํŠน์ • ๊ฐ’์ด๋‚˜ ์ง‘ํ•ฉ์— ์†ํ•˜๋Š”์ง€ ์—ฌ๋ถ€๋ฅผ ๋ฐ˜ํ™˜ (True, False) data_frame_value_in_set.to_csv(output_file, index=False)
Python
๋ณต์‚ฌ

์ •๊ทœํ‘œํ˜„์‹์œผ๋กœ ํ•„ํ„ฐ๋งํ•˜๊ธฐ

โ€ข
csv ๋ชจ๋“ˆ ์ด์šฉ
โ€ข
pandas ๋ชจ๋“ˆ ์ด์šฉ

csv ๋ชจ๋“ˆ ์ด์šฉ

import csv import re import os # ์‹คํ–‰ ํ”„๋กœ๊ทธ๋žจ์˜ ๊ฒฝ๋กœ program_path = os.path.abspath(__file__) # ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ฒฝ๋กœ - ์ด ์•ˆ์˜ input, output ํด๋”์—์„œ ์ž…์ถœ๋ ฅํ•œ๋‹ค. path = os.path.dirname(program_path) # ์ž…๋ ฅํŒŒ์ผ, ์ถœ๋ ฅํŒŒ์ผ input_file = path + '/input/' + input('์ž…๋ ฅ ํŒŒ์ผ : ') output_file = path + '/output/' + input('์ถœ๋ ฅ ํŒŒ์ผ : ') # ์ •๊ทœ ํ‘œํ˜„์‹ ํŒจํ„ด ์„ค์ • pattern = re.compile(r'(?P<my_pattern_group>^001-.*)', re.I) # re.I : ๋Œ€์†Œ๋ฌธ์ž ๊ตฌ๋ถ„ ์—†์ด ๋งค์นญ print('pattern : {}'.format( pattern )) # ^001-.* # 1 - ^001- : 001- ๋กœ ์‹œ์ž‘ํ•˜๋Š” ํŒจํ„ด ๋งค์นญ # 2 - .* : . ์€ ํ•œ๋ฌธ์ž ๋Œ€์ฒด, * 0 ํšŒ์ด์ƒ โžก ํ•œ ๋ฌธ์ž ์ด์ƒ ๋งค์นญ # โžก 001- ๋’ค์— ํ•œ ๋ฌธ์ž ์ด์ƒ์ธ ํŒจํ„ด์„ ๋งค์นญ with open(input_file, 'r', newline='') as csv_in_file: with open(output_file, 'w', newline='') as csv_out_file: filereader = csv.reader(csv_in_file) filewriter = csv.writer(csv_out_file) header = next(filereader) filewriter.writerow(header) for row_list in filereader: invoice_number = row_list[1] # invoice_number if pattern.search(invoice_number): # ํŒจํ„ด ํ™•์ธ filewriter.writerow(row_list)
Python
๋ณต์‚ฌ

pandas ๋ชจ๋“ˆ ์ด์šฉ

import pandas as pd import re import os # ์‹คํ–‰ ํ”„๋กœ๊ทธ๋žจ์˜ ๊ฒฝ๋กœ program_path = os.path.abspath(__file__) # ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ฒฝ๋กœ - ์ด ์•ˆ์˜ input, output ํด๋”์—์„œ ์ž…์ถœ๋ ฅํ•œ๋‹ค. path = os.path.dirname(program_path) # ์ž…๋ ฅํŒŒ์ผ, ์ถœ๋ ฅํŒŒ์ผ input_file = path + '/input/' + input('์ž…๋ ฅ ํŒŒ์ผ : ') output_file = path + '/output/' + input('์ถœ๋ ฅ ํŒŒ์ผ : ') data_frame = pd.read_csv(input_file) # ix[ , ] # : deperecated (๋” ์ด์ƒ ์‚ฌ์šฉ ๊ถŒ์žฅ) โžก ๋ฒ„์ „ ์—…๋ฐ์ดํŠธ ๋˜๋ฉด์„œ ์ƒˆ๋กœ ๋‹ค๋ฅธ ๋ฌธ๋ฒ•์ด ๋Œ€์ฒด # ix[ , ] โžก loc[ , ] # '001-' ๋กœ ์‹œ์ž‘ํ•˜๋Š” ํ–‰์„ ์„ ํƒํ•˜์—ฌ ๋ฐ˜ํ™˜ # condition = data_frame['Invoice Number'].str.startswith("001-") # 'Z' ๋กœ ๋๋‚˜๋Š” ํ–‰์„ ์„ ํƒํ•˜์—ฌ ๋ฐ˜ํ™˜ # condition = data_frame['Supplier Name'].str.endswith('Z') # match(์ •๊ทœํ‘œํ˜„์‹) : ๋ฌธ์ž์—ด์—์„œ ์ •๊ทœํ‘œํ˜„์‹์— ๋”ฐ๋ผ ํŒจํ„ด ๋งค์นญ pattern = re.compile(r'(?P<my_pattern_group>^001-.*)', re.I) condition = data_frame['Invoice Number'].str.match(pattern) data_frame_value_matches_pattern = data_frame.loc[ condition, : ] data_frame_value_matches_pattern.to_csv(output_file, index=False)
Python
๋ณต์‚ฌ

ํŠน์ • ์—ด ์„ ํƒํ•˜๊ธฐ

โ€ข
์—ด์˜ ์ธ๋ฑ์Šค๋กœ ์„ ํƒํ•˜๊ธฐ
โ€ข
์—ด์˜ ํ—ค๋”๋ช…์œผ๋กœ ์„ ํƒํ•˜๊ธฐ

์—ด์˜ ์ธ๋ฑ์Šค๋กœ ์„ ํƒํ•˜๊ธฐ

โ€ข
csv ๋ชจ๋“ˆ ์ด์šฉ
โ€ข
pandas ๋ชจ๋“ˆ ์ด์šฉ

csv ๋ชจ๋“ˆ ์ด์šฉ

import csv import os # ์‹คํ–‰ ํ”„๋กœ๊ทธ๋žจ์˜ ๊ฒฝ๋กœ program_path = os.path.abspath(__file__) # ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ฒฝ๋กœ - ์ด ์•ˆ์˜ input, output ํด๋”์—์„œ ์ž…์ถœ๋ ฅํ•œ๋‹ค. path = os.path.dirname(program_path) # ์ž…๋ ฅํŒŒ์ผ, ์ถœ๋ ฅํŒŒ์ผ input_file = path + '/input/' + input('์ž…๋ ฅ ํŒŒ์ผ : ') output_file = path + '/output/' + input('์ถœ๋ ฅ ํŒŒ์ผ : ') # 0, 3 ๋ฒˆ ์ธ๋ฑ์Šค์— ํ•ด๋‹นํ•˜๋Š” ์—ด์„ ์„ ํƒํ•˜๊ธฐ ์œ„ํ•œ ๋ฆฌ์ŠคํŠธ my_columns = [0, 3] with open(input_file, 'r', newline='') as csv_in_file: with open(output_file, 'w', newline='') as csv_out_file: filereader = csv.reader(csv_in_file) filewriter = csv.writer(csv_out_file) for row_list in filereader: row_list_output = [ ] # my_columns ๋ฆฌ์ŠคํŠธ ๋ฐ˜๋ณต - index_value : 0, 3 for index_value in my_columns: # row_list[0] : ๊ณต๊ธ‰์ž๋ช…(supplier name) # row_list[3] : ๊ฐ€๊ฒฉ(cost) row_list_output.append(row_list[index_value]) filewriter.writerow(row_list_output)
Python
๋ณต์‚ฌ

pandas ๋ชจ๋“ˆ ์ด์šฉ

import pandas as pd import os # ์‹คํ–‰ ํ”„๋กœ๊ทธ๋žจ์˜ ๊ฒฝ๋กœ program_path = os.path.abspath(__file__) # ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ฒฝ๋กœ - ์ด ์•ˆ์˜ input, output ํด๋”์—์„œ ์ž…์ถœ๋ ฅํ•œ๋‹ค. path = os.path.dirname(program_path) # ์ž…๋ ฅํŒŒ์ผ, ์ถœ๋ ฅํŒŒ์ผ input_file = path + '/input/' + input('์ž…๋ ฅ ํŒŒ์ผ : ') output_file = path + '/output/' + input('์ถœ๋ ฅ ํŒŒ์ผ : ') data_frame = pd.read_csv(input_file) # loc[ ํ–‰๋ผ๋ฒจ , ์—ด๋ผ๋ฒจ ] # : ํ–‰๋ผ๋ฒจ, ์—ด๋ผ๋ฒจ์œผ๋กœ ๋ฐ์ดํ„ฐ ์„ ํƒ # iloc[ ํ–‰index, ์—ดindex ] # index + location - index ๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ ํ–‰๊ณผ ์—ด์„ ์„ ํƒํ•˜๋Š” ํ•จ์ˆ˜ # iloc[ ํ–‰ , ์—ด [0,3] ] # : 0, 3 ๋ฒˆ index ์— ํ•ด๋‹นํ•˜๋Š” ์—ด์„ ์„ ํƒ # data_frame_column_by_index = data_frame.iloc[:, [0, 3]] # 1~3์—ด (Supplier Name,Invoice Number,Part Number)๋ฅผ ์„ ํƒ # - ๋ฆฌ์ŠคํŠธ์— ์ง€์ •ํ•  index ๋ฅผ ๋‹ด์•„์„œ ์„ ํƒ # data_frame_column_by_index = data_frame.iloc[:, [0, 1, 2] ] # - index ๋ฒ”์œ„๋กœ ์„ ํƒ data_frame_column_by_index = data_frame.iloc[:, 0:3 ] data_frame_column_by_index.to_csv(output_file, index=False)
Python
๋ณต์‚ฌ

์—ด์˜ ํ—ค๋”๋ช…์œผ๋กœ ์„ ํƒํ•˜๊ธฐ

โ€ข
csv ๋ชจ๋“ˆ ์ด์šฉ
โ€ข
pandas ๋ชจ๋“ˆ ์ด์šฉ

csv ๋ชจ๋“ˆ ์ด์šฉ

import csv import os # ์‹คํ–‰ ํ”„๋กœ๊ทธ๋žจ์˜ ๊ฒฝ๋กœ program_path = os.path.abspath(__file__) # ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ฒฝ๋กœ - ์ด ์•ˆ์˜ input, output ํด๋”์—์„œ ์ž…์ถœ๋ ฅํ•œ๋‹ค. path = os.path.dirname(program_path) # ์ž…๋ ฅํŒŒ์ผ, ์ถœ๋ ฅํŒŒ์ผ input_file = path + '/input/' + input('์ž…๋ ฅ ํŒŒ์ผ : ') output_file = path + '/output/' + input('์ถœ๋ ฅ ํŒŒ์ผ : ') my_columns = ['Invoice Number', 'Purchase Date'] my_columns_index = [] with open(input_file, 'r', newline='') as csv_in_file: with open(output_file, 'w', newline='') as csv_out_file: filereader = csv.reader(csv_in_file) filewriter = csv.writer(csv_out_file) header = next(filereader) for index_value in range(len(header)): if header[index_value] in my_columns: my_columns_index.append(index_value) # [ 1, 4 ] filewriter.writerow(my_columns) for row_list in filereader: row_list_output = [ ] for index_value in my_columns_index: row_list_output.append(row_list[index_value]) filewriter.writerow(row_list_output)
Python
๋ณต์‚ฌ

pandas ๋ชจ๋“ˆ ์ด์šฉ

import pandas as pd import os # ์‹คํ–‰ ํ”„๋กœ๊ทธ๋žจ์˜ ๊ฒฝ๋กœ program_path = os.path.abspath(__file__) # ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ฒฝ๋กœ - ์ด ์•ˆ์˜ input, output ํด๋”์—์„œ ์ž…์ถœ๋ ฅํ•œ๋‹ค. path = os.path.dirname(program_path) # ์ž…๋ ฅํŒŒ์ผ, ์ถœ๋ ฅํŒŒ์ผ input_file = path + '/input/' + input('์ž…๋ ฅ ํŒŒ์ผ : ') output_file = path + '/output/' + input('์ถœ๋ ฅ ํŒŒ์ผ : ') data_frame = pd.read_csv(input_file) # ์„ ํƒํ•  ์—ด ๋ฆฌ์ŠคํŠธ select_list = ['Invoice Number', 'Purchase Date'] # loc[ ํ–‰๋ผ๋ฒจ, ์—ด๋ผ๋ฒจ ] data_frame_column_by_name = data_frame.loc[ :, select_list ] data_frame_column_by_name.to_csv(output_file, index=False)
Python
๋ณต์‚ฌ

์—ฐ์†๋œ ํ–‰ ์„ ํƒํ•˜๊ธฐ

โ€ข
csv ๋ชจ๋“ˆ ์ด์šฉ
โ€ข
pandas ๋ชจ๋“ˆ ์ด์šฉ

csv ๋ชจ๋“ˆ ์ด์šฉ

import csv import os # ์‹คํ–‰ ํ”„๋กœ๊ทธ๋žจ์˜ ๊ฒฝ๋กœ program_path = os.path.abspath(__file__) # ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ฒฝ๋กœ - ์ด ์•ˆ์˜ input, output ํด๋”์—์„œ ์ž…์ถœ๋ ฅํ•œ๋‹ค. path = os.path.dirname(program_path) # ์ž…๋ ฅํŒŒ์ผ, ์ถœ๋ ฅํŒŒ์ผ input_file = path + '/input/' + input('์ž…๋ ฅ ํŒŒ์ผ : ') output_file = path + '/output/' + input('์ถœ๋ ฅ ํŒŒ์ผ : ') row_counter = 0 with open(input_file, 'r', newline='') as csv_in_file: with open(output_file, 'w', newline='') as csv_out_file: filereader = csv.reader(csv_in_file) filewriter = csv.writer(csv_out_file) for row in filereader: # index 3~10 ๊นŒ์ง€ ์—ฐ์†๋œ ํ–‰ ์„ ํƒ if row_counter >= 3 and row_counter <= 10: filewriter.writerow([value.strip() for value in row]) row_counter += 1
Python
๋ณต์‚ฌ

pandas ๋ชจ๋“ˆ ์ด์šฉ

import pandas as pd import os # ์‹คํ–‰ ํ”„๋กœ๊ทธ๋žจ์˜ ๊ฒฝ๋กœ program_path = os.path.abspath(__file__) # ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ฒฝ๋กœ - ์ด ์•ˆ์˜ input, output ํด๋”์—์„œ ์ž…์ถœ๋ ฅํ•œ๋‹ค. path = os.path.dirname(program_path) # ์ž…๋ ฅํŒŒ์ผ, ์ถœ๋ ฅํŒŒ์ผ input_file = path + '/input/' + input('์ž…๋ ฅ ํŒŒ์ผ : ') output_file = path + '/output/' + input('์ถœ๋ ฅ ํŒŒ์ผ : ') data_frame = pd.read_csv(input_file, header=None) print('์‚ญ์ œ ์ „') print(data_frame) header = data_frame.iloc[0] # drop() # : ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์˜ ํŠน์ • ํ–‰์„ ์‚ญ์ œํ•˜๋Š” ํ•จ์ˆ˜ data_frame = data_frame.drop([0,1,2,3,4]) print('์‚ญ์ œ ํ›„') print(data_frame) # iloc[0] # : index๋ฅผ ๊ธฐ์ค€์œผ๋กœ ํŠน์ • ํ–‰,์—ด์„ ์„ ํƒํ•˜๋Š” ํ•จ์ˆ˜ data_frame.columns = header print('iloc[0] ์ดํ›„') print(data_frame) # reindex() # : ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์—์„œ ํ–‰์„ ์žฌ๊ตฌ์„ฑํ•˜๋Š” ํ•จ์ˆ˜ # data_frame.reindex(data_frame.index.drop(3)) # - ์ธ๋ฑ์Šค 3์ธ ํ–‰์„ ์‚ญ์ œ ํ›„, ์‚ญ์ œ๋œ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ ์žฌ๊ตฌ์„ฑํ•˜์—ฌ ๋ฐ˜ํ™˜ # new_index = range( len(data_frame) ) # range( 8 ) -> (0:7) # data_frame = data_frame.reindex([0,1,2,3,4,5,6,7]) # print('reindex() : ์ธ๋ฑ์Šค ์žฌ๊ตฌ์„ฑ ํ›„') # print(data_frame) # ์ธ๋ฑ์Šค ์žฌ๊ตฌ์„ฑ data_frame.reset_index(drop=True, inplace=True) # data_frame.reset_index() print('reset_index() ') print(data_frame) data_frame.to_csv(output_file, index=True)
Python
๋ณต์‚ฌ

ํ—ค๋” ์ถ”๊ฐ€ํ•˜๊ธฐ

โ€ข
csv ๋ชจ๋“ˆ ์ด์šฉ
โ€ข
pandas ๋ชจ๋“ˆ ์ด์šฉ

csv ๋ชจ๋“ˆ ์ด์šฉ

import csv import os # ์‹คํ–‰ ํ”„๋กœ๊ทธ๋žจ์˜ ๊ฒฝ๋กœ program_path = os.path.abspath(__file__) # ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ฒฝ๋กœ - ์ด ์•ˆ์˜ input, output ํด๋”์—์„œ ์ž…์ถœ๋ ฅํ•œ๋‹ค. path = os.path.dirname(program_path) # ์ž…๋ ฅํŒŒ์ผ, ์ถœ๋ ฅํŒŒ์ผ input_file = path + '/input/' + input('์ž…๋ ฅ ํŒŒ์ผ : ') output_file = path + '/output/' + input('์ถœ๋ ฅ ํŒŒ์ผ : ') with open(input_file, 'r', newline='') as csv_in_file: with open(output_file, 'w', newline='') as csv_out_file: filereader = csv.reader(csv_in_file) filewriter = csv.writer(csv_out_file) header_list = ['Supplier Name', 'Invoice Number', \ 'Part Number', 'Cost', 'Purchase Date'] filewriter.writerow(header_list) for row in filereader: filewriter.writerow (row)
Python
๋ณต์‚ฌ

pandas ๋ชจ๋“ˆ ์ด์šฉ

import pandas as pd import os # ์‹คํ–‰ ํ”„๋กœ๊ทธ๋žจ์˜ ๊ฒฝ๋กœ program_path = os.path.abspath(__file__) # ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ฒฝ๋กœ - ์ด ์•ˆ์˜ input, output ํด๋”์—์„œ ์ž…์ถœ๋ ฅํ•œ๋‹ค. path = os.path.dirname(program_path) # ์ž…๋ ฅํŒŒ์ผ, ์ถœ๋ ฅํŒŒ์ผ input_file = path + '/input/' + input('์ž…๋ ฅ ํŒŒ์ผ : ') output_file = path + '/output/' + input('์ถœ๋ ฅ ํŒŒ์ผ : ') header_list = ['Supplier Name', 'Invoice Number', \ 'Part Number', 'Cost', 'Purchase Date'] # header=None : ํ—ค๋” ์—†์ด ์ž…๋ ฅ # names=[์ถ”๊ฐ€ํ•  ํ—ค๋”] : ํ—ค๋”๋ฅผ ์ถ”๊ฐ€ํ•ด ์ž…๋ ฅ data_frame = pd.read_csv(input_file, header=None, names=header_list) data_frame.to_csv(output_file, index=False)
Python
๋ณต์‚ฌ

์—ฌ๋Ÿฌ ๊ฐœ์˜ CSV ํŒŒ์ผ ์ฝ๊ธฐ

โ€ข
csv ๋ชจ๋“ˆ ์ด์šฉ

csv ๋ชจ๋“ˆ ์ด์šฉ

import csv import glob import os # ์‹คํ–‰ ํ”„๋กœ๊ทธ๋žจ์˜ ๊ฒฝ๋กœ program_path = os.path.abspath(__file__) # ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ฒฝ๋กœ - ์ด ์•ˆ์˜ input, output ํด๋”์—์„œ ์ž…์ถœ๋ ฅํ•œ๋‹ค. path = os.path.dirname(program_path) # ์ž…๋ ฅํŒŒ์ผ, ์ถœ๋ ฅํŒŒ์ผ input_path = path + '/input/' file_counter = 0 # glob.glob() ํ•จ์ˆ˜๋กœ, input_path์—์„œ sales_๋กœ ์‹œ์ž‘ํ•˜๋Š” ๋ชจ๋“  ํŒŒ์ผ์˜ ๊ฒฝ๋กœ๋ฅผ ์ƒ์„ฑ for input_file in glob.glob(os.path.join(input_path,'sales_*')): row_counter = 1 # ๊ฐ ํŒŒ์ผ์˜ ํ–‰ ์ˆ˜๋ฅผ ์„ธ๊ธฐ ์œ„ํ•œ ๋ณ€์ˆ˜ ์ดˆ๊ธฐํ™” with open(input_file, 'r', newline='') as csv_in_file: filereader = csv.reader(csv_in_file) header = next(filereader) # ์ฒซ ๋ฒˆ์งธ ํ–‰์€ ํ—ค๋” for row in filereader: row_counter += 1 # ๊ฐ ํ–‰๋งˆ๋‹ค ํ–‰ ์ˆ˜ ์ฆ๊ฐ€ # ํŒŒ์ผ ์ด๋ฆ„, ํ–‰ ์ˆ˜, ์—ด ์ˆ˜ ์ถœ๋ ฅ print('{0!s}: \t{1:d} rows \t{2:d} columns'.format(\ os.path.basename(input_file), row_counter, len(header))) file_counter += 1 # ํŒŒ์ผ ์ˆ˜ ์ฆ๊ฐ€ print('Number of files: {0:d}'.format(file_counter)) # ์ด ํŒŒ์ผ ์ˆ˜ ์ถœ๋ ฅ
Python
๋ณต์‚ฌ

์—ฌ๋Ÿฌ ํŒŒ์ผ์˜ ๋ฐ์ดํ„ฐ ํ•ฉ์น˜๊ธฐ

โ€ข
csv ๋ชจ๋“ˆ ์ด์šฉ
โ€ข
pandas ๋ชจ๋“ˆ ์ด์šฉ

csv ๋ชจ๋“ˆ ์ด์šฉ

import csv import glob import os # ์‹คํ–‰ ํ”„๋กœ๊ทธ๋žจ์˜ ๊ฒฝ๋กœ program_path = os.path.abspath(__file__) # ๋””๋ ‰ํ„ฐ๋ฆฌ ๊ฒฝ๋กœ - ์ด ์•ˆ์˜ input, output ํด๋”์—์„œ ์ž…์ถœ๋ ฅํ•œ๋‹ค. path = os.path.dirname(program_path) # ์ž…๋ ฅํŒŒ์ผ, ์ถœ๋ ฅํŒŒ์ผ input_path = path + '/input/' output_file = path + '/output/' + input('์ถœ๋ ฅ ํŒŒ์ผ : ') first_file = True # glob ๋ชจ๋“ˆ์„ ์‚ฌ์šฉํ•ด์„œ, * ๋“ฑ์˜ ์™€์ผ๋“œ์นด๋“œ๋กœ ์—ฌ๋Ÿฌ ํŒŒ์ผ์„ ๋งค์น˜ํ•ด์„œ ๊ฐ€์ ธ์˜จ๋‹ค for input_file in glob.glob(os.path.join(input_path,'sales_*')): print(os.path.basename(input_file)) with open(input_file, 'r', newline='') as csv_in_file: with open(output_file, 'a', newline='') as csv_out_file: filereader = csv.reader(csv_in_file) filewriter = csv.writer(csv_out_file) if first_file: for row in filereader: filewriter.writerow(row) first_file = False else: header = next(filereader) for row in filereader: filewriter.writerow(row)
Python
๋ณต์‚ฌ

pandas ๋ชจ๋“ˆ ์ด์šฉ

import pandas as pd import glob import os program_path = os.path.abspath(__file__) path = os.path.dirname(program_path) # ์ž…๋ ฅ๊ฒฝ๋กœ, ์ถœ๋ ฅํŒŒ์ผ input_path = path + '/input/' output_file = path + '/output/' + input('์ถœ๋ ฅ ํŒŒ์ผ : ') # glob.glob() ํ•จ์ˆ˜๋กœ, input_path์—์„œ sales_๋กœ ์‹œ์ž‘ํ•˜๋Š” ๋ชจ๋“  ํŒŒ์ผ์˜ ๊ฒฝ๋กœ๋ฅผ ์ƒ์„ฑ all_files = glob.glob(os.path.join(input_path,'sales_*')) all_data_frames = [] # ์—ฌ๋Ÿฌ ํŒŒ์ผ ๊ฒฝ๋กœ๋“ค์„ ๋ฐ˜๋ณตํ•˜์—ฌ csv ํŒŒ์ผ์„ ์ž…๋ ฅ for file in all_files: data_frame = pd.read_csv(file, index_col=None) # CSV ํŒŒ์ผ ์ž…๋ ฅ all_data_frames.append(data_frame) # ๋ฆฌ์ŠคํŠธ์— ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ์ถ”๊ฐ€ # concat() ํ•จ์ˆ˜๋กœ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„ ๋ฆฌ์ŠคํŠธ๋ฅผ ํ•˜๋‚˜์˜ ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์œผ๋กœ ๋ณ‘ํ•ฉ data_frame_concat = pd.concat(all_data_frames, axis=0, ignore_index=True) # ๋ฐ์ดํ„ฐํ”„๋ ˆ์ž„์„ CSV ํŒŒ์ผ๋กœ ์ถœ๋ ฅ data_frame_concat.to_csv(output_file, index = False)
Python
๋ณต์‚ฌ

ํŒŒ์ผ์—์„œ ๋ฐ์ดํ„ฐ ๊ฐ’์˜ ํ•ฉ๊ณ„ ๋ฐ ํ‰๊ท  ๊ณ„์‚ฐํ•˜๊ธฐ

โ€ข
csv ๋ชจ๋“ˆ ์ด์šฉ
โ€ข
pandas ๋ชจ๋“ˆ ์ด์šฉ

csv ๋ชจ๋“ˆ ์ด์šฉ

import csv import glob import os program_path = os.path.abspath(__file__) path = os.path.dirname(program_path) # ์ž…๋ ฅ๊ฒฝ๋กœ, ์ถœ๋ ฅํŒŒ์ผ input_path = path + '/input/' output_file = path + '/output/' + input('์ถœ๋ ฅ ํŒŒ์ผ : ') output_header_list = ['file_name', 'total_sales', 'average_sales'] csv_out_file = open(output_file, 'a', newline='') filewriter = csv.writer(csv_out_file) filewriter.writerow(output_header_list) # "sales_" ๋กœ ์‹œ์ž‘ํ•˜๋Š” ์—ฌ๋Ÿฌ ํŒŒ์ผ ๊ฒฝ๋กœ ์ƒ์„ฑ for input_file in glob.glob(os.path.join(input_path,'sales_*')): # salse_2013.csv, salse_2014.csv, ... ์ฐจ๋ก€๋กœ ์ฝ์–ด์˜ด with open(input_file, 'r', newline='') as csv_in_file: filereader = csv.reader(csv_in_file) output_list = [ ] # ์ฝ์–ด์˜จ CSV ํŒŒ์ผ๊ฒฝ๋กœ๋ฅผ ์ถœ๋ ฅ ๋ฐ์ดํ„ฐ ๋ฆฌ์ŠคํŠธ์— ์ถ”๊ฐ€ output_list.append(os.path.basename(input_file)) header = next(filereader) # ํ•ฉ๊ณ„, ๊ฐœ์ˆ˜ ๋ณ€์ˆ˜ ์„ ์–ธ total_sales = 0.0 number_of_sales = 0.0 for row in filereader: sale_amount = row[3] # ํ•ฉ๊ณ„ ๊ณ„์‚ฐ total_sales += float(str(sale_amount).strip('$').replace(',','')) # ๊ฐœ์ˆ˜ ์นด์šดํŒ… number_of_sales += 1.0 # ํ‰๊ท  ๊ณ„์‚ฐ average_sales = '{0:.2f}'.format(total_sales / number_of_sales) # ์ถœ๋ ฅ ๋ฐ์ดํ„ฐ ๋ฆฌ์ŠคํŠธ์— ํ•ฉ๊ณ„ ์ถ”๊ฐ€ output_list.append(total_sales) # ์ถœ๋ ฅ ๋ฐ์ดํ„ฐ ๋ฆฌ์ŠคํŠธ์— ํ‰๊ท  ์ถ”๊ฐ€ output_list.append(average_sales) # [์ž…๋ ฅํŒŒ์ผ๋ช….csv,ํ•ฉ๊ณ„,ํ‰๊ท ] ํ˜•์‹์œผ๋กœ ํ•œ ์ค„ ์ถœ๋ ฅ filewriter.writerow(output_list) csv_out_file.close()
Python
๋ณต์‚ฌ

pandas ๋ชจ๋“ˆ ์ด์šฉ

import pandas as pd import glob import os program_path = os.path.abspath(__file__) path = os.path.dirname(program_path) # ์ž…๋ ฅ๊ฒฝ๋กœ, ์ถœ๋ ฅํŒŒ์ผ input_path = path + '/input/' output_file = path + '/output/' + input('์ถœ๋ ฅ ํŒŒ์ผ : ') all_files = glob.glob(os.path.join(input_path,'sales_*')) all_data_frames = [] for input_file in all_files: data_frame = pd.read_csv(input_file, index_col=None) # ํ•ฉ๊ณ„ # 1. ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์—์„œ Sales Amount ์—ด ์„ ํƒ # 2. ๋ฆฌ์ŠคํŠธ ๋‚ดํฌ๋กœ ๋ฐ์ดํ„ฐ์—์„œ $ ๋ฐ , ๊ธฐํ˜ธ ์ œ๊ฑฐ # 3. ํŒ๋‹ค์Šค DataFrame์˜ sum() ํ•จ์ˆ˜๋กœ ๋ฆฌ์ŠคํŠธ์˜ ํ•ฉ๊ณ„ ๊ตฌํ•จ total_sales = pd.DataFrame([float(str(value).strip('$').replace(',','')) \ for value in data_frame.loc[:, 'Sale Amount']]).sum() # ํ‰๊ท  # 1. ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์—์„œ Sales Amount ์—ด ์„ ํƒ # 2. ๋ฆฌ์ŠคํŠธ ๋‚ดํฌ๋กœ ๋ฐ์ดํ„ฐ์—์„œ $ ๋ฐ , ๊ธฐํ˜ธ ์ œ๊ฑฐ # 3. ํŒ๋‹ค์Šค DataFrame์˜ mean() ํ•จ์ˆ˜๋กœ ๋ฆฌ์ŠคํŠธ์˜ ํ‰๊ท  ๊ตฌํ•จ average_sales = pd.DataFrame([float(str(value).strip('$').replace(',','')) \ for value in data_frame.loc[:, 'Sale Amount']]).mean() # data ๋”•์…”๋„ˆ๋ฆฌ ์„ ์–ธ data = {'file_name': os.path.basename(input_file), 'total_sales': total_sales, 'average_sales': average_sales} # data ๋”•์…”๋„ˆ๋ฆฌ๋กœ ์ปฌ๋Ÿผ๋ช…์„ ์ง€์ •ํ•˜์—ฌ, ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ๋ฆฌ์ŠคํŠธ์— ํ•œ CSV ํŒŒ์ผ์˜ ํ•ฉ๊ณ„ ๋ฐ ํ‰๊ท  ์ถ”๊ฐ€ all_data_frames.append(pd.DataFrame(data, columns=['file_name', 'total_sales', 'average_sales'])) # ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„ ๋ฆฌ์ŠคํŠธ(๊ฐ CSV ํŒŒ์ผ์˜ ํ•ฉ๊ณ„ํ‰๊ท  ๋ฆฌ์ŠคํŠธ)๋ฅผ ํ•˜๋‚˜์˜ ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์œผ๋กœ ๋ณ‘ํ•ฉ data_frames_concat = pd.concat(all_data_frames, axis=0, ignore_index=True) # ๋ฐ์ดํ„ฐ ํ”„๋ ˆ์ž„์„ CSV ํŒŒ์ผ๋กœ ์ถœ๋ ฅ data_frames_concat.to_csv(output_file, index = False)
Python
๋ณต์‚ฌ