Pandas IO 작업

Pandas IO 작업 예제

텍스트 파일을 읽는 주요 기능은 read_csv()와 read_table()입니다. 모두가 동일한 파싱 코드를 사용하여 테이블 데이터를 DataFrame 객체로 지능적으로 변환합니다：

　pandas.read_csv(filepath_or_buffer,　sep=',',　delimiter=None,　header='infer',
　names=None,　index_col=None,　usecols=None

　pandas.read_csv(filepath_or_buffer,　sep='\t',　delimiter=None,　header='infer',
　names=None,　index_col=None,　usecols=None

Save this data as temp.csv and operate on it.

　S.No,Name,Age,City,Salary
　1,Tom,28,Toronto,20000
　2,Lee,32,HongKong,3000
　3,Steven,43,Bay　Area,8300
　4,Ram,38,Hyderabad,3900

read.csv

read.csv from csv file reads data and creates a DataFrame object.

예제

　import pandas as pd
　df=pd.read_csv("temp.csv")
　print df

실행 결과는 다음과 같습니다：

S.No　　　　　Name　　　Age　　　　　　　City　　　Salary
0　　　　　1　　　　　　Tom　　　　28　　　　Toronto　　　　20000
1　　　　　2　　　　　　Lee　　　　32　　　HongKong　　　　　3000
2　　　　　3　　　Steven　　　　43　　　Bay Area　　　　　8300
3　　　　　4　　　　　　Ram　　　　38　　Hyderabad　　　　　3900

Custom Indexing

This will specify a column in the csv file to use index_col for custom indexing.

예제

　import pandas as pd
　df=pd.read_csv("temp.csv",index_col=['S.No'])
　print df

실행 결과는 다음과 같습니다：

S.No　　　Name　　　Age　　　　　　　City　　　Salary
1　　　　　　　Tom　　　　28　　　　Toronto　　　　20000
2　　　　　　　Lee　　　　32　　　HongKong　　　　　3000
3　　　　Steven　　　　43　　　Bay Area　　　　　8300
4　　　　　　　Ram　　　　38　　Hyderabad　　　　　3900

Converter

The dtype of the column can be passed as a dict.

예제

　import pandas as pd
　df　=　pd.read_csv("temp.csv",　dtype={'Salary':　np.float}64}
　print　df.dtypes

실행 결과는 다음과 같습니다：

S.No　　　　　　　int64
Name　　　　　　object
Age　　　　　　　　int64
City　　　　　　object
Salary　　　float64
dtype:　object

By default, the dtype of the Salary column is int, but it is displayed as float because we have explicitly converted the type. Therefore, the data looks like float.

Thus, the data looks like float −

　　　S.No　　　Name　　　Age　　　　　　City　　　　Salary
0　　　1　　　　　Tom　　　28　　　　Toronto　　　20000.0
1　　　2　　　　　Lee　　　32　　　HongKong　　　　3000.0
2　　　3　　Steven　　　43　　　Bay Area　　　　8300.0
3　　　4　　　　　Ram　　　38　　Hyderabad　　　　3900.0

제목 이름

names 매개변수를 사용하여 제목의 이름을 지정합니다.

예제

　import pandas as pd
　　
　df=pd.read_csv("temp.csv",　names=['a',　'b',　'c','d','e'])
　print df

실행 결과는 다음과 같습니다：

　　　a	b	c	d	e
0	S.No	Name	Age	City	Salary
1　　　　　　1　　　　　　Tom　　　28　　　　　Toronto　　　　20000
2　　　　　　2　　　　　　Lee　　　32　　　　HongKong　　　　　3000
3　　　　　　3　　　Steven　　　43　　　　Bay Area　　　　　8300
4　　　　　　4　　　　　　Ram　　　38　　　Hyderabad　　　　　3900

주의하세요, 헤더 이름에 사용자 정의 이름이 추가되었지만, 파일의 헤더는 아직 제거되지 않았습니다. 지금, header 매개변수를 사용하여 제거합니다.

제목이 첫 번째 행에 없다면, 행 번호를 제목으로 전달합니다. 이렇게 앞의 행을 건너뜁니다。

예제

　import pandas as pd　
　df=pd.read_csv("temp.csv",names=['a','b','c','d','e'],header=0)
　print df

실행 결과는 다음과 같습니다：

　　a	b	c	d	e
0	S.No	Name	Age	City	Salary
1　　　　　1　　　　　　Tom　　　28　　　　　Toronto　　　　20000
2　　　　　2　　　　　　Lee　　　32　　　　HongKong　　　　　3000
3　　　　　3　　　Steven　　　43　　　　Bay Area　　　　　8300
4　　　　　4　　　　　　Ram　　　38　　　Hyderabad　　　　　3900

skiprows

skiprows는 지정된 행 수를 건너뜁니다。

예제

　import pandas as pd
　df=pd.read_csv("temp.csv",　skiprows=2)
　print df

실행 결과는 다음과 같습니다：

　　　　2　　　　　　Lee　　　32　　　　HongKong　　　3000
0　　　3　　　Steven　　　43　　　　Bay Area　　　8300
1　　　4　　　　　　Ram　　　38　　　Hyderabad　　　3900

Pandas SQL 작업 Pandas 시각화

Pandas 강의

Pandas IO 작업

read.csv

Custom Indexing

Converter

제목 이름

skiprows