Pandas DataFrame

　　Pandas DataFrame 기본 작업

DataFrame는 두차원 데이터 구조이며, 데이터가 행과 열로 테이블 형식으로 정렬되어 있습니다.

DataFrame 기능

가능한 열은 다른 타입일 수 있습니다 크기가 변할 수 있습니다 표시된 축(행과 열) 행과 열에 수학 연산을 수행할 수 있습니다

구조체

pandas.Series

Series 구조는 다음과 같습니다:

학생 데이터를 사용하여 데이터 프레임을 생성할 것을 가정해 보겠습니다.

이를 SQL 테이블이나 엑셀 데이터 표현 형식으로 볼 수 있습니다.

pandas.DataFrame

pandas DataFrame를 생성하기 위해 다음과 같은 생성자를 사용할 수 있습니다.-

　pandas.DataFrame( data, index, columns, dtype, copy )

파라미터 설명:

data: 데이터는 ndarray, 시리즈, 매핑, 리스트, dict, 상수 및 다른 DataFrame와 같은 여러 형식을 취합니다. index: 행 레이블에 인덱스가 전달되지 않으면, 결과 프레임의 인덱스는 Optional Default np.arange(n)입니다. columns: 열 태그에 대해 선택할 수 있는 기본 문법은}}-np.arange(n)。인덱스를 전달하지 않았을 때만 이렇게 합니다。 dtype: 각 열의 데이터 유형입니다。 copy: 기본값이 False인 경우, 이 명령(또는 그 명령의 모든 명령)은 데이터를 복사합니다。

DataFrame 생성

pandas DataFrame를 생성하기 위해 다양한 입력을 사용할 수 있습니다。-

리스트 딕셔너리 시리즈 Numpy ndarrays 또 다른 DataFrame

이 장의 후반부에서는 이러한 입력을 사용하여 DataFrame를 생성하는 방법을 볼 것입니다。

Empty DataFrame를 생성합니다

기본적으로 Empty DataFrame를 생성할 수 있습니다。

예제

　# Filename : pandas.py
　# author by : ko.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　df = pd.DataFrame()
　print(df)

실행 결과:

　Empty DataFrame
　Columns:　[]
　Index:　[]

Lists에서 DataFrame를 생성합니다

예제

　# Filename : pandas.py
　# author by : ko.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　data = [1,2,3,4,5}
　df = pd.DataFrame(data)
　print(df)

실행 결과:

예제

　# Filename : pandas.py
　# author by : ko.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　data = [['알렉스',10],['본',12],['클락',13}]
　df = pd.DataFrame(data, columns=['이름','나이'])
　print(df)

실행 결과:

　　　　　　　이름　　　　　나이
　0　　　　　알렉스　　　　　10
　1　　　　　본　　　　　　12
　2　　　　　클락　　　13

예제

　# Filename : pandas.py
　# author by : ko.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　data = [['알렉스',10],['본',12],['클락',13}]
　df = pd.DataFrame(data, columns=['이름','나이'], dtype=float)
　print df

실행 결과:

　
　　　　　　이름　　　나이
　0　　　　알렉스　　　10.0
　1　　　　본　　　　12.0
　2　　　　클락　13.0

注意：dtype 매개변수를 사용하여 Age 열의 데이터 유형을 부동소수점으로 변경합니다。

ndarrays에서 / List의 Dict를 사용하여 DataFrame를 생성합니다

모든 ndarray의 길이는 같아야 합니다. 인덱스를 전달한 경우, 인덱스의 길이는 배열의 길이와 같아야 합니다。
인덱스를 전달하지 않았을 경우, 기본적으로 인덱스는 range(n)입니다. 여기서 n은 배열 길이입니다。

예제

　# Filename : pandas.py
　# author by : ko.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　data = {'이름':['汤姆', '잭', '스티브', '릭키'], '나이':[28,34,29,42}]
　df = pd.DataFrame(data)
　print(df)

실행 결과:

　
　　　　나이　　　이름
　0　　　28　　　汤姆
　1　　　34　　　잭
　2　　　29　　　스티브
　3　　　42　　　Ricky

注意：0을 준수합니다、1、2、3이들은 각 객체에 기본 인덱스로(n) 할당된 기능 범위입니다。

우리는 배열을 사용하여 인덱스 DataFrame를 생성합니다。

예제

　# Filename : pandas.py
　# author by : ko.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　data = {'이름':['汤姆', '잭', '스티브', '릭키'], '나이':[28,34,29,42}]
　df = pd.DataFrame(data, index=['순위1','순위2','순위3','순위4'])
　print(df)

실행 결과:

　
　　　　　　　나이　이름
　rank1　28　汤姆
　rank2　34　잭
　rank3　29　스티브
　rank4　42　Ricky

注意：index参数为每行分配一个索引。

从字典列表创建DataFrame

字典列表可以作为输入数据传递以创建DataFrame。默认情况下，字典键被用作列名。
下面的示例演示如何通过传递字典列表来创建DataFrame。

예제

　# Filename : pandas.py
　# author by : ko.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　data = [{'a':　1, 'b':　2},{'a':　5, 'b':　10, 'c':　20}]
　df = pd.DataFrame(data)
　print(df)

실행 결과:

　　　　　a b c
　0　1　2　NaN
　1　5　10　20.0

注意：NaN（非数字）会在缺失区域中附加。

下面的示例演示如何通过传递字典列表和行索引来创建DataFrame。

예제

　# Filename : pandas.py
　# author by : ko.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　data = [{'a':　1, 'b':　2},{'a':　5, 'b':　10, 'c':　20}]
　df = pd.DataFrame(data, index=['first', 'second'])
　print(df)

실행 결과:

　　　　　　　　　　a b c
　first　1　2　NaN
　second　5　10　20.0

下面的示例演示如何创建包含字典，行索引和列索引的列表的DataFrame。

예제

　# Filename : pandas.py
　# author by : ko.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　data = [{'a':　1, 'b':　2},{'a':　5, 'b':　10, 'c':　20}]
　# 有两个列索引，值与字典键相同
　df1　= pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])
　# 有两个列索引
　df2　= pd.DataFrame(data, index=['first', 'second'], columns=['a', 'b'])1'])
　print(df1)
　print(df2)

실행 결과:

　#df1　output
　　　　　　　a b
　first　1　2
　second　5　10
　#df2　output
　　　　　　　a b1
　first　1　NaN
　second　5　NaN

注意：df2 DataFrame是使用除字典键以外的列索引创建的；因此，将NaN附加到位。而df1是使用与字典键相同的列索引创建的，因此附加了NaN。

从Dict Series创建DataFrame

可以传递系列字典以形成DataFrame。结果索引是所有通过的系列索引的并集。

예제

　# Filename : pandas.py
　# author by : ko.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　d = {'one' : pd.Series([1,　2,　3], index=['a', 'b', 'c']),
　　　　'two' : pd.Series([1,　2,　3,　4], index=['a', 'b', 'c', 'd'])}
　df = pd.DataFrame(d
　print(df)

실행 결과:

　　　one two
　a　1.0　1
　b　2.0　2
　c　3.0　3
　d NaN　4

对于第一个系列，没有传递标签'd'，但是结果是，对于d标签，附加了NaN。
现在让我们通过示例了解列的选择，添加和删除。

列查询

我们将从DataFrame中选择一列来了解这一点。

예제

　# Filename : pandas.py
　# author by : ko.oldtoolbag.com　
　# Import pandas dependency package and alias
　
　import pandas as pd
　d = {'one' : pd.Series([1,　2,　3], index=['a', 'b', 'c']),
　　　　'two' : pd.Series([1,　2,　3,　4], index=['a', 'b', 'c', 'd'])}
　df = pd.DataFrame(d
　print(df ['one'])

실행 결과:

　　　a　1.0
　b　2.0
　c　3.0
　d NaN
　Name: one, dtype: float64

列添加

我们将通过在现有数据框中添加新列来了解这一点。

예제

# Filename : pandas.py
　# author by : ko.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　d = {'one' : pd.Series([1,　2,　3], index=['a', 'b', 'c']),
　　　　'two' : pd.Series([1,　2,　3,　4], index=['a', 'b', 'c', 'd'])}
　df = pd.DataFrame(d
　#通过传递新序列，向具有列标签的现有 DataFrame 对象添加新列
　print ("通过作为 Series 传递添加新列:")
　df['three']=pd.Series([10,20,30],index=['a','b','c'])
　print df
　print ("Add new column using existing columns in DataFrame:")
　df['four']=df['one']+df['three']
　print(df)

실행 결과:

　Add new column by passing as Series:
　one two three
　a　1.0　1　10.0
　b　2.0　2　20.0
　c　3.0　3　30.0
　d NaN　4　NaN
　Add new column using existing columns in DataFrame:
　one two three four
　a　1.0　1　10.0　11.0
　b　2.0　2　20.0　22.0
　c　3.0　3　30.0　33.0
　d NaN　4　NaN NaN

Column deletion

can delete or pop columns; let's understand how with an example.

예제

　# Filename : pandas.py
　# author by : ko.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　d = {'one' : pd.Series([1,　2,　3], index=['a', 'b', 'c']),　
　　　　'two' : pd.Series([1,　2,　3,　4], index=['a', 'b', 'c', 'd']),　
　　　　'three' : pd.Series([10,20,30], index=['a','b','c'])}
　df = pd.DataFrame(d
　print ("Our dataframe is:")
　print(df)
　# using del function
　print ("Deleting the first column using del function:")
　del df['one']
　print(df)
　# using pop function
　print ("Deleting another column using POP function:")
　df.pop('two')
　print(df)

실행 결과:

　Our dataframe is:
　one three two
　a　1.0　10.0　1
　b　2.0　20.0　2
　c　3.0　30.0　3
　d NaN NaN　4
　Deleting the first column using del function:
　　　three two
　a　10.0　1
　b　20.0　2
　c　30.0　3
　d NaN　4
　Deleting another column using POP function:
　　　three
　a　10.0
　b　20.0
　c　30.0
　d NaN

행 검색, 추가 및 제거

이제 예제를 통해 행 선택, 추가 및 제거 방법을 배웁시다. 선택 개념으로부터 시작해 보겠습니다.

레이블로 검색

loc 함수에 행 레이블을 전달하여 행을 선택할 수 있습니다.

예제

　# Filename : pandas.py
　# author by : ko.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　d = {'one' : pd.Series([1,　2,　3], index=['a', 'b', 'c']),　
　　　　'two' : pd.Series([1,　2,　3,　4], index=['a', 'b', 'c', 'd'])}
　df = pd.DataFrame(d
　print(df.loc['b'])

실행 결과:

　
　　　one　2.0
　two　2.0
　Name: b, dtype: float64

결과는 레이블로 인식된 시리즈로 구성되며, 시리즈의 이름은 그 레이블을 검색하는 데 사용됩니다.

정수 위치로 검색

결과를 선택하려면iloc 함수에 정수 위치를 전달할 수 있습니다.

예제

　# Filename : pandas.py
　# author by : ko.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　d = {'one' : pd.Series([1,　2,　3], index=['a', 'b', 'c']),
　　　　'two' : pd.Series([1,　2,　3,　4], index=['a', 'b', 'c', 'd'])}
　df = pd.DataFrame(d
　print(df.iloc[2])

실행 결과:

　
　　　one　3.0
　two　3.0
　Name: c, dtype: float64

스ライ싱 행

可以使用'：'演算자를 사용하여 여러 행을 선택할 수 있습니다.

예제

　# Filename : pandas.py
　# author by : ko.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　d = {'one' : pd.Series([1,　2,　3], index=['a', 'b', 'c']),　
　　　　'two' : pd.Series([1,　2,　3,　4], index=['a', 'b', 'c', 'd'])}
　df = pd.DataFrame(d
　print(df[2:4])

실행 결과:

　
　　　　　one two
　c　3.0　3
　d NaN　4

행 추가

append 함수를 사용하여 새 행을 DataFrame에 추가합니다. 이 함수는 마지막에 행을 추가합니다.

예제

　# Filename : pandas.py
　# author by : ko.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　df = pd.DataFrame([1,　2], [3,　4], columns = ['a','b'])
　df2　= pd.DataFrame([5,　6], [7,　8], columns = ['a','b'])
　df = df.append(df2)
　print(df)

실행 결과:

행 제거

DataFrame에서 인덱스 레이블을 사용하여 행을 제거하거나 행을 제거합니다. 레이블이 중복되면 여러 행이 제거됩니다.
위 예제에서, 레이블이 중복되어 있습니다. 레이블을 하나 제거해 보겠습니다. 그럼 몇 줄이 제거될지 확인할 수 있습니다.

예제

　# Filename : pandas.py
　# author by : ko.oldtoolbag.com　
　# Import pandas dependency package and alias
　import pandas as pd
　df = pd.DataFrame([1,　2], [3,　4], columns = ['a','b'])
　df2　= pd.DataFrame([5,　6], [7,　8], columns = ['a','b'])
　df = df.append(df2)
　# Drop rows with label 0
　df = df.drop(0)
　print(df)

실행 결과:

위 예제에서, 같은 레이블 0을 포함한 두 줄을 제거했습니다.

Pandas SQL 작업 Pandas Series

Pandas 강의

Pandas DataFrame

pandas.Series

pandas.DataFrame

DataFrame 생성

Empty DataFrame를 생성합니다

Lists에서 DataFrame를 생성합니다

ndarrays에서 / List의 Dict를 사용하여 DataFrame를 생성합니다

从字典列表创建DataFrame

从Dict Series创建DataFrame

列查询

列添加

Column deletion

행 검색, 추가 및 제거

레이블로 검색

정수 위치로 검색

스ライ싱 행

행 추가

행 제거