1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

#/usr/bin/python3
 
import pandas as pd
 
def process():
 
    print("create empty dataframe")
    df = pd.DataFrame()
    print(df)
    print('\n')
 
    print("create dataframe by using list")
    df = pd.DataFrame(data=[[1,2],[3,4]], columns=['A', 'B'])
    print(df)
    print('\n')
 
    print("create dataframe by using dictionary")
    df = pd.DataFrame(data={'A':[1,3], 'B':[2,4]})
    print(df)
    print('\n')
 
 
def main():
    process()
 
 
if __name__ == "__main__":
    main()
Colored by Color Scripter

cs

실행 결과

reset_index

우선 함수 definition은 다음과 같고, dataframe을 조작(정렬, 필터 등)하고 난 뒤, index가 변경될 경우 가급적 reset_index를 통하여 index를 재설정 해주는 것이 좋습니다.

pandas.DataFrame.reset_index

DataFrame.reset_index(level=None, drop=False, inplace=False, col_level=0, col_fill='')

pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.reset_index.html

보통 다음(inplace, drop)의 option이 주로 쓰이는데 inplace=True, drop=True가 가장 많이 쓰게 되는 option 같습니다.

original data

filtered data

- B와 C column이 1인 것만 filtering 합니다.

- 이 때, index는 원본의 0, 2, 6을 가지게 되어 새롭게 index 지정이 필요합니다.

inplace=True

- 반환 df : 반환 객체는 없습니다.

- 원본 df : 새로운 index로 대체하고 원본 index를 index column에 보관합니다.

inplace=False

- 반환 df : 새로운 index로 대체하고 원본 index를 index column에 보관한 객체를 반환합니다.

- 원본 df : 원본 data는 변경 사항 없습니다.

inplace=True, drop=True

- 반환 df : 반환 객체는 없습니다.

- 원본 df : 새로운 index로 대체하고 원본 index column은 drop 합니다.

inplace=False, drop=True

- 반환 df : 새로운 index로 대체하고 원본 index column은 drop한 객체를 반환합니다.

- 원본 df : 원본 data는 변경 사항 없습니다.

Code

실행 코드입니다.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57

#/usr/bin/python3
 
import pandas as pd
 
df = None
 
def load_data():
    global df
    df = pd.read_csv("./sample.csv")
 
 
def process():
    print("original df")
    print(df)
    print('\n')
 
    print("filtered df")
    df_filtered = df[df['B']==1][df['C']==1]
    print(df_filtered)
    print('\n')
 
    print("inplace=True")
    df_filtered = df[df['B']==1][df['C']==1]
    df_rst = df_filtered.reset_index(inplace=True)
    print(df_rst)
    print(df_filtered)
    print('\n')
 
    print("inplace=False")
    df_filtered = df[df['B']==1][df['C']==1]
    df_rst = df_filtered.reset_index(inplace=False)
    print(df_rst)
    print(df_filtered)
    print('\n')
 
    print("inplace=True, drop=True")
    df_filtered = df[df['B']==1][df['C']==1]
    df_rst = df_filtered.reset_index(inplace=True, drop=True)
    print(df_rst)
    print(df_filtered)
    print('\n')
 
    print("inplace=False, drop=True")
    df_filtered = df[df['B']==1][df['C']==1]
    df_rst = df_filtered.reset_index(inplace=False, drop=True)
    print(df_rst)
    print(df_filtered)
    print('\n')
 
 
def main():
    load_data()
    process()
 
 
if __name__ == "__main__":
    main()
Colored by Color Scripter

cs

Basic Filtering

보통 dataframe에서 각 column 별로 원하는 조건의 value를 포함한 record만 추출하고 싶은 경우가 있습니다.

시나리오를 통해, 기본 filtering에 관한 내용을 정리합니다.

Scenario

원본 dataframe에서 B, C column이 동시에 1인 경우가 하나라도 포함하는 KEY의 값이 있으면 그 KEY의 data를 모두 출력

original dataframe

filtered dataframe

B, 와 C column이 모두 1인 record의 KEY 값들 중, 유일한 KEY 값의 list를 만듭니다.

※ 다른 filter 조건이 더 필요할 경우 대괄호[] 로 이어서 filter 합니다.

result dataframe

유일한 KEY 값에 해당하는 record를 모두 출력합니다.

최종 data를 만든 후, reset_index를 꼭 처리합니다.

Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

#/usr/bin/python3
 
import pandas as pd
 
df = None
 
 
def load_data():
    global df
    df = pd.read_csv("./sample.csv")
 
 
def process():
    print("original df")
    print(df)
    print('\n')
 
    
    print("filtered df")
    df_filtered = df[df['B']==1][df['C']==1]
    print(df_filtered)
    print('\n')
 
    print("unique KEY")
    unique_key = df_filtered['KEY'].unique()
    print(unique_key)
    print('\n')
 
    print("result df")
    df_rst = df[df['KEY'].isin(unique_key)]
    df_rst.reset_index(inplace=True, drop=True)
    print(df_rst)
    print('\n')
 
 
def main():
    load_data()
    process()
 
 
if __name__ == "__main__":
    main()
Colored by Color Scripter

cs

Translation

(To be continued)

저작자표시

'프로그래밍 > PYTHON' 카테고리의 다른 글

python - Mixin (0)	2021.03.13
python - Equality, Identity (0)	2021.02.12
python - Comprehension, Generator expression (0)	2021.02.11
python - Yield, Generator (0)	2021.02.10
[환경설정] python - Linux anaconda (0)	2021.02.10

현재글python - Pandas Dataframe Handling(Creation, Filtering, Translation)

책 읽는 개발자 평범한 직장인의 일상 + 독서 + SW 개발 위주의 공간입니다. aiemag.lhh@gmail.com

평범한 직장인의 일상 + 독서 + SW 개발 위주의 공간입니다. aiemag.lhh@gmail.com

Docker, 영어표현, 딥러닝, 칼럼LINK, TensorFlow, 독서목록, 육아팁, 독후감, 감사일기, 미라클모닝, 프로그래머스, 육아일기, 디자인패턴, PYTHON, 통계, 알고리즘, 이상탐지, 독서 목록, k8s, 매일일기,

Today :
Yesterday :

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

책 읽는 개발자

python - Pandas Dataframe Handling(Creation, Filtering, Translation)

Index

Creation

pandas.DataFrame

Code

실행 결과

reset_index

pandas.DataFrame.reset_index

original data

filtered data

inplace=True

inplace=False

inplace=True, drop=True

inplace=False, drop=True

Code

Basic Filtering

Scenario

original dataframe

filtered dataframe

result dataframe

Code

'프로그래밍 > PYTHON' 카테고리의 다른 글

'프로그래밍/PYTHON'의 다른글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역

2025. 04
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

python - Pandas Dataframe Handling(Creation, Filtering, Translation)

Index

Creation

pandas.DataFrame

Code

실행 결과

reset_index

pandas.DataFrame.reset_index

original data

filtered data

inplace=True

inplace=False

inplace=True, drop=True

inplace=False, drop=True

Code

Basic Filtering

Scenario

original dataframe

filtered dataframe

result dataframe

Code

'프로그래밍 > PYTHON' 카테고리의 다른 글

'프로그래밍/PYTHON'의 다른글

관련글

티스토리툴바

개인정보

단축키

내 블로그

블로그 게시글

모든 영역