Python Pandas에서 데이터 프레임에서 두 값 사이의 행을 선택하는 방법은 무엇입니까?

source

Python Pandas에서 데이터 프레임에서 두 값 사이의 행을 선택하는 방법은 무엇입니까?

lovecheck 2023. 7. 6. 22:19

Python Pandas에서 데이터 프레임에서 두 값 사이의 행을 선택하는 방법은 무엇입니까?

데이터 프레임을 수정하려고 합니다.df열에 값이 있는 행만 포함하는 경우closing_price99와 101 사이에 있으며 아래 코드로 이 작업을 수행하려고 합니다.

하지만 오류가 발생했습니다.

값 오류:시리즈의 참 값은 모호합니다.a.empty, a.boole(), a.item(), a.any() 또는 a.all()을 사용합니다.

그리고 루프를 사용하지 않고 이것을 할 수 있는 방법이 있는지 궁금합니다.

df = df[(99 <= df['closing_price'] <= 101)]

다음 사이의 시리즈도 고려합니다.

df = df[df['closing_price'].between(99, 101)]

사용해야 합니다.()부울 벡터를 그룹화하여 모호성을 제거합니다.

df = df[(df['closing_price'] >= 99) & (df['closing_price'] <= 101)]

더 좋은 대안이 있습니다 - query(쿼리 사용) 방법:

In [58]: df = pd.DataFrame({'closing_price': np.random.randint(95, 105, 10)})

In [59]: df
Out[59]:
   closing_price
0            104
1             99
2             98
3             95
4            103
5            101
6            101
7             99
8             95
9             96

In [60]: df.query('99 <= closing_price <= 101')
Out[60]:
   closing_price
1             99
5            101
6            101
7             99

업데이트: 의견에 답변합니다.

저는 여기 구문이 마음에 드는데 표현과 결합하려고 할 때 넘어졌습니다;df.query('(mean + 2 *sd) <= closing_price <=(mean + 2 *sd)')

In [161]: qry = "(closing_price.mean() - 2*closing_price.std())" +\
     ...:       " <= closing_price <= " + \
     ...:       "(closing_price.mean() + 2*closing_price.std())"
     ...:

In [162]: df.query(qry)
Out[162]:
   closing_price
0             97
1            101
2             97
3             95
4            100
5             99
6            100
7            101
8             99
9             95

newdf = df.query('closing_price.mean() <= closing_price <= closing_price.std()')

또는

mean = closing_price.mean()
std = closing_price.std()

newdf = df.query('@mean <= closing_price <= @std')

(다른 경계에 대해) 반복적으로 호출해야 하는 경우l그리고.r), 불필요하게 많은 작업이 반복됩니다.이 경우 프레임/시리즈를 한 번 정렬한 다음 사용하는 것이 좋습니다.최대 25배의 속도를 측정했습니다. 아래를 참조하십시오.

def between_indices(x, lower, upper, inclusive=True):
    """
    Returns smallest and largest index i for which holds 
    lower <= x[i] <= upper, under the assumption that x is sorted.
    """
    i = x.searchsorted(lower, side="left" if inclusive else "right")
    j = x.searchsorted(upper, side="right" if inclusive else "left")
    return i, j

# Sort x once before repeated calls of between()
x = x.sort_values().reset_index(drop=True)
# x = x.sort_values(ignore_index=True) # for pandas>=1.0
ret1 = between_indices(x, lower=0.1, upper=0.9)
ret2 = between_indices(x, lower=0.2, upper=0.8)
ret3 = ...

벤치마크

반복 평가 측정(n_reps=100의pd.Series.between()기반한 방법뿐만 아니라pd.Series.searchsorted()여러 가지 주장에 대하여lower그리고.upperPython v3.8.0 및 Pandas v1.0.3이 설치된 MacBook Pro 2015에서 아래 코드는 다음과 같은 출력을 생성합니다.

# pd.Series.searchsorted()
# 5.87 ms ± 321 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
# pd.Series.between(lower, upper)
# 155 ms ± 6.08 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
# Logical expressions: (x>=lower) & (x<=upper)
# 153 ms ± 3.52 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

import numpy as np
import pandas as pd

def between_indices(x, lower, upper, inclusive=True):
    # Assumption: x is sorted.
    i = x.searchsorted(lower, side="left" if inclusive else "right")
    j = x.searchsorted(upper, side="right" if inclusive else "left")
    return i, j

def between_fast(x, lower, upper, inclusive=True):
    """
    Equivalent to pd.Series.between() under the assumption that x is sorted.
    """
    i, j = between_indices(x, lower, upper, inclusive)
    if True:
        return x.iloc[i:j]
    else:
        # Mask creation is slow.
        mask = np.zeros_like(x, dtype=bool)
        mask[i:j] = True
        mask = pd.Series(mask, index=x.index)
        return x[mask]

def between(x, lower, upper, inclusive=True):
    mask = x.between(lower, upper, inclusive=inclusive)
    return x[mask]

def between_expr(x, lower, upper, inclusive=True):
    if inclusive:
        mask = (x>=lower) & (x<=upper)
    else:
        mask = (x>lower) & (x<upper)
    return x[mask]

def benchmark(func, x, lowers, uppers):
    for l,u in zip(lowers, uppers):
        func(x,lower=l,upper=u)

n_samples = 1000
n_reps = 100
x = pd.Series(np.random.randn(n_samples))
# Sort the Series.
# For pandas>=1.0:
# x = x.sort_values(ignore_index=True)
x = x.sort_values().reset_index(drop=True)

# Assert equivalence of different methods.
assert(between_fast(x, 0, 1, True ).equals(between(x, 0, 1, True)))
assert(between_expr(x, 0, 1, True ).equals(between(x, 0, 1, True)))
assert(between_fast(x, 0, 1, False).equals(between(x, 0, 1, False)))
assert(between_expr(x, 0, 1, False).equals(between(x, 0, 1, False)))

# Benchmark repeated evaluations of between().
uppers = np.linspace(0, 3, n_reps)
lowers = -uppers
%timeit benchmark(between_fast, x, lowers, uppers)
%timeit benchmark(between, x, lowers, uppers)
%timeit benchmark(between_expr, x, lowers, uppers)

이것 대신에

df = df[(99 <= df['closing_price'] <= 101)]

당신은 이것을 사용해야 합니다.

df = df[(df['closing_price']>=99 ) & (df['closing_price']<=101)]

쿼리를 복합화하려면 NumPy의 비트별 로직 연산자 |, &, ~, ^를 사용해야 합니다.또한 괄호는 연산자 우선 순위에 중요합니다.

자세한 내용은 비교, 마스크 및 부울 논리 링크를 참조하십시오.

여러 값과 여러 입력을 처리하는 경우 다음과 같은 적용 기능을 설정할 수도 있습니다.이 경우 특정 범위에 해당하는 GPS 위치에 대한 데이터 프레임을 필터링합니다.

def filter_values(lat,lon):
    if abs(lat - 33.77) < .01 and abs(lon - -118.16) < .01:
        return True
    elif abs(lat - 37.79) < .01 and abs(lon - -122.39) < .01:
        return True
    else:
        return False


df = df[df.apply(lambda x: filter_values(x['lat'],x['lon']),axis=1)]

언급URL : https://stackoverflow.com/questions/31617845/how-to-select-rows-in-a-dataframe-between-two-values-in-python-pandas

'source' 카테고리의 다른 글

SharePoint 사이트에서 Excel 파일 열기 (0)	2023.07.06
SQL Server 2000: 저장 프로시저를 종료하는 방법은 무엇입니까? (0)	2023.07.06
"이 작업을 수행하려면 IIS 통합 파이프라인 모드가 필요합니다." (0)	2023.07.06
그룹화 기능(적용, 기준, 집계) 및 *적용 제품군 (0)	2023.07.06
Array .find() 메서드 매개 변수가 vuex에 정의되지 않았습니다. (0)	2023.07.06

현재글Python Pandas에서 데이터 프레임에서 두 값 사이의 행을 선택하는 방법은 무엇입니까?

각종 프로그래밍 정보를 다루는 블로그입니다.

AJAX, Excel, sql-server, java, JQuery, oracle, C, Python, Wordpress, spring-boot, JSON, mysql, ASP.NET, reactjs, AngularJS, javascript, PHP, mariaDB, git, WPF,

Today :
Yesterday :

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

lovecheck

Python Pandas에서 데이터 프레임에서 두 값 사이의 행을 선택하는 방법은 무엇입니까?

Python Pandas에서 데이터 프레임에서 두 값 사이의 행을 선택하는 방법은 무엇입니까?

'source' 카테고리의 다른 글

'source'의 다른글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

2025. 04
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

Python Pandas에서 데이터 프레임에서 두 값 사이의 행을 선택하는 방법은 무엇입니까?

Python Pandas에서 데이터 프레임에서 두 값 사이의 행을 선택하는 방법은 무엇입니까?

'source' 카테고리의 다른 글

'source'의 다른글

관련글

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역