심화프로젝트 2일차 -TIL

심화프로젝트

심화프로젝트 2일차 -TIL

이준민1 2024. 6. 18. 20:07

머신러닝 심화 프로젝트를 진행하며 오늘 있었던 가장 큰 문제는 EDA를 통해 유의미한 시각화가 나오지 않아서

튜터님께 찾아가서 질문을 드렸더니 EDA로 유의미한 인사이트를 도출하는것보다 머신러닝을 학습시키고 유의미한 결과가 나올때까지 하이퍼파라미터나 모델을 업그레이드 및 전처리를 하여 평가를 유의미하게 만들어보라고 하셔서 그렇게 하기로 했다.

모델학습방법: 데이터 샘플링 - 제일 관련없는 피쳐를 삭제해나가는 과정

오늘 내 역할은 칼럼간의 상관관계 분석을 해보는 거였는데

for 문으로 칼럼간의 상관관계 분석하는 법을 고안해서 해봤다 LinearRegression, Decision Tree Regressor

# 각 칼럼을 X와 y로 하여 선형 회귀 모델의 성능 평가

for x_column in columns:

for y_column in columns:

if x_column != y_column: # X와 y가 동일하지 않은 경우에만 실행

X = df[[x_column]]

y = df[y_column]

# 학습 데이터와 테스트 데이터로 분리 (70% 학습, 30% 테스트)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 선형 회귀 모델 생성 및 학습

model = LinearRegression()

model.fit(X_train, y_train)

# 테스트 데이터에 대한 예측 수행

y_pred = model.predict(X_test)

# 성능 평가

mse = mean_squared_error(y_test, y_pred)

mae = mean_absolute_error(y_test, y_pred)

r2 = r2_score(y_test, y_pred)

rmse = np.sqrt(mse)

# 결과 출력

print(f"X: {x_column}, y: {y_column}")

print(f"Mean Squared Error (MSE): {mse}")

print(f"Mean Absolute Error (MAE): {mae}")

print(f"R^2 Score: {r2}")

print(f"Root Mean Squared Error (RMSE): {rmse}")

print("-" * 50)

# 각 칼럼을 X와 y로 하여 선형 회귀 모델의 성능 평가

for x_column in columns:

for y_column in columns:

if x_column != y_column: # X와 y가 동일하지 않은 경우에만 실행

X = df[[x_column]]

y = df[y_column]

# Train/test 데이터 분리

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Decision Tree Regressor 모델 생성 및 학습

model = DecisionTreeRegressor(random_state=42)

model.fit(X_train, y_train)

# 테스트 데이터로 예측

y_pred = model.predict(X_test)

# 모델 평가