XGBoostの素敵なポイントの一つは、自分で定義した関数を目的関数に使うことができる点です。

でもどういう関数にしたらよいのかがわからなくて過去に戸惑ったことがあるのでメモしておきます。（詳しいやり方はXGBoostのdocumentationに書いてあります）

定義すべき関数

XGBoostなどの勾配ブースティング決定木アルゴリズムでは誤差関数・目的関数を2種類使います。

early stopping¹用の誤差関数（metric）
個々の木の学習で使う目的関数（objective）

誤差関数（metric function）

metricは誤差二乗和（Sum of Squared Error: SSE）

L(\theta) = \sum_i (\hat{y}_i-y_i)^2

や平均二乗誤差（Mean of Squared Error: MSE）のことです。他の機械学習アルゴリズムでもよく見かけるやつです。

XGBoostでは、予測値のarrayと実測値が入ったDMatrixを受け取り、文字列と誤差を評価した実数を返すような関数を定義します。誤差二乗和なら以下のような感じです。

import numpy as np
import xgboost as xgb

def my_sse(y_hat: np.array, dtrain: xgb.DMatrix) -> [str, float]:
    """custom metric: sum of squared error"""
    y = dtrain.get_label()
    N = y_hat.shape[0]
    error = np.sum((y_hat - y)**2)
    return "sse", float(error)

目的関数（objective function）

XGBoostでは、objective functionは予測値のarrayと実測値が入ったDMatrixを受け取ってmetricの1次の勾配と2次の勾配を返すような関数として実装します。

metricがSSEなら、目的関数の出発点は総和を取る前の二乗誤差（squared error）であり、微分後の式を簡単にするために1/2を掛けて

L(\theta) = \frac{1}{2}(\hat{y}_i-y_i)^2

とすれば、1次の勾配と2次の勾配は

\frac{\partial L}{\partial \boldsymbol{\hat{y}}}<br>= \boldsymbol{\hat{y}} - \boldsymbol{y}\\<br>\frac{\partial^2 L}{\partial \boldsymbol{\hat{y}}^2}<br>= \pmatrix{1 \\ 1 \\\vdots \\ 1}

となります。

def my_squared_error(y_hat: np.array, dtrain: xgb.DMatrix) -> [np.array, np.array]:
    """custom objective: squared error"""
    y = dtrain.get_label()
    N = y_hat.shape[0]
    gradient = y_hat - y
    hessian = np.ones(N)
    return gradient, hessian

使用例

上記の自作squared errorが正しく動いているか検証してみます。

from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import pandas as pd
import xgboost as xgb
import numpy as np


def my_sse(y_hat: np.array, dtrain: xgb.DMatrix) -> [str, float]:
    """custom metric: sum of squared error"""
    y = dtrain.get_label()
    N = y_hat.shape[0]
    error = np.sum((y_hat - y)**2)
    return "sse", float(error)

    
def my_squared_error(y_hat: np.array, dtrain: xgb.DMatrix) -> [np.array, np.array]:
    """custom objective: squared error"""
    y = dtrain.get_label()
    N = y_hat.shape[0]
    gradient = y_hat - y
    hessian = np.ones(N)
    return gradient, hessian


# load data
boston = load_boston()
X = boston["data"]
y = boston["target"]

# split data
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.2, random_state=0)
# for early stopping
X_train, X_eval, y_train, y_eval = train_test_split(X_train, y_train,
                                                    test_size=0.2, random_state=0)

# make DMatrix data
dtrain = xgb.DMatrix(X_train, label=y_train)
deval = xgb.DMatrix(X_eval, label=y_eval)
dtest = xgb.DMatrix(X_test, label=y_test)

# 標準のmetricではmseは存在せずrmseしかないのでrmseにする
params = {'objective': 'reg:squarederror', 'metric': 'rmse'}
watchlist = [(deval, 'eval')]
n_trees = 100
# 標準のsquared errorで学習
model1 = xgb.train(params, dtrain, num_boost_round=n_trees,
                   evals=watchlist, early_stopping_rounds=2)
# カスタム目的関数で学習
model2 = xgb.train(params, dtrain, num_boost_round=n_trees,
                   evals=watchlist, early_stopping_rounds=2,
                   obj=my_squared_error, feval=my_sse)

# 予測結果が一致するかどうか確認
pred1 = model1.predict(dtest)
print(f"head of pred1: {pred1[0:5]}")
pred2 = model2.predict(dtest)
print(f"head of pred2: {pred2[0:5]}")
print(f"予測結果が一致するもの：{sum(pred1 == pred2)}件({sum(pred1 == pred2) / X_test.shape[0]:.0%})")

head of pred1: [23.791985 26.905416 23.237139 10.941677 21.959444]
head of pred2: [23.791985 26.905416 23.237139 10.941677 21.959444]
予測結果が一致するもの：102件(100%)

結果は一致しました。問題なくsquared error / mse（rmse）を実装できていると思われます。