线性回归

sklearn的linear regression

from sklearn.linear_model import LinearRegression

# Training data
X = df.loc[:, ['Time']]  # features
y = df.loc[:, 'NumVehicles']  # target

# Train the model
model = LinearRegression()
model.fit(X, y)

# Store the fitted values as a time series with the same time index as
# the training data
y_pred = pd.Series(model.predict(X), index=X.index)

ax = y.plot(**plot_params)
ax = y_pred.plot(ax=ax, linewidth=3)
ax.set_title('Time Plot of Tunnel Traffic')

sklearn 需要的 特征矩阵格式。所以 \(X\) 要是Dataframe 二维的， \(y\) 是一维的向量

预测

model.predict(X)

用之前训练好的线性回归模型 model 对输入特征 X 进行预测。

返回的是一个 NumPy 一维数组，长度等于样本数。这个数组没有索引信息，只是纯数字序列。把预测结果转换成 Pandas Series，方便操作和绘图。

趋势

series的 rolling

series.rolling(window, min_periods=None, center=False).function()

可以在 一维序列或 DataFrame 的列 上定义一个滑动窗口，然后对窗口内的数据进行统计计算（平均、求和、最大值、标准差等）。

window 是窗口大小，整数表示步数，或时间偏移量
min_periods 计算函数所需的最少有效值，默认window
center 是否把计算结果放在窗口中心（True）或窗口右端（False）

计算趋势

moving_average = tunnel.rolling(
    window=365,       # 365-day window
    center=True,      # puts the average at the center of the window
    min_periods=183,  # choose about half the window size
).mean()

DeterministicProcess

dp = DeterministicProcess(
    index=tunnel.index,  # 用训练数据的日期作为索引
    constant=True,       # 包含截距项（相当于 bias / dummy 变量）
    order=1,             # 多项式阶数，1 表示线性趋势
    drop=True,           # 避免共线性，必要时删除冗余项
)
X = dp.in_sample() # creates features for the dates given in the `index` argument

DeterministicProcess 是 statsmodels 提供的工具，用于生成 时间序列的确定性特征（趋势、季节性等）。

它能自动处理：

截距项（bias）
多项式趋势（线性、二次、三次等
避免共线性问题（collinearity）

这样在用 线性回归建模时间序列 时，更安全、更稳定。

就是生成确定性的序列作为特征。

无截距

model = LinearRegression(fit_intercept=False)
model.fit(X, y)

预测未来

生成未来的 \(X\)

X_fore = dp.out_of_sample(steps=30)  # 未来30天
y_pred = pd.Series(model.predict(X), index=X.index)