线性回归

回归：为一个或多个自变量与因变量之间的关系建模的一类方法

应用：预测问题

假设：

自变量\(\mathbf x\)和因变量\(y\)之间的关系是呈线性的
任何噪声比较正常（遵循正态分布）

概念：

训练（数据）集
样本 / 数据点 / 数据实例
标签 / 目标
特征 / 协变量

表示：

样本数\(n\)
索引为\(i\)的样本：\(\mathbf x^{(i)}=[x_1^{(i)},x_2^{(i)}]^\mathsf{T}\mapsto y^{(i)}\)

基本元素：

线性模型：\(\mathbf{\hat y}=\mathbf{Xw}+b\)
损失函数：模型质量的度量方式

\[ L(\mathbf w,b)=\dfrac{1}{n}\sum_{i=1}^n\dfrac{1}{2}\left(w^{\mathsf T}\mathbf x^{(i)}+b-y^{(i)}\right)^2 \]

模型训练时，期望寻找\((\mathbf w^*,b^*)\)最小化总损失，即

\[ \mathbf w^*,b^*=\mathop{\arg\!\min}\limits_{\mathbf w,b}\ L(\mathbf w,b) \]

解析解：当优化问题简单时，可用公式简单地表达该优化问题的解

令\(X\leftarrow\begin{bmatrix}X&1\end{bmatrix},\mathbf w\leftarrow\begin{bmatrix}\mathbf{w}\\b\end{bmatrix}\)

\[ \begin{array}{l} L(\mathbf X,\mathbf y,\mathbf w)=\dfrac{1}{2n}\left\Vert\mathbf y-\mathbf {Xw}\right\Vert^2\\ \dfrac{\partial}{\partial\mathbf w}L(\mathbf X,\mathbf y,\mathbf w)=-\dfrac{1}{n}(\mathbf{y-Xw})^{\mathsf T}\mathbf X\\ \dfrac{\partial}{\partial\mathbf w}L(\mathbf X,\mathbf y,\mathbf w)=0\Rightarrow \mathbf w^*=(\mathbf{X^{\mathsf T}X})^{-1}\mathbf{Xy} \end{array} \]

并非所有问题都存在解析解

随机梯度下降：几乎可以优化所有深度学习模型

\[ \begin{array}{l} \mathbf w\leftarrow\mathbf w-\dfrac{\eta}{|B|}\sum\limits_{i\in B}\partial_{\mathbf w}l^{(i)}(\mathbf w,b)=\mathbf w-\dfrac{\eta}{|B|}\sum\limits_{i\in B}x^{(i)}(\mathbf w^{\mathsf T}\mathbf x^{(i)}+b-y^{(i)})\\ b\leftarrow b-\dfrac{\eta}{|B|}\sum\limits_{i\in B}\partial_{b}l^{(i)}(\mathbf w,b)=b-\dfrac{\eta}{|B|}\sum\limits_{i\in B}(\mathbf w^{\mathsf T}\mathbf x^{(i)}+b-y^{(i)}) \end{array} \]

预测（推断）：给定特征的情况下，利用已学习的线性回归模型\(\mathbf{\hat w^{\mathsf T}x}+\hat b\)估计目标

神经网络描述：单层神经网络

输入：\(x_1,\cdots,x_d\) 特征维度 / 输入数：\(d\)
输出：\(o_1\) 输出数：1
层数：不考虑输入层，层数为1
全连接层 / 稠密层：每个输入都与每个输出相连