|
|
| __init__ (self, np.ndarray x, Callable|float step, Callable|None grad=None, Callable|None prox=None, float|SingleItemArray b1=0.9, float b2=0.999, float eps=1e-8, float p=0.25, np.ndarray|None m0=None, np.ndarray|None v0=None, np.ndarray|None vhat0=None, str scheme="amsgrad", float prox_e_rel=1e-6) |
| |
| | update (self, int it, np.ndarray input_grad, *args) |
| |
| AdaproxParameter | __deepcopy__ (self, dict[int, Any]|None memo=None) |
| |
| AdaproxParameter | __copy__ (self) |
| |
|
| __init__ (self, np.ndarray x, dict[str, np.ndarray] helpers, Callable|float step, Callable|None grad=None, Callable|None prox=None) |
| |
| float | step (self) |
| |
| tuple[int,...] | shape (self) |
| |
| npt.DTypeLike | dtype (self) |
| |
| Parameter | __copy__ (self) |
| |
| Parameter | __deepcopy__ (self, dict[int, Any]|None memo=None) |
| |
| Parameter | copy (self, bool deep=False) |
| |
| | resize (self, Box old_box, Box new_box) |
| |
Operator updated using the Proximal ADAM algorithm
Uses multiple variants of adaptive quasi-Newton gradient descent
* Adam (Kingma & Ba 2015)
* NAdam (Dozat 2016)
* AMSGrad (Reddi, Kale & Kumar 2018)
* PAdam (Chen & Gu 2018)
* AdamX (Phuong & Phong 2019)
* RAdam (Liu et al. 2019)
See details of the algorithms in the respective papers.
Parameters
----------
x:
The array of values that is being fit.
step:
A numerical step value or function to calculate the step for a
given `x`.
grad:
A function to calculate the gradient of `x`.
prox:
A function to take the proximal operator of `x`.
b1:
The decay rate of the first moment (mean) of the gradient.
b2:
The decay rate of the second moment (variance) of the gradient.
eps:
A small constant added for numerical stability.
p:
The power used by the ``PAdam`` scheme.
m0:
The initial value of the first moment.
If `None` then an array of zeros is used.
v0:
The initial value of the second moment.
If `None` then an array of zeros is used.
vhat0:
The initial value of the maximum second moment.
If `None` then an array of ``-inf`` is used.
scheme:
The name of the ADAM variant to use to update the parameter.
prox_e_rel:
The relative error used by the proximal operator.