Questions tagged [stochastic-approximation]

31 questions
10
votes
2 answers

Is Markov chain based sampling the "best" for Monte Carlo sampling? Are there alternative schemes available?

Markov Chain Monte Carlo is a method based on Markov chains that allows us to obtain samples (in a Monte Carlo setting) from non-standard distributions from which we cannot draw samples directly. My question is why Markov chain is…
6
votes
3 answers

Stochastic Differential Equations - A Few General Questions

I just have a few questions about stochastic differential equations. I generally did a lot of pure math but signed up for a course on probability models and stochastic differential equations because I wanted to try something different. I have really…
5
votes
1 answer

Confusion about Robbins-Monro algorithm in Bishop PRML

This is basically how Robbins-Monro is presented in chapter 2.3 of Bishop's PRML book (from his slides): In the general update equation, $$ \theta^{(N)} = \theta^{(N-1)} - \alpha_{N-1}z(θ^{(N-1)}) $$ $z(θ^{(N)})$ is an observed value of $z$ when…
4
votes
0 answers

Difference between Stochastic Approximation (SA) and Stochastic Gradient Descent (SGD)

I understand the intended use cases for both stochastic approximation algorithms like SPSA or FDSA, and for SGD algorithms like Adam. SPSA is intended for noisy objective functions, and Adam for randomized mini batches. So for me it looks like the…
4
votes
1 answer

Stochastic gradient descent: why randomise training set

I'm given a dataset of 200 million training examples. The stochastic gradient descent method requires me to sample these randomly, to avoid it gets 'stuck'. First and for all, I don't see how it gets stuck. So the fact the sample needs to be…
4
votes
1 answer

100 m sprint world record times - lower bound

Can we find the lowest attainable bound for the 100 m sprint times, i.e. the quickest it can be run ever, using the past data? So every now and again the record gets broken and we can map the new record time. But surely there must be a particular…
4
votes
0 answers

Rootfinding in the presence of one-sided noise

I'm faced with a practical problem of solving for a 1-D function which has noise, so find myself in the territory of stochastic approximation (and I am well out of my comfort zone here!). I know there is some literature on the problem, but I've yet…
J.J. Green
  • 141
  • 3
3
votes
1 answer

Root-finding via Robbins-Monro method: A real and simple example

I am looking for a real and simple example for the Robbins-Monro (RM) method, but most of the googled results are theoretical and abstract. To understand the RM, I used a simple function \begin{equation*} f( x) =\frac{1}{1+exp( -0.5x)}…
3
votes
1 answer

How do I compare the the sampling distribution of the minimum of a distribution by sample sizes

I saw this question (link) but when I read it, I see that it has a fixed "N" so I thought it was asking about for a finite sample size. When I read the answer that it was suggested to be a duplicate of (link) the answer is analytic, and in terms of…
3
votes
2 answers

Frequency distribution of Chinese Restaurant Process?

Set-up I was simulating the Generalized Chinese Restaurant Process as shown on the wikipedia page [link] with a discount, $\alpha$, and concentration parameter $\theta$ For $n=5$ total customers being seated with $\alpha=.3$ and $\theta=1.5$ and…
2
votes
1 answer

Variational inference with discrete variational parameters

Typically Variational Inference relies on taking gradient steps on KL divergence between the variational and true posterior, or on the ELBO. This does not seem valid when variational parameters are discrete (since gradients wrt those arguments are…
2
votes
1 answer

Is it possible to combine SPSA and Adam?

In SGD algorithms such as Adam you generally make a bad estimate of the gradient of the loss function and take that gradient to move the parameters in the desired direction. Gradient free methods such as SPSA do basically the same, they make a bad…
2
votes
0 answers

Log Euler simulation scheme for Cox–Ingersoll–Ross model

https://en.wikipedia.org/wiki/Cox%E2%80%93Ingersoll%E2%80%93Ross_model In this article the Cox–Ingersoll–Ross is given. I want to design a simulation scheme for this process. A continuous SDE can be discretized is also given in the article. However…
2
votes
0 answers

SGD shows the same convergence behaviour as batch gradient descent when using adaptive learning rate?

SGD shows the same convergence behaviour as batch gradient descent when using adaptive learning rate ? I dont understand why he claimed that. I couldnt find any reference about it in any paper. However, it has been shown that when we slowly…
2
votes
1 answer

Simulation of Secretary problem: optimal pool size given k=2?

Question: Is it incorrect to think there is a "sweet spot" where more samples slightly decreases the likelihood of a "Best pick" in the Secretary Problem? Details: The "Secretary Problem" from "optimal stopping" is a classic in decision theory. …
1
2 3