Quick Experiment – Breaking the sine in one line

As we discussed today in class and as mentioned by Laurent in his post, adding gaussian noise is important to prevent convergence to a constant. In my last post, I found this kind of behaviour which results in a flatline.

Recall Laurent’s post:

$P(x_{t}|x_{t-k},\ldots,x_{t-1})=\mathcal{N}(x_{t}|f(x_{t-k},\ldots,x_{t-1}),\hat{\sigma^{2}})=\mathcal{N}(x_{t}|\hat{x_{t}},\hat{\sigma^{2}})$

where $k$ is the size of the window.

The estimator of the variance is the mean square error (MSE):

$\hat{\sigma^{2}}=\frac{1}{n}\sum_{i=1}^{n}(\hat{x_{t}}-x_{t})^{2}$

where $n$ is the number of training examples.

Compared to my previous post, I only had to modify the output of the neural network to generate speech. Concretely, I sampled from a gaussian distribution which mean equals to the output of the neural network and which variance equals to the MSE of the training examples. You will find that in the same mlp.py script (go to line 380).

Here are the results (note that I used the same previous hyperparameters):

Learning curve. Note that I normalized the samples by dividing them by 560 which is roughly the standard error.

Generation of the acoustic based on a NN trained for 100 epochs. It starts at sample 2500 of the SX397.WAV and then generates the next 30 000 samples.

For those interested in listening the resulting generated speech (in case of the audio player doesn’t work, here is the link to the .wav file):

https://dl.dropboxusercontent.com/u/43075537/generated_data.wav%20

In conclusion it doesn’t sound like Georges Brassens but it does break the undesired behaviour described above!

3 comments

This is a very interesting result! I analysed the spectrum of your model’s output file and it looks a lot like speech in average should look (high energy in low frequency components, with a 10 dB/octave decay), which probably means the model was capable of capturing this aspect of speech’s distribution.

Pingback: Second MLP architecture for next sample prediction – Part II | IFT6266 Project on Representation Learning

Pingback: Second Experiment — Say ‘aa’ vanilla MLP | IFT6266

João Felipe Santos says:

February 27, 2014 at 3:47 PM

This is a very interesting result! I analysed the spectrum of your model’s output file and it looks a lot like speech in average should look (high energy in low frequency components, with a 10 dB/octave decay), which probably means the model was capable of capturing this aspect of speech’s distribution.

Pingback: Second MLP architecture for next sample prediction – Part II | IFT6266 Project on Representation Learning
Pingback: Second Experiment — Say ‘aa’ vanilla MLP | IFT6266

	Final final experime… on Last Experiment — Extrac…
	Second Experiment… on Quick Experiment – Break…
	TIMIT – FCJF0… on First experiment — Vanil…
	Second MLP architect… on Quick Experiment – Break…
	João Felipe Santos on Quick Experiment – Break…

IFT6266

William's Blog

Quick Experiment – Breaking the sine in one line

3 comments

Leave a comment Cancel reply

Share this:

Related

3 comments

Leave a comment Cancel reply