Opened 8 years ago

Last modified 8 years ago

#35 new task

What are dp and db supposed to be?

Reported by: dneise Owned by: somebody
Priority: minor Milestone: milestone:milestone1
Component: component1 Version:
Keywords: Cc:

Description

In line 188 and 189 of ratecontrol.cc
two values called dp and db are caluclated.
What are they supposed to mean?

Change History (3)

comment:1 by tbretz, 8 years ago

This is the 68% quantile around the median.

comment:2 by dneise, 8 years ago

Ah ... well ... in that case I believe there is a little bug in the implementation.

I show this using the example of the patch rates with a little python program, you can copy this into your ipython shell

import numpy as np
np.random.seed(0)
# let's get 160 random rates around the value 70, with a sigma of 10.
patch_rates = np.random.normal(70, 10, 160)

# let's print the median and the std-deviation for reference
print('median from numpy:', np.median(patch_rates))
print('68% around the median:', np.diff(np.percentile(patch_rates, [50-34, 50+34])) / 2)

## from this point on I use variable names from ratecontrol.cc and re-do what is done there

medp = np.copy(patch_rates)       # make new vector called `medp` containing the patch_rates
medp = np.sort(medp)              # sort it in place
devp = np.abs(patch_rates - medp)  # make a new vector `devp`, containing the elementwise difference between the original unsorted patch_rates and `medp`.

devp = np.sort(devp)              # sort `devp` in place

mp = (medp[79]+medp[80])/2;
dp = devp[109];

print('median from ratecontrol:', mp)
print('68% quantile from ratecontrol:', dp)

We see, the median works well, but dp is not the same as the 68% quantile. It is a little larger.

median from numpy: 70.9409611944
68% around the median: [ 10.35922294]
median from ratecontrol: 70.9409611944
68% quantile from ratecontrol: 14.5027351115

Even more interesting, dp depends on the order of the input data,
because of line 180 in ratecontrol.cc. One can see it in this example.

import numpy as np
np.random.seed(0)
patch_rates = np.random.normal(70, 10, 160)

# let's print the median and the std-deviation for reference

print('68% around the median:', np.diff(np.percentile(patch_rates, [50-34, 50+34])) / 2)

# now let's repeat it 3 times but we shuffle the input data
for test in range(3):
    np.random.shuffle(patch_rates)

    medp = np.copy(patch_rates)       # make new vector called `medp` containing the patch_rates
    medp = np.sort(medp)              # sort it in place
    devp = np.abs(patch_rates - medp)  # make a new vector `devp`, containing the elementwise difference between the original unsorted patch_rates and `medp`.

    devp = np.sort(devp)              # sort `devp` in place

    mp = (medp[79]+medp[80])/2;
    dp = devp[109];


    print('68% quantile from ratecontrol:', dp)

In my case the output is

68% around the median: [ 10.35922294]
68% quantile from ratecontrol: 14.0650573838
68% quantile from ratecontrol: 15.5652876525
68% quantile from ratecontrol: 14.8008871673

I think the C++ implementation of this could e.g. look like this:

// Calculate Median and deviation
vector<float> medp(sdata.fPatchRate, sdata.fPatchRate+160);
sort(medp.begin(), medp.end());

const double mp = (medp[79]+medp[80])/2;
const double dp1 = medp[135] - mp;
// or 
const double dp2 = (medp[135] - medp[25]) /2 ;

the result from a python implementation looks like this then:

median from numpy: 70.9409611944
68% around the median: [ 10.35922294]
mp: 70.9409611944
dp1: 11.0828372935
dp2: 10.5760103711

Last but not least ... this little bug, I believe, has a very limited impact on operation, if it has an impact at all. Since dp, is multiplied with 3.5 in line 236, I think, and this 3.5 has been chosen carefully by hand. So one would fix the implementation and thus reduce dp a little bit, one would simply have to increase this factor 3.5 a little bit an nothing would change overall.

Anyway I though I should mention this, since I was puzzled quite a lot when reading the code trying to understand it. From the variable name, I was guessing that dp, might stand for "delta_p" which is sometimes used as "sigma_p" depending on context, but due to the bug in the implementation I was unsure if I misinterpreted the variable name or whether it was an actual bug.

Now it's clear, thanks a lot.

Last edited 8 years ago by dneise (previous) (diff)

comment:3 by tbretz, 8 years ago

I agree. There is a bug :( Fortunately, it seems that a solution is not very urgent, given that it works already (by chance) now for several years. However, I will look into it when I find the time.

Note: See TracTickets for help on using tickets.