# module 'scipy.stats' has no attribute 'median_grouped'

I have the dataframe called " table The "Class" column is a column in which the data are grouped into class intervals. In the column " fi "column contains the absolute frequencies of the observations. In the column " Fi "We have the accumulated frequencies. Applying scipy.stats I try to get the grouped median

from scipy import stats
#Mediana de una muestra de observaciones
tabla = {'Rango' : ["0-5", "5-10", "10-15", "15-20", "20-25", "25-30"],
"LimExaInf" : [0, 5, 10, 15, 20, 25],
"LimExaSup" : [5, 10, 15, 20, 25, 30],
"fi": [6, 10, 3, 5, 9, 4],
"Fi": [6, 16, 19, 24, 33, 37]}
tabla = pd.DataFrame(tabla, dtype = np.float64)
print("\nMediana :", stats.median_grouped(tabla["fi"])) 

The script returns the error mentioned in the title.

AttributeError : module 'scipy.stats' has no attribute 'median_grouped' Realizando el cálculo aplicando la fórmula

Modal range is the interval that has the highest absolute efficiency.

Li is the lower limit of the modal range.

If we apply the formula, for which I have developed the following script.

N = tabla["fi"].sum()
for i in range (0, tabla.shape[0]):
if tabla["Fi"][i] > N/2:
Li = tabla ["LimExaInf"][i]
Ls = tabla ["LimExaSup"][i]
ci = Ls - Li
M = Li + (  ((N/2) - tabla["Fi"][i-1])/ tabla["fi"][i])*ci
break
print ("\nMediana :", M) 

the result is 14.17.

I would appreciate help in correctly applying the above function.

I've been thinking about it and trying things out and I think that statistics.median_grouped(data, intervalo) is intended for data without counting. I have created a couple of tables with values that fit within your class and frequency intervals and I get 14 when I use an interval of 6 and in another table that I know the result also gives me a similar value. I still don't know if there is a rounding error or if I am missing something, but I think the intention of these functions is to use them directly with the raw data.

1voto

Jose Rodriguez Points 51

As the error message points out, there is no attribute called median_grouped . There are median_absolute_deviation() y median_test()

After the comments I have taken this version of the calculation you require based on its formula and the attached table and not on a built-in. It can be further optimized at the cost of readability but I think it is easier to understand. With a little more time and desire you could make a function that accepts arguments with the values of a table (media_grouped(*table) could be a good start), but I leave it as an exercise for the reader.

If you have any doubt, comment again, best regards.

tabla = {
"LimExacInt": ["0-5", "5-10", "10-15", "15-20", "20-25", "25-30"],
"fi": [6, 10, 3, 5, 9, 4],
"Fi": [6, 16, 19, 24, 33, 37],
}

intervalo = 5  # Intervalo de clase
n_2 = max(tabla["Fi"]) / 2
for indice, cf in enumerate(tabla["Fi"]):
if n_2 < cf:
l_inf = indice * intervalo
pcf = tabla["Fi"][indice - 1]
f = tabla["fi"][indice]
median_grouped = l_inf + ((n_2 - pcf) / f) * intervalo
print(median_grouped)
break

If we replace "data1" with "table["fi"]", the answer is not correct. I have gone into the statistics module to try to understand what this function does, and now I am even more confused. I don't understand the difference between the "statistics.median" and "statistics.median_grouped" functions in this module. The basic question is to find a function that calculates the median, starting from a list of absolute frequencies. I could use any module.

Let's see, here you know more than I do about statistics. But taking a quick look at the concept of median I understand that it is not the average of all the values but the value that is in the middle (or the average of the two central values if they are even elements). And that in the case of median_grouped what it allows you is the same but taking the total median grouped by intervals of 2, 3, 4...etc. Is it like that? Then I think that your solution of 14,17 is incorrect. Please correct me if I am wrong

The median is the value of the variable that leaves as many observations to its left as to its right. The formula for its calculation when we have the data grouped in intervals, is the one I have inserted in the figure. The result of applying it is 14.17. The function "median_grouped", should (if it is to calculate the average of grouped data), using the variables, calculate internally absolute frequencies, accumulated frequencies, intervals, size of the intervals and number of sequences, to give the same result as the one returned by applying the formula.

0voto

efueyo Points 182

Jose Rodriguez, the answer is a bit long, so I am doing it here.

The median is the value of the variable that has as many observations to its left as to its right. The formula for its calculation when we have the data grouped in intervals, is the one I have inserted in the figure. The result applying the formula is 14.17.

The function "median_grouped", should (if it is to calculate the average of grouped data), using the variables, calculate internally absolute frequencies, accumulated frequencies, intervals, size of the intervals and the sum of sequences, to give the same result as the one returned by applying the formula. I have made the test by creating the array

array([65, 36, 49, 84, 79, 56, 28, 43, 67, 36, 43, 78, 37, 40, 68, 72, 55,
62, 22, 82, 88, 50, 60, 56, 57, 46, 39, 57, 73, 65, 59, 48, 76, 74,
70, 51, 40,  7, 56, 45, 35, 62, 52, 63, 32, 80, 64, 53, 74, 34, 76,
60, 48, 55, 51, 54, 45, 44, 35, 51, 21, 35, 61, 45, 33, 61, 77, 60,
85, 68, 45, 53, 34, 67, 42, 69, 52, 68, 52, 47, 62, 65, 55, 71, 73,
50, 53, 59, 41, 54, 41, 74, 82, 58, 26, 35, 47, 50, 38, 70])

If you apply to this array the functions statistics.median y statistics.median_grouped you will see that the result is the same, 14.5, with both functions.

So I understand that "median_grouped" is not the function I am looking for and I don't know what it is for.