 # Python – How to calculate the mean and standard deviation of similarity matrix

dataframe, numpy, pandas, python, similarity

I am working with CSV files and I have a code that calculates the similarity between the documents. Post 1 provide the code and details of data and output is as follow:

The data.csv looks as:

``idx         messages112  I have a car and it is blue114  I have a bike and it is red115  I don't have any car117  I don't have any bike``

The output is:

``    id     112    114    115    117    id                                 112  100.0   78.0   51.0   50.0    114   78.0  100.0   47.0   54.0    115   51.0   47.0  100.0   83.0    117   50.0   54.0   83.0  100.0``

Now I would like to calculate the mean and standard deviation of the lower triangular of the similarity matrix (since both upper and lower are similar) without the identity data (100.0).

I tried to use the panda built-in mean and std as:

``df_std = df.std()df_Mean = df.mean()``

But this considers all the data in the output like identity and upper triangular.

I would like to know if there is any way that I can calculate the mean and standard deviation the way that I mentioned.

#### Best Solution

Use `numpy.tril` with `k=-1` and make 0s `np.nan`:

``import numpy as npltri = np.tril(df.values, -1)ltri = ltri[np.nonzero(ltri)]``

Output:

``array([[ 0.,  0.,  0.,  0.],       [78.,  0.,  0.,  0.],       [51., 47.,  0.,  0.],       [50., 54., 83.,  0.]])``

And now you can do `ltri.std()`, `ltri.mean()`:

``ltri.std(), ltri.mean()# (14.361406616345072, 60.5)``