Skip to main content

Describing the sample

We need to describe our sample so that people can judge if it is representative of the population we are talking about. We describe our sample so that we can talk about the generlisability in our conclusions. For example, if our sample includes only men, it will not be generalisable to reflect what happens in women.

To describe our sample we would like to calculate the averages and deviations from the averages.

We want to describe our sample in terms of those given treatment compared to in the control group.

So we need to sort the data.


Highlight ALL the data EXCEPT the cells that contain calculations. Sorting is where most people ruin the database. Before we sort, we should save the database, so that if we make a mistake we can go back to the saved copy.

The mistake that people make is not highlighting all the data. If only some are highlighted then only these will be sorted. This makes a complete mess of the database.

Alternatively if we highlight the formulas and calculations we have created, we will put them in with the collected data and also make a bit of a mess of the data.

So highlight the column titles (username, gender etc..) and ALL the cells with collected data (not formula at the bottom of the database)

Select Data from the headings at the top

Select Sort

Select "My data range has.. Header row

Go to "Sort by" and select "group" by clicking on the arrow.

Then press "OK"


We want to know the average and standard deviation of the Ages, Disease duration, Age at onset

The people in group 0 should be from number 2 to 18

The people in group 1 should be from 19 to 31

Leave a bit of space between the end of the data and the start of the calculation and write:

Under the age at first symptoms column:





Copy this formula and paste them under the columns for

Age, disease duration and baseline wellbeing.