You may access the dataset below as www.uic.edu/classes/idsc/ids270sls/days_ill.xls .
You will have the accessed Excel workbook in one window, and these instructions in another window. You can go back and forth between the two.
One often reasonable rule for the number of bins is:
If n is about 2k, use k+1 bins. In this example, n = 50, which is between 32 and 64, the fifth and sixth powers of 2. The rule of thumb would suggest six or seven bins.
Using Excel, make histograms with different numbers of bins, as follows.
Go to the Toolbar, then > Tools > Data Analysis > Histogram.
The min is 0, the max is 18. Accordingly, first have Excel group the data into the seven bins 0, 1-3, 4-6, 7-9, 10-12, 13-15, 16-18. This means that in a range of cells in the spreadsheet you put in the bin upper limits 0, 3, 6, 9, 12, 15, and 18.
In Excel's histogram, click the graphical output option. You will get a bar chart. Right-click on a bar. Go to Options. Change the gap width from 150 to 0.
Next, group the data into the ten bins 0, 1-2, 3-4, 5-6, 7-8, 9-10, 11-12, 13-14, 15-16, 17-18. This means that in a range of cells in the spreadsheet you put in the bin upper limits 0, 2, 4, 6, 8, 10, 12, 14, 16, 18. Again, change the gap-width in the bar graph to 0.
Finally, don't group the data. Just use the nineteen bins 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18. This means that in a range of cells in the spreadsheet you put in these numbers as the bin upper limits. Again, change the gap in the bar graph to 0.
What are some advantages and disadvantages of the three histograms?
# DaysILL DAT
#
# Adapted from Kenkel
# This version Copyright (C) 1995 Stanley Louis Sclove
#
# Days work lost due to illness, one year period
# Sample of 50 coal miners
7
6
3
7
2
4
1
3
8
3
5
2
9
14
2
0
1
7
8
12
8
5
3
6
2
9
5
6
8
4
1
0
6
8
14
11
17
2
18
12
7
12
13
12
9
5
3
6
8
5