Histogram
A histogram is a graphical representation of the distribution of a continuous variable.
The histogram is only an estimation of the distribution. To construct a histogram you have to bin the range of values from the variable in a sequence of equal length intervals, and later on counting the values from each bin. Histograms can display counts, or can display proportions which are counts divided by the total number of values.
Because the histogram uses bins that the main parameter of a histogram is the bin width. The bin's width is computed. To compute the width of a bin we need the number of bins and the minimum and maximum from the range of values. The range of values can be computed automatically from data or it can be specified when the histogram is built.
We can omit the number of bins, in which case its value is estimated also from data. For estimation is used the Freedman-Diaconis rule. See Freedman-Diaconis wikipedia page for more details.
Example 1
Scope: Build a histogram with default values to estimate the pdf of sepal-length
variable from iris
data set.
Solution:
WS.draw(hist(iris.getVar("sepal-length")));
Example 2
Scope: Build two overlapped histograms with default values to estimate the pdf of sepal-length
and petal-length
variables from iris
data set. We want to get bins in range (0-10) of width 0.25, colored with red, and blue, with a big transparency for visibility
Solution:
WS.draw(plot(alpha(0.3f))
.hist(iris.getVar("sepal-length"), 0, 10, bins(40), color(1))
.hist(iris.getVar("petal-length"), 0, 10, bins(40), color(2))
.legend(7, 20, labels("sepal-length", "petal-length"), color(1, 2))
.xLab("variable"));
plot(alpha(0.3f))
- builds an empty plot; this is used only to pass default values for alpha for all plot components, otherwise the plot construct would not be neededhist
- adds a histogram to the current plotiris.getVar("sepal-length")
- variable used to build histogram0, 10
- specifies the range used to compute binsbins(40)
- specifies the number of bins for histogramcolor(1)
- specifies the color to draw the histogram, which is the color indexed with 1 in color palette (in this case is red)legend(7, 20, ...)
- draws a legend at the specified coordinates, values are in the units specified by datalabels(..)
- specifies labels for legendcolor(1, 2)
- specifies color for legendxLab
= specifies label text for horizontal axis