Pgfplots: density in histogram by factor 10 too small

1,199

First: As you already should know from your linked question your provided example using hist=density gives the desired result, which is a "to 1 mass normalized" density.

(That this is true can be estimated by imagining a triangle from (20,2) to (110,0) and back to (20,0) which has an area of about 90*0.02/2 which is round about 1. Another "proof" can be given, if you multiply all values by 10 by just appending a "0" to each number. Plotting this with hist=density will then show all x values multiplied by 10 while all y values are divided by 10.)

Now back to your "problem": To avoid counting the number of data points in your data file you can use the \pgfplotstablegetrowsof command of the pgfplotstable package. For more details have a look at the comments in the code.

\documentclass[border=2mm]{standalone}
\usepackage{pgfplots}
\usepackage{pgfplotstable}
\begin{filecontents*}{data.txt}
18
30
64
68
27
29
99
19
27
72
64
37
62
50
104
39
55
37
37
30
37
115
51
32
31
59
50
19
43
40
82
24
26
28
62
24
56
63
35
58
18
32
53
44
42
25
31
38
52
62
51
49
47
89
26
58
41
39
37
31
53
54
62
18
53
22
20
22
65
51
52
23
39
29
37
77
31
46
34
23
28
96
27
34
69
30
33
34
72
32
41
37
48
41
57
42
31
30
39
18
81
23
101
80
45
20
39
20
30
57
96
50
28
68
53
55
70
56
40
45
91
70
32
26
49
40
20
38
47
48
31
23
42
43
56
37
34
33
36
41
82
49
29
31
28
23
26
53
49
24
90
28
50
57
51
25
40
100
58
30
53
43
44
88
78
85
21
56
41
37
26
52
30
68
21
46
33
76
64
53
51
123
24
45
114
31
47
58
30
50
80
25
24
20
28
43
25
41
51
40
24
52
112
25
20
131
77
66
38
72
46
29
38
41
55
48
24
29
31
18
40
63
49
34
18
49
30
67
32
31
38
54
22
25
62
78
67
45
76
29
128
27
44
95
120
51
56
47
26
61
44
39
31
23
21
42
122
29
70
28
37
33
39
34
25
23
24
33
135
43
74
68
24
25
49
18
19
24
23
49
54
78
77
98
30
56
52
36
46
80
24
74
93
83
36
59
110
22
50
23
45
68
71
78
54
46
62
62
77
34
87
44
36
85
20
41
31
76
27
34
26
45
24
42
\end{filecontents*}
\begin{document}
    \begin{tikzpicture}
        % read the data file and store it in `\data'
        \pgfplotstableread{data.txt}\data
        % get number of data points in `\data' ...
        \pgfplotstablegetrowsof{\data}
        % ... and store it in `\N'
        \pgfmathsetmacro{\N}{\pgfplotsretval}
        \begin{axis}[
            ybar,
            % print y values in percent
            % (see e.g. <https://tex.stackexchange.com/a/87431/95441>)
            yticklabel={%
                \pgfmathparse{\tick*100}%
                \pgfmathprintnumber{\pgfmathresult}\,\%},
        ]
            % (just for debugging purposes:
            % show the number of rows in the upper right corner)
            \node [draw,gray,anchor=north east,align=left,font=\scriptsize]
                at (axis description cs:0.98,0.98) {Number of \\ data rows: \\ \N};

            \addplot+ [
                hist,
                % calculate ("non-normalized to the mass of 1") density by
                % dividing the y value by the number of data points
                y filter/.expression={y/\N},
            ] table [y index=0] {\data};
        \end{axis}
    \end{tikzpicture}
\end{document}

image showing the result of above code

Share:
1,199

Related videos on Youtube

Mace
Author by

Mace

Updated on September 10, 2020

Comments

  • Mace
    Mace about 3 years

    I have a set of data which I want to plot using the hist=density feature. The first two bins which are created already contain more than 20% of the data each. Unfortunately, the output graphic states only 2%.

    Interestingly, if you put hist={density,cumulative}, the calculated values are correct.

    Here is my MWE including the sample data:

    \documentclass{standalone}
    \usepackage{pgfplots,pgfplotstable}
    \pgfplotsset{compat=1.13}
    \usepackage{filecontents}
    %
    \begin{filecontents*}{data}
    18
    30
    64
    68
    27
    29
    99  
    \end{filecontents*}
    %
    \begin{document}
        \begin{tikzpicture}
            \begin{axis}[ybar]
                \addplot +[hist=density] table [y index=0] {data};
            \end{axis}
        \end{tikzpicture}
    %
        \begin{tikzpicture}
            \begin{axis}[ybar]
                \addplot +[density,cumulative}] table [y index=0] {data};
            \end{axis}  
        \end{tikzpicture}
    \end{document}
    

    This is the output: graphs created using the code above

    The workaround to calculate the density manually from this question works well, but as I have a lot of data sets with varying numbers of values I would greatly appreciate if anybody could help me solving this issue. Thanks!

    • Mace
      Mace about 7 years
      I edited the amount of data in the MWE but didn't change the graphic.
  • Mace
    Mace about 7 years
    Please correct me if I'm wrong, I still think that my "problem" acutally exists and that I described it correctly: My chart looks like yours, but as you can see, the y scale is set to 10^-2. Hence, the area would add up to 0.1 instead of 1.
  • Stefan Pinnow
    Stefan Pinnow about 7 years
    Sorry, but I think you are wrong. Do you agree that $2 \cdot 10^{-2} = 0.02$? With an approximate $\Delta x$ of the first bar of 10 the area under that bar is 0.02*10 = 0.2 = 20%, right?
  • Mace
    Mace about 7 years
    Now I got it. I certainly do agree with your answer. Apparently I misunderstood the behaviour of the density function. I neglegted the width of the bins, assuming that only the sum of y values should add up to 1. Thanks for the solution and the explanation!