Dynamically count words in chapter and insert word count at start of chapter

2,205

Solution 1

Assuming the file count.tex is not created, it may be that the command in \write18 is not run, so you might want to ensure that you are actually running TeXcount.

First, for \write18 to execute, LaTeX must be run with a command line option: --enable-write18 or --shell-escapee as explained in the TeXcount FAQ.

Next, you can try running TeXcount without the pipes, and use the option -out=filename to write the output from TeXcount directly to file: eg texcount -out=\jobname.out \jobname.tex to give a minimal example. If this fails, TeXcount is probably not run at all.

Maybe you need to provide the full path to texcount, although I think that would be unlikely to be a problem on Linux.

Is there any information in the LaTeX log? I think it should log that \write18 is being run or not, and perhaps provide some error message if something has gone seriously wrong.


Once you have TeXcount running using \write18, you need to run TeXcount with options that make TeXcount process the included files, and produce statistics either per file or per section depending on you desire.

By default, TeXcount only parses the main file, not files included through \input or \include. To make TeXcount process these, you need to use one of two options: either -inc which makes TeXcount parse each file separately, or -merge which makes it merge the files together and process it as if it was one big file.

What would be closest to the original example would be

texcount -merge -sub=section \jobname.tex

which would merge the files and produce per section summary counts. I think the grep and sed commands might work as in the example with this command: otherwise, it should work with some adjustments.

I recommend running TeXcount from the command line first to see the full output, and then adding the greps and seds to check if they work as desired.

Since your files lie in subfolders, you might want to verify what the value of \thesection is at each relevant point.

A slightly different approach could be to use per-file statistics from TeXcount by running something like

texcount -inc -brief \jobname.tex

which should return one line per file plus one for the total. Potential problems with this approach would be that you'd need the file name (or path) to extract the correct line from the TeXcount output, and that the section headers in your example would be counted as part of the main file rather than the corresponding file.


As a side-note, there are other ways to provide per section counts. One is to run TeXcount on each included file rather than one the whole document and grep out the relevant section. Another is to use a template to customise the output from TeXcount in such a way that it produces LaTeX code: I'll see if I can find or come up with an example of how to do that.

Alternative solution giving counts per file

You can define an alternative file input command which does the counting per file:

\newcommand\countinput[1]{
\input{#1}
\immediate\write18{texcount "#1.tex" -1 -sum > count.txt}
\footnote{FILE #1 CONTAINS \oldinput{count.txt} WORDS}
}

You may experiment with TeXcount options -1, -sum, -brief to find combinations that give you what you want. There is also the -template option for additional customisation, but that might get a bit more tricky.

You can even redefine the existing \input along these lines:

\let\oldinput=\input
\newcommand\countinput[1]{
\oldinput{#1}
\immediate\write18{texcount "#1.tex" -1 -sum > count.txt}
\footnote{FILE #1 CONTAINS \oldinput{count.txt} WORDS}
}
\let\input=\countinput

Do note that you now have to use \oldinput instead of \input to include the counts file.

For experimenting, it might be easier to use \verbatiminput from the verbatim package to include the counts file since the counts file tends to contain characters that TeX treats as special characters: eg "#". That way, you can use the full default output from TeXcount with per section counts should you wish.

Do note that the per file counts will not include the chapter headers as those are part of the main file rather than included in the subfiles.

Solution 2

Here is a different approach. It is likely not as precise as what you are otherwise trying to accomplish, and it requires the invocation of a Chapter environment. The upside is that only a single pass is required, with no external file writes. It uses the listofitems approach.

\documentclass{book}
\usepackage[T1]{fontenc}
\usepackage{lmodern,listofitems,environ}
\makeatletter\let\gobble\@gobble\makeatother
\NewEnviron{Chapter}[1]{%
  \chapter{#1}
  \ignoreemptyitems%
  \setsepchar{ ||\par||\ }%
  \let\z\expandafter%
  \z\z\z\greadlist\z\z\z\thewordcount\z\z\z{\z\gobble\BODY}%
  \noindent Chapter word count is: \listlen\thewordcount[]%

  \z\z\z\greadlist\z\z\z\thewordcount\z\z\z{\z\gobble\BODY\ #1}%
  \noindent Chapter word count (including title) is: \listlen\thewordcount[]%
  \vspace{\baselineskip}%

  \BODY
}
\begin{document}
\begin{Chapter}{Here is a chapter}

When in the course of human events, we resort to math
\begin{equation}
 E = m c^2
\end{equation}
Note that \textit{arguments only count as a single word}!
And now we are done
\end{Chapter}

\vspace{1in}
This is how the words are digested:

\showitems*\thewordcount
\end{document}

enter image description here

Share:
2,205

Related videos on Youtube

ratuk_
Author by

ratuk_

Updated on August 01, 2022

Comments

  • ratuk_
    ratuk_ over 1 year

    TL;DR Einar's first alternative solution of making a new input command that automatically generates counts at the end of each input section is what I used to generate counts at the end of each chapter:

    \newcommand\countinput[1]{
    \input{#1}
    \immediate\write18{texcount "#1.tex" -1 -sum > count.txt}
    \footnote{FILE \#1 CONTAINS \input{count.txt} WORDS}
    }
    

    However, this creates a new command and means counting the entire doc is then harder because you're using 'countinput' instead of 'input'; but for now that's fine for my purposes and my problem of counting each chapter is solved in the short term.

    Meanwhile, nevermind the need to enable write18 ... you need to make sure you've done --shell-escape (not --shell-escapee as Einar wrote and I kept trying, urggg!) or checked the following box if you're using Atom:

    enter image description here

    Question:

    I am trying to adapt this suggestion of how to dynamically count words in sections, but I want to use it for a large report format doc (a thesis) where it displays a count of the words in each chapter (and at the start of the chapter, if possible).

    Their solution proposes this using texcount:

    \documentclass{article}
    \newcommand\wordcount{
        \immediate\write18{texcount -sub=section \jobname.tex  | grep "Section" | sed -e 's/+.*//' | sed -n \thesection p > 'count.txt'}
    (\input{count.txt}words)}
    
    \begin{document}
    \section{Introduction}
    In publishing and graphic design, lorem ipsum is placeholder text (filler text) commonly used to demonstrate the graphics elements of a document or visual presentation, such as font, typography, and layout. The lorem ipsum text is typically a section of a Latin text by Cicero with words altered, added and removed that make it nonsensical in meaning and not proper Latin.
    
    \wordcount
    \section{Main Stuff}
    Even though "lorem ipsum" may arouse curiosity because of its resemblance to classical Latin, it is not intended to have meaning. Where text is comprehensible in a document, people tend to focus on the textual content rather than upon overall presentation, so publishers use lorem ipsum when displaying a typeface or design elements and page layout in order to direct the focus to the publication style and not the meaning of the text. In spite of its basis in Latin, use of lorem ipsum is often referred to as greeking, from the phrase "it's all Greek to me," which indicates that this is not meant to be readable text.
    
     \wordcount
    \section{Conclusion}
    Today's popular version of lorem ipsum was first created for Aldus Corporation's first desktop publishing program Aldus PageMaker in the mid-1980s for the Apple Macintosh. Art director Laura Perry adapted older forms of the lorem text from typography samples — it was, for example, widely used in Letraset catalogs in the 1960s and 1970s (anecdotes suggest that the original use of the "Lorem ipsum" text was by Letraset, which was used for print layouts by advertising agencies as early as the 1970s.) The text was frequently used in PageMaker templates.
    
    \wordcount
    \end{document}
    

    However, my document has chapters where the main tex file works as follows:

    \documentclass{report}
    \newcommand\wordcount{
        \immediate\write18{texcount -sub=section \jobname.tex  | grep "Section" | sed -e 's/+.*//' | sed -n \thesection p > 'count.txt'}
    (\input{count.txt}words)}
    
    \begin{document}
    
    \chapter{Introduction}
    \input{sections/introduction}
    \wordcount
    
    \chapter{Many Chapters}
    \input{sections/chapters}
    \wordcount
    
    \chapter{Conclusion}
    \input{sections/conclusion}
    \wordcount
    

    Now I know that the \wordcount command should probably sit inside the chapter files themselves, and at the start of the file if that's where I want it, but either way it fails.

    And, it fails because "File 'count.tex' not found."

    If it is relevant, I use atom with latex package.

    Is there any way I can adapt @Jake's solution to suite my use case?

    EDIT:

    Following Einer's solution, of using this in a magic comment

    --enable-write18
    

    I can now get the module to count the words in the main tex file (not many, just abstract), but I still can't seem to get it to work for specific chapters or sections as per original need. If I do the original method above it just says

    ( words)
    

    in the output file, like this:

    like this

    EDIT 2: Just to clarify, my current code still generates the blank count output above (I am calling \wordcount at the end of each chapter file, e.g. chapter1.tex, which is called from main.tex via \input{sections/chapter1.tex}), and my code is:

    In the main.tex file:

    % !TEX --enable-write18
    
    \documentclass{report}
    
    \newcommand\wordcount{
        \immediate\write18{texcount -merge -sub=section  \jobname.tex  | grep "Section" | sed -e 's/+.*//' | sed -n \thesection p > 'count.txt'}
    (\input{count.txt}words)}
    
    \begin{document}
    
    \chapter{Introduction}
    \input{sections/introduction.tex}
    
    \chapter{Many Chapters}
    \input{sections/chapter1.tex}
    
    \chapter{Conclusion}
    \input{sections/conclusion.tex}
    

    At the end of each chapter, e.g. chapter1.tex:

    A little bit of intro text.
    
    \section{Section 1}
    
    Lorem ipsum la la la and all that.
    
    \section{Section 2}
    
    Lorem ipsum la la la and all that. Except this section might be longer, more lorem ipsum dolor lalala sit amet, and so on.
    
    \wordcount
    

    However, this does not produce a count (blank again as per above), nor does it work with:

    \newcommand\wordcount{
        \immediate\write18{texcount -inc -brief -sub=section  \jobname.tex  | grep "Section" | sed -e 's/+.*//' | sed -n \thesection p > 'count.txt'}
    (\input{count.txt}words)}
    

    Therefore... it seems everything works with texcount but the problem is all in my \newcommand settings... and in essence, I am back to my original question -- how do I setup this \newcommand to call texcount to count the text in the chapter (e.g. chapter1.tex) and display it in the chapter?

    EDIT 3:

    A big problem is that the merge command isn't including or counting the content of the \input chapter files, only their chapter titles, e.g. output of standard texcount on main.tex is:

    File: main.tex
    Encoding: ascii
    Words in text: 333
    Words in headers: 30
    Words outside text (captions, etc.): 2
    Number of headers: 12
    Number of floats/tables/figures: 0
    Number of math inlines: 0
    Number of math displayed: 0
    Subcounts:
    text+headers+captions (#headers/#floats/#inlines/#displayed)
    0+9+2 (1/0/0/0) _top_
    333+1+0 (1/0/0/0) Chapter: Abstract
    0+1+0 (1/0/0/0) Chapter: Outline
    0+1+0 (1/0/0/0) Chapter: Introduction
    0+2+0 (1/0/0/0) Chapter: Some Stuff
    0+1+0 (1/0/0/0) Chapter: Conclusion
    

    EDIT 4:

    After trying Einer's alternative solutions I am still stumped, mainly because texcount is still NOT counting the input files from the sections directory... for example, the first solution looked promising as:

    \newcommand\countinput[1]{
    \input{#1}
    \immediate\write18{texcount "#1.tex" -1 -sum > count.txt}
    \footnote{FILE #1 CONTAINS \input{count.txt} WORDS}
    }
    

    (note I have changed \oldinput{count.txt} to \input{count.txt} because \oldinput was an undefined control sequence)

    and

    \chapter{Outline}
    \countinput{sections/outline}
    

    I still get no word count (and count.txt is produced but remains empty):

    enter image description here

    If I try the second alternative solution I get bigger problems: "no file b.tex" ?!?!

    I am sure the first alternative suggestion would work, if only texcount would actually count the words in files, e.g. "sections/file.tex" etc.

    EDIT 5:

    The problem appears to be twofold, compounded by an issue with Atom somehow not being able to create and writing to count.txt ... seems to be a permissions issue that shell-escapee and enabling write18 don't solve. I am looking into this... but will mark Einar's solution as sound... it essentially works but just not for me :(

    • Einar Rødland
      Einar Rødland about 5 years
      To debug, you should run the texcount command on the command line, then add the pipes one by one and see when problems start. For one, the grep should probably look for "Chapter" rather than "Section" since you're using chapters. Make sure the whole command line does what you expect it to before you start running it from within LaTeX.
    • ratuk_
      ratuk_ about 5 years
      I understand what you're saying but at the moment it simply isn't counting the included chapters... if it did then grep etc. would be fine presumably.
    • Einar Rødland
      Einar Rødland about 5 years
      The TeXcount output looks like the files are not included. You could run with the option -v to get the whole TeX code output which might help indicate what is being parsed. You can also try running TeXcount on the command line with -inc instead of -merge to verify that TeXcount finds the files. Note that TeXcount only tries to mimic what TeX/LaTeX does, without doing the actual TeX processing, so it's fully possible that something can go badly wrong in ways that does not immediately make sense from a TeX perspective.
    • Einar Rødland
      Einar Rødland about 5 years
      Even with the correct output from TeXcount, I had some difficulty getting the grep+sed to work as \thesection, or \thechapter which would be more appropriate for you, seems to return the chapter number rather than the title. Instead, I'm adding an alternative approach.
    • ratuk_
      ratuk_ about 5 years
      Hi Einer, I really appreciate your efforts but texcount is simply not counting the words in the files still (see edit 4 above)
    • ratuk_
      ratuk_ about 5 years
      ...I think this is all down to some weird permissions thing about running texcount from Atom... I am not sure what it is or how to fix but I think that's why texcount won't run properly and keep writing to count.txt
    • Einar Rødland
      Einar Rødland about 5 years
      You might test running texcount with the option -out=count.txt instead of using > count.txt. If the file does not get written, it is likely texcount didn't run at all. You could try providing the complete path to texcount, or specify the command as perl texcount (maybe with whole path to texcount) since it is actually Perl that is executed. You could also try replacing texcount with some standard instruction (eg ls, pwd, or echo ... combined with > count.txt) to check if the command actually gets run by TeX.
    • ratuk_
      ratuk_ about 5 years
      Hi Einar -- thanks for solution... I've explained what my problem was at the top of my question (see TL;DR haha). And for now I am happy with your 'first alternative solution'... it's not ideal bc it changes a command but it'll do and is definitely the best thing I have that works. Thank you!
    • Einar Rødland
      Einar Rødland about 5 years
      If you don't want to replace \input with \countinput, you can use the last version of the code in my answer. This redefines the \input macro to that of \countinput.
  • ratuk_
    ratuk_ about 5 years
    the file count.tex is now created, but it's empty (see edits to original question above)
  • Einar Rødland
    Einar Rødland about 5 years
    @ratuk_: Is that with grep+sed or without? What do you get if you run TeXcount directly on the command line, ie without piping it through grep+sed?
  • ratuk_
    ratuk_ about 5 years
    If I run texcount normally I only get the count for the main.tex file, not the subsections/chapter files in my 'sections' folder. It's the "adjust the grep and/or sed commands" part that I am struggling with...
  • Einar Rødland
    Einar Rødland about 5 years
    @ratuk_: Oh, I totally missed this part! TeXcount by default only processes the main file, not the included files. I have added explanation of how to use options -merge or -inc to add the included files to the count.
  • ratuk_
    ratuk_ about 5 years
    Hi Einar, thanks for adding the info -- I've clarified where I am at in 'EDIT2' ... unfortunately your suggestion doesn't work, but I feel like it's nearly there!
  • ratuk_
    ratuk_ about 5 years
    Appreciate the suggestion very much, especially since texcount isn't playing ball, but I really need a proper count solution that is at least a texcount equivalent :(
  • Steven B. Segletes
    Steven B. Segletes over 2 years