The version in wc/wc.c in this repository sits at 43 lines. It decides to read from stdin if the number of arguments fed to it is otherwise zero, and uses the linux `read` function to read character by character. It doesn't have flags, instead, there are further utilities in the extra/ folder for counting characters and lines, sitting at 33 and 35 lines of code, respectively. This version also has little error checking.
[Here](https://github.com/dspinellis/unix-history-repo/blob/Research-V7-Snapshot-Development/usr/src/cmd/wc.c) is a version of wc from UNIX V7, at 86 lines. It allows for counting characters, words and lines. I couldn't find a version in UNIX V6, so I'm guessing this is one of the earliest versions of this program (?). It decides to read from stdin if the number of arguments fed to it is zero, and reads character by character using the standard C `getc` function.
The busybox version ([git.busybox.net](https://git.busybox.net/busybox/tree/coreutils/wc.c)) of wc sits at 257 lines (162 with comments stripped), while striving to be [POSIX-compliant](https://pubs.opengroup.org/onlinepubs/9699919799/), meaning it has a fair number of flags and a bit of complexity. It reads character by character by using the standard `getc` function, and decides to read from stdin or not using its own `fopen_or_warn_stdin` function. It uses two GOTOs to get around, and has some incomplete Unicode support.
The [plan9](https://9p.io/sources/plan9/sys/src/cmd/wc.c) version implements some sort of table method in 331 lines. It uses plan9 rather than Unix libraries and methods, and seems to read from stdin if the number of args is 0.
The plan9port version of wc ([github](https://github.com/9fans/plan9port/blob/master/src/cmd/wc.c)) also implements some sort of table method, in 352 lines. It reads from stdin if the number of args is 0, and uses the Linux `read` function to read character by character.
The GNU utils version ([github](https://github.com/coreutils/coreutils/tree/master/src/wc.c), [savannah](http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/wc.c;hb=HEAD)) is a bit over 1K lines of C. It does many things and checks many possible failure modes. I think it detects whether it should be reading from stdin using some very wrapped fstat, and it reads character by character using its own custom function.
So this utility started out reasonably small, then started getting more and more complex. [The POSIX committee](https://pubs.opengroup.org/onlinepubs/9699919799/) ended up codifying that implementation, and now we are stuck with it because even implementations like busybox which strive to be quite small try to keep to POSIX.
Does one really need to spend 1k lines of C code to count characters, words and lines? There are many versions of this rant one could give, but the best and probably best known is [this one](to do: locate) by cat-v, named for the explosion of options.
[ add sad busybox comment on its cat implementation ]
- [ ] Possible follow-up: Write simple versions for other coreutils. <https://git.busybox.net/busybox/tree/coreutils/>? Would be a really nice project.
- Get it working on a DuskOS/CollapseOS machine? Or, find a minimalistic kernel that could use them?
- [ ] add chc, or charcounter (cc is "c compiler")
- [ ] Pitch to lwn.net?
Discarded:
- ~~[ ] Could use zig? => Not for now~~
- ~~Maybe make some pull requests, if I'm doing something better? => doesn't seem like it~~