README tweaks

This commit is contained in:
NunoSempere 2023-09-15 11:21:05 +03:00
parent 59ad6bda13
commit 4d31e2dbf7

View File

@ -2,18 +2,18 @@
## Desiderata
- Simplicity: Just count words as delimited by spaces, tabs, newlines.
- Simple: Just count words as delimited by spaces, tabs, newlines.
- Allow: reading files, piping to the utility, and reading from stdin.
- Separate utilities for counting different things, like lines and characters.
- Separate utilities for counting different things, like lines and characters, into their own tools.
- Avoid off-by-one errors.
- Linux only.
- Small.
## Comparison with other historical versions wc.
The version in wc/wc.c in this repository sits at 43 lines. It decides to read from stdin if the number of arguments fed to it is otherwise zero, and uses the linux `read` function to read character by character. It doesn't have flags, instead, there are further utilities in the extra/ folder for counting characters and lines, sitting at 33 and 35 lines of code, respectively. This version also has little error checking.
The version in wc/wc.c in this repository sits at 43 lines. It decides to read from stdin if the number of arguments fed to it is otherwise zero, and uses the linux `read` function to read character by character. It doesn't have flags, instead, there are further utilities in the src/extra/ folder for counting characters and lines, sitting at 33 and 35 lines of code, respectively. This version also has little error checking.
[Here](https://github.com/dspinellis/unix-history-repo/blob/Research-V7-Snapshot-Development/usr/src/cmd/wc.c) is a version of wc from UNIX V7, at 86 lines. It allows for counting characters, words and lines. I couldn't find a version in UNIX V6, so I'm guessing this is one of the earliest versions of this program (?). It decides to read from stdin if the number of arguments fed to it is zero, and reads character by character using the standard C `getc` function.
[Here](https://github.com/dspinellis/unix-history-repo/blob/Research-V7-Snapshot-Development/usr/src/cmd/wc.c) is a version of wc from UNIX V7, at 86 lines. It allows for counting characters, words and lines. I couldn't find a version in UNIX V6, so I'm guessing this is one of the earliest versions of this program. It decides to read from stdin if the number of arguments fed to it is zero, and reads character by character using the standard C `getc` function.
The busybox version ([git.busybox.net](https://git.busybox.net/busybox/tree/coreutils/wc.c)) of wc sits at 257 lines (162 with comments stripped), while striving to be [POSIX-compliant](https://pubs.opengroup.org/onlinepubs/9699919799/), meaning it has a fair number of flags and a bit of complexity. It reads character by character by using the standard `getc` function, and decides to read from stdin or not using its own `fopen_or_warn_stdin` function. It uses two GOTOs to get around, and has some incomplete Unicode support.
@ -21,7 +21,9 @@ The [plan9](https://9p.io/sources/plan9/sys/src/cmd/wc.c) version implements som
The plan9port version of wc ([github](https://github.com/9fans/plan9port/blob/master/src/cmd/wc.c)) also implements some sort of table method, in 352 lines. It reads from stdin if the number of args is 0, and uses the Linux `read` function to read character by character.
The GNU utils version ([github](https://github.com/coreutils/coreutils/tree/master/src/wc.c), [savannah](http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/wc.c;hb=HEAD)) is a bit over 1K lines of C. It does many things and checks many possible failure modes. I think it detects whether it should be reading from stdin using some very wrapped fstat, and it reads character by character using its own custom function.
The [FreeBSD version](https://cgit.freebsd.org/src/tree/usr.bin/wc/wc.c) sits at 367 lines. It has enough new things that I can't parse all that it's doing: in lines 137-143, what is capabilities mode? what is casper?, but otherwise it decides whether to read from stdin by the number of arguments, in line 157. It uses a combination of fstat and read, depending on the type of file.
Finally, the GNU utils version ([github](https://github.com/coreutils/coreutils/tree/master/src/wc.c), [savannah](http://git.savannah.gnu.org/gitweb/?p=coreutils.git;a=blob;f=src/wc.c;hb=HEAD)) is a bit over 1K lines of C. It does many things and checks many possible failure modes. I think it detects whether it should be reading from stdin using some very wrapped fstat, and it reads character by character using its own custom function.
So this utility started out reasonably small, then started getting more and more complex. [The POSIX committee](https://pubs.opengroup.org/onlinepubs/9699919799/) ended up codifying that implementation, and now we are stuck with it because even implementations like busybox which strive to be quite small try to keep to POSIX.