wc in 44 lines of C
Go to file
2023-09-11 16:34:53 +02:00
historical add historical versions 2023-09-11 16:34:53 +02:00
lc add lc version, various improvements. 2023-09-10 15:05:28 +02:00
wc add lc version, various improvements. 2023-09-10 15:05:28 +02:00
README.md add lc version, various improvements. 2023-09-10 15:05:28 +02:00

ww: count words in 50 lines of C

Desiderata

  • Simplicity: Just count words, as delimited by: spaces, tabs, newlines.
  • No flags.
  • Avoid off-by-one errors.
  • Allow piping, as well as reading files.
  • Small.
  • Linux only.

Comparison with wc.

The GNU utils version (github, savannah) is a bit over 1K lines of C. It does many things and checks many possible failure modes. I think it detects whether it should be reading from stdin using some very wrapped fstat.

The busybox version (git.busybox.net) of wc is much shorter, at 257 lines, while striving to be POSIX-compliant, meaning it has flags.

The plan9port version of wc (github) implements some sort of table method, in 352 lines. So does the plan9 version, which is worse documented, but shorter.

Here is a version of wc from UNIX V7, at 86 lines, and allowing for both word and line counts. I couldn't find a version in UNIX V6. Of all the versions, I think I understand this one best.

Steps:

  • Look into how C utilities both read from stdin and from files.
  • Program first version of the utility
  • Compare with other implementations, see how they do it, after I've read my own version
    • Compare with gnu utils.

    • Compare with musl/busybox implementations,

    • Maybe make some pull requests, if I'm doing something better? => doesn't seem like it

  • Install to ww, but check that ww is empty (installing to wc2 or smth would mean that you don't save that many keypresses vs wc -w)
  • [ ] Could use zig? => Not for now
  • Look specifically at how other versions do stuff.
    • Distinguish between reading from stdin and reading from a file
      • If it doesn't have arguments, read from stdin.
    • Open files, read characters.
  • Write version that counts lines
  • Document reading from user-inputed stdin (end with Ctrl+D)
  • Write man files?
  • Write a version for other coreutils? https://git.busybox.net/busybox/tree/coreutils/? Would be a really nice project.
    • Simple utils.
  • zig?
  • https://github.com/leecannon/zig-coreutils
  • https://github.com/keiranrowan/tiny-core/tree/master
  • Add lc
    • Take into account what happens if file doesn't end in newline.
  • add chc (cc is "c compiler")