Without this, the sender drops the connection before the "close" message
has made it to the server, which leaves the mailbox hanging until it
expires. It still lives in a 'd.addBoth()' slot, so it gets closed even
if some error occurrs, but we wait for it's Deferred to fire in both
success and failure cases.
We already hard-code 'relay.sqlite', so I don't see a lot of value in
making making the stats file configurable too. That said, if it makes
life easier for packagers (e.g. start-stop-daemon or systemd wanting
these files to go into /var/run/something/ , and if it isn't sufficient
to just use /var/run/something/ as the CWD), I'd accept a patch to
add it back.
This changes the DB schema and rewrites the expiration/pruning
algorithm. The previous code had several bugs which failed to clean up
nameplates, mailboxes, and messages when clients didn't explicitly close
them before disconnecting (and sometimes even when they did).
The new code runs an expiration loop every 10 minutes, and prunes
anything that is more than 11 minutes old. Clients with a connected
listener (a websocket that has open()ed a Mailbox) update the "in-use"
timestamp at the beginning of the loop. As a result, nameplates and
mailboxes should be pruned within between 10 and 20 minutes after the
last activity.
Clients are expected to reconnect within 9 minutes of a connection being
lost. The current release does not survive a dropped connection, but a
future one will. By raising the 10min/11min settings for specific
applications, a "Persistent Wormhole" mode will tolerate even longer
periods of being offline.
The server now has an option to write stats to a file at the end of each
expiration loop, and the munin plugins have been rewritten to use this
file instead of reading the database directly. The file includes a
timestamp so the plugins can avoid using stale data.
When this new server is run against an old (v2) database, it should
automatically upgrade it to the new v3 schema. The nameplate and mailbox
tables will be erased, but the usage (history) will be preserved.
The DB queries this uses aren't particularly efficient, and when the
time it takes to run starts to become a problem, we should do an
optimization pass.
This counts the number of "standalone" mailboxes we create, which
happens when a client does open() without first using a nameplate. The
current client doesn't do this, but future clients might.
This moves responsibility for the periodic prune-everything Timer up to
RelayServer too. That way we can be sure the stats are dumped
immediately after prune, and we can incorporate stats from Transit as
well.
The new approach runs every 10 minutes and keeps a
nameplate/mailbox/messages "channel" alive if the mailbox has been
updated within 11 minutes, or if there has been an attached listener
within that time.
Also remove the "nameplates.updated" column. Now we only track "updated"
timestamps on the "mailboxes" table, and a new mailbox will preserve any
attached nameplate.
Unless/until people start writing new applications (with different
app-ids), this code is unlikely to get used very much, and the code is
simpler without it.
I changed my mind, it's actually easier if 'wormhole-server stop' (and
'restart') does *not* throw an error when there wasn't already a server
running in that directory. Specifically that lets me use 'restart' as an
idempotent "make sure a server is running" command.
pynacl-1.0.1 has a bug, on systems which have the sodium runtime library
installed (/usr/lib/libsodium.so, as provided by a package like
"libsodium13"), but not the development headers (/usr/include/sodium.h,
as provided by "libsodium-dev"). The pynacl setup.py tries to detect
whether a system libsodium can be used (instead of the bundled copy), by
using cffi to load libsodium.so and check the version strings. However
this test doesn't check that sodium.h is available, which is needed to
build the glue code.
The long-term fix is to change pynacl, either to improve the test (to
look for a header file, which sounds tricky), or to always use the
bundled sodium unless explicitly told otherwise (removing the test
altogether). The short term fix is to tell magic-wormhole builders to
install libsodium-dev if they run into this error, or use the
environment-variable override when installing.
closes#10
refs https://github.com/pyca/pynacl/issues/184
These weren't running because Click complained about an ASCII locale
when running under py3, which triggered an error check that was there to
detect broken virtualenvs, skipping those tests.
The fix appears to be to force the en_US.UTF-8 locale when running the
wormhole program in a subprocess.
This adds a test for database upgrades, which I developed on a branch
that added a new DB schema (v3) and an upgrader to match, but then I
changed my mind about the schema and removed that part. The test will be
useful some time in the future when I change the schema in a small
enough way that I bother to write an upgrader for the change. For now,
the test is disabled.
In addition, the upgrader test is kind of lame. I'd really prefer to
assert that the upgraded schema is identical to the schema of a
brand-new (latest-version) database, but ALTER TABLE doesn't quite work
that way (comments are omitted, and the order of the columns is slightly
different).
This also adds database.dump_db() for the tests.
There was some vestigal server-cli code (leftover in the client-side
wormhole.cli.cli_args) that used port 3000/3001, and it accidentally got
used for the new Click-based parser, rather than the actual server-cli
code (in wormhole.server.cli_args) that uses port 4000/4001. This
changes the port numbers to match (everything uses 4000/4001 these days,
to avoid confusing interactions with the old 0.7.6 server that might
still be listening on the old ports).