Thoughts about writing a discord logbot.
These are just some thoughts. I have some plans but no guarantees I'll put any of these into actual action!
So uh... I have kind of a paranoia problem. I don't like having conversations I saw previously hidden from me or be made permanently inaccessible. I get that in some cases it might be needed, but for the most part, this is more of an irritation than it's an improvement as it makes it more difficult to get a coherent chain of events if someone opts to remove them.
One of the platforms I use the most is Discord. It's basically IRC without the dated hassle that IRC has become. Discord also boasts a lot of very good moderation tools, including bans and kicks and all the usual. The issue comes with the fact that it permits administrative members on a guild to remove messages. This fact has been used for censorship of certain discussions (by removing large swathes of them), and as a result makes it very annoying to figure out a discussion if the admins decided to remove a message (and additionally makes it very difficult to see if a user was misbehaving and removed their own messages, which users can do).
To this end, I've forked a logbot for Discord called panopticon about 10 months ago. It's a very good logbot, but it currently has a couple of shortcomings.
With the original maintainer of panopticon (ihaveahax) releasing the source code for a new version of Panopticon to the public, I'm currently considering writing my own logbot. I've attempted to work with Panopticon-2 itself, but I am unable to get it to function in the way I desire properly and my fork is mostly indicative of stuff I've run into that wasn't logged after I wanted to log it.
So this post will hold some thoughts about my experiences with logging this and what I would improve upon writing one from scratch. This is not indicative of me starting a project just yet, but I want to kind of get an idea of where I stand and what I prioritize in a logbot.
Problem 1: What library
This is an interesting subject of discussion. The discord API has a wrapper in a lot of languages. The original panopticon was written in discord async, but I patched it with a couple of minor changes to work on the rewrite version.
That said, those were at it's core still patches. They were simple hacks made to have it work on rewrite, not a core rewrite that attempts to make maximum use of d.py's rewrites functionality.
Secondly, it's clear that this bot was adjusted while I was still learning discord.py and how to best use it and it shows as the quality of the code varies wildly and the dev branch has undergone several major changes and rewrites.
That said, I'd totally stick with discord.py when writing another logger.
Problem 2: What storage medium
That covers the language for the bot. Now let's consider the storage medium. The original panopticon used simple textfiles and simply appended to the files when new messages and events came in.
This system was already getting pushed to it's limit with my own changes and brings with it various other changes, all of which I had made various attempts at fixing but always fell flat because it's such a major PITA.
But, let's talk quickly about the advantages of this method:
- Messages are easily accessible, logs are always realtime and you can simply cat them from your server or set up an h5ai and access them on the web.
- It's extremely easily readable
Now let's go over the disadvantages:
- No clean method of programatically getting the messages back into a piece of software. I've written attempts, mostly awful regex to attempt to properly do this.
- No easy method to properly store discord Embeds.
- Edits are disjointed from the message, indicated with an
- Server nicknames aren't properly shown.
- Other events on the server are disjointedly logged.
The obvious alternative would be to put it in a database. Something which panopticon-2 took to heart, as it uses a PostgreSQL database to which it directly talks.
That said... this is hardly an easy or acceptable solution. PostgreSQL is annoying to set up like every other database and it's incredibly difficult to do quick tests on the data given PostgreSQL doesn't have something I can install like PHPAdmin. Retrieving messages is a similar PITA because SQL is an incredibly annoying language.
This is more of a general criticism of SQL than a criticism of Postgres specifically, although the lack of an easy interface is a direct criticism I have towards it.
So... what should we try to use instead?
I personally would still stick with a database. That said, databases don't translate well to objects at all. Unless we had a library to specifically convert an actual object to a database table and relations and all those things.
Introduce, SQLAlchemy, one of the easiest to use SQL ORM tools. It's so easy that it made storing data in a database actually maintainable and not a massive headache of writing a data layer that directly executes SQL. SQLAlchemy removes the SQL part of writing said data layer and makes it easily accessible through OOP methods.
Mapping things is incredibly easy. Hell, look at this example of a simple SQL ORM for a starboard. It's super clean and easy to use. All you would have to do is map the values you want to store to the objects and SQLAlchemy helps handle the annoying writing of SQL. Actually doing things with the ORM is just as painless.
This comes in combination with the fact that SQLAlchemy by design is DMS agnostic. I can put it in an SQLite database if I want to do it lowscale or just for local testing, MySQL if I want to do it in production and PostgreSQL if I want to do it in big-scale production. All I need to give is the proper connection settings and it'll take it from there.
Problem 3: Viewing the data
For the original panopticon, I touched this already in problem 2. Panopticon-2 currently doesn't have a solution for it (outside of a couple of optional views and postgres functions), so I'm free to imagine this on my own.
I would personally make this a web interface, not unlike what Discord already provides. There are various HTML snippets and templates out there that can aptly fake Discords UI and writing a Flask implementation shouldn't be too difficult when using SQLAlchemy as you can share the ORM models.
This application would roughly have the following endpoints and ideas:
- The main endpoint will not be directly locked or have any authorization attached. Anyone who can get to it is able to read it. While this sounds horrible on paper, in reality, I'd put the password protection if it was needed for any endpoints behind a simple HTTP authing mechanism through nginxs reverse proxy system. Apache probably has similar features.
- API endpoints. These endpoints would be all of the endpoints described below but exposed and returned as JSON.
- Logging Accounts endpoint. A specific feature of my fork was the ability to log on multiple accounts. This was done by adding a configuration option and ensuring that all logged data was put in directories that were identified based on the logging account. This method I would like to persist, but I would also like the ability for this to be non-identifying of the used account to prevent people from finding out what account is the logging selfbot if users opt to log through a selfbot. Perphaps an avatar/image option for each account and a default if none are used? In addition, the ability for a “single account” mode that forgoes this mode and instantly redirects to the first logging account should be an option.
Then, for each logging account, I'd add the following endpoints:
- Searching endpoint. Global searching would probably be a bad idea and of little use, so I'm not gonna do that, but having the ability to search things per-guild should be possible. Syntax should be similar to discords own. SQLAlchemy has a filter function that looks promising for retrieving data, so perphaps using that could be handy.
- Guilds endpoint. Merely a list of all guilds that are logged and links to the channels endpoint for each guild.
- Individual guild details endpoint. (API wise, this one would go under the previous one, but web-wise splitting these up is wiser). Should list owners, previous names and IDs. Logging guild icons might be useful as these aren't often subject to change.
- Channels endpoint. See the guilds endpoint. Lists and links channels that a message has been logged into. Perphaps also show hidden channels and what can be retrieved from those?
- Individual channel endpoint. A list of dates that have been logged at and links to go to.
- Channel details endpoint. (API wise, this one would go under the previous one, but web-wise splitting these up is wiser). Would list details like topic changes, pinned messages, old names and so on.
- Message list endpoint. Should list all messages in a channel on a certain date, including edits and the IDs. Images should at least be linked to (storing them locally I'd say is infeasible, but providing links to the ones on discord should be a possibility).
- User account endpoint. Lists an individual user in the database. This should have their current avatar, their username (and previous usernames), creation date and their ID. Logging Nitro or Hypesquad status is not really needed or possible to maintain cross-account abilities. Should also list role history, nicknames and joins and leaves per server (perhaps on a sub-endpoint).
- DMs and DM groups endpoint. Just a list of links to Individual channel end points for these situations.
This is by no means an easy Web UI, and what is currently logged in any incarnation of panopticon would need to be severly increased to maintain this level of logging, but all these things should be passively obtainable thanks to Discord.pys own caching, so rate limits should not be an issue.
Problem 4: Importing existing data.
Importing existing messages and guild data always has been a bit of a hacked thing in my fork of panopticon. Initially it would cache all channel messages before writing them to a file, which ran the risk of running out memory if the amount of messages exceeded the required amount. In addition, it would have to be logged in a separate directory, as the log folders could not be properly merged.
SQLAlchemy would resolve this, as discord uniquely identifies everything with an ID, and if the ID for a message already exists in the database, it can just be skipped over.
Problem 5: Writing this bloody thing.
Yeah, this is a huge project. I'm not even sure if I can undertake it. I might though. Panopticon itself was an incredible logger, but panopticon-2 doesn't really fit my needs nor can I nicely modify it and my own fork is reaching the limit of maintainability due to my own incompetence when I made the fork.
That said, I do want to try and provide this project. I'm not sure if it gets anywhere or if I'll be able to fully implement all this, but I think it'd be really cool if I were able to.
So um... yeah. Those are my thoughts on writing a new and uh bigger, discord logbot.