Private message attachments

Started by namida, September 26, 2015, 11:28:29 PM

Previous topic - Next topic

0 Members and 1 Guest are viewing this topic.

namida

While running backups of the site, I'm noticing that there are some very large private message attachments, in some cases from a very long time ago, still on the server. (If I'm not mistaken, they're only removed when everyone has deleted the message - including the sender if they've saved it in their Sent Items).

These do add to site storage costs, and seeing as they're unlikely to be downloaded again (compared to those in regular posts, which new users might want to download), I'm considering whether to implement a system where PM attachments are removed from the server if:
- They've been downloaded at least once
- They haven't been downloaded within the last 30 days (but I do need to check whether SMF actually tracks this; I don't think it does)
- They've over a certain size, probably around 200KB (as small files such as levels have a negligable impact on storage costs; I'm referring to the larger, multi-MB files here)

This would not affect the messages themself, only the attachment. I could possibly also look into having the site send a warning message before this happens, if people are worried about possibly-important attachments they may have in their older messages.

(One alternative that people may suggest is limiting the total size of PM attachments a single user can have; I don't like this idea, as I'm perfectly fine with someone having multiple large attachments in their inbox/outbox; if they're there because they're actually being sent to the person, and not just sitting there, taking up space, from a PM several months ago.)

Implementing such a measure could also mean I'd be more comfortable with increasing the size limit a bit; this has been mentioned in the past.

Just to be clear, no such system has been implemented yet, and it will not be implemented without a warning significantly in advance. It's just something I'm considering.
My projects
2D Lemmings: NeoLemmix (engine) | Lemmings Plus Series (level packs) | Doomsday Lemmings (level pack)
3D Lemmings: Loap (engine) | L3DEdit (level / graphics editor) | L3DUtils (replay / etc utility) | Lemmings Plus 3D (level pack)
Non-Lemmings: Commander Keen: Galaxy Reimagined (a Commander Keen fangame)

mobius

the first solution sounds fine. I'm okay with that.
everything by me: https://www.lemmingsforums.net/index.php?topic=5982.msg96035#msg96035

"Not knowing how near the truth is, we seek it far away."
-Hakuin Ekaku

"I have seen a heap of trouble in my life, and most of it has never come to pass" - Mark Twain


Leo

namida, you are to polite. Whatever is too big and/or you think is useless, just delete.

namida

I can't tell what the content is - I can't read private messages or view attachments. (Okay, if I really wanted to put some effort into doing so, I could by looking up in the database the content of messages and/or what the file types are, or examining the files with a hex editor to work out their types... but that's more than I can be bothered doing.) So the only information that's instantly available to me - and visible any time I run a backup - is the age and filesize of the attachments.

In regards to the filesize, it would not make sense IMO to set the limit at 16MB, but then turn around and say "well actually you're not allowed to have PM attachments larger than <insert smaller size here>" - if I intended to do that, I'd simply set the limit lower in the first place.

And yes, while technically I could say "I'm the admin, so I'm making your files disappear", I would rather not do that, especially without warning. While most likely I'm going to go with the method suggested in the first post (once I can be bothered implementing it, that is - so we might be waiting a while), the main reason behind making this post is to see if anyone has alternative suggestions, too. :)
My projects
2D Lemmings: NeoLemmix (engine) | Lemmings Plus Series (level packs) | Doomsday Lemmings (level pack)
3D Lemmings: Loap (engine) | L3DEdit (level / graphics editor) | L3DUtils (replay / etc utility) | Lemmings Plus 3D (level pack)
Non-Lemmings: Commander Keen: Galaxy Reimagined (a Commander Keen fangame)

Simon

#4
Larger files tend to be hosted on rummage heaps (own web space, Dropbox, etc.).

Some people still suffer from the simple-looking problem of getting a file quickly to somebody else. PM is a reasonable solution for those who lack their own rummage heap.

30 days seems really short. What's the cost of garbage-collecting large PM attachments older than 2 years instead? Neglect number of downloads then.

Bonus info, even though nobody has asked: I find publicly visible rummage heaps very interesting. They tell a story, much more than a Facebook page. Rummage heaps should be used extensively by more people.

-- Simon

Nepster

I agree that deletion after 30 days may be a bit early, especially if the attachment is a beta-version of a NeoLemmix level pack and updates are done via single level files. How about 3 months?

Fun fact: I have only two PMs with attachments above 100kB - both of them were send by namida ;P.

namida

#6
2 years is a bit excessive. On the other hand, 3 months sounds reasonable.

The cost for any single attachment isn't huge, but it does add up when lots of them are considered. Currently, on the server, there's no PM attachments over 10MB (for the record, the size limit is 16MB), 8 (1) that are between 5MB and 10MB, 33 (5) that are between 1MB and 5MB, and 19 (7) that are between 200KB and 1MB. This is out of 564 (151) total PM attachments.

The ones in all these ranges combined (ie: those that are 200KB or larger) add up to 155.78MB, out of 172.82MB total for PM attachments. So despite being a small fraction of the total number, they count for an overwhelming majority of the total size.
(While we're putting random statistics out there, for comparison, the total size of non-PM attachments is 337.88 MB; I won't do a breakdown by size since I'm not even remotely considering any kind of pruning for regular post attachments.)

(Numbers in parenthesis are those that were sent "recently" - which I have defined as on or after the 1st of August this year.)
My projects
2D Lemmings: NeoLemmix (engine) | Lemmings Plus Series (level packs) | Doomsday Lemmings (level pack)
3D Lemmings: Loap (engine) | L3DEdit (level / graphics editor) | L3DUtils (replay / etc utility) | Lemmings Plus 3D (level pack)
Non-Lemmings: Commander Keen: Galaxy Reimagined (a Commander Keen fangame)

Simon

QuoteThe ones in all these brackets combined add up to 155.78MB, out of 172.82MB total for PM attachments.
[...]
(Numbers in brackets are those that were sent "recently" - which I have defined as on or after the 1st of August this year.)

Do I understand correctly that people have sent 17 MB worth of attachments during >= 6 years until 2015-08-01, and there has been a recent traffic spike with 155 MB sent since 2015-08-01?

-- Simon

namida

Not quite. My bad, I used the word "brackets" in two different contexts there. The 60 attachments that are 200KB+ (only 13 of which were since 2015-08-01) add up to 155.78MB total; while the remaining 504 (of which 138 were since 2015-08-01) only add up to 17.04MB total.

I edited that post so it should be more clear what I mean.
My projects
2D Lemmings: NeoLemmix (engine) | Lemmings Plus Series (level packs) | Doomsday Lemmings (level pack)
3D Lemmings: Loap (engine) | L3DEdit (level / graphics editor) | L3DUtils (replay / etc utility) | Lemmings Plus 3D (level pack)
Non-Lemmings: Commander Keen: Galaxy Reimagined (a Commander Keen fangame)

ccexplore

Quote from: Simon on September 27, 2015, 10:25:21 AM30 days seems really short.

But namida's proposal is strictly about PM attachments, and doesn't even prune the file until it's been downloaded at least once.  Given that unlike regular attachments, attachments in PMs are more likely for one-time file transfer from person A to person B (as opposed to "here's my levelpak available for everyone from now til eternity who's interested to download", the use case common for regular attachments), what is the expected scenario for which you want the file stick around for years wasting apparently valuable space, after both source and recipient long already each have their own personal copy of it somewhere in their own storage?

If the answer is something along the line of "in case I accidentally lost my own copy of the file" or similar, then sorry, the real answer for such usage scenarios is to look into a proper file backup solution (one which incidentally would cover more than just files from the forums here), either through the many choices of free cloud storage available online nowadays, or an old-school backup to your own physical media.  Those (especially the cloud variety) appears to be much more cost effective compared to the limited storage here.

Quote from: Nepster on September 27, 2015, 11:47:27 AMI agree that deletion after 30 days may be a bit early, especially if the attachment is a beta-version of a NeoLemmix level pack and updates are done via single level files.

I was under the impression that PMs' message contents can't even be edited once sent, let alone attachments.  It's possible I'm wrong on both counts though since I guess I used PMs somewhat less than regular postings.  If attachments in PMs can be updated then I guess the 30-day countdown should reset on each update (yes, it does open up a loophole of sorts, but someone who actually decides to waste their time utilizing the loophole is strongly urged to consider a far superior solution like the many choices of free online cloud storage available).

A longer countdown is fine if it proves sufficiently effective for dealing with the underlying storage space issue.

===============

I also realize from what namida said that apparently it's not trivial to find out the file types of the largest offenders, but you must forgive me for wondering whether this may be an unfortunate side effect of NL's "pack == EXE" idea? ;P

namida

QuoteI was under the impression that PMs' message contents can't even be edited once sent, let alone attachments.
They can't be. I think what he referred to here is when people assist in testing a level pack, and may for whatever reason want to go back to an old LVL version that they received in an older PM, or redownload the main EXE (perhaps on a different computer), or so on. I'm not personally aware of any case where a testing period has gone on for long enough for this to be an issue, but I do kind of see where the claim is coming from.

QuoteI also realize from what namida said that apparently it's not trivial to find out the file types of the largest offenders, but you must forgive me for wondering whether this may be an unfortunate side effect of NL's "pack == EXE" idea?
It's possible that these are contributing. But at the same time, the EXE "container" only adds about 1MB to the size - so the majority of content in such a pack is still the actual content (most NeoLemmix EXEs tend to be at least 3MB or so), usually. With that being said, not doing it this way would mean that an updated copy of the pack as a whole wouldn't nessecerially need to include the music again, which is definitely a significant contributor to the size of most packs.
My projects
2D Lemmings: NeoLemmix (engine) | Lemmings Plus Series (level packs) | Doomsday Lemmings (level pack)
3D Lemmings: Loap (engine) | L3DEdit (level / graphics editor) | L3DUtils (replay / etc utility) | Lemmings Plus 3D (level pack)
Non-Lemmings: Commander Keen: Galaxy Reimagined (a Commander Keen fangame)

ccexplore

It would be helpful to have more details on the storage situation in terms of costs and current usage trends.  For example, it sounds like most of the storage is still occupied by regular attachments anyway.  Are PM attachments' storage requirements growing much faster than regular attachments?  How close are we to hitting some sort of cap over which we actually start incurring new additional costs?

namida

QuoteAre PM attachments' storage requirements growing much faster than regular attachments?

Not at all. It's just an area where I've seen that there's a lot of data stored that probably doesn't need to be.

QuoteHow close are we to hitting some sort of cap over which we actually start incurring new additional costs?

Storage costs on NFS are linear, not based on various caps at which a new cost applies; it's $1 per GB, per month (not including the database). Thus, any additional data is an extra cost - even if that's small at first - and vice versa for any removed data. I know it doesn't seem like a huge amount, but I'm not exactly a millionaire. :P
My projects
2D Lemmings: NeoLemmix (engine) | Lemmings Plus Series (level packs) | Doomsday Lemmings (level pack)
3D Lemmings: Loap (engine) | L3DEdit (level / graphics editor) | L3DUtils (replay / etc utility) | Lemmings Plus 3D (level pack)
Non-Lemmings: Commander Keen: Galaxy Reimagined (a Commander Keen fangame)

Simon

With the information so far, I judge that at most 100 MB worth of PM baggage is generated per year. If we prune every 3 months, we get rid of 175 MB compared to pruning 2-year-old baggage. Pulling 175 MB of extra weight costs $2 per year.

I'm willing to pitch in $10 in 5 years from now if we resort to pruning only 2-year-old attachments.

And yeah, don't put anything but machine code into a binary blob.

-- Simon