Data degradation: alternative solutions without the use of ECC Ram and ZFS (or similar filesystem)?

wireless82@alien.top · 2 years ago

Data degradation: alternative solutions without the use of ECC Ram and ZFS (or similar filesystem)?

bobj33@alien.top · 2 years ago

copy / paste of my previous post

Silent bit rot where a bit flips but there is no hardware is extremely rare. My stats say once a year on 300TB of data. Some statistics major can correct me but if someone has 1TB of data then they should see a single bit flip in 300 years so maybe their great great great grandchildren will see it and report back to them in a time machine.

All of my data is on ordinary ext4 hard drives. I buy all my drives in groups of 3. I have my local file server, local backup, and remote backup. I have 2 drives in the local file server dedicated for snapraid parity and run “snapraid sync” every night.

https://www.snapraid.it

Snapraid has a data scrub feature. I run that once every 6 months to verify that my primary copy of my data in my file server is still correct.

Then I run cshatag on every file when generates SHA256 checksums and stores them as ext4 extended attribute metadata. It compares the stored checksum and stored timestamp and if any file has changed but the timestamp wasn’t edited it reports it as corrupt.

https://github.com/rfjakob/cshatag

Then I use rsync -RHXva when I make my backups via rsync of all my media drives. This data is almost never modifed, just new files are added. The -X option is to also copy over the extended attribute metadata. Then I run the same cshatag file on the local backup and remote backup server. This takes about 1 day to run. On literally 90 million files across 300TB it finds a single file about once a year that has been silently corrupted. I have 2 other copies that match each other so I overwrite the bad file with one of the good copies.

I only run rsnapshot on /home because that is where my frequently changing files are. The other 99% of my data is maybe “write only” so I just use rsync from the main file server to the two backups. Before I run rsync for real I use rsync --dry-run to show what WOULD change but it doesn’t do anything. If I see the files I expect to be written then I run it for real. If I were to see thousands of files that would be changed I would stop and investigate. Was this a cryptolocker virus that updated thousands of files?

As for backing up the operating system I have the /etc and /root account backed up every hour through rsnapshot along with /home

I’m not running a business. I can reinstall Linux in 15 minutes on a new SSD and copy over the handful of files I need from the /etc backup

wireless82@alien.top · 2 years ago

Man… thanks! You are a really Master! It is not clear to me the further steps after the snapraid activities but I have to read it with more attention, I think.

Cheap-Explanation662@alien.top · 2 years ago

Check out Linux builtin dm-integrity

DTangent@alien.top · 2 years ago

Store backups using rar and a data recovery record of 6% or larger and you won’t have to worry about bot rot or minor sectors going bad.

Far_Marsupial6303@alien.top · 2 years ago

…but data might be corrupted and so (my) easily backup strategy - that includes two rsynced copies in different places and time, one on ssd and another on hdd - cannot be enough.

It isn’t enough if you don’t perform a CRC and generate a HASH.

Obsessing about preventing bitrot is like taking multiple multiple vitamins then crossing the street without checking for traffic. Mantras: Any storage media/device can fail at any time, for any reason, with or without notice. Reliability and longevity is BACKUP, BACKUP, BACKUP!

Is bitflip/bitrot real? I believe it is. However, the likelihood of it corrupting multiple drives/media at the same time, as long as they’re continually checked, verified and copied to new drives/media is infinitesimally small.

wireless82@alien.top · 2 years ago

But if I backup data which are - from the source - bit rotten, the errors wont be propagated to the copies? I am miss8ng something…

snatch1e@alien.top · 2 years ago

Backups with versioning should solve the issue as soon as you would be able to identify which data got corrupted.

wireless82@alien.top · 2 years ago

And there is a best practise to do so … ?

snatch1e@alien.top · 2 years ago

Best practices of making backups?

bofh2023@alien.top · 2 years ago

Checksum your files, as you say. Just md5sum or sfv tools will do this and are quick and easy to check at any time.

If you want a way to recover as well, create a certain % of parity files with a tool like par2. If you’re just worried about the occasional flipped bit, a very small % of parity will go a really long way. Creating par2 is NOT super fast but you only have to do it once.

This is basically file-level RAID.

wireless82@alien.top · 2 years ago

Interesting, I will go deep. I think I should add or reformulate the question or I am missing a thing: I first have to avoid - or check - that source files are not bitrotten, otherwise all backups will be have bitrotten data, right? I mean, I might checksum but on working data how can I have evidence of it? And on “cold” data - f.e., pictures - how can be sure that source ruined files are not copied onto the backups? I cant - because of time - checksum 1tb of data every night…