A thread on why I use #ZFS. Each toot is one reason.
1/ Protection against Silent Data Corruption - data corruption that is not detected by any other component of the system. ZFS checksums or hashes every bit of data in the filesystem, data or metadata, and you can schedule a prediodic scan ("scrub") to see if all data is intact. On RAID setups, ZFS will automatically correct it for you. I've witnessed the value of this, particularly on petabyte-scale installations. https://en.wikipedia.org/wiki/Data_corruption#Silent
Why I use #ZFS
2/ Atomic snapshots are simple, easy, and lightweight. #Backup strategies do not generally need special handling for databases since the snapshot is atomic. Lightweight clones of snapshots are easy, and zero-storage snapshots called bookmarks are also available, to use as a basis for incremental zfs sends.
Why I use #ZFS
3/ zfs send and receive make #backups super easy. After the first full backup, there is never a need to scan the entire FS, and only changed blocks -- not changed files -- are sent. Safeguards abound. Backups every 10 minutes are fairly easy, every minute are possible, and I've been backing up my machines every hour for years now, without any load or battery life concerns. Rockets through in a few seconds on most of my boxes. More at https://changelog.complete.org/archives/10186-more-topics-on-store-and-forward-possibly-airgapped-zfs-and-non-zfs-backups-with-nncp (at the end)
Why I use #ZFS
4/ zfs #encryption is more flexible than LUKS. You can apply it to certain parts of the filesystem, use different keys for different parts, etc. Even better, zfs send lets you choose whether to send raw encrypted or decrypted packets. You can make #backups for which no component ever sees the plaintext data. You can also snapshot, clone, back up, etc. without needing decryption keys.
Why I use #ZFS
6/ zvols. You can do all of this with chunks that are presented as block devices, too. Want to back up that Windows virtual machine running under KVM or Virtualbox? Create /dev/zvol/foo with ZFS, attach it to the VM, and you can snapshot and backup and clone just as you would with a filesystem.
Why I use #ZFS
8/ More stable than #btrfs. Although btrfs provides many of these benefits - plus the theoretical benefit of more flexibility with rearranging data - every time I have tried btrfs, I have run into serious bugs. Some of them have been corrected, but not all. https://btrfs.wiki.kernel.org/index.php/Status gives the btrfs status. It still has pathologies with many snapshots, I believe still also with databases, though I think hardlinks have been fixed.
And to conclude, 9/ reasons I don't use #ZFS.
Although I use it on every machine I can, it is somewhat RAM hungry, so I don't run it on my #RaspberryPi machines or other very old hardware. Though I now have an 8GB Pi on hand that I intend to try it with.
It always makes me nervous, though. Running without protection against silent data corruption feels... unsafe 🙂
#ZFS 10/ OK, an addendum. With crypto, you can even do incremental #backups (with zfs send) without ever mounting the filesystem or knowing its decryption keys - the output stream will have encrypted data, but zfs send just looks at changed blocks so you're good! Also since ZFS is copy-on-write it has better properties for SSDs that have issues with sudden power loss than most filesystems.
@jgoerzen I have been a btrfs user on my media server, but switched to zfs with a recent HDD upgrade.
Now I can't copy from one drive array to another and successfully serve a video over NFS at the same time.
There definetly isnt an ideal filesystem. Btrfs wants to be an ideal ft, but wont ever get there. Zfs only wants to be ideal for enterprise.
@LovesTha I fully agree that there is not one filesystem that is best in every use case. Even I don't use ZFS everywhere, but I most definitely do use ZFS outside the enterprise (including on laptops). It is a bit hard to troubleshoot by toot, but the behavior you describe definitely doesn't sound like what you should be seeing. Serving video uses fairly little bandwidth compared to what a zpool should be capable of, so there may be some misconfiguration somewhere.
@LovesTha Some things to look at: ZFS is much better when it is in control of RAID rather than a layer beneath it. If you have hardware RAID or any other layer on the drive array, shut it off and let ZFS do it. Also make sure the zpool has ashift=12 assuming you have 4K sectors. If the system you are streaming from is the one you are copying from, then readahead at ZFS or OS level may be causing issues. Different considerations if you're writing. Also RAM & ARC size cconsiderations
@jgoerzen destination for copy and NFS share is 3 drive raidz directly on entire drives. Source for copy is ext4. It probably just doesn't have enough RAM.
I will fix it one day, but as zfs fixes are mostly destroy array and restart, i'll just use something that works
@LovesTha A couple of quick settings you may play with, BTW: zfs set logbias=latencty or throughput, try both. atime=off. dedup off. these can be set at runtime
@LovesTha Also the kernel scheduler should probably be disabled on your zpool devices if it isn't already
@LovesTha Great! If you happen to know which one did the trick for you, I'd be glad to know. There are a lot of tunables at the zfs module level as well (modinfo zfs to see them) but it's as easy to make things worse with many of those.
@jgoerzen Scheduler appeared to be off already. log bias didn't help. Applied the others at the same time, but my money is on dedupe, which I'd understood to be impossible to turn off....
The things you learn.
So setting dedup=off doesn't undo the deduplication that already happened, but it does mean future writes won't have to go through that expensive process. You might also zfs set checksum=on because it may have switched to the expensive sha256 for dedup purposes.
It won't help for videos, but zfs set compress=lz4 can save a lot of space w/o hurting perf
@LovesTha Anyhow, if dedup was on, then I would strongly suspect your guess is correct. Dedup effectively trades RAM and CPU for disk space, and for most people that trade is in precisely the wrong direction (disk space being relatively plentiful compared to RAM and CPU cycles). The manpage recommends 1.25GB RAM per TB stored for dedup, but for some workloads that significantly under-estimates the RAM needed.
Anyhow, glad it's working better for you!
@jgoerzen the enterprise dig is about being greedy with RAM and not supporting removing drives, something I have done many times
@LovesTha Those are both absolutely fair points (and I did try to mention both in my thread). RAM usage is the #1 reason I don't run it on my Raspberry Pis :-)
For people who care about, support, or build Free, Libre, and Open Source Software (FLOSS).