Discussion:
Low performance on sparse files with hypervisors
(too old to reply)
Jerome
2012-11-06 09:37:52 UTC
Permalink
Hello all,

I'm experiencing strange issue with last version (11) of ZoL.
I don't know if it related with this specifi version as I haven't tried
others.

I created a Linux (debian 64b) box with 35To (raid5) device (presented as
/dev/sdX). On this device I created a zpool.
On this Zpool I have created zfs filesystem shared via NFS (using 1Gbps
ethernet card) to Hypervisors hosts (XenServer / VMware) in order to test.
ZFS does not have any compression or deduplication activated.
*
*
* pool: XXXXX*
* state: ONLINE*
* scan: scrub canceled on Tue Nov 6 10:26:30 2012*
*config:*
*
*
* NAME STATE READ WRITE CKSUM*
* local ONLINE 0 0 0*
* sdc ONLINE 0 0 0*
*
*
*errors: No known data errors*

On this storage each hypervisor create vhd or vmdk files (sparse files).
Each file is attached as block device disk to Virtual Machines.

In these virtual machine I encounter very low peformance when writing using
dd.
We experience near 5-10MB/s rate.

Doing the same to an ext4 fs on the same server provides near 90MB/s data
rate.

As anyone experienced this ?

Do you have any idea on what is going on ?

Thanks for your help
Jefferson Diego Gomes Rosa
2012-11-06 11:33:28 UTC
Permalink
What is the sector size of yours hardisks, and what aligment have you used
to make the pool?
I had the same problem when I was using a 4K sector hard disk with aligment
to 512b (ashift=9)
When I'd recreated the pool with correct options (zpool create -o
ashift=12) I had a performance improve of at least 300% (salting from 9MB/s
to ~30MB/s).
Post by Jerome
Hello all,
I'm experiencing strange issue with last version (11) of ZoL.
I don't know if it related with this specifi version as I haven't tried
others.
I created a Linux (debian 64b) box with 35To (raid5) device (presented as
/dev/sdX). On this device I created a zpool.
On this Zpool I have created zfs filesystem shared via NFS (using 1Gbps
ethernet card) to Hypervisors hosts (XenServer / VMware) in order to test.
ZFS does not have any compression or deduplication activated.
*
*
* pool: XXXXX*
* state: ONLINE*
* scan: scrub canceled on Tue Nov 6 10:26:30 2012*
*config:*
*
*
* NAME STATE READ WRITE CKSUM*
* local ONLINE 0 0 0*
* sdc ONLINE 0 0 0*
*
*
*errors: No known data errors*
On this storage each hypervisor create vhd or vmdk files (sparse files).
Each file is attached as block device disk to Virtual Machines.
In these virtual machine I encounter very low peformance when writing
using dd.
We experience near 5-10MB/s rate.
Doing the same to an ext4 fs on the same server provides near 90MB/s data
rate.
As anyone experienced this ?
Do you have any idea on what is going on ?
Thanks for your help
--
*Best Regards,*



Jefferson *“**Diede”* Diego

*Linux System Administrator*


*Nextel**:* 55*86*248729 *iDEN:* +55-11-7763-9947*
Mobile: * +55-11-9.7598-6750
* **Skype:* jeffersondiego8

*Linux User:* 449363
Jerome
2012-11-06 11:43:46 UTC
Permalink
Thanks, I'll give it a try.
Post by Jefferson Diego Gomes Rosa
What is the sector size of yours hardisks, and what aligment have you used
to make the pool?
I had the same problem when I was using a 4K sector hard disk with
aligment to 512b (ashift=9)
When I'd recreated the pool with correct options (zpool create -o
ashift=12) I had a performance improve of at least 300% (salting from 9MB/s
to ~30MB/s).
Post by Jerome
Hello all,
I'm experiencing strange issue with last version (11) of ZoL.
I don't know if it related with this specifi version as I haven't tried
others.
I created a Linux (debian 64b) box with 35To (raid5) device (presented as
/dev/sdX). On this device I created a zpool.
On this Zpool I have created zfs filesystem shared via NFS (using 1Gbps
ethernet card) to Hypervisors hosts (XenServer / VMware) in order to test.
ZFS does not have any compression or deduplication activated.
*
*
* pool: XXXXX*
* state: ONLINE*
* scan: scrub canceled on Tue Nov 6 10:26:30 2012*
*config:*
*
*
* NAME STATE READ WRITE CKSUM*
* local ONLINE 0 0 0*
* sdc ONLINE 0 0 0*
*
*
*errors: No known data errors*
On this storage each hypervisor create vhd or vmdk files (sparse files).
Each file is attached as block device disk to Virtual Machines.
In these virtual machine I encounter very low peformance when writing
using dd.
We experience near 5-10MB/s rate.
Doing the same to an ext4 fs on the same server provides near 90MB/s data
rate.
As anyone experienced this ?
Do you have any idea on what is going on ?
Thanks for your help
--
*Best Regards,*
Jefferson *“**Diede”* Diego
*Linux System Administrator*
*Nextel**:* 55*86*248729 *iDEN:* +55-11-7763-9947*
Mobile: * +55-11-9.7598-6750
* **Skype:* jeffersondiego8
*Linux User:* 449363
Jerome
2012-11-06 14:48:43 UTC
Permalink
Sectors from disk seems to be 512b:

Disk /dev/sdc: 39999.8 GB, 39999777013760 bytes
256 heads, 63 sectors/track, 4844033 cylinders
Units = cylinders of 16128 * 512 = 8257536 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x00000000

So using 4K sectors may not improve performances.
Am I miising something.

Moreover it seems that my RAID card cannot do JBOD for some drives (only
for all) wich is not possible with my current configuration.
Post by Jefferson Diego Gomes Rosa
What is the sector size of yours hardisks, and what aligment have you used
to make the pool?
I had the same problem when I was using a 4K sector hard disk with
aligment to 512b (ashift=9)
When I'd recreated the pool with correct options (zpool create -o
ashift=12) I had a performance improve of at least 300% (salting from 9MB/s
to ~30MB/s).
Post by Jerome
Hello all,
I'm experiencing strange issue with last version (11) of ZoL.
I don't know if it related with this specifi version as I haven't tried
others.
I created a Linux (debian 64b) box with 35To (raid5) device (presented as
/dev/sdX). On this device I created a zpool.
On this Zpool I have created zfs filesystem shared via NFS (using 1Gbps
ethernet card) to Hypervisors hosts (XenServer / VMware) in order to test.
ZFS does not have any compression or deduplication activated.
*
*
* pool: XXXXX*
* state: ONLINE*
* scan: scrub canceled on Tue Nov 6 10:26:30 2012*
*config:*
*
*
* NAME STATE READ WRITE CKSUM*
* local ONLINE 0 0 0*
* sdc ONLINE 0 0 0*
*
*
*errors: No known data errors*
On this storage each hypervisor create vhd or vmdk files (sparse files).
Each file is attached as block device disk to Virtual Machines.
In these virtual machine I encounter very low peformance when writing
using dd.
We experience near 5-10MB/s rate.
Doing the same to an ext4 fs on the same server provides near 90MB/s data
rate.
As anyone experienced this ?
Do you have any idea on what is going on ?
Thanks for your help
--
*Best Regards,*
Jefferson *“**Diede”* Diego
*Linux System Administrator*
*Nextel**:* 55*86*248729 *iDEN:* +55-11-7763-9947*
Mobile: * +55-11-9.7598-6750
* **Skype:* jeffersondiego8
*Linux User:* 449363
RB
2012-11-06 15:35:08 UTC
Permalink
Never, ever trust what the disk reports to the OS on this. Check the
actual model # to see if it's an advanced-format drive.
Jerome
2012-11-06 16:10:27 UTC
Permalink
I'm looking for this information right now.
Disk model is: WDC WD2003FYYS-02W0B1 (2TB).
Seems a bit hard to find out this information.
Post by RB
Never, ever trust what the disk reports to the OS on this. Check the
actual model # to see if it's an advanced-format drive.
Jefferson Diego Gomes Rosa
2012-11-06 16:16:13 UTC
Permalink
Ok, According to this:
http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-701338.pdf
Your hard disk has just 512b per sector.
Post by Jerome
I'm looking for this information right now.
Disk model is: WDC WD2003FYYS-02W0B1 (2TB).
Seems a bit hard to find out this information.
Post by RB
Never, ever trust what the disk reports to the OS on this. Check the
actual model # to see if it's an advanced-format drive.
--
*Best Regards,*



Jefferson *“**Diede”* Diego

*Linux System Administrator*
Daniel Smedegaard Buus
2012-11-06 16:51:40 UTC
Permalink
On Tue, Nov 6, 2012 at 5:16 PM, Jefferson Diego Gomes Rosa <
Post by Jefferson Diego Gomes Rosa
http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-701338.pdf
Your hard disk has just 512b per sector.
But... That's unusual for a 2TB disk these days. For the sake of being able
to replace a disk at any point in the future, i would *never* create a pool
without ashift=12, regardless of the actual sector size of the current
disks.
Jerome
2012-11-06 16:58:23 UTC
Permalink
Ok so it's not related to block size/alignement.
As creating files through nfs is "fast" (80MB/s), and accessing/creating
data inside VMDK/VHD that are on this NFS share is slow (max 10MB/s), I
guess there is something to do with large (GB) sparse files.

I really guess I'm missing something important... but I'm really stuck on
this problem.
Post by Jefferson Diego Gomes Rosa
http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-701338.pdf
Your hard disk has just 512b per sector.
Post by Jerome
I'm looking for this information right now.
Disk model is: WDC WD2003FYYS-02W0B1 (2TB).
Seems a bit hard to find out this information.
Post by RB
Never, ever trust what the disk reports to the OS on this. Check the
actual model # to see if it's an advanced-format drive.
--
*Best Regards,*
Jefferson *“**Diede”* Diego
*Linux System Administrator*
Dan Swartzendruber
2012-11-06 17:01:37 UTC
Permalink
Post by Jerome
Ok so it's not related to block size/alignement.
As creating files through nfs is "fast" (80MB/s), and
accessing/creating data inside VMDK/VHD that are on this NFS share is
slow (max 10MB/s), I guess there is something to do with large (GB)
sparse files.
I really guess I'm missing something important... but I'm really stuck
on this problem.
curious here. what is your nfs share recordsize? vmware recommends 8KB
records for datastores. i believe the default is 128KB. can you try
changing that? i haven't read all the posts on this - i assume you have
eliminated sync mode as being the culprit?
Ryan How
2013-01-04 02:12:38 UTC
Permalink
Hi,

I am having what would appear to be a very similar problem. Only with
virtualbox VDI files. And I am not using it over NFS, but directly on
the file system.

Did you end up getting anywhere?. I was going to try using fixed size
virtual disk files instead of sparse and see if it helps, but then i'd
probably be better off using zvols instead. Performance for me gets so
bad with heavy IO in a VM it hangs the entire system.

Perhaps it is a fragmentation issue?, or block alignment? I don't know,
but I do know for me performance is so bad it is actually unusable and
causes VMs to crash due to what they see as disk failure.

Thanks, Ryan
Post by Jerome
Ok so it's not related to block size/alignement.
As creating files through nfs is "fast" (80MB/s), and
accessing/creating data inside VMDK/VHD that are on this NFS share is
slow (max 10MB/s), I guess there is something to do with large (GB)
sparse files.
I really guess I'm missing something important... but I'm really stuck
on this problem.
http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-701338.pdf <http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-701338.pdf>
Your hard disk has just 512b per sector.
I'm looking for this information right now.
Disk model is: WDC WD2003FYYS-02W0B1 (2TB).
Seems a bit hard to find out this information.
On Tue, Nov 6, 2012 at 7:48 AM, Jerome
Never, ever trust what the disk reports to the OS on this.
Check the
actual model # to see if it's an advanced-format drive.
--
/Best Regards,/
Jefferson /“//Diede”/ Diego
/Linux System Administrator/
RB
2012-11-06 16:42:46 UTC
Permalink
Post by Jerome
I'm looking for this information right now.
Disk model is: WDC WD2003FYYS-02W0B1 (2TB).
Seems a bit hard to find out this information.
As a 2TB drive I would presume it probably uses 4k sectors; as an RE4
I'm not so sure. Western Digital's spec sheet
(http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-701338.pdf)
seems to indicate 512B, but I've seen so many of those wrong I'm a
little jaded.

Looking back over the issue, I'm not so sure this is an AF issue.
Although it seems to be one of the most common performance issues, you
wouldn't see a difference between dd and ext4 on a virtual. I'm not
sure what the actual problem is, though - if I did I'd have two of the
items on my TODO list completed, as I'm seeing too slow of virtual I/O
over NFS to ZFS myself and have currently just masked it by aggressive
(write-back) host-side caching and async NFS. Let's hope the UPS
holds out... :-/
Jerome
2012-11-06 17:01:55 UTC
Permalink
What do you mean by "just masked it by aggressive
(write-back) host-side caching and async NFS. Let's hope the UPS
holds out... :-/ "
Post by RB
Post by Jerome
I'm looking for this information right now.
Disk model is: WDC WD2003FYYS-02W0B1 (2TB).
Seems a bit hard to find out this information.
As a 2TB drive I would presume it probably uses 4k sectors; as an RE4
I'm not so sure. Western Digital's spec sheet
(http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-701338.pdf)
seems to indicate 512B, but I've seen so many of those wrong I'm a
little jaded.
Looking back over the issue, I'm not so sure this is an AF issue.
Although it seems to be one of the most common performance issues, you
wouldn't see a difference between dd and ext4 on a virtual. I'm not
sure what the actual problem is, though - if I did I'd have two of the
items on my TODO list completed, as I'm seeing too slow of virtual I/O
over NFS to ZFS myself and have currently just masked it by aggressive
(write-back) host-side caching and async NFS. Let's hope the UPS
holds out... :-/
RB
2012-11-06 17:07:46 UTC
Permalink
Post by Jerome
What do you mean by "just masked it by aggressive
(write-back) host-side caching and async NFS. Let's hope the UPS
holds out... :-/ "
I use KVM as a virtualization solution, and can set individual guest
drives' cache as write-through (default) or write-back (aggressive).
Exporting the disk share with the option "async" in /etc/exports
allows NFS to ignore the SYNC requirements of write operations and
just immediately returns that they have been successful, whether or
not they actually were yet. Both of these options assume that my
power is reliable, and that the underlying systems will be shut down
gracefully prior to losing power. Together they allow me to unsafely
increase my IOPS by assuming anything in-memory is safe.
Massimo Maggi
2012-11-06 12:04:14 UTC
Permalink
Post by Jerome
I created a Linux (debian 64b) box with 35To (raid5) device (presented as
/dev/sdX). On this device I created a zpool.
The recommended configuration for ZFS is passing the underlying disks as
single block devices and add them to the pool without using hardware RAID.
With said configuration, you can get the best of ZFS from data integrity
and performance points of view.

On this Zpool I have created zfs filesystem shared via NFS (using 1Gbps
Post by Jerome
ethernet card) to Hypervisors hosts (XenServer / VMware) in order to test.
NFS write performance benefits greatly from having an SSD (even few GB, you
can use the remaining capacity as L2ARC cache) as a log device in the pool,
due to its default of requiring synchronous operation.

Regards,
Massimo Maggi
Jerome
2012-11-06 14:42:41 UTC
Permalink
Something wired is that accessing diectly files through NFS (ie not using
vmdk files) is very fast.
It seems that accessing sparse files is a performance killer when using ZFS
Post by Massimo Maggi
Post by Jerome
I created a Linux (debian 64b) box with 35To (raid5) device (presented as
/dev/sdX). On this device I created a zpool.
The recommended configuration for ZFS is passing the underlying disks as
single block devices and add them to the pool without using hardware RAID.
With said configuration, you can get the best of ZFS from data integrity
and performance points of view.
On this Zpool I have created zfs filesystem shared via NFS (using 1Gbps
Post by Jerome
ethernet card) to Hypervisors hosts (XenServer / VMware) in order to test.
NFS write performance benefits greatly from having an SSD (even few GB,
you can use the remaining capacity as L2ARC cache) as a log device in the
pool, due to its default of requiring synchronous operation.
Regards,
Massimo Maggi
Dead Horse
2012-11-08 04:08:18 UTC
Permalink
I use ZOL as a back end to my ovirt storage domains, which are accessed via
NFS. I spent some extensive time testing the best way to achieve maximum
throughput to KVM based VM's with ZOL. My storage inter-connects are
multiple port-channeled 10GB links on a 10GB switch stack between the VM
hosts and my ZOL storage. My zpools on each storage node consist of
anywhere from 24 disks to 48 disks (multipathed). Each zpool has two low
latency SSD striped and acting as an L2ARC and two more acting as a striped
and acting as the ZIL.

I found that using ZVOL's formatted with ext4 with a larger blocksize
yielded the best performance when combined with KVM and RAW thick or thin
provisioned file backed disks.
- I create a zvol as follows (Example): zfs create -V 100G -o
volblocksize=64K das0/foo
- After that a simple mkfs.ext4 -L <zvolname> /dev/das0/foo
- mount command (Example) mount /dev/das0/foo /some/mount/point -o noatime
- /some/mount/point is exported via NFS v3
- You can enable NFS async for additional performance however with the
understanding on the implications of doing so
Additionally the qemu/kvm VM disk cache policy set to none with the IO
policy set to threaded.
Post by Jerome
Hello all,
I'm experiencing strange issue with last version (11) of ZoL.
I don't know if it related with this specifi version as I haven't tried
others.
I created a Linux (debian 64b) box with 35To (raid5) device (presented as
/dev/sdX). On this device I created a zpool.
On this Zpool I have created zfs filesystem shared via NFS (using 1Gbps
ethernet card) to Hypervisors hosts (XenServer / VMware) in order to test.
ZFS does not have any compression or deduplication activated.
*
*
* pool: XXXXX*
* state: ONLINE*
* scan: scrub canceled on Tue Nov 6 10:26:30 2012*
*config:*
*
*
* NAME STATE READ WRITE CKSUM*
* local ONLINE 0 0 0*
* sdc ONLINE 0 0 0*
*
*
*errors: No known data errors*
On this storage each hypervisor create vhd or vmdk files (sparse files).
Each file is attached as block device disk to Virtual Machines.
In these virtual machine I encounter very low peformance when writing
using dd.
We experience near 5-10MB/s rate.
Doing the same to an ext4 fs on the same server provides near 90MB/s data
rate.
As anyone experienced this ?
Do you have any idea on what is going on ?
Thanks for your help
Jerome
2012-11-08 17:32:04 UTC
Permalink
Hello,

Formating suhe a zvol with an ext4 fs is not working.
When trying to mount it I get:
mount: wrong fs type, bad option, bad superblock on /dev/zd16,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
Post by Dead Horse
I use ZOL as a back end to my ovirt storage domains, which are accessed
via NFS. I spent some extensive time testing the best way
to achieve maximum throughput to KVM based VM's with ZOL. My storage
inter-connects are multiple port-channeled 10GB links on a 10GB switch
stack between the VM hosts and my ZOL storage. My zpools on each storage
node consist of anywhere from 24 disks to 48 disks (multipathed). Each
zpool has two low latency SSD striped and acting as an L2ARC and two more
acting as a striped and acting as the ZIL.
I found that using ZVOL's formatted with ext4 with a larger blocksize
yielded the best performance when combined with KVM and RAW thick or thin
provisioned file backed disks.
- I create a zvol as follows (Example): zfs create -V 100G -o
volblocksize=64K das0/foo
- After that a simple mkfs.ext4 -L <zvolname> /dev/das0/foo
- mount command (Example) mount /dev/das0/foo /some/mount/point -o noatime
- /some/mount/point is exported via NFS v3
- You can enable NFS async for additional performance however with the
understanding on the implications of doing so
Additionally the qemu/kvm VM disk cache policy set to none with the IO
policy set to threaded.
Post by Jerome
Hello all,
I'm experiencing strange issue with last version (11) of ZoL.
I don't know if it related with this specifi version as I haven't tried
others.
I created a Linux (debian 64b) box with 35To (raid5) device (presented as
/dev/sdX). On this device I created a zpool.
On this Zpool I have created zfs filesystem shared via NFS (using 1Gbps
ethernet card) to Hypervisors hosts (XenServer / VMware) in order to test.
ZFS does not have any compression or deduplication activated.
*
*
* pool: XXXXX*
* state: ONLINE*
* scan: scrub canceled on Tue Nov 6 10:26:30 2012*
*config:*
*
*
* NAME STATE READ WRITE CKSUM*
* local ONLINE 0 0 0*
* sdc ONLINE 0 0 0*
*
*
*errors: No known data errors*
On this storage each hypervisor create vhd or vmdk files (sparse files).
Each file is attached as block device disk to Virtual Machines.
In these virtual machine I encounter very low peformance when writing
using dd.
We experience near 5-10MB/s rate.
Doing the same to an ext4 fs on the same server provides near 90MB/s data
rate.
As anyone experienced this ?
Do you have any idea on what is going on ?
Thanks for your help
Dead Horse
2012-11-09 02:55:43 UTC
Permalink
could you post the output of "zfs list" on your system
Post by Jerome
Hello,
Formating suhe a zvol with an ext4 fs is not working.
mount: wrong fs type, bad option, bad superblock on /dev/zd16,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so
Post by Dead Horse
I use ZOL as a back end to my ovirt storage domains, which are accessed
via NFS. I spent some extensive time testing the best way
to achieve maximum throughput to KVM based VM's with ZOL. My storage
inter-connects are multiple port-channeled 10GB links on a 10GB switch
stack between the VM hosts and my ZOL storage. My zpools on each storage
node consist of anywhere from 24 disks to 48 disks (multipathed). Each
zpool has two low latency SSD striped and acting as an L2ARC and two more
acting as a striped and acting as the ZIL.
I found that using ZVOL's formatted with ext4 with a larger blocksize
yielded the best performance when combined with KVM and RAW thick or thin
provisioned file backed disks.
- I create a zvol as follows (Example): zfs create -V 100G -o
volblocksize=64K das0/foo
- After that a simple mkfs.ext4 -L <zvolname> /dev/das0/foo
- mount command (Example) mount /dev/das0/foo /some/mount/point -o noatime
- /some/mount/point is exported via NFS v3
- You can enable NFS async for additional performance however with the
understanding on the implications of doing so
Additionally the qemu/kvm VM disk cache policy set to none with the IO
policy set to threaded.
Post by Jerome
Hello all,
I'm experiencing strange issue with last version (11) of ZoL.
I don't know if it related with this specifi version as I haven't tried
others.
I created a Linux (debian 64b) box with 35To (raid5) device (presented
as /dev/sdX). On this device I created a zpool.
On this Zpool I have created zfs filesystem shared via NFS (using 1Gbps
ethernet card) to Hypervisors hosts (XenServer / VMware) in order to test.
ZFS does not have any compression or deduplication activated.
*
*
* pool: XXXXX*
* state: ONLINE*
* scan: scrub canceled on Tue Nov 6 10:26:30 2012*
*config:*
*
*
* NAME STATE READ WRITE CKSUM*
* local ONLINE 0 0 0*
* sdc ONLINE 0 0 0*
*
*
*errors: No known data errors*
On this storage each hypervisor create vhd or vmdk files (sparse files).
Each file is attached as block device disk to Virtual Machines.
In these virtual machine I encounter very low peformance when writing
using dd.
We experience near 5-10MB/s rate.
Doing the same to an ext4 fs on the same server provides near 90MB/s
data rate.
As anyone experienced this ?
Do you have any idea on what is going on ?
Thanks for your help
Continue reading on narkive:
Loading...