a***@whisperpc.com
2014-12-08 18:06:30 UTC
I'm running ZFS on top of CentOS 6 (recently patched) on a small file
server (8TB after RAID-Z2). After about a month, the system runs out of
memory. I believe the memory leak is in ZFS.
I collected hourly copies of /proc/spl/kstat/zfs/arcstats, starting at
20141112.101839, and ending at 20141208.161002 (last copy prior to the
system hanging). I combined the results into a CSV file (attached - I
hope it goes through). I added slab information (/proc/spl/kmem/slab) to
the data being gathered, but that was just after the system boot (no
useful data yet).
Does anyone have any ideas about solving this problem?
The configuration of the pool (zpool status) is as follows:
pool: data
state: ONLINE
scan: scrub repaired 0 in 5h45m with 0 errors on Fri Oct 3 03:05:30 2014
config:
NAME STATE READ
WRITE CKSUM
data ONLINE 0
0 0
raidz2-0 ONLINE 0
0 0
ata-ST91000640NS_9XG6JS7B ONLINE 0
0 0
ata-ST91000640NS_9XG6JTCF ONLINE 0
0 0
ata-ST91000640NS_9XG6JSX9 ONLINE 0
0 0
ata-ST91000640NS_9XG6JSMC ONLINE 0
0 0
ata-ST91000640NS_9XG6K3AK ONLINE 0
0 0
ata-ST91000640NS_9XG6LGZ8 ONLINE 0
0 0
ata-ST91000640NS_9XG6JT27 ONLINE 0
0 0
ata-ST91000640NS_9XG6JSFW ONLINE 0
0 0
ata-ST91000640NS_9XG6JS7N ONLINE 0
0 0
ata-ST91000640NS_9XG6JSWG ONLINE 0
0 0
logs
mirror-1 ONLINE 0
0 0
ata-INTEL_SSDSC2BA100G3_BTTV42130229100FGN ONLINE 0
0 0
ata-INTEL_SSDSC2BA100G3_BTTV4213020K100FGN ONLINE 0
0 0
cache
ata-INTEL_SSDSC2BA100G3_BTTV42130229100FGN ONLINE 0
0 0
ata-INTEL_SSDSC2BA100G3_BTTV4213020K100FGN ONLINE 0
0 0
errors: No known data errors
The data disks are 1TB 7200RPM Nearline SATA drives. The log slices are
16GiB, and the cache slices are what's left of the 100GB SSDs (~80GiB).
The /etc/modprobe.d/zfs.conf file is as follows:
#
# Set ZFS tuning parameters.
#
# System memory = 64GB
# L2ARC - two fast SSDs
# Minimum - 24GB - Don't allow the system to shrink the ARC to less
# than 24GB
#
# The ARC can choose to be smaller, but it can't be forced to be
# smaller by memory pressure.
options zfs zfs_arc_min=25769803776
# 32GB
# options zfs zfs_arc_min=34359738368
# Maximum - 40GB - Don't allow the ARC to grow to be larger than 40GB
options zfs zfs_arc_max=42949672960
# 48GB
# options zfs zfs_arc_max=51539607552
# Set the ARC shrink size to 1/256 memory (256MB) at a time
#
# The process of shrinking the ARC is very time consuming. Freeing
# large amounts at a time can cause a huge latency spike, which is
# bad for interactive response.
options zfs zfs_arc_shrink_shift=8
# Set the L2ARC write buffer size to 24MB
options zfs l2arc_write_max=25165824
# Set the buffer size to 48MB while the L2ARC for initial fill
options zfs l2arc_write_boost=50331648
# Allocate 4 L2ARC write buffers (2 per device)
options zfs l2arc_headroom=4
# Sync every second. This will keep the amount of data per sync down,
# delivering smoother operation.
options zfs zfs_txg_timeout=1
To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+***@zfsonlinux.org.
server (8TB after RAID-Z2). After about a month, the system runs out of
memory. I believe the memory leak is in ZFS.
I collected hourly copies of /proc/spl/kstat/zfs/arcstats, starting at
20141112.101839, and ending at 20141208.161002 (last copy prior to the
system hanging). I combined the results into a CSV file (attached - I
hope it goes through). I added slab information (/proc/spl/kmem/slab) to
the data being gathered, but that was just after the system boot (no
useful data yet).
Does anyone have any ideas about solving this problem?
The configuration of the pool (zpool status) is as follows:
pool: data
state: ONLINE
scan: scrub repaired 0 in 5h45m with 0 errors on Fri Oct 3 03:05:30 2014
config:
NAME STATE READ
WRITE CKSUM
data ONLINE 0
0 0
raidz2-0 ONLINE 0
0 0
ata-ST91000640NS_9XG6JS7B ONLINE 0
0 0
ata-ST91000640NS_9XG6JTCF ONLINE 0
0 0
ata-ST91000640NS_9XG6JSX9 ONLINE 0
0 0
ata-ST91000640NS_9XG6JSMC ONLINE 0
0 0
ata-ST91000640NS_9XG6K3AK ONLINE 0
0 0
ata-ST91000640NS_9XG6LGZ8 ONLINE 0
0 0
ata-ST91000640NS_9XG6JT27 ONLINE 0
0 0
ata-ST91000640NS_9XG6JSFW ONLINE 0
0 0
ata-ST91000640NS_9XG6JS7N ONLINE 0
0 0
ata-ST91000640NS_9XG6JSWG ONLINE 0
0 0
logs
mirror-1 ONLINE 0
0 0
ata-INTEL_SSDSC2BA100G3_BTTV42130229100FGN ONLINE 0
0 0
ata-INTEL_SSDSC2BA100G3_BTTV4213020K100FGN ONLINE 0
0 0
cache
ata-INTEL_SSDSC2BA100G3_BTTV42130229100FGN ONLINE 0
0 0
ata-INTEL_SSDSC2BA100G3_BTTV4213020K100FGN ONLINE 0
0 0
errors: No known data errors
The data disks are 1TB 7200RPM Nearline SATA drives. The log slices are
16GiB, and the cache slices are what's left of the 100GB SSDs (~80GiB).
The /etc/modprobe.d/zfs.conf file is as follows:
#
# Set ZFS tuning parameters.
#
# System memory = 64GB
# L2ARC - two fast SSDs
# Minimum - 24GB - Don't allow the system to shrink the ARC to less
# than 24GB
#
# The ARC can choose to be smaller, but it can't be forced to be
# smaller by memory pressure.
options zfs zfs_arc_min=25769803776
# 32GB
# options zfs zfs_arc_min=34359738368
# Maximum - 40GB - Don't allow the ARC to grow to be larger than 40GB
options zfs zfs_arc_max=42949672960
# 48GB
# options zfs zfs_arc_max=51539607552
# Set the ARC shrink size to 1/256 memory (256MB) at a time
#
# The process of shrinking the ARC is very time consuming. Freeing
# large amounts at a time can cause a huge latency spike, which is
# bad for interactive response.
options zfs zfs_arc_shrink_shift=8
# Set the L2ARC write buffer size to 24MB
options zfs l2arc_write_max=25165824
# Set the buffer size to 48MB while the L2ARC for initial fill
options zfs l2arc_write_boost=50331648
# Allocate 4 L2ARC write buffers (2 per device)
options zfs l2arc_headroom=4
# Sync every second. This will keep the amount of data per sync down,
# delivering smoother operation.
options zfs zfs_txg_timeout=1
To unsubscribe from this group and stop receiving emails from it, send an email to zfs-discuss+***@zfsonlinux.org.