ConnectX software maybe overwrites FreeBSD system directories' owner.

Continuation from “system directories owner changes to unknown UID 6151” problem.

I listed system directories up which was owned by non-root user with a below command.

# find / -type d ! -uid 0 ! -path "*/home/*" ! -path "*/zdata/*" ! -path "*/zbackup/*" -print0 | xargs -0 stat -f "%u %g %N" | tee ~/non_root_owner_dirs.txt

And then, these directories' onwer UID got to '6151.'

6151 0 /etc
6151 0 /etc/mft
6151 0 /etc/mft/fwtrace_cfg
6151 0 /usr
6151 0 /usr/bin
6151 0 /usr/include
6151 0 /usr/lib
6151 0 /usr/lib/bash_libs
6151 0 /usr/lib/mft
6151 0 /usr/lib/mft/mtcr_plugins
6151 0 /usr/lib/mft/python_tools
6151 0 /usr/lib/mft/python_tools/mlxmcg
6151 0 /usr/lib/mft/python_tools/mst
6151 0 /usr/lib/mft/python_tools/mstdump
6151 0 /usr/lib/mft/tcl
6151 0 /usr/lib/mft/tcl/bin
6151 0 /usr/lib/mft/tcl/lib
6151 0 /usr/lib/mft/tcl/lib/tcl8.4
6151 0 /usr/share
6151 0 /usr/share/man
6151 0 /usr/share/man/man1
6151 0 /usr/share/mft
6151 0 /usr/share/mft/mlxconfig_dbs
6151 0 /usr/share/mft/mstdump_dbs

I checked some of them and found a few files' UIDs were also modified to 6151. Additionally most of their mdate were 2018/11/23.

I think Mellanox's ConnectX-3 utility software install might modify these files, I guessed from the files and mdate. Come on, Mellanox.

Fortunately, there are no storange files other than owner 6151, so I managed to change their owner to 'root' by below command though I'm not sure their original owener was root or not.

# find / -uid 6151 ! -path "*/home/*" ! -path "*/zdata/*" ! -path "*/zbackup/*" -print0 | xargs -0 chown root

Some QSFP+ transceivers may cause no re-link up 40GbE connections on FreeBSD

Five months have passed since I setup a 40GBASE-SR4 network between my PC (Windows 10) and home server (FreeBSD) by ConnectX-3. When the PC went to sleep and resumed, I don't know why, the 40GBASE-SR4 connection could have not linked up again unless I physically unplug and plug a QSFP+ transceiver on the server side.

The problem is certain to be no TX signals on the server when it happens.

$ ifconfig -v mlxen0
        ether e4:1d:2d:74:16:e0
        hwaddr e4:1d:2d:74:16:e0
        media: Ethernet 40Gbase-CR4 <full-duplex> (autoselect)
        status: no carrier
        plugged: QSFP+ 40GBASE-SR4 (MPO Parallel Optic)
        vendor: Mellanox PN: MC2210411-SR4 SN: MEQSRIC0115 DATE: 2015-03-23
        compliance level: Unspecified
        nominal bitrate: 10300 Mbps
        module temperature: 40.00 C voltage: 3.22 Volts
        lane 1: RX: 0.57 mW (-2.37 dBm) TX: 0.36 mW (-4.38 dBm)
        lane 2: RX: 1.06 mW (0.26 dBm) TX: 0.37 mW (-4.30 dBm)
        lane 3: RX: 0.96 mW (-0.17 dBm) TX: 0.00 mW (-30.46 dBm)
        lane 4: RX: 1.12 mW (0.52 dBm) TX: 0.37 mW (-4.20 dBm)

If the PC side will do so because of an electric instability by sleep and resume, that makes sense. However, I have no idea why the server side will do so. It seems to be a compatibility issue in the long run. I had replaced the transceiver which is 10Gtek's compatible module AMQ10-SR4-M1 with Avago AFBR-79EQPZ two months ago, but the problem has no longer happened at the moment. The PC side uses AMQ10-SR4-M1 no change from before and works fine.

It has got to be the compatibility issue, doesn't it?

Windows' Tiered Storage Space causes weird hitching

I built a tiered Storage Space on Windows Storage Server 2016 and create a NTFS volume from it. After that, I fell on the problem the server got hitching and took suffering time to open folders when another machine was writing data into the server via CIFS. It looked to me like the problem occurred if SSD-tier on the Storage Space would be filled up, but I'm still looking for a nice solution.

The storage components are below:

  • SSD-tier
    • Intel DC S3500 240GB x2 (RAID-0. Only 160GB is assigned to the Storage Space.)
  • HDD-tier
    • 8TB 7200RPM SATA x6(RAID-10. Stripped 3 set of a pair of mirrored HDD. All HDDs are CMR.)
  • One NTFS volume is allocated 100% of the pool.

Both tier are logical drives underlying a hardware RAID card, so the server recognise them as each one drive. (Well, this configuration is not recommended in fact.)

I tried to copy 96 files to the volume which sizes are 1KB to 4GB, total 25.2GB. The copying goes well at first, but stops suddenly on the way. TaskManager tells Tiered Storage Management (記憶域階層管理) is active in this situation, so that a data moving process from SSD-tier to HDD-tier may be working.

And then, I saw by ResourceMonitor that target files' I/O response time was over 1000ms and disk queues got quite a lot. (The queue is normally under 1, or around 2 or 3 at most if going well.) It's too long latency for the data moving…

Looking at actual responsiveness, the system seems to block file I/Os until SSD-tier obtains a certain amount of empty spaces. According to Microsoft, IOPS will decrease drastically equivalent to HDD performance if SSD-tier is full, but it doesn't mean the process completely stops.

Usage examples of the Tiered Storage Space I can read on the net are mostly Hyper-V related, so I wonder it is ever unsuitable for file server. Be that as it may, I feel it is the usually case that frequently accessed data is placed in the SSD-tier to speed up with file server use.

I've gone through a performance issue once when handling a tons of files by Samba, so I choose Windows Server because I thought it was comfortable by official CIFS implementation, sigh indeed… It's just beginning.

(2018-08-16 EDIT)

I found the same report of my problem on Microsoft Japan Forum which is as very useful as a fart in a lift for me: 記憶域スペースで階層化構築時における急激なパフォーマンス低下について

A complete good-for-nothing answer makes me an empty laugh.

Make sure zfs_enable="YES" when ZFS isn't mounted automatically

Make sure to set zfs_enabled=“YES” in /etc/rc.conf when ZFS pool excluding root pool isn't mounted automatically at system boot up.

The root pool is mounted automatically if the setting dosen't exist, so it it difficult to find the problem. I checked canmount and mountpoint properties of unmounted ZFS pool, but they were okay. I had trouble detecting its reason.

We should be extra careful of it in case we manually install the FreeBSD without bsdinstall.

Finally understood the reason why Samba 4.7.4 wastes huge RAM on NAS4Free

I experienced the Samba daemons wolfed a lot of memories when I looked into bad behaviour of CIFS shareing on my friend's NAS. They wastesd gigabyte order memories per one process, then consumed 16GB of physical memory and 64GB of swap. I had no choice but to shut down the machine forcibly. It was clearly unusual. I think the lack of memory caused proximately the problem because ARC couldn't use enough memory and therefore storage performance was poor.

I tried to fiddle with some options, then it seemed a shadow copy option brought the disaster. A following picture shows difference between the option 'On' and 'Off' of top command.

Left-side is “Shadow copy enabled”, right-side is “disabled.”

It is alarmingly at-a-glance. The memory usages were different order of magnitude literally. The samba enabled shadow copy option almost dried up the memory in less than a day, on the other hand, disabled one works fine four-day-old alghough the load average is up to 13. FYI, the file sharing service also works in this situation.

The VSS in Samba means a vfs_shadow_copy2 module has some bugs, doesn't it? I felt there were no problem if the options was enabled when NAS4Free was version 9 or 10 although my memory was so dim.

  • en/start.txt
  • Last modified: 2021-02-02 14:37
  • by Decomo