15 Oct

1. Failure DescriptionThe task was to expand the Linux file system per the customer's request. During the process of mapping the hard disk, creating the PV, and adding it to the VG, an "unknown device" error was encountered.

bash[root@KMS-Svr cache]# pvs
Incorrect metadata area header checksum on /dev/sdb1 at offset 4096
Couldn't find device with uuid ZPy1sa-fXhe-qrcQ-HFhi-eazU-4Mg3-toY3Xv.
PV             VG            Fmt  Attr PSize   PFree
/dev/sda2      vg_testkmssvr lvm2 a--  149.51g      0
/dev/sdb1                    lvm2 a--  499.99g 499.99g
/dev/sdc1      vg_testkmssvr lvm2 a--  100.00g      0
/dev/sdd       vm2t          lvm2 a--    2.00t      0
/dev/sde       vm2t          lvm2 a--    2.20t      0
/dev/sdf1      vm2t          lvm2 a--    2.00t      0
/dev/sdg       vm2t          lvm2 a--    2.00t      0
/dev/sdh       vm2t          lvm2 a--    2.00t      0
/dev/sdi       vm2t          lvm2 a--    2.00t      0
/dev/sdj       vm2t          lvm2 a--    2.00t      0
/dev/sdk1      vm2t          lvm2 a--    1.86t  74.36g
/dev/sdk2      vg_testkmssvr lvm2 a--  140.64g  19.63g
unknown device vm2t          lvm2 a-m    2.00t   2.00t

2. Troubleshooting

1. Log AnalysisNo specific errors were found in the messages log. However, dmesg logs indicated the following:

bashsd 2:0:12:0: [sdl] Very big device. Trying to use READ CAPACITY(16).
sd 2:0:12:0: [sdl] Cache data unavailable
sd 2:0:12:0: [sdl] Assuming drive cache: write through

This showed the disk was recognized at the OS level. The customer confirmed that this disk was used on a different host, which was taken offline. The engineer assumed old VG/PV information was still stored on the disk.

2. Removing Problematic DisksRunning vgreduce to remove missing devices:

bash[root@KMS-Svr cache]# vgreduce vm2t --removemissing
Incorrect metadata area header checksum on /dev/sdb1 at offset 4096
Couldn't find device with uuid ZPy1sa-fXhe-qrcQ-HFhi-eazU-4Mg3-toY3Xv.
Wrote out consistent volume group vm2t

After running vgreduce, the disk status appeared normal, but it could not be displayed using pvs. The fdisk -l command showed the disk:

plaintextDisk /dev/sdl: 2199.0 GB, 2199023255552 bytes
Device Boot      Start         End      Blocks   Id  System
/dev/sdl1               1      267349  2147480811   8e  Linux LVM

Using lsblk, the system showed disk sdl and partition sdl1, so PV recreation was required.

3. Recreating the PVRunning the command:

bashpvcreate /dev/sdl1 --force

The pvs command confirmed the PV creation was successful. Then, the vgextend command added /dev/sdl1 to the VG:

bashvgextend vm2t /dev/sdl1

3. Lessons Learned

When "unknown device" errors occur, running vgscan to rescan LVM block devices can help. Exclude problematic PVs from the VG to restore VG functionality.

4. Additional Knowledge

  1. vgscan Command Usage - Scans devices and sends metadata to the lvmetad daemon, handles lock failures, manages missing special LVM files, and adjusts output format.
  2. LVM File System Size Limits - Resize failures can occur on systems with outdated kernels, which may need upgrades for larger filesystems.
  3. LVM and Kernel Relationships- Maximum LVM limits depend on CPU architecture and Linux kernel versions:
    • Linux kernel 2.4.x: Max 2TB
    • Linux kernel 2.6.x (32-bit): Max 16TB
    • Linux kernel 2.6.x (64-bit): Up to 8EB