970 EVO Plus with Media Errors and ZFS errors after 30TBW/3000hours

I have two Samsung 970 EVO Plus 1TB NVME SSDs on a generic NVME PCIe adaptor
(amazon) running for less than
150 days in a ZFS stripe.

Already I’m getting read and checksum failures in ZFS and SMART isn’t happy
either. Is this expected?

The data isn’t critical (monitoring databases, temporary storage) and I have
backups of it anyway, but the issues are affecting my usage.

zpool status:

pool: shasta
 state: DEGRADED
status: One or more devices has experienced an error resulting in data
        corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
        entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub repaired 0B in 0 days 00:17:43 with 1 errors on Sun Jan 29 14:32:08 2023
remove: Removal of /dev/mapper/shasta0_crypt canceled on Sun Jan 29 14:13:49 2023
config:

        NAME             STATE     READ WRITE CKSUM
        shasta           DEGRADED     0     0     0
          shasta0_crypt  DEGRADED     6     0    38  too many errors
          shasta1_crypt  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:
<snip>

And here’s smartctl for nvme0:

smartcl -a /dev/nvme0

smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-135-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO Plus 1TB
Serial Number:                      <redacted>
Firmware Version:                   3B2QEXM7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      6
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization:            380,888,137,728 [380 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 5b1140c3a7
Local Time is:                      Fri Feb  3 04:49:54 2023 PST
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0057):     Comp Wr_Unc DS_Mngmt Sav/Sel_Feat Timestmp
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     82 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.54W       -        -    0  0  0  0        0       0
 1 +     7.54W       -        -    1  1  1  1        0     200
 2 +     7.54W       -        -    2  2  2  2        0    1000
 3 -   0.0500W       -        -    3  3  3  3     2000    1200
 4 -   0.0050W       -        -    4  4  4  4      500    9500

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        36 Celsius
Available Spare:                    95%
Available Spare Threshold:          10%
Percentage Used:                    2%
Data Units Read:                    54,168,819 [27.7 TB]
Data Units Written:                 47,512,583 [24.3 TB]
Host Read Commands:                 260,828,947
Host Write Commands:                684,198,598
Controller Busy Time:               2,284
Power Cycles:                       10
Power On Hours:                     2,820
Unsafe Shutdowns:                   4
Media and Data Integrity Errors:    36
Error Information Log Entries:      36
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               36 Celsius
Temperature Sensor 2:               39 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0         36     1  0x00ae  0x4502  0x000    915751488     1     -
  1         35     5  0x01a6  0x4502  0x000    915751312     1     -
  2         34     2  0x00f4  0x4502  0x000    915750928     1     -
  3         33     4  0x028f  0x4502  0x000    915742112     1     -
  4         32     4  0x02be  0xc502  0x000    232311792     1     -
  5         31     3  0x007f  0x4502  0x000    232311792     1     -
  6         30     1  0x00a2  0x4502  0x000    232311792     1     -
  7         29     8  0x0278  0x4502  0x000    219233800     1     -
  8         28     8  0x0277  0x4502  0x000    915751488     1     -
  9         27     1  0x0083  0x4502  0x000    915751312     1     -
 10         26     3  0x0043  0x4502  0x000    915750928     1     -
 11         25     6  0x02b6  0x4502  0x000    915742112     1     -
 12         24     1  0x00b7  0xc502  0x000    232311792     1     -
 13         23     3  0x005f  0x4502  0x000    232311792     1     -
 14         22     6  0x02ae  0x4502  0x000    915751488     1     -
 15         21     1  0x00b2  0x4502  0x000    232311664     1     -
... (20 entries not shown)

And for nvme1

smartcl -a /dev/nvme1
smartctl 7.1 2019-12-30 r5022 [x86_64-linux-5.4.0-135-generic] (local build)
Copyright (C) 2002-19, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Model Number:                       Samsung SSD 970 EVO Plus 1TB
Serial Number:                      <snip>
Firmware Version:                   3B2QEXM7
PCI Vendor/Subsystem ID:            0x144d
IEEE OUI Identifier:                0x002538
Total NVM Capacity:                 1,000,204,886,016 [1.00 TB]
Unallocated NVM Capacity:           0
Controller ID:                      6
Number of Namespaces:               1
Namespace 1 Size/Capacity:          1,000,204,886,016 [1.00 TB]
Namespace 1 Utilization:            340,720,836,608 [340 GB]
Namespace 1 Formatted LBA Size:     512
Namespace 1 IEEE EUI-64:            002538 5b1140c2bf
Local Time is:                      Fri Feb  3 04:49:56 2023 PST
Firmware Updates (0x16):            3 Slots, no Reset required
Optional Admin Commands (0x0017):   Security Format Frmw_DL Self_Test
Optional NVM Commands (0x0057):     Comp Wr_Unc DS_Mngmt Sav/Sel_Feat Timestmp
Maximum Data Transfer Size:         128 Pages
Warning  Comp. Temp. Threshold:     82 Celsius
Critical Comp. Temp. Threshold:     85 Celsius

Supported Power States
St Op     Max   Active     Idle   RL RT WL WT  Ent_Lat  Ex_Lat
 0 +     7.54W       -        -    0  0  0  0        0       0
 1 +     7.54W       -        -    1  1  1  1        0     200
 2 +     7.54W       -        -    2  2  2  2        0    1000
 3 -   0.0500W       -        -    3  3  3  3     2000    1200
 4 -   0.0050W       -        -    4  4  4  4      500    9500

Supported LBA Sizes (NSID 0x1)
Id Fmt  Data  Metadt  Rel_Perf
 0 +     512       0         0

=== START OF SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART/Health Information (NVMe Log 0x02)
Critical Warning:                   0x00
Temperature:                        46 Celsius
Available Spare:                    96%
Available Spare Threshold:          10%
Percentage Used:                    3%
Data Units Read:                    60,734,021 [31.0 TB]
Data Units Written:                 54,941,078 [28.1 TB]
Host Read Commands:                 287,178,400
Host Write Commands:                728,180,577
Controller Busy Time:               2,680
Power Cycles:                       10
Power On Hours:                     3,180
Unsafe Shutdowns:                   4
Media and Data Integrity Errors:    10
Error Information Log Entries:      10
Warning  Comp. Temperature Time:    0
Critical Comp. Temperature Time:    0
Temperature Sensor 1:               46 Celsius
Temperature Sensor 2:               56 Celsius

Error Information (NVMe Log 0x01, max 64 entries)
Num   ErrCount  SQId   CmdId  Status  PELoc          LBA  NSID    VS
  0         10     3  0x03e5  0xc502  0x000    176284544     1     -
  1          9     5  0x033b  0x4502  0x000    176284544     1     -
  2          8     8  0x0149  0xc502  0x000    176284544     1     -
  3          7     1  0x00ba  0x4502  0x000    176284544     1     -
  4          6     7  0x00c4  0xc502  0x000    176284544     1     -
  5          5     1  0x008d  0x4502  0x000    176284544     1     -
  6          4     5  0x033c  0xc502  0x000    176284544     1     -
  7          3     2  0x0133  0x4502  0x000    176284544     1     -
  8          2     1  0x00a0  0xc502  0x000    176284544     1     -
  9          1     2  0x0111  0x4502  0x000    176284544     1     -

I ended up RMAing them and got two replacements, both had the 4B2QEXM7 firmware.