aboutsummaryrefslogtreecommitdiff
path: root/content/notes/stuff-about-pcie.md
diff options
context:
space:
mode:
Diffstat (limited to 'content/notes/stuff-about-pcie.md')
-rw-r--r--content/notes/stuff-about-pcie.md125
1 files changed, 62 insertions, 63 deletions
diff --git a/content/notes/stuff-about-pcie.md b/content/notes/stuff-about-pcie.md
index b783924..b540d24 100644
--- a/content/notes/stuff-about-pcie.md
+++ b/content/notes/stuff-about-pcie.md
@@ -1,9 +1,6 @@
---
title: Stuff about PCIe
date: 2022-01-03
-tags:
- - linux
- - harwdare
---
## Speed
@@ -12,7 +9,7 @@ The most common versions are 3 and 4, while 5 is starting to be
available with newer Intel processors.
| ver | encoding | transfer rate | x1 | x2 | x4 | x8 | x16 |
-|-----|-----------|---------------|------------|-------------|------------|------------|-------------|
+| --- | --------- | ------------- | ---------- | ----------- | ---------- | ---------- | ----------- |
| 1 | 8b/10b | 2.5GT/s | 250MB/s | 500MB/s | 1GB/s | 2GB/s | 4GB/s |
| 2 | 8b/10b | 5.0GT/s | 500MB/s | 1GB/s | 2GB/s | 4GB/s | 8GB/s |
| 3 | 128b/130b | 8.0GT/s | 984.6 MB/s | 1.969 GB/s | 3.94 GB/s | 7.88 GB/s | 15.75 GB/s |
@@ -76,12 +73,14 @@ An easy way to see the PCIe topology is with `lspci`:
\-18.7 Advanced Micro Devices, Inc. [AMD] Family 17h (Models 00h-0fh) Data Fabric: Device 18h; Function 7
Now, how do we read this ?
+
```
+-[10000:00]-+-02.0-[01]----00.0 Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
| \-03.0-[02]----00.0 Intel Corporation NVMe Datacenter SSD [3DNAND, Beta Rock Controller]
```
This is a lot of information, how do we read this ?
+
- The first part in brackets (`[10000:00]`) is the domain and the bus.
- The second part (`02.0` is still unclear to me)
- The third number (between brackets) is the device on the bus
@@ -171,18 +170,18 @@ lspci -v -s 0000:01:00.0
A few things to note from this output:
-- **GT/s** is the number of transactions supported (here, 8 billion
- transactions / second). This is gen3 controller (gen1 is 2.5 and
- gen2 is 5)xs
-- **LNKCAP** is the capabilities which were communicated, and
- **LNKSTAT** is the current status. You want them to report the same
- values. If they don't, you are not using the hardware as it is
- intended (here I'm assuming the hardware is intended to work as a
- gen3 controller). In case the device is downgraded, the output will
- be like this: `LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok)`
-- **width** is the number of lanes that can be used by the device
- (here, we can use 4 lanes)
-- **MaxPayload** is the maximum size of a PCIe packet
+- **GT/s** is the number of transactions supported (here, 8 billion
+ transactions / second). This is gen3 controller (gen1 is 2.5 and
+ gen2 is 5)xs
+- **LNKCAP** is the capabilities which were communicated, and
+ **LNKSTAT** is the current status. You want them to report the same
+ values. If they don't, you are not using the hardware as it is
+ intended (here I'm assuming the hardware is intended to work as a
+ gen3 controller). In case the device is downgraded, the output will
+ be like this: `LnkSta: Speed 2.5GT/s (downgraded), Width x16 (ok)`
+- **width** is the number of lanes that can be used by the device
+ (here, we can use 4 lanes)
+- **MaxPayload** is the maximum size of a PCIe packet
## Debugging
@@ -213,53 +212,53 @@ that have not been completed).
CEMsk: RxErr- BadTLP- BadDLLP- Rollover- Timeout- NonFatalErr+
AERCap: First Error Pointer: 00, GenCap+ CGenEn- ChkCap+ ChkEn-
-- The Uncorrectable Error Status (UESta) reports error status of
- individual uncorrectable error sources (no bits are set above):
- - Data Link Protocol Error (DLP)
- - Surprise Down Error (SDES)
- - Poisoned TLP (TLP)
- - Flow Control Protocol Error (FCP)
- - Completion Timeout (CmpltTO)
- - Completer Abort (CmpltAbrt)
- - Unexpected Completion (UnxCmplt)
- - Receiver Overflow (RxOF)
- - Malformed TLP (MalfTLP)
- - ECRC Error (ECRC)
- - Unsupported Request Error (UnsupReq)
- - ACS Violation (ACSViol)
-- The Uncorrectable Error Mask (UEMsk) controls reporting of
- individual errors by the device to the PCIe root complex. A masked
- error (bit set) is not recorded or reported. Above shows no errors
- are being masked)
-- The Uncorrectable Severity controls whether an individual error is
- reported as a Non-fatal (clear) or Fatal error (set).
-- The Correctable Error Status reports error status of individual
- correctable error sources: (no bits are set above)
- - Receiver Error (RXErr)
- - Bad TLP status (BadTLP)
- - Bad DLLP status (BadDLLP)
- - Replay Timer Timeout status (Timeout)
- - REPLAY NUM Rollover status (Rollover)
- - Advisory Non-Fatal Error (NonFatalIErr)
-- The Correctable Erro Mask (CEMsk) controls reporting of individual
- errors by the device to the PCIe root complex. A masked error (bit
- set) is not reported to the RC. Above shows that Advisory Non-Fatal
- Errors are being masked - this bit is set by default to enable
- compatibility with software that does not comprehend Role-Based
- error reporting.
-- The Advanced Error Capabilities and Control Register (AERCap)
- enables various capabilities (The above indicates the device capable
- of generating ECRC errors but they are not enabled):
- - First Error Pointer identifies the bit position of the first
- error reported in the Uncorrectable Error Status register
- - ECRC Generation Capable (GenCap) indicates if set that the
- function is capable of generating ECRC
- - ECRC Generation Enable (GenEn) indicates if ECRC generation is
- enabled (set)
- - ECRC Check Capable (ChkCap) indicates if set that the function
- is capable of checking ECRC
- - ECRC Check Enable (ChkEn) indicates if ECRC checking is enabled
+- The Uncorrectable Error Status (UESta) reports error status of
+ individual uncorrectable error sources (no bits are set above):
+ - Data Link Protocol Error (DLP)
+ - Surprise Down Error (SDES)
+ - Poisoned TLP (TLP)
+ - Flow Control Protocol Error (FCP)
+ - Completion Timeout (CmpltTO)
+ - Completer Abort (CmpltAbrt)
+ - Unexpected Completion (UnxCmplt)
+ - Receiver Overflow (RxOF)
+ - Malformed TLP (MalfTLP)
+ - ECRC Error (ECRC)
+ - Unsupported Request Error (UnsupReq)
+ - ACS Violation (ACSViol)
+- The Uncorrectable Error Mask (UEMsk) controls reporting of
+ individual errors by the device to the PCIe root complex. A masked
+ error (bit set) is not recorded or reported. Above shows no errors
+ are being masked)
+- The Uncorrectable Severity controls whether an individual error is
+ reported as a Non-fatal (clear) or Fatal error (set).
+- The Correctable Error Status reports error status of individual
+ correctable error sources: (no bits are set above)
+ - Receiver Error (RXErr)
+ - Bad TLP status (BadTLP)
+ - Bad DLLP status (BadDLLP)
+ - Replay Timer Timeout status (Timeout)
+ - REPLAY NUM Rollover status (Rollover)
+ - Advisory Non-Fatal Error (NonFatalIErr)
+- The Correctable Erro Mask (CEMsk) controls reporting of individual
+ errors by the device to the PCIe root complex. A masked error (bit
+ set) is not reported to the RC. Above shows that Advisory Non-Fatal
+ Errors are being masked - this bit is set by default to enable
+ compatibility with software that does not comprehend Role-Based
+ error reporting.
+- The Advanced Error Capabilities and Control Register (AERCap)
+ enables various capabilities (The above indicates the device capable
+ of generating ECRC errors but they are not enabled):
+ - First Error Pointer identifies the bit position of the first
+ error reported in the Uncorrectable Error Status register
+ - ECRC Generation Capable (GenCap) indicates if set that the
+ function is capable of generating ECRC
+ - ECRC Generation Enable (GenEn) indicates if ECRC generation is
+ enabled (set)
+ - ECRC Check Capable (ChkCap) indicates if set that the function
+ is capable of checking ECRC
+ - ECRC Check Enable (ChkEn) indicates if ECRC checking is enabled
## Compute Express Link (CXL)
-[Compute Express Link](https://en.wikipedia.org/wiki/Compute_Express_Link) (CXL) is an open standard for high-speed central processing unit (CPU)-to-device and CPU-to-memory connections, designed for high performance data center computers. The standard is built on top of the PCIe physical interface with protocols for I/O, memory, and cache coherence.
+[Compute Express Link](https://en.wikipedia.org/wiki/Compute_Express_Link) (CXL) is an open standard for high-speed central processing unit (CPU)-to-device and CPU-to-memory connections, designed for high performance data center computers. The standard is built on top of the PCIe physical interface with protocols for I/O, memory, and cache coherence.