In the Linux kernel, the following vulnerability has been resolved:
iommu/vt-d: Clear Present bit before tearing down PASID entry
The Intel VT-d Scalable Mode PASID table entry consists of 512 bits (64
bytes). When tearing down an entry, the current implementation zeros the
entire 64-byte structure immediately using multiple 64-bit writes.
Since the IOMMU hardware may fetch these 64 bytes using multiple
internal transactions (e.g., four 128-bit bursts), updating or zeroing
the entire entry while it is active (P=1) risks a "torn" read. If a
hardware fetch occurs simultaneously with the CPU zeroing the entry, the
hardware could observe an inconsistent state, leading to unpredictable
behavior or spurious faults.
Follow the "Guidance to Software for Invalidations" in the VT-d spec
(Section 6.5.3.3) by implementing the recommended ownership handshake:
1. Clear only the 'Present' (P) bit of the PASID entry.
2. Use a dma_wmb() to ensure the cleared bit is visible to hardware
before proceeding.
3. Execute the required invalidation sequence (PASID cache, IOTLB, and
Device-TLB flush) to ensure the hardware has released all cached
references.
4. Only after the flushes are complete, zero out the remaining fields
of the PASID entry.
Also, add a dma_wmb() in pasid_set_present() to ensure that all other
fields of the PASID entry are visible to the hardware before the Present
bit is set.
In the Linux kernel, the following vulnerability has been resolved:
apparmor: Fix & Optimize table creation from possibly unaligned memory
Source blob may come from userspace and might be unaligned.
Try to optize the copying process by avoiding unaligned memory accesses.
- Added Fixes tag
- Added "Fix &" to description as this doesn't just optimize but fixes
a potential unaligned memory access
[jj: remove duplicate word "convert" in comment trigger checkpatch warning]
In the Linux kernel, the following vulnerability has been resolved:
ext4: drop extent cache after doing PARTIAL_VALID1 zeroout
When splitting an unwritten extent in the middle and converting it to
initialized in ext4_split_extent() with the EXT4_EXT_MAY_ZEROOUT and
EXT4_EXT_DATA_VALID2 flags set, it could leave a stale unwritten extent.
Assume we have an unwritten file and buffered write in the middle of it
without dioread_nolock enabled, it will allocate blocks as written
extent.
0 A B N
[UUUUUUUUUUUU] on-disk extent U: unwritten extent
[UUUUUUUUUUUU] extent status tree
[--DDDDDDDD--] D: valid data
|<- ->| ----> this range needs to be initialized
ext4_split_extent() first try to split this extent at B with
EXT4_EXT_DATA_PARTIAL_VALID1 and EXT4_EXT_MAY_ZEROOUT flag set, but
ext4_split_extent_at() failed to split this extent due to temporary lack
of space. It zeroout B to N and leave the entire extent as unwritten.
0 A B N
[UUUUUUUUUUUU] on-disk extent
[UUUUUUUUUUUU] extent status tree
[--DDDDDDDDZZ] Z: zeroed data
ext4_split_extent() then try to split this extent at A with
EXT4_EXT_DATA_VALID2 flag set. This time, it split successfully and
leave an written extent from A to N.
0 A B N
[UUWWWWWWWWWW] on-disk extent W: written extent
[UUUUUUUUUUUU] extent status tree
[--DDDDDDDDZZ]
Finally ext4_map_create_blocks() only insert extent A to B to the extent
status tree, and leave an stale unwritten extent in the status tree.
0 A B N
[UUWWWWWWWWWW] on-disk extent W: written extent
[UUWWWWWWWWUU] extent status tree
[--DDDDDDDDZZ]
Fix this issue by always cached extent status entry after zeroing out
the second part.
In the Linux kernel, the following vulnerability has been resolved:
net: hns3: fix double free issue for tx spare buffer
In hns3_set_ringparam(), a temporary copy (tmp_rings) of the ring structure
is created for rollback. However, the tx_spare pointer in the original
ring handle is incorrectly left pointing to the old backup memory.
Later, if memory allocation fails in hns3_init_all_ring() during the setup,
the error path attempts to free all newly allocated rings. Since tx_spare
contains a stale (non-NULL) pointer from the backup, it is mistaken for
a newly allocated buffer and is erroneously freed, leading to a double-free
of the backup memory.
The root cause is that the tx_spare field was not cleared after its value
was saved in tmp_rings, leaving a dangling pointer.
Fix this by setting tx_spare to NULL in the original ring structure
when the creation of the new `tx_spare` fails. This ensures the
error cleanup path only frees genuinely newly allocated buffers.
In the Linux kernel, the following vulnerability has been resolved:
xen-netback: reject zero-queue configuration from guest
A malicious or buggy Xen guest can write "0" to the xenbus key
"multi-queue-num-queues". The connect() function in the backend only
validates the upper bound (requested_num_queues > xenvif_max_queues)
but not zero, allowing requested_num_queues=0 to reach
vzalloc(array_size(0, sizeof(struct xenvif_queue))), which triggers
WARN_ON_ONCE(!size) in __vmalloc_node_range().
On systems with panic_on_warn=1, this allows a guest-to-host denial
of service.
The Xen network interface specification requires
the queue count to be "greater than zero".
Add a zero check to match the validation already present
in xen-blkback, which has included this
guard since its multi-queue support was added.
In the Linux kernel, the following vulnerability has been resolved:
mptcp: do not account for OoO in mptcp_rcvbuf_grow()
MPTCP-level OoOs are physiological when multiple subflows are active
concurrently and will not cause retransmissions nor are caused by
drops.
Accounting for them in mptcp_rcvbuf_grow() causes the rcvbuf slowly
drifting towards tcp_rmem[2].
Remove such accounting. Note that subflows will still account for TCP-level
OoO when the MPTCP-level rcvbuf is propagated.
This also closes a subtle and very unlikely race condition with rcvspace
init; active sockets with user-space holding the msk-level socket lock,
could complete such initialization in the receive callback, after that the
first OoO data reaches the rcvbuf and potentially triggering a divide by
zero Oops.
In the Linux kernel, the following vulnerability has been resolved:
md/raid1: fix memory leak in raid1_run()
raid1_run() calls setup_conf() which registers a thread via
md_register_thread(). If raid1_set_limits() fails, the previously
registered thread is not unregistered, resulting in a memory leak
of the md_thread structure and the thread resource itself.
Add md_unregister_thread() to the error path to properly cleanup
the thread, which aligns with the error handling logic of other paths
in this function.
Compile tested only. Issue found using a prototype static analysis tool
and code review.
In the Linux kernel, the following vulnerability has been resolved:
af_unix: Fix memleak of newsk in unix_stream_connect().
When prepare_peercred() fails in unix_stream_connect(),
unix_release_sock() is not called for newsk, and the memory
is leaked.
Let's move prepare_peercred() before unix_create1().
In the Linux kernel, the following vulnerability has been resolved:
bpf: Fix bpf_xdp_store_bytes proto for read-only arg
While making some maps in Cilium read-only from the BPF side, we noticed
that the bpf_xdp_store_bytes proto is incorrect. In particular, the
verifier was throwing the following error:
; ret = ctx_store_bytes(ctx, l3_off + offsetof(struct iphdr, saddr),
&nat->address, 4, 0);
635: (79) r1 = *(u64 *)(r10 -144) ; R1=ctx() R10=fp0 fp-144=ctx()
636: (b4) w2 = 26 ; R2=26
637: (b4) w4 = 4 ; R4=4
638: (b4) w5 = 0 ; R5=0
639: (85) call bpf_xdp_store_bytes#190
write into map forbidden, value_size=6 off=0 size=4
nat comes from a BPF_F_RDONLY_PROG map, so R3 is a PTR_TO_MAP_VALUE.
The verifier checks the helper's memory access to R3 in
check_mem_size_reg, as it reaches ARG_CONST_SIZE argument. The third
argument has expected type ARG_PTR_TO_UNINIT_MEM, which includes the
MEM_WRITE flag. The verifier thus checks for a BPF_WRITE access on R3.
Given R3 points to a read-only map, the check fails.
Conversely, ARG_PTR_TO_UNINIT_MEM can also lead to the helper reading
from uninitialized memory.
This patch simply fixes the expected argument type to match that of
bpf_skb_store_bytes.
In the Linux kernel, the following vulnerability has been resolved:
power: supply: cpcap-battery: Fix use-after-free in power_supply_changed()
Using the `devm_` variant for requesting IRQ _before_ the `devm_`
variant for allocating/registering the `power_supply` handle, means that
the `power_supply` handle will be deallocated/unregistered _before_ the
interrupt handler (since `devm_` naturally deallocates in reverse
allocation order). This means that during removal, there is a race
condition where an interrupt can fire just _after_ the `power_supply`
handle has been freed, *but* just _before_ the corresponding
unregistration of the IRQ handler has run.
This will lead to the IRQ handler calling `power_supply_changed()` with
a freed `power_supply` handle. Which usually crashes the system or
otherwise silently corrupts the memory...
Note that there is a similar situation which can also happen during
`probe()`; the possibility of an interrupt firing _before_ registering
the `power_supply` handle. This would then lead to the nasty situation
of using the `power_supply` handle *uninitialized* in
`power_supply_changed()`.
Fix this racy use-after-free by making sure the IRQ is requested _after_
the registration of the `power_supply` handle.
In the Linux kernel, the following vulnerability has been resolved:
apparmor: avoid per-cpu hold underflow in aa_get_buffer
When aa_get_buffer() pulls from the per-cpu list it unconditionally
decrements cache->hold. If hold reaches 0 while count is still non-zero,
the unsigned decrement wraps to UINT_MAX. This keeps hold non-zero for a
very long time, so aa_put_buffer() never returns buffers to the global
list, which can starve other CPUs and force repeated kmalloc(aa_g_path_max)
allocations.
Guard the decrement so hold never underflows.
In the Linux kernel, the following vulnerability has been resolved:
iio: sca3000: Fix a resource leak in sca3000_probe()
spi->irq from request_threaded_irq() not released when
iio_device_register() fails. Add an return value check and jump to a
common error handler when iio_device_register() fails.
In the Linux kernel, the following vulnerability has been resolved:
power: supply: pm8916_bms_vm: Fix use-after-free in power_supply_changed()
Using the `devm_` variant for requesting IRQ _before_ the `devm_`
variant for allocating/registering the `power_supply` handle, means that
the `power_supply` handle will be deallocated/unregistered _before_ the
interrupt handler (since `devm_` naturally deallocates in reverse
allocation order). This means that during removal, there is a race
condition where an interrupt can fire just _after_ the `power_supply`
handle has been freed, *but* just _before_ the corresponding
unregistration of the IRQ handler has run.
This will lead to the IRQ handler calling `power_supply_changed()` with
a freed `power_supply` handle. Which usually crashes the system or
otherwise silently corrupts the memory...
Note that there is a similar situation which can also happen during
`probe()`; the possibility of an interrupt firing _before_ registering
the `power_supply` handle. This would then lead to the nasty situation
of using the `power_supply` handle *uninitialized* in
`power_supply_changed()`.
Fix this racy use-after-free by making sure the IRQ is requested _after_
the registration of the `power_supply` handle.
In the Linux kernel, the following vulnerability has been resolved:
soc: mediatek: svs: Fix memory leak in svs_enable_debug_write()
In svs_enable_debug_write(), the buf allocated by memdup_user_nul()
is leaked if kstrtoint() fails.
Fix this by using __free(kfree) to automatically free buf, eliminating
the need for explicit kfree() calls and preventing leaks.
[Angelo: Added missing cleanup.h inclusion]
In the Linux kernel, the following vulnerability has been resolved:
PCI/P2PDMA: Release per-CPU pgmap ref when vm_insert_page() fails
When vm_insert_page() fails in p2pmem_alloc_mmap(), p2pmem_alloc_mmap()
doesn't invoke percpu_ref_put() to free the per-CPU ref of pgmap acquired
after gen_pool_alloc_owner(), and memunmap_pages() will hang forever when
trying to remove the PCI device.
Fix it by adding the missed percpu_ref_put().
In the Linux kernel, the following vulnerability has been resolved:
power: supply: bq25980: Fix use-after-free in power_supply_changed()
Using the `devm_` variant for requesting IRQ _before_ the `devm_`
variant for allocating/registering the `power_supply` handle, means that
the `power_supply` handle will be deallocated/unregistered _before_ the
interrupt handler (since `devm_` naturally deallocates in reverse
allocation order). This means that during removal, there is a race
condition where an interrupt can fire just _after_ the `power_supply`
handle has been freed, *but* just _before_ the corresponding
unregistration of the IRQ handler has run.
This will lead to the IRQ handler calling `power_supply_changed()` with
a freed `power_supply` handle. Which usually crashes the system or
otherwise silently corrupts the memory...
Note that there is a similar situation which can also happen during
`probe()`; the possibility of an interrupt firing _before_ registering
the `power_supply` handle. This would then lead to the nasty situation
of using the `power_supply` handle *uninitialized* in
`power_supply_changed()`.
Fix this racy use-after-free by making sure the IRQ is requested _after_
the registration of the `power_supply` handle.
In the Linux kernel, the following vulnerability has been resolved:
drm/amdkfd: Fix watch_id bounds checking in debug address watch v2
The address watch clear code receives watch_id as an unsigned value
(u32), but some helper functions were using a signed int and checked
bits by shifting with watch_id.
If a very large watch_id is passed from userspace, it can be converted
to a negative value. This can cause invalid shifts and may access
memory outside the watch_points array.
drm/amdkfd: Fix watch_id bounds checking in debug address watch v2
Fix this by checking that watch_id is within MAX_WATCH_ADDRESSES before
using it. Also use BIT(watch_id) to test and clear bits safely.
This keeps the behavior unchanged for valid watch IDs and avoids
undefined behavior for invalid ones.
Fixes the below:
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c:448
kfd_dbg_trap_clear_dev_address_watch() error: buffer overflow
'pdd->watch_points' 4 <= u32max user_rl='0-3,2147483648-u32max' uncapped
drivers/gpu/drm/amd/amdgpu/../amdkfd/kfd_debug.c
433 int kfd_dbg_trap_clear_dev_address_watch(struct kfd_process_device *pdd,
434 uint32_t watch_id)
435 {
436 int r;
437
438 if (!kfd_dbg_owns_dev_watch_id(pdd, watch_id))
kfd_dbg_owns_dev_watch_id() doesn't check for negative values so if
watch_id is larger than INT_MAX it leads to a buffer overflow.
(Negative shifts are undefined).
439 return -EINVAL;
440
441 if (!pdd->dev->kfd->shared_resources.enable_mes) {
442 r = debug_lock_and_unmap(pdd->dev->dqm);
443 if (r)
444 return r;
445 }
446
447 amdgpu_gfx_off_ctrl(pdd->dev->adev, false);
--> 448 pdd->watch_points[watch_id] = pdd->dev->kfd2kgd->clear_address_watch(
449 pdd->dev->adev,
450 watch_id);
v2: (as per, Jonathan Kim)
- Add early watch_id >= MAX_WATCH_ADDRESSES validation in the set path to
match the clear path.
- Drop the redundant bounds check in kfd_dbg_owns_dev_watch_id().
In the Linux kernel, the following vulnerability has been resolved:
HID: intel-ish-hid: fix NULL-ptr-deref in ishtp_bus_remove_all_clients
During a warm reset flow, the cl->device pointer may be NULL if the
reset occurs while clients are still being enumerated. Accessing
cl->device->reference_count without a NULL check leads to a kernel panic.
This issue was identified during multi-unit warm reboot stress clycles.
Add a defensive NULL check for cl->device to ensure stability under
such intensive testing conditions.
KASAN: null-ptr-deref in range [0000000000000000-0000000000000007]
Workqueue: ish_fw_update_wq fw_reset_work_fn
Call Trace:
ishtp_bus_remove_all_clients+0xbe/0x130 [intel_ishtp]
ishtp_reset_handler+0x85/0x1a0 [intel_ishtp]
fw_reset_work_fn+0x8a/0xc0 [intel_ish_ipc]
In the Linux kernel, the following vulnerability has been resolved:
arm64/gcs: Fix error handling in arch_set_shadow_stack_status()
alloc_gcs() returns an error-encoded pointer on failure, which comes
from do_mmap(), not NULL.
The current NULL check fails to detect errors, which could lead to using
an invalid GCS address.
Use IS_ERR_VALUE() to properly detect errors, consistent with the
check in gcs_alloc_thread_stack().
In the Linux kernel, the following vulnerability has been resolved:
mfd: arizona: Fix regulator resource leak on wm5102_clear_write_sequencer() failure
The wm5102_clear_write_sequencer() helper may return an error
and just return, bypassing the cleanup sequence and causing
regulators to remain enabled, leading to a resource leak.
Change the direct return to jump to the err_reset label to
properly free the resources.
In the Linux kernel, the following vulnerability has been resolved:
phy: freescale: imx8qm-hsio: fix NULL pointer dereference
During the probe the refclk_pad pointer is set to NULL if the
'fsl,refclk-pad-mode' property is not defined in the devicetree node. But
in imx_hsio_configure_clk_pad() this pointer is unconditionally used which
could result in a NULL pointer dereference. So check the pointer before to
use it.
In the Linux kernel, the following vulnerability has been resolved:
netfilter: nft_set_rbtree: check for partial overlaps in anonymous sets
Userspace provides an optimized representation in case intervals are
adjacent, where the end element is omitted.
The existing partial overlap detection logic skips anonymous set checks
on start elements for this reason.
However, it is possible to add intervals that overlap to this anonymous
where two start elements with the same, eg. A-B, A-C where C < B.
start end
A B
start end
A C
Restore the check on overlapping start elements to report an overlap.
In the Linux kernel, the following vulnerability has been resolved:
scsi: smartpqi: Fix memory leak in pqi_report_phys_luns()
pqi_report_phys_luns() fails to release the rpl_list buffer when
encountering an unsupported data format or when the allocation for
rpl_16byte_wwid_list fails. These early returns bypass the cleanup logic,
leading to memory leaks.
Consolidate the error handling by adding an out_free_rpl_list label and use
goto statements to ensure rpl_list is consistently freed on failure.
Compile tested only. Issue found using a prototype static analysis tool and
code review.
In the Linux kernel, the following vulnerability has been resolved:
tpm: st33zp24: Fix missing cleanup on get_burstcount() error
get_burstcount() can return -EBUSY on timeout. When this happens,
st33zp24_send() returns directly without releasing the locality
acquired earlier.
Use goto out_err to ensure proper cleanup when get_burstcount() fails.
In the Linux kernel, the following vulnerability has been resolved:
SUNRPC: auth_gss: fix memory leaks in XDR decoding error paths
The gssx_dec_ctx(), gssx_dec_status(), and gssx_dec_name()
functions allocate memory via gssx_dec_buffer(), which calls
kmemdup(). When a subsequent decode operation fails, these
functions return immediately without freeing previously
allocated buffers, causing memory leaks.
The leak in gssx_dec_ctx() is particularly relevant because
the caller (gssp_accept_sec_context_upcall) initializes several
buffer length fields to non-zero values, resulting in memory
allocation:
struct gssx_ctx rctxh = {
.exported_context_token.len = GSSX_max_output_handle_sz,
.mech.len = GSS_OID_MAX_LEN,
.src_name.display_name.len = GSSX_max_princ_sz,
.targ_name.display_name.len = GSSX_max_princ_sz
};
If, for example, gssx_dec_name() succeeds for src_name but
fails for targ_name, the memory allocated for
exported_context_token, mech, and src_name.display_name
remains unreferenced and cannot be reclaimed.
Add error handling with goto-based cleanup to free any
previously allocated buffers before returning an error.
In the Linux kernel, the following vulnerability has been resolved:
power: supply: wm97xx: Fix NULL pointer dereference in power_supply_changed()
In `probe()`, `request_irq()` is called before allocating/registering a
`power_supply` handle. If an interrupt is fired between the call to
`request_irq()` and `power_supply_register()`, the `power_supply` handle
will be used uninitialized in `power_supply_changed()` in
`wm97xx_bat_update()` (triggered from the interrupt handler). This will
lead to a `NULL` pointer dereference since
Fix this racy `NULL` pointer dereference by making sure the IRQ is
requested _after_ the registration of the `power_supply` handle. Since
the IRQ is the last thing requests in the `probe()` now, remove the
error path for freeing it. Instead add one for unregistering the
`power_supply` handle when IRQ request fails.
In the Linux kernel, the following vulnerability has been resolved:
pinctrl: single: fix refcount leak in pcs_add_gpio_func()
of_parse_phandle_with_args() returns a device_node pointer with refcount
incremented in gpiospec.np. The loop iterates through all phandles but
never releases the reference, causing a refcount leak on each iteration.
Add of_node_put() calls to release the reference after extracting the
needed arguments and on the error path when devm_kzalloc() fails.
This bug was detected by our static analysis tool and verified by my
code review.
In the Linux kernel, the following vulnerability has been resolved:
power: supply: act8945a: Fix use-after-free in power_supply_changed()
Using the `devm_` variant for requesting IRQ _before_ the `devm_`
variant for allocating/registering the `power_supply` handle, means that
the `power_supply` handle will be deallocated/unregistered _before_ the
interrupt handler (since `devm_` naturally deallocates in reverse
allocation order). This means that during removal, there is a race
condition where an interrupt can fire just _after_ the `power_supply`
handle has been freed, *but* just _before_ the corresponding
unregistration of the IRQ handler has run.
This will lead to the IRQ handler calling `power_supply_changed()` with
a freed `power_supply` handle. Which usually crashes the system or
otherwise silently corrupts the memory...
Note that there is a similar situation which can also happen during
`probe()`; the possibility of an interrupt firing _before_ registering
the `power_supply` handle. This would then lead to the nasty situation
of using the `power_supply` handle *uninitialized* in
`power_supply_changed()`.
Fix this racy use-after-free by making sure the IRQ is requested _after_
the registration of the `power_supply` handle.
In the Linux kernel, the following vulnerability has been resolved:
serial: caif: fix use-after-free in caif_serial ldisc_close()
There is a use-after-free bug in caif_serial where handle_tx() may
access ser->tty after the tty has been freed.
The race condition occurs between ldisc_close() and packet transmission:
CPU 0 (close) CPU 1 (xmit)
------------- ------------
ldisc_close()
tty_kref_put(ser->tty)
[tty may be freed here]
<-- race window -->
caif_xmit()
handle_tx()
tty = ser->tty // dangling ptr
tty->ops->write() // UAF!
schedule_work()
ser_release()
unregister_netdevice()
The root cause is that tty_kref_put() is called in ldisc_close() while
the network device is still active and can receive packets.
Since ser and tty have a 1:1 binding relationship with consistent
lifecycles (ser is allocated in ldisc_open and freed in ser_release
via unregister_netdevice, and each ser binds exactly one tty), we can
safely defer the tty reference release to ser_release() where the
network device is unregistered.
Fix this by moving tty_kref_put() from ldisc_close() to ser_release(),
after unregister_netdevice(). This ensures the tty reference is held
as long as the network device exists, preventing the UAF.
Note: We save ser->tty before unregister_netdevice() because ser is
embedded in netdev's private data and will be freed along with netdev
(needs_free_netdev = true).
How to reproduce: Add mdelay(500) at the beginning of ldisc_close()
to widen the race window, then run the reproducer program [1].
Note: There is a separate deadloop issue in handle_tx() when using
PORT_UNKNOWN serial ports (e.g., /dev/ttyS3 in QEMU without proper
serial backend). This deadloop exists even without this patch,
and is likely caused by inconsistency between uart_write_room() and
uart_write() in serial core. It has been addressed in a separate
patch [2].
KASAN report:
==================================================================
BUG: KASAN: slab-use-after-free in handle_tx+0x5d1/0x620
Read of size 1 at addr ffff8881131e1490 by task caif_uaf_trigge/9929
Call Trace:
<TASK>
dump_stack_lvl+0x10e/0x1f0
print_report+0xd0/0x630
kasan_report+0xe4/0x120
handle_tx+0x5d1/0x620
dev_hard_start_xmit+0x9d/0x6c0
__dev_queue_xmit+0x6e2/0x4410
packet_xmit+0x243/0x360
packet_sendmsg+0x26cf/0x5500
__sys_sendto+0x4a3/0x520
__x64_sys_sendto+0xe0/0x1c0
do_syscall_64+0xc9/0xf80
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7f615df2c0d7
Allocated by task 9930:
Freed by task 64:
Last potentially related work creation:
The buggy address belongs to the object at ffff8881131e1000
which belongs to the cache kmalloc-cg-2k of size 2048
The buggy address is located 1168 bytes inside of
freed 2048-byte region [ffff8881131e1000, ffff8881131e1800)
The buggy address belongs to the physical page:
page_owner tracks the page as allocated
page last free pid 9778 tgid 9778 stack trace:
Memory state around the buggy address:
ffff8881131e1380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8881131e1400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff8881131e1480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff8881131e1500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff8881131e1580: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
[1]: https://gist.github.com/mrpre/f683f244544f7b11e7fa87df9e6c2eeb
[2]: https://lore.kernel.org/linux-serial/20260204074327.226165-1-jiayuan.chen@linux.dev/T/#u
In the Linux kernel, the following vulnerability has been resolved:
mctp i2c: initialise event handler read bytes
Set a 0xff value for i2c reads of an mctp-i2c device. Otherwise reads
will return "val" from the i2c bus driver. For i2c-aspeed and
i2c-npcm7xx that is a stack uninitialised u8.
Tested with "i2ctransfer -y 1 r10@0x34" where 0x34 is a mctp-i2c
instance, now it returns all 0xff.
In the Linux kernel, the following vulnerability has been resolved:
fs/ntfs3: prevent infinite loops caused by the next valid being the same
When processing valid within the range [valid : pos), if valid cannot
be retrieved correctly, for example, if the retrieved valid value is
always the same, this can trigger a potential infinite loop, similar
to the hung problem reported by syzbot [1].
Adding a check for the valid value within the loop body, and terminating
the loop and returning -EINVAL if the value is the same as the current
value, can prevent this.
[1]
INFO: task syz.4.21:6056 blocked for more than 143 seconds.
Call Trace:
rwbase_write_lock+0x14f/0x750 kernel/locking/rwbase_rt.c:244
inode_lock include/linux/fs.h:1027 [inline]
ntfs_file_write_iter+0xe6/0x870 fs/ntfs3/file.c:1284
In the Linux kernel, the following vulnerability has been resolved:
i3c: dw: Fix memory leak in dw_i3c_master_i2c_xfers()
The dw_i3c_master_i2c_xfers() function allocates memory for the xfer
structure using dw_i3c_master_alloc_xfer(). If pm_runtime_resume_and_get()
fails, the function returns without freeing the allocated xfer, resulting
in a memory leak.
Add a dw_i3c_master_free_xfer() call to the error path to ensure the
allocated memory is properly freed.
Compile tested only. Issue found using a prototype static analysis tool
and code review.
In the Linux kernel, the following vulnerability has been resolved:
iommu/vt-d: Flush cache for PASID table before using it
When writing the address of a freshly allocated zero-initialized PASID
table to a PASID directory entry, do that after the CPU cache flush for
this PASID table, not before it, to avoid the time window when this
PASID table may be already used by non-coherent IOMMU hardware while
its contents in RAM is still some random old data, not zero-initialized.
In the Linux kernel, the following vulnerability has been resolved:
gfs2: Fix slab-use-after-free in qd_put
Commit a475c5dd16e5 ("gfs2: Free quota data objects synchronously")
started freeing quota data objects during filesystem shutdown instead of
putting them back onto the LRU list, but it failed to remove these
objects from the LRU list, causing LRU list corruption. This caused
use-after-free when the shrinker (gfs2_qd_shrink_scan) tried to access
already-freed objects on the LRU list.
Fix this by removing qd objects from the LRU list before freeing them in
qd_put().
Initial fix from Deepanshu Kartikey <kartikey406@gmail.com>.
In the Linux kernel, the following vulnerability has been resolved:
netfilter: nf_conncount: increase the connection clean up limit to 64
After the optimization to only perform one GC per jiffy, a new problem
was introduced. If more than 8 new connections are tracked per jiffy the
list won't be cleaned up fast enough possibly reaching the limit
wrongly.
In order to prevent this issue, only skip the GC if it was already
triggered during the same jiffy and the increment is lower than the
clean up limit. In addition, increase the clean up limit to 64
connections to avoid triggering GC too often and do more effective GCs.
This has been tested using a HTTP server and several
performance tools while having nft_connlimit/xt_connlimit or OVS limit
configured.
Output of slowhttptest + OVS limit at 52000 connections:
slow HTTP test status on 340th second:
initializing: 0
pending: 432
connected: 51998
error: 0
closed: 0
service available: YES
In the Linux kernel, the following vulnerability has been resolved:
netfilter: nfnetlink_queue: do shared-unconfirmed check before segmentation
Ulrich reports a regression with nfqueue:
If an application did not set the 'F_GSO' capability flag and a gso
packet with an unconfirmed nf_conn entry is received all packets are
now dropped instead of queued, because the check happens after
skb_gso_segment(). In that case, we did have exclusive ownership
of the skb and its associated conntrack entry. The elevated use
count is due to skb_clone happening via skb_gso_segment().
Move the check so that its peformed vs. the aggregated packet.
Then, annotate the individual segments except the first one so we
can do a 2nd check at reinject time.
For the normal case, where userspace does in-order reinjects, this avoids
packet drops: first reinjected segment continues traversal and confirms
entry, remaining segments observe the confirmed entry.
While at it, simplify nf_ct_drop_unconfirmed(): We only care about
unconfirmed entries with a refcnt > 1, there is no need to special-case
dying entries.
This only happens with UDP. With TCP, the only unconfirmed packet will
be the TCP SYN, those aren't aggregated by GRO.
Next patch adds a udpgro test case to cover this scenario.
In the Linux kernel, the following vulnerability has been resolved:
ext4: don't zero the entire extent if EXT4_EXT_DATA_PARTIAL_VALID1
When allocating initialized blocks from a large unwritten extent, or
when splitting an unwritten extent during end I/O and converting it to
initialized, there is currently a potential issue of stale data if the
extent needs to be split in the middle.
0 A B N
[UUUUUUUUUUUU] U: unwritten extent
[--DDDDDDDD--] D: valid data
|<- ->| ----> this range needs to be initialized
ext4_split_extent() first try to split this extent at B with
EXT4_EXT_DATA_ENTIRE_VALID1 and EXT4_EXT_MAY_ZEROOUT flag set, but
ext4_split_extent_at() failed to split this extent due to temporary lack
of space. It zeroout B to N and mark the entire extent from 0 to N
as written.
0 A B N
[WWWWWWWWWWWW] W: written extent
[SSDDDDDDDDZZ] Z: zeroed, S: stale data
ext4_split_extent() then try to split this extent at A with
EXT4_EXT_DATA_VALID2 flag set. This time, it split successfully and left
a stale written extent from 0 to A.
0 A B N
[WW|WWWWWWWWWW]
[SS|DDDDDDDDZZ]
Fix this by pass EXT4_EXT_DATA_PARTIAL_VALID1 to ext4_split_extent_at()
when splitting at B, don't convert the entire extent to written and left
it as unwritten after zeroing out B to N. The remaining work is just
like the standard two-part split. ext4_split_extent() will pass the
EXT4_EXT_DATA_VALID2 flag when it calls ext4_split_extent_at() for the
second time, allowing it to properly handle the split. If the split is
successful, it will keep extent from 0 to A as unwritten.
In the Linux kernel, the following vulnerability has been resolved:
scsi: csiostor: Fix dereference of null pointer rn
The error exit path when rn is NULL ends up deferencing the null pointer rn
via the use of the macro CSIO_INC_STATS. Fix this by adding a new error
return path label after the use of the macro to avoid the deference.
In the Linux kernel, the following vulnerability has been resolved:
RDMA/uverbs: Validate wqe_size before using it in ib_uverbs_post_send
ib_uverbs_post_send() uses cmd.wqe_size from userspace without any
validation before passing it to kmalloc() and using the allocated
buffer as struct ib_uverbs_send_wr.
If a user provides a small wqe_size value (e.g., 1), kmalloc() will
succeed, but subsequent accesses to user_wr->opcode, user_wr->num_sge,
and other fields will read beyond the allocated buffer, resulting in
an out-of-bounds read from kernel heap memory. This could potentially
leak sensitive kernel information to userspace.
Additionally, providing an excessively large wqe_size can trigger a
WARNING in the memory allocation path, as reported by syzkaller.
This is inconsistent with ib_uverbs_unmarshall_recv() which properly
validates that wqe_size >= sizeof(struct ib_uverbs_recv_wr) before
proceeding.
Add the same validation for ib_uverbs_post_send() to ensure wqe_size
is at least sizeof(struct ib_uverbs_send_wr).
In the Linux kernel, the following vulnerability has been resolved:
ata: libata-scsi: avoid Non-NCQ command starvation
When a non-NCQ command is issued while NCQ commands are being executed,
ata_scsi_qc_issue() indicates to the SCSI layer that the command issuing
should be deferred by returning SCSI_MLQUEUE_XXX_BUSY. This command
deferring is correct and as mandated by the ACS specifications since
NCQ and non-NCQ commands cannot be mixed.
However, in the case of a host adapter using multiple submission queues,
when the target device is under a constant load of NCQ commands, there
are no guarantees that requeueing the non-NCQ command will be executed
later and it may be deferred again repeatedly as other submission queues
can constantly issue NCQ commands from different CPUs ahead of the
non-NCQ command. This can lead to very long delays for the execution of
non-NCQ commands, and even complete starvation for these commands in the
worst case scenario.
Since the block layer and the SCSI layer do not distinguish between
queueable (NCQ) and non queueable (non-NCQ) commands, libata-scsi SAT
implementation must ensure forward progress for non-NCQ commands in the
presence of NCQ command traffic. This is similar to what SAS HBAs with a
hardware/firmware based SAT implementation do.
Implement such forward progress guarantee by limiting requeueing of
non-NCQ commands from ata_scsi_qc_issue(): when a non-NCQ command is
received and NCQ commands are in-flight, do not force a requeue of the
non-NCQ command by returning SCSI_MLQUEUE_XXX_BUSY and instead return 0
to indicate that the command was accepted but hold on to the qc using
the new deferred_qc field of struct ata_port.
This deferred qc will be issued using the work item deferred_qc_work
running the function ata_scsi_deferred_qc_work() once all in-flight
commands complete, which is checked with the port qc_defer() callback
return value indicating that no further delay is necessary. This check
is done using the helper function ata_scsi_schedule_deferred_qc() which
is called from ata_scsi_qc_complete(). This thus excludes this mechanism
from all internal non-NCQ commands issued by ATA EH.
When a port deferred_qc is non NULL, that is, the port has a command
waiting for the device queue to drain, the issuing of all incoming
commands (both NCQ and non-NCQ) is deferred using the regular busy
mechanism. This simplifies the code and also avoids potential denial of
service problems if a user issues too many non-NCQ commands.
Finally, whenever ata EH is scheduled, regardless of the reason, a
deferred qc is always requeued so that it can be retried once EH
completes. This is done by calling the function
ata_scsi_requeue_deferred_qc() from ata_eh_set_pending(). This avoids
the need for any special processing for the deferred qc in case of NCQ
error, link or device reset, or device timeout.
In the Linux kernel, the following vulnerability has been resolved:
crypto: inside-secure/eip93 - unregister only available algorithm
EIP93 has an options register. This register indicates which crypto
algorithms are implemented in silicon. Supported algorithms are
registered on this basis. Unregister algorithms on the same basis.
Currently, all algorithms are unregistered, even those not supported
by HW. This results in panic on platforms that don't have all options
implemented in silicon.
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: Use kvfree instead of kfree in amdgpu_gmc_get_nps_memranges()
amdgpu_discovery_get_nps_info() internally allocates memory for ranges
using kvcalloc(), which may use vmalloc() for large allocation. Using
kfree() to release vmalloc memory will lead to a memory corruption.
Use kvfree() to safely handle both kmalloc and vmalloc allocations.
Compile tested only. Issue found using a prototype static analysis tool
and code review.
In the Linux kernel, the following vulnerability has been resolved:
RDMA/rxe: Fix double free in rxe_srq_from_init
In rxe_srq_from_init(), the queue pointer 'q' is assigned to
'srq->rq.queue' before copying the SRQ number to user space.
If copy_to_user() fails, the function calls rxe_queue_cleanup()
to free the queue, but leaves the now-invalid pointer in
'srq->rq.queue'.
The caller of rxe_srq_from_init() (rxe_create_srq) eventually
calls rxe_srq_cleanup() upon receiving the error, which triggers
a second rxe_queue_cleanup() on the same memory, leading to a
double free.
The call trace looks like this:
kmem_cache_free+0x.../0x...
rxe_queue_cleanup+0x1a/0x30 [rdma_rxe]
rxe_srq_cleanup+0x42/0x60 [rdma_rxe]
rxe_elem_release+0x31/0x70 [rdma_rxe]
rxe_create_srq+0x12b/0x1a0 [rdma_rxe]
ib_create_srq_user+0x9a/0x150 [ib_core]
Fix this by moving 'srq->rq.queue = q' after copy_to_user.
In the Linux kernel, the following vulnerability has been resolved:
efi: Fix reservation of unaccepted memory table
The reserve_unaccepted() function incorrectly calculates the size of the
memblock reservation for the unaccepted memory table. It aligns the
size of the table, but fails to account for cases where the table's
starting physical address (efi.unaccepted) is not page-aligned.
If the table starts at an offset within a page and its end crosses into
a subsequent page that the aligned size does not cover, the end of the
table will not be reserved. This can lead to the table being overwritten
or inaccessible, causing a kernel panic in accept_memory().
This issue was observed when starting Intel TDX VMs with specific memory
sizes (e.g., > 64GB).
Fix this by calculating the end address first (including the unaligned
start) and then aligning it up, ensuring the entire range is covered
by the reservation.
In the Linux kernel, the following vulnerability has been resolved:
ipvs: skip ipv6 extension headers for csum checks
Protocol checksum validation fails for IPv6 if there are extension
headers before the protocol header. iph->len already contains its
offset, so use it to fix the problem.
In the Linux kernel, the following vulnerability has been resolved:
net: mscc: ocelot: add missing lock protection in ocelot_port_xmit_inj()
ocelot_port_xmit_inj() calls ocelot_can_inject() and
ocelot_port_inject_frame() without holding the injection group lock.
Both functions contain lockdep_assert_held() for the injection lock,
and the correct caller felix_port_deferred_xmit() properly acquires
the lock using ocelot_lock_inj_grp() before calling these functions.
Add ocelot_lock_inj_grp()/ocelot_unlock_inj_grp() around the register
injection path to fix the missing lock protection. The FDMA path is not
affected as it uses its own locking mechanism.
In the Linux kernel, the following vulnerability has been resolved:
apparmor: fix NULL sock in aa_sock_file_perm
Deal with the potential that sock and sock-sk can be NULL during
socket setup or teardown. This could lead to an oops. The fix for NULL
pointer dereference in __unix_needs_revalidation shows this is at
least possible for af_unix sockets. While the fix for af_unix sockets
applies for newer mediation this is still the fall back path for older
af_unix mediation and other sockets, so ensure it is covered.
In the Linux kernel, the following vulnerability has been resolved:
net: remove WARN_ON_ONCE when accessing forward path array
Although unlikely, recent support for IPIP tunnels increases chances of
reaching this WARN_ON_ONCE if userspace manages to build a sufficiently
long forward path.
Remove it.
Improper Certificate Validation vulnerability in Erlang OTP public_key (pubkey_ocsp module) allows forged OCSP responses signed with an expired responder certificate to be accepted as valid.
OCSP response verification in pubkey_ocsp:verify_response/5 and pubkey_ocsp:is_authorized_responder/3 in lib/public_key/src/pubkey_ocsp.erl does not check the validity period (notBefore/notAfter) of the OCSP responder certificate. An attacker who has obtained the private key of an expired CA-designated OCSP responder certificate can forge OCSP responses that Erlang/OTP accepts as valid.
This affects TLS clients using OCSP stapling via the ssl application: a malicious or compromised server can present a revoked TLS certificate together with a forged OCSP response signed by an expired responder key, and the client will accept the revoked certificate as valid. It also affects applications calling public_key:pkix_ocsp_validate/5 directly, where the impact depends on the use case — server-side client certificate validation using this API may allow authentication bypass with a revoked client certificate.
This issue affects OTP from OTP 27.0 before OTP 27.3.4.12, 28.5.0.1, and 29.0.1 corresponding to public_key from 1.16 before 1.17.1.3, 1.20.3.1, and 1.21.1.
Improper Following of a Certificate's Chain of Trust vulnerability in Erlang OTP public_key (pubkey_cert module) allows a non-CA certificate to be accepted as an intermediate issuer, enabling certificate chain forgery.
In lib/public_key/src/pubkey_cert.erl, pubkey_cert:validate_extensions/7 contains two flaws that together allow a certificate with basicConstraints cA:false and no keyUsage extension to be used as an intermediate issuer in a chain passed to public_key:pkix_path_validation/3: the cA:false clause recurses into the remaining extensions without rejecting the certificate when it is in issuer position, and the keyUsage check only fires when the extension is present, so a certificate lacking keyUsage entirely bypasses the keyCertSign enforcement.
Any party holding an end-entity certificate with basicConstraints cA:false and no keyUsage extension, issued by any CA in the victim's trust store, can use that certificate's private key to sign forged leaf certificates for arbitrary identities. public_key:pkix_path_validation/3 accepts the resulting chain, and by extension every TLS or mTLS endpoint built on the OTP ssl application that relies on the default verifier is affected, including server identity verification on the client side and client certificate verification on mTLS servers.
This issue affects OTP from OTP 17.0 before OTP 26.2.5.21, 27.3.4.12, 28.5.0.1, and 29.0.1 corresponding to public_key from 0.22 before 1.15.1.7, 1.17.1.3, 1.20.3.1, and 1.21.1.