In the Linux kernel, the following vulnerability has been resolved:
ext4: fix bounds check in check_xattrs() to prevent out-of-bounds access
The bounds check for the next xattr entry in check_xattrs() uses
(void *)next >= end, which allows next to point within sizeof(u32)
bytes of end. On the next loop iteration, IS_LAST_ENTRY() reads 4
bytes via *(__u32 *)(entry), which can overrun the valid xattr region.
For example, if next lands at end - 1, the check passes since
next < end, but IS_LAST_ENTRY() reads 4 bytes starting at end - 1,
accessing 3 bytes beyond the valid region.
Fix this by changing the check to (void *)next + sizeof(u32) > end,
ensuring there is always enough space for the IS_LAST_ENTRY() read
on the subsequent iteration.
In the Linux kernel, the following vulnerability has been resolved:
mm/vmalloc: take vmap_purge_lock in shrinker
decay_va_pool_node() can be invoked concurrently from two paths:
__purge_vmap_area_lazy() when pools are being purged, and the shrinker via
vmap_node_shrink_scan().
However, decay_va_pool_node() is not safe to run concurrently, and the
shrinker path currently lacks serialization, leading to races and possible
leaks.
Protect decay_va_pool_node() by taking vmap_purge_lock in the shrinker
path to ensure serialization with purge users.
In the Linux kernel, the following vulnerability has been resolved:
wifi: rtw88: check for PCI upstream bridge existence
pci_upstream_bridge() returns NULL if the device is on a root bus. If
8821CE is installed in the system with such a PCI topology, the probing
routine will crash. This has probably been unnoticed as 8821CE is mostly
supplied in laptops where there is a PCI-to-PCI bridge located upstream
from the device. However the card might be installed on a system with
different configuration.
Check if the bridge does exist for the specific workaround to be applied.
Found by Linux Verification Center (linuxtesting.org) with Svace static
analysis tool.
In the Linux kernel, the following vulnerability has been resolved:
media: rc: igorplugusb: heed coherency rules
In a control request, the USB request structure
can be subject to DMA on some HCs. Hence it must obey
the rules for DMA coherency. Allocate it separately.
In the Linux kernel, the following vulnerability has been resolved:
ALSA: aloop: Fix peer runtime UAF during format-change stop
loopback_check_format() may stop the capture side when playback starts
with parameters that no longer match a running capture stream. Commit
826af7fa62e3 ("ALSA: aloop: Fix racy access at PCM trigger") moved
the peer lookup under cable->lock, but the actual snd_pcm_stop() still
runs after dropping that lock.
A concurrent close can clear the capture entry from cable->streams[] and
detach or free its runtime while the playback trigger path still holds a
stale peer substream pointer.
Keep a per-cable count of in-flight peer stops before dropping
cable->lock, and make free_cable() wait for those stops before
detaching the runtime. This preserves the existing behavior while
making the peer runtime lifetime explicit.
In the Linux kernel, the following vulnerability has been resolved:
zram: do not forget to endio for partial discard requests
As reported by Qu Wenruo and Avinesh Kumar, the following
getconf PAGESIZE
65536
blkdiscard -p 4k /dev/zram0
takes literally forever to complete. zram doesn't support partial
discards and just returns immediately w/o doing any discard work in such
cases. The problem is that we forget to endio on our way out, so
blkdiscard sleeps forever in submit_bio_wait(). Fix this by jumping to
end_bio label, which does bio_endio().
In the Linux kernel, the following vulnerability has been resolved:
ALSA: control: Validate buf_len before strnlen() in snd_ctl_elem_init_enum_names()
snd_ctl_elem_init_enum_names() advances pointer p through the names
buffer while decrementing buf_len. If buf_len reaches zero but items
remain, the next iteration calls strnlen(p, 0).
While strnlen(p, 0) returns 0 and would hit the existing name_len == 0
error path, CONFIG_FORTIFY_SOURCE's fortified strnlen() first checks
maxlen against __builtin_dynamic_object_size(). When Clang loses track
of p's object size inside the loop, this triggers a BRK exception panic
before the return value is examined.
Add a buf_len == 0 guard at the loop entry to prevent calling fortified
strnlen() on an exhausted buffer.
Found by kernel fuzz testing through Xiaomi Smartphone.
In the Linux kernel, the following vulnerability has been resolved:
mm/damon/stat: fix memory leak on damon_start() failure in damon_stat_start()
Destroy the DAMON context and reset the global pointer when damon_start()
fails. Otherwise, the context allocated by damon_stat_build_ctx() is
leaked, and the stale damon_stat_context pointer will be overwritten on
the next enable attempt, making the old allocation permanently
unreachable.
In the Linux kernel, the following vulnerability has been resolved:
net: bridge: use a stable FDB dst snapshot in RCU readers
Local FDB entries can be rewritten in place by `fdb_delete_local()`, which
updates `f->dst` to another port or to `NULL` while keeping the entry
alive. Several bridge RCU readers inspect `f->dst`, including
`br_fdb_fillbuf()` through the `brforward_read()` sysfs path.
These readers currently load `f->dst` multiple times and can therefore
observe inconsistent values across the check and later dereference.
In `br_fdb_fillbuf()`, this means a concurrent local-FDB update can change
`f->dst` after the NULL check and before the `port_no` dereference,
leading to a NULL-ptr-deref.
Fix this by taking a single `READ_ONCE()` snapshot of `f->dst` in each
affected RCU reader and using that snapshot for the rest of the access
sequence. Also publish the in-place `f->dst` updates in `fdb_delete_local()`
with `WRITE_ONCE()` so the readers and writer use matching access patterns.
In the Linux kernel, the following vulnerability has been resolved:
rxrpc: Fix rxkad crypto unalignment handling
Fix handling of a packet with a misaligned crypto length. Also handle
non-ENOMEM errors from decryption by aborting. Further, remove the
WARN_ON_ONCE() so that it can't be remotely triggered (a trace line can
still be emitted).
In the Linux kernel, the following vulnerability has been resolved:
RDMA/mana_ib: Disable RX steering on RSS QP destroy
When an RSS QP is destroyed (e.g. DPDK exit), mana_ib_destroy_qp_rss()
destroys the RX WQ objects but does not disable vPort RX steering in
firmware. This leaves stale steering configuration that still points to
the destroyed RX objects.
If traffic continues to arrive (e.g. peer VM is still transmitting) and
the VF interface is subsequently brought up (mana_open), the firmware
may deliver completions using stale CQ IDs from the old RX objects.
These CQ IDs can be reused by the ethernet driver for new TX CQs,
causing RX completions to land on TX CQs:
WARNING: mana_poll_tx_cq+0x1b8/0x220 [mana] (is_sq == false)
WARNING: mana_gd_process_eq_events+0x209/0x290 (cq_table lookup fails)
Fix this by disabling vPort RX steering before destroying RX WQ objects.
Note that mana_fence_rqs() cannot be used here because the fence
completion is delivered on the CQ, which is polled by user-mode (e.g.
DPDK) and not visible to the kernel driver.
Refactor the disable logic into a shared mana_disable_vport_rx() in
mana_en, exported for use by mana_ib, replacing the duplicate code.
The ethernet driver's mana_dealloc_queues() is also updated to call
this common function.
In the Linux kernel, the following vulnerability has been resolved:
spi: fix resource leaks on device setup failure
Make sure to call controller cleanup() if spi_setup() fails while
registering a device to avoid leaking any resources allocated by
setup().
In the Linux kernel, the following vulnerability has been resolved:
KVM: SVM: Inject #UD for INVLPGA if EFER.SVME=0
INVLPGA should cause a #UD when EFER.SVME is not set. Add a check to
properly inject #UD when EFER.SVME=0.
[sean: tag for stable@]
In the Linux kernel, the following vulnerability has been resolved:
crypto: acomp - fix wrong pointer stored by acomp_save_req()
acomp_save_req() stores &req->chain in req->base.data. When
acomp_reqchain_done() is invoked on asynchronous completion, it receives
&req->chain as the data argument but casts it directly to struct
acomp_req. Since data points to the chain member, all subsequent field
accesses are at a wrong offset, resulting in memory corruption.
The issue occurs when an asynchronous hardware implementation, such as
the QAT driver, completes a request that uses the DMA virtual address
interface (e.g. acomp_request_set_src_dma()). This combination causes
crypto_acomp_compress() to enter the acomp_do_req_chain() path, which
sets acomp_reqchain_done() as the completion callback via
acomp_save_req().
With KASAN enabled, this manifests as a general protection fault in
acomp_reqchain_done():
general protection fault, probably for non-canonical address 0xe000040000000000
KASAN: probably user-memory-access in range [0x0000400000000000-0x0000400000000007]
RIP: 0010:acomp_reqchain_done+0x15b/0x4e0
Call Trace:
<IRQ>
qat_comp_alg_callback+0x5d/0xa0 [intel_qat]
adf_ring_response_handler+0x376/0x8b0 [intel_qat]
adf_response_handler+0x60/0x170 [intel_qat]
tasklet_action_common+0x223/0x820
handle_softirqs+0x1ab/0x640
</IRQ>
Fix this by storing the request itself in req->base.data instead of
&req->chain, so that acomp_reqchain_done() receives the correct pointer.
Simplify acomp_restore_req() accordingly to access req->chain directly.
In the Linux kernel, the following vulnerability has been resolved:
ocfs2: split transactions in dio completion to avoid credit exhaustion
During ocfs2 dio operations, JBD2 may report warnings via following
call trace:
ocfs2_dio_end_io_write
ocfs2_mark_extent_written
ocfs2_change_extent_flag
ocfs2_split_extent
ocfs2_try_to_merge_extent
ocfs2_extend_rotate_transaction
ocfs2_extend_trans
jbd2__journal_restart
start_this_handle
output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449
To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to
handle extents in a batch of transaction.
Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode
should only be removed from the orphan list after the extent tree update
is complete. This ensures that if a crash occurs in the middle of extent
tree updates, we won't leave stale blocks beyond EOF.
This patch also changes the logic for updating the inode size and removing
orphan, making it similar to ext4_dio_write_end_io(). Both operations are
performed only when everything looks good.
Finally, thanks to Jans and Joseph for providing the bug fix prototype and
suggestions.
In the Linux kernel, the following vulnerability has been resolved:
erofs: fix the out-of-bounds nameoff handling for trailing dirents
Currently we already have boundary-checks for nameoffs, but the trailing
dirents are special since the namelens are calculated with strnlen()
with unchecked nameoffs.
If a crafted EROFS has a trailing dirent with nameoff >= maxsize,
maxsize - nameoff can underflow, causing strnlen() to read past the
directory block.
nameoff0 should also be verified to be a multiple of
`sizeof(struct erofs_dirent)` as well [1].
[1] https://sashiko.dev/#/patchset/20260416063511.3173774-1-hsiangkao%40linux.alibaba.com
In the Linux kernel, the following vulnerability has been resolved:
crypto: atmel-tdes - fix DMA sync direction
Before DMA output is consumed by the CPU, ->dma_addr_out must be synced
with dma_sync_single_for_cpu() instead of dma_sync_single_for_device().
Using the wrong direction can return stale cache data on non-coherent
platforms.
In the Linux kernel, the following vulnerability has been resolved:
KVM: nSVM: Raise #UD if unhandled VMMCALL isn't intercepted by L1
Explicitly synthesize a #UD for VMMCALL if L2 is active, L1 does NOT want
to intercept VMMCALL, nested_svm_l2_tlb_flush_enabled() is true, and the
hypercall is something other than one of the supported Hyper-V hypercalls.
When all of the above conditions are met, KVM will intercept VMMCALL but
never forward it to L1, i.e. will let L2 make hypercalls as if it were L1.
The TLFS says a whole lot of nothing about this scenario, so go with the
architectural behavior, which says that VMMCALL #UDs if it's not
intercepted.
Opportunistically do a 2-for-1 stub trade by stub-ifying the new API
instead of the helpers it uses. The last remaining "single" stub will
soon be dropped as well.
[sean: rewrite changelog and comment, tag for stable, remove defunct stubs]
In the Linux kernel, the following vulnerability has been resolved:
crypto: atmel-sha204a - Fix potential UAF and memory leak in remove path
Unregister the hwrng to prevent new ->read() calls and flush the Atmel
I2C workqueue before teardown to prevent a potential UAF if a queued
callback runs while the device is being removed.
Drop the early return to ensure sysfs entries are removed and
->hwrng.priv is freed, preventing a memory leak.
In the Linux kernel, the following vulnerability has been resolved:
spi: ch341: fix memory leaks on probe failures
Make sure to deregister the controller, disable pins, and kill and free
the RX URB on probe failures to mirror disconnect and avoid memory
leaks and use-after-free.
Also add an explicit URB kill on disconnect for symmetry (even if that
is not strictly required as USB core would have stopped it in the
current setup).
In the Linux kernel, the following vulnerability has been resolved:
hwmon: (powerz) Fix missing usb_kill_urb() on signal interrupt
wait_for_completion_interruptible_timeout() returns -ERESTARTSYS when
interrupted. This needs to abort the URB and return an error. No data
has been received from the device so any reads from the transfer
buffer are invalid.
The original code tests !ret, which only catches the timeout case (0).
On signal delivery (-ERESTARTSYS), !ret is false so the function skips
usb_kill_urb() and falls through to read from the unfilled transfer
buffer.
Fix by capturing the return value into a long (matching the function
return type) and handling signal (negative) and timeout (zero) cases
with separate checks that both call usb_kill_urb() before returning.
In the Linux kernel, the following vulnerability has been resolved:
ntfs3: add buffer boundary checks to run_unpack()
run_unpack() checks `run_buf < run_last` at the top of the while loop
but then reads size_size and offset_size bytes via run_unpack_s64()
without verifying they fit within the remaining buffer. A crafted NTFS
image with truncated run data in an MFT attribute triggers an OOB heap
read of up to 15 bytes when the filesystem is mounted.
Add boundary checks before each run_unpack_s64() call to ensure the
declared field size does not exceed the remaining buffer.
Found by fuzzing with a source-patched harness (LibAFL + QEMU).
In the Linux kernel, the following vulnerability has been resolved:
KVM: nSVM: Avoid clearing VMCB_LBR in vmcb12
svm_copy_lbrs() always marks VMCB_LBR dirty in the destination VMCB.
However, nested_svm_vmexit() uses it to copy LBRs to vmcb12, and
clearing clean bits in vmcb12 is not architecturally defined.
Move vmcb_mark_dirty() to callers and drop it for vmcb12.
This also facilitates incoming refactoring that does not pass the entire
VMCB to svm_copy_lbrs().
In the Linux kernel, the following vulnerability has been resolved:
md/raid5: validate payload size before accessing journal metadata
r5c_recovery_analyze_meta_block() and
r5l_recovery_verify_data_checksum_for_mb() iterate over payloads in a
journal metadata block using on-disk payload size fields without
validating them against the remaining space in the metadata block.
A corrupted journal contains payload sizes extending beyond the PAGE_SIZE
boundary can cause out-of-bounds reads when accessing payload fields or
computing offsets.
Add bounds validation for each payload type to ensure the full payload
fits within meta_size before processing.
In the Linux kernel, the following vulnerability has been resolved:
wifi: mwifiex: fix use-after-free in mwifiex_adapter_cleanup()
The mwifiex_adapter_cleanup() function uses timer_delete()
(non-synchronous) for the wakeup_timer before the adapter structure is
freed. This is incorrect because timer_delete() does not wait for any
running timer callback to complete.
If the wakeup_timer callback (wakeup_timer_fn) is executing when
mwifiex_adapter_cleanup() is called, the callback will continue to
access adapter fields (adapter->hw_status, adapter->if_ops.card_reset,
etc.) which may be freed by mwifiex_free_adapter() called later in the
mwifiex_remove_card() path.
Use timer_delete_sync() instead to ensure any running timer callback has
completed before returning.
In the Linux kernel, the following vulnerability has been resolved:
crypto: nx - fix bounce buffer leaks in nx842_crypto_{alloc,free}_ctx
The bounce buffers are allocated with __get_free_pages() using
BOUNCE_BUFFER_ORDER (order 2 = 4 pages), but both the allocation error
path and nx842_crypto_free_ctx() release the buffers with free_page().
Use free_pages() with the matching order instead.
In the Linux kernel, the following vulnerability has been resolved:
mm/damon/core: validate damos_quota_goal->nid for node_memcg_{used,free}_bp
Users can set damos_quota_goal->nid with arbitrary value for
node_memcg_{used,free}_bp. But DAMON core is using those for NODE-DATA()
without a validation of the value. This can result in out of bounds
memory access. The issue can actually triggered using DAMON user-space
tool (damo), like below.
$ sudo mkdir /sys/fs/cgroup/foo
$ sudo ./damo start --damos_action stat --damos_quota_interval 1s \
--damos_quota_goal node_memcg_used_bp 50% -1 /foo
$ sudo dmseg
[...]
[ 524.181426] Unable to handle kernel paging request at virtual address 0000000000002c00
Fix this issue by adding the validation of the given node id. If an
invalid node id is given, it returns 0% for used memory ratio, and 100%
for free memory ratio.
In the Linux kernel, the following vulnerability has been resolved:
ceph: fix num_ops off-by-one when crypto allocation fails
move_dirty_folio_in_page_array() may fail if the file is encrypted, the
dirty folio is not the first in the batch, and it fails to allocate a
bounce buffer to hold the ciphertext. When that happens,
ceph_process_folio_batch() simply redirties the folio and flushes the
current batch -- it can retry that folio in a future batch.
However, if this failed folio is not contiguous with the last folio that
did make it into the batch, then ceph_process_folio_batch() has already
incremented `ceph_wbc->num_ops`; because it doesn't follow through and
add the discontiguous folio to the array, ceph_submit_write() -- which
expects that `ceph_wbc->num_ops` accurately reflects the number of
contiguous ranges (and therefore the required number of "write extent"
ops) in the writeback -- will panic the kernel:
BUG_ON(ceph_wbc->op_idx + 1 != req->r_num_ops);
This issue can be reproduced on affected kernels by writing to
fscrypt-enabled CephFS file(s) with a 4KiB-written/4KiB-skipped/repeat
pattern (total filesize should not matter) and gradually increasing the
system's memory pressure until a bounce buffer allocation fails.
Fix this crash by decrementing `ceph_wbc->num_ops` back to the correct
value when move_dirty_folio_in_page_array() fails, but the folio already
started counting a new (i.e. still-empty) extent.
The defect corrected by this patch has existed since 2022 (see first
`Fixes:`), but another bug blocked multi-folio encrypted writeback until
recently (see second `Fixes:`). The second commit made it into 6.18.16,
6.19.6, and 7.0-rc1, unmasking the panic in those versions. This patch
therefore fixes a regression (panic) introduced by cac190c7674f.
In the Linux kernel, the following vulnerability has been resolved:
fbdev: defio: Disconnect deferred I/O from the lifetime of struct fb_info
Hold state of deferred I/O in struct fb_deferred_io_state. Allocate an
instance as part of initializing deferred I/O and remove it only after
the final mapping has been closed. If the fb_info and the contained
deferred I/O meanwhile goes away, clear struct fb_deferred_io_state.info
to invalidate the mapping. Any access will then result in a SIGBUS
signal.
Fixes a long-standing problem, where a device hot-unplug happens while
user space still has an active mapping of the graphics memory. The hot-
unplug frees the instance of struct fb_info. Accessing the memory will
operate on undefined state.
In the Linux kernel, the following vulnerability has been resolved:
ibmasm: fix heap over-read in ibmasm_send_i2o_message()
The ibmasm_send_i2o_message() function uses get_dot_command_size() to
compute the byte count for memcpy_toio(), but this value is derived from
user-controlled fields in the dot_command_header (command_size: u8,
data_size: u16) and is never validated against the actual allocation size.
A root user can write a small buffer with inflated header fields, causing
memcpy_toio() to read up to ~65 KB past the end of the allocation into
adjacent kernel heap, which is then forwarded to the service processor
over MMIO.
Silently clamping the copy size is not sufficient: if the header fields
claim a larger size than the buffer, the SP receives a dot command whose
own header is inconsistent with the I2O message length, which can cause
the SP to desynchronize. Reject such commands outright by returning
failure.
Validate command_size before calling get_mfa_inbound() to avoid leaking
an I2O message frame: reading INBOUND_QUEUE_PORT dequeues a hardware
frame from the controller's free pool, and returning without a
corresponding set_mfa_inbound() call would permanently exhaust it.
Additionally, clamp command_size to I2O_COMMAND_SIZE before the
memcpy_toio() so the MMIO write stays within the I2O message frame,
consistent with the clamping already performed by outgoing_message_size()
for the header field.
In the Linux kernel, the following vulnerability has been resolved:
x86/shstk: Prevent deadlock during shstk sigreturn
During sigreturn the shadow stack signal frame is popped. The kernel does
this by reading the shadow stack using normal read accesses. When it can't
assume the memory is shadow stack, it takes extra steps to makes sure it is
reading actual shadow stack memory and not other normal readable memory. It
does this by holding the mmap read lock while doing the access and checking
the flags of the VMA.
Unfortunately that is not safe. If the read of the shadow stack sigframe
hits a page fault, the fault handler will try to recursively grab another
mmap read lock. This normally works ok, but if a writer on another CPU is
also waiting, the second read lock could fail and cause a deadlock.
Fix this by not holding mmap lock during the read access to userspace.
Instead use mmap_lock_speculate_...() to watch for changes between dropping
mmap lock and the userspace access. Retry if anything grabbed an mmap write
lock in between and could have changed the VMA.
These mmap_lock_speculate_...() helpers use mm::mm_lock_seq, which is only
available when PER_VMA_LOCK is configured. So make X86_USER_SHADOW_STACK
depend on it. On x86, PER_VMA_LOCK is a default configuration for SMP
kernels. So drop support for the other configs under the assumption that
the !SMP shadow stack user base does not exist.
Currently there is a check that skips the lookup work when the SSP can be
assumed to be on a shadow stack. While reorganizing the function, remove
the optimization to make the tricky code flows more common, such that
issues like this cannot escape detection for so long.
In the Linux kernel, the following vulnerability has been resolved:
ntfs3: fix integer overflow in run_unpack() volume boundary check
The volume boundary check `lcn + len > sbi->used.bitmap.nbits` uses raw
addition which can wrap around for large lcn and len values, bypassing
the validation. Use check_add_overflow() as is already done for the
adjacent prev_lcn + dlcn and vcn64 + len checks added by commit
3ac37e100385 ("ntfs3: Fix integer overflow in run_unpack()").
Found by fuzzing with a source-patched harness (LibAFL + QEMU).
In the Linux kernel, the following vulnerability has been resolved:
jbd2: fix deadlock in jbd2_journal_cancel_revoke()
Commit f76d4c28a46a ("fs/jbd2: use sleeping version of
__find_get_block()") changed jbd2_journal_cancel_revoke() to use
__find_get_block_nonatomic() which holds the folio lock instead of
i_private_lock. This breaks the lock ordering (folio -> buffer) and
causes an ABBA deadlock when the filesystem blocksize < pagesize:
T1 T2
ext4_mkdir()
ext4_init_new_dir()
ext4_append()
ext4_getblk()
lock_buffer() <- A
sync_blockdev()
blkdev_writepages()
writeback_iter()
writeback_get_folio()
folio_lock() <- B
ext4_journal_get_create_access()
jbd2_journal_cancel_revoke()
__find_get_block_nonatomic()
folio_lock() <- B
block_write_full_folio()
lock_buffer() <- A
This can occasionally cause generic/013 to hang.
Fix by only calling __find_get_block_nonatomic() when the passed
buffer_head doesn't belong to the bdev, which is the only case that we
need to look up its bdev alias. Otherwise, the lookup is redundant since
the found buffer_head is equal to the one we passed in.
In the Linux kernel, the following vulnerability has been resolved:
crypto: qat - fix IRQ cleanup on 6xxx probe failure
When adf_dev_up() partially completes and then fails, the IRQ
handlers registered during adf_isr_resource_alloc() are not detached
before the MSI-X vectors are released.
Since the device is enabled with pcim_enable_device(), calling
pci_alloc_irq_vectors() internally registers pcim_msi_release() as a
devres action. On probe failure, devres runs pcim_msi_release() which
calls pci_free_irq_vectors(), tearing down the MSI-X vectors while IRQ
handlers (for example 'qat0-bundle0') are still attached. This causes
remove_proc_entry() warnings:
[ 22.163964] remove_proc_entry: removing non-empty directory 'irq/143', leaking at least 'qat0-bundle0'
Moving the devm_add_action_or_reset() before adf_dev_up() does not solve
the problem since devres runs in LIFO order and pcim_msi_release(),
registered later inside adf_dev_up(), would still fire before
adf_device_down().
Fix by calling adf_dev_down() explicitly when adf_dev_up() fails, to
properly free IRQ handlers before devres releases the MSI-X vectors.
In the Linux kernel, the following vulnerability has been resolved:
KVM: nSVM: Always use NextRIP as vmcb02's NextRIP after first L2 VMRUN
For guests with NRIPS disabled, L1 does not provide NextRIP when running
an L2 with an injected soft interrupt, instead it advances the current RIP
before running it. KVM uses the current RIP as the NextRIP in vmcb02 to
emulate a CPU without NRIPS.
However, after L2 runs the first time, NextRIP will be updated by the CPU
and/or KVM, and the current RIP is no longer the correct value to use in
vmcb02. Hence, after save/restore, use the current RIP if and only if a
nested run is pending, otherwise use NextRIP. Give soft_int_next_rip the
same treatment, as it's the same logic, just for a narrower use case.
[sean: give soft_int_next_rip the same treatment]
In the Linux kernel, the following vulnerability has been resolved:
media: amphion: Fix race between m2m job_abort and device_run
Fix kernel panic caused by race condition where v4l2_m2m_ctx_release()
frees m2m_ctx while v4l2_m2m_try_run() is about to call device_run
with the same context.
Race sequence:
v4l2_m2m_try_run(): v4l2_m2m_ctx_release():
lock/unlock v4l2_m2m_cancel_job()
job_abort()
v4l2_m2m_job_finish()
kfree(m2m_ctx) <- frees ctx
device_run() <- use-after-free crash at 0x538
Crash trace:
Unable to handle kernel read from unreadable memory at virtual address
0000000000000538
v4l2_m2m_try_run+0x78/0x138
v4l2_m2m_device_run_work+0x14/0x20
The amphion vpu driver does not rely on the m2m framework's device_run
callback to perform encode/decode operations.
Fix the race by preventing m2m framework job scheduling entirely:
- Add job_ready callback returning 0 (no jobs ready for m2m framework)
- Remove job_abort callback to avoid the race condition
In the Linux kernel, the following vulnerability has been resolved:
landlock: Fix LOG_SUBDOMAINS_OFF inheritance across fork()
hook_cred_transfer() only copies the Landlock security blob when the
source credential has a domain. This is inconsistent with
landlock_restrict_self() which can set LOG_SUBDOMAINS_OFF on a
credential without creating a domain (via the ruleset_fd=-1 path): the
field is committed but not preserved across fork() because the child's
prepare_creds() calls hook_cred_transfer() which skips the copy when
domain is NULL.
This breaks the documented use case where a process mutes subdomain logs
before forking sandboxed children: the children lose the muting and
their domains produce unexpected audit records.
Fix this by unconditionally copying the Landlock credential blob.
In the Linux kernel, the following vulnerability has been resolved:
Bluetooth: hci_event: fix potential UAF in SSP passkey handlers
hci_conn lookup and field access must be covered by hdev lock in
hci_user_passkey_notify_evt() and hci_keypress_notify_evt(), otherwise
the connection can be freed concurrently.
Extend the hci_dev_lock critical section to cover all conn usage in both
handlers.
Keep the existing keypress notification behavior unchanged by routing
the early exits through a common unlock path.
In the Linux kernel, the following vulnerability has been resolved:
selinux: fix overlayfs mmap() and mprotect() access checks
The existing SELinux security model for overlayfs is to allow access if
the current task is able to access the top level file (the "user" file)
and the mounter's credentials are sufficient to access the lower
level file (the "backing" file). Unfortunately, the current code does
not properly enforce these access controls for both mmap() and mprotect()
operations on overlayfs filesystems.
This patch makes use of the newly created security_mmap_backing_file()
LSM hook to provide the missing backing file enforcement for mmap()
operations, and leverages the backing file API and new LSM blob to
provide the necessary information to properly enforce the mprotect()
access controls.
In the Linux kernel, the following vulnerability has been resolved:
net: rds: fix MR cleanup on copy error
__rds_rdma_map() hands sg/pages ownership to the transport after
get_mr() succeeds. If copying the generated cookie back to user space
fails after that point, the error path must not free those resources
again before dropping the MR reference.
Remove the duplicate unpin/free from the put_user() failure branch so
that MR teardown is handled only through the existing final cleanup
path.
In the Linux kernel, the following vulnerability has been resolved:
ceph: only d_add() negative dentries when they are unhashed
Ceph can call d_add(dentry, NULL) on a negative dentry that is already
present in the primary dcache hash.
In the current VFS that is not safe. d_add() goes through __d_add()
to __d_rehash(), which unconditionally reinserts dentry->d_hash into
the hlist_bl bucket. If the dentry is already hashed, reinserting the
same node can corrupt the bucket, including creating a self-loop.
Once that happens, __d_lookup() can spin forever in the hlist_bl walk,
typically looping only on the d_name.hash mismatch check and
eventually triggering RCU stall reports like this one:
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 87-....: (2100 ticks this GP) idle=3a4c/1/0x4000000000000000 softirq=25003319/25003319 fqs=829
rcu: (t=2101 jiffies g=79058445 q=698988 ncpus=192)
CPU: 87 UID: 2952868916 PID: 3933303 Comm: php-cgi8.3 Not tainted 6.18.17-i1-amd #950 NONE
Hardware name: Dell Inc. PowerEdge R7615/0G9DHV, BIOS 1.6.6 09/22/2023
RIP: 0010:__d_lookup+0x46/0xb0
Code: c1 e8 07 48 8d 04 c2 48 8b 00 49 89 fc 49 89 f5 48 89 c3 48 83 e3 fe 48 83 f8 01 77 0f eb 2d 0f 1f 44 00 00 48 8b 1b 48 85 db <74> 20 39 6b 18 75 f3 48 8d 7b 78 e8 ba 85 d0 00 4c 39 63 10 74 1f
RSP: 0018:ff745a70c8253898 EFLAGS: 00000282
RAX: ff26e470054cb208 RBX: ff26e470054cb208 RCX: 000000006e958966
RDX: ff26e48267340000 RSI: ff745a70c82539b0 RDI: ff26e458f74655c0
RBP: 000000006e958966 R08: 0000000000000180 R09: 9cd08d909b919a89
R10: ff26e458f74655c0 R11: 0000000000000000 R12: ff26e458f74655c0
R13: ff745a70c82539b0 R14: d0d0d0d0d0d0d0d0 R15: 2f2f2f2f2f2f2f2f
FS: 00007f5770896980(0000) GS:ff26e482c5d88000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f5764de50c0 CR3: 000000a72abb5001 CR4: 0000000000771ef0
PKRU: 55555554
Call Trace:
<TASK>
lookup_fast+0x9f/0x100
walk_component+0x1f/0x150
link_path_walk+0x20e/0x3d0
path_lookupat+0x68/0x180
filename_lookup+0xdc/0x1e0
vfs_statx+0x6c/0x140
vfs_fstatat+0x67/0xa0
__do_sys_newfstatat+0x24/0x60
do_syscall_64+0x6a/0x230
entry_SYSCALL_64_after_hwframe+0x76/0x7e
This is reachable with reused cached negative dentries. A Ceph lookup
or atomic_open can be handed a negative dentry that is already hashed,
and fs/ceph/dir.c then hits one of two paths that incorrectly assume
"negative" also means "unhashed":
- ceph_finish_lookup():
MDS reply is -ENOENT with no trace
-> d_add(dentry, NULL)
- ceph_lookup():
local ENOENT fast path for a complete directory with shared caps
-> d_add(dentry, NULL)
Both paths can therefore re-add an already-hashed negative dentry.
Ceph already uses the correct pattern elsewhere: ceph_fill_trace() only
calls d_add(dn, NULL) for a negative null-dentry reply when d_unhashed(dn)
is true.
Fix both fs/ceph/dir.c sites the same way: only call d_add() for a
negative dentry when it is actually unhashed. If the negative dentry
is already hashed, leave it in place and reuse it as-is.
This preserves the existing behavior for unhashed dentries while
avoiding d_hash list corruption for reused hashed negatives.
In the Linux kernel, the following vulnerability has been resolved:
md/raid5: fix soft lockup in retry_aligned_read()
When retry_aligned_read() encounters an overlapped stripe, it releases
the stripe via raid5_release_stripe() which puts it on the lockless
released_stripes llist. In the next raid5d loop iteration,
release_stripe_list() drains the stripe onto handle_list (since
STRIPE_HANDLE is set by the original IO), but retry_aligned_read()
runs before handle_active_stripes() and removes the stripe from
handle_list via find_get_stripe() -> list_del_init(). This prevents
handle_stripe() from ever processing the stripe to resolve the
overlap, causing an infinite loop and soft lockup.
Fix this by using __release_stripe() with temp_inactive_list instead
of raid5_release_stripe() in the failure path, so the stripe does not
go through the released_stripes llist. This allows raid5d to break out
of its loop, and the overlap will be resolved when the stripe is
eventually processed by handle_stripe().
In the Linux kernel, the following vulnerability has been resolved:
md/raid10: fix deadlock with check operation and nowait requests
When an array check is running it will raise the barrier at which point
normal requests will become blocked and increment the nr_pending value to
signal there is work pending inside of wait_barrier(). NOWAIT requests
do not block and so will return immediately with an error, and additionally
do not increment nr_pending in wait_barrier(). Upstream change commit
43806c3d5b9b ("raid10: cleanup memleak at raid10_make_request") added a
call to raid_end_bio_io() to fix a memory leak when NOWAIT requests hit
this condition. raid_end_bio_io() eventually calls allow_barrier() and
it will unconditionally do an atomic_dec_and_test(&conf->nr_pending) even
though the corresponding increment on nr_pending didn't happen in the
NOWAIT case.
This can be easily seen by starting a check operation while an application
is doing nowait IO on the same array. This results in a deadlocked state
due to nr_pending value underflowing and so the md resync thread gets stuck
waiting for nr_pending to == 0.
Output of r10conf state of the array when we hit this condition:
crash> struct r10conf
barrier = 1,
nr_pending = {
counter = -41
},
nr_waiting = 15,
nr_queued = 0,
Example of md_sync thread stuck waiting on raise_barrier() and other
requests stuck in wait_barrier():
md1_resync
[<0>] raise_barrier+0xce/0x1c0
[<0>] raid10_sync_request+0x1ca/0x1ed0
[<0>] md_do_sync+0x779/0x1110
[<0>] md_thread+0x90/0x160
[<0>] kthread+0xbe/0xf0
[<0>] ret_from_fork+0x34/0x50
[<0>] ret_from_fork_asm+0x1a/0x30
kworker/u1040:2+flush-253:4
[<0>] wait_barrier+0x1de/0x220
[<0>] regular_request_wait+0x30/0x180
[<0>] raid10_make_request+0x261/0x1000
[<0>] md_handle_request+0x13b/0x230
[<0>] __submit_bio+0x107/0x1f0
[<0>] submit_bio_noacct_nocheck+0x16f/0x390
[<0>] ext4_io_submit+0x24/0x40
[<0>] ext4_do_writepages+0x254/0xc80
[<0>] ext4_writepages+0x84/0x120
[<0>] do_writepages+0x7a/0x260
[<0>] __writeback_single_inode+0x3d/0x300
[<0>] writeback_sb_inodes+0x1dd/0x470
[<0>] __writeback_inodes_wb+0x4c/0xe0
[<0>] wb_writeback+0x18b/0x2d0
[<0>] wb_workfn+0x2a1/0x400
[<0>] process_one_work+0x149/0x330
[<0>] worker_thread+0x2d2/0x410
[<0>] kthread+0xbe/0xf0
[<0>] ret_from_fork+0x34/0x50
[<0>] ret_from_fork_asm+0x1a/0x30
In the Linux kernel, the following vulnerability has been resolved:
ALSA: ctxfi: Add fallback to default RSR for S/PDIF
spdif_passthru_playback_get_resources() uses atc->pll_rate as the RSR
for the MSR calculation loop. However, pll_rate is only updated in
atc_pll_init() and not in hw_pll_init(), so it remains 0 after the
card init.
When spdif_passthru_playback_setup() skips atc_pll_init() for
32000 Hz, (rsr * desc.msr) always becomes 0, causing the loop to spin
indefinitely.
Add fallback to use atc->rsr when atc->pll_rate is 0. This reflects
the hardware state, since hw_card_init() already configures the PLL
to the default RSR.
In the Linux kernel, the following vulnerability has been resolved:
ALSA: caiaq: fix usb_dev refcount leak on probe failure
create_card() takes a reference on the USB device with usb_get_dev()
and stores the matching usb_put_dev() in card_free(), which is
installed as the snd_card's ->private_free destructor.
However, ->private_free is only assigned near the end of init_card(),
after several failure points (usb_set_interface(), EP type checks,
usb_submit_urb(), the EP1_CMD_GET_DEVICE_INFO exchange, and its
timeout). When any of those fail, init_card() returns an error to
snd_probe(), which calls snd_card_free(card). Because ->private_free
is still NULL, card_free() never runs, the usb_get_dev() reference
is not dropped, and the struct usb_device leaks along with its
descriptor allocations and device_private.
syzbot reproduces this with a malformed UAC3 device whose only valid
altsetting is 0; init_card()'s usb_set_interface(usb_dev, 0, 1) call
fails with -EIO and triggers the leak.
Move the ->private_free assignment into create_card(), immediately
after usb_get_dev(), so that every error path reaching snd_card_free()
balances the reference. card_free()'s callees (snd_usb_caiaq_input_free,
free_urbs, kfree) already tolerate the partially-initialized state
because the chip private area is zero-initialized by snd_card_new().
In the Linux kernel, the following vulnerability has been resolved:
net: qrtr: ns: Fix use-after-free in driver remove()
In the remove callback, if a packet arrives after destroy_workqueue() is
called, but before sock_release(), the qrtr_ns_data_ready() callback will
try to queue the work, causing use-after-free issue.
Fix this issue by saving the default 'sk_data_ready' callback during
qrtr_ns_init() and use it to replace the qrtr_ns_data_ready() callback at
the start of remove(). This ensures that even if a packet arrives after
destroy_workqueue(), the work struct will not be dereferenced.
Note that it is also required to ensure that the RX threads are completed
before destroying the workqueue, because the threads could be using the
qrtr_ns_data_ready() callback.
In the Linux kernel, the following vulnerability has been resolved:
ext4: fix missing brelse() in ext4_xattr_inode_dec_ref_all()
The commit c8e008b60492 ("ext4: ignore xattrs past end")
introduced a refcount leak in when block_csum is false.
ext4_xattr_inode_dec_ref_all() calls ext4_get_inode_loc() to
get iloc.bh, but never releases it with brelse().
In the Linux kernel, the following vulnerability has been resolved:
md/md-llbitmap: skip reading rdevs that are not in_sync
When reading bitmap pages from member disks, the code iterates through
all rdevs and attempts to read from the first available one. However,
it only checks for raid_disk assignment and Faulty flag, missing the
In_sync flag check.
This can cause bitmap data to be read from spare disks that are still
being rebuilt and don't have valid bitmap information yet. Reading
stale or uninitialized bitmap data from such disks can lead to
incorrect dirty bit tracking, potentially causing data corruption
during recovery or normal operation.
Add the In_sync flag check to ensure bitmap pages are only read from
fully synchronized member disks that have valid bitmap data.