In the Linux kernel, the following vulnerability has been resolved:
f2fs: fix fsck inconsistency caused by FGGC of node block
During FGGC node block migration, fsck may incorrectly treat the
migrated node block as fsync-written data.
The reproduction scenario:
root@vm:/mnt/f2fs# seq 1 2048 | xargs -n 1 ./test_sync // write inline inode and sync
root@vm:/mnt/f2fs# rm -f 1
root@vm:/mnt/f2fs# sync
root@vm:/mnt/f2fs# f2fs_io gc_range // move data block in sync mode and not write CP
SPO, "fsck --dry-run" find inode has already checkpointed but still
with DENT_BIT_SHIFT set
The root cause is that GC does not clear the dentry mark and fsync mark
during node block migration, leading fsck to misinterpret them as
user-issued fsync writes.
In BGGC mode, node block migration is handled by f2fs_sync_node_pages(),
which guarantees the dentry and fsync marks are cleared before writing.
This patch move the set/clear of the fsync|dentry marks into
__write_node_folio to make the logic clearer, and ensures the
fsync|dentry mark is cleared in FGGC.
In the Linux kernel, the following vulnerability has been resolved:
x86/CPU/AMD: Prevent improper isolation of shared resources in Zen2's op cache
Make sure resources are not improperly shared in the op cache and
cause instruction corruption this way.
In the Linux kernel, the following vulnerability has been resolved:
exit: prevent preemption of oopsing TASK_DEAD task
When an already-exiting task oopses, make_task_dead() currently calls
do_task_dead() with preemption enabled. That is forbidden:
do_task_dead() calls __schedule(), which has a comment saying "WARNING:
must be called with preemption disabled!".
If an oopsing task is preempted in do_task_dead(), between becoming
TASK_DEAD and entering the scheduler explicitly, bad things happen:
finish_task_switch() assumes that once the scheduler has switched away
from a TASK_DEAD task, the task can never run again and its stack is no
longer needed; but that assumption apparently doesn't hold if the dead
task was preempted (the SM_PREEMPT case).
This means that the scheduler ends up repeatedly dropping references on
the dead task's stack, which can lead to use-after-free or double-free
of the entire task stack; in other words, two tasks can end up running
on the same stack, resulting in various kinds of memory corruption.
(This does not just affect "recursively oopsing" tasks; it is enough to
oops once during task exit, for example in a file_operations::release
handler)
In the Linux kernel, the following vulnerability has been resolved:
ipv6: xfrm6: release dst on error in xfrm6_rcv_encap()
xfrm6_rcv_encap() performs an IPv6 route lookup when the skb does not
already have a dst attached. ip6_route_input_lookup() returns a
referenced dst entry even when the lookup resolves to an error route.
If dst->error is set, xfrm6_rcv_encap() drops the skb without attaching
the dst to the skb and without releasing the reference returned by the
lookup. Repeated packets hitting this path therefore leak dst entries.
Release the dst before jumping to the drop path.
In the Linux kernel, the following vulnerability has been resolved:
riscv: kvm: fix vector context allocation leak
When the second kzalloc (host_context.vector.datap) fails in
kvm_riscv_vcpu_alloc_vector_context, the first allocation
(guest_context.vector.datap) is leaked. Free it before returning.
In the Linux kernel, the following vulnerability has been resolved:
mptcp: pm: ADD_ADDR rtx: free sk if last
When an ADD_ADDR is retransmitted, the sk is held in sk_reset_timer(),
and released at the end.
If at that moment, it was the last reference being held, the sk would
not be freed. sock_put() should then be called instead of __sock_put().
But that's not enough: if it is the last reference, sock_put() will call
sk_free(), which will end up calling sk_stop_timer_sync() on the same
timer, and waiting indefinitely to finish. So it is needed to mark that
the timer is done at the end of the timer handler when it has not been
rescheduled, not to call sk_stop_timer_sync() on "itself".
In the Linux kernel, the following vulnerability has been resolved:
hfsplus: fix uninit-value by validating catalog record size
Syzbot reported a KMSAN uninit-value issue in hfsplus_strcasecmp(). The
root cause is that hfs_brec_read() doesn't validate that the on-disk
record size matches the expected size for the record type being read.
When mounting a corrupted filesystem, hfs_brec_read() may read less data
than expected. For example, when reading a catalog thread record, the
debug output showed:
HFSPLUS_BREC_READ: rec_len=520, fd->entrylength=26
HFSPLUS_BREC_READ: WARNING - entrylength (26) < rec_len (520) - PARTIAL READ!
hfs_brec_read() only validates that entrylength is not greater than the
buffer size, but doesn't check if it's less than expected. It successfully
reads 26 bytes into a 520-byte structure and returns success, leaving 494
bytes uninitialized.
This uninitialized data in tmp.thread.nodeName then gets copied by
hfsplus_cat_build_key_uni() and used by hfsplus_strcasecmp(), triggering
the KMSAN warning when the uninitialized bytes are used as array indices
in case_fold().
Fix by introducing hfsplus_brec_read_cat() wrapper that:
1. Calls hfs_brec_read() to read the data
2. Validates the record size based on the type field:
- Fixed size for folder and file records
- Variable size for thread records (depends on string length)
3. Returns -EIO if size doesn't match expected
For thread records, check against HFSPLUS_MIN_THREAD_SZ before reading
nodeName.length to avoid reading uninitialized data at call sites that
don't zero-initialize the entry structure.
Also initialize the tmp variable in hfsplus_find_cat() as defensive
programming to ensure no uninitialized data even if validation is
bypassed.
In the Linux kernel, the following vulnerability has been resolved:
mptcp: fix scheduling with atomic in timestamp sockopt
Using lock_sock_fast() (atomic context) around sock_set_timestamp()
and sock_set_timestamping() is unsafe, as both helpers can sleep.
Replace lock_sock_fast() with sleepable lock_sock()/release_sock()
to avoid scheduling while atomic panic.
In the Linux kernel, the following vulnerability has been resolved:
usb: usblp: fix uninitialized heap leak via LPGETSTATUS ioctl
Just like in a previous problem in this driver, usblp_ctrl_msg() will
collapse the usb_control_msg() return value to 0/-errno, discarding the
actual number of bytes transferred.
Ideally that short command should be detected and error out, but many
printers are known to send "incorrect" responses back so we can't just
do that.
statusbuf is kmalloc(8) at probe time and never filled before the first
LPGETSTATUS ioctl.
usblp_read_status() requests 1 byte. If a malicious printer responds
with zero bytes, *statusbuf is one byte of stale kmalloc heap,
sign-extended into the local int status, which the LPGETSTATUS path then
copy_to_user()s directly to the ioctl caller.
Fix this all by just zapping out the memory buffer when allocated at
probe time. If a later call does a short read, the data will be
identical to what the device sent it the last time, so there is no
"leak" of information happening.
In the Linux kernel, the following vulnerability has been resolved:
wifi: mac80211: use safe list iteration in radar detect work
The call to ieee80211_dfs_cac_cancel can cause the iterated chanctx to
be freed and removed from the list. Guard against this to avoid a
slab-use-after-free error.
In the Linux kernel, the following vulnerability has been resolved:
openvswitch: vport: fix self-deadlock on release of tunnel ports
vports are used concurrently and protected by RCU, so netdev_put()
must happen after the RCU grace period. So, either in an RCU call or
after the synchronize_net(). The rtnl_delete_link() must happen under
RTNL and so can't be executed in RCU context. Calling synchronize_net()
while holding RTNL is not a good idea for performance and system
stability under load in general, so calling netdev_put() in RCU call
is the right solution here.
However,
when the device is deleted, rtnl_unlock() will call netdev_run_todo()
and block until all the references are gone. In the current code this
means that we never reach the call_rcu() and the vport is never freed
and the reference is never released, causing a self-deadlock on device
removal.
Fix that by moving the rcu_call() before the rtnl_unlock(), so the
scheduled RCU callback will be executed when synchronize_net() is
called from the rtnl_unlock()->netdev_run_todo() while the RTNL itself
is already released.
In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix double free in create_space_info_sub_group() error path
When kobject_init_and_add() fails, the call chain is:
create_space_info_sub_group()
-> btrfs_sysfs_add_space_info_type()
-> kobject_init_and_add()
-> failure
-> kobject_put(&sub_group->kobj)
-> space_info_release()
-> kfree(sub_group)
Then control returns to create_space_info_sub_group(), where:
btrfs_sysfs_add_space_info_type() returns error
-> kfree(sub_group)
Thus, sub_group is freed twice.
Keep parent->sub_group[index] = NULL for the failure path, but after
btrfs_sysfs_add_space_info_type() has called kobject_put(), let the
kobject release callback handle the cleanup.
In the Linux kernel, the following vulnerability has been resolved:
wifi: b43legacy: enforce bounds check on firmware key index in RX path
Same fix as b43: the firmware-controlled key index in b43legacy_rx()
can exceed dev->max_nr_keys. The existing B43legacy_WARN_ON is
non-enforcing in production builds, allowing an out-of-bounds read of
dev->key[].
Make the check enforcing by dropping the frame for invalid indices.
In the Linux kernel, the following vulnerability has been resolved:
ice: fix double free in ice_sf_eth_activate() error path
When auxiliary_device_add() fails, ice_sf_eth_activate() jumps to
aux_dev_uninit and calls auxiliary_device_uninit(&sf_dev->adev).
The device release callback ice_sf_dev_release() frees sf_dev, but
the current error path falls through to sf_dev_free and calls
kfree(sf_dev) again, causing a double free.
Keep kfree(sf_dev) for the auxiliary_device_init() failure path, but
avoid falling through to sf_dev_free after auxiliary_device_uninit().
In the Linux kernel, the following vulnerability has been resolved:
md/raid10: fix divide-by-zero in setup_geo() with zero far_copies
setup_geo() extracts near_copies (nc) and far_copies (fc) from the
user-provided layout parameter without checking for zero. When fc=0
with the "improved" far set layout selected, 'geo->far_set_size =
disks / fc' triggers a divide-by-zero.
Validate nc and fc immediately after extraction, returning -1 if
either is zero.
In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix btrfs_ioctl_space_info() slot_count TOCTOU which can lead to info-leak
btrfs_ioctl_space_info() has a TOCTOU race between two passes over the
block group RAID type lists. The first pass counts entries to determine
the allocation size, then the second pass fills the buffer. The
groups_sem rwlock is released between passes, allowing concurrent block
group removal to reduce the entry count.
When the second pass fills fewer entries than the first pass counted,
copy_to_user() copies the full alloc_size bytes including trailing
uninitialized kmalloc bytes to userspace.
Fix by copying only total_spaces entries (the actually-filled count from
the second pass) instead of alloc_size bytes, and switch to kzalloc so
any future copy size mismatch cannot leak heap data.
In the Linux kernel, the following vulnerability has been resolved:
mptcp: pm: ADD_ADDR rtx: always decrease sk refcount
When an ADD_ADDR is retransmitted, the sk is held in sk_reset_timer().
It should then be released in all cases at the end.
Some (unlikely) checks were returning directly instead of calling
sock_put() to decrease the refcount. Jump to a new 'exit' label to call
__sock_put() (which will become sock_put() in the next commit) to fix
this potential leak.
While at it, drop the '!msk' check which cannot happen because it is
never reset, and explicitly mark the remaining one as "unlikely".
In the Linux kernel, the following vulnerability has been resolved:
ALSA: pcm: oss: Fix data race at accessing runtime.oss.trigger
Currently the runtime.oss.trigger field may be accessed concurrently
without protection, which may lead to the data race. And, in this
case, it may lead to more severe problem because it's a bit field; as
writing the data, it may overwrite other bit fields as well, which
confuses the operation completely, as spotted by fuzzing.
Fix it by covering runtime.oss.trigger bit fled also with the existing
params_lock mutex in both snd_pcm_oss_get_trigger() and
snd_pcm_oss_poll().
In the Linux kernel, the following vulnerability has been resolved:
smb/client: fix out-of-bounds read in smb2_compound_op()
If a server sends a truncated response but a large OutputBufferLength, and
terminates the EA list early, check_wsl_eas() returns success without
validating that the entire OutputBufferLength fits within iov_len.
Then smb2_compound_op() does:
memcpy(idata->wsl.eas, data[0], size[0]);
Where size[0] is OutputBufferLength. If iov_len is smaller than size[0],
memcpy can read beyond the end of the rsp_iov allocation and leak adjacent
kernel heap memory.
In the Linux kernel, the following vulnerability has been resolved:
sched_ext: Read scx_root under scx_cgroup_ops_rwsem in cgroup setters
scx_group_set_{weight,idle,bandwidth}() cache scx_root before acquiring
scx_cgroup_ops_rwsem, so the pointer can be stale by the time the op runs.
If the loaded scheduler is disabled and freed (via RCU work) and another is
enabled between the naked load and the rwsem acquire, the reader sees
scx_cgroup_enabled=true (the new scheduler's) but dereferences the freed one
- UAF on SCX_HAS_OP(sch, ...) / SCX_CALL_OP(sch, ...).
scx_cgroup_enabled is toggled only under scx_cgroup_ops_rwsem write
(scx_cgroup_{init,exit}), so reading scx_root inside the rwsem read section
correlates @sch with the enabled snapshot.
In the Linux kernel, the following vulnerability has been resolved:
8021q: delete cleared egress QoS mappings
vlan_dev_set_egress_priority() currently keeps cleared egress
priority mappings in the hash as tombstones. Repeated set/clear cycles
with distinct skb priorities therefore accumulate mapping nodes until
device teardown and leak memory.
Delete mappings when vlan_prio is cleared instead of keeping tombstones.
Now that the egress mapping lists are RCU protected, the node can be
unlinked safely and freed after a grace period.
In the Linux kernel, the following vulnerability has been resolved:
wifi: mac80211: drop stray 'static' from fast-RX rx_result
ieee80211_invoke_fast_rx() is documented as safe for parallel RX, but
its per-invocation rx_result is declared static. Concurrent callers then
share one instance and can overwrite each other's result between
ieee80211_rx_mesh_data() and the switch on res.
That can make a packet that was queued or consumed by
ieee80211_rx_mesh_data() fall through into ieee80211_rx_8023(), or make
a packet that should continue return as queued.
Make res an automatic variable so each invocation keeps its own result.
In the Linux kernel, the following vulnerability has been resolved:
usb: usblp: fix heap leak in IEEE 1284 device ID via short response
usblp_ctrl_msg() collapses the usb_control_msg() return value to
0/-errno, discarding the actual number of bytes transferred. A broken
printer can complete the GET_DEVICE_ID control transfer short and the
driver has no way to know.
usblp_cache_device_id_string() reads the 2-byte big-endian length prefix
from the response and trusts it (clamped only to the buffer bounds).
The buffer is kmalloc(1024) at probe time. A device that sends exactly
two bytes (e.g. 0x03 0xFF, claiming a 1023-byte ID) leaves
device_id_string[2..1022] holding stale kmalloc heap.
That stale data is then exposed:
- via the ieee1284_id sysfs attribute (sprintf("%s", buf+2), truncated
at the first NUL in the stale heap), and
- via the IOCNR_GET_DEVICE_ID ioctl, which copy_to_user()s the full
claimed length regardless of NULs, up to 1021 bytes of uninitialized
heap, with the leak size chosen by the device.
Fix this up by just zapping the buffer with zeros before each request
sent to the device.
In the Linux kernel, the following vulnerability has been resolved:
fanotify: fix false positive on permission events
fsnotify_get_mark_safe() may return false for a mark on an unrelated group,
which results in bypassing the permission check.
Fix by skipping over detached marks that are not in the current group.
In the Linux kernel, the following vulnerability has been resolved:
scsi: target: configfs: Bound snprintf() return in tg_pt_gp_members_show()
target_tg_pt_gp_members_show() formats LUN paths with snprintf() into a
256-byte stack buffer, then will memcpy() cur_len bytes from that
buffer. snprintf() returns the length the output would have had, which
can exceed the buffer size when the fabric WWN is long because iSCSI IQN
names can be up to 223 bytes. The check at the memcpy() site only
guards the destination page write, not the source read, so memcpy() will
read past the stack buffer and copy adjacent stack contents to the sysfs
reader, which when CONFIG_FORTIFY_SOURCE is enabled, fortify_panic()
will be triggered.
Commit 27e06650a5ea ("scsi: target: target_core_configfs: Add length
check to avoid buffer overflow") added the same bound to the
target_lu_gp_members_show() but the tg_pt_gp variant was missed so
resolve that here.
In the Linux kernel, the following vulnerability has been resolved:
spi: microchip-core-qspi: control built-in cs manually
The coreQSPI IP supports only a single chip select, which is
automagically operated by the hardware - set low when the transmit
buffer first gets written to and set high when the number of bytes
written to the TOTALBYTES field of the FRAMES register have been sent on
the bus. Additional devices must use GPIOs for their chip selects.
It was reported to me that if there are two devices attached to this
QSPI controller that the in-built chip select is set low while linux
tries to access the device attached to the GPIO.
This went undetected as the boards that connected multiple devices to
the SPI controller all exclusively used GPIOs for chip selects, not
relying on the built-in chip select at all. It turns out that this was
because the built-in chip select, when controlled automagically, is set
low when active and high when inactive, thereby ruling out its use for
active-high devices or devices that need to transmit with the chip
select disabled.
Modify the driver so that it controls chip select directly, retaining
the behaviour for mem_ops of setting the chip select active for the
entire duration of the transfer in the exec_op callback. For regular
transfers, implement the set_cs callback for the core to use.
As part of this, the existing setup callback, mchp_coreqspi_setup_op(),
is removed. Modifying the CLKIDLE field is not safe to do during
operation when there are multiple devices, so this code is removed
entirely. Setting the MASTER and ENABLE fields is something that can be
done once at probe, it doesn't need to be re-run for each device.
Instead the new setup callback sets the built-in chip select to its
inactive state for active-low devices, as the reset value of the chip
select in software controlled mode is low.
In the Linux kernel, the following vulnerability has been resolved:
KVM: arm64: Fix pin leak and publication ordering in __pkvm_init_vcpu()
Two bugs exist in the vCPU initialisation path:
1. If a check fails after hyp_pin_shared_mem() succeeds, the cleanup
path jumps to 'unlock' without calling unpin_host_vcpu() or
unpin_host_sve_state(), permanently leaking pin references on the
host vCPU and SVE state pages.
Extract a register_hyp_vcpu() helper that performs the checks and
the store. When register_hyp_vcpu() returns an error, call
unpin_host_vcpu() and unpin_host_sve_state() inline before falling
through to the existing 'unlock' label.
2. register_hyp_vcpu() publishes the new vCPU pointer into
'hyp_vm->vcpus[]' with a bare store, allowing a concurrent caller
of pkvm_load_hyp_vcpu() to observe a partially initialised vCPU
object.
Ensure the store uses smp_store_release() and the load uses
smp_load_acquire(). While 'vm_table_lock' currently serialises the
store and the load, these barriers ensure the reader sees the fully
initialised 'hyp_vcpu' object even if there were a lockless path or
if the lock's own ordering guarantees were insufficient for nested
object initialization.
In the Linux kernel, the following vulnerability has been resolved:
ALSA: usb-audio: Avoid potential endless loop in convert_chmap_v3()
The convert_chmap_v3() has a loop with its increment size of
cs_desc->wLength, but we forgot to validate cs_desc->wLength itself,
which may lead to potential endless loop by a malformed descriptor.
Add a proper size check to abort the loop for plugging the hole.
In the Linux kernel, the following vulnerability has been resolved:
RDMA/mana: Validate rx_hash_key_len
Sashiko points out that rx_hash_key_len comes from a uAPI structure and is
blindly passed to memcpy, allowing the userspace to trash kernel
memory. Bounds check it so the memcpy cannot overflow.
In the Linux kernel, the following vulnerability has been resolved:
RDMA/mana: Fix error unwind in mana_ib_create_qp_rss()
Sashiko points out that mana_ib_cfg_vport_steering() is leaked, the normal
destroy path cleans it up.
In the Linux kernel, the following vulnerability has been resolved:
ASoC: qcom: q6apm-lpass-dai: Fix multiple graph opens
As prepare can be called mulitple times, this can result in multiple
graph opens for playback path.
This will result in a memory leaks, fix this by adding a check before
opening.
In the Linux kernel, the following vulnerability has been resolved:
net: libwx: fix VF illegal register access
Register WX_CFG_PORT_ST is a PF restricted register. When a VF is
initialized, attempting to read this register triggers an illegal
register access, which lead to a system hang.
When the device is VF, the bus function ID can be obtained directly from
the PCI_FUNC(pdev->devfn).
In the Linux kernel, the following vulnerability has been resolved:
powerpc/xive: fix kmemleak caused by incorrect chip_data lookup
The kmemleak reports the following memory leak:
Unreferenced object 0xc0000002a7fbc640 (size 64):
comm "kworker/8:1", pid 540, jiffies 4294937872
hex dump (first 32 bytes):
01 00 00 00 00 00 00 00 00 00 09 04 00 04 00 00 ................
00 00 a7 81 00 00 0a c0 00 00 08 04 00 04 00 00 ................
backtrace (crc 177d48f6):
__kmalloc_cache_noprof+0x520/0x730
xive_irq_alloc_data.constprop.0+0x40/0xe0
xive_irq_domain_alloc+0xd0/0x1b0
irq_domain_alloc_irqs_parent+0x44/0x6c
pseries_irq_domain_alloc+0x1cc/0x354
irq_domain_alloc_irqs_parent+0x44/0x6c
msi_domain_alloc+0xb0/0x220
irq_domain_alloc_irqs_locked+0x138/0x4d0
__irq_domain_alloc_irqs+0x8c/0xfc
__msi_domain_alloc_irqs+0x214/0x4d8
msi_domain_alloc_irqs_all_locked+0x70/0xf8
pci_msi_setup_msi_irqs+0x60/0x78
__pci_enable_msix_range+0x54c/0x98c
pci_alloc_irq_vectors_affinity+0x16c/0x1d4
nvme_pci_enable+0xac/0x9c0 [nvme]
nvme_probe+0x340/0x764 [nvme]
This occurs when allocating MSI-X vectors for an NVMe device. During
allocation the XIVE code creates a struct xive_irq_data and stores it
in irq_data->chip_data.
When the MSI-X irqdomain is later freed, xive_irq_free_data() is
responsible for retrieving this structure and freeing it. However,
after commit cc0cc23babc9 ("powerpc/xive: Untangle xive from child
interrupt controller drivers"), xive_irq_free_data() retrieves the
chip_data using irq_get_chip_data(), which looks up the data through
the child domain.
This is incorrect because the XIVE-specific irq data is associated with
the XIVE (parent) domain. As a result the lookup fails and the allocated
struct xive_irq_data is never freed, leading to the kmemleak report
shown above.
Fix this by retrieving the irq_data from the correct domain using
irq_domain_get_irq_data() and then accessing the chip_data via
irq_data_get_irq_chip_data().
In the Linux kernel, the following vulnerability has been resolved:
Bluetooth: btmtk: validate WMT event SKB length before struct access
btmtk_usb_hci_wmt_sync() casts the WMT event response SKB data to
struct btmtk_hci_wmt_evt (7 bytes) and struct btmtk_hci_wmt_evt_funcc
(9 bytes) without first checking that the SKB contains enough data.
A short firmware response causes out-of-bounds reads from SKB tailroom.
Use skb_pull_data() to validate and advance past the base WMT event
header. For the FUNC_CTRL case, pull the additional status field bytes
before accessing them.
In the Linux kernel, the following vulnerability has been resolved:
smb: client: use kzalloc to zero-initialize security descriptor buffer
Commit 62e7dd0a39c2d ("smb: common: change the data type of num_aces
to le16") split struct smb_acl's __le32 num_aces field into __le16
num_aces and __le16 reserved. The reserved field corresponds to Sbz2
in the MS-DTYP ACL wire format, which must be zero [1].
When building an ACL descriptor in build_sec_desc(), we are using a
kmalloc()'ed descriptor buffer and writing the fields explicitly using
le16() writes now. This never writes to the 2 byte reserved field,
leaving it as uninitialized heap data.
When the reserved field happens to contain non-zero slab garbage,
Samba rejects the security descriptor with "ndr_pull_security_descriptor
failed: Range Error", causing chmod to fail with EINVAL.
Change kmalloc() to kzalloc() to ensure the entire buffer is
zero-initialized.
[1] https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-dtyp/20233ed8-a6c6-4097-aafa-dd545ed24428
In the Linux kernel, the following vulnerability has been resolved:
Bluetooth: hci_event: Fix OOB read and infinite loop in hci_le_create_big_complete_evt
hci_le_create_big_complete_evt() iterates over BT_BOUND connections for
a BIG handle using a while loop, accessing ev->bis_handle[i++] on each
iteration. However, there is no check that i stays within ev->num_bis
before the array access.
When a controller sends a LE_Create_BIG_Complete event with fewer
bis_handle entries than there are BT_BOUND connections for that BIG,
or with num_bis=0, the loop reads beyond the valid bis_handle[] flex
array into adjacent heap memory. Since the out-of-bounds values
typically exceed HCI_CONN_HANDLE_MAX (0x0EFF), hci_conn_set_handle()
rejects them and the connection remains in BT_BOUND state. The same
connection is then found again by hci_conn_hash_lookup_big_state(),
creating an infinite loop with hci_dev_lock held.
Fix this by terminating the BIG if in case not all BIS could be setup
properly.
In the Linux kernel, the following vulnerability has been resolved:
mptcp: pm: ADD_ADDR rtx: fix potential data-race
This mptcp_pm_add_timer() helper is executed as a timer callback in
softirq context. To avoid any data races, the socket lock needs to be
held with bh_lock_sock().
If the socket is in use, retry again soon after, similar to what is done
with the keepalive timer.
In the Linux kernel, the following vulnerability has been resolved:
wifi: mt76: mt7921: fix a potential clc buffer length underflow
The buf_len is used to limit the iterations for retrieving the country
power setting and may underflow under certain conditions due to changes
in the power table in CLC.
This underflow leads to an almost infinite loop or an invalid power
setting resulting in driver initialization failure.
In the Linux kernel, the following vulnerability has been resolved:
nvmet-tcp: fix race between ICReq handling and queue teardown
nvmet_tcp_handle_icreq() updates queue->state after sending an
Initialization Connection Response (ICResp), but it does so without
serializing against target-side queue teardown.
If an NVMe/TCP host sends an Initialization Connection Request
(ICReq) and immediately closes the connection, target-side teardown
may start in softirq context before io_work drains the already
buffered ICReq. In that case, nvmet_tcp_schedule_release_queue()
sets queue->state to NVMET_TCP_Q_DISCONNECTING and drops the queue
reference under state_lock.
If io_work later processes that ICReq, nvmet_tcp_handle_icreq() can
still overwrite the state back to NVMET_TCP_Q_LIVE. That defeats the
DISCONNECTING-state guard in nvmet_tcp_schedule_release_queue() and
allows a later socket state change to re-enter teardown and issue a
second kref_put() on an already released queue.
The ICResp send failure path has the same problem. If teardown has
already moved the queue to DISCONNECTING, a send error can still
overwrite the state with NVMET_TCP_Q_FAILED, again reopening the
window for a second teardown path to drop the queue reference.
Fix this by serializing both post-send state transitions with
state_lock and bailing out if teardown has already started.
Use -ESHUTDOWN as an internal sentinel for that bail-out path rather
than propagating it as a transport error like -ECONNRESET. Keep
nvmet_tcp_socket_error() setting rcv_state to NVMET_TCP_RECV_ERR before
honoring that sentinel so receive-side parsing stays quiesced until the
existing release path completes.
In the Linux kernel, the following vulnerability has been resolved:
platform/chrome: cros_ec_typec: Init mutex in Thunderbolt registration
cros_typec_register_thunderbolt() missed initializing the `adata->lock`
mutex. This leads to a NULL dereference when the mutex is later
acquired (e.g. in cros_typec_altmode_work()).
Initialize the mutex in cros_typec_register_thunderbolt() to fix the
issue.
In the Linux kernel, the following vulnerability has been resolved:
RDMA/rxe: Reject unknown opcodes before ICRC processing
Even after applying commit 7244491dab34 ("RDMA/rxe: Validate pad and ICRC
before payload_size() in rxe_rcv"), a single unauthenticated UDP packet
can still trigger panic. That patch handled payload_size() underflow only
for valid opcodes with short packets, not for packets carrying an unknown
opcode. The unknown-opcode OOB read described below predates that commit
and reaches back to the initial Soft RoCE driver.
The check added there reads
pkt->paylen < header_size(pkt) + bth_pad(pkt) + RXE_ICRC_SIZE
where header_size(pkt) expands to rxe_opcode[pkt->opcode].length. The
rxe_opcode[] array has 256 entries but is only populated for defined IB
opcodes; any other entry (for example opcode 0xff) is zero-initialized, so
length == 0 and the check degenerates to
pkt->paylen < 0 + bth_pad(pkt) + RXE_ICRC_SIZE
which does not constrain pkt->paylen enough. rxe_icrc_hdr() then computes
rxe_opcode[pkt->opcode].length - RXE_BTH_BYTES
which underflows when length == 0 and passes a huge value to rxe_crc32(),
causing an out-of-bounds read of the skb payload.
Reproduced on v7.0-rc7 with that fix applied, QEMU/KVM with
CONFIG_RDMA_RXE=y and CONFIG_KASAN=y, after
rdma link add rxe0 type rxe netdev eth0
A single 48-byte UDP packet to port 4791 with BTH opcode=0xff and
QPN=IB_MULTICAST_QPN triggers:
BUG: KASAN: slab-out-of-bounds in crc32_le+0x115/0x170
Read of size 1 at addr ...
The buggy address is located 0 bytes to the right of
allocated 704-byte region
Call Trace:
crc32_le+0x115/0x170
rxe_icrc_hdr.isra.0+0x226/0x300
rxe_icrc_check+0x13f/0x3a0
rxe_rcv+0x6e1/0x16e0
rxe_udp_encap_recv+0x20a/0x320
udp_queue_rcv_one_skb+0x7ed/0x12c0
Subsequent packets with the same shape fault on unmapped memory and panic
the kernel. The trigger requires only module load and "rdma link add"; no
QP, no connection, and no authentication.
Fix this by rejecting packets whose opcode has no rxe_opcode[] entry,
detected via the zero mask or zero length, before any length arithmetic
runs.
In the Linux kernel, the following vulnerability has been resolved:
net: rtnetlink: zero ifla_vf_broadcast to avoid stack infoleak in rtnl_fill_vfinfo
rtnl_fill_vfinfo() declares struct ifla_vf_broadcast on the stack
without initialisation:
struct ifla_vf_broadcast vf_broadcast;
The struct contains a single fixed 32-byte field:
/* include/uapi/linux/if_link.h */
struct ifla_vf_broadcast {
__u8 broadcast[32];
};
The function then copies dev->broadcast into it using dev->addr_len
as the length:
memcpy(vf_broadcast.broadcast, dev->broadcast, dev->addr_len);
On Ethernet devices (the overwhelming majority of SR-IOV NICs)
dev->addr_len is 6, so only the first 6 bytes of broadcast[] are
written. The remaining 26 bytes retain whatever was previously on
the kernel stack. The full struct is then handed to userspace via:
nla_put(skb, IFLA_VF_BROADCAST,
sizeof(vf_broadcast), &vf_broadcast)
leaking up to 26 bytes of uninitialised kernel stack per VF per
RTM_GETLINK request, repeatable.
The other vf_* structs in the same function are explicitly zeroed
for exactly this reason - see the memset() calls for ivi,
vf_vlan_info, node_guid and port_guid a few lines above.
vf_broadcast was simply missed when it was added.
Reachability: any unprivileged local process can open AF_NETLINK /
NETLINK_ROUTE without capabilities and send RTM_GETLINK with an
IFLA_EXT_MASK attribute carrying RTEXT_FILTER_VF. The kernel walks
each VF and emits IFLA_VF_BROADCAST, leaking 26 bytes of stack per
VF per request. Stack residue at this call site can include return
addresses and transient sensitive data; KASAN with stack
instrumentation, or KMSAN, will flag the nla_put() when reproduced.
Zero the on-stack struct before the partial memcpy, matching the
existing pattern used for the other vf_* structs in the same
function.
In the Linux kernel, the following vulnerability has been resolved:
KVM: x86: check for nEPT/nNPT in slow flush hypercalls
Checking is_guest_mode(vcpu) is incorrect, because translate_nested_gpa()
is only valid if an L2 guest is running *with nested EPT/NPT enabled*.
Instead use the same condition as translate_nested_gpa() itself.
In the Linux kernel, the following vulnerability has been resolved:
dm-verity-fec: fix reading parity bytes split across blocks (take 3)
fec_decode_bufs() assumes that the parity bytes of the first RS codeword
it decodes are never split across parity blocks.
This assumption is false. Consider v->fec->block_size == 4096 &&
v->fec->roots == 17 && fio->nbufs == 1, for example. In that case, each
call to fec_decode_bufs() consumes v->fec->roots * (fio->nbufs <<
DM_VERITY_FEC_BUF_RS_BITS) = 272 parity bytes.
Considering that the parity data for each message block starts on a
block boundary, the byte alignment in the parity data will iterate
through 272*i mod 4096 until the 3 parity blocks have been consumed. On
the 16th call (i=15), the alignment will be 4080 bytes into the first
block. Only 16 bytes remain in that block, but 17 parity bytes will be
needed. The code reads out-of-bounds from the parity block buffer.
Fortunately this doesn't normally happen, since it can occur only for
certain non-default values of fec_roots *and* when the maximum number of
buffers couldn't be allocated due to low memory. For example with
block_size=4096 only the following cases are affected:
fec_roots=17: nbufs in [1, 3, 5, 15]
fec_roots=19: nbufs in [1, 229]
fec_roots=21: nbufs in [1, 3, 5, 13, 15, 39, 65, 195]
fec_roots=23: nbufs in [1, 89]
Regardless, fix it by refactoring how the parity blocks are read.
In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix double free in create_space_info() error path
When kobject_init_and_add() fails, the call chain is:
create_space_info()
-> btrfs_sysfs_add_space_info_type()
-> kobject_init_and_add()
-> failure
-> kobject_put(&space_info->kobj)
-> space_info_release()
-> kfree(space_info)
Then control returns to create_space_info():
btrfs_sysfs_add_space_info_type() returns error
-> goto out_free
-> kfree(space_info)
This causes a double free.
Keep the direct kfree(space_info) for the earlier failure path, but
after btrfs_sysfs_add_space_info_type() has called kobject_put(), let
the kobject release callback handle the cleanup.
In the Linux kernel, the following vulnerability has been resolved:
ipmi: Check event message buffer response for bad data
The event message buffer response data size got checked later when
processing, but check it right after the response comes back. It
appears some BMCs may return an empty message instead of an error
when fetching events.
There are apparently some new BMCs that make this error, so we need to
compensate.
In the Linux kernel, the following vulnerability has been resolved:
RDMA/ocrdma: Don't NULL deref uctx on errors in ocrdma_copy_pd_uresp()
Sashiko points out that pd->uctx isn't initialized until late in the
function so all these error flow references are NULL and will crash. Use
the uctx that isn't NULL.
In the Linux kernel, the following vulnerability has been resolved:
RDMA/mana: Fix mana_destroy_wq_obj() cleanup in mana_ib_create_qp_rss()
Sashiko points out there are two bugs here in the error unwind flow, both
related to how the WQ table is unwound.
First there is a double i-- on the first failure path due to the while loop
having a i--, remove it.
Second if mana_ib_install_cq_cb() fails then mana_create_wq_obj() is not
undone due to the above i--.