In the Linux kernel, the following vulnerability has been resolved:
x86/kexec: Disable KCOV instrumentation after load_segments()
The load_segments() function changes segment registers, invalidating GS base
(which KCOV relies on for per-cpu data). When CONFIG_KCOV is enabled, any
subsequent instrumented C code call (e.g. native_gdt_invalidate()) begins
crashing the kernel in an endless loop.
To reproduce the problem, it's sufficient to do kexec on a KCOV-instrumented
kernel:
$ kexec -l /boot/otherKernel
$ kexec -e
The real-world context for this problem is enabling crash dump collection in
syzkaller. For this, the tool loads a panic kernel before fuzzing and then
calls makedumpfile after the panic. This workflow requires both CONFIG_KEXEC
and CONFIG_KCOV to be enabled simultaneously.
Adding safeguards directly to the KCOV fast-path (__sanitizer_cov_trace_pc())
is also undesirable as it would introduce an extra performance overhead.
Disabling instrumentation for the individual functions would be too fragile,
so disable KCOV instrumentation for the entire machine_kexec_64.c and
physaddr.c. If coverage-guided fuzzing ever needs these components in the
future, other approaches should be considered.
The problem is not relevant for 32 bit kernels as CONFIG_KCOV is not supported
there.
[ bp: Space out comment for better readability. ]
In the Linux kernel, the following vulnerability has been resolved:
crypto: caam - fix overflow on long hmac keys
When a key longer than block size is supplied, it is copied and then
hashed into the real key. The memory allocated for the copy needs to
be rounded to DMA cache alignment, as otherwise the hashed key may
corrupt neighbouring memory.
The copying is performed using kmemdup, however this leads to an overflow:
reading more bytes (aligned_len - keylen) from the keylen source buffer.
Fix this by replacing kmemdup with kmalloc, followed by memcpy.
In the Linux kernel, the following vulnerability has been resolved:
netfilter: flowtable: strictly check for maximum number of actions
The maximum number of flowtable hardware offload actions in IPv6 is:
* ethernet mangling (4 payload actions, 2 for each ethernet address)
* SNAT (4 payload actions)
* DNAT (4 payload actions)
* Double VLAN (4 vlan actions, 2 for popping vlan, and 2 for pushing)
for QinQ.
* Redirect (1 action)
Which makes 17, while the maximum is 16. But act_ct supports for tunnels
actions too. Note that payload action operates at 32-bit word level, so
mangling an IPv6 address takes 4 payload actions.
Update flow_action_entry_next() calls to check for the maximum number of
supported actions.
While at it, rise the maximum number of actions per flow from 16 to 24
so this works fine with IPv6 setups.
In the Linux kernel, the following vulnerability has been resolved:
cpufreq: governor: fix double free in cpufreq_dbs_governor_init() error path
When kobject_init_and_add() fails, cpufreq_dbs_governor_init() calls
kobject_put(&dbs_data->attr_set.kobj).
The kobject release callback cpufreq_dbs_data_release() calls
gov->exit(dbs_data) and kfree(dbs_data), but the current error path
then calls gov->exit(dbs_data) and kfree(dbs_data) again, causing a
double free.
Keep the direct kfree(dbs_data) for the gov->init() failure path, but
after kobject_init_and_add() has been called, let kobject_put() handle
the cleanup through cpufreq_dbs_data_release().
In the Linux kernel, the following vulnerability has been resolved:
USB: dummy-hcd: Fix locking/synchronization error
Syzbot testing was able to provoke an addressing exception and crash
in the usb_gadget_udc_reset() routine in
drivers/usb/gadgets/udc/core.c, resulting from the fact that the
routine was called with a second ("driver") argument of NULL. The bad
caller was set_link_state() in dummy_hcd.c, and the problem arose
because of a race between a USB reset and driver unbind.
These sorts of races were not supposed to be possible; commit
7dbd8f4cabd9 ("USB: dummy-hcd: Fix erroneous synchronization change"),
along with a few followup commits, was written specifically to prevent
them. As it turns out, there are (at least) two errors remaining in
the code. Another patch will address the second error; this one is
concerned with the first.
The error responsible for the syzbot crash occurred because the
stop_activity() routine will sometimes drop and then re-acquire the
dum->lock spinlock. A call to stop_activity() occurs in
set_link_state() when handling an emulated USB reset, after the test
of dum->ints_enabled and before the increment of dum->callback_usage.
This allowed another thread (doing a driver unbind) to sneak in and
grab the spinlock, and then clear dum->ints_enabled and dum->driver.
Normally this other thread would have to wait for dum->callback_usage
to go down to 0 before it would clear dum->driver, but in this case it
didn't have to wait since dum->callback_usage had not yet been
incremented.
The fix is to increment dum->callback_usage _before_ calling
stop_activity() instead of after. Then the thread doing the unbind
will not clear dum->driver until after the call to
usb_gadget_udc_reset() safely returns and dum->callback_usage has been
decremented again.
In the Linux kernel, the following vulnerability has been resolved:
sched_ext: Fix SCX_KICK_WAIT deadlock by deferring wait to balance callback
SCX_KICK_WAIT busy-waits in kick_cpus_irq_workfn() using
smp_cond_load_acquire() until the target CPU's kick_sync advances. Because
the irq_work runs in hardirq context, the waiting CPU cannot reschedule and
its own kick_sync never advances. If multiple CPUs form a wait cycle, all
CPUs deadlock.
Replace the busy-wait in kick_cpus_irq_workfn() with resched_curr() to
force the CPU through do_pick_task_scx(), which queues a balance callback
to perform the wait. The balance callback drops the rq lock and enables
IRQs following the sched_core_balance() pattern, so the CPU can process
IPIs while waiting. The local CPU's kick_sync is advanced on entry to
do_pick_task_scx() and continuously during the wait, ensuring any CPU that
starts waiting for us sees the advancement and cannot form cyclic
dependencies.
In the Linux kernel, the following vulnerability has been resolved:
wifi: iwlwifi: mvm: don't send a 6E related command when not supported
MCC_ALLOWED_AP_TYPE_CMD is related to 6E support. Do not send it if the
device doesn't support 6E.
Apparently, the firmware is mistakenly advertising support for this
command even on AX201 which does not support 6E and then the firmware
crashes.
In the Linux kernel, the following vulnerability has been resolved:
USB: dummy-hcd: Fix interrupt synchronization error
This fixes an error in synchronization in the dummy-hcd driver. The
error has a somewhat involved history. The synchronization mechanism
was introduced by commit 7dbd8f4cabd9 ("USB: dummy-hcd: Fix erroneous
synchronization change"), which added an emulated "interrupts enabled"
flag together with code emulating synchronize_irq() (it waits until
all current handler callbacks have returned).
But the emulated interrupt-disable occurred too late, after the driver
containing the handler callback routines had been told that it was
unbound and no more callbacks would occur. Commit 4a5d797a9f9c ("usb:
gadget: dummy_hcd: fix gpf in gadget_setup") tried to fix this by
moving the synchronize_irq() emulation code from dummy_stop() to
dummy_pullup(), which runs before the unbind callback.
There still were races, though, because the emulated interrupt-disable
still occurred too late. It couldn't be moved to dummy_pullup(),
because that routine can be called for reasons other than an impending
unbind. Therefore commits 7dc0c55e9f30 ("USB: UDC core: Add
udc_async_callbacks gadget op") and 04145a03db9d ("USB: UDC: Implement
udc_async_callbacks in dummy-hcd") added an API allowing the UDC core
to tell dummy-hcd exactly when emulated interrupts and their callbacks
should be disabled.
That brings us to the current state of things, which is still wrong
because the emulated synchronize_irq() occurs before the emulated
interrupt-disable! That's no good, beause it means that more emulated
interrupts can occur after the synchronize_irq() emulation has run,
leading to the possibility that a callback handler may be running when
the gadget driver is unbound.
To fix this, we have to move the synchronize_irq() emulation code yet
again, to the dummy_udc_async_callbacks() routine, which takes care of
enabling and disabling emulated interrupt requests. The
synchronization will now run immediately after emulated interrupts are
disabled, which is where it belongs.
In the Linux kernel, the following vulnerability has been resolved:
sched/fair: Fix zero_vruntime tracking fix
John reported that stress-ng-yield could make his machine unhappy and
managed to bisect it to commit b3d99f43c72b ("sched/fair: Fix
zero_vruntime tracking").
The combination of yield and that commit was specific enough to
hypothesize the following scenario:
Suppose we have 2 runnable tasks, both doing yield. Then one will be
eligible and one will not be, because the average position must be in
between these two entities.
Therefore, the runnable task will be eligible, and be promoted a full
slice (all the tasks do is yield after all). This causes it to jump over
the other task and now the other task is eligible and current is no
longer. So we schedule.
Since we are runnable, there is no {de,en}queue. All we have is the
__{en,de}queue_entity() from {put_prev,set_next}_task(). But per the
fingered commit, those two no longer move zero_vruntime.
All that moves zero_vruntime are tick and full {de,en}queue.
This means, that if the two tasks playing leapfrog can reach the
critical speed to reach the overflow point inside one tick's worth of
time, we're up a creek.
Additionally, when multiple cgroups are involved, there is no guarantee
the tick will in fact hit every cgroup in a timely manner. Statistically
speaking it will, but that same statistics does not rule out the
possibility of one cgroup not getting a tick for a significant amount of
time -- however unlikely.
Therefore, just like with the yield() case, force an update at the end
of every slice. This ensures the update is never more than a single
slice behind and the whole thing is within 2 lag bounds as per the
comment on entity_key().
In the Linux kernel, the following vulnerability has been resolved:
bpf: Properly mark live registers for indirect jumps
For a `gotox rX` instruction the rX register should be marked as used
in the compute_insn_live_regs() function. Fix this.
In the Linux kernel, the following vulnerability has been resolved:
drm/amd/display: Fix dsc eDP issue
[why]
Need to add function hook check before use
In the Linux kernel, the following vulnerability has been resolved:
spi: spidev: fix lock inversion between spi_lock and buf_lock
The spidev driver previously used two mutexes, spi_lock and buf_lock,
but acquired them in different orders depending on the code path:
write()/read(): buf_lock -> spi_lock
ioctl(): spi_lock -> buf_lock
This AB-BA locking pattern triggers lockdep warnings and can
cause real deadlocks:
WARNING: possible circular locking dependency detected
spidev_ioctl() -> mutex_lock(&spidev->buf_lock)
spidev_sync_write() -> mutex_lock(&spidev->spi_lock)
*** DEADLOCK ***
The issue is reproducible with a simple userspace program that
performs write() and SPI_IOC_WR_MAX_SPEED_HZ ioctl() calls from
separate threads on the same spidev file descriptor.
Fix this by simplifying the locking model and removing the lock
inversion entirely. spidev_sync() no longer performs any locking,
and all callers serialize access using spi_lock.
buf_lock is removed since its functionality is fully covered by
spi_lock, eliminating the possibility of lock ordering issues.
This removes the lock inversion and prevents deadlocks without
changing userspace ABI or behaviour.
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: fix sync handling in amdgpu_dma_buf_move_notify
Invalidating a dmabuf will impact other users of the shared BO.
In the scenario where process A moves the BO, it needs to inform
process B about the move and process B will need to update its
page table.
The commit fixes a synchronisation bug caused by the use of the
ticket: it made amdgpu_vm_handle_moved behave as if updating
the page table immediately was correct but in this case it's not.
An example is the following scenario, with 2 GPUs and glxgears
running on GPU0 and Xorg running on GPU1, on a system where P2P
PCI isn't supported:
glxgears:
export linear buffer from GPU0 and import using GPU1
submit frame rendering to GPU0
submit tiled->linear blit
Xorg:
copy of linear buffer
The sequence of jobs would be:
drm_sched_job_run # GPU0, frame rendering
drm_sched_job_queue # GPU0, blit
drm_sched_job_done # GPU0, frame rendering
drm_sched_job_run # GPU0, blit
move linear buffer for GPU1 access #
amdgpu_dma_buf_move_notify -> update pt # GPU0
It this point the blit job on GPU0 is still running and would
likely produce a page fault.
In the Linux kernel, the following vulnerability has been resolved:
most: core: fix leak on early registration failure
A recent commit fixed a resource leak on early registration failures but
for some reason left out the first error path which still leaks the
resources associated with the interface.
Fix up also the first error path so that the interface is always
released on errors.
In the Linux kernel, the following vulnerability has been resolved:
media: solo6x10: Check for out of bounds chip_id
Clang with CONFIG_UBSAN_SHIFT=y noticed a condition where a signed type
(literal "1" is an "int") could end up being shifted beyond 32 bits,
so instrumentation was added (and due to the double is_tw286x() call
seen via inlining), Clang decides the second one must now be undefined
behavior and elides the rest of the function[1]. This is a known problem
with Clang (that is still being worked on), but we can avoid the entire
problem by actually checking the existing max chip ID, and now there is
no runtime instrumentation added at all since everything is known to be
within bounds.
Additionally use an unsigned value for the shift to remove the
instrumentation even without the explicit bounds checking.
[hverkuil: fix checkpatch warning for is_tw286x]
In the Linux kernel, the following vulnerability has been resolved:
KVM: nSVM: Remove a user-triggerable WARN on nested_svm_load_cr3() succeeding
Drop the WARN in svm_set_nested_state() on nested_svm_load_cr3() failing
as it is trivially easy to trigger from userspace by modifying CPUID after
loading CR3. E.g. modifying the state restoration selftest like so:
--- tools/testing/selftests/kvm/x86/state_test.c
+++ tools/testing/selftests/kvm/x86/state_test.c
@@ -280,7 +280,16 @@ int main(int argc, char *argv[])
/* Restore state in a new VM. */
vcpu = vm_recreate_with_one_vcpu(vm);
- vcpu_load_state(vcpu, state);
+
+ if (stage == 4) {
+ state->sregs.cr3 = BIT(44);
+ vcpu_load_state(vcpu, state);
+
+ vcpu_set_cpuid_property(vcpu, X86_PROPERTY_MAX_PHY_ADDR, 36);
+ __vcpu_nested_state_set(vcpu, &state->nested);
+ } else {
+ vcpu_load_state(vcpu, state);
+ }
/*
* Restore XSAVE state in a dummy vCPU, first without doing
generates:
WARNING: CPU: 30 PID: 938 at arch/x86/kvm/svm/nested.c:1877 svm_set_nested_state+0x34a/0x360 [kvm_amd]
Modules linked in: kvm_amd kvm irqbypass [last unloaded: kvm]
CPU: 30 UID: 1000 PID: 938 Comm: state_test Tainted: G W 6.18.0-rc7-58e10b63777d-next-vm
Tainted: [W]=WARN
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015
RIP: 0010:svm_set_nested_state+0x34a/0x360 [kvm_amd]
Call Trace:
<TASK>
kvm_arch_vcpu_ioctl+0xf33/0x1700 [kvm]
kvm_vcpu_ioctl+0x4e6/0x8f0 [kvm]
__x64_sys_ioctl+0x8f/0xd0
do_syscall_64+0x61/0xad0
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Simply delete the WARN instead of trying to prevent userspace from shoving
"illegal" state into CR3. For better or worse, KVM's ABI allows userspace
to set CPUID after SREGS, and vice versa, and KVM is very permissive when
it comes to guest CPUID. I.e. attempting to enforce the virtual CPU model
when setting CPUID could break userspace. Given that the WARN doesn't
provide any meaningful protection for KVM or benefit for userspace, simply
drop it even though the odds of breaking userspace are minuscule.
Opportunistically delete a spurious newline.
In the Linux kernel, the following vulnerability has been resolved:
dm: remove fake timeout to avoid leak request
Since commit 15f73f5b3e59 ("blk-mq: move failure injection out of
blk_mq_complete_request"), drivers are responsible for calling
blk_should_fake_timeout() at appropriate code paths and opportunities.
However, the dm driver does not implement its own timeout handler and
relies on the timeout handling of its slave devices.
If an io-timeout-fail error is injected to a dm device, the request
will be leaked and never completed, causing tasks to hang indefinitely.
Reproduce:
1. prepare dm which has iscsi slave device
2. inject io-timeout-fail to dm
echo 1 >/sys/class/block/dm-0/io-timeout-fail
echo 100 >/sys/kernel/debug/fail_io_timeout/probability
echo 10 >/sys/kernel/debug/fail_io_timeout/times
3. read/write dm
4. iscsiadm -m node -u
Result: hang task like below
[ 862.243768] INFO: task kworker/u514:2:151 blocked for more than 122 seconds.
[ 862.244133] Tainted: G E 6.19.0-rc1+ #51
[ 862.244337] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 862.244718] task:kworker/u514:2 state:D stack:0 pid:151 tgid:151 ppid:2 task_flags:0x4288060 flags:0x00080000
[ 862.245024] Workqueue: iscsi_ctrl_3:1 __iscsi_unbind_session [scsi_transport_iscsi]
[ 862.245264] Call Trace:
[ 862.245587] <TASK>
[ 862.245814] __schedule+0x810/0x15c0
[ 862.246557] schedule+0x69/0x180
[ 862.246760] blk_mq_freeze_queue_wait+0xde/0x120
[ 862.247688] elevator_change+0x16d/0x460
[ 862.247893] elevator_set_none+0x87/0xf0
[ 862.248798] blk_unregister_queue+0x12e/0x2a0
[ 862.248995] __del_gendisk+0x231/0x7e0
[ 862.250143] del_gendisk+0x12f/0x1d0
[ 862.250339] sd_remove+0x85/0x130 [sd_mod]
[ 862.250650] device_release_driver_internal+0x36d/0x530
[ 862.250849] bus_remove_device+0x1dd/0x3f0
[ 862.251042] device_del+0x38a/0x930
[ 862.252095] __scsi_remove_device+0x293/0x360
[ 862.252291] scsi_remove_target+0x486/0x760
[ 862.252654] __iscsi_unbind_session+0x18a/0x3e0 [scsi_transport_iscsi]
[ 862.252886] process_one_work+0x633/0xe50
[ 862.253101] worker_thread+0x6df/0xf10
[ 862.253647] kthread+0x36d/0x720
[ 862.254533] ret_from_fork+0x2a6/0x470
[ 862.255852] ret_from_fork_asm+0x1a/0x30
[ 862.256037] </TASK>
Remove the blk_should_fake_timeout() check from dm, as dm has no
native timeout handling and should not attempt to fake timeouts.
In the Linux kernel, the following vulnerability has been resolved:
ACPI: processor: Fix NULL-pointer dereference in acpi_processor_errata_piix4()
In acpi_processor_errata_piix4(), the pointer dev is first assigned an IDE
device and then reassigned an ISA device:
dev = pci_get_subsys(..., PCI_DEVICE_ID_INTEL_82371AB, ...);
dev = pci_get_subsys(..., PCI_DEVICE_ID_INTEL_82371AB_0, ...);
If the first lookup succeeds but the second fails, dev becomes NULL. This
leads to a potential null-pointer dereference when dev_dbg() is called:
if (errata.piix4.bmisx)
dev_dbg(&dev->dev, ...);
To prevent this, use two temporary pointers and retrieve each device
independently, avoiding overwriting dev with a possible NULL value.
[ rjw: Subject adjustment, added an empty code line ]
In the Linux kernel, the following vulnerability has been resolved:
media: i2c: ov5647: Initialize subdev before controls
In ov5647_init_controls() we call v4l2_get_subdevdata, but it is
initialized by v4l2_i2c_subdev_init() in the probe, which currently
happens after init_controls(). This can result in a segfault if the
error condition is hit, and we try to access i2c_client, so fix the
order.
In the Linux kernel, the following vulnerability has been resolved:
soc/tegra: pmc: Fix unsafe generic_handle_irq() call
Currently, when resuming from system suspend on Tegra platforms,
the following warning is observed:
WARNING: CPU: 0 PID: 14459 at kernel/irq/irqdesc.c:666
Call trace:
handle_irq_desc+0x20/0x58 (P)
tegra186_pmc_wake_syscore_resume+0xe4/0x15c
syscore_resume+0x3c/0xb8
suspend_devices_and_enter+0x510/0x540
pm_suspend+0x16c/0x1d8
The warning occurs because generic_handle_irq() is being called from
a non-interrupt context which is considered as unsafe.
Fix this warning by deferring generic_handle_irq() call to an IRQ work
which gets executed in hard IRQ context where generic_handle_irq()
can be called safely.
When PREEMPT_RT kernels are used, regular IRQ work (initialized with
init_irq_work) is deferred to run in per-CPU kthreads in preemptible
context rather than hard IRQ context. Hence, use the IRQ_WORK_INIT_HARD
variant so that with PREEMPT_RT kernels, the IRQ work is processed in
hardirq context instead of being deferred to a thread which is required
for calling generic_handle_irq().
On non-PREEMPT_RT kernels, both init_irq_work() and IRQ_WORK_INIT_HARD()
execute in IRQ context, so this change has no functional impact for
standard kernel configurations.
[treding@nvidia.com: miscellaneous cleanups]
In the Linux kernel, the following vulnerability has been resolved:
media: verisilicon: Avoid G2 bus error while decoding H.264 and HEVC
For the i.MX8MQ platform, there is a hardware limitation: the g1 VPU and
g2 VPU cannot decode simultaneously; otherwise, it will cause below bus
error and produce corrupted pictures, even potentially lead to system hang.
[ 110.527986] hantro-vpu 38310000.video-codec: frame decode timed out.
[ 110.583517] hantro-vpu 38310000.video-codec: bus error detected.
Therefore, it is necessary to ensure that g1 and g2 operate alternately.
This allows for successful multi-instance decoding of H.264 and HEVC.
To achieve this, g1 and g2 share the same v4l2_m2m_dev, and then the
v4l2_m2m_dev can handle the scheduling.
In the Linux kernel, the following vulnerability has been resolved:
md raid: fix hang when stopping arrays with metadata through dm-raid
When using device-mapper's dm-raid target, stopping a RAID array can cause
the system to hang under specific conditions.
This occurs when:
- A dm-raid managed device tree is suspended from top to bottom
(the top-level RAID device is suspended first, followed by its
underlying metadata and data devices)
- The top-level RAID device is then removed
Removing the top-level device triggers a hang in the following sequence:
the dm-raid destructor calls md_stop(), which tries to flush the
write-intent bitmap by writing to the metadata sub-devices. However, these
devices are already suspended, making them unable to complete the write-intent
operations and causing an indefinite block.
Fix:
- Prevent bitmap flushing when md_stop() is called from dm-raid
destructor context
and avoid a quiescing/unquescing cycle which could also cause I/O
- Still allow write-intent bitmap flushing when called from dm-raid
suspend context
This ensures that RAID array teardown can complete successfully even when the
underlying devices are in a suspended state.
This second patch uses md_is_rdwr() to distinguish between suspend and
destructor paths as elaborated on above.
In the Linux kernel, the following vulnerability has been resolved:
btrfs: don't BUG() on unexpected delayed ref type in run_one_delayed_ref()
There is no need to BUG(), we can just return an error and log an error
message.
In the Linux kernel, the following vulnerability has been resolved:
iio: accel: adxl380: Avoid reading more entries than present in FIFO
The interrupt handler reads FIFO entries in batches of N samples, where N
is the number of scan elements that have been enabled. However, the sensor
fills the FIFO one sample at a time, even when more than one channel is
enabled. Therefore,the number of entries reported by the FIFO status
registers may not be a multiple of N; if this number is not a multiple, the
number of entries read from the FIFO may exceed the number of entries
actually present.
To fix the above issue, round down the number of FIFO entries read from the
status registers so that it is always a multiple of N.
In the Linux kernel, the following vulnerability has been resolved:
bpf: crypto: Use the correct destructor kfunc type
With CONFIG_CFI enabled, the kernel strictly enforces that indirect
function calls use a function pointer type that matches the target
function. I ran into the following type mismatch when running BPF
self-tests:
CFI failure at bpf_obj_free_fields+0x190/0x238 (target:
bpf_crypto_ctx_release+0x0/0x94; expected type: 0xa488ebfc)
Internal error: Oops - CFI: 00000000f2008228 [#1] SMP
...
As bpf_crypto_ctx_release() is also used in BPF programs and using
a void pointer as the argument would make the verifier unhappy, add
a simple stub function with the correct type and register it as the
destructor kfunc instead.
In the Linux kernel, the following vulnerability has been resolved:
drm/amd/display: Fix mismatched unlock for DMUB HW lock in HWSS fast path
[Why]
The evaluation for whether we need to use the DMUB HW lock isn't the
same as whether we need to unlock which results in a hang when the
fast path is used for ASIC without FAMS support.
[How]
Store a flag that indicates whether we should use the lock and use
that same flag to specify whether unlocking is needed.
In the Linux kernel, the following vulnerability has been resolved:
libceph: define and enforce CEPH_MAX_KEY_LEN
When decoding the key, verify that the key material would fit into
a fixed-size buffer in process_auth_done() and generally has a sane
length.
The new CEPH_MAX_KEY_LEN check replaces the existing check for a key
with no key material which is a) not universal since CEPH_CRYPTO_NONE
has to be excluded and b) doesn't provide much value since a smaller
than needed key is just as invalid as no key -- this has to be handled
elsewhere anyway.
In the Linux kernel, the following vulnerability has been resolved:
mm/page_alloc: clear page->private in free_pages_prepare()
Several subsystems (slub, shmem, ttm, etc.) use page->private but don't
clear it before freeing pages. When these pages are later allocated as
high-order pages and split via split_page(), tail pages retain stale
page->private values.
This causes a use-after-free in the swap subsystem. The swap code uses
page->private to track swap count continuations, assuming freshly
allocated pages have page->private == 0. When stale values are present,
swap_count_continued() incorrectly assumes the continuation list is valid
and iterates over uninitialized page->lru containing LIST_POISON values,
causing a crash:
KASAN: maybe wild-memory-access in range [0xdead000000000100-0xdead000000000107]
RIP: 0010:__do_sys_swapoff+0x1151/0x1860
Fix this by clearing page->private in free_pages_prepare(), ensuring all
freed pages have clean state regardless of previous use.
In the Linux kernel, the following vulnerability has been resolved:
media: chips-media: wave5: Fix PM runtime usage count underflow
Replace pm_runtime_put_sync() with pm_runtime_dont_use_autosuspend() in
the remove path to properly pair with pm_runtime_use_autosuspend() from
probe. This allows pm_runtime_disable() to handle reference count cleanup
correctly regardless of current suspend state.
The driver calls pm_runtime_put_sync() unconditionally in remove, but the
device may already be suspended due to autosuspend configured in probe.
When autosuspend has already suspended the device, the usage count is 0,
and pm_runtime_put_sync() decrements it to -1.
This causes the following warning on module unload:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 963 at kernel/kthread.c:1430
kthread_destroy_worker+0x84/0x98
...
vdec 30210000.video-codec: Runtime PM usage count underflow!
In the Linux kernel, the following vulnerability has been resolved:
drm/panel: Fix a possible null-pointer dereference in jdi_panel_dsi_remove()
In jdi_panel_dsi_remove(), jdi is explicitly checked, indicating that it
may be NULL:
if (!jdi)
mipi_dsi_detach(dsi);
However, when jdi is NULL, the function does not return and continues by
calling jdi_panel_disable():
err = jdi_panel_disable(&jdi->base);
Inside jdi_panel_disable(), jdi is dereferenced unconditionally, which can
lead to a NULL-pointer dereference:
struct jdi_panel *jdi = to_panel_jdi(panel);
backlight_disable(jdi->backlight);
To prevent such a potential NULL-pointer dereference, return early from
jdi_panel_dsi_remove() when jdi is NULL.
In the Linux kernel, the following vulnerability has been resolved:
btrfs: do not ASSERT() when the fs flips RO inside btrfs_repair_io_failure()
[BUG]
There is a bug report that when btrfs hits ENOSPC error in a critical
path, btrfs flips RO (this part is expected, although the ENOSPC bug
still needs to be addressed).
The problem is after the RO flip, if there is a read repair pending, we
can hit the ASSERT() inside btrfs_repair_io_failure() like the following:
BTRFS info (device vdc): relocating block group 30408704 flags metadata|raid1
------------[ cut here ]------------
BTRFS: Transaction aborted (error -28)
WARNING: fs/btrfs/extent-tree.c:3235 at __btrfs_free_extent.isra.0+0x453/0xfd0, CPU#1: btrfs/383844
Modules linked in: kvm_intel kvm irqbypass
[...]
---[ end trace 0000000000000000 ]---
BTRFS info (device vdc state EA): 2 enospc errors during balance
BTRFS info (device vdc state EA): balance: ended with status: -30
BTRFS error (device vdc state EA): parent transid verify failed on logical 30556160 mirror 2 wanted 8 found 6
BTRFS error (device vdc state EA): bdev /dev/nvme0n1 errs: wr 0, rd 0, flush 0, corrupt 10, gen 0
[...]
assertion failed: !(fs_info->sb->s_flags & SB_RDONLY) :: 0, in fs/btrfs/bio.c:938
------------[ cut here ]------------
assertion failed: !(fs_info->sb->s_flags & SB_RDONLY) :: 0, in fs/btrfs/bio.c:938
kernel BUG at fs/btrfs/bio.c:938!
Oops: invalid opcode: 0000 [#1] SMP NOPTI
CPU: 0 UID: 0 PID: 868 Comm: kworker/u8:13 Tainted: G W N 6.19.0-rc6+ #4788 PREEMPT(full)
Tainted: [W]=WARN, [N]=TEST
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.17.0-0-gb52ca86e094d-prebuilt.qemu.org 04/01/2014
Workqueue: btrfs-endio simple_end_io_work
RIP: 0010:btrfs_repair_io_failure.cold+0xb2/0x120
RSP: 0000:ffffc90001d2bcf0 EFLAGS: 00010246
RAX: 0000000000000051 RBX: 0000000000001000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff8305cf42 RDI: 00000000ffffffff
RBP: 0000000000000002 R08: 00000000fffeffff R09: ffffffff837fa988
R10: ffffffff8327a9e0 R11: 6f69747265737361 R12: ffff88813018d310
R13: ffff888168b8a000 R14: ffffc90001d2bd90 R15: ffff88810a169000
FS: 0000000000000000(0000) GS:ffff8885e752c000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
------------[ cut here ]------------
[CAUSE]
The cause of -ENOSPC error during the test case btrfs/124 is still
unknown, although it's known that we still have cases where metadata can
be over-committed but can not be fulfilled correctly, thus if we hit
such ENOSPC error inside a critical path, we have no choice but abort
the current transaction.
This will mark the fs read-only.
The problem is inside the btrfs_repair_io_failure() path that we require
the fs not to be mount read-only. This is normally fine, but if we are
doing a read-repair meanwhile the fs flips RO due to a critical error,
we can enter btrfs_repair_io_failure() with super block set to
read-only, thus triggering the above crash.
[FIX]
Just replace the ASSERT() with a proper return if the fs is already
read-only.
In the Linux kernel, the following vulnerability has been resolved:
media: rockchip: rga: Fix possible ERR_PTR dereference in rga_buf_init()
rga_get_frame() can return ERR_PTR(-EINVAL) when buffer type is
unsupported or invalid. rga_buf_init() does not check the return value
and unconditionally dereferences the pointer when accessing f->size.
Add proper ERR_PTR checking and return the error to prevent
dereferencing an invalid pointer.
In the Linux kernel, the following vulnerability has been resolved:
octeontx2-af: Workaround SQM/PSE stalls by disabling sticky
NIX SQ manager sticky mode is known to cause stalls when multiple SQs
share an SMQ and transmit concurrently. Additionally, PSE may deadlock
on transitions between sticky and non-sticky transmissions. There is
also a credit drop issue observed when certain condition clocks are
gated.
work around these hardware errata by:
- Disabling SQM sticky operation:
- Clear TM6 (bit 15)
- Clear TM11 (bit 14)
- Disabling sticky → non-sticky transition path that can deadlock PSE:
- Clear TM5 (bit 23)
- Preventing credit drops by keeping the control-flow clock enabled:
- Set TM9 (bit 21)
These changes are applied via NIX_AF_SQM_DBG_CTL_STATUS. With this
configuration the SQM/PSE maintain forward progress under load without
credit loss, at the cost of disabling sticky optimizations.
In the Linux kernel, the following vulnerability has been resolved:
rapidio: replace rio_free_net() with kfree() in rio_scan_alloc_net()
When idtab allocation fails, net is not registered with rio_add_net() yet,
so kfree(net) is sufficient to release the memory. Set mport->net to NULL
to avoid dangling pointer.
In the Linux kernel, the following vulnerability has been resolved:
drm: renesas: rz-du: mipi_dsi: fix kernel panic when rebooting for some panels
Since commit 56de5e305d4b ("clk: renesas: r9a07g044: Add MSTOP for RZ/G2L")
we may get the following kernel panic, for some panels, when rebooting:
systemd-shutdown[1]: Rebooting.
Call trace:
...
do_serror+0x28/0x68
el1h_64_error_handler+0x34/0x50
el1h_64_error+0x6c/0x70
rzg2l_mipi_dsi_host_transfer+0x114/0x458 (P)
mipi_dsi_device_transfer+0x44/0x58
mipi_dsi_dcs_set_display_off_multi+0x9c/0xc4
ili9881c_unprepare+0x38/0x88
drm_panel_unprepare+0xbc/0x108
This happens for panels that need to send MIPI-DSI commands in their
unprepare() callback. Since the MIPI-DSI interface is stopped at that
point, rzg2l_mipi_dsi_host_transfer() triggers the kernel panic.
Fix by moving rzg2l_mipi_dsi_stop() to new callback function
rzg2l_mipi_dsi_atomic_post_disable().
With this change we now have the correct power-down/stop sequence:
systemd-shutdown[1]: Rebooting.
rzg2l-mipi-dsi 10850000.dsi: rzg2l_mipi_dsi_atomic_disable(): entry
ili9881c-dsi 10850000.dsi.0: ili9881c_unprepare(): entry
rzg2l-mipi-dsi 10850000.dsi: rzg2l_mipi_dsi_atomic_post_disable(): entry
reboot: Restarting system
In the Linux kernel, the following vulnerability has been resolved:
media: chips-media: wave5: Fix kthread worker destruction in polling mode
Fix the cleanup order in polling mode (irq < 0) to prevent kernel warnings
during module removal. Cancel the hrtimer before destroying the kthread
worker to ensure work queues are empty.
In polling mode, the driver uses hrtimer to periodically trigger
wave5_vpu_timer_callback() which queues work via kthread_queue_work().
The kthread_destroy_worker() function validates that both work queues
are empty with WARN_ON(!list_empty(&worker->work_list)) and
WARN_ON(!list_empty(&worker->delayed_work_list)).
The original code called kthread_destroy_worker() before hrtimer_cancel(),
creating a race condition where the timer could fire during worker
destruction and queue new work, triggering the WARN_ON.
This causes the following warning on every module unload in polling mode:
------------[ cut here ]------------
WARNING: CPU: 2 PID: 1034 at kernel/kthread.c:1430
kthread_destroy_worker+0x84/0x98
Modules linked in: wave5(-) rpmsg_ctrl rpmsg_char ...
Call trace:
kthread_destroy_worker+0x84/0x98
wave5_vpu_remove+0xc8/0xe0 [wave5]
platform_remove+0x30/0x58
...
---[ end trace 0000000000000000 ]---
In the Linux kernel, the following vulnerability has been resolved:
mm/vmalloc: prevent RCU stalls in kasan_release_vmalloc_node
When CONFIG_PAGE_OWNER is enabled, freeing KASAN shadow pages during
vmalloc cleanup triggers expensive stack unwinding that acquires RCU read
locks. Processing a large purge_list without rescheduling can cause the
task to hold CPU for extended periods (10+ seconds), leading to RCU stalls
and potential OOM conditions.
The issue manifests in purge_vmap_node() -> kasan_release_vmalloc_node()
where iterating through hundreds or thousands of vmap_area entries and
freeing their associated shadow pages causes:
rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
rcu: Tasks blocked on level-0 rcu_node (CPUs 0-1): P6229/1:b..l
...
task:kworker/0:17 state:R running task stack:28840 pid:6229
...
kasan_release_vmalloc_node+0x1ba/0xad0 mm/vmalloc.c:2299
purge_vmap_node+0x1ba/0xad0 mm/vmalloc.c:2299
Each call to kasan_release_vmalloc() can free many pages, and with
page_owner tracking, each free triggers save_stack() which performs stack
unwinding under RCU read lock. Without yielding, this creates an
unbounded RCU critical section.
Add periodic cond_resched() calls within the loop to allow:
- RCU grace periods to complete
- Other tasks to run
- Scheduler to preempt when needed
The fix uses need_resched() for immediate response under load, with a
batch count of 32 as a guaranteed upper bound to prevent worst-case stalls
even under light load.
In the Linux kernel, the following vulnerability has been resolved:
net: nfc: nci: Fix parameter validation for packet data
Since commit 9c328f54741b ("net: nfc: nci: Add parameter validation for
packet data") communication with nci nfc chips is not working any more.
The mentioned commit tries to fix access of uninitialized data, but
failed to understand that in some cases the data packet is of variable
length and can therefore not be compared to the maximum packet length
given by the sizeof(struct).
In the Linux kernel, the following vulnerability has been resolved:
media: uvcvideo: Return queued buffers on start_streaming() failure
Return buffers if streaming fails to start due to uvc_pm_get() error.
This bug may be responsible for a warning I got running
while :; do yavta -c3 /dev/video0; done
on an xHCI controller which failed under this workload.
I had no luck reproducing this warning again to confirm.
xhci_hcd 0000:09:00.0: HC died; cleaning up
usb 13-2: USB disconnect, device number 2
WARNING: CPU: 2 PID: 29386 at drivers/media/common/videobuf2/videobuf2-core.c:1803 vb2_start_streaming+0xac/0x120
In the Linux kernel, the following vulnerability has been resolved:
kexec: derive purgatory entry from symbol
kexec_load_purgatory() derives image->start by locating e_entry inside an
SHF_EXECINSTR section. If the purgatory object contains multiple
executable sections with overlapping sh_addr, the entrypoint check can
match more than once and trigger a WARN.
Derive the entry section from the purgatory_start symbol when present and
compute image->start from its final placement. Keep the existing e_entry
fallback for purgatories that do not expose the symbol.
WARNING: kernel/kexec_file.c:1009 at kexec_load_purgatory+0x395/0x3c0, CPU#10: kexec/1784
Call Trace:
<TASK>
bzImage64_load+0x133/0xa00
__do_sys_kexec_file_load+0x2b3/0x5c0
do_syscall_64+0x81/0x610
entry_SYSCALL_64_after_hwframe+0x76/0x7e
[me@linux.beauty: move helper to avoid forward declaration, per Baoquan]
In the Linux kernel, the following vulnerability has been resolved:
ext4: move ext4_percpu_param_init() before ext4_mb_init()
When running `kvm-xfstests -c ext4/1k -C 1 generic/383` with the
`DOUBLE_CHECK` macro defined, the following panic is triggered:
==================================================================
EXT4-fs error (device vdc): ext4_validate_block_bitmap:423:
comm mount: bg 0: bad block bitmap checksum
BUG: unable to handle page fault for address: ff110000fa2cc000
PGD 3e01067 P4D 3e02067 PUD 0
Oops: Oops: 0000 [#1] SMP NOPTI
CPU: 0 UID: 0 PID: 2386 Comm: mount Tainted: G W
6.18.0-gba65a4e7120a-dirty #1152 PREEMPT(none)
RIP: 0010:percpu_counter_add_batch+0x13/0xa0
Call Trace:
<TASK>
ext4_mark_group_bitmap_corrupted+0xcb/0xe0
ext4_validate_block_bitmap+0x2a1/0x2f0
ext4_read_block_bitmap+0x33/0x50
mb_group_bb_bitmap_alloc+0x33/0x80
ext4_mb_add_groupinfo+0x190/0x250
ext4_mb_init_backend+0x87/0x290
ext4_mb_init+0x456/0x640
__ext4_fill_super+0x1072/0x1680
ext4_fill_super+0xd3/0x280
get_tree_bdev_flags+0x132/0x1d0
vfs_get_tree+0x29/0xd0
vfs_cmd_create+0x59/0xe0
__do_sys_fsconfig+0x4f6/0x6b0
do_syscall_64+0x50/0x1f0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
==================================================================
This issue can be reproduced using the following commands:
mkfs.ext4 -F -q -b 1024 /dev/sda 5G
tune2fs -O quota,project /dev/sda
mount /dev/sda /tmp/test
With DOUBLE_CHECK defined, mb_group_bb_bitmap_alloc() reads
and validates the block bitmap. When the validation fails,
ext4_mark_group_bitmap_corrupted() attempts to update
sbi->s_freeclusters_counter. However, this percpu_counter has not been
initialized yet at this point, which leads to the panic described above.
Fix this by moving the execution of ext4_percpu_param_init() to occur
before ext4_mb_init(), ensuring the per-CPU counters are initialized
before they are used.
In the Linux kernel, the following vulnerability has been resolved:
drm: Account property blob allocations to memcg
DRM_IOCTL_MODE_CREATEPROPBLOB allows userspace to allocate arbitrary-sized
property blobs backed by kernel memory.
Currently, the blob data allocation is not accounted to the allocating
process's memory cgroup, allowing unprivileged users to trigger unbounded
kernel memory consumption and potentially cause system-wide OOM.
Mark the property blob data allocation with GFP_KERNEL_ACCOUNT so that the memory
is properly charged to the caller's memcg. This ensures existing cgroup
memory limits apply and prevents uncontrolled kernel memory growth without
introducing additional policy or per-file limits.
In the Linux kernel, the following vulnerability has been resolved:
mm/hugetlb: restore failed global reservations to subpool
Commit a833a693a490 ("mm: hugetlb: fix incorrect fallback for subpool")
fixed an underflow error for hstate->resv_huge_pages caused by incorrectly
attributing globally requested pages to the subpool's reservation.
Unfortunately, this fix also introduced the opposite problem, which would
leave spool->used_hpages elevated if the globally requested pages could
not be acquired. This is because while a subpool's reserve pages only
accounts for what is requested and allocated from the subpool, its "used"
counter keeps track of what is consumed in total, both from the subpool
and globally. Thus, we need to adjust spool->used_hpages in the other
direction, and make sure that globally requested pages are uncharged from
the subpool's used counter.
Each failed allocation attempt increments the used_hpages counter by how
many pages were requested from the global pool. Ultimately, this renders
the subpool unusable, as used_hpages approaches the max limit.
The issue can be reproduced as follows:
1. Allocate 4 hugetlb pages
2. Create a hugetlb mount with max=4, min=2
3. Consume 2 pages globally
4. Request 3 pages from the subpool (2 from subpool + 1 from global)
4.1 hugepage_subpool_get_pages(spool, 3) succeeds.
used_hpages += 3
4.2 hugetlb_acct_memory(h, 1) fails: no global pages left
used_hpages -= 2
5. Subpool now has used_hpages = 1, despite not being able to
successfully allocate any hugepages. It believes it can now only
allocate 3 more hugepages, not 4.
With each failed allocation attempt incrementing the used counter, the
subpool eventually reaches a point where its used counter equals its
max counter. At that point, any future allocations that try to
allocate hugeTLB pages from the subpool will fail, despite the subpool
not having any of its hugeTLB pages consumed by any user.
Once this happens, there is no way to make the subpool usable again,
since there is no way to decrement the used counter as no process is
really consuming the hugeTLB pages.
The underflow issue that the original commit fixes still remains fixed
as well.
Without this fix, used_hpages would keep on leaking if
hugetlb_acct_memory() fails.
In the Linux kernel, the following vulnerability has been resolved:
mm/slab: do not access current->mems_allowed_seq if !allow_spin
Lockdep complains when get_from_any_partial() is called in an NMI
context, because current->mems_allowed_seq is seqcount_spinlock_t and
not NMI-safe:
================================
WARNING: inconsistent lock state
6.19.0-rc5-kfree-rcu+ #315 Tainted: G N
--------------------------------
inconsistent {INITIAL USE} -> {IN-NMI} usage.
kunit_try_catch/9989 [HC1[1]:SC0[0]:HE0:SE1] takes:
ffff889085799820 (&____s->seqcount#3){.-.-}-{0:0}, at: ___slab_alloc+0x58f/0xc00
{INITIAL USE} state was registered at:
lock_acquire+0x185/0x320
kernel_init_freeable+0x391/0x1150
kernel_init+0x1f/0x220
ret_from_fork+0x736/0x8f0
ret_from_fork_asm+0x1a/0x30
irq event stamp: 56
hardirqs last enabled at (55): [<ffffffff850a68d7>] _raw_spin_unlock_irq+0x27/0x70
hardirqs last disabled at (56): [<ffffffff850858ca>] __schedule+0x2a8a/0x6630
softirqs last enabled at (0): [<ffffffff81536711>] copy_process+0x1dc1/0x6a10
softirqs last disabled at (0): [<0000000000000000>] 0x0
other info that might help us debug this:
Possible unsafe locking scenario:
CPU0
----
lock(&____s->seqcount#3);
<Interrupt>
lock(&____s->seqcount#3);
*** DEADLOCK ***
According to Documentation/locking/seqlock.rst, seqcount_t is not
NMI-safe and seqcount_latch_t should be used when read path can interrupt
the write-side critical section. In this case, do not access
current->mems_allowed_seq and avoid retry.
ai-scanner is an AI model safety scanner built on NVIDIA garak. From version 1.0.0 to before version 1.4.1, there is a remote code execution vulnerability via JavaScript injection in `BrowserAutomation::PlaywrightService`. This issue has been patched in version 1.4.1.
CROSS implementation contains reference and optimized implementations of the CROSS post-quantum signature algorithm. Prior to commit fc6b7e7, there is a buffer overflow in crypto_sign_open() caused by an underflow of the integer mlen. This issue has been patched via commit fc6b7e7.
math-codegen generates code from mathematical expressions. Prior to version 0.4.3, string literal content passed to cg.parse() is injected verbatim into a new Function() body without sanitization. This allows an attacker to execute arbitrary system commands when user-controlled input reaches the parser. Any application exposing a math evaluation endpoint where user input flows into cg.parse() is vulnerable to full RCE. This issue has been patched in version 0.4.3.