In the Linux kernel, the following vulnerability has been resolved:
media: intel/ipu6: do not handle interrupts when device is disabled
Some IPU6 devices have shared interrupts. We need to handle properly
case when interrupt is triggered from other device on shared irq line
and IPU6 itself disabled. In such case we get 0xffffffff from
ISR_STATUS register and handle all irq's cases, for what we are not
not prepared and usually hang the whole system.
To avoid the issue use pm_runtime_get_if_active() to check if
the device is enabled and prevent suspending it when we handle irq
until the end of irq. Additionally use synchronize_irq() in suspend
In the Linux kernel, the following vulnerability has been resolved:
octeontx2-pf: handle otx2_mbox_get_rsp errors in otx2_common.c
Add error pointer check after calling otx2_mbox_get_rsp().
In the Linux kernel, the following vulnerability has been resolved:
powerpc/fadump: Move fadump_cma_init to setup_arch() after initmem_init()
During early init CMA_MIN_ALIGNMENT_BYTES can be PAGE_SIZE,
since pageblock_order is still zero and it gets initialized
later during initmem_init() e.g.
setup_arch() -> initmem_init() -> sparse_init() -> set_pageblock_order()
One such use case where this causes issue is -
early_setup() -> early_init_devtree() -> fadump_reserve_mem() -> fadump_cma_init()
This causes CMA memory alignment check to be bypassed in
cma_init_reserved_mem(). Then later cma_activate_area() can hit
a VM_BUG_ON_PAGE(pfn & ((1 << order) - 1)) if the reserved memory
area was not pageblock_order aligned.
Fix it by moving the fadump_cma_init() after initmem_init(),
where other such cma reservations also gets called.
<stack trace>
==============
page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x10010
flags: 0x13ffff800000000(node=1|zone=0|lastcpupid=0x7ffff) CMA
raw: 013ffff800000000 5deadbeef0000100 5deadbeef0000122 0000000000000000
raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000
page dumped because: VM_BUG_ON_PAGE(pfn & ((1 << order) - 1))
------------[ cut here ]------------
kernel BUG at mm/page_alloc.c:778!
Call Trace:
__free_one_page+0x57c/0x7b0 (unreliable)
free_pcppages_bulk+0x1a8/0x2c8
free_unref_page_commit+0x3d4/0x4e4
free_unref_page+0x458/0x6d0
init_cma_reserved_pageblock+0x114/0x198
cma_init_reserved_areas+0x270/0x3e0
do_one_initcall+0x80/0x2f8
kernel_init_freeable+0x33c/0x530
kernel_init+0x34/0x26c
ret_from_kernel_user_thread+0x14/0x1c
Huawei HiLink AI Life product has an identity authentication bypass vulnerability. Successful exploitation of this vulnerability may allow attackers to access restricted functions.(Vulnerability ID:HWPSIRT-2022-42291)
This vulnerability has been assigned a (CVE)ID:CVE-2022-48470
An issue was discovered in Kurmi Provisioning Suite before 7.9.0.35, 7.10.x through 7.10.0.18, and 7.11.x through 7.11.0.15. An Observable Response Discrepancy vulnerability in the sendPasswordReinitLink action of the unlogged.do page allows remote attackers to test whether a username is valid or not. This allows confirmation of valid usernames.
An issue was discovered in Kurmi Provisioning Suite before 7.9.0.35 and 7.10.x through 7.10.0.18. A Directory Traversal and Local File Inclusion vulnerability in the logsSys.do page allows remote attackers (authenticated as administrators) to trigger the display of unintended files. Any file accessible to the Kurmi user account could be displayed, e.g., configuration files with information such as the database password.
A cross-site scripting (XSS) vulnerability in the graphicCustomization.do page in Kurmi Provisioning Suite before 7.9.0.38, 7.10.x through 7.10.0.18, and 7.11.x through 7.11.0.15 allows remote attackers (authenticated as system administrators) to inject arbitrary web script or HTML via the COMPONENT_fields(htmlTitle) field, which is rendered in other pages of the application for all users (if the graphical customization has been activated by a super-administrator).
A race condition vulnerability in SimplCommerce at commit 230310c8d7a0408569b292c5a805c459d47a1d8f allows attackers to bypass inventory restrictions by simultaneously submitting purchase requests from multiple accounts for the same product. This can lead to overselling when stock is limited, as the system fails to accurately track inventory under high concurrency, resulting in potential loss and unfulfilled orders.
A vulnerability was found in ruifang-tech Rebuild 3.8.6. It has been classified as problematic. This affects an unknown part of the file /user/admin-verify of the component Admin Verification Page. The manipulation of the argument nexturl with the input http://localhost/evil.html leads to open redirect. It is possible to initiate the attack remotely. The exploit has been disclosed to the public and may be used. The vendor was contacted early about this disclosure but did not respond in any way.
A vulnerability was found in WISI Tangram GT31 up to 20241214 and classified as problematic. Affected by this issue is some unknown functionality of the component HTTP Request Handler. The manipulation leads to server-side request forgery. The attack may be launched remotely. The vendor was contacted early about this disclosure but did not respond in any way.
LinkAce is a self-hosted archive to collect links of your favorite websites. Prior to 1.15.6, a reflected cross-site scripting (XSS) vulnerability exists in the LinkAce. This issue occurs in the "URL" field of the "Edit Link" module, where user input is not properly sanitized or encoded before being reflected in the HTML response. This allows attackers to inject and execute arbitrary JavaScript in the context of the victimβs browser, leading to potential session hijacking, data theft, and unauthorized actions. This vulnerability is fixed in 1.15.6.
In the Linux kernel, the following vulnerability has been resolved:
virtio_net: correct netdev_tx_reset_queue() invocation point
When virtnet_close is followed by virtnet_open, some TX completions can
possibly remain unconsumed, until they are finally processed during the
first NAPI poll after the netdev_tx_reset_queue(), resulting in a crash
[1]. Commit b96ed2c97c79 ("virtio_net: move netdev_tx_reset_queue() call
before RX napi enable") was not sufficient to eliminate all BQL crash
cases for virtio-net.
This issue can be reproduced with the latest net-next master by running:
`while :; do ip l set DEV down; ip l set DEV up; done` under heavy network
TX load from inside the machine.
netdev_tx_reset_queue() can actually be dropped from virtnet_open path;
the device is not stopped in any case. For BQL core part, it's just like
traffic nearly ceases to exist for some period. For stall detector added
to BQL, even if virtnet_close could somehow lead to some TX completions
delayed for long, followed by virtnet_open, we can just take it as stall
as mentioned in commit 6025b9135f7a ("net: dqs: add NIC stall detector
based on BQL"). Note also that users can still reset stall_max via sysfs.
So, drop netdev_tx_reset_queue() from virtnet_enable_queue_pair(). This
eliminates the BQL crashes. As a result, netdev_tx_reset_queue() is now
explicitly required in freeze/restore path. This patch adds it to
immediately after free_unused_bufs(), following the rule of thumb:
netdev_tx_reset_queue() should follow any SKB freeing not followed by
netdev_tx_completed_queue(). This seems the most consistent and
streamlined approach, and now netdev_tx_reset_queue() runs whenever
free_unused_bufs() is done.
[1]:
------------[ cut here ]------------
kernel BUG at lib/dynamic_queue_limits.c:99!
Oops: invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
CPU: 7 UID: 0 PID: 1598 Comm: ip Tainted: G N 6.12.0net-next_main+ #2
Tainted: [N]=TEST
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), \
BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:dql_completed+0x26b/0x290
Code: b7 c2 49 89 e9 44 89 da 89 c6 4c 89 d7 e8 ed 17 47 00 58 65 ff 0d
4d 27 90 7e 0f 85 fd fe ff ff e8 ea 53 8d ff e9 f3 fe ff ff <0f> 0b 01
d2 44 89 d1 29 d1 ba 00 00 00 00 0f 48 ca e9 28 ff ff ff
RSP: 0018:ffffc900002b0d08 EFLAGS: 00010297
RAX: 0000000000000000 RBX: ffff888102398c80 RCX: 0000000080190009
RDX: 0000000000000000 RSI: 000000000000006a RDI: 0000000000000000
RBP: ffff888102398c00 R08: 0000000000000000 R09: 0000000000000000
R10: 00000000000000ca R11: 0000000000015681 R12: 0000000000000001
R13: ffffc900002b0d68 R14: ffff88811115e000 R15: ffff8881107aca40
FS: 00007f41ded69500(0000) GS:ffff888667dc0000(0000)
knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000556ccc2dc1a0 CR3: 0000000104fd8003 CR4: 0000000000772ef0
PKRU: 55555554
Call Trace:
<IRQ>
? die+0x32/0x80
? do_trap+0xd9/0x100
? dql_completed+0x26b/0x290
? dql_completed+0x26b/0x290
? do_error_trap+0x6d/0xb0
? dql_completed+0x26b/0x290
? exc_invalid_op+0x4c/0x60
? dql_completed+0x26b/0x290
? asm_exc_invalid_op+0x16/0x20
? dql_completed+0x26b/0x290
__free_old_xmit+0xff/0x170 [virtio_net]
free_old_xmit+0x54/0xc0 [virtio_net]
virtnet_poll+0xf4/0xe30 [virtio_net]
? __update_load_avg_cfs_rq+0x264/0x2d0
? update_curr+0x35/0x260
? reweight_entity+0x1be/0x260
__napi_poll.constprop.0+0x28/0x1c0
net_rx_action+0x329/0x420
? enqueue_hrtimer+0x35/0x90
? trace_hardirqs_on+0x1d/0x80
? kvm_sched_clock_read+0xd/0x20
? sched_clock+0xc/0x30
? kvm_sched_clock_read+0xd/0x20
? sched_clock+0xc/0x30
? sched_clock_cpu+0xd/0x1a0
handle_softirqs+0x138/0x3e0
do_softirq.part.0+0x89/0xc0
</IRQ>
<TASK>
__local_bh_enable_ip+0xa7/0xb0
virtnet_open+0xc8/0x310 [virtio_net]
__dev_open+0xfa/0x1b0
__dev_change_flags+0x1de/0x250
dev_change_flags+0x22/0x60
do_setlink.isra.0+0x2df/0x10b0
? rtnetlink_rcv_msg+0x34f/0x3f0
? netlink_rcv_skb+0x54/0x100
? netlink_unicas
---truncated---
In the Linux kernel, the following vulnerability has been resolved:
gpio: graniterapids: Fix vGPIO driver crash
Move setting irq_chip.name from probe() function to the initialization
of "irq_chip" struct in order to fix vGPIO driver crash during bootup.
Crash was caused by unauthorized modification of irq_chip.name field
where irq_chip struct was initialized as const.
This behavior is a consequence of suboptimal implementation of
gpio_irq_chip_set_chip(), which should be changed to avoid
casting away const qualifier.
Crash log:
BUG: unable to handle page fault for address: ffffffffc0ba81c0
/#PF: supervisor write access in kernel mode
/#PF: error_code(0x0003) - permissions violation
CPU: 33 UID: 0 PID: 1075 Comm: systemd-udevd Not tainted 6.12.0-rc6-00077-g2e1b3cc9d7f7 #1
Hardware name: Intel Corporation Kaseyville RP/Kaseyville RP, BIOS KVLDCRB1.PGS.0026.D73.2410081258 10/08/2024
RIP: 0010:gnr_gpio_probe+0x171/0x220 [gpio_graniterapids]
In the Linux kernel, the following vulnerability has been resolved:
usb: gadget: u_serial: Fix the issue that gs_start_io crashed due to accessing null pointer
Considering that in some extreme cases,
when u_serial driver is accessed by multiple threads,
Thread A is executing the open operation and calling the gs_open,
Thread B is executing the disconnect operation and calling the
gserial_disconnect function,The port->port_usb pointer will be set to NULL.
E.g.
Thread A Thread B
gs_open() gadget_unbind_driver()
gs_start_io() composite_disconnect()
gs_start_rx() gserial_disconnect()
... ...
spin_unlock(&port->port_lock)
status = usb_ep_queue() spin_lock(&port->port_lock)
spin_lock(&port->port_lock) port->port_usb = NULL
gs_free_requests(port->port_usb->in) spin_unlock(&port->port_lock)
Crash
This causes thread A to access a null pointer (port->port_usb is null)
when calling the gs_free_requests function, causing a crash.
If port_usb is NULL, the release request will be skipped as it
will be done by gserial_disconnect.
So add a null pointer check to gs_start_io before attempting
to access the value of the pointer port->port_usb.
Call trace:
gs_start_io+0x164/0x25c
gs_open+0x108/0x13c
tty_open+0x314/0x638
chrdev_open+0x1b8/0x258
do_dentry_open+0x2c4/0x700
vfs_open+0x2c/0x3c
path_openat+0xa64/0xc60
do_filp_open+0xb8/0x164
do_sys_openat2+0x84/0xf0
__arm64_sys_openat+0x70/0x9c
invoke_syscall+0x58/0x114
el0_svc_common+0x80/0xe0
do_el0_svc+0x1c/0x28
el0_svc+0x38/0x68
In the Linux kernel, the following vulnerability has been resolved:
iommu/vt-d: Fix qi_batch NULL pointer with nested parent domain
The qi_batch is allocated when assigning cache tag for a domain. While
for nested parent domain, it is missed. Hence, when trying to map pages
to the nested parent, NULL dereference occurred. Also, there is potential
memleak since there is no lock around domain->qi_batch allocation.
To solve it, add a helper for qi_batch allocation, and call it in both
the __cache_tag_assign_domain() and __cache_tag_assign_parent_domain().
BUG: kernel NULL pointer dereference, address: 0000000000000200
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not-present page
PGD 8104795067 P4D 0
Oops: Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 223 UID: 0 PID: 4357 Comm: qemu-system-x86 Not tainted 6.13.0-rc1-00028-g4b50c3c3b998-dirty #2632
Call Trace:
? __die+0x24/0x70
? page_fault_oops+0x80/0x150
? do_user_addr_fault+0x63/0x7b0
? exc_page_fault+0x7c/0x220
? asm_exc_page_fault+0x26/0x30
? cache_tag_flush_range_np+0x13c/0x260
intel_iommu_iotlb_sync_map+0x1a/0x30
iommu_map+0x61/0xf0
batch_to_domain+0x188/0x250
iopt_area_fill_domains+0x125/0x320
? rcu_is_watching+0x11/0x50
iopt_map_pages+0x63/0x100
iopt_map_common.isra.0+0xa7/0x190
iopt_map_user_pages+0x6a/0x80
iommufd_ioas_map+0xcd/0x1d0
iommufd_fops_ioctl+0x118/0x1c0
__x64_sys_ioctl+0x93/0xc0
do_syscall_64+0x71/0x140
entry_SYSCALL_64_after_hwframe+0x76/0x7e
In the Linux kernel, the following vulnerability has been resolved:
drm/i915: Fix NULL pointer dereference in capture_engine
When the intel_context structure contains NULL,
it raises a NULL pointer dereference error in drm_info().
(cherry picked from commit 754302a5bc1bd8fd3b7d85c168b0a1af6d4bba4d)
In the Linux kernel, the following vulnerability has been resolved:
drm/amdkfd: Dereference null return value
In the function pqm_uninit there is a call-assignment of "pdd =
kfd_get_process_device_data" which could be null, and this value was
later dereferenced without checking.
In the Linux kernel, the following vulnerability has been resolved:
bpf,perf: Fix invalid prog_array access in perf_event_detach_bpf_prog
Syzbot reported [1] crash that happens for following tracing scenario:
- create tracepoint perf event with attr.inherit=1, attach it to the
process and set bpf program to it
- attached process forks -> chid creates inherited event
the new child event shares the parent's bpf program and tp_event
(hence prog_array) which is global for tracepoint
- exit both process and its child -> release both events
- first perf_event_detach_bpf_prog call will release tp_event->prog_array
and second perf_event_detach_bpf_prog will crash, because
tp_event->prog_array is NULL
The fix makes sure the perf_event_detach_bpf_prog checks prog_array
is valid before it tries to remove the bpf program from it.
[1] https://lore.kernel.org/bpf/Z1MR6dCIKajNS6nU@krava/T/#m91dbf0688221ec7a7fc95e896a7ef9ff93b0b8ad
In the Linux kernel, the following vulnerability has been resolved:
acpi: nfit: vmalloc-out-of-bounds Read in acpi_nfit_ctl
Fix an issue detected by syzbot with KASAN:
BUG: KASAN: vmalloc-out-of-bounds in cmd_to_func drivers/acpi/nfit/
core.c:416 [inline]
BUG: KASAN: vmalloc-out-of-bounds in acpi_nfit_ctl+0x20e8/0x24a0
drivers/acpi/nfit/core.c:459
The issue occurs in cmd_to_func when the call_pkg->nd_reserved2
array is accessed without verifying that call_pkg points to a buffer
that is appropriately sized as a struct nd_cmd_pkg. This can lead
to out-of-bounds access and undefined behavior if the buffer does not
have sufficient space.
To address this, a check was added in acpi_nfit_ctl() to ensure that
buf is not NULL and that buf_len is less than sizeof(*call_pkg)
before accessing it. This ensures safe access to the members of
call_pkg, including the nd_reserved2 array.
In the Linux kernel, the following vulnerability has been resolved:
net/mlx5: DR, prevent potential error pointer dereference
The dr_domain_add_vport_cap() function generally returns NULL on error
but sometimes we want it to return ERR_PTR(-EBUSY) so the caller can
retry. The problem here is that "ret" can be either -EBUSY or -ENOMEM
and if it's and -ENOMEM then the error pointer is propogated back and
eventually dereferenced in dr_ste_v0_build_src_gvmi_qpn_tag().
In the Linux kernel, the following vulnerability has been resolved:
ALSA: control: Avoid WARN() for symlink errors
Using WARN() for showing the error of symlink creations don't give
more information than telling that something goes wrong, since the
usual code path is a lregister callback from each control element
creation. More badly, the use of WARN() rather confuses fuzzer as if
it were serious issues.
This patch downgrades the warning messages to use the normal dev_err()
instead of WARN(). For making it clearer, add the function name to
the prefix, too.
In the Linux kernel, the following vulnerability has been resolved:
bnxt_en: Fix aggregation ID mask to prevent oops on 5760X chips
The 5760X (P7) chip's HW GRO/LRO interface is very similar to that of
the previous generation (5750X or P5). However, the aggregation ID
fields in the completion structures on P7 have been redefined from
16 bits to 12 bits. The freed up 4 bits are redefined for part of the
metadata such as the VLAN ID. The aggregation ID mask was not modified
when adding support for P7 chips. Including the extra 4 bits for the
aggregation ID can potentially cause the driver to store or fetch the
packet header of GRO/LRO packets in the wrong TPA buffer. It may hit
the BUG() condition in __skb_pull() because the SKB contains no valid
packet header:
kernel BUG at include/linux/skbuff.h:2766!
Oops: invalid opcode: 0000 1 PREEMPT SMP NOPTI
CPU: 4 UID: 0 PID: 0 Comm: swapper/4 Kdump: loaded Tainted: G OE 6.12.0-rc2+ #7
Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
Hardware name: Dell Inc. PowerEdge R760/0VRV9X, BIOS 1.0.1 12/27/2022
RIP: 0010:eth_type_trans+0xda/0x140
Code: 80 00 00 00 eb c1 8b 47 70 2b 47 74 48 8b 97 d0 00 00 00 83 f8 01 7e 1b 48 85 d2 74 06 66 83 3a ff 74 09 b8 00 04 00 00 eb a5 <0f> 0b b8 00 01 00 00 eb 9c 48 85 ff 74 eb 31 f6 b9 02 00 00 00 48
RSP: 0018:ff615003803fcc28 EFLAGS: 00010283
RAX: 00000000000022d2 RBX: 0000000000000003 RCX: ff2e8c25da334040
RDX: 0000000000000040 RSI: ff2e8c25c1ce8000 RDI: ff2e8c25869f9000
RBP: ff2e8c258c31c000 R08: ff2e8c25da334000 R09: 0000000000000001
R10: ff2e8c25da3342c0 R11: ff2e8c25c1ce89c0 R12: ff2e8c258e0990b0
R13: ff2e8c25bb120000 R14: ff2e8c25c1ce89c0 R15: ff2e8c25869f9000
FS: 0000000000000000(0000) GS:ff2e8c34be300000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055f05317e4c8 CR3: 000000108bac6006 CR4: 0000000000773ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
<IRQ>
? die+0x33/0x90
? do_trap+0xd9/0x100
? eth_type_trans+0xda/0x140
? do_error_trap+0x65/0x80
? eth_type_trans+0xda/0x140
? exc_invalid_op+0x4e/0x70
? eth_type_trans+0xda/0x140
? asm_exc_invalid_op+0x16/0x20
? eth_type_trans+0xda/0x140
bnxt_tpa_end+0x10b/0x6b0 [bnxt_en]
? bnxt_tpa_start+0x195/0x320 [bnxt_en]
bnxt_rx_pkt+0x902/0xd90 [bnxt_en]
? __bnxt_tx_int.constprop.0+0x89/0x300 [bnxt_en]
? kmem_cache_free+0x343/0x440
? __bnxt_tx_int.constprop.0+0x24f/0x300 [bnxt_en]
__bnxt_poll_work+0x193/0x370 [bnxt_en]
bnxt_poll_p5+0x9a/0x300 [bnxt_en]
? try_to_wake_up+0x209/0x670
__napi_poll+0x29/0x1b0
Fix it by redefining the aggregation ID mask for P5_PLUS chips to be
12 bits. This will work because the maximum aggregation ID is less
than 4096 on all P5_PLUS chips.
In the Linux kernel, the following vulnerability has been resolved:
Bluetooth: hci_event: Fix using rcu_read_(un)lock while iterating
The usage of rcu_read_(un)lock while inside list_for_each_entry_rcu is
not safe since for the most part entries fetched this way shall be
treated as rcu_dereference:
Note that the value returned by rcu_dereference() is valid
only within the enclosing RCU read-side critical section [1]_.
For example, the following is **not** legal::
rcu_read_lock();
p = rcu_dereference(head.next);
rcu_read_unlock();
x = p->address; /* BUG!!! */
rcu_read_lock();
y = p->data; /* BUG!!! */
rcu_read_unlock();
In the Linux kernel, the following vulnerability has been resolved:
net: enetc: Do not configure preemptible TCs if SIs do not support
Both ENETC PF and VF drivers share enetc_setup_tc_mqprio() to configure
MQPRIO. And enetc_setup_tc_mqprio() calls enetc_change_preemptible_tcs()
to configure preemptible TCs. However, only PF is able to configure
preemptible TCs. Because only PF has related registers, while VF does not
have these registers. So for VF, its hw->port pointer is NULL. Therefore,
VF will access an invalid pointer when accessing a non-existent register,
which will cause a crash issue. The simplified log is as follows.
root@ls1028ardb:~# tc qdisc add dev eno0vf0 parent root handle 100: \
mqprio num_tc 4 map 0 0 1 1 2 2 3 3 queues 1@0 1@1 1@2 1@3 hw 1
[ 187.290775] Unable to handle kernel paging request at virtual address 0000000000001f00
[ 187.424831] pc : enetc_mm_commit_preemptible_tcs+0x1c4/0x400
[ 187.430518] lr : enetc_mm_commit_preemptible_tcs+0x30c/0x400
[ 187.511140] Call trace:
[ 187.513588] enetc_mm_commit_preemptible_tcs+0x1c4/0x400
[ 187.518918] enetc_setup_tc_mqprio+0x180/0x214
[ 187.523374] enetc_vf_setup_tc+0x1c/0x30
[ 187.527306] mqprio_enable_offload+0x144/0x178
[ 187.531766] mqprio_init+0x3ec/0x668
[ 187.535351] qdisc_create+0x15c/0x488
[ 187.539023] tc_modify_qdisc+0x398/0x73c
[ 187.542958] rtnetlink_rcv_msg+0x128/0x378
[ 187.547064] netlink_rcv_skb+0x60/0x130
[ 187.550910] rtnetlink_rcv+0x18/0x24
[ 187.554492] netlink_unicast+0x300/0x36c
[ 187.558425] netlink_sendmsg+0x1a8/0x420
[ 187.606759] ---[ end trace 0000000000000000 ]---
In addition, some PFs also do not support configuring preemptible TCs,
such as eno1 and eno3 on LS1028A. It won't crash like it does for VFs,
but we should prevent these PFs from accessing these unimplemented
registers.
In the Linux kernel, the following vulnerability has been resolved:
net: Fix icmp host relookup triggering ip_rt_bug
arp link failure may trigger ip_rt_bug while xfrm enabled, call trace is:
WARNING: CPU: 0 PID: 0 at net/ipv4/route.c:1241 ip_rt_bug+0x14/0x20
Modules linked in:
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-rc6-00077-g2e1b3cc9d7f7
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014
RIP: 0010:ip_rt_bug+0x14/0x20
Call Trace:
<IRQ>
ip_send_skb+0x14/0x40
__icmp_send+0x42d/0x6a0
ipv4_link_failure+0xe2/0x1d0
arp_error_report+0x3c/0x50
neigh_invalidate+0x8d/0x100
neigh_timer_handler+0x2e1/0x330
call_timer_fn+0x21/0x120
__run_timer_base.part.0+0x1c9/0x270
run_timer_softirq+0x4c/0x80
handle_softirqs+0xac/0x280
irq_exit_rcu+0x62/0x80
sysvec_apic_timer_interrupt+0x77/0x90
The script below reproduces this scenario:
ip xfrm policy add src 0.0.0.0/0 dst 0.0.0.0/0 \
dir out priority 0 ptype main flag localok icmp
ip l a veth1 type veth
ip a a 192.168.141.111/24 dev veth0
ip l s veth0 up
ping 192.168.141.155 -c 1
icmp_route_lookup() create input routes for locally generated packets
while xfrm relookup ICMP traffic.Then it will set input route
(dst->out = ip_rt_bug) to skb for DESTUNREACH.
For ICMP err triggered by locally generated packets, dst->dev of output
route is loopback. Generally, xfrm relookup verification is not required
on loopback interfaces (net.ipv4.conf.lo.disable_xfrm = 1).
Skip icmp relookup for locally generated packets to fix it.
In the Linux kernel, the following vulnerability has been resolved:
can: j1939: j1939_session_new(): fix skb reference counting
Since j1939_session_skb_queue() does an extra skb_get() for each new
skb, do the same for the initial one in j1939_session_new() to avoid
refcount underflow.
[mkl: clean up commit message]
In the Linux kernel, the following vulnerability has been resolved:
net/ipv6: release expired exception dst cached in socket
Dst objects get leaked in ip6_negative_advice() when this function is
executed for an expired IPv6 route located in the exception table. There
are several conditions that must be fulfilled for the leak to occur:
* an ICMPv6 packet indicating a change of the MTU for the path is received,
resulting in an exception dst being created
* a TCP connection that uses the exception dst for routing packets must
start timing out so that TCP begins retransmissions
* after the exception dst expires, the FIB6 garbage collector must not run
before TCP executes ip6_negative_advice() for the expired exception dst
When TCP executes ip6_negative_advice() for an exception dst that has
expired and if no other socket holds a reference to the exception dst, the
refcount of the exception dst is 2, which corresponds to the increment
made by dst_init() and the increment made by the TCP socket for which the
connection is timing out. The refcount made by the socket is never
released. The refcount of the dst is decremented in sk_dst_reset() but
that decrement is counteracted by a dst_hold() intentionally placed just
before the sk_dst_reset() in ip6_negative_advice(). After
ip6_negative_advice() has finished, there is no other object tied to the
dst. The socket lost its reference stored in sk_dst_cache and the dst is
no longer in the exception table. The exception dst becomes a leaked
object.
As a result of this dst leak, an unbalanced refcount is reported for the
loopback device of a net namespace being destroyed under kernels that do
not contain e5f80fcf869a ("ipv6: give an IPv6 dev to blackhole_netdev"):
unregister_netdevice: waiting for lo to become free. Usage count = 2
Fix the dst leak by removing the dst_hold() in ip6_negative_advice(). The
patch that introduced the dst_hold() in ip6_negative_advice() was
92f1655aa2b22 ("net: fix __dst_negative_advice() race"). But 92f1655aa2b22
merely refactored the code with regards to the dst refcount so the issue
was present even before 92f1655aa2b22. The bug was introduced in
54c1a859efd9f ("ipv6: Don't drop cache route entry unless timer actually
expired.") where the expired cached route is deleted and the sk_dst_cache
member of the socket is set to NULL by calling dst_negative_advice() but
the refcount belonging to the socket is left unbalanced.
The IPv4 version - ipv4_negative_advice() - is not affected by this bug.
When the TCP connection times out ipv4_negative_advice() merely resets the
sk_dst_cache of the socket while decrementing the refcount of the
exception dst.
In the Linux kernel, the following vulnerability has been resolved:
dccp: Fix memory leak in dccp_feat_change_recv
If dccp_feat_push_confirm() fails after new value for SP feature was accepted
without reconciliation ('entry == NULL' branch), memory allocated for that value
with dccp_feat_clone_sp_val() is never freed.
Here is the kmemleak stack for this:
unreferenced object 0xffff88801d4ab488 (size 8):
comm "syz-executor310", pid 1127, jiffies 4295085598 (age 41.666s)
hex dump (first 8 bytes):
01 b4 4a 1d 80 88 ff ff ..J.....
backtrace:
[<00000000db7cabfe>] kmemdup+0x23/0x50 mm/util.c:128
[<0000000019b38405>] kmemdup include/linux/string.h:465 [inline]
[<0000000019b38405>] dccp_feat_clone_sp_val net/dccp/feat.c:371 [inline]
[<0000000019b38405>] dccp_feat_clone_sp_val net/dccp/feat.c:367 [inline]
[<0000000019b38405>] dccp_feat_change_recv net/dccp/feat.c:1145 [inline]
[<0000000019b38405>] dccp_feat_parse_options+0x1196/0x2180 net/dccp/feat.c:1416
[<00000000b1f6d94a>] dccp_parse_options+0xa2a/0x1260 net/dccp/options.c:125
[<0000000030d7b621>] dccp_rcv_state_process+0x197/0x13d0 net/dccp/input.c:650
[<000000001f74c72e>] dccp_v4_do_rcv+0xf9/0x1a0 net/dccp/ipv4.c:688
[<00000000a6c24128>] sk_backlog_rcv include/net/sock.h:1041 [inline]
[<00000000a6c24128>] __release_sock+0x139/0x3b0 net/core/sock.c:2570
[<00000000cf1f3a53>] release_sock+0x54/0x1b0 net/core/sock.c:3111
[<000000008422fa23>] inet_wait_for_connect net/ipv4/af_inet.c:603 [inline]
[<000000008422fa23>] __inet_stream_connect+0x5d0/0xf70 net/ipv4/af_inet.c:696
[<0000000015b6f64d>] inet_stream_connect+0x53/0xa0 net/ipv4/af_inet.c:735
[<0000000010122488>] __sys_connect_file+0x15c/0x1a0 net/socket.c:1865
[<00000000b4b70023>] __sys_connect+0x165/0x1a0 net/socket.c:1882
[<00000000f4cb3815>] __do_sys_connect net/socket.c:1892 [inline]
[<00000000f4cb3815>] __se_sys_connect net/socket.c:1889 [inline]
[<00000000f4cb3815>] __x64_sys_connect+0x6e/0xb0 net/socket.c:1889
[<00000000e7b1e839>] do_syscall_64+0x33/0x40 arch/x86/entry/common.c:46
[<0000000055e91434>] entry_SYSCALL_64_after_hwframe+0x67/0xd1
Clean up the allocated memory in case of dccp_feat_push_confirm() failure
and bail out with an error reset code.
Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
In the Linux kernel, the following vulnerability has been resolved:
net/smc: initialize close_work early to avoid warning
We encountered a warning that close_work was canceled before
initialization.
WARNING: CPU: 7 PID: 111103 at kernel/workqueue.c:3047 __flush_work+0x19e/0x1b0
Workqueue: events smc_lgr_terminate_work [smc]
RIP: 0010:__flush_work+0x19e/0x1b0
Call Trace:
? __wake_up_common+0x7a/0x190
? work_busy+0x80/0x80
__cancel_work_timer+0xe3/0x160
smc_close_cancel_work+0x1a/0x70 [smc]
smc_close_active_abort+0x207/0x360 [smc]
__smc_lgr_terminate.part.38+0xc8/0x180 [smc]
process_one_work+0x19e/0x340
worker_thread+0x30/0x370
? process_one_work+0x340/0x340
kthread+0x117/0x130
? __kthread_cancel_work+0x50/0x50
ret_from_fork+0x22/0x30
This is because when smc_close_cancel_work is triggered, e.g. the RDMA
driver is rmmod and the LGR is terminated, the conn->close_work is
flushed before initialization, resulting in WARN_ON(!work->func).
__smc_lgr_terminate | smc_connect_{rdma|ism}
-------------------------------------------------------------
| smc_conn_create
| \- smc_lgr_register_conn
for conn in lgr->conns_all |
\- smc_conn_kill |
\- smc_close_active_abort |
\- smc_close_cancel_work |
\- cancel_work_sync |
\- __flush_work |
(close_work) |
| smc_close_init
| \- INIT_WORK(&close_work)
So fix this by initializing close_work before establishing the
connection.
In the Linux kernel, the following vulnerability has been resolved:
netfilter: ipset: Hold module reference while requesting a module
User space may unload ip_set.ko while it is itself requesting a set type
backend module, leading to a kernel crash. The race condition may be
provoked by inserting an mdelay() right after the nfnl_unlock() call.
In the Linux kernel, the following vulnerability has been resolved:
gpio: grgpio: Add NULL check in grgpio_probe
devm_kasprintf() can return a NULL pointer on failure,but this
returned value in grgpio_probe is not checked.
Add NULL check in grgpio_probe, to handle kernel NULL
pointer dereference error.
In the Linux kernel, the following vulnerability has been resolved:
nvme-tcp: fix the memleak while create new ctrl failed
Now while we create new ctrl failed, we have not free the
tagset occupied by admin_q, here try to fix it.
In the Linux kernel, the following vulnerability has been resolved:
ocfs2: free inode when ocfs2_get_init_inode() fails
syzbot is reporting busy inodes after unmount, for commit 9c89fe0af826
("ocfs2: Handle error from dquot_initialize()") forgot to call iput() when
new_inode() succeeded and dquot_initialize() failed.
In the Linux kernel, the following vulnerability has been resolved:
can: dev: can_set_termination(): allow sleeping GPIOs
In commit 6e86a1543c37 ("can: dev: provide optional GPIO based
termination support") GPIO based termination support was added.
For no particular reason that patch uses gpiod_set_value() to set the
GPIO. This leads to the following warning, if the systems uses a
sleeping GPIO, i.e. behind an I2C port expander:
| WARNING: CPU: 0 PID: 379 at /drivers/gpio/gpiolib.c:3496 gpiod_set_value+0x50/0x6c
| CPU: 0 UID: 0 PID: 379 Comm: ip Not tainted 6.11.0-20241016-1 #1 823affae360cc91126e4d316d7a614a8bf86236c
Replace gpiod_set_value() by gpiod_set_value_cansleep() to allow the
use of sleeping GPIOs.
In the Linux kernel, the following vulnerability has been resolved:
scsi: qla2xxx: Fix use after free on unload
System crash is observed with stack trace warning of use after
free. There are 2 signals to tell dpc_thread to terminate (UNLOADING
flag and kthread_stop).
On setting the UNLOADING flag when dpc_thread happens to run at the time
and sees the flag, this causes dpc_thread to exit and clean up
itself. When kthread_stop is called for final cleanup, this causes use
after free.
Remove UNLOADING signal to terminate dpc_thread. Use the kthread_stop
as the main signal to exit dpc_thread.
[596663.812935] kernel BUG at mm/slub.c:294!
[596663.812950] invalid opcode: 0000 [#1] SMP PTI
[596663.812957] CPU: 13 PID: 1475935 Comm: rmmod Kdump: loaded Tainted: G IOE --------- - - 4.18.0-240.el8.x86_64 #1
[596663.812960] Hardware name: HP ProLiant DL380p Gen8, BIOS P70 08/20/2012
[596663.812974] RIP: 0010:__slab_free+0x17d/0x360
...
[596663.813008] Call Trace:
[596663.813022] ? __dentry_kill+0x121/0x170
[596663.813030] ? _cond_resched+0x15/0x30
[596663.813034] ? _cond_resched+0x15/0x30
[596663.813039] ? wait_for_completion+0x35/0x190
[596663.813048] ? try_to_wake_up+0x63/0x540
[596663.813055] free_task+0x5a/0x60
[596663.813061] kthread_stop+0xf3/0x100
[596663.813103] qla2x00_remove_one+0x284/0x440 [qla2xxx]
In the Linux kernel, the following vulnerability has been resolved:
scsi: ufs: core: sysfs: Prevent div by zero
Prevent a division by 0 when monitoring is not enabled.
In the Linux kernel, the following vulnerability has been resolved:
pmdomain: imx: gpcv2: Adjust delay after power up handshake
The udelay(5) is not enough, sometimes below kernel panic
still be triggered:
[ 4.012973] Kernel panic - not syncing: Asynchronous SError Interrupt
[ 4.012976] CPU: 2 UID: 0 PID: 186 Comm: (udev-worker) Not tainted 6.12.0-rc2-0.0.0-devel-00004-g8b1b79e88956 #1
[ 4.012982] Hardware name: Toradex Verdin iMX8M Plus WB on Dahlia Board (DT)
[ 4.012985] Call trace:
[...]
[ 4.013029] arm64_serror_panic+0x64/0x70
[ 4.013034] do_serror+0x3c/0x70
[ 4.013039] el1h_64_error_handler+0x30/0x54
[ 4.013046] el1h_64_error+0x64/0x68
[ 4.013050] clk_imx8mp_audiomix_runtime_resume+0x38/0x48
[ 4.013059] __genpd_runtime_resume+0x30/0x80
[ 4.013066] genpd_runtime_resume+0x114/0x29c
[ 4.013073] __rpm_callback+0x48/0x1e0
[ 4.013079] rpm_callback+0x68/0x80
[ 4.013084] rpm_resume+0x3bc/0x6a0
[ 4.013089] __pm_runtime_resume+0x50/0x9c
[ 4.013095] pm_runtime_get_suppliers+0x60/0x8c
[ 4.013101] __driver_probe_device+0x4c/0x14c
[ 4.013108] driver_probe_device+0x3c/0x120
[ 4.013114] __driver_attach+0xc4/0x200
[ 4.013119] bus_for_each_dev+0x7c/0xe0
[ 4.013125] driver_attach+0x24/0x30
[ 4.013130] bus_add_driver+0x110/0x240
[ 4.013135] driver_register+0x68/0x124
[ 4.013142] __platform_driver_register+0x24/0x30
[ 4.013149] sdma_driver_init+0x20/0x1000 [imx_sdma]
[ 4.013163] do_one_initcall+0x60/0x1e0
[ 4.013168] do_init_module+0x5c/0x21c
[ 4.013175] load_module+0x1a98/0x205c
[ 4.013181] init_module_from_file+0x88/0xd4
[ 4.013187] __arm64_sys_finit_module+0x258/0x350
[ 4.013194] invoke_syscall.constprop.0+0x50/0xe0
[ 4.013202] do_el0_svc+0xa8/0xe0
[ 4.013208] el0_svc+0x3c/0x140
[ 4.013215] el0t_64_sync_handler+0x120/0x12c
[ 4.013222] el0t_64_sync+0x190/0x194
[ 4.013228] SMP: stopping secondary CPUs
The correct way is to wait handshake, but it needs BUS clock of
BLK-CTL be enabled, which is in separate driver. So delay is the
only option here. The udelay(10) is a data got by experiment.
In the Linux kernel, the following vulnerability has been resolved:
cacheinfo: Allocate memory during CPU hotplug if not done from the primary CPU
Commit
5944ce092b97 ("arch_topology: Build cacheinfo from primary CPU")
adds functionality that architectures can use to optionally allocate and
build cacheinfo early during boot. Commit
6539cffa9495 ("cacheinfo: Add arch specific early level initializer")
lets secondary CPUs correct (and reallocate memory) cacheinfo data if
needed.
If the early build functionality is not used and cacheinfo does not need
correction, memory for cacheinfo is never allocated. x86 does not use
the early build functionality. Consequently, during the cacheinfo CPU
hotplug callback, last_level_cache_is_valid() attempts to dereference
a NULL pointer:
BUG: kernel NULL pointer dereference, address: 0000000000000100
#PF: supervisor read access in kernel mode
#PF: error_code(0x0000) - not present page
PGD 0 P4D 0
Oops: 0000 [#1] PREEPMT SMP NOPTI
CPU: 0 PID 19 Comm: cpuhp/0 Not tainted 6.4.0-rc2 #1
RIP: 0010: last_level_cache_is_valid+0x95/0xe0a
Allocate memory for cacheinfo during the cacheinfo CPU hotplug callback
if not done earlier.
Moreover, before determining the validity of the last-level cache info,
ensure that it has been allocated. Simply checking for non-zero
cache_leaves() is not sufficient, as some architectures (e.g., Intel
processors) have non-zero cache_leaves() before allocation.
Dereferencing NULL cacheinfo can occur in update_per_cpu_data_slice_size().
This function iterates over all online CPUs. However, a CPU may have come
online recently, but its cacheinfo may not have been allocated yet.
While here, remove an unnecessary indentation in allocate_cache_info().
[ bp: Massage. ]
In the Linux kernel, the following vulnerability has been resolved:
sched/numa: fix memory leak due to the overwritten vma->numab_state
[Problem Description]
When running the hackbench program of LTP, the following memory leak is
reported by kmemleak.
# /opt/ltp/testcases/bin/hackbench 20 thread 1000
Running with 20*40 (== 800) tasks.
# dmesg | grep kmemleak
...
kmemleak: 480 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
kmemleak: 665 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff888cd8ca2c40 (size 64):
comm "hackbench", pid 17142, jiffies 4299780315
hex dump (first 32 bytes):
ac 74 49 00 01 00 00 00 4c 84 49 00 01 00 00 00 .tI.....L.I.....
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace (crc bff18fd4):
[<ffffffff81419a89>] __kmalloc_cache_noprof+0x2f9/0x3f0
[<ffffffff8113f715>] task_numa_work+0x725/0xa00
[<ffffffff8110f878>] task_work_run+0x58/0x90
[<ffffffff81ddd9f8>] syscall_exit_to_user_mode+0x1c8/0x1e0
[<ffffffff81dd78d5>] do_syscall_64+0x85/0x150
[<ffffffff81e0012b>] entry_SYSCALL_64_after_hwframe+0x76/0x7e
...
This issue can be consistently reproduced on three different servers:
* a 448-core server
* a 256-core server
* a 192-core server
[Root Cause]
Since multiple threads are created by the hackbench program (along with
the command argument 'thread'), a shared vma might be accessed by two or
more cores simultaneously. When two or more cores observe that
vma->numab_state is NULL at the same time, vma->numab_state will be
overwritten.
Although current code ensures that only one thread scans the VMAs in a
single 'numa_scan_period', there might be a chance for another thread
to enter in the next 'numa_scan_period' while we have not gotten till
numab_state allocation [1].
Note that the command `/opt/ltp/testcases/bin/hackbench 50 process 1000`
cannot the reproduce the issue. It is verified with 200+ test runs.
[Solution]
Use the cmpxchg atomic operation to ensure that only one thread executes
the vma->numab_state assignment.
[1] https://lore.kernel.org/lkml/1794be3c-358c-4cdc-a43d-a1f841d91ef7@amd.com/
In the Linux kernel, the following vulnerability has been resolved:
mm/gup: handle NULL pages in unpin_user_pages()
The recent addition of "pofs" (pages or folios) handling to gup has a
flaw: it assumes that unpin_user_pages() handles NULL pages in the pages**
array. That's not the case, as I discovered when I ran on a new
configuration on my test machine.
Fix this by skipping NULL pages in unpin_user_pages(), just like
unpin_folios() already does.
Details: when booting on x86 with "numa=fake=2 movablecore=4G" on Linux
6.12, and running this:
tools/testing/selftests/mm/gup_longterm
...I get the following crash:
BUG: kernel NULL pointer dereference, address: 0000000000000008
RIP: 0010:sanity_check_pinned_pages+0x3a/0x2d0
...
Call Trace:
<TASK>
? __die_body+0x66/0xb0
? page_fault_oops+0x30c/0x3b0
? do_user_addr_fault+0x6c3/0x720
? irqentry_enter+0x34/0x60
? exc_page_fault+0x68/0x100
? asm_exc_page_fault+0x22/0x30
? sanity_check_pinned_pages+0x3a/0x2d0
unpin_user_pages+0x24/0xe0
check_and_migrate_movable_pages_or_folios+0x455/0x4b0
__gup_longterm_locked+0x3bf/0x820
? mmap_read_lock_killable+0x12/0x50
? __pfx_mmap_read_lock_killable+0x10/0x10
pin_user_pages+0x66/0xa0
gup_test_ioctl+0x358/0xb20
__se_sys_ioctl+0x6b/0xc0
do_syscall_64+0x7b/0x150
entry_SYSCALL_64_after_hwframe+0x76/0x7e
In the Linux kernel, the following vulnerability has been resolved:
mm/mempolicy: fix migrate_to_node() assuming there is at least one VMA in a MM
We currently assume that there is at least one VMA in a MM, which isn't
true.
So we might end up having find_vma() return NULL, to then de-reference
NULL. So properly handle find_vma() returning NULL.
This fixes the report:
Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN PTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 UID: 0 PID: 6021 Comm: syz-executor284 Not tainted 6.12.0-rc7-syzkaller-00187-gf868cd251776 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/30/2024
RIP: 0010:migrate_to_node mm/mempolicy.c:1090 [inline]
RIP: 0010:do_migrate_pages+0x403/0x6f0 mm/mempolicy.c:1194
Code: ...
RSP: 0018:ffffc9000375fd08 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffffc9000375fd78 RCX: 0000000000000000
RDX: ffff88807e171300 RSI: dffffc0000000000 RDI: ffff88803390c044
RBP: ffff88807e171428 R08: 0000000000000014 R09: fffffbfff2039ef1
R10: ffffffff901cf78f R11: 0000000000000000 R12: 0000000000000003
R13: ffffc9000375fe90 R14: ffffc9000375fe98 R15: ffffc9000375fdf8
FS: 00005555919e1380(0000) GS:ffff8880b8700000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00005555919e1ca8 CR3: 000000007f12a000 CR4: 00000000003526f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
kernel_migrate_pages+0x5b2/0x750 mm/mempolicy.c:1709
__do_sys_migrate_pages mm/mempolicy.c:1727 [inline]
__se_sys_migrate_pages mm/mempolicy.c:1723 [inline]
__x64_sys_migrate_pages+0x96/0x100 mm/mempolicy.c:1723
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xcd/0x250 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x77/0x7f
[akpm@linux-foundation.org: add unlikely()]
In the Linux kernel, the following vulnerability has been resolved:
kcsan: Turn report_filterlist_lock into a raw_spinlock
Ran Xiaokai reports that with a KCSAN-enabled PREEMPT_RT kernel, we can see
splats like:
| BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
| in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/1
| preempt_count: 10002, expected: 0
| RCU nest depth: 0, expected: 0
| no locks held by swapper/1/0.
| irq event stamp: 156674
| hardirqs last enabled at (156673): [<ffffffff81130bd9>] do_idle+0x1f9/0x240
| hardirqs last disabled at (156674): [<ffffffff82254f84>] sysvec_apic_timer_interrupt+0x14/0xc0
| softirqs last enabled at (0): [<ffffffff81099f47>] copy_process+0xfc7/0x4b60
| softirqs last disabled at (0): [<0000000000000000>] 0x0
| Preemption disabled at:
| [<ffffffff814a3e2a>] paint_ptr+0x2a/0x90
| CPU: 1 UID: 0 PID: 0 Comm: swapper/1 Not tainted 6.11.0+ #3
| Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
| Call Trace:
| <IRQ>
| dump_stack_lvl+0x7e/0xc0
| dump_stack+0x1d/0x30
| __might_resched+0x1a2/0x270
| rt_spin_lock+0x68/0x170
| kcsan_skip_report_debugfs+0x43/0xe0
| print_report+0xb5/0x590
| kcsan_report_known_origin+0x1b1/0x1d0
| kcsan_setup_watchpoint+0x348/0x650
| __tsan_unaligned_write1+0x16d/0x1d0
| hrtimer_interrupt+0x3d6/0x430
| __sysvec_apic_timer_interrupt+0xe8/0x3a0
| sysvec_apic_timer_interrupt+0x97/0xc0
| </IRQ>
On a detected data race, KCSAN's reporting logic checks if it should
filter the report. That list is protected by the report_filterlist_lock
*non-raw* spinlock which may sleep on RT kernels.
Since KCSAN may report data races in any context, convert it to a
raw_spinlock.
This requires being careful about when to allocate memory for the filter
list itself which can be done via KCSAN's debugfs interface. Concurrent
modification of the filter list via debugfs should be rare: the chosen
strategy is to optimistically pre-allocate memory before the critical
section and discard if unused.