In the Linux kernel, the following vulnerability has been resolved:
media: vidtv: Fix use-after-free in vidtv_bridge_dvb_init()
KASAN reports a use-after-free:
BUG: KASAN: use-after-free in dvb_dmxdev_release+0x4d5/0x5d0 [dvb_core]
Call Trace:
...
dvb_dmxdev_release+0x4d5/0x5d0 [dvb_core]
vidtv_bridge_probe+0x7bf/0xa40 [dvb_vidtv_bridge]
platform_probe+0xb6/0x170
...
Allocated by task 1238:
...
dvb_register_device+0x1a7/0xa70 [dvb_core]
dvb_dmxdev_init+0x2af/0x4a0 [dvb_core]
vidtv_bridge_probe+0x766/0xa40 [dvb_vidtv_bridge]
...
Freed by task 1238:
dvb_register_device+0x6d2/0xa70 [dvb_core]
dvb_dmxdev_init+0x2af/0x4a0 [dvb_core]
vidtv_bridge_probe+0x766/0xa40 [dvb_vidtv_bridge]
...
It is because the error handling in vidtv_bridge_dvb_init() is wrong.
First, vidtv_bridge_dmx(dev)_init() will clean themselves when fail, but
goto fail_dmx(_dev): calls release functions again, which causes
use-after-free.
Also, in fail_fe, fail_tuner_probe and fail_demod_probe, j = i will cause
out-of-bound when i finished its loop (i == NUM_FE). And the loop
releasing is wrong, although now NUM_FE is 1 so it won't cause problem.
Fix this by correctly releasing everything.
In the Linux kernel, the following vulnerability has been resolved:
bnxt_en: fix memory leak in bnxt_nvm_test()
Free the kzalloc'ed buffer before returning in the success path.
In the Linux kernel, the following vulnerability has been resolved:
media: ipu3-imgu: Fix NULL pointer dereference in active selection access
What the IMGU driver did was that it first acquired the pointers to active
and try V4L2 subdev state, and only then figured out which one to use.
The problem with that approach and a later patch (see Fixes: tag) is that
as sd_state argument to v4l2_subdev_get_try_crop() et al is NULL, there is
now an attempt to dereference that.
Fix this.
Also rewrap lines a little.
In the Linux kernel, the following vulnerability has been resolved:
dmaengine: qcom-adm: fix wrong calling convention for prep_slave_sg
The calling convention for pre_slave_sg is to return NULL on error and
provide an error log to the system. Qcom-adm instead provide error
pointer when an error occur. This indirectly cause kernel panic for
example for the nandc driver that checks only if the pointer returned by
device_prep_slave_sg is not NULL. Returning an error pointer makes nandc
think the device_prep_slave_sg function correctly completed and makes
the kernel panics later in the code.
While nandc is the one that makes the kernel crash, it was pointed out
that the real problem is qcom-adm not following calling convention for
that function.
To fix this, drop returning error pointer and return NULL with an error
log.
In the Linux kernel, the following vulnerability has been resolved:
x86/apic: Don't disable x2APIC if locked
The APIC supports two modes, legacy APIC (or xAPIC), and Extended APIC
(or x2APIC). X2APIC mode is mostly compatible with legacy APIC, but
it disables the memory-mapped APIC interface in favor of one that uses
MSRs. The APIC mode is controlled by the EXT bit in the APIC MSR.
The MMIO/xAPIC interface has some problems, most notably the APIC LEAK
[1]. This bug allows an attacker to use the APIC MMIO interface to
extract data from the SGX enclave.
Introduce support for a new feature that will allow the BIOS to lock
the APIC in x2APIC mode. If the APIC is locked in x2APIC mode and the
kernel tries to disable the APIC or revert to legacy APIC mode a GP
fault will occur.
Introduce support for a new MSR (IA32_XAPIC_DISABLE_STATUS) and handle
the new locked mode when the LEGACY_XAPIC_DISABLED bit is set by
preventing the kernel from trying to disable the x2APIC.
On platforms with the IA32_XAPIC_DISABLE_STATUS MSR, if SGX or TDX are
enabled the LEGACY_XAPIC_DISABLED will be set by the BIOS. If
legacy APIC is required, then it SGX and TDX need to be disabled in the
BIOS.
[1]: https://aepicleak.com/aepicleak.pdf
In the Linux kernel, the following vulnerability has been resolved:
ALSA: line6: fix stack overflow in line6_midi_transmit
Correctly calculate available space including the size of the chunk
buffer. This fixes a buffer overflow when multiple MIDI sysex
messages are sent to a PODxt device.
In the Linux kernel, the following vulnerability has been resolved:
drm/amdgpu: fix pci device refcount leak
As comment of pci_get_domain_bus_and_slot() says, it returns
a pci device with refcount increment, when finish using it,
the caller must decrement the reference count by calling
pci_dev_put().
So before returning from amdgpu_device_resume|suspend_display_audio(),
pci_dev_put() is called to avoid refcount leak.
In the Linux kernel, the following vulnerability has been resolved:
nvmet-tcp: add bounds check on Transfer Tag
ttag is used as an index to get cmd in nvmet_tcp_handle_h2c_data_pdu(),
add a bounds check to avoid out-of-bounds access.
In the Linux kernel, the following vulnerability has been resolved:
wifi: mt76: mt7921e: fix rmmod crash in driver reload test
In insmod/rmmod stress test, the following crash dump shows up immediately.
The problem is caused by missing mt76_dev in mt7921_pci_remove(). We
should make sure the drvdata is ready before probe() finished.
[168.862789] ==================================================================
[168.862797] BUG: KASAN: user-memory-access in try_to_grab_pending+0x59/0x480
[168.862805] Write of size 8 at addr 0000000000006df0 by task rmmod/5361
[168.862812] CPU: 7 PID: 5361 Comm: rmmod Tainted: G OE 5.19.0-rc6 #1
[168.862816] Hardware name: Intel(R) Client Systems NUC8i7BEH/NUC8BEB, 05/04/2020
[168.862820] Call Trace:
[168.862822] <TASK>
[168.862825] dump_stack_lvl+0x49/0x63
[168.862832] print_report.cold+0x493/0x6b7
[168.862845] kasan_report+0xa7/0x120
[168.862857] kasan_check_range+0x163/0x200
[168.862861] __kasan_check_write+0x14/0x20
[168.862866] try_to_grab_pending+0x59/0x480
[168.862870] __cancel_work_timer+0xbb/0x340
[168.862898] cancel_work_sync+0x10/0x20
[168.862902] mt7921_pci_remove+0x61/0x1c0 [mt7921e]
[168.862909] pci_device_remove+0xa3/0x1d0
[168.862914] device_remove+0xc4/0x170
[168.862920] device_release_driver_internal+0x163/0x300
[168.862925] driver_detach+0xc7/0x1a0
[168.862930] bus_remove_driver+0xeb/0x2d0
[168.862935] driver_unregister+0x71/0xb0
[168.862939] pci_unregister_driver+0x30/0x230
[168.862944] mt7921_pci_driver_exit+0x10/0x1b [mt7921e]
[168.862949] __x64_sys_delete_module+0x2f9/0x4b0
[168.862968] do_syscall_64+0x38/0x90
[168.862973] entry_SYSCALL_64_after_hwframe+0x63/0xcd
Test steps:
1. insmode
2. do not ifup
3. rmmod quickly (within 1 second)
In the Linux kernel, the following vulnerability has been resolved:
clk: visconti: Fix memory leak in visconti_register_pll()
@pll->rate_table has allocated memory by kmemdup(), if clk_hw_register()
fails, it should be freed, otherwise it will cause memory leak issue,
this patch fixes it.
In the Linux kernel, the following vulnerability has been resolved:
devlink: hold region lock when flushing snapshots
Netdevsim triggers a splat on reload, when it destroys regions
with snapshots pending:
WARNING: CPU: 1 PID: 787 at net/core/devlink.c:6291 devlink_region_snapshot_del+0x12e/0x140
CPU: 1 PID: 787 Comm: devlink Not tainted 6.1.0-07460-g7ae9888d6e1c #580
RIP: 0010:devlink_region_snapshot_del+0x12e/0x140
Call Trace:
<TASK>
devl_region_destroy+0x70/0x140
nsim_dev_reload_down+0x2f/0x60 [netdevsim]
devlink_reload+0x1f7/0x360
devlink_nl_cmd_reload+0x6ce/0x860
genl_family_rcv_msg_doit.isra.0+0x145/0x1c0
This is the locking assert in devlink_region_snapshot_del(),
we're supposed to be holding the region->snapshot_lock here.
In the Linux kernel, the following vulnerability has been resolved:
isdn: mISDN: hfcsusb: fix memory leak in hfcsusb_probe()
In hfcsusb_probe(), the memory allocated for ctrl_urb gets leaked when
setup_instance() fails with an error code. Fix that by freeing the urb
before freeing the hw structure. Also change the error paths to use the
goto ladder style.
Compile tested only. Issue found using a prototype static analysis tool.
In the Linux kernel, the following vulnerability has been resolved:
smack: fix bug: unprivileged task can create labels
If an unprivileged task is allowed to relabel itself
(/smack/relabel-self is not empty),
it can freely create new labels by writing their
names into own /proc/PID/attr/smack/current
This occurs because do_setattr() imports
the provided label in advance,
before checking "relabel-self" list.
This change ensures that the "relabel-self" list
is checked before importing the label.
In the Linux kernel, the following vulnerability has been resolved:
gpu: host1x: Fix race in syncpt alloc/free
Fix race condition between host1x_syncpt_alloc()
and host1x_syncpt_put() by using kref_put_mutex()
instead of kref_put() + manual mutex locking.
This ensures no thread can acquire the
syncpt_mutex after the refcount drops to zero
but before syncpt_release acquires it.
This prevents races where syncpoints could
be allocated while still being cleaned up
from a previous release.
Remove explicit mutex locking in syncpt_release
as kref_put_mutex() handles this atomically.
In the Linux kernel, the following vulnerability has been resolved:
accel/amdxdna: Fix an integer overflow in aie2_query_ctx_status_array()
The unpublished smatch static checker reported a warning.
drivers/accel/amdxdna/aie2_pci.c:904 aie2_query_ctx_status_array()
warn: potential user controlled sizeof overflow
'args->num_element * args->element_size' '1-u32max(user) * 1-u32max(user)'
Even this will not cause a real issue, it is better to put a reasonable
limitation for element_size and num_element. Add condition to make sure
the input element_size <= 4K and num_element <= 1K.
In the Linux kernel, the following vulnerability has been resolved:
accel/ivpu: Fix page fault in ivpu_bo_unbind_all_bos_from_context()
Don't add BO to the vdev->bo_list in ivpu_gem_create_object().
When failure happens inside drm_gem_shmem_create(), the BO is not
fully created and ivpu_gem_bo_free() callback will not be called
causing a deleted BO to be left on the list.
In the Linux kernel, the following vulnerability has been resolved:
wifi: ath12k: Fix MSDU buffer types handling in RX error path
Currently, packets received on the REO exception ring from
unassociated peers are of MSDU buffer type, while the driver expects
link descriptor type packets. These packets are not parsed further due
to a return check on packet type in ath12k_hal_desc_reo_parse_err(),
but the associated skb is not freed. This may lead to kernel
crashes and buffer leaks.
Hence to fix, update the RX error handler to explicitly drop
MSDU buffer type packets received on the REO exception ring.
This prevents further processing of invalid packets and ensures
stability in the RX error handling path.
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1
In the Linux kernel, the following vulnerability has been resolved:
ntfs3: fix uninit memory after failed mi_read in mi_format_new
Fix a KMSAN un-init bug found by syzkaller.
ntfs_get_bh() expects a buffer from sb_getblk(), that buffer may not be
uptodate. We do not bring the buffer uptodate before setting it as
uptodate. If the buffer were to not be uptodate, it could mean adding a
buffer with un-init data to the mi record. Attempting to load that record
will trigger KMSAN.
Avoid this by setting the buffer as uptodate, if it’s not already, by
overwriting it.
In the Linux kernel, the following vulnerability has been resolved:
ntfs3: Fix uninit buffer allocated by __getname()
Fix uninit errors caused after buffer allocation given to 'de'; by
initializing the buffer with zeroes. The fix was found by using KMSAN.
In the Linux kernel, the following vulnerability has been resolved:
crypto: aead - Fix reqsize handling
Commit afddce13ce81d ("crypto: api - Add reqsize to crypto_alg")
introduced cra_reqsize field in crypto_alg struct to replace type
specific reqsize fields. It looks like this was introduced specifically
for ahash and acomp from the commit description as subsequent commits
add necessary changes in these alg frameworks.
However, this is being recommended for use in all crypto algs
instead of setting reqsize using crypto_*_set_reqsize(). Using
cra_reqsize in aead algorithms, hence, causes memory corruptions and
crashes as the underlying functions in the algorithm framework have not
been updated to set the reqsize properly from cra_reqsize. [1]
Add proper set_reqsize calls in the aead init function to properly
initialize reqsize for these algorithms in the framework.
[1]: https://gist.github.com/Pratham-T/24247446f1faf4b7843e4014d5089f6b
In the Linux kernel, the following vulnerability has been resolved:
bpf: Do not let BPF test infra emit invalid GSO types to stack
Yinhao et al. reported that their fuzzer tool was able to trigger a
skb_warn_bad_offload() from netif_skb_features() -> gso_features_check().
When a BPF program - triggered via BPF test infra - pushes the packet
to the loopback device via bpf_clone_redirect() then mentioned offload
warning can be seen. GSO-related features are then rightfully disabled.
We get into this situation due to convert___skb_to_skb() setting
gso_segs and gso_size but not gso_type. Technically, it makes sense
that this warning triggers since the GSO properties are malformed due
to the gso_type. Potentially, the gso_type could be marked non-trustworthy
through setting it at least to SKB_GSO_DODGY without any other specific
assumptions, but that also feels wrong given we should not go further
into the GSO engine in the first place.
The checks were added in 121d57af308d ("gso: validate gso_type in GSO
handlers") because there were malicious (syzbot) senders that combine
a protocol with a non-matching gso_type. If we would want to drop such
packets, gso_features_check() currently only returns feature flags via
netif_skb_features(), so one location for potentially dropping such skbs
could be validate_xmit_unreadable_skb(), but then otoh it would be
an additional check in the fast-path for a very corner case. Given
bpf_clone_redirect() is the only place where BPF test infra could emit
such packets, lets reject them right there.
In the Linux kernel, the following vulnerability has been resolved:
crypto: asymmetric_keys - prevent overflow in asymmetric_key_generate_id
Use check_add_overflow() to guard against potential integer overflows
when adding the binary blob lengths and the size of an asymmetric_key_id
structure and return ERR_PTR(-EOVERFLOW) accordingly. This prevents a
possible buffer overflow when copying data from potentially malicious
X.509 certificate fields that can be arbitrarily large, such as ASN.1
INTEGER serial numbers, issuer names, etc.
In the Linux kernel, the following vulnerability has been resolved:
wifi: ath11k: fix peer HE MCS assignment
In ath11k_wmi_send_peer_assoc_cmd(), peer's transmit MCS is sent to
firmware as receive MCS while peer's receive MCS sent as transmit MCS,
which goes against firmwire's definition.
While connecting to a misbehaved AP that advertises 0xffff (meaning not
supported) for 160 MHz transmit MCS map, firmware crashes due to 0xffff
is assigned to he_mcs->rx_mcs_set field.
Ext Tag: HE Capabilities
[...]
Supported HE-MCS and NSS Set
[...]
Rx and Tx MCS Maps 160 MHz
[...]
Tx HE-MCS Map 160 MHz: 0xffff
Swap the assignment to fix this issue.
As the HE rate control mask is meant to limit our own transmit MCS, it
needs to go via he_mcs->rx_mcs_set field. With the aforementioned swapping
done, change is needed as well to apply it to the peer's receive MCS.
Tested-on: WCN6855 hw2.1 PCI WLAN.HSP.1.1-03125-QCAHSPSWPL_V1_V2_SILICONZ_LITE-3.6510.41
Tested-on: QCN9274 hw2.0 PCI WLAN.WBE.1.4.1-00199-QCAHKSWPL_SILICONZ-1
In the Linux kernel, the following vulnerability has been resolved:
bpf: Fix stackmap overflow check in __bpf_get_stackid()
Syzkaller reported a KASAN slab-out-of-bounds write in __bpf_get_stackid()
when copying stack trace data. The issue occurs when the perf trace
contains more stack entries than the stack map bucket can hold,
leading to an out-of-bounds write in the bucket's data array.
In the Linux kernel, the following vulnerability has been resolved:
ns: initialize ns_list_node for initial namespaces
Make sure that the list is always initialized for initial namespaces.
In the Linux kernel, the following vulnerability has been resolved:
coresight: ETR: Fix ETR buffer use-after-free issue
When ETR is enabled as CS_MODE_SYSFS, if the buffer size is changed
and enabled again, currently sysfs_buf will point to the newly
allocated memory(buf_new) and free the old memory(buf_old). But the
etr_buf that is being used by the ETR remains pointed to buf_old, not
updated to buf_new. In this case, it will result in a memory
use-after-free issue.
Fix this by checking ETR's mode before updating and releasing buf_old,
if the mode is CS_MODE_SYSFS, then skip updating and releasing it.
In the Linux kernel, the following vulnerability has been resolved:
perf/x86: Fix NULL event access and potential PEBS record loss
When intel_pmu_drain_pebs_icl() is called to drain PEBS records, the
perf_event_overflow() could be called to process the last PEBS record.
While perf_event_overflow() could trigger the interrupt throttle and
stop all events of the group, like what the below call-chain shows.
perf_event_overflow()
-> __perf_event_overflow()
->__perf_event_account_interrupt()
-> perf_event_throttle_group()
-> perf_event_throttle()
-> event->pmu->stop()
-> x86_pmu_stop()
The side effect of stopping the events is that all corresponding event
pointers in cpuc->events[] array are cleared to NULL.
Assume there are two PEBS events (event a and event b) in a group. When
intel_pmu_drain_pebs_icl() calls perf_event_overflow() to process the
last PEBS record of PEBS event a, interrupt throttle is triggered and
all pointers of event a and event b are cleared to NULL. Then
intel_pmu_drain_pebs_icl() tries to process the last PEBS record of
event b and encounters NULL pointer access.
To avoid this issue, move cpuc->events[] clearing from x86_pmu_stop()
to x86_pmu_del(). It's safe since cpuc->active_mask or
cpuc->pebs_enabled is always checked before access the event pointer
from cpuc->events[].
In the Linux kernel, the following vulnerability has been resolved:
md: fix rcu protection in md_wakeup_thread
We attempted to use RCU to protect the pointer 'thread', but directly
passed the value when calling md_wakeup_thread(). This means that the
RCU pointer has been acquired before rcu_read_lock(), which renders
rcu_read_lock() ineffective and could lead to a use-after-free.
In the Linux kernel, the following vulnerability has been resolved:
md: avoid repeated calls to del_gendisk
There is a uaf problem which is found by case 23rdev-lifetime:
Oops: general protection fault, probably for non-canonical address 0xdead000000000122
RIP: 0010:bdi_unregister+0x4b/0x170
Call Trace:
<TASK>
__del_gendisk+0x356/0x3e0
mddev_unlock+0x351/0x360
rdev_attr_store+0x217/0x280
kernfs_fop_write_iter+0x14a/0x210
vfs_write+0x29e/0x550
ksys_write+0x74/0xf0
do_syscall_64+0xbb/0x380
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff5250a177e
The sequence is:
1. rdev remove path gets reconfig_mutex
2. rdev remove path release reconfig_mutex in mddev_unlock
3. md stop calls do_md_stop and sets MD_DELETED
4. rdev remove path calls del_gendisk because MD_DELETED is set
5. md stop path release reconfig_mutex and calls del_gendisk again
So there is a race condition we should resolve. This patch adds a
flag MD_DO_DELETE to avoid the race condition.
In the Linux kernel, the following vulnerability has been resolved:
nbd: defer config put in recv_work
There is one uaf issue in recv_work when running NBD_CLEAR_SOCK and
NBD_CMD_RECONFIGURE:
nbd_genl_connect // conf_ref=2 (connect and recv_work A)
nbd_open // conf_ref=3
recv_work A done // conf_ref=2
NBD_CLEAR_SOCK // conf_ref=1
nbd_genl_reconfigure // conf_ref=2 (trigger recv_work B)
close nbd // conf_ref=1
recv_work B
config_put // conf_ref=0
atomic_dec(&config->recv_threads); -> UAF
Or only running NBD_CLEAR_SOCK:
nbd_genl_connect // conf_ref=2
nbd_open // conf_ref=3
NBD_CLEAR_SOCK // conf_ref=2
close nbd
nbd_release
config_put // conf_ref=1
recv_work
config_put // conf_ref=0
atomic_dec(&config->recv_threads); -> UAF
Commit 87aac3a80af5 ("nbd: call nbd_config_put() before notifying the
waiter") moved nbd_config_put() to run before waking up the waiter in
recv_work, in order to ensure that nbd_start_device_ioctl() would not
be woken up while nbd->task_recv was still uncleared.
However, in nbd_start_device_ioctl(), after being woken up it explicitly
calls flush_workqueue() to make sure all current works are finished.
Therefore, there is no need to move the config put ahead of the wakeup.
Move nbd_config_put() to the end of recv_work, so that the reference is
held for the whole lifetime of the worker thread. This makes sure the
config cannot be freed while recv_work is still running, even if clear
+ reconfigure interleave.
In addition, we don't need to worry about recv_work dropping the last
nbd_put (which causes deadlock):
path A (netlink with NBD_CFLAG_DESTROY_ON_DISCONNECT):
connect // nbd_refs=1 (trigger recv_work)
open nbd // nbd_refs=2
NBD_CLEAR_SOCK
close nbd
nbd_release
nbd_disconnect_and_put
flush_workqueue // recv_work done
nbd_config_put
nbd_put // nbd_refs=1
nbd_put // nbd_refs=0
queue_work
path B (netlink without NBD_CFLAG_DESTROY_ON_DISCONNECT):
connect // nbd_refs=2 (trigger recv_work)
open nbd // nbd_refs=3
NBD_CLEAR_SOCK // conf_refs=2
close nbd
nbd_release
nbd_config_put // conf_refs=1
nbd_put // nbd_refs=2
recv_work done // conf_refs=0, nbd_refs=1
rmmod // nbd_refs=0
Depends-on: e2daec488c57 ("nbd: Fix hungtask when nbd_config_put")
In the Linux kernel, the following vulnerability has been resolved:
scsi: smartpqi: Fix device resources accessed after device removal
Correct possible race conditions during device removal.
Previously, a scheduled work item to reset a LUN could still execute
after the device was removed, leading to use-after-free and other
resource access issues.
This race condition occurs because the abort handler may schedule a LUN
reset concurrently with device removal via sdev_destroy(), leading to
use-after-free and improper access to freed resources.
- Check in the device reset handler if the device is still present in
the controller's SCSI device list before running; if not, the reset
is skipped.
- Cancel any pending TMF work that has not started in sdev_destroy().
- Ensure device freeing in sdev_destroy() is done while holding the
LUN reset mutex to avoid races with ongoing resets.
In the Linux kernel, the following vulnerability has been resolved:
coresight: tmc: add the handle of the event to the path
The handle is essential for retrieving the AUX_EVENT of each CPU and is
required in perf mode. It has been added to the coresight_path so that
dependent devices can access it from the path when needed.
The existing bug can be reproduced with:
perf record -e cs_etm//k -C 0-9 dd if=/dev/zero of=/dev/null
Showing an oops as follows:
Unable to handle kernel paging request at virtual address 000f6e84934ed19e
Call trace:
tmc_etr_get_buffer+0x30/0x80 [coresight_tmc] (P)
catu_enable_hw+0xbc/0x3d0 [coresight_catu]
catu_enable+0x70/0xe0 [coresight_catu]
coresight_enable_path+0xb0/0x258 [coresight]
In the Linux kernel, the following vulnerability has been resolved:
ntfs3: init run lock for extend inode
After setting the inode mode of $Extend to a regular file, executing the
truncate system call will enter the do_truncate() routine, causing the
run_lock uninitialized error reported by syzbot.
Prior to patch 4e8011ffec79, if the inode mode of $Extend was not set to
a regular file, the do_truncate() routine would not be entered.
Add the run_lock initialization when loading $Extend.
syzbot reported:
INFO: trying to register non-static key.
Call Trace:
dump_stack_lvl+0x189/0x250 lib/dump_stack.c:120
assign_lock_key+0x133/0x150 kernel/locking/lockdep.c:984
register_lock_class+0x105/0x320 kernel/locking/lockdep.c:1299
__lock_acquire+0x99/0xd20 kernel/locking/lockdep.c:5112
lock_acquire+0x120/0x360 kernel/locking/lockdep.c:5868
down_write+0x96/0x1f0 kernel/locking/rwsem.c:1590
ntfs_set_size+0x140/0x200 fs/ntfs3/inode.c:860
ntfs_extend+0x1d9/0x970 fs/ntfs3/file.c:387
ntfs_setattr+0x2e8/0xbe0 fs/ntfs3/file.c:808
In the Linux kernel, the following vulnerability has been resolved:
md: init bioset in mddev_init
IO operations may be needed before md_run(), such as updating metadata
after writing sysfs. Without bioset, this triggers a NULL pointer
dereference as below:
BUG: kernel NULL pointer dereference, address: 0000000000000020
Call Trace:
md_update_sb+0x658/0xe00
new_level_store+0xc5/0x120
md_attr_store+0xc9/0x1e0
sysfs_kf_write+0x6f/0xa0
kernfs_fop_write_iter+0x141/0x2a0
vfs_write+0x1fc/0x5a0
ksys_write+0x79/0x180
__x64_sys_write+0x1d/0x30
x64_sys_call+0x2818/0x2880
do_syscall_64+0xa9/0x580
entry_SYSCALL_64_after_hwframe+0x4b/0x53
Reproducer
```
mdadm -CR /dev/md0 -l1 -n2 /dev/sd[cd]
echo inactive > /sys/block/md0/md/array_state
echo 10 > /sys/block/md0/md/new_level
```
mddev_init() can only be called once per mddev, no need to test if bioset
has been initialized anymore.
In the Linux kernel, the following vulnerability has been resolved:
nbd: defer config unlock in nbd_genl_connect
There is one use-after-free warning when running NBD_CMD_CONNECT and
NBD_CLEAR_SOCK:
nbd_genl_connect
nbd_alloc_and_init_config // config_refs=1
nbd_start_device // config_refs=2
set NBD_RT_HAS_CONFIG_REF open nbd // config_refs=3
recv_work done // config_refs=2
NBD_CLEAR_SOCK // config_refs=1
close nbd // config_refs=0
refcount_inc -> uaf
------------[ cut here ]------------
refcount_t: addition on 0; use-after-free.
WARNING: CPU: 24 PID: 1014 at lib/refcount.c:25 refcount_warn_saturate+0x12e/0x290
nbd_genl_connect+0x16d0/0x1ab0
genl_family_rcv_msg_doit+0x1f3/0x310
genl_rcv_msg+0x44a/0x790
The issue can be easily reproduced by adding a small delay before
refcount_inc(&nbd->config_refs) in nbd_genl_connect():
mutex_unlock(&nbd->config_lock);
if (!ret) {
set_bit(NBD_RT_HAS_CONFIG_REF, &config->runtime_flags);
+ printk("before sleep\n");
+ mdelay(5 * 1000);
+ printk("after sleep\n");
refcount_inc(&nbd->config_refs);
nbd_connect_reply(info, nbd->index);
}
In the Linux kernel, the following vulnerability has been resolved:
fs/ntfs3: Initialize allocated memory before use
KMSAN reports: Multiple uninitialized values detected:
- KMSAN: uninit-value in ntfs_read_hdr (3)
- KMSAN: uninit-value in bcmp (3)
Memory is allocated by __getname(), which is a wrapper for
kmem_cache_alloc(). This memory is used before being properly
cleared. Change kmem_cache_alloc() to kmem_cache_zalloc() to
properly allocate and clear memory before use.
In the Linux kernel, the following vulnerability has been resolved:
ocfs2: relax BUG() to ocfs2_error() in __ocfs2_move_extent()
In '__ocfs2_move_extent()', relax 'BUG()' to 'ocfs2_error()' just
to avoid crashing the whole kernel due to a filesystem corruption.
In the Linux kernel, the following vulnerability has been resolved:
bpf: Check skb->transport_header is set in bpf_skb_check_mtu
The bpf_skb_check_mtu helper needs to use skb->transport_header when
the BPF_MTU_CHK_SEGS flag is used:
bpf_skb_check_mtu(skb, ifindex, &mtu_len, 0, BPF_MTU_CHK_SEGS)
The transport_header is not always set. There is a WARN_ON_ONCE
report when CONFIG_DEBUG_NET is enabled + skb->gso_size is set +
bpf_prog_test_run is used:
WARNING: CPU: 1 PID: 2216 at ./include/linux/skbuff.h:3071
skb_gso_validate_network_len
bpf_skb_check_mtu
bpf_prog_3920e25740a41171_tc_chk_segs_flag # A test in the next patch
bpf_test_run
bpf_prog_test_run_skb
For a normal ingress skb (not test_run), skb_reset_transport_header
is performed but there is plan to avoid setting it as described in
commit 2170a1f09148 ("net: no longer reset transport_header in __netif_receive_skb_core()").
This patch fixes the bpf helper by checking
skb_transport_header_was_set(). The check is done just before
skb->transport_header is used, to avoid breaking the existing bpf prog.
The WARN_ON_ONCE is limited to bpf_prog_test_run, so targeting bpf-next.
In the Linux kernel, the following vulnerability has been resolved:
wifi: rtl818x: rtl8187: Fix potential buffer underflow in rtl8187_rx_cb()
The rtl8187_rx_cb() calculates the rx descriptor header address
by subtracting its size from the skb tail pointer.
However, it does not validate if the received packet
(skb->len from urb->actual_length) is large enough to contain this
header.
If a truncated packet is received, this will lead to a buffer
underflow, reading memory before the start of the skb data area,
and causing a kernel panic.
Add length checks for both rtl8187 and rtl8187b descriptor headers
before attempting to access them, dropping the packet cleanly if the
check fails.
In the Linux kernel, the following vulnerability has been resolved:
erofs: limit the level of fs stacking for file-backed mounts
Otherwise, it could cause potential kernel stack overflow (e.g., EROFS
mounting itself).
In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix double free of qgroup record after failure to add delayed ref head
In the previous code it was possible to incur into a double kfree()
scenario when calling add_delayed_ref_head(). This could happen if the
record was reported to already exist in the
btrfs_qgroup_trace_extent_nolock() call, but then there was an error
later on add_delayed_ref_head(). In this case, since
add_delayed_ref_head() returned an error, the caller went to free the
record. Since add_delayed_ref_head() couldn't set this kfree'd pointer
to NULL, then kfree() would have acted on a non-NULL 'record' object
which was pointing to memory already freed by the callee.
The problem comes from the fact that the responsibility to kfree the
object is on both the caller and the callee at the same time. Hence, the
fix for this is to shift the ownership of the 'qrecord' object out of
the add_delayed_ref_head(). That is, we will never attempt to kfree()
the given object inside of this function, and will expect the caller to
act on the 'qrecord' object on its own. The only exception where the
'qrecord' object cannot be kfree'd is if it was inserted into the
tracing logic, for which we already have the 'qrecord_inserted_ret'
boolean to account for this. Hence, the caller has to kfree the object
only if add_delayed_ref_head() reports not to have inserted it on the
tracing logic.
As a side-effect of the above, we must guarantee that
'qrecord_inserted_ret' is properly initialized at the start of the
function, not at the end, and then set when an actual insert
happens. This way we avoid 'qrecord_inserted_ret' having an invalid
value on an early exit.
The documentation from the add_delayed_ref_head() has also been updated
to reflect on the exact ownership of the 'qrecord' object.
In the Linux kernel, the following vulnerability has been resolved:
btrfs: fix racy bitfield write in btrfs_clear_space_info_full()
From the memory-barriers.txt document regarding memory barrier ordering
guarantees:
(*) These guarantees do not apply to bitfields, because compilers often
generate code to modify these using non-atomic read-modify-write
sequences. Do not attempt to use bitfields to synchronize parallel
algorithms.
(*) Even in cases where bitfields are protected by locks, all fields
in a given bitfield must be protected by one lock. If two fields
in a given bitfield are protected by different locks, the compiler's
non-atomic read-modify-write sequences can cause an update to one
field to corrupt the value of an adjacent field.
btrfs_space_info has a bitfield sharing an underlying word consisting of
the fields full, chunk_alloc, and flush:
struct btrfs_space_info {
struct btrfs_fs_info * fs_info; /* 0 8 */
struct btrfs_space_info * parent; /* 8 8 */
...
int clamp; /* 172 4 */
unsigned int full:1; /* 176: 0 4 */
unsigned int chunk_alloc:1; /* 176: 1 4 */
unsigned int flush:1; /* 176: 2 4 */
...
Therefore, to be safe from parallel read-modify-writes losing a write to
one of the bitfield members protected by a lock, all writes to all the
bitfields must use the lock. They almost universally do, except for
btrfs_clear_space_info_full() which iterates over the space_infos and
writes out found->full = 0 without a lock.
Imagine that we have one thread completing a transaction in which we
finished deleting a block_group and are thus calling
btrfs_clear_space_info_full() while simultaneously the data reclaim
ticket infrastructure is running do_async_reclaim_data_space():
T1 T2
btrfs_commit_transaction
btrfs_clear_space_info_full
data_sinfo->full = 0
READ: full:0, chunk_alloc:0, flush:1
do_async_reclaim_data_space(data_sinfo)
spin_lock(&space_info->lock);
if(list_empty(tickets))
space_info->flush = 0;
READ: full: 0, chunk_alloc:0, flush:1
MOD/WRITE: full: 0, chunk_alloc:0, flush:0
spin_unlock(&space_info->lock);
return;
MOD/WRITE: full:0, chunk_alloc:0, flush:1
and now data_sinfo->flush is 1 but the reclaim worker has exited. This
breaks the invariant that flush is 0 iff there is no work queued or
running. Once this invariant is violated, future allocations that go
into __reserve_bytes() will add tickets to space_info->tickets but will
see space_info->flush is set to 1 and not queue the work. After this,
they will block forever on the resulting ticket, as it is now impossible
to kick the worker again.
I also confirmed by looking at the assembly of the affected kernel that
it is doing RMW operations. For example, to set the flush (3rd) bit to 0,
the assembly is:
andb $0xfb,0x60(%rbx)
and similarly for setting the full (1st) bit to 0:
andb $0xfe,-0x20(%rax)
So I think this is really a bug on practical systems. I have observed
a number of systems in this exact state, but am currently unable to
reproduce it.
Rather than leaving this footgun lying around for the future, take
advantage of the fact that there is room in the struct anyway, and that
it is already quite large and simply change the three bitfield members to
bools. This avoids writes to space_info->full having any effect on
---truncated---
In the Linux kernel, the following vulnerability has been resolved:
iomap: allocate s_dio_done_wq for async reads as well
Since commit 222f2c7c6d14 ("iomap: always run error completions in user
context"), read error completions are deferred to s_dio_done_wq. This
means the workqueue also needs to be allocated for async reads.
In the Linux kernel, the following vulnerability has been resolved:
gfs2: Prevent recursive memory reclaim
Function new_inode() returns a new inode with inode->i_mapping->gfp_mask
set to GFP_HIGHUSER_MOVABLE. This value includes the __GFP_FS flag, so
allocations in that address space can recurse into filesystem memory
reclaim. We don't want that to happen because it can consume a
significant amount of stack memory.
Worse than that is that it can also deadlock: for example, in several
places, gfs2_unstuff_dinode() is called inside filesystem transactions.
This calls filemap_grab_folio(), which can allocate a new folio, which
can trigger memory reclaim. If memory reclaim recurses into the
filesystem and starts another transaction, a deadlock will ensue.
To fix these kinds of problems, prevent memory reclaim from recursing
into filesystem code by making sure that the gfp_mask of inode address
spaces doesn't include __GFP_FS.
The "meta" and resource group address spaces were already using GFP_NOFS
as their gfp_mask (which doesn't include __GFP_FS). The default value
of GFP_HIGHUSER_MOVABLE is less restrictive than GFP_NOFS, though. To
avoid being overly limiting, use the default value and only knock off
the __GFP_FS flag. I'm not sure if this will actually make a
difference, but it also shouldn't hurt.
This patch is loosely based on commit ad22c7a043c2 ("xfs: prevent stack
overflows from page cache allocation").
Fixes xfstest generic/273.