summaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)Author
2024-03-01fs/ntfs3: Enhance the attribute size checkEdward Lo
commit 4f082a7531223a438c757bb20e304f4c941c67a8 upstream. This combines the overflow and boundary check so that all attribute size will be properly examined while enumerating them. [ 169.181521] BUG: KASAN: slab-out-of-bounds in run_unpack+0x2e3/0x570 [ 169.183161] Read of size 1 at addr ffff8880094b6240 by task mount/247 [ 169.184046] [ 169.184925] CPU: 0 PID: 247 Comm: mount Not tainted 6.0.0-rc7+ #3 [ 169.185908] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 169.187066] Call Trace: [ 169.187492] <TASK> [ 169.188049] dump_stack_lvl+0x49/0x63 [ 169.188495] print_report.cold+0xf5/0x689 [ 169.188964] ? run_unpack+0x2e3/0x570 [ 169.189331] kasan_report+0xa7/0x130 [ 169.189714] ? run_unpack+0x2e3/0x570 [ 169.190079] __asan_load1+0x51/0x60 [ 169.190634] run_unpack+0x2e3/0x570 [ 169.191290] ? run_pack+0x840/0x840 [ 169.191569] ? run_lookup_entry+0xb3/0x1f0 [ 169.192443] ? mi_enum_attr+0x20a/0x230 [ 169.192886] run_unpack_ex+0xad/0x3e0 [ 169.193276] ? run_unpack+0x570/0x570 [ 169.193557] ? ni_load_mi+0x80/0x80 [ 169.193889] ? debug_smp_processor_id+0x17/0x20 [ 169.194236] ? mi_init+0x4a/0x70 [ 169.194496] attr_load_runs_vcn+0x166/0x1c0 [ 169.194851] ? attr_data_write_resident+0x250/0x250 [ 169.195188] mi_read+0x133/0x2c0 [ 169.195481] ntfs_iget5+0x277/0x1780 [ 169.196017] ? call_rcu+0x1c7/0x330 [ 169.196392] ? ntfs_get_block_bmap+0x70/0x70 [ 169.196708] ? evict+0x223/0x280 [ 169.197014] ? __kmalloc+0x33/0x540 [ 169.197305] ? wnd_init+0x15b/0x1b0 [ 169.197599] ntfs_fill_super+0x1026/0x1ba0 [ 169.197994] ? put_ntfs+0x1d0/0x1d0 [ 169.198299] ? vsprintf+0x20/0x20 [ 169.198583] ? mutex_unlock+0x81/0xd0 [ 169.198930] ? set_blocksize+0x95/0x150 [ 169.199269] get_tree_bdev+0x232/0x370 [ 169.199750] ? put_ntfs+0x1d0/0x1d0 [ 169.200094] ntfs_fs_get_tree+0x15/0x20 [ 169.200431] vfs_get_tree+0x4c/0x130 [ 169.200714] path_mount+0x654/0xfe0 [ 169.201067] ? putname+0x80/0xa0 [ 169.201358] ? finish_automount+0x2e0/0x2e0 [ 169.201965] ? putname+0x80/0xa0 [ 169.202445] ? kmem_cache_free+0x1c4/0x440 [ 169.203075] ? putname+0x80/0xa0 [ 169.203414] do_mount+0xd6/0xf0 [ 169.203719] ? path_mount+0xfe0/0xfe0 [ 169.203977] ? __kasan_check_write+0x14/0x20 [ 169.204382] __x64_sys_mount+0xca/0x110 [ 169.204711] do_syscall_64+0x3b/0x90 [ 169.205059] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 169.205571] RIP: 0033:0x7f67a80e948a [ 169.206327] Code: 48 8b 0d 11 fa 2a 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 49 89 ca b8 a5 00 00 008 [ 169.208296] RSP: 002b:00007ffddf020f58 EFLAGS: 00000202 ORIG_RAX: 00000000000000a5 [ 169.209253] RAX: ffffffffffffffda RBX: 000055e2547a6060 RCX: 00007f67a80e948a [ 169.209777] RDX: 000055e2547a6260 RSI: 000055e2547a62e0 RDI: 000055e2547aeaf0 [ 169.210342] RBP: 0000000000000000 R08: 000055e2547a6280 R09: 0000000000000020 [ 169.210843] R10: 00000000c0ed0000 R11: 0000000000000202 R12: 000055e2547aeaf0 [ 169.211307] R13: 000055e2547a6260 R14: 0000000000000000 R15: 00000000ffffffff [ 169.211913] </TASK> [ 169.212304] [ 169.212680] Allocated by task 0: [ 169.212963] (stack is not available) [ 169.213200] [ 169.213472] The buggy address belongs to the object at ffff8880094b5e00 [ 169.213472] which belongs to the cache UDP of size 1152 [ 169.214095] The buggy address is located 1088 bytes inside of [ 169.214095] 1152-byte region [ffff8880094b5e00, ffff8880094b6280) [ 169.214639] [ 169.215004] The buggy address belongs to the physical page: [ 169.215766] page:000000002e324c8c refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x94b4 [ 169.218412] head:000000002e324c8c order:2 compound_mapcount:0 compound_pincount:0 [ 169.219078] flags: 0xfffffc0010200(slab|head|node=0|zone=1|lastcpupid=0x1fffff) [ 169.220272] raw: 000fffffc0010200 0000000000000000 dead000000000122 ffff888002409b40 [ 169.221006] raw: 0000000000000000 00000000800c000c 00000001ffffffff 0000000000000000 [ 169.222320] page dumped because: kasan: bad access detected [ 169.222922] [ 169.223119] Memory state around the buggy address: [ 169.224056] ffff8880094b6100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 169.224908] ffff8880094b6180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 169.225677] >ffff8880094b6200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 169.226445] ^ [ 169.227055] ffff8880094b6280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc [ 169.227638] ffff8880094b6300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb Signed-off-by: Edward Lo <edward.lo@ambergroup.io> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Cc: "Doebel, Bjoern" <doebel@amazon.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01fs/aio: Restrict kiocb_set_cancel_fn() to I/O submitted via libaioBart Van Assche
commit b820de741ae48ccf50dd95e297889c286ff4f760 upstream. If kiocb_set_cancel_fn() is called for I/O submitted via io_uring, the following kernel warning appears: WARNING: CPU: 3 PID: 368 at fs/aio.c:598 kiocb_set_cancel_fn+0x9c/0xa8 Call trace: kiocb_set_cancel_fn+0x9c/0xa8 ffs_epfile_read_iter+0x144/0x1d0 io_read+0x19c/0x498 io_issue_sqe+0x118/0x27c io_submit_sqes+0x25c/0x5fc __arm64_sys_io_uring_enter+0x104/0xab0 invoke_syscall+0x58/0x11c el0_svc_common+0xb4/0xf4 do_el0_svc+0x2c/0xb0 el0_svc+0x2c/0xa4 el0t_64_sync_handler+0x68/0xb4 el0t_64_sync+0x1a4/0x1a8 Fix this by setting the IOCB_AIO_RW flag for read and write I/O that is submitted by libaio. Suggested-by: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Cc: Avi Kivity <avi@scylladb.com> Cc: Sandeep Dhavale <dhavale@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: stable@vger.kernel.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20240215204739.2677806-2-bvanassche@acm.org Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01erofs: fix inconsistent per-file compression formatGao Xiang
commit 118a8cf504d7dfa519562d000f423ee3ca75d2c4 upstream. EROFS can select compression algorithms on a per-file basis, and each per-file compression algorithm needs to be marked in the on-disk superblock for initialization. However, syzkaller can generate inconsistent crafted images that use an unsupported algorithmtype for specific inodes, e.g. use MicroLZMA algorithmtype even it's not set in `sbi->available_compr_algs`. This can lead to an unexpected "BUG: kernel NULL pointer dereference" if the corresponding decompressor isn't built-in. Fix this by checking against `sbi->available_compr_algs` for each m_algorithmformat request. Incorrect !erofs_sb_has_compr_cfgs preset bitmap is now fixed together since it was harmless previously. Reported-by: <bugreport@ubisectech.com> Fixes: 8f89926290c4 ("erofs: get compression algorithms directly on mapping") Fixes: 622ceaddb764 ("erofs: lzma compression support") Reviewed-by: Yue Hu <huyue2@coolpad.com> Link: https://lore.kernel.org/r/20240113150602.1471050-1-hsiangkao@linux.alibaba.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Yue Hu <huyue2@coolpad.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01erofs: simplify compression configuration parserGao Xiang
commit efb4fb02cef3ab410b603c8f0e1c67f61d55f542 upstream. Move erofs_load_compr_cfgs() into decompressor.c as well as introduce a callback instead of a hard-coded switch for each algorithm for simplicity. Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Link: https://lore.kernel.org/r/20231022130957.11398-1-xiang@kernel.org Stable-dep-of: 118a8cf504d7 ("erofs: fix inconsistent per-file compression format") Signed-off-by: Yue Hu <huyue2@coolpad.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01afs: Increase buffer size in afs_update_volume_status()Daniil Dulov
[ Upstream commit 6ea38e2aeb72349cad50e38899b0ba6fbcb2af3d ] The max length of volume->vid value is 20 characters. So increase idbuf[] size up to 24 to avoid overflow. Found by Linux Verification Center (linuxtesting.org) with SVACE. [DH: Actually, it's 20 + NUL, so increase it to 24 and use snprintf()] Fixes: d2ddc776a458 ("afs: Overhaul volume and server record caching and fileserver rotation") Signed-off-by: Daniil Dulov <d.dulov@aladdin.ru> Signed-off-by: David Howells <dhowells@redhat.com> Link: https://lore.kernel.org/r/20240211150442.3416-1-d.dulov@aladdin.ru/ # v1 Link: https://lore.kernel.org/r/20240212083347.10742-1-d.dulov@aladdin.ru/ # v2 Link: https://lore.kernel.org/r/20240219143906.138346-3-dhowells@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01erofs: fix refcount on the metabuf used for inode lookupSandeep Dhavale
commit 56ee7db31187dc36d501622cb5f1415e88e01c2a upstream. In erofs_find_target_block() when erofs_dirnamecmp() returns 0, we do not assign the target metabuf. This causes the caller erofs_namei()'s erofs_put_metabuf() at the end to be not effective leaving the refcount on the page. As the page from metabuf (buf->page) is never put, such page cannot be migrated or reclaimed. Fix it now by putting the metabuf from previous loop and assigning the current metabuf to target before returning so caller erofs_namei() can do the final put as it was intended. Fixes: 500edd095648 ("erofs: use meta buffers for inode lookup") Cc: <stable@vger.kernel.org> # 5.18+ Signed-off-by: Sandeep Dhavale <dhavale@google.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Chao Yu <chao@kernel.org> Link: https://lore.kernel.org/r/20240221210348.3667795-1-dhavale@google.com Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01cachefiles: fix memory leak in cachefiles_add_cache()Baokun Li
commit e21a2f17566cbd64926fb8f16323972f7a064444 upstream. The following memory leak was reported after unbinding /dev/cachefiles: ================================================================== unreferenced object 0xffff9b674176e3c0 (size 192): comm "cachefilesd2", pid 680, jiffies 4294881224 hex dump (first 32 bytes): 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc ea38a44b): [<ffffffff8eb8a1a5>] kmem_cache_alloc+0x2d5/0x370 [<ffffffff8e917f86>] prepare_creds+0x26/0x2e0 [<ffffffffc002eeef>] cachefiles_determine_cache_security+0x1f/0x120 [<ffffffffc00243ec>] cachefiles_add_cache+0x13c/0x3a0 [<ffffffffc0025216>] cachefiles_daemon_write+0x146/0x1c0 [<ffffffff8ebc4a3b>] vfs_write+0xcb/0x520 [<ffffffff8ebc5069>] ksys_write+0x69/0xf0 [<ffffffff8f6d4662>] do_syscall_64+0x72/0x140 [<ffffffff8f8000aa>] entry_SYSCALL_64_after_hwframe+0x6e/0x76 ================================================================== Put the reference count of cache_cred in cachefiles_daemon_unbind() to fix the problem. And also put cache_cred in cachefiles_add_cache() error branch to avoid memory leaks. Fixes: 9ae326a69004 ("CacheFiles: A cache that backs onto a mounted filesystem") CC: stable@vger.kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://lore.kernel.org/r/20240217081431.796809-1-libaokun1@huawei.com Acked-by: David Howells <dhowells@redhat.com> Reviewed-by: Jingbo Xu <jefflexu@linux.alibaba.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-03-01smb3: clarify mount warningSteve French
[ Upstream commit a5cc98eba2592d6e3c5a4351319595ddde2a5901 ] When a user tries to use the "sec=krb5p" mount parameter to encrypt data on connection to a server (when authenticating with Kerberos), we indicate that it is not supported, but do not note the equivalent recommended mount parameter ("sec=krb5,seal") which turns on encryption for that mount (and uses Kerberos for auth). Update the warning message. Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01smb: client: set correct d_type for reparse points under DFS mountsPaulo Alcantara
[ Upstream commit 55c7788c37242702868bfac7861cdf0c358d6c3d ] Send query dir requests with an info level of SMB_FIND_FILE_FULL_DIRECTORY_INFO rather than SMB_FIND_FILE_DIRECTORY_INFO when the client is generating its own inode numbers (e.g. noserverino) so that reparse tags still can be parsed directly from the responses, but server won't send UniqueId (server inode number) Signed-off-by: Paulo Alcantara <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01fs/ntfs3: Fix oob in ntfs_listxattrEdward Adam Davis
[ Upstream commit 731ab1f9828800df871c5a7ab9ffe965317d3f15 ] The length of name cannot exceed the space occupied by ea. Reported-and-tested-by: syzbot+65e940cfb8f99a97aca7@syzkaller.appspotmail.com Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01fs/ntfs3: Update inode->i_size after success write into compressed fileKonstantin Komarov
[ Upstream commit d68968440b1a75dee05cfac7f368f1aa139e1911 ] Reported-by: Giovanni Santini <giovannisantini93@yahoo.it> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01fs/ntfs3: Correct function is_rst_area_validKonstantin Komarov
[ Upstream commit 1b7dd28e14c4728ae1a815605ca33ffb4ce1b309 ] Reported-by: Robert Morris <rtm@csail.mit.edu> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01fs/ntfs3: Prevent generic message "attempt to access beyond end of device"Konstantin Komarov
[ Upstream commit 5ca87d01eba7bdfe9536a157ca33c1455bb8d16c ] It used in test environment. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01fs/ntfs3: use non-movable memory for ntfs3 MFT buffer cacheIsm Hong
[ Upstream commit d6d33f03baa43d763fe094ca926eeae7d3421d07 ] Since the buffer cache for ntfs3 metadata is not released until the file system is unmounted, allocating from the movable zone may result in cma allocation failures. This is due to the page still being used by ntfs3, leading to migration failures. To address this, this commit use sb_bread_umovable() instead of sb_bread(). This change prevents allocation from the movable zone, ensuring compatibility with scenarios where the buffer head is not released until unmount. This patch is inspired by commit a8ac900b8163("ext4: use non-movable memory for the ext4 superblock"). The issue is found when playing video files stored in NTFS on the Android TV platform. During this process, the media parser reads the video file, causing ntfs3 to allocate buffer cache from the CMA area. Subsequently, the hardware decoder attempts to allocate memory from the same CMA area. However, the page is still in use by ntfs3, resulting in a migrate failure in alloc_contig_range(). The pinned page and allocating stacktrace reported by page owner shows below: page:ffffffff00b68880 refcount:3 mapcount:0 mapping:ffffff80046aa828 index:0xc0040 pfn:0x20fa4 aops:def_blk_aops ino:0 flags: 0x2020(active|private) page dumped because: migration failure page last allocated via order 0, migratetype Movable, gfp_mask 0x108c48 (GFP_NOFS|__GFP_NOFAIL|__GFP_HARDWALL|__GFP_MOVABLE), page_owner tracks the page as allocated prep_new_page get_page_from_freelist __alloc_pages_nodemask pagecache_get_page __getblk_gfp __bread_gfp ntfs_read_run_nb ntfs_read_bh mi_read ntfs_iget5 dir_search_u ntfs_lookup __lookup_slow lookup_slow walk_component path_lookupat Signed-off-by: Ism Hong <ism.hong@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01fs/ntfs3: Disable ATTR_LIST_ENTRY size checkKonstantin Komarov
[ Upstream commit 4cdfb6e7bc9c80142d33bf1d4653a73fa678ba56 ] The use of sizeof(struct ATTR_LIST_ENTRY) has been replaced with le_size(0) due to alignment peculiarities on different platforms. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202312071005.g6YrbaIe-lkp@intel.com/ Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01fs/ntfs3: Add NULL ptr dereference checking at the end of attr_allocate_frame()Konstantin Komarov
[ Upstream commit aaab47f204aaf47838241d57bf8662c8840de60a ] It is preferable to exit through the out: label because internal debugging functions are located there. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01fs/ntfs3: Fix detected field-spanning write (size 8) of single field "le->name"Konstantin Komarov
[ Upstream commit d155617006ebc172a80d3eb013c4b867f9a8ada4 ] Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01fs/ntfs3: Print warning while fixing hard links countKonstantin Komarov
[ Upstream commit 85ba2a75faee759809a7e43b4c103ac59bac1026 ] Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01fs/ntfs3: Correct hard links updating when dealing with DOS namesKonstantin Komarov
[ Upstream commit 1918c10e137eae266b8eb0ab1cc14421dcb0e3e2 ] Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01fs/ntfs3: Improve ntfs_dir_countKonstantin Komarov
[ Upstream commit 6a799c928b78b14999b7705c4cca0f88e297fe96 ] Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01fs/ntfs3: Modified fix directory element type detectionKonstantin Komarov
[ Upstream commit 22457c047ed971f2f2e33be593ddfabd9639a409 ] Unfortunately reparse attribute is used for many purposes (several dozens). It is not possible here to know is this name symlink or not. To get exactly the type of name we should to open inode (read mft). getattr for opened file (fstat) correctly returns symlink. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01ext4: correct the hole length returned by ext4_map_blocks()Zhang Yi
[ Upstream commit 6430dea07e85958fa87d0276c0c4388dd51e630b ] In ext4_map_blocks(), if we can't find a range of mapping in the extents cache, we are calling ext4_ext_map_blocks() to search the real path and ext4_ext_determine_hole() to determine the hole range. But if the querying range was partially or completely overlaped by a delalloc extent, we can't find it in the real extent path, so the returned hole length could be incorrect. Fortunately, ext4_ext_put_gap_in_cache() have already handle delalloc extent, but it searches start from the expanded hole_start, doesn't start from the querying range, so the delalloc extent found could not be the one that overlaped the querying range, plus, it also didn't adjust the hole length. Let's just remove ext4_ext_put_gap_in_cache(), handle delalloc and insert adjusted hole extent in ext4_ext_determine_hole(). Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240127015825.1608160-4-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01smb: client: increase number of PDUs allowed in a compound requestPaulo Alcantara
[ Upstream commit 11d4d1dba3315f73d2d1d386f5bf4811a8241d45 ] With the introduction of SMB2_OP_QUERY_WSL_EA, the client may now send 5 commands in a single compound request in order to query xattrs from potential WSL reparse points, which should be fine as we currently allow up to 5 PDUs in a single compound request. However, if encryption is enabled (e.g. 'seal' mount option) or enforced by the server, current MAX_COMPOUND(5) won't be enough as we require an extra PDU for the transform header. Fix this by increasing MAX_COMPOUND to 7 and, while we're at it, add an WARN_ON_ONCE() and return -EIO instead of -ENOMEM in case we attempt to send a compound request that couldn't include the extra transform header. Signed-off-by: Paulo Alcantara <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01ext4: avoid allocating blocks from corrupted group in ext4_mb_find_by_goal()Baokun Li
[ Upstream commit 832698373a25950942c04a512daa652c18a9b513 ] Places the logic for checking if the group's block bitmap is corrupt under the protection of the group lock to avoid allocating blocks from the group with a corrupted block bitmap. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240104142040.2835097-8-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01ext4: avoid allocating blocks from corrupted group in ext4_mb_try_best_found()Baokun Li
[ Upstream commit 4530b3660d396a646aad91a787b6ab37cf604b53 ] Determine if the group block bitmap is corrupted before using ac_b_ex in ext4_mb_try_best_found() to avoid allocating blocks from a group with a corrupted block bitmap in the following concurrency and making the situation worse. ext4_mb_regular_allocator ext4_lock_group(sb, group) ext4_mb_good_group // check if the group bbitmap is corrupted ext4_mb_complex_scan_group // Scan group gets ac_b_ex but doesn't use it ext4_unlock_group(sb, group) ext4_mark_group_bitmap_corrupted(group) // The block bitmap was corrupted during // the group unlock gap. ext4_mb_try_best_found ext4_lock_group(ac->ac_sb, group) ext4_mb_use_best_found mb_mark_used // Allocating blocks in block bitmap corrupted group Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240104142040.2835097-7-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01ext4: avoid dividing by 0 in mb_update_avg_fragment_size() when block bitmap ↵Baokun Li
corrupt [ Upstream commit 993bf0f4c393b3667830918f9247438a8f6fdb5b ] Determine if bb_fragments is 0 instead of determining bb_free to eliminate the risk of dividing by zero when the block bitmap is corrupted. Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240104142040.2835097-6-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01cifs: translate network errors on send to -ECONNABORTEDShyam Prasad N
[ Upstream commit a68106a6928e0a6680f12bcc7338c0dddcfe4d11 ] When the network stack returns various errors, we today bubble up the error to the user (in case of soft mounts). This change translates all network errors except -EINTR and -EAGAIN to -ECONNABORTED. A similar approach is taken when we receive network errors when reading from the socket. The change also forces the cifsd thread to reconnect during it's next activity. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01smb: Work around Clang __bdos() type confusionKees Cook
[ Upstream commit 8deb05c84b63b4fdb8549e08942867a68924a5b8 ] Recent versions of Clang gets confused about the possible size of the "user" allocation, and CONFIG_FORTIFY_SOURCE ends up emitting a warning[1]: repro.c:126:4: warning: call to '__write_overflow_field' declared with 'warning' attribute: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Wattribute-warning] 126 | __write_overflow_field(p_size_field, size); | ^ for this memset(): int len; __le16 *user; ... len = ses->user_name ? strlen(ses->user_name) : 0; user = kmalloc(2 + (len * 2), GFP_KERNEL); ... if (len) { ... } else { memset(user, '\0', 2); } While Clang works on this bug[2], switch to using a direct assignment, which avoids memset() entirely which both simplifies the code and silences the false positive warning. (Making "len" size_t also silences the warning, but the direct assignment seems better.) Reported-by: Nathan Chancellor <nathan@kernel.org> Closes: https://github.com/ClangBuiltLinux/linux/issues/1966 [1] Link: https://github.com/llvm/llvm-project/issues/77813 [2] Cc: Steve French <sfrench@samba.org> Cc: Paulo Alcantara <pc@manguebit.com> Cc: Ronnie Sahlberg <ronniesahlberg@gmail.com> Cc: Shyam Prasad N <sprasad@microsoft.com> Cc: Tom Talpey <tom@talpey.com> Cc: linux-cifs@vger.kernel.org Cc: llvm@lists.linux.dev Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-03-01cifs: open_cached_dir should not rely on primary channelShyam Prasad N
[ Upstream commit 936eba9cfb5cfbf6a2c762cd163605f2b784e03e ] open_cached_dir today selects ses->server a.k.a primary channel to send requests. When multichannel is used, the primary channel maybe down. So it does not make sense to rely only on that channel. This fix makes this function pick a channel with the standard helper function cifs_pick_channel. Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23fs/ntfs3: Add null pointer checksKonstantin Komarov
commit fc4992458e0aa2d2e82a25c922e6ac36c2d91083 upstream. Added null pointer checks in function ntfs_security_init. Also added le32_to_cpu in functions ntfs_security_init and indx_read. Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com> Cc: "Doebel, Bjoern" <doebel@amazon.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23nilfs2: replace WARN_ONs for invalid DAT metadata block requestsRyusuke Konishi
commit 5124a0a549857c4b87173280e192eea24dea72ad upstream. If DAT metadata file block access fails due to corruption of the DAT file or abnormal virtual block numbers held by b-trees or inodes, a kernel warning is generated. This replaces the WARN_ONs by error output, so that a kernel, booted with panic_on_warn, does not panic. This patch also replaces the detected return code -ENOENT with another internal code -EINVAL to notify the bmap layer of metadata corruption. When the bmap layer sees -EINVAL, it handles the abnormal situation with nilfs_bmap_convert_error() and finally returns code -EIO as it should. Link: https://lkml.kernel.org/r/0000000000005cc3d205ea23ddcf@google.com Link: https://lkml.kernel.org/r/20230126164114.6911-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: <syzbot+5d5d25f90f195a3cfcb4@syzkaller.appspotmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23nilfs2: fix potential bug in end_buffer_async_writeRyusuke Konishi
commit 5bc09b397cbf1221f8a8aacb1152650c9195b02b upstream. According to a syzbot report, end_buffer_async_write(), which handles the completion of block device writes, may detect abnormal condition of the buffer async_write flag and cause a BUG_ON failure when using nilfs2. Nilfs2 itself does not use end_buffer_async_write(). But, the async_write flag is now used as a marker by commit 7f42ec394156 ("nilfs2: fix issue with race condition of competition between segments for dirty blocks") as a means of resolving double list insertion of dirty blocks in nilfs_lookup_dirty_data_buffers() and nilfs_lookup_node_buffers() and the resulting crash. This modification is safe as long as it is used for file data and b-tree node blocks where the page caches are independent. However, it was irrelevant and redundant to also introduce async_write for segment summary and super root blocks that share buffers with the backing device. This led to the possibility that the BUG_ON check in end_buffer_async_write would fail as described above, if independent writebacks of the backing device occurred in parallel. The use of async_write for segment summary buffers has already been removed in a previous change. Fix this issue by removing the manipulation of the async_write flag for the remaining super root block buffer. Link: https://lkml.kernel.org/r/20240203161645.4992-1-konishi.ryusuke@gmail.com Fixes: 7f42ec394156 ("nilfs2: fix issue with race condition of competition between segments for dirty blocks") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+5c04210f7c7f897c1e7f@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/00000000000019a97c05fd42f8c8@google.com Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23smb: client: fix parsing of SMB3.1.1 POSIX create contextPaulo Alcantara
commit 76025cc2285d9ede3d717fe4305d66f8be2d9346 upstream. The data offset for the SMB3.1.1 POSIX create context will always be 8-byte aligned so having the check 'noff + nlen >= doff' in smb2_parse_contexts() is wrong as it will lead to -EINVAL because noff + nlen == doff. Fix the sanity check to correctly handle aligned create context data. Fixes: af1689a9b770 ("smb: client: fix potential OOBs in smb2_parse_contexts()") Signed-off-by: Paulo Alcantara <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Guruswamy Basavaiah <guruswamy.basavaiah@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23smb: client: fix potential OOBs in smb2_parse_contexts()Paulo Alcantara
commit af1689a9b7701d9907dfc84d2a4b57c4bc907144 upstream. Validate offsets and lengths before dereferencing create contexts in smb2_parse_contexts(). This fixes following oops when accessing invalid create contexts from server: BUG: unable to handle page fault for address: ffff8881178d8cc3 #PF: supervisor read access in kernel mode #PF: error_code(0x0000) - not-present page PGD 4a01067 P4D 4a01067 PUD 0 Oops: 0000 [#1] PREEMPT SMP NOPTI CPU: 3 PID: 1736 Comm: mount.cifs Not tainted 6.7.0-rc4 #1 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.2-3-gd478f380-rebuilt.opensuse.org 04/01/2014 RIP: 0010:smb2_parse_contexts+0xa0/0x3a0 [cifs] Code: f8 10 75 13 48 b8 93 ad 25 50 9c b4 11 e7 49 39 06 0f 84 d2 00 00 00 8b 45 00 85 c0 74 61 41 29 c5 48 01 c5 41 83 fd 0f 76 55 <0f> b7 7d 04 0f b7 45 06 4c 8d 74 3d 00 66 83 f8 04 75 bc ba 04 00 RSP: 0018:ffffc900007939e0 EFLAGS: 00010216 RAX: ffffc90000793c78 RBX: ffff8880180cc000 RCX: ffffc90000793c90 RDX: ffffc90000793cc0 RSI: ffff8880178d8cc0 RDI: ffff8880180cc000 RBP: ffff8881178d8cbf R08: ffffc90000793c22 R09: 0000000000000000 R10: ffff8880180cc000 R11: 0000000000000024 R12: 0000000000000000 R13: 0000000000000020 R14: 0000000000000000 R15: ffffc90000793c22 FS: 00007f873753cbc0(0000) GS:ffff88806bc00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffff8881178d8cc3 CR3: 00000000181ca000 CR4: 0000000000750ef0 PKRU: 55555554 Call Trace: <TASK> ? __die+0x23/0x70 ? page_fault_oops+0x181/0x480 ? search_module_extables+0x19/0x60 ? srso_alias_return_thunk+0x5/0xfbef5 ? exc_page_fault+0x1b6/0x1c0 ? asm_exc_page_fault+0x26/0x30 ? smb2_parse_contexts+0xa0/0x3a0 [cifs] SMB2_open+0x38d/0x5f0 [cifs] ? smb2_is_path_accessible+0x138/0x260 [cifs] smb2_is_path_accessible+0x138/0x260 [cifs] cifs_is_path_remote+0x8d/0x230 [cifs] cifs_mount+0x7e/0x350 [cifs] cifs_smb3_do_mount+0x128/0x780 [cifs] smb3_get_tree+0xd9/0x290 [cifs] vfs_get_tree+0x2c/0x100 ? capable+0x37/0x70 path_mount+0x2d7/0xb80 ? srso_alias_return_thunk+0x5/0xfbef5 ? _raw_spin_unlock_irqrestore+0x44/0x60 __x64_sys_mount+0x11a/0x150 do_syscall_64+0x47/0xf0 entry_SYSCALL_64_after_hwframe+0x6f/0x77 RIP: 0033:0x7f8737657b1e Reported-by: Robert Morris <rtm@csail.mit.edu> Cc: stable@vger.kernel.org Signed-off-by: Paulo Alcantara (SUSE) <pc@manguebit.com> Signed-off-by: Steve French <stfrench@microsoft.com> [Guru: Modified the patch to be applicable to the cached_dir.c file.] Signed-off-by: Guruswamy Basavaiah <guruswamy.basavaiah@broadcom.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23nfsd: don't take fi_lock in nfsd_break_deleg_cb()NeilBrown
[ Upstream commit 5ea9a7c5fe4149f165f0e3b624fe08df02b6c301 ] A recent change to check_for_locks() changed it to take ->flc_lock while holding ->fi_lock. This creates a lock inversion (reported by lockdep) because there is a case where ->fi_lock is taken while holding ->flc_lock. ->flc_lock is held across ->fl_lmops callbacks, and nfsd_break_deleg_cb() is one of those and does take ->fi_lock. However it doesn't need to. Prior to v4.17-rc1~110^2~22 ("nfsd: create a separate lease for each delegation") nfsd_break_deleg_cb() would walk the ->fi_delegations list and so needed the lock. Since then it doesn't walk the list and doesn't need the lock. Two actions are performed under the lock. One is to call nfsd_break_one_deleg which calls nfsd4_run_cb(). These doesn't act on the nfs4_file at all, so don't need the lock. The other is to set ->fi_had_conflict which is in the nfs4_file. This field is only ever set here (except when initialised to false) so there is no possible problem will multiple threads racing when setting it. The field is tested twice in nfs4_set_delegation(). The first test does not hold a lock and is documented as an opportunistic optimisation, so it doesn't impose any need to hold ->fi_lock while setting ->fi_had_conflict. The second test in nfs4_set_delegation() *is* make under ->fi_lock, so removing the locking when ->fi_had_conflict is set could make a change. The change could only be interesting if ->fi_had_conflict tested as false even though nfsd_break_one_deleg() ran before ->fi_lock was unlocked. i.e. while hash_delegation_locked() was running. As hash_delegation_lock() doesn't interact in any way with nfs4_run_cb() there can be no importance to this interaction. So this patch removes the locking from nfsd_break_one_deleg() and moves the final test on ->fi_had_conflict out of the locked region to make it clear that locking isn't important to the test. It is still tested *after* vfs_setlease() has succeeded. This might be significant and as vfs_setlease() takes ->flc_lock, and nfsd_break_one_deleg() is called under ->flc_lock this "after" is a true ordering provided by a spinlock. Fixes: edcf9725150e ("nfsd: fix RELEASE_LOCKOWNER") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23nfsd: fix RELEASE_LOCKOWNERNeilBrown
[ Upstream commit edcf9725150e42beeca42d085149f4c88fa97afd ] The test on so_count in nfsd4_release_lockowner() is nonsense and harmful. Revert to using check_for_locks(), changing that to not sleep. First: harmful. As is documented in the kdoc comment for nfsd4_release_lockowner(), the test on so_count can transiently return a false positive resulting in a return of NFS4ERR_LOCKS_HELD when in fact no locks are held. This is clearly a protocol violation and with the Linux NFS client it can cause incorrect behaviour. If RELEASE_LOCKOWNER is sent while some other thread is still processing a LOCK request which failed because, at the time that request was received, the given owner held a conflicting lock, then the nfsd thread processing that LOCK request can hold a reference (conflock) to the lock owner that causes nfsd4_release_lockowner() to return an incorrect error. The Linux NFS client ignores that NFS4ERR_LOCKS_HELD error because it never sends NFS4_RELEASE_LOCKOWNER without first releasing any locks, so it knows that the error is impossible. It assumes the lock owner was in fact released so it feels free to use the same lock owner identifier in some later locking request. When it does reuse a lock owner identifier for which a previous RELEASE failed, it will naturally use a lock_seqid of zero. However the server, which didn't release the lock owner, will expect a larger lock_seqid and so will respond with NFS4ERR_BAD_SEQID. So clearly it is harmful to allow a false positive, which testing so_count allows. The test is nonsense because ... well... it doesn't mean anything. so_count is the sum of three different counts. 1/ the set of states listed on so_stateids 2/ the set of active vfs locks owned by any of those states 3/ various transient counts such as for conflicting locks. When it is tested against '2' it is clear that one of these is the transient reference obtained by find_lockowner_str_locked(). It is not clear what the other one is expected to be. In practice, the count is often 2 because there is precisely one state on so_stateids. If there were more, this would fail. In my testing I see two circumstances when RELEASE_LOCKOWNER is called. In one case, CLOSE is called before RELEASE_LOCKOWNER. That results in all the lock states being removed, and so the lockowner being discarded (it is removed when there are no more references which usually happens when the lock state is discarded). When nfsd4_release_lockowner() finds that the lock owner doesn't exist, it returns success. The other case shows an so_count of '2' and precisely one state listed in so_stateid. It appears that the Linux client uses a separate lock owner for each file resulting in one lock state per lock owner, so this test on '2' is safe. For another client it might not be safe. So this patch changes check_for_locks() to use the (newish) find_any_file_locked() so that it doesn't take a reference on the nfs4_file and so never calls nfsd_file_put(), and so never sleeps. With this check is it safe to restore the use of check_for_locks() rather than testing so_count against the mysterious '2'. Fixes: ce3c4ad7f4ce ("NFSD: Fix possible sleep during nfsd4_release_lockowner()") Signed-off-by: NeilBrown <neilb@suse.de> Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org # v6.2+ Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23fs/proc: do_task_stat: move thread_group_cputime_adjusted() outside of ↵Oleg Nesterov
lock_task_sighand() commit 60f92acb60a989b14e4b744501a0df0f82ef30a3 upstream. Patch series "fs/proc: do_task_stat: use sig->stats_". do_task_stat() has the same problem as getrusage() had before "getrusage: use sig->stats_lock rather than lock_task_sighand()": a hard lockup. If NR_CPUS threads call lock_task_sighand() at the same time and the process has NR_THREADS, spin_lock_irq will spin with irqs disabled O(NR_CPUS * NR_THREADS) time. This patch (of 3): thread_group_cputime() does its own locking, we can safely shift thread_group_cputime_adjusted() which does another for_each_thread loop outside of ->siglock protected section. Not only this removes for_each_thread() from the critical section with irqs disabled, this removes another case when stats_lock is taken with siglock held. We want to remove this dependency, then we can change the users of stats_lock to not disable irqs. Link: https://lkml.kernel.org/r/20240123153313.GA21832@redhat.com Link: https://lkml.kernel.org/r/20240123153355.GA21854@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Dylan Hatch <dylanbhatch@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23mm: hugetlb pages should not be reserved by shmat() if SHM_NORESERVEPrakash Sangappa
commit e656c7a9e59607d1672d85ffa9a89031876ffe67 upstream. For shared memory of type SHM_HUGETLB, hugetlb pages are reserved in shmget() call. If SHM_NORESERVE flags is specified then the hugetlb pages are not reserved. However when the shared memory is attached with the shmat() call the hugetlb pages are getting reserved incorrectly for SHM_HUGETLB shared memory created with SHM_NORESERVE which is a bug. ------------------------------- Following test shows the issue. $cat shmhtb.c int main() { int shmflags = 0660 | IPC_CREAT | SHM_HUGETLB | SHM_NORESERVE; int shmid; shmid = shmget(SKEY, SHMSZ, shmflags); if (shmid < 0) { printf("shmat: shmget() failed, %d\n", errno); return 1; } printf("After shmget()\n"); system("cat /proc/meminfo | grep -i hugepages_"); shmat(shmid, NULL, 0); printf("\nAfter shmat()\n"); system("cat /proc/meminfo | grep -i hugepages_"); shmctl(shmid, IPC_RMID, NULL); return 0; } #sysctl -w vm.nr_hugepages=20 #./shmhtb After shmget() HugePages_Total: 20 HugePages_Free: 20 HugePages_Rsvd: 0 HugePages_Surp: 0 After shmat() HugePages_Total: 20 HugePages_Free: 20 HugePages_Rsvd: 5 <-- HugePages_Surp: 0 -------------------------------- Fix is to ensure that hugetlb pages are not reserved for SHM_HUGETLB shared memory in the shmat() call. Link: https://lkml.kernel.org/r/1706040282-12388-1-git-send-email-prakash.sangappa@oracle.com Signed-off-by: Prakash Sangappa <prakash.sangappa@oracle.com> Acked-by: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23fs,hugetlb: fix NULL pointer dereference in hugetlbs_fill_superOscar Salvador
commit 79d72c68c58784a3e1cd2378669d51bfd0cb7498 upstream. When configuring a hugetlb filesystem via the fsconfig() syscall, there is a possible NULL dereference in hugetlbfs_fill_super() caused by assigning NULL to ctx->hstate in hugetlbfs_parse_param() when the requested pagesize is non valid. E.g: Taking the following steps: fd = fsopen("hugetlbfs", FSOPEN_CLOEXEC); fsconfig(fd, FSCONFIG_SET_STRING, "pagesize", "1024", 0); fsconfig(fd, FSCONFIG_CMD_CREATE, NULL, NULL, 0); Given that the requested "pagesize" is invalid, ctxt->hstate will be replaced with NULL, losing its previous value, and we will print an error: ... ... case Opt_pagesize: ps = memparse(param->string, &rest); ctx->hstate = h; if (!ctx->hstate) { pr_err("Unsupported page size %lu MB\n", ps / SZ_1M); return -EINVAL; } return 0; ... ... This is a problem because later on, we will dereference ctxt->hstate in hugetlbfs_fill_super() ... ... sb->s_blocksize = huge_page_size(ctx->hstate); ... ... Causing below Oops. Fix this by replacing cxt->hstate value only when then pagesize is known to be valid. kernel: hugetlbfs: Unsupported page size 0 MB kernel: BUG: kernel NULL pointer dereference, address: 0000000000000028 kernel: #PF: supervisor read access in kernel mode kernel: #PF: error_code(0x0000) - not-present page kernel: PGD 800000010f66c067 P4D 800000010f66c067 PUD 1b22f8067 PMD 0 kernel: Oops: 0000 [#1] PREEMPT SMP PTI kernel: CPU: 4 PID: 5659 Comm: syscall Tainted: G E 6.8.0-rc2-default+ #22 5a47c3fef76212addcc6eb71344aabc35190ae8f kernel: Hardware name: Intel Corp. GROVEPORT/GROVEPORT, BIOS GVPRCRB1.86B.0016.D04.1705030402 05/03/2017 kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0 kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28 kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004 kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000 kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004 kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000 kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400 kernel: FS: 00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0 kernel: Call Trace: kernel: <TASK> kernel: ? __die_body+0x1a/0x60 kernel: ? page_fault_oops+0x16f/0x4a0 kernel: ? search_bpf_extables+0x65/0x70 kernel: ? fixup_exception+0x22/0x310 kernel: ? exc_page_fault+0x69/0x150 kernel: ? asm_exc_page_fault+0x22/0x30 kernel: ? __pfx_hugetlbfs_fill_super+0x10/0x10 kernel: ? hugetlbfs_fill_super+0xb4/0x1a0 kernel: ? hugetlbfs_fill_super+0x28/0x1a0 kernel: ? __pfx_hugetlbfs_fill_super+0x10/0x10 kernel: vfs_get_super+0x40/0xa0 kernel: ? __pfx_bpf_lsm_capable+0x10/0x10 kernel: vfs_get_tree+0x25/0xd0 kernel: vfs_cmd_create+0x64/0xe0 kernel: __x64_sys_fsconfig+0x395/0x410 kernel: do_syscall_64+0x80/0x160 kernel: ? syscall_exit_to_user_mode+0x82/0x240 kernel: ? do_syscall_64+0x8d/0x160 kernel: ? syscall_exit_to_user_mode+0x82/0x240 kernel: ? do_syscall_64+0x8d/0x160 kernel: ? exc_page_fault+0x69/0x150 kernel: entry_SYSCALL_64_after_hwframe+0x6e/0x76 kernel: RIP: 0033:0x7ffbc0cb87c9 kernel: Code: 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 97 96 0d 00 f7 d8 64 89 01 48 kernel: RSP: 002b:00007ffc29d2f388 EFLAGS: 00000206 ORIG_RAX: 00000000000001af kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffbc0cb87c9 kernel: RDX: 0000000000000000 RSI: 0000000000000006 RDI: 0000000000000003 kernel: RBP: 00007ffc29d2f3b0 R08: 0000000000000000 R09: 0000000000000000 kernel: R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000 kernel: R13: 00007ffc29d2f4c0 R14: 0000000000000000 R15: 0000000000000000 kernel: </TASK> kernel: Modules linked in: rpcsec_gss_krb5(E) auth_rpcgss(E) nfsv4(E) dns_resolver(E) nfs(E) lockd(E) grace(E) sunrpc(E) netfs(E) af_packet(E) bridge(E) stp(E) llc(E) iscsi_ibft(E) iscsi_boot_sysfs(E) intel_rapl_msr(E) intel_rapl_common(E) iTCO_wdt(E) intel_pmc_bxt(E) sb_edac(E) iTCO_vendor_support(E) x86_pkg_temp_thermal(E) intel_powerclamp(E) coretemp(E) kvm_intel(E) rfkill(E) ipmi_ssif(E) kvm(E) acpi_ipmi(E) irqbypass(E) pcspkr(E) igb(E) ipmi_si(E) mei_me(E) i2c_i801(E) joydev(E) intel_pch_thermal(E) i2c_smbus(E) dca(E) lpc_ich(E) mei(E) ipmi_devintf(E) ipmi_msghandler(E) acpi_pad(E) tiny_power_button(E) button(E) fuse(E) efi_pstore(E) configfs(E) ip_tables(E) x_tables(E) ext4(E) mbcache(E) jbd2(E) hid_generic(E) usbhid(E) sd_mod(E) t10_pi(E) crct10dif_pclmul(E) crc32_pclmul(E) crc32c_intel(E) polyval_clmulni(E) ahci(E) xhci_pci(E) polyval_generic(E) gf128mul(E) ghash_clmulni_intel(E) sha512_ssse3(E) sha256_ssse3(E) xhci_pci_renesas(E) libahci(E) ehci_pci(E) sha1_ssse3(E) xhci_hcd(E) ehci_hcd(E) libata(E) kernel: mgag200(E) i2c_algo_bit(E) usbcore(E) wmi(E) sg(E) dm_multipath(E) dm_mod(E) scsi_dh_rdac(E) scsi_dh_emc(E) scsi_dh_alua(E) scsi_mod(E) scsi_common(E) aesni_intel(E) crypto_simd(E) cryptd(E) kernel: Unloaded tainted modules: acpi_cpufreq(E):1 fjes(E):1 kernel: CR2: 0000000000000028 kernel: ---[ end trace 0000000000000000 ]--- kernel: RIP: 0010:hugetlbfs_fill_super+0xb4/0x1a0 kernel: Code: 48 8b 3b e8 3e c6 ed ff 48 85 c0 48 89 45 20 0f 84 d6 00 00 00 48 b8 ff ff ff ff ff ff ff 7f 4c 89 e7 49 89 44 24 20 48 8b 03 <8b> 48 28 b8 00 10 00 00 48 d3 e0 49 89 44 24 18 48 8b 03 8b 40 28 kernel: RSP: 0018:ffffbe9960fcbd48 EFLAGS: 00010246 kernel: RAX: 0000000000000000 RBX: ffff9af5272ae780 RCX: 0000000000372004 kernel: RDX: ffffffffffffffff RSI: ffffffffffffffff RDI: ffff9af555e9b000 kernel: RBP: ffff9af52ee66b00 R08: 0000000000000040 R09: 0000000000370004 kernel: R10: ffffbe9960fcbd48 R11: 0000000000000040 R12: ffff9af555e9b000 kernel: R13: ffffffffa66b86c0 R14: ffff9af507d2f400 R15: ffff9af507d2f400 kernel: FS: 00007ffbc0ba4740(0000) GS:ffff9b0bd7000000(0000) knlGS:0000000000000000 kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 kernel: CR2: 0000000000000028 CR3: 00000001b1ee0000 CR4: 00000000001506f0 Link: https://lkml.kernel.org/r/20240130210418.3771-1-osalvador@suse.de Fixes: 32021982a324 ("hugetlbfs: Convert to fs_context") Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: Muchun Song <muchun.song@linux.dev> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23ceph: prevent use-after-free in encode_cap_msg()Rishabh Dave
commit cda4672da1c26835dcbd7aec2bfed954eda9b5ef upstream. In fs/ceph/caps.c, in encode_cap_msg(), "use after free" error was caught by KASAN at this line - 'ceph_buffer_get(arg->xattr_buf);'. This implies before the refcount could be increment here, it was freed. In same file, in "handle_cap_grant()" refcount is decremented by this line - 'ceph_buffer_put(ci->i_xattrs.blob);'. It appears that a race occurred and resource was freed by the latter line before the former line could increment it. encode_cap_msg() is called by __send_cap() and __send_cap() is called by ceph_check_caps() after calling __prep_cap(). __prep_cap() is where arg->xattr_buf is assigned to ci->i_xattrs.blob. This is the spot where the refcount must be increased to prevent "use after free" error. Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/59259 Signed-off-by: Rishabh Dave <ridave@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Xiubo Li <xiubli@redhat.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23fs: relax mount_setattr() permission checksChristian Brauner
commit 46f5ab762d048dad224436978315cbc2fa79c630 upstream. When we added mount_setattr() I added additional checks compared to the legacy do_reconfigure_mnt() and do_change_type() helpers used by regular mount(2). If that mount had a parent then verify that the caller and the mount namespace the mount is attached to match and if not make sure that it's an anonymous mount. The real rootfs falls into neither category. It is neither an anoymous mount because it is obviously attached to the initial mount namespace but it also obviously doesn't have a parent mount. So that means legacy mount(2) allows changing mount properties on the real rootfs but mount_setattr(2) blocks this. I never thought much about this but of course someone on this planet of earth changes properties on the real rootfs as can be seen in [1]. Since util-linux finally switched to the new mount api in 2.39 not so long ago it also relies on mount_setattr() and that surfaced this issue when Fedora 39 finally switched to it. Fix this. Link: https://bugzilla.redhat.com/show_bug.cgi?id=2256843 Link: https://lore.kernel.org/r/20240206-vfs-mount-rootfs-v1-1-19b335eee133@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Reported-by: Karel Zak <kzak@redhat.com> Cc: stable@vger.kernel.org # v5.12+ Signed-off-by: Christian Brauner <brauner@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23zonefs: Improve error handlingDamien Le Moal
commit 14db5f64a971fce3d8ea35de4dfc7f443a3efb92 upstream. Write error handling is racy and can sometime lead to the error recovery path wrongly changing the inode size of a sequential zone file to an incorrect value which results in garbage data being readable at the end of a file. There are 2 problems: 1) zonefs_file_dio_write() updates a zone file write pointer offset after issuing a direct IO with iomap_dio_rw(). This update is done only if the IO succeed for synchronous direct writes. However, for asynchronous direct writes, the update is done without waiting for the IO completion so that the next asynchronous IO can be immediately issued. However, if an asynchronous IO completes with a failure right before the i_truncate_mutex lock protecting the update, the update may change the value of the inode write pointer offset that was corrected by the error path (zonefs_io_error() function). 2) zonefs_io_error() is called when a read or write error occurs. This function executes a report zone operation using the callback function zonefs_io_error_cb(), which does all the error recovery handling based on the current zone condition, write pointer position and according to the mount options being used. However, depending on the zoned device being used, a report zone callback may be executed in a context that is different from the context of __zonefs_io_error(). As a result, zonefs_io_error_cb() may be executed without the inode truncate mutex lock held, which can lead to invalid error processing. Fix both problems as follows: - Problem 1: Perform the inode write pointer offset update before a direct write is issued with iomap_dio_rw(). This is safe to do as partial direct writes are not supported (IOMAP_DIO_PARTIAL is not set) and any failed IO will trigger the execution of zonefs_io_error() which will correct the inode write pointer offset to reflect the current state of the one on the device. - Problem 2: Change zonefs_io_error_cb() into zonefs_handle_io_error() and call this function directly from __zonefs_io_error() after obtaining the zone information using blkdev_report_zones() with a simple callback function that copies to a local stack variable the struct blk_zone obtained from the device. This ensures that error handling is performed holding the inode truncate mutex. This change also simplifies error handling for conventional zone files by bypassing the execution of report zones entirely. This is safe to do because the condition of conventional zones cannot be read-only or offline and conventional zone files are always fully mapped with a constant file size. Reported-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: 8dcc1a9d90c1 ("fs: New zonefs file system") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23nilfs2: fix hang in nilfs_lookup_dirty_data_buffers()Ryusuke Konishi
commit 38296afe3c6ee07319e01bb249aa4bb47c07b534 upstream. Syzbot reported a hang issue in migrate_pages_batch() called by mbind() and nilfs_lookup_dirty_data_buffers() called in the log writer of nilfs2. While migrate_pages_batch() locks a folio and waits for the writeback to complete, the log writer thread that should bring the writeback to completion picks up the folio being written back in nilfs_lookup_dirty_data_buffers() that it calls for subsequent log creation and was trying to lock the folio. Thus causing a deadlock. In the first place, it is unexpected that folios/pages in the middle of writeback will be updated and become dirty. Nilfs2 adds a checksum to verify the validity of the log being written and uses it for recovery at mount, so data changes during writeback are suppressed. Since this is broken, an unclean shutdown could potentially cause recovery to fail. Investigation revealed that the root cause is that the wait for writeback completion in nilfs_page_mkwrite() is conditional, and if the backing device does not require stable writes, data may be modified without waiting. Fix these issues by making nilfs_page_mkwrite() wait for writeback to finish regardless of the stable write requirement of the backing device. Link: https://lkml.kernel.org/r/20240131145657.4209-1-konishi.ryusuke@gmail.com Fixes: 1d1d1a767206 ("mm: only enforce stable page writes if the backing device requires it") Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by: syzbot+ee2ae68da3b22d04cd8d@syzkaller.appspotmail.com Closes: https://lkml.kernel.org/r/00000000000047d819061004ad6c@google.com Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23nilfs2: fix data corruption in dsync block recovery for small block sizesRyusuke Konishi
commit 67b8bcbaed4777871bb0dcc888fb02a614a98ab1 upstream. The helper function nilfs_recovery_copy_block() of nilfs_recovery_dsync_blocks(), which recovers data from logs created by data sync writes during a mount after an unclean shutdown, incorrectly calculates the on-page offset when copying repair data to the file's page cache. In environments where the block size is smaller than the page size, this flaw can cause data corruption and leak uninitialized memory bytes during the recovery process. Fix these issues by correcting this byte offset calculation on the page. Link: https://lkml.kernel.org/r/20240124121936.10575-1-konishi.ryusuke@gmail.com Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23ksmbd: free aux buffer if ksmbd_iov_pin_rsp_read failsFedor Pchelkin
commit 108a020c64434fed4b69762879d78cd24088b4c7 upstream. ksmbd_iov_pin_rsp_read() doesn't free the provided aux buffer if it fails. Seems to be the caller's responsibility to clear the buffer in error case. Found by Linux Verification Center (linuxtesting.org). Fixes: e2b76ab8b5c9 ("ksmbd: add support for read compound") Cc: stable@vger.kernel.org Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23ext4: avoid bb_free and bb_fragments inconsistency in mb_free_blocks()Baokun Li
commit 2331fd4a49864e1571b4f50aa3aa1536ed6220d0 upstream. After updating bb_free in mb_free_blocks, it is possible to return without updating bb_fragments because the block being freed is found to have already been freed, which leads to inconsistency between bb_free and bb_fragments. Since the group may be unlocked in ext4_grp_locked_error(), this can lead to problems such as dividing by zero when calculating the average fragment length. Hence move the update of bb_free to after the block double-free check guarantees that the corresponding statistics are updated only after the core block bitmap is modified. Fixes: eabe0444df90 ("ext4: speed-up releasing blocks on commit") CC: <stable@vger.kernel.org> # 3.10 Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240104142040.2835097-5-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23ext4: fix double-free of blocks due to wrong extents moved_lenBaokun Li
commit 55583e899a5357308274601364741a83e78d6ac4 upstream. In ext4_move_extents(), moved_len is only updated when all moves are successfully executed, and only discards orig_inode and donor_inode preallocations when moved_len is not zero. When the loop fails to exit after successfully moving some extents, moved_len is not updated and remains at 0, so it does not discard the preallocations. If the moved extents overlap with the preallocated extents, the overlapped extents are freed twice in ext4_mb_release_inode_pa() and ext4_process_freed_data() (as described in commit 94d7c16cbbbd ("ext4: Fix double-free of blocks with EXT4_IOC_MOVE_EXT")), and bb_free is incremented twice. Hence when trim is executed, a zero-division bug is triggered in mb_update_avg_fragment_size() because bb_free is not zero and bb_fragments is zero. Therefore, update move_len after each extent move to avoid the issue. Reported-by: Wei Chen <harperchen1110@gmail.com> Reported-by: xingwei lee <xrivendell7@gmail.com> Closes: https://lore.kernel.org/r/CAO4mrferzqBUnCag8R3m2zf897ts9UEuhjFQGPtODT92rYyR2Q@mail.gmail.com Fixes: fcf6b1b729bc ("ext4: refactor ext4_move_extents code base") CC: <stable@vger.kernel.org> # 3.18 Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240104142040.2835097-2-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23cifs: fix underflow in parse_server_interfaces()Dan Carpenter
[ Upstream commit cffe487026be13eaf37ea28b783d9638ab147204 ] In this loop, we step through the buffer and after each item we check if the size_left is greater than the minimum size we need. However, the problem is that "bytes_left" is type ssize_t while sizeof() is type size_t. That means that because of type promotion, the comparison is done as an unsigned and if we have negative bytes left the loop continues instead of ending. Fixes: fe856be475f7 ("CIFS: parse and store info on iface queries") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-02-23btrfs: don't drop extent_map for free space inode on write errorJosef Bacik
commit 5571e41ec6e56e35f34ae9f5b3a335ef510e0ade upstream. While running the CI for an unrelated change I hit the following panic with generic/648 on btrfs_holes_spacecache. assertion failed: block_start != EXTENT_MAP_HOLE, in fs/btrfs/extent_io.c:1385 ------------[ cut here ]------------ kernel BUG at fs/btrfs/extent_io.c:1385! invalid opcode: 0000 [#1] PREEMPT SMP NOPTI CPU: 1 PID: 2695096 Comm: fsstress Kdump: loaded Tainted: G W 6.8.0-rc2+ #1 RIP: 0010:__extent_writepage_io.constprop.0+0x4c1/0x5c0 Call Trace: <TASK> extent_write_cache_pages+0x2ac/0x8f0 extent_writepages+0x87/0x110 do_writepages+0xd5/0x1f0 filemap_fdatawrite_wbc+0x63/0x90 __filemap_fdatawrite_range+0x5c/0x80 btrfs_fdatawrite_range+0x1f/0x50 btrfs_write_out_cache+0x507/0x560 btrfs_write_dirty_block_groups+0x32a/0x420 commit_cowonly_roots+0x21b/0x290 btrfs_commit_transaction+0x813/0x1360 btrfs_sync_file+0x51a/0x640 __x64_sys_fdatasync+0x52/0x90 do_syscall_64+0x9c/0x190 entry_SYSCALL_64_after_hwframe+0x6e/0x76 This happens because we fail to write out the free space cache in one instance, come back around and attempt to write it again. However on the second pass through we go to call btrfs_get_extent() on the inode to get the extent mapping. Because this is a new block group, and with the free space inode we always search the commit root to avoid deadlocking with the tree, we find nothing and return a EXTENT_MAP_HOLE for the requested range. This happens because the first time we try to write the space cache out we hit an error, and on an error we drop the extent mapping. This is normal for normal files, but the free space cache inode is special. We always expect the extent map to be correct. Thus the second time through we end up with a bogus extent map. Since we're deprecating this feature, the most straightforward way to fix this is to simply skip dropping the extent map range for this failed range. I shortened the test by using error injection to stress the area to make it easier to reproduce. With this patch in place we no longer panic with my error injection test. CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-02-23btrfs: reject encoded write if inode has nodatasum flag setFilipe Manana
commit 1bd96c92c6a0a4d43815eb685c15aa4b78879dc9 upstream. Currently we allow an encoded write against inodes that have the NODATASUM flag set, either because they are NOCOW files or they were created while the filesystem was mounted with "-o nodatasum". This results in having compressed extents without corresponding checksums, which is a filesystem inconsistency reported by 'btrfs check'. For example, running btrfs/281 with MOUNT_OPTIONS="-o nodatacow" triggers this and 'btrfs check' errors out with: [1/7] checking root items [2/7] checking extents [3/7] checking free space tree [4/7] checking fs roots root 256 inode 257 errors 1040, bad file extent, some csum missing root 256 inode 258 errors 1040, bad file extent, some csum missing ERROR: errors found in fs roots (...) So reject encoded writes if the target inode has NODATASUM set. CC: stable@vger.kernel.org # 6.1+ Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>