summaryrefslogtreecommitdiff
path: root/drivers/vhost
AgeCommit message (Collapse)Author
2023-11-08vhost: Allow null msg.size on VHOST_IOTLB_INVALIDATEEric Auger
commit ca50ec377c2e94b0a9f8735de2856cd0f13beab4 upstream. Commit e2ae38cf3d91 ("vhost: fix hung thread due to erroneous iotlb entries") Forbade vhost iotlb msg with null size to prevent entries with size = start = 0 and last = ULONG_MAX to end up in the iotlb. Then commit 95932ab2ea07 ("vhost: allow batching hint without size") only applied the check for VHOST_IOTLB_UPDATE and VHOST_IOTLB_INVALIDATE message types to fix a regression observed with batching hit. Still, the introduction of that check introduced a regression for some users attempting to invalidate the whole ULONG_MAX range by setting the size to 0. This is the case with qemu/smmuv3/vhost integration which does not work anymore. It Looks safe to partially revert the original commit and allow VHOST_IOTLB_INVALIDATE messages with null size. vhost_iotlb_del_range() will compute a correct end iova. Same for vhost_vdpa_iotlb_unmap(). Signed-off-by: Eric Auger <eric.auger@redhat.com> Fixes: e2ae38cf3d91 ("vhost: fix hung thread due to erroneous iotlb entries") Cc: stable@vger.kernel.org # v5.17+ Acked-by: Jason Wang <jasowang@redhat.com> Message-Id: <20230927140544.205088-1-eric.auger@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-10-10vringh: don't use vringh_kiov_advance() in vringh_iov_xfer()Stefano Garzarella
commit 7aed44babc7f97e82b38e9a68515e699692cc100 upstream. In the while loop of vringh_iov_xfer(), `partlen` could be 0 if one of the `iov` has 0 lenght. In this case, we should skip the iov and go to the next one. But calling vringh_kiov_advance() with 0 lenght does not cause the advancement, since it returns immediately if asked to advance by 0 bytes. Let's restore the code that was there before commit b8c06ad4d67d ("vringh: implement vringh_kiov_advance()"), avoiding using vringh_kiov_advance(). Fixes: b8c06ad4d67d ("vringh: implement vringh_kiov_advance()") Cc: stable@vger.kernel.org Reported-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-06-28vhost_net: revert upend_idx only on retriable errorAndrey Smetanin
[ Upstream commit 1f5d2e3bab16369d5d4b4020a25db4ab1f4f082c ] Fix possible virtqueue used buffers leak and corresponding stuck in case of temporary -EIO from sendmsg() which is produced by tun driver while backend device is not up. In case of no-retriable error and zcopy do not revert upend_idx to pass packet data (that is update used_idx in corresponding vhost_zerocopy_signal_used()) as if packet data has been transferred successfully. v2: set vq->heads[ubuf->desc].len equal to VHOST_DMA_DONE_LEN in case of fake successful transmit. Signed-off-by: Andrey Smetanin <asmetanin@yandex-team.ru> Message-Id: <20230424204411.24888-1-asmetanin@yandex-team.ru> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Andrey Smetanin <asmetanin@yandex-team.ru> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-14vhost_vdpa: support PACKED when setting-getting vring_baseShannon Nelson
[ Upstream commit beee7fdb5b56a46415a4992d28dd4c2d06eb52df ] Use the right structs for PACKED or split vqs when setting and getting the vring base. Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230424225031.18947-4-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-06-14vhost: support PACKED when setting-getting vring_baseShannon Nelson
[ Upstream commit 55d8122f5cd62d5aaa225d7167dcd14a44c850b9 ] Use the right structs for PACKED or split vqs when setting and getting the vring base. Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Signed-off-by: Shannon Nelson <shannon.nelson@amd.com> Message-Id: <20230424225031.18947-3-shannon.nelson@amd.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-02-09vhost/net: Clear the pending messages when the backend is removedEric Auger
[ Upstream commit 9526f9a2b762af16be94a72aca5d65c677d28f50 ] When the vhost iotlb is used along with a guest virtual iommu and the guest gets rebooted, some MISS messages may have been recorded just before the reboot and spuriously executed by the virtual iommu after the reboot. As vhost does not have any explicit reset user API, VHOST_NET_SET_BACKEND looks a reasonable point where to clear the pending messages, in case the backend is removed. Export vhost_clear_msg() and call it in vhost_net_set_backend() when fd == -1. Signed-off-by: Eric Auger <eric.auger@redhat.com> Suggested-by: Jason Wang <jasowang@redhat.com> Fixes: 6b1e6cc7855b0 ("vhost: new device IOTLB API") Message-Id: <20230117151518.44725-3-eric.auger@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12vhost: fix range used in translate_desc()Stefano Garzarella
[ Upstream commit 98047313cdb46828093894d0ac8b1183b8b317f9 ] vhost_iotlb_itree_first() requires `start` and `last` parameters to search for a mapping that overlaps the range. In translate_desc() we cyclically call vhost_iotlb_itree_first(), incrementing `addr` by the amount already translated, so rightly we move the `start` parameter passed to vhost_iotlb_itree_first(), but we should hold the `last` parameter constant. Let's fix it by saving the `last` parameter value before incrementing `addr` in the loop. Fixes: a9709d6874d5 ("vhost: convert pre sorted vhost memory array to interval tree") Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20221109102503.18816-3-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12vringh: fix range used in iotlb_translate()Stefano Garzarella
[ Upstream commit f85efa9b0f5381874f727bd98f56787840313f0b ] vhost_iotlb_itree_first() requires `start` and `last` parameters to search for a mapping that overlaps the range. In iotlb_translate() we cyclically call vhost_iotlb_itree_first(), incrementing `addr` by the amount already translated, so rightly we move the `start` parameter passed to vhost_iotlb_itree_first(), but we should hold the `last` parameter constant. Let's fix it by saving the `last` parameter value before incrementing `addr` in the loop. Fixes: 9ad9c49cfe97 ("vringh: IOTLB support") Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Message-Id: <20221109102503.18816-2-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-01-12vhost/vsock: Fix error handling in vhost_vsock_init()Yuan Can
[ Upstream commit 7a4efe182ca61fb3e5307e69b261c57cbf434cd4 ] A problem about modprobe vhost_vsock failed is triggered with the following log given: modprobe: ERROR: could not insert 'vhost_vsock': Device or resource busy The reason is that vhost_vsock_init() returns misc_register() directly without checking its return value, if misc_register() failed, it returns without calling vsock_core_unregister() on vhost_transport, resulting the vhost_vsock can never be installed later. A simple call graph is shown as below: vhost_vsock_init() vsock_core_register() # register vhost_transport misc_register() device_create_with_groups() device_create_groups_vargs() dev = kzalloc(...) # OOM happened # return without unregister vhost_transport Fix by calling vsock_core_unregister() when misc_register() returns error. Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") Signed-off-by: Yuan Can <yuancan@huawei.com> Message-Id: <20221108101705.45981-1-yuancan@huawei.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-26vhost/vsock: Use kvmalloc/kvfree for larger packets.Junichi Uekawa
[ Upstream commit 0e3f72931fc47bb81686020cc643cde5d9cd0bb8 ] When copying a large file over sftp over vsock, data size is usually 32kB, and kmalloc seems to fail to try to allocate 32 32kB regions. vhost-5837: page allocation failure: order:4, mode:0x24040c0 Call Trace: [<ffffffffb6a0df64>] dump_stack+0x97/0xdb [<ffffffffb68d6aed>] warn_alloc_failed+0x10f/0x138 [<ffffffffb68d868a>] ? __alloc_pages_direct_compact+0x38/0xc8 [<ffffffffb664619f>] __alloc_pages_nodemask+0x84c/0x90d [<ffffffffb6646e56>] alloc_kmem_pages+0x17/0x19 [<ffffffffb6653a26>] kmalloc_order_trace+0x2b/0xdb [<ffffffffb66682f3>] __kmalloc+0x177/0x1f7 [<ffffffffb66e0d94>] ? copy_from_iter+0x8d/0x31d [<ffffffffc0689ab7>] vhost_vsock_handle_tx_kick+0x1fa/0x301 [vhost_vsock] [<ffffffffc06828d9>] vhost_worker+0xf7/0x157 [vhost] [<ffffffffb683ddce>] kthread+0xfd/0x105 [<ffffffffc06827e2>] ? vhost_dev_set_owner+0x22e/0x22e [vhost] [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3 [<ffffffffb6eb332e>] ret_from_fork+0x4e/0x80 [<ffffffffb683dcd1>] ? flush_kthread_worker+0xf3/0xf3 Work around by doing kvmalloc instead. Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") Signed-off-by: Junichi Uekawa <uekawa@chromium.org> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Link: https://lore.kernel.org/r/20220928064538.667678-1-uekawa@chromium.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-06-14vringh: Fix loop descriptors check in the indirect casesXie Yongji
[ Upstream commit dbd29e0752286af74243cf891accf472b2f3edd8 ] We should use size of descriptor chain to test loop condition in the indirect case. And another statistical count is also introduced for indirect descriptors to avoid conflict with the statistical count of direct descriptors. Fixes: f87d0fbb5798 ("vringh: host-side implementation of virtio rings.") Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Fam Zheng <fam.zheng@bytedance.com> Message-Id: <20220505100910.137-1-xieyongji@bytedance.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-05-25Fix double fget() in vhost_net_set_backend()Al Viro
commit fb4554c2232e44d595920f4d5c66cf8f7d13f9bc upstream. Descriptor table is a shared resource; two fget() on the same descriptor may return different struct file references. get_tap_ptr_ring() is called after we'd found (and pinned) the socket we'll be using and it tries to find the private tun/tap data structures associated with it. Redoing the lookup by the same file descriptor we'd used to get the socket is racy - we need to same struct file. Thanks to Jason for spotting a braino in the original variant of patch - I'd missed the use of fd == -1 for disabling backend, and in that case we can end up with sock == NULL and sock != oldsock. Cc: stable@kernel.org Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-05-25vhost_vdpa: don't setup irq offloading when irq_num < 0Zhu Lingshan
[ Upstream commit cce0ab2b2a39072d81f98017f7b076f3410ef740 ] When irq number is negative(e.g., -EINVAL), the virtqueue may be disabled or the virtqueues are sharing a device irq. In such case, we should not setup irq offloading for a virtqueue. Signed-off-by: Zhu Lingshan <lingshan.zhu@intel.com> Link: https://lore.kernel.org/r/20220222115428.998334-3-lingshan.zhu@intel.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-13tuntap: add sanity checks about msg_controllen in sendmsgHarold Huang
[ Upstream commit 74a335a07a17d131b9263bfdbdcb5e40673ca9ca ] In patch [1], tun_msg_ctl was added to allow pass batched xdp buffers to tun_sendmsg. Although we donot use msg_controllen in this path, we should check msg_controllen to make sure the caller pass a valid msg_ctl. [1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=fe8dd45bb7556246c6b76277b1ba4296c91c2505 Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Harold Huang <baymaxhuang@gmail.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20220303022441.383865-1-baymaxhuang@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-04-08vhost: handle error while adding split ranges to iotlbAnirudh Rayabharam
commit 03a91c9af2c42ae14afafb829a4b7e6589ab5892 upstream. vhost_iotlb_add_range_ctx() handles the range [0, ULONG_MAX] by splitting it into two ranges and adding them separately. The return value of adding the first range to the iotlb is currently ignored. Check the return value and bail out in case of an error. Signed-off-by: Anirudh Rayabharam <mail@anirudhrb.com> Link: https://lore.kernel.org/r/20220312141121.4981-1-mail@anirudhrb.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Fixes: e2ae38cf3d91 ("vhost: fix hung thread due to erroneous iotlb entries") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-23vsock: each transport cycles only on its own socketsJiyong Park
[ Upstream commit 8e6ed963763fe21429eabfc76c69ce2b0163a3dd ] When iterating over sockets using vsock_for_each_connected_socket, make sure that a transport filters out sockets that don't belong to the transport. There actually was an issue caused by this; in a nested VM configuration, destroying the nested VM (which often involves the closing of /dev/vhost-vsock if there was h2g connections to the nested VM) kills not only the h2g connections, but also all existing g2h connections to the (outmost) host which are totally unrelated. Tested: Executed the following steps on Cuttlefish (Android running on a VM) [1]: (1) Enter into an `adb shell` session - to have a g2h connection inside the VM, (2) open and then close /dev/vhost-vsock by `exec 3< /dev/vhost-vsock && exec 3<&-`, (3) observe that the adb session is not reset. [1] https://android.googlesource.com/device/google/cuttlefish/ Fixes: c0cfa2d8a788 ("vsock: add multi-transports support") Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Jiyong Park <jiyong@google.com> Link: https://lore.kernel.org/r/20220311020017.1509316-1-jiyong@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-16vhost: allow batching hint without sizeJason Wang
commit 95932ab2ea07b79cdb33121e2f40ccda9e6a73b5 upstream. Commit e2ae38cf3d91 ("vhost: fix hung thread due to erroneous iotlb entries") tries to reject the IOTLB message whose size is zero. But the size is not necessarily meaningful, one example is the batching hint, so the commit breaks that. Fixing this be reject zero size message only if the message is used to update/invalidate the IOTLB. Fixes: e2ae38cf3d91 ("vhost: fix hung thread due to erroneous iotlb entries") Reported-by: Eli Cohen <elic@nvidia.com> Cc: Anirudh Rayabharam <mail@anirudhrb.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20220310075211.4801-1-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Tested-by: Eli Cohen <elic@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-03-16vhost: fix hung thread due to erroneous iotlb entriesAnirudh Rayabharam
[ Upstream commit e2ae38cf3d91837a493cb2093c87700ff3cbe667 ] In vhost_iotlb_add_range_ctx(), range size can overflow to 0 when start is 0 and last is ULONG_MAX. One instance where it can happen is when userspace sends an IOTLB message with iova=size=uaddr=0 (vhost_process_iotlb_msg). So, an entry with size = 0, start = 0, last = ULONG_MAX ends up in the iotlb. Next time a packet is sent, iotlb_access_ok() loops indefinitely due to that erroneous entry. Call Trace: <TASK> iotlb_access_ok+0x21b/0x3e0 drivers/vhost/vhost.c:1340 vq_meta_prefetch+0xbc/0x280 drivers/vhost/vhost.c:1366 vhost_transport_do_send_pkt+0xe0/0xfd0 drivers/vhost/vsock.c:104 vhost_worker+0x23d/0x3d0 drivers/vhost/vhost.c:372 kthread+0x2e9/0x3a0 kernel/kthread.c:377 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295 </TASK> Reported by syzbot at: https://syzkaller.appspot.com/bug?extid=0abd373e2e50d704db87 To fix this, do two things: 1. Return -EINVAL in vhost_chr_write_iter() when userspace asks to map a range with size 0. 2. Fix vhost_iotlb_add_range_ctx() to handle the range [0, ULONG_MAX] by splitting it into two entries. Fixes: 0bbe30668d89e ("vhost: factor out IOTLB") Reported-by: syzbot+0abd373e2e50d704db87@syzkaller.appspotmail.com Tested-by: syzbot+0abd373e2e50d704db87@syzkaller.appspotmail.com Signed-off-by: Anirudh Rayabharam <mail@anirudhrb.com> Link: https://lore.kernel.org/r/20220305095525.5145-1-mail@anirudhrb.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-03-02vhost/vsock: don't check owner in vhost_vsock_stop() while releasingStefano Garzarella
commit a58da53ffd70294ebea8ecd0eb45fd0d74add9f9 upstream. vhost_vsock_stop() calls vhost_dev_check_owner() to check the device ownership. It expects current->mm to be valid. vhost_vsock_stop() is also called by vhost_vsock_dev_release() when the user has not done close(), so when we are in do_exit(). In this case current->mm is invalid and we're releasing the device, so we should clean it anyway. Let's check the owner only when vhost_vsock_stop() is called by an ioctl. When invoked from release we can not fail so we don't check return code of vhost_vsock_stop(). We need to stop vsock even if it's not the owner. Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") Cc: stable@vger.kernel.org Reported-by: syzbot+1e3ea63db39f2b4440e0@syzkaller.appspotmail.com Reported-and-tested-by: syzbot+3140b17cb44a7b174008@syzkaller.appspotmail.com Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-22vdpa: check that offsets are within boundsDan Carpenter
commit 3ed21c1451a14d139e1ceb18f2fa70865ce3195a upstream. In this function "c->off" is a u32 and "size" is a long. On 64bit systems if "c->off" is greater than "size" then "size - c->off" is a negative and we always return -E2BIG. But on 32bit systems the subtraction is type promoted to a high positive u32 value and basically any "c->len" is accepted. Fixes: 4c8cf31885f6 ("vhost: introduce vDPA-based backend") Reported-by: Xie Yongji <xieyongji@bytedance.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20211208103337.GA4047@kili Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-12-01vhost/vsock: fix incorrect used length reported to the guestStefano Garzarella
commit 49d8c5ffad07ca014cfae72a1b9b8c52b6ad9cb8 upstream. The "used length" reported by calling vhost_add_used() must be the number of bytes written by the device (using "in" buffers). In vhost_vsock_handle_tx_kick() the device only reads the guest buffers (they are all "out" buffers), without writing anything, so we must pass 0 as "used length" to comply virtio spec. Fixes: 433fc58e6bf2 ("VSOCK: Introduce vhost_vsock.ko") Cc: stable@vger.kernel.org Reported-by: Halil Pasic <pasic@linux.ibm.com> Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20211122163525.294024-2-sgarzare@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Halil Pasic <pasic@linux.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-10-17Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio fixes from Michael Tsirkin: "Fixes up some issues in rc5" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vhost-vdpa: Fix the wrong input in config_cb VDUSE: fix documentation underline warning Revert "virtio-blk: Add validation for block size in config space" vhost_vdpa: unset vq irq before freeing irq virtio: write back F_VERSION_1 before validate
2021-10-13vhost-vdpa: Fix the wrong input in config_cbCindy Lu
Fix the wrong input in for config_cb. In function vhost_vdpa_config_cb, the input cb.private was used as struct vhost_vdpa, so the input was wrong here, fix this issue Fixes: 776f395004d8 ("vhost_vdpa: Support config interrupt in vdpa") Signed-off-by: Cindy Lu <lulu@redhat.com> Link: https://lore.kernel.org/r/20210929090933.20465-1-lulu@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-10-13vhost_vdpa: unset vq irq before freeing irqWu Zongyong
Currently we unset vq irq after freeing irq and that will result in error messages: pi_update_irte: failed to update PI IRTE irq bypass consumer (token 000000005a07a12b) unregistration fails: -22 This patch solves this. Signed-off-by: Wu Zongyong <wuzongyong@linux.alibaba.com> Link: https://lore.kernel.org/r/02637d38dcf4e4b836c5b3a65055fe92bf812b3b.1631687872.git.wuzongyong@linux.alibaba.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
2021-09-28Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio/vdpa fixes from Michael Tsirkin: "Fixes up some issues in rc1" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa: potential uninitialized return in vhost_vdpa_va_map() vdpa/mlx5: Avoid executing set_vq_ready() if device is reset vdpa/mlx5: Clear ready indication for control VQ vduse: Cleanup the old kernel states after reset failure vduse: missing error code in vduse_init() virtio: don't fail on !of_device_is_compatible
2021-09-16Merge tag 'net-5.15-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf. Current release - regressions: - vhost_net: fix OoB on sendmsg() failure - mlx5: bridge, fix uninitialized variable usage - bnxt_en: fix error recovery regression Current release - new code bugs: - bpf, mm: fix lockdep warning triggered by stack_map_get_build_id_offset() Previous releases - regressions: - r6040: restore MDIO clock frequency after MAC reset - tcp: fix tp->undo_retrans accounting in tcp_sacktag_one() - dsa: flush switchdev workqueue before tearing down CPU/DSA ports Previous releases - always broken: - ptp: dp83640: don't define PAGE0, avoid compiler warning - igc: fix tunnel segmentation offloads - phylink: update SFP selected interface on advertising changes - stmmac: fix system hang caused by eee_ctrl_timer during suspend/resume - mlx5e: fix mutual exclusion between CQE compression and HW TS Misc: - bpf, cgroups: fix cgroup v2 fallback on v1/v2 mixed mode - sfc: fallback for lack of xdp tx queues - hns3: add option to turn off page pool feature" * tag 'net-5.15-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (67 commits) mlxbf_gige: clear valid_polarity upon open igc: fix tunnel offloading net/{mlx5|nfp|bnxt}: Remove unnecessary RTNL lock assert net: wan: wanxl: define CROSS_COMPILE_M68K selftests: nci: replace unsigned int with int net: dsa: flush switchdev workqueue before tearing down CPU/DSA ports Revert "net: phy: Uniform PHY driver access" net: dsa: destroy the phylink instance on any error in dsa_slave_phy_setup ptp: dp83640: don't define PAGE0 bnx2x: Fix enabling network interfaces without VFs Revert "Revert "ipv4: fix memory leaks in ip_cmsg_send() callers"" tcp: fix tp->undo_retrans accounting in tcp_sacktag_one() net-caif: avoid user-triggerable WARN_ON(1) bpf, selftests: Add test case for mixed cgroup v1/v2 bpf, selftests: Add cgroup v1 net_cls classid helpers bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode bpf: Add oversize check before call kvcalloc() net: hns3: fix the timing issue of VF clearing interrupt sources net: hns3: fix the exception when query imp info net: hns3: disable mac in flr process ...
2021-09-14vdpa: potential uninitialized return in vhost_vdpa_va_map()Dan Carpenter
The concern here is that "ret" can be uninitialized if we hit the "goto next" condition on every iteration through the loop. Fixes: 41ba1b5f9d4b ("vdpa: Support transferring virtual addressing during DMA mapping") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Link: https://lore.kernel.org/r/20210907073253.GB18254@kili Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2021-09-11Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - vduse driver ("vDPA Device in Userspace") supporting emulated virtio block devices - virtio-vsock support for end of record with SEQPACKET - vdpa: mac and mq support for ifcvf and mlx5 - vdpa: management netlink for ifcvf - virtio-i2c, gpio dt bindings - misc fixes and cleanups * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (39 commits) Documentation: Add documentation for VDUSE vduse: Introduce VDUSE - vDPA Device in Userspace vduse: Implement an MMU-based software IOTLB vdpa: Support transferring virtual addressing during DMA mapping vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap() vdpa: Add an opaque pointer for vdpa_config_ops.dma_map() vhost-iotlb: Add an opaque pointer for vhost IOTLB vhost-vdpa: Handle the failure of vdpa_reset() vdpa: Add reset callback in vdpa_config_ops vdpa: Fix some coding style issues file: Export receive_fd() to modules eventfd: Export eventfd_wake_count to modules iova: Export alloc_iova_fast() and free_iova_fast() virtio-blk: remove unneeded "likely" statements virtio-balloon: Use virtio_find_vqs() helper vdpa: Make use of PFN_PHYS/PFN_UP/PFN_DOWN helper macro vsock_test: update message bounds test for MSG_EOR af_vsock: rename variables in receive loop virtio/vsock: support MSG_EOR bit processing vhost/vsock: support MSG_EOR bit processing ...
2021-09-09vhost_net: fix OoB on sendmsg() failure.Paolo Abeni
If the sendmsg() call in vhost_tx_batch() fails, both the 'batched_xdp' and 'done_idx' indexes are left unchanged. If such failure happens when batched_xdp == VHOST_NET_BATCH, the next call to vhost_net_build_xdp() will access and write memory outside the xdp buffers area. Since sendmsg() can only error with EBADFD, this change addresses the issue explicitly freeing the XDP buffers batch on error. Fixes: 0a0be13b8fe2 ("vhost_net: batch submitting XDP buffers to underlayer sockets") Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-09-06vdpa: Support transferring virtual addressing during DMA mappingXie Yongji
This patch introduces an attribute for vDPA device to indicate whether virtual address can be used. If vDPA device driver set it, vhost-vdpa bus driver will not pin user page and transfer userspace virtual address instead of physical address during DMA mapping. And corresponding vma->vm_file and offset will be also passed as an opaque pointer. Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210831103634.33-11-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06vdpa: factor out vhost_vdpa_pa_map() and vhost_vdpa_pa_unmap()Xie Yongji
The upcoming patch is going to support VA mapping/unmapping. So let's factor out the logic of PA mapping/unmapping firstly to make the code more readable. Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210831103634.33-10-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06vdpa: Add an opaque pointer for vdpa_config_ops.dma_map()Xie Yongji
Add an opaque pointer for DMA mapping. Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210831103634.33-9-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06vhost-iotlb: Add an opaque pointer for vhost IOTLBXie Yongji
Add an opaque pointer for vhost IOTLB. And introduce vhost_iotlb_add_range_ctx() to accept it. Suggested-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210831103634.33-8-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06vhost-vdpa: Handle the failure of vdpa_reset()Xie Yongji
The vdpa_reset() may fail now. This adds check to its return value and fail the vhost_vdpa_open(). Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20210831103634.33-7-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06vdpa: Add reset callback in vdpa_config_opsXie Yongji
This adds a new callback to support device specific reset behavior. The vdpa bus driver will call the reset function instead of setting status to zero during resetting. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Link: https://lore.kernel.org/r/20210831103634.33-6-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-06vdpa: Make use of PFN_PHYS/PFN_UP/PFN_DOWN helper macroCai Huoqing
it's a nice refactor to make use of PFN_PHYS/PFN_UP/PFN_DOWN helper macro Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Link: https://lore.kernel.org/r/20210802013717.851-1-caihuoqing@baidu.com Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-05vhost/vsock: support MSG_EOR bit processingArseny Krasnov
'MSG_EOR' handling has similar logic as 'MSG_EOM' - if bit present in packet's header, reset it to 0. Then restore it back if packet processing wasn't completed. Instead of bool variable for each flag, bit mask variable was added: it has logical OR of 'MSG_EOR' and 'MSG_EOM' if needed, to restore flags, this variable is ORed with flags field of packet. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Link: https://lore.kernel.org/r/20210903123238.3273526-1-arseny.krasnov@kaspersky.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
2021-09-05virtio/vsock: rename 'EOR' to 'EOM' bit.Arseny Krasnov
This current implemented bit is used to mark end of messages ('EOM' - end of message), not records('EOR' - end of record). Also rename 'record' to 'message' in implementation as it is different things. Signed-off-by: Arseny Krasnov <arseny.krasnov@kaspersky.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Link: https://lore.kernel.org/r/20210903123109.3273053-1-arseny.krasnov@kaspersky.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-09-05vhost scsi: Convert to SPDX identifierCai Huoqing
use SPDX-License-Identifier instead of a verbose license text Signed-off-by: Cai Huoqing <caihuoqing@baidu.com> Link: https://lore.kernel.org/r/20210821123320.734-1-caihuoqing@baidu.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2021-08-26sock: remove one redundant SKB_FRAG_PAGE_ORDER macroYunsheng Lin
Both SKB_FRAG_PAGE_ORDER are defined to the same value in net/core/sock.c and drivers/vhost/net.c. Move the SKB_FRAG_PAGE_ORDER definition to net/core/sock.h, as both net/core/sock.c and drivers/vhost/net.c include it, and it seems a reasonable file to put the macro. Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2021-08-11vringh: Use wiov->used to check for read/write desc orderNeeraj Upadhyay
As __vringh_iov() traverses a descriptor chain, it populates each descriptor entry into either read or write vring iov and increments that iov's ->used member. So, as we iterate over a descriptor chain, at any point, (riov/wriov)->used value gives the number of descriptor enteries available, which are to be read or written by the device. As all read iovs must precede the write iovs, wiov->used should be zero when we are traversing a read descriptor. Current code checks for wiov->i, to figure out whether any previous entry in the current descriptor chain was a write descriptor. However, iov->i is only incremented, when these vring iovs are consumed, at a later point, and remain 0 in __vringh_iov(). So, correct the check for read and write descriptor order, to use wiov->used. Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Neeraj Upadhyay <neeraju@codeaurora.org> Link: https://lore.kernel.org/r/1624591502-4827-1-git-send-email-neeraju@codeaurora.org Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-08-11vhost: Fix the calculation in vhost_overflow()Xie Yongji
This fixes the incorrect calculation for integer overflow when the last address of iova range is 0xffffffff. Fixes: ec33d031a14b ("vhost: detect 32 bit integer wrap around") Reported-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210728130756.97-2-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-08-10vhost-vdpa: Fix integer overflow in vhost_vdpa_process_iotlb_update()Xie Yongji
The "msg->iova + msg->size" addition can have an integer overflow if the iotlb message is from a malicious user space application. So let's fix it. Fixes: 1b48dc03e575 ("vhost: vdpa: report iova range") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210728130756.97-1-xieyongji@bytedance.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-08vdpa: support packed virtqueue for set/get_vq_state()Jason Wang
This patch extends the vdpa_vq_state to support packed virtqueue state which is basically the device/driver ring wrap counters and the avail and used index. This will be used for the virito-vdpa support for the packed virtqueue and the future vhost/vhost-vdpa support for the packed virtqueue. Signed-off-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210602021536.39525-2-jasowang@redhat.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Eli Cohen <elic@nvidia.com>
2021-07-03vhost: fix up vhost_work coding styleMike Christie
Switch from a mix of tabs and spaces to just tabs. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20210525174733.6212-6-michael.christie@oracle.com Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03vhost: fix poll coding styleMike Christie
We use 3 coding styles in this struct. Switch to just tabs. Signed-off-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210525174733.6212-5-michael.christie@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03vhost-scsi: reduce flushes during endpoint clearingMike Christie
vhost_scsi_flush will flush everything, so we can clear the backends then flush, then destroy. We don't need to flush before each vq destruction because after the flush we will have made sure there can be no new cmds started and there are no running cmds. Signed-off-by: Mike Christie <michael.christie@oracle.com> Link: https://lore.kernel.org/r/20210525174733.6212-4-michael.christie@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03vhost-scsi: remove extra flushesMike Christie
The vhost work flush function was flushing the entire work queue, so there is no need for the double vhost_work_dev_flush calls in vhost_scsi_flush. And we do not need to call vhost_poll_flush for each poller because that call also ends up flushing the same work queue thread the vhost_work_dev_flush call flushed. Signed-off-by: Mike Christie <michael.christie@oracle.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Link: https://lore.kernel.org/r/20210525174733.6212-3-michael.christie@oracle.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03vhost: remove work arg from vhost_work_flushMike Christie
vhost_work_flush doesn't do anything with the work arg. This patch drops it and then renames vhost_work_flush to vhost_work_dev_flush to reflect that the function flushes all the works in the dev and not just a specific queue or work item. Signed-off-by: Mike Christie <michael.christie@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Link: https://lore.kernel.org/r/20210525174733.6212-2-michael.christie@oracle.com Reviewed-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2021-07-03vhost: Remove the repeated declarationShaokun Zhang
Function 'vhost_vring_ioctl' is declared twice, remove the repeated declaration. Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Link: https://lore.kernel.org/r/1621857884-19964-1-git-send-email-zhangshaokun@hisilicon.com Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>