summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2018-05-30KVM: Fix spelling mistake: "cop_unsuable" -> "cop_unusable"Colin Ian King
commit ba3696e94d9d590d9a7e55f68e81c25dba515191 upstream. Trivial fix to spelling mistake in debugfs_entries text. Fixes: 669e846e6c4e ("KVM/MIPS32: MIPS arch specific APIs for KVM") Signed-off-by: Colin Ian King <colin.king@canonical.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: kernel-janitors@vger.kernel.org Cc: <stable@vger.kernel.org> # 3.10+ Signed-off-by: James Hogan <jhogan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30MIPS: Fix ptrace(2) PTRACE_PEEKUSR and PTRACE_POKEUSR accesses to o32 FGRsMaciej W. Rozycki
commit 9a3a92ccfe3620743d4ae57c987dc8e9c5f88996 upstream. Check the TIF_32BIT_FPREGS task setting of the tracee rather than the tracer in determining the layout of floating-point general registers in the floating-point context, correcting access to odd-numbered registers for o32 tracees where the setting disagrees between the two processes. Fixes: 597ce1723e0f ("MIPS: Support for 64-bit FP with O32 binaries") Signed-off-by: Maciej W. Rozycki <macro@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: <stable@vger.kernel.org> # 3.14+ Signed-off-by: James Hogan <jhogan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-30MIPS: ptrace: Expose FIR register through FP regsetMaciej W. Rozycki
commit 71e909c0cdad28a1df1fa14442929e68615dee45 upstream. Correct commit 7aeb753b5353 ("MIPS: Implement task_user_regset_view.") and expose the FIR register using the unused 4 bytes at the end of the NT_PRFPREG regset. Without that register included clients cannot use the PTRACE_GETREGSET request to retrieve the complete FPU register set and have to resort to one of the older interfaces, either PTRACE_PEEKUSR or PTRACE_GETFPREGS, to retrieve the missing piece of data. Also the register is irreversibly missing from core dumps. This register is architecturally hardwired and read-only so the write path does not matter. Ignore data supplied on writes then. Fixes: 7aeb753b5353 ("MIPS: Implement task_user_regset_view.") Signed-off-by: James Hogan <jhogan@kernel.org> Signed-off-by: Maciej W. Rozycki <macro@mips.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: <stable@vger.kernel.org> # 3.13+ Patchwork: https://patchwork.linux-mips.org/patch/19273/ Signed-off-by: James Hogan <jhogan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26Linux 4.4.133v4.4.133Greg Kroah-Hartman
2018-05-26x86/kexec: Avoid double free_page() upon do_kexec_load() failureTetsuo Handa
commit a466ef76b815b86748d9870ef2a430af7b39c710 upstream. >From ff82bedd3e12f0d3353282054ae48c3bd8c72012 Mon Sep 17 00:00:00 2001 From: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Date: Wed, 9 May 2018 12:12:39 +0900 Subject: x86/kexec: Avoid double free_page() upon do_kexec_load() failure syzbot is reporting crashes after memory allocation failure inside do_kexec_load() [1]. This is because free_transition_pgtable() is called by both init_transition_pgtable() and machine_kexec_cleanup() when memory allocation failed inside init_transition_pgtable(). Regarding 32bit code, machine_kexec_free_page_tables() is called by both machine_kexec_alloc_page_tables() and machine_kexec_cleanup() when memory allocation failed inside machine_kexec_alloc_page_tables(). Fix this by leaving the error handling to machine_kexec_cleanup() (and optionally setting NULL after free_page()). [1] https://syzkaller.appspot.com/bug?id=91e52396168cf2bdd572fe1e1bc0bc645c1c6b40 Fixes: f5deb79679af6eb4 ("x86: kexec: Use one page table in x86_64 machine_kexec") Fixes: 92be3d6bdf2cb349 ("kexec/i386: allocate page table pages dynamically") Reported-by: syzbot <syzbot+d96f60296ef613fe1d69@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Baoquan He <bhe@redhat.com> Cc: thomas.lendacky@amd.com Cc: prudo@linux.vnet.ibm.com Cc: Huang Ying <ying.huang@intel.com> Cc: syzkaller-bugs@googlegroups.com Cc: takahiro.akashi@linaro.org Cc: H. Peter Anvin <hpa@zytor.com> Cc: akpm@linux-foundation.org Cc: dyoung@redhat.com Cc: kirill.shutemov@linux.intel.com Link: https://lkml.kernel.org/r/201805091942.DGG12448.tMFVFSJFQOOLHO@I-love.SAKURA.ne.jp Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26hfsplus: stop workqueue when fill_super() failedTetsuo Handa
commit 66072c29328717072fd84aaff3e070e3f008ba77 upstream. syzbot is reporting ODEBUG messages at hfsplus_fill_super() [1]. This is because hfsplus_fill_super() forgot to call cancel_delayed_work_sync(). As far as I can see, it is hfsplus_mark_mdb_dirty() from hfsplus_new_inode() in hfsplus_fill_super() that calls queue_delayed_work(). Therefore, I assume that hfsplus_new_inode() does not fail if queue_delayed_work() was called, and the out_put_hidden_dir label is the appropriate location to call cancel_delayed_work_sync(). [1] https://syzkaller.appspot.com/bug?id=a66f45e96fdbeb76b796bf46eb25ea878c42a6c9 Link: http://lkml.kernel.org/r/964a8b27-cd69-357c-fe78-76b066056201@I-love.SAKURA.ne.jp Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Reported-by: syzbot <syzbot+4f2e5f086147d543ab03@syzkaller.appspotmail.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: David Howells <dhowells@redhat.com> Cc: Ernesto A. Fernandez <ernesto.mnd.fernandez@gmail.com> Cc: Vyacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26cfg80211: limit wiphy names to 128 bytesJohannes Berg
commit a7cfebcb7594a24609268f91299ab85ba064bf82 upstream. There's currently no limit on wiphy names, other than netlink message size and memory limitations, but that causes issues when, for example, the wiphy name is used in a uevent, e.g. in rfkill where we use the same name for the rfkill instance, and then the buffer there is "only" 2k for the environment variables. This was reported by syzkaller, which used a 4k name. Limit the name to something reasonable, I randomly picked 128. Reported-by: syzbot+230d9e642a85d3fec29c@syzkaller.appspotmail.com Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26gpio: rcar: Add Runtime PM handling for interruptsGeert Uytterhoeven
commit b26a719bdba9aa926ceaadecc66e07623d2b8a53 upstream. The R-Car GPIO driver handles Runtime PM for requested GPIOs only. When using a GPIO purely as an interrupt source, no Runtime PM handling is done, and the GPIO module's clock may not be enabled. To fix this: - Add .irq_request_resources() and .irq_release_resources() callbacks to handle Runtime PM when an interrupt is requested, - Add irq_bus_lock() and sync_unlock() callbacks to handle Runtime PM when e.g. disabling/enabling an interrupt, or configuring the interrupt type. Fixes: d5c3d84657db57bd "net: phy: Avoid polling PHY with PHY_IGNORE_INTERRUPTS" Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> [fabrizio: cherry-pick to v4.4.y. Use container_of instead of gpiochip_get_data.] Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com> Reviewed-by: Biju Das <biju.das@bp.renesas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accountingJohn Stultz
commit 3d88d56c5873f6eebe23e05c3da701960146b801 upstream. Due to how the MONOTONIC_RAW accumulation logic was handled, there is the potential for a 1ns discontinuity when we do accumulations. This small discontinuity has for the most part gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW in their vDSO clock_gettime implementation, we've seen failures with the inconsistency-check test in kselftest. This patch addresses the issue by using the same sub-ns accumulation handling that CLOCK_MONOTONIC uses, which avoids the issue for in-kernel users. Since the ARM64 vDSO implementation has its own clock_gettime calculation logic, this patch reduces the frequency of errors, but failures are still seen. The ARM64 vDSO will need to be updated to include the sub-nanosecond xtime_nsec values in its calculation for this issue to be completely fixed. Signed-off-by: John Stultz <john.stultz@linaro.org> Tested-by: Daniel Mentz <danielmentz@google.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Richard Cochran <richardcochran@gmail.com> Cc: Stephen Boyd <stephen.boyd@linaro.org> Cc: Will Deacon <will.deacon@arm.com> Cc: "stable #4 . 8+" <stable@vger.kernel.org> Cc: Miroslav Lichvar <mlichvar@redhat.com> Link: http://lkml.kernel.org/r/1496965462-20003-3-git-send-email-john.stultz@linaro.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> [fabrizio: cherry-pick to 4.4. Kept cycle_t type for function logarithmic_accumulation local variable "interval". Dropped casting of "interval" variable] Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com> Signed-off-by: Biju Das <biju.das@bp.renesas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26dmaengine: ensure dmaengine helpers check valid callbackVinod Koul
commit 757d12e5849be549076901b0d33c60d5f360269c upstream. dmaengine has various device callbacks and exposes helper functions to invoke these. These helpers should check if channel, device and callback is valid or not before invoking them. Reported-by: Jon Hunter <jonathanh@nvidia.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Fabrizio Castro <fabrizio.castro@bp.renesas.com> Signed-off-by: Jianming Qiao <jianming.qiao@bp.renesas.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26scsi: zfcp: fix infinite iteration on ERP ready listJens Remus
commit fa89adba1941e4f3b213399b81732a5c12fd9131 upstream. zfcp_erp_adapter_reopen() schedules blocking of all of the adapter's rports via zfcp_scsi_schedule_rports_block() and enqueues a reopen adapter ERP action via zfcp_erp_action_enqueue(). Both are separately processed asynchronously and concurrently. Blocking of rports is done in a kworker by zfcp_scsi_rport_work(). It calls zfcp_scsi_rport_block(), which then traces a DBF REC "scpdely" via zfcp_dbf_rec_trig(). zfcp_dbf_rec_trig() acquires the DBF REC spin lock and then iterates with list_for_each() over the adapter's ERP ready list without holding the ERP lock. This opens a race window in which the current list entry can be moved to another list, causing list_for_each() to iterate forever on the wrong list, as the erp_ready_head is never encountered as terminal condition. Meanwhile the ERP action can be processed in the ERP thread by zfcp_erp_thread(). It calls zfcp_erp_strategy(), which acquires the ERP lock and then calls zfcp_erp_action_to_running() to move the ERP action from the ready to the running list. zfcp_erp_action_to_running() can move the ERP action using list_move() just during the aforementioned race window. It then traces a REC RUN "erator1" via zfcp_dbf_rec_run(). zfcp_dbf_rec_run() tries to acquire the DBF REC spin lock. If this is held by the infinitely looping kworker, it effectively spins forever. Example Sequence Diagram: Process ERP Thread rport_work ------------------- ------------------- ------------------- zfcp_erp_adapter_reopen() zfcp_erp_adapter_block() zfcp_scsi_schedule_rports_block() lock ERP zfcp_scsi_rport_work() zfcp_erp_action_enqueue(ZFCP_ERP_ACTION_REOPEN_ADAPTER) list_add_tail() on ready !(rport_task==RPORT_ADD) wake_up() ERP thread zfcp_scsi_rport_block() zfcp_dbf_rec_trig() zfcp_erp_strategy() zfcp_dbf_rec_trig() unlock ERP lock DBF REC zfcp_erp_wait() lock ERP | zfcp_erp_action_to_running() | list_for_each() ready | list_move() current entry | ready to running | zfcp_dbf_rec_run() endless loop over running | zfcp_dbf_rec_run_lvl() | lock DBF REC spins forever Any adapter recovery can trigger this, such as setting the device offline or reboot. V4.9 commit 4eeaa4f3f1d6 ("zfcp: close window with unblocked rport during rport gone") introduced additional tracing of (un)blocking of rports. It missed that the adapter->erp_lock must be held when calling zfcp_dbf_rec_trig(). This fix uses the approach formerly introduced by commit aa0fec62391c ("[SCSI] zfcp: Fix sparse warning by providing new entry in dbf") that got later removed by commit ae0904f60fab ("[SCSI] zfcp: Redesign of the debug tracing for recovery actions."). Introduce zfcp_dbf_rec_trig_lock(), a wrapper for zfcp_dbf_rec_trig() that acquires and releases the adapter->erp_lock for read. Reported-by: Sebastian Ott <sebott@linux.ibm.com> Signed-off-by: Jens Remus <jremus@linux.ibm.com> Fixes: 4eeaa4f3f1d6 ("zfcp: close window with unblocked rport during rport gone") Cc: <stable@vger.kernel.org> # 2.6.32+ Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26scsi: sg: allocate with __GFP_ZERO in sg_build_indirect()Alexander Potapenko
commit a45b599ad808c3c982fdcdc12b0b8611c2f92824 upstream. This shall help avoid copying uninitialized memory to the userspace when calling ioctl(fd, SG_IO) with an empty command. Reported-by: syzbot+7d26fc1eea198488deab@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Signed-off-by: Alexander Potapenko <glider@google.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26scsi: libsas: defer ata device eh commands to libataJason Yan
commit 318aaf34f1179b39fa9c30fa0f3288b645beee39 upstream. When ata device doing EH, some commands still attached with tasks are not passed to libata when abort failed or recover failed, so libata did not handle these commands. After these commands done, sas task is freed, but ata qc is not freed. This will cause ata qc leak and trigger a warning like below: WARNING: CPU: 0 PID: 28512 at drivers/ata/libata-eh.c:4037 ata_eh_finish+0xb4/0xcc CPU: 0 PID: 28512 Comm: kworker/u32:2 Tainted: G W OE 4.14.0#1 ...... Call trace: [<ffff0000088b7bd0>] ata_eh_finish+0xb4/0xcc [<ffff0000088b8420>] ata_do_eh+0xc4/0xd8 [<ffff0000088b8478>] ata_std_error_handler+0x44/0x8c [<ffff0000088b8068>] ata_scsi_port_error_handler+0x480/0x694 [<ffff000008875fc4>] async_sas_ata_eh+0x4c/0x80 [<ffff0000080f6be8>] async_run_entry_fn+0x4c/0x170 [<ffff0000080ebd70>] process_one_work+0x144/0x390 [<ffff0000080ec100>] worker_thread+0x144/0x418 [<ffff0000080f2c98>] kthread+0x10c/0x138 [<ffff0000080855dc>] ret_from_fork+0x10/0x18 If ata qc leaked too many, ata tag allocation will fail and io blocked for ever. As suggested by Dan Williams, defer ata device commands to libata and merge sas_eh_finish_cmd() with sas_eh_defer_cmd(). libata will handle ata qcs correctly after this. Signed-off-by: Jason Yan <yanaijie@huawei.com> CC: Xiaofei Tan <tanxiaofei@huawei.com> CC: John Garry <john.garry@huawei.com> CC: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26s390: use expoline thunks in the BPF JITMartin Schwidefsky
[ Upstream commit de5cb6eb514ebe241e3edeb290cb41deb380b81d ] The BPF JIT need safe guarding against spectre v2 in the sk_load_xxx assembler stubs and the indirect branches generated by the JIT itself need to be converted to expolines. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26s390: extend expoline to BC instructionsMartin Schwidefsky
[ Upstream commit 6deaa3bbca804b2a3627fd685f75de64da7be535 ] The BPF JIT uses a 'b <disp>(%r<x>)' instruction in the definition of the sk_load_word and sk_load_half functions. Add support for branch-on-condition instructions contained in the thunk code of an expoline. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26s390: move spectre sysfs attribute codeMartin Schwidefsky
[ Upstream commit 4253b0e0627ee3461e64c2495c616f1c8f6b127b ] The nospec-branch.c file is compiled without the gcc options to generate expoline thunks. The return branch of the sysfs show functions cpu_show_spectre_v1 and cpu_show_spectre_v2 is an indirect branch as well. These need to be compiled with expolines. Move the sysfs functions for spectre reporting to a separate file and loose an '.' for one of the messages. Cc: stable@vger.kernel.org # 4.16 Fixes: d424986f1d ("s390: add sysfs attributes for spectre") Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26s390/kernel: use expoline for indirect branchesMartin Schwidefsky
[ Upstream commit c50c84c3ac4d5db683904bdb3257798b6ef980ae ] The assember code in arch/s390/kernel uses a few more indirect branches which need to be done with execute trampolines for CONFIG_EXPOLINE=y. Cc: stable@vger.kernel.org # 4.16 Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches") Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26s390/lib: use expoline for indirect branchesMartin Schwidefsky
[ Upstream commit 97489e0663fa700d6e7febddc43b58df98d7bcda ] The return from the memmove, memset, memcpy, __memset16, __memset32 and __memset64 functions are done with "br %r14". These are indirect branches as well and need to use execute trampolines for CONFIG_EXPOLINE=y. Cc: stable@vger.kernel.org # 4.16 Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches") Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26s390: move expoline assembler macros to a headerMartin Schwidefsky
[ Upstream commit 6dd85fbb87d1d6b87a3b1f02ca28d7b2abd2e7ba ] To be able to use the expoline branches in different assembler files move the associated macros from entry.S to a new header nospec-insn.h. While we are at it make the macros a bit nicer to use. Cc: stable@vger.kernel.org # 4.16 Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches") Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26s390: add assembler macros for CPU alternativesMartin Schwidefsky
[ Upstream commit fba9eb7946251d6e420df3bdf7bc45195be7be9a ] Add a header with macros usable in assembler files to emit alternative code sequences. It works analog to the alternatives for inline assmeblies in C files, with the same restrictions and capabilities. The syntax is ALTERNATIVE "<default instructions sequence>", \ "<alternative instructions sequence>", \ "<features-bit>" and ALTERNATIVE_2 "<default instructions sequence>", \ "<alternative instructions sqeuence #1>", \ "<feature-bit #1>", "<alternative instructions sqeuence #2>", \ "<feature-bit #2>" Reviewed-by: Vasily Gorbik <gor@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26ext2: fix a block leakAl Viro
commit 5aa1437d2d9a068c0334bd7c9dafa8ec4f97f13b upstream. open file, unlink it, then use ioctl(2) to make it immutable or append only. Now close it and watch the blocks *not* freed... Immutable/append-only checks belong in ->setattr(). Note: the bug is old and backport to anything prior to 737f2e93b972 ("ext2: convert to use the new truncate convention") will need these checks lifted into ext2_setattr(). Cc: stable@kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26tcp: purge write queue in tcp_connect_init()Eric Dumazet
[ Upstream commit 7f582b248d0a86bae5788c548d7bb5bca6f7691a ] syzkaller found a reliable way to crash the host, hitting a BUG() in __tcp_retransmit_skb() Malicous MSG_FASTOPEN is the root cause. We need to purge write queue in tcp_connect_init() at the point we init snd_una/write_seq. This patch also replaces the BUG() by a less intrusive WARN_ON_ONCE() kernel BUG at net/ipv4/tcp_output.c:2837! invalid opcode: 0000 [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 0 PID: 5276 Comm: syz-executor0 Not tainted 4.17.0-rc3+ #51 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:__tcp_retransmit_skb+0x2992/0x2eb0 net/ipv4/tcp_output.c:2837 RSP: 0000:ffff8801dae06ff8 EFLAGS: 00010206 RAX: ffff8801b9fe61c0 RBX: 00000000ffc18a16 RCX: ffffffff864e1a49 RDX: 0000000000000100 RSI: ffffffff864e2e12 RDI: 0000000000000005 RBP: ffff8801dae073a0 R08: ffff8801b9fe61c0 R09: ffffed0039c40dd2 R10: ffffed0039c40dd2 R11: ffff8801ce206e93 R12: 00000000421eeaad R13: ffff8801ce206d4e R14: ffff8801ce206cc0 R15: ffff8801cd4f4a80 FS: 0000000000000000(0000) GS:ffff8801dae00000(0063) knlGS:00000000096bc900 CS: 0010 DS: 002b ES: 002b CR0: 0000000080050033 CR2: 0000000020000000 CR3: 00000001c47b6000 CR4: 00000000001406f0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Call Trace: <IRQ> tcp_retransmit_skb+0x2e/0x250 net/ipv4/tcp_output.c:2923 tcp_retransmit_timer+0xc50/0x3060 net/ipv4/tcp_timer.c:488 tcp_write_timer_handler+0x339/0x960 net/ipv4/tcp_timer.c:573 tcp_write_timer+0x111/0x1d0 net/ipv4/tcp_timer.c:593 call_timer_fn+0x230/0x940 kernel/time/timer.c:1326 expire_timers kernel/time/timer.c:1363 [inline] __run_timers+0x79e/0xc50 kernel/time/timer.c:1666 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692 __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285 invoke_softirq kernel/softirq.c:365 [inline] irq_exit+0x1d1/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:525 [inline] smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863 Fixes: cf60af03ca4e ("net-tcp: Fast Open client - sendmsg(MSG_FASTOPEN)") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Yuchung Cheng <ycheng@google.com> Cc: Neal Cardwell <ncardwell@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26sock_diag: fix use-after-free read in __sk_freeEric Dumazet
[ Upstream commit 9709020c86f6bf8439ca3effc58cfca49a5de192 ] We must not call sock_diag_has_destroy_listeners(sk) on a socket that has no reference on net structure. BUG: KASAN: use-after-free in sock_diag_has_destroy_listeners include/linux/sock_diag.h:75 [inline] BUG: KASAN: use-after-free in __sk_free+0x329/0x340 net/core/sock.c:1609 Read of size 8 at addr ffff88018a02e3a0 by task swapper/1/0 CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.17.0-rc5+ #54 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x1b9/0x294 lib/dump_stack.c:113 print_address_description+0x6c/0x20b mm/kasan/report.c:256 kasan_report_error mm/kasan/report.c:354 [inline] kasan_report.cold.7+0x242/0x2fe mm/kasan/report.c:412 __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433 sock_diag_has_destroy_listeners include/linux/sock_diag.h:75 [inline] __sk_free+0x329/0x340 net/core/sock.c:1609 sk_free+0x42/0x50 net/core/sock.c:1623 sock_put include/net/sock.h:1664 [inline] reqsk_free include/net/request_sock.h:116 [inline] reqsk_put include/net/request_sock.h:124 [inline] inet_csk_reqsk_queue_drop_and_put net/ipv4/inet_connection_sock.c:672 [inline] reqsk_timer_handler+0xe27/0x10e0 net/ipv4/inet_connection_sock.c:739 call_timer_fn+0x230/0x940 kernel/time/timer.c:1326 expire_timers kernel/time/timer.c:1363 [inline] __run_timers+0x79e/0xc50 kernel/time/timer.c:1666 run_timer_softirq+0x4c/0x70 kernel/time/timer.c:1692 __do_softirq+0x2e0/0xaf5 kernel/softirq.c:285 invoke_softirq kernel/softirq.c:365 [inline] irq_exit+0x1d1/0x200 kernel/softirq.c:405 exiting_irq arch/x86/include/asm/apic.h:525 [inline] smp_apic_timer_interrupt+0x17e/0x710 arch/x86/kernel/apic/apic.c:1052 apic_timer_interrupt+0xf/0x20 arch/x86/entry/entry_64.S:863 </IRQ> RIP: 0010:native_safe_halt+0x6/0x10 arch/x86/include/asm/irqflags.h:54 RSP: 0018:ffff8801d9ae7c38 EFLAGS: 00000282 ORIG_RAX: ffffffffffffff13 RAX: dffffc0000000000 RBX: 1ffff1003b35cf8a RCX: 0000000000000000 RDX: 1ffffffff11a30d0 RSI: 0000000000000001 RDI: ffffffff88d18680 RBP: ffff8801d9ae7c38 R08: ffffed003b5e46c3 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000001 R13: ffff8801d9ae7cf0 R14: ffffffff897bef20 R15: 0000000000000000 arch_safe_halt arch/x86/include/asm/paravirt.h:94 [inline] default_idle+0xc2/0x440 arch/x86/kernel/process.c:354 arch_cpu_idle+0x10/0x20 arch/x86/kernel/process.c:345 default_idle_call+0x6d/0x90 kernel/sched/idle.c:93 cpuidle_idle_call kernel/sched/idle.c:153 [inline] do_idle+0x395/0x560 kernel/sched/idle.c:262 cpu_startup_entry+0x104/0x120 kernel/sched/idle.c:368 start_secondary+0x426/0x5b0 arch/x86/kernel/smpboot.c:269 secondary_startup_64+0xa5/0xb0 arch/x86/kernel/head_64.S:242 Allocated by task 4557: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] kasan_kmalloc+0xc4/0xe0 mm/kasan/kasan.c:553 kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490 kmem_cache_alloc+0x12e/0x760 mm/slab.c:3554 kmem_cache_zalloc include/linux/slab.h:691 [inline] net_alloc net/core/net_namespace.c:383 [inline] copy_net_ns+0x159/0x4c0 net/core/net_namespace.c:423 create_new_namespaces+0x69d/0x8f0 kernel/nsproxy.c:107 unshare_nsproxy_namespaces+0xc3/0x1f0 kernel/nsproxy.c:206 ksys_unshare+0x708/0xf90 kernel/fork.c:2408 __do_sys_unshare kernel/fork.c:2476 [inline] __se_sys_unshare kernel/fork.c:2474 [inline] __x64_sys_unshare+0x31/0x40 kernel/fork.c:2474 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe Freed by task 69: save_stack+0x43/0xd0 mm/kasan/kasan.c:448 set_track mm/kasan/kasan.c:460 [inline] __kasan_slab_free+0x11a/0x170 mm/kasan/kasan.c:521 kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528 __cache_free mm/slab.c:3498 [inline] kmem_cache_free+0x86/0x2d0 mm/slab.c:3756 net_free net/core/net_namespace.c:399 [inline] net_drop_ns.part.14+0x11a/0x130 net/core/net_namespace.c:406 net_drop_ns net/core/net_namespace.c:405 [inline] cleanup_net+0x6a1/0xb20 net/core/net_namespace.c:541 process_one_work+0xc1e/0x1b50 kernel/workqueue.c:2145 worker_thread+0x1cc/0x1440 kernel/workqueue.c:2279 kthread+0x345/0x410 kernel/kthread.c:240 ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:412 The buggy address belongs to the object at ffff88018a02c140 which belongs to the cache net_namespace of size 8832 The buggy address is located 8800 bytes inside of 8832-byte region [ffff88018a02c140, ffff88018a02e3c0) The buggy address belongs to the page: page:ffffea0006280b00 count:1 mapcount:0 mapping:ffff88018a02c140 index:0x0 compound_mapcount: 0 flags: 0x2fffc0000008100(slab|head) raw: 02fffc0000008100 ffff88018a02c140 0000000000000000 0000000100000001 raw: ffffea00062a1320 ffffea0006268020 ffff8801d9bdde40 0000000000000000 page dumped because: kasan: bad access detected Fixes: b922622ec6ef ("sock_diag: don't broadcast kernel sockets") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Craig Gallek <kraig@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26packet: in packet_snd start writing at link layer allocationWillem de Bruijn
[ Upstream commit b84bbaf7a6c8cca24f8acf25a2c8e46913a947ba ] Packet sockets allow construction of packets shorter than dev->hard_header_len to accommodate protocols with variable length link layer headers. These packets are padded to dev->hard_header_len, because some device drivers interpret that as a minimum packet size. packet_snd reserves dev->hard_header_len bytes on allocation. SOCK_DGRAM sockets call skb_push in dev_hard_header() to ensure that link layer headers are stored in the reserved range. SOCK_RAW sockets do the same in tpacket_snd, but not in packet_snd. Syzbot was able to send a zero byte packet to a device with massive 116B link layer header, causing padding to cross over into skb_shinfo. Fix this by writing from the start of the llheader reserved range also in the case of packet_snd/SOCK_RAW. Update skb_set_network_header to the new offset. This also corrects it for SOCK_DGRAM, where it incorrectly double counted reserve due to the skb_push in dev_hard_header. Fixes: 9ed988cd5915 ("packet: validate variable length ll headers") Reported-by: syzbot+71d74a5406d02057d559@syzkaller.appspotmail.com Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26net: test tailroom before appending to linear skbWillem de Bruijn
[ Upstream commit 113f99c3358564a0647d444c2ae34e8b1abfd5b9 ] Device features may change during transmission. In particular with corking, a device may toggle scatter-gather in between allocating and writing to an skb. Do not unconditionally assume that !NETIF_F_SG at write time implies that the same held at alloc time and thus the skb has sufficient tailroom. This issue predates git history. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26btrfs: fix reading stale metadata blocks after degraded raid1 mountsLiu Bo
commit 02a3307aa9c20b4f6626255b028f07f6cfa16feb upstream. If a btree block, aka. extent buffer, is not available in the extent buffer cache, it'll be read out from the disk instead, i.e. btrfs_search_slot() read_block_for_search() # hold parent and its lock, go to read child btrfs_release_path() read_tree_block() # read child Unfortunately, the parent lock got released before reading child, so commit 5bdd3536cbbe ("Btrfs: Fix block generation verification race") had used 0 as parent transid to read the child block. It forces read_tree_block() not to check if parent transid is different with the generation id of the child that it reads out from disk. A simple PoC is included in btrfs/124, 0. A two-disk raid1 btrfs, 1. Right after mkfs.btrfs, block A is allocated to be device tree's root. 2. Mount this filesystem and put it in use, after a while, device tree's root got COW but block A hasn't been allocated/overwritten yet. 3. Umount it and reload the btrfs module to remove both disks from the global @fs_devices list. 4. mount -odegraded dev1 and write some data, so now block A is allocated to be a leaf in checksum tree. Note that only dev1 has the latest metadata of this filesystem. 5. Umount it and mount it again normally (with both disks), since raid1 can pick up one disk by the writer task's pid, if btrfs_search_slot() needs to read block A, dev2 which does NOT have the latest metadata might be read for block A, then we got a stale block A. 6. As parent transid is not checked, block A is marked as uptodate and put into the extent buffer cache, so the future search won't bother to read disk again, which means it'll make changes on this stale one and make it dirty and flush it onto disk. To avoid the problem, parent transid needs to be passed to read_tree_block(). In order to get a valid parent transid, we need to hold the parent's lock until finishing reading child. This patch needs to be slightly adapted for stable kernels, the &first_key parameter added to read_tree_block() is from 4.16+ (581c1760415c4). The fix is to replace 0 by 'gen'. Fixes: 5bdd3536cbbe ("Btrfs: Fix block generation verification race") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> [ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26btrfs: fix crash when trying to resume balance without the resume flagAnand Jain
commit 02ee654d3a04563c67bfe658a05384548b9bb105 upstream. We set the BTRFS_BALANCE_RESUME flag in the btrfs_recover_balance() only, which isn't called during the remount. So when resuming from the paused balance we hit the bug: kernel: kernel BUG at fs/btrfs/volumes.c:3890! :: kernel: balance_kthread+0x51/0x60 [btrfs] kernel: kthread+0x111/0x130 :: kernel: RIP: btrfs_balance+0x12e1/0x1570 [btrfs] RSP: ffffba7d0090bde8 Reproducer: On a mounted filesystem: btrfs balance start --full-balance /btrfs btrfs balance pause /btrfs mount -o remount,ro /dev/sdb /btrfs mount -o remount,rw /dev/sdb /btrfs To fix this set the BTRFS_BALANCE_RESUME flag in btrfs_resume_balance_async(). CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Anand Jain <anand.jain@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26Btrfs: fix xattr loss after power failureFilipe Manana
commit 9a8fca62aacc1599fea8e813d01e1955513e4fad upstream. If a file has xattrs, we fsync it, to ensure we clear the flags BTRFS_INODE_NEEDS_FULL_SYNC and BTRFS_INODE_COPY_EVERYTHING from its inode, the current transaction commits and then we fsync it (without either of those bits being set in its inode), we end up not logging all its xattrs. This results in deleting all xattrs when replying the log after a power failure. Trivial reproducer $ mkfs.btrfs -f /dev/sdb $ mount /dev/sdb /mnt $ touch /mnt/foobar $ setfattr -n user.xa -v qwerty /mnt/foobar $ xfs_io -c "fsync" /mnt/foobar $ sync $ xfs_io -c "pwrite -S 0xab 0 64K" /mnt/foobar $ xfs_io -c "fsync" /mnt/foobar <power failure> $ mount /dev/sdb /mnt $ getfattr --absolute-names --dump /mnt/foobar <empty output> $ So fix this by making sure all xattrs are logged if we log a file's inode item and neither the flags BTRFS_INODE_NEEDS_FULL_SYNC nor BTRFS_INODE_COPY_EVERYTHING were set in the inode. Fixes: 36283bf777d9 ("Btrfs: fix fsync xattr loss in the fast fsync path") Cc: <stable@vger.kernel.org> # 4.2+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26ARM: 8772/1: kprobes: Prohibit kprobes on get_user functionsMasami Hiramatsu
commit 0d73c3f8e7f6ee2aab1bb350f60c180f5ae21a2c upstream. Since do_undefinstr() uses get_user to get the undefined instruction, it can be called before kprobes processes recursive check. This can cause an infinit recursive exception. Prohibit probing on get_user functions. Fixes: 24ba613c9d6c ("ARM kprobes: core code") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26ARM: 8770/1: kprobes: Prohibit probing on optimized_callbackMasami Hiramatsu
commit 70948c05fdde0aac32f9667856a88725c192fa40 upstream. Prohibit probing on optimized_callback() because it is called from kprobes itself. If we put a kprobes on it, that will cause a recursive call loop. Mark it NOKPROBE_SYMBOL. Fixes: 0dc016dbd820 ("ARM: kprobes: enable OPTPROBES for ARM 32") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26ARM: 8769/1: kprobes: Fix to use get_kprobe_ctlblk after irq-disabedMasami Hiramatsu
commit 69af7e23a6870df2ea6fa79ca16493d59b3eebeb upstream. Since get_kprobe_ctlblk() uses smp_processor_id() to access per-cpu variable, it hits smp_processor_id sanity check as below. [ 7.006928] BUG: using smp_processor_id() in preemptible [00000000] code: swapper/0/1 [ 7.007859] caller is debug_smp_processor_id+0x20/0x24 [ 7.008438] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.16.0-rc1-00192-g4eb17253e4b5 #1 [ 7.008890] Hardware name: Generic DT based system [ 7.009917] [<c0313f0c>] (unwind_backtrace) from [<c030e6d8>] (show_stack+0x20/0x24) [ 7.010473] [<c030e6d8>] (show_stack) from [<c0c64694>] (dump_stack+0x84/0x98) [ 7.010990] [<c0c64694>] (dump_stack) from [<c071ca5c>] (check_preemption_disabled+0x138/0x13c) [ 7.011592] [<c071ca5c>] (check_preemption_disabled) from [<c071ca80>] (debug_smp_processor_id+0x20/0x24) [ 7.012214] [<c071ca80>] (debug_smp_processor_id) from [<c03335e0>] (optimized_callback+0x2c/0xe4) [ 7.013077] [<c03335e0>] (optimized_callback) from [<bf0021b0>] (0xbf0021b0) To fix this issue, call get_kprobe_ctlblk() right after irq-disabled since that disables preemption. Fixes: 0dc016dbd820 ("ARM: kprobes: enable OPTPROBES for ARM 32") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26tick/broadcast: Use for_each_cpu() specially on UP kernelsDexuan Cui
commit 5596fe34495cf0f645f417eb928ef224df3e3cb4 upstream. for_each_cpu() unintuitively reports CPU0 as set independent of the actual cpumask content on UP kernels. This causes an unexpected PIT interrupt storm on a UP kernel running in an SMP virtual machine on Hyper-V, and as a result, the virtual machine can suffer from a strange random delay of 1~20 minutes during boot-up, and sometimes it can hang forever. Protect if by checking whether the cpumask is empty before entering the for_each_cpu() loop. [ tglx: Use !IS_ENABLED(CONFIG_SMP) instead of #ifdeffery ] Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poulson <jopoulso@microsoft.com> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: stable@vger.kernel.org Cc: Rakib Mullick <rakib.mullick@gmail.com> Cc: Jork Loeser <Jork.Loeser@microsoft.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: KY Srinivasan <kys@microsoft.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Link: https://lkml.kernel.org/r/KL1P15301MB000678289FE55BA365B3279ABF990@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM Link: https://lkml.kernel.org/r/KL1P15301MB0006FA63BC22BEB64902EAA0BF930@KL1P15301MB0006.APCP153.PROD.OUTLOOK.COM Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26ARM: 8771/1: kprobes: Prohibit kprobes on do_undefinstrMasami Hiramatsu
commit eb0146daefdde65665b7f076fbff7b49dade95b9 upstream. Prohibit kprobes on do_undefinstr because kprobes on arm is implemented by undefined instruction. This means if we probe do_undefinstr(), it can cause infinit recursive exception. Fixes: 24ba613c9d6c ("ARM kprobes: core code") Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26efi: Avoid potential crashes, fix the 'struct efi_pci_io_protocol_32' ↵Ard Biesheuvel
definition for mixed mode commit 0b3225ab9407f557a8e20f23f37aa7236c10a9b1 upstream. Mixed mode allows a kernel built for x86_64 to interact with 32-bit EFI firmware, but requires us to define all struct definitions carefully when it comes to pointer sizes. 'struct efi_pci_io_protocol_32' currently uses a 'void *' for the 'romimage' field, which will be interpreted as a 64-bit field on such kernels, potentially resulting in bogus memory references and subsequent crashes. Tested-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: <stable@vger.kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-efi@vger.kernel.org Link: http://lkml.kernel.org/r/20180504060003.19618-13-ard.biesheuvel@linaro.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26s390: remove indirect branch from do_softirq_own_stackMartin Schwidefsky
commit 9f18fff63cfd6f559daa1eaae60640372c65f84b upstream. The inline assembly to call __do_softirq on the irq stack uses an indirect branch. This can be replaced with a normal relative branch. Cc: stable@vger.kernel.org # 4.16 Fixes: f19fbd5ed6 ("s390: introduce execute-trampolines for branches") Reviewed-by: Hendrik Brueckner <brueckner@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26s390/qdio: don't release memory in qdio_setup_irq()Julian Wiedmann
commit 2e68adcd2fb21b7188ba449f0fab3bee2910e500 upstream. Calling qdio_release_memory() on error is just plain wrong. It frees the main qdio_irq struct, when following code still uses it. Also, no other error path in qdio_establish() does this. So trust callers to clean up via qdio_free() if some step of the QDIO initialization fails. Fixes: 779e6e1c724d ("[S390] qdio: new qdio driver.") Cc: <stable@vger.kernel.org> #v2.6.27+ Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26s390/cpum_sf: ensure sample frequency of perf event attributes is non-zeroHendrik Brueckner
commit 4bbaf2584b86b0772413edeac22ff448f36351b1 upstream. Correct a trinity finding for the perf_event_open() system call with a perf event attribute structure that uses a frequency but has the sampling frequency set to zero. This causes a FP divide exception during the sample rate initialization for the hardware sampling facility. Fixes: 8c069ff4bd606 ("s390/perf: add support for the CPU-Measurement Sampling Facility") Cc: stable@vger.kernel.org # 3.14+ Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26s390/qdio: fix access to uninitialized qdio_q fieldsJulian Wiedmann
commit e521813468f786271a87e78e8644243bead48fad upstream. Ever since CQ/QAOB support was added, calling qdio_free() straight after qdio_alloc() results in qdio_release_memory() accessing uninitialized memory (ie. q->u.out.use_cq and q->u.out.aobs). Followed by a kmem_cache_free() on the random AOB addresses. For older kernels that don't have 6e30c549f6ca, the same applies if qdio_establish() fails in the DEV_STATE_ONLINE check. While initializing q->u.out.use_cq would be enough to fix this particular bug, the more future-proof change is to just zero-alloc the whole struct. Fixes: 104ea556ee7f ("qdio: support asynchronous delivery of storage blocks") Cc: <stable@vger.kernel.org> #v3.2+ Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26mm: don't allow deferred pages with NEED_PER_CPU_KMPavel Tatashin
commit ab1e8d8960b68f54af42b6484b5950bd13a4054b upstream. It is unsafe to do virtual to physical translations before mm_init() is called if struct page is needed in order to determine the memory section number (see SECTION_IN_PAGE_FLAGS). This is because only in mm_init() we initialize struct pages for all the allocated memory when deferred struct pages are used. My recent fix in commit c9e97a1997 ("mm: initialize pages on demand during boot") exposed this problem, because it greatly reduced number of pages that are initialized before mm_init(), but the problem existed even before my fix, as Fengguang Wu found. Below is a more detailed explanation of the problem. We initialize struct pages in four places: 1. Early in boot a small set of struct pages is initialized to fill the first section, and lower zones. 2. During mm_init() we initialize "struct pages" for all the memory that is allocated, i.e reserved in memblock. 3. Using on-demand logic when pages are allocated after mm_init call (when memblock is finished) 4. After smp_init() when the rest free deferred pages are initialized. The problem occurs if we try to do va to phys translation of a memory between steps 1 and 2. Because we have not yet initialized struct pages for all the reserved pages, it is inherently unsafe to do va to phys if the translation itself requires access of "struct page" as in case of this combination: CONFIG_SPARSE && !CONFIG_SPARSE_VMEMMAP The following path exposes the problem: start_kernel() trap_init() setup_cpu_entry_areas() setup_cpu_entry_area(cpu) get_cpu_gdt_paddr(cpu) per_cpu_ptr_to_phys(addr) pcpu_addr_to_page(addr) virt_to_page(addr) pfn_to_page(__pa(addr) >> PAGE_SHIFT) We disable this path by not allowing NEED_PER_CPU_KM with deferred struct pages feature. The problems are discussed in these threads: http://lkml.kernel.org/r/20180418135300.inazvpxjxowogyge@wfg-t540p.sh.intel.com http://lkml.kernel.org/r/20180419013128.iurzouiqxvcnpbvz@wfg-t540p.sh.intel.com http://lkml.kernel.org/r/20180426202619.2768-1-pasha.tatashin@oracle.com Link: http://lkml.kernel.org/r/20180515175124.1770-1-pasha.tatashin@oracle.com Fixes: 3a80a7fa7989 ("mm: meminit: initialise a subset of struct pages if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set") Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Steven Sistare <steven.sistare@oracle.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Fengguang Wu <fengguang.wu@intel.com> Cc: Dennis Zhou <dennisszhou@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26powerpc/powernv: Fix NVRAM sleep in invalid context when crashingNicholas Piggin
commit c1d2a31397ec51f0370f6bd17b19b39152c263cb upstream. Similarly to opal_event_shutdown, opal_nvram_write can be called in the crash path with irqs disabled. Special case the delay to avoid sleeping in invalid context. Fixes: 3b8070335f75 ("powerpc/powernv: Fix OPAL NVRAM driver OPAL_BUSY loops") Cc: stable@vger.kernel.org # v3.2 Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26procfs: fix pthread cross-thread naming if !PR_DUMPABLEJanis Danisevskis
commit 1b3044e39a89cb1d4d5313da477e8dfea2b5232d upstream. The PR_DUMPABLE flag causes the pid related paths of the proc file system to be owned by ROOT. The implementation of pthread_set/getname_np however needs access to /proc/<pid>/task/<tid>/comm. If PR_DUMPABLE is false this implementation is locked out. This patch installs a special permission function for the file "comm" that grants read and write access to all threads of the same group regardless of the ownership of the inode. For all other threads the function falls back to the generic inode permission check. [akpm@linux-foundation.org: fix spello in comment] Signed-off-by: Janis Danisevskis <jdanis@google.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: David Rientjes <rientjes@google.com> Cc: Minfei Huang <mnfhuang@gmail.com> Cc: John Stultz <john.stultz@linaro.org> Cc: Calvin Owens <calvinowens@fb.com> Cc: Jann Horn <jann@thejh.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26proc read mm's {arg,env}_{start,end} with mmap semaphore taken.Mateusz Guzik
commit a3b609ef9f8b1dbfe97034ccad6cd3fe71fbe7ab upstream. Only functions doing more than one read are modified. Consumeres happened to deal with possibly changing data, but it does not seem like a good thing to rely on. Signed-off-by: Mateusz Guzik <mguzik@redhat.com> Acked-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Jarod Wilson <jarod@redhat.com> Cc: Jan Stancek <jstancek@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Anshuman Khandual <anshuman.linux@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26tracing/x86/xen: Remove zero data size trace events ↵Steven Rostedt (VMware)
trace_xen_mmu_flush_tlb{_all} commit 45dd9b0666a162f8e4be76096716670cf1741f0e upstream. Doing an audit of trace events, I discovered two trace events in the xen subsystem that use a hack to create zero data size trace events. This is not what trace events are for. Trace events add memory footprint overhead, and if all you need to do is see if a function is hit or not, simply make that function noinline and use function tracer filtering. Worse yet, the hack used was: __array(char, x, 0) Which creates a static string of zero in length. There's assumptions about such constructs in ftrace that this is a dynamic string that is nul terminated. This is not the case with these tracepoints and can cause problems in various parts of ftrace. Nuke the trace events! Link: http://lkml.kernel.org/r/20180509144605.5a220327@gandalf.local.home Cc: stable@vger.kernel.org Fixes: 95a7d76897c1e ("xen/mmu: Use Xen specific TLB flush instead of the generic one.") Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26cpufreq: intel_pstate: Enable HWP by defaultSrinivas Pandruvada
commit 7791e4aa59ad724e0b4c8b4dea547a5735108972 upstream. If the processor supports HWP, enable it by default without checking for the cpu model. This will allow to enable HWP in all supported processors without driver change. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26signals: avoid unnecessary taking of sighand->siglockWaiman Long
commit c7be96af89d4b53211862d8599b2430e8900ed92 upstream. When running certain database workload on a high-end system with many CPUs, it was found that spinlock contention in the sigprocmask syscalls became a significant portion of the overall CPU cycles as shown below. 9.30% 9.30% 905387 dataserver /proc/kcore 0x7fff8163f4d2 [k] _raw_spin_lock_irq | ---_raw_spin_lock_irq | |--99.34%-- __set_current_blocked | sigprocmask | sys_rt_sigprocmask | system_call_fastpath | | | |--50.63%-- __swapcontext | | | | | |--99.91%-- upsleepgeneric | | | |--49.36%-- __setcontext | | ktskRun Looking further into the swapcontext function in glibc, it was found that the function always call sigprocmask() without checking if there are changes in the signal mask. A check was added to the __set_current_blocked() function to avoid taking the sighand->siglock spinlock if there is no change in the signal mask. This will prevent unneeded spinlock contention when many threads are trying to call sigprocmask(). With this patch applied, the spinlock contention in sigprocmask() was gone. Link: http://lkml.kernel.org/r/1474979209-11867-1-git-send-email-Waiman.Long@hpe.com Signed-off-by: Waiman Long <Waiman.Long@hpe.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Stas Sergeev <stsp@list.ru> Cc: Scott J Norton <scott.norton@hpe.com> Cc: Douglas Hatch <doug.hatch@hpe.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26mm: filemap: avoid unnecessary calls to lock_page when waiting for IO to ↵Mel Gorman
complete during a read commit ebded02788b5d7c7600f8cff26ae07896d568649 upstream. In the generic read paths the kernel looks up a page in the page cache and if it's up to date, it is used. If not, the page lock is acquired to wait for IO to complete and then check the page. If multiple processes are waiting on IO, they all serialise against the lock and duplicate the checks. This is unnecessary. The page lock in itself does not give any guarantees to the callers about the page state as it can be immediately truncated or reclaimed after the page is unlocked. It's sufficient to wait_on_page_locked and then continue if the page is up to date on wakeup. It is possible that a truncated but up-to-date page is returned but the reference taken during read prevents it disappearing underneath the caller and the data is still valid if PageUptodate. The overall impact is small as even if processes serialise on the lock, the lock section is tiny once the IO is complete. Profiles indicated that unlock_page and friends are generally a tiny portion of a read-intensive workload. An artificial test was created that had instances of dd access a cache-cold file on an ext4 filesystem and measure how long the read took. paralleldd 4.4.0 4.4.0 vanilla avoidlock Amean Elapsd-1 5.28 ( 0.00%) 5.15 ( 2.50%) Amean Elapsd-4 5.29 ( 0.00%) 5.17 ( 2.12%) Amean Elapsd-7 5.28 ( 0.00%) 5.18 ( 1.78%) Amean Elapsd-12 5.20 ( 0.00%) 5.33 ( -2.50%) Amean Elapsd-21 5.14 ( 0.00%) 5.21 ( -1.41%) Amean Elapsd-30 5.30 ( 0.00%) 5.12 ( 3.38%) Amean Elapsd-48 5.78 ( 0.00%) 5.42 ( 6.21%) Amean Elapsd-79 6.78 ( 0.00%) 6.62 ( 2.46%) Amean Elapsd-110 9.09 ( 0.00%) 8.99 ( 1.15%) Amean Elapsd-128 10.60 ( 0.00%) 10.43 ( 1.66%) The impact is small but intuitively, it makes sense to avoid unnecessary calls to lock_page. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26mm: filemap: remove redundant code in do_read_cache_pageMel Gorman
commit 32b635298ff4e991d8d8f64dc23782b02eec29c3 upstream. do_read_cache_page and __read_cache_page duplicate page filler code when filling the page for the first time. This patch simply removes the duplicate logic. Signed-off-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Jan Kara <jack@suse.cz> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26proc: meminfo: estimate available memory more conservativelyJohannes Weiner
commit 84ad5802a33a4964a49b8f7d24d80a214a096b19 upstream. The MemAvailable item in /proc/meminfo is to give users a hint of how much memory is allocatable without causing swapping, so it excludes the zones' low watermarks as unavailable to userspace. However, for a userspace allocation, kswapd will actually reclaim until the free pages hit a combination of the high watermark and the page allocator's lowmem protection that keeps a certain amount of DMA and DMA32 memory from userspace as well. Subtract the full amount we know to be unavailable to userspace from the number of free pages when calculating MemAvailable. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Cc: Rik van Riel <riel@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26vmscan: do not force-scan file lru if its absolute size is smallVladimir Davydov
commit 316bda0e6cc5f36f94b4af8bded16d642c90ad75 upstream. We assume there is enough inactive page cache if the size of inactive file lru is greater than the size of active file lru, in which case we force-scan file lru ignoring anonymous pages. While this logic works fine when there are plenty of page cache pages, it fails if the size of file lru is small (several MB): in this case (lru_size >> prio) will be 0 for normal scan priorities, as a result, if inactive file lru happens to be larger than active file lru, anonymous pages of a cgroup will never get evicted unless the system experiences severe memory pressure, even if there are gigabytes of unused anonymous memory there, which is unfair in respect to other cgroups, whose workloads might be page cache oriented. This patch attempts to fix this by elaborating the "enough inactive page cache" check: it makes it not only check that inactive lru size > active lru size, but also that we will scan something from the cgroup at the current scan priority. If these conditions do not hold, we proceed to SCAN_FRACT as usual. Signed-off-by: Vladimir Davydov <vdavydov@virtuozzo.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-05-26powerpc: Don't preempt_disable() in show_cpuinfo()Benjamin Herrenschmidt
commit 349524bc0da698ec77f2057cf4a4948eb6349265 upstream. This causes warnings from cpufreq mutex code. This is also rather unnecessary and ineffective. If we really want to prevent concurrent unplug, we could take the unplug read lock but I don't see this being critical. Fixes: cd77b5ce208c ("powerpc/powernv/cpufreq: Fix the frequency read by /proc/cpuinfo") Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>