summaryrefslogtreecommitdiff
path: root/drivers/scsi/lpfc
AgeCommit message (Collapse)Author
2022-02-23scsi: lpfc: Fix pt2pt NVMe PRLI reject LOGO loopJames Smart
commit 7f4c5a26f735dea4bbc0eb8eb9da99cda95a8563 upstream. When connected point to point, the driver does not know the FC4's supported by the other end. In Fabrics, it can query the nameserver. Thus the driver must send PRLIs for the FC4s it supports and enable support based on the acc(ept) or rej(ect) of the respective FC4 PRLI. Currently the driver supports SCSI and NVMe PRLIs. Unfortunately, although the behavior is per standard, many devices have come to expect only SCSI PRLIs. In this particular example, the NVMe PRLI is properly RJT'd but the target decided that it must LOGO after seeing the unexpected NVMe PRLI. The LOGO causes the sequence to restart and login is now in an infinite failure loop. Fix the problem by having the driver, on a pt2pt link, remember NVMe PRLI accept or reject status across logout as long as the link stays "up". When retrying login, if the prior NVMe PRLI was rejected, it will not be sent on the next login. Link: https://lore.kernel.org/r/20220212163120.15385-1-jsmart2021@gmail.com Cc: <stable@vger.kernel.org> # v5.4+ Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-02-16scsi: lpfc: Remove NVMe support if kernel has NVME_FC disabledJames Smart
commit c80b27cfd93ba9f5161383f798414609e84729f3 upstream. The driver is initiating NVMe PRLIs to determine device NVMe support. This should not be occurring if CONFIG_NVME_FC support is disabled. Correct this by changing the default value for FC4 support. Currently it defaults to FCP and NVMe. With change, when NVME_FC support is not enabled in the kernel, the default value is just FCP. Link: https://lore.kernel.org/r/20220207180516.73052-1-jsmart2021@gmail.com Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-01-27scsi: lpfc: Trigger SLI4 firmware dump before doing driver cleanupJames Smart
[ Upstream commit 7dd2e2a923173d637c272e483966be8e96a72b64 ] Extraneous teardown routines are present in the firmware dump path causing altered states in firmware captures. When a firmware dump is requested via sysfs, trigger the dump immediately without tearing down structures and changing adapter state. The driver shall rely on pre-existing firmware error state clean up handlers to restore the adapter. Link: https://lore.kernel.org/r/20211204002644.116455-6-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-01-05scsi: lpfc: Terminate string in lpfc_debugfs_nvmeio_trc_write()Dan Carpenter
[ Upstream commit 9020be114a47bf7ff33e179b3bb0016b91a098e6 ] The "mybuf" string comes from the user, so we need to ensure that it is NUL terminated. Link: https://lore.kernel.org/r/20211214070527.GA27934@kili Fixes: bd2cdd5e400f ("scsi: lpfc: NVME Initiator: Add debugfs support") Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-11-26scsi: lpfc: Fix list_add() corruption in lpfc_drain_txq()James Smart
[ Upstream commit 99154581b05c8fb22607afb7c3d66c1bace6aa5d ] When parsing the txq list in lpfc_drain_txq(), the driver attempts to pass the requests to the adapter. If such an attempt fails, a local "fail_msg" string is set and a log message output. The job is then added to a completions list for cancellation. Processing of any further jobs from the txq list continues, but since "fail_msg" remains set, jobs are added to the completions list regardless of whether a wqe was passed to the adapter. If successfully added to txcmplq, jobs are added to both lists resulting in list corruption. Fix by clearing the fail_msg string after adding a job to the completions list. This stops the subsequent jobs from being added to the completions list unless they had an appropriate failure. Link: https://lore.kernel.org/r/20210910233159.115896-2-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-09-30scsi: lpfc: Use correct scnprintf() limitDan Carpenter
[ Upstream commit 6dacc371b77f473770ec646e220303a84fe96c11 ] The limit should be "PAGE_SIZE - len" instead of "PAGE_SIZE". We're not going to hit the limit so this fix will not affect runtime. Link: https://lore.kernel.org/r/20210916132331.GE25094@kili Fixes: 5b9e70b22cc5 ("scsi: lpfc: raise sg count for nvme to use available sg resources") Reviewed-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-20scsi: lpfc: Fix crash when lpfc_sli4_hba_setup() fails to initialize the SGLsJames Smart
[ Upstream commit 5aa615d195f1e142c662cb2253f057c9baec7531 ] The driver is encountering a crash in lpfc_free_iocb_list() while performing initial attachment. Code review found this to be an errant failure path that was taken, jumping to a tag that then referenced structures that were uninitialized. Fix the failure path. Link: https://lore.kernel.org/r/20210514195559.119853-9-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-07-20scsi: lpfc: Fix "Unexpected timeout" error in direct attach topologyJames Smart
[ Upstream commit e30d55137edef47434c40d7570276a0846fe922c ] An 'unexpected timeout' message may be seen in a point-2-point topology. The message occurs when a PLOGI is received before the driver is notified of FLOGI completion. The FLOGI completion failure causes discovery to be triggered for a second time. The discovery timer is restarted but no new discovery activity is initiated, thus the timeout message eventually appears. In point-2-point, when discovery has progressed before the FLOGI completion is processed, it is not a failure. Add code to FLOGI completion to detect that discovery has progressed and exit the FLOGI handling (noop'ing it). Link: https://lore.kernel.org/r/20210514195559.119853-4-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-22scsi: lpfc: Fix illegal memory access on Abort IOCBsJames Smart
[ Upstream commit e1364711359f3ced054bda9920477c8bf93b74c5 ] In devloss timer handler and in backend calls to terminate remote port I/O, there is logic to walk through all active IOCBs and validate them to potentially trigger an abort request. This logic is causing illegal memory accesses which leads to a crash. Abort IOCBs, which may be on the list, do not have an associated lpfc_io_buf struct. The driver is trying to map an lpfc_io_buf struct on the IOCB and which results in a bogus address thus the issue. Fix by skipping over ABORT IOCBs (CLOSE IOCBs are ABORTS that don't send ABTS) in the IOCB scan logic. Link: https://lore.kernel.org/r/20210421234433.102079-1-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-11scsi: lpfc: Remove unsupported mbox PORT_CAPABILITIES logicJames Smart
[ Upstream commit b62232ba8caccaf1954e197058104a6478fac1af ] SLI-4 does not contain a PORT_CAPABILITIES mailbox command (only SLI-3 does, and SLI-3 doesn't use it), yet there are SLI-4 code paths that have code to issue the command. The command will always fail. Remove the code for the mailbox command and leave only the resulting "failure path" logic. Link: https://lore.kernel.org/r/20210412013127.2387-12-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-11scsi: lpfc: Fix error handling for mailboxes completed in MBX_POLL modeJames Smart
[ Upstream commit 304ee43238fed517faa123e034b593905b8679f8 ] In SLI-4, when performing a mailbox command with MBX_POLL, the driver uses the BMBX register to send the command rather than the MQ. A flag is set indicating the BMBX register is active and saves the mailbox job struct (mboxq) in the mbox_active element of the adapter. The routine then waits for completion or timeout. The mailbox job struct is not freed by the routine. In cases of timeout, the adapter will be reset. The lpfc_sli_mbox_sys_flush() routine will clean up the mbox in preparation for the reset. It clears the BMBX active flag and marks the job structure as MBX_NOT_FINISHED. But, it never frees the mboxq job structure. Expectation in both normal completion and timeout cases is that the issuer of the mbx command will free the structure. Unfortunately, not all calling paths are freeing the memory in cases of error. All calling paths were looked at and updated, if missing, to free the mboxq memory regardless of completion status. Link: https://lore.kernel.org/r/20210412013127.2387-7-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-11scsi: lpfc: Fix crash when a REG_RPI mailbox fails triggering a LOGO responseJames Smart
[ Upstream commit fffd18ec6579c2d9c72b212169259062fe747888 ] Fix a crash caused by a double put on the node when the driver completed an ACC for an unsolicted abort on the same node. The second put was executed by lpfc_nlp_not_used() and is wrong because the completion routine executes the nlp_put when the iocbq was released. Additionally, the driver is issuing a LOGO then immediately calls lpfc_nlp_set_state to put the node into NPR. This call does nothing. Remove the lpfc_nlp_not_used call and additional set_state in the completion routine. Remove the lpfc_nlp_set_state post issue_logo. Isn't necessary. Link: https://lore.kernel.org/r/20210412013127.2387-3-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-11scsi: lpfc: Fix pt2pt connection does not recover after LOGOJames Smart
[ Upstream commit bd4f5100424d17d4e560d6653902ef8e49b2fc1f ] On a pt2pt setup, between 2 initiators, if one side issues a a LOGO, there is no relogin attempt. The FC specs are grey in this area on which port (higher wwn or not) is to re-login. As there is no spec guidance, unconditionally re-PLOGI after the logout to ensure a login is re-established. Link: https://lore.kernel.org/r/20210301171821.3427-8-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-05-11scsi: lpfc: Fix incorrect dbde assignment when building target abts wqeJames Smart
[ Upstream commit 9302154c07bff4e7f7f43c506a1ac84540303d06 ] The wqe_dbde field indicates whether a Data BDE is present in Words 0:2 and should therefore should be clear in the abts request wqe. By setting the bit we can be misleading fw into error cases. Clear the wqe_dbde field. Link: https://lore.kernel.org/r/20210301171821.3427-2-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2021-03-24scsi: lpfc: Fix some error codes in debugfsDan Carpenter
commit 19f1bc7edf0f97186810e13a88f5b62069d89097 upstream. If copy_from_user() or kstrtoull() fail then the correct behavior is to return a negative error code. Link: https://lore.kernel.org/r/YEsbU/UxYypVrC7/@mwanda Fixes: f9bb2da11db8 ("[SCSI] lpfc 8.3.27: T10 additions for SLI4") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23scsi: lpfc: Make lpfc_defer_acc_rsp staticYueHaibing
commit fdb827e4a3f84cb92e286a821114ac0ad79c8281 upstream. Fix sparse warning: drivers/scsi/lpfc/lpfc_nportdisc.c:344:1: warning: symbol 'lpfc_defer_acc_rsp' was not declared. Should it be static? Link: https://lore.kernel.org/r/20200107014956.41748-1-yuehaibing@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2021-01-23scsi: lpfc: Make function lpfc_defer_pt2pt_acc staticzhengbin
commit f7cb0d0945ebc9879aff72cf7b3342fd1040ffaa upstream. Fix sparse warnings: drivers/scsi/lpfc/lpfc_nportdisc.c:290:1: warning: symbol 'lpfc_defer_pt2pt_acc' was not declared. Should it be static? Link: https://lore.kernel.org/r/1570183477-137273-1-git-send-email-zhengbin13@huawei.com Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhengbin <zhengbin13@huawei.com> Reviewed-by: Dick Kennedy <dick.kennedy@broadcom.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30scsi: lpfc: Re-fix use after free in lpfc_rq_buf_free()James Smart
commit e5785d3ec32f5f44dd88cd7b398e496742630469 upstream. Commit 9816ef6ecbc1 ("scsi: lpfc: Use after free in lpfc_rq_buf_free()") was made to correct a use after free condition in lpfc_rq_buf_free(). Unfortunately, a subsequent patch cut on a tree without the fix inadvertently reverted the fix. Put the fix back: Move the freeing of the rqb_entry to after the print function that references it. Link: https://lore.kernel.org/r/20201020202719.54726-4-james.smart@broadcom.com Fixes: 411de511c694 ("scsi: lpfc: Fix RQ empty firmware trap") Cc: <stable@vger.kernel.org> # v4.17+ Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-12-30scsi: lpfc: Fix invalid sleeping context in lpfc_sli4_nvmet_alloc()James Smart
commit 62e3a931db60daf94fdb3159d685a5bc6ad4d0cf upstream. The following calltrace was seen: BUG: sleeping function called from invalid context at mm/slab.h:494 ... Call Trace: dump_stack+0x9a/0xf0 ___might_sleep.cold.63+0x13d/0x178 slab_pre_alloc_hook+0x6a/0x90 kmem_cache_alloc_trace+0x3a/0x2d0 lpfc_sli4_nvmet_alloc+0x4c/0x280 [lpfc] lpfc_post_rq_buffer+0x2e7/0xa60 [lpfc] lpfc_sli4_hba_setup+0x6b4c/0xa4b0 [lpfc] lpfc_pci_probe_one_s4.isra.15+0x14f8/0x2280 [lpfc] lpfc_pci_probe_one+0x260/0x2880 [lpfc] local_pci_probe+0xd4/0x180 work_for_cpu_fn+0x51/0xa0 process_one_work+0x8f0/0x17b0 worker_thread+0x536/0xb50 kthread+0x30c/0x3d0 ret_from_fork+0x3a/0x50 A prior patch introduced a spin_lock_irqsave(hbalock) in the lpfc_post_rq_buffer() routine. Call trace is seen as the hbalock is held with interrupts disabled during a GFP_KERNEL allocation in lpfc_sli4_nvmet_alloc(). Fix by reordering locking so that hbalock not held when calling sli4_nvmet_alloc() (aka rqb_buf_list()). Link: https://lore.kernel.org/r/20201020202719.54726-2-james.smart@broadcom.com Fixes: 411de511c694 ("scsi: lpfc: Fix RQ empty firmware trap") Cc: <stable@vger.kernel.org> # v4.17+ Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-01scsi: lpfc: Fix initial FLOGI failure due to BBSCN not supportedJames Smart
commit 7f04839ec4483563f38062b4dd90253e45447198 upstream. Initial FLOGIs are failing with the following message: lpfc 0000:13:00.1: 1:(0):0820 FLOGI Failed (x300). BBCredit Not Supported In a prior patch, the READ_SPARAM command was re-ordered to post after CONFIG_LINK as the driver is expected to update the driver's copy of the service parameters for the FLOGI payload. If the bb-credit recovery feature is enabled, this is fine. But on adapters were bb-credit recovery isn't enabled, it would cause the FLOGI to fail. Fix by restoring the original command order (READ_SPARAM before CONFIG_LINK), and after issuing CONFIG_LINK, detect bb-credit recovery support and reissuing READ_SPARAM to obtain the updated service parameters (effectively adding in the fix command order). [mkp: corrected SHA] Link: https://lore.kernel.org/r/20200911200147.110826-1-james.smart@broadcom.com Fixes: 835214f5d5f5 ("scsi: lpfc: Fix broken Credit Recovery after driver load") CC: <stable@vger.kernel.org> # v5.7+ Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-10-01scsi: lpfc: Fix coverity errors in fmdi attribute handlingJames Smart
[ Upstream commit 4cb9e1ddaa145be9ed67b6a7de98ca705a43f998 ] Coverity reported a memory corruption error for the fdmi attributes routines: CID 15768 [Memory Corruption] Out-of-bounds access on FDMI Sloppy coding of the fmdi structures. In both the lpfc_fdmi_attr_def and lpfc_fdmi_reg_port_list structures, a field was placed at the start of payload that may have variable content. The field was given an arbitrary type (uint32_t). The code then uses the field name to derive an address, which it used in things such as memset and memcpy. The memset sizes or memcpy lengths were larger than the arbitrary type, thus coverity reported an error. Fix by replacing the arbitrary fields with the real field structures describing the payload. Link: https://lore.kernel.org/r/20200128002312.16346-8-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01scsi: lpfc: Fix release of hwq to clear the eq relationshipJames Smart
[ Upstream commit 821bc882accaaaf1bbecf5c0ecef659443e3e8cb ] When performing reset testing, the eq's list for related hwqs was getting corrupted. In cases where there is not a 1:1 eq to hwq, the eq is shared. The eq maintains a list of hwqs utilizing it in case of cpu offlining and polling. During the reset, the hwqs are being torn down so they can be recreated. The recreation was getting confused by seeing a non-null eq assignment on the eq and the eq list became corrupt. Correct by clearing the hdwq eq assignment when the hwq is cleaned up. Link: https://lore.kernel.org/r/20200128002312.16346-6-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01scsi: lpfc: Fix RQ buffer leakage when no IOCBs availableJames Smart
[ Upstream commit 39c4f1a965a9244c3ba60695e8ff8da065ec6ac4 ] The driver is occasionally seeing the following SLI Port error, requiring reset and reinit: Port Status Event: ... error 1=0x52004a01, error 2=0x218 The failure means an RQ timeout. That is, the adapter had received asynchronous receive frames, ran out of buffer slots to place the frames, and the driver did not replenish the buffer slots before a timeout occurred. The driver should not be so slow in replenishing buffers that a timeout can occur. When the driver received all the frames of a sequence, it allocates an IOCB to put the frames in. In a situation where there was no IOCB available for the frame of a sequence, the RQ buffer corresponding to the first frame of the sequence was not returned to the FW. Eventually, with enough traffic encountering the situation, the timeout occurred. Fix by releasing the buffer back to firmware whenever there is no IOCB for the first frame. [mkp: typo] Link: https://lore.kernel.org/r/20200128002312.16346-2-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01scsi: lpfc: Fix incomplete NVME discovery when targetJames Smart
[ Upstream commit be0709e449ac9d9753a5c17e5b770d6e5e930e4a ] NVMe device re-discovery does not complete. Dev_loss_tmo messages seen on initiator after recovery from a link disturbance. The failing case is the following: When the driver (as a NVME target) receives a PLOGI, the driver initiates an "unreg rpi" mailbox command. While the mailbox command is in progress, the driver requests that an ACC be sent to the initiator. The target's ACC is received by the initiator and the initiator then transmits a PLOGI. The driver receives the PLOGI prior to receiving the completion for the PLOGI response WQE that sent the ACC. (Different delivery sources from the hw so the race is very possible). Given the PLOGI is prior to the ACC completion (signifying PLOGI exchange complete), the driver LS_RJT's the PRLI. The "unreg rpi" mailbox then completes. Since PRLI has been received, the driver transmits a PLOGI to restart discovery, which the initiator then ACC's. If the driver processes the (re)PLOGI ACC prior to the completing the handling for the earlier ACC it sent the intiators original PLOGI, there is no state change for completion of the (re)PLOGI. The ndlp remains in "PLOGI Sent" and the initiator continues sending PRLI's which are rejected by the target until timeout or retry is reached. Fix by: When in target mode, defer sending an ACC for the received PLOGI until unreg RPI completes. Link: https://lore.kernel.org/r/20191218235808.31922-2-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01scsi: lpfc: Fix kernel crash at lpfc_nvme_info_show during remote port bounceJames Smart
[ Upstream commit 6c1e803eac846f886cd35131e6516fc51a8414b9 ] When reading sysfs nvme_info file while a remote port leaves and comes back, a NULL pointer is encountered. The issue is due to ndlp list corruption as the the nvme_info_show does not use the same lock as the rest of the code. Correct by removing the rcu_xxx_lock calls and replace by the host_lock and phba->hbaLock spinlocks that are used by the rest of the driver. Given we're called from sysfs, we are safe to use _irq rather than _irqsave. Link: https://lore.kernel.org/r/20191105005708.7399-4-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-10-01scsi: lpfc: Fix pt2pt discovery on SLI3 HBAsJames Smart
[ Upstream commit 359e10f087dbb7b9c9f3035a8cc4391af45bd651 ] After exchanging PLOGI on an SLI-3 adapter, the PRLI exchange failed. Link trace showed the port was assigned a non-zero n_port_id, but didn't use the address on the PRLI. The assigned address is set on the port by the CONFIG_LINK mailbox command. The driver responded to the PRLI before the mailbox command completed. Thus the PRLI response used the old n_port_id. Defer the PRLI response until CONFIG_LINK completes. Link: https://lore.kernel.org/r/20190922035906.10977-2-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-23scsi: lpfc: Fix FLOGI/PLOGI receive race condition in pt2pt discoveryJames Smart
[ Upstream commit 7b08e89f98cee9907895fabb64cf437bc505ce9a ] The driver is unable to successfully login with remote device. During pt2pt login, the driver completes its FLOGI request with the remote device having WWN precedence. The remote device issues its own (delayed) FLOGI after accepting the driver's and, upon transmitting the FLOGI, immediately recognizes it has already processed the driver's FLOGI thus it transitions to sending a PLOGI before waiting for an ACC to its FLOGI. In the driver, the FLOGI is received and an ACC sent, followed by the PLOGI being received and an ACC sent. The issue is that the PLOGI reception occurs before the response from the adapter from the FLOGI ACC is received. Processing of the PLOGI sets state flags to perform the REG_RPI mailbox command and proceed with the rest of discovery on the port. The same completion routine used by both FLOGI and PLOGI is generic in nature. One of the things it does is clear flags, and those flags happen to drive the rest of discovery. So what happened was the PLOGI processing set the flags, the FLOGI ACC completion cleared them, thus when the PLOGI ACC completes it doesn't see the flags and stops. Fix by modifying the generic completion routine to not clear the rest of discovery flag (NLP_ACC_REGLOGIN) unless the completion is also associated with performing a mailbox command as part of its handling. For things such as FLOGI ACC, there isn't a subsequent action to perform with the adapter, thus there is no mailbox cmd ptr. PLOGI ACC though will perform REG_RPI upon completion, thus there is a mailbox cmd ptr. Link: https://lore.kernel.org/r/20200828175332.130300-3-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-09-03scsi: lpfc: Fix shost refcount mismatch when deleting vportDick Kennedy
[ Upstream commit 03dbfe0668e6692917ac278883e0586cd7f7d753 ] When vports are deleted, it is observed that there is memory/kthread leakage as the vport isn't fully being released. There is a shost reference taken in scsi_add_host_dma that is not released during scsi_remove_host. It was noticed that other drivers resolve this by doing a scsi_host_put after calling scsi_remove_host. The vport_delete routine is taking two references one that corresponds to an access to the scsi_host in the vport_delete routine and another that is released after the adapter mailbox command completes that destroys the VPI that corresponds to the vport. Remove one of the references taken such that the second reference that is put will complete the missing scsi_add_host_dma reference and the shost will be terminated. Link: https://lore.kernel.org/r/20200630215001.70793-8-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-08-21scsi: lpfc: nvmet: Avoid hang / use-after-free again when destroying targetportEwan D. Milne
[ Upstream commit af6de8c60fe9433afa73cea6fcccdccd98ad3e5e ] We cannot wait on a completion object in the lpfc_nvme_targetport structure in the _destroy_targetport() code path because the NVMe/fc transport will free that structure immediately after the .targetport_delete() callback. This results in a use-after-free, and a crash if slub_debug=FZPU is enabled. An earlier fix put put the completion on the stack, but commit 2a0fb340fcc8 ("scsi: lpfc: Correct localport timeout duration error") subsequently changed the code to reference the completion through a pointer in the object rather than the local stack variable. Fix this by using the stack variable directly. Link: https://lore.kernel.org/r/20200729231011.13240-1-emilne@redhat.com Fixes: 2a0fb340fcc8 ("scsi: lpfc: Correct localport timeout duration error") Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-30scsi: lpfc: Avoid another null dereference in lpfc_sli4_hba_unset()SeongJae Park
[ Upstream commit 46da547e21d6cefceec3fb3dba5ebbca056627fc ] Commit cdb42becdd40 ("scsi: lpfc: Replace io_channels for nvme and fcp with general hdw_queues per cpu") has introduced static checker warnings for potential null dereferences in 'lpfc_sli4_hba_unset()' and commit 1ffdd2c0440d ("scsi: lpfc: resolve static checker warning in lpfc_sli4_hba_unset") has tried to fix it. However, yet another potential null dereference is remaining. This commit fixes it. This bug was discovered and resolved using Coverity Static Analysis Security Testing (SAST) by Synopsys, Inc. Link: https://lore.kernel.org/r/20200623084122.30633-1-sjpark@amazon.com Fixes: 1ffdd2c0440d ("scsi: lpfc: resolve static checker warning inlpfc_sli4_hba_unset") Fixes: cdb42becdd40 ("scsi: lpfc: Replace io_channels for nvme and fcp with general hdw_queues per cpu") Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: SeongJae Park <sjpark@amazon.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-24scsi: lpfc: Fix lpfc_nodelist leak when processing unsolicited eventXiyu Yang
[ Upstream commit 7217e6e694da3aae6d17db8a7f7460c8d4817ebf ] In order to create or activate a new node, lpfc_els_unsol_buffer() invokes lpfc_nlp_init() or lpfc_enable_node() or lpfc_nlp_get(), all of them will return a reference of the specified lpfc_nodelist object to "ndlp" with increased refcnt. When lpfc_els_unsol_buffer() returns, local variable "ndlp" becomes invalid, so the refcount should be decreased to keep refcount balanced. The reference counting issue happens in one exception handling path of lpfc_els_unsol_buffer(). When "ndlp" in DEV_LOSS, the function forgets to decrease the refcnt increased by lpfc_nlp_init() or lpfc_enable_node() or lpfc_nlp_get(), causing a refcnt leak. Fix this issue by calling lpfc_nlp_put() when "ndlp" in DEV_LOSS. Link: https://lore.kernel.org/r/1590416184-52592-1-git-send-email-xiyuyang19@fudan.edu.cn Reviewed-by: Daniel Wagner <dwagner@suse.de> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Xiyu Yang <xiyuyang19@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-06-17scsi: lpfc: Fix negation of else clause in lpfc_prep_node_fc4typeDick Kennedy
commit f809da6db68a8be49e317f0ccfbced1af9258839 upstream. Implementation of a previous patch added a condition to an if check that always end up with the if test being true. Execution of the else clause was inadvertently negated. The additional condition check was incorrect and unnecessary after the other modifications had been done in that patch. Remove the check from the if series. Link: https://lore.kernel.org/r/20200501214310.91713-5-jsmart2021@gmail.com Fixes: b95b21193c85 ("scsi: lpfc: Fix loss of remote port after devloss due to lack of RPIs") Cc: <stable@vger.kernel.org> # v5.4+ Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-29scsi: lpfc: Fix crash in target side cable pulls hitting WAIT_FOR_UNREGJames Smart
[ Upstream commit 807e7353d8a7105ce884d22b0dbc034993c6679c ] Kernel is crashing with the following stacktrace: BUG: unable to handle kernel NULL pointer dereference at 00000000000005bc IP: lpfc_nvme_register_port+0x1a8/0x3a0 [lpfc] ... Call Trace: lpfc_nlp_state_cleanup+0x2b2/0x500 [lpfc] lpfc_nlp_set_state+0xd7/0x1a0 [lpfc] lpfc_cmpl_prli_prli_issue+0x1f7/0x450 [lpfc] lpfc_disc_state_machine+0x7a/0x1e0 [lpfc] lpfc_cmpl_els_prli+0x16f/0x1e0 [lpfc] lpfc_sli_sp_handle_rspiocb+0x5b2/0x690 [lpfc] lpfc_sli_handle_slow_ring_event_s4+0x182/0x230 [lpfc] lpfc_do_work+0x87f/0x1570 [lpfc] kthread+0x10d/0x130 ret_from_fork+0x35/0x40 During target side fault injections, it is possible to hit the NLP_WAIT_FOR_UNREG case in lpfc_nvme_remoteport_delete. A prior commit fixed a rebind and delete race condition, but called lpfc_nlp_put unconditionally. This triggered a deletion and the crash. Fix by movng nlp_put to inside the NLP_WAIT_FOR_UNREG case, where the nlp will be being unregistered/removed. Leave the reference if the flag isn't set. Link: https://lore.kernel.org/r/20200322181304.37655-8-jsmart2021@gmail.com Fixes: b15bd3e6212e ("scsi: lpfc: Fix nvme remoteport registration race conditions") Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-29scsi: lpfc: Fix crash after handling a pci errorJames Smart
[ Upstream commit 4cd70891308dfb875ef31060c4a4aa8872630a2e ] Injecting EEH on a 32GB card is causing kernel oops The pci error handler is doing an IO flush and the offline code is also doing an IO flush. When the 1st flush is complete the hdwq is destroyed (freed), yet the second flush accesses the hdwq and crashes. Added a check in lpfc_sli4_fush_io_rings to check both the HBA_IOQ_FLUSH flag and the hdwq pointer to see if it is already set and not already freed. Link: https://lore.kernel.org/r/20200322181304.37655-6-jsmart2021@gmail.com Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-29scsi: lpfc: Fix kasan slab-out-of-bounds error in lpfc_unreg_loginJames Smart
[ Upstream commit 38503943c89f0bafd9e3742f63f872301d44cbea ] The following kasan bug was called out: BUG: KASAN: slab-out-of-bounds in lpfc_unreg_login+0x7c/0xc0 [lpfc] Read of size 2 at addr ffff889fc7c50a22 by task lpfc_worker_3/6676 ... Call Trace: dump_stack+0x96/0xe0 ? lpfc_unreg_login+0x7c/0xc0 [lpfc] print_address_description.constprop.6+0x1b/0x220 ? lpfc_unreg_login+0x7c/0xc0 [lpfc] ? lpfc_unreg_login+0x7c/0xc0 [lpfc] __kasan_report.cold.9+0x37/0x7c ? lpfc_unreg_login+0x7c/0xc0 [lpfc] kasan_report+0xe/0x20 lpfc_unreg_login+0x7c/0xc0 [lpfc] lpfc_sli_def_mbox_cmpl+0x334/0x430 [lpfc] ... When processing the completion of a "Reg Rpi" login mailbox command in lpfc_sli_def_mbox_cmpl, a call may be made to lpfc_unreg_login. The vpi is extracted from the completing mailbox context and passed as an input for the next. However, the vpi stored in the mailbox command context is an absolute vpi, which for SLI4 represents both base + offset. When used with a non-zero base component, (function id > 0) this results in an out-of-range access beyond the allocated phba->vpi_ids array. Fix by subtracting the function's base value to get an accurate vpi number. Link: https://lore.kernel.org/r/20200322181304.37655-2-jsmart2021@gmail.com Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17scsi: lpfc: fix inlining of lpfc_sli4_cleanup_poll_list()James Smart
[ Upstream commit d480e57809a043333a3b9e755c0bdd43e10a9f12 ] Compilation can fail due to having an inline function reference where the function body is not present. Fix by removing the inline tag. Fixes: 93a4d6f40198 ("scsi: lpfc: Add registration for CPU Offline/Online events") Link: https://lore.kernel.org/r/20191111230401.12958-4-jsmart2021@gmail.com Reviewed-by: Ewan D. Milne <emilne@redhat.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17scsi: lpfc: Fix broken Credit Recovery after driver loadJames Smart
[ Upstream commit 835214f5d5f516a38069bc077c879c7da00d6108 ] When driver is set to enable bb credit recovery, the switch displayed the setting as inactive. If the link bounces, it switches to Active. During link up processing, the driver currently does a MBX_READ_SPARAM followed by a MBX_CONFIG_LINK. These mbox commands are queued to be executed, one at a time and the completion is processed by the worker thread. Since the MBX_READ_SPARAM is done BEFORE the MBX_CONFIG_LINK, the BB_SC_N bit is never set the the returned values. BB Credit recovery status only gets set after the driver requests the feature in CONFIG_LINK, which is done after the link up. Thus the ordering of READ_SPARAM needs to follow the CONFIG_LINK. Fix by reordering so that READ_SPARAM is done after CONFIG_LINK. Added a HBA_DEFER_FLOGI flag so that any FLOGI handling waits until after the READ_SPARAM is done so that the proper BB credit value is set in the FLOGI payload. Fixes: 6bfb16208298 ("scsi: lpfc: Fix configuration of BB credit recovery in service parameters") Cc: <stable@vger.kernel.org> # v5.4+ Link: https://lore.kernel.org/r/20200128002312.16346-4-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17scsi: lpfc: Fix configuration of BB credit recovery in service parametersJames Smart
[ Upstream commit 6bfb1620829825c01e1dcdd63b6a7700352babd9 ] The driver today is reading service parameters from the firmware and then overwriting the firmware-provided values with values of its own. There are some switch features that require preliminary FLOGI's that are switch-specific and done prior to the actual fabric FLOGI for traffic. The fw will perform those FLOGIs and will revise the service parameters for the features configured. As the driver later overwrites those values with its own values, it misconfigures things like BBSCN use by doing so. Correct by eliminating the driver-overwrite of firmware values. The driver correctly re-reads the service parameters after each link up to obtain the latest values from firmware. Link: https://lore.kernel.org/r/20191105005708.7399-3-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17scsi: lpfc: Fix Fabric hostname registration if system hostname changesJames Smart
[ Upstream commit e3ba04c9bad1d1c7f15df43da25e878045150777 ] There are reports of multiple ports on the same system displaying different hostnames in fabric FDMI displays. Currently, the driver registers the hostname at initialization and obtains the hostname via init_utsname()->nodename queried at the time the FC link comes up. Unfortunately, if the machine hostname is updated after initialization, such as via DHCP or admin command, the value registered initially will be incorrect. Fix by having the driver save the hostname that was registered with FDMI. The driver then runs a heartbeat action that will check the hostname. If the name changes, reregister the FMDI data. The hostname is used in RSNN_NN, FDMI RPA and FDMI RHBA. Link: https://lore.kernel.org/r/20191218235808.31922-5-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17scsi: lpfc: Add registration for CPU Offline/Online eventsJames Smart
[ Upstream commit 93a4d6f40198dffcca35d9a928c409f9290f1fe0 ] The recent affinitization didn't address cpu offlining/onlining. If an interrupt vector is shared and the low order cpu owning the vector is offlined, as interrupts are managed, the vector is taken offline. This causes the other CPUs sharing the vector will hang as they can't get io completions. Correct by registering callbacks with the system for Offline/Online events. When a cpu is taken offline, its eq, which is tied to an interrupt vector is found. If the cpu is the "owner" of the vector and if the eq/vector is shared by other CPUs, the eq is placed into a polled mode. Additionally, code paths that perform io submission on the "sharing CPUs" will check the eq state and poll for completion after submission of new io to a wq that uses the eq. Similarly, when a cpu comes back online and owns an offlined vector, the eq is taken out of polled mode and rearmed to start driving interrupts for eq. Link: https://lore.kernel.org/r/20191105005708.7399-9-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-04-17scsi: lpfc: Fix lpfc_io_buf resource leak in lpfc_get_scsi_buf_s4 error pathJames Smart
commit 0ab384a49c548baf132ccef249f78d9c6c506380 upstream. If a call to lpfc_get_cmd_rsp_buf_per_hdwq returns NULL (memory allocation failure), a previously allocated lpfc_io_buf resource is leaked. Fix by releasing the lpfc_io_buf resource in the failure path. Fixes: d79c9e9d4b3d ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.") Cc: <stable@vger.kernel.org> # v5.4+ Link: https://lore.kernel.org/r/20200128002312.16346-3-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-04-17nvme-fc: Revert "add module to ops template to allow module references"James Smart
commit 8c5c660529209a0e324c1c1a35ce3f83d67a2aa5 upstream. The original patch was to resolve the lldd being able to be unloaded while being used to talk to the boot device of the system. However, the end result of the original patch is that any driver unload while a nvme controller is live via the lldd is now being prohibited. Given the module reference, the module teardown routine can't be called, thus there's no way, other than manual actions to terminate the controllers. Fixes: 863fbae929c7 ("nvme_fc: add module to ops template to allow module references") Cc: <stable@vger.kernel.org> # v5.4+ Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-02-24scsi: lpfc: Fix: Rework setting of fdmi symbolic node name registrationJames Smart
[ Upstream commit df9166bfa7750bade5737ffc91fbd432e0354442 ] This patch reworks the fdmi symbolic node name data for the following two issues: - Correcting extraneous periods following the DV and HN fdmi data fields. - Avoiding buffer overflow issues when formatting the data. The fix to the fist issue is to just remove the characters. The fix to the second issue has all data being staged in temporary storage before being moved to the real buffer. Link: https://lore.kernel.org/r/20191218235808.31922-3-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-23scsi: lpfc: use hdwq assigned cpu for allocationJames Smart
[ Upstream commit 4583a4f66b323c6e4d774be2649e83a4e7c7b78c ] Looking at the recent conversion from smp_processor_id() to raw_smp_processor_id(), realized that the allocation should be based on the cpu the hdwq is bound to, not the executing cpu. Revise to pull cpu number from the hdwq Fixes: 765ab6cdac3b ("scsi: lpfc: Fix a kernel warning triggered by lpfc_get_sgl_per_hdwq()") Link: https://lore.kernel.org/r/20191116003847.6141-1-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2020-01-23scsi: lpfc: Fix a kernel warning triggered by lpfc_get_sgl_per_hdwq()Bart Van Assche
commit 765ab6cdac3b681952da0e22184bf6cf1ae41cf8 upstream. Fix the following kernel bug report: BUG: using smp_processor_id() in preemptible [00000000] code: systemd-udevd/954 Fixes: d79c9e9d4b3d ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.") Link: https://lore.kernel.org/r/20191107052158.25788-2-bvanassche@acm.org Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23scsi: lpfc: Fix hdwq sgl locks and irq handlingJames Smart
commit a4c21acca2be6729ecbe72eda9b08092725b0a77 upstream. Many of the sgl-per-hdwq paths are locking with spin_lock_irq() and spin_unlock_irq() and may unwittingly raising irq when it shouldn't. Hard deadlocks were seen around lpfc_scsi_prep_cmnd(). Fix by converting the locks to irqsave/irqrestore. Fixes: d79c9e9d4b3d ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.") Link: https://lore.kernel.org/r/20190922035906.10977-16-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23scsi: lpfc: Fix list corruption detected in lpfc_put_sgl_per_hdwqJames Smart
commit 35a635af54ce79881eb35ba20b64dcb1e81b0389 upstream. In lpfc_release_io_buf, an lpfc_io_buf is returned to the 'available' pool before any associated sgl or cmd and rsp buffers are returned via their respective 'put' routines. If xri rebalancing occurs and an lpfc_io_buf structure is reused quickly, there may be a race condition between release of old and association of new resources. Re-ordered lpfc_release_io_buf to release sgl and cmd/rsp buffer lists before releasing the lpfc_io_buf structure for re-use. Fixes: d79c9e9d4b3d ("scsi: lpfc: Support dynamic unbounded SGL lists on G7 hardware.") Link: https://lore.kernel.org/r/20190922035906.10977-17-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-23scsi: lpfc: fix: Coverity: lpfc_get_scsi_buf_s3(): Null pointer dereferencesJames Smart
commit 6f23f8c5c9f1be4eb17c035129c80e49000c18a7 upstream. Coverity reported the following:
2020-01-09scsi: lpfc: Fix rpi release when deleting vportJames Smart
commit 97acd0019d5dadd9c0e111c2083c889bfe548f25 upstream. A prior use-after-free mailbox fix solved it's problem by null'ing a ndlp pointer. However, further testing has shown that this change causes a later state change to occasionally be skipped, which results in a reference count never being decremented thus the rpi is never released, which causes a vport delete to never succeed. Revise the fix in the prior patch to no longer null the ndlp. Instead the RELEASE_RPI flag is set which will drive the release of the rpi. Given the new code was added at a deep indentation level, refactor the code block using a new routine that avoids the indentation issues. Fixes: 9b1640686470 ("scsi: lpfc: Fix use-after-free mailbox cmd completion") Link: https://lore.kernel.org/r/20190922035906.10977-6-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2020-01-09scsi: lpfc: Fix memory leak on lpfc_bsg_write_ebuf_set funcBo Wu
[ Upstream commit 9a1b0b9a6dab452fb0e39fe96880c4faf3878369 ] When phba->mbox_ext_buf_ctx.seqNum != phba->mbox_ext_buf_ctx.numBuf, dd_data should be freed before return SLI_CONFIG_HANDLED. When lpfc_sli_issue_mbox func return fails, pmboxq should be also freed in job_error tag. Link: https://lore.kernel.org/r/EDBAAA0BBBA2AC4E9C8B6B81DEEE1D6915E7A966@DGGEML525-MBS.china.huawei.com Signed-off-by: Bo Wu <wubo40@huawei.com> Reviewed-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Reviewed-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>