summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/machine_kexec_64.c
AgeCommit message (Collapse)Author
2014-12-16x86, irq: Move IOAPIC related declarations from hw_irq.h into io_apic.hJiang Liu
Clean up code by moving IOAPIC related declarations from hw_irq.h into io_apic.h. Signed-off-by: Jiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Rafael J. Wysocki <rjw@rjwysocki.net> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Grant Likely <grant.likely@linaro.org> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Christian Gmeiner <christian.gmeiner@gmail.com> Cc: Aubrey <aubrey.li@linux.intel.com> Cc: Ryan Desfosses <ryan@desfo.org> Cc: Quentin Lambert <lambert.quentin@gmail.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: http://lkml.kernel.org/r/1414397531-28254-14-git-send-email-jiang.liu@linux.intel.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-08-29kexec: create a new config option CONFIG_KEXEC_FILE for new syscallVivek Goyal
Currently new system call kexec_file_load() and all the associated code compiles if CONFIG_KEXEC=y. But new syscall also compiles purgatory code which currently uses gcc option -mcmodel=large. This option seems to be available only gcc 4.4 onwards. Hiding new functionality behind a new config option will not break existing users of old gcc. Those who wish to enable new functionality will require new gcc. Having said that, I am trying to figure out how can I move away from using -mcmodel=large but that can take a while. I think there are other advantages of introducing this new config option. As this option will be enabled only on x86_64, other arches don't have to compile generic kexec code which will never be used. This new code selects CRYPTO=y and CRYPTO_SHA256=y. And all other arches had to do this for CONFIG_KEXEC. Now with introduction of new config option, we can remove crypto dependency from other arches. Now CONFIG_KEXEC_FILE is available only on x86_64. So whereever I had CONFIG_X86_64 defined, I got rid of that. For CONFIG_KEXEC_FILE, instead of doing select CRYPTO=y, I changed it to "depends on CRYPTO=y". This should be safer as "select" is not recursive. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Tested-by: Shaun Ruffell <sruffell@digium.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08kexec: verify the signature of signed PE bzImageVivek Goyal
This is the final piece of the puzzle of verifying kernel image signature during kexec_file_load() syscall. This patch calls into PE file routines to verify signature of bzImage. If signature are valid, kexec_file_load() succeeds otherwise it fails. Two new config options have been introduced. First one is CONFIG_KEXEC_VERIFY_SIG. This option enforces that kernel has to be validly signed otherwise kernel load will fail. If this option is not set, no signature verification will be done. Only exception will be when secureboot is enabled. In that case signature verification should be automatically enforced when secureboot is enabled. But that will happen when secureboot patches are merged. Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG. This option enables signature verification support on bzImage. If this option is not set and previous one is set, kernel image loading will fail because kernel does not have support to verify signature of bzImage. I tested these patches with both "pesign" and "sbsign" signed bzImages. I used signing_key.priv key and signing_key.x509 cert for signing as generated during kernel build process (if module signing is enabled). Used following method to sign bzImage. pesign ====== - Convert DER format cert to PEM format cert openssl x509 -in signing_key.x509 -inform DER -out signing_key.x509.PEM -outform PEM - Generate a .p12 file from existing cert and private key file openssl pkcs12 -export -out kernel-key.p12 -inkey signing_key.priv -in signing_key.x509.PEM - Import .p12 file into pesign db pk12util -i /tmp/kernel-key.p12 -d /etc/pki/pesign - Sign bzImage pesign -i /boot/vmlinuz-3.16.0-rc3+ -o /boot/vmlinuz-3.16.0-rc3+.signed.pesign -c "Glacier signing key - Magrathea" -s sbsign ====== sbsign --key signing_key.priv --cert signing_key.x509.PEM --output /boot/vmlinuz-3.16.0-rc3+.signed.sbsign /boot/vmlinuz-3.16.0-rc3+ Patch details: Well all the hard work is done in previous patches. Now bzImage loader has just call into that code and verify whether bzImage signature are valid or not. Also create two config options. First one is CONFIG_KEXEC_VERIFY_SIG. This option enforces that kernel has to be validly signed otherwise kernel load will fail. If this option is not set, no signature verification will be done. Only exception will be when secureboot is enabled. In that case signature verification should be automatically enforced when secureboot is enabled. But that will happen when secureboot patches are merged. Second config option is CONFIG_KEXEC_BZIMAGE_VERIFY_SIG. This option enables signature verification support on bzImage. If this option is not set and previous one is set, kernel image loading will fail because kernel does not have support to verify signature of bzImage. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Cc: Matt Fleming <matt@console-pimps.org> Cc: David Howells <dhowells@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08kexec: support for kexec on panic using new system callVivek Goyal
This patch adds support for loading a kexec on panic (kdump) kernel usning new system call. It prepares ELF headers for memory areas to be dumped and for saved cpu registers. Also prepares the memory map for second kernel and limits its boot to reserved areas only. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08kexec-bzImage64: support for loading bzImage using 64bit entryVivek Goyal
This is loader specific code which can load bzImage and set it up for 64bit entry. This does not take care of 32bit entry or real mode entry. 32bit mode entry can be implemented if somebody needs it. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08kexec: load and relocate purgatory at kernel load timeVivek Goyal
Load purgatory code in RAM and relocate it based on the location. Relocation code has been inspired by module relocation code and purgatory relocation code in kexec-tools. Also compute the checksums of loaded kexec segments and store them in purgatory. Arch independent code provides this functionality so that arch dependent bootloaders can make use of it. Helper functions are provided to get/set symbol values in purgatory which are used by bootloaders later to set things like stack and entry point of second kernel etc. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-08-08kexec: implementation of new syscall kexec_file_loadVivek Goyal
Previous patch provided the interface definition and this patch prvides implementation of new syscall. Previously segment list was prepared in user space. Now user space just passes kernel fd, initrd fd and command line and kernel will create a segment list internally. This patch contains generic part of the code. Actual segment preparation and loading is done by arch and image specific loader. Which comes in next patch. [akpm@linux-foundation.org: coding-style fixes] Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Cc: Borislav Petkov <bp@suse.de> Cc: Michael Kerrisk <mtk.manpages@gmail.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric Biederman <ebiederm@xmission.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Greg Kroah-Hartman <greg@kroah.com> Cc: Dave Young <dyoung@redhat.com> Cc: WANG Chao <chaowang@redhat.com> Cc: Baoquan He <bhe@redhat.com> Cc: Andy Lutomirski <luto@amacapital.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-02-25x86, kaslr: export offset in VMCOREINFO ELF notesEugene Surovegin
Include kASLR offset in VMCOREINFO ELF notes to assist in debugging. [ hpa: pushing this for v3.14 to avoid having a kernel version with kASLR where we can't debug output. ] Signed-off-by: Eugene Surovegin <surovegin@google.com> Link: http://lkml.kernel.org/r/20140123173120.GA25474@www.outflux.net Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86, kexec, 64bit: Only set ident mapping for ram.Yinghai Lu
We should set mappings only for usable memory ranges under max_pfn Otherwise causes same problem that is fixed by x86, mm: Only direct map addresses that are marked as E820_RAM This patch exposes pfn_mapped array, and only sets ident mapping for ranges in that array. This patch relies on new kernel_ident_mapping_init that could handle existing pgd/pud between different calls. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-25-git-send-email-yinghai@kernel.org Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86, kexec: Replace ident_mapping_init and init_level4_pageYinghai Lu
Now ident_mapping_init is checking if pgd/pud is present for every 2M, so several 2Ms are in same PUD, it will keep checking if pud is there with same pud. init_level4_page just does not check existing pgd/pud. We could use generic mapping_init with different settings in info to replace those two local grown version functions. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-24-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86, kexec: Set ident mapping for kernel that is above max_pfnYinghai Lu
When first kernel is booted with memmap= or mem= to limit max_pfn. kexec can load second kernel above that max_pfn. We need to set ident mapping for whole image in this case instead of just for first 2M. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-23-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-09-22x86, cleanups: Use clear_page/copy_page rather than memset/memcpyJan Beulich
When operating on whole pages, use clear_page() and copy_page() in favor of memset() and memcpy(); after all that's what they are intended for. Signed-off-by: Jan Beulich <jbeulich@novell.com> LKML-Reference: <4C7FB8CA0200007800013F51@vpn.id2.novell.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2010-03-30include cleanup: Update gfp.h and slab.h includes to prepare for breaking ↵Tejun Heo
implicit slab.h inclusion from percpu.h percpu.h is included by sched.h and module.h and thus ends up being included when building most .c files. percpu.h includes slab.h which in turn includes gfp.h making everything defined by the two files universally available and complicating inclusion dependencies. percpu.h -> slab.h dependency is about to be removed. Prepare for this change by updating users of gfp and slab facilities include those headers directly instead of assuming availability. As this conversion needs to touch large number of source files, the following script is used as the basis of conversion. http://userweb.kernel.org/~tj/misc/slabh-sweep.py The script does the followings. * Scan files for gfp and slab usages and update includes such that only the necessary includes are there. ie. if only gfp is used, gfp.h, if slab is used, slab.h. * When the script inserts a new include, it looks at the include blocks and try to put the new include such that its order conforms to its surrounding. It's put in the include block which contains core kernel includes, in the same order that the rest are ordered - alphabetical, Christmas tree, rev-Xmas-tree or at the end if there doesn't seem to be any matching order. * If the script can't find a place to put a new include (mostly because the file doesn't have fitting include block), it prints out an error message indicating which .h file needs to be added to the file. The conversion was done in the following steps. 1. The initial automatic conversion of all .c files updated slightly over 4000 files, deleting around 700 includes and adding ~480 gfp.h and ~3000 slab.h inclusions. The script emitted errors for ~400 files. 2. Each error was manually checked. Some didn't need the inclusion, some needed manual addition while adding it to implementation .h or embedding .c file was more appropriate for others. This step added inclusions to around 150 files. 3. The script was run again and the output was compared to the edits from #2 to make sure no file was left behind. 4. Several build tests were done and a couple of problems were fixed. e.g. lib/decompress_*.c used malloc/free() wrappers around slab APIs requiring slab.h to be added manually. 5. The script was run on all .h files but without automatically editing them as sprinkling gfp.h and slab.h inclusions around .h files could easily lead to inclusion dependency hell. Most gfp.h inclusion directives were ignored as stuff from gfp.h was usually wildly available and often used in preprocessor macros. Each slab.h inclusion directive was examined and added manually as necessary. 6. percpu.h was updated not to include slab.h. 7. Build test were done on the following configurations and failures were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my distributed build env didn't work with gcov compiles) and a few more options had to be turned off depending on archs to make things build (like ipr on powerpc/64 which failed due to missing writeq). * x86 and x86_64 UP and SMP allmodconfig and a custom test config. * powerpc and powerpc64 SMP allmodconfig * sparc and sparc64 SMP allmodconfig * ia64 SMP allmodconfig * s390 SMP allmodconfig * alpha SMP allmodconfig * um on x86_64 SMP allmodconfig 8. percpu.h modifications were reverted so that it could be applied as a separate patch and serve as bisection point. Given the fact that I had only a couple of failures from tests on step 6, I'm fairly confident about the coverage of this conversion patch. If there is a breakage, it's likely to be something in one of the arch headers which should be easily discoverable easily on most builds of the specific arch. Signed-off-by: Tejun Heo <tj@kernel.org> Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
2009-06-02hw-breakpoints: cleanup HW Breakpoint registers before kexecK.Prasad
This patch disables Hardware breakpoints before doing a 'kexec' on the machine so that the cpu doesn't keep debug registers values which would be out of sync for the new image. Original-patch-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: K.Prasad <prasad@linux.vnet.ibm.com> Reviewed-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2009-05-07x86, kexec: fix crashdump panic with CONFIG_KEXEC_JUMPHuang Ying
Tim Starling reported that crashdump will panic with kernel compiled with CONFIG_KEXEC_JUMP due to null pointer deference in machine_kexec_32.c: machine_kexec(), when deferencing kexec_image. Refering to: http://bugzilla.kernel.org/show_bug.cgi?id=13265 This patch fixes the BUG via replacing global variable reference: kexec_image in machine_kexec() with local variable reference: image, which is more appropriate, and will not be null. Same BUG is in machine_kexec_64.c too, so fixed too in the same way. [ Impact: fix crash on kexec ] Reported-by: Tim Starling <tstarling@wikimedia.org> Signed-off-by: Huang Ying <ying.huang@intel.com> LKML-Reference: <1241751101.6259.85.camel@yhuang-dev.sh.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-03-10x86, kexec: x86_64: add kexec jump support for x86_64Huang Ying
Impact: New major feature This patch add kexec jump support for x86_64. More information about kexec jump can be found in corresponding x86_32 support patch. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-03-10x86, kexec: x86_64: add identity map for pages at image->startHuang Ying
Impact: Fix corner case that cannot yet occur image->start may be outside of 0 ~ max_pfn, for example when jumping back to original kernel from kexeced kenrel. This patch add identity map for pages at image->start. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-03-10x86, kexec: fix kexec x86 coding styleHuang Ying
Impact: Cleanup Fix some coding style issue for kexec x86. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-02-03x86: kexec: Use one page table in x86_64 machine_kexecHuang Ying
Impact: reduce kernel BSS size by 7 pages, improve code readability Two page tables are used in current x86_64 kexec implementation. One is used to jump from kernel virtual address to identity map address, the other is used to map all physical memory. In fact, on x86_64, there is no conflict between kernel virtual address space and physical memory space, so just one page table is sufficient. The page table pages used to map control page are dynamically allocated to save memory if kexec image is not loaded. ASM code used to map control page is replaced by C code too. Signed-off-by: Huang Ying <ying.huang@intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2008-07-26kexec jumpHuang Ying
This patch provides an enhancement to kexec/kdump. It implements the following features: - Backup/restore memory used by the original kernel before/after kexec. - Save/restore CPU state before/after kexec. The features of this patch can be used as a general method to call program in physical mode (paging turning off). This can be used to call BIOS code under Linux. kexec-tools needs to be patched to support kexec jump. The patches and the precompiled kexec can be download from the following URL: source: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-src_git_kh10.tar.bz2 patches: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec-tools-patches_git_kh10.tar.bz2 binary: http://khibernation.sourceforge.net/download/release_v10/kexec-tools/kexec_git_kh10 Usage example of calling some physical mode code and return: 1. Compile and install patched kernel with following options selected: CONFIG_X86_32=y CONFIG_KEXEC=y CONFIG_PM=y CONFIG_KEXEC_JUMP=y 2. Build patched kexec-tool or download the pre-built one. 3. Build some physical mode executable named such as "phy_mode" 4. Boot kernel compiled in step 1. 5. Load physical mode executable with /sbin/kexec. The shell command line can be as follow: /sbin/kexec --load-preserve-context --args-none phy_mode 6. Call physical mode executable with following shell command line: /sbin/kexec -e Implementation point: To support jumping without reserving memory. One shadow backup page (source page) is allocated for each page used by kexeced code image (destination page). When do kexec_load, the image of kexeced code is loaded into source pages, and before executing, the destination pages and the source pages are swapped, so the contents of destination pages are backupped. Before jumping to the kexeced code image and after jumping back to the original kernel, the destination pages and the source pages are swapped too. C ABI (calling convention) is used as communication protocol between kernel and called code. A flag named KEXEC_PRESERVE_CONTEXT for sys_kexec_load is added to indicate that the loaded kernel image is used for jumping back. Now, only the i386 architecture is supported. Signed-off-by: Huang Ying <ying.huang@intel.com> Acked-by: Vivek Goyal <vgoyal@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Machek <pavel@ucw.cz> Cc: Nigel Cunningham <nigel@nigel.suspend2.net> Cc: "Rafael J. Wysocki" <rjw@sisk.pl> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-07-14Merge branch 'auto-ftrace-next' into tracing/for-linusIngo Molnar
Conflicts: arch/x86/kernel/entry_32.S arch/x86/kernel/process_32.c arch/x86/kernel/process_64.c arch/x86/lib/Makefile include/asm-x86/irqflags.h kernel/Makefile kernel/sched.c Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-07-08x86: remove end_pfn in 64bitYinghai Lu
and use max_pfn directly. Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-05-23ftrace: fix kexecIngo Molnar
disable the tracer while kexec pulls the rug from under the old kernel. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-04-02vmcoreinfo: add the symbol "phys_base"Ken'ichi Ohmichi
Fix the problem that makedumpfile sometimes fails on x86_64 machine. This patch adds the symbol "phys_base" to a vmcoreinfo data. The vmcoreinfo data has the minimum debugging information only for dump filtering. makedumpfile (dump filtering command) gets it to distinguish unnecessary pages, and makedumpfile creates a small dumpfile. On x86_64 kernel which compiled with CONFIG_PHYSICAL_START=0x0 and CONFIG_RELOCATABLE=y, makedumpfile fails like the following: # makedumpfile -d31 /proc/vmcore dumpfile The kernel version is not supported. The created dumpfile may be incomplete. _exclude_free_page: Can't get next online node. makedumpfile Failed. # The cause is the lack of the symbol "phys_base" in a vmcoreinfo data. If the symbol "phys_base" does not exist, makedumpfile considers an x86_64 kernel as non relocatable. As the result, makedumpfile misunderstands the physical address where the kernel is loaded, and it cannot translate a kernel virtual address to physical address correctly. To fix this problem, this patch adds the symbol "phys_base" to a vmcoreinfo data. Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-02-07vmcoreinfo: fix the configuration dependenciesKen'ichi Ohmichi
This patch fixes the configuration dependencies in the vmcoreinfo data. i386's "node_data" is defined in arch/x86/mm/discontig_32.c, and x86_64's one is defined in arch/x86/mm/numa_64.c. They depend on CONFIG_NUMA: arch/x86/mm/Makefile_32:7 obj-$(CONFIG_NUMA) += discontig_32.o arch/x86/mm/Makefile_64:7 obj-$(CONFIG_NUMA) += numa_64.o ia64's "pgdat_list" is defined in arch/ia64/mm/discontig.c, and it depends on CONFIG_DISCONTIGMEM and CONFIG_SPARSEMEM: arch/ia64/mm/Makefile:9-10 obj-$(CONFIG_DISCONTIGMEM) += discontig.o obj-$(CONFIG_SPARSEMEM) += discontig.o ia64's "node_memblk" is defined in arch/ia64/mm/numa.c, and it depends on CONFIG_NUMA: arch/ia64/mm/Makefile:8 obj-$(CONFIG_NUMA) += numa.o Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Acked-by: Simon Horman <horms@verge.net.au> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-01-30x86: 64-bit, make sparsemem vmemmap the only memory modelChristoph Lameter
Use sparsemem as the only memory model for UP, SMP and NUMA. Measurements indicate that DISCONTIGMEM has a higher overhead than sparsemem. And FLATMEMs benefits are minimal. So I think its best to simply standardize on sparsemem. Results of page allocator tests (test can be had via git from slab git tree branch tests) Measurements in cycle counts. 1000 allocations were performed and then the average cycle count was calculated. Order FlatMem Discontig SparseMem 0 639 665 641 1 567 647 593 2 679 774 692 3 763 967 781 4 961 1501 962 5 1356 2344 1392 6 2224 3982 2336 7 4869 7225 5074 8 12500 14048 12732 9 27926 28223 28165 10 58578 58714 58682 (Note that FlatMem is an SMP config and the rest NUMA configurations) Memory use: SMP Sparsemem ------------- Kernel size: text data bss dec hex filename 3849268 397739 1264856 5511863 541ab7 vmlinux total used free shared buffers cached Mem: 8242252 41164 8201088 0 352 11512 -/+ buffers/cache: 29300 8212952 Swap: 9775512 0 9775512 SMP Flatmem ----------- Kernel size: text data bss dec hex filename 3844612 397739 1264536 5506887 540747 vmlinux So 4.5k growth in text size vs. FLATMEM. total used free shared buffers cached Mem: 8244052 40544 8203508 0 352 11484 -/+ buffers/cache: 28708 8215344 2k growth in overall memory use after boot. NUMA discontig: text data bss dec hex filename 3888124 470659 1276504 5635287 55fcd7 vmlinux total used free shared buffers cached Mem: 8256256 56908 8199348 0 352 11496 -/+ buffers/cache: 45060 8211196 Swap: 9775512 0 9775512 NUMA sparse: text data bss dec hex filename 3896428 470659 1276824 5643911 561e87 vmlinux 8k text growth. Given that we fully inline virt_to_page and friends now that is rather good. total used free shared buffers cached Mem: 8264720 57240 8207480 0 352 11516 -/+ buffers/cache: 45372 8219348 Swap: 9775512 0 9775512 The total available memory is increased by 8k. This patch makes sparsemem the default and removes discontig and flatmem support from x86. [ akpm@linux-foundation.org: allnoconfig build fix ] Acked-by: Andi Kleen <ak@suse.de> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-27x86: Dump filtering supports x86_64 sparsememKen'ichi Ohmichi
This patch adds the symbol "init_level4_pgt" to the vmcoreinfo data so that makedumpfile (dump filtering command) supports x86_64 sparsemem kernel of linux-2.6.24. makedumpfile creates a small dumpfile by excluding unnecessary pages for the analysis. It checks attributes in page structures and distinguishes necessary pages and unnecessary ones. To check them, makedumpfile gets the vmcoreinfo data which has the minimum debugging information only for dump filtering. For older x86_64 kernel (linux-2.6.23 or before), makedumpfile translates the virtual address of page structure into physical address by subtracting PAGE_OFFSET from virtual address, but this translation isn't effective for linux-2.6.24 sparsemem kernel, because its page structures are in virtual memmap area. makedumpfile should translate their virtual address by 4-levels paging and it needs the symbol "init_level4_pgt". Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2007-10-19Use extended crashkernel command line on x86_64Bernhard Walle
This patch removes the crashkernel parsing from arch/x86_64/kernel/machine_kexec.c and calls the generic function, introduced in the last patch, in setup_bootmem_allocator(). This is necessary because the amount of System RAM must be known in this function now because of the new syntax. Signed-off-by: Bernhard Walle <bwalle@suse.de> Cc: Andi Kleen <ak@suse.de> Cc: Vivek Goyal <vgoyal@in.ibm.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17add-vmcore: add a prefix "VMCOREINFO_" to the vmcoreinfo macrosKen'ichi Ohmichi
Add a prefix "VMCOREINFO_" to the vmcoreinfo macros. Old vmcoreinfo macros were defined as generic names SYMBOL/SIZE/OFFSET /LENGTH/CONFIG, and it is impossible to grep for them. So these names should be changed. This discussion is the following: http://www.ussg.iu.edu/hypermail/linux/kernel/0709.1/0415.html Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17Add vmcoreinfoKen'ichi Ohmichi
This patch set frees the restriction that makedumpfile users should install a vmlinux file (including the debugging information) into each system. makedumpfile command is the dump filtering feature for kdump. It creates a small dumpfile by filtering unnecessary pages for the analysis. To distinguish unnecessary pages, it needs a vmlinux file including the debugging information. These days, the debugging package becomes a huge file, and it is hard to install it into each system. To solve the problem, kdump developers discussed it at lkml and kexec-ml. As the result, we reached the conclusion that necessary information for dump filtering (called "vmcoreinfo") should be embedded into the first kernel file and it should be accessed through /proc/vmcore during the second kernel. (http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.0/1806.html) Dan Aloni created the patch set for the above implementation. (http://www.uwsg.iu.edu/hypermail/linux/kernel/0707.1/1053.html) And I updated it for multi architectures and memory models. (http://lists.infradead.org/pipermail/kexec/2007-August/000479.html) Signed-off-by: Dan Aloni <da-x@monatomic.org> Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Signed-off-by: Bernhard Walle <bwalle@suse.de> Signed-off-by: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-13Delete filenames in comments.Dave Jones
Since the x86 merge, lots of files that referenced their own filenames are no longer correct. Rather than keep them up to date, just delete them, as they add no real value. Additionally: - fix up comment formatting in scx200_32.c - Remove a credit from myself in setup_64.c from a time when we had no SCM - remove longwinded history from tsc_32.c which can be figured out from git. Signed-off-by: Dave Jones <davej@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-11x86_64: move kernelThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>