summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2008-06-24x86: use BOOTMEM_EXCLUSIVE on 32-bitBernhard Walle
commit d3942cff620bea073fc4e3c8ed878eb1e84615ce upstream This patch uses the BOOTMEM_EXCLUSIVE for crashkernel reservation also for i386 and prints a error message on failure. The patch is still for 2.6.26 since it is only bug fixing. The unification of reserve_crashkernel() between i386 and x86_64 should be done for 2.6.27. Signed-off-by: Bernhard Walle <bwalle@suse.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-06-21x86: disable mwait for AMD family 10H/11H CPUsThomas Gleixner
back-ported from upstream commit e9623b35599fcdbc00c16535cbefbb4d5578f4ab by Vegard Nossum The previous revert of 0c07ee38c9d4eb081758f5ad14bbffa7197e1aec left out the mwait disable condition for AMD family 10H/11H CPUs. Andreas Herrman said: It depends on the CPU. For AMD CPUs that support MWAIT this is wrong. Family 0x10 and 0x11 CPUs will enter C1 on HLT. Powersavings then depend on a clock divisor and current Pstate of the core. If all cores of a processor are in halt state (C1) the processor can enter the C1E (C1 enhanced) state. If mwait is used this will never happen. Thus HLT saves more power than MWAIT here. It might be best to switch off the mwait flag for these AMD CPU families like it was introduced with commit f039b754714a422959027cb18bb33760eb8153f0 (x86: Don't use MWAIT on AMD Family 10) Re-add the AMD families 10H/11H check and disable the mwait usage for those. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-06-21x86: remove mwait capability C-state checkIngo Molnar
back-ported from upstream commit a738d897b7b03b83488ae74a9bc03d26a2875dc6 by Vegard Nossum Vegard Nossum reports: | powertop shows between 200-400 wakeups/second with the description | "<kernel IPI>: Rescheduling interrupts" when all processors have load (e.g. | I need to run two busy-loops on my 2-CPU system for this to show up). | | The bisect resulted in this commit: | | commit 0c07ee38c9d4eb081758f5ad14bbffa7197e1aec | Date: Wed Jan 30 13:33:16 2008 +0100 | | x86: use the correct cpuid method to detect MWAIT support for C states remove the functional effects of this patch and make mwait unconditional. A future patch will turn off mwait on specific CPUs where that causes power to be wasted. Bisected-by: Vegard Nossum <vegard.nossum@gmail.com> Tested-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Vegard Nossum <vegard.nossum@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-06-21x86-64: Fix "bytes left to copy" return value for copy_from_user()Linus Torvalds
commit 42a886af728c089df8da1b0017b0e7e6c81b5335 upstream Most users by far do not care about the exact return value (they only really care about whether the copy succeeded in its entirety or not), but a few special core routines actually care deeply about exactly how many bytes were copied from user space. And the unrolled versions of the x86-64 user copy routines would sometimes report that it had copied more bytes than it actually had. Very few uses actually have partial copies to begin with, but to make this bug even harder to trigger, most x86 CPU's use the "rep string" instructions for normal user copies, and that version didn't have this issue. To make it even harder to hit, the one user of this that really cared about the return value (and used the uncached version of the copy that doesn't use the "rep string" instructions) was the generic write routine, which pre-populated its source, once more hiding the problem by avoiding the exception case that triggers the bug. In other words, very special thanks to Bron Gondwana who not only triggered this, but created a test-program to show it, and bisected the behavior down to commit 08291429cfa6258c4cd95d8833beb40f828b194e ("mm: fix pagecache write deadlocks") which changed the access pattern just enough that you can now trigger it with 'writev()' with multiple iovec's. That commit itself was not the cause of the bug, it just allowed all the stars to align just right that you could trigger the problem. [ Side note: this is just the minimal fix to make the copy routines (with __copy_from_user_inatomic_nocache as the particular version that was involved in showing this) have the right return values. We really should improve on the exceptional case further - to make the copy do a byte-accurate copy up to the exact page limit that causes it to fail. As it is, the callers have to do extra work to handle the limit case gracefully. ] Reported-by: Bron Gondwana <brong@fastmail.fm> Cc: Nick Piggin <npiggin@suse.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Al Viro <viro@ZenIV.linux.org.uk> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-06-16x86: fix recursive dependenciesRoman Zippel
commit 823c248e7cc75b4f22da914b01f8e5433cff197e in mainline The proper dependency check uncovered a few dependency problems, the subarchitecture used a mixture of selects and depends on SMP and PCI dependency was messed up. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Ravikiran Thirumalai <kiran@scalex86.org>
2008-06-16Kconfig: introduce ARCH_DEFCONFIG to DEFCONFIG_LISTSam Ravnborg
commit 73531905ed53576d9e8707659a761e7046a60497 in mainline. init/Kconfig contains a list of configs that are searched for if 'make *config' are used with no .config present. Extend this list to look at the config identified by ARCH_DEFCONFIG. With this change we now try the defconfig targets last. This fixes a regression reported by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-06-09x86: fix bad pmd ffff810000207xxx(9090909090909090)Hugh Dickins
upstream commit: 2884f110d5409714f3a04eeb6d2ecd77da66b242 OGAWA Hirofumi and Fede have reported rare pmd_ERROR messages: mm/memory.c:127: bad pmd ffff810000207xxx(9090909090909090). Initialization's cleanup_highmap was leaving alignment filler behind in the pmd for MODULES_VADDR: when vmalloc's guard page would occupy a new page table, it's not allocated, and then module unload's vfree hits the bad 9090 pmd entry left over. Signed-off-by: Hugh Dickins <hugh@veritas.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Hugh notes: It's actually not a serious problem, but it does look as if it's a serious problem, so we should stamp it out. Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09x86, fpu: fix CONFIG_PREEMPT=y corruption of application's FPU stackSuresh Siddha
upstream commit: 870568b39064cab2dd971fe57969916036982862 Jürgen Mell reported an FPU state corruption bug under CONFIG_PREEMPT, and bisected it to commit v2.6.19-1363-gacc2076, "i386: add sleazy FPU optimization". Add tsk_used_math() checks to prevent calling math_state_restore() which can sleep in the case of !tsk_used_math(). This prevents making a blocking call in __switch_to(). Apparently "fpu_counter > 5" check is not enough, as in some signal handling and fork/exec scenarios, fpu_counter > 5 and !tsk_used_math() is possible. It's a side effect though. This is the failing scenario: process 'A' in save_i387_ia32() just after clear_used_math() Got an interrupt and pre-empted out. At the next context switch to process 'A' again, kernel tries to restore the math state proactively and sees a fpu_counter > 0 and !tsk_used_math() This results in init_fpu() during the __switch_to()'s math_state_restore() And resulting in fpu corruption which will be saved/restored (save_i387_fxsave and restore_i387_fxsave) during the remaining part of the signal handling after the context switch. Bisected-by: Jürgen Mell <j.mell@t-online.de> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Tested-by: Jürgen Mell <j.mell@t-online.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09CPUFREQ: Make acpi-cpufreq more robust against BIOS freq changes behind our ↵Venkatesh Pallipadi
back. upstream commit: e56a727b023d40d1adf660168883f30f2e6abe0a We checked the hardware freq with OS cached freq value in get_cur_freqon_cpu(). Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Dave Jones <davej@redhat.com> Cc: Thomas Renninger <trenn@suse.de> Cc: "Anthony L. Awtrey" <tony@awtrey.com> [chrisw: backport to 2.6.25.4] Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09Revert "PCI: remove default PCI expansion ROM memory allocation"Linus Torvalds
upstream commit: 8d539108560ec121d59eee05160236488266221c This reverts commit 9f8daccaa05c14e5643bdd4faf5aed9cc8e6f11e, which was reported to break X startup (xf86-video-ati-6.8.0). See http://bugs.freedesktop.org/show_bug.cgi?id=15523 for details. Reported-by: Laurence Withers <l@lwithers.me.uk> Cc: Gary Hade <garyhade@us.ibm.com> Cc: Greg KH <greg@kroah.com> Cc: Jan Beulich <jbeulich@novell.com> Cc: "Jun'ichi Nomura" <j-nomura@ce.jp.nec.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [cebbert@redhat.com: backport, remove first hunk to make port easier] [chrisw@sous-sol.org: add back first hunk] Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09x86: disable TSC for sched_clock() when calibration failedThomas Gleixner
upstream commit: 74dc51a3de06aa516e3b9fdc4017b2aeb38bf44b When the TSC calibration fails then TSC is still used in sched_clock(). Disable it completely in that case. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09x86: distangle user disabled TSC from unstableThomas Gleixner
upstream commit: 9ccc906c97e34fd91dc6aaf5b69b52d824386910 tsc_enabled is set to 0 from the command line switch "notsc" and from the mark_tsc_unstable code. Seperate those functionalities and replace tsc_enable with tsc_disable. This makes also the native_sched_clock() decision when to use TSC understandable. Preparatory patch to solve the sched_clock() issue on 32 bit. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09x86: if we cannot calibrate the TSC, we panic.Rusty Russell
upstream commit: 3c2047cd32b1a8c782d7efab72707e7daa251625 The current tsc_init() clears the TSC feature bit if the TSC khz cannot be calculated, causing us to panic in arch/x86/kernel/cpu/bugs.c check_config(). We should simply mark it unstable. Frankly, someone should take an axe to this code. mark_tsc_unstable() not only marks it unstable, but sets tsc_enabled to 0, which seems redundant but is actually important here because means it won't be used by sched_clock() either. Perhaps a tristate enum "UNUSABLE, UNSTABLE, OK" would be clearer, and separate mark_tsc_unstable() and mark_tsc_broken() functions? Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09x86: fix setup of cyc2ns in tsc_64.cThomas Gleixner
upstream commit: b6db80ee1331e7beaeb91b4b3d946dd16c72e388 When the TSC is calibrated against the PIT due to the nonavailability of PMTIMER/HPET or due to SMI interference then the setup of the per CPU cyc2ns variables is skipped. This is unlikely to happen but it would definitely render sched_clock() unusable. This was introduced with commit 53d517cdbaac704352b3d0c10fecb99e0b54572e x86: scale cyc_2_nsec according to CPU frequency Update the per CPU cyc2ns variables in all exit pathes of tsc_calibrate. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09x86: don't read maxlvt before checking if APIC is mappedChuck Ebbert
upstream commit: 2584a82deed7196f48066f1b1a7fad4ec5bea961 A check for unmapped apic was added before reading maxlvt but the early read of maxlvt wasn't removed. Signed-off-by: Chuck Ebbert <cebbert@redhat.com> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-06-09i386: fix asm constraint in do_IRQ()Jan Beulich
upstream commit: 5065dbafc299507f16731434e95b91dadff03006 i386: fix asm constraint in do_IRQ() Two prior changes resulted in the "ecx" clobber being lost. Signed-off-by: Jan Beulich <jbeulich@novell.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Chris Wright <chrisw@sous-sol.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-06-09x86: user_regset_view table fix for ia32 on 64-bitRoland McGrath
commit 1f465f4e475454b8bb590846c50a9d16e8046f3d upstream The user_regset_view table for the 32-bit regsets on the 64-bit build had the wrong sizes for the FP regsets. This bug had no user-visible effect (just on kernel modules using the user_regset interfaces and the like). But the fix is trivial and risk-free. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Chris Wright <chrisw@sous-sol.org>
2008-05-15x86: use defconfigs from x86/configs/*Sam Ravnborg
commit b9b39bfba5b0de3418305f01cfa7bc55a16004e1 upstream x86: use defconfigs from x86/configs/* Daniel Drake <dsd@gentoo.org> reported: In 2.6.23, if you unpacked a kernel source tarball and then ran "make menuconfig" you'd be presented with this message: # using defaults found in arch/i386/defconfig and the default options would be set. The same thing in 2.6.24 does not give you any "using defaults" message, and the default config options within menuconfig are rather blank (e.g. no PCI support). You can work around this by explicitly running "make defconfig" before menuconfig, but it would be nice to have the behaviour the way it was for 2.6.23 (and the way it still is for other archs). Fixed by adding a x86 specific defconfig list to Kconfig. Fixes: http://bugzilla.kernel.org/show_bug.cgi?id=10470 Tested-by: Daniel Drake <dsd@gentoo.org> Signed-off-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-09x86 PCI: call dmi_check_pciprobe()Yinghai Lu
This is a backport of the noted commit which is in 2.6.26-rc1 now. This is necessary to enable pci=bfsort automatically on a number of Dell and HP servers, as well as pci=assign-busses for a few other systems, which was broken between 2.6.22 and 2.6.23. commit 0df18ff366853cdf31e5238764ec5c63e6b5a398 upstream x86 PCI: call dmi_check_pciprobe() this change: | commit 08f1c192c3c32797068bfe97738babb3295bbf42 | Author: Muli Ben-Yehuda <muli@il.ibm.com> | Date: Sun Jul 22 00:23:39 2007 +0300 | | x86-64: introduce struct pci_sysdata to facilitate sharing of ->sysdata | | This patch introduces struct pci_sysdata to x86 and x86-64, and | converts the existing two users (NUMA, Calgary) to use it. | | This lays the groundwork for having other users of sysdata, such as | the PCI domains work. | | The Calgary bits are tested, the NUMA bits just look ok. replaces pcibios_scan_root with pci_scan_bus_parented... but in pcibios_scan_root we have a DMI check: dmi_check_system(pciprobe_dmi_table); when when have several peer root buses this could be called multiple times (which is bad), so move that call to pci_access_init(). Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Matt Domsch <Matt_Domsch@dell.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-01x86: Fix 32-bit x86 MSI-X allocation leakagePJ Waskiewicz
commit 9d9ad4b51d2b29b5bbeb4011f5e76f7538119cf9 upstream This bug was introduced in the 2.6.24 lguest merge, where MSI-X vector allocation will eventually fail. The cause is the new bit array tracking used vectors is not getting cleared properly on IRQ destruction on the 32-bit APIC code. This can be seen easily using the ixgbe 10 GbE driver on multi-core systems by simply loading and unloading the driver a few times. Depending on the number of available vectors on the host system, the MSI-X allocation will eventually fail, and the driver will only be able to use legacy interrupts. Signed-off-by: Peter P Waskiewicz Jr <peter.p.waskiewicz.jr@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-05-01x86, pci: fix off-by-one errors in some pirq warningsBjörn Steinbrink
commit 223ac2f42d49dd0324ca02ea15897ead1a2f5133 upstream. fix bogus pirq warnings reported in: http://bugzilla.kernel.org/show_bug.cgi?id=10366 safe to be backported to v2.6.25 and earlier. Signed-off-by: Björn Steinbrink <B.Steinbrink@gmx.de> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2008-04-15acpi: unneccessary to scan the PCI bus already scannedyakui.zhao@intel.com
http://bugzilla.kernel.org/show_bug.cgi?id=10124 this change: commit 08f1c192c3c32797068bfe97738babb3295bbf42 Author: Muli Ben-Yehuda <muli@il.ibm.com> Date: Sun Jul 22 00:23:39 2007 +0300 x86-64: introduce struct pci_sysdata to facilitate sharing of ->sysdata This patch introduces struct pci_sysdata to x86 and x86-64, and converts the existing two users (NUMA, Calgary) to use it. This lays the groundwork for having other users of sysdata, such as the PCI domains work. The Calgary bits are tested, the NUMA bits just look ok. replaces pcibios_scan_root by pci_scan_bus_parented... but in pcibios_scan_root we have a check about scanned busses. Cc: <yakui.zhao@intel.com> Cc: Stian Jordet <stian@jordet.net> Cc: Len Brown <lenb@kernel.org> Cc: Greg KH <greg@kroah.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "Yinghai Lu" <yhlu.kernel@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-10asmlinkage_protect replaces prevent_tail_callRoland McGrath
The prevent_tail_call() macro works around the problem of the compiler clobbering argument words on the stack, which for asmlinkage functions is the caller's (user's) struct pt_regs. The tail/sibling-call optimization is not the only way that the compiler can decide to use stack argument words as scratch space, which we have to prevent. Other optimizations can do it too. Until we have new compiler support to make "asmlinkage" binding on the compiler's own use of the stack argument frame, we have work around all the manifestations of this issue that crop up. More cases seem to be prevented by also keeping the incoming argument variables live at the end of the function. This makes their original stack slots attractive places to leave those variables, so the compiler tends not clobber them for something else. It's still no guarantee, but it handles some observed cases that prevent_tail_call() did not. Signed-off-by: Roland McGrath <roland@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-10x86: Simplify cpu_idle_waitVenki Pallipadi
This patch also resolves hangs on boot: http://lkml.org/lkml/2008/2/23/263 http://bugzilla.kernel.org/show_bug.cgi?id=10093 The bug was causing once-in-few-reboots 10-15 sec wait during boot on certain laptops. Earlier commit 40d6a146629b98d8e322b6f9332b182c7cbff3df added smp_call_function in cpu_idle_wait() to kick cpus that are in tickless idle. Looking at cpu_idle_wait code at that time, code seemed to be over-engineered for a case which is rarely used (while changing idle handler). Below is a simplified version of cpu_idle_wait, which just makes a dummy smp_call_function to all cpus, to make them come out of old idle handler and start using the new idle handler. It eliminates code in the idle loop to handle cpu_idle_wait. Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-09pop previous section in alternative.cSteven Rostedt
gcc expects all toplevel assembly to return to the original section type. The code in alteranative.c does not do this. This caused some strange bugs in sched-devel where code would end up in the .rodata section and when the kernel sets the NX bit on all .rodata, the kernel would crash when executing this code. This patch adds a .previous marker to return the code back to the original section. Credit goes to Andrew Pinski for telling me it wasn't a gcc bug but a bug in the toplevel asm code in the kernel. ;-) Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-07x86: fix call to set_cyc2ns_scale() from time_cpufreq_notifier()Karsten Wiese
In time_cpufreq_notifier() the cpu id to act upon is held in freq->cpu. Use it instead of smp_processor_id() in the call to set_cyc2ns_scale(). This makes the preempt_*able() unnecessary and lets set_cyc2ns_scale() update the intended cpu's cyc2ns. Related mail/thread: http://lkml.org/lkml/2007/12/7/130 Signed-off-by: Karsten Wiese <fzu@wemgehoertderstaat.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-07revert "x86: tsc prevent time going backwards"Ingo Molnar
revert: | commit 47001d603375f857a7fab0e9c095d964a1ea0039 | Author: Thomas Gleixner <tglx@linutronix.de> | Date: Tue Apr 1 19:45:18 2008 +0200 | | x86: tsc prevent time going backwards it has been identified to cause suspend regression - and the commit fixes a longstanding bug that existed before 2.6.25 was opened - so it can wait some more until the effects are better understood. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-06Fix booting pentium+ with dodgy TSCRusty Russell
We handle a broken tsc these days, so no need to panic. We clear the TSC bit when tsc_init decides it's unreliable (eg. under lguest w/ bad host TSC), leading to bogus panic. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Acked-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-04x86: revert assign IRQs to hpet timerThomas Gleixner
The commits: commit 37a47db8d7f0f38dac5acf5a13abbc8f401707fa Author: Balaji Rao <balajirrao@gmail.com> Date: Wed Jan 30 13:30:03 2008 +0100 x86: assign IRQs to HPET timers, fix and commit e3f37a54f690d3e64995ea7ecea08c5ab3070faf Author: Balaji Rao <balajirrao@gmail.com> Date: Wed Jan 30 13:30:03 2008 +0100 x86: assign IRQs to HPET timers have been identified to cause a regression on some platforms due to the assignement of legacy IRQs which makes the legacy devices connected to those IRQs disfunctional. Revert them. This fixes http://bugzilla.kernel.org/show_bug.cgi?id=10382 Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-04x86: tsc prevent time going backwardsThomas Gleixner
We already catch most of the TSC problems by sanity checks, but there is a subtle bug which has been in the code for ever. This can cause time jumps in the range of hours. This was reported in: http://lkml.org/lkml/2007/8/23/96 and http://lkml.org/lkml/2008/3/31/23 I was able to reproduce the problem with a gettimeofday loop test on a dual core and a quad core machine which both have sychronized TSCs. The TSCs seems not to be perfectly in sync though, but the kernel is not able to detect the slight delta in the sync check. Still there exists an extremly small window where this delta can be observed with a real big time jump. So far I was only able to reproduce this with the vsyscall gettimeofday implementation, but in theory this might be observable with the syscall based version as well. CPU 0 updates the clock source variables under xtime/vyscall lock and CPU1, where the TSC is slighty behind CPU0, is reading the time right after the seqlock was unlocked. The clocksource reference data was updated with the TSC from CPU0 and the value which is read from TSC on CPU1 is less than the reference data. This results in a huge delta value due to the unsigned subtraction of the TSC value and the reference value. This algorithm can not be changed due to the support of wrapping clock sources like pm timer. The huge delta is converted to nanoseconds and added to xtime, which is then observable by the caller. The next gettimeofday call on CPU1 will show the correct time again as now the TSC has advanced above the reference value. To prevent this TSC specific wreckage we need to compare the TSC value against the reference value and return the latter when it is larger than the actual TSC value. I pondered to mark the TSC unstable when the readout is smaller than the reference value, but this would render an otherwise good and fast clocksource unusable without a real good reason. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-04xen: Clear PG_pinned in release_{pt,pd}()Mark McLoughlin
Signed-off-by: Mark McLoughlin <markmc@redhat.com> Cc: xen-devel@lists.xensource.com Cc: Mark McLoughlin <markmc@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-04xen: Do not pin/unpin PMD pagesMark McLoughlin
i.e. with this simple test case: int fd = open("/dev/zero", O_RDONLY); munmap(mmap((void *)0x40000000, 0x1000_LEN, PROT_READ, MAP_PRIVATE, fd, 0), 0x1000); close(fd); we currently get: kernel BUG at arch/x86/xen/enlighten.c:678! ... EIP is at xen_release_pt+0x79/0xa9 ... Call Trace: [<c041da25>] ? __pmd_free_tlb+0x1a/0x75 [<c047a192>] ? free_pgd_range+0x1d2/0x2b5 [<c047a2f3>] ? free_pgtables+0x7e/0x93 [<c047b272>] ? unmap_region+0xb9/0xf5 [<c047c1bd>] ? do_munmap+0x193/0x1f5 [<c047c24f>] ? sys_munmap+0x30/0x3f [<c0408cce>] ? syscall_call+0x7/0xb ======================= and xen complains: (XEN) mm.c:2241:d4 Mfn 1cc37 not pinned Further details at: https://bugzilla.redhat.com/436453 Signed-off-by: Mark McLoughlin <markmc@redhat.com> Cc: xen-devel@lists.xensource.com Cc: Mark McLoughlin <markmc@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-04xen: refactor xen_{alloc,release}_{pt,pd}()Mark McLoughlin
Signed-off-by: Mark McLoughlin <markmc@redhat.com> Cc: xen-devel@lists.xensource.com Cc: Mark McLoughlin <markmc@redhat.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-04x86, agpgart: scary messages are fortunately obsoletePavel Machek
Fix obsolete printks in aperture-64. We used not to handle missing agpgart, but we handle it okay now. Signed-off-by: Pavel Machek <pavel@suse.cz> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-04x86: print message if nmi_watchdog=2 cannot be enabledIngo Molnar
right now if there's no CPU support for nmi_watchdog=2 we'll just refuse it silently. print a useful warning. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-04x86: fix nmi_watchdog=2 on Pentium-D CPUsIngo Molnar
implement nmi_watchdog=2 on this class of CPUs: cpu family : 15 model : 6 model name : Intel(R) Pentium(R) D CPU 3.00GHz the watchdog's ->setup() method is safe anyway, so if the CPU cannot support it we'll bail out safely. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-03x86 ptrace: avoid unnecessary wrmsrRoland McGrath
This avoids using wrmsr on MSR_IA32_DEBUGCTLMSR when it's not needed. No wrmsr ever needs to be done if noone has ever used block stepping. Without this change, using ptrace on 2.6.25 on an x86 KVM guest will tickle KVM's missing support for the MSR and crash the guest kernel. Though host KVM is the buggy one, this makes for a regression in the guest behavior from 2.6.24->2.6.25 that we can easily avoid. I also corrected some bad whitespace. Signed-off-by: Roland McGrath <roland@redhat.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-04-02vmcoreinfo: add the symbol "phys_base"Ken'ichi Ohmichi
Fix the problem that makedumpfile sometimes fails on x86_64 machine. This patch adds the symbol "phys_base" to a vmcoreinfo data. The vmcoreinfo data has the minimum debugging information only for dump filtering. makedumpfile (dump filtering command) gets it to distinguish unnecessary pages, and makedumpfile creates a small dumpfile. On x86_64 kernel which compiled with CONFIG_PHYSICAL_START=0x0 and CONFIG_RELOCATABLE=y, makedumpfile fails like the following: # makedumpfile -d31 /proc/vmcore dumpfile The kernel version is not supported. The created dumpfile may be incomplete. _exclude_free_page: Can't get next online node. makedumpfile Failed. # The cause is the lack of the symbol "phys_base" in a vmcoreinfo data. If the symbol "phys_base" does not exist, makedumpfile considers an x86_64 kernel as non relocatable. As the result, makedumpfile misunderstands the physical address where the kernel is loaded, and it cannot translate a kernel virtual address to physical address correctly. To fix this problem, this patch adds the symbol "phys_base" to a vmcoreinfo data. Signed-off-by: Ken'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: <stable@kernel.org> Acked-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-28Avoid false positive warnings in kmap_atomic_prot() with DEBUG_HIGHMEMAndrew Morton
I believe http://bugzilla.kernel.org/show_bug.cgi?id=10318 is a false positive. There's no way in which networking will be using highmem pages here, so it won't be taking the KM_USER0 kmap slot, so there's no point in performing these checks. Cc: Pawel Staszewski <pstaszewski@artcom.pl> Cc: Ingo Molnar <mingo@elte.hu> Acked-by: Christoph Lameter <clameter@sgi.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> [ Really sad. We lose almost all real-life coverage of the debug tests with this patch. Now it will only report problems for the cases where people actually end up using a HIGHMEM page, not when they just _might_ use one. - Linus ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2008-03-28lguest: comment documentation update.Rusty Russell
Took some cycles to re-read the Lguest Journey end-to-end, fix some rot and tighten some phrases. Only comments change. No new jokes, but a couple of recycled old jokes. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2008-03-27x86: prefetch fix #2Ingo Molnar
Linus noticed a second bug and an uncleanliness: - we'd return on any instruction fetch fault - we'd use both the value of 16 and the PF_INSTR symbol which are the same and make no sense the cleanup nicely unifies this piece of logic. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-27xen: fix UP setup of shared_infoJeremy Fitzhardinge
We need to set up the shared_info pointer once we've mapped the real shared_info into its fixmap slot. That needs to happen once the general pagetable setup has been done. Previously, the UP shared_info was set up one in xen_start_kernel, but that was left pointing to the dummy shared info. Unfortunately there's no really good place to do a later setup of the shared_info in UP, so just do it once the pagetable setup has been done. [ Stable: needed in 2.6.24.x ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable Kernel <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-27xen: fix RMW when unmasking eventsJeremy Fitzhardinge
xen_irq_enable_direct and xen_sysexit were using "andw $0x00ff, XEN_vcpu_info_pending(vcpu)" to unmask events and test for pending ones in one instuction. Unfortunately, the pending flag must be modified with a locked operation since it can be set by another CPU, and the unlocked form of this operation was causing the pending flag to get lost, allowing the processor to return to usermode with pending events and ultimately deadlock. The simple fix would be to make it a locked operation, but that's rather costly and unnecessary. The fix here is to split the mask-clearing and pending-testing into two instructions; the interrupt window between them is of no concern because either way pending or new events will be processed. This should fix lingering bugs in using direct vcpu structure access too. [ Stable: needed in 2.6.24.x ] Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Cc: Stable <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-27x86: stricter check in follow_huge_addr()Christoph Lameter
The first page of the compound page is determined in follow_huge_addr() but then PageCompound() only checks if the page is part of a compound page. PageHead() allows checking if this is indeed the first page of the compound. Cc: Jeremy Fitzhardinge <jeremy@goop.org> Signed-off-by: Christoph Lameter <clameter@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-27rdc321x: GPIO routines bugfixesFlorian Fainelli
This patch fixes the use of GPIO routines which are in the PCI configuration space of the RDC321x, therefore reading/writing to this space without spinlock protection can be problematic. We also now request and free GPIOs and support the MGB100 board, previous code was very AR525W-centric. Signed-off-by: Volker Weiss <volker@tintuc.de> Signed-off-by: Florian Fainelli <florian.fainelli@telecomint.eu> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-27x86: ptrace.c: fix defined-but-unused warningsAndrew Morton
arch/x86/kernel/ptrace.c:548: warning: 'ptrace_bts_get_size' defined but not used arch/x86/kernel/ptrace.c:558: warning: 'ptrace_bts_read_record' defined but not used arch/x86/kernel/ptrace.c:607: warning: 'ptrace_bts_clear' defined but not used arch/x86/kernel/ptrace.c:617: warning: 'ptrace_bts_drain' defined but not used arch/x86/kernel/ptrace.c:720: warning: 'ptrace_bts_config' defined but not used arch/x86/kernel/ptrace.c:788: warning: 'ptrace_bts_status' defined but not used Cc: Roland McGrath <roland@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2008-03-27x86: fix prefetch workaroundIngo Molnar
some early Athlon XP's and Opterons generate bogus faults on prefetch instructions. The workaround for this regressed over .24 - reinstate it. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-26x86: fix performance drop for glxSuresh Siddha
fix the 3D performance drop reported at: http://bugzilla.kernel.org/show_bug.cgi?id=10328 fb drivers are using ioremap()/ioremap_nocache(), followed by mtrr_add with WC attribute. Recent changes in page attribute code made both ioremap()/ioremap_nocache() mappings as UC (instead of previous UC-). This breaks the graphics performance, as the effective memory type is UC instead of expected WC. The correct way to fix this is to add ioremap_wc() (which uses UC- in the absence of PAT kernel support and WC with PAT) and change all the fb drivers to use this new ioremap_wc() API. We can take this correct and longer route for post 2.6.25. For now, revert back to the UC- behavior for ioremap/ioremap_nocache. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-26x86: fix trim mtrr not to setup_memory two timesYinghai Lu
we could call find_max_pfn() directly instead of setup_memory() to get max_pfn needed for mtrr trimming. otherwise setup_memory() is called two times... that is duplicated... [ mingo@elte.hu: both Thomas and me simulated a double call to setup_bootmem_allocator() and can confirm that it is a real bug which can hang in certain configs. It's not been reported yet but that is probably due to the relatively scarce nature of MTRR-trimming systems. ] Signed-off-by: Yinghai Lu <yhlu.kernel@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-03-26x86: GEODE: add missing module.h includeAndres Salomon
On Wed, 26 Mar 2008 11:56:22 -0600 Jordan Crouse <jordan.crouse@amd.com> wrote: > On 26/03/08 14:31 +0100, Stefan Pfetzing wrote: > > Hello Jordan, > > > > I just tried to build your geodwdt driver for the geode watchdog. Therefore > > I pulled your repository from http://git.infradead.org/geode.git (or more, > > the git url). > > > > I tried to build the geodewdt driver as a module - which didn't work, and > > it failed with the same problem as earlier mentioned on lkmk [1]. I also > > checked the fix [2], but that seems to be already in your (or linus) tree - > > and so I'm unsure what the problem is. > > > > [1] http://kerneltrap.org/mailarchive/linux-kernel/2008/2/17/884074 > > [2] http://kerneltrap.org/mailarchive/linux-kernel/2008/2/17/884174 > > > > Building directly into the kernel seems to work. > > > > Maybe you have some idea? > > Hmm - that is strange. Exporting the symbols should work. I recommend > starting over with a clean tree. > > CCing Andres - any thoughts? > > Jordan > Er, yeah. The patch below should fix it. This should probably go into 2.6.25. Oops, EXPORT_SYMBOL_GPL wasn't being declared due to this header being missing. Signed-off-by: Andres Salomon <dilinger@debian.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>