blackbird-op-linux - Blackbird™ Linux sources for OpenPOWER

	Commit message (Collapse)	Author	Age	Files	Lines
*	powerpc: Remove fpscr use from [kvm_]cvt_{fd,df}	Andreas Schwab	2010-09-02	6	-48/+26
\| \| \| \| \| \| \| \|	Neither lfs nor stfs touch the fpscr, so remove the restore/save of it around them. Signed-off-by: Andreas Schwab <schwab@linux-m68k.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/pseries: Re-enable dispatch trace log userspace interface	Paul Mackerras	2010-09-02	3	-41/+179
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since the cpu accounting code uses the hypervisor dispatch trace log now when CONFIG_VIRT_CPU_ACCOUNTING = y, the previous commit disabled access to it via files in the /sys/kernel/debug/powerpc/dtl/ directory in that case. This restores those files. To do this, we now have a hook that the cpu accounting code will call as it processes each entry from the hypervisor dispatch trace log. The code in dtl.c now uses that to fill up its ring buffer, rather than having the hypervisor fill the ring buffer directly. This also fixes dtl_file_read() to handle overflow conditions a bit better and adds a spinlock to ensure that race conditions (multiple processes opening or reading the file concurrently) are handled correctly. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Account time using timebase rather than PURR	Paul Mackerras	2010-09-02	13	-194/+290
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently, when CONFIG_VIRT_CPU_ACCOUNTING is enabled, we use the PURR register for measuring the user and system time used by processes, as well as other related times such as hardirq and softirq times. This turns out to be quite confusing for users because it means that a program will often be measured as taking less time when run on a multi-threaded processor (SMT2 or SMT4 mode) than it does when run on a single-threaded processor (ST mode), even though the program takes longer to finish. The discrepancy is accounted for as stolen time, which is also confusing, particularly when there are no other partitions running. This changes the accounting to use the timebase instead, meaning that the reported user and system times are the actual number of real-time seconds that the program was executing on the processor thread, regardless of which SMT mode the processor is in. Thus a program will generally show greater user and system times when run on a multi-threaded processor than on a single-threaded processor. On pSeries systems on POWER5 or later processors, we measure the stolen time (time when this partition wasn't running) using the hypervisor dispatch trace log. We check for new entries in the log on every entry from user mode and on every transition from kernel process context to soft or hard IRQ context (i.e. when account_system_vtime() gets called). So that we can correctly distinguish time stolen from user time and time stolen from system time, without having to check the log on every exit to user mode, we store separate timestamps for exit to user mode and entry from user mode. On systems that have a SPURR (POWER6 and POWER7), we read the SPURR in account_system_vtime() (as before), and then apportion the SPURR ticks since the last time we read it between scaled user time and scaled system time according to the relative proportions of user time and system time over the same interval. This avoids having to read the SPURR on every kernel entry and exit. On systems that have PURR but not SPURR (i.e., POWER5), we do the same using the PURR rather than the SPURR. This disables the DTL user interface in /sys/debug/kernel/powerpc/dtl for now since it conflicts with the use of the dispatch trace log by the time accounting code. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Dynamically allocate most lppaca structs	Paul Mackerras	2010-09-02	2	-3/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This arranges for the lppaca structs for most cpus to be dynamically allocated in the same manner as the paca structs. If we don't include support for legacy iSeries, only the first lppaca is statically allocated; the rest are dynamically allocated. If we include legacy iSeries support, then we statically allocate the first 64 lppaca structs, since the iSeries hypervisor requires that the lppaca structs be present in the data section of the kernel image, but legacy iSeries supports at most 64 cpus. With CONFIG_NR_CPUS, the kernel image size for a typical pSeries config went from: text data bss dec hex filename 9524478 4734564 8469944 22728986 15ad11a ../test-1024/vmlinux to: text data bss dec hex filename 9524482 3751508 8469944 21745934 14bd10e ../test-1024/vmlinux a reduction of 983052 bytes overall. Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Abstract indexing of lppaca structs	Paul Mackerras	2010-09-02	7	-18/+20
\| \| \| \| \| \| \| \| \| \| \| \|	Currently we have the lppaca structs as a simple array of NR_CPUS entries, taking up space in the data section of the kernel image. In future we would like to allocate them dynamically, so this abstracts out the accesses to the array, making it easier to change how we locate the lppaca for a given cpu in future. Specifically, lppaca[cpu] changes to lppaca_of(cpu). Signed-off-by: Paul Mackerras <paulus@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Move arch_sd_sibling_asym_packing() to smp.c	Michael Neuling	2010-09-02	2	-11/+9
\| \| \| \| \| \| \| \| \| \|	Simple cleanup by moving arch_sd_sibling_asym_packing from process.c to smp.c to save an #ifdef CONFIG_SMP No functionality change. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Check end of stack canary at oops time	Anton Blanchard	2010-09-02	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \|	Add a check for the stack canary when we oops, similar to x86. This should make it clear that we overran our stack: Unable to handle kernel paging request for data at address 0x24652f63700ac689 Faulting instruction address: 0xc000000000063d24 Thread overran stack, or stack corrupted Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Feature nop out reservation clear when stcx checks address	Anton Blanchard	2010-09-02	2	-5/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The POWER architecture does not require stcx to check that it is operating on the same address as the larx. This means it is possible for an an exception handler to execute a larx, get a reservation, decide not to do the stcx and then return back with an active reservation. If the interrupted code was in the middle of a larx/stcx sequence the stcx could incorrectly succeed. All recent POWER CPUs check the address before letting the stcx succeed so we can create a CPU feature and nop it out. As Ben suggested, we can only do this in our syscall path because there is a remote possibility some kernel code gets interrupted by an exception that ends up operating on the same cacheline. Thanks to Paul Mackerras and Derek Williams for the idea. To test this I used a very simple null syscall (actually getppid) testcase at http://ozlabs.org/~anton/junkcode/null_syscall.c I tested against 2.6.35-git10 with the following changes against the pseries_defconfig: CONFIG_VIRT_CPU_ACCOUNTING=n CONFIG_AUDIT=n CONFIG_PPC_4K_PAGES=n CONFIG_PPC_64K_PAGES=y CONFIG_FORCE_MAX_ZONEORDER=9 CONFIG_PPC_SUBPAGE_PROT=n CONFIG_FUNCTION_TRACER=n CONFIG_FUNCTION_GRAPH_TRACER=n CONFIG_IRQSOFF_TRACER=n CONFIG_STACK_TRACER=n to remove the overhead of virtual CPU accounting, syscall auditing and the ftrace mcount tracers. 64kB pages were enabled to minimise TLB misses. POWER6: +8.2% POWER7: +7.0% Another suggestion was to use a larx to something in the L1 instead of a stcx. This was almost as fast as removing the larx on POWER6, but only 3.5% faster on POWER7. We can use this to speed up the reservation clear in our exception exit code. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Add 64bit csum_and_copy_to_user	Anton Blanchard	2010-09-02	2	-0/+40
\| \| \| \| \| \| \| \| \|	This adds the equivalent of csum_and_copy_from_user for the receive side so we can copy and checksum in one pass. It is modelled on the generic checksum routine. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Optimise 64bit csum_partial_copy_generic and add ↵	Anton Blanchard	2010-09-02	4	-88/+276
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	csum_and_copy_from_user We use the same core loop as the new csum_partial, adding in the stores and exception handling code. To keep things simple we do all the exception fixup in csum_and_copy_from_user. This wrapper function is modelled on the generic checksum code and is careful to always calculate a complete checksum even if we only copied part of the data to userspace. To test this I forced checksumming on over loopback and ran socklib (a simple TCP benchmark). On a POWER6 575 throughput improved by 19% with this patch. If I forced both the sender and receiver onto the same cpu (with the hope of shifting the benchmark from being cache bandwidth limited to cpu limited), adding this patch improved performance by 55% Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc: Optimise 64bit csum_partial	Anton Blanchard	2010-09-02	1	-40/+153
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The main loop of csum_partial runs very slowly on recent POWER CPUs. After some analysis on both POWER6 and POWER7 I came up with routine below. First we get the source aligned to a double word, ignoring any odd alignment to keep things simple. Then we do 64 bytes at a time, with an entry and exit limb of a further 64 bytes. On both POWER6 and POWER7 this should be as fast as we can go since we are limited by the latency of the adde instructions. To test this I forced checksumming on over loopback and ran socklib (a simple TCP benchmark). On a POWER6 575 throughput improved by 11% with this patch. Signed-off-by: Anton Blanchard <anton@samba.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/pseries: Correct rtas_data_buf locking in dlpar code	Nathan Fontenot	2010-09-02	1	-13/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The dlpar code can cause a deadlock to occur when making the RTAS configure-connector call. This occurs because we make kmalloc calls, which can block, while parsing the rtas_data_buf and holding the rtas_data_buf_lock. This an cause issues if someone else attempts to grab the rtas_data_bug_lock. This patch alleviates this issue by copying the contents of the rtas_data_buf to a local buffer before parsing. This allows us to only hold the rtas_data_buf_lock around the RTAS configure-connector calls. Signed-off-by: Nathan Fontenot <nfont@austin.ibm.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/85xx: Add P1021 PCI IDs and quirks	Anton Vorontsov	2010-08-31	2	-0/+4
\| \| \| \| \| \| \|	This is needed for proper PCI-E support on P1021 SoCs. Signed-off-by: Anton Vorontsov <avorontsov@mvista.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
*	arch/powerpc/sysdev/qe_lib/qe.c: Add of_node_put to avoid memory leak	Julia Lawall	2010-08-31	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a call to of_node_put in the error handling code following a call to of_find_compatible_node. The semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r exists@ local idexpression x; expression E,E1; statement S; @@ x = (of_find_node_by_path \|of_find_node_by_name \|of_find_node_by_phandle \|of_get_parent \|of_get_next_parent \|of_get_next_child \|of_find_compatible_node \|of_match_node )(...); ... if (x == NULL) S <... when != x = E if (...) { ... when != of_node_put(x) when != if (...) { ... of_node_put(x); ... } ( return <+...x...+>; \| * return ...; ) } ...> of_node_put(x); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Acked-by: Timur Tabi <timur@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
*	arch/powerpc/platforms/83xx/mpc837x_mds.c: Add missing iounmap	Julia Lawall	2010-08-31	1	-3/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The function of_iomap returns the result of calling ioremap, so iounmap should be called on the result in the error handling code, as done in the normal exit of the function. The sematic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r exists@ local idexpression x; expression E,E1; identifier l; statement S; @@ x = of_iomap(...); ... when != iounmap(x) when != if (...) { ... iounmap(x); ... } when != E = x when any ( if (x == NULL) S \| if (...) { ... when != iounmap(x) when != if (...) { ... iounmap(x); ... } ( return <+...x...+>; \| return ...; ) } ) ... when != x = E1 when any iounmap(x); // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
*	fsl_rio: fix compile errors	Li Yang	2010-08-31	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \|	Fixes the following compile problem on E500 platforms: arch/powerpc/sysdev/fsl_rio.c: In function 'fsl_rio_mcheck_exception': arch/powerpc/sysdev/fsl_rio.c:248: error: 'MCSR_MASK' undeclared (first use in this function) Also fixes the compile problem on non-E500 platforms. Signed-off-by: Li Yang <leoli@freescale.com> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
*	powerpc/85xx: Fix compile issue with p1022_ds due to lmb rename to memblock	Kumar Gala	2010-08-31	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	arch/powerpc/platforms/85xx/p1022_ds.c:22:23: error: linux/lmb.h: No such file or directory arch/powerpc/platforms/85xx/p1022_ds.c: In function 'p1022_ds_setup_arch': arch/powerpc/platforms/85xx/p1022_ds.c:100: error: implicit declaration of function 'memblock_end_of_DRAM' arch/powerpc/platforms/85xx/p1022_ds.c: At top level: arch/powerpc/platforms/85xx/p1022_ds.c:147: error: 'udbg_progress' undeclared here (not in a function) make[2]: *** [arch/powerpc/platforms/85xx/p1022_ds.o] Error 1 Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
*	powerpc/85xx: Fix compilation of mpc85xx_mds.c	Alexander Graf	2010-08-31	1	-0/+1
\| \| \| \| \| \| \| \| \|	Commit 99d8238f berobbed the for_each loop of its iterator! Let's be nice and give it back, so it compiles for us. CC: Anton Vorontsov <avorontsov@mvista.com> Signed-off-by: Alexander Graf <agraf@suse.de> Signed-off-by: Kumar Gala <galak@kernel.crashing.org>
*	powerpc: Don't use kernel stack with translation off	Michael Neuling	2010-08-31	1	-3/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In f761622e59433130bc33ad086ce219feee9eb961 we changed early_setup_secondary so it's called using the proper kernel stack rather than the emergency one. Unfortunately, this stack pointer can't be used when translation is off on PHYP as this stack pointer might be outside the RMO. This results in the following on all non zero cpus: cpu 0x1: Vector: 300 (Data Access) at [c00000001639fd10] pc: 000000000001c50c lr: 000000000000821c sp: c00000001639ff90 msr: 8000000000001000 dar: c00000001639ffa0 dsisr: 42000000 current = 0xc000000016393540 paca = 0xc000000006e00200 pid = 0, comm = swapper The original patch was only tested on bare metal system, so it never caught this problem. This changes __secondary_start so that we calculate the new stack pointer but only start using it after we've called early_setup_secondary. With this patch, the above problem goes away. Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/perf_event: Reduce latency of calling perf_event_do_pending	Paul Mackerras	2010-08-31	1	-12/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Commit 0fe1ac48 ("powerpc/perf_event: Fix oops due to perf_event_do_pending call") moved the call to perf_event_do_pending in timer_interrupt() down so that it was after the irq_enter() call. Unfortunately this moved it after the code that checks whether it is time for the next decrementer clock event. The result is that the call to perf_event_do_pending() won't happen until the next decrementer clock event is due. This was pointed out by Milton Miller. This fixes it by moving the check for whether it's time for the next decrementer clock event down to the point where we're about to call the event handler, after we've called perf_event_do_pending. This has the side effect that on old pre-Core99 Powermacs where we use the ppc_n_lost_interrupts mechanism to replay interrupts, a replayed interrupt will incur a little more latency since it will now do the code from the irq_enter down to the irq_exit, that it used to skip. However, these machines are now old and rare enough that this doesn't matter. To make it clear that ppc_n_lost_interrupts is only used on Powermacs, and to speed up the code slightly on non-Powermac ppc32 machines, the code that tests ppc_n_lost_interrupts is now conditional on CONFIG_PMAC as well as CONFIG_PPC32. Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: stable@kernel.org Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	powerpc/kexec: Adds correct calling convention for kexec purgatory	Matthew McClintock	2010-08-31	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Call kexec purgatory code correctly. We were getting lucky before. If you examine the powerpc 32bit kexec "purgatory" code you will see it expects the following: >From kexec-tools: purgatory/arch/ppc/v2wrap_32.S -> calling convention: -> r3 = physical number of this cpu (all cpus) -> r4 = address of this chunk (master only) As such, we need to set r3 to the current core, r4 happens to be unused by purgatory at the moment but we go ahead and set it here as well Signed-off-by: Matthew McClintock <msm@freescale.com> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
*	Linux 2.6.36-rc3v2.6.36-rc3	Linus Torvalds	2010-08-29	1	-1/+1
\|
*	Merge branch 'for-linus' of ↵	Linus Torvalds	2010-08-29	5	-26/+50
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6: firewire: ohci: work around VIA and NEC PHY packet reception bug firewire: core: do not use del_timer_sync() in interrupt context firewire: net: fix unicast reception RCODE in failure paths firewire: sbp2: fix stall with "Unsolicited response" firewire: sbp2: fix memory leak in sbp2_cancel_orbs or at send error ieee1394: Adjust confusing if indentation
\| *	firewire: ohci: work around VIA and NEC PHY packet reception bug	Stefan Richter	2010-08-29	1	-1/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	VIA VT6306, VIA VT6308, and NEC OrangeLink controllers do not write packet event codes for received PHY packets (or perhaps write evt_no_status, hard to tell). Work around it by overwriting the packet's ACK by ack_complete, so that upper layers that listen to PHY packet reception get to see these packets. (Also tested: TI TSB82AA2, TI TSB43AB22/A, TI XIO2213A, Agere FW643, JMicron JMB381 --- these do not exhibit this bug.) Clemens proposed a quirks flag for that, IOW whitelist known misbehaving controllers for this workaround. Though to me it seems harmless enough to enable for all controllers. The log_ar_at_event() debug log will continue to show the original status from the DMA unit. Reported-by: Clemens Ladisch <clemens@ladisch.de> (VT6308) Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
\| *	firewire: core: do not use del_timer_sync() in interrupt context	Clemens Ladisch	2010-08-19	1	-3/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because we might be in interrupt context, replace del_timer_sync() with del_timer(). If the timer is already running, we know that it will clean up the transaction, so we do not need to do any further processing in the normal transaction handler. Many thanks to Yong Zhang for diagnosing this. Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: Clemens Ladisch <clemens@ladisch.de> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
\| *	firewire: net: fix unicast reception RCODE in failure paths	Stefan Richter	2010-08-19	1	-13/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The incoming request hander fwnet_receive_packet() expects subsequent datagram handling code to return non-zero on errors. However, almost none of the failure paths did so. Fix them all. (This error reporting is used to send and RCODE_CONFLICT_ERROR to the sender node in such failure cases. Two modes of failure exist: Out of memory, or firewire-net is unaware of any peer node to which a fragment or an ARP packet belongs. However, it is unclear whether a sender can actually make use of such information. A Linux peer apparently can't. Maybe it should all be simplified to void functions.) Reported-by: Julia Lawall <julia@diku.dk> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
\| *	firewire: sbp2: fix stall with "Unsolicited response"	Stefan Richter	2010-08-19	1	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix I/O stalls with some 4-bay RAID enclosures which are based on OXUF936QSE: - Onnto dataTale RSM4QO, old firmware (not anymore with current firmware), - inXtron Hydra Super-S LCM, old as well as current firmware when used in RAID-5 mode, perhaps also in other RAID modes. The stalls happen during heavy or moderate disk traffic in periods that are a multiple of 5 minutes, roughly twice per hour. They are caused by the target responding too late to an ORB_Pointer register write: The target responds after Split_Timeout, hence firewire-core cancels the transaction, and firewire-sbp2 fails the SCSI request. The SCSI core retries the request, that fails again (and again), hence SCSI core calls firewire-sbp2's abort handler (and even the Management_Agent register write in the abort handler has the transaction timeout problem). During all that, the process which issued the I/O is stalled in I/O wait state. Meanwhile, the target actually acts on the first failed SCSI request: It responds to the ORB_Pointer write later (seen in the kernel log as "firewire_core: Unsolicited response") and also finishes the SCSI request with proper status (seen in the kernel log as "firewire_sbp2: status write for unknown orb"). So let's just ignore RCODE_CANCELLED in the transaction callback and wait for the target to complete the ORB nevertheless. This requires a small modification is sbp2_cancel_orbs(); it now needs to call orb->callback() regardless whether fw_cancel_transaction() found the transaction unfinished or finished. A different solution is to increase Split_Timeout on the local node. (Tested: 2000ms timeout; maybe 1000ms or something like that works too. 200ms is insufficient. Standard is 100ms.) However, I rather not do this because any software on any node could change the Split_Timeout to something unsuitable. Or such a large Split_Timeout may be undesirable for other purposes. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
\| *	firewire: sbp2: fix memory leak in sbp2_cancel_orbs or at send error	Stefan Richter	2010-08-19	1	-5/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When an ORB was canceled (Command ORB i.e. SCSI request timed out, or Management ORB timed out), or there was a send error in the initial transaction, we missed to drop one of the ORB's references and thus leaked memory. Background: In total, we hold 3 references to each Operation Request Block: - 1 during sbp2_scsi_queuecommand() or sbp2_send_management_orb() respectively, - 1 for the duration of the write transaction to the ORB_Pointer or Management_Agent register of the target, - 1 for as long as the ORB stays within the lu->orb_list, until the ORB is unlinked from the list and the orb->callback was executed. The latter one of these 3 references is finished - normally by sbp2_status_write() when the target wrote status for a pending ORB, - or by sbp2_cancel_orbs() in case of an ORB time-out, - or by complete_transaction() in case of a send error. Of them, the latter two lacked the kref_put. Add the missing kref_put()s. Add comments to the gets and puts of references for transaction callbacks and ORB callbacks so that it is easier to see what is supposed to happen. Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
\| *	ieee1394: Adjust confusing if indentation	Julia Lawall	2010-08-05	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Indent the branch of an if. Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
* \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6	Linus Torvalds	2010-08-28	15	-72/+98
\|\ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6: net/ipv4: Eliminate kstrdup memory leak net/caif/cfrfml.c: use asm/unaligned.h ax25: missplaced sock_put(sk) qlge: reset the chip before freeing the buffers l2tp: test for ethernet header in l2tp_eth_dev_recv() tcp: select(writefds) don't hang up when a peer close connection tcp: fix three tcp sysctls tuning tcp: Combat per-cpu skew in orphan tests. pxa168_eth: silence gcc warnings pxa168_eth: update call to phy_mii_ioctl() pxa168_eth: fix error handling in prope pxa168_eth: remove unneeded null check phylib: Fix race between returning phydev and calling adjust_link caif-driver: add HAS_DMA dependency 3c59x: Fix deadlock between boomerang_interrupt and boomerang_start_tx qlcnic: fix poll implementation netxen: fix poll implementation bridge: netfilter: fix a memory leak
\| * \|	net/ipv4: Eliminate kstrdup memory leak	Julia Lawall	2010-08-27	1	-2/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The string clone is only used as a temporary copy of the argument val within the while loop, and so it should be freed before leaving the function. The call to strsep, however, modifies clone, so a pointer to the front of the string is kept in saved_clone, to make it possible to free it. The sematic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @r exists@ local idexpression x; expression E; identifier l; statement S; @@ x= $kasprintf\\|kstrdup$(...); ... if (x == NULL) S ... when != kfree(x) when != E = x if (...) { <... when != kfree(x) goto l; ...> * return ...; } // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	net/caif/cfrfml.c: use asm/unaligned.h	Jeff Mahoney	2010-08-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	caif does not build on ia64 starting with 2.6.32-rc1. Using asm/unaligned.h instead of linux/unaligned/le_byteshift.h fixes the issue. include/linux/unaligned/le_byteshift.h:40:50: error: redefinition of 'get_unaligned_le16' include/linux/unaligned/le_byteshift.h:45:50: error: redefinition of 'get_unaligned_le32' include/linux/unaligned/le_byteshift.h:50:50: error: redefinition of 'get_unaligned_le64' include/linux/unaligned/le_byteshift.h:55:51: error: redefinition of 'put_unaligned_le16' include/linux/unaligned/le_byteshift.h:60:51: error: redefinition of 'put_unaligned_le32' include/linux/unaligned/le_byteshift.h:65:51: error: redefinition of 'put_unaligned_le64' include/linux/unaligned/le_struct.h:31:51: note: previous definition of 'put_unaligned_le64' was here Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	ax25: missplaced sock_put(sk)	Bernard Pidoux F6BVP	2010-08-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch moves a missplaced sock_put(sk) after bh_unlock_sock(sk) like in other parts of AX25 driver. Signed-off-by: Bernard Pidoux <f6bvp@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	qlge: reset the chip before freeing the buffers	Breno Leitao	2010-08-26	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Qlge is freeing the buffers before stopping the card DMA, and this can cause some severe error, as a EEH event on PPC. This patch just stop the card and then free the resources. Signed-off-by: Breno Leitao <leitao@linux.vnet.ibm.com> Signed-off-by: Ron Mercer <ron.mercer@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	l2tp: test for ethernet header in l2tp_eth_dev_recv()	Eric Dumazet	2010-08-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	close https://bugzilla.kernel.org/show_bug.cgi?id=16529 Before calling dev_forward_skb(), we should make sure skb head contains at least an ethernet header, even if length included in upper layer said so. Use pskb_may_pull() to make sure this ethernet header is present in skb head. Reported-by: Thomas Heil <heil@terminal-consulting.de> Reported-by: Ian Campbell <Ian.Campbell@eu.citrix.com> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	tcp: select(writefds) don't hang up when a peer close connection	KOSAKI Motohiro	2010-08-25	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This issue come from ruby language community. Below test program hang up when only run on Linux. % uname -mrsv Linux 2.6.26-2-486 #1 Sat Dec 26 08:37:39 UTC 2009 i686 % ruby -rsocket -ve ' BasicSocket.do_not_reverse_lookup = true serv = TCPServer.open("127.0.0.1", 0) s1 = TCPSocket.open("127.0.0.1", serv.addr[1]) s2 = serv.accept s2.close s1.write("a") rescue p $! s1.write("a") rescue p $! Thread.new { s1.write("a") }.join' ruby 1.9.3dev (2010-07-06 trunk 28554) [i686-linux] #<Errno::EPIPE: Broken pipe> [Hang Here] FreeBSD, Solaris, Mac doesn't. because Ruby's write() method call select() internally. and tcp_poll has a bug. SUS defined 'ready for writing' of select() as following. \| A descriptor shall be considered ready for writing when a call to an output \| function with O_NONBLOCK clear would not block, whether or not the function \| would transfer data successfully. That said, EPIPE situation is clearly one of 'ready for writing'. We don't have read-side issue because tcp_poll() already has read side shutdown care. \| if (sk->sk_shutdown & RCV_SHUTDOWN) \| mask \|= POLLIN \| POLLRDNORM \| POLLRDHUP; So, Let's insert same logic in write side. - reference url http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/31065 http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-core/31068 Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	tcp: fix three tcp sysctls tuning	Eric Dumazet	2010-08-25	1	-17/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As discovered by Anton Blanchard, current code to autotune tcp_death_row.sysctl_max_tw_buckets, sysctl_tcp_max_orphans and sysctl_max_syn_backlog makes little sense. The bigger a page is, the less tcp_max_orphans is : 4096 on a 512GB machine in Anton's case. (tcp_hashinfo.bhash_size * sizeof(struct inet_bind_hashbucket)) is much bigger if spinlock debugging is on. Its wrong to select bigger limits in this case (where kernel structures are also bigger) bhash_size max is 65536, and we get this value even for small machines. A better ground is to use size of ehash table, this also makes code shorter and more obvious. Based on a patch from Anton, and another from David. Reported-and-tested-by: Anton Blanchard <anton@samba.org> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	tcp: Combat per-cpu skew in orphan tests.	David S. Miller	2010-08-25	3	-12/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As reported by Anton Blanchard when we use percpu_counter_read_positive() to make our orphan socket limit checks, the check can be off by up to num_cpus_online() * batch (which is 32 by default) which on a 128 cpu machine can be as large as the default orphan limit itself. Fix this by doing the full expensive sum check if the optimized check triggers. Reported-by: Anton Blanchard <anton@samba.org> Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Eric Dumazet <eric.dumazet@gmail.com>
\| * \|	pxa168_eth: silence gcc warnings	Dan Carpenter	2010-08-24	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Casting "pep->tx_desc_dma" to to a struct tx_desc pointer makes gcc complain: drivers/net/pxa168_eth.c:657: warning: cast to pointer from integer of different size Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	pxa168_eth: update call to phy_mii_ioctl()	Dan Carpenter	2010-08-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The phy_mii_ioctl() function changed recently. It now takes a struct ifreq pointer directly. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	pxa168_eth: fix error handling in prope	Dan Carpenter	2010-08-24	1	-22/+22
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A couple issues here: * Some resources weren't released. * If alloc_etherdev() failed it would have caused a NULL dereference because "pep" would be null when we checked "if (pep->clk)". * Also it's better to propagate the error codes from mdiobus_register() instead of just returning -ENOMEM. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	pxa168_eth: remove unneeded null check	Dan Carpenter	2010-08-24	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"pep->pd" isn't checked consistently in this function. For example it's dereferenced unconditionally on the next line after the end of the if condition. This function is only called from pxa168_eth_probe() and pep->pd is always non-NULL so I removed the check. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	phylib: Fix race between returning phydev and calling adjust_link	Anton Vorontsov	2010-08-24	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It is possible that phylib will call adjust_link before returning from {,of_}phy_connect(), which may cause the following [very rare, though] oops upon reopening the device: Unable to handle kernel paging request for data at address 0x0000024c Oops: Kernel access of bad area, sig: 11 [#1] PREEMPT SMP NR_CPUS=2 LTT NESTING LEVEL : 0 P1021 RDB Modules linked in: NIP: c0345dac LR: c0345dac CTR: c0345d84 TASK = dffab6b0[30] 'events/0' THREAD: c0d24000 CPU: 0 [...] NIP [c0345dac] adjust_link+0x28/0x19c LR [c0345dac] adjust_link+0x28/0x19c Call Trace: [c0d25f00] [000045e1] 0x45e1 (unreliable) [c0d25f30] [c036c158] phy_state_machine+0x3ac/0x554 [...] Here is why. Drivers store phydev in their private structures, e.g. gianfar driver: static int init_phy(struct net_device dev) { ... priv->phydev = of_phy_connect(...); ... } So that adjust_link could retrieve it back: static void adjust_link(struct net_device dev) { ... struct phy_device *phydev = priv->phydev; ... } If the device has been opened before, then phydev->state is set to PHY_HALTED (or undefined if the driver didn't call phy_stop()). Now, phy_connect starts the PHY state machine before returning phydev to the driver: phy_start_machine(phydev, NULL); if (phydev->irq > 0) phy_start_interrupts(phydev); return phydev; The time between 'phy_start_machine()' and 'return phydev' is undefined. The start machine routine delays execution for 1 second, which is enough for most cases. But under heavy load, or if you're unlucky, it is quite possible that PHY state machine will execute before phy_connect() returns, and so adjust_link callback will try to dereference phydev, which is not yet ready. To fix the issue, simply initialize the PHY's state to PHY_READY during phy_attach(). This will ensure that phylib won't call adjust_link before phy_start(). Signed-off-by: Anton Vorontsov <avorontsov@mvista.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	caif-driver: add HAS_DMA dependency	Heiko Carstens	2010-08-24	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix this error on an s390 allyesconfig build: linux-2.6/drivers/net/caif/caif_spi.c:98: undefined reference to `dma_free_coherent' Cc: Sjur Braendeland <sjur.brandeland@stericsson.com> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	3c59x: Fix deadlock between boomerang_interrupt and boomerang_start_tx	Neil Horman	2010-08-24	1	-1/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If netconsole is in use, there is a possibility for deadlock in 3c59x between boomerang_interrupt and boomerang_start_xmit. Both routines take the vp->lock, and if netconsole is in use, a pr_* call from the boomerang_interrupt routine will result in the netconsole code attempting to trnasmit an skb, which can try to take the same spin lock, resulting in deadlock. The fix is pretty straightforward. This patch allocats a bit in the 3c59x private structure to indicate that its handling an interrupt. If we get into the transmit routine and that bit is set, we can be sure that we have recursed and will deadlock if we continue, so instead we just return NETDEV_TX_BUSY, so the stack requeues the skb to try again later. Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	qlcnic: fix poll implementation	Yinglin Luan	2010-08-23	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Function qlcnic_intr has pointer to qlcnic_host_sds_ring as second parameter not pointer to qlcnic_adapter. Signed-off-by: Yinglin Luan <synmyth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	netxen: fix poll implementation	Yinglin Luan	2010-08-23	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Function netxen_intr has pointer to nx_host_sds_ring as second parameter not pointer to netxen_adapter. Signed-off-by: Yinglin Luan <synmyth@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
\| * \|	bridge: netfilter: fix a memory leak	Changli Gao	2010-08-23	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	nf_bridge_alloc() always reset the skb->nf_bridge, so we should always put the old one. Signed-off-by: Changli Gao <xiaosuo@gmail.com> Signed-off-by: Bart De Schuymer <bdschuym@pandora.be> Signed-off-by: David S. Miller <davem@davemloft.net>
* \| \|	Merge branch 'for-linus' of ↵	Linus Torvalds	2010-08-28	13	-581/+0
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin: Blackfin: bf52x/bf54x boards: drop unused nand page size Blackfin: punt duplicate SPORT MMR defines
\| * \| \|	Blackfin: bf52x/bf54x boards: drop unused nand page size	Barry Song	2010-08-27	5	-5/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Now that the driver for the Blackfin on-chip NFC no longer uses/respects the page_size from the platform resources (figures out the needs on the fly), drop it from the platform resources. This fixes some build errors since the defines no longer exists. Signed-off-by: Barry Song <barry.song@analog.com> Signed-off-by: Mike Frysinger <vapier@gentoo.org>