blackbird-op-linux - Blackbird™ Linux sources for OpenPOWER

	Commit message (Collapse)	Author	Age	Files	Lines
*	Merge git://git.kernel.org/pub/scm/linux/kernel/git/czankel/xtensa-2.6	Linus Torvalds	2009-04-03	45	-306/+2466
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/czankel/xtensa-2.6: (21 commits) xtensa: we don't need to include asm/io.h xtensa: only build platform or variant if they contain a Makefile xtensa: make startup code discardable xtensa: ccount clocksource xtensa: remove platform rtc hooks xtensa: use generic sched_clock() xtensa: platform: s6105 xtensa: let platform override KERNELOFFSET xtensa: s6000 variant xtensa: s6000 variant core definitions xtensa: variant irq set callbacks xtensa: variant-specific code xtensa: nommu support xtensa: add flat support xtensa: enforce slab alignment to maximum register width xtensa: cope with ram beginning at higher addresses xtensa: don't make bootmem bitmap larger than required xtensa: fix init_bootmem_node() argument order xtensa: use correct stack pointer for stack traces xtensa: beat Kconfig into shape ...
\| *	Merge branch 'master' of ↵	Chris Zankel	2009-04-03	366	-6286/+34561
\| \|\ \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into merge
\| * \|	xtensa: we don't need to include asm/io.h	Chris Zankel	2009-04-03	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove include statement to include asm/io.h. Signed-off-by: Chris Zankel <chris@zankel.net>
\| * \|	xtensa: only build platform or variant if they contain a Makefile	Chris Zankel	2009-04-03	3	-10/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We only add the platform or variant directory to core-y if it contains a Makefile. Consequently, we can remove the Makefiles for the dc232b and fsf processor variants. Signed-off-by: Chris Zankel <chris@zankel.net>
\| * \|	xtensa: make startup code discardable	Daniel Glöckner	2009-04-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Move it from .text to .init.text to get rid of it after boot and prevent illegal section references. Signed-off-by: Daniel Glöckner <dg@emlix.com> Signed-off-by: Chris Zankel <chris@zankel.net>
\| * \|	xtensa: ccount clocksource	Johannes Weiner	2009-04-02	2	-73/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Switch to GENERIC_TIME by using the ccount register as a clock source. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Chris Zankel <chris@zankel.net>
\| * \|	xtensa: remove platform rtc hooks	Johannes Weiner	2009-04-02	3	-41/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	platform_get/set_rtc_time() is not implemented by any of the supported xtensa platforms. Remove the facility completely. The initial seconds for xtime come from read_persistent_clock() which returns just 0 in the generic implementation. Platforms that sport a persistent clock can implement this function. This is needed to implement the ccount as a clock source. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Chris Zankel <chris@zankel.net>
\| * \|	xtensa: use generic sched_clock()	Johannes Weiner	2009-04-02	1	-9/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Current xtensa implementation of sched_clock() is the same as the generic one. Just remove it, the weak symbol in kernel/sched_clock.c will be used instead. Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Chris Zankel <chris@zankel.net>
\| * \|	xtensa: platform: s6105	Johannes Weiner	2009-04-02	9	-0/+712
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Support for the S6105 IP Camera Reference Design Kit. Signed-off-by: Johannes Weiner <jw@emlix.com> Signed-off-by: Oskar Schirmer <os@emlix.com> Signed-off-by: Chris Zankel <chris@zankel.net>
\| * \|	xtensa: let platform override KERNELOFFSET	Johannes Weiner	2009-04-02	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The linker script should not assume a fix offset in memory for the kernel, this is platform-specific, so let the platform set it. Signed-off-by: Johannes Weiner <jw@emlix.com> Signed-off-by: Chris Zankel <chris@zankel.net>
\| * \|	xtensa: s6000 variant	Johannes Weiner	2009-04-02	8	-0/+481
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Support for the Stretch S6000 Xtensa core variant. Signed-off-by: Johannes Weiner <jw@emlix.com> Signed-off-by: Oskar Schirmer <os@emlix.com> Signed-off-by: Chris Zankel <chris@zankel.net>
\| * \|	xtensa: s6000 variant core definitions	Johannes Weiner	2009-04-02	3	-0/+926
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	S6000 core configuration files from Tensilica. Signed-off-by: Johannes Weiner <jw@emlix.com> Signed-off-by: Chris Zankel <chris@zankel.net>
\| * \|	xtensa: variant irq set callbacks	Johannes Weiner	2009-04-02	3	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow the core variant code to provide irq enable/disable callbacks. Signed-off-by: Johannes Weiner <jw@emlix.com> Signed-off-by: Chris Zankel <chris@zankel.net>
\| * \|	xtensa: variant-specific code	Johannes Weiner	2009-04-02	3	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow the variant to provide real code. Add empty dummy Makefiles for the existing variants. Signed-off-by: Johannes Weiner <jw@emlix.com> Signed-off-by: Chris Zankel <chris@zankel.net>
\| * \|	xtensa: nommu support	Johannes Weiner	2009-04-02	19	-75/+169
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add support for !CONFIG_MMU setups. Signed-off-by: Johannes Weiner <jw@emlix.com> Signed-off-by: Chris Zankel <chris@zankel.net>
\| * \|	xtensa: add flat support	Oskar Schirmer	2009-04-02	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add the arch-specific header for flat support on xtensa in preparation for the Xtensa S6000 nommu port. Signed-off-by: Oskar Schirmer <os@emlix.com> Signed-off-by: Johannes Weiner <jw@emlix.com> Signed-off-by: Chris Zankel <chris@zankel.net>
\| * \|	xtensa: enforce slab alignment to maximum register width	Oskar Schirmer	2009-04-02	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	XCHAL_DATA_WIDTH is the maximum register width, slab caches should be aligned to this. Theoretical fix as all variants have had an XCHAL_DATA_WIDTH of 4 (wordsize) for now. But the S6000 variant will raise this to 16. Signed-off-by: Oskar Schirmer <os@emlix.com> Signed-off-by: Johannes Weiner <jw@emlix.com> Signed-off-by: Chris Zankel <chris@zankel.net>
\| * \|	xtensa: cope with ram beginning at higher addresses	Johannes Weiner	2009-04-02	2	-6/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The current assumption of the memory code is that the first RAM PFN in the system is 0. Adjust the relevant code to play well with setups where memory starts at higher addresses, indicated by PLATFORM_DEFAULT_MEM_START. The new memory model looks like this: +----------+--+----------------------+----------------+ \| \| \| \| \| \| \| \| RAM \| \| \| \| \| \| \| +----------+--+----------------------+----------------+ \| \| \| \| \| +- PFN 0 \| +- min_low_pfn +- max_low_pfn +- max_pfn \| +- ARCH_PFN_OFFSET +- PLATFORM_DEFAULT_MEM_START >> PAGE_SIZE The memory map contains pages starting from pfn ARCH_PFN_OFFSET up to max_low_pfn. The only zone used right now will span exactly the same region. Usually, ARCH_PFN_OFFSET and min_low_pfn are the same value. Handle them separately for robustness. Gapping pages will be in the memory map but marked as reserved and won't be touched. Signed-off-by: Johannes Weiner <jw@emlix.com> Signed-off-by: Chris Zankel <chris@zankel.net>
\| * \|	xtensa: don't make bootmem bitmap larger than required	Johannes Weiner	2009-04-02	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If min_low_pfn is non-zero, the bitmap reserved for bootmem is bigger than needed. The number of pages bootmem has to maintain is the range from min_low_pfn to max_low_pfn. For now it has only been a theoretical mistake, min_low_pfn was always zero. Signed-off-by: Johannes Weiner <jw@emlix.com> Signed-off-by: Chris Zankel <chris@zankel.net>
\| * \|	xtensa: fix init_bootmem_node() argument order	Johannes Weiner	2009-04-02	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The second argument to init_bootmem_node() is the PFN to place the bootmem bitmap at and the third argument is the first PFN on the node. This is currently backwards but never made any problems as both values were always zero. Signed-off-by: Johannes Weiner <jw@emlix.com> Signed-off-by: Chris Zankel <chris@zankel.net>
\| * \|	xtensa: use correct stack pointer for stack traces	Johannes Weiner	2009-04-02	1	-8/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Right now, the xtensa stacktrace code reads the _current_ kernel stack pointer if nothing is supplied. With debugging facilities like sysrq this means that the backtrace of the sysrq-handler is printed instead of a trace of the given task's stack. When no stack pointer is specified in show_trace() and show_stack(), use the stack pointer that comes with the handed in task descriptor to make stack traces more useful. Signed-off-by: Johannes Weiner <jw@emlix.com> Signed-off-by: Chris Zankel <chris@zankel.net>
\| * \|	xtensa: beat Kconfig into shape	Johannes Weiner	2009-04-02	1	-88/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instead of making support code depend on variants or platforms, the latter should select what they need explicitely. Otherwise this starts looking weird when support code depends on !XTENSA_PLATFORM_FOO && !XTENSA_PLATFORM_BAR etc. This also includes some minor fixlets like converting bool and default to def_bool and fixing indentation and whitespace errors. Signed-off-by: Johannes Weiner <jw@emlix.com> Signed-off-by: Chris Zankel <chris@zankel.net>
\| * \|	xtensa: remove redefinition of XCHAL_MMU_ASID_BITS	Johannes Weiner	2009-04-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This constant is defined in all core headers. Remove the redundant definition which might error out if other includes lead to inclusion of <variant/core.h>. Signed-off-by: Johannes Weiner <jw@emlix.com> Signed-off-by: Chris Zankel <chris@zankel.net>
* \| \|	Merge git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable	Linus Torvalds	2009-04-03	17	-544/+982
\|\ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: BUG to BUG_ON changes Btrfs: remove dead code Btrfs: remove dead code Btrfs: fix typos in comments Btrfs: remove unused ftrace include Btrfs: fix __ucmpdi2 compile bug on 32 bit builds Btrfs: free inode struct when btrfs_new_inode fails Btrfs: fix race in worker_loop Btrfs: add flushoncommit mount option Btrfs: notreelog mount option Btrfs: introduce btrfs_show_options Btrfs: rework allocation clustering Btrfs: Optimize locking in btrfs_next_leaf() Btrfs: break up btrfs_search_slot into smaller pieces Btrfs: kill the pinned_mutex Btrfs: kill the block group alloc mutex Btrfs: clean up find_free_extent Btrfs: free space cache cleanups Btrfs: unplug in the async bio submission threads Btrfs: keep processing bios for a given bdev if our proc is batching
\| * \| \|	Btrfs: BUG to BUG_ON changes	Stoyan Gaydarov	2009-04-02	3	-6/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \| \|	Btrfs: remove dead code	Dan Carpenter	2009-04-02	1	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove an unneeded return statement and conditional Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \| \|	Btrfs: remove dead code	Dan Carpenter	2009-04-02	1	-1/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	merge is always NULL at this point. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \| \|	Btrfs: fix typos in comments	Wu Fengguang	2009-04-02	3	-12/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \| \|	Btrfs: remove unused ftrace include	Jim Owens	2009-04-02	2	-2/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Signed-off-by: jim owens <jowens@hp.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \| \|	Btrfs: fix __ucmpdi2 compile bug on 32 bit builds	Heiko Carstens	2009-04-03	1	-11/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We get this on 32 builds: fs/built-in.o: In function `extent_fiemap': (.text+0x1019f2): undefined reference to `__ucmpdi2' Happens because of a switch statement with a 64 bit argument. Convert this to an if statement to fix this. Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \| \|	Btrfs: free inode struct when btrfs_new_inode fails	Shen Feng	2009-04-02	1	-1/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	btrfs_new_inode doesn't call iput to free the inode when it fails. Signed-off-by: Shen Feng <shen@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \| \|	Btrfs: fix race in worker_loop	Amit Gud	2009-04-02	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Need to check kthread_should_stop after schedule_timeout() before calling schedule(). This causes threads to sleep with potentially no one to wake them up causing mount(2) to hang in btrfs_stop_workers waiting for threads to stop. Signed-off-by: Amit Gud <gud@ksu.edu> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \| \|	Btrfs: add flushoncommit mount option	Sage Weil	2009-04-02	3	-5/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The 'flushoncommit' mount option forces any data dirtied by a write in a prior transaction to commit as part of the current commit. This makes the committed state a fully consistent view of the file system from the application's perspective (i.e., it includes all completed file system operations). This was previously the behavior only when a snapshot is created. This is used by Ceph to ensure that completed writes make it to the platter along with the metadata operations they are bound to (by BTRFS_IOC_TRANS_{START,END}). Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \| \|	Btrfs: notreelog mount option	Sage Weil	2009-04-02	3	-1/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add a 'notreelog' mount option to disable the tree log (used by fsync, O_SYNC writes). This is much slower, but the tree logging produces inconsistent views into the FS for ceph. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \| \|	Btrfs: introduce btrfs_show_options	Eric Paris	2009-04-02	1	-1/+33
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	btrfs options can change at times other than mount, yet /proc/mounts shows the options string used when the fs was mounted (an example would be when btrfs determines that barriers aren't useful and turns them off.) This patch instead outputs the actual options in use by btrfs. Signed-off-by: Eric Paris <eparis@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \| \|	Btrfs: rework allocation clustering	Chris Mason	2009-04-03	6	-74/+509
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Because btrfs is copy-on-write, we end up picking new locations for blocks very often. This makes it fairly difficult to maintain perfect read patterns over time, but we can at least do some optimizations for writes. This is done today by remembering the last place we allocated and trying to find a free space hole big enough to hold more than just one allocation. The end result is that we tend to write sequentially to the drive. This happens all the time for metadata and it happens for data when mounted -o ssd. But, the way we record it is fairly racey and it tends to fragment the free space over time because we are trying to allocate fairly large areas at once. This commit gets rid of the races by adding a free space cluster object with dedicated locking to make sure that only one process at a time is out replacing the cluster. The free space fragmentation is somewhat solved by allowing a cluster to be comprised of smaller free space extents. This part definitely adds some CPU time to the cluster allocations, but it allows the allocator to consume the small holes left behind by cow. Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \| \|	Btrfs: Optimize locking in btrfs_next_leaf()	Chris Mason	2009-04-03	1	-23/+65
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	btrfs_next_leaf was using blocking locks when it could have been using faster spinning ones instead. This adds a few extra checks around the pieces that block and switches over to spinning locks. Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \| \|	Btrfs: break up btrfs_search_slot into smaller pieces	Chris Mason	2009-04-03	1	-90/+131
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	btrfs_search_slot was doing too many things at once. This breaks it up into more reasonable units. Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \| \|	Btrfs: kill the pinned_mutex	Josef Bacik	2009-04-03	4	-16/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes the pinned_mutex. The extent io map has an internal tree lock that protects the tree itself, and since we only copy the extent io map when we are committing the transaction we don't need it there. We also don't need it when caching the block group since searching through the tree is also protected by the internal map spin lock. Signed-off-by: Josef Bacik <jbacik@redhat.com>
\| * \| \|	Btrfs: kill the block group alloc mutex	Josef Bacik	2009-04-03	3	-157/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch removes the block group alloc mutex used to protect the free space tree for allocations and replaces it with a spin lock which is used only to protect the free space rb tree. This means we only take the lock when we are directly manipulating the tree, which makes us a touch faster with multi-threaded workloads. This patch also gets rid of btrfs_find_free_space and replaces it with btrfs_find_space_for_alloc, which takes the number of bytes you want to allocate, and empty_size, which is used to indicate how much free space should be at the end of the allocation. It will return an offset for the allocator to use. If we don't end up using it we _must_ call btrfs_add_free_space to put it back. This is the tradeoff to kill the alloc_mutex, since we need to make sure nobody else comes along and takes our space. Signed-off-by: Josef Bacik <jbacik@redhat.com>
\| * \| \|	Btrfs: clean up find_free_extent	Josef Bacik	2009-04-03	1	-165/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I've replaced the strange looping constructs with a list_for_each_entry on space_info->block_groups. If we have a hint we just jump into the loop with the block group and start looking for space. If we don't find anything we start at the beginning and start looking. We never come out of the loop with a ref on the block_group _unless_ we found space to use, then we drop it after we set the trans block_group. Signed-off-by: Josef Bacik <jbacik@redhat.com>
\| * \| \|	Btrfs: free space cache cleanups	Josef Bacik	2009-04-03	2	-51/+44
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch cleans up the free space cache code a bit. It better documents the idiosyncrasies of tree_search_offset and makes the code make a bit more sense. I took out the info allocation at the start of __btrfs_add_free_space and put it where it makes more sense. This was left over cruft from when alloc_mutex existed. Also all of the re-searches we do to make sure we inserted properly. Signed-off-by: Josef Bacik <jbacik@redhat.com>
\| * \| \|	Btrfs: unplug in the async bio submission threads	Chris Mason	2009-04-03	1	-1/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Btrfs pages being written get set to writeback, and then may go through a number of steps before they hit the block layer. This includes compression, checksumming and async bio submission. The end result is that someone who writes a page and then does wait_on_page_writeback is likely to unplug the queue before the bio they cared about got there. We could fix this by marking bios sync, or by doing more frequent unplugs, but this commit just changes the async bio submission code to unplug after it has processed all the bios for a device. The async bio submission does a fair job of collection bios, so this shouldn't be a huge problem for reducing merging at the elevator. For streaming O_DIRECT writes on a 5 drive array, it boosts performance from 386MB/s to 460MB/s. Thanks to Hisashi Hifumi for helping with this work. Signed-off-by: Chris Mason <chris.mason@oracle.com>
\| * \| \|	Btrfs: keep processing bios for a given bdev if our proc is batching	Chris Mason	2009-04-03	1	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Btrfs uses async helper threads to submit write bios so the checksumming helper threads don't block on the disk. The submit bio threads may process bios for more than one block device, so when they find one device congested they try to move on to other devices instead of blocking in get_request_wait for one device. This does a pretty good job of keeping multiple devices busy, but the congested flag has a number of problems. A congested device may still give you a request, and other procs that aren't backing off the congested device may starve you out. This commit uses the io_context stored in current to decide if our process has been made a batching process by the block layer. If so, it keeps sending IO down for at least one batch. This helps make sure we do a good amount of work each time we visit a bdev, and avoids large IO stalls in multi-device workloads. It's also very ugly. A better solution is in the works with Jens Axboe. Signed-off-by: Chris Mason <chris.mason@oracle.com>
* \| \| \|	x86, PAT: Remove duplicate memtype reserve in pci mmap	Suresh Siddha	2009-04-03	1	-46/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	pci mmap code was doing memtype reserve for a while now. Recently we added memtype tracking in remap_pfn_range, and pci code indirectly calls remap_pfn_range. So, we don't need seperate tracking in pci code anymore. Which means a patch that removes ~50 lines of code :-). Also, recently we found out that the pci tracking is not working as we expect it to work in some cases. Specifically, userlevel X mmap of pci, with some recent version of X, is having a problem with vm_page_prot getting reset. The pci tracking uses vm_page_prot to pass on the protection type from parent to child during fork. a) Parent does a pci mmap b) We look at PAT and get either UC_MINUS or WC mapping for parent c) Store that mapping type in vma vm_page_prot for future use d) This thread does a fork e) Fork results in mmap_ops ->open for the child process f) We get the vm_page_prot from vma and reserve that type for the child process But, between c) and e) above, the vma vm_page_prot is getting reset to zero. This results in PAT reserve failing at the time of fork as in here. http://marc.info/?l=linux-kernel&m=123858163103240&w=2 This cleanup makes the above problem go away as we do not depend on vm_page_prot in our PAT code anymore. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \| \|	Merge branch 'upstream-linus' of ↵	Linus Torvalds	2009-04-03	30	-689/+4389
\|\ \ \ \ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2 * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (32 commits) ocfs2: recover orphans in offline slots during recovery and mount ocfs2: Pagecache usage optimization on ocfs2 ocfs2: fix rare stale inode errors when exporting via nfs ocfs2/dlm: Tweak mle_state output ocfs2/dlm: Do not purge lockres that is being migrated dlm_purge_lockres() ocfs2/dlm: Remove struct dlm_lock_name in struct dlm_master_list_entry ocfs2/dlm: Show the number of lockres/mles in dlm_state ocfs2/dlm: dlm_set_lockres_owner() and dlm_change_lockres_owner() inlined ocfs2/dlm: Improve lockres counts ocfs2/dlm: Track number of mles ocfs2/dlm: Indent dlm_cleanup_master_list() ocfs2/dlm: Activate dlm->master_hash for master list entries ocfs2/dlm: Create and destroy the dlm->master_hash ocfs2/dlm: Refactor dlm_clean_master_list() ocfs2/dlm: Clean up struct dlm_lock_name ocfs2/dlm: Encapsulate adding and removing of mle from dlm->master_list ocfs2: Optimize inode group allocation by recording last used group. ocfs2: Allocate inode groups from global_bitmap. ocfs2: Optimize inode allocation by remembering last group ocfs2: fix leaf start calculation in ocfs2_dx_dir_rebalance() ...
\| * \| \| \|	ocfs2: recover orphans in offline slots during recovery and mount	Srinivas Eeda	2009-04-03	4	-18/+132
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	During recovery, a node recovers orphans in it's slot and the dead node(s). But if the dead nodes were holding orphans in offline slots, they will be left unrecovered. If the dead node is the last one to die and is holding orphans in other slots and is the first one to mount, then it only recovers it's own slot, which leaves orphans in offline slots. This patch queues complete_recovery to clean orphans for all offline slots during mount and node recovery. Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
\| * \| \| \|	ocfs2: Pagecache usage optimization on ocfs2	Hisashi Hifumi	2009-04-03	1	-11/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A page can have multiple buffers and even if a page is not uptodate, some buffers can be uptodate on pagesize != blocksize environment. This aops checks that all buffers which correspond to a part of a file that we want to read are uptodate. If so, we do not have to issue actual read IO to HDD even if a page is not uptodate because the portion we want to read are uptodate. "block_is_partially_uptodate" function is already used by ext2/3/4. With the following patch random read/write mixed workloads or random read after random write workloads can be optimized and we can get performance improvement. Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
\| * \| \| \|	ocfs2: fix rare stale inode errors when exporting via nfs	wengang wang	2009-04-03	9	-8/+319
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For nfs exporting, ocfs2_get_dentry() returns the dentry for fh. ocfs2_get_dentry() may read from disk when the inode is not in memory, without any cross cluster lock. this leads to the file system loading a stale inode. This patch fixes above problem. Solution is that in case of inode is not in memory, we get the cluster lock(PR) of alloc inode where the inode in question is allocated from (this causes node on which deletion is done sync the alloc inode) before reading out the inode itsself. then we check the bitmap in the group (the inode in question allcated from) to see if the bit is clear. if it's clear then it's stale. if the bit is set, we then check generation as the existing code does. We have to read out the inode in question from disk first to know its alloc slot and allot bit. And if its not stale we read it out using ocfs2_iget(). The second read should then be from cache. And also we have to add a per superblock nfs_sync_lock to cover the lock for alloc inode and that for inode in question. this is because ocfs2_get_dentry() and ocfs2_delete_inode() lock on them in reverse order. nfs_sync_lock is locked in EX mode in ocfs2_get_dentry() and in PR mode in ocfs2_delete_inode(). so that mutliple ocfs2_delete_inode() can run concurrently in normal case. [mfasheh@suse.com: build warning fixes and comment cleanups] Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Acked-by: Joel Becker <joel.becker@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>
\| * \| \| \|	ocfs2/dlm: Tweak mle_state output	Sunil Mushran	2009-04-03	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The debugfs file, mle_state, now prints the number of largest number of mles in one hash link. Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com> Signed-off-by: Mark Fasheh <mfasheh@suse.com>