| Commit message (Collapse) | Author | Age | Files | Lines |
|\
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
* git://git.kernel.org/pub/scm/linux/kernel/git/czankel/xtensa-2.6: (21 commits)
xtensa: we don't need to include asm/io.h
xtensa: only build platform or variant if they contain a Makefile
xtensa: make startup code discardable
xtensa: ccount clocksource
xtensa: remove platform rtc hooks
xtensa: use generic sched_clock()
xtensa: platform: s6105
xtensa: let platform override KERNELOFFSET
xtensa: s6000 variant
xtensa: s6000 variant core definitions
xtensa: variant irq set callbacks
xtensa: variant-specific code
xtensa: nommu support
xtensa: add flat support
xtensa: enforce slab alignment to maximum register width
xtensa: cope with ram beginning at higher addresses
xtensa: don't make bootmem bitmap larger than required
xtensa: fix init_bootmem_node() argument order
xtensa: use correct stack pointer for stack traces
xtensa: beat Kconfig into shape
...
|
| |\
| | |
| | |
| | | |
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6 into merge
|
| | |
| | |
| | |
| | |
| | |
| | | |
Remove include statement to include asm/io.h.
Signed-off-by: Chris Zankel <chris@zankel.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
We only add the platform or variant directory to core-y if it
contains a Makefile. Consequently, we can remove the Makefiles
for the dc232b and fsf processor variants.
Signed-off-by: Chris Zankel <chris@zankel.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Move it from .text to .init.text to get rid of it after boot and
prevent illegal section references.
Signed-off-by: Daniel Glöckner <dg@emlix.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Switch to GENERIC_TIME by using the ccount register as a clock source.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Chris Zankel <chris@zankel.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
platform_get/set_rtc_time() is not implemented by any of the supported
xtensa platforms. Remove the facility completely.
The initial seconds for xtime come from read_persistent_clock() which
returns just 0 in the generic implementation. Platforms that sport a
persistent clock can implement this function.
This is needed to implement the ccount as a clock source.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Chris Zankel <chris@zankel.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Current xtensa implementation of sched_clock() is the same as the
generic one. Just remove it, the weak symbol in kernel/sched_clock.c
will be used instead.
Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Chris Zankel <chris@zankel.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Support for the S6105 IP Camera Reference Design Kit.
Signed-off-by: Johannes Weiner <jw@emlix.com>
Signed-off-by: Oskar Schirmer <os@emlix.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The linker script should not assume a fix offset in memory for the
kernel, this is platform-specific, so let the platform set it.
Signed-off-by: Johannes Weiner <jw@emlix.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Support for the Stretch S6000 Xtensa core variant.
Signed-off-by: Johannes Weiner <jw@emlix.com>
Signed-off-by: Oskar Schirmer <os@emlix.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
S6000 core configuration files from Tensilica.
Signed-off-by: Johannes Weiner <jw@emlix.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Allow the core variant code to provide irq enable/disable callbacks.
Signed-off-by: Johannes Weiner <jw@emlix.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Allow the variant to provide real code. Add empty dummy Makefiles for
the existing variants.
Signed-off-by: Johannes Weiner <jw@emlix.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add support for !CONFIG_MMU setups.
Signed-off-by: Johannes Weiner <jw@emlix.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Add the arch-specific header for flat support on xtensa in preparation
for the Xtensa S6000 nommu port.
Signed-off-by: Oskar Schirmer <os@emlix.com>
Signed-off-by: Johannes Weiner <jw@emlix.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
XCHAL_DATA_WIDTH is the maximum register width, slab caches should be
aligned to this.
Theoretical fix as all variants have had an XCHAL_DATA_WIDTH of 4
(wordsize) for now. But the S6000 variant will raise this to 16.
Signed-off-by: Oskar Schirmer <os@emlix.com>
Signed-off-by: Johannes Weiner <jw@emlix.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The current assumption of the memory code is that the first RAM PFN in
the system is 0.
Adjust the relevant code to play well with setups where memory starts
at higher addresses, indicated by PLATFORM_DEFAULT_MEM_START.
The new memory model looks like this:
+----------+--+----------------------+----------------+
| | | | |
| | | RAM | |
| | | | |
+----------+--+----------------------+----------------+
| | | | |
+- PFN 0 | +- min_low_pfn +- max_low_pfn +- max_pfn
|
+- ARCH_PFN_OFFSET
+- PLATFORM_DEFAULT_MEM_START >> PAGE_SIZE
The memory map contains pages starting from pfn ARCH_PFN_OFFSET up to
max_low_pfn. The only zone used right now will span exactly the same
region.
Usually, ARCH_PFN_OFFSET and min_low_pfn are the same value. Handle
them separately for robustness. Gapping pages will be in the memory
map but marked as reserved and won't be touched.
Signed-off-by: Johannes Weiner <jw@emlix.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
If min_low_pfn is non-zero, the bitmap reserved for bootmem is bigger
than needed. The number of pages bootmem has to maintain is the range
from min_low_pfn to max_low_pfn.
For now it has only been a theoretical mistake, min_low_pfn was always
zero.
Signed-off-by: Johannes Weiner <jw@emlix.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
The second argument to init_bootmem_node() is the PFN to place the
bootmem bitmap at and the third argument is the first PFN on the node.
This is currently backwards but never made any problems as both values
were always zero.
Signed-off-by: Johannes Weiner <jw@emlix.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Right now, the xtensa stacktrace code reads the _current_ kernel stack
pointer if nothing is supplied. With debugging facilities like sysrq
this means that the backtrace of the sysrq-handler is printed instead
of a trace of the given task's stack.
When no stack pointer is specified in show_trace() and show_stack(),
use the stack pointer that comes with the handed in task descriptor to
make stack traces more useful.
Signed-off-by: Johannes Weiner <jw@emlix.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
Instead of making support code depend on variants or platforms, the
latter should select what they need explicitely.
Otherwise this starts looking weird when support code depends on
!XTENSA_PLATFORM_FOO && !XTENSA_PLATFORM_BAR etc.
This also includes some minor fixlets like converting bool and default
to def_bool and fixing indentation and whitespace errors.
Signed-off-by: Johannes Weiner <jw@emlix.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
|
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
This constant is defined in all core headers. Remove the redundant
definition which might error out if other includes lead to inclusion
of <variant/core.h>.
Signed-off-by: Johannes Weiner <jw@emlix.com>
Signed-off-by: Chris Zankel <chris@zankel.net>
|
|\ \ \
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
* git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable:
Btrfs: BUG to BUG_ON changes
Btrfs: remove dead code
Btrfs: remove dead code
Btrfs: fix typos in comments
Btrfs: remove unused ftrace include
Btrfs: fix __ucmpdi2 compile bug on 32 bit builds
Btrfs: free inode struct when btrfs_new_inode fails
Btrfs: fix race in worker_loop
Btrfs: add flushoncommit mount option
Btrfs: notreelog mount option
Btrfs: introduce btrfs_show_options
Btrfs: rework allocation clustering
Btrfs: Optimize locking in btrfs_next_leaf()
Btrfs: break up btrfs_search_slot into smaller pieces
Btrfs: kill the pinned_mutex
Btrfs: kill the block group alloc mutex
Btrfs: clean up find_free_extent
Btrfs: free space cache cleanups
Btrfs: unplug in the async bio submission threads
Btrfs: keep processing bios for a given bdev if our proc is batching
|
| | | |
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Remove an unneeded return statement and conditional
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
merge is always NULL at this point.
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Signed-off-by: jim owens <jowens@hp.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
We get this on 32 builds:
fs/built-in.o: In function `extent_fiemap':
(.text+0x1019f2): undefined reference to `__ucmpdi2'
Happens because of a switch statement with a 64 bit argument.
Convert this to an if statement to fix this.
Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
btrfs_new_inode doesn't call iput to free the inode
when it fails.
Signed-off-by: Shen Feng <shen@cn.fujitsu.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Need to check kthread_should_stop after schedule_timeout() before calling
schedule(). This causes threads to sleep with potentially no one to wake them
up causing mount(2) to hang in btrfs_stop_workers waiting for threads to stop.
Signed-off-by: Amit Gud <gud@ksu.edu>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
The 'flushoncommit' mount option forces any data dirtied by a write in a
prior transaction to commit as part of the current commit. This makes
the committed state a fully consistent view of the file system from the
application's perspective (i.e., it includes all completed file system
operations). This was previously the behavior only when a snapshot is
created.
This is used by Ceph to ensure that completed writes make it to the
platter along with the metadata operations they are bound to (by
BTRFS_IOC_TRANS_{START,END}).
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Add a 'notreelog' mount option to disable the tree log (used by fsync,
O_SYNC writes). This is much slower, but the tree logging produces
inconsistent views into the FS for ceph.
Signed-off-by: Sage Weil <sage@newdream.net>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
btrfs options can change at times other than mount, yet /proc/mounts shows the
options string used when the fs was mounted (an example would be when btrfs
determines that barriers aren't useful and turns them off.) This patch
instead outputs the actual options in use by btrfs.
Signed-off-by: Eric Paris <eparis@redhat.com>
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Because btrfs is copy-on-write, we end up picking new locations for
blocks very often. This makes it fairly difficult to maintain perfect
read patterns over time, but we can at least do some optimizations
for writes.
This is done today by remembering the last place we allocated and
trying to find a free space hole big enough to hold more than just one
allocation. The end result is that we tend to write sequentially to
the drive.
This happens all the time for metadata and it happens for data
when mounted -o ssd. But, the way we record it is fairly racey
and it tends to fragment the free space over time because we are trying
to allocate fairly large areas at once.
This commit gets rid of the races by adding a free space cluster object
with dedicated locking to make sure that only one process at a time
is out replacing the cluster.
The free space fragmentation is somewhat solved by allowing a cluster
to be comprised of smaller free space extents. This part definitely
adds some CPU time to the cluster allocations, but it allows the allocator
to consume the small holes left behind by cow.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
btrfs_next_leaf was using blocking locks when it could have been using
faster spinning ones instead. This adds a few extra checks around
the pieces that block and switches over to spinning locks.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
btrfs_search_slot was doing too many things at once. This breaks
it up into more reasonable units.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This patch removes the pinned_mutex. The extent io map has an internal tree
lock that protects the tree itself, and since we only copy the extent io map
when we are committing the transaction we don't need it there. We also don't
need it when caching the block group since searching through the tree is also
protected by the internal map spin lock.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This patch removes the block group alloc mutex used to protect the free space
tree for allocations and replaces it with a spin lock which is used only to
protect the free space rb tree. This means we only take the lock when we are
directly manipulating the tree, which makes us a touch faster with
multi-threaded workloads.
This patch also gets rid of btrfs_find_free_space and replaces it with
btrfs_find_space_for_alloc, which takes the number of bytes you want to
allocate, and empty_size, which is used to indicate how much free space should
be at the end of the allocation.
It will return an offset for the allocator to use. If we don't end up using it
we _must_ call btrfs_add_free_space to put it back. This is the tradeoff to
kill the alloc_mutex, since we need to make sure nobody else comes along and
takes our space.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
I've replaced the strange looping constructs with a list_for_each_entry on
space_info->block_groups. If we have a hint we just jump into the loop with
the block group and start looking for space. If we don't find anything we
start at the beginning and start looking. We never come out of the loop with a
ref on the block_group _unless_ we found space to use, then we drop it after we
set the trans block_group.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
This patch cleans up the free space cache code a bit. It better documents the
idiosyncrasies of tree_search_offset and makes the code make a bit more sense.
I took out the info allocation at the start of __btrfs_add_free_space and put it
where it makes more sense. This was left over cruft from when alloc_mutex
existed. Also all of the re-searches we do to make sure we inserted properly.
Signed-off-by: Josef Bacik <jbacik@redhat.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Btrfs pages being written get set to writeback, and then may go through
a number of steps before they hit the block layer. This includes compression,
checksumming and async bio submission.
The end result is that someone who writes a page and then does
wait_on_page_writeback is likely to unplug the queue before the bio they
cared about got there.
We could fix this by marking bios sync, or by doing more frequent unplugs,
but this commit just changes the async bio submission code to unplug
after it has processed all the bios for a device. The async bio submission
does a fair job of collection bios, so this shouldn't be a huge problem
for reducing merging at the elevator.
For streaming O_DIRECT writes on a 5 drive array, it boosts performance
from 386MB/s to 460MB/s.
Thanks to Hisashi Hifumi for helping with this work.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
Btrfs uses async helper threads to submit write bios so the checksumming
helper threads don't block on the disk.
The submit bio threads may process bios for more than one block device,
so when they find one device congested they try to move on to other
devices instead of blocking in get_request_wait for one device.
This does a pretty good job of keeping multiple devices busy, but the
congested flag has a number of problems. A congested device may still
give you a request, and other procs that aren't backing off the congested
device may starve you out.
This commit uses the io_context stored in current to decide if our process
has been made a batching process by the block layer. If so, it keeps
sending IO down for at least one batch. This helps make sure we do
a good amount of work each time we visit a bdev, and avoids large IO
stalls in multi-device workloads.
It's also very ugly. A better solution is in the works with Jens Axboe.
Signed-off-by: Chris Mason <chris.mason@oracle.com>
|
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | |
| | | | |
pci mmap code was doing memtype reserve for a while now. Recently we
added memtype tracking in remap_pfn_range, and pci code indirectly calls
remap_pfn_range. So, we don't need seperate tracking in pci code
anymore. Which means a patch that removes ~50 lines of code :-).
Also, recently we found out that the pci tracking is not working as we expect
it to work in some cases. Specifically, userlevel X mmap of pci, with some
recent version of X, is having a problem with vm_page_prot getting reset.
The pci tracking uses vm_page_prot to pass on the protection type from parent
to child during fork.
a) Parent does a pci mmap
b) We look at PAT and get either UC_MINUS or WC mapping for parent
c) Store that mapping type in vma vm_page_prot for future use
d) This thread does a fork
e) Fork results in mmap_ops ->open for the child process
f) We get the vm_page_prot from vma and reserve that type for the child process
But, between c) and e) above, the vma vm_page_prot is getting reset to zero.
This results in PAT reserve failing at the time of fork as in here.
http://marc.info/?l=linux-kernel&m=123858163103240&w=2
This cleanup makes the above problem go away as we do not depend on
vm_page_prot in our PAT code anymore.
Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com>
Signed-off-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|\ \ \ \
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mfasheh/ocfs2: (32 commits)
ocfs2: recover orphans in offline slots during recovery and mount
ocfs2: Pagecache usage optimization on ocfs2
ocfs2: fix rare stale inode errors when exporting via nfs
ocfs2/dlm: Tweak mle_state output
ocfs2/dlm: Do not purge lockres that is being migrated dlm_purge_lockres()
ocfs2/dlm: Remove struct dlm_lock_name in struct dlm_master_list_entry
ocfs2/dlm: Show the number of lockres/mles in dlm_state
ocfs2/dlm: dlm_set_lockres_owner() and dlm_change_lockres_owner() inlined
ocfs2/dlm: Improve lockres counts
ocfs2/dlm: Track number of mles
ocfs2/dlm: Indent dlm_cleanup_master_list()
ocfs2/dlm: Activate dlm->master_hash for master list entries
ocfs2/dlm: Create and destroy the dlm->master_hash
ocfs2/dlm: Refactor dlm_clean_master_list()
ocfs2/dlm: Clean up struct dlm_lock_name
ocfs2/dlm: Encapsulate adding and removing of mle from dlm->master_list
ocfs2: Optimize inode group allocation by recording last used group.
ocfs2: Allocate inode groups from global_bitmap.
ocfs2: Optimize inode allocation by remembering last group
ocfs2: fix leaf start calculation in ocfs2_dx_dir_rebalance()
...
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
During recovery, a node recovers orphans in it's slot and the dead node(s). But
if the dead nodes were holding orphans in offline slots, they will be left
unrecovered.
If the dead node is the last one to die and is holding orphans in other slots
and is the first one to mount, then it only recovers it's own slot, which
leaves orphans in offline slots.
This patch queues complete_recovery to clean orphans for all offline slots
during mount and node recovery.
Signed-off-by: Srinivas Eeda <srinivas.eeda@oracle.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
A page can have multiple buffers and even if a page is not uptodate, some buffers
can be uptodate on pagesize != blocksize environment.
This aops checks that all buffers which correspond to a part of a file
that we want to read are uptodate. If so, we do not have to issue actual
read IO to HDD even if a page is not uptodate because the portion we
want to read are uptodate.
"block_is_partially_uptodate" function is already used by ext2/3/4.
With the following patch random read/write mixed workloads or random read after
random write workloads can be optimized and we can get performance improvement.
Signed-off-by: Hisashi Hifumi <hifumi.hisashi@oss.ntt.co.jp>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
For nfs exporting, ocfs2_get_dentry() returns the dentry for fh.
ocfs2_get_dentry() may read from disk when the inode is not in memory,
without any cross cluster lock. this leads to the file system loading a
stale inode.
This patch fixes above problem.
Solution is that in case of inode is not in memory, we get the cluster
lock(PR) of alloc inode where the inode in question is allocated from (this
causes node on which deletion is done sync the alloc inode) before reading
out the inode itsself. then we check the bitmap in the group (the inode in
question allcated from) to see if the bit is clear. if it's clear then it's
stale. if the bit is set, we then check generation as the existing code
does.
We have to read out the inode in question from disk first to know its alloc
slot and allot bit. And if its not stale we read it out using ocfs2_iget().
The second read should then be from cache.
And also we have to add a per superblock nfs_sync_lock to cover the lock for
alloc inode and that for inode in question. this is because ocfs2_get_dentry()
and ocfs2_delete_inode() lock on them in reverse order. nfs_sync_lock is locked
in EX mode in ocfs2_get_dentry() and in PR mode in ocfs2_delete_inode(). so
that mutliple ocfs2_delete_inode() can run concurrently in normal case.
[mfasheh@suse.com: build warning fixes and comment cleanups]
Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com>
Acked-by: Joel Becker <joel.becker@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | |
| | | | | |
The debugfs file, mle_state, now prints the number of largest number of mles
in one hash link.
Signed-off-by: Sunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: Mark Fasheh <mfasheh@suse.com>
|