summaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
* UAPI: Strip _UAPI prefix on header install no matter the whitespaceDavid Howells2013-01-021-3/+3
| | | | | | | | | | | | | Commit 56c176c9cac9 ("UAPI: strip the _UAPI prefix from header guards during header installation") strips the _UAPI prefix from header guards, but only if there's a single space between the cpp directive and the label. Make it more flexible and able to handle tabs and multiple white space characters. Signed-off-by: David Howells <dhowell@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* UAPI: Remove empty Kbuild filesDavid Howells2013-01-028-8/+0
| | | | | | | | Empty files can get deleted by the patch program, so remove empty Kbuild files and their links from the parent Kbuilds. Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* Merge tag 'ecryptfs-3.8-rc2-fixes' of ↵Linus Torvalds2013-01-023-6/+14
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs Pull ecryptfs fixes from Tyler Hicks: "Two self-explanatory fixes and a third patch which improves performance: when overwriting a full page in the eCryptfs page cache, skip reading in and decrypting the corresponding lower page." * tag 'ecryptfs-3.8-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs: fs/ecryptfs/crypto.c: make ecryptfs_encode_for_filename() static eCryptfs: fix to use list_for_each_entry_safe() when delete items eCryptfs: Avoid unnecessary disk read and data decryption during writing
| * fs/ecryptfs/crypto.c: make ecryptfs_encode_for_filename() staticCong Ding2012-12-181-1/+1
| | | | | | | | | | | | | | the function ecryptfs_encode_for_filename() is only used in this file Signed-off-by: Cong Ding <dinggnu@gmail.com> Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
| * eCryptfs: fix to use list_for_each_entry_safe() when delete itemsWei Yongjun2012-12-181-3/+3
| | | | | | | | | | | | | | | | | | | | | | Since we will be removing items off the list using list_del() we need to use a safer version of the list_for_each_entry() macro aptly named list_for_each_entry_safe(). We should use the safe macro if the loop involves deletions of items. Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> [tyhicks: Fixed compiler err - missing list_for_each_entry_safe() param] Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
| * eCryptfs: Avoid unnecessary disk read and data decryption during writingLi Wang2012-11-071-2/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ecryptfs_write_begin grabs a page from page cache for writing. If the page contains invalid data, or data older than the counterpart on the disk, eCryptfs will read out the corresponing data from the disk into the page, decrypt them, then perform writing. However, for this page, if the length of the data to be written into is equal to page size, that means the whole page of data will be overwritten, in which case, it does not matter whatever the data were before, it is beneficial to perform writing directly rather than bothering to read and decrypt first. With this optimization, according to our test on a machine with Intel Core 2 Duo processor, iozone 'write' operation on an existing file with write size being multiple of page size will enjoy a steady 3x speedup. Signed-off-by: Li Wang <wangli@kylinos.com.cn> Signed-off-by: Yunchuan Wen <wenyunchuan@kylinos.com.cn> Signed-off-by: Tyler Hicks <tyhicks@canonical.com>
* | Merge branch 'for-linus' of ↵Linus Torvalds2013-01-022-28/+29
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client Pull Ceph fixes from Sage Weil: "Two of Alex's patches deal with a race when reseting server connections for open RBD images, one demotes some non-fatal BUGs to WARNs, and my patch fixes a protocol feature bit failure path." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: libceph: fix protocol feature mismatch failure path libceph: WARN, don't BUG on unexpected connection states libceph: always reset osds when kicking libceph: move linger requests sooner in kick_requests()
| * | libceph: fix protocol feature mismatch failure pathSage Weil2012-12-271-10/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We should not set con->state to CLOSED here; that happens in ceph_fault() in the caller, where it first asserts that the state is not yet CLOSED. Avoids a BUG when the features don't match. Since the fail_protocol() has become a trivial wrapper, replace calls to it with direct calls to reset_connection(). Signed-off-by: Sage Weil <sage@inktank.com> Reviewed-by: Alex Elder <elder@inktank.com>
| * | libceph: WARN, don't BUG on unexpected connection statesAlex Elder2012-12-271-6/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A number of assertions in the ceph messenger are implemented with BUG_ON(), killing the system if connection's state doesn't match what's expected. At this point our state model is (evidently) not well understood enough for these assertions to trigger a BUG(). Convert all BUG_ON(con->state...) calls to be WARN_ON(con->state...) so we learn about these issues without killing the machine. We now recognize that a connection fault can occur due to a socket closure at any time, regardless of the state of the connection. So there is really nothing we can assert about the state of the connection at that point so eliminate that assertion. Reported-by: Ugis <ugis22@gmail.com> Tested-by: Ugis <ugis22@gmail.com> Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * | libceph: always reset osds when kickingAlex Elder2012-12-271-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When ceph_osdc_handle_map() is called to process a new osd map, kick_requests() is called to ensure all affected requests are updated if necessary to reflect changes in the osd map. This happens in two cases: whenever an incremental map update is processed; and when a full map update (or the last one if there is more than one) gets processed. In the former case, the kick_requests() call is followed immediately by a call to reset_changed_osds() to ensure any connections to osds affected by the map change are reset. But for full map updates this isn't done. Both cases should be doing this osd reset. Rather than duplicating the reset_changed_osds() call, move it into the end of kick_requests(). Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
| * | libceph: move linger requests sooner in kick_requests()Alex Elder2012-12-271-11/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The kick_requests() function is called by ceph_osdc_handle_map() when an osd map change has been indicated. Its purpose is to re-queue any request whose target osd is different from what it was when it was originally sent. It is structured as two loops, one for incomplete but registered requests, and a second for handling completed linger requests. As a special case, in the first loop if a request marked to linger has not yet completed, it is moved from the request list to the linger list. This is as a quick and dirty way to have the second loop handle sending the request along with all the other linger requests. Because of the way it's done now, however, this quick and dirty solution can result in these incomplete linger requests never getting re-sent as desired. The problem lies in the fact that the second loop only arranges for a linger request to be sent if it appears its target osd has changed. This is the proper handling for *completed* linger requests (it avoids issuing the same linger request twice to the same osd). But although the linger requests added to the list in the first loop may have been sent, they have not yet completed, so they need to be re-sent regardless of whether their target osd has changed. The first required fix is we need to avoid calling __map_request() on any incomplete linger request. Otherwise the subsequent __map_request() call in the second loop will find the target osd has not changed and will therefore not re-send the request. Second, we need to be sure that a sent but incomplete linger request gets re-sent. If the target osd is the same with the new osd map as it was when the request was originally sent, this won't happen. This can be fixed through careful handling when we move these requests from the request list to the linger list, by unregistering the request *before* it is registered as a linger request. This works because a side-effect of unregistering the request is to make the request's r_osd pointer be NULL, and *that* will ensure the second loop actually re-sends the linger request. Processing of such a request is done at that point, so continue with the next one once it's been moved. Signed-off-by: Alex Elder <elder@inktank.com> Reviewed-by: Sage Weil <sage@inktank.com>
* | | mm: mempolicy: Convert shared_policy mutex to spinlockMel Gorman2013-01-022-21/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Sasha was fuzzing with trinity and reported the following problem: BUG: sleeping function called from invalid context at kernel/mutex.c:269 in_atomic(): 1, irqs_disabled(): 0, pid: 6361, name: trinity-main 2 locks held by trinity-main/6361: #0: (&mm->mmap_sem){++++++}, at: [<ffffffff810aa314>] __do_page_fault+0x1e4/0x4f0 #1: (&(&mm->page_table_lock)->rlock){+.+...}, at: [<ffffffff8122f017>] handle_pte_fault+0x3f7/0x6a0 Pid: 6361, comm: trinity-main Tainted: G W 3.7.0-rc2-next-20121024-sasha-00001-gd95ef01-dirty #74 Call Trace: __might_sleep+0x1c3/0x1e0 mutex_lock_nested+0x29/0x50 mpol_shared_policy_lookup+0x2e/0x90 shmem_get_policy+0x2e/0x30 get_vma_policy+0x5a/0xa0 mpol_misplaced+0x41/0x1d0 handle_pte_fault+0x465/0x6a0 This was triggered by a different version of automatic NUMA balancing but in theory the current version is vunerable to the same problem. do_numa_page -> numa_migrate_prep -> mpol_misplaced -> get_vma_policy -> shmem_get_policy It's very unlikely this will happen as shared pages are not marked pte_numa -- see the page_mapcount() check in change_pte_range() -- but it is possible. To address this, this patch restores sp->lock as originally implemented by Kosaki Motohiro. In the path where get_vma_policy() is called, it should not be calling sp_alloc() so it is not necessary to treat the PTL specially. Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Tested-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: Mel Gorman <mgorman@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | Merge tag 'ext4_for_linus' of ↵Linus Torvalds2013-01-029-58/+152
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 bug fixes from Ted Ts'o: "Various bug fixes for ext4. Perhaps the most serious bug fixed is one which could cause file system corruptions when performing file punch operations." * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: avoid hang when mounting non-journal filesystems with orphan list ext4: lock i_mutex when truncating orphan inodes ext4: do not try to write superblock on ro remount w/o journal ext4: include journal blocks in df overhead calcs ext4: remove unaligned AIO warning printk ext4: fix an incorrect comment about i_mutex ext4: fix deadlock in journal_unmap_buffer() ext4: split off ext4_journalled_invalidatepage() jbd2: fix assertion failure in jbd2_journal_flush() ext4: check dioread_nolock on remount ext4: fix extent tree corruption caused by hole punch
| * | | ext4: avoid hang when mounting non-journal filesystems with orphan listTheodore Ts'o2012-12-271-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When trying to mount a file system which does not contain a journal, but which does have a orphan list containing an inode which needs to be truncated, the mount call with hang forever in ext4_orphan_cleanup() because ext4_orphan_del() will return immediately without removing the inode from the orphan list, leading to an uninterruptible loop in kernel code which will busy out one of the CPU's on the system. This can be trivially reproduced by trying to mount the file system found in tests/f_orphan_extents_inode/image.gz from the e2fsprogs source tree. If a malicious user were to put this on a USB stick, and mount it on a Linux desktop which has automatic mounts enabled, this could be considered a potential denial of service attack. (Not a big deal in practice, but professional paranoids worry about such things, and have even been known to allocate CVE numbers for such problems.) Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Cc: stable@vger.kernel.org
| * | | ext4: lock i_mutex when truncating orphan inodesTheodore Ts'o2012-12-271-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit c278531d39 added a warning when ext4_flush_unwritten_io() is called without i_mutex being taken. It had previously not been taken during orphan cleanup since races weren't possible at that point in the mount process, but as a result of this c278531d39, we will now see a kernel WARN_ON in this case. Take the i_mutex in ext4_orphan_cleanup() to suppress this warning. Reported-by: Alexander Beregalov <a.beregalov@gmail.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Zheng Liu <wenqing.lz@taobao.com> Cc: stable@vger.kernel.org
| * | | ext4: do not try to write superblock on ro remount w/o journalMichael Tokarev2012-12-251-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When a journal-less ext4 filesystem is mounted on a read-only block device (blockdev --setro will do), each remount (for other, unrelated, flags, like suid=>nosuid etc) results in a series of scary messages from kernel telling about I/O errors on the device. This is becauese of the following code ext4_remount(): if (sbi->s_journal == NULL) ext4_commit_super(sb, 1); at the end of remount procedure, which forces writing (flushing) of a superblock regardless whenever it is dirty or not, if the filesystem is readonly or not, and whenever the device itself is readonly or not. We only need call ext4_commit_super when the file system had been previously mounted read/write. Thanks to Eric Sandeen for help in diagnosing this issue. Signed-off-By: Michael Tokarev <mjt@tls.msk.ru> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
| * | | ext4: include journal blocks in df overhead calcsEric Sandeen2012-12-251-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | To more accurately calculate overhead for "bsd" style df reporting, we should count the journal blocks as overhead as well. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Tested-by: Eric Whitney <enwlinux@gmail.com>
| * | | ext4: remove unaligned AIO warning printkEric Sandeen2012-12-251-8/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Although I put this in, I now think it was a bad decision. For most users, there is very little to be done in this case. They get the message, once per day, with no real context or proposed action. TBH, it generates support calls when it probably does not need to; the message sounds more dire than the situation really is. Just nuke it. Normal investigation via blktrace or whatnot can reveal poor IO patterns if bad performance is encountered. Signed-off-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | | ext4: fix an incorrect comment about i_mutexAndy Lutomirski2012-12-251-2/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | i_mutex is not held when ->sync_file is called. Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | | ext4: fix deadlock in journal_unmap_buffer()Jan Kara2012-12-253-25/+86
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We cannot wait for transaction commit in journal_unmap_buffer() because we hold page lock which ranks below transaction start. We solve the issue by bailing out of journal_unmap_buffer() and jbd2_journal_invalidatepage() with -EBUSY. Caller is then responsible for waiting for transaction commit to finish and try invalidation again. Since the issue can happen only for page stradding i_size, it is simple enough to manually call jbd2_journal_invalidatepage() for such page from ext4_setattr(), check the return value and wait if necessary. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | | ext4: split off ext4_journalled_invalidatepage()Jan Kara2012-12-252-8/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In data=journal mode we don't need delalloc or DIO handling in invalidatepage and similarly in other modes we don't need the journal handling. So split invalidatepage implementations. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu>
| * | | jbd2: fix assertion failure in jbd2_journal_flush()Jan Kara2012-12-211-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following race is possible between start_this_handle() and someone calling jbd2_journal_flush(). Process A Process B start_this_handle(). if (journal->j_barrier_count) # false if (!journal->j_running_transaction) { #true read_unlock(&journal->j_state_lock); jbd2_journal_lock_updates() jbd2_journal_flush() write_lock(&journal->j_state_lock); if (journal->j_running_transaction) { # false ... wait for committing trans ... write_unlock(&journal->j_state_lock); ... write_lock(&journal->j_state_lock); if (!journal->j_running_transaction) { # true jbd2_get_transaction(journal, new_transaction); write_unlock(&journal->j_state_lock); goto repeat; # eventually blocks on j_barrier_count > 0 ... J_ASSERT(!journal->j_running_transaction); # fails We fix the race by rechecking j_barrier_count after reacquiring j_state_lock in exclusive mode. Reported-by: yjwsignal@empal.com Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Cc: stable@vger.kernel.org
| * | | ext4: check dioread_nolock on remountJan Kara2012-12-201-11/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently we allow enabling dioread_nolock mount option on remount for filesystems where blocksize < PAGE_CACHE_SIZE. This isn't really supported so fix the bug by moving the check for blocksize != PAGE_CACHE_SIZE into parse_options(). Change the original PAGE_SIZE to PAGE_CACHE_SIZE along the way because that's what we are really interested in. Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Eric Sandeen <sandeen@redhat.com> Cc: stable@vger.kernel.org
| * | | ext4: fix extent tree corruption caused by hole punchForrest Liu2012-12-171-4/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When depth of extent tree is greater than 1, logical start value of interior node is not correctly updated in ext4_ext_rm_idx. Signed-off-by: Forrest Liu <forrestl@synology.com> Signed-off-by: "Theodore Ts'o" <tytso@mit.edu> Reviewed-by: Ashish Sangwan <ashishsangwan2@gmail.com> Cc: stable@vger.kernel.org
* | | | mempolicy: remove arg from mpol_parse_str, mpol_to_strHugh Dickins2013-01-024-14/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the unused argument (formerly no_context) from mpol_parse_str() and from mpol_to_str(). Signed-off-by: Hugh Dickins <hughd@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | tmpfs mempolicy: fix /proc/mounts corrupting memoryHugh Dickins2013-01-021-38/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Recently I suggested using "mount -o remount,mpol=local /tmp" in NUMA mempolicy testing. Very nasty. Reading /proc/mounts, /proc/pid/mounts or /proc/pid/mountinfo may then corrupt one bit of kernel memory, often in a page table (causing "Bad swap" or "Bad page map" warning or "Bad pagetable" oops), sometimes in a vm_area_struct or rbnode or somewhere worse. "mpol=prefer" and "mpol=prefer:Node" are equally toxic. Recent NUMA enhancements are not to blame: this dates back to 2.6.35, when commit e17f74af351c "mempolicy: don't call mpol_set_nodemask() when no_context" skipped mpol_parse_str()'s call to mpol_set_nodemask(), which used to initialize v.preferred_node, or set MPOL_F_LOCAL in flags. With slab poisoning, you can then rely on mpol_to_str() to set the bit for node 0x6b6b, probably in the next page above the caller's stack. mpol_parse_str() is only called from shmem_parse_options(): no_context is always true, so call it unused for now, and remove !no_context code. Set v.nodes or v.preferred_node or MPOL_F_LOCAL as mpol_to_str() might expect. Then mpol_to_str() can ignore its no_context argument also, the mpol being appropriately initialized whether contextualized or not. Rename its no_context unused too, and let subsequent patch remove them (that's not needed for stable backporting, which would involve rejects). I don't understand why MPOL_LOCAL is described as a pseudo-policy: it's a reasonable policy which suffers from a confusing implementation in terms of MPOL_PREFERRED with MPOL_F_LOCAL. I believe this would be much more robust if MPOL_LOCAL were recognized in switch statements throughout, MPOL_F_LOCAL deleted, and MPOL_PREFERRED use the (possibly empty) nodes mask like everyone else, instead of its preferred_node variant (I presume an optimization from the days before MPOL_LOCAL). But that would take me too long to get right and fully tested. Signed-off-by: Hugh Dickins <hughd@google.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | epoll: prevent missed events on EPOLL_CTL_MODEric Wong2013-01-021-1/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to ensure events are not missed. Since the modifications to the interest mask are not protected by the same lock as ep_poll_callback, we need to ensure the change is visible to other CPUs calling ep_poll_callback. We also need to ensure f_op->poll() has an up-to-date view of past events which occured before we modified the interest mask. So this barrier also pairs with the barrier in wq_has_sleeper(). This should guarantee either ep_poll_callback or f_op->poll() (or both) will notice the readiness of a recently-ready/modified item. This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in: http://thread.gmane.org/gmane.linux.kernel/1408782/ Signed-off-by: Eric Wong <normalperson@yhbt.net> Cc: Hans Verkuil <hans.verkuil@cisco.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Davide Libenzi <davidel@xmailserver.org> Cc: Hans de Goede <hdegoede@redhat.com> Cc: Mauro Carvalho Chehab <mchehab@infradead.org> Cc: David Miller <davem@davemloft.net> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andreas Voellmy <andreas.voellmy@yale.edu> Tested-by: "Junchang(Jason) Wang" <junchang.wang@yale.edu> Cc: netdev@vger.kernel.org Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* | | | Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linuxLinus Torvalds2012-12-3046-397/+880
|\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Pull DRM update from Dave Airlie: "This is a bit larger due to me not bothering to do anything since before Xmas, and other people working too hard after I had clearly given up. It's got the 3 main x86 driver fixes pulls, and a bunch of tegra fixes, doesn't fix the Ironlake bug yet, but that does seem to be getting closer. - radeon: gpu reset fixes and userspace packet support - i915: watermark fixes, workarounds, i830/845 fix, - nouveau: nvd9/kepler microcode fixes, accel is now enabled and working, gk106 support - tegra: misc fixes." * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (34 commits) Revert "drm: tegra: protect DC register access with mutex" drm: tegra: program only one window during modeset drm: tegra: clean out old gem prototypes drm: tegra: remove redundant tegra2_tmds_config entry drm: tegra: protect DC register access with mutex drm: tegra: don't leave clients host1x member uninitialized drm: tegra: fix front_porch <-> back_porch mixup drm/nve0/graph: fix fuc, and enable acceleration on all known chipsets drm/nvc0/graph: fix fuc, and enable acceleration on GF119 drm/nouveau/bios: cache ramcfg strap on later chipsets drm/nouveau/mxm: silence output if no bios data drm/nouveau/bios: parse/display extra version component drm/nouveau/bios: implement opcode 0xa9 drm/nouveau/bios: update gpio parsing apis to match current design drm/nouveau: initial support for GK106 drm/radeon: add WAIT_UNTIL to evergreen VM safe reg list drm/i915: disable shrinker lock stealing for create_mmap_offset drm/i915: optionally disable shrinker lock stealing drm/i915: fix flags in dma buf exporting drm/radeon: add support for MEM_WRITE packet ...
| * | | | Revert "drm: tegra: protect DC register access with mutex"Dave Airlie2012-12-302-14/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts commit 83c0bcb694be31dcd6c04bdd935b96a95a0af548. Lucas pointed out this was a mistake, and I missed the discussion, so just revert it out to save a rebase. Signed-off-by: Dave Airlie <airlied@redhat.com>
| * | | | drm: tegra: program only one window during modesetLucas Stach2012-12-301-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The intention is to program exactly WIN_A, not WIN_A and possibly others. Signed-off-by: Lucas Stach <dev@lynxeye.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
| * | | | drm: tegra: clean out old gem prototypesLucas Stach2012-12-301-18/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There is no gem.c anymore, those functions are implemented by the drm_cma_helpers now. Signed-off-by: Lucas Stach <dev@lynxeye.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
| * | | | drm: tegra: remove redundant tegra2_tmds_config entryLucas Stach2012-12-301-16/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The 720p and 1080p entries are completely redundant, as we are matching the table entries against <=pclk. Also generalize the comment, as we are using those table entries even when driving other modes than the standard TV ones. Signed-off-by: Lucas Stach <dev@lynxeye.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
| * | | | drm: tegra: protect DC register access with mutexLucas Stach2012-12-302-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Window properties are programmed through a shared aperture and have to happen atomically. Also we do the read-update-write dance on some of the shared regs. To make sure that different functions don't stumble over each other protect the register access with a mutex. Signed-off-by: Lucas Stach <dev@lynxeye.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
| * | | | drm: tegra: don't leave clients host1x member uninitializedLucas Stach2012-12-301-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | No real problem for now, as nothing is using this, but leaving it unitialized is asking for trouble later on. Signed-off-by: Lucas Stach <dev@lynxeye.de> Signed-off-by: Dave Airlie <airlied@redhat.com>
| * | | | drm: tegra: fix front_porch <-> back_porch mixupLucas Stach2012-12-302-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes wrong picture offset observed when using HDMI output with a Technisat HD TV. Signed-off-by: Lucas Stach <dev@lynxeye.de> Acked-by: Mark Zhang <markz@nvidia.com> Tested-by: Mark Zhang <markz@nvidia.com> Signed-off-by: Dave Airlie <airlied@redhat.com>
| * | | | Merge branch 'drm-intel-fixes' of ↵Dave Airlie2012-12-3014-88/+356
| |\ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://people.freedesktop.org/~danvet/drm-intel into drm-next Some fixes for 3.8: - Watermark fixups from Chris Wilson (4 pieces). - 2 snb workarounds, seem to be recently added to our internal DB. - workaround for the infamous i830/i845 hang, seems now finally solid! Based on Chris' fix for SNA, now also for UXA/mesa&old SNA. - Some more fixlets for shrinker-pulls-the-rug issues (Chris&me). - Fix dma-buf flags when exporting (you). - Disable the VGA plane if it's enabled on lid open - similar fix in spirit to the one I've sent you last weeek, BIOS' really like to mess with the display when closing the lid (awesome debug work from Krzysztof Mazur). * 'drm-intel-fixes' of git://people.freedesktop.org/~danvet/drm-intel: drm/i915: disable shrinker lock stealing for create_mmap_offset drm/i915: optionally disable shrinker lock stealing drm/i915: fix flags in dma buf exporting i915: ensure that VGA plane is disabled drm/i915: Preallocate the drm_mm_node prior to manipulating the GTT drm_mm manager drm: Export routines for inserting preallocated nodes into the mm manager drm/i915: don't disable disconnected outputs drm/i915: Implement workaround for broken CS tlb on i830/845 drm/i915: Implement WaSetupGtModeTdRowDispatch drm/i915: Implement WaDisableHiZPlanesWhenMSAAEnabled drm/i915: Prefer CRTC 'active' rather than 'enabled' during WM computations drm/i915: Clear self-refresh watermarks when disabled drm/i915: Double the cursor self-refresh latency on Valleyview drm/i915: Fixup cursor latency used for IVB lp3 watermarks
| | * | | | drm/i915: disable shrinker lock stealing for create_mmap_offsetDaniel Vetter2012-12-201-3/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The mmap offset structure is not part of the drm/i915 code, but provided by gem helpers. To avoid leaky abstractions (by either depending upon implementation details of said helper wrt to preallocations, or reimplementing it in our code and so fuzzing around in internal details of that helpr) simply disable the shrinker lock stealing accross calls into the helper functions. This should fix igt/gem_tiled_swapping. v2: Fix cleanup path confusion bemoaned by Chris Wilson. Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
| | * | | | drm/i915: optionally disable shrinker lock stealingDaniel Vetter2012-12-202-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5774506f157a91400c587b85d1ce4de56f0d32f6 Author: Chris Wilson <chris@chris-wilson.co.uk> Date: Wed Nov 21 13:04:04 2012 +0000 drm/i915: Borrow our struct_mutex for the direct reclaim added a nice trick to steal the struct_mutex lock in the shrinker if it's the current task holding it. But this also caused the requirement that every place which allocates memory needs to be careful about the gem state of objects, since the shrinker could have pulled the rug out from under it. We've usually solved this by carefully preallocating things or ensure that buffers are pinned already. But the shrinker also reaps mmap offset, so allocating those needs to be careful, too. Now that code has been factored out into some common helpers, so either we have fragile code depending upon the common helper not doing something we don't want it to do. Or we need to reimplement the mmap offset creation and so also leak implementation details into our code. Since this all results in leaky abstraction, cop out by disabling the lock borrowing trick while calling down into the helpers. That way our craziness is nicely confined to files in drm/i915. v2: Split out the change to create_mmap_offset as request by Chris Wilson. Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
| | * | | | drm/i915: fix flags in dma buf exportingDave Airlie2012-12-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | As pointed out by Seung-Woo Kim this should have been passing flags like nouveau/radeon have. Cc: stable@vger.kernel.org Signed-off-by: Dave Airlie <airlied@redhat.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
| | * | | | i915: ensure that VGA plane is disabledKrzysztof Mazur2012-12-191-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Some broken systems (like HP nc6120) in some cases, usually after LID close/open, enable VGA plane, making display unusable (black screen on LVDS, some strange mode on VGA output). We used to disable VGA plane only once at startup. Now we also check, if VGA plane is still disabled while changing mode, and fix that if something changed it. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57434 Signed-off-by: Krzysztof Mazur <krzysiek@podlesie.net> Cc: stable@vger.kernel.org Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
| | * | | | drm/i915: Preallocate the drm_mm_node prior to manipulating the GTT drm_mm ↵Chris Wilson2012-12-181-37/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | manager As we may reap neighbouring objects in order to free up pages for allocations, we need to be careful not to allocate in the middle of the drm_mm manager. To accomplish this, we can simply allocate the drm_mm_node up front and then use the combined search & insert drm_mm routines, reducing our code footprint in the process. Fixes (partially) i-g-t/gem_tiled_swapping Reported-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jani Nikula <jani.nikula@intel.com> [danvet: Again fixup atomic bikeshed.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
| | * | | | drm: Export routines for inserting preallocated nodes into the mm managerChris Wilson2012-12-182-16/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Required by i915 in order to avoid the allocation in the middle of manipulating the drm_mm lists. Use a pair of stubs to preserve the existing EXPORT_SYMBOLs for backporting; to be removed later. Cc: Dave Airlie <airlied@redhat.com> Cc: dri-devel@lists.freedesktop.org Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jani Nikula <jani.nikula@intel.com> [danvet: bikeshedded-away the atomic parameter, it's not yet used anywhere.] Acked-by: Dave Airlie <airlied@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
| | * | | | drm/i915: don't disable disconnected outputsDaniel Vetter2012-12-181-4/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This piece of neat lore has been ported painstakingly and bug-for-bug compatible from the old crtc helper code. Imo it's utter nonsense. If you disconnected a cable and before you reconnect it, userspace (or the kernel) does an set_crtc call, this will result in that connector getting disabled. Which will result in a nice black screen when plugging in the cable again. There's absolutely no reason the kernel does such policy enforcements - if userspace tries to set up a mode on something disconnected we might fail loudly (since the dp link training fails), but silently adjusting the output configuration behind userspace's back is a recipe for disaster. Specifically I think that this could explain some of our MI_WAIT hangs around suspend, where userspace issues a scanline wait on a disable pipe. This mechanisims here could explain how that pipe got disabled without userspace noticing. Note that this fixes a NULL deref at BIOS takeover when the firmware sets up a disconnected output in a clone configuration with a connected output on the 2nd pipe: When doing the full modeset we don't have a mode for the 2nd pipe and OOPS. On the first pipe this doesn't matter, since at boot-up the fbdev helpers will set up the choosen configuration on that on first. Since this is now the umptenth bug around handling this imo brain-dead semantics correctly, I think it's time to kill it and see whether there's any userspace out there which relies on this. It also nicely demonstrates that we have a tiny window where DP hotplug can still kill the driver. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=58396 Cc: stable@vger.kernel.org Tested-by: Peter Ujfalusi <peter.ujfalusi@gmail.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
| | * | | | drm/i915: Implement workaround for broken CS tlb on i830/845Daniel Vetter2012-12-177-8/+100
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Now that Chris Wilson demonstrated that the key for stability on early gen 2 is to simple _never_ exchange the physical backing storage of batch buffers I've tried a stab at a kernel solution. Doesn't look too nefarious imho, now that I don't try to be too clever for my own good any more. v2: After discussing the various techniques, we've decided to always blit batches on the suspect devices, but allow userspace to opt out of the kernel workaround assume full responsibility for providing coherent batches. The principal reason is that avoiding the blit does improve performance in a few key microbenchmarks and also in cairo-trace replays. Signed-Off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> [danvet: - Drop the hunk which uses HAS_BROKEN_CS_TLB to implement the ring wrap w/a. Suggested by Chris Wilson. - Also add the ACTHD check from Chris Wilson for the error state dumping, so that we still catch batches when userspace opts out of the w/a.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
| | * | | | drm/i915: Implement WaSetupGtModeTdRowDispatchDaniel Vetter2012-12-173-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | I'm not really sure, since the w/a entry is as thin on details as ever, and Bspec doesn't say anything about it. But I've figured only dispatching to rows 0&1 instead of all four should be the right thing for GT1. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> [danvet: Add the missing snb server GT1 to the check, spotted by Chris Wilson.] Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
| | * | | | drm/i915: Implement WaDisableHiZPlanesWhenMSAAEnabledDaniel Vetter2012-12-172-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Quoting from Bspec, 3D_CHICKEN1, bit 10 This bit needs to be set always to "1", Project: DevSNB " Reviewed-by: Rodrigo Vivi <rodrigo.vivi@gmail.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
| | * | | | drm/i915: Prefer CRTC 'active' rather than 'enabled' during WM computationsChris Wilson2012-12-171-7/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Only the intel_crtc->active is accurate at the point where we wish to perform WM computations, so use it instead of crtc->enabled. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
| | * | | | drm/i915: Clear self-refresh watermarks when disabledChris Wilson2012-12-171-6/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If we elect to disable self-refresh as they require too many FIFO entries, clear the values prior to writing them into the registers. If they are too large they may occupy more bits than available and so corrupt neighbouring WM values. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
| | * | | | drm/i915: Double the cursor self-refresh latency on ValleyviewChris Wilson2012-12-171-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It operates at twice the declared latency, so double the latency value used for the cursor watermark calculation. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50248 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> CC: Jesse Barnes <jbarnes@virtuousgeek.org> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
| | * | | | drm/i915: Fixup cursor latency used for IVB lp3 watermarksChris Wilson2012-12-171-5/+112
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | It operates at twice the declared latency, so adjust the computation to avoid potential flicker at low power. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=50248 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> CC: Jesse Barnes <jbarnes@virtuousgeek.org> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
OpenPOWER on IntegriCloud