summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* ARM & AArch64: teach LowerVSETCC that output type size may differ from input.Tim Northover2015-02-082-18/+27
| | | | | | | | | | | | | | | | | | While various DAG combines try to guarantee that a vector SETCC operation will have the same output size as input, there's nothing intrinsic to either creation or LegalizeTypes that actually guarantees it, so the function needs to be ready to handle a mismatch. Fortunately this is easy enough, just extend or truncate the naturally compared result. I couldn't reproduce the failure in other backends that I know have SIMD, so it's probably only an issue for these two due to shared heritage. Should fix PR21645. llvm-svn: 228518
* [X86] Add register use/def for wrmsr and rdmsr.Craig Topper2015-02-071-0/+2
| | | | llvm-svn: 228515
* [X86] Add GETSEC instruction.Craig Topper2015-02-071-0/+6
| | | | llvm-svn: 228514
* [X86][AVX] Added missing stack folding support + test for vptest ymm instructionSimon Pilgrim2015-02-071-0/+1
| | | | llvm-svn: 228509
* Fix typos; NFC.Andrea Di Biagio2015-02-071-4/+4
| | | | llvm-svn: 228493
* [PowerPC] Handle loop predecessor invokesHal Finkel2015-02-071-4/+12
| | | | | | | | | | If a loop predecessor has an invoke as its terminator, and the return value from that invoke is used to determine the loop iteration space, then we can't insert a computation based on that value in the loop predecessor prior to the terminator (oops). If there's such an invoke, or just no predecessor for that matter, insert a new loop preheader. llvm-svn: 228488
* [AArch64] Use the source location of the IR branch when creating BccAhmed Bougacha2015-02-061-2/+2
| | | | | | | | | | | | | | | | | | from a conditional branch fed by an add/sub/mul-with-overflow node. We previously used the SDLoc of the overflow node, for no good reason. In some cases, this led to the Bcc and B terminators having different source orders, and DBG_VALUEs being inserted between them. The real issue is with the code that can't handle DBG_VALUEs between terminators: the few places affected by this will be fixed soon. In the meantime, fixing the SDLoc is a positive change no matter what. No tests, as I have no idea how to get .loc emitted for branches? rdar://19347133 llvm-svn: 228463
* Revert "r227976 - [PowerPC] Yet another approach to __tls_get_addr" and ↵Hal Finkel2015-02-0611-234/+108
| | | | | | | | | | | | | | related fixups Unfortunately, even with the workaround of disabling the linker TLS optimizations in Clang restored (which has already been done), this still breaks self-hosting on my P7 machine (-O3 -DNDEBUG -mcpu=native). Bill is currently working on an alternate implementation to address the TLS issue in a way that also fully elides the linker bug (which, unfortunately, this approach did not fully), so I'm reverting this now. llvm-svn: 228460
* use local variables; NFCSanjay Patel2015-02-061-3/+2
| | | | llvm-svn: 228452
* Test commit to see if it triggers an email to llvm-commits. No change.Cameron Esfahani2015-02-061-0/+1
| | | | llvm-svn: 228442
* Don't dllexport declarationsReid Kleckner2015-02-061-2/+2
| | | | | | Fixes PR22488 llvm-svn: 228411
* Make helper functions/classes/globals static. NFC.Benjamin Kramer2015-02-065-13/+18
| | | | llvm-svn: 228410
* AArch64PromoteConstant: Modernize and resolve some Use<->User confusion.Benjamin Kramer2015-02-061-87/+63
| | | | | | NFC. llvm-svn: 228399
* R600/SI: Don't enable WQM for V_INTERP_* instructions v2Michel Danzer2015-02-061-6/+0
| | | | | | | | | Doesn't seem necessary anymore. I think this was mostly compensating for not enabling WQM for texture sampling instructions. v2: Add test coverage Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 228373
* R600/SI: Also enable WQM for image opcodes which calculate LOD v3Michel Danzer2015-02-066-56/+79
| | | | | | | | | | | | | If whole quad mode isn't enabled for these, the level of detail is calculated incorrectly for pixels along diagonal triangle edges, causing artifacts. v2: Use a TSFlag instead of lots of switch cases v3: Add test coverage Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88642 Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 228372
* [Hexagon] Renaming v4 compare-and-jump instructions.Colin LeMahieu2015-02-053-46/+44
| | | | llvm-svn: 228349
* [Hexagon] Deleting unused patterns.Colin LeMahieu2015-02-051-188/+0
| | | | llvm-svn: 228348
* [Hexagon] Simplifying and formatting several patterns. Changing a pattern ↵Colin LeMahieu2015-02-052-154/+88
| | | | | | multiply to be expanded. llvm-svn: 228347
* [Hexagon] Factoring a class out of some store patterns, deleting unused ↵Colin LeMahieu2015-02-051-89/+53
| | | | | | definitions and reformatting some patterns. llvm-svn: 228345
* [Hexagon] Factoring out a class for immediate transfers and cleaning up ↵Colin LeMahieu2015-02-053-61/+70
| | | | | | formatting. llvm-svn: 228343
* Remove the use of getSubtarget in the creation of the X86Eric Christopher2015-02-051-6/+3
| | | | | | | | | PassManager instance. In one case we can make the determination from the Triple, in the other (execution dependency pass) the pass will avoid running if we don't have any code that uses that register class so go ahead and add it to the pipeline. llvm-svn: 228334
* Use cached subtargets inside X86FixupLEAs.Eric Christopher2015-02-051-3/+2
| | | | llvm-svn: 228333
* Migrate the X86 AsmPrinter away from using the subtarget whenEric Christopher2015-02-052-14/+20
| | | | | | | | dealing with module level emission. Currently this is using the Triple to determine, but eventually the logic should probably migrate to TLOF. llvm-svn: 228332
* Fix an incorrect identifierSylvestre Ledru2015-02-051-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: EIEIO is not a correct declaration and breaks the build under Debian HURD. Instead, E_IEIO is used. // http://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html Some additional classes of identifier names are reserved for future extensions to the C language or the POSIX.1 environment. While using these names for your own purposes right now might not cause a problem, they do raise the possibility of conflict with future versions of the C or POSIX standards, so you should avoid these names. ... Names beginning with a capital ‘E’ followed a digit or uppercase letter may be used for additional error code names. See Error Reporting.// Reported here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=776965 And patch wrote by Svante Signell With this patch, LLVM, Clang & LLDB build under Debian HURD: https://buildd.debian.org/status/fetch.php?pkg=llvm-toolchain-3.6&arch=hurd-i386&ver=1%3A3.6~%2Brc2-2&stamp=1423040039 Reviewers: hfinkel Reviewed By: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7437 llvm-svn: 228331
* [Hexagon] Renaming Y2_barrier. Fixing issues where doubleword variants of ↵Colin LeMahieu2015-02-052-15/+23
| | | | | | instructions can't be newvalue producers. llvm-svn: 228330
* [PowerPC] Prepare loops for pre-increment loads/storesHal Finkel2015-02-054-0/+383
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PowerPC supports pre-increment load/store instructions (except for Altivec/VSX vector load/stores). Using these on embedded cores can be very important, but most loops are not naturally set up to use them. We can often change that, however, by placing loops into a non-canonical form. Generically, this means transforming loops like this: for (int i = 0; i < n; ++i) array[i] = c; to look like this: T *p = array[-1]; for (int i = 0; i < n; ++i) *++p = c; the key point is that addresses accessed are pulled into dedicated PHIs and "pre-decremented" in the loop preheader. This allows the use of pre-increment load/store instructions without loop peeling. A target-specific late IR-level pass (running post-LSR), PPCLoopPreIncPrep, is introduced to perform this transformation. I've used this code out-of-tree for generating code for the PPC A2 for over a year. Somewhat to my surprise, running the test suite + externals on a P7 with this transformation enabled showed no performance regressions, and one speedup: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk -2.32514% +/- 1.03736% So I'm going to enable it on everything for now. I was surprised by this because, on the POWER cores, these pre-increment load/store instructions are cracked (and, thus, harder to schedule effectively). But seeing no regressions, and feeling that it is generally easier to split instructions apart late than it is to combine them late, this might be the better approach regardless. In the future, we might want to integrate this functionality into LSR (but currently LSR does not create new PHI nodes, so (for that and other reasons) significant work would need to be done). llvm-svn: 228328
* [PowerPC] Generate pre-increment floating-point ld/st instructionsHal Finkel2015-02-051-0/+4
| | | | | | | | PowerPC supports pre-increment floating-point load/store instructions, both r+r and r+i, and we had patterns for them, but they were not marked as legal. Mark them as legal (and add a test case). llvm-svn: 228327
* [Hexagon] Renaming A2_subri, A2_andir, A2_orir. Fixing formatting.Colin LeMahieu2015-02-053-39/+39
| | | | llvm-svn: 228326
* [CodeGen] Add hook/combine to form vector extloads, enabled on X86.Ahmed Bougacha2015-02-052-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The combine that forms extloads used to be disabled on vector types, because "None of the supported targets knows how to perform load and sign extend on vectors in one instruction." That's not entirely true, since at least SSE4.1 X86 knows how to do those sextloads/zextloads (with PMOVS/ZX). But there are several aspects to getting this right. First, vector extloads are controlled by a profitability callback. For instance, on ARM, several instructions have folded extload forms, so it's not always beneficial to create an extload node (and trying to match extloads is a whole 'nother can of worms). The interesting optimization enables folding of s/zextloads to illegal (splittable) vector types, expanding them into smaller legal extloads. It's not ideal (it introduces some legalization-like behavior in the combine) but it's better than the obvious alternative: form illegal extloads, and later try to split them up. If you do that, you might generate extloads that can't be split up, but have a valid ext+load expansion. At vector-op legalization time, it's too late to generate this kind of code, so you end up forced to scalarize. It's better to just avoid creating egregiously illegal nodes. This optimization is enabled unconditionally on X86. Note that the splitting combine is happy with "custom" extloads. As is, this bypasses the actual custom lowering, and just unrolls the extload. But from what I've seen, this is still much better than the current custom lowering, which does some kind of unrolling at the end anyway (see for instance load_sext_4i8_to_4i64 on SSE2, and the added FIXME). Also note that the existing combine that forms extloads is now also enabled on legal vectors. This doesn't have a big effect on X86 (because sext+load is usually combined to sext_inreg+aextload). On ARM it fires on some rare occasions; that's for a separate commit. Differential Revision: http://reviews.llvm.org/D6904 llvm-svn: 228325
* X86 ABI fix for return values > 24 bytes.Andrew Trick2015-02-051-8/+9
| | | | | | | | | | | The return value's address must be returned in %rax. i.e. the callee needs to copy the sret argument (%rdi) into the return value (%rax). This probably won't manifest as a bug when the caller is LLVM-compiled code. But it is an ABI guarantee and tools expect it. llvm-svn: 228321
* [Hexagon] Renaming A2_addi and formatting.Colin LeMahieu2015-02-057-37/+34
| | | | llvm-svn: 228318
* move fold comments to the corresponding fold; NFCSanjay Patel2015-02-051-3/+9
| | | | llvm-svn: 228317
* [Hexagon] Since decoding conflicts have been resolved, isCodeGenOnly = 0 by ↵Colin LeMahieu2015-02-056-532/+207
| | | | | | default and remove explicitly setting it. llvm-svn: 228316
* R600/SI: Fix bug in TTI loop unrolling preferencesTom Stellard2015-02-051-1/+1
| | | | | | | | | | | | | We should be setting UnrollingPreferences::MaxCount to MAX_UINT instead of UnrollingPreferences::Count. Count is a 'forced unrolling factor', while MaxCount sets an upper limit to the unrolling factor. Setting Count to MAX_UINT was causing the loop in the testcase to be unrolled 15 times, when it only had a maximum of 4 iterations. llvm-svn: 228303
* R600/SI: Fix bug from insertion of llvm.SI.end.cf into loop headersTom Stellard2015-02-051-3/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The llvm.SI.end.cf intrinsic is used to mark the end of if-then blocks, if-then-else blocks, and loops. It is responsible for updating the exec mask to re-enable threads that had been masked during the preceding control flow block. For example: s_mov_b64 exec, 0x3 ; Initial exec mask s_mov_b64 s[0:1], exec ; Saved exec mask v_cmpx_gt_u32 exec, s[2:3], v0, 0 ; llvm.SI.if do_stuff() s_or_b64 exec, exec, s[0:1] ; llvm.SI.end.cf The bug fixed by this patch was one where the llvm.SI.end.cf intrinsic was being inserted into the header of loops. This would happen when an if block terminated in a loop header and we would end up with code like this: s_mov_b64 exec, 0x3 ; Initial exec mask s_mov_b64 s[0:1], exec ; Saved exec mask v_cmpx_gt_u32 exec, s[2:3], v0, 0 ; llvm.SI.if do_stuff() LOOP: ; Start of loop header s_or_b64 exec, exec, s[0:1] ; llvm.SI.end.cf <-BUG: The exec mask has the same value at the beginning of each loop iteration. do_stuff(); s_cbranch_execnz LOOP The fix is to create a new basic block before the loop and insert the llvm.SI.end.cf there. This way the exec mask is restored before the start of the loop instead of at the beginning of each iteration. llvm-svn: 228302
* [PowerPC] Implement the vclz instructions for PWR8Bill Schmidt2015-02-052-4/+22
| | | | | | | | | | | | | | | | | | | | | | | | Patch by Kit Barton. Add the vector count leading zeros instruction for byte, halfword, word, and doubleword sizes. This is a fairly straightforward addition after the changes made for vpopcnt: 1. Add the correct definitions for the various instructions in PPCInstrAltivec.td 2. Make the CTLZ operation legal on vector types when using P8Altivec in PPCISelLowering.cpp Test Plan Created new test case in test/CodeGen/PowerPC/vec_clz.ll to check the instructions are being generated when the CTLZ operation is used in LLVM. Check the encoding and decoding in test/MC/PowerPC/ppc_encoding_vmx.s and test/Disassembler/PowerPC/ppc_encoding_vmx.txt respectively. llvm-svn: 228301
* [X86][MMX] Handle i32->mmx conversion using movdBruno Cardoso Lopes2015-02-054-0/+38
| | | | | | | | | Implement a BITCAST dag combine to transform i32->mmx conversion patterns into a X86 specific node (MMX_MOVW2D) and guarantee that moves between i32 and x86mmx are better handled, i.e., don't use store-load to do the conversion.. llvm-svn: 228293
* [X86][MMX] Move MMX DAG node to proper fileBruno Cardoso Lopes2015-02-052-3/+8
| | | | llvm-svn: 228291
* [X86] Add xrstors/xsavec/xsaves/clflushopt/clwb/pcommit instructionsCraig Topper2015-02-053-4/+27
| | | | llvm-svn: 228283
* [X86] Remove two feature flags that covered sets of instructions that have ↵Craig Topper2015-02-056-21/+4
| | | | | | no patterns or intrinsics. Since we don't check feature flags in the assembler parser for any instruction sets, these flags don't provide any value. This frees up 2 of the fully utilized feature flags. llvm-svn: 228282
* R600/SI: Fix i64 truncate to i1Matt Arsenault2015-02-051-0/+6
| | | | llvm-svn: 228273
* Disable enumeral mismatch warning when compiling llvm with gcc.Larisse Voufo2015-02-051-2/+3
| | | | | | | | | | | Tested with gcc 4.9.2. Compiling with -Werror was producing: .../llvm/lib/Target/X86/X86ISelLowering.cpp: In function 'llvm::SDValue lowerVectorShuffleAsBitMask(llvm::SDLoc, llvm::MVT, llvm::SDValue, llvm::SDValue, llvm::ArrayRef<int>, llvm::SelectionDAG&)': .../llvm/lib/Target/X86/X86ISelLowering.cpp:7771:40: error: enumeral mismatch in conditional expression: 'llvm::X86ISD::NodeType' vs 'llvm::ISD::NodeType' [-Werror=enum-compare] V = DAG.getNode(VT.isFloatingPoint() ? X86ISD::FAND : ISD::AND, DL, VT, V, ^ llvm-svn: 228271
* Value soft float calls as more expensive in the inliner.Cameron Esfahani2015-02-053-1/+23
| | | | | | | | | | | | | | Summary: When evaluating floating point instructions in the inliner, ask the TTI whether it is an expensive operation. By default, it's not an expensive operation. This keeps the default behavior the same as before. The ARM TTI has been updated to return back TCC_Expensive for targets which don't have hardware floating point. Reviewers: chandlerc, echristo Reviewed By: echristo Subscribers: t.p.northover, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D6936 llvm-svn: 228263
* [Hexagon] Deleting unused instructions and adding isCodeGenOnly to some defs.Colin LeMahieu2015-02-053-34/+8
| | | | llvm-svn: 228238
* [Hexagon] Updating load extend to i64 patterns.Colin LeMahieu2015-02-041-85/+30
| | | | llvm-svn: 228237
* [Hexagon] Cleaning up i1 load and extension patterns.Colin LeMahieu2015-02-041-24/+11
| | | | llvm-svn: 228232
* [Hexagon] Simplifying more load and store patterns and using new addressing ↵Colin LeMahieu2015-02-041-72/+41
| | | | | | patterns. llvm-svn: 228231
* R600/SI: Enable subreg liveness by defaultTom Stellard2015-02-041-0/+4
| | | | llvm-svn: 228228
* [Hexagon] Simplifying some load and store patterns.Colin LeMahieu2015-02-041-68/+35
| | | | llvm-svn: 228227
* [Hexagon] Converting absolute-address load patterns to use AddrGP.Colin LeMahieu2015-02-041-48/+13
| | | | llvm-svn: 228225
OpenPOWER on IntegriCloud