summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* InstCombine: Combine select sequences into a single selectMatthias Braun2015-02-061-0/+18
| | | | | | | | | | | | | | Normalize select(C0, select(C1, a, b), b) -> select((C0 & C1), a, b) select(C0, a, select(C1, a, b)) -> select((C0 | C1), a, b) This normal form may enable further combines on the And/Or and shortens paths for the values. Many targets prefer the other but can go back easily in CodeGen. Differential Revision: http://reviews.llvm.org/D7399 llvm-svn: 228409
* LiveInterval: Fix SubRange memory leak.Matthias Braun2015-02-061-1/+16
| | | | llvm-svn: 228405
* AArch64PromoteConstant: Modernize and resolve some Use<->User confusion.Benjamin Kramer2015-02-061-87/+63
| | | | | | NFC. llvm-svn: 228399
* IRCE: Demote template to ArrayRef and SmallVector to array.Benjamin Kramer2015-02-061-26/+15
| | | | | | NFC. llvm-svn: 228398
* Whitespace.Chad Rosier2015-02-061-2/+0
| | | | llvm-svn: 228397
* [PBQP] Fix comment wording. NFCArnaud A. de Grandmaison2015-02-061-1/+1
| | | | llvm-svn: 228390
* R600/SI: Don't enable WQM for V_INTERP_* instructions v2Michel Danzer2015-02-061-6/+0
| | | | | | | | | Doesn't seem necessary anymore. I think this was mostly compensating for not enabling WQM for texture sampling instructions. v2: Add test coverage Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 228373
* R600/SI: Also enable WQM for image opcodes which calculate LOD v3Michel Danzer2015-02-066-56/+79
| | | | | | | | | | | | | If whole quad mode isn't enabled for these, the level of detail is calculated incorrectly for pixels along diagonal triangle edges, causing artifacts. v2: Use a TSFlag instead of lots of switch cases v3: Add test coverage Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88642 Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 228372
* Introduce print-memderefs to test isDereferenceablePointerRamkumar Ramachandra2015-02-063-0/+63
| | | | | | | | | | Since testing the function indirectly is tricky, introduce a direct print-memderefs pass, in the same spirit as print-memdeps, which prints dereferenceability information matched by FileCheck. Differential Revision: http://reviews.llvm.org/D7075 llvm-svn: 228369
* Small cleanup of MachineLICM.cppDaniel Jasper2015-02-051-15/+12
| | | | | | | | | | | | | Specifically: - Calculate the loop pre-header once at the stat of HoistOutOfLoop, so: - We don't-DFS walk the MachineDomTree if we aren't going to do anything - Don't call getCurPreheader for each Scope - Don't needlessly use a do-while loop - Use early exit for Scopes.size() == 0 No functional changes intended. llvm-svn: 228350
* [Hexagon] Renaming v4 compare-and-jump instructions.Colin LeMahieu2015-02-053-46/+44
| | | | llvm-svn: 228349
* [Hexagon] Deleting unused patterns.Colin LeMahieu2015-02-051-188/+0
| | | | llvm-svn: 228348
* [Hexagon] Simplifying and formatting several patterns. Changing a pattern ↵Colin LeMahieu2015-02-052-154/+88
| | | | | | multiply to be expanded. llvm-svn: 228347
* [Hexagon] Factoring a class out of some store patterns, deleting unused ↵Colin LeMahieu2015-02-051-89/+53
| | | | | | definitions and reformatting some patterns. llvm-svn: 228345
* [Hexagon] Factoring out a class for immediate transfers and cleaning up ↵Colin LeMahieu2015-02-053-61/+70
| | | | | | formatting. llvm-svn: 228343
* [ASan] Enable -asan-stack-dynamic-alloca by default.Alexey Samsonov2015-02-051-1/+1
| | | | | | | | | | | By default, store all local variables in dynamic alloca instead of static one. It reduces the stack space usage in use-after-return mode (dynamic alloca will not be called if the local variables are stored in a fake stack), and improves the debug info quality for local variables (they will not be described relatively to %rbp/%rsp, which are assumed to be clobbered by function calls). llvm-svn: 228336
* Remove the use of getSubtarget in the creation of the X86Eric Christopher2015-02-051-6/+3
| | | | | | | | | PassManager instance. In one case we can make the determination from the Triple, in the other (execution dependency pass) the pass will avoid running if we don't have any code that uses that register class so go ahead and add it to the pipeline. llvm-svn: 228334
* Use cached subtargets inside X86FixupLEAs.Eric Christopher2015-02-051-3/+2
| | | | llvm-svn: 228333
* Migrate the X86 AsmPrinter away from using the subtarget whenEric Christopher2015-02-052-14/+20
| | | | | | | | dealing with module level emission. Currently this is using the Triple to determine, but eventually the logic should probably migrate to TLOF. llvm-svn: 228332
* Fix an incorrect identifierSylvestre Ledru2015-02-051-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: EIEIO is not a correct declaration and breaks the build under Debian HURD. Instead, E_IEIO is used. // http://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html Some additional classes of identifier names are reserved for future extensions to the C language or the POSIX.1 environment. While using these names for your own purposes right now might not cause a problem, they do raise the possibility of conflict with future versions of the C or POSIX standards, so you should avoid these names. ... Names beginning with a capital ‘E’ followed a digit or uppercase letter may be used for additional error code names. See Error Reporting.// Reported here: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=776965 And patch wrote by Svante Signell With this patch, LLVM, Clang & LLDB build under Debian HURD: https://buildd.debian.org/status/fetch.php?pkg=llvm-toolchain-3.6&arch=hurd-i386&ver=1%3A3.6~%2Brc2-2&stamp=1423040039 Reviewers: hfinkel Reviewed By: hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D7437 llvm-svn: 228331
* [Hexagon] Renaming Y2_barrier. Fixing issues where doubleword variants of ↵Colin LeMahieu2015-02-052-15/+23
| | | | | | instructions can't be newvalue producers. llvm-svn: 228330
* [PowerPC] Prepare loops for pre-increment loads/storesHal Finkel2015-02-054-0/+383
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | PowerPC supports pre-increment load/store instructions (except for Altivec/VSX vector load/stores). Using these on embedded cores can be very important, but most loops are not naturally set up to use them. We can often change that, however, by placing loops into a non-canonical form. Generically, this means transforming loops like this: for (int i = 0; i < n; ++i) array[i] = c; to look like this: T *p = array[-1]; for (int i = 0; i < n; ++i) *++p = c; the key point is that addresses accessed are pulled into dedicated PHIs and "pre-decremented" in the loop preheader. This allows the use of pre-increment load/store instructions without loop peeling. A target-specific late IR-level pass (running post-LSR), PPCLoopPreIncPrep, is introduced to perform this transformation. I've used this code out-of-tree for generating code for the PPC A2 for over a year. Somewhat to my surprise, running the test suite + externals on a P7 with this transformation enabled showed no performance regressions, and one speedup: External/SPEC/CINT2006/483.xalancbmk/483.xalancbmk -2.32514% +/- 1.03736% So I'm going to enable it on everything for now. I was surprised by this because, on the POWER cores, these pre-increment load/store instructions are cracked (and, thus, harder to schedule effectively). But seeing no regressions, and feeling that it is generally easier to split instructions apart late than it is to combine them late, this might be the better approach regardless. In the future, we might want to integrate this functionality into LSR (but currently LSR does not create new PHI nodes, so (for that and other reasons) significant work would need to be done). llvm-svn: 228328
* [PowerPC] Generate pre-increment floating-point ld/st instructionsHal Finkel2015-02-051-0/+4
| | | | | | | | PowerPC supports pre-increment floating-point load/store instructions, both r+r and r+i, and we had patterns for them, but they were not marked as legal. Mark them as legal (and add a test case). llvm-svn: 228327
* [Hexagon] Renaming A2_subri, A2_andir, A2_orir. Fixing formatting.Colin LeMahieu2015-02-053-39/+39
| | | | llvm-svn: 228326
* [CodeGen] Add hook/combine to form vector extloads, enabled on X86.Ahmed Bougacha2015-02-053-12/+128
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The combine that forms extloads used to be disabled on vector types, because "None of the supported targets knows how to perform load and sign extend on vectors in one instruction." That's not entirely true, since at least SSE4.1 X86 knows how to do those sextloads/zextloads (with PMOVS/ZX). But there are several aspects to getting this right. First, vector extloads are controlled by a profitability callback. For instance, on ARM, several instructions have folded extload forms, so it's not always beneficial to create an extload node (and trying to match extloads is a whole 'nother can of worms). The interesting optimization enables folding of s/zextloads to illegal (splittable) vector types, expanding them into smaller legal extloads. It's not ideal (it introduces some legalization-like behavior in the combine) but it's better than the obvious alternative: form illegal extloads, and later try to split them up. If you do that, you might generate extloads that can't be split up, but have a valid ext+load expansion. At vector-op legalization time, it's too late to generate this kind of code, so you end up forced to scalarize. It's better to just avoid creating egregiously illegal nodes. This optimization is enabled unconditionally on X86. Note that the splitting combine is happy with "custom" extloads. As is, this bypasses the actual custom lowering, and just unrolls the extload. But from what I've seen, this is still much better than the current custom lowering, which does some kind of unrolling at the end anyway (see for instance load_sext_4i8_to_4i64 on SSE2, and the added FIXME). Also note that the existing combine that forms extloads is now also enabled on legal vectors. This doesn't have a big effect on X86 (because sext+load is usually combined to sext_inreg+aextload). On ARM it fires on some rare occasions; that's for a separate commit. Differential Revision: http://reviews.llvm.org/D6904 llvm-svn: 228325
* X86 ABI fix for return values > 24 bytes.Andrew Trick2015-02-051-8/+9
| | | | | | | | | | | The return value's address must be returned in %rax. i.e. the callee needs to copy the sret argument (%rdi) into the return value (%rax). This probably won't manifest as a bug when the caller is LLVM-compiled code. But it is an ABI guarantee and tools expect it. llvm-svn: 228321
* [Hexagon] Renaming A2_addi and formatting.Colin LeMahieu2015-02-057-37/+34
| | | | llvm-svn: 228318
* move fold comments to the corresponding fold; NFCSanjay Patel2015-02-051-3/+9
| | | | llvm-svn: 228317
* [Hexagon] Since decoding conflicts have been resolved, isCodeGenOnly = 0 by ↵Colin LeMahieu2015-02-056-532/+207
| | | | | | default and remove explicitly setting it. llvm-svn: 228316
* LowerSwitch: Use ConstantInt for CaseRange::{Low,High}Hans Wennborg2015-02-051-20/+20
| | | | | | | Case values are always ConstantInt. This allows us to remove a bunch of casts. NFC. llvm-svn: 228312
* LowerSwitch: remove default args from CaseRange ctor; NFCHans Wennborg2015-02-051-3/+2
| | | | llvm-svn: 228311
* R600/SI: Fix bug in TTI loop unrolling preferencesTom Stellard2015-02-051-1/+1
| | | | | | | | | | | | | We should be setting UnrollingPreferences::MaxCount to MAX_UINT instead of UnrollingPreferences::Count. Count is a 'forced unrolling factor', while MaxCount sets an upper limit to the unrolling factor. Setting Count to MAX_UINT was causing the loop in the testcase to be unrolled 15 times, when it only had a maximum of 4 iterations. llvm-svn: 228303
* R600/SI: Fix bug from insertion of llvm.SI.end.cf into loop headersTom Stellard2015-02-051-3/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The llvm.SI.end.cf intrinsic is used to mark the end of if-then blocks, if-then-else blocks, and loops. It is responsible for updating the exec mask to re-enable threads that had been masked during the preceding control flow block. For example: s_mov_b64 exec, 0x3 ; Initial exec mask s_mov_b64 s[0:1], exec ; Saved exec mask v_cmpx_gt_u32 exec, s[2:3], v0, 0 ; llvm.SI.if do_stuff() s_or_b64 exec, exec, s[0:1] ; llvm.SI.end.cf The bug fixed by this patch was one where the llvm.SI.end.cf intrinsic was being inserted into the header of loops. This would happen when an if block terminated in a loop header and we would end up with code like this: s_mov_b64 exec, 0x3 ; Initial exec mask s_mov_b64 s[0:1], exec ; Saved exec mask v_cmpx_gt_u32 exec, s[2:3], v0, 0 ; llvm.SI.if do_stuff() LOOP: ; Start of loop header s_or_b64 exec, exec, s[0:1] ; llvm.SI.end.cf <-BUG: The exec mask has the same value at the beginning of each loop iteration. do_stuff(); s_cbranch_execnz LOOP The fix is to create a new basic block before the loop and insert the llvm.SI.end.cf there. This way the exec mask is restored before the start of the loop instead of at the beginning of each iteration. llvm-svn: 228302
* [PowerPC] Implement the vclz instructions for PWR8Bill Schmidt2015-02-052-4/+22
| | | | | | | | | | | | | | | | | | | | | | | | Patch by Kit Barton. Add the vector count leading zeros instruction for byte, halfword, word, and doubleword sizes. This is a fairly straightforward addition after the changes made for vpopcnt: 1. Add the correct definitions for the various instructions in PPCInstrAltivec.td 2. Make the CTLZ operation legal on vector types when using P8Altivec in PPCISelLowering.cpp Test Plan Created new test case in test/CodeGen/PowerPC/vec_clz.ll to check the instructions are being generated when the CTLZ operation is used in LLVM. Check the encoding and decoding in test/MC/PowerPC/ppc_encoding_vmx.s and test/Disassembler/PowerPC/ppc_encoding_vmx.txt respectively. llvm-svn: 228301
* Add a FIXME.Rafael Espindola2015-02-051-0/+3
| | | | | | Thanks to Eric for the suggestion. llvm-svn: 228300
* Removing an unused variable warning I accidentally introduced with my last ↵Aaron Ballman2015-02-051-1/+1
| | | | | | warning fix; NFC. llvm-svn: 228295
* Silencing an MSVC warning about a switch statement with no cases; NFC.Aaron Ballman2015-02-051-8/+5
| | | | llvm-svn: 228294
* [X86][MMX] Handle i32->mmx conversion using movdBruno Cardoso Lopes2015-02-054-0/+38
| | | | | | | | | Implement a BITCAST dag combine to transform i32->mmx conversion patterns into a X86 specific node (MMX_MOVW2D) and guarantee that moves between i32 and x86mmx are better handled, i.e., don't use store-load to do the conversion.. llvm-svn: 228293
* [X86][MMX] Move MMX DAG node to proper fileBruno Cardoso Lopes2015-02-052-3/+8
| | | | llvm-svn: 228291
* Teach isDereferenceablePointer() to look through bitcast constant expressions.Michael Kuperstein2015-02-051-1/+1
| | | | | | | | This fixes a LICM regression due to the new load+store pair canonicalization. Differential Revision: http://reviews.llvm.org/D7411 llvm-svn: 228284
* [X86] Add xrstors/xsavec/xsaves/clflushopt/clwb/pcommit instructionsCraig Topper2015-02-053-4/+27
| | | | llvm-svn: 228283
* [X86] Remove two feature flags that covered sets of instructions that have ↵Craig Topper2015-02-056-21/+4
| | | | | | no patterns or intrinsics. Since we don't check feature flags in the assembler parser for any instruction sets, these flags don't provide any value. This frees up 2 of the fully utilized feature flags. llvm-svn: 228282
* R600/SI: Fix i64 truncate to i1Matt Arsenault2015-02-051-0/+6
| | | | llvm-svn: 228273
* Disable enumeral mismatch warning when compiling llvm with gcc.Larisse Voufo2015-02-051-2/+3
| | | | | | | | | | | Tested with gcc 4.9.2. Compiling with -Werror was producing: .../llvm/lib/Target/X86/X86ISelLowering.cpp: In function 'llvm::SDValue lowerVectorShuffleAsBitMask(llvm::SDLoc, llvm::MVT, llvm::SDValue, llvm::SDValue, llvm::ArrayRef<int>, llvm::SelectionDAG&)': .../llvm/lib/Target/X86/X86ISelLowering.cpp:7771:40: error: enumeral mismatch in conditional expression: 'llvm::X86ISD::NodeType' vs 'llvm::ISD::NodeType' [-Werror=enum-compare] V = DAG.getNode(VT.isFloatingPoint() ? X86ISD::FAND : ISD::AND, DL, VT, V, ^ llvm-svn: 228271
* Implement new heuristic for complete loop unrolling.Michael Zolotukhin2015-02-051-2/+332
| | | | | | | | | | | | | | | | | | | | | | | | | Complete loop unrolling can make some loads constant, thus enabling a lot of other optimizations. To catch such cases, we look for loads that might become constants and estimate number of instructions that would be simplified or become dead after substitution. Example: Suppose we have: int a[] = {0, 1, 0}; v = 0; for (i = 0; i < 3; i ++) v += b[i]*a[i]; If we completely unroll the loop, we would get: v = b[0]*a[0] + b[1]*a[1] + b[2]*a[2] Which then will be simplified to: v = b[0]* 0 + b[1]* 1 + b[2]* 0 And finally: v = b[1] llvm-svn: 228265
* Value soft float calls as more expensive in the inliner.Cameron Esfahani2015-02-055-1/+46
| | | | | | | | | | | | | | Summary: When evaluating floating point instructions in the inliner, ask the TTI whether it is an expensive operation. By default, it's not an expensive operation. This keeps the default behavior the same as before. The ARM TTI has been updated to return back TCC_Expensive for targets which don't have hardware floating point. Reviewers: chandlerc, echristo Reviewed By: echristo Subscribers: t.p.northover, aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D6936 llvm-svn: 228263
* Try to fix the build in MCValue.cppReid Kleckner2015-02-051-1/+1
| | | | llvm-svn: 228256
* Fixup.Sean Silva2015-02-051-2/+2
| | | | | | Didn't see these calls in my release build locally when testing. llvm-svn: 228254
* [MC] Remove various unused MCAsmInfo parameters.Sean Silva2015-02-054-15/+10
| | | | llvm-svn: 228244
* IR: Rename 'operator ==()' to 'isKeyOf()', NFCDuncan P. N. Exon Smith2015-02-051-4/+4
| | | | | | `isKeyOf()` is a clearer name than overloading `operator==()`. llvm-svn: 228242
OpenPOWER on IntegriCloud