summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* Minor refactor to make applying patches from 'Add a "probe-stack" attribute' ↵Philip Reames2014-08-211-1/+5
| | | | | | review thread out of order easier. llvm-svn: 216241
* Use DILexicalBlockFile, rather than DILexicalBlock, to track discriminator ↵David Blaikie2014-08-214-15/+13
| | | | | | | | | | | | | | | changes to ensure discriminator changes don't introduce new DWARF DW_TAG_lexical_blocks. Somewhat unnoticed in the original implementation of discriminators, but it could cause instructions to end up in new, small, DW_TAG_lexical_blocks due to the use of DILexicalBlock to track discriminator changes. Instead, use DILexicalBlockFile which we already use to track file changes without introducing new scopes, so it works well to track discriminator changes in the same way. llvm-svn: 216239
* name change: isPow2DivCheap -> isPow2SDivCheapSanjay Patel2014-08-214-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | isPow2DivCheap That name doesn't specify signed or unsigned. Lazy as I am, I eventually read the function and variable comments. It turns out that this is strictly about signed div. But I discovered that the comments are wrong: srl/add/sra is not the general sequence for signed integer division by power-of-2. We need one more 'sra': sra/srl/add/sra That's the sequence produced in DAGCombiner. The first 'sra' may be removed when dividing by exactly '2', but that's a special case. This patch corrects the comments, changes the name of the flag bit, and changes the name of the accessor methods. No functional change intended. Differential Revision: http://reviews.llvm.org/D5010 llvm-svn: 216237
* [PeepholeOptimizer] Enable the advanced copy optimization by default.Quentin Colombet2014-08-211-1/+1
| | | | | | | | | | | | | The advanced copy optimization does not yield any difference on the whole llvm test-suite + SPECs, either in compile time or runtime (binaries are identical), but has a big potential when data go back and forth between register files as demonstrated with test/CodeGen/ARM/adv-copy-opt.ll. Note: This was measured for both Os and O3 for armv7s, arm64, and x86_64. <rdar://problem/12702965> llvm-svn: 216236
* Whitespace change to reduce diff in future patch.Philip Reames2014-08-211-6/+6
| | | | | | | | Patch 2 of 11 in 'Add a "probe-stack" attribute' review thread Patch by: john.kare.alsaker@gmail.com llvm-svn: 216235
* [X86] Split out the logic to select the stack probe function (NFC)Philip Reames2014-08-212-11/+25
| | | | | | | | Patch 1 of 11 in 'Add a "probe-stack" attribute' review thread. Patch by: <john.kare.alsaker@gmail.com> llvm-svn: 216233
* Rename AtomicExpandLoadLinked into AtomicExpandRobin Morisset2014-08-218-27/+27
| | | | | | | | | | | AtomicExpandLoadLinked is currently rather ARM-specific. This patch is the first of a group that aim at making it more target-independent. See http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075873.html for details The command line option is "atomic-expand" llvm-svn: 216231
* [PeepholeOptimizer] Update the kill flags when extending the live-range of theQuentin Colombet2014-08-211-1/+5
| | | | | | | | source of a copy. <rdar://problem/12702965> llvm-svn: 216229
* Fix a URL (NFC)Justin Bogner2014-08-211-1/+1
| | | | llvm-svn: 216228
* [FastISel][AArch64] Use the correct register class to make the MI verifier ↵Juergen Ributzka2014-08-211-135/+140
| | | | | | | | | | | | | | | happy. This is mostly achieved by providing the correct register class manually, because getRegClassFor always returns the GPR*AllRegClass for MVT::i32 and MVT::i64. Also cleanup the code to use the FastEmitInst_* method whenever possible. This makes sure that the operands' register class is properly constrained. For all the remaining cases this adds the missing constrainOperandRegClass calls for each operand. llvm-svn: 216225
* Explicitly pass ownership of the MemoryBuffer to AddNewSourceBuffer using ↵David Blaikie2014-08-217-25/+19
| | | | | | std::unique_ptr llvm-svn: 216223
* R600/SI: Teach moveToVALU how to handle more S_LOAD_* instructionsTom Stellard2014-08-212-9/+127
| | | | llvm-svn: 216220
* R600/SI: Make sure SCRATCH_WAVE_OFFSET is added as Live-In to the functionTom Stellard2014-08-212-9/+7
| | | | | | This fixes a crash in an ocl conformance test. llvm-svn: 216219
* R600/SI: Remove unused SGPR spilling codeTom Stellard2014-08-212-80/+0
| | | | llvm-svn: 216218
* R600/SI: Use eliminateFrameIndex() to expand SGPR spill pseudosTom Stellard2014-08-215-112/+159
| | | | | | | | | | | | | | | This will simplify the SGPR spilling and also allow us to use MachineFrameInfo for calculating offsets, which should be more reliable than our custom code. This fixes a crash in some cases where a register would be spilled in a branch such that the VGPR defined for spilling did not dominate all the uses when restoring. This fixes a crash in an ocl conformance test. The test requries register spilling and is too big to include. llvm-svn: 216217
* R600/SI: Handle VCC in SIRegisterInfo::getPhysRegSubReg()Tom Stellard2014-08-211-0/+11
| | | | | | | This fixes a crash in an ocl conformance test. The test requries register spilling and is too big to include. llvm-svn: 216216
* Move some logic to populateLTOPassManager.Rafael Espindola2014-08-212-27/+47
| | | | | | | This will avoid code duplication in the next commit which calls it directly from the gold plugin. llvm-svn: 216211
* [AVX512] Add class to group common template arguments related to vector typeAdam Nemet2014-08-211-18/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | We discussed the issue of generality vs. readability of the AVX512 classes recently. I proposed this approach to try to hide and centralize the mappings we commonly perform based on the vector type. A new class X86VectorVTInfo captures these. The idea is to pass an instance of this class to classes/multiclasses instead of the corresponding ValueType. Then the class/multiclass can use its field for things that derive from the type rather than passing all those as separate arguments. I modified avx512_valign to demonstrate this new approach. As you can see instead of 7 related template parameters we now have one. The downside is that we have to refer to fields for the derived values. I named the argument '_' in order to make this as invisible as possible. Please let me know if you absolutely hate this. (Also once we allow local initializations in multiclasses we can recover the original version by assigning the fields to local variables.) Another possible use-case for this class is to directly map things, e.g.: RegisterClass KRC = X86VectorVTInfo<32, i16>.KRC llvm-svn: 216209
* Coverage Mapping: add function's hash to coverage function records.Alex Lorenz2014-08-211-2/+4
| | | | | | | | | | The profile data format was recently updated and the new indexing api requires the code coverage tool to know the function's hash as well as the function's name to get the execution counts for a function. Differential Revision: http://reviews.llvm.org/D4994 llvm-svn: 216207
* Respect LibraryInfo in populateLTOPassManager and use it. NFC.Rafael Espindola2014-08-212-3/+6
| | | | llvm-svn: 216203
* Remove dead code. NFC.Rafael Espindola2014-08-211-8/+0
| | | | llvm-svn: 216201
* [AArch64] Run a peephole pass right after AdvSIMD pass.Quentin Colombet2014-08-211-1/+5
| | | | | | | | | The AdvSIMD pass may produce copies that are not coalescer-friendly. The peephole optimizer knows how to fix that as demonstrated in the test case. <rdar://problem/12702965> llvm-svn: 216200
* [FastISel][AArch64] Factor out ANDWri instruction generation into a helper ↵Juergen Ributzka2014-08-211-42/+50
| | | | | | function. NFCI. llvm-svn: 216199
* Thumb1 load/store optimizer: Improve code to materialize new base register.Moritz Roth2014-08-211-5/+13
| | | | | | | | | | | | | There are two add-immediate instructions in Thumb1: tADDi8 and tADDi3. Only the latter supports using different source and destination registers, so whenever we materialize a new base register (at a certain offset) we'd do so by moving the base register value to the new register and then adding in place. This patch changes the code to use a single tADDi3 if the offset is small enough to fit in 3 bits. Differential Revision: http://reviews.llvm.org/D5006 llvm-svn: 216193
* Add a thread-model knob for lowering atomics on baremetal & single threaded ↵Jonathan Roelofs2014-08-212-3/+10
| | | | | | | | systems http://reviews.llvm.org/D4984 llvm-svn: 216182
* Handle inlining in populateLTOPassManager like in populateModulePassManager.Rafael Espindola2014-08-212-6/+16
| | | | | | No functionality change. llvm-svn: 216178
* [CLNUP] Remove return after llvm_unreachable. Thanks to Hal Finkel for pointing.Zinovy Nis2014-08-211-1/+0
| | | | llvm-svn: 216176
* DAGCombiner: Make concat_vector combine safe for EVTs and concat_vectors ↵Benjamin Kramer2014-08-211-1/+6
| | | | | | | | with many arguments. PR20677 llvm-svn: 216175
* Move DisableGVNLoadPRE from populateLTOPassManager to PassManagerBuilder.Rafael Espindola2014-08-212-9/+11
| | | | llvm-svn: 216174
* X86AsmPrinter MCJIT MSVC bug fix.Josh Klontz2014-08-211-6/+7
| | | | | | | | | | | | | | | | | Summary: This bug was introduced in r213006 which makes an assumption that MCSection is COFF for Windows MSVC. This assumption is broken for MCJIT users where ELF is used instead [1]. The fix is to change the MCSection cast to a dyn_cast. [1] http://lists.cs.uiuc.edu/pipermail/llvmdev/2013-December/068407.html. Reviewers: majnemer Reviewed By: majnemer Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D4872 llvm-svn: 216173
* [ARM] Enable DP copy, load and store instructions for FPv4-SPOliver Stannard2014-08-217-53/+206
| | | | | | | | | | | | | | | | | The FPv4-SP floating-point unit is generally referred to as single-precision only, but it does have double-precision registers and load, store and GPR<->DPR move instructions which operate on them. This patch enables the use of these registers, the main advantage of which is that we now comply with the AAPCS-VFP calling convention. This partially reverts r209650, which added some AAPCS-VFP support, but did not handle return values or alignment of double arguments in registers. This patch also adds tests for Thumb2 code generation for floating-point instructions and intrinsics, which previously only existed for ARM. llvm-svn: 216172
* Reassociate x + -0.1234 * y into x - 0.1234 * yErik Verbruggen2014-08-212-40/+49
| | | | | | | | | | | This does not require -ffast-math, and it gives CSE/GVN more options to eliminate duplicate expressions in, e.g.: return ((x + 0.1234 * y) * (x - 0.1234 * y)); Differential Revision: http://reviews.llvm.org/D4904 llvm-svn: 216169
* X86: Turn redundant if into an assertion.Benjamin Kramer2014-08-211-7/+5
| | | | | | While there remove noop casts. llvm-svn: 216168
* [x86] Added _addcarry_ and _subborrow_ intrinsicsRobert Khasanov2014-08-211-1/+9
| | | | llvm-svn: 216164
* [x86] SMAP: added HasSMAP attribute for CLAC/STAC, corrected attributesRobert Khasanov2014-08-211-1/+1
| | | | llvm-svn: 216163
* [x86] Broadwell: ADOX/ADCX. Added _addcarryx_u{32|64} intrinsics to LLVM.Robert Khasanov2014-08-212-21/+52
| | | | llvm-svn: 216162
* [x86] Enable Broadwell target.Robert Khasanov2014-08-214-0/+15
| | | | | | | | Added FeatureSMAP. Broadwell ISA includes Haswell ISA + ADX + RDSEED + SMAP llvm-svn: 216161
* [INDVARS] Extend using of widening of induction variables for the cases of ↵Zinovy Nis2014-08-211-4/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | | "sub nsw" and "mul nsw" instructions. Currently only "add nsw" are widened. This patch eliminates tons of "sext" instructions for 64 bit code (and the corresponding target code) in cases like: int N = 100; float **A; void foo(int x0, int x1) { float * A_cur = &A[0][0]; float * A_next = &A[1][0]; for(int x = x0; x < x1; ++x). { // Currently only [x+N] case is widened. Others 2 cases lead to sext. // This patch fixes it, so all 3 cases do not need sext. const float div = A_cur[x + N] + A_cur[x - N] + A_cur[x * N]; A_next[x] = div; } } ... > clang++ test.cpp -march=core-avx2 -Ofast -fno-unroll-loops -fno-tree-vectorize -S -o - Differential Revision: http://reviews.llvm.org/D4695 llvm-svn: 216160
* IntelJITEventListener updates to fix breaks by recent changes to ↵Elena Demikhovsky2014-08-211-1/+1
| | | | | | | | EngineBuilder and DIContext. By Arch Robison. llvm-svn: 216159
* Repace SmallPtrSet with SmallPtrSetImpl in function arguments to avoid ↵Craig Topper2014-08-2141-105/+103
| | | | | | needing to mention the size. llvm-svn: 216158
* InstCombine: Fold ((A | B) & C1) ^ (B & C2) -> (A & C1) ^ B if C1^C2=-1David Majnemer2014-08-212-0/+46
| | | | | | Adapted from a patch by Richard Smith, test-case written by me. llvm-svn: 216157
* Remove custom implementations of max/min in StringRef that was originally ↵Craig Topper2014-08-211-9/+9
| | | | | | added to work an old gcc bug. I believe its been fixed by now. llvm-svn: 216156
* Fix a bug around truncating vector in const prop.Jiangning Liu2014-08-211-0/+3
| | | | | | In constant folding stage, "TRUNC" can't handle vector data type. llvm-svn: 216149
* Revert r216066, "Optimize ZERO_EXTEND and SIGN_EXTEND in both SelectionDAG ↵Jiangning Liu2014-08-212-48/+3
| | | | | | Builder and type". llvm-svn: 216147
* [PeepholeOptimizer] Take advantage of the isInsertSubreg property in theQuentin Colombet2014-08-211-32/+15
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | advanced copy optimization. This is the final step patch toward transforming: udiv r0, r0, r2 udiv r1, r1, r3 vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 bx lr into: udiv r0, r0, r2 udiv r1, r1, r3 bx lr Indeed, thanks to this patch, this optimization is able to look through vmov.32 d16[0], r0 vmov.32 d16[1], r1 and is able to rewrite the following sequence: vmov.32 d16[0], r0 vmov.32 d16[1], r1 vmov r0, r1, d16 into simple generic GPR copies that the coalescer managed to remove. <rdar://problem/12702965> llvm-svn: 216144
* [ARM] Mark VSETLNi32 with the InsertSubreg property and implement the relatedQuentin Colombet2014-08-213-0/+43
| | | | | | | | | | | | | target hook. This patch teaches the compiler that: dX = VSETLNi32 dY, rZ, imm is the same as: dX = INSERT_SUBREG dY, rZ, translateImmToSubIdx(imm) <rdar://problem/12702965> llvm-svn: 216143
* [LoopVectorize] Up the maximum unroll factor to 4 for AArch64James Molloy2014-08-211-1/+7
| | | | | | Only for Cortex-A57 and Cyclone for now, where it has shown wins. llvm-svn: 216141
* [LoopVectorizer] Limit unroll factor in the presence of nested reductions.James Molloy2014-08-201-0/+17
| | | | | | If we have a scalar reduction, we can increase the critical path length if the loop we're unrolling is inside another loop. Limit, by default to 2, so the critical path only gets increased by one reduction operation. llvm-svn: 216140
* Add isInsertSubreg property.Quentin Colombet2014-08-201-0/+26
| | | | | | | | | | | | | This patch adds a new property: isInsertSubreg and the related target hooks: TargetIntrInfo::getInsertSubregInputs and TargetInstrInfo::getInsertSubregLikeInputs to specify that a target specific instruction is a (kind of) INSERT_SUBREG. The approach is similar to r215394. <rdar://problem/12702965> llvm-svn: 216139
* Lower thumbv4t & thumbv5 lo->lo copies through a push-pop sequenceJonathan Roelofs2014-08-201-2/+23
| | | | | | | | | | | On pre-v6 hardware, 'MOV lo, lo' gives undefined results, so such copies need to be avoided. This patch trades simplicity for implementation time at the expense of performance... As they say: correctness first, then performance. See http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-August/075998.html for a few ideas on how to make this better. llvm-svn: 216138
OpenPOWER on IntegriCloud