summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* Cleaned up test. NFCI.Simon Pilgrim2015-08-121-9/+9
| | | | llvm-svn: 244765
* Redo "Make global aliases have symbol size equal to their type"John Brawn2015-08-124-0/+28
| | | | | | | | | | | | r242520 was reverted in r244313 as the expected behaviour of the alias attribute in C is that the alias has the same size as the aliasee. However we can re-introduce adding the size on the alias when the aliasee does not, from a source code or object perspective, exist as a discrete entity. This happens when the aliasee is not a symbol, or when that symbol is private. Differential Revision: http://reviews.llvm.org/D11943 llvm-svn: 244752
* [GlobalMerge] Only emit aliases for internal linkage variables for non-Mach-OJohn Brawn2015-08-122-0/+6
| | | | | | | | On Mach-O emitting aliases for the variables that make up a MergedGlobals variable can cause problems when linking with dead stripping enabled so don't do that, except for external variables where we must emit an alias. llvm-svn: 244748
* [X86] Disable mul -> shl + lea combine when compiling for minsizeMichael Kuperstein2015-08-121-0/+18
| | | | | | Differential Revision: http://reviews.llvm.org/D11904 llvm-svn: 244740
* [X86] Allow x86 call frame optimization to fold more loads into pushesMichael Kuperstein2015-08-121-0/+23
| | | | | | | | | | This abstracts away the test for "when can we fold across a MachineInstruction" into the the MI interface, and changes call-frame optimization use the same test the peephole optimizer users. Differential Revision: http://reviews.llvm.org/D11945 llvm-svn: 244729
* AMDGPU: Fix assert on dbg_value instructionsMatt Arsenault2015-08-121-0/+37
| | | | llvm-svn: 244728
* [InstCombine] Move SSE/AVX vector blend folding to instcombinerSimon Pilgrim2015-08-123-179/+0
| | | | | | | | | | | | As discussed in D11886, this patch moves the SSE/AVX vector blend folding to instcombiner from PerformINTRINSIC_WO_CHAINCombine (which allows us to remove this completely). InstCombiner already had partial support for this, I just had to add support for zero (ConstantAggregateZero) masks and also the case where both selection inputs were the same (allowing us to ignore the mask). I also moved all the relevant combine tests into InstCombine/blend_x86.ll Differential Revision: http://reviews.llvm.org/D11934 llvm-svn: 244723
* [x86] enable machine combiner reassociations for 256-bit vector FP mul/addSanjay Patel2015-08-122-2/+62
| | | | llvm-svn: 244705
* Use 32-bit divides instead of 64-bit divides where possible.Mark Heffernan2015-08-111-0/+80
| | | | | | | | | For NVPTX, try to use 32-bit division instead of 64-bit division when the dividend and divisor fit in 32 bits. This speeds up some internal benchmarks significantly. The underlying reason is that many index computations are carried out in 64-bits but never actually exceed the capacity of a 32-bit word. llvm-svn: 244684
* WebAssembly: implement comparison.JF Bastien2015-08-114-0/+312
| | | | | | | | | | | | Some of the FP comparisons (ueq, one, ult, ule, ugt, uge) are currently broken, I'll fix them in a follow-up. Reviewers: sunfish Subscribers: llvm-commits, jfb Differential Revision: http://reviews.llvm.org/D11924 llvm-svn: 244665
* [x86] enable machine combiner reassociations for 128-bit vector ↵Sanjay Patel2015-08-112-2/+46
| | | | | | single/double multiplies llvm-svn: 244657
* fix minsize detection: minsize attribute implies optimizing for sizeSanjay Patel2015-08-111-0/+15
| | | | | | Also, add a test for optsize because this was not part of any existing regression test. llvm-svn: 244651
* SelectionDAG: Prefer to combine multiplication with less uses for fmaJingyue Wu2015-08-111-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: For example: s6 = s0*s5; s2 = s6*s6 + s6; ... s4 = s6*s3; We notice that it is possible for s2 is folded to fma (s0, s5, fmul (s6 s6)). This only happens when Aggressive is true, otherwise hasOneUse() check already prevents from folding the multiplication with more uses. Test Plan: test/CodeGen/NVPTX/fma-assoc.ll Patch by Xuetian Weng Reviewers: hfinkel, apazos, jingyue, ohsallen, arsenm Subscribers: arsenm, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D11855 llvm-svn: 244649
* fix minsize detection: minsize attribute implies optimizing for sizeSanjay Patel2015-08-111-13/+4
| | | | llvm-svn: 244631
* add missing tests for powi expansion with size optimizationsSanjay Patel2015-08-111-0/+27
| | | | | | The minsize test will be fixed in the next commit. llvm-svn: 244630
* fixed to use FileCheckSanjay Patel2015-08-111-5/+15
| | | | llvm-svn: 244627
* fixed to test attribute, rather than CPUSanjay Patel2015-08-111-1/+1
| | | | llvm-svn: 244625
* [GlobalMerge] Use private linkage for MergedGlobals variablesJohn Brawn2015-08-1110-66/+57
| | | | | | | | | | | | | | | Other objects can never reference the MergedGlobals symbol so external linkage is never needed. Using private instead of internal linkage means the object is more similar to what it looks like when global merging is not enabled, with the only difference being that the merged variables are addressed indirectly relative to the start of the section they are in. Also add aliases for merged variables with internal linkage, as this also makes the object be more like what it is when they are not merged. Differential Revision: http://reviews.llvm.org/D11942 llvm-svn: 244615
* delete FIXME comment; it's fixedSanjay Patel2015-08-111-2/+0
| | | | llvm-svn: 244605
* fix minsize detection: minsize attribute implies optimizing for sizeSanjay Patel2015-08-111-4/+5
| | | | llvm-svn: 244604
* add missing test for machine combiner when optimizing for sizeSanjay Patel2015-08-111-0/+30
| | | | | | The minsize test will be fixed in the next commit. llvm-svn: 244603
* [X86] Allow merging of immediates within a basic block for code size savingsMichael Kuperstein2015-08-112-85/+82
| | | | | | | | | | | First step in preventing immediates that occur more than once within a single basic block from being pulled into their users, in order to prevent unnecessary large instruction encoding .Currently enabled only when optimizing for size. Patch by: zia.ansari@intel.com Differential Revision: http://reviews.llvm.org/D11363 llvm-svn: 244601
* [AArch64] Match fminnum/fmaxnum for vector fminnm/fmaxnm instead of an ↵James Molloy2015-08-112-11/+20
| | | | | | | | | | | intrinsic. Lower Intrinsic::aarch64_neon_fmin/fmax to fminnum/fmannum and match that instead. Minimal functional change: - Extra tests added because coverage of scalar fminnm/fmaxnm instructions was nonexistant. - f16 test updated because now we actually generate scalar fminnm/fmaxnm we no longer need to bail out to a libcall! llvm-svn: 244595
* [X86] When optimizing for minsize, use POP for small post-call stack clean-upMichael Kuperstein2015-08-112-2/+63
| | | | | | | | | | | | | | | | When optimizing for size, replace "addl $4, %esp" and "addl $8, %esp" following a call by one or two pops, respectively. We don't try to do it in general, but only when the stack adjustment immediately follows a call - which is the most common case. That allows taking a short-cut when trying to find a free register to pop into, instead of a full-blown liveness check. If the adjustment immediately follows a call, then every register the call clobbers but doesn't define should be dead at that point, and can be used. Differential Revision: http://reviews.llvm.org/D11749 llvm-svn: 244578
* Allow PeepholeOptimizer to fold a few more casesMichael Kuperstein2015-08-112-13/+10
| | | | | | | | | | The condition for clearing the folding candidate list was clamped together with the "uninteresting instruction" condition. This is too conservative, e.g. we don't need to clear the list when encountering an IMPLICIT_DEF. Differential Revision: http://reviews.llvm.org/D11591 llvm-svn: 244577
* WebAssembly: add basic floating-point testsJF Bastien2015-08-112-0/+138
| | | | | | | | | | Summary: I somehow forgot to add these when I added the basic floating-point opcodes. Also remove ceil/floor/trunc/nearestint for now, and add them only when properly tested. Subscribers: llvm-commits, sunfish, jfb Differential Revision: http://reviews.llvm.org/D11927 llvm-svn: 244562
* WebAssembly: simply assert on SNaN and NaNs with payloadsJF Bastien2015-08-111-2/+2
| | | | | | | | | | Summary: convertToHexString doesn't represent them correctly at this point in time. This is a follow-up to sunfish's suggestion in D11914. Subscribers: llvm-commits, sunfish, jfb Differential Revision: http://reviews.llvm.org/D11925 llvm-svn: 244551
* MIR Serialization: Serialize UsedPhysRegMask from the machine register info.Alex Lorenz2015-08-111-0/+113
| | | | | | | | | | | | This commit serializes the UsedPhysRegMask register mask from the machine register information class. The mask is serialized as an inverted 'calleeSavedRegisters' mask to keep the output minimal. This commit also allows the MIR parser to infer this mask from the register mask operands if the machine function doesn't specify it. Reviewers: Duncan P. N. Exon Smith llvm-svn: 244548
* MIR Parser: Report an error when a stack object is redefined.Alex Lorenz2015-08-101-0/+38
| | | | llvm-svn: 244536
* MIR Parser: Report an error when a fixed stack object is redefined.Alex Lorenz2015-08-101-0/+30
| | | | llvm-svn: 244534
* MIR Serialization: Serialize the liveout register mask machine operands.Alex Lorenz2015-08-101-0/+43
| | | | llvm-svn: 244529
* fix minsize detection: minsize attribute implies optimizing for sizeSanjay Patel2015-08-101-0/+38
| | | | llvm-svn: 244528
* WebAssembly: print immediatesJF Bastien2015-08-101-0/+174
| | | | | | | | | | | | | | | Summary: For now output using C99's hexadecimal floating-point representation. This patch also cleans up how machine operands are printed: instead of special-casing per type of machine instruction, the code now handles operands generically. Reviewers: sunfish Subscribers: llvm-commits, jfb Differential Revision: http://reviews.llvm.org/D11914 llvm-svn: 244520
* MachineVerifier: Handle the optional def operand in a PATCHPOINT instruction.Alex Lorenz2015-08-101-0/+43
| | | | | | | | | | | | The PATCHPOINT instructions have a single optional defined register operand, but the machine verifier can't verify the optional defined register operands. This commit makes sure that the machine verifier won't report an error when a PATCHPOINT instruction doesn't have its optional defined register operand. This change will allow us to enable the machine verifier for the code generation tests for the patchpoint intrinsics. Reviewers: Juergen Ributzka llvm-svn: 244513
* StackMap: FastISel: Add an appropriate number of immediate operands to theAlex Lorenz2015-08-103-0/+60
| | | | | | | | | | | | | | | | | | frame setup instruction. This commit ensures that the stack map lowering code in FastISel adds an appropriate number of immediate operands to the frame setup instruction. The previous code added just one immediate operand, which was fine for a target like AArch64, but on X86 the ADJCALLSTACKDOWN64 instruction needs two explicit operands. This caused the machine verifier to report an error when the old code added just one. Reviewers: Juergen Ributzka Differential Revision: http://reviews.llvm.org/D11853 llvm-svn: 244508
* x86: Emit LAHF/SAHF instead of PUSHF/POPFJF Bastien2015-08-101-28/+96
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | NaCl's sandbox doesn't allow PUSHF/POPF out of security concerns (priviledged emulators have forgotten to mask system bits in the past, and EFLAGS's DF bit is a constant source of hilarity). Commit r220529 fixed PR20376 by saving cmpxchg's flags result using EFLAGS, this commit now generated LAHF/SAHF instead, for all of x86 (not just NaCl) because it leads to an overall performance gain over PUSHF/POPF. As with the previous patch this code generation is pretty bad because it occurs very later, after register allocation, and in many cases it rematerializes flags which were already available (e.g. already in a register through SETE). Fortunately it's somewhat rare that this code needs to fire. I did [[ https://github.com/jfbastien/benchmark-x86-flags | a bit of benchmarking ]], the results on an Intel Haswell E5-2690 CPU at 2.9GHz are: | Time per call (ms) | Runtime (ms) | Benchmark | | 0.000012514 | 6257 | sete.i386 | | 0.000012810 | 6405 | sete.i386-fast | | 0.000010456 | 5228 | sete.x86-64 | | 0.000010496 | 5248 | sete.x86-64-fast | | 0.000012906 | 6453 | lahf-sahf.i386 | | 0.000013236 | 6618 | lahf-sahf.i386-fast | | 0.000010580 | 5290 | lahf-sahf.x86-64 | | 0.000010304 | 5152 | lahf-sahf.x86-64-fast | | 0.000028056 | 14028 | pushf-popf.i386 | | 0.000027160 | 13580 | pushf-popf.i386-fast | | 0.000023810 | 11905 | pushf-popf.x86-64 | | 0.000026468 | 13234 | pushf-popf.x86-64-fast | Clearly `PUSHF`/`POPF` are suboptimal. It doesn't really seems to be worth teaching LLVM about individual flags, at least not for this purpose. Reviewers: rnk, jvoung, t.p.northover Subscribers: llvm-commits Differential revision: http://reviews.llvm.org/D6629 llvm-svn: 244503
* fix minsize detection: minsize attribute implies optimizing for sizeSanjay Patel2015-08-101-3/+2
| | | | llvm-svn: 244499
* [x86, SSE]]add missing tests for load folding with partial register updateSanjay Patel2015-08-101-0/+34
| | | | | | The minsize case is wrong; that will be fixed in the next commit. llvm-svn: 244498
* [InstCombine] Move SSE2/AVX2 arithmetic vector shift folding to instcombinerSimon Pilgrim2015-08-102-98/+0
| | | | | | | | As discussed in D11760, this patch moves the (V)PSRA(WD) arithmetic shift-by-constant folding to InstCombine to match the logical shift implementations. Differential Revision: http://reviews.llvm.org/D11886 llvm-svn: 244495
* Fix a few more cases of 'CHECK[^:]*$'. NFCIJonathan Roelofs2015-08-101-2/+2
| | | | llvm-svn: 244491
* [Sparc] Implement i64 load/store support for 32-bit sparc.James Y Knight2015-08-104-7/+258
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The LDD/STD instructions can load/store a 64bit quantity from/to memory to/from a consecutive even/odd pair of (32-bit) registers. They are part of SparcV8, and also present in SparcV9. (Although deprecated there, as you can store 64bits in one register). As recommended on llvmdev in the thread "How to enable use of 64bit load/store for 32bit architecture" from Apr 2015, I've modeled the 64-bit load/store operations as working on a v2i32 type, rather than making i64 a legal type, but with few legal operations. The latter does not (currently) work, as there is much code in llvm which assumes that if i64 is legal, operations like "add" will actually work on it. The same assumption does not hold for v2i32 -- for vector types, it is workable to support only load/store, and expand everything else. This patch: - Adds a new register class, IntPair, for even/odd pairs of registers. - Modifies the list of reserved registers, the stack spilling code, and register copying code to support the IntPair register class. - Adds support in AsmParser. (note that in asm text, you write the name of the first register of the pair only. So the parser has to morph the single register into the equivalent paired register). - Adds the new instructions themselves (LDD/STD/LDDA/STDA). - Hooks up the instructions and registers as a vector type v2i32. Adds custom legalizer to transform i64 load/stores into v2i32 load/stores and bitcasts, so that the new instructions can actually be generated, and marks all operations other than load/store on v2i32 as needing to be expanded. - Copies the unfortunate SelectInlineAsm hack from ARMISelDAGToDAG. This hack undoes the transformation of i64 operands into two arbitrarily-allocated separate i32 registers in SelectionDAGBuilder. and instead passes them in a single IntPair. (Arbitrarily allocated registers are not useful, asm code expects to be receiving a pair, which can be passed to ldd/std.) Also adds a bunch of test cases covering all the bugs I've added along the way. Differential Revision: http://reviews.llvm.org/D8713 llvm-svn: 244484
* Fix a bunch of trivial cases of 'CHECK[^:]*$' in the tests. NFCIJonathan Roelofs2015-08-1013-27/+27
| | | | | | | I looked into adding a warning / error for this to FileCheck, but there doesn't seem to be a good way to avoid it triggering on the instances of it in RUN lines. llvm-svn: 244481
* fix minsize detection: minsize attribute implies optimizing for sizeSanjay Patel2015-08-101-6/+6
| | | | llvm-svn: 244464
* fix minsize detection: minsize attribute implies optimizing for sizeSanjay Patel2015-08-101-2/+2
| | | | llvm-svn: 244463
* fix minsize detection: minsize attribute implies optimizing for sizeSanjay Patel2015-08-101-1/+1
| | | | llvm-svn: 244460
* fix minsize detection: minsize attribute implies optimizing for sizeSanjay Patel2015-08-101-6/+7
| | | | llvm-svn: 244458
* Trace copies when checking for rematerializability in spill weight calculationRobert Lougher2015-08-101-0/+148
| | | | | | | | | | | | | | | PR24139 contains an analysis of poor register allocation. One of the findings was that when calculating the spill weight, a rematerializable interval once split is no longer rematerializable. This is because the isRematerializable check in CalcSpillWeights.cpp does not follow the copies introduced by live range splitting (after splitting, the live interval register definition is a copy which is not rematerializable). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D11686 llvm-svn: 244439
* [x86] enable machine combiner reassociations for 128-bit vector ↵Sanjay Patel2015-08-081-0/+44
| | | | | | single/double adds llvm-svn: 244403
* add a missing regression test for a DAGCombiner FDIV optimizationSanjay Patel2015-08-071-0/+15
| | | | | | | There's no test for this transform in any backend. Discovered while debugging fast-math-flag propagation in the DAG (r244053). llvm-svn: 244373
* AMDGPU: Add pass to lower OpenCL image and sampler arguments.Tom Stellard2015-08-073-0/+680
| | | | | | | | | The pass adds new kernel arguments for image attributes, and resolves calls to dummy attribute and resource id getter functions. Patch by: Zoltan Gilian llvm-svn: 244372
OpenPOWER on IntegriCloud