summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* R600/SI: Remove SI_BUFFER_RSRC pseudoMatt Arsenault2014-10-171-1/+1
| | | | | | | Just use REG_SEQUENCE directly, so there are fewer instructions to need to deal with later. llvm-svn: 220056
* [Stackmaps] Enable invoking the patchpoint intrinsic.Juergen Ributzka2014-10-171-0/+65
| | | | | | | | | | | Patch by Kevin Modzelewski Reviewers: atrick, ributzka Reviewed By: ributzka Subscribers: llvm-commits, reames Differential Revision: http://reviews.llvm.org/D5634 llvm-svn: 220055
* [X86] Fix missed selection of non-temporal store of zero vector.Andrea Di Biagio2014-10-171-0/+31
| | | | | | | | | | When the input to a store instruction was a zero vector, the backend always selected a normal vector store regardless of the non-temporal hint. This is fixed by this patch. This fixes PR19370. llvm-svn: 220054
* [AArch64] Fix a silent codegen fault in BUILD_VECTOR lowering.James Molloy2014-10-171-0/+22
| | | | | | | | | | We should be talking about the number of source elements, not the number of destination elements, given we know at this point that the source and dest element numbers are not the same. While we're at it, avoid writing to std::vector::end()... Bug found with random testing and a lot of coffee. llvm-svn: 220051
* [PowerPC] Enable use of lxvw4x/stxvw4x in VSX code generationBill Schmidt2014-10-177-9/+142
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently the VSX support enables use of lxvd2x and stxvd2x for 2x64 types, but does not yet use lxvw4x and stxvw4x for 4x32 types. This patch adds that support. As with lxvd2x/stxvd2x, this involves straightforward overriding of the patterns normally recognized for lvx/stvx, with preference given to the VSX patterns when VSX is enabled. In addition, the logic for permitting misaligned memory accesses is modified so that v4r32 and v4i32 are treated the same as v2f64 and v2i64 when VSX is enabled. Finally, the DAG generation for unaligned loads is changed to just use a normal LOAD (which will become lxvw4x) on P8 and later hardware, where unaligned loads are preferred over lvsl/lvx/lvx/vperm. A number of tests now generate the VSX loads/stores instead of lvx/stvx, so this patch adds VSX variants to those tests. I've also added <4 x float> tests to the vsx.ll test case, and created a vsx-p8.ll test case to be used for testing code generation for the P8Vector feature. For now, that simply tests the unaligned load/store behavior. This has been tested along with a temporary patch to enable the VSX and P8Vector features, with no new regressions encountered with or without the temporary patch applied. llvm-svn: 220047
* R600: Add EG to FMA testJan Vesely2014-10-171-1/+14
| | | | | | Reviewed-by: Tom Stellard <tom@stellard.net> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 220045
* SelectionDAG: Add sext_inreg optimizationsJan Vesely2014-10-171-0/+26
| | | | | | | | | | v2: use dyn_cast fixup comments v3: use cast Reviewed-by: Matt Arsenault <arsenm2@gmail.com> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 220044
* [PPC] Adjust some PowerPC tests to account for presence/absence of VSXBill Schmidt2014-10-1719-37/+243
| | | | | | | | | | | Patch by Bill Seurer; committed on his behalf. These test cases generate slightly different code sequences when VSX is activated and thus fail. The update turns off VSX explicitly for the existing checks and then adds a second set of checks for most of them that test the VSX instruction output. llvm-svn: 220019
* ARM: Fix a bug which was causing convergence failure in constant-island pass.Akira Hatanaka2014-10-171-0/+395
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The bug is in ARMConstantIslands::createNewWater where the upper bound of the new water split point is computed: // This could point off the end of the block if we've already got constant // pool entries following this block; only the last one is in the water list. // Back past any possible branches (allow for a conditional and a maximally // long unconditional). if (BaseInsertOffset + 8 >= UserBBI.postOffset()) { BaseInsertOffset = UserBBI.postOffset() - UPad - 8; DEBUG(dbgs() << format("Move inside block: %#x\n", BaseInsertOffset)); } The split point is supposed to be somewhere between the machine instruction that loads from the constant pool entry and the end of the basic block, before branch instructions. The code above is fine if the basic block is large enough and there are a sufficient number of instructions following the machine instruction. However, if the machine instruction is near the end of the basic block, BaseInsertOffset can point to the machine instruction or another instruction that precedes it, and this can lead to convergence failure. This commit fixes this bug by ensuring BaseInsertOffset is larger than the offset of the instruction following the constant-loading instruction. rdar://problem/18581150 llvm-svn: 220015
* Erase fence insertion from SelectionDAGBuilder.cpp (NFC)Robin Morisset2014-10-161-3/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Backends can use setInsertFencesForAtomic to signal to the middle-end that montonic is the only memory ordering they can accept for stores/loads/rmws/cmpxchg. The code lowering those accesses with a stronger ordering to fences + monotonic accesses is currently living in SelectionDAGBuilder.cpp. In this patch I propose moving this logic out of it for several reasons: - There is lots of redundancy to avoid: extremely similar logic already exists in AtomicExpand. - The current code in SelectionDAGBuilder does not use any target-hooks, it does the same transformation for every backend that requires it - As a result it is plain *unsound*, as it was apparently designed for ARM. It happens to mostly work for the other targets because they are extremely conservative, but Power for example had to switch to AtomicExpand to be able to use lwsync safely (see r218331). - Because it produces IR-level fences, it cannot be made sound ! This is noted in the C++11 standard (section 29.3, page 1140): ``` Fences cannot, in general, be used to restore sequential consistency for atomic operations with weaker ordering semantics. ``` It can also be seen by the following example (called IRIW in the litterature): ``` atomic<int> x = y = 0; int r1, r2, r3, r4; Thread 0: x.store(1); Thread 1: y.store(1); Thread 2: r1 = x.load(); r2 = y.load(); Thread 3: r3 = y.load(); r4 = x.load(); ``` r1 = r3 = 1 and r2 = r4 = 0 is impossible as long as the accesses are all seq_cst. But if they are lowered to monotonic accesses, no amount of fences can prevent it.. This patch does three things (I could cut it into parts, but then some of them would not be tested/testable, please tell me if you would prefer that): - it provides a default implementation for emitLeadingFence/emitTrailingFence in terms of IR-level fences, that mimic the original logic of SelectionDAGBuilder. As we saw above, this is unsound, but the best that can be done without knowing the targets well (and there is a comment warning about this risk). - it then switches Mips/Sparc/XCore to use AtomicExpand, relying on this default implementation (that exactly replicates the logic of SelectionDAGBuilder, so no functional change) - it finally erase this logic from SelectionDAGBuilder as it is dead-code. Ideally, each target would define its own override for emitLeading/TrailingFence using target-specific fences, but I do not know the Sparc/Mips/XCore memory model well enough to do this, and they appear to be dealing fine with the ARM-inspired default expansion for now (probably because they are overly conservative, as Power was). If anyone wants to compile fences more agressively on these platforms, the long comment should make it clear why he should first override emitLeading/TrailingFence. Test Plan: make check-all, no functional change Reviewers: jfb, t.p.northover Subscribers: aemerson, llvm-commits Differential Revision: http://reviews.llvm.org/D5474 llvm-svn: 219957
* R600: Fix nonsensical implementation of computeKnownBits for BFEMatt Arsenault2014-10-161-0/+15
| | | | | | This was resulting in invalid simplifications of sdiv llvm-svn: 219953
* Delete -std-compile-opts.Rafael Espindola2014-10-166-6/+6
| | | | | | These days -std-compile-opts was just a silly alias for -O3. llvm-svn: 219951
* [AArch64] Fix miscompile of sdiv-by-power-of-2.Juergen Ributzka2014-10-161-1/+14
| | | | | | | | | | | When the constant divisor was larger than 32bits, then the optimized code generated for the AArch64 backend would emit the wrong code, because the shift was defined as a shift of a 32bit constant '(1<<Lg2(divisor))' and we would loose the upper 32bits. This fixes rdar://problem/18678801. llvm-svn: 219934
* [mips] Account for endianess when expanding BuildPairF64/ExtractElementF64 ↵Vasileios Kalintiris2014-10-161-67/+31
| | | | | | | | | | | | | | | | | | | | nodes. Summary: In order to support big endian targets for the BuildPairF64 nodes we just need to swap the low/high pair registers. Additionally, for the ExtractElementF64 nodes we have to calculate the correct stack offset with respect to the node's register/operand that we want to extract. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5753 llvm-svn: 219931
* [AVX512] Add DQ subvector insertsAdam Nemet2014-10-151-3/+6
| | | | | | | | | | | | | | In AVX512f we support 64x2 and 32x8 inserts via matching them to 32x4 and 64x4 respectively. These are matched by "Alt" Pat<>'s (Alt stands for alternative VTs). Since DQ has native support for these intructions, I peeled off the non-"Alt" part of the baseclass into vinsert_for_size_no_alt. The DQ instructions are derived from this multiclass. The "Alt" Pat<>'s are disabled with DQ. Fixes <rdar://problem/18426089> llvm-svn: 219874
* [AVX512] Add SKX testing to avx512-insert-extract.llAdam Nemet2014-10-151-2/+4
| | | | | | This is in preparation to adding DQ subvector inserts to this testcase. llvm-svn: 219873
* [AVX512] Fix test to produce a defined valueAdam Nemet2014-10-151-1/+1
| | | | | | We're inserting into a 8 wide vector, so the index should be < 8. llvm-svn: 219872
* R600/SI: Fix bug where immediates were being used in DS addr operandsTom Stellard2014-10-151-0/+18
| | | | | | | | | | | | | | | | | | | The SelectDS1Addr1Offset complex pattern always tries to store constant lds pointers in the offset operand and store a zero value in the addr operand. Since the addr operand does not accept immediates, the zero value needs to first be copied to a register. This newly created zero value will not go through normal instruction selection, so we need to manually insert a V_MOV_B32_e32 in the complex pattern. This bug was hidden by the fact that if there was another zero value in the DAG that had not been selected yet, then the CSE done by the DAG would use the unselected node for the addr operand rather than the one that was just created. This would lead to the zero value being selected and the DAG automatically inserting a V_MOV_B32_e32 instruction. llvm-svn: 219848
* Reapply "[FastISel][AArch64] Add custom lowering for GEPs."Juergen Ributzka2014-10-152-3/+43
| | | | | | | | | | | This is mostly a copy of the existing FastISel GEP code, but we have to duplicate it for AArch64, because otherwise we would bail out even for simple cases. This is because the standard fastEmit functions don't cover MUL at all and ADD is lowered very inefficientily. The original commit had a bug in the add emit logic, which has been fixed. llvm-svn: 219831
* R600/SI: Also try to use 0 base for misaligned 8-byte DS loads.Matt Arsenault2014-10-152-0/+56
| | | | llvm-svn: 219823
* R600: Fix miscompiles when BFE has multiple usesMatt Arsenault2014-10-151-0/+22
| | | | | | SimplifyDemandedBits would break the other uses of the operand. llvm-svn: 219819
* Revert "[FastISel][AArch64] Add custom lowering for GEPs."Juergen Ributzka2014-10-152-20/+3
| | | | | | This breaks our internal build bots. Reverting it to get the bots green again. llvm-svn: 219776
* [MachineSink] Use the real post dominator treeJingyue Wu2014-10-153-2/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Fixes a FIXME in MachineSinking. Instead of using the simple heuristics in isPostDominatedBy, use the real MachinePostDominatorTree and MachineLoopInfo. The old heuristics caused instructions to sink unnecessarily, and might create register pressure. This is the second try of the fix. The first one (D4814) caused a performance regression due to failing to sink instructions out of loops (PR21115). This patch fixes PR21115 by sinking an instruction from a deeper loop to a shallower one regardless of whether the target block post-dominates the source. Thanks Alexey Volkov for reporting PR21115! Test Plan: Added a NVPTX codegen test to verify that our change prevents the backend from over-sinking. It also shows the unnecessary register pressure caused by over-sinking. Added an X86 test to verify we can sink instructions out of loops regardless of the dominance relationship. This test is reduced from Alexey's test in PR21115. Updated an affected test in X86. Also ran SPEC CINT2006 and llvm-test-suite for compilation time and runtime performance. Results are attached separately in the review thread. Reviewers: Jiangning, resistor, hfinkel Reviewed By: hfinkel Subscribers: hfinkel, bruno, volkalexey, llvm-commits, meheff, eliben, jholewinski Differential Revision: http://reviews.llvm.org/D5633 llvm-svn: 219773
* [AAarch64] Optimize CSINC-branch sequenceGerolf Hoflehner2014-10-141-0/+26
| | | | | | | | | | | | | | | | | | | | | Peephole optimization that generates a single conditional branch for csinc-branch sequences like in the examples below. This is possible when the csinc sets or clears a register based on a condition code and the branch checks that register. Also the condition code may not be modified between the csinc and the original branch. Examples: 1. Convert csinc w9, wzr, wzr, <CC>;tbnz w9, #0, 0x44 to b.<invCC> 2. Convert csinc w9, wzr, wzr, <CC>; tbz w9, #0, 0x44 to b.<CC> rdar://problem/18506500 llvm-svn: 219742
* [X86][SSE] pslldq/psrldq shuffle mask decodesSimon Pilgrim2014-10-144-146/+146
| | | | | | | | Patch to provide shuffle decodes and asm comments for the sse pslldq/psrldq SSE2/AVX2 byte shift instructions. Differential Revision: http://reviews.llvm.org/D5598 llvm-svn: 219738
* ARM: remove ARM/Thumb distinction for preferred alignment.Tim Northover2014-10-143-9/+23
| | | | | | | | | | | | Thumb1 has legitimate reasons for preferring 32-bit alignment of types i1/i8/i16, since the 16-bit encoding of "add rD, sp, #imm" requires #imm to be a multiple of 4. However, this is a trade-off betweem code size and RAM usage; the DataLayout string is not the best place to represent it even if desired. So this patch removes the extra Thumb requirements, hopefully making ARM and Thumb completely compatible in this respect. llvm-svn: 219734
* ARM: allow misaligned local variables in Thumb1 mode.Tim Northover2014-10-141-0/+2
| | | | | | | | There's no hard requirement on LLVM to align local variable to 32-bits, so the Thumb1 frame handling needs to be able to deal with variables that are only naturally aligned without falling over. llvm-svn: 219733
* [FastISel][AArch64] Add custom lowering for GEPs.Juergen Ributzka2014-10-142-3/+20
| | | | | | | | This is mostly a copy of the existing FastISel GEP code, but on AArch64 we bail out even for simple cases, because the standard fastEmit functions don't cover MUL and ADD is lowered inefficientily. llvm-svn: 219726
* ARM: set preferred aggregate alignment to 32 universally.Tim Northover2014-10-141-0/+6
| | | | | | | | | | | Before, ARM and Thumb mode code had different preferred alignments, which could lead to some rather unexpected results. There's justification for reducing it from the default 64-bits (wasted space), but I don't think there is for going below 32-bits. There's no actual ABI change here, just to reassure people. llvm-svn: 219719
* [FastISel][AArch64] Fix sign-/zero-extend folding when SelectionDAG is involved.Juergen Ributzka2014-10-142-0/+556
| | | | | | | | | | | Sign-/zero-extend folding depended on the load and the integer extend to be both selected by FastISel. This cannot always be garantueed and SelectionDAG might interfer. This commit adds additonal checks to load and integer extend lowering to catch this. Related to rdar://problem/18495928. llvm-svn: 219716
* Reapply "R600: Add new intrinsic to read work dimensions"Jan Vesely2014-10-141-0/+16
| | | | | | | | | This effectively reverts revert 219707. After fixing the test to work with new function name format and renamed intrinsic. Reviewed-by: Tom Stellard <tom@stellard.net> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 219710
* Revert "R600: Add new intrinsic to read work dimensions"Rafael Espindola2014-10-141-16/+0
| | | | | | | | This reverts commit r219705. CodeGen/R600/work-item-intrinsics.ll was failing on linux. llvm-svn: 219707
* R600: Add new intrinsic to read work dimensionsJan Vesely2014-10-141-0/+16
| | | | | | | | | | | | | | v2: Add SI lowering Add test v3: Place work dimensions after the kernel arguments. v4: Calculate offset while lowering arguments v5: rebase v6: change prefix to AMDGPU Reviewed-by: Tom Stellard <tom@stellard.net> Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> llvm-svn: 219705
* R600/SI: Use DS offsets for constant addressesMatt Arsenault2014-10-142-0/+45
| | | | | | | | Use 0 as the base address for a constant address, so if we have a constant address we can save moves and form read2/write2s. llvm-svn: 219698
* [AArch64] Fix crash with empty/pseudo-only blocks in A53 erratum (835769) ↵Bradley Smith2014-10-141-0/+9
| | | | | | workaround llvm-svn: 219684
* [AArch64]Select wide immediate offset into [Base+XReg] addressing modeHao Liu2014-10-143-15/+130
| | | | | | | | | | | | | | | e.g Currently we'll generate following instructions if the immediate is too wide: MOV X0, WideImmediate ADD X1, BaseReg, X0 LDR X2, [X1, 0] Using [Base+XReg] addressing mode can save one ADD as following: MOV X0, WideImmediate LDR X2, [BaseReg, X0] Differential Revision: http://reviews.llvm.org/D5477 llvm-svn: 219665
* Fix a broadcast related regression on the vector shuffle lowering.Filipe Cabecinhas2014-10-132-0/+52
| | | | | | | | | | | | Summary: Test by Robert Lougher! Reviewers: chandlerc Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5745 llvm-svn: 219617
* Adds support for the Cortex-A17 to the ARM backendRenato Golin2014-10-131-0/+32
| | | | | | Patch by Matthew Wahab. llvm-svn: 219606
* [mips] Mark redundant instructions with a comment in ↵Daniel Sanders2014-10-131-2/+9
| | | | | | test/CodeGen/Mips/Fast-ISel/icmpa.ll. llvm-svn: 219605
* [AArch64] Add workaround for Cortex-A53 erratum (835769)Bradley Smith2014-10-131-0/+525
| | | | | | | | | | | | | | | | | | | | Some early revisions of the Cortex-A53 have an erratum (835769) whereby it is possible for a 64-bit multiply-accumulate instruction in AArch64 state to generate an incorrect result. The details are quite complex and hard to determine statically, since branches in the code may exist in some circumstances, but all cases end with a memory (load, store, or prefetch) instruction followed immediately by the multiply-accumulate operation. The safest work-around for this issue is to make the compiler avoid emitting multiply-accumulate instructions immediately after memory instructions and the simplest way to do this is to insert a NOP. This patch implements such work-around in the backend, enabled via the option -aarch64-fix-cortex-a53-835769. The work-around code generation is not enabled by default. llvm-svn: 219603
* Revert r219584, "[X86] Memory folding for commutative instructions."NAKAMURA Takumi2014-10-131-16/+0
| | | | | | It broke i686 selfhosting. llvm-svn: 219595
* [X86] Memory folding for commutative instructions.Simon Pilgrim2014-10-121-0/+16
| | | | | | | | | | This patch improves support for commutative instructions in the x86 memory folding implementation by attempting to fold a commuted version of the instruction if the original folding fails - if that folding fails as well the instruction is 're-commuted' back to its original order before returning. This mainly helps the stack inliner better fold reloads of 3 (or more) operand instructions (VEX encoded SSE etc.) but by performing this in the lowest foldMemoryOperandImpl implementation it also replaces the X86InstrInfo::optimizeLoadInstr version and is now used by FastISel too. Differential Revision: http://reviews.llvm.org/D5701 llvm-svn: 219584
* llvm/test/CodeGen: Some tests don't REQUIRE asserts any more. Remove them.NAKAMURA Takumi2014-10-128-8/+0
| | | | llvm-svn: 219581
* Add basic conditional branches in mips fast-iselReed Kotler2014-10-111-0/+34
| | | | | | | | | | | | | | | | | | Summary: Implement the most basic form of conditional branches in Mips fast-isel. Test Plan: br1.ll run 4 flavors of test-suite. mips32 r1/r2 and at -O0/O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D5583 llvm-svn: 219556
* R600/SI: Change how DS offsets are printedMatt Arsenault2014-10-1016-209/+213
| | | | | | | Match SC by using offset/offset0/offset1 and printing in decimal. llvm-svn: 219537
* R600/SI: Match read2/write2 stride 64 versionsMatt Arsenault2014-10-105-9/+399
| | | | llvm-svn: 219536
* R600/SI: Add load / store machine optimizer pass.Matt Arsenault2014-10-103-6/+843
| | | | | | | | | | | | | Currently this only functions to match simple cases where ds_read2_* / ds_write2_* instructions can be used. In the future it might match some of the other weird load patterns, such as direct to LDS loads. Currently enabled only with a subtarget feature to enable easier testing. llvm-svn: 219533
* Implement floating point compare for mips fast-iselReed Kotler2014-10-101-0/+254
| | | | | | | | | | | | | | | | | | Summary: Expand SelectCmp to handle floating point compare Test Plan: fpcmpa.ll run 4 flavors of test-suite, mips32 r1/r2 O0/O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits, rfuhler Differential Revision: http://reviews.llvm.org/D5567 llvm-svn: 219530
* implement integer compare in mips fast-iselReed Kotler2014-10-101-0/+203
| | | | | | | | | | | | | | | | | | Summary: implement SelectCmp (integer compare ) in mips fast-isel Test Plan: icmpa.ll also ran 4 test-suite flavors mips32 r1/r2 O0/O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits, rfuhler, mcrosier Differential Revision: http://reviews.llvm.org/D5566 llvm-svn: 219518
* [MiSched] Fix a logic error in tryPressure()Hal Finkel2014-10-102-3/+5
| | | | | | | | | | | | | Fixes a logic error in the MachineScheduler found by Steve Montgomery (and confirmed by Andy). This has gone unfixed for months because the fix has been found to introduce some small performance regressions. However, Andy has recommended that, at this point, we fix this to avoid further dependence on the incorrect behavior (and then follow-up separately on any regressions), and I agree. Fixes PR18883. llvm-svn: 219512
OpenPOWER on IntegriCloud