summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* GlobalISel: combine extracts & sequences created for legalizationTim Northover2016-08-302-10/+108
| | | | | | | | Legalization ends up creating many G_SEQUENCE/G_EXTRACT pairs which leads to inefficient codegen (even for -O0), so add a quick pass over the function to remove them again. llvm-svn: 280155
* AMDGPU: Relax SGPR asm constraint register classMatt Arsenault2016-08-301-0/+10
| | | | | | | s should be SReg_32 to be as general as possible. This can avoid a copy from m0. llvm-svn: 280154
* GlobalISel: forbid physical registers on generic MIs.Tim Northover2016-08-308-72/+160
| | | | | | | | | | We're intending to move to a world where the type of a register is determined by its (unique) def. This is incompatible with physregs, which are untyped. It also means the other passes don't have to worry quite so much about register-class compatibility and inserting COPYs appropriately. llvm-svn: 280132
* [PowerPC] Force entry alignment in .got2Hal Finkel2016-08-301-0/+1
| | | | | | | | | Implement Bill's suggested fix for 32-bit targets for PR22711 (for the alignment of each entry). As pointed out in the bug report, we could just force the section alignment, since we only add pointer-sized things currently, but this fix is somewhat more future-proof. llvm-svn: 280049
* [PowerPC] Add support for -mlongcallHal Finkel2016-08-301-0/+26
| | | | | | | | | | | The "long call" option forces the use of the indirect calling sequence for all calls (even those that don't really need it). GCC provides this option; This is helpful, under certain circumstances, for building very-large binaries, and some other specialized use cases. Fixes PR19098. llvm-svn: 280040
* [PowerPC] Add triple to test/CodeGen/PowerPC/atomic-2.ll for ppc64leHal Finkel2016-08-301-1/+1
| | | | | | Otherwise, running the test on Darwin systems will not work. llvm-svn: 280034
* [PowerPC] Fix i8/i16 atomics for little-Endian targets without partword atomicsHal Finkel2016-08-291-1/+14
| | | | | | | | | | | | | | For little-Endian PowerPC, we generally target only P8 and later by default. However, generic (older) 64-bit configurations are still an option, and in that case, partword atomics are not available (e.g. stbcx.). To lower i8/i16 atomics without true i8/i16 atomic operations, we emulate using i32 atomics in combination with a bunch of shifting and masking, etc. The amount by which to shift in little-Endian mode is different from the amount in big-Endian mode (it is inverted -- meaning we can leave off the xor when computing the amount). Fixes PR22923. llvm-svn: 280022
* Propagate TBAA info in SelectionDAG::getIndexedLoadKrzysztof Parzyszek2016-08-291-0/+37
| | | | | | Patch by Pranav Bhandarkar. llvm-svn: 279998
* AMDGPU/SI: Implement a custom MachineSchedStrategyTom Stellard2016-08-2923-62/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: GCNSchedStrategy re-uses most of GenericScheduler, it's just uses a different method to compute the excess and critical register pressure limits. It's not enabled by default, to enable it you need to pass -misched=gcn to llc. Shader DB stats: 32464 shaders in 17874 tests Totals: SGPRS: 1542846 -> 1643125 (6.50 %) VGPRS: 1005595 -> 904653 (-10.04 %) Spilled SGPRs: 29929 -> 27745 (-7.30 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 36688188 -> 37034900 (0.95 %) bytes LDS: 1913 -> 1913 (0.00 %) blocks Max Waves: 254101 -> 265125 (4.34 %) Wait states: 0 -> 0 (0.00 %) Totals from affected shaders: SGPRS: 1338220 -> 1438499 (7.49 %) VGPRS: 886221 -> 785279 (-11.39 %) Spilled SGPRs: 29869 -> 27685 (-7.31 %) Spilled VGPRs: 334 -> 352 (5.39 %) Scratch VGPRs: 1612 -> 1624 (0.74 %) dwords per thread Code Size: 34315716 -> 34662428 (1.01 %) bytes LDS: 1551 -> 1551 (0.00 %) blocks Max Waves: 188127 -> 199151 (5.86 %) Wait states: 0 -> 0 (0.00 %) Reviewers: arsenm, mareko, nhaehnle, MatzeB, atrick Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23688 llvm-svn: 279995
* AMDGPU/SI: Improve SILoadStoreOptimizer and run it before the schedulerTom Stellard2016-08-2910-39/+61
| | | | | | | | | | | | | | | | | | | | Summary: The SILoadStoreOptimizer can now look ahead more then one instruction when looking for instructions to merge, which greatly improves the number of loads/stores that we are able to merge. Moving the pass before scheduling avoids increasing register pressure after the scheduler, so that the scheduler's register pressure estimates will be more accurate. It also gives more consistent results, since it is no longer affected by minor scheduling changes. Reviewers: arsenm Subscribers: arsenm, kzhuravl, llvm-commits Differential Revision: https://reviews.llvm.org/D23814 llvm-svn: 279991
* GlobalISel: legalize frem to a libcall on AArch64.Tim Northover2016-08-291-0/+14
| | | | llvm-svn: 279988
* AMDGPU/R600: Fix fixups used for constant arraysMatt Arsenault2016-08-291-0/+28
| | | | | | Fixes bug 29289 llvm-svn: 279986
* IfConversion: Fix branch predication bug.Kyle Butt2016-08-291-0/+37
| | | | | | | | | | | | This bug shows up with diamonds that share unpredicable, unanalyzable branches. There's an included test case from Hexagon. What was happening was that we were attempting to predicate the branch instruction despite the fact that it was checked to be the same. Now for unanalyzable branches we skip over the branch instructions when predicating the block. Differential Revision: https://reviews.llvm.org/D23939 llvm-svn: 279985
* Make vec_fabs.ll pass with MSVC 2013Reid Kleckner2016-08-291-4/+7
| | | | | | We should revert this change once we drop support for MSVC 2013. llvm-svn: 279979
* [TargetLowering] remove fdiv and frem from canOpTrap() (PR29114)Sanjay Patel2016-08-291-10/+2
| | | | | | | | | | | | | | | | | Assuming the default FP env, we should not treat fdiv and frem any differently in terms of trapping behavior than any other FP op. Ie, FP ops do not trap with the default FP env. This matches how we treat these ops in IR with isSafeToSpeculativelyExecute(). There's a similar bug in Constant::canTrap(). This bug manifests in PR29114: https://llvm.org/bugs/show_bug.cgi?id=29114 ...as a sequence of scalar divisions instead of a vector division on x86 for a <3 x float> type. Differential Revision: https://reviews.llvm.org/D23974 llvm-svn: 279970
* Do not use MRI::getMaxLaneMaskForVReg as a mask covering whole registerKrzysztof Parzyszek2016-08-291-0/+48
| | | | | | | | | | | | | MRI::getMaxLaneMaskForVReg does not always cover the whole register. For example, on X86 the upper 16 bits of EAX cannot be accessed via any subregister. Consequently, there is no lane mask that only covers that part of EAX. The getMaxLaneMaskForVReg will return the union of the lane masks for all subregisters, and in case of EAX, that union will not cover the upper 16 bits. This fixes https://llvm.org/bugs/show_bug.cgi?id=29132 llvm-svn: 279969
* AMDGPU/SI: Improve register allocation hints for sopk instructionsTom Stellard2016-08-291-2/+2
| | | | | | | | | | | | | | | | | | | Summary: For shrinking SOPK instructions, we were creating a hint to tell the register allocator to use the register allocated for src0 for the dst operand as well. However, this seems to not work sometimes depending on the order virtual registers are assigned physical registers. To fix this, I've added a second allocation hint which does the reverse, asks that the register allocated for dst is used for src0. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23862 llvm-svn: 279968
* Use the correct ctor/dtor section for dynamic-no-pic.Rafael Espindola2016-08-291-0/+4
| | | | llvm-svn: 279967
* Fixed a bug in type legalizer for masked gather.Igor Breger2016-08-291-0/+33
| | | | | | | | | The problem occurs when the Node doesn't updated in place , UpdateNodeOperation() return the node that already exist. In this case assert fail in PromoteIntegerOperand() , N have 2 results ( val + chain). Differential Revision: http://reviews.llvm.org/D23756 llvm-svn: 279961
* [AVX512] In some cases KORTEST instruction may be used instead of ZEXT + ↵Igor Breger2016-08-295-723/+273
| | | | | | | | TEST sequence. Differential Revision: http://reviews.llvm.org/D23490 llvm-svn: 279960
* [X86] Don't lower FABS/FNEG masking directly to a ConstantPool load. Just ↵Craig Topper2016-08-297-73/+188
| | | | | | | | create a ConstantFPSDNode and let that be lowered. This allows broadcast loads to used when available. llvm-svn: 279958
* [AVX-512] Add 512-bit fabs tests with and without AVX512DQ.Craig Topper2016-08-291-4/+84
| | | | llvm-svn: 279956
* [AVX-512] Add support for selecting 512-bit VPABSB/VPABSW when BWI is available.Craig Topper2016-08-281-8/+2
| | | | llvm-svn: 279951
* [AVX-512] Add testcases showing that we don't emit 512-bit vpabsb/vpabsw. ↵Craig Topper2016-08-281-5/+155
| | | | | | Will be fixed in a future commit. llvm-svn: 279949
* [x86] add tests for <3 x N> vector types (PR29114)Sanjay Patel2016-08-281-0/+40
| | | | llvm-svn: 279939
* [X86][AVX512] Only combine EVEX targets shuffles to shuffles of the same ↵Simon Pilgrim2016-08-281-4/+8
| | | | | | | | | | number of vector elements Over eager combing prevents the correct folding of writemasks. At the moment this occurs for ALL EVEX shuffles, in the future we need to check that the user of the root shuffle is a VSELECT that can fold to a writemask. llvm-svn: 279934
* [PowerPC] Implement lowering for atomicrmw min/max/umin/umaxHal Finkel2016-08-281-0/+435
| | | | | | Implement lowering for atomicrmw min/max/umin/umax. Fixes PR28818. llvm-svn: 279933
* [AVX-512] Promote AND/OR/XOR to v2i64/v4i64/v8i64 even when we have ↵Craig Topper2016-08-286-38/+53
| | | | | | | | | | AVX512F/AVX512VL. Previously we weren't creating masked logical operations if bitcasts appeared between the logic operation and the select. The IR optimizers can move bitcasts across logic operations and create these cases. To minimize the number of cases we need to handle, this change promotes all logic ops to an i64 vector type just like when only SSE or AVX is available. Unfortunately, this also has the consequence of making it difficult to select unmasked VPANDD/VPORD/VPXORD in all the cases it was previously used. This is the cause of most of the test change. This shouldn't result in any functional change though. llvm-svn: 279929
* [AVX-512] Add tests to show that we don't select masked logic ops if there ↵Craig Topper2016-08-281-0/+51
| | | | | | | | are bitcasts between the logic op and the select. This is taken from optimized IR of clang test cases for masked logic ops. llvm-svn: 279928
* AMDGPU/R600: Enable Load combineJan Vesely2016-08-276-135/+1951
| | | | | | | | Fix and improve tests Differential Revision: https://reviews.llvm.org/D23899 llvm-svn: 279925
* [X86] Enable FR32/FR64 cmpeq/cmpne/cmpunord/cmpord to be commuted.Craig Topper2016-08-271-4/+4
| | | | llvm-svn: 279913
* [AVX-512] Add load folding for EVEX vcmpps/pd/ss/sd.Craig Topper2016-08-272-0/+54
| | | | llvm-svn: 279912
* AMDGPU: Select mulhi 24-bit instructionsMatt Arsenault2016-08-272-37/+293
| | | | llvm-svn: 279902
* AMDGPU: Move cndmask pseudo to be isel pseudoMatt Arsenault2016-08-272-18/+18
| | | | | | | | There's only one use of this for the convenience of a pattern. I think v_mov_b64_pseudo should also be moved, but SIFoldOperands does currently make use of it. llvm-svn: 279901
* [GlobalISel] Add a fallback path to SDISel.Quentin Colombet2016-08-271-0/+19
| | | | | | | | | When global-isel fails on a MachineFunction MF, MF will be cleaned up and given to SDISel. Thanks to this fallback, we can already perform correctness test even if we support only a small portion of the functions in a test. llvm-svn: 279891
* [X86] Add baseline test for "odd" shuffles. NFC.Michael Kuperstein2016-08-271-0/+691
| | | | | | | | Adds a baseline test for lowering shuffles where the width of the output vector is not twice the size of the input vectors. Many of those sequences are suboptimal, and will hopefully be improved in follow-up patches. llvm-svn: 279888
* AMDGPU/SI: Canonicalize offset order for merged DS instructionsTom Stellard2016-08-266-22/+22
| | | | | | | | | | | | | | | | | | | Summary: If the scheduler clusters the loads, then the offsets will be sorted, but it is possible for the scheduler to scheduler loads together without out explicitly clustering them, which would give us non-sorted offsets. Also, we will want to do this if we move the load/store optimizer before the scheduler. Reviewers: arsenm Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: https://reviews.llvm.org/D23776 llvm-svn: 279870
* Swift Calling Convetion: add support for AArch64.Manman Ren2016-08-261-0/+11
| | | | | | | | It will just be the same as the regular calling convention. rdar://28029509 llvm-svn: 279853
* AArch64: avoid assertion on illegal types in performFDivCombine.Tim Northover2016-08-261-0/+43
| | | | | | | | In the code to detect fixed-point conversions and make use of AArch64's special instructions, we weren't prepared for weird types. The fptosi direction got fixed recently, but not the similar sitofp code. llvm-svn: 279852
* [AArch64] Avoid materializing constant values when generating csel instructions.Chad Rosier2016-08-261-0/+99
| | | | | | Differential Revision: https://reviews.llvm.org/D23677 llvm-svn: 279849
* GlobalISel: mark G_FPEXT legal from float to double.Tim Northover2016-08-261-0/+6
| | | | llvm-svn: 279845
* GlobalISel: mark G_FCMP legal on float & double.Tim Northover2016-08-261-0/+35
| | | | llvm-svn: 279844
* GlobalISel: simplify G_ICMP legalization regime.Tim Northover2016-08-261-19/+2
| | | | | | | | | | | | | | It's unclear how the old %res(32) = G_ICMP { s32, s32 } intpred(eq), %0, %1 is actually different from an s1 verison %res(1) = G_ICMP { s1, s32 } intpred(eq), %0, %1 so we'll remove it for now. llvm-svn: 279843
* GlobalISel: legalize sdiv and srem operations.Tim Northover2016-08-261-0/+52
| | | | llvm-svn: 279842
* GlobalISel: legalize under-width divisions.Tim Northover2016-08-261-0/+42
| | | | llvm-svn: 279841
* GlobalISel: mark selects legalTim Northover2016-08-261-0/+16
| | | | llvm-svn: 279840
* GlobalISel: mark float/int conversions legalTim Northover2016-08-261-0/+66
| | | | llvm-svn: 279839
* [AArch64] Avoid materializing constant 1 by using csinc, rather than csel.Chad Rosier2016-08-262-6/+44
| | | | | | | | This is similar to what was done in r261675, but for CSINC rather than CSINV. Differential Revision: https://reviews.llvm.org/D23892 llvm-svn: 279822
* Handle empty functions with debug info in load/store opt passPablo Barrio2016-08-261-0/+54
| | | | | | | | | | | | | | | | | | | | Summary: In fuctions that contained debug info but were empty otherwise, the ARM load/store optimizer could abort. This was because function MergeReturnIntoLDM handled the special case where a Machine Basic BLock is empty by calling MBB.empty(). However, this returns false in presence of debug info, although the function should be considered empty in the eyes of the load/store optimizer. This has been fixed by handling the case where searching through the block finds only debug instructions. Reviewers: rengolin, dexonsmith, llvm-commits, jmolloy Subscribers: t.p.northover, aemerson, rengolin, samparker Differential Revision: https://reviews.llvm.org/D23847 llvm-svn: 279820
* [X86][SSE4A] The EXTRQ/INSERTQ bit extraction/insertion ops should be in the ↵Simon Pilgrim2016-08-262-8/+145
| | | | | | integer domain llvm-svn: 279811
OpenPOWER on IntegriCloud