summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [AArch64] Add test case for preferred function alignment (NFC). Florian Hahn2017-07-071-0/+26
| | | | | | | | | | | | Reviewers: evandro, joelkevinjones, mcrosier Reviewed By: joelkevinjones, mcrosier Subscribers: mcrosier, aemerson, llvm-commits, rengolin, evandro, javed.absar, joelkevinjones, kristof.beyls Differential Revision: https://reviews.llvm.org/D34951 llvm-svn: 307369
* [ARM] GlobalISel: Select hard G_FCMP for s32Diana Picus2017-07-072-0/+637
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We lower to a sequence consisting of: - MOVi 0 into a register - VCMPS to do the actual comparison and set the VFP flags - FMSTAT to move the flags out of the VFP unit - MOVCCi to either use the "zero register" that we have previously set with the MOVi, or move 1 into the result register, based on the values of the flags As was the case with soft-float, for some predicates (one, ueq) we actually need two comparisons instead of just one. When that happens, we generate two VCMPS-FMSTAT-MOVCCi sequences and chain them by means of using the result of the first MOVCCi as the "zero register" for the second one. This is a bit overkill, since one comparison followed by two non-flag-setting conditional moves should be enough. In any case, the backend manages to CSE one of the comparisons away so it doesn't matter much. Note that unlike SelectionDAG and FastISel, we always use VCMPS, and not VCMPES. This makes the code a lot simpler, and it also seems correct since the LLVM Lang Ref defines simple true/false returns if the operands are QNaN's. For SNaN's, even VCMPS throws an Invalid Operand exception, so they won't be slipping through unnoticed. Implementation-wise, this introduces a template so we can share the same code that we use for handling integer comparisons, since the only differences are in the details (exact opcodes to be used etc). Hopefully this will be easy to extend to s64 G_FCMP. llvm-svn: 307365
* RegisterScavenging: Fix PR33687Matthias Braun2017-07-071-0/+66
| | | | | | | | | When scavenging for a use in instruction MI, we will reload after that instruction and hence cannot spill uses/defs of this instruction. This fixes http://llvm.org/PR33687 llvm-svn: 307352
* Extend memcpy expansion in Transform/Utils to handle wider operand types.Sean Fertile2017-07-072-0/+73
| | | | | | | | | | | Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the target to provide the operand types through TTI callbacks. The default values for the TTI callbacks use int8 operand types and matches the existing behaviour if they aren't overridden by the target. Differential revision: https://reviews.llvm.org/D32536 llvm-svn: 307346
* Reverting r307326 because it breaks clang tests.Michael Kuperstein2017-07-063-93/+0
| | | | llvm-svn: 307334
* [ConstHoisting] choose to hoist when frequency is the same.Wei Mi2017-07-061-4/+48
| | | | | | | | | | | | The patch is to adjust the strategy of frequency based consthoisting: Previously when the candidate block has the same frequency with the existing blocks containing a const, it will not hoist the const to the candidate block. For that case, now we change the strategy to hoist the const if only existing blocks have more than one block member. This is helpful for reducing code size. Differential Revision: https://reviews.llvm.org/D35084 llvm-svn: 307328
* [NVPTX] Add lowering of i128 params.Michael Kuperstein2017-07-063-0/+93
| | | | | | | | | | | | | | | | | The patch adds support of i128 params lowering. The changes are quite trivial to support i128 as a "special case" of integer type. With this patch, we lower i128 params the same way as aggregates of size 16 bytes: .param .b8 _ [16]. Currently, NVPTX can't deal with the 128 bit integers: * in some cases because of failed assertions like ValVTs.size() == OutVals.size() && "Bad return value decomposition" * in other cases emitting PTX with .i128 or .u128 types (which are not valid [1]) [1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types Differential Revision: https://reviews.llvm.org/D34555 Patch by: Denys Zariaiev (denys.zariaiev@gmail.com) llvm-svn: 307326
* AMDGPU: Add macro fusion schedule DAG mutationMatt Arsenault2017-07-0614-305/+422
| | | | | | Try to increase opportunities to shrink vcc uses. llvm-svn: 307313
* AMDGPU: Remove unnecessary IR from MIR testsMatt Arsenault2017-07-067-358/+113
| | | | llvm-svn: 307311
* [AMDGPU] Always use rcp + mul with fast mathStanislav Mekhanoshin2017-07-062-43/+32
| | | | | | | | | | | | | | | | | | | | | | | | | Regardless of relaxation options such as -cl-fast-relaxed-math we are producing rather long code for fdiv via amdgcn_fdiv_fast intrinsic. This intrinsic is used to replace fdiv with 2.5ulp metadata and does not handle denormals, thus believed to be fast. An fdiv instruction can also have fast math flag either by itself or together with fpmath metadata. Clang used with a relaxation flag always produces both metadata and fast flag: %div = fdiv fast float %v, %0, !fpmath !12 !12 = !{float 2.500000e+00} Current implementation ignores fast flag and favors metadata. An instruction with just fast flag would be lowered to a fastest rcp + mul, but that never happen on practice because of described mutual clang and BE behavior. This change allows an "fdiv fast" to be always lowered as rcp + mul. Differential Revision: https://reviews.llvm.org/D34844 llvm-svn: 307308
* [X86][SSE] Tests for bitcasting iX integers to vXi1 boolean vectorsSimon Pilgrim2017-07-063-0/+7447
| | | | | | Including sign/zero extension to legal types llvm-svn: 307301
* [X86][SSE] Dropped -mcpu from bitcast+setcc testsSimon Pilgrim2017-07-062-100/+475
| | | | | | | | Use triple and attribute only for consistency Added SSE2/AVX tests on 256-bit vectors to test PACKSS behaviour llvm-svn: 307289
* [LSR] Narrow search space by filtering non-optimal formulae with the same ↵Wei Mi2017-07-061-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ScaledReg and Scale. When the formulae search space is huge, LSR uses a series of heuristic to keep pruning the search space until the number of possible solutions are within certain limit. The big hammer of the series of heuristics is NarrowSearchSpaceByPickingWinnerRegs, which picks the register which is used by the most LSRUses and deletes the other formulae which don't use the register. This is a effective way to prune the search space, but quite often not a good way to keep the best solution. We saw cases before that the heuristic pruned the best formula candidate out of search space. To relieve the problem, we introduce a new heuristic called NarrowSearchSpaceByFilterFormulaWithSameScaledReg. The basic idea is in order to reduce the search space while keeping the best formula, we want to keep as many formulae with different Scale and ScaledReg as possible. That is because the central idea of LSR is to choose a group of loop induction variables and use those induction variables to represent LSRUses. An induction variable candidate is often represented by the Scale and ScaledReg in a formula. If we have more formulae with different ScaledReg and Scale to choose, we have better opportunity to find the best solution. That is why we believe pruning search space by only keeping the best formula with the same Scale and ScaledReg should be more effective than PickingWinnerReg. And we use two criteria to choose the best formula with the same Scale and ScaledReg. The first criteria is to select the formula using less non shared registers, and the second criteria is to select the formula with less cost got from RateFormula. The patch implements the heuristic before NarrowSearchSpaceByPickingWinnerRegs, which is the last resort. Testing shows we get 1.8% and 2% on two internal benchmarks on x86. llvm nightly testsuite performance is neutral. We also tried lsr-exp-narrow and it didn't help on the two improved internal cases we saw. Differential Revision: https://reviews.llvm.org/D34583 llvm-svn: 307269
* [X86][SSE4A] Add support for shuffle combining to INSERTQI.Simon Pilgrim2017-07-061-13/+4
| | | | llvm-svn: 307268
* [X86][SSE4A] Add test showing missed opportunities to combine INSERTQI shuffleSimon Pilgrim2017-07-061-0/+20
| | | | llvm-svn: 307265
* [x86] fix over-specified triple and auto-generate checks; NFCSanjay Patel2017-07-061-17/+25
| | | | llvm-svn: 307262
* [MachineVerifier] Add check that tied physregs aren't different.Mikael Holmen2017-07-061-0/+22
| | | | | | | | | | | | | | | | Summary: Added MachineVerifier code to check register ties more thoroughly, especially so that physical registers that are tied are the same. This may help e.g. when creating MIR files. Original patch by Jesper Antonsson Reviewers: stoklund, sanjoy, qcolombet Reviewed By: qcolombet Subscribers: qcolombet, llvm-commits Differential Revision: https://reviews.llvm.org/D34394 llvm-svn: 307259
* [X86][SSE4A] Add support for shuffle combining to EXTRQ.Simon Pilgrim2017-07-061-24/+12
| | | | llvm-svn: 307254
* [X86][SSE4A] Add scheduling tests for SSE4A instructionsSimon Pilgrim2017-07-061-0/+95
| | | | llvm-svn: 307251
* [RegisterCoalescer] Fix for SubRange join unreachableDavid Stuttard2017-07-061-0/+162
| | | | | | | | | | | | | | | | Summary: During remat, some subranges might end up having invalid segments which caused problems for later coalescing. Added in a check to remove segments that are invalidated as part of the remat. See http://llvm.org/PR33524 Subscribers: MatzeB, qcolombet Differential Revision: https://reviews.llvm.org/D34391 llvm-svn: 307247
* [ARM] GlobalISel: Map s32 G_FCMP in reg bank selectDiana Picus2017-07-061-0/+29
| | | | | | Map hard G_FCMP operands to FPR and the result to GPR. llvm-svn: 307245
* [ARM] GlobalISel: Legalize G_FCMP for s32Diana Picus2017-07-061-0/+654
| | | | | | | | | | | | | | | | | | | | | This covers both hard and soft float. Hard float is easy, since it's just Legal. Soft float is more involved, because there are several different ways to handle it based on the predicate: one and ueq need not only one, but two libcalls to get a result. Furthermore, we have large differences between the values returned by the AEABI and GNU functions. AEABI functions return a nice 1 or 0 representing true and respectively false. GNU functions generally return a value that needs to be compared against 0 (e.g. for ogt, the value returned by the libcall is > 0 for true). We could introduce redundant comparisons for AEABI as well, but they don't seem easy to remove afterwards, so we do different processing based on whether or not the result really needs to be compared against something (and just truncate if it doesn't). llvm-svn: 307243
* [ARM] GlobalISel: Widen s1, s8, s16 G_CONSTANTDiana Picus2017-07-061-0/+15
| | | | | | Get the legalizer to widen small constants. llvm-svn: 307239
* Fix libcall expansion creating DAG nodes with invalid type post type ↵Vadim Chugunov2017-07-051-0/+21
| | | | | | | | | | | | legalization. If we are lowering a libcall after legalization, we'll split the return type into a pair of legal values. Patch by Jatin Bhateja and Eli Friedman. Differential Revision: https://reviews.llvm.org/D34240 llvm-svn: 307207
* {DAGCombiner] Fold (rot x, 0) -> xSimon Pilgrim2017-07-051-1/+0
| | | | llvm-svn: 307184
* [X86] Test bitfield loadstore tests on i686 as wellSimon Pilgrim2017-07-051-89/+162
| | | | llvm-svn: 307182
* [PowerPC] Make sure that we remove dead PHI nodes after the PPCCTRLoops pass.Sean Fertile2017-07-051-0/+38
| | | | | | | Commiting on behalf of Stefan Pintilie. Differential Revision: https://reviews.llvm.org/D34829 llvm-svn: 307180
* [DAGCombiner] visitRotate patch to optimize pair of ROTR/ROTL instructions ↵Andrew Zhogin2017-07-052-10/+5
| | | | | | | | | | into one with combined shift operand. For two ROTR operations with shifts C1, C2; combined shift operand will be (C1 + C2) % bitsize. Differential revision: https://reviews.llvm.org/D12833 llvm-svn: 307179
* [X86][SSE] Dropped -mcpu from bitcast+setcc mask testsSimon Pilgrim2017-07-052-130/+130
| | | | | | Use triple and attribute only for consistency llvm-svn: 307176
* [Power9] Exploit vector extract with variable index.Tony Jiang2017-07-051-0/+167
| | | | | | | | | | | | | | | | This patch adds the exploitation for new power 9 instructions which extract variable elements from vectors: VEXTUBLX VEXTUBRX VEXTUHLX VEXTUHRX VEXTUWLX VEXTUWRX Differential Revision: https://reviews.llvm.org/D34032 Commit on behalf of Zaara Syeda (syzaara@ca.ibm.com) llvm-svn: 307174
* [Power9] Exploit vector integer extend instructions when indices aren't correct.Tony Jiang2017-07-051-28/+225
| | | | | | | | | | | | | | | This patch adds on to the exploitation added by https://reviews.llvm.org/D33510. This now catches build vector nodes where the inputs are coming from sign extended vector extract elements where the indices used by the vector extract are not correct. We can still use the new hardware instructions by adding a shuffle to move the elements to the correct indices. I introduced a new PPCISD node here because adding a vector_shuffle and changing the elements of the vector_extracts was getting undone by another DAG combine. Commit on behalf of Zaara Syeda (syzaara@ca.ibm.com) Differential Revision: https://reviews.llvm.org/D34009 llvm-svn: 307169
* [Hexagon] Preclude non-memory test from being optimized away. NFC.Nirav Dave2017-07-0511-38/+38
| | | | llvm-svn: 307153
* [GlobalIsel] allow x86_fp80 values to be dumped.Igor Breger2017-07-051-0/+18
| | | | | | | | | | | | | | | | Summary: Otherwise the fallback path fails with an assertion on x86_64 targets, when "x86_fp80" is encountered. Reviewers: t.p.northover, zvi, guyblank Reviewed By: zvi Subscribers: rovka, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D34975 llvm-svn: 307140
* Add the missing triple to the test case added as part of r307120.Nemanja Ivanovic2017-07-051-1/+1
| | | | llvm-svn: 307122
* [PowerPC] Fix for PR33636Nemanja Ivanovic2017-07-051-0/+702
| | | | | | | | Remove casts to a constant when a node can be an undef. Differential Revision: https://reviews.llvm.org/D34808 llvm-svn: 307120
* Rewrite areNonVolatileConsecutiveLoads to use BaseIndexOffsetNirav Dave2017-07-058-210/+166
| | | | | | | | | | | | | | | | | | | | | | | | | | | Relanding after rewriting undef.ll test to avoid host-dependant endianness. As discussed in D34087, rewrite areNonVolatileConsecutiveLoads using generic checks. Also, propagate missing local handling from there to BaseIndexOffset checks. Tests of note: * test/CodeGen/X86/build-vector* - Improved. * test/CodeGen/BPF/undef.ll - Improved store alignment allows an additional store merge * test/CodeGen/X86/clear_upper_vector_element_bits.ll - This is a case we already do not handle well. Here, the DAG is improved, but scheduling causes a code size degradation. Reviewers: RKSimon, craig.topper, spatel, andreadb, filcab Subscribers: nemanjai, llvm-commits Differential Revision: https://reviews.llvm.org/D34472 llvm-svn: 307114
* Revert "[AVR] Add the branch selection pass from the GitHub repository"Dylan McKay2017-07-053-7/+7
| | | | | | This reverts commit 602ef067c1d58ecb425d061f35f2bc4c7e92f4f3. llvm-svn: 307111
* [AVR] Add the branch selection pass from the GitHub repositoryDylan McKay2017-07-053-7/+7
| | | | | | | We should rewrite this using the generic branch relaxation pass, but for the moment having this pass is better than hitting an assertion error. llvm-svn: 307109
* NFC.Gadi Haber2017-07-041-210/+835
| | | | | | | | | | | | | | | Made some updates to the half.ll test under CodeGen to make it friendly to the update_llc_test_checks .py tool as follows: 1.Removing the llc flag -asm-verbose=false 2.Grouping the multiple check-prefix directives 3.Apply update_llc_test_checks.py tool on the test This change is needed to easily update scheduling changes in an upcoming patch. Reviewers: zvi, RKSimon, craig.topper Differential Revision: https://reviews.llvm.org/D34934 llvm-svn: 307108
* [ARM][test] Added test/CodeGen/ARM/ror.ll test. NFC precommit for D12833.Andrew Zhogin2017-07-041-0/+36
| | | | llvm-svn: 307103
* [X86][SSE4A] Add support for combining from non-v16i8 EXTRQI/INSERTQI shufflesSimon Pilgrim2017-07-042-36/+41
| | | | | | With the improved shuffle decoding we can now combine EXTRQI/INSERTQI shuffles from non-v16i8 vector types llvm-svn: 307099
* [AMDGPU] Switch scalarize global loads ON by defaultAlexander Timofeev2017-07-04141-559/+789
| | | | | | Differential revision: https://reviews.llvm.org/D34407 llvm-svn: 307097
* [FastISel] Move gc intrinsic test to X86 directoryAnna Thomas2017-07-041-0/+2
| | | | | | | | | Move from generic to X86 directory since gc intrinsics only supposed in X86 64 bit. Add target triple as well. Fixes build failure in i686-linux-RA caused by rL307084. llvm-svn: 307086
* [FastISel][SelectionDAG]Teach fastISel about GC intrinsicsAnna Thomas2017-07-041-0/+55
| | | | | | | | | | | | | | | | | | | | | | | Summary: We are crashing in LLC at O0 when gc intrinsics are present in the block. The reason being FastISel performs basic block ISel by modifying GC.relocates to be the first instruction in the block. This can cause us to visit the GC relocate before it's corresponding GC.statepoint is visited, which is incorrect. When we lower the statepoint, we record the base and derived pointers, along with the gc.relocates. After this we can visit the gc.relocate. This patch avoids fastISel from incorrectly creating the block with gc.relocate as the first instruction. Reviewers: qcolombet, skatkov, qikon, reames Reviewed by: skatkov Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34421 llvm-svn: 307084
* [X86] Add combine tests for vector rotatesSimon Pilgrim2017-07-041-0/+83
| | | | | | Reference tests for D12833 llvm-svn: 307073
* NFC commit.Gadi Haber2017-07-041-25/+26
| | | | | | | | | | | | Converting the Codegen test "extractelement-legalization-store-ordering.ll" to be "update_llc_test_checks" friendly. The changes to the test are needed for an upcoming scheduling patch. Reviewers: zvi, RKSimon Differential Revision: https://reviews.llvm.org/D34935 llvm-svn: 307066
* [X86] Add comment string for broadcast loads from the constant pool.Craig Topper2017-07-046-639/+1459
| | | | | | | | | | | | | | | | | Summary: When broadcasting from the constant pool its useful to print out the final vector similar to what we do for normal moves from the constant pool. I changed only a couple tests that were broadcast focused. One of them had been previously hand tweaked after running the script so that it could check the constant pool declaration. But I think this patch makes that unnecessary now since we can check the comment instead. Reviewers: spatel, RKSimon, zvi Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34923 llvm-svn: 307062
* [AVR] Fix bug which caused assertion errors for some FRMIDX instructionsDylan McKay2017-07-041-0/+33
| | | | | | | | | | | | | Previously, if a basic block ended with a FRMIDX instruction, we would end up doing something like this. *std::next(MBB.end()) Which would hit an error: "Assertion `!NodePtr->isKnownSentinel()' failed." llvm-svn: 307057
* Revert r307026, "[AMDGPU] Switch scalarize global loads ON by default"NAKAMURA Takumi2017-07-04140-788/+558
| | | | | | | | | It broke a testcase. Failing Tests (1): LLVM :: CodeGen/AMDGPU/alignbit-pat.ll llvm-svn: 307054
* [legalize-types] Clean up softening machinery.Anton Yartsev2017-07-041-0/+55
| | | | | | | | The patch makes SoftenFloatResult/Operand logic just the same as all other legalization routines have: SoftenFloatResult() now fills the SoftenFloats map and SoftenFloatOperand() perform all needed replacements. This prevents softening mashinery from leaving stale entries in SoftenFloats map (that resulted in errors during the legalize type checking) and clarifies softening. The patch replaces https://reviews.llvm.org/D29265. Differential Revision: https://reviews.llvm.org/D31946 llvm-svn: 307053
OpenPOWER on IntegriCloud