summaryrefslogtreecommitdiffstats
path: root/llvm/test/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* Add llvm.codeview.annotation to implement MSVC __annotationReid Kleckner2017-09-051-0/+73
| | | | | | | | | | | | | | | | | | Summary: This intrinsic represents a label with a list of associated metadata strings. It is modelled as reading and writing inaccessible memory so that it won't be removed as dead code. I think the intention is that the annotation strings should appear at most once in the debug info, so I marked it noduplicate. We are allowed to inline code with annotations as long as we strip the annotation, but that can be done later. Reviewers: majnemer Subscribers: eraman, llvm-commits, hiraditya Differential Revision: https://reviews.llvm.org/D36904 llvm-svn: 312569
* [X86] Remove unnecessary (v4f32 (X86vzmovl (v4f32 (scalar_to_vector ↵Craig Topper2017-09-051-4/+4
| | | | | | | | | | | | | | | | | | FR32X)))) patterns We had already disabled the pattern for SSE4.1 and SSE4.2. But it got re-enabled for AVX and AVX512. With SSE41 we rely on a separate (v4f32 (X86vzmovl VR128)) pattern to select blendps with a xorps to create zeroess. And a separate (v4f32 (scalar_to_vector FR32X)) to select a COPY_TO_REG_CLASS to move FR32 to VR128 The same thing can happen for AVX with vblendps and those separate patterns already exist. For AVX512, (v4f32 (X86vzmov VR128)) will select a VMOVSS instruction instead of VBLENDPS due to their not being a EVEX VBLENDPS. This is what we were getting out of the larger pattern anyway. So the larger pattern is unneeded for AVX512 too. For SSE1-SSSE3 we can rely on (v4f32 (X86vzmov VR128)) selecting a MOVSS similar to AVX512. Again this is what the larger pattern did too. So the only real change here is that AVX1/2 now properly outputs a VBLENDPS during isel instead of a VMOVSS to match SSE41. Most tests didn't notice because the two address instruction pass knows how to turn VMOVSS into VBLENDPS to get an independent destination register. llvm-svn: 312564
* AMDGPU: Fix not accounting for tail call resource usageMatt Arsenault2017-09-051-0/+31
| | | | | | | | If the only call in a function is a tail call, the function isn't considered to have a call since it's a type of return. llvm-svn: 312561
* X86 Tests: Adding missing AVX512 fptoui coverage tests. NFC.Zvi Rackover2017-09-051-0/+231
| | | | | | Some of the cases show missing pattern i intend to fix shortly. llvm-svn: 312560
* [AVX512] Remove patterns for (v8f32 (X86vzmovl (insert_subvector undef, ↵Craig Topper2017-09-051-0/+1
| | | | | | | | (v4f32 (scalar_to_vector FR32X:)), (iPTR 0)))) and the same for v4f64. We don't have this same pattern for AVX2 so I don't believe we should have it for AVX512. We also didn't have it for v16f32. llvm-svn: 312543
* [AMDGPU] Added extra test checks to make D19325 diff clearerSimon Pilgrim2017-09-051-5/+11
| | | | llvm-svn: 312537
* [X86] Limit store merge size when implicitfloat is enabled (PR34421)Simon Pilgrim2017-09-051-0/+40
| | | | | | | | As suggested by @niravd : https://bugs.llvm.org/show_bug.cgi?id=34421#c2 Differential Revision: https://reviews.llvm.org/D37464 llvm-svn: 312534
* [X86] Regenerate scalar rotation testsSimon Pilgrim2017-09-052-69/+207
| | | | llvm-svn: 312530
* [X86][AVX512] Use AVX512 attributes instead of -mcpu in vector shift testsSimon Pilgrim2017-09-059-38/+76
| | | | llvm-svn: 312529
* [X86][AVX512] Use AVX512 attributes instead of -mcpuSimon Pilgrim2017-09-053-8/+18
| | | | llvm-svn: 312528
* [ARM] GlobalISel: Support global variables for RWPIDiana Picus2017-09-053-12/+72
| | | | | | | | | In RWPI code, globals that are not read-only are accessed relative to the SB register (R9). This is achieved by explicitly generating an ADD instruction between SB and an offset that we either load from a constant pool or movw + movt into a register. llvm-svn: 312521
* [PowerPC] eliminate redundant compare instructionHiroshi Inoue2017-09-051-0/+722
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If multiple conditional branches are executed based on the same comparison, we can execute multiple conditional branches based on the result of one comparison on PPC. For example, if (a == 0) { ... } else if (a < 0) { ... } can be executed by one compare and two conditional branches instead of two pairs of a compare and a conditional branch. This patch identifies a code sequence of the two pairs of a compare and a conditional branch and merge the compares if possible. To maximize the opportunity, we do canonicalization of code sequence before merging compares. For the above example, the input for this pass looks like: cmplwi r3, 0 beq 0, .LBB0_3 cmpwi r3, -1 bgt 0, .LBB0_4 So, before merging two compares, we canonicalize it as cmpwi r3, 0 ; cmplwi and cmpwi yield same result for beq beq 0, .LBB0_3 cmpwi r3, 0 ; greather than -1 means greater or equal to 0 bge 0, .LBB0_4 The generated code should be cmpwi r3, 0 beq 0, .LBB0_3 bge 0, .LBB0_4 Differential Revision: https://reviews.llvm.org/D37211 llvm-svn: 312514
* [x86] add tests for vector store merge opportunity; NFCSanjay Patel2017-09-041-0/+139
| | | | llvm-svn: 312504
* [x86] auto-generate complete checks; NFCSanjay Patel2017-09-041-7/+21
| | | | llvm-svn: 312503
* [x86] add/regenerate complete checks; NFCSanjay Patel2017-09-043-78/+146
| | | | llvm-svn: 312502
* [x86] add test for unnecessary cmp + masked store; NFCSanjay Patel2017-09-041-0/+28
| | | | | | | | | As noted in PR11210: https://bugs.llvm.org/show_bug.cgi?id=11210 ...fixing this should allow us to eliminate x86-specific masked store intrinsics in IR. (Although more testing will be needed to confirm that.) llvm-svn: 312496
* Revert "Re-enable "[MachineCopyPropagation] Extend pass to do COPY source ↵Sam McCall2017-09-0472-302/+319
| | | | | | | | | | forwarding"" This crashes on boringSSL on PPC (will send reduced testcase) This reverts commit r312328. llvm-svn: 312490
* [X86][AVX512] Add support for VPERMILPS v16f32 shuffle lowering (PR34382)Simon Pilgrim2017-09-042-42/+31
| | | | | | Avoid use of VPERMPS where we don't need it by instead using the variable mask version of VPERMILPS for unary shuffles. llvm-svn: 312486
* Added shuffle test case from PR34382Simon Pilgrim2017-09-041-0/+11
| | | | llvm-svn: 312485
* Added shuffle test case from PR34369Simon Pilgrim2017-09-041-0/+37
| | | | llvm-svn: 312481
* [X86] Replace -mcpu option with -mattr in LIT tests added in ↵Ayman Musa2017-09-0413-952/+953
| | | | | | https://reviews.llvm.org/rL312442 llvm-svn: 312474
* [GlobalISel][X86] G_PHI support.Igor Breger2017-09-044-1/+1321
| | | | llvm-svn: 312473
* [XRay][CodeGen] Use PIC-friendly code in XRay sleds and remove synthetic ↵Dean Michael Berris2017-09-0411-85/+73
| | | | | | | | | | | | | | | | | references in .text Summary: This is a re-roll of D36615 which uses PLT relocations in the back-end to the call to __xray_CustomEvent() when building in -fPIC and -fxray-instrument mode. Reviewers: pcc, djasper, bkramer Subscribers: sdardis, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D37373 llvm-svn: 312466
* [X86] Add a combine to recognize when we have two insert subvectors that ↵Craig Topper2017-09-042-3/+0
| | | | | | | | together write the whole vector, but the starting vector isn't undef. In this case we should replace the starting vector with undef. llvm-svn: 312462
* [X86] Add a combine to turn (insert_subvector zero, (insert_subvector zero, ↵Craig Topper2017-09-032-6/+0
| | | | | | X, Idx), Idx) into an insert of X into the larger zero vector. llvm-svn: 312460
* [X86] Add more patterns to use moves to zero the upper portions of a vector ↵Craig Topper2017-09-032-30/+15
| | | | | | register that I missed in r312450. llvm-svn: 312459
* [X86] Combine inserting a vector of zeros into a vector of zeros just the ↵Craig Topper2017-09-031-14/+4
| | | | | | larger vector. llvm-svn: 312458
* [X86] Add patterns to turn an insert into lower subvector of a zero vector ↵Craig Topper2017-09-038-186/+97
| | | | | | | | into a move instruction which will implicitly zero the upper elements. Ideally we'd be able to emit the SUBREG_TO_REG without the explicit register->register move, but we'd need to be sure the producing operation would select something that guaranteed the upper bits were already zeroed. llvm-svn: 312450
* [X86] Add VBLENDPS/VPBLENDD to the execution domain fixing tables.Craig Topper2017-09-0326-276/+205
| | | | llvm-svn: 312449
* [X86] Canonicalize (concat_vectors X, zero) -> (insert_subvector zero, X, 0).Craig Topper2017-09-036-121/+93
| | | | | | In a future patch, I plan to teach isel to use a small vector move with implicit zeroing of the upper elements when it sees the (insert_subvector zero, X, 0) pattern. llvm-svn: 312448
* [X86] Add -mtriple option to LIT tests added in ↵Ayman Musa2017-09-0313-13/+13
| | | | | | https://reviews.llvm.org/rL312442 llvm-svn: 312443
* [X86][AVX512] Add simple tests for all AVX512 shuffle instructions.Ayman Musa2017-09-0313-0/+26357
| | | | | | | | | | | | | | | | Throughout an effort to strongly check the behavior of CodeGen with the IR shufflevector instruction we generated many tests while predicting the best X86 sequence that may be generated. This is a subset of the generated tests that we think may add value to our X86 set of tests. Some of the checks are not optimal and will be changed after fixing: 1. PR34394 2. PR34382 3. PR34380 4. PR34359 Differential Revision: https://reviews.llvm.org/D37329 llvm-svn: 312442
* [X86] Add RUN line for LIT test committed in "rL312438: [X86] Fix crash on ↵Ayman Musa2017-09-031-1/+3
| | | | | | assert of non-simple type after type-legalization.". llvm-svn: 312439
* [X86] Fix crash on assert of non-simple type after type-legalizationAyman Musa2017-09-031-0/+22
| | | | | | | | | | The function combineShuffleToVectorExtend in DAGCombine might generate an illegal typed node after "legalize types" phase, causing assertion on non-simple type to fail afterwards. Adding a type check in case the combine is running after the type legalize pass. Differential Revision: https://reviews.llvm.org/D37330 llvm-svn: 312438
* [X86] Teach fastisel to handle zext/sext i8->i16 and sext i1->i8/i16/i32/i64Craig Topper2017-09-024-48/+464
| | | | | | | | | | | | | | | | | | | | | Summary: ZExt and SExt from i8 to i16 aren't implemented in the autogenerated fast isel table because normal isel does a zext/sext to 32-bits and a subreg extract to avoid a partial register write or false dependency on the upper bits of the destination. This means without handling in fast isel we end up triggering a fast isel abort. We had no custom sign extend handling at all so while I was there I went ahead and implemented sext i1->i8/i16/i32/i64 which was also missing. This generates an i1->i8 sign extend using a mask with 1, then an 8-bit negate, then continues with a sext from i8. A better sequence would be a wider and/negate, but would require more custom code. Fast isel tests are a mess and I couldn't find a good home for the tests so I created a new one. The test pr34381.ll had to have fast-isel removed because it was relying on a fast isel abort to hit the bug. The test case still seems valid with fast-isel disabled though some of the instructions changed. Reviewers: spatel, zvi, igorb, guyblank, RKSimon Reviewed By: guyblank Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D37320 llvm-svn: 312422
* [AMDGPU] Testcase for computeKnownBits recursion. NFC.Stanislav Mekhanoshin2017-09-011-0/+69
| | | | | | | Testcase for rL312364: [AMDGPU] Prevent infinite recursion in DAG.computeKnownBits() llvm-svn: 312388
* [MIParser] Ensure getHexUint doesn't produce APInts with a bitwidth of 0Jessica Paquette2017-09-011-0/+39
| | | | | | | | | | | | | | | | | | | | | If getHexUint reads in a hex 0, it will create an APInt with a value of 0. The number of active bits on this APInt is used to calculate the bitwidth of Result. The number of active bits is defined as an APInt's bitwidth - its number of leading 0s. Since this APInt is 0, its bitwidth and number of leading 0s are equal. Thus, Result is constructed with a bitwidth of 0, triggering an APInt assert. This commit fixes that by checking if the APInt is equal to 0, and setting the bitwidth to 32 if it is. Otherwise, it sets the bitwidth using getActiveBits. This caused issues when compiling MIR files with successor probabilities. In the case that a successor is tagged with a probability of 0, this assert would fire on debug builds. https://reviews.llvm.org/D37401 llvm-svn: 312387
* [x86] eliminate redundant shuffle of horizontal math ops when both inputs ↵Sanjay Patel2017-09-011-16/+0
| | | | | | | | | | | | | | | are the same This is limited to a set of patterns based on the example in PR34111: https://bugs.llvm.org/show_bug.cgi?id=34111 ...but as I was investigating this, I see that horizontal patterns can go wrong in many, many other ways that would not be handled by this patch. Each data type may even go different in the DAG after starting with the same basic IR pattern, so even proper IR canonicalization won't fix it all. Differential Revision: https://reviews.llvm.org/D37357 llvm-svn: 312379
* LiveIntervalAnalysis: Fix alias regunit reserved definitionMatthias Braun2017-09-011-0/+49
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A register in CodeGen can be marked as reserved: In that case we consider the register always live and do not use (or rather ignore) kill/dead/undef operand flags. LiveIntervalAnalysis however tracks liveness per register unit (not per register). We already needed adjustments for this in r292871 to deal with super/sub registers. However I did not look at aliased register there. Looking at ARM: FPSCR (regunits FPSCR, FPSCR~FPSCR_NZCV) aliases with FPSCR_NZCV (regunits FPSCR_NZCV, FPSCR~FPSCR_NZCV) hence they share a register unit (FPSCR~FPSCR_NZCV) that represents the aliased parts of the registers. This shared register unit was previously considered non-reserved, however given that we uses of the reserved FPSCR potentially violate some rules (like uses without defs) we should make FPSCR~FPSCR_NZCV reserved too and stop tracking liveness for it. This patch: - Defines a register unit as reserved when: At least for one root register, the root register and all its super registers are reserved. - Adjust LiveIntervals::computeRegUnitRange() for new reserved definition. - Add MachineRegisterInfo::isReservedRegUnit() to have a canonical way of testing. - Stop computing LiveRanges for reserved register units in HMEditor even with UpdateFlags enabled. - Skip verification of uses of reserved reg units in the machine verifier (this usually didn't happen because there would be no cached liverange but there is no guarantee for that and I would run into this case before the HMEditor tweak, so may as well fix the verifier too). Note that this should only affect ARMs FPSCR/FPSCR_NZCV registers today; aliased registers are rarely used, the only other cases are hexagons P0-P3/P3_0 and C8/USR pairs which are not mixing reserved/non-reserved registers in an alias. Differential Revision: https://reviews.llvm.org/D37356 llvm-svn: 312348
* AMDGPU: IMPLICIT_DEFs and DBG_VALUEs do not contribute to wait statesNicolai Haehnle2017-09-011-0/+31
| | | | | | | | | | | | | | | Summary: This fixes a bug that was exposed on gfx9 in various GL45-CTS.shaders.loops.*_iterations.select_iteration_count_fragment tests, e.g. GL45-CTS.shaders.loops.do_while_uniform_iterations.select_iteration_count_fragment Reviewers: arsenm Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D36193 llvm-svn: 312337
* [X86] Add test case I forgot to commit with r312285.Craig Topper2017-09-011-0/+49
| | | | llvm-svn: 312335
* [LoopVectorizer] Use two step casting for float to pointer types.Manoj Gupta2017-09-012-0/+132
| | | | | | | | | | | | | | | | | | | | Summary: LoopVectorizer is creating casts between vec<ptr> and vec<float> types on ARM when compiling OpenCV. Since, tIs is illegal to directly cast a floating point type to a pointer type even if the types have same size causing a crash. Fix the crash using a two-step casting by bitcasting to integer and integer to pointer/float. Fixes PR33804. Reviewers: mkuper, Ayal, dlj, rengolin, srhines Reviewed By: rengolin Subscribers: aemerson, kristof.beyls, mkazantsev, Meinersbur, rengolin, mzolotukhin, llvm-commits Differential Revision: https://reviews.llvm.org/D35498 llvm-svn: 312331
* Re-enable "[MachineCopyPropagation] Extend pass to do COPY source forwarding"Geoff Berry2017-09-0172-319/+302
| | | | | | | | | | | | | | | | | | | | | | | | | | | Issues addressed since original review: - Moved removal of dead instructions found by LiveIntervals::shrinkToUses() outside of loop iterating over instructions to avoid instructions being deleted while pointed to by iterator. - Fixed ARMLoadStoreOptimizer bug exposed by this change in r311907. - The pass no longer forwards COPYs to physical register uses, since doing so can break code that implicitly relies on the physical register number of the use. - The pass no longer forwards COPYs to undef uses, since doing so can break the machine verifier by creating LiveRanges that don't end on a use (since the undef operand is not considered a use). [MachineCopyPropagation] Extend pass to do COPY source forwarding This change extends MachineCopyPropagation to do COPY source forwarding. This change also extends the MachineCopyPropagation pass to be able to be run during register allocation, after physical registers have been assigned, but before the virtual registers have been re-written, which allows it to remove virtual register COPY LiveIntervals that become dead through the forwarding of all of their uses. llvm-svn: 312328
* [ARM] GlobalISel: Support ROPI global variablesDiana Picus2017-09-013-3/+198
| | | | | | | In the ROPI relocation model, read-only variables are accessed relative to the PC. We use the (MOV|LDRLIT)_ga_pcrel pseudoinstructions for this. llvm-svn: 312323
* [ARM] GlobalISel: More tests. NFC.Diana Picus2017-09-012-2/+116
| | | | | | | | Test constants as well in the PIC tests. These are also represented as G_GLOBAL_VALUE, and although they are treated just like other globals for PIC, they won't be for ROPI, so it's good to have this coverage. llvm-svn: 312319
* [X86] Add isel patterns for memory forms of FMA3 intrinsic instructionsCraig Topper2017-09-011-48/+48
| | | | llvm-svn: 312309
* AMDGPU: Fold clamp modifier for packed instructionsMatt Arsenault2017-08-311-15/+187
| | | | llvm-svn: 312297
* [WebAssembly] Refactor load ISel tablegen patterns into classesDerek Schuff2017-08-311-5/+2
| | | | | | | | | Not all of these will be able to be used by atomics because tablegen, but it still seems like a good change by itself. Differential Revision: https://reviews.llvm.org/D37345 llvm-svn: 312287
* [MachineOutliner] Recommit r312194, missed optimization remarksJessica Paquette2017-08-311-0/+73
| | | | | | | | | | | | | Before, this commit caused a buildbot failure: http://bb.pgr.jp/builders/test-llvm-i686-linux-RA/builds/6026/steps/test_llvm/logs/LLVM%20%3A%3A%20CodeGen__AArch64__machine-outliner-remarks.ll This was caused by the Key value in DiagnosticInfoOptimizationBase being deallocated before emitting the remarks defined in MachineOutliner.cpp. As of r312277 this should no longer be an issue. llvm-svn: 312280
* [x86] add more tests for horizontal ops; NFCSanjay Patel2017-08-312-19/+159
| | | | llvm-svn: 312279
OpenPOWER on IntegriCloud