summaryrefslogtreecommitdiffstats
path: root/llvm/test
Commit message (Collapse)AuthorAgeFilesLines
...
* Re-enable "[IndVars] Canonicalize comparisons between non-negative values ↵Max Kazantsev2017-07-084-6/+104
| | | | | | | | | | | | | | and indvars" The patch was reverted due to a bug. The bug was that if the IV is the 2nd operand of the icmp instruction, then the "Pred" variable gets swapped and differs from the instruction's predicate. In this patch we use the original predicate to do the transformation. Also added a test case that exercises this situation. Differentian Revision: https://reviews.llvm.org/D35107 llvm-svn: 307477
* [LoopVectorize] partly revert r307475Sanjay Patel2017-07-081-194/+19
| | | | | | Bots are failing because of the additional checks. llvm-svn: 307476
* [LoopVectorize] auto-generate complete checks; NFCSanjay Patel2017-07-082-29/+279
| | | | | | | | | I'm looking at a cmp transform in InstCombine that would affect these tests, but it's hard to know if it makes things better or worse without seeing the full IR. OTOH, maybe these tests shouldn't be running a bunch of transform passes in the first place? llvm-svn: 307475
* [x86] add SBB optimization for SETBE (ule) condition codeSanjay Patel2017-07-081-4/+2
| | | | | | | | | | | | | | | | | | | | | | | x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess with missing optimizations. We handle some patterns, but miss logical variants. To clean that up, we should convert all select-of-constants to logic/math and enhance the combining for the expected patterns from that. Selecting 0 or -1 needs extra attention to produce the optimal code as shown here. Attempt to verify that all of these IR forms are logically equivalent: http://rise4fun.com/Alive/plxs Earlier steps in this series: rL306040 rL306072 rL307404 (D34652) As acknowledged in the earlier review, there's a possibility that some Intel uarch would prefer to produce an xor to clear the fake register operand with sbb %eax, %eax. This will likely need to be addressed in a separate pass. llvm-svn: 307471
* Increase the import-threshold for crtical functions.Dehao Chen2017-07-072-2/+2
| | | | | | | | | | | | | | Summary: For interative sample-pgo, if a hot call site is inlined in the profiling binary, we should inline it in before profile annotation in the backend. Before that, the compile phase first collects all GUIDs that needs to be imported and creates virtual "hot" call edge in the summary. However, "hot" is not good enough to guarantee the callsites get inlined. This patch introduces "critical" call edge, and assign much higher importing threshold for those edges. Reviewers: tejohnson Reviewed By: tejohnson Subscribers: sanjoy, mehdi_amini, llvm-commits, eraman Differential Revision: https://reviews.llvm.org/D35096 llvm-svn: 307439
* [LoopUnrollRuntime] Support multiple exit blocks unrolling when prolog ↵Anna Thomas2017-07-071-79/+165
| | | | | | | | | | | | | | | remainder generated With the NFC refactoring in rL307417 (git SHA 987dd01), all the logic is in place to support multiple exit/exiting blocks when prolog remainder is generated. This patch removed the assert that multiple exit blocks unrolling is only supported when epilog remainder is generated. Also, added test runs and checks with PROLOG prefix in runtime-loop-multiple-exits.ll test cases. llvm-svn: 307435
* [RegAllocFast] Don't insert kill flags of super-register for partial killQuentin Colombet2017-07-071-0/+34
| | | | | | | | | | | | | | | | | When reusing a register for a new definition, the fast register allocator used to insert a kill flag at the previous last use of that register to inform later passes that this register is free between the redef and the last use. However, this may be wrong when subregisters are involved. Indeed, a partially redef would have trigger a kill of the full super register, potentially wrongly marking all the other subregisters as free. Given we don't track which lanes are still live, we cannot set the kill flag in such case. Note: This bug has been latent for about 7 years (r104056). llvmg.org/PR33677 llvm-svn: 307428
* [RegAllocFast] Add the proper initialize method to use the .mir infrastructureQuentin Colombet2017-07-071-0/+2
| | | | | | NFC llvm-svn: 307427
* [PPC CodeGen] Expand the bitreverse.i32 intrinsic.Tony Jiang2017-07-073-23/+109
| | | | | | | Differential Revision: https://reviews.llvm.org/D33572 Fix PR: https://bugs.llvm.org/show_bug.cgi?id=33093 llvm-svn: 307413
* [ARM] Implement interleaved access bug fix from r306334Matthew Simpson2017-07-071-0/+29
| | | | | | | r306334 fixed a bug in AArch64 dealing with wide interleaved accesses having pointer types. The bug also exists in ARM, so this patch copies over the fix. llvm-svn: 307409
* [x86] add SBB optimization for SETAE (uge) condition codeSanjay Patel2017-07-071-6/+2
| | | | | | | | | | | | | | | | | | | | | | x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess with missing optimizations. We handle some patterns, but miss logical variants. To clean that up, we should convert all select-of-constants to logic/math and enhance the combining for the expected patterns from that. DAGCombiner already has the foundation to allow the transforms, so we just need to fill in the holes for x86 math op lowering. Selecting 0 or -1 needs extra attention to produce the optimal code as shown here. Attempt to verify that all of these IR forms are logically equivalent: http://rise4fun.com/Alive/plxs Earlier steps in this series: rL306040 rL306072 Differential Revision: https://reviews.llvm.org/D34652 llvm-svn: 307404
* [AMDGPU][mc][gfx9] Added support of op_sel/op_sel_hi for V_MAD_MIX*Dmitry Preobrazhensky2017-07-073-56/+507
| | | | | | | | | | See https://bugs.llvm.org//show_bug.cgi?id=33595 Reviewers: vpykhtin, artem.tamazov, arsenm Differential Revision: https://reviews.llvm.org/D35021 llvm-svn: 307402
* [ValueTracking] Fix the identity case (LHS => RHS) when the LHS is false.Chad Rosier2017-07-071-0/+36
| | | | | | | | | Prior to this commit both of the added test cases were passing. However, in the latter case (test7) we were doing a lot more work to arrive at the same answer (i.e., we were using isImpliedCondMatchingOperands() to determine the implication.). llvm-svn: 307400
* NFC: I simply added CHECK-LABEL to prevent false matches in the tests.Andrew V. Tischenko2017-07-072-0/+2
| | | | llvm-svn: 307397
* [SafepointIRVerifier] Avoid false positives in GC verifier for compare ↵Anna Thomas2017-07-071-0/+85
| | | | | | | | | | | | | | | | | | | between pointers Today the safepoint IR verifier catches some unrelocated uses of base pointers that are actually valid. With this change, we narrow down the set of false positives. Specifically, the verifier knows about compares to null and compares between 2 unrelocated pointers. Reviewed by: skatkov Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D35057 llvm-svn: 307392
* [AArch64] Use 16 bytes as preferred function alignment on Cortex-A57.Florian Hahn2017-07-071-1/+1
| | | | | | | | | | | | | | | | | | | | | Summary: This change gives a 0.89% speed on execution time, a 0.94% improvement in benchmark scores and a 0.62% increase in binary size on a Cortex-A57. These numbers are the geomean results on a wide range of benchmarks from the test-suite, SPEC2000, SPEC2006 and a range of proprietary suites. The software optimization guide for the Cortex-A57 recommends 16 byte branch alignment. Reviewers: t.p.northover, mcrosier, javed.absar, kristof.beyls, sbaranga Reviewed By: kristof.beyls Subscribers: aemerson, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D34954 llvm-svn: 307389
* [AArch64] Use 16 bytes as preferred function alignment on Cortex-A72.Florian Hahn2017-07-071-1/+1
| | | | | | | | | | | | | | | | | | | | | | Summary: This change gives a 0.34% speed on execution time, a 0.61% improvement in benchmark scores and a 0.57% increase in binary size on a Cortex-A72. These numbers are the geomean results on a wide range of benchmarks from the test-suite, SPEC2000, SPEC2006 and a range of proprietary suites. The software optimization guide for the Cortex-A72 recommends 16 byte branch alignment. Reviewers: t.p.northover, kristof.beyls, rengolin, sbaranga, mcrosier, javed.absar Reviewed By: kristof.beyls Subscribers: llvm-commits, aemerson Differential Revision: https://reviews.llvm.org/D34961 llvm-svn: 307380
* [AArch64] Add test case for preferred function alignment (NFC). Florian Hahn2017-07-071-0/+26
| | | | | | | | | | | | Reviewers: evandro, joelkevinjones, mcrosier Reviewed By: joelkevinjones, mcrosier Subscribers: mcrosier, aemerson, llvm-commits, rengolin, evandro, javed.absar, joelkevinjones, kristof.beyls Differential Revision: https://reviews.llvm.org/D34951 llvm-svn: 307369
* [ARM] GlobalISel: Select hard G_FCMP for s32Diana Picus2017-07-072-0/+637
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We lower to a sequence consisting of: - MOVi 0 into a register - VCMPS to do the actual comparison and set the VFP flags - FMSTAT to move the flags out of the VFP unit - MOVCCi to either use the "zero register" that we have previously set with the MOVi, or move 1 into the result register, based on the values of the flags As was the case with soft-float, for some predicates (one, ueq) we actually need two comparisons instead of just one. When that happens, we generate two VCMPS-FMSTAT-MOVCCi sequences and chain them by means of using the result of the first MOVCCi as the "zero register" for the second one. This is a bit overkill, since one comparison followed by two non-flag-setting conditional moves should be enough. In any case, the backend manages to CSE one of the comparisons away so it doesn't matter much. Note that unlike SelectionDAG and FastISel, we always use VCMPS, and not VCMPES. This makes the code a lot simpler, and it also seems correct since the LLVM Lang Ref defines simple true/false returns if the operands are QNaN's. For SNaN's, even VCMPS throws an Invalid Operand exception, so they won't be slipping through unnoticed. Implementation-wise, this introduces a template so we can share the same code that we use for handling integer comparisons, since the only differences are in the details (exact opcodes to be used etc). Hopefully this will be easy to extend to s64 G_FCMP. llvm-svn: 307365
* [TableGen] Add a proper namespace to an Instruction in an AsmMatcher test. ↵Craig Topper2017-07-071-0/+1
| | | | | | This is required after r307358. llvm-svn: 307361
* [PDB] Teach libpdb to write DBI Stream ECNames.Zachary Turner2017-07-071-0/+50
| | | | | | | | | | | | | | | | | | | | | | | Based strictly on the name, this seems to have something to do width edit & continue. The goal of this patch has nothing to do with supporting edit and continue though. msvc link.exe writes very basic information into this area even when *not* compiling with support for E&C, and so the goal here is to bring lld-link to parity. Since we cannot know what assumptions standard tools make about the content of PDB files, we need to be as close as possible. This ECNames data structure is a standard PDB string hash table. link.exe puts a single string into this hash table, which is the full path to the PDB file on disk. It then references this string from the module descriptor for the compiler generated `* Linker *` module. With this patch, lld-link will generate the exact same sequence of bytes as MSVC link for this subsection for a given object file input (as reported by `llvm-pdbutil bytes -ec`). llvm-svn: 307356
* RegisterScavenging: Fix PR33687Matthias Braun2017-07-071-0/+66
| | | | | | | | | When scavenging for a use in instruction MI, we will reload after that instruction and hence cannot spill uses/defs of this instruction. This fixes http://llvm.org/PR33687 llvm-svn: 307352
* [InferAddressSpaces] Fix assertion about null pointerYaxun Liu2017-07-071-0/+12
| | | | | | | | | | | | | InferAddressSpaces does not check address space in collectFlatAddressExpressions, which causes values with non flat address space put into Postorder and causes assertion in cloneValueWithNewAddressSpace. This patch fixes assertion in OpenCL 2.0 conformance test generic_address_space subtest for amdgcn target. Differential Revision: https://reviews.llvm.org/D34991 llvm-svn: 307349
* [WebAssembly] Support weak defined symbolsSam Clegg2017-07-071-5/+27
| | | | | | | | | | | | Model weakly defined symbols as symbols that are both exports and imported and marked as weak. Local references to the symbols refer to the import but the linker can resolve this to the weak export if not strong symbol is found at link time. Differential Revision: https://reviews.llvm.org/D35029 llvm-svn: 307348
* Extend memcpy expansion in Transform/Utils to handle wider operand types.Sean Fertile2017-07-072-0/+73
| | | | | | | | | | | Adds loop expansions for known-size and unknown-sized memcpy calls, allowing the target to provide the operand types through TTI callbacks. The default values for the TTI callbacks use int8 operand types and matches the existing behaviour if they aren't overridden by the target. Differential revision: https://reviews.llvm.org/D32536 llvm-svn: 307346
* Revert r307342, r307343.Evgeniy Stepanov2017-07-071-48/+0
| | | | | | | | | | Revert "Copy arguments passed by value into explicit allocas for ASan." Revert "[asan] Add end-to-end tests for overflows of byval arguments." Build failure on lldb-x86_64-ubuntu-14.04-buildserver. Test failure on clang-cmake-aarch64-42vma and sanitizer-x86_64-linux-android. llvm-svn: 307345
* Copy arguments passed by value into explicit allocas for ASan.Evgeniy Stepanov2017-07-071-0/+48
| | | | | | | | | | | | | | ASan determines the stack layout from alloca instructions. Since arguments marked as "byval" do not have an explicit alloca instruction, ASan does not produce red zones for them. This commit produces an explicit alloca instruction and copies the byval argument into the allocated memory so that red zones are produced. Patch by Matt Morehouse. Differential revision: https://reviews.llvm.org/D34789 llvm-svn: 307342
* [ConstHoisting] Turn on consthoist-with-block-frequency by default.Wei Mi2017-07-071-4/+1
| | | | | | | | | | | Using profile information to guide consthoisting is generally helpful for performance, so the patch turns it on by default. No compile time or perf regression were found using spec2000 and spec2006 on x86. Some significant improvement (>20%) was seen on internal benchmarks. Differential Revision: https://reviews.llvm.org/D35063 llvm-svn: 307338
* Reverting r307326 because it breaks clang tests.Michael Kuperstein2017-07-063-93/+0
| | | | llvm-svn: 307334
* [ConstHoisting] choose to hoist when frequency is the same.Wei Mi2017-07-061-4/+48
| | | | | | | | | | | | The patch is to adjust the strategy of frequency based consthoisting: Previously when the candidate block has the same frequency with the existing blocks containing a const, it will not hoist the const to the candidate block. For that case, now we change the strategy to hoist the const if only existing blocks have more than one block member. This is helpful for reducing code size. Differential Revision: https://reviews.llvm.org/D35084 llvm-svn: 307328
* [NVPTX] Add lowering of i128 params.Michael Kuperstein2017-07-063-0/+93
| | | | | | | | | | | | | | | | | The patch adds support of i128 params lowering. The changes are quite trivial to support i128 as a "special case" of integer type. With this patch, we lower i128 params the same way as aggregates of size 16 bytes: .param .b8 _ [16]. Currently, NVPTX can't deal with the 128 bit integers: * in some cases because of failed assertions like ValVTs.size() == OutVals.size() && "Bad return value decomposition" * in other cases emitting PTX with .i128 or .u128 types (which are not valid [1]) [1] http://docs.nvidia.com/cuda/parallel-thread-execution/index.html#fundamental-types Differential Revision: https://reviews.llvm.org/D34555 Patch by: Denys Zariaiev (denys.zariaiev@gmail.com) llvm-svn: 307326
* Change remaining references to lit.util.capture to use subprocess.check_output.David L. Jones2017-07-062-4/+6
| | | | | | | | | | | | | | Summary: The capture() function was removed in r306625. This should fix PGO breakages reported by Michael Zolotukhin. Reviewers: mzolotukhin Subscribers: sanjoy, llvm-commits Differential Revision: https://reviews.llvm.org/D35088 llvm-svn: 307320
* Use @LINE in two more tests.Rafael Espindola2017-07-062-4/+2
| | | | llvm-svn: 307318
* AMDGPU: Add macro fusion schedule DAG mutationMatt Arsenault2017-07-0614-305/+422
| | | | | | Try to increase opportunities to shrink vcc uses. llvm-svn: 307313
* AMDGPU: Remove unnecessary IR from MIR testsMatt Arsenault2017-07-067-358/+113
| | | | llvm-svn: 307311
* [AMDGPU] Always use rcp + mul with fast mathStanislav Mekhanoshin2017-07-062-43/+32
| | | | | | | | | | | | | | | | | | | | | | | | | Regardless of relaxation options such as -cl-fast-relaxed-math we are producing rather long code for fdiv via amdgcn_fdiv_fast intrinsic. This intrinsic is used to replace fdiv with 2.5ulp metadata and does not handle denormals, thus believed to be fast. An fdiv instruction can also have fast math flag either by itself or together with fpmath metadata. Clang used with a relaxation flag always produces both metadata and fast flag: %div = fdiv fast float %v, %0, !fpmath !12 !12 = !{float 2.500000e+00} Current implementation ignores fast flag and favors metadata. An instruction with just fast flag would be lowered to a fastest rcp + mul, but that never happen on practice because of described mutual clang and BE behavior. This change allows an "fdiv fast" to be always lowered as rcp + mul. Differential Revision: https://reviews.llvm.org/D34844 llvm-svn: 307308
* [ValueTracking] Support icmps fed by 'and' and 'or'.Chad Rosier2017-07-062-0/+224
| | | | | | | | | | This patch adds support for handling some forms of ands and ors in ValueTracking's isImpliedCondition API. PR33611 https://reviews.llvm.org/D34901 llvm-svn: 307304
* [LTO] Fix the interaction between linker redefined symbols and ThinLTODavide Italiano2017-07-061-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | This is the same as r304719 but for ThinLTO. The substantial difference is that in this case we don't have whole visibility, just the summary. In the LTO case, when we got the resolution for the input file we could just see if the linker told us whether a symbol was linker redefined (using --wrap or --defsym) and switch the linkage directly for the GV. Here, we have the summary. So, we record that the linkage changed from <whatever it was> to $weakany to prevent IPOs across this symbol boundaries and actually just switch the linkage at FunctionImport time. This patch should also fixes the lld bits (as all the scaffolding for communicating if a symbol is linker redefined should be there & should be the same), but I'll make sure to add some tests there as well. Fixes PR33192. Differential Revision: https://reviews.llvm.org/D35064 llvm-svn: 307303
* [X86][SSE] Tests for bitcasting iX integers to vXi1 boolean vectorsSimon Pilgrim2017-07-063-0/+7447
| | | | | | Including sign/zero extension to legal types llvm-svn: 307301
* Add @LINE to checks in a test.Rafael Espindola2017-07-061-22/+29
| | | | | | This makes it a lot easier to see which error failed a check. llvm-svn: 307300
* Modify constraints in `llvm::canReplaceOperandWithVariable`Leo Li2017-07-061-0/+24
| | | | | | | | | | | | | | | | | Summary: `Instruction::Switch`: only first operand can be set to a non-constant value. `Instruction::InsertValue` both the first and the second operand can be set to a non-constant value. `Instruction::Alloca` return true for non-static allocation. Reviewers: efriedma Reviewed By: efriedma Subscribers: srhines, pirama, llvm-commits Differential Revision: https://reviews.llvm.org/D34905 llvm-svn: 307294
* [LoopUnrollRuntime] Bailout when multiple exiting blocks to the unique latch ↵Anna Thomas2017-07-061-0/+30
| | | | | | | | | | | | | | exit block Currently, we do not support multiple exiting blocks to the latch exit block. However, this bailout wasn't triggered when we had a unique exit block (which is the latch exit), with multiple exiting blocks to that unique exit. Moved the bailout so that it's triggered in both cases and added testcase. llvm-svn: 307291
* [X86][SSE] Dropped -mcpu from bitcast+setcc testsSimon Pilgrim2017-07-062-100/+475
| | | | | | | | Use triple and attribute only for consistency Added SSE2/AVX tests on 256-bit vectors to test PACKSS behaviour llvm-svn: 307289
* Bitcode: Include any strings added to the string table in the module hash.Peter Collingbourne2017-07-064-4/+39
| | | | | | Differential Revision: https://reviews.llvm.org/D35037 llvm-svn: 307286
* [InstCombine] Add single use checks to SimplifyBSwap to ensure we are really ↵Craig Topper2017-07-061-6/+4
| | | | | | | | | | | | saving instructions Bswap isn't a simple operation so we need to make sure we are really removing a call to it before doing these simplifications. For the case when both LHS and RHS are bswaps I've allowed it to be moved if either LHS or RHS has a single use since that at least allows us to move it later where it might find another bswap to combine with and it decreases the use count on the other side so maybe the other user can be optimized. Differential Revision: https://reviews.llvm.org/D34974 llvm-svn: 307273
* [LSR] Narrow search space by filtering non-optimal formulae with the same ↵Wei Mi2017-07-063-3/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | ScaledReg and Scale. When the formulae search space is huge, LSR uses a series of heuristic to keep pruning the search space until the number of possible solutions are within certain limit. The big hammer of the series of heuristics is NarrowSearchSpaceByPickingWinnerRegs, which picks the register which is used by the most LSRUses and deletes the other formulae which don't use the register. This is a effective way to prune the search space, but quite often not a good way to keep the best solution. We saw cases before that the heuristic pruned the best formula candidate out of search space. To relieve the problem, we introduce a new heuristic called NarrowSearchSpaceByFilterFormulaWithSameScaledReg. The basic idea is in order to reduce the search space while keeping the best formula, we want to keep as many formulae with different Scale and ScaledReg as possible. That is because the central idea of LSR is to choose a group of loop induction variables and use those induction variables to represent LSRUses. An induction variable candidate is often represented by the Scale and ScaledReg in a formula. If we have more formulae with different ScaledReg and Scale to choose, we have better opportunity to find the best solution. That is why we believe pruning search space by only keeping the best formula with the same Scale and ScaledReg should be more effective than PickingWinnerReg. And we use two criteria to choose the best formula with the same Scale and ScaledReg. The first criteria is to select the formula using less non shared registers, and the second criteria is to select the formula with less cost got from RateFormula. The patch implements the heuristic before NarrowSearchSpaceByPickingWinnerRegs, which is the last resort. Testing shows we get 1.8% and 2% on two internal benchmarks on x86. llvm nightly testsuite performance is neutral. We also tried lsr-exp-narrow and it didn't help on the two improved internal cases we saw. Differential Revision: https://reviews.llvm.org/D34583 llvm-svn: 307269
* [X86][SSE4A] Add support for shuffle combining to INSERTQI.Simon Pilgrim2017-07-061-13/+4
| | | | llvm-svn: 307268
* [CGP, x86] update test checks; NFCSanjay Patel2017-07-061-38/+39
| | | | | | | | This was auto-generated using an older version of the script, and that version does not work with phis, so if we enable expansion it will go bad. llvm-svn: 307267
* [X86][SSE4A] Add test showing missed opportunities to combine INSERTQI shuffleSimon Pilgrim2017-07-061-0/+20
| | | | llvm-svn: 307265
* [x86] fix over-specified triple and auto-generate checks; NFCSanjay Patel2017-07-061-17/+25
| | | | llvm-svn: 307262
OpenPOWER on IntegriCloud