summaryrefslogtreecommitdiffstats
path: root/llvm/test
Commit message (Collapse)AuthorAgeFilesLines
* Revert "Add support for generating a call graph profile from Branch ↵Benjamin Kramer2018-06-285-123/+0
| | | | | | | | Frequency Info." This reverts commits r335794 and r335797. Breaks ThinLTO+FDO selfhost. llvm-svn: 335851
* [ARM] Parallel DSP PassSjoerd Meijer2018-06-2813-0/+677
| | | | | | | | | | | | | | | | | Armv6 introduced instructions to perform 32-bit SIMD operations. The purpose of this pass is to do some straightforward IR pattern matching to create ACLE DSP intrinsics, which map on these 32-bit SIMD operations. Currently, only the SMLAD instruction gets recognised. This instruction performs two multiplications with 16-bit operands, and stores the result in an accumulator. We will follow this up with patches to recognise SMLAD in more cases, and also to generate other DSP instructions (like e.g. SADD16). Patch by: Sam Parker and Sjoerd Meijer Differential Revision: https://reviews.llvm.org/D48128 llvm-svn: 335850
* AMDGPU: Error on calls from graphics shadersMatt Arsenault2018-06-281-0/+7
| | | | | | | | In principle nothing should stop these from working, but work is necessary to create an ABI for dealing with the stack related registers. llvm-svn: 335829
* AMDGPU: Fix assert on aggregate type kernel argumentsMatt Arsenault2018-06-281-0/+171
| | | | | | | | | | Just fix the crash for now by not doing the optimization since figuring out how to properly convert the bits for an arbitrary struct is a pain. Also fix a crash when there is only an empty struct argument. llvm-svn: 335827
* [DAGCombiner] Ensure we use the correct CC result type in visitSDIVSimon Pilgrim2018-06-282-6/+33
| | | | | | | | We could get away with it for constant folded cases, but not for rL335719. Thanks to Krzysztof Parzyszek for noticing. llvm-svn: 335821
* [SCCP] Mark CFG as preserved.Florian Hahn2018-06-284-6/+36
| | | | | | | | | | | | SCCP does not change the CFG, so we can mark it as preserved. Reviewers: dberlin, efriedma, davide Reviewed By: davide Differential Revision: https://reviews.llvm.org/D47149 llvm-svn: 335820
* [IndVarSimplify] Ignore unreachable users of truncsMax Kazantsev2018-06-281-0/+47
| | | | | | | If a trunc has a user in a block which is not reachable from entry, we can safely perform trunc elimination as if this user didn't exist. llvm-svn: 335816
* Add support for generating a call graph profile from Branch Frequency Info.Michael J. Spencer2018-06-275-0/+123
| | | | | | | | | | | | | | | | | | | | | === Generating the CG Profile === The CGProfile module pass simply gets the block profile count for each BB and scans for call instructions. For each call instruction it adds an edge from the current function to the called function with the current BB block profile count as the weight. After scanning all the functions, it generates an appending module flag containing the data. The format looks like: ``` !llvm.module.flags = !{!0} !0 = !{i32 5, !"CG Profile", !1} !1 = !{!2, !3, !4} ; List of edges !2 = !{void ()* @a, void ()* @b, i64 32} ; Edge from a to b with a weight of 32 !3 = !{void (i1)* @freq, void ()* @a, i64 11} !4 = !{void (i1)* @freq, void ()* @b, i64 20} ``` Differential Revision: https://reviews.llvm.org/D48105 llvm-svn: 335794
* [RISCV] Add machine function pass to merge base + offsetSameer AbuAsal2018-06-271-37/+38
| | | | | | | | | | | | | | | | | | | | | | | | Summary: In r333455 we added a peephole to fix the corner cases that result from separating base + offset lowering of global address.The peephole didn't handle some of the cases because it only has a basic block view instead of a function level view. This patch replaces that logic with a machine function pass. In addition to handling the original cases it handles uses of the global address across blocks in function and folding an offset from LW\SW instruction. This pass won't run for OptNone compilation, so there will be a negative impact overall vs the old approach at O0. Reviewers: asb, apazos, mgrang Reviewed By: asb Subscribers: MartinMosbeck, brucehoult, the_o, rogfer01, mgorny, rbar, johnrusso, simoncook, niosHD, kito-cheng, shiva0217, zzheng, llvm-commits, edward-jones Differential Revision: https://reviews.llvm.org/D47857 llvm-svn: 335786
* [InstCombine] add tests for vector-select-of-binops with 2 variables; NFCSanjay Patel2018-06-271-0/+263
| | | | llvm-svn: 335778
* [WebAssembly] Try fixing test/CodeGen/WebAssembly/vector_sdiv.llFangrui Song2018-06-271-3/+3
| | | | llvm-svn: 335771
* [X86] Teach the disassembler to use %eiz/%riz instead of NoRegister when the ↵Craig Topper2018-06-272-8/+38
| | | | | | | | | | SIB byte is present, but doesn't encode an index register and there was another shorter encoding that would achieve the same result. The %eiz/%riz are dummy registers that force the encoder to emit a SIB byte when it normally wouldn't. By emitting them in the disassembly output we ensure that assembling the disassembler output would also produce a SIB byte. This should match the behavior of objdump from binutils. llvm-svn: 335768
* [globalisel][legalizer] Add AtomicOrdering to LegalityQuery and use it in ↵Daniel Sanders2018-06-271-12/+12
| | | | | | | | | | | | | AArch64 Now that we have the ability to legalize based on MMO's. Add support for legalizing based on AtomicOrdering and use it to correct the legalization of the atomic instructions. Also extend all() to be a variadic template as this ruleset now requires 3 and 4 argument versions. llvm-svn: 335767
* [ThinLTO] Fix testTeresa Johnson2018-06-271-9/+7
| | | | | | | | | | | | Fix test changes added in r335760. Even though we are invoking llvm-lto2 in single threaded mode, the order of processing the modules in the backend is apparently not deterministic. Handle the expected debug messages in any order. (The determinism would be good to fix, but not related to this change.) This also undoes the change I made in r335764 to help debug this. llvm-svn: 335766
* [ThinLTO] Modify test to help diagnose bot failuresTeresa Johnson2018-06-271-1/+3
| | | | | | | | I am getting bot failures from r335760 that are difficult to diagnose since the stderr is getting redirected to FileCheck. Save and dump the debug output to stderr to help debug the issue. llvm-svn: 335764
* [DAGCombiner] restrict (float)((int) f) --> ftrunc with no-signed-zerosSanjay Patel2018-06-279-39/+79
| | | | | | | | | | | | | | | | | | | | | | | | | | | | As noted in the D44909 review, the transform from (fptosi+sitofp) to ftrunc can produce -0.0 where the original code does not: #include <stdio.h> int main(int argc) { float x; x = -0.8 * argc; printf("%f\n", (float)((int)x)); return 0; } $ clang -O0 -mavx fp.c ; ./a.out 0.000000 $ clang -O1 -mavx fp.c ; ./a.out -0.000000 Ideally, we'd use IR/node flags to predicate the transform, but the IR parser doesn't currently allow fast-math-flags on the cast instructions. So for now, just use the function attribute that corresponds to clang's "-fno-signed-zeros" option. Differential Revision: https://reviews.llvm.org/D48085 llvm-svn: 335761
* [ThinLTO] Print names in function import debug messages when availableTeresa Johnson2018-06-273-1/+58
| | | | | | | | | | | | | | | | Summary: Rather than just print the GUID, when it is available in the index, print the global name as well in the function import thin link debug messages. Names will be available when the combined index is being built by the same process, e.g. a linker or "llvm-lto2 run". Reviewers: davidxl Subscribers: mehdi_amini, inglorion, eraman, steven_wu, llvm-commits Differential Revision: https://reviews.llvm.org/D48612 llvm-svn: 335760
* [MachineOutliner] Don't outline sequences where x16/x17/nzcv are live acrossJessica Paquette2018-06-271-0/+183
| | | | | | | | | | | | | | It isn't safe to outline sequences of instructions where x16/x17/nzcv live across the sequence. This teaches the outliner to check whether or not a specific canidate has x16/x17/nzcv live across it and discard the candidate in the case that that is true. https://bugs.llvm.org/show_bug.cgi?id=37573 https://reviews.llvm.org/D47655 llvm-svn: 335758
* [InstCombine] add more tests for shuffle with different binops; NFCSanjay Patel2018-06-271-0/+36
| | | | llvm-svn: 335756
* [X86] Use bts/btr/btc for single bit set/clear/complement of a variable bit ↵Craig Topper2018-06-271-68/+33
| | | | | | | | | | position If we are just modifying a single bit at a variable bit position we can use the BT* instructions to make the change instead of shifting a 1(or rotating a -1) and doing a binop. These instruction also ignore the upper bits of their index input so we can also remove an and if one is present on the index. Fixes PR37938. llvm-svn: 335754
* [X86] Add test cases for D48606.Craig Topper2018-06-271-0/+995
| | | | llvm-svn: 335753
* [X86][SSE] Add missing AVX512 rotation testsSimon Pilgrim2018-06-273-230/+943
| | | | | | Increase coverage to make sure we're not doing anything stupid without AVX512BW llvm-svn: 335746
* [X86] Rename the autoupgraded of packed fp compare and fpclass intrinsics ↵Craig Topper2018-06-2711-129/+129
| | | | | | | | that don't take a mask as input to exclude '.mask.' from their name. I think the intrinsics named 'avx512.mask.' should refer to the previous behavior of taking a mask argument in the intrinsic instead of using a 'select' or 'and' instruction in IR to accomplish the masking. This is more consistent with the goal that eventually we will have no intrinsics that have masking builtin. When we reach that goal, we should have no intrinsics named "avx512.mask". llvm-svn: 335744
* [AMDGPU] Convert rcp to rcp_iflagStanislav Mekhanoshin2018-06-275-22/+43
| | | | | | | | | | | If a source of rcp instruction is a result of any conversion from an integer convert it into rcp_iflag instruction. No FP exception can ever happen except division by zero if a single precision rcp argument is a representation of an integral number. Differential Revision: https://reviews.llvm.org/D48569 llvm-svn: 335742
* [AArch64] Reverting FP16 vcvth_n_s64_f16 to fixLuke Geeson2018-06-271-20/+0
| | | | llvm-svn: 335737
* [AArch64] Add custom lowering for v4i8 trunc storeAdhemerval Zanella2018-06-273-2/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds a custom trunc store lowering for v4i8 vector types. Since there is not v.4b register, the v4i8 is promoted to v4i16 (v.4h) and default action for v4i8 is to extract each element and issue 4 byte stores. A better strategy would be to extended the promoted v4i16 to v8i16 (with undef elements) and extract and store the word lane which represents the v4i8 subvectores. The construction: define void @foo(<4 x i16> %x, i8* nocapture %p) { %0 = trunc <4 x i16> %x to <4 x i8> %1 = bitcast i8* %p to <4 x i8>* store <4 x i8> %0, <4 x i8>* %1, align 4, !tbaa !2 ret void } Can be optimized from: umov w8, v0.h[3] umov w9, v0.h[2] umov w10, v0.h[1] umov w11, v0.h[0] strb w8, [x0, #3] strb w9, [x0, #2] strb w10, [x0, #1] strb w11, [x0] ret To: xtn v0.8b, v0.8h str s0, [x0] ret The patch also adjust the memory cost for autovectorization, so the C code: void foo (const int *src, int width, unsigned char *dst) { for (int i = 0; i < width; i++) *dst++ = *src++; } can be vectorized to: .LBB0_4: // %vector.body // =>This Inner Loop Header: Depth=1 ldr q0, [x0], #16 subs x12, x12, #4 // =4 xtn v0.4h, v0.4s xtn v0.8b, v0.8h st1 { v0.s }[0], [x2], #4 b.ne .LBB0_4 Instead of byte operations. llvm-svn: 335735
* [NEON] Support vldNq intrinsics in AArch32 (LLVM part)Ivan A. Kosarev2018-06-271-0/+234
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for the q versions of the dup (load-to-all-lanes) NEON intrinsics, such as vld2q_dup_f16() for example. Currently, non-q versions of the dup intrinsics are implemented in clang by generating IR that first loads the elements of the structure into the first lane with the lane (to-single-lane) intrinsics, and then propagating it other lanes. There are at least two problems with this approach. First, there are no double-spaced to-single-lane byte-element instructions. For example, there is no such instruction as 'vld2.8 { d0[0], d2[0] }, [r0]'. That means we cannot rely on the to-single-lane intrinsics and instructions to implement the q versions of the dup intrinsics. Note that to-all-lanes instructions do support all sizes of data items, including bytes. The second problem with the current approach is that we need a separate vdup instruction to propagate the structure to each lane. So for vld4q_dup_f16() we would need four vdup instructions in addition to the initial vld instruction. This patch introduces dup LLVM intrinsics and reworks handling of the currently supported (non-q) NEON dup intrinsics to expand them into those LLVM intrinsics, thus eliminating the need for using to-single-lane intrinsics and instructions. Additionally, this patch adds support for u64 and s64 dup NEON intrinsics. These are marked as Arch64-only in the ARM NEON Reference, but it seems there are no reasons to not support them in AArch32 mode. Please correct, if that is wrong. That's what we generate with this patch applied: vld2q_dup_f16: vld2.16 {d0[], d2[]}, [r0] vld2.16 {d1[], d3[]}, [r0] vld3q_dup_f16: vld3.16 {d0[], d2[], d4[]}, [r0] vld3.16 {d1[], d3[], d5[]}, [r0] vld4q_dup_f16: vld4.16 {d0[], d2[], d4[], d6[]}, [r0] vld4.16 {d1[], d3[], d5[], d7[]}, [r0] Differential Revision: https://reviews.llvm.org/D48439 llvm-svn: 335733
* [DAGCombiner] visitSDIV - add special case handling for (sdiv X, 1) -> X in ↵Simon Pilgrim2018-06-271-3535/+1415
| | | | | | | | pow2 expansion For divisor = 1, perform a select of X - reduces scalarisation of simple SDIVs llvm-svn: 335727
* [X86][SSE] Include MIN_SIGNED element in non-uniform SDIV pow2 testsSimon Pilgrim2018-06-271-25/+17
| | | | llvm-svn: 335721
* [DAGCombiner] Fold SDIV(%X, MIN_SIGNED) -> SELECT(%X == MIN_SIGNED, 1, 0)Simon Pilgrim2018-06-271-54/+12
| | | | | | Fixes PR37569. llvm-svn: 335719
* [DAGCombiner] Don't accept signbit sdiv divisors in sdiv-by-pow2 vector ↵Simon Pilgrim2018-06-271-17/+63
| | | | | | expansion (PR37569) llvm-svn: 335717
* [AArch64] Remove Duplicate FP16 Patterns with same encoding, match on ↵Luke Geeson2018-06-271-29/+22
| | | | | | existing patterns llvm-svn: 335715
* [InstCombine] Avoid creating mis-sized dbg.values in commonCastTransforms()Vedant Kumar2018-06-271-0/+3
| | | | | | | | | | | | | | This prevents InstCombine from creating mis-sized dbg.values when replacing a sequence of casts with a simpler cast. For example, in: (fptrunc (floor (fpext X))) -> (floorf X) We no longer emit dbg.value(X) (with a 32-bit float operand) to describe (fpext X) (which is a 64-bit float). This was diagnosed by the debugify check added in r335682. llvm-svn: 335696
* [Debugify] Handle failure to get fragment size when checking dbg.valuesVedant Kumar2018-06-271-0/+1
| | | | | | | | | | It's not possible to get the fragment size of some dbg.values. Teach the mis-sized dbg.value diagnostic to detect this scenario and bail out. Tested with: $ find test/Transforms -print -exec opt -debugify-each -instcombine {} \; llvm-svn: 335695
* [Debugify] Diagnose mis-sized dbg.valuesVedant Kumar2018-06-261-0/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Report an error in -check-debugify when the size of a dbg.value operand doesn't match up with the size of the variable it describes. Eventually this check should be moved into the IR verifier. For the moment, it's useful to include the check in -check-debugify as a means of catching regressions and finding existing bugs. Here are some instances of bugs the new check finds in the -O2 pipeline (all in InstCombine): 1) A float is used where a double is expected: ERROR: dbg.value operand has size 32, but its variable has size 64: call void @llvm.dbg.value(metadata float %expf, metadata !12, metadata !DIExpression()), !dbg !15 2) An i8 is used where an i32 is expected: ERROR: dbg.value operand has size 8, but its variable has size 32: call void @llvm.dbg.value(metadata i8 %t4, metadata !14, metadata !DIExpression()), !dbg !24 3) A <4 x i32> is used where something twice as large is expected (perhaps a <4 x i64>, I haven't double-checked): ERROR: dbg.value operand has size 128, but its variable has size 256: call void @llvm.dbg.value(metadata <4 x i32> %4, metadata !40, metadata !DIExpression()), !dbg !95 Differential Revision: https://reviews.llvm.org/D48408 llvm-svn: 335682
* Revert "[asan] Instrument comdat globals on COFF targets"Evgeniy Stepanov2018-06-261-46/+0
| | | | | | Causes false positive ODR violation reports on __llvm_profile_raw_version. llvm-svn: 335681
* [JumpThreading] Don't try to rewrite a use if it's already valid.Michael Zolotukhin2018-06-261-0/+19
| | | | | | | | | | | | | | | | | | | Summary: When recording uses we need to rewrite after cloning a loop we need to check if the use is not dominated by the original def. The initial assumption was that the cloned basic block will introduce a new path and thus the original def will only dominate the use if they are in the same BB, but as the reproducer from PR37745 shows it's not always the case. This fixes PR37745. Reviewers: haicheng, Ka-Ka Subscribers: hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D48111 llvm-svn: 335675
* [X86] Add test for SDIV by sign bit (minsigned) valueSimon Pilgrim2018-06-261-0/+37
| | | | llvm-svn: 335671
* [ORC] Add LLJIT and LLLazyJIT, and replace OrcLazyJIT in LLI with LLLazyJIT.Lang Hames2018-06-261-1/+1
| | | | | | | | | | | | | | | | | | | LLJIT is a prefabricated ORC based JIT class that is meant to be the go-to replacement for MCJIT. Unlike OrcMCJITReplacement (which will continue to be supported) it is not API or bug-for-bug compatible, but targets the same use cases: Simple, non-lazy compilation and execution of LLVM IR. LLLazyJIT extends LLJIT with support for function-at-a-time lazy compilation, similar to what was provided by LLVM's original (now long deprecated) JIT APIs. This commit also contains some simple utility classes (CtorDtorRunner2, LocalCXXRuntimeOverrides2, JITTargetMachineBuilder) to support LLJIT and LLLazyJIT. Both of these classes are works in progress. Feedback from JIT clients is very welcome! llvm-svn: 335670
* [X86][AsmParser] Recommit r335658Jessica Paquette2018-06-261-0/+13
| | | | | | | Recommit of r335658 so that it does not change the behaviour of any existing error output. llvm-svn: 335668
* Revert "[X86][AsmParser] Emit an error when RIP-relative instructions are ↵Jessica Paquette2018-06-261-13/+0
| | | | | | | | | | used in 32-bit mode" This reverts commit 4850a9aae8b38c7deadc103d634ec7397e6c323b. It caused MC/X86/x86_errors.s to fail. Will fix and recommit shortly. llvm-svn: 335660
* [X86][AsmParser] Emit an error when RIP-relative instructions are used in ↵Jessica Paquette2018-06-261-0/+13
| | | | | | | | | | | | | 32-bit mode Right now, when we use RIP-relative instructions in 32-bit mode, we'll just assert and crash. This adds an error message which tells the user that they can't do that in 32-bit mode, so that we don't crash (and also can see the issue outside of assert builds). llvm-svn: 335658
* [AMDGPU] Add llvm.amdgcn.fmad.ftz intrinsicStanislav Mekhanoshin2018-06-261-0/+114
| | | | | | | | This intrinsic selects v_mad_f32 regardless of fp32 denorm support. Differential Revision: https://reviews.llvm.org/D48573 llvm-svn: 335654
* AMDGPU: Add pass to lower kernel arguments to loadsMatt Arsenault2018-06-26126-1539/+2828
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This replaces most argument uses with loads, but for now not all. The code in SelectionDAG for calling convention lowering is actively harmful for amdgpu_kernel. It attempts to split the argument types into register legal types, which results in low quality code for arbitary types. Since all kernel arguments are passed in memory, we just want the raw types. I've tried a couple of methods of mitigating this in SelectionDAG, but it's easier to just bypass this problem alltogether. It's possible to hack around the problem in the initial lowering, but the real problem is the DAG then expects to be able to use CopyToReg/CopyFromReg for uses of the arguments outside the block. Exposing the argument loads in the IR also has the advantage that the LoadStoreVectorizer can merge them. I'm not sure the best approach to dealing with the IR argument list is. The patch as-is just leaves the IR arguments in place, so all the existing code will still compute the same kernarg size and pointlessly lowers the arguments. Arguably the frontend should emit kernels with an empty argument list in the first place. Alternatively a dummy array could be inserted as a single argument just to reserve space. This does have some disadvantages. Local pointer kernel arguments can no longer have AssertZext placed on them as the equivalent !range metadata is not valid on pointer typed loads. This is mostly bad for SI which needs to know about the known bits in order to use the DS instruction offset, so in this case this is not done. More importantly, this skips noalias arguments since this pass does not yet convert this to the equivalent !alias.scope and !noalias metadata. Producing this metadata correctly seems to be tricky, although this logically is the same as inlining into a function which doesn't exist. Additionally, exposing these loads to the vectorizer may result in degraded aliasing information if a pointer load is merged with another argument load. I'm also not entirely sure this is preserving the current clover ABI, although I would greatly prefer if it would stop widening arguments and match the HSA ABI. As-is I think it is extending < 4-byte arguments to 4-bytes but doesn't align them to 4-bytes. llvm-svn: 335650
* ConstantFold: Don't fold global address vs. null for addrspace != 0Matt Arsenault2018-06-262-2/+13
| | | | | | | | | | | Not sure why this logic seems to be repeated in 2 different places, one called by the other. On AMDGPU addrspace(3) globals start allocating at 0, so these checks will be incorrect (not that real code actually tries to compare these addresses) llvm-svn: 335649
* [Debugify] Don't treat missing dbg.values as an error (PR37942)Vedant Kumar2018-06-261-1/+1
| | | | | | | | | When checking the debug info in a module, don't treat a missing dbg.value as an error. The dbg.value may simply have been DCE'd, in which case the debugger has enough information to display the variable as <optimized out>. llvm-svn: 335647
* LoopUnroll: Allow analyzing intrinsic call costsMatt Arsenault2018-06-261-0/+77
| | | | | | | | | | | I'm not sure why the code here is skipping calls since TTI does try to do something for general calls, but it at least should allow intrinsics. Skip intrinsics that should not be omitted as calls, which is by far the most common case on AMDGPU. llvm-svn: 335645
* [Hexagon] Add a "generic" cpuBrendon Cahoon2018-06-261-0/+7
| | | | | | | | | | Add the generic processor for Hexagon so that it can be used with 3rd party programs that create a back-end with the "generic" CPU. This patch also enables the JIT for Hexagon. Differential Revision: https://reviews.llvm.org/D48571 llvm-svn: 335641
* [DAGCombiner] Don't accept -1 sdiv divisors in sdiv-by-pow2 vector expansion ↵Simon Pilgrim2018-06-261-192/+185
| | | | | | | | (PR37119) Temporary fix until I've managed to get D45806 updated - both +1 and -1 special cases need to be properly supported. llvm-svn: 335637
* Move `REQUIRES:` line to the topFangrui Song2018-06-2611-14/+12
| | | | llvm-svn: 335635
OpenPOWER on IntegriCloud