summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* Broaden the definition of a "widenable branch"Philip Reames2019-11-213-26/+69
| | | | | | | | | | | | As a reminder, a "widenable branch" is the pattern "br i1 (and i1 X, WC()), label %taken, label %untaken" where "WC" is the widenable condition intrinsics. The semantics of such a branch (derived from the semantics of WC) is that a new condition can be added into the condition arbitrarily without violating legality. Broaden the definition in two ways: Allow swapped operands to the br (and X, WC()) form Allow widenable branch w/trivial condition (i.e. true) which takes form of br i1 WC() The former is just general robustness (e.g. for X = non-instruction this is what instcombine produces). The later is specifically important as partial unswitching of a widenable range check produces exactly this form above the loop. Differential Revision: https://reviews.llvm.org/D70502
* [SCEV] Add a mode to skip classification when printing analysisPhilip Reames2019-11-211-55/+63
| | | | For the various trip-count tests, the classification isn't useful and makes the auto-generated tests super verbose. By skipping it, we make the auto-gen tests closer to the manually written ones. Up next: auto-genning a bunch of the existings tests.
* [SCEV] Be robust against IR generated by simple-loop-unswitchPhilip Reames2019-11-211-0/+10
| | | | | | Simple loop unswitch likes to leave around unsimplified and/or/xors. SCEV today bails out on these idioms which is unfortunate in general, and specifically for the unswitch interaction. Differential Revision: https://reviews.llvm.org/D70459
* [BranchFolding] Fix PR43964 about branch folder not being debug invariantBjorn Pettersson2019-11-211-95/+48
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The fix in BranchFolder related to non debug invariant problems done in commit ec32dff0b075055 actually introduced some new problems with debug invariance. Before that patch ComputeCommonTailLength would move iterators back, past debug instructions, in order to make ProfitableToMerge make consistent answers "when one block differs from the other only by whether debugging pseudos are present at the beginning". But the changes in ec32dff0b075055 undid that by moving the iterators forward again. This patch refactors ComputeCommonTailLength. The function was really complex, considering that the SkipTopCFIAndReturn part always moved the iterators forward to the first "real" instruction in the found tail after ec32dff0b075055. The patch also restores the logic to "back past possible debugging pseudos at beginning of block" to make sure ProfitableToMerge gives consistent answers independent of DBG_VALUE instructions before the tail. That is now done by ProfitableToMerge instead of being hidden as a side-effect in ComputeCommonTailLength. Reviewers: probinson, yechunliang, jmorse Reviewed By: jmorse Subscribers: Orlando, mehdi_amini, dexonsmith, aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70091
* [InstCombine] add assert in SimplifyDemandedVectorElts and improve ↵Sanjay Patel2019-11-211-19/+22
| | | | readability; NFC
* [Hexagon] Remove incorrect intrinsic definition and invalid testcaseKrzysztof Parzyszek2019-11-211-3/+0
| | | | | | | | | The intrinsic int_hexagon_S2_asr_i_vh was mapped to S2_asr_r_vh, which is wrong. The testcase vasrh.select.ll was using an invalid immediate for that intrinsic. This is not a proper testcase, since at the MIR level such use of this intrinsic should never appear. Together with 824b25fc02, this completes the fix for llvm.org/PR44090.
* [mips] Add a 'generic' Mips CPUMiloš Stojanović2019-11-211-0/+1
| | | | | | | Having a generic CPU removes a warning when creating a subtarget without the CPU being explicitly specified. Differential Revision: https://reviews.llvm.org/D70490
* [LV] PreferPredicateOverEpilog respecting optionSjoerd Meijer2019-11-211-1/+5
| | | | | | | | Follow-up of cb47b8783: don't query TTI->preferPredicateOverEpilogue when option -prefer-predicate-over-epilog is set to false, i.e. when we prefer not to predicate the loop. Differential Revision: https://reviews.llvm.org/D70382
* [Debuginfo][NFC] removes redundant semicolon.Alexey Lapshin2019-11-211-1/+1
|
* Statistic - Fix MSVC shadow warning against global PrintOnExit static ↵Simon Pilgrim2019-11-211-2/+2
| | | | variable. NFC.
* Reland 9f3fdb0d7fab: [Driver] Use VFS to check if sanitizer blacklists existIlya Biryukov2019-11-212-6/+11
| | | | | | | With updates to various LLVM tools that use SpecialCastList. It was tempting to use RealFileSystem as the default, but that makes it too easy to accidentally forget passing VFS in clang code.
* [mips] Remove unused `IsPCRelativeLoad` MIPS instructions attribute. NFCSimon Atanasyan2019-11-213-11/+3
| | | | This attribute is always set to zero.
* Revert "[DependenceAnalysis] Dependecies for loads marked with ↵Benjamin Kramer2019-11-211-19/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151" Summary: Revert "[DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151" This reverts commit 5f026b6d9e882941fde9b7e5dc0a2d807f7f24f5. We're (tensorflow.org/xla team) seeing some misscompiles with the new change, only at -O3, with fast math disabled. I'm still trying to come up with a useful/small/external example, but for now, the following IR: ``` ; ModuleID = '__compute_module' source_filename = "__compute_module" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-grtev4-linux-gnu" @0 = private unnamed_addr constant [4 x i8] c"\DB\0F\C9@" @1 = private unnamed_addr constant [4 x i8] c"\00\00\00?" ; Function Attrs: uwtable define void @jit_wrapped_fun.31(i8* %retval, i8* noalias %run_options, i8** noalias %params, i8** noalias %buffer_table, i64* noalias %prof_counters) #0 { entry: %fusion.invar_address.dim.2 = alloca i64 %fusion.invar_address.dim.1 = alloca i64 %fusion.invar_address.dim.0 = alloca i64 %fusion.1.invar_address.dim.2 = alloca i64 %fusion.1.invar_address.dim.1 = alloca i64 %fusion.1.invar_address.dim.0 = alloca i64 %0 = getelementptr inbounds i8*, i8** %buffer_table, i64 1 %1 = load i8*, i8** %0, !invariant.load !0, !dereferenceable !1, !align !2 %parameter.3 = bitcast i8* %1 to [2 x [1 x [4 x float]]]* %2 = getelementptr inbounds i8*, i8** %buffer_table, i64 5 %3 = load i8*, i8** %2, !invariant.load !0, !dereferenceable !1, !align !2 %fusion.1 = bitcast i8* %3 to [2 x [1 x [4 x float]]]* store i64 0, i64* %fusion.1.invar_address.dim.0 br label %fusion.1.loop_header.dim.0 fusion.1.loop_header.dim.0: ; preds = %fusion.1.loop_exit.dim.1, %entry %fusion.1.indvar.dim.0 = load i64, i64* %fusion.1.invar_address.dim.0 %4 = icmp uge i64 %fusion.1.indvar.dim.0, 2 br i1 %4, label %fusion.1.loop_exit.dim.0, label %fusion.1.loop_body.dim.0 fusion.1.loop_body.dim.0: ; preds = %fusion.1.loop_header.dim.0 store i64 0, i64* %fusion.1.invar_address.dim.1 br label %fusion.1.loop_header.dim.1 fusion.1.loop_header.dim.1: ; preds = %fusion.1.loop_exit.dim.2, %fusion.1.loop_body.dim.0 %fusion.1.indvar.dim.1 = load i64, i64* %fusion.1.invar_address.dim.1 %5 = icmp uge i64 %fusion.1.indvar.dim.1, 1 br i1 %5, label %fusion.1.loop_exit.dim.1, label %fusion.1.loop_body.dim.1 fusion.1.loop_body.dim.1: ; preds = %fusion.1.loop_header.dim.1 store i64 0, i64* %fusion.1.invar_address.dim.2 br label %fusion.1.loop_header.dim.2 fusion.1.loop_header.dim.2: ; preds = %fusion.1.loop_body.dim.2, %fusion.1.loop_body.dim.1 %fusion.1.indvar.dim.2 = load i64, i64* %fusion.1.invar_address.dim.2 %6 = icmp uge i64 %fusion.1.indvar.dim.2, 4 br i1 %6, label %fusion.1.loop_exit.dim.2, label %fusion.1.loop_body.dim.2 fusion.1.loop_body.dim.2: ; preds = %fusion.1.loop_header.dim.2 %7 = load float, float* bitcast ([4 x i8]* @0 to float*) %8 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %fusion.1.indvar.dim.0, i64 0, i64 %fusion.1.indvar.dim.2 %9 = load float, float* %8, !invariant.load !0, !noalias !3 %10 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %fusion.1.indvar.dim.0, i64 0, i64 %fusion.1.indvar.dim.2 %11 = load float, float* %10, !invariant.load !0, !noalias !3 %12 = fmul float %9, %11 %13 = fmul float %7, %12 %14 = call float @llvm.log.f32(float %13) %15 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %fusion.1, i64 0, i64 %fusion.1.indvar.dim.0, i64 0, i64 %fusion.1.indvar.dim.2 store float %14, float* %15, !alias.scope !7, !noalias !8 %invar.inc2 = add nuw nsw i64 %fusion.1.indvar.dim.2, 1 store i64 %invar.inc2, i64* %fusion.1.invar_address.dim.2 br label %fusion.1.loop_header.dim.2 fusion.1.loop_exit.dim.2: ; preds = %fusion.1.loop_header.dim.2 %invar.inc1 = add nuw nsw i64 %fusion.1.indvar.dim.1, 1 store i64 %invar.inc1, i64* %fusion.1.invar_address.dim.1 br label %fusion.1.loop_header.dim.1 fusion.1.loop_exit.dim.1: ; preds = %fusion.1.loop_header.dim.1 %invar.inc = add nuw nsw i64 %fusion.1.indvar.dim.0, 1 store i64 %invar.inc, i64* %fusion.1.invar_address.dim.0 br label %fusion.1.loop_header.dim.0 fusion.1.loop_exit.dim.0: ; preds = %fusion.1.loop_header.dim.0 %16 = getelementptr inbounds i8*, i8** %buffer_table, i64 4 %17 = load i8*, i8** %16, !invariant.load !0, !dereferenceable !9, !align !2 %parameter.1 = bitcast i8* %17 to float* %18 = getelementptr inbounds i8*, i8** %buffer_table, i64 2 %19 = load i8*, i8** %18, !invariant.load !0, !dereferenceable !10, !align !2 %parameter.2 = bitcast i8* %19 to [3 x [1 x float]]* %20 = getelementptr inbounds i8*, i8** %buffer_table, i64 0 %21 = load i8*, i8** %20, !invariant.load !0, !dereferenceable !11, !align !2 %fusion = bitcast i8* %21 to [2 x [3 x [4 x float]]]* store i64 0, i64* %fusion.invar_address.dim.0 br label %fusion.loop_header.dim.0 fusion.loop_header.dim.0: ; preds = %fusion.loop_exit.dim.1, %fusion.1.loop_exit.dim.0 %fusion.indvar.dim.0 = load i64, i64* %fusion.invar_address.dim.0 %22 = icmp uge i64 %fusion.indvar.dim.0, 2 br i1 %22, label %fusion.loop_exit.dim.0, label %fusion.loop_body.dim.0 fusion.loop_body.dim.0: ; preds = %fusion.loop_header.dim.0 store i64 0, i64* %fusion.invar_address.dim.1 br label %fusion.loop_header.dim.1 fusion.loop_header.dim.1: ; preds = %fusion.loop_exit.dim.2, %fusion.loop_body.dim.0 %fusion.indvar.dim.1 = load i64, i64* %fusion.invar_address.dim.1 %23 = icmp uge i64 %fusion.indvar.dim.1, 3 br i1 %23, label %fusion.loop_exit.dim.1, label %fusion.loop_body.dim.1 fusion.loop_body.dim.1: ; preds = %fusion.loop_header.dim.1 store i64 0, i64* %fusion.invar_address.dim.2 br label %fusion.loop_header.dim.2 fusion.loop_header.dim.2: ; preds = %fusion.loop_body.dim.2, %fusion.loop_body.dim.1 %fusion.indvar.dim.2 = load i64, i64* %fusion.invar_address.dim.2 %24 = icmp uge i64 %fusion.indvar.dim.2, 4 br i1 %24, label %fusion.loop_exit.dim.2, label %fusion.loop_body.dim.2 fusion.loop_body.dim.2: ; preds = %fusion.loop_header.dim.2 %25 = mul nuw nsw i64 %fusion.indvar.dim.2, 1 %26 = add nuw nsw i64 0, %25 %27 = udiv i64 %26, 4 %28 = mul nuw nsw i64 %fusion.indvar.dim.0, 1 %29 = add nuw nsw i64 0, %28 %30 = udiv i64 %29, 2 %31 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %fusion.1, i64 0, i64 %29, i64 0, i64 %26 %32 = load float, float* %31, !alias.scope !7, !noalias !8 %33 = mul nuw nsw i64 %fusion.indvar.dim.1, 1 %34 = add nuw nsw i64 0, %33 %35 = udiv i64 %34, 3 %36 = load float, float* %parameter.1, !invariant.load !0, !noalias !3 %37 = getelementptr inbounds [3 x [1 x float]], [3 x [1 x float]]* %parameter.2, i64 0, i64 %34, i64 0 %38 = load float, float* %37, !invariant.load !0, !noalias !3 %39 = fsub float %36, %38 %40 = fmul float %39, %39 %41 = mul nuw nsw i64 %fusion.indvar.dim.2, 1 %42 = add nuw nsw i64 0, %41 %43 = udiv i64 %42, 4 %44 = mul nuw nsw i64 %fusion.indvar.dim.0, 1 %45 = add nuw nsw i64 0, %44 %46 = udiv i64 %45, 2 %47 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %45, i64 0, i64 %42 %48 = load float, float* %47, !invariant.load !0, !noalias !3 %49 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %45, i64 0, i64 %42 %50 = load float, float* %49, !invariant.load !0, !noalias !3 %51 = fmul float %48, %50 %52 = fdiv float %40, %51 %53 = fadd float %32, %52 %54 = fneg float %53 %55 = load float, float* bitcast ([4 x i8]* @1 to float*) %56 = fmul float %54, %55 %57 = getelementptr inbounds [2 x [3 x [4 x float]]], [2 x [3 x [4 x float]]]* %fusion, i64 0, i64 %fusion.indvar.dim.0, i64 %fusion.indvar.dim.1, i64 %fusion.indvar.dim.2 store float %56, float* %57, !alias.scope !8, !noalias !12 %invar.inc5 = add nuw nsw i64 %fusion.indvar.dim.2, 1 store i64 %invar.inc5, i64* %fusion.invar_address.dim.2 br label %fusion.loop_header.dim.2 fusion.loop_exit.dim.2: ; preds = %fusion.loop_header.dim.2 %invar.inc4 = add nuw nsw i64 %fusion.indvar.dim.1, 1 store i64 %invar.inc4, i64* %fusion.invar_address.dim.1 br label %fusion.loop_header.dim.1 fusion.loop_exit.dim.1: ; preds = %fusion.loop_header.dim.1 %invar.inc3 = add nuw nsw i64 %fusion.indvar.dim.0, 1 store i64 %invar.inc3, i64* %fusion.invar_address.dim.0 br label %fusion.loop_header.dim.0 fusion.loop_exit.dim.0: ; preds = %fusion.loop_header.dim.0 %58 = getelementptr inbounds i8*, i8** %buffer_table, i64 3 %59 = load i8*, i8** %58, !invariant.load !0, !dereferenceable !2, !align !2 %tuple.30 = bitcast i8* %59 to [1 x i8*]* %60 = bitcast [2 x [3 x [4 x float]]]* %fusion to i8* %61 = getelementptr inbounds [1 x i8*], [1 x i8*]* %tuple.30, i64 0, i64 0 store i8* %60, i8** %61, !alias.scope !14, !noalias !8 ret void } ; Function Attrs: nounwind readnone speculatable willreturn declare float @llvm.log.f32(float) #1 attributes #0 = { uwtable "no-frame-pointer-elim"="false" } attributes #1 = { nounwind readnone speculatable willreturn } !0 = !{} !1 = !{i64 32} !2 = !{i64 8} !3 = !{!4, !6} !4 = !{!"buffer: {index:0, offset:0, size:96}", !5} !5 = !{!"XLA global AA domain"} !6 = !{!"buffer: {index:5, offset:0, size:32}", !5} !7 = !{!6} !8 = !{!4} !9 = !{i64 4} !10 = !{i64 12} !11 = !{i64 96} !12 = !{!13, !6} !13 = !{!"buffer: {index:3, offset:0, size:8}", !5} !14 = !{!13} ``` gets (correctly) optimized to the one below without the change: ``` ; ModuleID = '__compute_module' source_filename = "__compute_module" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-grtev4-linux-gnu" ; Function Attrs: nofree nounwind uwtable define void @jit_wrapped_fun.31(i8* nocapture readnone %retval, i8* noalias nocapture readnone %run_options, i8** noalias nocapture readnone %params, i8** noalias nocapture readonly %buffer_table, i64* noalias nocapture readnone %prof_counters) local_unnamed_addr #0 { entry: %0 = getelementptr inbounds i8*, i8** %buffer_table, i64 1 %1 = bitcast i8** %0 to [2 x [1 x [4 x float]]]** %2 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %1, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %3 = getelementptr inbounds i8*, i8** %buffer_table, i64 5 %4 = bitcast i8** %3 to [2 x [1 x [4 x float]]]** %5 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %4, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %6 = bitcast [2 x [1 x [4 x float]]]* %2 to <4 x float>* %7 = load <4 x float>, <4 x float>* %6, align 8, !invariant.load !0, !noalias !3 %8 = fmul <4 x float> %7, %7 %9 = fmul <4 x float> %8, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000> %10 = call <4 x float> @llvm.log.v4f32(<4 x float> %9) %11 = bitcast [2 x [1 x [4 x float]]]* %5 to <4 x float>* store <4 x float> %10, <4 x float>* %11, align 8, !alias.scope !7, !noalias !8 %12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0 %13 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0 %14 = bitcast float* %12 to <4 x float>* %15 = load <4 x float>, <4 x float>* %14, align 8, !invariant.load !0, !noalias !3 %16 = fmul <4 x float> %15, %15 %17 = fmul <4 x float> %16, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000> %18 = call <4 x float> @llvm.log.v4f32(<4 x float> %17) %19 = bitcast float* %13 to <4 x float>* store <4 x float> %18, <4 x float>* %19, align 8, !alias.scope !7, !noalias !8 %20 = getelementptr inbounds i8*, i8** %buffer_table, i64 4 %21 = bitcast i8** %20 to float** %22 = load float*, float** %21, align 8, !invariant.load !0, !dereferenceable !9, !align !2 %23 = getelementptr inbounds i8*, i8** %buffer_table, i64 2 %24 = bitcast i8** %23 to [3 x [1 x float]]** %25 = load [3 x [1 x float]]*, [3 x [1 x float]]** %24, align 8, !invariant.load !0, !dereferenceable !10, !align !2 %26 = load i8*, i8** %buffer_table, align 8, !invariant.load !0, !dereferenceable !11, !align !2 %27 = load float, float* %22, align 8, !invariant.load !0, !noalias !3 %.phi.trans.insert28 = getelementptr inbounds [3 x [1 x float]], [3 x [1 x float]]* %25, i64 0, i64 2, i64 0 %.pre29 = load float, float* %.phi.trans.insert28, align 8, !invariant.load !0, !noalias !3 %28 = bitcast [3 x [1 x float]]* %25 to <2 x float>* %29 = load <2 x float>, <2 x float>* %28, align 8, !invariant.load !0, !noalias !3 %30 = insertelement <2 x float> undef, float %27, i32 0 %31 = shufflevector <2 x float> %30, <2 x float> undef, <2 x i32> zeroinitializer %32 = fsub <2 x float> %31, %29 %33 = fmul <2 x float> %32, %32 %shuffle30 = shufflevector <2 x float> %33, <2 x float> undef, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1> %34 = fsub float %27, %.pre29 %35 = fmul float %34, %34 %36 = insertelement <4 x float> undef, float %35, i32 0 %37 = shufflevector <4 x float> %36, <4 x float> undef, <4 x i32> zeroinitializer %shuffle = shufflevector <4 x float> %10, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %38 = fmul <4 x float> %7, %7 %shuffle31 = shufflevector <4 x float> %38, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %39 = fdiv <8 x float> %shuffle30, %shuffle31 %40 = fadd <8 x float> %shuffle, %39 %41 = fmul <8 x float> %40, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %42 = bitcast i8* %26 to <8 x float>* store <8 x float> %41, <8 x float>* %42, align 8, !alias.scope !8, !noalias !12 %43 = getelementptr inbounds i8, i8* %26, i64 32 %44 = fdiv <4 x float> %37, %38 %45 = fadd <4 x float> %10, %44 %46 = fmul <4 x float> %45, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %47 = bitcast i8* %43 to <4 x float>* store <4 x float> %46, <4 x float>* %47, align 8, !alias.scope !8, !noalias !12 %.phi.trans.insert = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0 %.phi.trans.insert12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0 %48 = bitcast float* %.phi.trans.insert to <4 x float>* %49 = load <4 x float>, <4 x float>* %48, align 8, !alias.scope !7, !noalias !8 %50 = bitcast float* %.phi.trans.insert12 to <4 x float>* %51 = load <4 x float>, <4 x float>* %50, align 8, !invariant.load !0, !noalias !3 %shuffle.1 = shufflevector <4 x float> %49, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %52 = getelementptr inbounds i8, i8* %26, i64 48 %53 = fmul <4 x float> %51, %51 %shuffle31.1 = shufflevector <4 x float> %53, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %54 = fdiv <8 x float> %shuffle30, %shuffle31.1 %55 = fadd <8 x float> %shuffle.1, %54 %56 = fmul <8 x float> %55, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %57 = bitcast i8* %52 to <8 x float>* store <8 x float> %56, <8 x float>* %57, align 8, !alias.scope !8, !noalias !12 %58 = getelementptr inbounds i8, i8* %26, i64 80 %59 = fdiv <4 x float> %37, %53 %60 = fadd <4 x float> %49, %59 %61 = fmul <4 x float> %60, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %62 = bitcast i8* %58 to <4 x float>* store <4 x float> %61, <4 x float>* %62, align 8, !alias.scope !8, !noalias !12 %63 = getelementptr inbounds i8*, i8** %buffer_table, i64 3 %64 = bitcast i8** %63 to [1 x i8*]** %65 = load [1 x i8*]*, [1 x i8*]** %64, align 8, !invariant.load !0, !dereferenceable !2, !align !2 %66 = getelementptr inbounds [1 x i8*], [1 x i8*]* %65, i64 0, i64 0 store i8* %26, i8** %66, align 8, !alias.scope !14, !noalias !8 ret void } ; Function Attrs: nounwind readnone speculatable willreturn declare <4 x float> @llvm.log.v4f32(<4 x float>) #1 attributes #0 = { nofree nounwind uwtable "no-frame-pointer-elim"="false" } attributes #1 = { nounwind readnone speculatable willreturn } !0 = !{} !1 = !{i64 32} !2 = !{i64 8} !3 = !{!4, !6} !4 = !{!"buffer: {index:0, offset:0, size:96}", !5} !5 = !{!"XLA global AA domain"} !6 = !{!"buffer: {index:5, offset:0, size:32}", !5} !7 = !{!6} !8 = !{!4} !9 = !{i64 4} !10 = !{i64 12} !11 = !{i64 96} !12 = !{!13, !6} !13 = !{!"buffer: {index:3, offset:0, size:8}", !5} !14 = !{!13} ``` and (incorrectly) optimized to the one below with the change: ``` ; ModuleID = '__compute_module' source_filename = "__compute_module" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-grtev4-linux-gnu" ; Function Attrs: nofree nounwind uwtable define void @jit_wrapped_fun.31(i8* nocapture readnone %retval, i8* noalias nocapture readnone %run_options, i8** noalias nocapture readnone %params, i8** noalias nocapture readonly %buffer_table, i64* noalias nocapture readnone %prof_counters) local_unnamed_addr #0 { entry: %0 = getelementptr inbounds i8*, i8** %buffer_table, i64 1 %1 = bitcast i8** %0 to [2 x [1 x [4 x float]]]** %2 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %1, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %3 = getelementptr inbounds i8*, i8** %buffer_table, i64 5 %4 = bitcast i8** %3 to [2 x [1 x [4 x float]]]** %5 = load [2 x [1 x [4 x float]]]*, [2 x [1 x [4 x float]]]** %4, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %6 = bitcast [2 x [1 x [4 x float]]]* %2 to <4 x float>* %7 = load <4 x float>, <4 x float>* %6, align 8, !invariant.load !0, !noalias !3 %8 = fmul <4 x float> %7, %7 %9 = fmul <4 x float> %8, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000> %10 = call <4 x float> @llvm.log.v4f32(<4 x float> %9) %11 = bitcast [2 x [1 x [4 x float]]]* %5 to <4 x float>* store <4 x float> %10, <4 x float>* %11, align 8, !alias.scope !7, !noalias !8 %12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0 %13 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0 %14 = bitcast float* %12 to <4 x float>* %15 = load <4 x float>, <4 x float>* %14, align 8, !invariant.load !0, !noalias !3 %16 = fmul <4 x float> %15, %15 %17 = fmul <4 x float> %16, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000> %18 = call <4 x float> @llvm.log.v4f32(<4 x float> %17) %19 = bitcast float* %13 to <4 x float>* store <4 x float> %18, <4 x float>* %19, align 8, !alias.scope !7, !noalias !8 %20 = getelementptr inbounds i8*, i8** %buffer_table, i64 4 %21 = bitcast i8** %20 to float** %22 = load float*, float** %21, align 8, !invariant.load !0, !dereferenceable !9, !align !2 %23 = getelementptr inbounds i8*, i8** %buffer_table, i64 2 %24 = bitcast i8** %23 to [3 x [1 x float]]** %25 = load [3 x [1 x float]]*, [3 x [1 x float]]** %24, align 8, !invariant.load !0, !dereferenceable !10, !align !2 %26 = load i8*, i8** %buffer_table, align 8, !invariant.load !0, !dereferenceable !11, !align !2 %27 = load float, float* %22, align 8, !invariant.load !0, !noalias !3 %.phi.trans.insert28 = getelementptr inbounds [3 x [1 x float]], [3 x [1 x float]]* %25, i64 0, i64 2, i64 0 %.pre29 = load float, float* %.phi.trans.insert28, align 8, !invariant.load !0, !noalias !3 %28 = bitcast [3 x [1 x float]]* %25 to <2 x float>* %29 = load <2 x float>, <2 x float>* %28, align 8, !invariant.load !0, !noalias !3 %30 = insertelement <2 x float> undef, float %27, i32 0 %31 = shufflevector <2 x float> %30, <2 x float> undef, <2 x i32> zeroinitializer %32 = fsub <2 x float> %31, %29 %33 = fmul <2 x float> %32, %32 %shuffle32 = shufflevector <2 x float> %33, <2 x float> undef, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1> %34 = fsub float %27, %.pre29 %35 = fmul float %34, %34 %36 = insertelement <4 x float> undef, float %35, i32 0 %37 = shufflevector <4 x float> %36, <4 x float> undef, <4 x i32> zeroinitializer %shuffle = shufflevector <4 x float> %10, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %38 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 0, i64 0, i64 3 %39 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 0, i64 0, i64 3 %40 = fmul <4 x float> %7, %7 %41 = shufflevector <4 x float> %40, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef> %42 = fdiv <8 x float> %shuffle32, %41 %43 = fadd <8 x float> %shuffle, %42 %44 = fmul <8 x float> %43, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %45 = bitcast i8* %26 to <8 x float>* store <8 x float> %44, <8 x float>* %45, align 8, !alias.scope !8, !noalias !12 %46 = extractelement <4 x float> %10, i32 0 %47 = getelementptr inbounds i8, i8* %26, i64 32 %48 = extractelement <4 x float> %10, i32 1 %49 = extractelement <4 x float> %10, i32 2 %50 = load float, float* %38, align 4, !alias.scope !7, !noalias !8 %51 = load float, float* %39, align 4, !invariant.load !0, !noalias !3 %52 = fmul float %51, %51 %53 = insertelement <4 x float> undef, float %52, i32 3 %54 = fdiv <4 x float> %37, %53 %55 = insertelement <4 x float> undef, float %46, i32 0 %56 = insertelement <4 x float> %55, float %48, i32 1 %57 = insertelement <4 x float> %56, float %49, i32 2 %58 = insertelement <4 x float> %57, float %50, i32 3 %59 = fadd <4 x float> %58, %54 %60 = fmul <4 x float> %59, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %61 = bitcast i8* %47 to <4 x float>* store <4 x float> %60, <4 x float>* %61, align 8, !alias.scope !8, !noalias !12 %.phi.trans.insert = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0 %.phi.trans.insert12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0 %62 = bitcast float* %.phi.trans.insert to <4 x float>* %63 = load <4 x float>, <4 x float>* %62, align 8, !alias.scope !7, !noalias !8 %64 = bitcast float* %.phi.trans.insert12 to <4 x float>* %65 = load <4 x float>, <4 x float>* %64, align 8, !invariant.load !0, !noalias !3 %shuffle.1 = shufflevector <4 x float> %63, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %66 = getelementptr inbounds i8, i8* %26, i64 48 %67 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 3 %68 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 3 %69 = fmul <4 x float> %65, %65 %70 = shufflevector <4 x float> %69, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %71 = fdiv <8 x float> %shuffle32, %70 %72 = fadd <8 x float> %shuffle.1, %71 %73 = fmul <8 x float> %72, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %74 = bitcast i8* %66 to <8 x float>* store <8 x float> %73, <8 x float>* %74, align 8, !alias.scope !8, !noalias !12 %75 = extractelement <4 x float> %69, i32 0 %76 = extractelement <4 x float> %63, i32 0 %77 = getelementptr inbounds i8, i8* %26, i64 80 %78 = extractelement <4 x float> %69, i32 1 %79 = extractelement <4 x float> %63, i32 1 %80 = extractelement <4 x float> %69, i32 2 %81 = extractelement <4 x float> %63, i32 2 %82 = load float, float* %67, align 4, !alias.scope !7, !noalias !8 %83 = load float, float* %68, align 4, !invariant.load !0, !noalias !3 %84 = fmul float %83, %83 %85 = insertelement <4 x float> undef, float %75, i32 0 %86 = insertelement <4 x float> %85, float %78, i32 1 %87 = insertelement <4 x float> %86, float %80, i32 2 %88 = insertelement <4 x float> %87, float %84, i32 3 %89 = fdiv <4 x float> %37, %88 %90 = insertelement <4 x float> undef, float %76, i32 0 %91 = insertelement <4 x float> %90, float %79, i32 1 %92 = insertelement <4 x float> %91, float %81, i32 2 %93 = insertelement <4 x float> %92, float %82, i32 3 %94 = fadd <4 x float> %93, %89 %95 = fmul <4 x float> %94, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %96 = bitcast i8* %77 to <4 x float>* store <4 x float> %95, <4 x float>* %96, align 8, !alias.scope !8, !noalias !12 %97 = getelementptr inbounds i8*, i8** %buffer_table, i64 3 %98 = bitcast i8** %97 to [1 x i8*]** %99 = load [1 x i8*]*, [1 x i8*]** %98, align 8, !invariant.load !0, !dereferenceable !2, !align !2 %100 = getelementptr inbounds [1 x i8*], [1 x i8*]* %99, i64 0, i64 0 store i8* %26, i8** %100, align 8, !alias.scope !14, !noalias !8 ret void } ; Function Attrs: nounwind readnone speculatable willreturn declare <4 x float> @llvm.log.v4f32(<4 x float>) #1 attributes #0 = { nofree nounwind uwtable "no-frame-pointer-elim"="false" } attributes #1 = { nounwind readnone speculatable willreturn } !0 = !{} !1 = !{i64 32} !2 = !{i64 8} !3 = !{!4, !6} !4 = !{!"buffer: {index:0, offset:0, size:96}", !5} !5 = !{!"XLA global AA domain"} !6 = !{!"buffer: {index:5, offset:0, size:32}", !5} !7 = !{!6} !8 = !{!4} !9 = !{i64 4} !10 = !{i64 12} !11 = !{i64 96} !12 = !{!13, !6} !13 = !{!"buffer: {index:3, offset:0, size:8}", !5} !14 = !{!13} ``` This results in bad numerical answers when used through XLA. Again, it's not that easy to give a small fully-reproducible example, but the misscompare is: ``` Expected literal: ( f32[2,3,4] { { { nan, -inf, -3181.35, -inf }, { nan, -inf, -28.2577019, -inf }, { nan, -inf, -28.2577019, -inf } }, { { -inf, -inf, -inf, -inf }, { -6.60753046e+28, -1.47314833e+23, -inf, -inf }, { -2.43504347e+30, -5.42892693e+24, -inf, -inf } } } ) Actual literal: ( f32[2,3,4] { { { nan, -inf, -3181.35, -inf }, { nan, -inf, -inf, -inf }, { inf, -inf, -28.2577019, -inf } }, { { -inf, -inf, -inf, -inf }, { -6.60753046e+28, -1.47314833e+23, -inf, -inf }, { -2.43504347e+30, -5.42892693e+24, -inf, -inf } } } ) ``` Reviewers: sanjoy.google, sanjoy, ebrevnov, jdoerfert, reames, chandlerc Subscribers: hiraditya, Charusso, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70516
* Revert "[Driver] Use VFS to check if sanitizer blacklists exist"Ilya Biryukov2019-11-212-11/+6
| | | | | This reverts commit ba6f906854263375cff3257d22d241a8a259cf77. Commit caused compilation errors on llvm tests. Will fix and re-land.
* [Driver] Use VFS to check if sanitizer blacklists existIlya Biryukov2019-11-212-6/+11
| | | | | | | | | | | | | | | | | | | Summary: This is a follow-up to 590f279c456bbde632eca8ef89a85c478f15a249, which moved some of the callers to use VFS. It turned out more code in Driver calls into real filesystem APIs and also needs an update. Reviewers: gribozavr2, kadircet Reviewed By: kadircet Subscribers: ormris, mgorny, hiraditya, llvm-commits, jkorous, cfe-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D70440
* [AMDGPU] Add attribute for target loop unroll threshold defaultTim Corringham2019-11-211-1/+2
| | | | | | | | | | | | | | Summary: Add a function attribute to allow the target specific default loop unroll threshold to be specified on a per-function basis. This allows a front-end to give guidance where it has insight that is not available to the back-end, while still allowing the target specific heuristics to also have an effect. Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D68873
* [DebugInfo] Refactor DIExpression [SZ]Ext creation into function [NFC]David Stenberg2019-11-212-4/+11
| | | | | | | | | | | | | | | Summary: Also, replace the SmallVector with a normal C array. Reviewers: vsk Reviewed By: vsk Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70498
* [Support] Don't check XCR0 when detecting avx512 on Darwin.Florian Hahn2019-11-211-0/+15
| | | | | | | | | | | | | | | Darwin lazily saves the AVX512 context on first use [1]: instead of checking that it already does to figure out if the OS supports AVX512, trust that the kernel will do the right thing and always assume the context save support is available. [1] https://github.com/apple/darwin-xnu/blob/xnu-4903.221.2/osfmk/i386/fpu.c#L174 Reviewers: ab, RKSimon, craig.topper Reviewed By: craig.topper Differential Revision: https://reviews.llvm.org/D70453
* [DAGCombine][NFC] Use ArrayRef and correctly size SmallVectors.Clement Courbet2019-11-211-3/+3
| | | | In preparation for D70487.
* D'oh. Fix assert after a84922916e6eddf701b39fbd7fe0222cb0fee1d6.James Y Knight2019-11-201-2/+3
| | | | (Which was attempting to fix unused variable warning in NDEBUG mode after 8ba56f322abf848cec78ff7f814f3ad84cd778be)
* Fix unused variable warning in NDEBUG mode after ↵James Y Knight2019-11-201-2/+2
| | | | 8ba56f322abf848cec78ff7f814f3ad84cd778be
* Tablegen: Remove the error for duplicate include files.River Riddle2019-11-204-15/+8
| | | | | | | | | | | | | | | | | | | | | | | | This error was originally added a while(7 years) ago when including multiple files was basically always an error. Tablegen now has preprocessor support, which allows for building nice c/c++ style include guards. With the current error being reported, we unfortunately need to double guard when including files: * In user of MyFile.td #ifndef MYFILE_TD include MyFile.td #endif * In MyFile.td #ifndef MYFILE_TD #define MYFILE_TD ... #endif Differential Revision: https://reviews.llvm.org/D70410
* [Error] Remove a broken code fragment accidentally included in 76bcbaafab2.Lang Hames2019-11-201-2/+0
|
* [Orc][Modules] Fix Modules build fallout from a34680a33eb.Lang Hames2019-11-203-2/+12
| | | | | | | | | | | | In a34680a33eb OrcError.h and Orc/RPC/*.h were split out from the rest of ExecutionEngine in order to eliminate false dependencies for remote JIT targets (see https://reviews.llvm.org/D68732), however this broke modules builds (see https://reviews.llvm.org/D69817). This patch splits these headers out into a separate module, LLVM_OrcSupport, in order to fix the modules build. Fixes <rdar://56377508>.
* [MemorySSA] Moving at the end often means before terminator.Alina Sbirlea2019-11-205-5/+17
| | | | | | | | | | | | | Moving accesses in MemorySSA at InsertionPlace::End, when an instruction is moved into a block, almost always means insert at the end of the block, but before the block terminator. This matters when the block terminator is a MemoryAccess itself (an invoke), and the insertion must be done before the terminator for the update to be correct. Insert an additional position: InsertionPlace:BeforeTerminator and update current usages where this applies. Resolves PR44027.
* [X86] Fix i16->f128 sitofp to promote the i16 to i32 before trying to form a ↵Craig Topper2019-11-201-8/+9
| | | | | | libcall. Previously one of the test cases added here gave an error.
* [X86] Fix f128->i16 fptosi to promote the i16 to i32 before trying to form a ↵Craig Topper2019-11-201-15/+16
| | | | | | libcall. Previously one of the test cases added here gave an error.
* Fix an offset underflow bug in DwarfExpression when describing small values ↵Adrian Prantl2019-11-201-9/+8
| | | | | | | | | | | | | | | | | | with subregisters DwarfExpression::addMachineReg() knows how to build a larger register that isn't expressible in DWARF by combining multiple subregisters. However, if the entire value fits into just one subregister, it would still emit the other subregisters, leading to all sorts of inconsistencies down the line. This patch fixes that by moving an already existing(!) check whether the subregister's offset is before the end of the value to the right place. rdar://problem/57294211 Differential Revision: https://reviews.llvm.org/D70508
* [MemorySSA] Update analysis when the terminator is a memory instruction.Alina Sbirlea2019-11-202-1/+10
| | | | | Update MemorySSA when moving the terminator instruction, as that may be a memory touching instruction. Resolves PR44029.
* Revert "[AArch64] Add the pipeline model for Exynos M5"Eric Christopher2019-11-202-1014/+1
| | | | | | as it's causing test failures in llvm-mca. This reverts commit 9bdfee2a3bd13d405ce1592930182f23849d2897.
* Temporarily Revert "[SLP] allow forming 2-way reduction patterns" and update ↵Eric Christopher2019-11-201-29/+8
| | | | | | | | | | testcases. After speaking with Sanjay - seeing a number of miscompiles and working on tracking down a testcase. None of the follow on patches seem to have helped so far. This reverts commit 8a0aa5310bccbb42d16d11db090419fcefdd1376.
* Temporarily Revert "Temporarily Revert "[SLP] allow forming 2-way reduction ↵Eric Christopher2019-11-201-8/+29
| | | | | | | | patterns"" as there were testcase changes after that need to also be reverted. This reverts commit cd8748a15f2d18861b3548eb26ed2b52e5ee50b4.
* [BPF] Fix a bug in peephole optimizationYonghong Song2019-11-201-21/+59
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | One of current peephole optimiations is to remove SLL/SRL if the sub register has been zero extended. This phase has two bugs and one limitations. First, for the physical subregister used in pseudo insn COPY like below, it permits incorrect optimization. %0:gpr32 = COPY $w0 ... %4:gpr = MOV_32_64 %0:gpr32 %5:gpr = SLL_ri %4:gpr(tied-def 0), 32 %6:gpr = SRA_ri %5:gpr(tied-def 0), 32 The $w0 could be from the return value of a previous function call and its upper 32-bit value might contain some non-zero values. The same applies to function arguments. Second, the current code may permits removing SLL/SRA like below: %0:gpr32 = COPY $w0 %1:gpr32 = COPY %0:gpr32 ... %4:gpr = MOV_32_64 %1:gpr32 %5:gpr = SLL_ri %4:gpr(tied-def 0), 32 %6:gpr = SRA_ri %5:gpr(tied-def 0), 32 The reason is that it did not follow def-use chain to skip all intermediate 32bit-to-32bit COPY instructions. The current implementation is also very conservative for PHI instructions. If any PHI insn component is another PHI or COPY insn, it will just permit SLL/SRA. This patch fixed the issue as follows: - During def/use chain traversal, if any physical register is read, SLL/SRA will be preserved as these physical registers are mostly from function return values or current function arguments. - Recursively visit all COPY and PHI instructions.
* Temporarily Revert "[SLP] allow forming 2-way reduction patterns"Eric Christopher2019-11-201-29/+8
| | | | | | | | After speaking with Sanjay - seeing a number of miscompiles and working on tracking down a testcase. None of the follow on patches seem to have helped so far. This reverts commit 7ff57705ba196ce649d6034614b3b9df57e1f84f.
* [AArch64] Add the pipeline model for Exynos M5Evandro Menezes2019-11-202-1/+1014
| | | | Add the scheduling and cost models for Exynos M5.
* [AMDGPU][SILoadStoreOptimizer] Merge TBUFFER loads/storesPiotr Sobczak2019-11-204-8/+444
| | | | | | | | | | | | | | Summary: Extend SILoadStoreOptimizer to merge tbuffer loads and stores. Reviewers: nhaehnle Reviewed By: nhaehnle Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69794
* [X86] Mark vector STRICT_FP_ROUND as Legal instead of Custom.Craig Topper2019-11-201-3/+9
| | | | | | | | | The Custom handler doesn't do anything for these nodes anyway. SelectionDAGISel won't mutate them if they are Legal or Custom. X86 has custom code for mutating them due to missing isel patterns. When the isel patterns are added Legal will be the right answer. So go ahead a change it now since that's where we'll end up.
* Move widenable branch formation into makeGuardControlFlowExplicit helperPhilip Reames2019-11-203-17/+16
| | | | This is mostly NFC, but I removed the setting of the guard's calling convention onto the WC call. Why? Because it was untested, and was producing an ill defined output as the declaration's convention wasn't been changed leaving a mismatch which is UB.
* [AMDGPU] Keep consistent check of legal addressing mode.Michael Liao2019-11-202-14/+11
| | | | | | | | | | | | | | Summary: - Add test cases for GFX10, which has narrower offset range compared to GFX9. Reviewers: rampitec, arsenm Subscribers: kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70473
* [SelectionDAG][X86] Mutate strictFP nodes to non-strict in ↵Craig Topper2019-11-202-1/+8
| | | | | | | | | | DoInstructionSelection when the node is marked Expand rather than when it is not Legal. This allows operations that are marked Custom, but have some type combinations that are legal to get past this code. Add custom mutation code to X86's Select function for the nodes that don't have isel patterns yet.
* A fix of the bug introduced by previous lowering in asm patch.Xiangling Liao2019-11-201-2/+1
| | | | Differential Revision: https://reviews.llvm.org/D70243
* [AIX][XCOFF] Add support for generating assembly code for one-byte mergable ↵Xing Xue2019-11-202-3/+22
| | | | | | | | | | | | | | | | | | strings This patch adds support for generating assembly code for one-byte mergeable strings. Generating assembly code for multi-byte mergeable strings and the `XCOFF` object code for mergeable strings will be supported later. Reviewers: hubert.reinterpretcast, jasonliu, daltenty, sfertile, DiggerLin, Xiangling_L Reviewed by: daltenty Subscribers: wuzish, nemanjai, hiraditya, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70310
* [AIX] Lowering jump table, constant pool and block address in asmXiangling Liao2019-11-204-7/+48
| | | | | | | | | | This patch lowering jump table, constant pool and block address in assembly. 1. On AIX, jump table index is always relative; 2. Put CPI and JTI into ReadOnlySection until we support unique data sections; 3. Create the temp symbol for block address symbol; 4. Update MIR testcases and add related assembly part; Differential Revision: https://reviews.llvm.org/D70243
* Recommit "[DWARF] Add an api to get "interpreted" location lists"Pavel Labath2019-11-204-1/+64
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This recommits 089c0f581492cd6e2a3d2927be3fbf60ea2d7e62, which was reverted due to failing tests on big endian machines. It includes a fix which I believe (I don't have BE machine) should fix this issue. The fix consists of correcting the invocation DWARFYAML::EmitDebugSections, which was missing one (default) function arguments, and so didn't actually force the little-endian mode. The original commit message follows. Summary: This patch adds DWARFDie::getLocations, which returns the location expressions for a given attribute (typically DW_AT_location). It handles both "inline" locations and references to the external location list sections (currently only of the DW_FORM_sec_offset type). It is implemented on top of DWARFUnit::findLoclistFromOffset, which is also added in this patch. I tried to make their signatures similar to the equivalent range list functionality. The actual location list interpretation logic is in DWARFLocationTable::visitAbsoluteLocationList. This part is not equivalent to the range list code, but this deviation is motivated by a desire to reuse the same location list parsing code within lldb. The functionality is tested via a c++ unit test of the DWARFDie API. Reviewers: dblaikie, JDevlieghere, SouraVX Subscribers: mgorny, hiraditya, cmtice, probinson, llvm-commits, aprantl Tags: #llvm Differential Revision: https://reviews.llvm.org/D70394
* [AMDGPU][GFX10] Disabled v_movrel*[sdwa|dpp] opcodes in codegenDmitry Preobrazhensky2019-11-202-0/+27
| | | | | | | | These opcodes use indirect register addressing so they need special handling by codegen (currently missing). Reviewers: vpykhtin, arsenm, rampitec Differential Revision: https://reviews.llvm.org/D70400
* [mips] Define mem_simm# operands using tblgen `foreach` loop. NFCSimon Atanasyan2019-11-201-29/+5
|
* [SelectionDAG] Combine U{ADD,SUB}O diamonds into {ADD,SUB}CARRYDavid Zarzycki2019-11-201-0/+99
| | | | | | | | | | | | | | | | | Summary: Convert (uaddo (uaddo x, y), carryIn) into addcarry x, y, carryIn if-and-only-if the carry flags of the first two uaddo are merged via OR or XOR. Work remaining: match ADD, etc. Reviewers: craig.topper, RKSimon, spatel, niravd, jonpa, uweigand, deadalnix, nikic, lebedev.ri, dmgreen, chfast Reviewed By: lebedev.ri Subscribers: chfast, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70079
* Revert "[DWARF] Add an api to get "interpreted" location lists"Pavel Labath2019-11-204-64/+1
| | | | | | | The test fails on big endian machines. This reverts commit 089c0f581492cd6e2a3d2927be3fbf60ea2d7e62 and the subsequent attempt to fix in 82dc32e2d456c75d08bc9ffe97def409ee5a03cd.
* [ARM][MVE] Select vqabsAnna Welker2019-11-201-0/+35
| | | | | | | | Adds a pattern to ARMInstrMVE.td to use a VQABS instruction if an equivalent multi-instruction construct is found. Differential revision: https://reviews.llvm.org/D70181
* [mips] Put conditions when we need to expand memory operand into a separate ↵Simon Atanasyan2019-11-201-29/+36
| | | | | | | | function. NFC `expandMemInst` expects instruction with 3 or 4 operands and the last operand requires expanding. It's redundant to scan all operands in a loop. We can check the last operands.
OpenPOWER on IntegriCloud