summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* [ConstantRange] Add umul_sat()/smul_sat() methodsRoman Lebedev2019-11-081-0/+35
| | | | | | | | | | | | | | | | | | | | | | | Summary: To be used in `ConstantRange::mulWithNoOverflow()`, may in future be useful for when saturating shift/mul ops are added. These are precise as far as i can tell. I initially though i will need `APInt::[us]mul_sat()` for these, but it turned out much simpler to do what `ConstantRange::multiply()` does - perform multiplication in twice the bitwidth, and then truncate. Though here we want saturating signed truncation. Reviewers: nikic, reames, spatel Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69994
* [APInt] Add saturating truncation methodsRoman Lebedev2019-11-081-0/+25
| | | | | | | | | | | | | | | | Summary: The signed one is needed for implementation of `ConstantRange::smul_sat()`, unsigned is for completeness only. Reviewers: nikic, RKSimon, spatel Reviewed By: nikic Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69993
* [llvm-xray] Add AArch64 to llvm-xray extractAditya Kumar2019-11-081-5/+17
| | | | | | | | | This required adding support for resolving R_AARCH64_ABS64 relocations to get accurate addresses for function names to resolve. Authored by: ianlevesque (Ian Levesque) Reviewers: dberris, phosek, smeenai, tetsuo-cpp Differential Revision: https://reviews.llvm.org/D69967
* [XCOFF][AIX] Differentiate usage of label symbol and csect symbolJason Liu2019-11-0811-74/+82
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: We are using symbols to represent label and csect interchangeably before, and that could be a problem. There are cases we would need to add storage mapping class to the symbol if that symbol is actually the name of a csect, but it's hard for us to figure out whether that symbol is a label or csect. This patch intend to do the following: 1. Construct a QualName (A name include the storage mapping class) MCSymbolXCOFF for every MCSectionXCOFF. 2. Keep a pointer to that QualName inside of MCSectionXCOFF. 3. Use that QualName whenever we need a symbol refers to that MCSectionXCOFF. 4. Adapt the snowball effect from the above changes in XCOFFObjectWriter.cpp. Reviewers: xingxue, DiggerLin, sfertile, daltenty, hubert.reinterpretcast Reviewed By: DiggerLin, daltenty Subscribers: wuzish, nemanjai, mgorny, hiraditya, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69633
* [AMDGPU][MC] Corrected src0 for v_movrelsd_b32 and v_movrelsd_2_b32Dmitry Preobrazhensky2019-11-081-6/+8
| | | | | | | | See https://bugs.llvm.org/show_bug.cgi?id=40903 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D69888
* [LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI)Gil Rapaport2019-11-085-125/+170
| | | | | | | | This recommits 100e797adb433724a17c9b42b6533cd634cb796b (reverted in 009e032634b3bd7fc32071ac2344b12142286477 for failing an assert). While the root cause was independently reverted in eaff3004019f97c64c88ab76da6b25106b659b30, this commit includes a LIT to make sure IVDescriptor's SinkAfter logic does not try to sink branch instructions.
* BinaryStream - fix static analyzer warnings. NFCI.Simon Pilgrim2019-11-081-4/+4
| | | | | | - uninitialized variables - documention warnings - shadow variable names
* Reland: [TII] Use optional destination and source pair as a return value; NFCDjordje Todorovic2019-11-0812-106/+76
| | | | | | | | | | Refactor usage of isCopyInstrImpl, isCopyInstr and isAddImmediate methods to return optional machine operand pair of destination and source registers. Patch by Nikola Prica Differential Revision: https://reviews.llvm.org/D69622
* Revert d91ed80 "[codeview] Reference types in type parent scopes"Hans Wennborg2019-11-082-35/+14
| | | | | | | | | | | | | | | | | | | This triggered asserts in the Chromium build, see https://crbug.com/1022729 for details and reproducer. > Without this change, when a nested tag type of any kind (enum, class, > struct, union) is used as a variable type, it is emitted without > emitting the parent type. In CodeView, parent types point to their inner > types, and inner types do not point back to their parents. We already > walk over all of the parent scopes to build the fully qualified name. > This change simply requests their type indices as we go along to enusre > they are all emitted. > > Fixes PR43905 > > Reviewers: akhuang, amccarth > > Differential Revision: https://reviews.llvm.org/D69924
* [RAGreedy] Enable -consider-local-interval-cost for AArch64Sanne Wouda2019-11-082-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: The greedy register allocator occasionally decides to insert a large number of unnecessary copies, see below for an example. The -consider-local-interval-cost option (which X86 already enables by default) fixes this. We enable this option for AArch64 only after receiving feedback that this change is not beneficial for PowerPC. We evaluated the impact of this change on compile time, code size and performance benchmarks. This option has a small impact on compile time, measured on CTMark. A 0.1% geomean regression on -O1 and -O2, and 0.2% geomean for -O3, with at most 0.5% on individual benchmarks. The effect on both code size and performance on AArch64 for the LLVM test suite is nil on the geomean with individual outliers (ignoring short exec_times) between: best worst size..text -3.3% +0.0% exec_time -5.8% +2.3% On SPEC CPU® 2017 (compiled for AArch64) there is a minor reduction (-0.2% at most) in code size on some benchmarks, with a tiny movement (-0.01%) on the geomean. Neither intrate nor fprate show any change in performance. This patch makes the following changes. - For the AArch64 target, enableAdvancedRASplitCost() now returns true. - Ensures that -consider-local-interval-cost=false can disable the new behaviour if necessary. This matrix multiply example: $ cat test.c long A[8][8]; long B[8][8]; long C[8][8]; void run_test() { for (int k = 0; k < 8; k++) { for (int i = 0; i < 8; i++) { for (int j = 0; j < 8; j++) { C[i][j] += A[i][k] * B[k][j]; } } } } results in the following generated code on AArch64: $ clang --target=aarch64-arm-none-eabi -O3 -S test.c -o - [...] // %for.cond1.preheader // =>This Inner Loop Header: Depth=1 add x14, x11, x9 str q0, [sp, #16] // 16-byte Folded Spill ldr q0, [x14] mov v2.16b, v15.16b mov v15.16b, v14.16b mov v14.16b, v13.16b mov v13.16b, v12.16b mov v12.16b, v11.16b mov v11.16b, v10.16b mov v10.16b, v9.16b mov v9.16b, v8.16b mov v8.16b, v31.16b mov v31.16b, v30.16b mov v30.16b, v29.16b mov v29.16b, v28.16b mov v28.16b, v27.16b mov v27.16b, v26.16b mov v26.16b, v25.16b mov v25.16b, v24.16b mov v24.16b, v23.16b mov v23.16b, v22.16b mov v22.16b, v21.16b mov v21.16b, v20.16b mov v20.16b, v19.16b mov v19.16b, v18.16b mov v18.16b, v17.16b mov v17.16b, v16.16b mov v16.16b, v7.16b mov v7.16b, v6.16b mov v6.16b, v5.16b mov v5.16b, v4.16b mov v4.16b, v3.16b mov v3.16b, v1.16b mov x12, v0.d[1] fmov x15, d0 ldp q1, q0, [x14, #16] ldur x1, [x10, #-256] ldur x2, [x10, #-192] add x9, x9, #64 // =64 mov x13, v1.d[1] fmov x16, d1 ldr q1, [x14, #48] mul x3, x15, x1 mov x14, v0.d[1] fmov x17, d0 mov x18, v1.d[1] fmov x0, d1 mov v1.16b, v3.16b mov v3.16b, v4.16b mov v4.16b, v5.16b mov v5.16b, v6.16b mov v6.16b, v7.16b mov v7.16b, v16.16b mov v16.16b, v17.16b mov v17.16b, v18.16b mov v18.16b, v19.16b mov v19.16b, v20.16b mov v20.16b, v21.16b mov v21.16b, v22.16b mov v22.16b, v23.16b mov v23.16b, v24.16b mov v24.16b, v25.16b mov v25.16b, v26.16b mov v26.16b, v27.16b mov v27.16b, v28.16b mov v28.16b, v29.16b mov v29.16b, v30.16b mov v30.16b, v31.16b mov v31.16b, v8.16b mov v8.16b, v9.16b mov v9.16b, v10.16b mov v10.16b, v11.16b mov v11.16b, v12.16b mov v12.16b, v13.16b mov v13.16b, v14.16b mov v14.16b, v15.16b mov v15.16b, v2.16b ldr q2, [sp] // 16-byte Folded Reload fmov d0, x3 mul x3, x12, x1 [...] With -consider-local-interval-cost the same section of code results in the following: $ clang --target=aarch64-arm-none-eabi -mllvm -consider-local-interval-cost -O3 -S test.c -o - [...] .LBB0_1: // %for.cond1.preheader // =>This Inner Loop Header: Depth=1 add x14, x11, x9 ldp q0, q1, [x14] ldur x1, [x10, #-256] ldur x2, [x10, #-192] add x9, x9, #64 // =64 mov x12, v0.d[1] fmov x15, d0 mov x13, v1.d[1] fmov x16, d1 ldp q0, q1, [x14, #32] mul x3, x15, x1 cmp x9, #512 // =512 mov x14, v0.d[1] fmov x17, d0 fmov d0, x3 mul x3, x12, x1 [...] Reviewers: SjoerdMeijer, samparker, dmgreen, qcolombet Reviewed By: dmgreen Subscribers: ZhangKang, jsji, wuzish, ppc-slack, lkail, steven.zhang, MatzeB, qcolombet, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69437
* [RISCV] Fix evaluation of %pcrel_loRoger Ferrer Ibanez2019-11-081-3/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The following testcase function: .Lpcrel_label1: auipc a0, %pcrel_hi(other_function) addi a1, a0, %pcrel_lo(.Lpcrel_label1) .p2align 2 # Causes a new fragment to be emitted .type other_function,@function other_function: ret exposes an odd behaviour in which only the %pcrel_hi relocation is evaluated but not the %pcrel_lo. $ llvm-mc -triple riscv64 -filetype obj t.s | llvm-objdump -d -r - <stdin>: file format ELF64-riscv Disassembly of section .text: 0000000000000000 function: 0: 17 05 00 00 auipc a0, 0 4: 93 05 05 00 mv a1, a0 0000000000000004: R_RISCV_PCREL_LO12_I other_function+4 0000000000000008 other_function: 8: 67 80 00 00 ret The reason seems to be that in RISCVAsmBackend::shouldForceRelocation we only consider the fragment but in RISCVMCExpr::evaluatePCRelLo we consider the section. This usually works but there are cases where the section may still be the same but the fragment may be another one. In that case we end forcing a %pcrel_lo relocation without any %pcrel_hi. This patch makes RISCVAsmBackend::shouldForceRelocation use the section, if any, to determine if the relocation must be forced or not. Differential Revision: https://reviews.llvm.org/D60657
* [NFC][IndVarS] Adjust a commentDaniil Suchkov2019-11-081-1/+1
| | | | (test commit)
* [CR] ConstantRange::sshl_sat(): check sigdness of the min/max, not rangesRoman Lebedev2019-11-081-2/+2
| | | | | This was pointed out in review, but forgot to stage this change into the commit itself..
* [ConstantRange] Add `ushl_sat()`/`sshl_sat()` methods.Roman Lebedev2019-11-081-0/+20
| | | | | | | | | | | | | | | | | | Summary: To be used in `ConstantRange::shlWithNoOverflow()`, may in future be useful for when saturating shift/mul ops are added. Unlike `ConstantRange::shl()`, these are precise. Reviewers: nikic, spatel, reames Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69960
* [BPF] turn on -mattr=+alu32 for cpu version v3 and laterYonghong Song2019-11-071-0/+1
| | | | | | | | | | | -mattr=+alu32 has shown good performance vs. without this attribute. Based on discussion at https://lore.kernel.org/bpf/1ec37838-966f-ec0b-5223-ca9b6eb0860d@fb.com/T/#t cpu version v3 should support -mattr=+alu32. This patch enabled alu32 if cpu version is v3, either specified by user or probed by the llvm. Differential Revision: https://reviews.llvm.org/D69957
* [PowerPC] Option for enabling absolute jumptables with command lineNemanja Ivanovic2019-11-071-0/+5
| | | | | | | | | This option allows the user to specify the use of absolute jumptables instead of relative which is the default on most PPC subtargets. Patch by Kamauu Bridgeman Differential revision: https://reviews.llvm.org/D69108
* [InstCombine] Don't transform bitcasts between x86_mmx and v1i64 into ↵Craig Topper2019-11-071-2/+3
| | | | | | | | | | insertelement/extractelement x86_mmx is conceptually a vector already. Don't introduce an extra conversion between it and scalar i64. I'm using VectorType::isValidElementType which checks for floating point, integer, and pointers to hopefully make this more readable than just blacklisting x86_mmx. Differential Revision: https://reviews.llvm.org/D69964
* [debugify] Move the Debugify pass from tools/opt to lib/Transform/UtilsDaniel Sanders2019-11-072-0/+435
| | | | | | | | | | | | | | | | | | | Summary: I need to make use of this pass from a driver program that isn't opt. Therefore this patch moves this pass into the LLVM library so that it is available for use elsewhere. There was one function I kept in tools/opt which is exportDebugifyStats() this is because it's serializing the statistics into a human readable format and this seemed more in keeping with opt than a library function Reviewers: vsk, aprantl Subscribers: mgorny, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69926
* Revert "[MachineVerifier] Improve verification of live-in lists.Galina Kistanova2019-11-071-26/+0
| | | | This reverts commit b7b170c to give the author more time to address failing tests on the expensive checks buildbots.
* [codeview] Reference types in type parent scopesReid Kleckner2019-11-072-14/+35
| | | | | | | | | | | | | | | | Without this change, when a nested tag type of any kind (enum, class, struct, union) is used as a variable type, it is emitted without emitting the parent type. In CodeView, parent types point to their inner types, and inner types do not point back to their parents. We already walk over all of the parent scopes to build the fully qualified name. This change simply requests their type indices as we go along to enusre they are all emitted. Fixes PR43905 Reviewers: akhuang, amccarth Differential Revision: https://reviews.llvm.org/D69924
* Wrong debug info generated at -O2 (-O0 is correct)Vedant Kumar2019-11-073-3/+7
| | | | | | | | | | | | | | | | Instcombiner pass was erasing trivially dead instruction without updating dependent llvm.dbg.value. which was not showing programmer current state of variables while debugging. As a part of this fix I did following, Iterate throught all the users (llvm.dbg) of a instruction which is trivially dead and set each if them undef, Before deleting the instruction. Now user will see optimized out, when try to print those variables. This fixes https://bugs.llvm.org/show_bug.cgi?id=43893 This is my first fix to llvm. Patch by kamlesh kumar! Differential Revision: https://reviews.llvm.org/D69809
* [AsmWritter] Fixed "null check after dereferencing" warningDávid Bolvanský2019-11-071-4/+2
| | | | | | | | | | | | Summary: The 'BB->getParent()' pointer was utilized before it was verified against nullptr. Check lines: 3567, 3581. Reviewers: jyknight, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69751
* Revert "[XCOFF] Fix link errors from explicit template instantiation"Reid Kleckner2019-11-071-4/+0
| | | | | | | | This reverts commit c989993ba1a666f04f7aee7df51d9f4de0588b71. maskray already fixed the explicit instantiation definition in the .cpp file, and these extern template declarations seem to be causing warnings that I don't understand.
* [XCOFF] Fix link errors from explicit template instantiationReid Kleckner2019-11-071-0/+4
| | | | | | | | | | | | | | | | I happen to be using clang-cl+lld-link locally, and I get these link errors: lld-link: error: undefined symbol: public: unsigned short __cdecl llvm::object::XCOFFSectionHeader<struct llvm::object::XCOFFSectionHeader64>::getSectionType(void) const >>> referenced by C:\src\llvm-project\llvm\tools\llvm-readobj\XCOFFDumper.cpp:106 >>> tools\llvm-readobj\CMakeFiles\llvm-readobj.dir\XCOFFDumper.cpp.obj:(public: virtual void __cdecl `anonymous namespace'::XCOFFDumper::printSectionHeaders(void)) I suspect this is because the explicit template instaniation appears before the inline method definitions in the .cpp file, so they aren't available at the point of instantiation. Move the explicit instantiation later. Also, forward declare the explicit instantiation for good measure.
* [XCOFF] Move explicit instantions after member function definitions to fix ↵Fangrui Song2019-11-071-4/+4
| | | | clang builds
* [InstCombine] canonicalize shift+logic+shift to reduce dependency chainSanjay Patel2019-11-071-0/+46
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | shift (logic (shift X, C0), Y), C1 --> logic (shift X, C0+C1), (shift Y, C1) This is an IR translation of an existing SDAG transform added here: rL370617 So we again have 9 possible patterns with a commuted IR variant of each pattern: https://rise4fun.com/Alive/VlI https://rise4fun.com/Alive/n1m https://rise4fun.com/Alive/1Vn Part of the motivation is to allow easier recognition and subsequent canonicalization of bswap patterns as discussed in PR43146: https://bugs.llvm.org/show_bug.cgi?id=43146 We had to delay this transform because it used to allow the SLP vectorizer to create awful reductions out of simple load-combines. That problem was fixed with: rL375025 (we'll bring back load combining in IR someday...) The backend is also better equipped to deal with these patterns now using hooks like TLI.getShiftAmountThreshold(). The only remaining potential controversy is that the -reassociate pass tends to reverse this kind of pattern (to help GVN?). But since -reassociate doesn't do anything with these specific patterns, there is no conflict currently. Finally, there's a new pass proposal at D67383 for general tree-height-reduction reassociation, and it could use a cost model to decide how to optimally rearrange these kinds of ops for a target. That patch appears to be stalled. Differential Revision: https://reviews.llvm.org/D69842
* X86FrameLowering - fix bool to unsigned cast static analyzer warnings. NFCI.Simon Pilgrim2019-11-071-7/+7
|
* PostRAScheduler - fix uninitialized variable warning. NFCI.Simon Pilgrim2019-11-071-1/+1
|
* ManagedStringPool - pre-increment iterator. NFC.Simon Pilgrim2019-11-071-1/+1
|
* X86CondBrFolding - remove non-existent fixBranchProb function. NFC.Simon Pilgrim2019-11-071-2/+0
|
* Using crtp to refactor the xcoff section headerdiggerlin2019-11-071-8/+19
| | | | | | | | | | | | | SUMMARY: According to https://reviews.llvm.org/D68575#inline-617586, Create a NFC patch for it. Using crtp to refactor the xcoff section header Move the define of SectionFlagsReservedMask and SectionFlagsTypeMask from XCOFFDumper.cpp to XCOFFObjectFile.h Reviewers: hubert.reinterpretcast,jasonliu Subscribers: rupprecht, seiyai,hiraditya Differential Revision: https://reviews.llvm.org/D69131
* comment shiftamountthresholdjoanlluch2019-11-071-0/+1
|
* [SDAG] reduce code duplication; NFCSanjay Patel2019-11-071-18/+11
|
* [SDAG] reduce code duplication; NFCSanjay Patel2019-11-071-4/+4
|
* [ConstantRange][LVI] Use overflow flags from `sub` to constrain the rangeRoman Lebedev2019-11-071-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This notably improves non-negativity deduction: ``` | statistic | old | new | delta | % change | | correlated-value-propagation.NumAShrs | 209 | 227 | 18 | 8.6124% | | correlated-value-propagation.NumAddNSW | 4972 | 4988 | 16 | 0.3218% | | correlated-value-propagation.NumAddNUW | 7141 | 7148 | 7 | 0.0980% | | correlated-value-propagation.NumAddNW | 12113 | 12136 | 23 | 0.1899% | | correlated-value-propagation.NumAnd | 442 | 445 | 3 | 0.6787% | | correlated-value-propagation.NumNSW | 7160 | 7176 | 16 | 0.2235% | | correlated-value-propagation.NumNUW | 13306 | 13316 | 10 | 0.0752% | | correlated-value-propagation.NumNW | 20466 | 20492 | 26 | 0.1270% | | correlated-value-propagation.NumSDivs | 207 | 212 | 5 | 2.4155% | | correlated-value-propagation.NumSExt | 6279 | 6679 | 400 | 6.3704% | | correlated-value-propagation.NumSRems | 28 | 29 | 1 | 3.5714% | | correlated-value-propagation.NumShlNUW | 2793 | 2796 | 3 | 0.1074% | | correlated-value-propagation.NumShlNW | 3964 | 3967 | 3 | 0.0757% | | correlated-value-propagation.NumUDivs | 353 | 358 | 5 | 1.4164% | | instcount.NumAShrInst | 13763 | 13741 | -22 | -0.1598% | | instcount.NumAddInst | 277349 | 277348 | -1 | -0.0004% | | instcount.NumLShrInst | 27437 | 27463 | 26 | 0.0948% | | instcount.NumOrInst | 102677 | 102678 | 1 | 0.0010% | | instcount.NumSDivInst | 8732 | 8727 | -5 | -0.0573% | | instcount.NumSExtInst | 80872 | 80468 | -404 | -0.4996% | | instcount.NumSRemInst | 1679 | 1678 | -1 | -0.0596% | | instcount.NumTruncInst | 62154 | 62153 | -1 | -0.0016% | | instcount.NumUDivInst | 2526 | 2527 | 1 | 0.0396% | | instcount.NumURemInst | 1589 | 1590 | 1 | 0.0629% | | instcount.NumZExtInst | 69405 | 69809 | 404 | 0.5821% | | instcount.TotalInsts | 7439575 | 7439574 | -1 | 0.0000% | ``` Reviewers: nikic, reames, spatel Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69942
* [ThinLTO] Import readonly vars with refsevgeny2019-11-075-14/+48
| | | | | | Patch allows importing declarations of functions and variables, referenced by the initializer of some other readonly variable. Differential revision: https://reviews.llvm.org/D69561
* [SLP] allow forming 2-way reduction patternsSanjay Patel2019-11-071-8/+29
| | | | | | | | | | | | | | | | | | | | | | | | | We have a vector compare reduction problem seen in PR39665 comment 2: https://bugs.llvm.org/show_bug.cgi?id=39665#c2 Or slightly reduced here: define i1 @cmp2(<2 x double> %a0) { %a = fcmp ogt <2 x double> %a0, <double 1.0, double 1.0> %b = extractelement <2 x i1> %a, i32 0 %c = extractelement <2 x i1> %a, i32 1 %d = and i1 %b, %c ret i1 %d } SLP would not attempt to turn this into a vector reduction because there is an artificial lower limit on that transform. We can not completely remove that limit without inducing regressions though, so this patch just hacks an extra attempt at creating a 2-way reduction to the end of the analysis. As shown in the test file, we are still not getting some of the motivating cases, so follow-on patches will be needed to solve those cases. Differential Revision: https://reviews.llvm.org/D59710
* [mips] Write `AFL_EXT_OCTEONP` flag to the `.MIPS.abiflags` sectionSimon Atanasyan2019-11-071-1/+3
| | | | Differential Revision: https://reviews.llvm.org/D69851
* [mips] Support `octeon+` CPU in the `.set arch=` directiveSimon Atanasyan2019-11-071-2/+3
| | | | Differential Revision: https://reviews.llvm.org/D69850
* [mips] Implement Octeon+ `saa` and `saad` instructionsSimon Atanasyan2019-11-0710-16/+130
| | | | | | | | | | | | | | | `saa` and `saad` are 32-bit and 64-bit store atomic add instructions. memory[base] = memory[base] + rt These instructions are available for "Octeon+" CPU. The patch adds support for both instructions to MIPS assembler and diassembler and introduces new CPU type - "octeon+". Next patches will implement `.set arch=octeon+` directive and `AFL_EXT_OCTEONP` ISA extension flag support. Differential Revision: https://reviews.llvm.org/D69849
* Revert f0c2a5a "[LV] Generalize conditions for sinking instrs for first ↵Hans Wennborg2019-11-071-26/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | order recurrences." It broke Chromium, causing "Instruction does not dominate all uses!" errors. See https://bugs.chromium.org/p/chromium/issues/detail?id=1022297#c1 for a reproducer. > If the recurrence PHI node has a single user, we can sink any > instruction without side effects, given that all users are dominated by > the instruction computing the incoming value of the next iteration > ('Previous'). We can sink instructions that may cause traps, because > that only causes the trap to occur later, but not on any new paths. > > With the relaxed check, we also have to make sure that we do not have a > direct cycle (meaning PHI user == 'Previous), which indicates a > reduction relation, which potentially gets missed by > ReductionDescriptor. > > As follow-ups, we can also sink stores, iff they do not alias with > other instructions we move them across and we could also support sinking > chains of instructions and multiple users of the PHI. > > Fixes PR43398. > > Reviewers: hsaito, dcaballe, Ayal, rengolin > > Reviewed By: Ayal > > Differential Revision: https://reviews.llvm.org/D69228
* [AMDGPU] Fix bug introduced in 47a5c36b37f0dfukalov2019-11-071-1/+1
| | | | | | | | | | | | | | Summary: [AMDGPU] Fix bug introduced in 47a5c36b37f0 Reviewers: foad, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69915
* [X86] Remove unused variable. NFCCraig Topper2019-11-061-1/+0
|
* [X86] Remove dead code from combineStore.Craig Topper2019-11-061-44/+10
| | | | | | Leftovers from before we switched to widening legalization. Fixes PR43919.
* Temporarily Revert "[LV] Apply sink-after & interleave-groups as VPlan ↵Eric Christopher2019-11-065-170/+125
| | | | | | | | transformations (NFC)" as it's causing assert failures. This reverts commit 100e797adb433724a17c9b42b6533cd634cb796b.
* Keep import function list for inlinee profile updateWenlei He2019-11-062-8/+16
| | | | | | | | | | | | | Summary: When adjusting function entry counts after inlining, Funciton::setEntryCount is called without providing an import function list. The side effect of that is the previously set import function list will be dropped. The import function list is used by ThinLTO to help import hot cross module callee for LTO inlining, so dropping that during ThinLTO pre-link may adversely affect LTO inlining. The fix is to keep the list while updating entry counts for inlining. Reviewers: wmi, davidxl, tejohnson Subscribers: mehdi_amini, hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69736
* [AArch64][SVE] Add remaining patterns and intrinsics for add/sub/mad patternsDanilo Carvalho Grael2019-11-062-23/+38
| | | | | | | | | | | Add pattern matching and intrinsics for the following instructions: predicated orr, eor, and, bic predicated mul, smulh, umulh, sdiv, udiv, sdivr, udivr predicated smax, umax, smin, umin, sabd, uabd mad, msb, mla, mls https://reviews.llvm.org/D69588
* AMDGPU: Select global atomicrmw faddMatt Arsenault2019-11-065-13/+21
| | | | This only works if there is no use of the return value.
* Temporarily Revert:Eric Christopher2019-11-061-169/+98
| | | | | | | | | | | | | | | | | "[SLP] Generalization of stores vectorization." "[SLP] Fix -Wunused-variable. NFC" "[SLP] Vectorize jumbled stores." As they're causing significant (10-30x) compile time regressions on vectorizable code. The primary cause of the compile-time regression is f228b5371647f471853c5fb3e6719823a42fe451. This reverts commits: f228b5371647f471853c5fb3e6719823a42fe451 5503455ccb3f5fcedced158332c016c8d3a7fa81 21d498c9c0f32dcab5bc89ac593aa813b533b43a
* [AMDGPU] Add handling of 160 bit registers in analyzeResourceUsageStanislav Mekhanoshin2019-11-061-0/+7
| | | | | | This was omitted. Also SReg_96Reg missed IsSGPR assignment. Differential Revision: https://reviews.llvm.org/D69919
OpenPOWER on IntegriCloud