bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Revert "[LVI] Normalize pointer behavior"	Nikita Popov	2019-11-08	1	-94/+89
\| \| \| \| \| \| \|	This reverts commit 15bc4dc9a8949f9cffd46ec647baf0818d28fb28. clang-cmake-x86_64-sde-avx512-linux buildbot reported quite a few compile-time regressions in test-suite, will investigate.
*	[LVI] Normalize pointer behavior	Nikita Popov	2019-11-08	1	-89/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Related to D69686. As noted there, LVI currently behaves differently for integer and pointer values: For integers, the block value is always valid inside the basic block, while for pointers it is only valid at the end of the basic block. I believe the integer behavior is the correct one, and CVP relies on it via its getConstantRange() uses. The reason for the special pointer behavior is that LVI checks whether a pointer is dereferenced in a given basic block and marks it as non-null in that case. Of course, this information is valid only after the dereferencing instruction, or in conservative approximation, at the end of the block. This patch changes the treatment of dereferencability: Instead of including it inside the block value, we instead treat it as something similar to an assume (it essentially is a non-nullness assume) and incorporate this information in intersectAssumeOrGuardBlockValueConstantRange() if the context instruction is the terminator of the basic block. This happens either when determining an edge-value internally in LVI, or when a terminator was explicitly passed to getValueAt(). The latter case makes this change not fully NFC, because we can now fold terminator icmps based on the dereferencability information in the same block. This is the reason why I changed one JumpThreading test (it would optimize the condition away without the change). Of course, we do not want to recompute dereferencability on each intersectAssume call, so we need a new cache for this. The dereferencability analysis requires walking the entire basic block and computing underlying objects of all memory operands. This was previously done separately for each queried pointer value. In the new implementation (both because this makes the caching simpler, and because it is faster), I instead only walk the full BB once and cache all the dereferenced pointers. So the traversal is now performed only once per BB, instead of once per queried pointer value. I think the overall model now makes more sense than before, and there will be no more pitfalls due to differing integer/pointer behavior. Differential Revision: https://reviews.llvm.org/D69914
*	TimeTraceProfiler - fix uninitialized variable warning. NFCI.	Simon Pilgrim	2019-11-08	1	-3/+3
\|
*	[LICM] Support hosting of dynamic allocas out of loops	Philip Reames	2019-11-08	1	-0/+45
\| \| \| \| \| \| \| \| \| \| \| \|	This patch implements a correct, but not terribly useful, transform. In particular, if we have a dynamic alloca in a loop which is guaranteed to execute, and provably not captured, we hoist the alloca out of the loop. The capture tracking is needed so that we can prove that each previous stack region dies before the next one is allocated. The transform decreases the amount of stack allocation needed by a linear factor (e.g. the iteration count of the loop). Now, I really hope no one is actually using dynamic allocas. As such, why this patch? Well, the actual problem I'm hoping to make progress on is allocation hoisting. There's a large draft patch out for review (https://reviews.llvm.org/D60056), and this patch was the smallest chunk of testable functionality I could come up with which takes a step vaguely in that direction. Once this is in, it makes motivating the changes to capture tracking mentioned in TODOs testable. After that, I hope to extend this to trivial malloc free regions (i.e. free dominating all loop exits) and allocation functions for GCed languages. Differential Revision: https://reviews.llvm.org/D69227
*	[LICM] Hoisting of widenable conditions out of loops	Philip Reames	2019-11-08	1	-0/+4
\| \| \| \| \| \|	The change itself is straight forward and obvious, but ... there's an existing test checking for exactly the opposite. Both I and Artur think this is simply conservatism in the initial implementation. If anyone bisects a problem to this, a counter example will be very interesting. Differential Revision: https://reviews.llvm.org/D69907
*	[CostModel] Fixed isExtractSubvectorMask for undef index off end	Tim Renouf	2019-11-08	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ShuffleVectorInst::isExtractSubvectorMask, introduced in [CostModel] Add SK_ExtractSubvector handling to getInstructionThroughput (PR39368) erroneously thought that %340 = shufflevector <4 x float> %339, <4 x float> undef, <3 x i32> <i32 2, i32 3, i32 undef> is a subvector extract, even though it goes off the end of the parent vector with the undef index. That then caused an assert in BasicTTIImplBase::getExtractSubvectorOverhead. This commit fixes that, by not considering the above a subvector extract. Differential Revision: https://reviews.llvm.org/D70005 Change-Id: I87b8b00b24bef19ffc9a1b82ef4eca3b8a246eaf
*	[PowerPC] Remove redundant CRSET/CRUNSET in custom lowering of known CR bit ↵	Yi-Hong Lyu	2019-11-08	3	-3/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	spills We lower known CR bit spills (CRSET/CRUNSET) to load and spill the known value but forgot to remove the redundant spills. e.g., This sequence was used to spill a CRUNSET: crclr 4cr5+lt mfocrf r3,4 rlwinm r3,r3,20,0,0 stw r3,132(r1) Custom lowering of known CR bit spills lower it to: crxor 4cr5+lt, 4cr5+lt, 4cr5+lt li r3,0 stw r3,132(r1) crxor is redundant if there is no use of 4*cr5+lt so we should remove it Differential revision: https://reviews.llvm.org/D67722
*	raw_ostream - fix static analyzer warnings. NFCI.	Simon Pilgrim	2019-11-08	1	-6/+6
\| \| \| \| \|	- uninitialized variables - make BufferKind a scoped enum class
*	[NFC] ConstantRange::subWithNoWrap(): fixup comment	Roman Lebedev	2019-11-08	1	-1/+1
\|
*	[ConstantRange] Add umul_sat()/smul_sat() methods	Roman Lebedev	2019-11-08	1	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: To be used in `ConstantRange::mulWithNoOverflow()`, may in future be useful for when saturating shift/mul ops are added. These are precise as far as i can tell. I initially though i will need `APInt::[us]mul_sat()` for these, but it turned out much simpler to do what `ConstantRange::multiply()` does - perform multiplication in twice the bitwidth, and then truncate. Though here we want saturating signed truncation. Reviewers: nikic, reames, spatel Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69994
*	[APInt] Add saturating truncation methods	Roman Lebedev	2019-11-08	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The signed one is needed for implementation of `ConstantRange::smul_sat()`, unsigned is for completeness only. Reviewers: nikic, RKSimon, spatel Reviewed By: nikic Subscribers: hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69993
*	[llvm-xray] Add AArch64 to llvm-xray extract	Aditya Kumar	2019-11-08	1	-5/+17
\| \| \| \| \| \| \| \| \|	This required adding support for resolving R_AARCH64_ABS64 relocations to get accurate addresses for function names to resolve. Authored by: ianlevesque (Ian Levesque) Reviewers: dberris, phosek, smeenai, tetsuo-cpp Differential Revision: https://reviews.llvm.org/D69967
*	[XCOFF][AIX] Differentiate usage of label symbol and csect symbol	Jason Liu	2019-11-08	11	-74/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We are using symbols to represent label and csect interchangeably before, and that could be a problem. There are cases we would need to add storage mapping class to the symbol if that symbol is actually the name of a csect, but it's hard for us to figure out whether that symbol is a label or csect. This patch intend to do the following: 1. Construct a QualName (A name include the storage mapping class) MCSymbolXCOFF for every MCSectionXCOFF. 2. Keep a pointer to that QualName inside of MCSectionXCOFF. 3. Use that QualName whenever we need a symbol refers to that MCSectionXCOFF. 4. Adapt the snowball effect from the above changes in XCOFFObjectWriter.cpp. Reviewers: xingxue, DiggerLin, sfertile, daltenty, hubert.reinterpretcast Reviewed By: DiggerLin, daltenty Subscribers: wuzish, nemanjai, mgorny, hiraditya, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69633
*	[AMDGPU][MC] Corrected src0 for v_movrelsd_b32 and v_movrelsd_2_b32	Dmitry Preobrazhensky	2019-11-08	1	-6/+8
\| \| \| \| \| \| \| \|	See https://bugs.llvm.org/show_bug.cgi?id=40903 Reviewers: arsenm, rampitec Differential Revision: https://reviews.llvm.org/D69888
*	[LV] Apply sink-after & interleave-groups as VPlan transformations (NFCI)	Gil Rapaport	2019-11-08	5	-125/+170
\| \| \| \| \| \| \| \|	This recommits 100e797adb433724a17c9b42b6533cd634cb796b (reverted in 009e032634b3bd7fc32071ac2344b12142286477 for failing an assert). While the root cause was independently reverted in eaff3004019f97c64c88ab76da6b25106b659b30, this commit includes a LIT to make sure IVDescriptor's SinkAfter logic does not try to sink branch instructions.
*	BinaryStream - fix static analyzer warnings. NFCI.	Simon Pilgrim	2019-11-08	1	-4/+4
\| \| \| \| \| \|	- uninitialized variables - documention warnings - shadow variable names
*	Reland: [TII] Use optional destination and source pair as a return value; NFC	Djordje Todorovic	2019-11-08	12	-106/+76
\| \| \| \| \| \| \| \| \| \|	Refactor usage of isCopyInstrImpl, isCopyInstr and isAddImmediate methods to return optional machine operand pair of destination and source registers. Patch by Nikola Prica Differential Revision: https://reviews.llvm.org/D69622
*	Revert d91ed80 "[codeview] Reference types in type parent scopes"	Hans Wennborg	2019-11-08	2	-35/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This triggered asserts in the Chromium build, see https://crbug.com/1022729 for details and reproducer. > Without this change, when a nested tag type of any kind (enum, class, > struct, union) is used as a variable type, it is emitted without > emitting the parent type. In CodeView, parent types point to their inner > types, and inner types do not point back to their parents. We already > walk over all of the parent scopes to build the fully qualified name. > This change simply requests their type indices as we go along to enusre > they are all emitted. > > Fixes PR43905 > > Reviewers: akhuang, amccarth > > Differential Revision: https://reviews.llvm.org/D69924
*	[RAGreedy] Enable -consider-local-interval-cost for AArch64	Sanne Wouda	2019-11-08	2	-2/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The greedy register allocator occasionally decides to insert a large number of unnecessary copies, see below for an example. The -consider-local-interval-cost option (which X86 already enables by default) fixes this. We enable this option for AArch64 only after receiving feedback that this change is not beneficial for PowerPC. We evaluated the impact of this change on compile time, code size and performance benchmarks. This option has a small impact on compile time, measured on CTMark. A 0.1% geomean regression on -O1 and -O2, and 0.2% geomean for -O3, with at most 0.5% on individual benchmarks. The effect on both code size and performance on AArch64 for the LLVM test suite is nil on the geomean with individual outliers (ignoring short exec_times) between: best worst size..text -3.3% +0.0% exec_time -5.8% +2.3% On SPEC CPU® 2017 (compiled for AArch64) there is a minor reduction (-0.2% at most) in code size on some benchmarks, with a tiny movement (-0.01%) on the geomean. Neither intrate nor fprate show any change in performance. This patch makes the following changes. - For the AArch64 target, enableAdvancedRASplitCost() now returns true. - Ensures that -consider-local-interval-cost=false can disable the new behaviour if necessary. This matrix multiply example: $ cat test.c long A[8][8]; long B[8][8]; long C[8][8]; void run_test() { for (int k = 0; k < 8; k++) { for (int i = 0; i < 8; i++) { for (int j = 0; j < 8; j++) { C[i][j] += A[i][k] * B[k][j]; } } } } results in the following generated code on AArch64: $ clang --target=aarch64-arm-none-eabi -O3 -S test.c -o - [...] // %for.cond1.preheader // =>This Inner Loop Header: Depth=1 add x14, x11, x9 str q0, [sp, #16] // 16-byte Folded Spill ldr q0, [x14] mov v2.16b, v15.16b mov v15.16b, v14.16b mov v14.16b, v13.16b mov v13.16b, v12.16b mov v12.16b, v11.16b mov v11.16b, v10.16b mov v10.16b, v9.16b mov v9.16b, v8.16b mov v8.16b, v31.16b mov v31.16b, v30.16b mov v30.16b, v29.16b mov v29.16b, v28.16b mov v28.16b, v27.16b mov v27.16b, v26.16b mov v26.16b, v25.16b mov v25.16b, v24.16b mov v24.16b, v23.16b mov v23.16b, v22.16b mov v22.16b, v21.16b mov v21.16b, v20.16b mov v20.16b, v19.16b mov v19.16b, v18.16b mov v18.16b, v17.16b mov v17.16b, v16.16b mov v16.16b, v7.16b mov v7.16b, v6.16b mov v6.16b, v5.16b mov v5.16b, v4.16b mov v4.16b, v3.16b mov v3.16b, v1.16b mov x12, v0.d[1] fmov x15, d0 ldp q1, q0, [x14, #16] ldur x1, [x10, #-256] ldur x2, [x10, #-192] add x9, x9, #64 // =64 mov x13, v1.d[1] fmov x16, d1 ldr q1, [x14, #48] mul x3, x15, x1 mov x14, v0.d[1] fmov x17, d0 mov x18, v1.d[1] fmov x0, d1 mov v1.16b, v3.16b mov v3.16b, v4.16b mov v4.16b, v5.16b mov v5.16b, v6.16b mov v6.16b, v7.16b mov v7.16b, v16.16b mov v16.16b, v17.16b mov v17.16b, v18.16b mov v18.16b, v19.16b mov v19.16b, v20.16b mov v20.16b, v21.16b mov v21.16b, v22.16b mov v22.16b, v23.16b mov v23.16b, v24.16b mov v24.16b, v25.16b mov v25.16b, v26.16b mov v26.16b, v27.16b mov v27.16b, v28.16b mov v28.16b, v29.16b mov v29.16b, v30.16b mov v30.16b, v31.16b mov v31.16b, v8.16b mov v8.16b, v9.16b mov v9.16b, v10.16b mov v10.16b, v11.16b mov v11.16b, v12.16b mov v12.16b, v13.16b mov v13.16b, v14.16b mov v14.16b, v15.16b mov v15.16b, v2.16b ldr q2, [sp] // 16-byte Folded Reload fmov d0, x3 mul x3, x12, x1 [...] With -consider-local-interval-cost the same section of code results in the following: $ clang --target=aarch64-arm-none-eabi -mllvm -consider-local-interval-cost -O3 -S test.c -o - [...] .LBB0_1: // %for.cond1.preheader // =>This Inner Loop Header: Depth=1 add x14, x11, x9 ldp q0, q1, [x14] ldur x1, [x10, #-256] ldur x2, [x10, #-192] add x9, x9, #64 // =64 mov x12, v0.d[1] fmov x15, d0 mov x13, v1.d[1] fmov x16, d1 ldp q0, q1, [x14, #32] mul x3, x15, x1 cmp x9, #512 // =512 mov x14, v0.d[1] fmov x17, d0 fmov d0, x3 mul x3, x12, x1 [...] Reviewers: SjoerdMeijer, samparker, dmgreen, qcolombet Reviewed By: dmgreen Subscribers: ZhangKang, jsji, wuzish, ppc-slack, lkail, steven.zhang, MatzeB, qcolombet, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69437
*	[RISCV] Fix evaluation of %pcrel_lo	Roger Ferrer Ibanez	2019-11-08	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The following testcase function: .Lpcrel_label1: auipc a0, %pcrel_hi(other_function) addi a1, a0, %pcrel_lo(.Lpcrel_label1) .p2align 2 # Causes a new fragment to be emitted .type other_function,@function other_function: ret exposes an odd behaviour in which only the %pcrel_hi relocation is evaluated but not the %pcrel_lo. $ llvm-mc -triple riscv64 -filetype obj t.s \| llvm-objdump -d -r - <stdin>: file format ELF64-riscv Disassembly of section .text: 0000000000000000 function: 0: 17 05 00 00 auipc a0, 0 4: 93 05 05 00 mv a1, a0 0000000000000004: R_RISCV_PCREL_LO12_I other_function+4 0000000000000008 other_function: 8: 67 80 00 00 ret The reason seems to be that in RISCVAsmBackend::shouldForceRelocation we only consider the fragment but in RISCVMCExpr::evaluatePCRelLo we consider the section. This usually works but there are cases where the section may still be the same but the fragment may be another one. In that case we end forcing a %pcrel_lo relocation without any %pcrel_hi. This patch makes RISCVAsmBackend::shouldForceRelocation use the section, if any, to determine if the relocation must be forced or not. Differential Revision: https://reviews.llvm.org/D60657
*	[NFC][IndVarS] Adjust a comment	Daniil Suchkov	2019-11-08	1	-1/+1
\| \| \| \|	(test commit)
*	[CR] ConstantRange::sshl_sat(): check sigdness of the min/max, not ranges	Roman Lebedev	2019-11-08	1	-2/+2
\| \| \| \| \|	This was pointed out in review, but forgot to stage this change into the commit itself..
*	[ConstantRange] Add `ushl_sat()`/`sshl_sat()` methods.	Roman Lebedev	2019-11-08	1	-0/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: To be used in `ConstantRange::shlWithNoOverflow()`, may in future be useful for when saturating shift/mul ops are added. Unlike `ConstantRange::shl()`, these are precise. Reviewers: nikic, spatel, reames Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69960
*	[BPF] turn on -mattr=+alu32 for cpu version v3 and later	Yonghong Song	2019-11-07	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	-mattr=+alu32 has shown good performance vs. without this attribute. Based on discussion at https://lore.kernel.org/bpf/1ec37838-966f-ec0b-5223-ca9b6eb0860d@fb.com/T/#t cpu version v3 should support -mattr=+alu32. This patch enabled alu32 if cpu version is v3, either specified by user or probed by the llvm. Differential Revision: https://reviews.llvm.org/D69957
*	[PowerPC] Option for enabling absolute jumptables with command line	Nemanja Ivanovic	2019-11-07	1	-0/+5
\| \| \| \| \| \| \| \| \|	This option allows the user to specify the use of absolute jumptables instead of relative which is the default on most PPC subtargets. Patch by Kamauu Bridgeman Differential revision: https://reviews.llvm.org/D69108
*	[InstCombine] Don't transform bitcasts between x86_mmx and v1i64 into ↵	Craig Topper	2019-11-07	1	-2/+3
\| \| \| \| \| \| \| \| \| \|	insertelement/extractelement x86_mmx is conceptually a vector already. Don't introduce an extra conversion between it and scalar i64. I'm using VectorType::isValidElementType which checks for floating point, integer, and pointers to hopefully make this more readable than just blacklisting x86_mmx. Differential Revision: https://reviews.llvm.org/D69964
*	[debugify] Move the Debugify pass from tools/opt to lib/Transform/Utils	Daniel Sanders	2019-11-07	2	-0/+435
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: I need to make use of this pass from a driver program that isn't opt. Therefore this patch moves this pass into the LLVM library so that it is available for use elsewhere. There was one function I kept in tools/opt which is exportDebugifyStats() this is because it's serializing the statistics into a human readable format and this seemed more in keeping with opt than a library function Reviewers: vsk, aprantl Subscribers: mgorny, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69926
*	Revert "[MachineVerifier] Improve verification of live-in lists.	Galina Kistanova	2019-11-07	1	-26/+0
\| \| \| \|	This reverts commit b7b170c to give the author more time to address failing tests on the expensive checks buildbots.
*	[codeview] Reference types in type parent scopes	Reid Kleckner	2019-11-07	2	-14/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Without this change, when a nested tag type of any kind (enum, class, struct, union) is used as a variable type, it is emitted without emitting the parent type. In CodeView, parent types point to their inner types, and inner types do not point back to their parents. We already walk over all of the parent scopes to build the fully qualified name. This change simply requests their type indices as we go along to enusre they are all emitted. Fixes PR43905 Reviewers: akhuang, amccarth Differential Revision: https://reviews.llvm.org/D69924
*	Wrong debug info generated at -O2 (-O0 is correct)	Vedant Kumar	2019-11-07	3	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Instcombiner pass was erasing trivially dead instruction without updating dependent llvm.dbg.value. which was not showing programmer current state of variables while debugging. As a part of this fix I did following, Iterate throught all the users (llvm.dbg) of a instruction which is trivially dead and set each if them undef, Before deleting the instruction. Now user will see optimized out, when try to print those variables. This fixes https://bugs.llvm.org/show_bug.cgi?id=43893 This is my first fix to llvm. Patch by kamlesh kumar! Differential Revision: https://reviews.llvm.org/D69809
*	[AsmWritter] Fixed "null check after dereferencing" warning	Dávid Bolvanský	2019-11-07	1	-4/+2
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: The 'BB->getParent()' pointer was utilized before it was verified against nullptr. Check lines: 3567, 3581. Reviewers: jyknight, RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69751
*	Revert "[XCOFF] Fix link errors from explicit template instantiation"	Reid Kleckner	2019-11-07	1	-4/+0
\| \| \| \| \| \| \| \|	This reverts commit c989993ba1a666f04f7aee7df51d9f4de0588b71. maskray already fixed the explicit instantiation definition in the .cpp file, and these extern template declarations seem to be causing warnings that I don't understand.
*	[XCOFF] Fix link errors from explicit template instantiation	Reid Kleckner	2019-11-07	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I happen to be using clang-cl+lld-link locally, and I get these link errors: lld-link: error: undefined symbol: public: unsigned short __cdecl llvm::object::XCOFFSectionHeader<struct llvm::object::XCOFFSectionHeader64>::getSectionType(void) const >>> referenced by C:\src\llvm-project\llvm\tools\llvm-readobj\XCOFFDumper.cpp:106 >>> tools\llvm-readobj\CMakeFiles\llvm-readobj.dir\XCOFFDumper.cpp.obj:(public: virtual void __cdecl `anonymous namespace'::XCOFFDumper::printSectionHeaders(void)) I suspect this is because the explicit template instaniation appears before the inline method definitions in the .cpp file, so they aren't available at the point of instantiation. Move the explicit instantiation later. Also, forward declare the explicit instantiation for good measure.
*	[XCOFF] Move explicit instantions after member function definitions to fix ↵	Fangrui Song	2019-11-07	1	-4/+4
\| \| \| \|	clang builds
*	[InstCombine] canonicalize shift+logic+shift to reduce dependency chain	Sanjay Patel	2019-11-07	1	-0/+46
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	shift (logic (shift X, C0), Y), C1 --> logic (shift X, C0+C1), (shift Y, C1) This is an IR translation of an existing SDAG transform added here: rL370617 So we again have 9 possible patterns with a commuted IR variant of each pattern: https://rise4fun.com/Alive/VlI https://rise4fun.com/Alive/n1m https://rise4fun.com/Alive/1Vn Part of the motivation is to allow easier recognition and subsequent canonicalization of bswap patterns as discussed in PR43146: https://bugs.llvm.org/show_bug.cgi?id=43146 We had to delay this transform because it used to allow the SLP vectorizer to create awful reductions out of simple load-combines. That problem was fixed with: rL375025 (we'll bring back load combining in IR someday...) The backend is also better equipped to deal with these patterns now using hooks like TLI.getShiftAmountThreshold(). The only remaining potential controversy is that the -reassociate pass tends to reverse this kind of pattern (to help GVN?). But since -reassociate doesn't do anything with these specific patterns, there is no conflict currently. Finally, there's a new pass proposal at D67383 for general tree-height-reduction reassociation, and it could use a cost model to decide how to optimally rearrange these kinds of ops for a target. That patch appears to be stalled. Differential Revision: https://reviews.llvm.org/D69842
*	X86FrameLowering - fix bool to unsigned cast static analyzer warnings. NFCI.	Simon Pilgrim	2019-11-07	1	-7/+7
\|
*	PostRAScheduler - fix uninitialized variable warning. NFCI.	Simon Pilgrim	2019-11-07	1	-1/+1
\|
*	ManagedStringPool - pre-increment iterator. NFC.	Simon Pilgrim	2019-11-07	1	-1/+1
\|
*	X86CondBrFolding - remove non-existent fixBranchProb function. NFC.	Simon Pilgrim	2019-11-07	1	-2/+0
\|
*	Using crtp to refactor the xcoff section header	diggerlin	2019-11-07	1	-8/+19
\| \| \| \| \| \| \| \| \| \| \| \| \|	SUMMARY: According to https://reviews.llvm.org/D68575#inline-617586, Create a NFC patch for it. Using crtp to refactor the xcoff section header Move the define of SectionFlagsReservedMask and SectionFlagsTypeMask from XCOFFDumper.cpp to XCOFFObjectFile.h Reviewers: hubert.reinterpretcast,jasonliu Subscribers: rupprecht, seiyai,hiraditya Differential Revision: https://reviews.llvm.org/D69131
*	comment shiftamountthreshold	joanlluch	2019-11-07	1	-0/+1
\|
*	[SDAG] reduce code duplication; NFC	Sanjay Patel	2019-11-07	1	-18/+11
\|
*	[SDAG] reduce code duplication; NFC	Sanjay Patel	2019-11-07	1	-4/+4
\|
*	[ConstantRange][LVI] Use overflow flags from `sub` to constrain the range	Roman Lebedev	2019-11-07	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This notably improves non-negativity deduction: ``` \| statistic \| old \| new \| delta \| % change \| \| correlated-value-propagation.NumAShrs \| 209 \| 227 \| 18 \| 8.6124% \| \| correlated-value-propagation.NumAddNSW \| 4972 \| 4988 \| 16 \| 0.3218% \| \| correlated-value-propagation.NumAddNUW \| 7141 \| 7148 \| 7 \| 0.0980% \| \| correlated-value-propagation.NumAddNW \| 12113 \| 12136 \| 23 \| 0.1899% \| \| correlated-value-propagation.NumAnd \| 442 \| 445 \| 3 \| 0.6787% \| \| correlated-value-propagation.NumNSW \| 7160 \| 7176 \| 16 \| 0.2235% \| \| correlated-value-propagation.NumNUW \| 13306 \| 13316 \| 10 \| 0.0752% \| \| correlated-value-propagation.NumNW \| 20466 \| 20492 \| 26 \| 0.1270% \| \| correlated-value-propagation.NumSDivs \| 207 \| 212 \| 5 \| 2.4155% \| \| correlated-value-propagation.NumSExt \| 6279 \| 6679 \| 400 \| 6.3704% \| \| correlated-value-propagation.NumSRems \| 28 \| 29 \| 1 \| 3.5714% \| \| correlated-value-propagation.NumShlNUW \| 2793 \| 2796 \| 3 \| 0.1074% \| \| correlated-value-propagation.NumShlNW \| 3964 \| 3967 \| 3 \| 0.0757% \| \| correlated-value-propagation.NumUDivs \| 353 \| 358 \| 5 \| 1.4164% \| \| instcount.NumAShrInst \| 13763 \| 13741 \| -22 \| -0.1598% \| \| instcount.NumAddInst \| 277349 \| 277348 \| -1 \| -0.0004% \| \| instcount.NumLShrInst \| 27437 \| 27463 \| 26 \| 0.0948% \| \| instcount.NumOrInst \| 102677 \| 102678 \| 1 \| 0.0010% \| \| instcount.NumSDivInst \| 8732 \| 8727 \| -5 \| -0.0573% \| \| instcount.NumSExtInst \| 80872 \| 80468 \| -404 \| -0.4996% \| \| instcount.NumSRemInst \| 1679 \| 1678 \| -1 \| -0.0596% \| \| instcount.NumTruncInst \| 62154 \| 62153 \| -1 \| -0.0016% \| \| instcount.NumUDivInst \| 2526 \| 2527 \| 1 \| 0.0396% \| \| instcount.NumURemInst \| 1589 \| 1590 \| 1 \| 0.0629% \| \| instcount.NumZExtInst \| 69405 \| 69809 \| 404 \| 0.5821% \| \| instcount.TotalInsts \| 7439575 \| 7439574 \| -1 \| 0.0000% \| ``` Reviewers: nikic, reames, spatel Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69942
*	[ThinLTO] Import readonly vars with refs	evgeny	2019-11-07	5	-14/+48
\| \| \| \| \| \|	Patch allows importing declarations of functions and variables, referenced by the initializer of some other readonly variable. Differential revision: https://reviews.llvm.org/D69561
*	[SLP] allow forming 2-way reduction patterns	Sanjay Patel	2019-11-07	1	-8/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have a vector compare reduction problem seen in PR39665 comment 2: https://bugs.llvm.org/show_bug.cgi?id=39665#c2 Or slightly reduced here: define i1 @cmp2(<2 x double> %a0) { %a = fcmp ogt <2 x double> %a0, <double 1.0, double 1.0> %b = extractelement <2 x i1> %a, i32 0 %c = extractelement <2 x i1> %a, i32 1 %d = and i1 %b, %c ret i1 %d } SLP would not attempt to turn this into a vector reduction because there is an artificial lower limit on that transform. We can not completely remove that limit without inducing regressions though, so this patch just hacks an extra attempt at creating a 2-way reduction to the end of the analysis. As shown in the test file, we are still not getting some of the motivating cases, so follow-on patches will be needed to solve those cases. Differential Revision: https://reviews.llvm.org/D59710
*	[mips] Write `AFL_EXT_OCTEONP` flag to the `.MIPS.abiflags` section	Simon Atanasyan	2019-11-07	1	-1/+3
\| \| \| \|	Differential Revision: https://reviews.llvm.org/D69851
*	[mips] Support `octeon+` CPU in the `.set arch=` directive	Simon Atanasyan	2019-11-07	1	-2/+3
\| \| \| \|	Differential Revision: https://reviews.llvm.org/D69850
*	[mips] Implement Octeon+ `saa` and `saad` instructions	Simon Atanasyan	2019-11-07	10	-16/+130
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	`saa` and `saad` are 32-bit and 64-bit store atomic add instructions. memory[base] = memory[base] + rt These instructions are available for "Octeon+" CPU. The patch adds support for both instructions to MIPS assembler and diassembler and introduces new CPU type - "octeon+". Next patches will implement `.set arch=octeon+` directive and `AFL_EXT_OCTEONP` ISA extension flag support. Differential Revision: https://reviews.llvm.org/D69849
*	Revert f0c2a5a "[LV] Generalize conditions for sinking instrs for first ↵	Hans Wennborg	2019-11-07	1	-26/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	order recurrences." It broke Chromium, causing "Instruction does not dominate all uses!" errors. See https://bugs.chromium.org/p/chromium/issues/detail?id=1022297#c1 for a reproducer. > If the recurrence PHI node has a single user, we can sink any > instruction without side effects, given that all users are dominated by > the instruction computing the incoming value of the next iteration > ('Previous'). We can sink instructions that may cause traps, because > that only causes the trap to occur later, but not on any new paths. > > With the relaxed check, we also have to make sure that we do not have a > direct cycle (meaning PHI user == 'Previous), which indicates a > reduction relation, which potentially gets missed by > ReductionDescriptor. > > As follow-ups, we can also sink stores, iff they do not alias with > other instructions we move them across and we could also support sinking > chains of instructions and multiple users of the PHI. > > Fixes PR43398. > > Reviewers: hsaito, dcaballe, Ayal, rengolin > > Reviewed By: Ayal > > Differential Revision: https://reviews.llvm.org/D69228