bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[ConstantRange][LVI] Use overflow flags from `sub` to constrain the range	Roman Lebedev	2019-11-07	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This notably improves non-negativity deduction: ``` \| statistic \| old \| new \| delta \| % change \| \| correlated-value-propagation.NumAShrs \| 209 \| 227 \| 18 \| 8.6124% \| \| correlated-value-propagation.NumAddNSW \| 4972 \| 4988 \| 16 \| 0.3218% \| \| correlated-value-propagation.NumAddNUW \| 7141 \| 7148 \| 7 \| 0.0980% \| \| correlated-value-propagation.NumAddNW \| 12113 \| 12136 \| 23 \| 0.1899% \| \| correlated-value-propagation.NumAnd \| 442 \| 445 \| 3 \| 0.6787% \| \| correlated-value-propagation.NumNSW \| 7160 \| 7176 \| 16 \| 0.2235% \| \| correlated-value-propagation.NumNUW \| 13306 \| 13316 \| 10 \| 0.0752% \| \| correlated-value-propagation.NumNW \| 20466 \| 20492 \| 26 \| 0.1270% \| \| correlated-value-propagation.NumSDivs \| 207 \| 212 \| 5 \| 2.4155% \| \| correlated-value-propagation.NumSExt \| 6279 \| 6679 \| 400 \| 6.3704% \| \| correlated-value-propagation.NumSRems \| 28 \| 29 \| 1 \| 3.5714% \| \| correlated-value-propagation.NumShlNUW \| 2793 \| 2796 \| 3 \| 0.1074% \| \| correlated-value-propagation.NumShlNW \| 3964 \| 3967 \| 3 \| 0.0757% \| \| correlated-value-propagation.NumUDivs \| 353 \| 358 \| 5 \| 1.4164% \| \| instcount.NumAShrInst \| 13763 \| 13741 \| -22 \| -0.1598% \| \| instcount.NumAddInst \| 277349 \| 277348 \| -1 \| -0.0004% \| \| instcount.NumLShrInst \| 27437 \| 27463 \| 26 \| 0.0948% \| \| instcount.NumOrInst \| 102677 \| 102678 \| 1 \| 0.0010% \| \| instcount.NumSDivInst \| 8732 \| 8727 \| -5 \| -0.0573% \| \| instcount.NumSExtInst \| 80872 \| 80468 \| -404 \| -0.4996% \| \| instcount.NumSRemInst \| 1679 \| 1678 \| -1 \| -0.0596% \| \| instcount.NumTruncInst \| 62154 \| 62153 \| -1 \| -0.0016% \| \| instcount.NumUDivInst \| 2526 \| 2527 \| 1 \| 0.0396% \| \| instcount.NumURemInst \| 1589 \| 1590 \| 1 \| 0.0629% \| \| instcount.NumZExtInst \| 69405 \| 69809 \| 404 \| 0.5821% \| \| instcount.TotalInsts \| 7439575 \| 7439574 \| -1 \| 0.0000% \| ``` Reviewers: nikic, reames, spatel Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69942
*	[ThinLTO] Import readonly vars with refs	evgeny	2019-11-07	5	-14/+48
\| \| \| \| \| \|	Patch allows importing declarations of functions and variables, referenced by the initializer of some other readonly variable. Differential revision: https://reviews.llvm.org/D69561
*	[SLP] allow forming 2-way reduction patterns	Sanjay Patel	2019-11-07	1	-8/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have a vector compare reduction problem seen in PR39665 comment 2: https://bugs.llvm.org/show_bug.cgi?id=39665#c2 Or slightly reduced here: define i1 @cmp2(<2 x double> %a0) { %a = fcmp ogt <2 x double> %a0, <double 1.0, double 1.0> %b = extractelement <2 x i1> %a, i32 0 %c = extractelement <2 x i1> %a, i32 1 %d = and i1 %b, %c ret i1 %d } SLP would not attempt to turn this into a vector reduction because there is an artificial lower limit on that transform. We can not completely remove that limit without inducing regressions though, so this patch just hacks an extra attempt at creating a 2-way reduction to the end of the analysis. As shown in the test file, we are still not getting some of the motivating cases, so follow-on patches will be needed to solve those cases. Differential Revision: https://reviews.llvm.org/D59710
*	[mips] Write `AFL_EXT_OCTEONP` flag to the `.MIPS.abiflags` section	Simon Atanasyan	2019-11-07	1	-1/+3
\| \| \| \|	Differential Revision: https://reviews.llvm.org/D69851
*	[mips] Support `octeon+` CPU in the `.set arch=` directive	Simon Atanasyan	2019-11-07	1	-2/+3
\| \| \| \|	Differential Revision: https://reviews.llvm.org/D69850
*	[mips] Implement Octeon+ `saa` and `saad` instructions	Simon Atanasyan	2019-11-07	10	-16/+130
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	`saa` and `saad` are 32-bit and 64-bit store atomic add instructions. memory[base] = memory[base] + rt These instructions are available for "Octeon+" CPU. The patch adds support for both instructions to MIPS assembler and diassembler and introduces new CPU type - "octeon+". Next patches will implement `.set arch=octeon+` directive and `AFL_EXT_OCTEONP` ISA extension flag support. Differential Revision: https://reviews.llvm.org/D69849
*	Revert f0c2a5a "[LV] Generalize conditions for sinking instrs for first ↵	Hans Wennborg	2019-11-07	1	-26/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	order recurrences." It broke Chromium, causing "Instruction does not dominate all uses!" errors. See https://bugs.chromium.org/p/chromium/issues/detail?id=1022297#c1 for a reproducer. > If the recurrence PHI node has a single user, we can sink any > instruction without side effects, given that all users are dominated by > the instruction computing the incoming value of the next iteration > ('Previous'). We can sink instructions that may cause traps, because > that only causes the trap to occur later, but not on any new paths. > > With the relaxed check, we also have to make sure that we do not have a > direct cycle (meaning PHI user == 'Previous), which indicates a > reduction relation, which potentially gets missed by > ReductionDescriptor. > > As follow-ups, we can also sink stores, iff they do not alias with > other instructions we move them across and we could also support sinking > chains of instructions and multiple users of the PHI. > > Fixes PR43398. > > Reviewers: hsaito, dcaballe, Ayal, rengolin > > Reviewed By: Ayal > > Differential Revision: https://reviews.llvm.org/D69228
*	[AMDGPU] Fix bug introduced in 47a5c36b37f0	dfukalov	2019-11-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: [AMDGPU] Fix bug introduced in 47a5c36b37f0 Reviewers: foad, arsenm Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69915
*	[X86] Remove unused variable. NFC	Craig Topper	2019-11-06	1	-1/+0
\|
*	[X86] Remove dead code from combineStore.	Craig Topper	2019-11-06	1	-44/+10
\| \| \| \| \| \|	Leftovers from before we switched to widening legalization. Fixes PR43919.
*	Temporarily Revert "[LV] Apply sink-after & interleave-groups as VPlan ↵	Eric Christopher	2019-11-06	5	-170/+125
\| \| \| \| \| \| \| \|	transformations (NFC)" as it's causing assert failures. This reverts commit 100e797adb433724a17c9b42b6533cd634cb796b.
*	Keep import function list for inlinee profile update	Wenlei He	2019-11-06	2	-8/+16
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When adjusting function entry counts after inlining, Funciton::setEntryCount is called without providing an import function list. The side effect of that is the previously set import function list will be dropped. The import function list is used by ThinLTO to help import hot cross module callee for LTO inlining, so dropping that during ThinLTO pre-link may adversely affect LTO inlining. The fix is to keep the list while updating entry counts for inlining. Reviewers: wmi, davidxl, tejohnson Subscribers: mehdi_amini, hiraditya, dexonsmith, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69736
*	[AArch64][SVE] Add remaining patterns and intrinsics for add/sub/mad patterns	Danilo Carvalho Grael	2019-11-06	2	-23/+38
\| \| \| \| \| \| \| \| \| \| \|	Add pattern matching and intrinsics for the following instructions: predicated orr, eor, and, bic predicated mul, smulh, umulh, sdiv, udiv, sdivr, udivr predicated smax, umax, smin, umin, sabd, uabd mad, msb, mla, mls https://reviews.llvm.org/D69588
*	AMDGPU: Select global atomicrmw fadd	Matt Arsenault	2019-11-06	5	-13/+21
\| \| \| \|	This only works if there is no use of the return value.
*	Temporarily Revert:	Eric Christopher	2019-11-06	1	-169/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"[SLP] Generalization of stores vectorization." "[SLP] Fix -Wunused-variable. NFC" "[SLP] Vectorize jumbled stores." As they're causing significant (10-30x) compile time regressions on vectorizable code. The primary cause of the compile-time regression is f228b5371647f471853c5fb3e6719823a42fe451. This reverts commits: f228b5371647f471853c5fb3e6719823a42fe451 5503455ccb3f5fcedced158332c016c8d3a7fa81 21d498c9c0f32dcab5bc89ac593aa813b533b43a
*	[AMDGPU] Add handling of 160 bit registers in analyzeResourceUsage	Stanislav Mekhanoshin	2019-11-06	1	-0/+7
\| \| \| \| \| \|	This was omitted. Also SReg_96Reg missed IsSGPR assignment. Differential Revision: https://reviews.llvm.org/D69919
*	[LoopPred] Enable new transformation by default	Philip Reames	2019-11-06	1	-1/+1
\| \| \| \| \| \| \| \|	The basic idea of the transform is to convert variant loop exit conditions into invariant exit conditions by changing the iteration on which the exit is taken when we know that the trip count is unobservable. See the original patch which introduced the code for a more complete explanation. The individual parts of this have been reviewed, the result has been fuzzed, and then further analyzed by hand, but despite all of that, I will not be suprised to see breakage here. If you see problems, please don't hesitate to revert - though please do provide a test case. The most likely class of issues are latent SCEV bugs and without a reduced test case, I'll be essentially stuck on reducing them. (Note: A bunch of tests were opted out of the new transform to preserve coverage. That landed in a previous commit to simplify revert cycles if they turn out to be needed.)
*	When lowering calls and tail calls in AArch64, the register mask and	Eric Christopher	2019-11-06	1	-3/+3
\| \| \| \| \| \| \| \| \| \|	return value location depends on the calling convention of the callee. `F.getCallingConv()`, however, is the caller CC. Correct it to the callee CC from `CallLoweringInfo`. Fixes PR43449 Patch by Shu-Chun Weng!
*	[ConstantRange] Add `subWithNoWrap()` method	Roman Lebedev	2019-11-07	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Much like D67339, adds ConstantRange handling for when we know no-wrap behavior of the `sub`. Unlike addWithNoWrap(), we only get lucky re returning empty set for signed wrap. For unsigned, we must perform overflow check manually. A patch that makes use of this in LVI (CVP) to be posted later. Reviewers: nikic, shchenz, efriedma Reviewed By: nikic Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69918
*	[ConstantRange] Cleanup addWithNoWrap() by just piggybacking on ↵	Roman Lebedev	2019-11-07	1	-32/+8
\| \| \| \| \| \| \| \| \|	sadd_sat()/uadd_sat() As discussed in https://reviews.llvm.org/D69918 that happens to work as intended, and returns empty set if there is always an overflow because we get lucky with intersection. Since there's now an explicit test for that, let's prefer cleaner code.
*	[JITLink] Refactor EH-frame handling to support eh-frames with existing relocs.	Lang Hames	2019-11-06	6	-320/+537
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Some targets (E.g. MachO/arm64) use relocations to fix some CFI record fields in the eh-frame section. When relocations are used the initial (pre-relocation) content of the eh-frame section can no longer be interpreted by following the eh-frame specification. This causes errors in the existing eh-frame parser. This patch moves eh-frame handling into two LinkGraph passes that are run after relocations have been parsed (but before they are applied). The first] pass breaks up blocks in the eh-frame section into per-CFI-record blocks, and the second parses blocks of (potentially multiple) CFI records and adds the appropriate edges to any CFI fields that do not have existing relocations. These passes can be run independently of one another. By handling eh-frame splitting/fixing with LinkGraph passes we can both re-use existing relocations for CFI record fields and avoid applying eh-frame fixups before parsing the section (which would complicate the linker and require extra temporary allocations of working memory).
*	[Orc] Fix iterator usage after remove	Alexandre Ganea	2019-11-06	1	-1/+4
\| \| \| \|	Differential Revision: https://reviews.llvm.org/D69805
*	[JumpThreading] Factor out code to clone instructions (NFC)	Kazu Hirata	2019-11-06	1	-31/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch factors out code to clone instructions -- partly for readability and partly to facilitate an upcoming patch of my own. Reviewers: wmi Subscribers: hiraditya, jfb, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69861
*	[WC] Fix a subtle bug in our definition of widenable branch	Philip Reames	2019-11-06	3	-7/+13
\| \| \| \| \| \| \| \| \| \| \| \|	We had a subtle, but nasty bug in our definition of a widenable branch, and thus in the transforms which used that utility. Specifically, we returned true for any branch which included a widenable condition within it's condition, regardless of whether that widenable condition also had other uses. The problem is that the result of the WC() call is defined to be one particular value. As such, all users must agree as to what that value is. If we widen a branch without also updating all other users of the WC in the same way, we have broken the required semantics. Most of the textual diff is updating existing transforms not to leave dead uses hanging around. They're largely NFC as the dead instructions would be immediately deleted by other passes. The reason to make these changes is so that the transforms preserve the widenable branch form. In practice, we don't get bitten by this only because it isn't profitable to CSE WC() calls and the lowering pass from guards uses distinct WC calls per branch. Differential Revision: https://reviews.llvm.org/D69916
*	[Analysis] Attribute deref/deref_or_null should not prevent tail call ↵	Dávid Bolvanský	2019-11-06	1	-1/+5
\| \| \| \|	optimization
*	[LoopPred] Fix two subtle issues found by inspection	Philip Reames	2019-11-06	2	-8/+37
\| \| \| \| \| \| \| \| \| \| \| \|	This patch fixes two issues noticed by inspection when going to enable the loop predication code in IndVarSimplify. Issue 1 - Both the LoopPredication transform, and the already on by default optimizeLoopExits transform, modify the exit count of the exits they modify. (either to 0 or Infinity) Looking at the code more closely, this was not reflected into SCEV and we were instead running later transforms with incorrect SCEVs. Fixing this requires forgetting the loop, weakening a too strong assert, and updating SCEV to not pessimize results when a loop is provable untaken. I haven't been able to find a test case to demonstrate the miscompile. Issue 2 - For modules without a data layout, we can end up with unsized pointer typed exit counts. Just bail out of this case. I think these are the last two issues which need addressed before we enable this by default. The code has already survived a decent amount of fuzzing without revealing either of the above. Differential Revision: https://reviews.llvm.org/D69695
*	[X86] Clamp large constant shift amounts for MMX shift intrinsics to 8-bits.	Craig Topper	2019-11-06	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The MMX intrinsics for shift by immediate take a 32-bit shift amount but the hardware for shifting by immediate only encodes 8-bits. For the intrinsic we don't require the shift amount to fit in 8-bits in the frontend because we don't check that its an immediate in the frontend. If its is not an immediate we move it to an MMX register and use the shift by register. But if it is an immediate we'll use the shift by immediate instruction. But we need to change the shift amount to 8-bits. We were previously doing this accidentally by masking it in the encoder. But this can make a large shift amount into a small in bounds shift amount. Instead we should clamp larger shift amounts to 255 so that the they don't become in bounds. Fixes PR43922
*	[AArch64] Re-add patterns for (s/u)mull2.	Eli Friedman	2019-11-06	1	-0/+19
\| \| \| \| \| \| \|	These patterns were added in D46009, but removed in D54276 due to missing test coverage. Differential Revision: https://reviews.llvm.org/D69831
*	Fix a typo in my previous commit	Steven Wu	2019-11-06	1	-1/+1
\|
*	[Object][MachO] Rewrite macho-invalid-fat-arch-size into YAML	Steven Wu	2019-11-06	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Rewrite one of the invalid macho test input file with YAML file. The original invalid macho is breaking our internal test infrastusture because it is too broken to be copy around. Need to relax an assertion in the YAML/MachoEmitter to allow yaml2obj to write an invalid object like this. rdar://problem/56879982 Reviewers: beanz, mtrent Reviewed By: beanz Subscribers: hiraditya, jkorous, dexonsmith, ributzka, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69856
*	[X86TargetTransformInfo] Fixed warning: Expression 'ISD == ISD::UREM' is ↵	Dávid Bolvanský	2019-11-06	1	-1/+1
\| \| \| \|	always true. NFCI.
*	[X86] Fix SLM v2i64 ADD/Sub/CMPEQ instruction schedules	Simon Pilgrim	2019-11-06	1	-0/+16
\| \| \| \| \| \|	Noticed while fixing the reduction costs for D59710 - the SLM model doesn't account for the poor throughput of v2i64 ops. Numbers taken from Intel AOM (+ checked against Agner)
*	[X86] Fix SLM v2f64 ADD/MUL + FP BLEND/HADD instruction schedules	Simon Pilgrim	2019-11-06	1	-7/+7
\| \| \| \|	Noticed while fixing the reduction costs for D59710 - the SLM model doesn't account for the poor throughput of v2f64/v2i64 ops.
*	[X86ISelLowering] Fixed typo in assert. NFCI.	Dávid Bolvanský	2019-11-06	1	-1/+1
\|
*	[CostModel][X86] Improve add vXi64 + fadd vXf64 reduction tests for SLM	Simon Pilgrim	2019-11-06	1	-0/+26
\| \| \| \|	As noted on D59710 we weren't handling the high costs of these operations on SLM.
*	[CommandLine] Add inline ArgName printing	Don Hinton	2019-11-06	1	-14/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch adds PrintArgInline (after PrintArg) that strips the leading spaces from an argument before printing them, for usage inline. Related bug: PR42943 <https://bugs.llvm.org/show_bug.cgi?id=42943> Patch by Daan Sprenkels! Reviewers: jhenderson, chandlerc, hintonda Reviewed By: jhenderson Subscribers: hiraditya, kristina, llvm-commits, dsprenkels Tags: #llvm Differential Revision: https://reviews.llvm.org/D69501
*	DWARFDebugLoclists: Move to a incremental parsing model	Pavel Labath	2019-11-06	3	-123/+120
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch stems from the discussion D68270 (including some offline talks). The idea is to provide an "incremental" api for parsing location lists, which will avoid caching or materializing parsed data. An additional goal is to provide a high level location list api, which abstracts the differences between different encoding schemes, and can be used by users which don't care about those (such as LLDB). This patch implements the first part. It implements a call-back based "visitLocationList" api. This function parses a single location list, calling a user-specified callback for each entry. This is going to be the base api, which other location list functions (right now, just the dumping code) are going to be based on. Future patches will do something similar for the v4 location lists, and add a mechanism to translate raw entries into concrete address ranges. Reviewers: dblaikie, probinson, JDevlieghere, aprantl, SouraVX Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69672
*	[NFC][APInt] Fix typos in comments.	Miloš Stojanović	2019-11-06	1	-3/+3
\| \| \| \|	Testing git commit access.
*	[x86] avoid crashing when splitting AVX stores with non-simple type (PR43916)	Sanjay Patel	2019-11-06	1	-3/+5
\| \| \| \| \|	The store splitting transform was assuming a simple type (MVT), but that's not necessarily the case as shown in the test.
*	[Support] fix mingw-w64 build	Ilya Biryukov	2019-11-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	Older versions of Mingw-w64 do not define _beginthreadex_proc_type, so we replace it with `unsigned (__stdcall ThreadFunc)(void )`. Fixes https://github.com/clangd/clangd/issues/188 Patch by lh123! Differential Revision: https://reviews.llvm.org/D69879
*	[X86] Fix uninitialized variable warnings. NFCI.	Simon Pilgrim	2019-11-06	7	-26/+26
\|
*	[X86] LowerAVXExtend - fix dodgy self-comparison assert.	Simon Pilgrim	2019-11-06	1	-1/+1
\| \| \| \|	PVS Studio noticed that we were asserting "VT.getVectorNumElements() == VT.getVectorNumElements()" instead of "VT.getVectorNumElements() == InVT.getVectorNumElements()".
*	[AArch64] Move the branch relaxation pass after BTI insertion	Momchil Velikov	2019-11-06	1	-3/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Inserting BTI instructions can push branch destinations out of range. The branch relaxation pass itself cannot insert indirect branches since `TargetInstrInfo::insertIndirecrtBranch` is not implemented for AArch64 (guess +/-128 MB direct branch range is more than enough in practice). Testing this is a bit tricky. The original test case we have is 155kloc/6.1M. I've generated a test case using this program: ``` int main() { std::cout << R"src(int test(); void g0(), g1(), g2(), g3(), g4(), e(); void f(int v) { if ((test() & 2) == 0) { switch (v) { case 0: g0(); case 1: g1(); case 2: g2(); case 3: g3(); } )src"; const int N = 8176; for (int i = 0; i < N; ++i) std::cout << " void h" << i << "();\n"; for (int i = 0; i < N; ++i) std::cout << " h" << i << "();\n"; std::cout << R"src( } else { e(); } } )src"; } ``` which is still a bit too much to commit as a regression test, IMHO. Reviewers: t.p.northover, ostannard Reviewed By: ostannard Subscribers: kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69118 Change-Id: Ide5c922bcde08ff4cf635da5e52365525a997a0a
*	[LoopUnroll] countToEliminateCompares(): fix handling of [in]equality ↵	Roman Lebedev	2019-11-06	1	-16/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	predicates (PR43840) Summary: I believe this bisects to https://reviews.llvm.org/D44983 (`[LoopUnroll] Only peel if a predicate becomes known in the loop body.`) While that revision did contain tests that showed arguably-subpar peeling for [in]equality predicates that [not] happen in the middle of the loop, it also disabled peeling for the first loop iteration, because latch would be canonicalized to [in]equality comparison.. That was intentional as per https://reviews.llvm.org/D44983#1059583. I'm not 100% sure that i'm using correct checks here, but this fix appears to be going in the right direction.. Let me know if i'm missing some checks here.. Fixes [[ https://bugs.llvm.org/show_bug.cgi?id=43840 \| PR43840 ]]. Reviewers: fhahn, mkazantsev, efriedma Reviewed By: fhahn Subscribers: xbolva00, hiraditya, zzheng, llvm-commits, fhahn Tags: #llvm Differential Revision: https://reviews.llvm.org/D69617
*	[AMDGPU] Improve code size cost model (part 2)	dfukalov	2019-11-06	1	-18/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Added estimations for ShuffleVector, some cast and arithmetic instructions Reviewers: rampitec Reviewed By: rampitec Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, hiraditya, zzheng, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69629
*	[TTI][LV] preferPredicateOverEpilogue	Sjoerd Meijer	2019-11-06	4	-5/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We have two ways to steer creating a predicated vector body over creating a scalar epilogue. To force this, we have 1) a command line option and 2) a pragma available. This adds a third: a target hook to TargetTransformInfo that can be queried whether predication is preferred or not, which allows the vectoriser to make the decision without forcing it. While this change behaves as a non-functional change for now, it shows the required TTI plumbing, usage of this new hook in the vectoriser, and the beginning of an ARM MVE implementation. I will follow up on this with: - a complete MVE implementation, see D69845. - a patch to disable this, i.e. we should respect "vector_predicate(disable)" and its corresponding loophint. Differential Revision: https://reviews.llvm.org/D69040
*	[ARM,MVE] Add intrinsics for gather/scatter load/stores.	Simon Tatham	2019-11-06	1	-39/+143
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds two new families of intrinsics, both of which are memory accesses taking a vector of locations to load from / store to. The vldrq_gather_base / vstrq_scatter_base intrinsics take a vector of base addresses, and an immediate offset to be added consistently to each one. vldrq_gather_offset / vstrq_scatter_offset take a scalar base address, and a vector of offsets to add to it. The 'shifted_offset' variants also multiply each offset by the element size type, so that the vector is effectively of array indices. At the IR level, these operations are represented by a single set of four IR intrinsics: {gather,scatter} × {base,offset}. The other details (signed/unsigned, shift, and memory element size as opposed to vector element size) are all specified by IR intrinsic polymorphism and immediate operands, because that made the selection job easier than making a huge family of similarly named intrinsics. I considered using the standard IR representations such as llvm.masked.gather, but they're not a good fit. In order to use llvm.masked.gather to represent a gather_offset load with element size smaller than a pointer, you'd have to expand the <8 x i16> vector of offsets into an <8 x i16*> vector of pointers, which would be split up during legalization, so you'd spend most of your time undoing the mess it had made. Also, ISel support for llvm.masked.gather would be easy enough in a trivial way (you can expand it into a gather-base load with a zero immediate offset), but instruction-selecting lots of fiddly idioms back into all the _other_ MVE load instructions would be much more work. So I think dedicated IR intrinsics are the more sensible approach, at least for the moment. On the clang tablegen side, I've added two new features to the Tablegen source accepted by MveEmitter: a 'CopyKind' type node for defining a type that varies with the parameter type (it lets you ask for an unsigned integer type of the same width as the parameter), and an 'unsignedflag' value node for passing an immediate IR operand which is 0 for a signed integer type or 1 for an unsigned one. That lets me write each kind of intrinsic just once and get all its subtypes and immediate arguments generated automatically. Also I've tweaked the handling of pointer-typed values in the code generation part of MveEmitter: they're generated as Address rather than Value (i.e. including an alignment) so that they can be given to the ordinary IR load and store operations, but I'd omitted the code to convert them back to Value when they're going to be used as an argument to an IR intrinsic. On the MC side, I've enhanced MVEVectorVTInfo so that it can tell you not only the full assembly-language suffix for a given vector type (like 's32' or 'u16') but also the numeric-only one used by store instructions (just '32' or '16'). Reviewers: dmgreen Subscribers: kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D69791
*	YAML parser robustness improvements	Thomas Finch	2019-11-05	2	-14/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch fixes a number of bugs found in the YAML parser through fuzzing. In general, this makes the parser more robust against malformed inputs. The fixes are mostly improved null checking and returning errors in more cases. In some cases, asserts were changed to regular errors, this provides the same robustness but also protects release builds from the triggering conditions. This also improves the fuzzability of the YAML parser since asserts can act as a roadblock to further fuzzing once they're hit. Each fix has a corresponding test case: - TestAnchorMapError - Added proper null pointer handling in `Stream::printError` if N is null and `KeyValueNode::getValue` if getKey returns null, `Input::createHNodes` `dyn_casts` changed to `dyn_cast_or_null` so the null pointer checks are actually able to fail - TestFlowSequenceTokenErrors - Added case in `Document::parseBlockNode` for FlowMappingEnd, FlowSequenceEnd, or FlowEntry tokens outside of mappings or sequences - TestDirectiveMappingNoValue - Changed assert to regular error return in `Scanner::scanValue` - TestUnescapeInfiniteLoop - Fixed infinite loop in `ScalarNode::unescapeDoubleQuoted` by returning an error for unrecognized escape codes - TestScannerUnexpectedCharacter - Changed asserts to regular error returns in `Scanner::consume` - TestUnknownDirective - For both of the inputs the stream doesn't fail and correctly returns TK_Error, but there is no valid root node for the document. There's no reasonable way to make the scanner fail for unknown directives without breaking the YAML spec (see spec-07-01.test). I think the assert is unnecessary given that an error is still generated for this case. The `SimpleKeys.clear()` line fixes a bug found by AddressSanitizer triggered by multiple test cases - when TokenQueue is cleared SimpleKeys is still holding dangling pointers into it, so SimpleKeys should be cleared as well. Patch by Thomas Finch! Reviewers: chandlerc, Bigcheese, hintonda Reviewed By: Bigcheese, hintonda Subscribers: hintonda, kristina, beanz, dexonsmith, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D61608
*	[PowerPC] Fix the incorrect 'RM' flag set on load/store instr	QingShan Zhang	2019-11-06	1	-1/+1
\| \| \| \| \| \|	The 'RM' flag model the "Rounding Mode" and it has nothing to do with the load/store instructions. Differential Revision: https://reviews.llvm.org/D69551
*	Implement `sys::getHostCPUName()` for Darwin ARM	Chris Bieneman	2019-11-05	1	-1/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Currently there is no implementation of `sys::getHostCPUName()` for Darwin ARM targets. This patch makes it so that LLVM running on ARM makes reasonable guesses about the CPU features of the host CPU. Reviewers: t.p.northover, lhames, efriedma Reviewed By: efriedma Subscribers: rjmccall, efriedma, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69597