bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Symbolize: Make DWPName a symbolizer option instead of an argument to ↵	Peter Collingbourne	2019-06-11	1	-10/+8
\| \| \| \| \| \| \| \| \| \| \| \|	symbolize{,Inlined}Code. This makes the interface simpler and more consistent with the interface for .dSYM files and fixes a bug where llvm-symbolizer would not read the dwp if it was asked to symbolize data before symbolizing code. Differential Revision: https://reviews.llvm.org/D63114 llvm-svn: 363025
*	AtomicExpand: Don't crash on non-0 alloca	Matt Arsenault	2019-06-11	2	-2/+7
\| \| \| \| \| \| \|	This now produces garbage on AMDGPU with a call to an nonexistent, anonymous libcall but won't assert. llvm-svn: 363022
*	AMDGPU: Expand < 32-bit atomics	Matt Arsenault	2019-06-11	2	-0/+6
\| \| \| \| \| \|	Also fix AtomicExpand asserting on atomicrmw fadd/fsub. llvm-svn: 363021
*	llvm-lib: Implement /machine: argument	Nico Weber	2019-06-11	2	-10/+40
\| \| \| \| \| \| \| \| \| \|	And share some code with lld-link. While here, also add a FIXME about PR42180 and merge r360150 to llvm-lib. Differential Revision: https://reviews.llvm.org/D63021 llvm-svn: 363016
*	[AArch64] Add more CPUs to host detection	Yi Kong	2019-06-11	1	-0/+6
\| \| \| \| \| \| \| \| \|	Returns "cortex-a73" for 3rd and 4th gen Kryo; not precisely correct, but close enough. Differential Revision: https://reviews.llvm.org/D63099 llvm-svn: 363013
*	[MIR-Canon] Fixing non-determinism that was breaking bots (NFC).	Puyan Lotfi	2019-06-11	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \|	An earlier fix of a subtle iterator invalidation bug had uncovered a nondeterminism that was present in the MultiUsers bag. Problem was that MultiUsers was being looked up using pointers. This patch is an NFC change that numbers each multiuser and processes each in numbered order. This fixes the test failure on netbsd and will likely fix the green-dragon bot too. llvm-svn: 363012
*	[Support] Explicitly detect recursive response files	Shoaib Meenai	2019-06-10	1	-9/+50
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previous detection relied upon an arbitrary hard coded limit of 21 response files, which some code bases were running up against. The new detection maintains a stack of processing response files and explicitly checks if a newly encountered file is in the current stack. Some bookkeeping data is necessary in order to detect when to pop the stack. Patch by Chris Glover. Differential Revision: https://reviews.llvm.org/D62798 llvm-svn: 363005
*	[PGO] Fix the buildbot failure in r362995	Rong Xu	2019-06-10	1	-5/+2
\| \| \| \| \| \|	Fixed one unused variable warning. llvm-svn: 363004
*	[PGO] Handle cases of non-instrument BBs	Rong Xu	2019-06-10	1	-38/+85
\| \| \| \| \| \| \| \| \| \| \|	As shown in PR41279, some basic blocks (such as catchswitch) cannot be instrumented. This patch filters out these BBs in PGO instrumentation. It also sets the profile count to the fail-to-instrument edge, so that we can propagate the counts in the CFG. Differential Revision: https://reviews.llvm.org/D62700 llvm-svn: 362995
*	CMake: Make most target symbols hidden by default	Tom Stellard	2019-06-10	105	-105/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: For builds with LLVM_BUILD_LLVM_DYLIB=ON and BUILD_SHARED_LIBS=OFF this change makes all symbols in the target specific libraries hidden by default. A new macro called LLVM_EXTERNAL_VISIBILITY has been added to mark symbols in these libraries public, which is mainly needed for the definitions of the LLVMInitialize* functions. This patch reduces the number of public symbols in libLLVM.so by about 25%. This should improve load times for the dynamic library and also make abi checker tools, like abidiff require less memory when analyzing libLLVM.so One side-effect of this change is that for builds with LLVM_BUILD_LLVM_DYLIB=ON and LLVM_LINK_LLVM_DYLIB=ON some unittests that access symbols that are no longer public will need to be statically linked. Before and after public symbol counts (using gcc 8.2.1, ld.bfd 2.31.1): nm before/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 36221 nm after/libLLVM-9svn.so \| grep ' [A-Zuvw] ' \| wc -l 26278 Reviewers: chandlerc, beanz, mgorny, rnk, hans Reviewed By: rnk, hans Subscribers: Jim, hiraditya, michaelplatings, chapuni, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, javed.absar, sbc100, jgravelle-google, aheejin, kbarton, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, zzheng, edward-jones, mgrang, atanasyan, rogfer01, MartinMosbeck, brucehoult, the_o, PkmX, jocewei, kristina, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D54439 llvm-svn: 362990
*	[GlobalISel] Translate memset/memmove/memcpy from undef ptrs into nops	Jessica Paquette	2019-06-10	1	-0/+13
\| \| \| \| \| \| \| \| \| \| \| \| \|	If the source is undef, then just don't do anything. This matches SelectionDAG's behaviour in SelectionDAG.cpp. Also add a test showing that we do the right thing here. (irtranslator-memfunc-undef.ll) Differential Revision: https://reviews.llvm.org/D63095 llvm-svn: 362989
*	Factor out a helper function for readability and reuse in a future patch [NFC]	Philip Reames	2019-06-10	1	-2/+8
\| \| \| \|	llvm-svn: 362980
*	[LFTR] Use recomputed BE count	Philip Reames	2019-06-10	1	-9/+1
\| \| \| \| \| \| \| \| \| \| \| \|	This was discussed as part of D62880. The basic thought is that computing BE taken count after widening should produce (on average) an equally good backedge taken count as the one before widening. Since there's only one test in the suite which is impacted by this change, and it's essentially equivelent codegen, that seems to be a reasonable assertion. This change was separated from r362971 so that if this turns out to be problematic, the triggering piece is obvious and easily revertable. For the nestedIV example from elim-extend.ll, we end up with the following BE counts: BEFORE: (-2 + (-1 * %innercount) + %limit) AFTER: (-1 + (sext i32 (-1 + %limit) to i64) + (-1 * (sext i32 %innercount to i64))<nsw>) Note that before is an i32 type, and the after is an i64. Truncating the i64 produces the i32. llvm-svn: 362975
*	[PowerPC][HTM]Fix $zero is not a GPRC register for builtin_ttest	Jinsong Ji	2019-06-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was found during HTM cleanup. Adding a test for builtin_ttest would expose following issue. * Bad machine code: Illegal physical register for instruction * - function: test10 - basic block: %bb.0 entry (0xf0e57497b58) - instruction: %5:crrc0 = TABORTWCI 0, $zero, 0 - operand 2: $zero $zero is not a GPRC register. LLVM ERROR: Found 1 machine code errors. Differential Revision: https://reviews.llvm.org/D63079 llvm-svn: 362974
*	Prepare for multi-exit LFTR [NFC]	Philip Reames	2019-06-10	1	-65/+77
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change does the plumbing to wire an ExitingBB parameter through the LFTR implementation, and reorganizes the code to work in terms of a set of individual loop exits. Most of it is fairly obvious, but there's one key complexity which makes it worthy of consideration. The actual multi-exit LFTR patch is in D62625 for context. Specifically, it turns out the existing code uses the backedge taken count from before a IV is widened. Oddly, we can end up with a different (more expensive, but semantically equivelent) BE count for the loop when requerying after widening. For the nestedIV example from elim-extend, we end up with the following BE counts: BEFORE: (-2 + (-1 * %innercount) + %limit) AFTER: (-1 + (sext i32 (-1 + %limit) to i64) + (-1 * (sext i32 %innercount to i64))<nsw>) This is the only test in tree which seems sensitive to this difference. The actual result of using the wider BETC on this example is that we actually produce slightly better code. :) In review, we decided to accept that test change. This patch is structured to preserve the old behavior, but a separate change will immediate follow with the behavior change. (I wanted it separate for problem attribution purposes.) Differential Revision: https://reviews.llvm.org/D62880 llvm-svn: 362971
*	[FastISel] Skip creating unnecessary vregs for arguments	Francis Visoiu Mistrih	2019-06-10	2	-33/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This behavior was added in r130928 for both FastISel and SD, and then disabled in r131156 for FastISel. This re-enables it for FastISel with the corresponding fix. This is triggered only when FastISel can't lower the arguments and falls back to SelectionDAG for it. FastISel contains a map of "register fixups" where at the end of the selection phase it replaces all uses of a register with another register that FastISel sometimes pre-assigned. Code at the end of SelectionDAGISel::runOnMachineFunction is doing the replacement at the very end of the function, while other pieces that come in before that look through the MachineFunction and assume everything is done. In this case, the real issue is that the code emitting COPY instructions for the liveins (physreg to vreg) (EmitLiveInCopies) is checking if the vreg assigned to the physreg is used, and if it's not, it will skip the COPY. If a register wasn't replaced with its assigned fixup yet, the copy will be skipped and we'll end up with uses of undefined registers. This fix moves the replacement of registers before the emission of copies for the live-ins. The initial motivation for this fix is to enable tail calls for swiftself functions, which were blocked because we couldn't prove that the swiftself argument (which is callee-save) comes from a function argument (live-in), because there was an extra copy (vreg to vreg). A few tests are affected by this: * llvm/test/CodeGen/AArch64/swifterror.ll: we used to spill x21 (callee-save) but never reload it because it's attached to the return. We now don't even spill it anymore. * llvm/test/CodeGen//swiftself.ll: we tail-call now. llvm/test/CodeGen/AMDGPU/mubuf-legalize-operands.ll: I believe this test was not really testing the right thing, but it worked because the same registers were re-used. * llvm/test/CodeGen/ARM/cmpxchg-O0.ll: regalloc changes * llvm/test/CodeGen/ARM/swifterror.ll: get rid of a copy * llvm/test/CodeGen/Mips/: get rid of spills and copies llvm/test/CodeGen/SystemZ/swift-return.ll: smaller stack * llvm/test/CodeGen/X86/atomic-unordered.ll: smaller stack * llvm/test/CodeGen/X86/swifterror.ll: same as AArch64 * llvm/test/DebugInfo/X86/dbg-declare-arg.ll: stack size changed Differential Revision: https://reviews.llvm.org/D62361 llvm-svn: 362963
*	[ExecutionEngine] Fix rL362941: Add UnaryOperator visitor to the interpreter	Cameron McInally	2019-06-10	1	-0/+2
\| \| \| \| \| \|	Missed break statements. This was D62881. llvm-svn: 362958
*	[AMDGPU] Optimize image_[load\|store]_mip	Piotr Sobczak	2019-06-10	4	-0/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Replace image_load_mip/image_store_mip with image_load/image_store if lod is 0. Reviewers: arsenm, nhaehnle Reviewed By: arsenm Subscribers: arsenm, kzhuravl, jvesely, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63073 llvm-svn: 362957
*	Revert rL362953 and its followup rL362955.	Simon Tatham	2019-06-10	20	-1588/+90
\| \| \| \| \| \| \| \|	These caused a build failure because I managed not to notice they depended on a later unpushed commit in my current stack. Sorry about that. llvm-svn: 362956
*	[ARM] Add the non-MVE instructions in Arm v8.1-M.	Simon Tatham	2019-06-10	1	-17/+7
\| \| \| \| \| \| \| \|	This should have been part of r362953, but I had a finger-trouble incident and committed the old rather than new version of the patch. Sorry. llvm-svn: 362955
*	[InstCombine] allow unordered preds when canonicalizing to fabs()	Sanjay Patel	2019-06-10	1	-2/+2
\| \| \| \| \| \| \| \| \| \|	We have a known-never-nan value via 'nnan', so an unordered predicate is the same as its ordered sibling. Similar to: rL362937 llvm-svn: 362954
*	[ARM] Add the non-MVE instructions in Arm v8.1-M.	Simon Tatham	2019-06-10	20	-90/+1598
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This adds support for the new family of conditional selection / increment / negation instructions; the low-overhead branch instructions (e.g. BF, WLS, DLS); the CLRM instruction to zero a whole list of registers at once; the new VMRS/VMSR and VLDR/VSTR instructions to get data in and out of 8.1-M system registers, particularly including the new VPR register used by MVE vector predication. To support this, we also add a register name 'zr' (used by the CSEL family to force one of the inputs to the constant 0), and operand types for lists of registers that are also allowed to include APSR or VPR (used by CLRM). The VLDR/VSTR instructions also need some new addressing modes. The low-overhead branch instructions exist in their own separate architecture extension, which we treat as enabled by default, but you can say -mattr=-lob or equivalent to turn it off. Reviewers: dmgreen, samparker, SjoerdMeijer, t.p.northover Reviewed By: samparker Subscribers: miyuki, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62667 llvm-svn: 362953
*	[DebugInfo] Terminate all location-lists at end of block	Jeremy Morse	2019-06-10	2	-117/+113
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit reapplies r359426 (which was reverted in r360301 due to performance problems) and rolls in D61940 to address the performance problem. I've combined the two to avoid creating a span of slow-performance, and to ease reverting if more problems crop up. The summary of D61940: This patch removes the "ChangingRegs" facility in DbgEntityHistoryCalculator, as its overapproximate nature can produce incorrect variable locations. An unchanging register doesn't mean a variable doesn't change its location. The patch kills off everything that calculates the ChangingRegs vector. Previously ChangingRegs spotted epilogues and marked registers as unchanging if they weren't modified outside the epilogue, increasing the chance that we can emit a single-location variable record. Without this feature, debug-loc-offset.mir and pr19307.mir become temporarily XFAIL. They'll be re-enabled by D62314, using the FrameDestroy flag to identify epilogues, I've split this into two steps as FrameDestroy isn't necessarily supported by all backends. The logic for terminating variable locations at the end of a basic block now becomes much more enjoyably simple: we just terminate them all. Other test changes: inlined-argument.ll becomes XFAIL, but for a longer term. The current algorithm for detecting that a variable has a single-location doesn't work in this scenario (inlined function in multiple blocks), only other bugs were making this test work. fission-ranges.ll gets slightly refreshed too, as the location of "p" is now correctly determined to be a single location. Differential Revision: https://reviews.llvm.org/D61940 llvm-svn: 362951
*	[InstCombine] fix bug in canonicalization to fabs()	Sanjay Patel	2019-06-10	1	-2/+4
\| \| \| \| \| \|	Forgot to translate the predicate clauses in rL362943. llvm-svn: 362945
*	[InstCombine] change canonicalization to fabs() to use FMF on fsub	Sanjay Patel	2019-06-10	1	-19/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Similar to rL362909: This isn't the ideal fix (use FMF on the select), but it's still an improvement until we have better FMF propagation to selects and other FP math operators. I don't think there's much risk of regression from this change by not including the FMF on the fcmp any more. The nsz/nnan FMF should be the same on the fcmp and the fsub because they have the same operand. llvm-svn: 362943
*	[ARM] Disallow PC, and optionally SP, in VMOVRH and VMOVHR.	Simon Tatham	2019-06-10	1	-4/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Arm v8.1-M supports the VMOV instructions that move a half-precision value to and from a GPR, but not if the GPR is SP or PC. To fix this, I've changed those instructions to use the rGPR register class instead of GPR. rGPR always excludes PC, and it excludes SP except in the presence of the HasV8Ops target feature (i.e. Arm v8-A). So the effect is that VMOV.F16 to and from PC is now illegal everywhere, but VMOV.F16 to and from SP is illegal only on non-v8-A cores (which I believe is all as it should be). Reviewers: dmgreen, samparker, SjoerdMeijer, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D60704 llvm-svn: 362942
*	[ExecutionEngine] Add UnaryOperator visitor to the interpreter	Cameron McInally	2019-06-10	2	-0/+53
\| \| \| \| \| \| \| \|	This is to support the unary FNeg instruction. Differential Revision: https://reviews.llvm.org/D62881 llvm-svn: 362941
*	[InstCombine] allow unordered preds when canonicalizing to fabs()	Sanjay Patel	2019-06-10	1	-2/+4
\| \| \| \| \| \| \|	PR42179: https://bugs.llvm.org/show_bug.cgi?id=42179 llvm-svn: 362937
*	[MCA] Further refactor the bottleneck analysis view. NFCI.	Andrea Di Biagio	2019-06-10	1	-1/+2
\| \| \| \|	llvm-svn: 362933
*	[yaml2obj/obj2yaml] - Make RawContentSection::Content and ↵	George Rimar	2019-06-10	1	-4/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	RawContentSection::Size optional This is a follow-up for D62809. Content and Size fields should be optional as was discussed in comments of the D62809's thread. With that, we can describe a specific string table and symbol table sections in a more correct way and also show appropriate errors. The patch adds lots of test cases where the behavior is described in details. Differential revision: https://reviews.llvm.org/D62957 llvm-svn: 362931
*	[ARM] Enable Unroll UpperBound	David Green	2019-06-10	1	-0/+1
\| \| \| \| \| \| \| \| \| \| \|	This option allows loops with small max trip counts to be fully unrolled. This can help with code like the remainder loops from manually unrolled loops like those that appear in the cmsis dsp library. We would apparently previously runtime unroll them with the default unroll count (4). Differential Revision: https://reviews.llvm.org/D63064 llvm-svn: 362928
*	[DebugInfo] More strict debug range for stack variables	Nikola Prica	2019-06-10	6	-128/+174
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Variable's stack location can stretch longer than it should. If a variable is placed at the stack in a some nested basic block its range can be calculated to be up to the next occurrence of the variable's DBG_VALUE, or up to the end of the function, thus covering a basic blocks that should not be included in the variable’s location range. This happens because the DbgEntityHistoryCalculator ends register locations at the end of a basic block only if the variable’s location register has been changed throughout the function, which is not the case for the register used to reference stack objects. This patch also tries to produce a single value location if the location list builder managed to merge all the locations into one. Reviewers: aprantl, dstenb, jmorse Reviewed By: aprantl, dstenb, jmorse Subscribers: djtodoro, ivanbaev, asowda Tags: #debug-info Differential Revision: https://reviews.llvm.org/D61600 llvm-svn: 362923
*	[DAGCombine] Match a pattern where a wide type scalar value is stored by ↵	QingShan Zhang	2019-06-10	1	-0/+180
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	several narrow stores This opportunity is found from spec 2017 557.xz_r. And it is used by the sha encrypt/decrypt. See sha-2/sha512.c static void store64(u64 x, unsigned char* y) { for(int i = 0; i != 8; ++i) y[i] = (x >> ((7-i) * 8)) & 255; } static u64 load64(const unsigned char* y) { u64 res = 0; for(int i = 0; i != 8; ++i) res \|= (u64)(y[i]) << ((7-i) * 8); return res; } The load64 has been implemented by https://reviews.llvm.org/D26149 This patch is trying to implement the store pattern. Match a pattern where a wide type scalar value is stored by several narrow stores. Fold it into a single store or a BSWAP and a store if the targets supports it. Assuming little endian target: i8 p = ... i32 val = ... p[0] = (val >> 0) & 0xFF; p[1] = (val >> 8) & 0xFF; p[2] = (val >> 16) & 0xFF; p[3] = (val >> 24) & 0xFF; > ((i32)p) = val; i8 p = ... i32 val = ... p[0] = (val >> 24) & 0xFF; p[1] = (val >> 16) & 0xFF; p[2] = (val >> 8) & 0xFF; p[3] = (val >> 0) & 0xFF; > ((i32)p) = BSWAP(val); Differential Revision: https://reviews.llvm.org/D62897 llvm-svn: 362921
*	[X86] When promoting i16 compare with immediate to i32, try to use ↵	Craig Topper	2019-06-10	1	-19/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	sign_extend for eq/ne if the input is truncated from a type with enough sign its. Summary: Our default behavior is to use sign_extend for signed comparisons and zero_extend for everything else. But for equality we have the freedom to use either extension. If we can prove the input has been truncated from something with enough sign bits, we can use sign_extend instead and let DAG combine optimize it out. A similar rule is used by type legalization in LegalizeIntegerTypes. This gets rid of the movzx in PR42189. The immediate will still take 4 bytes instead of the 2 bytes plus 0x66 prefix a cmp di, 32767 would get, but it avoids a length changing prefix. Reviewers: RKSimon, spatel, xbolva00 Reviewed By: xbolva00 Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63032 llvm-svn: 362920
*	[X86] Disable f32->f64 extload when sse2 is enabled	Craig Topper	2019-06-10	3	-26/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We can only use the memory form of cvtss2sd under optsize due to a partial register update. So previously we were emitting 2 instructions for extload when optimizing for speed. Also due to a late optimization in preprocessiseldag we had to handle (fpextend (loadf32)) under optsize. This patch forces extload to expand so that it will always be in the (fpextend (loadf32)) form during isel. And when optimizing for speed we can just let each of those pieces select an instruction independently. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62710 llvm-svn: 362919
*	Do not derive no-recurse attribute if function does not have exact definition.	Vivek Pandya	2019-06-10	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \|	This is fix for https://bugs.llvm.org/show_bug.cgi?id=41336 Reviewers: jdoerfert Reviewed by: jdoerfert Differential Revision: https://reviews.llvm.org/D63045 llvm-svn: 362918
*	[X86] Use EVEX instructions for f128 FAND/FOR/FXOR when avx512vl is enabled.	Craig Topper	2019-06-10	1	-1/+22
\| \| \| \|	llvm-svn: 362915
*	[X86] Convert f32/f64 FANDN/FAND/FOR/FXOR to vector logic ops and ↵	Craig Topper	2019-06-10	3	-138/+43
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	scalar_to_vector/extract_vector_elts to reduce isel patterns. Previously we did the equivalent operation in isel patterns with COPY_TO_REGCLASS operations to transition. By inserting scalar_to_vetors and extract_vector_elts before isel we can allow each piece to be selected individually and accomplish the same final result. I ideally we'd use vector operations earlier in lowering/combine, but that looks to be more difficult. The scalar-fp-to-i64.ll changes are because we have a pattern for using movlpd for store+extract_vector_elt. While an f64 store uses movsd. The encoding sizes are the same. llvm-svn: 362914
*	Revert r361953 "[SVE][IR] Scalable Vector IR Type"	Nico Weber	2019-06-09	9	-99/+13
\| \| \| \| \| \| \|	This reverts commit f4fc01f8dd3a5dfd2060d1ad0df6b90e8351ddf7. It caused a 3-4x slowdown when doing thinlto links, PR42210. llvm-svn: 362913
*	[TargetLowering] Simplify (ctpop x) == 1	David Bolvansky	2019-06-09	1	-1/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: craig.topper, spatel, RKSimon, bkramer Reviewed By: spatel Subscribers: javed.absar, lebedev.ri, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63004 llvm-svn: 362912
*	[InstCombine] foldICmpWithLowBitMaskedVal(): 'icmp sgt/sle': avoid miscompiles	Roman Lebedev	2019-06-09	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	A precondition 'x != 0' was forgotten by me: https://rise4fun.com/Alive/JFNP https://rise4fun.com/Alive/jHvL These 4 folds with non-constants could be re-enabled, but for now let's go for the simplest solution. https://bugs.llvm.org/show_bug.cgi?id=42198 llvm-svn: 362911
*	[InstCombine] change canonicalization to fabs() to use FMF on fneg	Sanjay Patel	2019-06-09	1	-13/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This isn't the ideal fix (use FMF on the select), but it's still an improvement until we have better FMF propagation to selects and other FP math operators. I don't think there's much risk of regression from this change by not including the FMF on the fcmp any more. The nsz/nnan FMF should be the same on the fcmp and the fneg (fsub) because they have the same operand. This works around the most glaring FMF logical inconsistency cited in PR38086: https://bugs.llvm.org/show_bug.cgi?id=38086 llvm-svn: 362909
*	[InstSimplify] reduce code duplication for fcmp folds; NFC	Sanjay Patel	2019-06-09	1	-10/+7
\| \| \| \|	llvm-svn: 362904
*	[InstSimplify] enhance fcmp fold with never-nan operand	Sanjay Patel	2019-06-09	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is another step towards correcting our usage of fast-math-flags when applied on an fcmp. In this case, we are checking for 'nnan' on the fcmp itself rather than the operand of the fcmp. But I'm leaving that clause in until we're more confident that we can stop relying on fcmp's FMF. By using the more general "isKnownNeverNaN()", we gain a simplification shown on the tests with 'uitofp' regardless of the FMF on the fcmp (uitofp never produces a NaN). On the tests with 'fabs', we are now relying on the FMF for the call fabs instruction in addition to the FMF on the fcmp. This is a continuation of D62979 / rL362879. llvm-svn: 362903
*	[MIR] Add simple PRE pass to MachineCSE	Anton Afanasyev	2019-06-09	1	-9/+118
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the second part of the commit fixing PR38917 (hoisting partitially redundant machine instruction). Most of PRE (partitial redundancy elimination) and CSE work is done on LLVM IR, but some of redundancy arises during DAG legalization. Machine CSE is not enough to deal with it. This simple PRE implementation works a little bit intricately: it passes before CSE, looking for partitial redundancy and transforming it to fully redundancy, anticipating that the next CSE step will eliminate this created redundancy. If CSE doesn't eliminate this, than created instruction will remain dead and eliminated later by Remove Dead Machine Instructions pass. The third part of the commit is supposed to refactor MachineCSE, to make it more clear and to merge MachinePRE with MachineCSE, so one need no rely on further Remove Dead pass to clear instrs not eliminated by CSE. First step: https://reviews.llvm.org/D54839 Fixes llvm.org/PR38917 This is fixed recommit of r361356 after PowerPC64 multistage build failure. llvm-svn: 362901
*	[CaptureTracking] Don't let comparisons against null escape inbounds pointers	Ayke van Laethem	2019-06-09	1	-5/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Pointers that are in-bounds (either through dereferenceable_or_null or thorough a getelementptr inbounds) cannot be captured with a comparison against null. There is no way to construct a pointer that is still in bounds but also NULL. This helps safe languages that insert null checks before load/store instructions. Without this patch, almost all pointers would be considered captured even for simple loads. With this patch, an icmp with null will not be seen as escaping as long as certain conditions are met. There was a lot of discussion about this patch. See the Phabricator thread for detals. Differential Revision: https://reviews.llvm.org/D60047 llvm-svn: 362900
*	[X86] NFCI : Comment updation for EVEX to VEX translation.	Jatin Bhateja	2019-06-09	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: llvm-commits, jbhateja Reviewed By: jbhateja Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63055 llvm-svn: 362898
*	Use for-range loop. NFCI.	Simon Pilgrim	2019-06-09	1	-3/+1
\| \| \| \|	llvm-svn: 362897
*	[AArch64][GlobalISel] Select immediate forms of cmp instructions.	Amara Emerson	2019-06-09	1	-5/+17
\| \| \| \| \| \| \| \|	A simple re-use of the immediate operand matcher and renderer functions. rdar://43795178 llvm-svn: 362896
*	[X86] Remove (store (f32 (extractelt (v4f32))) isel patterns which is redundant.	Craig Topper	2019-06-09	2	-15/+0
\| \| \| \| \| \| \|	We emit a MOVSSmr and a COPY_TO_REGCLASS, but that's what we would get from selecting the store and extractelt independently. llvm-svn: 362895