bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[X86] NFCI : Comment updation for EVEX to VEX translation.	Jatin Bhateja	2019-06-09	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: llvm-commits, jbhateja Reviewed By: jbhateja Subscribers: llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63055 llvm-svn: 362898
*	Use for-range loop. NFCI.	Simon Pilgrim	2019-06-09	1	-3/+1
\| \| \| \|	llvm-svn: 362897
*	[AArch64][GlobalISel] Select immediate forms of cmp instructions.	Amara Emerson	2019-06-09	1	-5/+17
\| \| \| \| \| \| \| \|	A simple re-use of the immediate operand matcher and renderer functions. rdar://43795178 llvm-svn: 362896
*	[X86] Remove (store (f32 (extractelt (v4f32))) isel patterns which is redundant.	Craig Topper	2019-06-09	2	-15/+0
\| \| \| \| \| \| \|	We emit a MOVSSmr and a COPY_TO_REGCLASS, but that's what we would get from selecting the store and extractelt independently. llvm-svn: 362895
*	[X86] Mutate scalar fceil/ffloor/ftrunc/fnearbyint/frint into ↵	Craig Topper	2019-06-08	4	-121/+23
\| \| \| \| \| \| \| \|	X86ISD::RNDSCALE during PreProcessIselDAG to cut down on number of isel patterns. Similar was done for vectors in r362535. Removes about 1200 bytes from the isel table. llvm-svn: 362894
*	[DAGCombine] visitAND - merge (zext_inreg ((s)extload x)) -> (zextload x) ↵	Simon Pilgrim	2019-06-08	1	-21/+4
\| \| \| \| \| \| \| \|	combines. NFCI. Same codegen, only differ by the oneuse limit for the sextload case. llvm-svn: 362880
*	[InstSimplify] enhance fcmp fold with never-nan operand	Sanjay Patel	2019-06-08	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is 1 step towards correcting our usage of fast-math-flags when applied on an fcmp. In this case, we are checking for 'nnan' on the fcmp itself rather than the operand of the fcmp. But I'm leaving that clause in until we're more confident that we can stop relying on fcmp's FMF. By using the more general "isKnownNeverNaN()", we gain a simplification shown on the tests with 'uitofp' regardless of the FMF on the fcmp (uitofp never produces a NaN). On the tests with 'fabs', we are now relying on the FMF for the call fabs instruction in addition to the FMF on the fcmp. I'll update the 'ult' case below here as a follow-up assuming no problems here. Differential Revision: https://reviews.llvm.org/D62979 llvm-svn: 362879
*	[ARM] Adjust isLegalT1AddressImmediate for non-legal types	David Green	2019-06-08	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \|	Types such as float and i64's do not have legal loads in Thumb1, but will still be loaded with a LDR (or potentially multiple LDR's). As such we can treat the cost of addressing mode calculations the same as an i32 and get some optimisation benefits. Differential Revision: https://reviews.llvm.org/D62968 llvm-svn: 362874
*	[ARM] Add MVE addressing to isLegalT2AddressImmediate	David Green	2019-06-08	1	-1/+20
\| \| \| \| \| \| \| \| \| \|	Now with MVE being added, we can add the vector addressing mode costs for it. These are generally imm7 multiplied by the size of the type being loaded / stored. Differential Revision: https://reviews.llvm.org/D62967 llvm-svn: 362873
*	[ARM] Add fp16 addressing to isLegalT2AddressImmediate	David Green	2019-06-08	1	-0/+3
\| \| \| \| \| \| \| \| \| \|	The fp16 version of VLDR takes a imm8 multiplied by 2. This updates the costs to account for those, and adds extra testing. It is dependant upon hasFPRegs16 as this is what the load/store instructions require. Differential Revision: https://reviews.llvm.org/D62966 llvm-svn: 362872
*	[ARM] Add HasNEON for all Neon patterns in ARMInstrNEON.td. NFCI	David Green	2019-06-08	1	-78/+177
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We are starting to add an entirely separate vector architecture to the ARM backend. To do that we need at least some separation between the existing NEON and the new MVE code. This patch just goes through the Neon patterns and ensures that they are predicated on HasNEON, giving MVE a stable place to start from. No tests yet as this is largely an NFC, and we don't have the other target that will treat any of these intructions as legal. Differential Revision: https://reviews.llvm.org/D62945 llvm-svn: 362870
*	[SystemZ] Fix CMakeLists.txt for alphabetical order (NFC).	Jonas Paulsson	2019-06-08	1	-1/+1
\| \| \| \|	llvm-svn: 362869
*	[SystemZ, RegAlloc] Favor 3-address instructions during instruction selection.	Jonas Paulsson	2019-06-08	17	-160/+429
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch aims to reduce spilling and register moves by using the 3-address versions of instructions per default instead of the 2-address equivalent ones. It seems that both spilling and register moves are improved noticeably generally. Regalloc hints are passed to increase conversions to 2-address instructions which are done in SystemZShortenInst.cpp (after regalloc). Since the SystemZ reg/mem instructions are 2-address (dst and lhs regs are the same), foldMemoryOperandImpl() can no longer trivially fold a spilled source register since the reg/reg instruction is now 3-address. In order to remedy this, new 3-address pseudo memory instructions are used to perform the folding only when the dst and lhs virtual registers are known to be allocated to the same physreg. In order to not let MachineCopyPropagation run and change registers on these transformed instructions (making it 3-address), a new target pass called SystemZPostRewrite.cpp is run just after VirtRegRewriter, that immediately lowers the pseudo to a target instruction. If it would have been possibe to insert a COPY instruction and change a register operand (convert to 2-address) in foldMemoryOperandImpl() while trusting that the caller (e.g. InlineSpiller) would update/repair the involved LiveIntervals, the solution involving pseudo instructions would not have been needed. This is perhaps a potential improvement (see Phabricator post). Common code changes: * A new hook TargetPassConfig::addPostRewrite() is utilized to be able to run a target pass immediately before MachineCopyPropagation. * VirtRegMap is passed as an argument to foldMemoryOperand(). Review: Ulrich Weigand, Quentin Colombet https://reviews.llvm.org/D60888 llvm-svn: 362868
*	Factor out SelectionDAG's switch analysis and lowering into a separate ↵	Amara Emerson	2019-06-08	5	-767/+573
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	component. In order for GlobalISel to re-use the significant amount of analysis and optimization code in SDAG's switch lowering, we first have to extract it and create an interface to be used by both frameworks. No test changes as it's NFC. Differential Revision: https://reviews.llvm.org/D62745 llvm-svn: 362857
*	[GVN] non-functional code movement	Keno Fischer	2019-06-07	2	-16/+16
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Move some code around, in preparation for later fixes to the non-integral addrspace handling (D59661) Patch By Jameson Nash <jameson@juliacomputing.com> Reviewed By: reames, loladiro Differential Revision: https://reviews.llvm.org/D59729 llvm-svn: 362853
*	AMDGPU: Force skips around traps	Matt Arsenault	2019-06-07	1	-1/+1
\| \| \| \|	llvm-svn: 362852
*	[DomTreeUpdater] Add all insert before all delete updates to reduce compile ↵	Alina Sbirlea	2019-06-07	1	-4/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	time. Summary: The cleanup in D62751 introduced a compile-time regression due to the way DT updates are performed. Add all insert edges then all delete edges in DTU to match the previous compile time. Compile time on the test provided by @mstorsjo before and after this patch on my machine: 113.046s vs 35.649s Repro: clang -target x86_64-w64-mingw32 -c -O3 glew-preproc.c; on https://martin.st/temp/glew-preproc.c. Reviewers: kuhar, NutshellySima, mstorsjo Subscribers: jlebar, mstorsjo, dmgreen, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62981 llvm-svn: 362839
*	[X86] Remove unnecessary new line escape from the end of a macro. NFC	Craig Topper	2019-06-07	1	-1/+1
\| \| \| \|	llvm-svn: 362837
*	[GlobalISel] IRTranslator: Translate the intrinsics ignored by CodeGen	Volkan Keles	2019-06-07	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Translate `llvm.assume`, `llvm.var.annotation` and `llvm.sideeffect` to nothing as they have no effect on CodeGen. Reviewers: qcolombet, aditya_nandakumar, dsanders, paquette, aemerson, arsenm Reviewed By: arsenm Subscribers: hiraditya, wdng, rovka, kristof.beyls, javed.absar, Petar.Avramovic, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D63022 llvm-svn: 362834
*	[APFloat] APFloat::Storage::Storage - refix use after move	Nick Desaulniers	2019-06-07	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Re-land r360675 after it was reverted in r360770. This was reported in: https://llvm.org/reports/scan-build/ Based on feedback in: https://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20190513/652286.html Reviewers: RKSimon, efriedma Reviewed By: RKSimon, efriedma Subscribers: eli.friedman, hiraditya, llvm-commits, srhines Tags: #llvm Differential Revision: https://reviews.llvm.org/D62767 llvm-svn: 362833
*	[ORC] Update symbol lookup to use a single callback with a required symbol state	Lang Hames	2019-06-07	8	-376/+251
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	rather than two callbacks. The asynchronous lookup API (which the synchronous lookup API wraps for convenience) used to take two callbacks: OnResolved (called once all requested symbols had an address assigned) and OnReady to be called once all requested symbols were safe to access). This patch updates the asynchronous lookup API to take a single 'OnComplete' callback and a required state (SymbolState) to determine when the callback should be made. This simplifies the common use case (where the client is interested in a specific state) and will generalize neatly as new states are introduced to track runtime initialization of symbols. Clients who were making use of both callbacks in a single query will now need to issue two queries (one for SymbolState::Resolved and another for SymbolState::Ready). Synchronous lookup API clients who were explicitly passing the WaitOnReady argument will now need neeed to pass a SymbolState instead (for 'WaitOnReady == true' use SymbolState::Ready, for 'WaitOnReady == false' use SymbolState::Resolved). Synchronous lookup API clients who were using default arugment values should see no change. llvm-svn: 362832
*	[DAGCombine] visitAND - fix local shadow variable warnings. NFCI.	Simon Pilgrim	2019-06-07	1	-24/+24
\| \| \| \|	llvm-svn: 362825
*	[DAGCombine] Use APInt::extractBits in "sub-splat" constant mask detection. ↵	Simon Pilgrim	2019-06-07	1	-3/+3
\| \| \| \| \| \|	NFCI. llvm-svn: 362820
*	[Analysis] simplify code for getSplatValue(); NFC	Sanjay Patel	2019-06-07	1	-20/+11
\| \| \| \| \| \| \| \| \|	AFAIK, this is only currently called by TTI, but it could be used from instcombine or CGP to help solve problems like: https://bugs.llvm.org/show_bug.cgi?id=37428 https://bugs.llvm.org/show_bug.cgi?id=42174 llvm-svn: 362810
*	[MachineScheduler] checkResourceLimit boundary condition update	Jinsong Ji	2019-06-07	1	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When we call checkResourceLimit in bumpCycle or bumpNode, and we know the resource count has just reached the limit (the equations are equal). We should return true to mark that we are resource limited for next schedule, or else we might continue to schedule in favor of latency for 1 more schedule and create a schedule that actually overbook the resource. When we call checkResourceLimit to estimate the resource limite before scheduling, we don't need to return true even if the equations are equal, as it shouldn't limit the schedule for it . Differential Revision: https://reviews.llvm.org/D62345 llvm-svn: 362805
*	test-commit	Stefan Stipanovic	2019-06-07	1	-1/+0
\| \| \| \|	llvm-svn: 362802
*	TailDuplicator: Remove no-op analyzeBranch call	Matt Arsenault	2019-06-07	1	-5/+0
\| \| \| \| \| \| \| \| \| \| \|	This could fail, which looked concerning. However nothing was actually using the results of this. I assume this was intended to use the anti-feature of analyzeBranch of removing instructions, but wasn't actually calling it with AllowModify = true. Fixes bug 42162. llvm-svn: 362800
*	[NFC] Don't export helpers of ConstantFoldCall	Joerg Sonnenberger	2019-06-07	1	-9/+11
\| \| \| \|	llvm-svn: 362799
*	llvm-lib: Disallow mixing object files with different machine types	Nico Weber	2019-06-07	3	-1/+87
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	lib.exe doesn't allow creating .lib files with object files that have differing machine types. Update llvm-lib to match. The motivation is to make it possible to infer the machine type of a .lib file in lld, so that it can warn when e.g. a 32-bit .lib file is passed to a 64-bit link (PR38965). Fixes PR38782. Differential Revision: https://reviews.llvm.org/D62913 llvm-svn: 362798
*	[x86] narrow extract subvector of vector select	Sanjay Patel	2019-06-07	1	-0/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a potentially large perf win for AVX1 targets because of the way we auto-vectorize to 256-bit but then expect the backend to legalize/optimize for the half-implemented AVX1 ISA. On the motivating example from PR37428 (even though this patch doesn't solve the vector shift issue): https://bugs.llvm.org/show_bug.cgi?id=37428 ...there's a 16% speedup when compiling with "-mavx" (perf tested on Haswell) because we eliminate the remaining 256-bit vblendv ops. I added comments on a couple of tests that require further work. If we have 256-bit logic ops separating the vselect and extract, we should probably narrow everything to 128-bit, but that requires a larger pattern match. Differential Revision: https://reviews.llvm.org/D62969 llvm-svn: 362797
*	[ARM] Fix bugs introduced by the fp64/d32 rework.	Simon Tatham	2019-06-07	1	-79/+55
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change D60691 caused some knock-on failures that weren't caught by the existing tests. Firstly, selecting a CPU that should have had a restricted FPU (e.g. `-mcpu=cortex-m4`, which should have 16 d-regs and no double precision) could give the unrestricted version, because `ARM::getFPUFeatures` returned a list of features including subtracted ones (here `-fp64`,`-d32`), but `ARMTargetInfo::initFeatureMap` threw away all the ones that didn't start with `+`. Secondly, the preprocessor macros didn't reliably match the actual compilation settings: for example, `-mfpu=softvfp` could still set `__ARM_FP` as if hardware FP was available, because the list of features on the cc1 command line would include things like `+vfp4`,`-vfp4d16` and clang didn't realise that one of those cancelled out the other. I've fixed both of these issues by rewriting `ARM::getFPUFeatures` so that it returns a list that enables every FP-related feature compatible with the selected FPU and disables every feature not compatible, which is more verbose but means clang doesn't have to understand the dependency relationships between the backend features. Meanwhile, `ARMTargetInfo::handleTargetFeatures` is testing for all the various forms of the FP feature names, so that it won't miss cases where it should have set `HW_FP` to feed into feature test macros. That in turn caused an ordering problem when handling `-mcpu=foo+bar` together with `-mfpu=something_that_turns_off_bar`. To fix that, I've arranged that the `+bar` suffixes on the end of `-mcpu` and `-march` cause feature names to be put into a separate vector which is concatenated after the output of `getFPUFeatures`. Another side effect of all this is to fix a bug where `clang -target armv8-eabi` by itself would fail to set `__ARM_FEATURE_FMA`, even though `armv8` (aka Arm v8-A) implies FP-Armv8 which has FMA. That was because `HW_FP` was being set to a value including only the `FPARMV8` bit, but that feature test macro was testing only the `VFP4FPU` bit. Now `HW_FP` ends up with all the bits set, so it gives the right answer. Changes to tests included in this patch: * `arm-target-features.c`: I had to change basically all the expected results. (The Cortex-M4 test in there should function as a regression test for the accidental double-precision bug.) * `arm-mfpu.c`, `armv8.1m.main.c`: switched to using `CHECK-DAG` everywhere so that those tests are no longer sensitive to the order of cc1 feature options on the command line. * `arm-acle-6.5.c`: been updated to expect the right answer to that FMA test. * `Preprocessor/arm-target-features.c`: added a regression test for the `mfpu=softvfp` issue. Reviewers: SjoerdMeijer, dmgreen, ostannard, samparker, JamesNagurne Reviewed By: ostannard Subscribers: srhines, javed.absar, kristof.beyls, hiraditya, cfe-commits, llvm-commits Tags: #clang, #llvm Differential Revision: https://reviews.llvm.org/D62998 llvm-svn: 362791
*	[RISCV] Support Bit-Preserving FP in F/D Extensions	Sam Elliott	2019-06-07	2	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This allows some integer bitwise operations to instead be performed by hardware fp instructions. This is correct because the RISC-V spec requires the F and D extensions to use the IEEE-754 standard representation, and fp register loads and stores to be bit-preserving. This is tested against the soft-float ABI, but with hardware float extensions enabled, so that the tests also ensure the optimisation also fires in this case. Reviewers: asb, luismarques Reviewed By: asb Subscribers: hiraditya, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, kito-cheng, shiva0217, jrtc27, zzheng, edward-jones, rogfer01, MartinMosbeck, brucehoult, the_o, rkruppe, PkmX, jocewei, psnobl, benna, Jim, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62900 llvm-svn: 362790
*	[AMDGPU] Constrain the AMDGPU inliner on maximum number of basic blocks in a ↵	Valery Pykhtin	2019-06-07	1	-1/+15
\| \| \| \| \| \| \| \|	caller function (compile time performance) Differential revision: https://reviews.llvm.org/D62917 llvm-svn: 362789
*	[AArch64][AsmParser] error on unexpected SVE predicate type suffix	Cullen Rhodes	2019-06-07	1	-2/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch fixes a bug in the assembler that permitted a type suffix on predicate registers when not expected. For instance, the following was previously valid: faddv h0, p0.q, z1.h This bug was present in all SVE instructions containing predicates with no type suffix and no predication form qualifier, i.e. /z or /m. The latter instructions are already caught with an appropiate error message by the assembler, e.g.: .text <stdin>:1:13: error: not expecting size suffix cmpne p1.s, p0.b/z, z2.s, 0 ^ A similar issue for SVE vector registers was fixed in: https://reviews.llvm.org/D59636 Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62942 llvm-svn: 362780
*	[AArch64][AsmParser] Provide better diagnostics for SVE predicates	Cullen Rhodes	2019-06-07	1	-1/+5
\| \| \| \| \| \| \| \| \| \|	Patch by Sander de Smalen (sdesmalen) Reviewed By: SjoerdMeijer Differential Revision: https://reviews.llvm.org/D62941 llvm-svn: 362779
*	[X86] -march=cooperlake (llvm)	Pengfei Wang	2019-06-07	2	-1/+20
\| \| \| \| \| \| \| \| \| \|	Support intel -march=cooperlake in llvm Patch by Shengchen Kan (skan) Differential Revision: https://reviews.llvm.org/D62836 llvm-svn: 362776
*	Fix for lld buildbot	Sam Parker	2019-06-07	1	-2/+1
\| \| \| \| \| \|	Removed unused (in non-debug builds) variable. llvm-svn: 362775
*	[CodeGen] Generic Hardware Loop Support	Sam Parker	2019-06-07	11	-580/+805
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Patch which introduces a target-independent framework for generating hardware loops at the IR level. Most of the code has been taken from PowerPC CTRLoops and PowerPC has been ported over to use this generic pass. The target dependent parts have been moved into TargetTransformInfo, via isHardwareLoopProfitable, with HardwareLoopInfo introduced to transfer information from the backend. Three generic intrinsics have been introduced: - void @llvm.set_loop_iterations Takes as a single operand, the number of iterations to be executed. - i1 @llvm.loop_decrement(anyint) Takes the maximum number of elements processed in an iteration of the loop body and subtracts this from the total count. Returns false when the loop should exit. - anyint @llvm.loop_decrement_reg(anyint, anyint) Takes the number of elements remaining to be processed as well as the maximum numbe of elements processed in an iteration of the loop body. Returns the updated number of elements remaining. llvm-svn: 362774
*	[AVR] Expand 16-bit rotations during the legalization stage	Dylan McKay	2019-06-07	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In r356860, the legalization logic for BSWAP was modified to ISD::ROTL, rather than the old ISD::{SHL, SRL, OR} nodes. This works fine on AVR for 8-bit rotations, but 16-bit rotations are currently unimplemented - they always trigger an assertion error in the AVRExpandPseudoInsts pass ("RORW unimplemented"). This patch instructions the legalizer to expand 16-bit rotations into the previous SHL, SRL, OR pattern it did previously. This fixes the 'issue-cannot-select-bswap.ll' test. Interestingly, this test failure seems flaky - it passes successfully on the avr-build-01 buildbot, but fails locally on my Arch Linux install. llvm-svn: 362773
*	[MC][ELF] Don't create relocations with section symbols for STB_LOCAL ifunc	Fangrui Song	2019-06-07	1	-0/+6
\| \| \| \| \| \| \| \| \| \| \|	We should keep the symbol type (STT_GNU_IFUNC) for a local ifunc because it may result in an IRELATIVE reloc that the dynamic loader will use to resolve the address at startup time. There is another problem that is not fixed by this patch: a PC relative relocation should also create a relocation with the ifunc symbol. llvm-svn: 362767
*	[LV] Fix -Wunused-function after r362736	Fangrui Song	2019-06-07	1	-0/+2
\| \| \| \|	llvm-svn: 362762
*	AMDGPU: Don't count mask branch pseudo towards skip threshold	Matt Arsenault	2019-06-07	1	-10/+8
\| \| \| \|	llvm-svn: 362761
*	AMDGPU: Insert skips for blocks with FLAT	Matt Arsenault	2019-06-07	1	-1/+2
\| \| \| \| \| \| \|	This already forced a skip for VMEM, so it should also be done for flat. I'm somewhat skeptical about the benefit of this though. llvm-svn: 362760
*	[PowerPC] Exploit the vector min/max instructions	Nemanja Ivanovic	2019-06-06	3	-0/+65
\| \| \| \| \| \| \| \| \| \|	Use the PPC vector min/max instructions for computing the corresponding operation as these should be faster than the compare/select sequences we currently emit. Differential revision: https://reviews.llvm.org/D47332 llvm-svn: 362759
*	AMDGPU: Insert skip branches over return blocks	Matt Arsenault	2019-06-06	2	-3/+4
\| \| \| \| \| \| \| \| \| \|	SIInsertSkips really doesn't understand the control flow, and makes very stupid assumptions about the block layout. This was able to get away with not skipping return blocks, since usually after structurization there is only one placed at the end of the function. Tail duplication can break this assumption. llvm-svn: 362754
*	[DebugInfo] Incorrect debug info record generated for loop counter.	Alexey Lapshin	2019-06-06	1	-19/+1
\| \| \| \| \| \| \| \| \| \| \| \| \|	Incorrect Debug Variable Range was calculated while "COMPUTING LIVE DEBUG VARIABLES" stage. Range for Debug Variable("i") computed according to current state of instructions inside of basic block. But Register Allocator creates new instructions which were not taken into account when Live Debug Variables computed. In the result DBG_VALUE instruction for the "i" variable was put after these newly inserted instructions. This is incorrect. Debug Value for the loop counter should be inserted before any loop instruction. Differential Revision: https://reviews.llvm.org/D62650 llvm-svn: 362750
*	[AMDGPU] Partial revert for the ba447bae7448435c9986eece0811da1423972fdd	Alexander Timofeev	2019-06-06	4	-163/+107
\| \| \| \| \| \| \| \| \| \| \| \|	"Divergence driven ISel. Assign register class for cross block values according to the divergence." that discovered the design flaw leading to several issues that required to be solved before. This change reverts AMDGPU specific changes and keeps common part unaffected. llvm-svn: 362749
*	[X86] Make a bunch of merge masked binops commutable for loading folding.	Craig Topper	2019-06-06	1	-8/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	This primarily affects add/fadd/mul/fmul/and/or/xor/pmuludq/pmuldq/max/min/fmaxc/fminc/pmaddwd/pavg. We already commuted the unmasked and zero masked versions. I've added 512-bit stack folding tests for most of the instructions affected. I've tested needing commuting and not commuting across unmasked, merged masked, and zero masked. The 128/256 bit instructions should behave similarly. llvm-svn: 362746
*	[CFLGraph] Add support for unary fneg instruction.	Craig Topper	2019-06-06	1	-0/+10
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D62791 llvm-svn: 362737
*	[LV] Wrap LV illegality reporting in a function. NFC.	Renato Golin	2019-06-06	1	-100/+120
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A function for loop vectorization illegality reporting has been introduced: void LoopVectorizationLegality::reportVectorizationFailure( const StringRef DebugMsg, const StringRef OREMsg, const StringRef ORETag, Instruction * const I) const; The function prints a debug message when the debug for the compilation unit is enabled as well as invokes the optimization report emitter to generate a message with a specified tag. The function doesn't cover any complicated logic when a custom lambda should be passed to the emitter, only generating a message with a tag is supported. The function always prints the instruction `I` after the debug message whenever the instruction is specified, otherwise the debug message ends with a dot: 'LV: Not vectorizing: Disabled/already vectorized.' Patch by Pavel Samolysov <samolisov@gmail.com> llvm-svn: 362736