bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[DWARF][NFC] Refactor a function to return Optional<> instead of bool	Wolfgang Pieb	2018-10-31	2	-9/+9
\| \| \| \| \| \| \| \|	Minor refactor of DWARFUnit::getStringOffsetSectionItem(). Differential Revision: https://reviews.llvm.org/D53948 llvm-svn: 345776
*	[SelectionDAG] Handle constant range [0,1) in lowerRangeToAssertZExt	Scott Linder	2018-10-31	1	-1/+2
\| \| \| \| \| \| \| \| \|	lowerRangeToAssertZExt currently relies on something like EarlyCSE having eliminated the constant range [0,1). At -O0 this leads to an assert. Differential Revision: https://reviews.llvm.org/D53888 llvm-svn: 345770
*	[AMDGPU] Remove FeatureVGPRSpilling	Scott Linder	2018-10-31	6	-48/+8
\| \| \| \| \| \| \| \| \| \| \|	This feature is only relevant to shaders, and is no longer used. When disabled, lowering of reserved registers for shaders causes a compiler crash. Remove the feature and add a test for compilation of shaders at OptNone. Differential Revision: https://reviews.llvm.org/D53829 llvm-svn: 345763
*	[SelectionDAGISel] Suppress a -Wunused-but-set-variable warning in release ↵	Craig Topper	2018-10-31	1	-0/+1
\| \| \| \| \| \|	builds. NFC llvm-svn: 345761
*	Fix comment typo. NFCI.	Simon Pilgrim	2018-10-31	1	-1/+1
\| \| \| \|	llvm-svn: 345758
*	[SelectionDAG] SelectionDAGLegalize::ExpandBITREVERSE - ensure we use ShiftTy	Simon Pilgrim	2018-10-31	1	-6/+6
\| \| \| \| \| \|	We should be using the getShiftAmountTy value type for shift amounts. llvm-svn: 345756
*	[InstCombine] Combine nested min/max intrinsics with constants	Volkan Keles	2018-10-31	1	-1/+35
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, spatel Reviewed By: spatel Subscribers: lebedev.ri, wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D53774 llvm-svn: 345751
*	[globalisel][irtranslator] Verify that DILocations aren't lost in translation	Daniel Sanders	2018-10-31	1	-23/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Also fix a couple bugs where DILocations are lost. EntryBuilder wasn't passing on debug locations for PHI's, constants, GLOBAL_VALUE, etc. Reviewers: aprantl, vsk, bogner, aditya_nandakumar, volkan, rtereshin, aemerson Reviewed By: aemerson Subscribers: aemerson, rovka, kristof.beyls, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53740 llvm-svn: 345743
*	MachineModuleInfo: Initialize DbgInfoAvailable depending on debug_cus existing	Matthias Braun	2018-10-31	3	-3/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before this patch DbgInfoAvailable was set to true in DwarfDebug::beginModule() or CodeViewDebug::CodeViewDebug(). This made MIR testing weird since passes would suddenly stop dealing with debug info just because we stopped the pipeline before the debug printers. This patch changes the logic to initialize DbgInfoAvailable based on the fact that debug_compile_units exist in the llvm Module. The debug printers may then override it with false in case of debug printing being disabled. Differential Revision: https://reviews.llvm.org/D53885 llvm-svn: 345740
*	[InstCombine] refactor fabs+fcmp fold; NFC	Sanjay Patel	2018-10-31	1	-39/+45
\| \| \| \| \| \| \|	Also, remove/replace/minimize/enhance the tests for this fold. The code drops FMF, so it needs more tests and at least 1 fix. llvm-svn: 345734
*	[Hexagon] Make sure not to use GP-relative addressing with PIC	Krzysztof Parzyszek	2018-10-31	3	-4/+10
\| \| \| \| \| \| \|	Make sure that -relocation-model=pic prevents use of GP-relative addressing modes. llvm-svn: 345731
*	[InstSimplify] fold 'fcmp nnan ult X, 0.0' when X is not negative	Sanjay Patel	2018-10-31	1	-1/+4
\| \| \| \| \| \|	This is the inverted case for the transform added with D53874 / rL345725. llvm-svn: 345728
*	[InstCombine] add assertion that InstSimplify has folded a fabs+fcmp; NFC	Sanjay Patel	2018-10-31	1	-2/+5
\| \| \| \| \| \| \| \|	The 'OLT' case was updated at rL266175, so I assume it was just an oversight that 'UGE' was not included because that patch handled both predicates in InstSimplify. llvm-svn: 345727
*	[InstSimplify] fold 'fcmp nnan oge X, 0.0' when X is not negative	Sanjay Patel	2018-10-31	2	-2/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	This re-raises some of the open questions about how to apply and use fast-math-flags in IR from PR38086: https://bugs.llvm.org/show_bug.cgi?id=38086 ...but given the current implementation (no FMF on casts), this is likely the only way to predicate the transform. This is part of solving PR39475: https://bugs.llvm.org/show_bug.cgi?id=39475 Differential Revision: https://reviews.llvm.org/D53874 llvm-svn: 345725
*	[LoopUnroll] allow customization for new-pass-manager version of LoopUnroll	Fedor Sergeev	2018-10-31	3	-13/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Unlike its legacy counterpart new pass manager's LoopUnrollPass does not provide any means to select which flavors of unroll to run (runtime, peeling, partial), relying on global defaults. In some cases having ability to run a restricted LoopUnroll that does more than LoopFullUnroll is needed. Introduced LoopUnrollOptions to select optional unroll behaviors. Added 'unroll<peeling>' to PassRegistry mainly for the sake of testing. Reviewers: chandlerc, tejohnson Differential Revision: https://reviews.llvm.org/D53440 llvm-svn: 345723
*	[DAGCombiner] Fold 0 div/rem X to 0	David Bolvansky	2018-10-31	1	-2/+5
\| \| \| \| \| \| \| \| \| \| \| \|	Reviewers: RKSimon, spatel, javed.absar, craig.topper, t.p.northover Reviewed By: RKSimon Subscribers: craig.topper, llvm-commits Differential Revision: https://reviews.llvm.org/D52504 llvm-svn: 345721
*	AMDGPU: Rewrite SILowerI1Copies to always stay on SALU	Nicolai Haehnle	2018-10-31	6	-189/+749
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Instead of writing boolean values temporarily into 32-bit VGPRs if they are involved in PHIs or are observed from outside a loop, we use bitwise masking operations to combine lane masks in a way that is consistent with wave control flow. Move SIFixSGPRCopies to before this pass, since that pass incorrectly attempts to move SGPR phis to VGPRs. This should recover most of the code quality that was lost with the bug fix in "AMDGPU: Remove PHI loop condition optimization". There are still some relevant cases where code quality could be improved, in particular: - We often introduce redundant masks with EXEC. Ideally, we'd have a generic computeKnownBits-like analysis to determine whether masks are already masked by EXEC, so we can avoid this masking both here and when lowering uniform control flow. - The criterion we use to determine whether a def is observed from outside a loop is conservative: it doesn't check whether (loop) branch conditions are uniform. Change-Id: Ibabdb373a7510e426b90deef00f5e16c5d56e64b Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, mgorny, yaxunl, dstuttard, t-tye, eraman, llvm-commits Differential Revision: https://reviews.llvm.org/D53496 llvm-svn: 345719
*	AMDGPU: Remove PHI loop condition optimization	Nicolai Haehnle	2018-10-31	5	-138/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The optimization to early break out of loops if all threads are dead was never fully implemented. But the PHI node analyzing is actually causing a number of problems, so remove all the extra code for it. (This does actually regress code quality in a few places because it ends up relying more heavily on phi's of i1, which we don't do a great job with. However, since it fixes real bugs in the wild, we should take this change. I have some prototype changes to improve i1 lowering in general -- not just for control flow -- which should help recover the code quality, I just need to make those changes fit for general consumption. -- Nicolai) Change-Id: I6fc6c6c8961857ac6009fcfb9f7e5e48dc23fbb1 Patch-by: Christian König <christian.koenig@amd.com> Reviewers: arsenm, rampitec, tpr Subscribers: kzhuravl, jvesely, wdng, yaxunl, dstuttard, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D53359 llvm-svn: 345718
*	[InstSimplify] fold icmp based on range of abs/nabs	Sanjay Patel	2018-10-31	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is a fix for PR39475: https://bugs.llvm.org/show_bug.cgi?id=39475 We managed to get some of these patterns using computeKnownBits in D47041, but that can't be used for nabs(). Instead, put in some range-based logic, so we can fold both abs/nabs with icmp with a constant value. Alive proofs: https://rise4fun.com/Alive/21r Name: abs_nsw_is_positive %cmp = icmp slt i32 %x, 0 %negx = sub nsw i32 0, %x %abs = select i1 %cmp, i32 %negx, i32 %x %r = icmp sgt i32 %abs, -1 => %r = i1 true Name: abs_nsw_is_not_negative %cmp = icmp slt i32 %x, 0 %negx = sub nsw i32 0, %x %abs = select i1 %cmp, i32 %negx, i32 %x %r = icmp slt i32 %abs, 0 => %r = i1 false Name: nabs_is_negative_or_0 %cmp = icmp slt i32 %x, 0 %negx = sub i32 0, %x %nabs = select i1 %cmp, i32 %x, i32 %negx %r = icmp slt i32 %nabs, 1 => %r = i1 true Name: nabs_is_not_over_0 %cmp = icmp slt i32 %x, 0 %negx = sub i32 0, %x %nabs = select i1 %cmp, i32 %x, i32 %negx %r = icmp sgt i32 %nabs, 0 => %r = i1 false Differential Revision: https://reviews.llvm.org/D53844 llvm-svn: 345717
*	[tblgen][PredicateExpander] Add the ability to describe more complex ↵	Andrea Di Biagio	2018-10-31	2	-0/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	constraints on instruction operands. Before this patch, class PredicateExpander only knew how to expand simple predicates that performed checks on instruction operands. In particular, the new scheduling predicate syntax was not rich enough to express checks like this one: Foo(MI->getOperand(0).getImm()) == ExpectedVal; Here, the immediate operand value at index zero is passed in input to function Foo, and ExpectedVal is compared against the value returned by function Foo. While this predicate pattern doesn't show up in any X86 model, it shows up in other upstream targets. So, being able to support those predicates is fundamental if we want to be able to modernize all the scheduling models upstream. With this patch, we allow users to specify if a register/immediate operand value needs to be passed in input to a function as part of the predicate check. Now, register/immediate operand checks all derive from base class CheckOperandBase. This patch also changes where TIIPredicate definitions are expanded by the instructon info emitter. Before, definitions were expanded in class XXXGenInstrInfo (where XXX is a target name). With the introduction of this new syntax, we may want to have TIIPredicates expanded directly in XXXInstrInfo. That is because functions used by the new operand predicates may only exist in the derived class (i.e. XXXInstrInfo). This patch is a non functional change for the existing scheduling models. In future, we will be able to use this richer syntax to better describe complex scheduling predicates, and expose them to llvm-mca. Differential Revision: https://reviews.llvm.org/D53880 llvm-svn: 345714
*	[AMDGPU] support image load/store a16	Neil Henning	2018-10-31	1	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \|	Our a16 support was only enabled for sample/gather and buffer load/store, but not for image load/store operations (which take an i16 as the pixel index rather than a half). Fix our isel lowering and add test cases to prove it out. Differential Revision: https://reviews.llvm.org/D53750 llvm-svn: 345710
*	[IndVars] Strengthen restricton in rewriteLoopExitValues	Max Kazantsev	2018-10-31	1	-28/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	For some unclear reason rewriteLoopExitValues considers recalculation after the loop profitable if it has some "soft uses" outside the loop (i.e. any use other than call and return), even if we have proved that it has a user inside the loop which we think will not be optimized away. There is no existing unit test that would explain this. This patch provides an example when rematerialisation of exit value is not profitable but it passes this check due to presence of a "soft use" outside the loop. It makes no sense to recalculate value on exit if we are going to compute it due to some irremovable within the loop. This patch disallows applying this transform in the described situation. Differential Revision: https://reviews.llvm.org/D51581 Reviewed By: etherzhhb llvm-svn: 345708
*	[LV] Support vectorization of interleave-groups that require an epilog under	Dorit Nuzman	2018-10-31	15	-70/+173
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	optsize using masked wide loads Under Opt for Size, the vectorizer does not vectorize interleave-groups that have gaps at the end of the group (such as a loop that reads only the even elements: a[2*i]) because that implies that we'll require a scalar epilogue (which is not allowed under Opt for Size). This patch extends the support for masked-interleave-groups (introduced by D53011 for conditional accesses) to also cover the case of gaps in a group of loads; Targets that enable the masked-interleave-group feature don't have to invalidate interleave-groups of loads with gaps; they could now use masked wide-loads and shuffles (if that's what the cost model selects). Reviewers: Ayal, hsaito, dcaballe, fhahn Reviewed By: Ayal Differential Revision: https://reviews.llvm.org/D53668 llvm-svn: 345705
*	[MSan] another take at instrumenting inline assembly - now with calls	Alexander Potapenko	2018-10-31	1	-22/+108
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Turns out it's not always possible to figure out whether an asm() statement argument points to a valid memory region. One example would be per-CPU objects in the Linux kernel, for which the addresses are calculated using the FS register and a small offset in the .data..percpu section. To avoid pulling all sorts of checks into the instrumentation, we replace actual checking/unpoisoning code with calls to msan_instrument_asm_load(ptr, size) and msan_instrument_asm_store(ptr, size) functions in the runtime. This patch doesn't implement the runtime hooks in compiler-rt, as there's been no demand in assembly instrumentation for userspace apps so far. llvm-svn: 345702
*	[ARM64] [Windows] Exception handling support in frame lowering	Sanjin Sijaric	2018-10-31	6	-52/+398
\| \| \| \| \| \| \| \| \| \|	Emit pseudo instructions indicating unwind codes corresponding to each instruction inside the prologue/epilogue. These are used by the MCLayer to populate the .xdata section. Differential Revision: https://reviews.llvm.org/D50288 llvm-svn: 345701
*	[AArch64] Mark condition flags and x16/x17 as clobbered when calling __chkstk	Martin Storsjo	2018-10-31	1	-0/+6
\| \| \| \| \| \| \| \|	This is similar to SVN r311061 for ARM. Differential Revision: https://reviews.llvm.org/D53878 llvm-svn: 345698
*	[ORC] Fix hex printing of uint64_t values.	Lang Hames	2018-10-31	2	-6/+6
\| \| \| \| \| \| \|	A plain "%x" format string will drop the high 32-bits. Use the PRIx64 macro instead. llvm-svn: 345696
*	[DWARF] Revert r345546: Refactor range list extraction and dumping	Wolfgang Pieb	2018-10-31	7	-214/+229
\| \| \| \| \| \|	This patch caused some internal tests to break which are being investigated. llvm-svn: 345687
*	Use llvm::any_of instead std::any_of. NFC	Fangrui Song	2018-10-31	1	-2/+1
\| \| \| \|	llvm-svn: 345683
*	ADT/STLExtras: Introduce llvm::empty; NFC	Matthias Braun	2018-10-31	11	-14/+13
\| \| \| \| \| \| \| \|	This is modeled after C++17 std::empty(). Differential Revision: https://reviews.llvm.org/D53909 llvm-svn: 345679
*	DWARFVerifier: make the verifier more comprehensive for objects	Saleem Abdulrasool	2018-10-30	1	-1/+1
\| \| \| \| \| \| \|	Make the code do what was mentioned in the comment: only skip the CU types. This enables the lexical blocks to be verified as well. llvm-svn: 345675
*	MachineOperand/MIParser: Do not print debug-use flag, infer it	Matthias Braun	2018-10-30	2	-2/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	The debug-use flag must be set exactly for uses on DBG_VALUEs. This is so obvious that it can be trivially inferred while parsing. This will reduce noise when printing while omitting an information that has little value to the user. The parser will keep recognizing the flag for compatibility with old `.mir` files. Differential Revision: https://reviews.llvm.org/D53903 llvm-svn: 345671
*	Revert r345542: AMDGPU: Enable code object v3 by default	Konstantin Zhuravlyov	2018-10-30	1	-30/+15
\| \| \| \| \| \|	It breaks mesa. llvm-svn: 345662
*	[FPEnv] [FPEnv] Add constrained intrinsics for MAXNUM and MINNUM	Cameron McInally	2018-10-30	7	-0/+28
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D53216 llvm-svn: 345650
*	[InstCombine] use 'match' to reduce code; NFC	Sanjay Patel	2018-10-30	1	-92/+90
\| \| \| \|	llvm-svn: 345647
*	[InstCombine] Teach the move free before null test opti how to deal with ↵	Quentin Colombet	2018-10-30	1	-12/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	noop casts InstCombine features an optimization that essentially replaces: if (a) free(a) into: free(a) Right now, this optimization is gated by the minsize attribute and therefore we only perform it if we can prove that we are going to be able to eliminate the branch and the destination block. However when casts are involved the optimization would fail to apply, because the optimization was not smart enough to realize that it is possible to also move the casts away from the destination block and that is harmless to the performance since they are just noops. E.g., foo(int a) if (a) free((char)a) Wouldn't be optimized by instcombine, because - We would refuse to hoist the `bitcast i32* %a to i8` in the source block - We would fail to see that `bitcast i32* %a to i8` and %a are the same value. This patch fixes both these problems: - It teaches the pattern matching of the comparison how to look through casts. - It checks that whether the additional instruction in the destination block can be hoisted and are harmless performance-wise. - It hoists all the code of the destination block in the source block. Differential Revision: D53356 llvm-svn: 345644
*	[COFF, ARM64] Make sure to forward arguments from vararg to musttail vararg	Mandeep Singh Grang	2018-10-30	2	-0/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Thunk functions in Windows are varag functions that call a musttail function to pass the arguments after the fixup is done. We need to make sure that we forward the arguments from the caller vararg to the callee vararg function. This is the same mechanism that is used for Windows on X86. Reviewers: ssijaric, eli.friedman, TomTan, mgrang, mstorsjo, rnk, compnerd, efriedma Reviewed By: efriedma Subscribers: efriedma, kristof.beyls, chrib, javed.absar, llvm-commits Differential Revision: https://reviews.llvm.org/D53843 llvm-svn: 345641
*	[ScalarizeMaskedMemIntrin] Limit the scope of some variables that are only ↵	Craig Topper	2018-10-30	1	-8/+5
\| \| \| \| \| \|	used inside loops. llvm-svn: 345638
*	[DAGCombiner] Fix for big endian in ForwardStoreValueToDirectLoad	Bjorn Pettersson	2018-10-30	1	-9/+13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Normalize the offset for endianess before checking if the store cover the load in ForwardStoreValueToDirectLoad. Without this we missed out on some optimizations for big endian targets. If for example having a 4 bytes store followed by a 1 byte load, loading the least significant byte from the store, the STCoversLD check would fail (see @test4 in test/CodeGen/AArch64/load-store-forwarding.ll). This patch also fixes a problem seen in an out-of-tree target. The target has i40 as a legal type, it is big endian, and the StoreSize for i40 is 48 bits. So when normalizing the offset for endianess we need to take the StoreSize into account (assuming that padding added when storing into a larger StoreSize always is added at the most significant end). Reviewers: niravd Reviewed By: niravd Subscribers: javed.absar, kristof.beyls, llvm-commits, uabelho Differential Revision: https://reviews.llvm.org/D53776 llvm-svn: 345636
*	[AArch64] [Windows] SEH opcodes should be scheduling boundaries.	Eli Friedman	2018-10-30	2	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Prevents the post-RA scheduler from modifying the prologue sequences emitting by frame lowering. This is roughly similar to what we do for other targets: TargetInstrInfo::isSchedulingBoundary checks isPosition(), which checks for CFI_INSTRUCTION. isSEHInstruction is taken from D50288; it'll land with whatever patch lands first. Differential Revision: https://reviews.llvm.org/D53851 llvm-svn: 345634
*	[AArch64] Create proper memoperand for multi-vector stores	David Greene	2018-10-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Re-apply r345315 with testcase fixes. Include all of the store's source vector operands when creating the MachineMemOperand. Previously, we were missing the first operand, making the store size seem smaller than it really is. Differential Revision: https://reviews.llvm.org/D52816 llvm-svn: 345631
*	[X86] In lowerVectorShuffleAsBroadcast, make peeking through CONCAT_VECTORS ↵	Craig Topper	2018-10-30	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	work correctly if we already walked through a bitcast that changed the element size. The CONCAT_VECTORS case was using the original mask element count to determine how to adjust the broadcast index. But if we looked through a bitcast the original mask size doesn't tell us anything about the concat_vectors. This patch switchs to using the concat_vectors input element count directly instead. Differential Revision: https://reviews.llvm.org/D53823 llvm-svn: 345626
*	[GCOV] Function counters are wrong when on one line	Calixte Denizet	2018-10-30	1	-4/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: After commit https://reviews.llvm.org/rL344228, the function definitions have a counter but when on one line the counter is wrong (e.g. void foo() { }) I added a test in: https://reviews.llvm.org/D53601 Reviewers: marco-c Reviewed By: marco-c Subscribers: llvm-commits, sylvestre.ledru Differential Revision: https://reviews.llvm.org/D53600 llvm-svn: 345624
*	[DAG] Add const variants for BaseIndexOffset functions.	Nirav Dave	2018-10-30	1	-3/+4
\| \| \| \|	llvm-svn: 345623
*	Fix printing bug in pdb2yaml.	Zachary Turner	2018-10-30	1	-1/+1
\| \| \| \| \| \| \|	We were using the wrong enum table when mapping enum values to strings for public symbol flags. llvm-svn: 345622
*	[SystemZ] Simplify LRV/STRV ISD nodes	Ulrich Weigand	2018-10-30	4	-47/+40
\| \| \| \| \| \| \| \| \| \| \|	The LRV and STRV nodes carry an extra operand to indicate the type of the memory access. This is redundant, since the nodes are actually of class MemIntrinsicNode and therefore hold that same information already as MemoryVT. NFC intended. llvm-svn: 345618
*	[TTI] Fix uses of SK_ExtractSubvector shuffle costs (PR39368)	Simon Pilgrim	2018-10-30	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Correct costings of SK_ExtractSubvector requires the SubTy argument to indicate the type/size of the extracted subvector. Unlike the rest of the shuffle kinds this means that the main Ty argument represents the source vector type not the destination! I've done my best to fix a number of vectorizer uses: SLP - the reduction epilogue costs should be using a SK_PermuteSingleSrc shuffle as these all occur at the hardware vector width - we're not extracting (illegal) subvector types. This is causing the cost model diffs as SK_ExtractSubvector costs are poorly handled and tend to just return 1 at the moment. LV - I'm not clear on what the SK_ExtractSubvector should represents for recurrences - I've used a <1 x ?> subvector extraction as that seems to match the VF delta. Differential Revision: https://reviews.llvm.org/D53573 llvm-svn: 345617
*	[InstCombine] use getFltSemantics() instead of duplicating it; NFC	Sanjay Patel	2018-10-30	1	-19/+3
\| \| \| \|	llvm-svn: 345613
*	[InstCombine] try to turn shuffle into insertelement	Sanjay Patel	2018-10-30	1	-0/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	shuffle (insert ?, Scalar, IndexC), V1, Mask --> insert V1, Scalar, IndexC' The motivating case is at least a couple of steps away: I noticed that SLPVectorizer does not analyze shuffles as well as sequences of insert/extract in PR34724: https://bugs.llvm.org/show_bug.cgi?id=34724 ...so SLP may fail to vectorize when source code has shuffles to start with or instcombine has converted insert/extract to shuffles. Independent of that, an insertelement is always a simpler op for IR analysis vs. a shuffle, so we should transform to insert when possible. I don't think there's any codegen concern here - if a target can't insert a scalar directly to some fixed element in a vector (x86?), then this should get expanded to the insert+shuffle that we started with. Differential Revision: https://reviews.llvm.org/D53507 llvm-svn: 345607
*	[SchedModel] Fix for read advance cycles with implicit pseudo operands.	Jonas Paulsson	2018-10-30	1	-4/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The SchedModel allows the addition of ReadAdvances to express that certain operands of the instructions are needed at a later point than the others. RegAlloc may add pseudo operands that are not part of the instruction descriptor, and therefore cannot have any read advance entries. This meant that in some cases the desired read advance was nullified by such a pseudo operand, which still had the original latency. This patch fixes this by making sure that such pseudo operands get a zero latency during DAG construction. Review: Matthias Braun, Ulrich Weigand. https://reviews.llvm.org/D49671 llvm-svn: 345606