bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[AMDGPU] Add and update scalar instructions	Graham Sellers	2018-11-29	4	-34/+214
\| \| \| \| \| \| \| \| \|	This patch adds support for S_ANDN2, S_ORN2 32-bit and 64-bit instructions and adds splits to move them to the vector unit (for which there is no equivalent instruction). It modifies the way that the more complex scalar instructions are lowered to vector instructions by first breaking them down to sequences of simpler scalar instructions which are then lowered through the existing code paths. The pattern for S_XNOR has also been updated to apply inversion to one input rather than the output of the XOR as the result is equivalent and may allow leaving the NOT instruction on the scalar unit. A new tests for NAND, NOR, ANDN2 and ORN2 have been added, and existing tests now hit the new instructions (and have been modified accordingly). Differential: https://reviews.llvm.org/D54714 llvm-svn: 347877
*	Fix: Add support for TFE/LWE in image intrinsic	David Stuttard	2018-11-29	1	-2/+1
\| \| \| \| \| \| \| \|	My change svn-id: 347871 caused a buildbot failure due to an unused variable def (used in an assert). Change-Id: Ia882d18bb6fa79b4d7bbfda422b9ea5d23eab336 llvm-svn: 347876
*	Revert r347823 "[TextAPI] Switch back to a custom Platform enum."	Hans Wennborg	2018-11-29	13	-1396/+0
\| \| \| \| \| \| \| \| \|	It broke the Windows buildbots, e.g. http://lab.llvm.org:8011/builders/llvm-clang-lld-x86_64-scei-ps4-windows10pro-fast/builds/21829/steps/test/logs/stdio This also reverts the follow-ups: r347824, r347827, and r347836. llvm-svn: 347874
*	[CallSiteSplitting] Report edge deletion to DomTreeUpdater	Joseph Tremoulet	2018-11-29	1	-1/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When splitting musttail calls, the split blocks' original terminators get removed; inform the DTU when this happens. Also add a testcase that fails an assertion in the DTU without this fix. Reviewers: fhahn, junbuml Reviewed By: fhahn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D55027 llvm-svn: 347872
*	Add support for TFE/LWE in image intrinsics	David Stuttard	2018-11-29	13	-57/+564
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	TFE and LWE support requires extra result registers that are written in the event of a failure in order to detect that failure case. The specific use-case that initiated these changes is sparse texture support. This means that if image intrinsics are used with either option turned on, the programmer must ensure that the return type can contain all of the expected results. This can result in redundant registers since the vector size must be a power-of-2. This change takes roughly 6 parts: 1. Modify the instruction defs in tablegen to add new instruction variants that can accomodate the extra return values. 2. Updates to lowerImage in SIISelLowering.cpp to accomodate setting TFE or LWE (where the bulk of the work for these instruction types is now done) 3. Extra verification code to catch cases where intrinsics have been used but insufficient return registers are used. 4. Modification to the adjustWritemask optimisation to account for TFE/LWE being enabled (requires extra registers to be maintained for error return value). 5. An extra pass to zero initialize the error value return - this is because if the error does not occur, the register is not written and thus must be zeroed before use. Also added a new (on by default) option to ensure ALL return values are zero-initialized that is required for sparse texture support. 6. Disable the inst_combine optimization in the presence of tfe/lwe (later TODO for this to re-enable and handle correctly). There's an additional fix now to avoid a dmask=0 For an image intrinsic with tfe where all result channels except tfe were unused, I was getting an image instruction with dmask=0 and only a single vgpr result for tfe. That is incorrect because the hardware assumes there is at least one vgpr result, plus the one for tfe. Fixed by forcing dmask to 1, which gives the desired two vgpr result with tfe in the second one. The TFE or LWE result is returned from the intrinsics using an aggregate type. Look in the test code provided to see how this works, but in essence IR code to invoke the intrinsic looks as follows: %v = call {<4 x float>,i32} @llvm.amdgcn.image.load.1d.v4f32i32.i32(i32 15, i32 %s, <8 x i32> %rsrc, i32 1, i32 0) %v.vec = extractvalue {<4 x float>, i32} %v, 0 %v.err = extractvalue {<4 x float>, i32} %v, 1 Differential revision: https://reviews.llvm.org/D48826 Change-Id: If222bc03642e76cf98059a6bef5d5bffeda38dda llvm-svn: 347871
*	[CVP] tidy processCmp(); NFC	Sanjay Patel	2018-11-29	1	-14/+14
\| \| \| \| \| \| \| \|	1. The variables were confusing: 'C' typically refers to a constant, but here it was the Cmp. 2. Formatting violations. 3. Simplify code to return true/false constant. llvm-svn: 347868
*	Revert "[LICM] Enable control flow hoisting by default" and "[LICM] Reapply ↵	Martin Storsjo	2018-11-29	1	-312/+15
\| \| \| \| \| \| \| \| \| \| \|	r347190 "Make LICM able to hoist phis" with fix" This reverts commits r347776 and r347778. The first one, r347776, caused significant compile time regressions for certain input files, see PR39836 for details. llvm-svn: 347867
*	Revert r347596 "Support for inserting profile-directed cache prefetches"	Hans Wennborg	2018-11-29	6	-395/+1
\| \| \| \| \| \| \| \| \| \| \| \|	It causes asserts building BoringSSL. See https://crbug.com/91009#c3 for repro. This also reverts the follow-ups: Revert r347724 "Do not insert prefetches with unsupported memory operands." Revert r347606 "[X86] Add dependency from X86 to ProfileData after rL347596" Revert r347607 "Add new passes to X86 pipeline tests" llvm-svn: 347864
*	[GlobalISel] Fix insertion of stack-protector epilogue	Petr Pavlu	2018-11-29	1	-8/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	* Tell the StackProtector pass to generate the epilogue instrumentation when GlobalISel is enabled because GISel currently does not implement the same deferred epilogue insertion as SelectionDAG. * Update StackProtector::InsertStackProtectors() to find a stack guard slot by searching for the llvm.stackprotector intrinsic when the prologue was not created by StackProtector itself but the pass still needs to generate the epilogue instrumentation. This fixes a problem when the pass would abort because the stack guard AllocInst pointer was null when generating the epilogue -- test CodeGen/AArch64/GlobalISel/arm64-irtranslator-stackprotect.ll. Differential Revision: https://reviews.llvm.org/D54518 llvm-svn: 347862
*	[GlobalISel] Make EnableGlobalISel always set when GISel is enabled	Petr Pavlu	2018-11-29	2	-14/+20
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Change meaning of TargetOptions::EnableGlobalISel. The flag was previously set only when a target switched on GlobalISel but it is now always set when the GlobalISel pipeline is enabled. This makes the flag consistent with TargetOptions::EnableFastISel and allows its use in other parts of the compiler to determine when GlobalISel is enabled. The EnableGlobalISel flag had previouly only one use in TargetPassConfig::isGlobalISelAbortEnabled(). The method used its value to determine if GlobalISel was enabled by a target and returned false in such a case. To preserve the current behaviour, a new flag TargetOptions::GlobalISelAbort is introduced to separately record the abort behaviour. Differential Revision: https://reviews.llvm.org/D54518 llvm-svn: 347861
*	[llvm-mca][MC] Add the ability to declare which processor resources model ↵	Andrea Di Biagio	2018-11-29	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	load/store queues (PR36666). This patch adds the ability to specify via tablegen which processor resources are load/store queue resources. A new tablegen class named MemoryQueue can be optionally used to mark resources that model load/store queues. Information about the load/store queue is collected at 'CodeGenSchedule' stage, and analyzed by the 'SubtargetEmitter' to initialize two new fields in struct MCExtraProcessorInfo named `LoadQueueID` and `StoreQueueID`. Those two fields are identifiers for buffered resources used to describe the load queue and the store queue. Field `BufferSize` is interpreted as the number of entries in the queue, while the number of units is a throughput indicator (i.e. number of available pickers for loads/stores). At construction time, LSUnit in llvm-mca checks for the presence of extra processor information (i.e. MCExtraProcessorInfo) in the scheduling model. If that information is available, and fields LoadQueueID and StoreQueueID are set to a value different than zero (i.e. the invalid processor resource index), then LSUnit initializes its LoadQueue/StoreQueue based on the BufferSize value declared by the two processor resources. With this patch, we more accurately track dynamic dispatch stalls caused by the lack of LS tokens (i.e. load/store queue full). This is also shown by the differences in two BdVer2 tests. Stalls that were previously classified as generic SCHEDULER FULL stalls, are not correctly classified either as "load queue full" or "store queue full". About the differences in the -scheduler-stats view: those differences are expected, because entries in the load/store queue are not released at instruction issue stage. Instead, those are released at instruction executed stage. This is the main reason why for the modified tests, the load/store queues gets full before PdEx is full. Differential Revision: https://reviews.llvm.org/D54957 llvm-svn: 347857
*	AMDGPU/InsertWaitcnts: Remove the dependence on MachineLoopInfo	Nicolai Haehnle	2018-11-29	1	-469/+256
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: MachineLoopInfo cannot be relied on for correctness, because it cannot properly recognize loops in irreducible control flow which can be introduced by late machine basic block optimization passes. See the new test case for the reduced form of an example that occurred in practice. Use a simple fixpoint iteration instead. In order to facilitate this change, refactor WaitcntBrackets so that it only tracks pending events and registers, rather than also maintaining state that is relevant for the high-level algorithm. Various accessor methods can be removed or made private as a consequence. Affects (in radv): - dEQP-VK.glsl.loops.special.{for,while}_uniform_iterations.select_iteration_count_{fragment,vertex} Fixes: r345719 ("AMDGPU: Rewrite SILowerI1Copies to always stay on SALU") Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54231 llvm-svn: 347853
*	AMDGPU/InsertWaitcnt: Consistently use uint32_t for scores / time points	Nicolai Haehnle	2018-11-29	1	-55/+49
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: There is one obsolete reference to using -1 as an indication of "unknown", but this isn't actually used anywhere. Using unsigned makes robust wrapping checks easier. Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, llvm-commits, tpr, t-tye, hakzsam Differential Revision: https://reviews.llvm.org/D54230 llvm-svn: 347852
*	AMDGPU/InsertWaitcnt: Remove unused WaitAtBeginning	Nicolai Haehnle	2018-11-29	1	-27/+2
\| \| \| \| \| \| \| \| \| \|	Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54229 llvm-svn: 347851
*	AMDGPU/InsertWaitcnts: Simplify pending events tracking	Nicolai Haehnle	2018-11-29	1	-191/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Instead of storing the "score" (last time point) of the various relevant events, only store whether an event is pending or not. This is sufficient, because whenever only one event of a count type is pending, its last time point is naturally the upper bound of all time points of this count type, and when multiple event types are pending, the count type has gone out of order and an s_waitcnt to 0 is required to clear any pending event type (and will then clear all pending event types for that count type). This also removes the special handling of GDS_GPR_LOCK and EXP_GPR_LOCK. I do not understand what this special handling ever attempted to achieve. It has existed ever since the original port from an internal code base, so my best guess is that it solved a problem related to EXEC handling in that internal code base. Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54228 llvm-svn: 347850
*	AMDGPU/InsertWaitcnts: Use foreach loops for inst and wait event types	Nicolai Haehnle	2018-11-29	1	-26/+39
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: It hides the type casting ugliness, and I happened to have to add a new such loop (in a later patch). Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54227 llvm-svn: 347849
*	AMDGPU/InsertWaitcnts: Untangle some semi-global state	Nicolai Haehnle	2018-11-29	3	-242/+234
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Reduce the statefulness of the algorithm in two ways: 1. More clearly split generateWaitcntInstBefore into two phases: the first one which determines the required wait, if any, without changing the ScoreBrackets, and the second one which actually inserts the wait and updates the brackets. 2. Communicate pre-existing s_waitcnt instructions using an argument to generateWaitcntInstBefore instead of through the ScoreBrackets. To simplify these changes, a Waitcnt structure is introduced which carries the counts of an s_waitcnt instruction in decoded form. There are some functional changes: 1. The FIXME for the VCCZ bug workaround was implemented: we only wait for SMEM instructions as required instead of waiting on all counters. 2. We now properly track pre-existing waitcnt's in all cases, which leads to less conservative waitcnts being emitted in some cases. s_load_dword ... s_waitcnt lgkmcnt(0) <-- pre-existing wait count ds_read_b32 v0, ... ds_read_b32 v1, ... s_waitcnt lgkmcnt(0) <-- this is too conservative use(v0) more code use(v1) This increases code size a bit, but the reduced latency should still be a win in basically all cases. The worst code size regressions in my shader-db are: WORST REGRESSIONS - Code Size Before After Delta Percentage 1724 1736 12 0.70 % shaders/private/f1-2015/1334.shader_test [0] 2276 2284 8 0.35 % shaders/private/f1-2015/1306.shader_test [0] 4632 4640 8 0.17 % shaders/private/ue4_elemental/62.shader_test [0] 2376 2384 8 0.34 % shaders/private/f1-2015/1308.shader_test [0] 3284 3292 8 0.24 % shaders/private/talos_principle/1955.shader_test [0] Reviewers: msearles, rampitec, scott.linder, kanarayan Subscribers: arsenm, kzhuravl, jvesely, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits, hakzsam Differential Revision: https://reviews.llvm.org/D54226 llvm-svn: 347848
*	Disable TermFolding in LoopSimplifyCFG until PR39783 is fixed	Max Kazantsev	2018-11-29	1	-1/+1
\| \| \| \|	llvm-svn: 347844
*	[LoopStrengthReduce] ComplexityLimit as an option	Sam Parker	2018-11-29	1	-3/+5
\| \| \| \| \| \| \| \|	Convert ComplexityLimit into a command line value. Differential Revision: https://reviews.llvm.org/D54899 llvm-svn: 347843
*	[Inliner] Modify the merging of min-legal-vector-width attribute to better ↵	Craig Topper	2018-11-29	1	-12/+16
\| \| \| \| \| \| \| \| \| \|	handle when the caller or callee don't have the attribute. Lack of an attribute means that the function hasn't been checked for what vector width it requires. So if the caller or the callee doesn't have the attribute we should make sure the combined function after inlining does not have the attribute. If the caller already doesn't have the attribute we can just avoid adding it. Otherwise if the callee doesn't have the attribute just remove the caller's attribute. llvm-svn: 347841
*	[CGP] Improve compile time for complex addressing mode	Serguei Katkov	2018-11-29	1	-106/+58
\| \| \| \| \| \| \| \| \| \| \| \|	This is a fix for PR39625 with improvement the compile time by reducing the number of intermediate Phi nodes created. Reviewers: john.brawn, reames Reviewed By: john.brawn Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54932 llvm-svn: 347839
*	Revert "[TextAPI] Fix a memory leak in the TBD reader."	Juergen Ributzka	2018-11-29	1	-4/+2
\| \| \| \|	llvm-svn: 347838
*	[TextAPI] Fix a memory leak in the TBD reader.	Juergen Ributzka	2018-11-29	1	-2/+4
\| \| \| \| \| \| \|	This fixes an issue where we were leaking the YAML document if there was a parsing error. llvm-svn: 347837
*	[TextAPI] Switch back to a custom Platform enum.	Juergen Ributzka	2018-11-29	3	-25/+26
\| \| \| \| \| \| \| \| \|	Moving to PlatformType from BinaryFormat had some UB fallout when handing unknown platforms or malformed input files. This should fix the sanitizer bots. llvm-svn: 347836
*	[X86] Correct comment. NFC	Craig Topper	2018-11-29	1	-1/+1
\| \| \| \|	llvm-svn: 347835
*	Add Hurd target to LLVMSupport (1/2)	Kristina Brooks	2018-11-29	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \|	Add the required target triples to LLVMSupport to support Hurd in LLVM (formally `pc-hurd-gnu`). Patch by sthibaul (Samuel Thibault) Differential Revision: https://reviews.llvm.org/D54378 llvm-svn: 347832
*	[PowerPC] Fix a conversion is not considered when the ISD::BR_CC node making ↵	Li Jia He	2018-11-29	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the instruction selection Summary: A signed comparison of i1 values produces the opposite result to an unsigned one if the condition code includes less-than or greater-than. This is so because 1 is the most negative signed i1 number and the most positive unsigned i1 number. The CR-logical operations used for such comparisons are non-commutative so for signed comparisons vs. unsigned ones, the input operands just need to be swapped. Reviewed By: steven.zhang Differential Revision: https://reviews.llvm.org/D54825 llvm-svn: 347831
*	NFC. Use unsigned type for uses counter in CaptureTracking	Artur Pilipenko	2018-11-29	1	-2/+2
\| \| \| \|	llvm-svn: 347826
*	[TextAPI] TBD Reader/Writer	Juergen Ributzka	2018-11-29	13	-0/+1395
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Add basic infrastructure for reading and writting TBD files (version 1 - 3). The TextAPI library is not used by anything yet (besides the unit tests). Tool support will be added in a separate commit. The TBD format is currently documented in the implementation file (TextStub.cpp). https://reviews.llvm.org/D53945 Update: This contains changes to fix issues discovered by the bots: - add parentheses to silence warnings. - rename variables - use PlatformType from BinaryFormat llvm-svn: 347823
*	[x86] try select simplification for target-specific nodes	Sanjay Patel	2018-11-28	1	-1/+6
\| \| \| \| \| \| \| \| \| \| \| \| \|	This failed to select (which might be a separate bug) in X86ISelDAGToDAG because we try to create a select node that can be simplified away after rL347227. This change avoids the problem by simplifying the SHRUNKBLEND node sooner. In the test case, we manage to realize that the true/false values of the select (SHRUNKBLEND) are the same thing, so it simplifies away completely. llvm-svn: 347818
*	Revert "[TextAPI] TBD Reader/Writer"	Juergen Ributzka	2018-11-28	13	-1400/+0
\| \| \| \| \| \|	Reverting to unbreak bots. llvm-svn: 347809
*	[TextAPI] TBD Reader/Writer	Juergen Ributzka	2018-11-28	13	-0/+1400
\| \| \| \| \| \| \| \| \| \| \| \| \|	Add basic infrastructure for reading and writting TBD files (version 1 - 3). The TextAPI library is not used by anything yet (besides the unit tests). Tool support will be added in a separate commit. The TBD format is currently documented in the implementation file (TextStub.cpp). https://reviews.llvm.org/D53945 llvm-svn: 347808
*	[DebugInfo] IR/Bitcode changes for DISubprogram flags.	Paul Robinson	2018-11-28	7	-39/+168
\| \| \| \| \| \| \| \| \|	Packing the flags into one bitcode word will save effort in adding new flags in the future. Differential Revision: https://reviews.llvm.org/D54755 llvm-svn: 347806
*	[X86] Make X86TTIImpl::getCastInstrCost properly handle the case where ↵	Craig Topper	2018-11-28	1	-23/+25
\| \| \| \| \| \| \| \| \| \| \| \|	AVX512 is enabled, but 512-bit vectors aren't legal. Unlike most cost model functions this code makes a lot of table lookups without using the results from getTypeLegalizationCost. This means 512-bit vectors can be looked up even when the type isn't legal. This patch adds a check around the two tables that contain 512-bit types to make sure that neither of the types would be split by type legalization. Meaning 512 bit types are illegal. I wanted to write this in a somewhat generic way that uses type legalization query hooks. But if prefered, I can switch to just using is512BitVector and the subtarget feature. Differential Revision: https://reviews.llvm.org/D54984 llvm-svn: 347786
*	[X86] Add some cost model entries for sext/zext for avx512bw	Craig Topper	2018-11-28	1	-0/+27
\| \| \| \| \| \| \| \| \| \|	This fixes some of scalarization costs reported for sext/zext using avx512bw. This does not fix all scalarization costs being reported. Just the worst. I've restricted this only to combinations of types that are legal with avx512bw like v32i1/v64i1/v32i16/v64i8 and conversions between vXi1 and vXi8/vXi16 with legal vXi8/vXi16 result types. Differential Revision: https://reviews.llvm.org/D54979 llvm-svn: 347785
*	[X86] Add a combine for back to back VSRAI instructions	Craig Topper	2018-11-28	1	-0/+11
\| \| \| \| \| \| \| \|	Expansion of SIGN_EXTEND_INREG can create a VSRAI instruction. If there is already a VSRAI after it, we should combine them into a larger VSRAI Differential Revision: https://reviews.llvm.org/D54959 llvm-svn: 347784
*	[DebugInfo] Give inlinable calls DILocs (PR39807)	Jeremy Morse	2018-11-28	1	-8/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In PR39807 we incorrectly handle circumstances where calls are common'd from conditional blocks into the parent BB. Calls that can be inlined must always have DebugLocs, however we strip them during commoning, which the IR verifier asserts on. Fix this by using applyMergedLocation: it will perform the same DebugLoc stripping of conditional Locs, but will also generate an unknown location DebugLoc that satisfies the requirement for inlinable calls to always have locations. Some of the prior logic for selecting a DebugLoc is now likely redundant; I'll generate a follow-up to remove it (involves editing more regression tests). Differential Revision: https://reviews.llvm.org/D54997 llvm-svn: 347782
*	[LICM] Enable control flow hoisting by default	John Brawn	2018-11-28	1	-1/+1
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D54949 llvm-svn: 347778
*	[LICM] Reapply r347190 "Make LICM able to hoist phis" with fix	John Brawn	2018-11-28	1	-15/+312
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit caused failures because it failed to correctly handle cases where we hoist a phi, then hoist a use of that phi, then have to rehoist that use. We need to make sure that we rehoist the use to _after_ the hoisted phi, which we do by always rehoisting to the immediate dominator instead of just rehoisting everything to the original preheader. An option is also added to control whether control flow is hoisted, which is off in this commit but will be turned on in a subsequent commit. Differential Revision: https://reviews.llvm.org/D52827 llvm-svn: 347776
*	[RISCV] Support .option push and .option pop	Alex Bradbury	2018-11-28	5	-1/+62
\| \| \| \| \| \| \| \| \|	This adds support in the RISCVAsmParser the storing of Subtarget feature bits to a stack so that they can be pushed/popped to enable/disable multiple features at once. Differential Revision: https://reviews.llvm.org/D46424 Patch by Lewis Revill. llvm-svn: 347774
*	[InstCombine] Combine saturating add/sub with constant operands	Nikita Popov	2018-11-28	1	-0/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Combine sat(sat(X + C1) + C2) -> sat(X + (C1+C2)) and sat(sat(X - C1) - C2) -> sat(X - (C1+C2)) if the sign of C1 and C2 matches. In the unsigned case we can compute C1+C2 with saturating arithmetic, and InstSimplify will reduce this just to the saturation value. For the signed case, we cannot perform the simplification if the result of the addition overflows. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347773
*	[InstCombine] Canonicalize ssub.sat to sadd.sat	Nikita Popov	2018-11-28	1	-0/+11
\| \| \| \| \| \| \| \| \|	Canonicalize ssub.sat(X, C) to ssub.sat(X, -C) if C is constant and not signed minimum. This will help further optimizations to apply. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347772
*	[ValueTracking] Determine always-overflow condition for unsigned sub	Nikita Popov	2018-11-28	1	-6/+11
\| \| \| \| \| \| \| \| \| \| \| \|	Always-overflow was already determined for unsigned addition, but not subtraction. This patch establishes parity. This allows us to perform some additional simplifications for signed saturating subtractions. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347771
*	[InstCombine] Use known overflow information for saturating add/sub	Nikita Popov	2018-11-28	1	-0/+38
\| \| \| \| \| \| \| \| \| \| \| \| \|	If ValueTracking can determine that the add/sub can newer overflow, replace it with the corresponding nuw/nsw add/sub. Additionally, for the unsigned case, if ValueTracking determines that the add/sub always overflows, replace the result with the saturation value. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347770
*	[InstCombine] Canonicalize const arg for saturating adds	Nikita Popov	2018-11-28	1	-0/+6
\| \| \| \| \| \| \| \| \|	If a saturating add intrinsic has one constant argument, make sure it is on the RHS. This will simplify further transformations. This change is part of https://reviews.llvm.org/D54534. llvm-svn: 347769
*	[Hexagon] Add missing flags to ELF YAMLIO	Krzysztof Parzyszek	2018-11-28	1	-0/+8
\| \| \| \|	llvm-svn: 347768
*	[ThinLTO] Correct linkonce_any function import linkage. NFC.	Xin Tong	2018-11-28	1	-5/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This is a NFC as we do not import non-odr vague linkage when computing for import list for a module. Reviewers: tejohnson, pcc Subscribers: inglorion, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D54928 llvm-svn: 347763
*	Fix build error due to missing cctype include	David Spickett	2018-11-28	2	-1/+1
\| \| \| \| \| \|	in ARMTargetParser.cpp. llvm-svn: 347762
*	[SLP]Fix PR39774: Set ReductionRoot if the original instruction is vectorized.	Alexey Bataev	2018-11-28	1	-5/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If the original reduction root instruction was vectorized, it might be removed from the tree. It means that the insertion point may become invalidated and the whole vectorization of the reduction leads to the incorrect output result. The ReductionRoot instruction must be marked as externally used so it could not be removed. Otherwise it might cause inconsistency with the cost model and we may end up with too optimistic optimization. Reviewers: RKSimon, spatel, hfinkel, mkuper Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D54955 llvm-svn: 347759
*	[MachineScheduler] Add support for clustering mem ops with FI base operands	Francis Visoiu Mistrih	2018-11-28	2	-25/+91
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Before this patch, the following stores in `merge_fail` would fail to be merged, while they would get merged in `merge_ok`: ``` void use(unsigned long long ); void merge_fail(unsigned key, unsigned index) { unsigned long long args[8]; args[0] = key; args[1] = index; use(args); } void merge_ok(unsigned long long dst, unsigned a, unsigned b) { dst[0] = a; dst[1] = b; } ``` The reason is that `getMemOpBaseImmOfs` would return false for FI base operands. This adds support for this. Differential Revision: https://reviews.llvm.org/D54847 llvm-svn: 347747