bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[DAGCombine] Don't combine sext with extload if sextload is not supported ↵	Guozhi Wei	2017-10-27	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	and extload has multi users In function DAGCombiner::visitSIGN_EXTEND_INREG, sext can be combined with extload even if sextload is not supported by target, then if sext is the only user of extload, there is no big difference, no harm no benefit. if extload has more than one user, the combined sextload may block extload from combining with other zext, causes extra zext instructions generated. As demonstrated by the attached test case. This patch add the constraint that when sextload is not supported by target, sext can only be combined with extload if it is the only user of extload. Differential Revision: https://reviews.llvm.org/D39108 llvm-svn: 316802
*	Handle undefined weak hidden symbols on all architectures.	Rafael Espindola	2017-10-27	2	-2/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were handling the non-hidden case in lib/Target/TargetMachine.cpp, but the hidden case was handled in architecture dependent code and only X86_64 and AArch64 were covered. While it is true that some code sequences in some ABIs might be able to produce the correct value at runtime, that doesn't seem to be the common case. I left the AArch64 code in place since it also forces a got access for non-pic code. It is not clear if that is needed, but it is probably better to change that in another commit. llvm-svn: 316799
*	[X86] Add fast-isel tests for integer shifts. We definitely had no coverage ↵	Craig Topper	2017-10-27	1	-0/+383
\| \| \| \| \| \|	of i16 and i32/i64 are only tested by larger tests. llvm-svn: 316796
*	Improve clamp recognition in ValueTracking.	Artur Gainullin	2017-10-27	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: ValueTracking was recognizing not all variations of clamp. Swapping of true value and false value of select was added to fix this problem. The first patch was reverted because it caused miscompile in NVPTX target. Added corresponding test cases. Reviewers: spatel, majnemer, efriedma, reames Subscribers: llvm-commits, jholewinski Differential Revision: https://reviews.llvm.org/D39240 llvm-svn: 316795
*	[X86] Add avx512vl command line to fast-isel-nontemporal.ll	Craig Topper	2017-10-27	1	-0/+27
\| \| \| \|	llvm-svn: 316789
*	[Hexagon] Fix an incorrect assertion in HexagonConstExtenders.cpp	Krzysztof Parzyszek	2017-10-27	1	-0/+45
\| \| \| \| \| \| \|	Making sure that an instruction has fewer operands than required, then attempting to access one out of range is going to fail. llvm-svn: 316785
*	[X86][SSE] Add tests for inserting all-bits (-1) into a vector	Simon Pilgrim	2017-10-27	1	-0/+504
\| \| \| \| \| \|	We should be able to do this by re-materializing an all-bits vector and then blending with it llvm-svn: 316779
*	[CodeGen][ExpandMemCmp][NFC] Simplify load sequence generation.	Clement Courbet	2017-10-27	1	-53/+82
\| \| \| \|	llvm-svn: 316763
*	DAG: Fold fma (fneg x), K, y -> fma x, -K, y	Matt Arsenault	2017-10-27	3	-4/+50
\| \| \| \|	llvm-svn: 316753
*	[CodeGen][ExpandMemcmp][NFC] Make tests more complete.	Clement Courbet	2017-10-27	1	-0/+14
\| \| \| \|	llvm-svn: 316749
*	Add subclass data to the FoldingSetNode for MemIntrinsicSDNodes.	Sean Fertile	2017-10-27	1	-0/+37
\| \| \| \| \| \| \| \| \| \| \|	Not having the subclass data on an MemIntrinsicSDNodes means it was possible to try to fold 2 nodes with the same operands but differing MMO flags. This would trip an assertion when trying to refine the alignment between the 2 MachineMemOperands. Differential Revision: https://reviews.llvm.org/D38898 llvm-svn: 316737
*	[ARM] Honor -mfloat-abi for libcall calling convention	Eli Friedman	2017-10-26	1	-22/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As far as I can tell, this matches gcc: -mfloat-abi determines the calling convention for all functions except those explicitly defined as soft-float in the ARM RTABI. This change only affects cases where the user specifies -mfloat-abi to override the default calling convention derived from the target triple. Fixes https://bugs.llvm.org//show_bug.cgi?id=34530. Differential Revision: https://reviews.llvm.org/D38299 llvm-svn: 316708
*	[X86] Improve handling of UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG to support ↵	Craig Topper	2017-10-26	1	-3/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	64-bit extensions. If the extend type is 64-bits, emit a 32-bit -> 64-bit extend after the UDIVREM8_ZEXT_HREG/UDIVREM8_SEXT_HREG operation. This gives a shorter encoding for the second extend in the sext case, and allows us to completely remove the second extend in the zext case. This also adds known bit and num sign bits support for UDIVREM8_ZEXT_HREG/SDIVREM8_SEXT_HREG. Differential Revision: https://reviews.llvm.org/D38275 llvm-svn: 316702
*	[x86] use an insert op to put one variable element into a constant of vectors	Sanjay Patel	2017-10-26	2	-552/+144
\| \| \| \| \| \| \| \|	Instead of loading (a potential ton of) scalar constants, load those as a vector and then insert into it. Differential Revision: https://reviews.llvm.org/D38756 llvm-svn: 316685
*	AMDGPU: Commit missing fence-barrier test	Konstantin Zhuravlyov	2017-10-26	1	-0/+197
\| \| \| \| \| \|	This should have been committed with memory model implementation llvm-svn: 316680
*	Represent runtime preemption in the IR.	Sean Fertile	2017-10-26	4	-0/+1013
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently we do not represent runtime preemption in the IR, which has several drawbacks: 1) The semantics of GlobalValues differ depending on the object file format you are targeting (as well as the relocation-model and -fPIE value). 2) We have no way of disabling inlining of run time interposable functions, since in the IR we only know if a function is link-time interposable. Because of this llvm cannot support elf-interposition semantics. 3) In LTO builds of executables we will have extra knowledge that a symbol resolved to a local definition and can't be preemptable, but have no way to propagate that knowledge through the compiler. This patch adds preemptability specifiers to the IR with the following meaning: dso_local --> means the compiler may assume the symbol will resolve to a definition within the current linkage unit and the symbol may be accessed directly even if the definition is not within this compilation unit. dso_preemptable --> means that the compiler must assume the GlobalValue may be replaced with a definition from outside the current linkage unit at runtime. To ease transitioning dso_preemptable is treated as a 'default' in that low-level codegen will still do the same checks it did previously to see if a symbol should be accessed indirectly. Eventually when IR producers emit the specifiers on all Globalvalues we can change dso_preemptable to mean 'always access indirectly', and remove the current logic. Differential Revision: https://reviews.llvm.org/D20217 llvm-svn: 316668
*	AMDGPU: Handle s_buffer_load_dword hazard on SI	Marek Olsak	2017-10-26	1	-0/+17
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D39171 llvm-svn: 316666
*	[mips] Fix PR35071	Simon Dardis	2017-10-26	1	-0/+73
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	PR35071 exposed the fact that MipsInstrInfo::removeBranch did not walk past debug instructions when removing branches for the control flow optimizer, which lead to duplicated conditional branches. If the target of the branch was a removable block, only the conditional branch in the terminating position would have it's MBB operands updated, leaving the first branch with a dangling MBB operand. The MIPS long branch pass would then trigger an assertion when attempting to examine the instruction with dangling MBB operand. This resolves PR35071. Thanks to Alex Richardson for reporting the issue! Reviewers: atanasyan Differential Revision: https://reviews.llvm.org/D39288 llvm-svn: 316654
*	[PowerPC] Use record-form instruction for Less-or-Equal -1 and ↵	Hiroshi Inoue	2017-10-26	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \|	Greater-or-Equal 1 Currently a record-form instruction is used for comparison of "greater than -1" and "less than 1" by modifying the predicate (e.g. LT 1 into LE 0) in addition to the naive case of comparison against 0. This patch also enables emitting a record-form instruction for "less than or equal to -1" (i.e. "less than 0") and "greater than or equal to 1" (i.e. "greater than 0") to increase the optimization opportunities. Differential Revision: https://reviews.llvm.org/D38941 llvm-svn: 316647
*	Fix CodeGen/AMDGPU/fcanonicalize-elimination.ll on FreeBSD 11.0	Alexander Richardson	2017-10-25	1	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: On FreeBSD11.0 the FileCheck NOT string "1.0" will be matched by `.amd_amdgpu_isa "amdgcn-unknown-freebsd11.0--gfx802"` at the end of the file. Add a CHECK for that directive to avoid failing the test. Reviewers: rampitec, kzhuravl Reviewed By: rampitec, kzhuravl Subscribers: emaste, kzhuravl, wdng, nhaehnle, yaxunl, dstuttard, tpr, t-tye, llvm-commits, krytarowski Differential Revision: https://reviews.llvm.org/D39306 llvm-svn: 316616
*	[Hexagon] Account for negative offset when limiting max deviation	Krzysztof Parzyszek	2017-10-25	1	-0/+43
\| \| \| \| \| \| \| \| \| \| \| \| \|	In getOffsetRange, Max can be set to 0 to force the extender replacement to be at or below the original value. This would cause the new offset to be non-negative, which is preferred for memory instructions (to reduce the likelihood of it getting constant-extended due to predication). The problem happens when the range is shifted by an offset (present in the instruction being examined) and the offset is negative. The entire range for the allowable deviation will then be strictly negative. This creates a problem, since 0 is assumed to be a valid deviation. llvm-svn: 316601
*	AMDGPU: Cleanup memory legalizer load/store tests	Konstantin Zhuravlyov	2017-10-25	4	-378/+375
\| \| \| \|	llvm-svn: 316590
*	AMDGPU/NFC: Rename memory legalizer tests:	Konstantin Zhuravlyov	2017-10-25	2	-0/+0
\| \| \| \| \| \| \|	- memory-legalizer-atomic-load.ll -> memory-legalizer-load.ll - memory-legalizer-atomic-store.ll -> memory-legalizer-store.ll llvm-svn: 316586
*	[inlineasm] Fix crash when number of matched input constraint operands ↵	Daniil Fukalov	2017-10-25	1	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	overflows signed char In a case when number of output constraint operands that has matched input operands doesn't fit to signed char, TargetLowering::ParseConstraints() can try to access ConstraintOperands (that is std::vector) with negative index. Reviewers: rampitec, arsenm Differential Review: https://reviews.llvm.org/D39125 llvm-svn: 316574
*	[ARM GlobalISel] Remove redundant testcases. NFC	Diana Picus	2017-10-25	1	-53/+0
\| \| \| \| \| \| \| \|	Remove the G_FADD testcases from arm-legalizer.mir, they are covered by arm-legalizer-fp.mir (I probably forgot to delete them when I created that test). llvm-svn: 316573
*	[ARM GlobalISel] Update test after r316479. NFC	Diana Picus	2017-10-25	1	-58/+11
\| \| \| \| \| \| \|	No need to check register classes in the register block anymore, since we can now much more conveniently check them at their def. llvm-svn: 316572
*	[ARM GlobalISel] Fix call opcodes	Diana Picus	2017-10-25	7	-159/+163
\| \| \| \| \| \| \| \|	We were generating BLX for all the calls, which was incorrect in most cases. Update ARMCallLowering to generate BL for direct calls, and BLX, BX_CALL or BMOVPCRX_CALL for indirect calls. llvm-svn: 316570
*	[ARM GlobalISel] Split test into 3. NFC	Diana Picus	2017-10-25	3	-499/+502
\| \| \| \| \| \| \| \| \| \| \|	Separate the test cases that deal with calls from the rest of the IR Translator tests. We split into 2 different files, one for testing parameter and result lowering, and one for testing the various different kinds of calls that can occur (BL, BLX, BX_CALL etc). llvm-svn: 316569
*	Re-land "[CodeGen][ExpandMemcmp][NFC] Allow memcmp to expand to vector loads ↵	Clement Courbet	2017-10-25	1	-0/+7
\| \| \| \| \| \| \| \| \| \| \| \| \|	(1)" Compute the actual decomposition only after deciding whether to expand of not. Else, it's easy to make the compiler OOM with: `memcpy(dst, src, 0xffffffffffffffff);`, which typically happens if someone mistakenly passes a negative value. Add a test. This reverts commit f8fc02fbd4ab33383c010d33675acf9763d0bd44. llvm-svn: 316567
*	[ARM] Swap cmp operands for automatic shifts	Sam Parker	2017-10-25	2	-52/+154
\| \| \| \| \| \| \| \| \| \|	Swap the compare operands if the lhs is a shift and the rhs isn't, as in arm and T2 the shift can be performed by the compare for its second operand. Differential Revision: https://reviews.llvm.org/D39004 llvm-svn: 316562
*	[AArch64] Add support for dllimport of values and functions	Martin Storsjo	2017-10-25	1	-0/+54
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, the dllimport attribute did the right thing in terms of treating it as a pointer to a value, but this makes sure the names get mangled properly, and calls to such functions load the function from the __imp_ pointer. This is based on SVN r212431 and r212430 where the same was implemented for ARM. Differential Revision: https://reviews.llvm.org/D38530 llvm-svn: 316555
*	DAG: Fix creating select with wrong condition type	Matt Arsenault	2017-10-25	2	-29/+62
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This code added in r297930 assumed that it could create a select with a condition type that is just an integer bitcast of the selected type. For AMDGPU any vselect is going to be scalarized (although the vector types are legal), and all select conditions must be i1 (the same as getSetCCResultType). This logic doesn't really make sense to me, but there's never really been a consistent policy in what the select condition mask type is supposed to be. Try to extend the logic for skipping the transform for condition types that aren't setccs. It doesn't seem quite right to me though, but checking conditions that seem more sensible (like whether the vselect is going to be expanded) doesn't work since this seems to depend on that also. llvm-svn: 316554
*	[NVPTX] allow address space inference for volatile loads/stores.	Artem Belevich	2017-10-24	1	-0/+97
\| \| \| \| \| \| \| \| \| \|	If particular target supports volatile memory access operations, we can avoid AS casting to generic AS. Currently it's only enabled in NVPTX for loads and stores that access global & shared AS. Differential Revision: https://reviews.llvm.org/D39026 llvm-svn: 316495
*	[X86][Broadwell] Added the instruction scheduling information for the ↵	Gadi Haber	2017-10-24	19	-1372/+1372
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Broadwell CPU. Adding the scheduling information for the Browadwell (BDW) CPU target. This patch adds the instruction scheduling information for the Broadwell (BDW) architecture target by adding the file X86SchedBroadwell.td located under the X86 Target. We used the scheduling information retrieved from the Broadwell architects in order to create the file. The scheduling information includes latency, number of micro-Ops and used ports by each BDW instruction. The patch continues the scheduling replacement and insertion effort started with the SandyBridge (SNB) target in r310792, the Haswell (HSW) target in r311879, the SkylakeClient (SKL) target in rL313613 + rL315978 and the SkylakeServer (SKX) in rL315175. Performance fluctuations may be expected due to code alignment effects. Reviewers: zvi, RKSimon, craig.topper Differential Revision: https://reviews.llvm.org/D39054 Change-Id: If6f799e5ff60e1091c8d43b05ea78c53581bae01 llvm-svn: 316492
*	MIR: Print the register class or bank in vreg defs	Justin Bogner	2017-10-24	245	-6937/+5380
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This updates the MIRPrinter to include the regclass when printing virtual register defs, which is already valid syntax for the parser. That is, given 64 bit %0 and %1 in a "gpr" regbank, %1(s64) = COPY %0(s64) would now be written as %1:gpr(s64) = COPY %0(s64) While this change alone introduces a bit of redundancy with the registers block, it allows us to update the tests to be more concise and understandable and brings us closer to being able to remove the registers block completely. Note: We generally only print the class in defs, but there is one exception. If there are uses without any defs whatsoever, we'll print the class on all uses. I'm not completely convinced this comes up in meaningful machine IR, but for now the MIRParser and MachineVerifier both accept that kind of stuff, so we don't want to have a situation where we can print something we can't parse. llvm-svn: 316479
*	[PowerPC] Try to simplify a Swap if it feeds a Splat	Stefan Pintilie	2017-10-24	2	-2/+136
\| \| \| \| \| \| \| \| \| \| \| \|	If we have the situation where a Swap feeds a Splat we can sometimes change the index on the Splat and then remove the Swap instruction. Fixed the test case that was failing and recommit after pulling the original commit. Original revision is here: https://reviews.llvm.org/D39009 llvm-svn: 316478
*	[X86][AVX] ComputeNumSignBitsForTargetNode - add support for X86ISD::VTRUNC	Simon Pilgrim	2017-10-24	2	-15/+5
\| \| \| \|	llvm-svn: 316462
*	[SelectionDAG] Add VSELECT support to ComputeNumSignBits	Simon Pilgrim	2017-10-24	1	-2/+2
\| \| \| \|	llvm-svn: 316457
*	[X86] truncateVectorCompareWithPACKSS - use PACKSSDW/PACKSSWB instead of ↵	Simon Pilgrim	2017-10-24	12	-287/+283
\| \| \| \| \| \| \| \|	just PACKSSWB. By using the widest type possible for PACKSS truncation we have a better chance of being able to peek through bitcasts and improves other combines driven by ComputeNumSignBits. llvm-svn: 316448
*	[x86] add more vector ISA variants for memcmp expansion; NFC	Sanjay Patel	2017-10-24	1	-4/+62
\| \| \| \| \| \|	...because every swiss cheese has different holes. llvm-svn: 316446
*	Update f16c instruction scheduling on btver2.	Andrew V. Tischenko	2017-10-24	1	-33/+33
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D39051 llvm-svn: 316435
*	X86CallFrameOptimization: Recognize 'store 0/-1 using and/or' idioms	Zvi Rackover	2017-10-24	3	-177/+259
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: r264440 added or/and patterns for storing -1 or 0 with the intention of decreasing code size. However, X86CallFrameOptimization does not recognize these memory accesses so it will not replace them with push's when profitable. This patch fixes this problem by teaching X86CallFrameOptimization these store 0/-1 idioms. An alternative fix would be to prevent the 'store 0/1 idioms' patterns from firing when accessing the stack. This would save the need to teach the pass about these idioms. However, because X86CallFrameOptimization does not always fire we may result in cases where neither X86CallFrameOptimization not the patterns for 'store 0/1 idioms' fire. Fixes pr34863 Reviewers: DavidKreitzer, guyblank, aymanmus Reviewed By: aymanmus Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38738 llvm-svn: 316431
*	AMDGPU: Add new intrinsic llvm.amdgcn.kill(i1)	Marek Olsak	2017-10-24	2	-1/+242
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Kill the thread if operand 0 == false. llvm.amdgcn.wqm.vote can be applied to the operand. Also allow kill in all shader stages. Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D38544 llvm-svn: 316427
*	AMDGPU: Add llvm.amdgcn.wqm.vote intrinsic	Marek Olsak	2017-10-24	1	-0/+52
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, t-tye Differential Revision: https://reviews.llvm.org/D38543 llvm-svn: 316426
*	X86: Fix X86CallFrameOptimization to search for the COPY StackPointer	Zvi Rackover	2017-10-24	3	-41/+71
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	SelectionDAG inserts a copy of ESP into a virtual register. X86CallFrameOptimization assumed that the COPY, if present, is always right after the call-frame setup instruction (ADJCALLSTACKDOWN). This was a wrong assumption as the COPY can be located anywhere between the call-frame setup instruction and its first use. If the COPY happened to be located in a different location than what X86CallFrameOptimization assumed, visiting it while processing the call chain would lead to a conservative bail-out. The fix is quite straightfoward, scan ahead for the stack-pointer copy and make note of it so it can be ignored while processing the call chain. Fixes pr34903 Differential Revision: https://reviews.llvm.org/D38730 llvm-svn: 316416
*	[MC] Adding code padding for performance stability - infrastructure. NFC.	Omer Paparo Bivas	2017-10-24	1	-1/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Infrastructure designed for padding code with nop instructions in key places such that preformance improvement will be achieved. The infrastructure is implemented such that the padding is done in the Assembler after the layout is done and all IPs and alignments are known. This patch by itself in a NFC. Future patches will make use of this infrastructure to implement required policies for code padding. Reviewers: aaboud zvi craig.topper gadi.haber Differential revision: https://reviews.llvm.org/D34393 Change-Id: I92110d0c0a757080a8405636914a93ef6f8ad00e llvm-svn: 316413
*	X86: Register the X86CallFrameOptimization pass	Zvi Rackover	2017-10-24	1	-0/+125
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The motivation of this change is to enable .mir testing for this pass. Added one test case to cover the functionality, this same case will be improved by a future patch. Reviewers: igorb, guyblank, DavidKreitzer Reviewed By: guyblank, DavidKreitzer Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D38729 llvm-svn: 316412
*	[MachineOutliner] Add optimisation remarks for successful outlining	Jessica Paquette	2017-10-23	1	-9/+59
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit adds optimisation remarks for outlining which fire when a function is successfully outlined. To do this, OutlinedFunctions must now contain references to their Candidates. Since the Candidates must still be sorted and worked on separately, this is done by working on everything in terms of shared_ptrs to Candidates. This is good; it means that we can easily move everything to outlining in terms of the OutlinedFunctions rather than the individual Candidates. This is far more intuitive than what's currently there! (Remarks are output when a function is created for some group of Candidates. In a later commit, all of the outlining logic should be rewritten so that we loop over OutlinedFunctions rather than over Candidates.) llvm-svn: 316396
*	[GISel][ARM]: Fix illegal Generic copies in tests	Aditya Nandakumar	2017-10-23	4	-227/+382
\| \| \| \| \| \| \|	This is in preparation for a verifier check that makes sure copies are of the same size (when generic virtual registers are involved). llvm-svn: 316388
*	[GISel][AArch64]: Fix illegal Generic copies in tests	Aditya Nandakumar	2017-10-23	16	-77/+168
\| \| \| \| \| \| \|	This is in preparation for a verifier check that makes sure copies are of the same size (when generic virtual registers are involved). llvm-svn: 316387