bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[Mips][FastISel] Fix handling of icmp with i1 type	Petar Jovanovic	2018-07-17	2	-2/+15
\| \| \| \| \| \| \| \| \| \| \|	The Mips FastISel back-end does not extend i1 values while lowering icmp. Ensure that we bail into DAG ISel when handling this case. Patch by Dragan Mladjenovic. Differential Revision: https://reviews.llvm.org/D49290 llvm-svn: 337288
*	More fixes for subreg join failure in RegCoalescer	Tim Renouf	2018-07-17	1	-0/+319
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Part of the adjustCopiesBackFrom method wasn't correctly dealing with SubRange intervals when updating. 2 changes. The first to ensure that bogus SubRange Segments aren't propagated when encountering Segments of the form [1234r, 1234d:0) when preparing to merge value numbers. These can be removed in this case. The second forces a shrinkToUses call if SubRanges end on the copy index (instead of just the parent register). V2: Addressed review comments, plus MIR test instead of ll test Subscribers: MatzeB, qcolombet, nhaehnle Differential Revision: https://reviews.llvm.org/D40308 Change-Id: I1d2b2b4beea802fce11da01edf71feb2064aab05 llvm-svn: 337273
*	[DAGCombiner] Call SimplifyDemandedVectorElts from EXTRACT_VECTOR_ELT	Simon Pilgrim	2018-07-17	9	-592/+317
\| \| \| \| \| \| \| \|	If we are only extracting vector elements via EXTRACT_VECTOR_ELT(s) we may be able to use SimplifyDemandedVectorElts to avoid unnecessary vector ops. Differential Revision: https://reviews.llvm.org/D49262 llvm-svn: 337258
*	[Sparc] Do not depend on icc for ta 1	Daniel Cederman	2018-07-17	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	The ta instruction will always trap, regardless of the value of the integer condition codes. TRAPri is marked as using icc, so we cannot use a pattern for TRAPri to implement ta 1, as verify-machineinstrs can complain that icc is not defined. Instead we implement ta 1 the same way as ta 5. llvm-svn: 337236
*	[X86] Add full set of patterns for turning ceil/floor/trunc/rint/nearbyint ↵	Craig Topper	2018-07-17	1	-408/+204
\| \| \| \| \| \| \| \|	into rndscale with loads, broadcast, and masking. This amounts to pretty ridiculous number of patterns. Ideally we'd canonicalize the X86ISD::VRNDSCALE earlier to reuse those patterns. I briefly looked into doing that, but some strict FP operations could still get converted to rint and nearbyint during isel. It's probably still worthwhile to look into. This patch is meant as a starting point to work from. llvm-svn: 337234
*	[X86] Add test cases for selecting floor/ceil/trunc/rint/nearbyint to ↵	Craig Topper	2018-07-17	1	-0/+3597
\| \| \| \| \| \|	rndscale with masking, loading, and broadcasting. llvm-svn: 337233
*	[X86] Add a missing FMA3 scalar intrinsic pattern.	Craig Topper	2018-07-16	1	-0/+31
\| \| \| \| \| \|	This allows us to use 231 form to fold an insertelement on the add input to the fma. There is technically no software intrinsic that can use this until AVX512F, but it can be manually built up from other intrinsics. llvm-svn: 337223
*	[Intrinsics] define funnel shift IR intrinsics + DAG builder support	Sanjay Patel	2018-07-16	6	-0/+1926
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As discussed here: http://lists.llvm.org/pipermail/llvm-dev/2018-May/123292.html http://lists.llvm.org/pipermail/llvm-dev/2018-July/124400.html We want to add rotate intrinsics because the IR expansion of that pattern is 4+ instructions, and we can lose pieces of the pattern before it gets to the backend. Generalizing the operation by allowing 2 different input values (plus the 3rd shift/rotate amount) gives us a "funnel shift" operation which may also be a single hardware instruction. Initially, I thought we needed to define new DAG nodes for these ops, and I spent time working on that (much larger patch), but then I concluded that we don't need it. At least as a first step, we have all of the backend support necessary to match these ops...because it was required. And shepherding these through the IR optimizer is the primary concern, so the IR intrinsics are likely all that we'll ever need. There was also a question about converting the intrinsics to the existing ROTL/ROTR DAG nodes (along with improving the oversized shift documentation). Again, I don't think that's strictly necessary (as the test results here prove). That can be an efficiency improvement as a small follow-up patch. So all we're left with is documentation, definition of the IR intrinsics, and DAG builder support. Differential Revision: https://reviews.llvm.org/D49242 llvm-svn: 337221
*	[AMDGPU] [AMDGPU] Support a fdot2 pattern.	Farhana Aleen	2018-07-16	1	-0/+232
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Optimize fma((float)S0.x, (float)S1.x fma((float)S0.y, (float)S1.y, z)) -> fdot2((v2f16)S0, (v2f16)S1, (float)z) Author: FarhanaAleen Reviewed By: rampitec, b-sumner Subscribers: AMDGPU Differential Revision: https://reviews.llvm.org/D49146 llvm-svn: 337198
*	[RegAlloc] Skip global splitting if the live range is huge and its spill is	Wei Mi	2018-07-16	1	-0/+150
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	trivially rematerializable. We run into a case where machineLICM hoists a large number of live ranges outside of a big loop because it thinks those live ranges are trivially rematerializable. In regalloc, global splitting is tried out first for those live ranges before they are spilled and rematerialized. Because the global splitting algorithm is quadratic, increasing a lot of global splitting candidates causes huge compile time increase (50s to 1400s on my local machine when compiling a module). However, we think for live ranges which are very large and are trivially rematerialiable, it is better to just skip global splitting so as to save compile time with little chance of sacrificing performance. We uses the segment size of live range to indirectly evaluate whether the global splitting of the live range can introduce high cost, and use an option as a knob to adjust the size limit threshold. Differential Revision: https://reviews.llvm.org/D49353 llvm-svn: 337186
*	[x86/SLH] Completely rework how we sink post-load hardening past data	Chandler Carruth	2018-07-16	1	-0/+68
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	invariant instructions to be both more correct and much more powerful. While testing, I continued to find issues with sinking post-load hardening. Unfortunately, it was amazingly hard to create any useful tests of this because we were mostly sinking across copies and other loading instructions. The fact that we couldn't sink past normal arithmetic was really a big oversight. So first, I've ported roughly the same set of instructions from the data invariant loads to also have their non-loading varieties understood to be data invariant. I've also added a few instructions that came up so often it again made testing complicated: inc, dec, and lea. With this, I was able to shake out a few nasty bugs in the validity checking. We need to restrict to hardening single-def instructions with defined registers that match a particular form: GPRs that don't have a NOREX constraint directly attached to their register class. The (tiny!) test case included catches all of the issues I was seeing (once we can sink the hardening at all) except for the NOREX issue. The only test I have there is horrible. It is large, inexplicable, and doesn't even produce an error unless you try to emit encodings. I can keep looking for a way to test it, but I'm out of ideas really. Thanks to Ben for giving me at least a sanity-check review. I'll follow up with Craig to go over this more thoroughly post-commit, but without it SLH crashes everywhere so landing it for now. Differential Revision: https://reviews.llvm.org/D49378 llvm-svn: 337177
*	[MIPS GlobalISel] Select instructions to load and store i32 on stack	Petar Jovanovic	2018-07-16	4	-0/+219
\| \| \| \| \| \| \| \| \| \| \|	Add code for selection of G_LOAD, G_STORE, G_GEP, G_FRAMEINDEX and G_CONSTANT. Support loads and stores of i32 values. Patch by Petar Avramovic. Differential Revision: https://reviews.llvm.org/D48957 llvm-svn: 337168
*	[X86][AArch64][DAGCombine] Unfold 'check for [no] signed truncation' pattern	Roman Lebedev	2018-07-16	4	-138/+118
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: [[ https://bugs.llvm.org/show_bug.cgi?id=38149 \| PR38149 ]] As discussed in https://reviews.llvm.org/D49179#1158957 and later, the IR for 'check for [no] signed truncation' pattern can be improved: https://rise4fun.com/Alive/gBf ^ that pattern will be produced by Implicit Integer Truncation sanitizer, https://reviews.llvm.org/D48958 https://bugs.llvm.org/show_bug.cgi?id=21530 in signed case, therefore it is probably a good idea to improve it. But the IR-optimal patter does not lower efficiently, so we want to undo it.. This handles the simple pattern. There is a second pattern with predicate and constants inverted. NOTE: we do not check uses here. we always do the transform. Reviewers: spatel, craig.topper, RKSimon, javed.absar Reviewed By: spatel Subscribers: kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D49266 llvm-svn: 337166
*	[Sparc] Use the correct encoding for ta 3	Daniel Cederman	2018-07-16	1	-3/+3
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The old encoding generated a "tn %g1 + 3" instruction instead of the expected "ta 3". Reviewers: venkatra, jyknight Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D49171 llvm-svn: 337165
*	[Sparc] Use the names .rem and .urem instead of __modsi3 and __umodsi3	Daniel Cederman	2018-07-16	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: These are the names used in libgcc. Reviewers: venkatra, jyknight, ekedaigle Reviewed By: jyknight Subscribers: joerg, fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D48915 llvm-svn: 337164
*	[Sparc] Generate ta 1 for the @llvm.debugtrap intrinsic	Daniel Cederman	2018-07-16	1	-0/+10
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Software trap number one is the trap used for breakpoints in the Sparc ABI. Reviewers: jyknight, venkatra Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D48637 llvm-svn: 337163
*	Avoid losing Hi part when expanding VAARG nodes on big endian machines	Daniel Cederman	2018-07-16	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: If the high part of the load is not used the offset to the next element will not be set correctly. For example, on Sparc V8, the following code will read val2 from offset 4 instead of 8. ``` int val = __builtin_va_arg(va, long long); int val2 = __builtin_va_arg(va, int); ``` Reviewers: jyknight Reviewed By: jyknight Subscribers: fedor.sergeev, jrtc27, llvm-commits Differential Revision: https://reviews.llvm.org/D48595 llvm-svn: 337161
*	[x86/SLH] Fix a bug where we would try to post-load harden non-GPRs.	Chandler Carruth	2018-07-16	1	-0/+272
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Found cases that hit the assert I added. This patch factors the validity checking into a nice helper routine and calls it when deciding to harden post-load, and asserts it when doing so later. I've added tests for the various ways of loading a floating point type, as well as loading all vector permutations. Even though many of these go to identical instructions, it seems good to somewhat comprehensively test them. I'm confident there will be more fixes needed here, I'll try to add tests each time as I get this predicate adjusted. llvm-svn: 337160
*	run post-RA hazard recognizer pass late	Mark Searles	2018-07-16	3	-9/+51
\| \| \| \| \| \| \| \| \| \| \| \| \|	Memory legalizer, waitcnt, and shrink passes can perturb the instructions, which means that the post-RA hazard recognizer pass should run after them. Otherwise, one of those passes may invalidate the work done by the hazard recognizer. Note that this has adverse side-effect that any consecutive S_NOP 0's, emitted by the hazard recognizer, will not be shrunk into a single S_NOP <N>. This should be addressed in a follow-on patch. Differential Revision: https://reviews.llvm.org/D49288 llvm-svn: 337154
*	[X86] Merge the FR128 and VR128 regclass since they have identical spill and ↵	Craig Topper	2018-07-16	19	-442/+425
\| \| \| \| \| \| \| \| \| \|	alignment characteristics. This unfortunately requires a bunch of bitcasts to be added added to SUBREG_TO_REG, COPY_TO_REGCLASS, and instructions in output patterns. Otherwise tablegen seems to default to picking f128 and then we fail when something tries to get the register class for f128 which isn't always valid. The test changes are because we were previously mixing fr128 and vr128 due to contrainRegClass finding FR128 first and passes like live range shrinking weren't handling that well. llvm-svn: 337147
*	[x86/SLH] Teach speculative load hardening to correctly harden the	Chandler Carruth	2018-07-16	1	-0/+955
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	indices used by AVX2 and AVX-512 gather instructions. The index vector is hardened by broadcasting the predicate state into a vector register and then or-ing. We don't even have to worry about EFLAGS here. I've added a test for all of the gather intrinsics to make sure that we don't miss one. A particularly interesting creation is the gather prefetch, which needs to be marked as potentially "loading" to get the correct behavior. It's a memory access in many ways, and is actually relevant for SLH. Based on discussion with Craig in review, I've moved it to be `mayLoad` and `mayStore` rather than generic side effects. This matches how we model other prefetch instructions. Many thanks to Craig for the review here. Differential Revision: https://reviews.llvm.org/D49336 llvm-svn: 337144
*	[X86] Add custom execution domain fixing for 128/256-bit integer logic ↵	Craig Topper	2018-07-15	27	-1806/+1342
\| \| \| \| \| \| \| \| \| \| \| \|	operations with AVX512F, but not AVX512DQ. AVX512F only has integer domain logic instructions. AVX512DQ added FP domain logic instructions. Execution domain fixing runs before EVEX->VEX. So if we have AVX512F and not AVX512DQ we fail to do execution domain switching of the logic operations. This leads to mismatches in execution domain and more test differences. This patch adds custom domain fixing that switches EVEX integer logic operations to VEX fp logic operations if XMM16-31 are not used. llvm-svn: 337137
*	[X86] Use 128-bit blends instead vmovss/vmovsd for 512-bit vzmovl patterns ↵	Craig Topper	2018-07-15	2	-6/+6
\| \| \| \| \| \|	to match AVX. llvm-svn: 337135
*	[X86] Use 128-bit ops for 256-bit vzmovl patterns.	Craig Topper	2018-07-15	6	-21/+29
\| \| \| \| \| \| \| \|	128-bit ops implicitly zero the upper bits. This should address the comment about domain crossing for the integer version without AVX2 since we can use a 128-bit VBLENDW without AVX2. The only bad thing I see here is that we failed to reuse an vxorps in some of the tests, but I think that's already known issue. llvm-svn: 337134
*	[DAGCombiner] extend(ifpositive(X)) -> shift-right (not X)	Sanjay Patel	2018-07-15	9	-68/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is almost the same as an existing IR canonicalization in instcombine, so I'm assuming this is a good early generic DAG combine too. The motivation comes from reduced bit-hacking for select-of-constants in IR after rL331486. We want to restore that functionality in the DAG as noted in the commit comments for that change and the llvm-dev discussion here: http://lists.llvm.org/pipermail/llvm-dev/2018-July/124433.html The PPC and AArch tests show that those targets are already doing something similar. x86 will be neutral in the minimal case and generally better when this pattern is extended with other ops as shown in the signbit-shift.ll tests. Note the asymmetry: we don't include the (extend (ifneg X)) transform because it already exists in SimplifySelectCC(), and that is verified in the later unchanged tests in the signbit-shift.ll files. Without the 'not' op, the general transform to use a shift is always a win because that's a single instruction. Alive proofs: https://rise4fun.com/Alive/ysli Name: if pos, get -1 %c = icmp sgt i16 %x, -1 %r = sext i1 %c to i16 => %n = xor i16 %x, -1 %r = ashr i16 %n, 15 Name: if pos, get 1 %c = icmp sgt i16 %x, -1 %r = zext i1 %c to i16 => %n = xor i16 %x, -1 %r = lshr i16 %n, 15 Differential Revision: https://reviews.llvm.org/D48970 llvm-svn: 337130
*	[AMDGPU] adjusted test checks because minnum with NaN gets simplified	Sanjay Patel	2018-07-15	1	-4/+5
\| \| \| \| \| \| \| \|	This was improved with rL337127, but I missed the failure in this test. I'm not sure what the expected result will be, so I've generalized it and added a FIXME comment. llvm-svn: 337128
*	[X86] Add some optsize patterns for 256-bit X86vzmovl.	Craig Topper	2018-07-15	1	-0/+64
\| \| \| \| \| \|	These patterns use VMOVSS/SD. Without optsize we use BLENDI instead. llvm-svn: 337119
*	[MachineOutliner] Check the last instruction from the sequence when updating ↵	Francis Visoiu Mistrih	2018-07-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	liveness The MachineOutliner was doing an std::for_each from the call (inserted before the outlined sequence) to the iterator at the end of the sequence. std::for_each needs the iterator past the end, so the last instruction was not taken into account when propagating the liveness information. This fixes the machine verifier issue in machine-outliner-disubprogram.ll. Differential Revision: https://reviews.llvm.org/D49295 llvm-svn: 337090
*	[x86/SLH] Fix an issue where we wouldn't harden any loads if we found	Chandler Carruth	2018-07-14	1	-67/+88
\| \| \| \| \| \| \| \| \|	no conditions. This is only valid to do if we're hardening calls and rets with LFENCE which results in an LFENCE guarding the entire entry block for us. llvm-svn: 337089
*	[X86] Fix a subtle bug in the custom execution domain fixing for blends.	Craig Topper	2018-07-14	6	-83/+83
\| \| \| \| \| \| \| \|	The code tried to find the immediate by using getNumOperands() on the MachineInstr, but there might be implicit-defs after the immediate that get counted. Instead use getNumOperands() from the instruction description which will only count the operands that are defined in the td file. llvm-svn: 337088
*	[X86] Prefer blendi over movss/sd when avx512 is enabled unless optimizing ↵	Craig Topper	2018-07-14	20	-786/+361
\| \| \| \| \| \| \| \| \| \|	for size. AVX512 doesn't have an immediate controlled blend instruction. But blend throughput is still better than movss/sd on SKX. This commit changes AVX512 to use the AVX blend instructions instead of MOVSS/MOVSD. This constrains the register allocation since it won't be able to use XMM16-31, but hopefully the increased throughput and reduced port 5 pressure makes up for that. llvm-svn: 337083
*	Revert "AMDGPU: Fix handling of alignment padding in DAG argument lowering"	Evgeniy Stepanov	2018-07-14	4	-206/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit r337021. WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7 #1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3 #2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3 #3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18 #4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23 #5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5 #6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const, llvm::raw_ostream&, char const) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5 #7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3 #8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26 [...] Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv' #0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192 llvm-svn: 337079
*	Add a CHECK line for r337072.	Tim Shen	2018-07-13	1	-0/+1
\| \| \| \|	llvm-svn: 337074
*	[Hexagon] Avoid introducing calls into coalesced range of HVX vector pairs	Krzysztof Parzyszek	2018-07-13	1	-0/+26
\| \| \| \| \| \| \| \| \| \| \| \| \|	If an HVX vector register is to be coalesced into a vector pair, make sure that the vector pair will not have a function call in its live range, unless it already had one. All HVX vector registers are volatile, so any vector register live across a function call will have to be spilled. If a vector needs to be spilled, and it's coalesced into a vector pair then the whole pair will need to be spilled (even if only a part of it is live), taking extra stack space. llvm-svn: 337073
*	[LSR] If no Use is interesting, early return.	Tim Shen	2018-07-13	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: By looking at the callers of getUse(), we can see that even though IVUsers may offer uses, but they may not be interesting to LSR. It's possible that none of them is interesting. Reviewers: sanjoy Subscribers: jlebar, hiraditya, bixia, llvm-commits Differential Revision: https://reviews.llvm.org/D49049 llvm-svn: 337072
*	AMDGPU/GlobalISel: Implement select() for 32-bit @llvm.minnun and @llvm.maxnum	Tom Stellard	2018-07-13	2	-0/+131
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D46172 llvm-svn: 337056
*	[X86][FastISel] Support uitofp with avx512.	Craig Topper	2018-07-13	2	-0/+229
\| \| \| \|	llvm-svn: 337055
*	AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.exp	Tom Stellard	2018-07-13	1	-0/+33
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45882 llvm-svn: 337046
*	[X86][FastISel] Add EVEX support to sitofp handling.	Craig Topper	2018-07-13	2	-0/+3
\| \| \| \|	llvm-svn: 337045
*	[X86] Try fixing r336768	Fangrui Song	2018-07-13	1	-8/+8
\| \| \| \|	llvm-svn: 337043
*	AMDGPU: Fix handling of alignment padding in DAG argument lowering	Matt Arsenault	2018-07-13	4	-21/+206
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was completely broken if there was ever a struct argument, as this information is thrown away during the argument analysis. The offsets as passed in to LowerFormalArguments are not useful, as they partially depend on the legalized result register type, and they don't consider the alignment in the first place. Ignore the Ins array, and instead figure out from the raw IR type what we need to do. This seems to fix the padding computation if the DAG lowering is forced (and stops breaking arguments following padded arguments if the arguments were only partially lowered in the IR) llvm-svn: 337021
*	[NFC][X86][AArch64] Negative tests for 'check for [no] signed truncation' ↵	Roman Lebedev	2018-07-13	4	-0/+595
\| \| \| \| \| \| \| \| \| \| \| \| \|	pattern See D49247, D49266 I'm only adding the sane negative tests, and not adding the one-use tests yet. Also, not adding negative tests for the second pattern with inverted operands yet, since it's handling will be added in later differential. llvm-svn: 337014
*	[PowerPC] Materialize more constants with CR-field set in late peephole	Nemanja Ivanovic	2018-07-13	2	-3/+422
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Revision r322373 fixed a bug in how we materialize constants when the CR-field needs to be set. However the fix is overly conservative. It will only do the transform if AND-ing the input with the new constant produces the same new constant. This is of course correct, but not necessarily required. If there are no futher uses of the constant, the constant can be changed. If there are no uses of the GPR result, the final result of the materialization isn't important other than it needs to compare to zero correctly (lt, gt, eq). Differential revision: https://reviews.llvm.org/D42109 llvm-svn: 337008
*	[mips] Add microMIPS case to the tests and regenerate assertions using ↵	Simon Atanasyan	2018-07-13	1	-7/+114
\| \| \| \| \| \|	update_llc_test_checks.py. NFC llvm-svn: 337004
*	[SLH] Introduce a new pass to do Speculative Load Hardening to mitigate	Chandler Carruth	2018-07-13	1	-0/+571
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Spectre variant #1 for x86. There is a lengthy, detailed RFC thread on llvm-dev which discusses the high level issues. High level discussion is probably best there. I've split the design document out of this patch and will land it separately once I update it to reflect the latest edits and updates to the Google doc used in the RFC thread. This patch is really just an initial step. It isn't quite ready for prime time and is only exposed via debugging flags. It has two major limitations currently: 1) It only supports x86-64, and only certain ABIs. Many assumptions are currently hard-coded and need to be factored out of the code here. 2) It doesn't include any options for more fine-grained control, either of which control flow edges are significant or which loads are important to be hardened. 3) The code is still quite rough and the testing lighter than I'd like. However, this is enough for people to begin using. I have had numerous requests from people to be able to experiment with this patch to understand the trade-offs it presents and how to use it. We would also like to encourage work to similar effect in other toolchains. The ARM folks are actively developing a system based on this for AArch64. We hope to merge this with their efforts when both are far enough along. But we also don't want to block making this available on that effort. Many thanks to the numerous people who helped along the way here. For this patch in particular, both Eric and Craig did a ton of review to even have confidence in it as an early, rough cut at this functionality. Differential Revision: https://reviews.llvm.org/D44824 llvm-svn: 336990
*	[x86] Fix a capitalization that I failed to save in my editor before	Chandler Carruth	2018-07-13	1	-1/+1
\| \| \| \| \| \|	landing the patch. =/ llvm-svn: 336986
*	[x86] Teach the EFLAGS copy lowering to handle much more complex control	Chandler Carruth	2018-07-13	1	-0/+198
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	flow patterns including forks, merges, and even cyles. This tries to cover a reasonably comprehensive set of patterns that still don't require PHIs or PHI placement. The coverage was inspired by the amazing variety of patterns produced when copy EFLAGS and restoring it to implement Speculative Load Hardening. Without this patch, we simply cannot make such complex and invasive changes to x86 instruction sequences due to EFLAGS. I've added "just" one test, but this test covers many different complexities and corner cases of this approach. It is actually more comprehensive, as far as I can tell, than anything that I have encountered in the wild on SLH. Because the test is so complex, I've tried to give somewhat thorough comments and an ASCII-art diagram of the control flows to make it a bit easier to read and maintain long-term. Differential Revision: https://reviews.llvm.org/D49220 llvm-svn: 336985
*	[AArch64] Updated bigendian buildvector tests	Simon Pilgrim	2018-07-13	1	-128/+128
\| \| \| \| \| \|	As suggested by @efriedma on D49262 - changed the extractelement to a store to prevent SimplifyDemandedVectorElts from simplifying the build vectors - this keeps the immediate generation which was the point of the tests. llvm-svn: 336981
*	[ARM] Regenerated arg endian test	Simon Pilgrim	2018-07-13	1	-48/+224
\| \| \| \| \| \|	As requested on D49262 llvm-svn: 336980
*	[X86] Remove isel patterns that turns packed add/sub/mul/div+movss/sd into ↵	Craig Topper	2018-07-13	1	-304/+800
\| \| \| \| \| \| \| \| \| \|	scalar intrinsic instructions. This is not an optimization we should be doing in isel. This is more suitable for a DAG combine. My main concern is a future time when we support more FPENV. Changing a packed op to a scalar op could cause us to miss some exceptions that should have occured if we had done a packed op. A DAG combine would be better able to manage this. llvm-svn: 336971