bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[AMDGPU][Waitcnt] Re-apply fix "comparison of integers of different signs" ↵	Mark Searles	2018-07-16	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	build error" Re-apply "[AMDGPU][Waitcnt] fix "comparison of integers of different signs" build error"" ( fe0a456510131f268e388c4a18a92f575c0db183 ), which was inadvertantly reverted via 2b2ee080f0164485562593b1b87291a48cea4a9a . llvm-svn: 337156
*	run post-RA hazard recognizer pass late	Mark Searles	2018-07-16	2	-4/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	Memory legalizer, waitcnt, and shrink passes can perturb the instructions, which means that the post-RA hazard recognizer pass should run after them. Otherwise, one of those passes may invalidate the work done by the hazard recognizer. Note that this has adverse side-effect that any consecutive S_NOP 0's, emitted by the hazard recognizer, will not be shrunk into a single S_NOP <N>. This should be addressed in a follow-on patch. Differential Revision: https://reviews.llvm.org/D49288 llvm-svn: 337154
*	Revert "[AMDGPU][Waitcnt] fix "comparison of integers of different signs" ↵	Mark Searles	2018-07-16	1	-1/+1
\| \| \| \| \| \| \| \|	build error" This reverts commit fe0a456510131f268e388c4a18a92f575c0db183. llvm-svn: 337153
*	[X86] Merge the FR128 and VR128 regclass since they have identical spill and ↵	Craig Topper	2018-07-16	7	-298/+328
\| \| \| \| \| \| \| \| \| \|	alignment characteristics. This unfortunately requires a bunch of bitcasts to be added added to SUBREG_TO_REG, COPY_TO_REGCLASS, and instructions in output patterns. Otherwise tablegen seems to default to picking f128 and then we fail when something tries to get the register class for f128 which isn't always valid. The test changes are because we were previously mixing fr128 and vr128 due to contrainRegClass finding FR128 first and passes like live range shrinking weren't handling that well. llvm-svn: 337147
*	[x86/SLH] Fix an unused variable warning in release builds after	Chandler Carruth	2018-07-16	1	-0/+1
\| \| \| \| \| \|	r337144. llvm-svn: 337145
*	[x86/SLH] Teach speculative load hardening to correctly harden the	Chandler Carruth	2018-07-16	2	-17/+92
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	indices used by AVX2 and AVX-512 gather instructions. The index vector is hardened by broadcasting the predicate state into a vector register and then or-ing. We don't even have to worry about EFLAGS here. I've added a test for all of the gather intrinsics to make sure that we don't miss one. A particularly interesting creation is the gather prefetch, which needs to be marked as potentially "loading" to get the correct behavior. It's a memory access in many ways, and is actually relevant for SLH. Based on discussion with Craig in review, I've moved it to be `mayLoad` and `mayStore` rather than generic side effects. This matches how we model other prefetch instructions. Many thanks to Craig for the review here. Differential Revision: https://reviews.llvm.org/D49336 llvm-svn: 337144
*	[x86/SLH] Extract one of the bits of logic to its own function. NFC.	Chandler Carruth	2018-07-15	1	-43/+48
\| \| \| \| \| \| \|	This is just a refactoring to start cleaning up the code here and make it more readable and approachable. llvm-svn: 337138
*	[X86] Add custom execution domain fixing for 128/256-bit integer logic ↵	Craig Topper	2018-07-15	1	-0/+85
\| \| \| \| \| \| \| \| \| \| \| \|	operations with AVX512F, but not AVX512DQ. AVX512F only has integer domain logic instructions. AVX512DQ added FP domain logic instructions. Execution domain fixing runs before EVEX->VEX. So if we have AVX512F and not AVX512DQ we fail to do execution domain switching of the logic operations. This leads to mismatches in execution domain and more test differences. This patch adds custom domain fixing that switches EVEX integer logic operations to VEX fp logic operations if XMM16-31 are not used. llvm-svn: 337137
*	[X86] Add load patterns for cases where we select X86Movss/X86Movsd to blend ↵	Craig Topper	2018-07-15	1	-0/+32
\| \| \| \| \| \| \| \|	instructions. This allows us to fold the load during isel without waiting for the peephole pass to do it. llvm-svn: 337136
*	[X86] Use 128-bit blends instead vmovss/vmovsd for 512-bit vzmovl patterns ↵	Craig Topper	2018-07-15	1	-12/+39
\| \| \| \| \| \|	to match AVX. llvm-svn: 337135
*	[X86] Use 128-bit ops for 256-bit vzmovl patterns.	Craig Topper	2018-07-15	1	-10/+17
\| \| \| \| \| \| \| \|	128-bit ops implicitly zero the upper bits. This should address the comment about domain crossing for the integer version without AVX2 since we can use a 128-bit VBLENDW without AVX2. The only bad thing I see here is that we failed to reuse an vxorps in some of the tests, but I think that's already known issue. llvm-svn: 337134
*	[llvm-mca][BtVer2] teach how to identify false dependencies on partially written	Andrea Di Biagio	2018-07-15	1	-1/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	registers. The goal of this patch is to improve the throughput analysis in llvm-mca for the case where instructions perform partial register writes. On x86, partial register writes are quite difficult to model, mainly because different processors tend to implement different register merging schemes in hardware. When the code contains partial register writes, the IPC (instructions per cycles) estimated by llvm-mca tends to diverge quite significantly from the observed IPC (using perf). Modern AMD processors (at least, from Bulldozer onwards) don't rename partial registers. Quoting Agner Fog's microarchitecture.pdf: " The processor always keeps the different parts of an integer register together. For example, AL and AH are not treated as independent by the out-of-order execution mechanism. An instruction that writes to part of a register will therefore have a false dependence on any previous write to the same register or any part of it." This patch is a first important step towards improving the analysis of partial register updates. It changes the semantic of RegisterFile descriptors in tablegen, and teaches llvm-mca how to identify false dependences in the presence of partial register writes (for more details: see the new code comments in include/Target/TargetSchedule.h - class RegisterFile). This patch doesn't address the case where a write to a part of a register is followed by a read from the whole register. On Intel chips, high8 registers (AH/BH/CH/DH)) can be stored in separate physical registers. However, a later (dirty) read of the full register (example: AX/EAX) triggers a merge uOp, which adds extra latency (and potentially affects the pipe usage). This is a very interesting article on the subject with a very informative answer from Peter Cordes: https://stackoverflow.com/questions/45660139/how-exactly-do-partial-registers-on-haswell-skylake-perform-writing-al-seems-to In future, the definition of RegisterFile can be extended with extra information that may be used to identify delays caused by merge opcodes triggered by a dirty read of a partial write. Differential Revision: https://reviews.llvm.org/D49196 llvm-svn: 337123
*	[AVR] Document some public functions	Dylan McKay	2018-07-15	1	-0/+2
\| \| \| \|	llvm-svn: 337122
*	[X86] Add some optsize patterns for 256-bit X86vzmovl.	Craig Topper	2018-07-15	1	-0/+19
\| \| \| \| \| \|	These patterns use VMOVSS/SD. Without optsize we use BLENDI instead. llvm-svn: 337119
*	[x86/SLH] Fix an issue where we wouldn't harden any loads if we found	Chandler Carruth	2018-07-14	1	-3/+3
\| \| \| \| \| \| \| \| \|	no conditions. This is only valid to do if we're hardening calls and rets with LFENCE which results in an LFENCE guarding the entire entry block for us. llvm-svn: 337089
*	[X86] Fix a subtle bug in the custom execution domain fixing for blends.	Craig Topper	2018-07-14	1	-2/+2
\| \| \| \| \| \| \| \|	The code tried to find the immediate by using getNumOperands() on the MachineInstr, but there might be implicit-defs after the immediate that get counted. Instead use getNumOperands() from the instruction description which will only count the operands that are defined in the td file. llvm-svn: 337088
*	[X86] Prefer blendi over movss/sd when avx512 is enabled unless optimizing ↵	Craig Topper	2018-07-14	2	-11/+16
\| \| \| \| \| \| \| \| \| \|	for size. AVX512 doesn't have an immediate controlled blend instruction. But blend throughput is still better than movss/sd on SKX. This commit changes AVX512 to use the AVX blend instructions instead of MOVSS/MOVSD. This constrains the register allocation since it won't be able to use XMM16-31, but hopefully the increased throughput and reduced port 5 pressure makes up for that. llvm-svn: 337083
*	Revert "AMDGPU: Fix handling of alignment padding in DAG argument lowering"	Evgeniy Stepanov	2018-07-14	13	-214/+193
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This reverts commit r337021. WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x1415cd65 in void write_signed<long>(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:95:7 #1 0x1415c900 in llvm::write_integer(llvm::raw_ostream&, long, unsigned long, llvm::IntegerStyle) /code/llvm-project/llvm/lib/Support/NativeFormatting.cpp:121:3 #2 0x1472357f in llvm::raw_ostream::operator<<(long) /code/llvm-project/llvm/lib/Support/raw_ostream.cpp:117:3 #3 0x13bb9d4 in llvm::raw_ostream::operator<<(int) /code/llvm-project/llvm/include/llvm/Support/raw_ostream.h:210:18 #4 0x3c2bc18 in void printField<unsigned int, &(amd_kernel_code_s::amd_kernel_code_version_major)>(llvm::StringRef, amd_kernel_code_s const&, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:78:23 #5 0x3c250ba in llvm::printAmdKernelCodeField(amd_kernel_code_s const&, int, llvm::raw_ostream&) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:104:5 #6 0x3c27ca3 in llvm::dumpAmdKernelCode(amd_kernel_code_s const, llvm::raw_ostream&, char const) /code/llvm-project/llvm/lib/Target/AMDGPU/Utils/AMDKernelCodeTUtils.cpp:113:5 #7 0x3a46e6c in llvm::AMDGPUTargetAsmStreamer::EmitAMDKernelCodeT(amd_kernel_code_s const&) /code/llvm-project/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUTargetStreamer.cpp:161:3 #8 0xd371e4 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:204:26 [...] Uninitialized value was created by an allocation of 'KernelCode' in the stack frame of function '_ZN4llvm16AMDGPUAsmPrinter21EmitFunctionBodyStartEv' #0 0xd36650 in llvm::AMDGPUAsmPrinter::EmitFunctionBodyStart() /code/llvm-project/llvm/lib/Target/AMDGPU/AMDGPUAsmPrinter.cpp:192 llvm-svn: 337079
*	[x86/SLH] Add an assert to catch if we ever end up trying to harden	Chandler Carruth	2018-07-14	1	-0/+8
\| \| \| \| \| \|	post-load a register that isn't valid for use with OR or SHRX. llvm-svn: 337078
*	[Hexagon] Avoid introducing calls into coalesced range of HVX vector pairs	Krzysztof Parzyszek	2018-07-13	2	-0/+54
\| \| \| \| \| \| \| \| \| \| \| \| \|	If an HVX vector register is to be coalesced into a vector pair, make sure that the vector pair will not have a function call in its live range, unless it already had one. All HVX vector registers are volatile, so any vector register live across a function call will have to be spilled. If a vector needs to be spilled, and it's coalesced into a vector pair then the whole pair will need to be spilled (even if only a part of it is live), taking extra stack space. llvm-svn: 337073
*	[X86][SLH] Remove PDEP and PEXT from isDataInvariantLoad	Craig Topper	2018-07-13	1	-4/+0
\| \| \| \| \| \| \| \| \| \| \| \|	Ryzen has something like an 18 cycle latency on these based on Agner's data. AMD's own xls is blank. So it seems like there might be something tricky here. Agner's data for Intel CPUs indicates these are a single uop there. Probably safest to remove them. We never generate them without an intrinsic so this should be ok. Differential Revision: https://reviews.llvm.org/D49315 llvm-svn: 337067
*	[X86][SLH] Add VEX and EVEX conversion instructions to isDataInvariantLoad	Craig Topper	2018-07-13	1	-13/+19
\| \| \| \| \| \| \| \| \| \|	-Drop the intrinsic versions of conversion instructions. These should be handled when we do vectors. They shouldn't show up in scalar code. -Add the float<->double conversions which were missing. -Add the AVX512 and AVX version of the conversion instructions including the unsigned integer conversions unique to AVX512 Differential Revision: https://reviews.llvm.org/D49313 llvm-svn: 337066
*	[X86][SLH] Regroup the instructions in isDataInvariantLoad a little. NFC	Craig Topper	2018-07-13	1	-36/+43
\| \| \| \| \| \| \| \| \| \|	-Move BSF/BSR to the same group as TZCNT/LZCNT/POPCNT. -Split some of the bit manipulation instructions away from TZCNT/LZCNT/POPCNT. These are things like 'x & (x - 1)' which are composed of a few simple arithmetic operations. These aren't nearly as complicated/surprising as counting bits. -Move BEXTR/BZHI into their own group. They aren't like a simple arithmethic op or the bit manipulation instructions. They're more like a shift+and. Differential Revision: https://reviews.llvm.org/D49312 llvm-svn: 337065
*	[X86] Use the correct types in some recently added isel patterns.	Craig Topper	2018-07-13	1	-2/+2
\| \| \| \| \| \| \| \|	These were supposed to be integer types since we are selecting integer instructions. Found while preparing to remove these patterns for another patch. llvm-svn: 337057
*	AMDGPU/GlobalISel: Implement select() for 32-bit @llvm.minnun and @llvm.maxnum	Tom Stellard	2018-07-13	2	-0/+19
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D46172 llvm-svn: 337056
*	[X86][FastISel] Support uitofp with avx512.	Craig Topper	2018-07-13	1	-8/+26
\| \| \| \|	llvm-svn: 337055
*	[X86] Correct comment of TEST elimination in BSF/TZCNT	Fangrui Song	2018-07-13	1	-2/+2
\| \| \| \|	llvm-svn: 337052
*	AMDGPU/GlobalISel: Implement select() for @llvm.amdgcn.exp	Tom Stellard	2018-07-13	2	-0/+71
\| \| \| \| \| \| \| \| \| \|	Reviewers: arsenm, nhaehnle Subscribers: kzhuravl, wdng, yaxunl, rovka, kristof.beyls, dstuttard, tpr, t-tye, llvm-commits Differential Revision: https://reviews.llvm.org/D45882 llvm-svn: 337046
*	[X86][FastISel] Add EVEX support to sitofp handling.	Craig Topper	2018-07-13	1	-7/+16
\| \| \| \|	llvm-svn: 337045
*	[X86] Try fixing r336768	Fangrui Song	2018-07-13	1	-1/+1
\| \| \| \|	llvm-svn: 337043
*	AMDGPU: Properly handle shader inputs with split arguments	Matt Arsenault	2018-07-13	1	-12/+27
\| \| \| \| \| \| \| \| \| \|	This needs to refer to arguments by their original argument index, not the argument split index which depends on what the type splitting decides to do. Also avoid increment PSInputNum for each split piece. llvm-svn: 337022
*	AMDGPU: Fix handling of alignment padding in DAG argument lowering	Matt Arsenault	2018-07-13	13	-193/+214
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This was completely broken if there was ever a struct argument, as this information is thrown away during the argument analysis. The offsets as passed in to LowerFormalArguments are not useful, as they partially depend on the legalized result register type, and they don't consider the alignment in the first place. Ignore the Ins array, and instead figure out from the raw IR type what we need to do. This seems to fix the padding computation if the DAG lowering is forced (and stops breaking arguments following padded arguments if the arguments were only partially lowered in the IR) llvm-svn: 337021
*	[AArch64] Armv8.4-A: LDAPR & STLR with immediate offset instructions (cont'd)	Sjoerd Meijer	2018-07-13	3	-22/+35
\| \| \| \| \| \| \| \| \|	Follow up of rL336913: fix base class description. Thanks to Ahmed Bougacha for pointing this out. Differential Revision: https://reviews.llvm.org/D49284 llvm-svn: 337009
*	[PowerPC] Materialize more constants with CR-field set in late peephole	Nemanja Ivanovic	2018-07-13	1	-5/+28
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Revision r322373 fixed a bug in how we materialize constants when the CR-field needs to be set. However the fix is overly conservative. It will only do the transform if AND-ing the input with the new constant produces the same new constant. This is of course correct, but not necessarily required. If there are no futher uses of the constant, the constant can be changed. If there are no uses of the GPR result, the final result of the materialization isn't important other than it needs to compare to zero correctly (lt, gt, eq). Differential revision: https://reviews.llvm.org/D42109 llvm-svn: 337008
*	[cfi-verify] Support AArch64.	Joel Galenson	2018-07-13	3	-9/+15
\| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support for AArch64 to cfi-verify. This required three changes to cfi-verify. First, it generalizes checking if an instruction is a trap by adding a new isTrap flag to TableGen (and defining it for x86 and AArch64). Second, the code that ensures that the operand register is not clobbered between the CFI check and the indirect call needs to allow a single dereference (in x86 this happens as part of the jump instruction). Third, we needed to ensure that return instructions are not counted as indirect branches. Technically, returns are indirect branches and can be covered by CFI, but LLVM's forward-edge CFI does not protect them, and x86 does not consider them, so we keep that behavior. In addition, we had to improve AArch64's code to evaluate the branch target of a MCInst to handle calls where the destination is not the first operand (which it often is not). Differential Revision: https://reviews.llvm.org/D48836 llvm-svn: 337007
*	Add parens to silence Wparentheses warning, introduced by 336990	Erich Keane	2018-07-13	1	-5/+3
\| \| \| \|	llvm-svn: 337002
*	[TableGen] Support multi-alternative pattern fragments	Ulrich Weigand	2018-07-13	4	-133/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A TableGen instruction record usually contains a DAG pattern that will describe the SelectionDAG operation that can be implemented by this instruction. However, there will be cases where several different DAG patterns can all be implemented by the same instruction. The way to represent this today is to write additional patterns in the Pattern (or usually Pat) class that map those extra DAG patterns to the instruction. This usually also works fine. However, I've noticed cases where the current setup seems to require quite a bit of extra (and duplicated) text in the target .td files. For example, in the SystemZ back-end, there are quite a number of instructions that can implement an "add-with-overflow" operation. The same instructions also need to be used to implement just plain addition (simply ignoring the extra overflow output). The current solution requires creating extra Pat pattern for every instruction, duplicating the information about which particular add operands map best to which particular instruction. This patch enhances TableGen to support a new PatFrags class, which can be used to encapsulate multiple alternative patterns that may all match to the same instruction. It operates the same way as the existing PatFrag class, except that it accepts a list of DAG patterns to match instead of just a single one. As an example, we can now define a PatFrags to match either an "add-with-overflow" or a regular add operation: def z_sadd : PatFrags<(ops node:$src1, node:$src2), [(z_saddo node:$src1, node:$src2), (add node:$src1, node:$src2)]>; and then use this in the add instruction pattern: defm AR : BinaryRRAndK<"ar", 0x1A, 0xB9F8, z_sadd, GR32, GR32>; These SystemZ target changes are implemented here as well. Note that PatFrag is now defined as a subclass of PatFrags, which means that some users of internals of PatFrag need to be updated. (E.g. instead of using PatFrag.Fragment you now need to use !head(PatFrag.Fragments).) The implementation is based on the following main ideas: - InlinePatternFragments may now replace each original pattern with several result patterns, not just one. - parseInstructionPattern delays calling InlinePatternFragments and InferAllTypes. Instead, it extracts a single DAG match pattern from the main instruction pattern. - Processing of the DAG match pattern part of the main instruction pattern now shares most code with processing match patterns from the Pattern class. - Direct use of main instruction patterns in InferFromPattern and EmitResultInstructionAsOperand is removed; everything now operates solely on DAG match patterns. Reviewed by: hfinkel Differential Revision: https://reviews.llvm.org/D48545 llvm-svn: 336999
*	[SLH] Introduce a new pass to do Speculative Load Hardening to mitigate	Chandler Carruth	2018-07-13	4	-0/+1677
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Spectre variant #1 for x86. There is a lengthy, detailed RFC thread on llvm-dev which discusses the high level issues. High level discussion is probably best there. I've split the design document out of this patch and will land it separately once I update it to reflect the latest edits and updates to the Google doc used in the RFC thread. This patch is really just an initial step. It isn't quite ready for prime time and is only exposed via debugging flags. It has two major limitations currently: 1) It only supports x86-64, and only certain ABIs. Many assumptions are currently hard-coded and need to be factored out of the code here. 2) It doesn't include any options for more fine-grained control, either of which control flow edges are significant or which loads are important to be hardened. 3) The code is still quite rough and the testing lighter than I'd like. However, this is enough for people to begin using. I have had numerous requests from people to be able to experiment with this patch to understand the trade-offs it presents and how to use it. We would also like to encourage work to similar effect in other toolchains. The ARM folks are actively developing a system based on this for AArch64. We hope to merge this with their efforts when both are far enough along. But we also don't want to block making this available on that effort. Many thanks to the numerous people who helped along the way here. For this patch in particular, both Eric and Craig did a ton of review to even have confidence in it as an early, rough cut at this functionality. Differential Revision: https://reviews.llvm.org/D44824 llvm-svn: 336990
*	[x86] Teach the EFLAGS copy lowering to handle much more complex control	Chandler Carruth	2018-07-13	1	-44/+161
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	flow patterns including forks, merges, and even cyles. This tries to cover a reasonably comprehensive set of patterns that still don't require PHIs or PHI placement. The coverage was inspired by the amazing variety of patterns produced when copy EFLAGS and restoring it to implement Speculative Load Hardening. Without this patch, we simply cannot make such complex and invasive changes to x86 instruction sequences due to EFLAGS. I've added "just" one test, but this test covers many different complexities and corner cases of this approach. It is actually more comprehensive, as far as I can tell, than anything that I have encountered in the wild on SLH. Because the test is so complex, I've tried to give somewhat thorough comments and an ASCII-art diagram of the control flows to make it a bit easier to read and maintain long-term. Differential Revision: https://reviews.llvm.org/D49220 llvm-svn: 336985
*	[AArch64][SVE] Asm: Vector Unpack Low/High instructions.	Sander de Smalen	2018-07-13	2	-0/+45
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch adds support for the following unpack instructions: - PUNPKLO, PUNPKHI Unpack elements from low/high half and place into elements of twice their size. e.g. punpklo p0.h, p0.b - UUNPKLO, UUNPKHI Unpack elements from low/high half and SUNPKLO, SUNPKHI place into elements of twice their size after zero- or sign-extending the values. e.g. uunpklo z0.h, z0.b llvm-svn: 336982
*	[AArch64][SVE] Asm: Support for insert element (INSR) instructions.	Sander de Smalen	2018-07-13	2	-0/+51
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Insert general purpose register into shifted vector, e.g. insr z0.s, w0 insr z0.d, x0 Insert SIMD&FP scalar register into shifted vector, e.g. insr z0.b, b0 insr z0.h, h0 insr z0.s, s0 insr z0.d, d0 llvm-svn: 336979
*	[X86] Prefer MOVSS/SD over BLEND under optsize in isel.	Craig Topper	2018-07-13	2	-14/+46
\| \| \| \| \| \|	Previously we iseled to blend, commuted to another blend, and then commuted back to movss/movsd or blend depending on optsize. Now we do it directly. llvm-svn: 336976
*	[X86] Remove isel patterns that turns packed add/sub/mul/div+movss/sd into ↵	Craig Topper	2018-07-13	2	-41/+33
\| \| \| \| \| \| \| \| \| \|	scalar intrinsic instructions. This is not an optimization we should be doing in isel. This is more suitable for a DAG combine. My main concern is a future time when we support more FPENV. Changing a packed op to a scalar op could cause us to miss some exceptions that should have occured if we had done a packed op. A DAG combine would be better able to manage this. llvm-svn: 336971
*	CodeGen: Remove pipeline dependencies on StackProtector; NFC	Matthias Braun	2018-07-13	3	-0/+12
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This re-applies r336929 with a fix to accomodate for the Mips target scheduling multiple SelectionDAG instances into the pass pipeline. PrologEpilogInserter and StackColoring depend on the StackProtector analysis being alive from the point it is run until PEI, which requires that they are all scheduled in the same FunctionPassManager. Inserting a (machine) ModulePass between StackProtector and PEI results in these passes being in separate FunctionPassManagers and the StackProtector is not available for PEI. PEI and StackColoring don't use much information from the StackProtector pass, so transfering the required information to MachineFrameInfo is cleaner than keeping the StackProtector pass around. This commit moves the SSP layout information to MFI instead of keeping it in the pass. This patch set (D37580, D37581, D37582, D37583, D37584, D37585, D37586, D37587) is a first draft of the pagerando implementation described in http://lists.llvm.org/pipermail/llvm-dev/2017-June/113794.html. Patch by Stephen Crane <sjc@immunant.com> Differential Revision: https://reviews.llvm.org/D49256 llvm-svn: 336964
*	[X86] Add AVX512 equivalents of some isel patterns so we get EVEX instructions.	Craig Topper	2018-07-12	2	-17/+48
\| \| \| \| \| \|	These are the patterns for matching fceil, ffloor, and sqrt to intrinsic instructions if they have a MOVSS/SD. llvm-svn: 336954
*	Revert r336950 and r336951 "[X86] Add AVX512 equivalents of some isel ↵	Craig Topper	2018-07-12	2	-48/+17
\| \| \| \| \| \| \| \|	patterns so we get EVEX instructions." and "foo" One of them had a bad title and they should have been squashed. llvm-svn: 336953
*	[X86] Add AVX512 equivalents of some isel patterns so we get EVEX instructions.	Craig Topper	2018-07-12	1	-0/+31
\| \| \| \| \| \|	These are the patterns for matching fceil, ffloor, and sqrt to intrinsic instructions if they have a MOVSS/SD. llvm-svn: 336951
*	foo	Craig Topper	2018-07-12	2	-17/+17
\| \| \| \|	llvm-svn: 336950
*	[X86][FastISel] Support EVEX version of sqrt.	Craig Topper	2018-07-12	1	-8/+11
\| \| \| \|	llvm-svn: 336939
*	AMDGPU: Fix assert in truncate combine with vectors	Matt Arsenault	2018-07-12	1	-1/+1
\| \| \| \| \| \| \|	The piece above probably has the same problem, but I need to try to come up with a test for it. llvm-svn: 336935