summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/X86
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Add initial support for unfolding broadcast loads from arithmetic ↵Craig Topper2019-09-013-10/+155
| | | | | | | | | | instructions to enable LICM hoisting of the load MachineLICM can hoist an invariant load, but if that load is folded it needs to be unfolded. On AVX512 sometimes this load is an broadcast load which we were previously unable to unfold. This patch adds initial support for that with a very basic list of supported instructions as a starting point. Differential Revision: https://reviews.llvm.org/D67017 llvm-svn: 370620
* [X86][AVX] Rename + cleanup lowerShuffleAsLanePermuteAndBlend. NFCI.Simon Pilgrim2019-09-011-28/+30
| | | | | | | | | | Rename to lowerShuffleAsLanePermuteAndShuffle to make it clear that not just blends are performed. Cleanup the in-lane shuffle mask generation to make it more obvious what's going on. Some prep work noticed while investigating the poor shuffle code mentioned in D66004. llvm-svn: 370613
* Fix shadow variable warning. NFCI.Simon Pilgrim2019-09-011-3/+3
| | | | llvm-svn: 370610
* [X86] Replace some COPY_TO_REGCLASS from GR32/GR64 to VR128 in isel patterns ↵Craig Topper2019-08-311-22/+18
| | | | | | | | | | with VMOVDI2PDIrr/VMOV64toPQIrr. This is what the copies will eventually be turned into. We don't use COPY_TO_REGCLASS for scalar_to_vector patterns. So we should use the real instruction here too. llvm-svn: 370601
* [X86] Compress the flag bits in the folding tables to make room for more ↵Craig Topper2019-08-312-13/+18
| | | | | | bits in an upcoming patch. llvm-svn: 370600
* [X86] EltsFromConsecutiveLoads - Don't confuse elt count with vector element ↵Simon Pilgrim2019-08-311-11/+16
| | | | | | | | count (PR43170) EltsFromConsecutiveLoads was assuming that the number of input elts was the same as the number of elements in the output vector type when creating a zeroing shuffle, causing an assert when subvectors were being combined instead of just scalars. llvm-svn: 370592
* Fix shadow variable warning by making CondCodes names more explicit. NFCI.Simon Pilgrim2019-08-311-10/+10
| | | | llvm-svn: 370589
* Fix shadow variable warning. NFCI.Simon Pilgrim2019-08-311-2/+2
| | | | llvm-svn: 370585
* [X86ISelLowering] combineCMov - cleanup CMOV->LEA codegen. NFCI.Simon Pilgrim2019-08-311-5/+5
| | | | | | Only compute the diff once and we don't need the truncation code (assert the bitwidth is correct just to be safe). llvm-svn: 370583
* [X86ISelLowering] LowerSELECT - remove duplicate value type. NFCI.Simon Pilgrim2019-08-311-2/+0
| | | | | | VT of SELECT result and selection ops will be the same. llvm-svn: 370581
* Fix SEH_NoReturn machine verifier errorReid Kleckner2019-08-301-1/+1
| | | | llvm-svn: 370543
* [MC] Avoid crashes from improperly nested or wrong target .seh_handlerdata ↵Reid Kleckner2019-08-301-2/+3
| | | | | | directives llvm-svn: 370540
* [X86] Print register names in .seh_* directivesReid Kleckner2019-08-302-13/+172
| | | | | | | | | | Also improve assembler parser register validation for .seh_ directives. This requires moving X86-specific seh directive handling into the x86 backend, which addresses some assembler FIXMEs. Differential Revision: https://reviews.llvm.org/D66625 llvm-svn: 370533
* [Windows] Disable TrapUnreachable for Win64, add SEH_NoReturnReid Kleckner2019-08-307-10/+37
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Users have complained llvm.trap produce two ud2 instructions on Win64, one for the trap, and one for unreachable. This change fixes that. TrapUnreachable was added and enabled for Win64 in r206684 (April 2014) to avoid poorly understood issues with the Windows unwinder. There seem to be two major things in play: - the unwinder - C++ EH, _CxxFrameHandler3 & co The unwinder disassembles forward from the return address to scan for epilogues. Inserting a ud2 had the effect of stopping the unwinder, and ensuring that it ran the EH personality function for the current frame. However, it's not clear what the unwinder does when the return address happens to be the last address of one function and the first address of the next function. The Visual C++ EH personality, _CxxFrameHandler3, needs to figure out what the current EH state number is. It does this by consulting the ip2state table, which maps from PC to state number. This seems to go wrong when the return address is the last PC of the function or catch funclet. I'm not sure precisely which system is involved here, but in order to address these real or hypothetical problems, I believe it is enough to insert int3 after a call site if it would otherwise be the last instruction in a function or funclet. I was able to reproduce some similar problems locally by arranging for a noreturn call to appear at the end of a catch block immediately before an unrelated function, and I confirmed that the problems go away when an extra trailing int3 instruction is added. MSVC inserts int3 after every noreturn function call, but I believe it's only necessary to do it if the call would be the last instruction. This change inserts a pseudo instruction that expands to int3 if it is in the last basic block of a function or funclet. I did what I could to run the Microsoft compiler EH tests, and the ones I was able to run showed no behavior difference before or after this change. Differential Revision: https://reviews.llvm.org/D66980 llvm-svn: 370525
* [X86] Pass v32i16/v64i8 in zmm registers on KNL target.Craig Topper2019-08-301-0/+15
| | | | | | | | | | | | | | | gcc and icc pass these types in zmm registers in zmm registers. This patch implements a quick hack to override the register type before calling convention handling to one that is legal. Longer term we might want to do something similar to 256-bit integer registers on AVX1 where we just split all the operations. Fixes PR42957 Differential Revision: https://reviews.llvm.org/D66708 llvm-svn: 370495
* [X86] Merge X86InstrInfo::loadRegFromAddr/storeRegToAddr into their only ↵Craig Topper2019-08-302-50/+21
| | | | | | | | | | | call site. I'm looking at unfolding broadcast loads on AVX512 which will require refactoring this code to select broadcast opcodes instead of regular load/stores in some cases. Merging them to avoid further complicating their interfaces. llvm-svn: 370484
* [X86] Explicitly list all the always trivially rematerializable instructions.Craig Topper2019-08-301-5/+40
| | | | | | | | | Add a default with an llvm_unreachable for anything we don't expect. This seems safer that just blindly returning true for anything missing from the switch. llvm-svn: 370424
* [X86] Don't emit unreachable stack adjustmentsReid Kleckner2019-08-291-2/+14
| | | | | | | | | | | | | | | | Summary: This is a minor improvement on our past attempts to do this. Fixes PR43155. Reviewers: hans Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66905 llvm-svn: 370409
* Allow '@' to appear in x86 mingw symbolsReid Kleckner2019-08-291-0/+2
| | | | | | | | | | | | | | | | | Summary: There is no reason to differ in assembler behavior here between -msvc and -gnu targets. Without this setting, the text after the '@' is interpreted as a symbol variable, like foo@IMGREL. Reviewers: mstorsjo Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66974 llvm-svn: 370408
* [X86][SSE] combinePMULDQ - pmuldq(x, 0) -> zero vector (PR43159)Simon Pilgrim2019-08-291-3/+5
| | | | | | ISD::isBuildVectorAllZeros permits undef elements to be present, which means we can't return it as a zero vector. PMULDQ/PMULUDQ is an extending multiply so a multiply by zero of the lower 32-bits should result in a zero 64-bit element. llvm-svn: 370404
* [X86] Remove what little support we had for MPXCraig Topper2019-08-294-25/+22
| | | | | | | | | | | | | | | -Deprecate -mmpx and -mno-mpx command line options -Remove CPUID detection of mpx for -march=native -Remove MPX from all CPUs -Remove MPX preprocessor define I've left the "mpx" string in the backend so we don't fail on old IR, but its not connected to anything. gcc has also deprecated these command line options. https://www.phoronix.com/scan.php?page=news_item&px=GCC-Patch-To-Drop-MPX Differential Revision: https://reviews.llvm.org/D66669 llvm-svn: 370393
* [X86][CodeGen][NFC] Delay `combineIncDecVector()` from DAGCombine to ↵Roman Lebedev2019-08-293-42/+58
| | | | | | | | | | | | | | | | | | | | | | | | | | X86DAGToDAGISel Summary: We were previously doing it in DAGCombine. But we also want to do `sub %x, C` -> `add %x, (sub 0, C)` for vectors in DAGCombine. So if we had `sub %x, -1`, we'll transform it to `add %x, 1`, which `combineIncDecVector()` will immediately transform back into `sub %x, -1`, and here we go again... I've marked this as NFC since not a single test changes, but since that 'changes' DAGCombine, probably this isn't fully NFC. Reviewers: RKSimon, craig.topper, spatel Reviewed By: craig.topper Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62327 llvm-svn: 370327
* [X86] Remove isel patterns with X86VBroadcast+scalar_to_vector+load.Craig Topper2019-08-292-47/+1
| | | | | | The DAG should have these as X86VBroadcast+load. llvm-svn: 370299
* [X86] Remove some unneeded X86VBroadcast isel patterns that have larger than ↵Craig Topper2019-08-292-43/+0
| | | | | | | | | 128 bit input types. We should always be shrinking the input to 128 bits or smaller when the node is created. llvm-svn: 370296
* [X86] Add a DAG combine to combine INSERTPS and VBROADCAST of a scalar load. ↵Craig Topper2019-08-292-47/+45
| | | | | | | | | | Remove corresponding isel patterns. We had an isel pattern to perform this, but its better to do it in DAG combine as a simplification. This also fixes the lack of patterns for AVX512 targets. llvm-svn: 370294
* [X86] Make inline assembly 'x' and 'v' constraints work for f128.Craig Topper2019-08-291-2/+3
| | | | | | | | | Including a type legalizer fix to make bitcast operand promotion work correctly when getSoftenedFloat returns f128 instead of i128. Fixes PR43157 llvm-svn: 370293
* [X86] Fix a couple isel patterns to not shrink a volatile load.Craig Topper2019-08-281-2/+4
| | | | | | | | Also add a FIXME because I'm not sure why these patterns exist. Looks like a missing combine. And another FIXME because the AVX512 equivalent one of the patterns is missing. llvm-svn: 370276
* [SelectionDAG] Don't generate libcalls for wide shifts on Windows (PR42711)Hans Wennborg2019-08-282-5/+9
| | | | | | | | | Neither libgcc or compiler-rt are usually used on Windows, so these functions can't be called. Differential revision: https://reviews.llvm.org/D66880 llvm-svn: 370204
* Revert "Change the X86 datalayout to add three address spaces for 32 bit ↵Vlad Tsyrklevich2019-08-281-3/+0
| | | | | | | | | signed," This reverts commit r370083 because it caused check-lld failures on sanitizer-x86_64-linux-fast. llvm-svn: 370142
* Change the X86 datalayout to add three address spaces for 32 bit signed,Amy Huang2019-08-271-0/+3
| | | | | | 32 bit unsigned, and 64 bit pointers. llvm-svn: 370083
* [X86] Remove encoding information from the TAILJMP instructions that are ↵Craig Topper2019-08-272-45/+87
| | | | | | | | | | | | | | lowered by MCInstLowering. Fix LowerPATCHABLE_TAIL_CALL to also convert them to regular JMP/JCC instructions There are 5 instructions here that are converted from TAILJMP opcodes to regular JMP/JCC opcodes during MCInstLowering. So normally there encoding information isn't used. The exception being when XRay wraps them in PATCHABLE_TAIL_CALL. For the ones that weren't already handled in MCInstLowering, add handling for those and remove their encoding information. This patch fixes PATCHABLE_TAIL_CALL to do the same opcode conversion as the regular lowering patch. Then removes the encoding information. Differential Revision: https://reviews.llvm.org/D66561 llvm-svn: 370079
* [X86][AVX] Add SimplifyDemandedVectorElts support for KSHIFTL/KSHIFTRSimon Pilgrim2019-08-271-0/+25
| | | | | | Differential Revision: https://reviews.llvm.org/D66527 llvm-svn: 370055
* [WinEH] Allocate space in funclets stack to save XMM CSRsPengfei Wang2019-08-274-7/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is an alternate approach to D63396 Currently funclets reuse the same stack slots that are used in the parent function for saving callee-saved xmm registers. If the parent function modifies a callee-saved xmm register before an excpetion is thrown, the catch handler will overwrite the original saved value. This patch allocates space in funclets stack for saving callee-saved xmm registers and uses RSP instead RBP to access memory. Signed-off-by: Pengfei Wang <pengfei.wang@intel.com> Reviewers: rnk, RKSimon, craig.topper, annita.zhang, LuoYuanke, andrew.w.kaylor Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66596 Signed-off-by: Pengfei Wang <pengfei.wang@intel.com> llvm-svn: 370005
* [X86] Delay combineIncDecVector until after op legalization.Craig Topper2019-08-261-5/+15
| | | | | | | | | Probably better to keep add over sub in early DAG combines. It might make sense to push this to lowering or delay it all the way to isel. But this was the simplest change. llvm-svn: 369981
* [X86] Add a hack to combinePMULDQ to manually turn ↵Craig Topper2019-08-261-0/+28
| | | | | | | | | | | | SIGN_EXTEND_VECTOR_INREG/ZERO_EXTEND_VECTOR_INREG inputs into an ANY_EXTEND_VECTOR_INREG style shuffle ANY_EXTEND_VECTOR_INREG isn't currently marked Legal which prevents SimplifyDemandedBits from turning SIGN/ZERO_EXTEND_VECTOR_INREG into it after op legalization. And even if we did make it Legal, combineExtInVec doesn't do shuffle combining on the VECTOR_INREG nodes until AVX1. This patch adds a quick hack to combinePMULDQ to directly emit a vector shuffle corresponding to an ANY_EXTEND_VECTOR_INREG operation. This avoids both of those issues without creating any other regressions on our tests. The xop-ifma.ll change here also showed up when I tried to resurrect D56306 and seemed to be the only improvement that patch creates now. This is a more direct way to get the benefit. Differential Revision: https://reviews.llvm.org/D66436 llvm-svn: 369942
* [X86][DAGCombiner] Teach narrowShuffle to use concat_vectors instead of ↵Craig Topper2019-08-251-2/+10
| | | | | | | | | | | | | | | | | | | | | inserting into undef Summary: Concat_vectors is more canonical during early DAG combine. For example, its what's used by SelectionDAGBuilder when converting IR shuffles into SelectionDAG shuffles when element counts between inputs and mask don't match. We also have combines in DAGCombiner than can pull concat_vectors through a shuffle. See partitionShuffleOfConcats. So it seems like concat_vectors is a better operation to use here. I had to teach DAGCombiner's SimplifyVBinOp to also handle concat_vectors with undef. I haven't checked yet if we can remove the INSERT_SUBVECTOR version in there or not. I didn't want to mess with the other caller of getShuffleHalfVectors that's used during shuffle lowering where insert_subvector probably is what we want to produce so I've enabled this via a boolean passed to the function. Reviewers: spatel, RKSimon Reviewed By: RKSimon Subscribers: hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D66504 llvm-svn: 369872
* [X86] Teach -Os immediate sharing code to not count constant uses that will ↵Craig Topper2019-08-251-0/+9
| | | | | | | | | | | become INC/DEC. INC/DEC don't use an immediate so we don't need to count it. We also shouldn't use the custom isel for it. Fixes PR42998. llvm-svn: 369863
* [X86] Add isel patterns to match vpdpwssd avx512vnni instruction from ↵Craig Topper2019-08-241-0/+29
| | | | | | add+pmaddwd nodes. llvm-svn: 369859
* [X86] Add an assert to mark more code that needs to be removed when the ↵Craig Topper2019-08-241-1/+4
| | | | | | vector widening legalization switch is removed again. llvm-svn: 369837
* Do a sweep of symbol internalization. NFC.Benjamin Kramer2019-08-231-1/+1
| | | | llvm-svn: 369803
* [X86] Move a transform out of combineConcatVectorOps so we don't prematurely ↵Craig Topper2019-08-231-9/+13
| | | | | | | | | | | | turn CONCAT_VECTORS into INSERT_SUBVECTORS. CONCAT_VECTORS and INSERT_SUBVECTORS can both call combineConcatVectorOps, but we shouldn't produce INSERT_SUBVECTORS from there. We should keep CONCAT_VECTORS until vector legalization. Noticed while looking at the madd_quad_reduction test from madd.ll llvm-svn: 369802
* [X86] Mark VPDPWSSD and VPDPWSSDS as commutable. Add stack folding tests.Craig Topper2019-08-232-10/+33
| | | | llvm-svn: 369792
* [SelectionDAG][X86] Enable iX SimplifyDemandedBits to vXi1 ↵Craig Topper2019-08-231-0/+20
| | | | | | | | | | | | | | | | SimplifyDemandedVectorElts simplification. Add a hack to X86 to avoid a regression Patch showing the effect of enabling bool vector oversimplification. Non-VLX builds can simplify a kshift shuffle, but VLX builds simplify: insert_subvector v8i zeroinitializer, v2i --> insert_subvector v8i undef, v2i Preventing the removal of the AND to clear the upper bits of result Differential Revision: https://reviews.llvm.org/D53022 llvm-svn: 369780
* Use VT::getHalfNumVectorElementsVT helpers in a few places. NFCI.Simon Pilgrim2019-08-231-5/+2
| | | | llvm-svn: 369751
* [X86][BtVer2] Add a read-advance to every implicit register use of ↵Andrea Di Biagio2019-08-231-4/+10
| | | | | | | | | | | | CMPXCHG8B/16B. This is a follow up of r369642. This patch assigns a ReadAfterLd to every implicit register use of instruction CMPXCHG8B and instruction CMPXCHG16B. Perf micro-benchmarks show that implicit registers are read after 3cy from the start of execution. llvm-svn: 369750
* [X86][BtVer2] Fix latency of ALU RMW instructions.Andrea Di Biagio2019-08-231-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Excluding ADC/SBB and the bit-test instructions (BTR/BTS/BTC), the observed latency of all other RMW integer arithmetic/logic instructions is 6cy and not 5cy. Example (ADD): ``` addb $0, (%rsp) # Latency: 6cy addb $7, (%rsp) # Latency: 6cy addb %sil, (%rsp) # Latency: 6cy addw $0, (%rsp) # Latency: 6cy addw $511, (%rsp) # Latency: 6cy addw %si, (%rsp) # Latency: 6cy addl $0, (%rsp) # Latency: 6cy addl $511, (%rsp) # Latency: 6cy addl %esi, (%rsp) # Latency: 6cy addq $0, (%rsp) # Latency: 6cy addq $511, (%rsp) # Latency: 6cy addq %rsi, (%rsp) # Latency: 6cy ``` The same latency profile applies to SUB/AND/OR/XOR/INC/DEC. The observed latency of ADC/SBB is 7-8cy. So we need a different write to model those. Latency of BTS/BTR/BTC is not fixed by this patch (they are much slower than what the model for btver2 currently reports). Differential Revision: https://reviews.llvm.org/D66636 llvm-svn: 369748
* [X86] Make combineLoopSADPattern use CONCAT_VECTORS instead of ↵Craig Topper2019-08-231-3/+5
| | | | | | | | | INSERT_SUBVECTORS for widening with zeros. CONCAT_VECTORS is more canonical for the early DAG combine runs until we start getting into the op legalization phases. llvm-svn: 369734
* [X86] Improve lowering of v2i32 SAD handling in combineLoopSADPattern.Craig Topper2019-08-231-3/+10
| | | | | | | | | | | | | | | | | | | | | For v2i32 we only feed 2 i8 elements into the psadbw instructions with 0s in the other 14 bytes. The resulting psadbw instruction will produce zeros in bits [127:16] of the output. We need to take the result and feed it to a v2i32 add where the first element includes bits [15:0] of the sad result. The other element should be zero. Prior to this patch we were using a truncate to take 0 from bits 95:64 of the psadbw. This results in a pshufd to move those bits to 63:32. But since we also have zeroes in bits 63:32 of the psadbw output, we should just take those bits. The previous code probably worked better with promoting legalization, but now we use widening legalization. I've preserved the old behavior if -x86-experimental-vector-widening-legalization=false until we get that option removed. llvm-svn: 369733
* [MC] Minor cleanup to MCFixup::Kind handling. NFC.Sam Clegg2019-08-232-11/+11
| | | | | | | | | | Prefer `MCFixupKind` where possible and add getTargetKind() to convert to `unsigned` when needed rather than scattering cast operators around the place. Differential Revision: https://reviews.llvm.org/D59890 llvm-svn: 369720
* [MachO][TLOF] Use hasLocalLinkage to determine if indirect symbol is localFrancis Visoiu Mistrih2019-08-222-3/+4
| | | | | | | | | | | | | | | | | | | | | Local symbols in the indirect symbol table contain the value `INDIRECT_SYMBOL_LOCAL` and the corresponding __pointers entry must contain the address of the target. In r349060, I added support for local symbols in the indirect symbol table, which was checking if the symbol `isDefined` && `!isExternal` to determine if the symbol is local or not. It turns out that `isDefined` will return false if the user of the symbol comes before its definition, and we'll again generate .long 0 which will be the symbol at the adress 0x0. Instead of doing that, use GlobalValue::hasLocalLinkage() to check if the symbol is local. Differential Revision: https://reviews.llvm.org/D66563 llvm-svn: 369671
OpenPOWER on IntegriCloud