summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* AArch64: Cyclone: Remove SlowMisaligned128Store tuning flagMatthias Braun2018-01-241-1/+0
| | | | | | | | | | | | | | | Remove FeatureSlowMisaligned128Store from cyclone flags. This flag causes splitting of 16 byte wide stores into 2 stored of 8 bytes. This was useful on older apple CPUs which were slow for 16byte stores that were not aligned on 16byte. As the compiler often cannot predict the actual alignment, the splitting was choosen. This has been a topic for a lot of debate as the splitting also decreases performance for some benchmarks. Measuring the effects on newer apple chips (rdar://35525421) shows that it harms more cases than it helps. So it is time to retire this workaround. llvm-svn: 323289
* [PPC] Avoid incorrect fp-i128-fp lowering.Tim Shen2018-01-231-2/+4
| | | | | | | | | | | | | | | Summary: Fix an issue that's similar to what D41411 fixed: float(__int128(float_var)) shouldn't be optimized to xscvdpsxds + xscvsxdsp, as they mean (float)(int64_t)float_var. Reviewers: jtony, hfinkel, echristo Subscribers: sanjoy, nemanjai, hiraditya, llvm-commits, kbarton Differential Revision: https://reviews.llvm.org/D42400 llvm-svn: 323270
* [X86] Merge some regular expressions in Zen scheduler model and remove 2 ↵Craig Topper2018-01-231-14/+2
| | | | | | | | unused classes. I don't know if the unused classes were intended to be used and that the VEX version is really different than the legacy SSE version. Agner's tables don't show any differences. I'm just cleaning up assuming the current behavior is correct. llvm-svn: 323263
* [X86] Remove 'Int_' from instregexs in Zen scheduler model.Craig Topper2018-01-231-12/+12
| | | | | | No instructions have Int_ at the beginning. It's always at the end now. So it should be picked up as a prefix match llvm-svn: 323262
* [X86] Move 'Int_' to the end of the name of the VCOMISS/VUCOMISS and ↵Craig Topper2018-01-233-40/+40
| | | | | | | | instructions to get them picked up by the scheduler model regexs. All other intrinsic instructions put the _Int on the end. This make these instructions consistent and gets the prefix instregexs in the scheduler models to pick them up. llvm-svn: 323261
* [X86][AVX] LowerBUILD_VECTORAsVariablePermute - add support for VPERMILPV to ↵Simon Pilgrim2018-01-231-49/+72
| | | | | | | | | | | | | | v2i64/v2f64 Minor refactor to make it possible for LowerBUILD_VECTORAsVariablePermute to be used with a wider variety of shuffles op and types. I'd have liked to add v4i32/v4f32 support as well but we don't see v4i32 index extractions at the moment (which is why I created D42308) After this I intend to begin adding scaling support for PSHUFB (v8i16, v4i32, v2i64)) and VPERMPS (v4f64, v4i64). Differential Revision: https://reviews.llvm.org/D42431 llvm-svn: 323260
* Fix MSVC "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI.Simon Pilgrim2018-01-231-1/+1
| | | | llvm-svn: 323258
* [Hexagon] Add patterns for sext_inreg of HVX vector typesKrzysztof Parzyszek2018-01-231-0/+19
| | | | llvm-svn: 323250
* [Hexagon] Implement hasLoadFromStackSlot and hasStoreToStackSlotKrzysztof Parzyszek2018-01-232-0/+50
| | | | | | | | If the instruction is a bundle, check the instructions inside of it. Patch by Suyog Sarda. llvm-svn: 323240
* [Hexagon] Fix unused variable warning in release buildKrzysztof Parzyszek2018-01-231-0/+1
| | | | llvm-svn: 323233
* [Hexagon] Implement basic vector operations on vectors vNi1Krzysztof Parzyszek2018-01-238-260/+970
| | | | | | | | | | | In addition to that, make sure that there are no boolean vector types that are associated with multiple register classes. Specifically, remove v32i1 and v64i1 from integer register classes. These types will correspond to results of vector comparisons, and as such should belong to the vector predicate class. Having them in scalar registers as well makes legalization ambiguous. llvm-svn: 323229
* [X86][SSE] LowerBUILD_VECTORAsVariablePermute - extract subvector from ↵Simon Pilgrim2018-01-231-6/+10
| | | | | | oversized index vectors llvm-svn: 323223
* [WebAssembly] Add mem.* intrinsics.Dan Gohman2018-01-231-0/+9
| | | | | | | | | | | | The grow_memory and current_memory instructions are expected to be officially renamed to mem.grow and mem.size. Introduce new intrinsics with the new names. These new names aren't yet official, so for now, use them at your own risk. Also, take this opportunity to add arguments for the currently unused immediate field in those instructions. llvm-svn: 323222
* [X86] Rewrite vXi1 element insertion by using a vXi1 scalar_to_vector and ↵Craig Topper2018-01-231-67/+5
| | | | | | | | | | inserting into a vXi1 vector. The existing code was already doing something very similar to subvector insertion so this allows us to remove the nearly duplicate code. This patch is a little larger than it should be due to differences between the DQI handling between the two today. llvm-svn: 323212
* [X86][SSE] LowerBUILD_VECTORAsVariablePermute - ensure that the source ↵Simon Pilgrim2018-01-231-1/+3
| | | | | | | | vector is not larger than the destination We might be able to support this in the future with VPERMV3, OR(PSHUFB, PSHUFB) etc. llvm-svn: 323210
* Use EVT::changeVectorElementTypeToInteger() to convert index type to integerSimon Pilgrim2018-01-231-4/+4
| | | | llvm-svn: 323207
* [X86][SSE] LowerBUILD_VECTORAsVariablePermute - ensure that the index vector ↵Simon Pilgrim2018-01-231-0/+4
| | | | | | has the correct number of elements llvm-svn: 323206
* AArch64: get type from correct result when forming BFXTim Northover2018-01-231-1/+1
| | | | | | | Some nodes produce multiple values so when obtaining the type of an ISD::OR we need to make sure we ask for the correct one. Hopefully that's all of them. llvm-svn: 323205
* AArch64: get type from correct result when forming BFI/BFMTim Northover2018-01-231-1/+1
| | | | | | | Some nodes produce multiple values so when obtaining the type of an ISD::OR we need to make sure we ask for the correct one. llvm-svn: 323202
* [X86] Legalize v32i1 without BWI via splitting to v16i1 rather than the ↵Craig Topper2018-01-232-0/+31
| | | | | | | | | | | | | | | | | | | | | | | | | default of promoting to v32i8. Summary: For the most part its better to keep v32i1 as a mask type of a narrower width than trying to promote it to a ymm register. I had to add some overrides to the methods that get the types for the calling convention so that we still use v32i8 for argument/return purposes. There are still some regressions in here. I definitely saw some around shuffles. I think we probably should move vXi1 shuffle from lowering to a DAG combine where I think the extend and truncate we have to emit would be better combined. I think we also need a DAG combine to remove trunc from (extract_vector_elt (trunc)) Overall this removes something like 13000 CHECK lines from lit tests. Reviewers: zvi, RKSimon, delena, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42031 llvm-svn: 323201
* [X86] Add missing MOVSX/MOVZX instructions to load folding tables.Craig Topper2018-01-231-0/+3
| | | | | | | | I'm not sure there's any way to generate these folding cases especially the movzx ones since even the register form is never emitted by codegen. I'm just adding them to remove the difference with the autogenerated version of the folding table. llvm-svn: 323200
* [X86][SSE] LowerBUILD_VECTORAsVariablePermute - fix PSHUFB source/index ↵Simon Pilgrim2018-01-231-2/+3
| | | | | | | | | | operand ordering As detailed in rL317463, PSHUFB (like most variable shuffle instructions) uses Op[0] for the source vector and Op[1] for the shuffle index vector, VPERMV works in reverse which is probably where the confusion comes from. Differential Revision: https://reviews.llvm.org/D42380 llvm-svn: 323190
* [mips] Properly select abs and sqrt instructionsStefan Maksimovic2018-01-233-23/+24
| | | | | | | | | | | | | - Alter abs for micromips to have both AFGR64 and FGR64 variants, same as sqrt - Remove sqrt and abs from MicroMips32r6InstrInfo.td, use micromips FGR64 variants - Restrict non-micromips abs/sqrt with NotInMicroMips predicate Differential revision: https://reviews.llvm.org/D41439 llvm-svn: 323184
* [X86] Don't reorder (srl (and X, C1), C2) if (and X, C1) can be matched as a ↵Craig Topper2018-01-231-0/+8
| | | | | | | | | | | | | | | | | | | movzx Summary: If we can match as a zero extend there's no need to flip the order to get an encoding benefit. As movzx is 3 bytes with independent source/dest registers. The shortest 'and' we could make is also 3 bytes unless we get lucky in the register allocator and its on AL/AX/EAX which have a 2 byte encoding. This patch was more impressive before r322957 went in. It removed some of the same Ands that got deleted by that patch. Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42313 llvm-svn: 323175
* [X86] Remove 'NOREX' comment from the printing of _NOREX instructions.Craig Topper2018-01-232-7/+7
| | | | | | Some of the NOREX instructions are used in 32-bit mode making this printing confusing. It also doesn't provide a lot of value since you can see the h-register being used by the instruction. llvm-svn: 323174
* [X86] Various vXi1 insertion improvements.Craig Topper2018-01-233-17/+27
| | | | | | Add missing patterns for inserting v1i1 into a zero vector. Use insert_subvector to zero upper bits before inserting an element into a vXi1 vector. Replace kshift based isel pattern with insert_subvector based pattern now that code that caused the pattern has been fixed to emit insert_subvector. llvm-svn: 323173
* Introduce the "retpoline" x86 mitigation technique for variant #2 of the ↵Chandler Carruth2018-01-2217-11/+523
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | speculative execution vulnerabilities disclosed today, specifically identified by CVE-2017-5715, "Branch Target Injection", and is one of the two halves to Spectre.. Summary: First, we need to explain the core of the vulnerability. Note that this is a very incomplete description, please see the Project Zero blog post for details: https://googleprojectzero.blogspot.com/2018/01/reading-privileged-memory-with-side.html The basis for branch target injection is to direct speculative execution of the processor to some "gadget" of executable code by poisoning the prediction of indirect branches with the address of that gadget. The gadget in turn contains an operation that provides a side channel for reading data. Most commonly, this will look like a load of secret data followed by a branch on the loaded value and then a load of some predictable cache line. The attacker then uses timing of the processors cache to determine which direction the branch took *in the speculative execution*, and in turn what one bit of the loaded value was. Due to the nature of these timing side channels and the branch predictor on Intel processors, this allows an attacker to leak data only accessible to a privileged domain (like the kernel) back into an unprivileged domain. The goal is simple: avoid generating code which contains an indirect branch that could have its prediction poisoned by an attacker. In many cases, the compiler can simply use directed conditional branches and a small search tree. LLVM already has support for lowering switches in this way and the first step of this patch is to disable jump-table lowering of switches and introduce a pass to rewrite explicit indirectbr sequences into a switch over integers. However, there is no fully general alternative to indirect calls. We introduce a new construct we call a "retpoline" to implement indirect calls in a non-speculatable way. It can be thought of loosely as a trampoline for indirect calls which uses the RET instruction on x86. Further, we arrange for a specific call->ret sequence which ensures the processor predicts the return to go to a controlled, known location. The retpoline then "smashes" the return address pushed onto the stack by the call with the desired target of the original indirect call. The result is a predicted return to the next instruction after a call (which can be used to trap speculative execution within an infinite loop) and an actual indirect branch to an arbitrary address. On 64-bit x86 ABIs, this is especially easily done in the compiler by using a guaranteed scratch register to pass the target into this device. For 32-bit ABIs there isn't a guaranteed scratch register and so several different retpoline variants are introduced to use a scratch register if one is available in the calling convention and to otherwise use direct stack push/pop sequences to pass the target address. This "retpoline" mitigation is fully described in the following blog post: https://support.google.com/faqs/answer/7625886 We also support a target feature that disables emission of the retpoline thunk by the compiler to allow for custom thunks if users want them. These are particularly useful in environments like kernels that routinely do hot-patching on boot and want to hot-patch their thunk to different code sequences. They can write this custom thunk and use `-mretpoline-external-thunk` *in addition* to `-mretpoline`. In this case, on x86-64 thu thunk names must be: ``` __llvm_external_retpoline_r11 ``` or on 32-bit: ``` __llvm_external_retpoline_eax __llvm_external_retpoline_ecx __llvm_external_retpoline_edx __llvm_external_retpoline_push ``` And the target of the retpoline is passed in the named register, or in the case of the `push` suffix on the top of the stack via a `pushl` instruction. There is one other important source of indirect branches in x86 ELF binaries: the PLT. These patches also include support for LLD to generate PLT entries that perform a retpoline-style indirection. The only other indirect branches remaining that we are aware of are from precompiled runtimes (such as crt0.o and similar). The ones we have found are not really attackable, and so we have not focused on them here, but eventually these runtimes should also be replicated for retpoline-ed configurations for completeness. For kernels or other freestanding or fully static executables, the compiler switch `-mretpoline` is sufficient to fully mitigate this particular attack. For dynamic executables, you must compile *all* libraries with `-mretpoline` and additionally link the dynamic executable and all shared libraries with LLD and pass `-z retpolineplt` (or use similar functionality from some other linker). We strongly recommend also using `-z now` as non-lazy binding allows the retpoline-mitigated PLT to be substantially smaller. When manually apply similar transformations to `-mretpoline` to the Linux kernel we observed very small performance hits to applications running typical workloads, and relatively minor hits (approximately 2%) even for extremely syscall-heavy applications. This is largely due to the small number of indirect branches that occur in performance sensitive paths of the kernel. When using these patches on statically linked applications, especially C++ applications, you should expect to see a much more dramatic performance hit. For microbenchmarks that are switch, indirect-, or virtual-call heavy we have seen overheads ranging from 10% to 50%. However, real-world workloads exhibit substantially lower performance impact. Notably, techniques such as PGO and ThinLTO dramatically reduce the impact of hot indirect calls (by speculatively promoting them to direct calls) and allow optimized search trees to be used to lower switches. If you need to deploy these techniques in C++ applications, we *strongly* recommend that you ensure all hot call targets are statically linked (avoiding PLT indirection) and use both PGO and ThinLTO. Well tuned servers using all of these techniques saw 5% - 10% overhead from the use of retpoline. We will add detailed documentation covering these components in subsequent patches, but wanted to make the core functionality available as soon as possible. Happy for more code review, but we'd really like to get these patches landed and backported ASAP for obvious reasons. We're planning to backport this to both 6.0 and 5.0 release streams and get a 5.0 release with just this cherry picked ASAP for distros and vendors. This patch is the work of a number of people over the past month: Eric, Reid, Rui, and myself. I'm mailing it out as a single commit due to the time sensitive nature of landing this and the need to backport it. Huge thanks to everyone who helped out here, and everyone at Intel who helped out in discussions about how to craft this. Also, credit goes to Paul Turner (at Google, but not an LLVM contributor) for much of the underlying retpoline design. Reviewers: echristo, rnk, ruiu, craig.topper, DavidKreitzer Subscribers: sanjoy, emaste, mcrosier, mgorny, mehdi_amini, hiraditya, llvm-commits Differential Revision: https://reviews.llvm.org/D41723 llvm-svn: 323155
* [AMDGPU] SI Load Store Optimizer: When merging with offset, use ↵Mark Searles2018-01-221-14/+18
| | | | | | | | | | | V_ADD_{I|U}32_e64 - Change inserted add ( V_ADD_{I|U}32_e32 ) to _e64 version ( V_ADD_{I|U}32_e64 ) so that the add uses a vreg for the carry; this prevents inserted v_add from killing VCC; the _e64 version doesn't accept a literal in its encoding, so we need to introduce a mov instr as well to get the imm into a register. - Change pass name to "SI Load Store Optimizer"; this removes the '/', which complicates scripts. Differential Revision: https://reviews.llvm.org/D42124 llvm-svn: 323153
* [AArch64] Create a separate feature set for Exynos M3Evandro Menezes2018-01-221-2/+17
| | | | | | Distinguish the features from Exynos M2. llvm-svn: 323139
* [ARM] Cleanup part of ARMBaseInstrInfo::optimizeCompareInstr (NFCI).Joel Galenson2018-01-221-12/+8
| | | | | | | | | As noted in another review, this loop is confusing. This commit cleans it up somewhat. Differential Revision: https://reviews.llvm.org/D42312 llvm-svn: 323136
* [mips] add warnings for using dsp and msa flags with inappropriate revisionsPetar Jovanovic2018-01-222-0/+44
| | | | | | | | | | | Dsp and dspr2 require MIPS revision 2, while msa requires revision 5. Adding warnings for cases when these flags are used with earlier revision. Patch by Milos Stojanovic. Differential Revision: https://reviews.llvm.org/D40490 llvm-svn: 323131
* [SystemZ] Fix bootstrap failure due to invalid DAG loopUlrich Weigand2018-01-221-2/+21
| | | | | | | | | | | | | | The change in r322988 caused a failure in the bootstrap build bot. The problem was that directly gluing a BR_CCMASK node to a compare-and-swap could lead to issues if other nodes were chained in between. There is then no way to create a topological sort that respects both the chain sequence and the glue property. Fixed for now by rejecting the optimization in this case. As a future enhancement, we may be able to handle additional cases by swapping chain links around. llvm-svn: 323129
* Fix bug in commit 323096 exposed by test in ↵Marina Yatsina2018-01-221-1/+1
| | | | | | | test-suite-verify-machineinstrs-x86_64h-O3 Change-Id: I0a4b10d0d6c8de606d989c567ec07944ae283a87 llvm-svn: 323126
* [AArch64][SVE] Asm: PTRUE and PTRUES instructionsSander de Smalen2018-01-222-2/+66
| | | | | | | | | | | | | | Summary: These instructions initialize a predicate vector from a pattern/immediate. Reviewers: fhahn, rengolin, evandro, mcrosier, t.p.northover, samparker, olista01 Reviewed By: samparker Subscribers: aemerson, javed.absar, tschuett, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D41819 llvm-svn: 323124
* [AArch64] optimise v4f16 fcmps to utilise vector instructionsCarey Williams2018-01-221-2/+15
| | | | | | | | | Improves the code generation for v4f16 FCMP instructions when FullFP16 is not supported. Generating FCTVL(s) rather than a longer series of FCVTs. Differential Revision: https://reviews.llvm.org/D41772 llvm-svn: 323118
* [X86][SSE] Add ISD::VECTOR_SHUFFLE to faux shuffle decoding (Reapplied)Simon Pilgrim2018-01-221-0/+11
| | | | | | | | Primarily, this allows us to use the aggressive extraction mechanisms in combineExtractWithShuffle earlier and make use of UNDEF elements that may be lost during lowering. Reapplied after rL322279 was reverted at rL322335 due to PR35918, underlying issue was fixed at rL322644. llvm-svn: 323104
* [AArch64][SVE] Asm: Predicate patternsSander de Smalen2018-01-226-0/+116
| | | | | | | | | | | | | | | | | | | Summary: This patch adds support for parsing/printing of named or unnamed patterns that are used in SVE's PTRUE instruction, amongst others. The pattern can be specified as a named pattern to initialize the predicate vector or it can be specified as an immediate in the range 0-31. Reviewers: fhahn, rengolin, evandro, mcrosier, t.p.northover Reviewed By: fhahn Subscribers: aemerson, javed.absar, tschuett, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D41818 llvm-svn: 323098
* Break false dependencies for POPCNT, LZCNT, TZCNTMarina Yatsina2018-01-224-10/+66
| | | | | | | | | | | | | | | | | | | | | | Add POPCNT, LZCNT, TZCNT to the list of instructions that have false dependency. Add a test to make sure BreakFalseDeps breaks the dependencies for these instructions. Update affected tests. This fixes bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869 This is the final of multiple patches that fix this bugzilla. Most of the patches are intended at refactoring the existent code. Reviews of the refactoring done to enable this change: https://reviews.llvm.org/D40330 https://reviews.llvm.org/D40331 https://reviews.llvm.org/D40332 https://reviews.llvm.org/D40333 Differential Revision: https://reviews.llvm.org/D40334 Change-Id: If95cbf1a3f5c7dccff8f1b22ecb397542147303d llvm-svn: 323096
* Separate LoopTraversal, ReachingDefAnalysis and BreakFalseDeps into their ↵Marina Yatsina2018-01-222-2/+2
| | | | | | | | | | | | | | | | | | own files. This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869 Most of the patches are intended at refactoring the existent code. Additional relevant reviews: https://reviews.llvm.org/D40330 https://reviews.llvm.org/D40331 https://reviews.llvm.org/D40332 https://reviews.llvm.org/D40334 Differential Revision: https://reviews.llvm.org/D40333 Change-Id: Ie5f8eb34d98cfdfae23a3072eb69b5794f0e2d56 llvm-svn: 323095
* Rename ExecutionDepsFix files to ExecutionDomainFixMarina Yatsina2018-01-222-2/+2
| | | | | | | | | | | | | | | | This is the one of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869 Most of the patches are intended at refactoring the existent code. Additional relevant reviews: https://reviews.llvm.org/D40330 https://reviews.llvm.org/D40331 https://reviews.llvm.org/D40333 https://reviews.llvm.org/D40334 Differential Revision: https://reviews.llvm.org/D40332 Change-Id: I6a048cca7fdafbfc42fb1bac94343e483befded8 llvm-svn: 323094
* Separate ExecutionDepsFix into 4 parts:Marina Yatsina2018-01-227-26/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | 1. ReachingDefsAnalysis - Allows to identify for each instruction what is the “closest” reaching def of a certain register. Used by BreakFalseDeps (for clearance calculation) and ExecutionDomainFix (for arbitrating conflicting domains). 2. ExecutionDomainFix - Changes the variant of the instructions in order to minimize domain crossings. 3. BreakFalseDeps - Breaks false dependencies. 4. LoopTraversal - Creatws a traversal order of the basic blocks that is optimal for loops (introduced in revision L293571). Both ExecutionDomainFix and ReachingDefsAnalysis use this to determine the order they will traverse the basic blocks. This also included the following changes to ExcecutionDepsFix original logic: 1. BreakFalseDeps and ReachingDefsAnalysis logic no longer restricted by a register class. 2. ReachingDefsAnalysis tracks liveness of reg units instead of reg indices into a given reg class. Additional changes in affected files: 1. X86 and ARM targets now inherit from ExecutionDomainFix instead of ExecutionDepsFix. BreakFalseDeps also was added to the passes they activate. 2. Comments and references to ExecutionDepsFix replaced with ExecutionDomainFix and BreakFalseDeps, as appropriate. Additional refactoring changes will follow. This commit is (almost) NFC. The only functional change is that now BreakFalseDeps will break dependency for all register classes. Since no additional instructions were added to the list of instructions that have false dependencies, there is no actual change yet. In a future commit several instructions (and tests) will be added. This is the first of multiple patches that fix bugzilla https://bugs.llvm.org/show_bug.cgi?id=33869 Most of the patches are intended at refactoring the existent code. Additional relevant reviews: https://reviews.llvm.org/D40331 https://reviews.llvm.org/D40332 https://reviews.llvm.org/D40333 https://reviews.llvm.org/D40334 Differential Revision: https://reviews.llvm.org/D40330 Change-Id: Icaeb75e014eff96a8f721377783f9a3e6c679275 llvm-svn: 323087
* [NFC] fix trivial typos in commentsHiroshi Inoue2018-01-227-9/+9
| | | | | | "the the" -> "the" llvm-svn: 323074
* [X86] Add an override of targetShrinkDemandedConstant to limit the damage ↵Craig Topper2018-01-203-1/+67
| | | | | | | | | | | | | | | | | | | that shrinkdemandedbits can do to zext_in_reg operations Summary: This patch adds an implementation of targetShrinkDemandedConstant that tries to keep shrinkdemandedbits from removing bits that would otherwise have been recognized as a movzx. We still need a follow patch to stop moving ands across srl if the and could be represented as a movzx before the shift but not after. I think this should help with some of the cases that D42088 ended up removing during isel. Reviewers: spatel, RKSimon Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D42265 llvm-svn: 323048
* [X86][SSE] Check for out of bounds PEXTR/PINSR indices during faux shuffle ↵Simon Pilgrim2018-01-201-2/+9
| | | | | | combining. llvm-svn: 323045
* [X86] Teach X86 codegen to use vector width preference to avoid promoting to ↵Craig Topper2018-01-202-17/+95
| | | | | | | | | | | | | | | | 512-bit types when VLX is enabled and the preference is for a smaller size. This change applies to places where we would turn 128/256-bit code into 512-bit in order to get a wider element type through sext/zext. Any 512-bit types that already existed in the IR/DAG will be left that way. The width preference has no effect on codegen behavior when the target does not have AVX512 enabled. So AVX/AVX2 codegen cannot be limited via this mechanism yet. If the preference is lower than 256 we may still use a 256 bit type to do the operation. Constraining to 128 bits makes it much more difficult to support some operations. For many of these cases we need to change element width while keeping element count constant which is easiest done by switching between 256 and 128 bit. The preference is only obeyed when AVX512 and VLX are available. This means the preference is not obeyed for KNL, but is obeyed for SKX, Cannonlake, and Icelake. For KNL, the only way to do masked operation is on 512-bit registers so we would have to completely disable masking to obey the preference. We would also lose support for gather, scatter, ctlz, vXi64 multiplies, etc. This may change in the future, but this simplifies the initial implementation. Differential Revision: https://reviews.llvm.org/D41895 llvm-svn: 323016
* [X86] Add support for passing 'prefer-vector-width' function attribute into ↵Craig Topper2018-01-205-8/+54
| | | | | | | | | | | | X86Subtarget and exposing via X86's getRegisterWidth TTI interface. This will cause the vectorizers to do some limiting of the vector widths they create. This is not a strict limit. There are reasons I know of that the loop vectorizer will generate larger vectors for. I've written this in such a way that the interface will only return a properly supported width(0/128/256/512) even if the attribute says something funny like 384 or 10. This has been split from D41895 with the remainder in a follow up commit. llvm-svn: 323015
* [WebAssembly] Fix MSVC buildDerek Schuff2018-01-201-5/+6
| | | | | | nullptr_t can't be used left of boolean && llvm-svn: 323012
* [SystemZ] Prefer LOCHI over generating IPM sequencesUlrich Weigand2018-01-191-0/+5
| | | | | | | | On current machines we have load-on-condition instructions that can be used to directly implement the SETCC semantics. If we have those, it is always preferable to use them instead of generating the IPM sequence. llvm-svn: 322989
* [SystemZ] Directly use CC result of compare-and-swapUlrich Weigand2018-01-192-0/+126
| | | | | | | | | | In order to implement a test whether a compare-and-swap succeeded, the SystemZ back-end currently emits a rather inefficient sequence of first converting the CC result into an integer, and then testing that integer against zero. This commit changes the back-end to simply directly test the CC value set by the compare-and-swap instruction. llvm-svn: 322988
* [SystemZ] Rework IPM sequence generationUlrich Weigand2018-01-194-134/+238
| | | | | | | | | | | | The SystemZ back-end uses a sequence of IPM followed by arithmetic operations to implement the SETCC primitive. This is currently done early during SelectionDAG. This patch moves generating those sequences to much later in SelectionDAG (during PreprocessISelDAG). This doesn't change much in generated code by itself, but it allows further enhancements that will be checked-in as follow-on commits. llvm-svn: 322987
OpenPOWER on IntegriCloud