summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AArch64
Commit message (Collapse)AuthorAgeFilesLines
...
* [AArch64][SVE] Allow explicit size specifier for predicate operandMomchil Velikov2019-07-251-8/+15
| | | | | | | | | | | | | ... for the vector forms of `{SQ,UQ,}{INC,DEC}P` instructions. Also continue supporting the exsting behaviour of not requiring an explicit size specifier. The preferred disasembly is *with* the specifier. This is implemented by redefining intruction forms to require vector predicates with explicit size and adding aliases, which allow a predicate with no size. Differential Revision: https://reviews.llvm.org/D65145 llvm-svn: 367019
* [ARM][AArch64] Support for Cortex-A65 & A65AE, Neoverse E1 & N1Pablo Barrio2019-07-253-2/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Add support for Cortex-A65, Cortex-A65AE, Neoverse E1 and Neoverse N1. Neoverse E1 and Cortex-A65(&AE) only implement the AArch64 state of the Arm architecture. Neoverse N1 implements both AArch32 and AArch64. Cortex-A65: https://developer.arm.com/ip-products/processors/cortex-a/cortex-a65 Cortex-A65AE: https://developer.arm.com/ip-products/processors/cortex-a/cortex-a65ae Neoverse E1: https://developer.arm.com/ip-products/processors/neoverse/neoverse-e1 Neoverse N1: https://developer.arm.com/ip-products/processors/neoverse/neoverse-n1 Patch by Diogo Sampaio and Pablo Barrio Reviewers: samparker, LukeCheeseman, sbaranga, ostannard Reviewed By: ostannard Subscribers: ostannard, javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64406 llvm-svn: 367007
* [AArch64][GlobalISel] Select immediate modes for ADD when selecting G_GEPJessica Paquette2019-07-241-2/+35
| | | | | | | | | | | | | | | Before, we weren't able to select things like this for G_GEP: add x0, x8, #8 And instead we'd materialize the 8. This teaches GISel to do that. It gives some considerable code size savings on 252.eon-- about 4%! Differential Revision: https://reviews.llvm.org/D65248 llvm-svn: 366959
* [AArch64][GlobalISel] Don't try to use GISel if subtarget doesn't have neon ↵Amara Emerson2019-07-241-0/+6
| | | | | | | | | | | | | | or fp. Throughout the legalizerinfo we currently make the assumption that the target has neon and FP target features available. Fixing it will require a refactor of the whole thing, so until then make sure we fall back. Works around PR42734 Differential Revision: https://reviews.llvm.org/D65244 llvm-svn: 366957
* [Codegen] (X & (C l>>/<< Y)) ==/!= 0 --> ((X <</l>> Y) & C) ==/!= 0 foldRoman Lebedev2019-07-242-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This was originally reported in D62818. https://rise4fun.com/Alive/oPH InstCombine does the opposite fold, in hope that `C l>>/<< Y` expression will be hoisted out of a loop if `Y` is invariant and `X` is not. But as it is seen from the diffs here, if it didn't get hoisted, the produced assembly is almost universally worse. Much like with my recent "hoist add/sub by/from const" patches, we should get almost universal win if we hoist constant, there is almost always an "and/test by imm" instruction, but "shift of imm" not so much, so we may avoid having to materialize the immediate, and thus need one less register. And since we now shift not by constant, but by something else, the live-range of that something else may reduce. Special care needs to be applied not to disturb x86 `BT` / hexagon `tstbit` instruction pattern. And to not get into endless combine loop. Reviewers: RKSimon, efriedma, t.p.northover, craig.topper, spatel, arsenm Reviewed By: spatel Subscribers: hiraditya, MaskRay, wuzish, xbolva00, nikic, nemanjai, jvesely, wdng, nhaehnle, javed.absar, tpr, kristof.beyls, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D62871 llvm-svn: 366955
* [AArch64][GlobalISel] Fold G_MUL into XRO load addressing mode when possibleJessica Paquette2019-07-241-9/+40
| | | | | | | | | | | | If we have a G_MUL, and either the LHS or the RHS of that mul is the legal shift value for a load addressing mode, we can fold it into the load. This gives some code size savings on some SPEC tests. The best are around 2% on 300.twolf and 3% on 254.gap. Differential Revision: https://reviews.llvm.org/D65173 llvm-svn: 366954
* [GlobalISel] Support for inlining memcpy, memset and memmove calls.Amara Emerson2019-07-243-3/+86
| | | | | | | | | | | | | This introduces a new family of combiner helper routines that re-use the target specific cost model from SelectionDAG, and generate inline implementations of the memcpy family of intrinsics. The combines are only enabled at optimization levels higher than -O0, and give very substantial performance improvements. Differential Revision: https://reviews.llvm.org/D65167 llvm-svn: 366951
* [AArch64][GlobalISel] Make vector dup optimization look at last elt of ZeroVecJessica Paquette2019-07-241-1/+1
| | | | | | | | | Fix an off-by-one error which made us not look at the last element of the zero vector. This caused a miscompile in 188.ammp. Differential Revision: https://reviews.llvm.org/D65168 llvm-svn: 366930
* [AArch64] Fix -Wunused-variable in -DLLVM_ENABLE_ASSERTIONS=off builds after ↵Fangrui Song2019-07-241-0/+1
| | | | | | r366857 llvm-svn: 366866
* [AArch64][GlobalISel] Add support for s128 loads, stores, extracts, truncs.Amara Emerson2019-07-233-13/+92
| | | | | | | | | | | | We need to be able to load and store s128 for memcpy inlining, where we want to generate Q register mem ops. Making these legal also requires that we add some support in other instructions. Regbankselect should also know about these since they have no GPR register class that can hold them, so need special handling to live on the FPR bank. Differential Revision: https://reviews.llvm.org/D65166 llvm-svn: 366857
* [GlobalISel][AArch64] Save a copy on G_SELECT by fixing condition to GPRJessica Paquette2019-07-231-5/+3
| | | | | | | | The condition can never be fed by FPRs, so it should always be on a GPR. Differential Revision: https://reviews.llvm.org/D65157 llvm-svn: 366854
* [GlobalISel][AArch64] Teach GISel to handle shifts in load addressing modesJessica Paquette2019-07-231-7/+124
| | | | | | | | | | | | | | | | | | | | | | | | | When we select the XRO variants of loads, we can pull in very specific shifts (of the size of an element). E.g. ``` ldr x1, [x2, x3, lsl #3] ``` This teaches GISel to handle these when they're coming from shifts specifically. This adds a new addressing mode function, `selectAddrModeShiftedExtendXReg` which recognizes this pattern. This also packs this up with `selectAddrModeRegisterOffset` into `selectAddrModeXRO`. This is intended to be equivalent to `selectAddrModeXRO` in AArch64ISelDAGtoDAG. Also update load-addressing-modes to show that all of the cases here work. Differential Revision: https://reviews.llvm.org/D65119 llvm-svn: 366819
* [GlobalISel][AArch64] Contract trivial same-size cross-bank copies into G_STOREsJessica Paquette2019-07-201-0/+49
| | | | | | | | | | | | | | | | | | | Sometimes, you can end up with cross-bank copies between same-sized GPRs and FPRs, which feed into G_STOREs. When these copies feed only into stores, they aren't necessary; we can just store using the original register bank. This provides some minor code size savings for some floating point SPEC benchmarks. (Around 0.2% for 453.povray and 450.soplex) This issue doesn't seem to show up due to regbankselect or anything similar. So, this patch introduces an early select function, `contractCrossBankCopyIntoStore` which performs the contraction when possible. The selector then continues normally and selects the correct store opcode, eliminating needless copies along the way. Differential Revision: https://reviews.llvm.org/D65024 llvm-svn: 366625
* [GlobalISel] Translate calls to memcpy et al to G_INTRINSIC_W_SIDE_EFFECTs ↵Amara Emerson2019-07-192-0/+23
| | | | | | | | | | | | | | and legalize later. I plan on adding memcpy optimizations in the GlobalISel pipeline, but we can't do that unless we delay lowering to actual function calls. This patch changes the translator to generate G_INTRINSIC_W_SIDE_EFFECTS for these functions, and then have each target specify that using the new custom legalizer for intrinsics hook that they want it expanded it a libcall. Differential Revision: https://reviews.llvm.org/D64895 llvm-svn: 366516
* [GlobalISel][AArch64] Add support for base register + offset register loadsJessica Paquette2019-07-181-0/+93
| | | | | | | | | | | | | | | | | | | | Add support for folding G_GEPs into loads of the form ``` ldr reg, [base, off] ``` when possible. This can save an add before the load. Currently, this is only supported for loads of 64 bits into 64 bit registers. Add a new addressing mode function, `selectAddrModeRegisterOffset` which performs this folding when it is profitable. Also add a test for addressing modes for G_LOAD. Differential Revision: https://reviews.llvm.org/D64944 llvm-svn: 366503
* MC: AArch64: Add support for prel_g* relocation specifiers.Peter Collingbourne2019-07-184-10/+49
| | | | | | Differential Revision: https://reviews.llvm.org/D64683 llvm-svn: 366462
* AArch64: Unify relocation restrictions between MOVK/MOVN/MOVZ.Peter Collingbourne2019-07-183-104/+51
| | | | | | | | | | | | | | | | | | | | | | | There doesn't seem to be a practical reason for these instructions to have different restrictions on the types of relocations that they may be used with, notwithstanding the language in the ELF AArch64 spec that implies that specific relocations are meant to be used with specific instructions. For example, we currently forbid the first instruction in the following sequence, despite it currently being used by clang to generate a global reference under -mcmodel=large: movz x0, #:abs_g0_nc:foo movk x0, #:abs_g1_nc:foo movk x0, #:abs_g2_nc:foo movk x0, #:abs_g3:foo Therefore, allow MOVK/MOVN/MOVZ to accept the union of the set of relocations that they currently accept individually. Differential Revision: https://reviews.llvm.org/D64466 llvm-svn: 366461
* [AArch64] Add dependency from AArch64CodeGen to TransformUtils to fix ↵Fangrui Song2019-07-181-1/+1
| | | | | | | | | | | -DBUILD_SHARED_LIBS=on link error after D64173/r366361 This fixes: ld.lld: error: undefined symbol: llvm::findAllocaForValue(llvm::Value*, llvm::DenseMap<llvm::Value*, llvm::Alloc aInst*, llvm::DenseMapInfo<llvm::Value*>, llvm::detail::DenseMapPair<llvm::Value*, llvm::AllocaInst*> >&) >>> referenced by AArch64StackTagging.cpp llvm-svn: 366396
* Speculative fix for stack-tagging.ll failure.Evgeniy Stepanov2019-07-171-2/+2
| | | | | | | Depending on the evaluation order of function call arguments, the current code may insert a use before def. llvm-svn: 366375
* Basic MTE stack tagging instrumentation.Evgeniy Stepanov2019-07-174-0/+351
| | | | | | | | | | | | | | | | Summary: Use MTE intrinsics to tag stack variables in functions with sanitize_memtag attribute. Reviewers: pcc, vitalybuka, hctim, ostannard Subscribers: srhines, mgorny, javed.absar, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64173 llvm-svn: 366361
* Basic codegen for MTE stack tagging.Evgeniy Stepanov2019-07-1712-7/+357
| | | | | | | | | | | | Implement IR intrinsics for stack tagging. Generated code is very unoptimized for now. Two special intrinsics, llvm.aarch64.irg.sp and llvm.aarch64.tagp are used to implement a tagged stack frame pointer in a virtual register. Differential Revision: https://reviews.llvm.org/D64172 llvm-svn: 366360
* Revert [AArch64] Add support for Transactional Memory Extension (TME)Momchil Velikov2019-07-174-77/+12
| | | | | | This reverts r366322 (git commit 4b8da3a503e434ddbc08ecf66582475765f449bc) llvm-svn: 366355
* [AArch64] Add support for Transactional Memory Extension (TME)Momchil Velikov2019-07-174-12/+77
| | | | | | | | | | | | | | | | | | | | | | | TME is a future architecture technology, documented in https://developer.arm.com/architectures/cpu-architecture/a-profile/exploration-tools https://developer.arm.com/docs/ddi0601/a More about the future architectures: https://community.arm.com/developer/ip-products/processors/b/processors-ip-blog/posts/new-technologies-for-the-arm-a-profile-architecture This patch adds support for the TME instructions TSTART, TTEST, TCOMMIT, and TCANCEL and the target feature/arch extension "tme". It also implements TME builtin functions, defined in ACLE Q2 2019 (https://developer.arm.com/docs/101028/latest) Patch by Javed Absar and Momchil Velikov Differential Revision: https://reviews.llvm.org/D64416 llvm-svn: 366322
* [AArch64] Implement __jcvt intrinsic from Armv8.3-AKyrylo Tkachov2019-07-161-1/+3
| | | | | | | | | | | | | | | | The jcvt intrinsic defined in ACLE [1] is available when ARM_FEATURE_JCVT is defined. This change introduces the AArch64 intrinsic, wires it up to the instruction and a new clang builtin function. The __ARM_FEATURE_JCVT macro is now defined when an Armv8.3-A or higher target is used. I've implemented the target detection logic in Clang so that this feature is enabled for architectures from armv8.3-a onwards (so -march=armv8.4-a also enables this, for example). make check-all didn't show any new failures. [1] https://developer.arm.com/docs/101028/latest/data-processing-intrinsics Differential Revision: https://reviews.llvm.org/D64495 llvm-svn: 366197
* Fix parameter name comments using clang-tidy. NFC.Rui Ueyama2019-07-161-3/+3
| | | | | | | | | | | | | | | | | | | | | This patch applies clang-tidy's bugprone-argument-comment tool to LLVM, clang and lld source trees. Here is how I created this patch: $ git clone https://github.com/llvm/llvm-project.git $ cd llvm-project $ mkdir build $ cd build $ cmake -GNinja -DCMAKE_BUILD_TYPE=Debug \ -DLLVM_ENABLE_PROJECTS='clang;lld;clang-tools-extra' \ -DCMAKE_EXPORT_COMPILE_COMMANDS=On -DLLVM_ENABLE_LLD=On \ -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ ../llvm $ ninja $ parallel clang-tidy -checks='-*,bugprone-argument-comment' \ -config='{CheckOptions: [{key: StrictMode, value: 1}]}' -fix \ ::: ../llvm/lib/**/*.{cpp,h} ../clang/lib/**/*.{cpp,h} ../lld/**/*.{cpp,h} llvm-svn: 366177
* Factor out resolveFrameOffsetReference (NFC).Evgeniy Stepanov2019-07-122-12/+26
| | | | | | | | | | | | | | | | | | | Split AArch64FrameLowering::resolveFrameIndexReference in two parts * Finding frame offset for the index. * Finding base register and offset to that register. The second part will be used to implement a virtual frame pointer in armv8.5 MTE stack instrumentation lowering. Reviewers: pcc, vitalybuka, hctim, ostannard Subscribers: javed.absar, kristof.beyls, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64171 llvm-svn: 365958
* [AArch64][GlobalISel] Optimize compare and branch cases with G_INTTOPTR and ↵Amara Emerson2019-07-101-4/+15
| | | | | | | | | | | | | | | | | | | | unknown values. Since we have distinct types for pointers and scalars, G_INTTOPTRs can sometimes obstruct attempts to find constant source values. These usually come about when try to do some kind of null pointer check. Teaching getConstantVRegValWithLookThrough about this operation allows the CBZ/CBNZ optimization to catch more cases. This change also improves the case where we can't find a constant source at all. Previously we would emit a cmp, cset and tbnz for that. Now we try to just emit a cmp and conditional branch, saving an instruction. The cumulative code size improvement of this change plus D64354 is 5.5% geomean on arm64 CTMark -O0. Differential Revision: https://reviews.llvm.org/D64377 llvm-svn: 365690
* [GlobalISel][AArch64] Use getOpcodeDef instead of findMIFromRegJessica Paquette2019-07-101-14/+3
| | | | | | | | | | | | | | | | Some minor cleanup. This function in Utils does the same thing as `findMIFromReg`. It also looks through copies, which `findMIFromReg` didn't. Delete `findMIFromReg` and use `getOpcodeDef` instead. This only happens in `tryOptVectorDup` right now. Update opt-shuffle-splat to show that we can look through the copies now, too. Differential Revision: https://reviews.llvm.org/D64520 llvm-svn: 365684
* [GlobalISel][AArch64][NFC] Use getDefIgnoringCopies from Utils where we canJessica Paquette2019-07-101-22/+5
| | | | | | | | | | | | | | | | | There are a few places where we walk over copies throughout AArch64InstructionSelector.cpp. In Utils, there's a function that does exactly this which we can use instead. Note that the utility function works with the case where we run into a COPY from a physical register. We've run into bugs with this a couple times, so using it should defend us from similar future bugs. Also update opt-fold-compare.mir to show that we still handle physical registers properly. Differential Revision: https://reviews.llvm.org/D64513 llvm-svn: 365683
* Revert "[System Model] [TTI] Update cache and prefetch TTI interfaces"David Greene2019-07-103-4/+28
| | | | | | | | This broke some PPC prefetching tests. This reverts commit 9fdfb045ae8bb643ab0d0455dcf9ecaea3b1eb3c. llvm-svn: 365680
* [System Model] [TTI] Update cache and prefetch TTI interfacesDavid Greene2019-07-103-28/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Rework the TTI cache and software prefetching APIs to prepare for the introduction of a general system model. Changes include: - Marking existing interfaces const and/or override as appropriate - Adding comments - Adding BasicTTIImpl interfaces that delegate to a subtarget implementation - Adding a default "no information" subtarget implementation Only a handful of targets use these interfaces currently: AArch64, Hexagon, PPC and SystemZ. AArch64 already has a custom subtarget implementation, so its custom TTI implementation is migrated to use the new facilities in BasicTTIImpl to invoke its custom subtarget implementation. The custom TTI implementations continue to exist for the other targets with this change. They are not moved over to subtarget-based implementations. The end goal is to have the default subtarget implementation defer to the system model defined by the target. With this change, the default subtarget implementation essentially returns "no information" for these interfaces. None of the existing users of TTI will hit that implementation because they define their own custom TTI implementations and won't use the BasicTTIImpl implementations. Once system models are in place for the targets that use these interfaces, their custom TTI implementations can be removed. Differential Revision: https://reviews.llvm.org/D63614 llvm-svn: 365676
* MC: AArch64: Add support for pg_hi21_nc relocation specifier.Peter Collingbourne2019-07-101-0/+2
| | | | | | Differential Revision: https://reviews.llvm.org/D64455 llvm-svn: 365661
* Fix "result of 32-bit shift implicitly converted to 64 bits" warning. NFCI.Simon Pilgrim2019-07-101-1/+1
| | | | llvm-svn: 365612
* hwasan: Improve precision of checks using short granule tags.Peter Collingbourne2019-07-091-3/+68
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | A short granule is a granule of size between 1 and `TG-1` bytes. The size of a short granule is stored at the location in shadow memory where the granule's tag is normally stored, while the granule's actual tag is stored in the last byte of the granule. This means that in order to verify that a pointer tag matches a memory tag, HWASAN must check for two possibilities: * the pointer tag is equal to the memory tag in shadow memory, or * the shadow memory tag is actually a short granule size, the value being loaded is in bounds of the granule and the pointer tag is equal to the last byte of the granule. Pointer tags between 1 to `TG-1` are possible and are as likely as any other tag. This means that these tags in memory have two interpretations: the full tag interpretation (where the pointer tag is between 1 and `TG-1` and the last byte of the granule is ordinary data) and the short tag interpretation (where the pointer tag is stored in the granule). When HWASAN detects an error near a memory tag between 1 and `TG-1`, it will show both the memory tag and the last byte of the granule. Currently, it is up to the user to disambiguate the two possibilities. Because this functionality obsoletes the right aligned heap feature of the HWASAN memory allocator (and because we can no longer easily test it), the feature is removed. Also update the documentation to cover both short granule tags and outlined checks. Differential Revision: https://reviews.llvm.org/D63908 llvm-svn: 365551
* [AArch64][GlobalISel] Optimize conditional branches followed by ↵Amara Emerson2019-07-091-0/+2
| | | | | | | | | | | | | | unconditional branches If we have an icmp->brcond->br sequence where the brcond just branches to the next block jumping over the br, while the br takes the false edge, then we can modify the conditional branch to jump to the br's target while inverting the condition of the incoming icmp. This means we can eliminate the br as an unconditional branch to the fallthrough block. Differential Revision: https://reviews.llvm.org/D64354 llvm-svn: 365510
* [AArch64][GlobalISel] Use TST for comparisons when possibleJessica Paquette2019-07-081-45/+98
| | | | | | | | | | | | | | | | | Porting over the part of `emitComparison` in AArch64ISelLowering where we use TST to represent a compare. - Rename `tryOptCMN` to `tryFoldIntegerCompare`, since it now also emits TSTs when possible. - Add a utility function for emitting a TST with register operands. - Rename opt-fold-cmn.mir to opt-fold-compare.mir, since it now also tests the TST fold as well. Differential Revision: https://reviews.llvm.org/D64371 llvm-svn: 365404
* GlobalISel: Convert some build functions to using SrcOp/DstOpMatt Arsenault2019-07-081-8/+6
| | | | llvm-svn: 365343
* Fix precedence in assert from r364961Jessica Paquette2019-07-031-1/+2
| | | | | | | | Precedence was wrong in an assert added in r364961. Add braces around the assertion condition to make it right. See: https://reviews.llvm.org/D64084 llvm-svn: 365069
* [GlobalISel][AArch64] Use getConstantVRegValWithLookThrough for selectArithImmedJessica Paquette2019-07-031-6/+4
| | | | | | | | | | | Instead of just stopping to see if we have a G_CONSTANT, instead, look through G_TRUNCs, G_SEXTs, and G_ZEXTs. This gives an average ~1.3% code size improvement on CINT2000 at -O3. Differential Revision: https://reviews.llvm.org/D64108 llvm-svn: 365063
* [Codegen][X86][AArch64][ARM][PowerPC] Inc-of-add vs sub-of-not (PR42457)Roman Lebedev2019-07-032-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | Summary: This is the backend part of [[ https://bugs.llvm.org/show_bug.cgi?id=42457 | PR42457 ]]. In middle-end, we'd want to prefer the form with two adds - D63992, but as this diff shows, not every target will prefer that pattern. Out of 4 targets for which i added tests all seem to be ok with inc-of-add for scalars, but only X86 prefer that same pattern for vectors. Here i'm adding a new TLI hook, always defaulting to the inc-of-add, but adding AArch64,ARM,PowerPC overrides to prefer inc-of-add only for scalars. Reviewers: spatel, RKSimon, efriedma, t.p.northover, hfinkel Reviewed By: efriedma Subscribers: nemanjai, javed.absar, kristof.beyls, kbarton, jsji, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D64090 llvm-svn: 365010
* [AArch64][GlobalISel] Overhaul legalization & isel or shifts to select ↵Amara Emerson2019-07-036-20/+234
| | | | | | | | | | | | | | | | | | | | | | | | | immediate forms. There are two main issues preventing us from generating immediate form shifts: 1) We have partial SelectionDAG imported support for G_ASHR and G_LSHR shift immediate forms, but they currently don't work because the amount type is expected to be an s64 constant, but we only legalize them to have homogenous types. To deal with this, first we introduce a custom legalizer to *only* custom legalize s32 shifts which have a constant operand into a s64. There is also an additional artifact combiner to fold zexts(g_constant) to a larger G_CONSTANT if it's legal, a counterpart to the anyext version committed in an earlier patch. 2) For G_SHL the importer can't cope with the pattern. For this I introduced an early selection phase in the arm64 selector to select these forms manually before the tablegen selector pessimizes it to a register-register variant. Differential Revision: https://reviews.llvm.org/D63910 llvm-svn: 364994
* [AArch64][GlobalISel] Teach tryOptSelect to handle G_ICMPJessica Paquette2019-07-021-106/+139
| | | | | | | | | | | | | | | | | | | | This teaches `tryOptSelect` to handle folding G_ICMP, and removes the requirement that the G_SELECT we're dealing with is floating point. Some refactoring to make this work nicely as well: - Factor out the scalar case from the selection code for G_ICMP into `emitIntegerCompare`. - Make `tryOptCMN` return a MachineInstr* instead of a bool. - Make `tryOptCMN` not modify the instruction being selected. - Factor out the CMN emission into `emitCMN` for readability. By doing this this way, we can get all of the compare selection optimizations in select emission. Differential Revision: https://reviews.llvm.org/D64084 llvm-svn: 364961
* AArch64/GlobalISel: Fix trying to select invalid MIRMatt Arsenault2019-07-011-18/+15
| | | | | | Physical registers are not allowed to be a phi operand. llvm-svn: 364810
* [AArch64 GlobalISel] Cleanup CallLowering. NFCIDiana Picus2019-06-272-49/+12
| | | | | | | | | Now that lowerCall and lowerFormalArgs have been refactored, we can simplify splitToValueTypes. Differential Revision: https://reviews.llvm.org/D63552 llvm-svn: 364513
* [GlobalISel] Accept multiple vregs for lowerCall's argsDiana Picus2019-06-271-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | Change the interface of CallLowering::lowerCall to accept several virtual registers for each argument, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660 and lowerFormalArguments in D63549. With this change, we no longer pack the virtual registers generated for aggregates into one big lump before delegating to the target. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. NFCI for AMDGPU, Mips and X86. Differential Revision: https://reviews.llvm.org/D63551 llvm-svn: 364512
* [GlobalISel] Accept multiple vregs for lowerCall's resultDiana Picus2019-06-271-9/+3
| | | | | | | | | | | | | | | | | | | | | | | | Change the interface of CallLowering::lowerCall to accept several virtual registers for the call result, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660 and lowerFormalArguments in D63549. With this change, we no longer pack the virtual registers generated for aggregates into one big lump before delegating to the target. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. NFCI for AMDGPU, Mips and X86. Differential Revision: https://reviews.llvm.org/D63550 llvm-svn: 364511
* [GlobalISel] Accept multiple vregs in lowerFormalArgsDiana Picus2019-06-272-20/+25
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Change the interface of CallLowering::lowerFormalArguments to accept several virtual registers for each formal argument, instead of just one. This is a follow-up to D46018. CallLowering::lowerReturn was similarly refactored in D49660. lowerCall will be refactored in the same way in follow-up patches. With this change, we forward the virtual registers generated for aggregates to CallLowering. Therefore, the target can decide itself whether it wants to handle them as separate pieces or use one big register. We also copy the pack/unpackRegs helpers to CallLowering to facilitate this. ARM and AArch64 have been updated to use the passed in virtual registers directly, which means we no longer need to generate so many merge/extract instructions. AArch64 seems to have had a bug when lowering e.g. [1 x i8*], which was put into a s64 instead of a p0. Added a test-case which illustrates the problem more clearly (it crashes without this patch) and fixed the existing test-case to expect p0. AMDGPU has been updated to unpack into the virtual registers for kernels. I think the other code paths fall back for aggregates, so this should be NFC. Mips doesn't support aggregates yet, so it's also NFC. x86 seems to have code for dealing with aggregates, but I couldn't find the tests for it, so I just added a fallback to DAGISel if we get more than one virtual register for an argument. Differential Revision: https://reviews.llvm.org/D63549 llvm-svn: 364510
* [GlobalISel] Allow multiple VRegs in ArgInfo. NFCDiana Picus2019-06-271-7/+10
| | | | | | | | | | | Allow CallLowering::ArgInfo to contain more than one virtual register. This is useful when passes split aggregates into several virtual registers, but need to also provide information about the original type to the call lowering. Used in follow-up patches. Differential Revision: https://reviews.llvm.org/D63548 llvm-svn: 364509
* Don't look for the TargetFrameLowering in the implementationMatt Arsenault2019-06-251-2/+1
| | | | | | The same oddity was apparently copy-pasted between multiple targets. llvm-svn: 364349
* GlobalISel: Remove unsigned variant of SrcOpMatt Arsenault2019-06-242-118/+118
| | | | | | | | | Force using Register. One downside is the generated register enums require explicit conversion. llvm-svn: 364194
OpenPOWER on IntegriCloud