summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AArch64
Commit message (Collapse)AuthorAgeFilesLines
* [SelectionDAG] Allow targets to specify legality of extloads' resultAhmed Bougacha2015-01-081-10/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | type (in addition to the memory type). The *LoadExt* legalization handling used to only have one type, the memory type. This forced users to assume that as long as the extload for the memory type was declared legal, and the result type was legal, the whole extload was legal. However, this isn't always the case. For instance, on X86, with AVX, this is legal: v4i32 load, zext from v4i8 but this isn't: v4i64 load, zext from v4i8 Whereas v4i64 is (arguably) legal, even without AVX2. Note that the same thing was done a while ago for truncstores (r46140), but I assume no one needed it yet for extloads, so here we go. Calls to getLoadExtAction were changed to add the value type, found manually in the surrounding code. Calls to setLoadExtAction were mechanically changed, by wrapping the call in a loop, to match previous behavior. The loop iterates over the MVT subrange corresponding to the memory type (FP vectors, etc...). I also pulled neighboring setTruncStoreActions into some of the loops; those shouldn't make a difference, as the additional types are illegal. (e.g., i128->i1 truncstores on PPC.) No functional change intended. Differential Revision: http://reviews.llvm.org/D6532 llvm-svn: 225421
* [CodeGen] Use MVT iterator_ranges in legality loops. NFC intended.Ahmed Bougacha2015-01-071-20/+15
| | | | | | | | A few loops do trickier things than just iterating on an MVT subset, so I'll leave them be for now. Follow-up of r225387. llvm-svn: 225392
* Revert r225165 and r225169Karthik Bhat2015-01-071-39/+0
| | | | | | | | Even thouh gcc produces simialr instructions as Owen pointed out the two patterns aren’t equivalent in the case where the original subtraction could have caused an overflow. Reverting the same. llvm-svn: 225341
* Revert r225048: It broke ObjC on AArch64.Lang Hames2015-01-062-32/+68
| | | | | | I've filed http://llvm.org/PR22100 to track this issue. llvm-svn: 225228
* [AArch64] Improve codegen of store lane instructions by avoiding GPR usage.Ahmed Bougacha2015-01-051-2/+2
| | | | | | | | | | | | | | | | | | | | We used to generate code similar to: umov.b w8, v0[2] strb w8, [x0, x1] because the STR*ro* patterns were preferred to ST1*. Instead, we can avoid going through GPRs, and generate: add x8, x0, x1 st1.b { v0 }[2], [x8] This patch increases the ST1* AddedComplexity to achieve that. rdar://16372710 Differential Revision: http://reviews.llvm.org/D6202 llvm-svn: 225183
* [AArch64] Improve codegen of store lane 0 instructions by directly storing ↵Ahmed Bougacha2015-01-051-0/+27
| | | | | | | | | | | | | | | | | | | | | | | the subregister. For 0-lane stores, we used to generate code similar to: fmov w8, s0 str w8, [x0, x1, lsl #2] instead of: str s0, [x0, x1, lsl #2] To correct that: for store lane 0 patterns, directly match to STR <subreg>0. Byte-sized instructions don't have the special case for a 0 index, because FPR8s are defined to have untyped content. rdar://16372710 Differential Revision: http://reviews.llvm.org/D6772 llvm-svn: 225181
* Select lower fsub,fabs pattern to fabd on AArch64Karthik Bhat2015-01-051-0/+12
| | | | | | | | | | | | This patch lowers patterns such as- fsub v0.4s, v0.4s, v1.4s fabs v0.4s, v0.4s to fabd v0.4s, v0.4s, v1.4s on AArch64. Review: http://reviews.llvm.org/D6791 llvm-svn: 225169
* Select lower sub,abs pattern to sabd on AArch64Karthik Bhat2015-01-051-0/+27
| | | | | | | | | | | | This patch lowers patterns such as- sub v0.4s, v0.4s, v1.4s abs v0.4s, v0.4s to sabd v0.4s, v0.4s, v1.4s on AArch64. Review: http://reviews.llvm.org/D6781 llvm-svn: 225165
* Replace several 'assert(false' with 'llvm_unreachable' or fold a condition ↵Craig Topper2015-01-053-5/+4
| | | | | | into the assert. llvm-svn: 225160
* ARM: permit tail calls to weak externals on COFFSaleem Abdulrasool2015-01-031-1/+3
| | | | | | | | | | | Weak externals are resolved statically, so we can actually generate the tail call on PE/COFF targets without breaking the requirements. It is questionable whether we want to propagate the current behaviour for MachO as the requirements are part of the ARM ELF specifications, and it seems that prior to the SVN r215890, we would have tail'ed the call. For now, be conservative and only permit it on PE/COFF where the call will always be fully resolved. llvm-svn: 225119
* Minor cleanup to all the switches after MatchInstructionImpl in all the ↵Craig Topper2015-01-031-1/+0
| | | | | | | | AsmParsers. Make sure they all have llvm_unreachable on the default path out of the switch. Remove unnecessary "default: break". Remove a 'return' after unreachable. Fix some indentation. llvm-svn: 225114
* Add r224985 back with a fix.Rafael Espindola2014-12-312-68/+32
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The issues was that AArch64 has additional restrictions on when local relocations can be used. We have to take those into consideration when deciding to put a L symbol in the symbol table or not. Original message: Remove doesSectionRequireSymbols. In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. llvm-svn: 225048
* Revert "Remove doesSectionRequireSymbols."Rafael Espindola2014-12-312-23/+63
| | | | | | | | This reverts commit r224985. I am investigating why it made an Apple bot unhappy. llvm-svn: 225044
* Remove doesSectionRequireSymbols.Rafael Espindola2014-12-302-63/+23
| | | | | | | | | | | | | | | | | | | | | | | | | | | In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. llvm-svn: 224985
* Lower multiply-negate operation to mneg on AArch64Karthik Bhat2014-12-221-0/+4
| | | | | | | | | | | This patch pattern matches code such as- neg w8, w8 mul w8, w9, w8 to mneg w8, w8, w9 Review: http://reviews.llvm.org/D6754 llvm-svn: 224706
* ARM/AArch64: Attach the FrameSetup MIFlag to CFI instructions.Adrian Prantl2014-12-161-6/+12
| | | | | | | | | | Debug info marks the first instruction without the FrameSetup flag as being the end of the function prologue. Any CFI instructions in the middle of the function prologue would cause debug info to end the prologue too early and worse, attach the line number of the CFI instruction, which incidentally is often 0. llvm-svn: 224294
* Silence more static analyzer warnings.Michael Ilseman2014-12-152-1/+4
| | | | | | | | Add in definedness checks for shift operators, null checks when pointers are assumed by the code to be non-null, and explicit unreachables. llvm-svn: 224255
* Enable MachineVerifier in debug mode for X86, ARM, AArch64, Mips.Matthias Braun2014-12-111-5/+5
| | | | llvm-svn: 224075
* [CodeGen] Add print and verify pass after each MachineFunctionPass by defaultMatthias Braun2014-12-111-17/+13
| | | | | | | | | | | | | | | | | | | Previously print+verify passes were added in a very unsystematic way, which is annoying when debugging as you miss intermediate steps and allows bugs to stay unnotice when no verification is performed. To make this change practical I added the possibility to explicitely disable verification. I used this option on all places where no verification was performed previously (because alot of places actually don't pass the MachineVerifier). In the long term these problems should be fixed properly and verification enabled after each pass. I'll enable some more verification in subsequent commits. This is the 2nd attempt at this after realizing that PassManager::add() may actually delete the pass. llvm-svn: 224059
* This reverts commit r224043 and r224042.Rafael Espindola2014-12-111-8/+12
| | | | | | check-llvm was failing. llvm-svn: 224045
* Enable machineverifier in debug mode for X86, ARM, AArch64, MipsMatthias Braun2014-12-111-5/+5
| | | | llvm-svn: 224043
* [CodeGen] Add print and verify pass after each MachineFunctionPass by defaultMatthias Braun2014-12-111-17/+13
| | | | | | | | | | | | | | | | Previously print+verify passes were added in a very unsystematic way, which is annoying when debugging as you miss intermediate steps and allows bugs to stay unnotice when no verification is performed. To make this change practical I added the possibility to explicitely disable verification. I used this option on all places where no verification was performed previously (because alot of places actually don't pass the MachineVerifier). In the long term these problems should be fixed properly and verification enabled after each pass. I'll enable some more verification in subsequent commits. llvm-svn: 224042
* [AArch64] MachO large code-model: Materialize FP constants in code.Juergen Ributzka2014-12-103-0/+43
| | | | | | | | | | | | | | | In the large code model we have to first get the address of the GOT entry, load the address of the constant, and then load the constant itself. To avoid these loads and the GOT entry alltogether this commit changes the way how FP constants are materialized in the large code model. The constats are now materialized in a GPR and then bitconverted/moved into the FPR. Reviewed by Tim Northover Fixes rdar://problem/16572564. llvm-svn: 223941
* [FastISel][AArch64] Fix a missing nullptr check in 'computeAddress'.Juergen Ributzka2014-12-091-1/+1
| | | | | | | | | The load/store value type is currently not available when lowering the memcpy intrinsic. Add the missing nullptr check to support this in 'computeAddress'. Fixes rdar://problem/19178947. llvm-svn: 223818
* AArch64: treat HFAs containing "half" types as blocks too.Tim Northover2014-12-081-0/+5
| | | | llvm-svn: 223669
* Make the DenseMap bucket type configurable and use a smaller bucket for ↵Benjamin Kramer2014-12-061-1/+1
| | | | | | | | | | | | | | DenseSet. DenseSet used to be implemented as DenseMap<Key, char>, which usually doubled the memory footprint of the map. Now we use a compressed set so the second element uses no memory at all. This required some surgery on DenseMap as all accesses to the bucket now have to go through methods; this should have no impact on the behavior of DenseMap though. The new default bucket type for DenseMap is a slightly extended std::pair as we expose it through DenseMap's iterator and don't want to break any existing users. llvm-svn: 223588
* AArch64: use explicit MVT::i64 when creating EXTRACT_SUBVECTOR nodes.Tim Northover2014-12-061-10/+12
| | | | | | | | | All our patterns use MVT::i64, but the ISelLowering nodes were inconsistent in their choice. No functional change. llvm-svn: 223551
* [AArch64] Combining Load and IntToFp should check for neon availabilityWeiming Zhao2014-12-041-3/+4
| | | | llvm-svn: 223382
* Allow target to specify prefix for labelsMatt Arsenault2014-12-041-0/+2
| | | | | | | | Use the MCAsmInfo instead of the DataLayout, and allow specifying a custom prefix for labels specifically. HSAIL requires that labels begin with @, but global symbols with &. llvm-svn: 223323
* AArch64: fix wrong-endian parameter passing.Tim Northover2014-12-031-2/+4
| | | | | | | The blocked arguments code didn't take account of the hacks needed to support it. llvm-svn: 223247
* AArch64: strengthen Darwin ABI alignment assumptionsTim Northover2014-12-021-1/+1
| | | | | | | | | | A global variable without an explicit alignment specified should be assumed to be ABI-aligned according to its type, like on other platforms. This allows us to use better memory operations when accessing it. rdar://18533701 llvm-svn: 223180
* AArch64: don't be too greedy when folding :lo12: accesses into mem ops.Tim Northover2014-12-021-1/+22
| | | | | | | | | | | | | | | This frequently leads to cases like: ldr xD, [xN, :lo12:var] add xA, xN, :lo12:var ldr xD, [xA, #8] where the ADD would have been needed anyway, and the two distinct addressing modes can prevent the formation of an ldp. Because of how we handle ADRP (aggressively forming an ADRP/ADD pseudo-inst at ISel time), this pattern also results in duplicated ADRP instructions (one on its own to cover the ldr, and one combined with the add). llvm-svn: 223172
* [AArch64][Stackmaps] Optimize stackmap shadows on AArch64.Lang Hames2014-12-021-1/+16
| | | | | | | | | | Reduce the number of nops emitted for stackmap shadows on AArch64 by counting non-stackmap instructions up to the next branch target towards the requested shadow. <rdar://problem/14959522> llvm-svn: 223156
* AArch64: make register block rules apply to vector types too.Tim Northover2014-12-021-3/+3
| | | | | | | | The blocking code originated in ARM, which is more aggressive about casting types to a canonical representative before doing anything else, so I missed out most vector HFAs and broke the ABI. This should fix it. llvm-svn: 223126
* [AArch64] Don't combine "select (setcc i1 LHS, RHS), vL, vR".Ahmed Bougacha2014-12-011-0/+6
| | | | | | | | | | | | | r208210 introduced an optimization that improves the vector select codegen by doing the setcc on vectors directly. This is a problem they the setcc operands are i1s, because the optimization would create vectors of i1, which aren't legal. Part of PR21549. Differential Revision: http://reviews.llvm.org/D6308 llvm-svn: 223075
* [AArch64] Fix v2i8->i16 bitcast legalization.Ahmed Bougacha2014-12-011-5/+4
| | | | | | | | | | | | | | | r213378 improved f16 bitcasts, so that they go directly through subregs, instead of through the stack. That code now causes an assertion failure for bitcasts from other 16-bits types (most importantly v2i8). Correct that by doing the custom lowering for i16 bitcasts only when the input is an f16. Part of PR21549. Differential Revision: http://reviews.llvm.org/D6307 llvm-svn: 223074
* Fix capitalization. NFC.Akira Hatanaka2014-12-011-2/+2
| | | | llvm-svn: 222988
* Add missing 'override' keyword.Craig Topper2014-11-281-1/+1
| | | | llvm-svn: 222911
* Stop using ArrayRef of a const type.Tim Northover2014-11-271-1/+1
| | | | | | I *think* this is what the GCC bots are complaining about. llvm-svn: 222905
* AArch64: treat [N x Ty] as a block during procedure calls.Tim Northover2014-11-275-0/+153
| | | | | | | | | | | | | | The AAPCS treats small structs and homogeneous floating (or vector) aggregates specially, and guarantees they either get passed as a contiguous block of registers, or prevent any future use of those registers and get passed on the stack. This concept can fit quite neatly into LLVM's own type system, mapping an HFA to [N x float] and so on, and small structs to [N x i64]. Doing so allows front-ends to emit AAPCS compliant code without having to duplicate the register counting logic. llvm-svn: 222903
* Update AArch64 ELF relocations to ABI 1.0Will Newton2014-11-261-1/+1
| | | | | | | | | | | | | | | | | | | | This mostly entails adding relocations, however there are a couple of changes to existing relocations: 1. R_AARCH64_NONE is defined to be zero rather than 256 R_AARCH64_NONE has been defined to be zero for a long time elsewhere e.g. binutils and glibc since the submission of the AArch64 port in 2012 so this is required for compatibility. 2. R_AARCH64_TLSDESC_ADR_PAGE renamed to R_AARCH64_TLSDESC_ADR_PAGE21 I don't think there is any way for relocation names to leak out of LLVM so this should not break anything. Tested with check-all with no regressions. llvm-svn: 222821
* Replace neverHasSideEffects=1 with hasSideEffects=0 in all .td files.Craig Topper2014-11-262-8/+8
| | | | llvm-svn: 222801
* [FastISel][AArch64] Fix and extend the tbz/tbnz pattern matching.Juergen Ributzka2014-11-251-19/+20
| | | | | | | | | | The pattern matching failed to recognize all instances of "-1", because when comparing against "-1" we didn't use an APInt of the same bitwidth. This commit fixes this and also adds inverse versions of the conditon to catch more cases. llvm-svn: 222722
* [AArch64] Fix clobber computation in A57LoadBalancing pass.Chad Rosier2014-11-241-1/+7
| | | | | | | Extremely difficult to reproduce, so no test case included. PR21637 llvm-svn: 222677
* DAGCombiner: Allow the DAGCombiner to combine multiple FDIVs with the same ↵Hao Liu2014-11-212-0/+7
| | | | | | | | | | | | divisor info FMULs by the reciprocal. E.g., ( a / D; b / D ) -> ( recip = 1.0 / D; a * recip; b * recip) A hook is added to allow the target to control whether it needs to do such combine. Reviewed in http://reviews.llvm.org/D6334 llvm-svn: 222510
* Fix more instances of -Wsentinel on Windows with s/NULL/nullptr/Reid Kleckner2014-11-201-1/+1
| | | | | | Follow up to r221940, where I must not have caught em all. NFC llvm-svn: 222481
* Add out of line virtual destructors to all LLVMTargetMachine subclassesReid Kleckner2014-11-202-0/+4
| | | | | | | | | | | | | | | | | These recently all grew a unique_ptr<TargetLoweringObjectFile> member in r221878. When anyone calls a virtual method of a class, clang-cl requires all virtual methods to be semantically valid. This includes the implicit virtual destructor, which triggers instantiation of the unique_ptr destructor, which fails because the type being deleted is incomplete. This is just part of the ongoing saga of PR20337, which is affecting Blink as well. Because the MSVC ABI doesn't have key functions, we end up referencing the vtable and implicit destructor on any virtual call through a class. We don't actually end up emitting the dtor, so it'd be good if we could avoid this unneeded type completion work. llvm-svn: 222480
* Update SetVector to rely on the underlying set's insert to return a ↵David Blaikie2014-11-191-1/+1
| | | | | | | | | | | | | pair<iterator, bool> This is to be consistent with StringSet and ultimately with the standard library's associative container insert function. This lead to updating SmallSet::insert to return pair<iterator, bool>, and then to update SmallPtrSet::insert to return pair<iterator, bool>, and then to update all the existing users of those functions... llvm-svn: 222334
* [AArch64] Disable useAA for Cortex-A57.Hao Liu2014-11-191-1/+1
| | | | | | | | Using AA during CodeGen is very useful for in-order cores. It is less useful for ooo cores. Also I find enabling useAA for Cortex-A57 may generate worse code for some test cases. If useAA in codegen is improved and benefical for ooo cores, we can enable it again. llvm-svn: 222333
* [AArch64] Enable SeparateConstOffsetFromGEP, EarlyCSE and LICM passes on ↵Hao Liu2014-11-191-0/+18
| | | | | | | | | | | | AArch64 backend. SeparateConstOffsetFromGEP can gives more optimizaiton opportunities related to GEPs, which benefits EarlyCSE and LICM. By enabling these passes we can have better address calculations and generate a better addressing mode. Some SPEC 2006 benchmarks (astar, gobmk, namd) have obvious improvements on Cortex-A57. Reviewed in http://reviews.llvm.org/D5864. llvm-svn: 222331
OpenPOWER on IntegriCloud