summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AArch64
Commit message (Collapse)AuthorAgeFilesLines
...
* Remove StringMap::GetOrCreateValue in favor of StringMap::insertDavid Blaikie2014-11-191-1/+1
| | | | | | | | | | | | | | Having two ways to do this doesn't seem terribly helpful and consistently using the insert version (which we already has) seems like it'll make the code easier to understand to anyone working with standard data structures. (I also updated many references to the Entry's key and value to use first() and second instead of getKey{Data,Length,} and get/setValue - for similar consistency) Also removes the GetOrCreateValue functions so there's less surface area to StringMap to fix/improve/change/accommodate move semantics, etc. llvm-svn: 222319
* [Aarch64] Customer lowering of CTPOP to SIMD should check for NEON availabilityWeiming Zhao2014-11-191-0/+3
| | | | llvm-svn: 222292
* [FastISel][AArch64] Also allow folding of sign-/zero-extend and arithmeticChad Rosier2014-11-181-2/+3
| | | | | | | | | shift-right for booleans (i1). Arithmetic shift-right immediate with sign-/zero-extensions also works for boolean values. Update the assert and the test cases to reflect that fact. llvm-svn: 222272
* [FastISel][AArch64] Also allow folding of sign-/zero-extend and logicalChad Rosier2014-11-181-2/+3
| | | | | | | | | shift-right for booleans (i1). Logical shift-right immediate with sign-/zero-extensions also works for boolean values. Update the assert and the test cases to reflect that fact. llvm-svn: 222270
* [FastISel][AArch64] Follow-up fix for "Fix shift-immediate emission for ↵Juergen Ributzka2014-11-181-17/+26
| | | | | | | | | | | "zero" shifts." Shifts also perform sign-/zero-extends to larger types, which requires us to emit an integer extend instead of a simple COPY. Related to PR21594. llvm-svn: 222257
* [AArch64] Don't optimize all compare instructions.Juergen Ributzka2014-11-181-26/+51
| | | | | | | | | | | | | | | | "optimizeCompareInstr" converts compares (cmp/cmn) into plain sub/add instructions when the flags are not used anymore. This conversion is valid for most instructions, but not all. Some instructions that don't set the flags (e.g. sub with immediate) can set the SP, whereas the flag setting version uses the same encoding for the "zero" register. Update the code to also check for the return register before performing the optimization to make sure that a cmp doesn't suddenly turn into a sub that sets the stack pointer. I don't have a test case for this, because it isn't easy to trigger. llvm-svn: 222255
* [FastISel][AArch64] Fix shift-immediate emission for "zero" shifts.Juergen Ributzka2014-11-181-6/+33
| | | | | | | | This change emits a COPY for a shift-immediate with a "zero" shift value. This fixes PR21594 where we emitted a shift instruction with an incorrect immediate operand. llvm-svn: 222247
* We can get the TLOF from the TargetMachine - so constructor no longer ↵Aditya Nandakumar2014-11-131-1/+1
| | | | | | requires TargetLoweringObjectFile to be passed. llvm-svn: 221926
* [FastISel][AArch64] Don't bail during simple GEP instruction selection.Juergen Ributzka2014-11-131-0/+23
| | | | | | | | | | | | | | | The generic FastISel code would bail, because it can't emit a sign-extend for AArch64. This copies the code over and uses AArch64 specific emit functions. This is not ideal and 'computeAddress' should handles this, so it can fold the address computation into the memory operation. I plan to clean up 'computeAddress' anyways, so I will add that in a future commit. Related to rdar://problem/18962471. llvm-svn: 221923
* This patch changes the ownership of TLOF from TargetLoweringBase to ↵Aditya Nandakumar2014-11-133-10/+18
| | | | | | TargetMachine so that different subtargets could share the TLOF effectively llvm-svn: 221878
* [FastISel][AArch64] Optimize select when one of the operands is a 'true' or ↵Juergen Ributzka2014-11-131-0/+61
| | | | | | | | | | | 'false' value. Optimize selects of i1 in the presence of 'true' and 'false' operands to simple logic operations. This fixes rdar://problem/18960150. llvm-svn: 221848
* [FastISel][AArch64] Fold the cmp into the select when possible.Juergen Ributzka2014-11-131-0/+54
| | | | | | | | | This folds the compare emission into the select emission when possible, so we can directly use the flags and don't have to emit a separate compare. Related to rdar://problem/18960150. llvm-svn: 221847
* [FastISel][AArch64] Extend 'select' lowering to support also i1 to i16.Juergen Ributzka2014-11-131-34/+46
| | | | | | Related to rdar://problem/18960150. llvm-svn: 221846
* Pass an ArrayRef to MCDisassembler::getInstruction.Rafael Espindola2014-11-122-6/+3
| | | | | | | | | | | | With this patch MCDisassembler::getInstruction takes an ArrayRef<uint8_t> instead of a MemoryObject. Even on X86 there is a maximum size an instruction can have. Given that, it seems way simpler and more efficient to just pass an ArrayRef to the disassembler instead of a MemoryObject and have it do a virtual call every time it wants some extra bytes. llvm-svn: 221751
* [FastISel][AArch64] Add support for fabs intrinsic.Juergen Ributzka2014-11-111-0/+26
| | | | | | | | Lower the llvm.fabs intrinsic to the 'fabs' MI instruction. This fixes rdar://problem/18946552. llvm-svn: 221729
* MCAsmParserExtension has a copy of the MCAsmParser. Use it.Rafael Espindola2014-11-111-10/+31
| | | | | | Base classes were storing a second copy. llvm-svn: 221667
* [AArch64][FastISel] Fix kill flags for integer extends.Juergen Ributzka2014-11-101-0/+8
| | | | | | | | | In the case we optimize an integer extend away and replace it directly with the source register, we also have to clear all kill flags at all its uses. This is necessary, because the orignal IR instruction might be trivially dead, but we replaced it with a nop at MI level. llvm-svn: 221628
* Misc style fixes. NFC.Rafael Espindola2014-11-102-14/+13
| | | | | | | | | | | | | This fixes a few cases of: * Wrong variable name style. * Lines longer than 80 columns. * Repeated names in comments. * clang-format of the above. This make the next patch a lot easier to read. llvm-svn: 221615
* [AArch64] Keep flags on condition vreg when instantiating a CB branch.Ahmed Bougacha2014-11-071-1/+2
| | | | | | | | | | Reversing a CB* instruction used to drop the flags on the condition. On the included testcase, this lead to a read from an undefined vreg. Using addOperand keeps the flags, here <undef>. Differential Revision: http://reviews.llvm.org/D6159 llvm-svn: 221507
* [AArch64] Use the correct register class for ORR.Juergen Ributzka2014-11-041-1/+1
| | | | | | | | | While fixing up the register classes in the machine combiner in a previous commit I missed one. This fixes the last one and adds a test case. llvm-svn: 221308
* AArch64: Pattern match integer vector abs like we do on ARM.Benjamin Kramer2014-11-041-0/+22
| | | | | | This kind of pattern is emitted by the loop vectorizer. llvm-svn: 221289
* Rename variables to conform to llvm coding standards.Akira Hatanaka2014-11-031-28/+28
| | | | | | Differential Revision: http://reviews.llvm.org/D6062 llvm-svn: 221204
* [AArch64] Make function processLogicalImmediate more efficient. NFC.Akira Hatanaka2014-11-031-47/+42
| | | | llvm-svn: 221199
* [AArch64] Fix miscompile of comparison with 0xffffffffffffffffOliver Stannard2014-11-031-4/+4
| | | | | | | Some literals in the AArch64 backend had 15 'f's rather than 16, causing comparisons with a constant 0xffffffffffffffff to be miscompiled. llvm-svn: 221157
* Remove redundant calls to isMaterializable.Rafael Espindola2014-11-011-7/+1
| | | | | | | | | | This removes calls to isMaterializable in the following cases: * It was redundant with a call to isDeclaration now that isDeclaration returns the correct answer for materializable functions. * It was followed by a call to Materialize. Just call Materialize and check EC. llvm-svn: 221050
* [AArch64] Check Dest Register Liveness in CondOpt pass.Chad Rosier2014-10-311-6/+12
| | | | | | | | | | | | | | | | | | | | | | | | | Our internal test reveals such case should not be transformed: cmp x17, #3 b.lt .LBB10_15 ... subs x12, x12, #1 b.gt .LBB10_1 where x12 is a liveout, becomes: cmp x17, #2 b.le .LBB10_15 ... subs x12, x12, #2 b.ge .LBB10_1 Unable to provide test case as it's difficult to reproduce on community branch. http://reviews.llvm.org/D6048 Patch by Zhaoshi Zheng <zhaoshiz@codeaurora.org>! llvm-svn: 220987
* [AArch64] CondOpt pass is missing FCMP instructions when searching backward forChad Rosier2014-10-311-0/+11
| | | | | | | | | a CMP which defines the flags used by B.CC. http://reviews.llvm.org/D6047 Patch by Zhaoshi Zheng <zhaoshiz@codeaurora.org>! llvm-svn: 220961
* AArch64: enable Cortex-A57 FP balancing on Cortex-A53.Tim Northover2014-10-281-1/+2
| | | | | | | | | | Benchmarks have shown that it's harmless to the performance there, and having a unified set of passes between the two cores where possible helps big.LITTLE deployment. Patch by Z. Zheng. llvm-svn: 220744
* AArch64InstrInfo.h: Fix a warning introduced in clang r220703. ↵NAKAMURA Takumi2014-10-271-1/+1
| | | | | | [-Winconsistent-missing-override] llvm-svn: 220739
* [FastISel][AArch64] Emit immediate version of icmp (subs) for null pointer ↵Juergen Ributzka2014-10-271-2/+6
| | | | | | | | | | | | check. This is a minor change to use the immediate version when the operand is a null value. This should get rid of an unnecessary 'mov' instruction in debug builds and align the code more with the one generated by SelectionDAG. This fixes rdar://problem/18785125. llvm-svn: 220713
* [FastISel][AArch64] Optimize compare-and-branch for i1 to use 'tbz'.Juergen Ributzka2014-10-271-0/+4
| | | | | | | | | Minor enhancement to use 'tbz' for i1 compare-and-branch to get rid of an 'and' instruction. This fixes rdar://problem/18784953. llvm-svn: 220712
* [FastISel][AArch64] Use 'cbz' also for null values (pointers).Juergen Ributzka2014-10-271-15/+12
| | | | | | | | | The pattern matching for a 'ConstantInt' value was too restrictive. Checking for a 'Constant' with a bull value is sufficient for using an 'cbz/cbnz' instruction. This fixes rdar://problem/18784732. llvm-svn: 220709
* [FastISel][AArch64] Don't fold the 'and' instruction into the 'tbz/tbnz' ↵Juergen Ributzka2014-10-271-2/+2
| | | | | | | | | | | | instruction if it is in a different basic block. This fixes a bug where the input register was not defined for the 'tbz/tbnz' instruction. This happened, because we folded the 'and' instruction from a different basic block. This fixes rdar://problem/18784013. llvm-svn: 220704
* [FastISel][AArch64] Fix load/store with frame indices.Juergen Ributzka2014-10-271-23/+20
| | | | | | | | | | | | At higher optimization levels the LLVM IR may contain more complex patterns for loads/stores from/to frame indices. The 'computeAddress' function wasn't able to handle this and triggered an assertion. This fix extends the possible addressing modes for frame indices. This fixes rdar://problem/18783298. llvm-svn: 220700
* [PBQP] Unique allowed-sets for nodes in the PBQP graph and use pairs of theseLang Hames2014-10-271-8/+8
| | | | | | | | | | | sets as keys into a cache of interference matrice values in the Interference constraint adder. Creating interference matrices was one of the large remaining time-sinks in PBQP. Caching them reduces the total compile time (when using PBQP) on the nightly test suite by ~10%. llvm-svn: 220688
* [AArch64] Fix fast-isel of cbz of i1, i8, i16Oliver Stannard2014-10-241-0/+6
| | | | | | | | | | This fixes a miscompilation in the AArch64 fast-isel which was triggered when a branch is based on an icmp with condition eq or ne, and type i1, i8 or i16. The cbz instruction compares the whole 32-bit register, so values with the bottom 1, 8 or 16 bits clear would cause the wrong branch to be taken. llvm-svn: 220553
* [AArch64] Add support for the .inst directive.Chad Rosier2014-10-225-0/+137
| | | | | | | | | | This has been implement using the MCTargetStreamer interface as is done in the ARM, Mips and PPC backends. Phabricator: http://reviews.llvm.org/D5891 PR20964 llvm-svn: 220422
* [AArch64] Cleanup A57PBQPConstraintsArnaud A. de Grandmaison2014-10-223-48/+50
| | | | | | And add a long awaited testcase. llvm-svn: 220381
* [PBQP] Teach PassConfig to tell if the default register allocator is used.Arnaud A. de Grandmaison2014-10-212-17/+2
| | | | | | | | This enables targets to adapt their pass pipeline to the register allocator in use. For example, with the AArch64 backend, using PBQP with the cortex-a57, the FPLoadBalancing pass is no longer necessary. llvm-svn: 220321
* [AArch64] Fix a silent codegen fault in BUILD_VECTOR lowering.James Molloy2014-10-171-9/+9
| | | | | | | | | | We should be talking about the number of source elements, not the number of destination elements, given we know at this point that the source and dest element numbers are not the same. While we're at it, avoid writing to std::vector::end()... Bug found with random testing and a lot of coffee. llvm-svn: 220051
* [AArch64] Fix miscompile of sdiv-by-power-of-2.Juergen Ributzka2014-10-162-4/+3
| | | | | | | | | | | When the constant divisor was larger than 32bits, then the optimized code generated for the AArch64 backend would emit the wrong code, because the shift was defined as a shift of a 32bit constant '(1<<Lg2(divisor))' and we would loose the upper 32bits. This fixes rdar://problem/18678801. llvm-svn: 219934
* Reapply "[FastISel][AArch64] Add custom lowering for GEPs."Juergen Ributzka2014-10-151-0/+76
| | | | | | | | | | | This is mostly a copy of the existing FastISel GEP code, but we have to duplicate it for AArch64, because otherwise we would bail out even for simple cases. This is because the standard fastEmit functions don't cover MUL at all and ADD is lowered very inefficientily. The original commit had a bug in the add emit logic, which has been fixed. llvm-svn: 219831
* [FastISel][AArch64] Factor out add with immediate emission into a helper ↵Juergen Ributzka2014-10-151-13/+28
| | | | | | | | function. NFC. Simplify add with immediate emission by factoring it out into a helper function. llvm-svn: 219830
* Simplify handling of --noexecstack by using getNonexecutableStackSection.Rafael Espindola2014-10-153-7/+4
| | | | llvm-svn: 219799
* Revert "[FastISel][AArch64] Add custom lowering for GEPs."Juergen Ributzka2014-10-151-85/+0
| | | | | | This breaks our internal build bots. Reverting it to get the bots green again. llvm-svn: 219776
* Remove unused variable.Eric Christopher2014-10-151-1/+0
| | | | llvm-svn: 219750
* [AArch64] Wrong CC access in CSINC-conditional branch sequenceGerolf Hoflehner2014-10-141-5/+1
| | | | | | | | This is a follow up to commit r219742. It removes the CCInMI variable and accesses the CC in CSCINC directly. In the case of a conditional branch accessing the CC with CCInMI was wrong. llvm-svn: 219748
* [AAarch64] Optimize CSINC-branch sequenceGerolf Hoflehner2014-10-142-29/+137
| | | | | | | | | | | | | | | | | | | | | Peephole optimization that generates a single conditional branch for csinc-branch sequences like in the examples below. This is possible when the csinc sets or clears a register based on a condition code and the branch checks that register. Also the condition code may not be modified between the csinc and the original branch. Examples: 1. Convert csinc w9, wzr, wzr, <CC>;tbnz w9, #0, 0x44 to b.<invCC> 2. Convert csinc w9, wzr, wzr, <CC>; tbz w9, #0, 0x44 to b.<CC> rdar://problem/18506500 llvm-svn: 219742
* [FastISel][AArch64] Add custom lowering for GEPs.Juergen Ributzka2014-10-141-0/+85
| | | | | | | | This is mostly a copy of the existing FastISel GEP code, but on AArch64 we bail out even for simple cases, because the standard fastEmit functions don't cover MUL and ADD is lowered inefficientily. llvm-svn: 219726
* [FastISel][AArch64] Fix sign-/zero-extend folding when SelectionDAG is involved.Juergen Ributzka2014-10-141-39/+190
| | | | | | | | | | | Sign-/zero-extend folding depended on the load and the integer extend to be both selected by FastISel. This cannot always be garantueed and SelectionDAG might interfer. This commit adds additonal checks to load and integer extend lowering to catch this. Related to rdar://problem/18495928. llvm-svn: 219716
OpenPOWER on IntegriCloud