summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target/AArch64/AArch64FastISel.cpp
Commit message (Collapse)AuthorAgeFilesLines
...
* [AArch64][FastISel] Fix the setting of kill flags for MUL -> UMULH sequences.Quentin Colombet2015-05-011-2/+8
| | | | | | rdar://problem/20748715 llvm-svn: 236346
* [AArch64] Fix bad register class constraint in fast-isel for TST instruction.Quentin Colombet2015-04-301-1/+4
| | | | | | rdar://problem/20748715 llvm-svn: 236273
* Disable AArch64 fast-isel on big-endian call vector returns.Pete Cooper2015-04-161-0/+5
| | | | | | | | A big-endian vector return needs a byte-swap which we aren't doing right now. For now just bail on these cases to get correctness back. llvm-svn: 235133
* [AArch64][FastISel] Fix integer extend optimization.Juergen Ributzka2015-04-091-5/+6
| | | | | | | | | | | | | | The integer extend optimization tries to fold the extend into the load instruction. This requires us to identify if the extend has already been emitted or not and act accordingly on it. The check that was originally performed for this was not sufficient. Besides checking the ValueMap for a mapped register we also need to check if the virtual register has already an associated machine instruction that defines it. This fixes rdar://problem/20470788. llvm-svn: 234529
* Refactor: Simplify boolean expressions in AArch64 targetDavid Blaikie2015-03-241-1/+1
| | | | | | | | | | | | Simplify boolean expressions using `true` and `false` with `clang-tidy` Patch by Richard Thomson. Reviewed By: rengolin Differential Revision: http://reviews.llvm.org/D8525 llvm-svn: 233089
* Have getCallPreservedMask and getThisCallPreservedMask take aEric Christopher2015-03-111-1/+1
| | | | | | | MachineFunction argument so that we can grab subtarget specific features off of it. llvm-svn: 231979
* Clean up some uses of getSubtarget in AArch64.Eric Christopher2015-01-301-4/+4
| | | | llvm-svn: 227530
* Migrate AArch64 except for TTI and AsmPrinter away from getSubtargetImpl.Eric Christopher2015-01-281-1/+1
| | | | llvm-svn: 227293
* [AArch64] Implement GHC calling conventionGreg Fitzgerald2015-01-191-0/+2
| | | | | | | | | | Original patch by Luke Iannini. Minor improvements and test added by Erik de Castro Lopo. Differential Revision: http://reviews.llvm.org/D6877 From: Erik de Castro Lopo <erikd@mega-nerd.com> llvm-svn: 226473
* [AArch64] MachO large code-model: Materialize FP constants in code.Juergen Ributzka2014-12-101-0/+18
| | | | | | | | | | | | | | | In the large code model we have to first get the address of the GOT entry, load the address of the constant, and then load the constant itself. To avoid these loads and the GOT entry alltogether this commit changes the way how FP constants are materialized in the large code model. The constats are now materialized in a GPR and then bitconverted/moved into the FPR. Reviewed by Tim Northover Fixes rdar://problem/16572564. llvm-svn: 223941
* [FastISel][AArch64] Fix a missing nullptr check in 'computeAddress'.Juergen Ributzka2014-12-091-1/+1
| | | | | | | | | The load/store value type is currently not available when lowering the memcpy intrinsic. Add the missing nullptr check to support this in 'computeAddress'. Fixes rdar://problem/19178947. llvm-svn: 223818
* AArch64: treat [N x Ty] as a block during procedure calls.Tim Northover2014-11-271-0/+1
| | | | | | | | | | | | | | The AAPCS treats small structs and homogeneous floating (or vector) aggregates specially, and guarantees they either get passed as a contiguous block of registers, or prevent any future use of those registers and get passed on the stack. This concept can fit quite neatly into LLVM's own type system, mapping an HFA to [N x float] and so on, and small structs to [N x i64]. Doing so allows front-ends to emit AAPCS compliant code without having to duplicate the register counting logic. llvm-svn: 222903
* [FastISel][AArch64] Fix and extend the tbz/tbnz pattern matching.Juergen Ributzka2014-11-251-19/+20
| | | | | | | | | | The pattern matching failed to recognize all instances of "-1", because when comparing against "-1" we didn't use an APInt of the same bitwidth. This commit fixes this and also adds inverse versions of the conditon to catch more cases. llvm-svn: 222722
* [FastISel][AArch64] Also allow folding of sign-/zero-extend and arithmeticChad Rosier2014-11-181-2/+3
| | | | | | | | | shift-right for booleans (i1). Arithmetic shift-right immediate with sign-/zero-extensions also works for boolean values. Update the assert and the test cases to reflect that fact. llvm-svn: 222272
* [FastISel][AArch64] Also allow folding of sign-/zero-extend and logicalChad Rosier2014-11-181-2/+3
| | | | | | | | | shift-right for booleans (i1). Logical shift-right immediate with sign-/zero-extensions also works for boolean values. Update the assert and the test cases to reflect that fact. llvm-svn: 222270
* [FastISel][AArch64] Follow-up fix for "Fix shift-immediate emission for ↵Juergen Ributzka2014-11-181-17/+26
| | | | | | | | | | | "zero" shifts." Shifts also perform sign-/zero-extends to larger types, which requires us to emit an integer extend instead of a simple COPY. Related to PR21594. llvm-svn: 222257
* [FastISel][AArch64] Fix shift-immediate emission for "zero" shifts.Juergen Ributzka2014-11-181-6/+33
| | | | | | | | This change emits a COPY for a shift-immediate with a "zero" shift value. This fixes PR21594 where we emitted a shift instruction with an incorrect immediate operand. llvm-svn: 222247
* [FastISel][AArch64] Don't bail during simple GEP instruction selection.Juergen Ributzka2014-11-131-0/+23
| | | | | | | | | | | | | | | The generic FastISel code would bail, because it can't emit a sign-extend for AArch64. This copies the code over and uses AArch64 specific emit functions. This is not ideal and 'computeAddress' should handles this, so it can fold the address computation into the memory operation. I plan to clean up 'computeAddress' anyways, so I will add that in a future commit. Related to rdar://problem/18962471. llvm-svn: 221923
* [FastISel][AArch64] Optimize select when one of the operands is a 'true' or ↵Juergen Ributzka2014-11-131-0/+61
| | | | | | | | | | | 'false' value. Optimize selects of i1 in the presence of 'true' and 'false' operands to simple logic operations. This fixes rdar://problem/18960150. llvm-svn: 221848
* [FastISel][AArch64] Fold the cmp into the select when possible.Juergen Ributzka2014-11-131-0/+54
| | | | | | | | | This folds the compare emission into the select emission when possible, so we can directly use the flags and don't have to emit a separate compare. Related to rdar://problem/18960150. llvm-svn: 221847
* [FastISel][AArch64] Extend 'select' lowering to support also i1 to i16.Juergen Ributzka2014-11-131-34/+46
| | | | | | Related to rdar://problem/18960150. llvm-svn: 221846
* [FastISel][AArch64] Add support for fabs intrinsic.Juergen Ributzka2014-11-111-0/+26
| | | | | | | | Lower the llvm.fabs intrinsic to the 'fabs' MI instruction. This fixes rdar://problem/18946552. llvm-svn: 221729
* [AArch64][FastISel] Fix kill flags for integer extends.Juergen Ributzka2014-11-101-0/+8
| | | | | | | | | In the case we optimize an integer extend away and replace it directly with the source register, we also have to clear all kill flags at all its uses. This is necessary, because the orignal IR instruction might be trivially dead, but we replaced it with a nop at MI level. llvm-svn: 221628
* [FastISel][AArch64] Emit immediate version of icmp (subs) for null pointer ↵Juergen Ributzka2014-10-271-2/+6
| | | | | | | | | | | | check. This is a minor change to use the immediate version when the operand is a null value. This should get rid of an unnecessary 'mov' instruction in debug builds and align the code more with the one generated by SelectionDAG. This fixes rdar://problem/18785125. llvm-svn: 220713
* [FastISel][AArch64] Optimize compare-and-branch for i1 to use 'tbz'.Juergen Ributzka2014-10-271-0/+4
| | | | | | | | | Minor enhancement to use 'tbz' for i1 compare-and-branch to get rid of an 'and' instruction. This fixes rdar://problem/18784953. llvm-svn: 220712
* [FastISel][AArch64] Use 'cbz' also for null values (pointers).Juergen Ributzka2014-10-271-15/+12
| | | | | | | | | The pattern matching for a 'ConstantInt' value was too restrictive. Checking for a 'Constant' with a bull value is sufficient for using an 'cbz/cbnz' instruction. This fixes rdar://problem/18784732. llvm-svn: 220709
* [FastISel][AArch64] Don't fold the 'and' instruction into the 'tbz/tbnz' ↵Juergen Ributzka2014-10-271-2/+2
| | | | | | | | | | | | instruction if it is in a different basic block. This fixes a bug where the input register was not defined for the 'tbz/tbnz' instruction. This happened, because we folded the 'and' instruction from a different basic block. This fixes rdar://problem/18784013. llvm-svn: 220704
* [FastISel][AArch64] Fix load/store with frame indices.Juergen Ributzka2014-10-271-23/+20
| | | | | | | | | | | | At higher optimization levels the LLVM IR may contain more complex patterns for loads/stores from/to frame indices. The 'computeAddress' function wasn't able to handle this and triggered an assertion. This fix extends the possible addressing modes for frame indices. This fixes rdar://problem/18783298. llvm-svn: 220700
* [AArch64] Fix fast-isel of cbz of i1, i8, i16Oliver Stannard2014-10-241-0/+6
| | | | | | | | | | This fixes a miscompilation in the AArch64 fast-isel which was triggered when a branch is based on an icmp with condition eq or ne, and type i1, i8 or i16. The cbz instruction compares the whole 32-bit register, so values with the bottom 1, 8 or 16 bits clear would cause the wrong branch to be taken. llvm-svn: 220553
* [AArch64] Fix miscompile of sdiv-by-power-of-2.Juergen Ributzka2014-10-161-3/+2
| | | | | | | | | | | When the constant divisor was larger than 32bits, then the optimized code generated for the AArch64 backend would emit the wrong code, because the shift was defined as a shift of a 32bit constant '(1<<Lg2(divisor))' and we would loose the upper 32bits. This fixes rdar://problem/18678801. llvm-svn: 219934
* Reapply "[FastISel][AArch64] Add custom lowering for GEPs."Juergen Ributzka2014-10-151-0/+76
| | | | | | | | | | | This is mostly a copy of the existing FastISel GEP code, but we have to duplicate it for AArch64, because otherwise we would bail out even for simple cases. This is because the standard fastEmit functions don't cover MUL at all and ADD is lowered very inefficientily. The original commit had a bug in the add emit logic, which has been fixed. llvm-svn: 219831
* [FastISel][AArch64] Factor out add with immediate emission into a helper ↵Juergen Ributzka2014-10-151-13/+28
| | | | | | | | function. NFC. Simplify add with immediate emission by factoring it out into a helper function. llvm-svn: 219830
* Revert "[FastISel][AArch64] Add custom lowering for GEPs."Juergen Ributzka2014-10-151-85/+0
| | | | | | This breaks our internal build bots. Reverting it to get the bots green again. llvm-svn: 219776
* [FastISel][AArch64] Add custom lowering for GEPs.Juergen Ributzka2014-10-141-0/+85
| | | | | | | | This is mostly a copy of the existing FastISel GEP code, but on AArch64 we bail out even for simple cases, because the standard fastEmit functions don't cover MUL and ADD is lowered inefficientily. llvm-svn: 219726
* [FastISel][AArch64] Fix sign-/zero-extend folding when SelectionDAG is involved.Juergen Ributzka2014-10-141-39/+190
| | | | | | | | | | | Sign-/zero-extend folding depended on the load and the integer extend to be both selected by FastISel. This cannot always be garantueed and SelectionDAG might interfer. This commit adds additonal checks to load and integer extend lowering to catch this. Related to rdar://problem/18495928. llvm-svn: 219716
* [FastISel][AArch64] Teach the address computation code to also fold ↵Juergen Ributzka2014-10-071-0/+29
| | | | | | | | | | sign-/zero-extends. The code already folds sign-/zero-extends, but only if they are arguments to mul and shift instructions. This extends the code to also fold them when they are direct inputs. llvm-svn: 219187
* [FastISel][AArch64] Teach the address computation to also fold sub instructions.Juergen Ributzka2014-10-071-1/+12
| | | | | | | Tiny enhancement to the address computation code to also fold sub instructions if the rhs is constant and can be folded into the offset. llvm-svn: 219186
* [FastISel][AArch64] Fix "Fold sign-/zero-extends into the load instruction."Juergen Ributzka2014-10-071-64/+90
| | | | | | | | | | This commit fixes an issue with sign-/zero-extending loads that was discovered by Richard Barton. We use now the correct load instructions for sign-extending loads to 64bit. Also updated and added more unit tests. llvm-svn: 219185
* Add fake use to suppress defined-but-unused warningsJingyue Wu2014-10-041-0/+1
| | | | llvm-svn: 219045
* Recommit r218010 [FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ.Juergen Ributzka2014-09-301-54/+116
| | | | | | | | | | | | | | Note: This version fixed an issue with the TBZ/TBNZ instructions that were generated in FastISel. The issue was that the 64bit version of TBZ (TBZX) automagically sets the upper bit of the immediate field that is used to specify the bit we want to test. To test for any of the lower 32bits we have to first extract the subregister and use the 32bit version of the TBZ instruction (TBZW). Original commit message: Teach selectBranch to fold bit test and branch into a single instruction (TBZ or TBNZ). llvm-svn: 218693
* [FastISel][AArch64] Fold sign-/zero-extends into the load instruction.Juergen Ributzka2014-09-301-135/+220
| | | | | | | | | | | | | | The sign-/zero-extension of the loaded value can be performed by the memory instruction for free. If the result of the load has only one use and the use is a sign-/zero-extend, then we emit the proper load instruction. The extend is only a register copy and will be optimized away later on. Other instructions that consume the sign-/zero-extended value are also made aware of this fact, so they don't fold the extend too. This fixes rdar://problem/18495928. llvm-svn: 218653
* [FastISel][AArch64] Factor out scale factor calculation. NFC.Juergen Ributzka2014-09-301-35/+29
| | | | | | | Factor out the code that determines the implicit scale factor of memory operations for a given value type. llvm-svn: 218652
* [FastISel][AArch64] Also allow folding of sign-/zero-extend and shift-left ↵Juergen Ributzka2014-09-221-2/+3
| | | | | | | | | | | for booleans (i1). Shift-left immediate with sign-/zero-extensions also works for boolean values. Update the assert and the test cases to reflect that fact. This should fix a bug found by Chad. llvm-svn: 218275
* [FastIsel][AArch64] Fix a think-o in address computation.Juergen Ributzka2014-09-191-20/+27
| | | | | | | | | | When looking through sign/zero-extensions the code would always assume there is such an extension instruction and use the wrong operand for the address. There was also a minor issue in the handling of 'AND' instructions. I accidentially used a 'cast' instead of a 'dyn_cast'. llvm-svn: 218161
* Revert "[FastISel][AArch64] Fold bit test and branch into TBZ and TBNZ."Juergen Ributzka2014-09-181-32/+8
| | | | | | Reverting it until I have time to investigate a regression. llvm-svn: 218035
* Fix previous commit: [FastISel][AArch64] Simplify XALU multiplies.Juergen Ributzka2014-09-181-8/+42
| | | | | | | | When folding the intrinsic flag into the branch or select we also have to consider the fact if the intrinsic got simplified, because it changes the flag we have to check for. llvm-svn: 218034
* [FastISel][AArch64] Simplify XALU multiplies.Juergen Ributzka2014-09-181-1/+22
| | | | | | Simplify {s|u}mul.with.overflow to {s|u}add.with.overflow when possible. llvm-svn: 218033
* [FastISel][AArch64] Followup commit for 218031 to handle negative offsets too.Juergen Ributzka2014-09-181-3/+7
| | | | llvm-svn: 218032
* [FastISel][AArch64] Try to fold the offset into the add instruction when ↵Juergen Ributzka2014-09-181-4/+10
| | | | | | | | | | | | simplifying a memory address. Small optimization in 'simplifyAddress'. When the offset cannot be encoded in the load/store instruction, then we need to materialize the address manually. The add instruction can encode a wider range of immediates than the load/store instructions. This change tries to fold the offset into the add instruction first before materializing the offset in a register. llvm-svn: 218031
* [FastISel][AArch64] Fold 'AND' instruction during the address computation.Juergen Ributzka2014-09-181-0/+54
| | | | | | | | | | | The 'AND' instruction could be used to mask out the lower 32 bits of a register. If this is done inside an address computation we might be able to fold the instruction into the memory instruction itself. and x1, x1, #0xffffffff ---> ldrb x0, [x0, w1, uxtw] ldrb x0, [x0, x1] llvm-svn: 218030
OpenPOWER on IntegriCloud