summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* vec perm can go down either pipeline on P8.Eric Christopher2017-02-261-1/+1
| | | | | | No observable changes, spotted while looking at the scheduling description. llvm-svn: 296277
* [APInt] Add APInt::extractBits() method to extract APInt subrange (reapplied)Simon Pilgrim2017-02-252-8/+7
| | | | | | | | | | | | | | | | The current pattern for extract bits in range is typically: Mask.lshr(BitOffset).trunc(SubSizeInBits); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable. This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation. Differential Revision: https://reviews.llvm.org/D30336 llvm-svn: 296272
* [AVX-512] Fix the execution domain for scalar FMA instructions.Craig Topper2017-02-251-1/+2
| | | | llvm-svn: 296271
* [AVX-512] Fix the execution domain on some instructions.Craig Topper2017-02-251-4/+13
| | | | llvm-svn: 296270
* [AVX-512] Remove unnecessary masked versions of VCVTSS2SD and VCVTSD2SS ↵Craig Topper2017-02-251-24/+11
| | | | | | using the scalar register class. We only have patterns for the masked intrinsics. llvm-svn: 296264
* In visitSTORE, always use FindBetterChain, rather than only when UseAA is ↵Nirav Dave2017-02-252-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | enabled. Recommiting after fixup of 32-bit aliasing sign offset bug in DAGCombiner. * Simplify Consecutive Merge Store Candidate Search Now that address aliasing is much less conservative, push through simplified store merging search and chain alias analysis which only checks for parallel stores through the chain subgraph. This is cleaner as the separation of non-interfering loads/stores from the store-merging logic. When merging stores search up the chain through a single load, and finds all possible stores by looking down from through a load and a TokenFactor to all stores visited. This improves the quality of the output SelectionDAG and the output Codegen (save perhaps for some ARM cases where we correctly constructs wider loads, but then promotes them to float operations which appear but requires more expensive constant generation). Some minor peephole optimizations to deal with improved SubDAG shapes (listed below) Additional Minor Changes: 1. Finishes removing unused AliasLoad code 2. Unifies the chain aggregation in the merged stores across code paths 3. Re-add the Store node to the worklist after calling SimplifyDemandedBits. 4. Increase GatherAllAliasesMaxDepth from 6 to 18. That number is arbitrary, but seems sufficient to not cause regressions in tests. 5. Remove Chain dependencies of Memory operations on CopyfromReg nodes as these are captured by data dependence 6. Forward loads-store values through tokenfactors containing {CopyToReg,CopyFromReg} Values. 7. Peephole to convert buildvector of extract_vector_elt to extract_subvector if possible (see CodeGen/AArch64/store-merge.ll) 8. Store merging for the ARM target is restricted to 32-bit as some in some contexts invalid 64-bit operations are being generated. This can be removed once appropriate checks are added. This finishes the change Matt Arsenault started in r246307 and jyknight's original patch. Many tests required some changes as memory operations are now reorderable, improving load-store forwarding. One test in particular is worth noting: CodeGen/PowerPC/ppc64-align-long-double.ll - Improved load-store forwarding converts a load-store pair into a parallel store and a memory-realized bitcast of the same value. However, because we lose the sharing of the explicit and implicit store values we must create another local store. A similar transformation happens before SelectionDAG as well. Reviewers: arsenm, hfinkel, tstellarAMD, jyknight, nhaehnle llvm-svn: 296252
* Minor code cleanup. NFC.Junmo Park2017-02-251-1/+1
| | | | llvm-svn: 296207
* [WebAssembly] Add support for using a wasm global for the stack pointer.Dan Gohman2017-02-245-42/+112
| | | | | | | This replaces the __stack_pointer variable which was allocated in linear memory. llvm-svn: 296201
* [Hexagon] Undo shift folding where it could simplify addressing modeKrzysztof Parzyszek2017-02-241-3/+75
| | | | | | | | | | | | For example, avoid (single shift): r0 = and(##536870908,lsr(r0,#3)) r0 = memw(r1+r0<<#0) in favor of (two shifts): r0 = lsr(r0,#5) r0 = memw(r1+r0<<#2) llvm-svn: 296196
* [WebAssembly] Basic support for Wasm object file encoding.Dan Gohman2017-02-2423-140/+588
| | | | | | | | | With the "wasm32-unknown-unknown-wasm" triple, this allows writing out simple wasm object files, and is another step in a larger series toward migrating from ELF to general wasm object support. Note that this code and the binary format itself is still experimental. llvm-svn: 296190
* [Hexagon] Prettify code in HexagonDAGToDAGISel::SelectKrzysztof Parzyszek2017-02-241-47/+13
| | | | llvm-svn: 296187
* AMDGPU : Replace FMAD with FMA when denormals are enabled.Wei Ding2017-02-244-1/+20
| | | | | | Differential Revision: http://reviews.llvm.org/D29958 llvm-svn: 296186
* Revert "Correct register pressure calculation in presence of subregs"Stanislav Mekhanoshin2017-02-242-22/+0
| | | | | | | | This reverts commit r296009. It broke one out of tree target and also does not account for all partial lines added or removed when calculating PressureDiff. llvm-svn: 296182
* [WebAssembly] Handle f16 in fast-isel.Dan Gohman2017-02-241-0/+2
| | | | llvm-svn: 296172
* [Target/MIPS] Kill dead code, no functional change intended.Davide Italiano2017-02-241-11/+0
| | | | | | Hopefully placates gcc with -Werror. llvm-svn: 296153
* Revert: r296141 [APInt] Add APInt::extractBits() method to extract APInt ↵Simon Pilgrim2017-02-242-7/+8
| | | | | | | | | | | | | | | | | | subrange The current pattern for extract bits in range is typically: Mask.lshr(BitOffset).trunc(SubSizeInBits); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable. This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation. Differential Revision: https://reviews.llvm.org/D30336 llvm-svn: 296147
* [PowerPC] Use subfic instruction for subtract from immediateNemanja Ivanovic2017-02-241-0/+4
| | | | | | | | | | | Provide a 64-bit pattern to use SUBFIC for subtracting from a 16-bit immediate. The corresponding pattern already exists for 32-bit integers. Committing on behalf of Hiroshi Inoue. Differential Revision: https://reviews.llvm.org/D29387 llvm-svn: 296144
* [PowerPC] Use rldicr instruction for AND with an immediate if possibleNemanja Ivanovic2017-02-241-0/+13
| | | | | | | | | | | Emit clrrdi (extended mnemonic for rldicr) for AND-ing with masks that clear bits from the right hand size. Committing on behalf of Hiroshi Inoue. Differential Revision: https://reviews.llvm.org/D29388 llvm-svn: 296143
* [APInt] Add APInt::extractBits() method to extract APInt subrangeSimon Pilgrim2017-02-242-8/+7
| | | | | | | | | | | | | | | | The current pattern for extract bits in range is typically: Mask.lshr(BitOffset).trunc(SubSizeInBits); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation of memory for the temporary variable. This is another of the compile time issues identified in PR32037 (see also D30265). This patch adds the APInt::extractBits() helper method which avoids the temporary memory allocation. Differential Revision: https://reviews.llvm.org/D30336 llvm-svn: 296141
* Recommit "[mips] Fix atomic compare and swap at O0."Simon Dardis2017-02-247-151/+400
| | | | | | | | | | | | | | | | | | | | | | This time with the missing files. Similar to PR/25526, fast-regalloc introduces spills at the end of basic blocks. When this occurs in between an ll and sc, the store can cause the atomic sequence to fail. This patch fixes the issue by introducing more pseudos to represent atomic operations and moving their lowering to after the expansion of postRA pseudos. This resolves PR/32020. Thanks to James Cowgill for reporting the issue! Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D30257 llvm-svn: 296134
* Revert "[mips] Fix atomic compare and swap at O0."Simon Dardis2017-02-246-59/+151
| | | | | | This reverts r296132. I forgot to include the tests. llvm-svn: 296133
* [mips] Fix atomic compare and swap at O0.Simon Dardis2017-02-246-151/+59
| | | | | | | | | | | | | | | | | | | | Similar to PR/25526, fast-regalloc introduces spills at the end of basic blocks. When this occurs in between an ll and sc, the store can cause the atomic sequence to fail. This patch fixes the issue by introducing more pseudos to represent atomic operations and moving their lowering to after the expansion of postRA pseudos. This resolves PR/32020. Thanks to James Cowgill for reporting the issue! Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D30257 llvm-svn: 296132
* [globalisel] Decouple src pattern operands from dst pattern operands.Daniel Sanders2017-02-241-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This isn't testable for AArch64 by itself so this patch also adds support for constant immediates in the pattern and physical register uses in the result. The new IntOperandMatcher matches the constant in patterns such as '(set $rd:GPR32, (G_XOR $rs:GPR32, -1))'. It's always safe to fold immediates into an instruction so this is the first rule that will match across multiple BB's. The Renderer hierarchy is responsible for adding operands to the result instruction. Renderers can copy operands (CopyRenderer) or add physical registers (in particular %wzr and %xzr) to the result instruction in any order (OperandMatchers now import the operand names from SelectionDAG to allow renderers to access any operand). This allows us to emit the result instruction for: %1 = G_XOR %0, -1 --> %1 = ORNWrr %wzr, %0 %1 = G_XOR -1, %0 --> %1 = ORNWrr %wzr, %0 although the latter is untested since the matcher/importer has not been taught about commutativity yet. Added BuildMIAction which can build new instructions and mutate them where possible. W.r.t the mutation aspect, MatchActions are now told the name of an instruction they can recycle and BuildMIAction will emit mutation code when the renderers are appropriate. They are appropriate when all operands are rendered using CopyRenderer and the indices are the same as the matcher. This currently assumes that all operands have at least one matcher. Finally, this change also fixes a crash in AArch64InstructionSelector::select() caused by an immediate operand passing isImm() rather than isCImm(). This was uncovered by the other changes and was detected by existing tests. Depends on D29711 Reviewers: t.p.northover, ab, qcolombet, rovka, aditya_nandakumar, javed.absar Reviewed By: rovka Subscribers: aemerson, dberris, kristof.beyls, llvm-commits Differential Revision: https://reviews.llvm.org/D29712 llvm-svn: 296131
* [X86][SSE] Target shuffle combine can try to combine up to 16 vectorsSimon Pilgrim2017-02-241-6/+6
| | | | | | Noticed while profiling PR32037, the target shuffle ops were being stored in SmallVector<*,8> types but the combiner could store as many as 16 ops at maximum depth (2 per depth). llvm-svn: 296130
* [x86] use DAG.getAllOnesConstant(); NFCISanjay Patel2017-02-241-18/+11
| | | | llvm-svn: 296128
* [mips] Handle 64 bit immediate in and/or/xor pseudo instructions on mips64Simon Dardis2017-02-243-15/+34
| | | | | | | | | | | | | | | | | | | | Previously LLVM was assuming 32-bit signed immediates which results in and with a bitmask that has bit 31 set to incorrectly include bits 63-32 in the result. After applying this patch I can now compile all of the FreeBSD mips assembly code with clang. This issue also affects the nor, slt and sltu macros and I will fix those in a separate review. Patch By: Alexander Richardson Commit message reformatted by sdardis. Reviewers: atanasyan, theraven, sdardis Differential Revision: https://reviews.llvm.org/D30298 llvm-svn: 296125
* [ARM] GlobalISel: Select G_STOREDiana Picus2017-02-241-16/+20
| | | | | | Same as selecting G_LOAD. llvm-svn: 296122
* [ARM] GlobalISel: Add reg bank mappings for storesDiana Picus2017-02-241-0/+2
| | | | | | Same as the ones for loads. llvm-svn: 296115
* [ARM] GlobalISel: Legalize storesDiana Picus2017-02-241-3/+6
| | | | | | Allow the same types that we allow for loads. llvm-svn: 296108
* [mips][mc] Fix a crash when disassembling odd sized sectionsSimon Dardis2017-02-241-30/+21
| | | | | | | | | | | | | | Make the MIPS disassembler consistent with the other targets in returning a Size of zero when the input buffer cannot contain an instruction due to it's size. Previously it reported the minimum instruction size when it failed due to the buffer not being big enough for an instruction causing llvm-objdump to crash when disassembling all sections. Reviewers: slthakur Differential Revision: https://reviews.llvm.org/D29984 llvm-svn: 296105
* Revert "[ARM] GlobalISel: Legalize stores"Diana Picus2017-02-241-5/+3
| | | | | | This reverts commit r296103 because the test broke on one of the bots. Sorry! llvm-svn: 296104
* [ARM] GlobalISel: Legalize storesDiana Picus2017-02-241-3/+5
| | | | | | Allow the same types that we allow for loads. llvm-svn: 296103
* [APInt] Add APInt::setBits() method to set all bits in rangeSimon Pilgrim2017-02-242-9/+6
| | | | | | | | | | | | | | | | | | The current pattern for setting bits in range is typically: Mask |= APInt::getBitsSet(MaskSizeInBits, LoPos, HiPos); Which can be particularly slow for large APInts (MaskSizeInBits > 64) as they require the allocation memory for the temporary variable. This is one of the key compile time issues identified in PR32037. This patch adds the APInt::setBits() helper method which avoids the temporary memory allocation completely, this first implementation uses setBit() internally instead but already significantly reduces the regression in PR32037 (~10% drop). Additional optimization may be possible. I investigated whether there is need for APInt::clearBits() and APInt::flipBits() equivalents but haven't seen these patterns to be particularly common, but reusing the code would be trivial. Differential Revision: https://reviews.llvm.org/D30265 llvm-svn: 296102
* [WebAssembly] Add a README.txt entry for mergeable sections.Dan Gohman2017-02-241-0/+5
| | | | llvm-svn: 296095
* [AVX-512] Separate the fadd/fsub/fmul/fdiv/fmax/fmin with rounding mode ISD ↵Craig Topper2017-02-245-26/+38
| | | | | | opcodes into separate packed and scalar opcodes. This is more consistent with the rest of the ISD opcodes. NFC llvm-svn: 296094
* [AVX-512] Remove lzcnt intrinsics and autoupgrade them to generic ctlz ↵Craig Topper2017-02-241-12/+0
| | | | | | | | intrinsics with select. Clang has been emitting cltz intrinsics for a while now. llvm-svn: 296091
* [Fuchsia] Use thread-pointer ABI slots for stack-protector and safe-stackPetr Hosek2017-02-245-47/+69
| | | | | | | | | | | | The Fuchsia ABI defines slots from the thread pointer where the stack-guard value for stack-protector, and the unsafe stack pointer for safe-stack, are stored. This parallels the Android ABI support. Patch by Roland McGrath Differential Revision: https://reviews.llvm.org/D30237 llvm-svn: 296081
* [NVPTX] Added support for .f16x2 instructions.Artem Belevich2017-02-2311-117/+714
| | | | | | | | | | | | | This patch enables support for .f16x2 operations. Added new register type Float16x2. Added support for .f16x2 instructions. Added handling of vectorized loads/stores of v2f16 values. Differential Revision: https://reviews.llvm.org/D30057 Differential Revision: https://reviews.llvm.org/D30310 llvm-svn: 296032
* ARM: make sure FastISel bails on f64 operations for Cortex-M4.Tim Northover2017-02-231-8/+13
| | | | | | | | | | | FastISel wasn't checking the isFPOnlySP subtarget feature before emitting double-precision operations, so it got completely invalid CodeGen for doubles on Cortex-M4F. The normal ISel testing wasn't spectacular either so I added a second RUN line to improve that while I was in the area. llvm-svn: 296031
* [Hexagon] Handle saturations in Hexagon bit trackerKrzysztof Parzyszek2017-02-231-0/+14
| | | | llvm-svn: 296026
* [Hexagon] Allow setting register in BitVal without storing into mapKrzysztof Parzyszek2017-02-232-6/+13
| | | | | | | | | | | In the bit tracker, references to other bit values in which the register is 0 are prohibited. This means that generating self-referential register cells like { w:32 [0-15]:s[0-15] [16-31]:s[15] } is impossible. In order to get a self-referential cell, it had to be stored into a map and then reloaded from it. To avoid this step, add a function that will set the register to a given value without going through the map. llvm-svn: 296025
* [AMDGPU] Shut the warning "getRegUnitWeight hides overload...". NFC.Stanislav Mekhanoshin2017-02-231-0/+2
| | | | | | | Clang issues warning about hidden overload. That was intended, so add "using AMDGPUGenRegisterInfo::getRegUnitWeight;" to mute it. llvm-svn: 296021
* Disable TLS for stack protector on Android API<17.Evgeniy Stepanov2017-02-232-6/+12
| | | | | | The TLS slot did not exist back then. llvm-svn: 296014
* Correct register pressure calculation in presence of subregsStanislav Mekhanoshin2017-02-232-0/+20
| | | | | | | | | | If a subreg is used in an instruction it counts as a whole superreg for the purpose of register pressure calculation. This patch corrects improper register pressure calculation by examining operand's lane mask. Differential Revision: https://reviews.llvm.org/D29835 llvm-svn: 296009
* [Hexagon] Avoid IMPLICIT_DEFs as new-value producersKrzysztof Parzyszek2017-02-231-0/+2
| | | | llvm-svn: 295997
* AMDGPU/SI: Fix trunc i16 patternJan Vesely2017-02-232-6/+5
| | | | | | | | Hit on ASICs that support 16bit instructions. Differential Revision: https://reviews.llvm.org/D30281 llvm-svn: 295990
* [Hexagon] Patterns for CTPOP, BSWAP and BITREVERSEKrzysztof Parzyszek2017-02-233-23/+16
| | | | llvm-svn: 295981
* [ARM] GlobalISel: Lower call returnsDiana Picus2017-02-231-11/+52
| | | | | | | | Introduce a common ValueHandler for call returns and formal arguments, and inherit two different versions for handling the differences (at the moment the only difference is the way physical registers are marked as used). llvm-svn: 295973
* [ARM] GlobalISel: Lower call parameters in regsDiana Picus2017-02-231-15/+39
| | | | | | | | Add support for lowering calls with parameters than can fit into regs. Use the same ValueHandler that we used for function returns, but rename it to match its new, extended purpose. llvm-svn: 295971
* [X86][AVX] Disable VCVTSS2SD & VCVTSD2SS memory folding and fix the register ↵Ayman Musa2017-02-232-6/+7
| | | | | | | | class of their first input when creating node in fast-isel. (Quick fix to buildbot failure after rL295940 commit). llvm-svn: 295970
OpenPOWER on IntegriCloud