summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* Fix BXJ is undefined in AArch32.Charlie Turner2015-04-152-2/+10
| | | | | | | | | | | | | | BXJ was incorrectly said to be unsupported in ARMv8-A. It is not supported in the A64 instruction set, but it is supported in the T32 and A32 instruction sets, because it's listed as an instruction in the ARM ARM section F7.1.28. Using SP as an operand to BXJ changed from UNPREDICTABLE to PREDICTABLE in v8-A. This patch reflects that update as well. This was found by MCHammer. llvm-svn: 235024
* [X86] add an exedepfix entry for movq == movlps == movlpdSanjay Patel2015-04-151-0/+2
| | | | | | | | | | | | | This is a 1-line patch (with a TODO for AVX because that will affect even more regression tests) that lets us substitute the appropriate 64-bit store for the float/double/int domains. It's not clear to me exactly what the difference is between the 0xD6 (MOVPQI2QImr) and 0x7E (MOVSDto64mr) opcodes, but this is apparently the right choice. Differential Revision: http://reviews.llvm.org/D8691 llvm-svn: 235014
* [x86] Implement combineRepeatedFPDivisorsSanjay Patel2015-04-152-0/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Set the transform bar at 2 divisions because the fastest current x86 FP divider circuit is in SandyBridge / Haswell at 10 cycle latency (best case) relative to a 5 cycle multiplier. So that's the worst case for this transform (no latency win), but multiplies are obviously pipelined while divisions are not, so there's still a big throughput win which we would expect to show up in typical FP code. These are the sequences I'm comparing: divss %xmm2, %xmm0 mulss %xmm1, %xmm0 divss %xmm2, %xmm0 Becomes: movss LCPI0_0(%rip), %xmm3 ## xmm3 = mem[0],zero,zero,zero divss %xmm2, %xmm3 mulss %xmm3, %xmm0 mulss %xmm1, %xmm0 mulss %xmm3, %xmm0 [Ignore for the moment that we don't optimize the chain of 3 multiplies into 2 independent fmuls followed by 1 dependent fmul...this is the DAG version of: https://llvm.org/bugs/show_bug.cgi?id=21768 ...if we fix that, then the transform becomes even more profitable on all targets.] Differential Revision: http://reviews.llvm.org/D8941 llvm-svn: 235012
* [msp430] Only support the 'm' inline assembly memory constraint. NFC.Daniel Sanders2015-04-151-6/+0
| | | | | | | | | | | | | | Summary: MSP430 doesn't seem to have any additional constraints. Therefore remove the target hook. No functional change intended. Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8208 llvm-svn: 235003
* [mips] [IAS] Refactor the function which checks for the availability of AT. NFC.Toma Tabacu2015-04-151-8/+13
| | | | | | | | | | | | | | | | | | Summary: Refactor MipsAsmParser::getATReg to return an internal register number instead of a register index. Also change all the int's to unsigned, seeing as the current AT register index is stored as an unsigned in MipsAssemblerOptions. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8478 llvm-svn: 234996
* [bpf] fix buildAlexei Starovoitov2015-04-155-9/+10
| | | | | | fix build due to refactoring in DIL/MDL and raw_pwrite_stream llvm-svn: 234971
* Change range-based for-loops to be -Wrange-loop-analysis clean.Richard Trieu2015-04-151-1/+1
| | | | | | No functionality change. llvm-svn: 234963
* Use raw_pwrite_stream in the object writer/streamer.Rafael Espindola2015-04-1452-104/+117
| | | | | | The ELF object writer will take advantage of that in the next commit. llvm-svn: 234950
* Correct 'teh' and other typos / repeated words.Ed Maste2015-04-141-1/+1
| | | | | | | | Patch by Eitan Adler. Differential Revision: http://reviews.llvm.org/D8514 llvm-svn: 234939
* Refactor: Simplify boolean expressions in ARM targetAlexander Kornienko2015-04-142-28/+25
| | | | | | | | | | Simplify boolean expressions using `true` and `false` with `clang-tidy` http://reviews.llvm.org/D8524 Patch by Richard Thomson! llvm-svn: 234901
* [AArch64] Allow non-standard INS/DUP encodingsBradley Smith2015-04-142-10/+10
| | | | | | | | | | The ARMv8 ARMARM states that for these instructions in A64 state: "Unspecified bits in "imm5" are ignored but should be set to zero by an assembler.", (imm4 for INS). Make the disassembler accept any encoding with these ignored bits set to 1. llvm-svn: 234896
* R600/SI: Fix verifier error caused by SIAnnotateControlFlowTom Stellard2015-04-141-6/+13
| | | | | | | | | | | | | | This pass will always try to insert llvm.SI.ifbreak intrinsics in the same block that its conditional value is computed in. This is a problem when conditions for breaks or continue are computed outside of the loop, because the llvm.SI.ifbreak intrinsic ends up being inserted outside of the loop. This patch fixes this problem by inserting the llvm.SI.ifbreak intrinsics in the loop header when the condition is computed outside the loop. llvm-svn: 234891
* Re-enable target-specific relocation table sorting and use it for MipsPetar Jovanovic2015-04-141-0/+180
| | | | | | | | | | | | | | Some targets (ie. Mips) have additional rules for ordering the relocation table entries. Allow them to override generic sortRelocs(), which sorts entries by Offset. Then override this function for Mips, to emit HI16 and GOT16 relocations against the local symbol in pair with the corresponding LO16 relocation. Patch by Vladimir Stefanovic. Differential Revision: http://reviews.llvm.org/D7414 llvm-svn: 234883
* DebugInfo: Gut DISubprogram and DILexicalBlock*Duncan P. N. Exon Smith2015-04-141-3/+3
| | | | | | | Gut the `DIDescriptor` wrappers around `MDLocalScope` subclasses. Note that `DILexicalBlock` wraps `MDLexicalBlockBase`, not `MDLexicalBlock`. llvm-svn: 234850
* DebugInfo: Gut DIVariable and DIGlobalVariableDuncan P. N. Exon Smith2015-04-141-2/+2
| | | | | | | | | | Gut all the non-pointer API from the variable wrappers, except an implicit conversion from `DIGlobalVariable` to `DIDescriptor`. Note that if you're updating out-of-tree code, `DIVariable` wraps `MDLocalVariable` (`MDVariable` is a common base class shared with `MDGlobalVariable`). llvm-svn: 234840
* Expand ADDO/SUBO on HexagonKrzysztof Parzyszek2015-04-131-0/+8
| | | | llvm-svn: 234795
* Revert revisions r234755, r234759, r234760Jan Vesely2015-04-137-61/+2
| | | | | | | | | | | Revert "Remove default in fully-covered switch (to fix Clang -Werror -Wcovered-switch-default)" Revert "R600: Add carry and borrow instructions. Use them to implement UADDO/USUBO" Revert "LegalizeDAG: Try to use Overflow operations when expanding ADD/SUB" Using overflow operations fails CodeGen/Generic/2011-07-07-ScheduleDAGCrash.ll on hexagon, nvptx, and r600. Revert while I investigate. llvm-svn: 234768
* Allow memory intrinsics to be tail callsKrzysztof Parzyszek2015-04-1310-11/+20
| | | | llvm-svn: 234764
* R600: Add carry and borrow instructions. Use them to implement UADDO/USUBOJan Vesely2015-04-137-2/+61
| | | | | | | | | | | | | | | | v2: tighten the sub64 tests v3: rename to CARRY/BORROW v4: fixup test cmdline add known bits computation use sign extend instead of sub 0,x better add test v5: remove redundant break move lowering to separate functions fix comments Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewers: arsenm llvm-svn: 234759
* R600: Make FMIN/MAXNUM legal on all asicsJan Vesely2015-04-123-2/+7
| | | | | | | | v2: Add tests Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> reviewer: arsenm llvm-svn: 234716
* R600: remove manual BFE optimizationJan Vesely2015-04-121-8/+2
| | | | | | | | Fixed since r233079 Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> reviewer: arsenm llvm-svn: 234715
* [PowerPC] Really iterate over all loops in PPCLoopDataPrefetch/PPCLoopPreIncPrepHal Finkel2015-04-122-14/+6
| | | | | | | | When I fixed these a couple of days ago to iterate over all loops, not just depth == 1 loops, I inadvertently made it such that we'd only look at the first top-level loop. Make sure that we really look at all of them. llvm-svn: 234705
* [PowerPC] Disable part-word atomics on the P7Hal Finkel2015-04-111-2/+2
| | | | | | | As it turns out, even though these are part of ISA 2.06, the P7 does not support them (or, at least, not any P7s we're tested so far). llvm-svn: 234686
* Add direct moves to/from VSR and exploit them for FP/INT conversionsNemanja Ivanovic2015-04-118-1/+134
| | | | | | | | | | This patch corresponds to review: http://reviews.llvm.org/D8928 It adds direct move instructions to/from VSX registers to GPR's. These are exploited for FP <-> INT conversions. llvm-svn: 234682
* Use 'override/final' instead of 'virtual' for overridden methodsAlexander Kornienko2015-04-1127-29/+31
| | | | | | | | | | | | | | The patch is generated using clang-tidy misc-use-override check. This command was used: tools/clang/tools/extra/clang-tidy/tool/run-clang-tidy.py \ -checks='-*,misc-use-override' -header-filter='llvm|clang' \ -j=32 -fix -format http://reviews.llvm.org/D8925 llvm-svn: 234679
* [PowerPC] Fix PPCLoopPreIncPrep for depth > 1 loopsHal Finkel2015-04-111-10/+27
| | | | | | | | | This pass had the same problem as the data-prefetching pass: it was only checking for depth == 1 loops in practice. Fix that, add some debugging statements, and make sure that, when we grab an AddRec, it is for the loop we expect. llvm-svn: 234670
* [CodeGen] Split -enable-global-merge into ARM and AArch64 options.Ahmed Bougacha2015-04-112-2/+16
| | | | | | | | | | | | | Currently, there's a single flag, checked by the pass itself. It can't force-enable the pass (and is on by default), because it might not even have been created, as that's the targets decision. Instead, have separate explicit flags, so that the decision is consistently made in the target. Keep the flag as a last-resort "force-disable GlobalMerge" for now, for backwards compatibility. llvm-svn: 234666
* [AArch64] Strengthen the code for the prologue insertion.Quentin Colombet2015-04-101-0/+2
| | | | | | | | | The spilled registers are pristine and thus, correctly handled by the register scavenger and so on, but the liveness information is strictly speaking wrong at this point. Fix that. llvm-svn: 234664
* [PowerPC] Prefetching should also consider depth > 1 loopsHal Finkel2015-04-101-2/+5
| | | | | | | Iterating over loops from the LoopInfo instance only provides top-level loops. We need to search the whole tree of loops to find the inner ones. llvm-svn: 234603
* [mips] [IAS] Improve comments in MipsAsmParser::expandLoadImm. NFC.Toma Tabacu2015-04-101-7/+5
| | | | llvm-svn: 234595
* [AArch64] Changes some SchedAlias to WriteRes for Cortex-A57.Chad Rosier2015-04-101-3/+8
| | | | | | | | | | | | Using SchedAliases is convenient and works well for latency and resource lookup for instructions. However, this creates an entry in AArch64WriteLatencyTable with a WriteResourceID of 0, breaking any SchedReadAdvance since the lookup will fail. http://reviews.llvm.org/D8043 Patch by Dave Estes <cestes@codeaurora.org>! llvm-svn: 234594
* [AArch64] Adjusts Cortex-A57 machine model to handle zero shift.Chad Rosier2015-04-101-0/+9
| | | | | | | http://reviews.llvm.org/D8043 Patch by Dave Estes <cestes@codeaurora.org>! llvm-svn: 234593
* Reduce dyn_cast<> to isa<> or cast<> where possible.Benjamin Kramer2015-04-1013-39/+36
| | | | | | No functional change intended. llvm-svn: 234586
* Divergence analysis for GPU programsJingyue Wu2015-04-102-0/+72
| | | | | | | | | | | | | | | | | | | Summary: Some optimizations such as jump threading and loop unswitching can negatively affect performance when applied to divergent branches. The divergence analysis added in this patch conservatively estimates which branches in a GPU program can diverge. This information can then help LLVM to run certain optimizations selectively. Test Plan: test/Analysis/DivergenceAnalysis/NVPTX/diverge.ll Reviewers: resistor, hfinkel, eliben, meheff, jholewinski Subscribers: broune, bjarke.roune, madhur13490, tstellarAMD, dberlin, echristo, jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D8576 llvm-svn: 234567
* [PowerPC] Don't crash on PPC32 i64 fp_to_uint on modern coresHal Finkel2015-04-101-0/+1
| | | | | | | | | | When we have an instruction for this (and, thus, don't generate a runtime call), we need to custom type legalize this (in a trivial way, just as we do for fp_to_sint). Fixes PR23173. llvm-svn: 234561
* [AArch64] Promote f16 operations to f32.Ahmed Bougacha2015-04-101-8/+50
| | | | | | | | | | | | | | | | | | | | For the most common ones (such as fadd), we already did the promotion. Do the same thing for all the others. Currently, we'll just crash/assert on all these operations, as there's no hardware or libcall support whatsoever. f16 (half) is specified as an interchange - not arithmetic - format, and is expected to be promoted to single-precision for arithmetic operations. While there, teach the legalizer about promoting some of the (mostly floating-point) operations that we never needed before. Differential Revision: http://reviews.llvm.org/D8648 See related discussion on the thread for: http://reviews.llvm.org/D8755 llvm-svn: 234550
* Add LLVM support for remaining integer divide and permute instructions from ↵Nemanja Ivanovic2015-04-096-52/+133
| | | | | | | | | | | ISA 2.06 This is the patch corresponding to review: http://reviews.llvm.org/D8406 It adds some missing instructions from ISA 2.06 to the PPC back end. llvm-svn: 234546
* Simplify use of formatted_raw_ostream.Rafael Espindola2015-04-093-12/+14
| | | | | | | | | | | | | | | formatted_raw_ostream is a wrapper over another stream to add column and line number tracking. It is used only for asm printing. This patch moves the its creation down to where we know we are printing assembly. This has the following advantages: * Simpler lifetime management: std::unique_ptr * We don't compute column and line number of object files :-) llvm-svn: 234535
* [AArch64][FastISel] Fix integer extend optimization.Juergen Ributzka2015-04-091-5/+6
| | | | | | | | | | | | | | The integer extend optimization tries to fold the extend into the load instruction. This requires us to identify if the extend has already been emitted or not and act accordingly on it. The check that was originally performed for this was not sufficient. Besides checking the ValueMap for a mapped register we also need to check if the virtual register has already an associated machine instruction that defines it. This fixes rdar://problem/20470788. llvm-svn: 234529
* Remove duplicated code and consolidate initializers.Eric Christopher2015-04-092-15/+5
| | | | llvm-svn: 234525
* clang-format bits of code to make a followup patch easy to read.Rafael Espindola2015-04-0911-26/+16
| | | | llvm-svn: 234519
* Use a raw_svector_ostream instead of a raw_string_ostream.Rafael Espindola2015-04-091-6/+8
| | | | | | It saves a bit of copying. llvm-svn: 234507
* Don't repeat name in comment. NFC.Rafael Espindola2015-04-094-24/+22
| | | | llvm-svn: 234506
* This reverts commit r234460 and r234461.Rafael Espindola2015-04-093-8/+6
| | | | | | | | | Revert "Add classof implementations to the raw_ostream classes." Revert "Use the cast machinery to remove dummy uses of formatted_raw_ostream." The underlying issue can be fixed without classof. llvm-svn: 234495
* [ARM] support for Cortex-R4/R4FJaved Absar2015-04-092-1/+19
| | | | | | | | | Currently, llvm (backend) doesn't know cortex-r4, even though it is the default target for armv7r. Using "--target=armv7r-arm-none-eabi" provokes 'cortex-r4' is not a recognized processor for this target' by llvm. This patch adds support for cortex-r4 and, very closely related, r4f. llvm-svn: 234486
* [mips] Refactor saved-registers bitmask creation in ↵Toma Tabacu2015-04-091-20/+11
| | | | | | | | | | | | | | | | | | | MipsAsmPrinter::printSavedRegsBitmask. NFC. Summary: Make the code more readable by fusing the for-loops together and explicitly checking for each register class. Also, this version is more straightforward because it doesn't assume that FPU registers always come before CPU registers in the CalleeSavedInfo vector. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D8033 llvm-svn: 234475
* [AArch64] Add support for dynamic stack alignmentKristof Beyls2015-04-094-43/+172
| | | | | | Differential Revision: http://reviews.llvm.org/D8876 llvm-svn: 234471
* [AArch64] Remove redundant -march option. Also fix a think-o from r234462.Lang Hames2015-04-091-1/+1
| | | | llvm-svn: 234467
* [AArch64] Teach AArch64TargetLowering::getOptimalMemOpType to consider alignmentLang Hames2015-04-091-1/+11
| | | | | | | | | | | | | restrictions when choosing a type for small-memcpy inlining in SelectionDAGBuilder. This ensures that the loads and stores output for the memcpy won't be further expanded during legalization, which would cause the total number of instructions for the memcpy to exceed (often significantly) the inlining thresholds. <rdar://problem/17829180> llvm-svn: 234462
* Use the cast machinery to remove dummy uses of formatted_raw_ostream.Rafael Espindola2015-04-093-6/+8
| | | | | | | If we know we are producing an object, we don't need to wrap the stream in a formatted_raw_ostream anymore. llvm-svn: 234461
OpenPOWER on IntegriCloud