summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
* Revert "Fix miscompile of MS inline assembly with stack realignment"Reid Kleckner2013-12-102-16/+13
| | | | | | | This reverts commit r196876. Its tests failed on the bots, so I'll figure it out tomorrow. llvm-svn: 196879
* Fix miscompile of MS inline assembly with stack realignmentReid Kleckner2013-12-102-13/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | For stack frames requiring realignment, three pointers may be needed: - ebp to address incoming arguments - esi (could be any callee-saved register) to address locals - esp to address outgoing arguments We would use esi unconditionally without verifying that it did not conflict with inline assembly. This change doesn't do the verification, it simply emits a fatal error on functions that use stack realignment, dynamic SP adjustments, and inline assembly. Because stack realignment is common on Windows, we also no longer assume that MS inline assembly clobbers esp. Instead, we analyze the inline instructions for implicit definitions and check if esp is there. If so, we require the use of a base pointer and consider it in the condition above. Mostly fixes PR16830, but we could try harder to find a non-conflicting base pointer. Reviewers: sunfish Differential Revision: http://llvm-reviews.chandlerc.com/D1317 llvm-svn: 196876
* Add comments documenting the ARM datalayout string.Rafael Espindola2013-12-101-0/+12
| | | | llvm-svn: 196850
* Simplify further.Rafael Espindola2013-12-101-12/+4
| | | | | | Thanks to Jim Grosbach for noticing it. llvm-svn: 196846
* Refactor the construction of the DataLayout string on ARM.Rafael Espindola2013-12-091-19/+39
| | | | llvm-svn: 196843
* [AArch64] Refactor the NEON scalar reduce pairwise intrinsics, so that they useChad Rosier2013-12-091-3/+3
| | | | | | float/double rather than the vector equivalents when appropriate. llvm-svn: 196833
* [AArch64] Refactor NEON scalar reduce pairwise front-end codegen to removeChad Rosier2013-12-091-16/+1
| | | | | | unnecessary patterns in tablegen. llvm-svn: 196832
* [AArch64] Remove q and non-q intrinsic definitions in the NEON scalar reduceChad Rosier2013-12-091-14/+13
| | | | | | pairwise implementation, using an overloaded definition instead. llvm-svn: 196831
* get rid of superfluous commentReed Kotler2013-12-091-1/+0
| | | | llvm-svn: 196829
* Delete some old code used for testing that is not needed anymore.Reed Kotler2013-12-091-87/+33
| | | | | | This is part of the mips16 epilogue/prologue cleanup. llvm-svn: 196824
* Don't add suffixes for stdcall/fastcall on 64 coff.Rafael Espindola2013-12-091-0/+2
| | | | | | This matches the behavior of both msvc and mingw. llvm-svn: 196814
* Don't set a variable to its default value.Rafael Espindola2013-12-091-1/+0
| | | | llvm-svn: 196807
* Fix pattern match for movi with 0D resultAna Pazos2013-12-091-3/+2
| | | | | | | | | | | Patch by Jiangning Liu. With some test case changes: - intrinsic test added to the existing /test/CodeGen/AArch64/neon-aba-abd.ll. - New test cases to cover movi 1D scenario without using the intrinsic in test/CodeGen/AArch64/neon-mov.ll. llvm-svn: 196806
* [mips][msa] Fix invalid generated code when lowering FrameIndex involving ↵Daniel Sanders2013-12-091-2/+21
| | | | | | | | | | | | | | | | | | | | | | | | unaligned offsets. Summary: The MSA ld.[bhwd] and st.[bhwd] instructions scale the immediate by the element size before use as an offset. The offset must therefore be a multiple of the element size to be valid in these instructions. However, an unaligned base address is valid in MSA. This commit causes the compiler to emit valid code when the calculated offset is not a multiple of the element size by accounting for the offset using addiu and using a zero offset in the load/store. Depends on D2338 Reviewers: matheusalmeida Reviewed By: matheusalmeida Differential Revision: http://llvm-reviews.chandlerc.com/D2339 llvm-svn: 196777
* [mips][msa] Fix suboptimal FrameIndex lowering for ld.[hwd] and st.[hwd]Daniel Sanders2013-12-091-15/+20
| | | | | | | | | | | | | | Summary: The immediate in these instructions is scaled before use as an offset. They therefore have a wider reach than ld.b/st.b. Reviewers: matheusalmeida Reviewed By: matheusalmeida Differential Revision: http://llvm-reviews.chandlerc.com/D2338 llvm-svn: 196775
* Method parseSetAssignment treats every operand with '$' sign as register and ↵Vladimir Medic2013-12-091-16/+1
| | | | | | | | | the parsing is directed to set alias for register. This will result in errors reported when expressions containing label references are parsed(for example long jumps) As we can't make a complete solution now it has been decided to enable .set directive to handle long jump expressions. This will cause parser to report errors when parsing integer based register assignments, for example: .set r3, will be reported as error. Still, the need for expressions is higher priority as the integer based register assignments are Mips specific and can be avoided using register names. llvm-svn: 196773
* [SPARCV9]: Adjust the resultant pointer of DYNAMIC_STACKALLOC with the stack ↵Venkatraman Govindaraju2013-12-091-3/+5
| | | | | | BIAS on sparcV9. llvm-svn: 196755
* [Sparc]: Implement getSetCCResultType() in SparcTargetLowering so that ↵Venkatraman Govindaraju2013-12-092-0/+9
| | | | | | umulo/smulo can be lowered on sparcv9 without an assertion error. llvm-svn: 196751
* [AArch64]Add missing pair intrinsics such as:Hao Liu2013-12-091-0/+19
| | | | | | | int32_t vminv_s32(int32x2_t a) which should be compiled into SMINP Vd.2S,Vn.2S,Vm.2S llvm-svn: 196749
* [AArch64]Pattern match failures for truncate store and extend loadHao Liu2013-12-091-0/+19
| | | | llvm-svn: 196748
* [SparcV9]: Expand MULHU/MULHS:i64 and UMUL_LOHI/SMUL_LOHI:i64 on sparcv9.Venkatraman Govindaraju2013-12-081-0/+7
| | | | | | This fixes PR18150. llvm-svn: 196735
* Revert 196544 due to internal bot failures.Manman Ren2013-12-081-25/+0
| | | | llvm-svn: 196732
* Make sure we mark these registers as defined. Previously was doneReed Kotler2013-12-081-6/+10
| | | | | | in the td file. llvm-svn: 196731
* Cleaning up of prologue/epilogue code for Mips16. First stepReed Kotler2013-12-084-33/+53
| | | | | | here is to make save/restore into variable number of argument instructions. llvm-svn: 196726
* ARM: fix folding of stack-adjustment (yet again).Tim Northover2013-12-081-3/+4
| | | | | | | | | | | | | | | | | | | | | | When trying to eliminate an "sub sp, sp, #N" instruction by folding it into an existing push/pop using dummy registers, we need to account for the fact that this might affect precisely how "fp" gets set in the prologue. We were attempting this, but assuming that *whenever* we performed a fold it would make a difference. This is false, for example, in: push {r4, r7, lr} add fp, sp, #4 vpush {d8} sub sp, sp, #8 we can fold the "sub" into the "vpush", forming "vpush {d7, d8}". However, in that case the "add fp" instruction mustn't change, which we were getting wrong before. Should fix PR18160. llvm-svn: 196725
* Remove the notion of primitive types.Rafael Espindola2013-12-073-26/+26
| | | | | | | | | | They were out of place since the introduction of arbitrary precision integer types. This also synchronizes the documentation to Types.h, so it refers to first class types and single value types. llvm-svn: 196661
* Add a RequireStructuredCFG Field to TargetMachine.Vincent Lejeune2013-12-072-0/+2
| | | | llvm-svn: 196634
* R600: Remove orphaned declarationsVincent Lejeune2013-12-071-3/+0
| | | | llvm-svn: 196633
* Added support for mcpu kraitAna Pazos2013-12-063-3/+24
| | | | | | | | | | | | - krait processor currently modeled with the same features as A9. - Krait processor additionally has VFP4 (fused multiply add/sub) and hardware division features enabled. - krait has currently the same Schedule model as A9 - krait cpu flag is not recognized by the GNU assembler yet, it is replaced with march=armv7-a to avoid a lower march from being used. llvm-svn: 196619
* Bug 18149: [AArch32] VSel instructions has no ARMCC fieldWeiming Zhao2013-12-061-7/+34
| | | | | | | | | The current peephole optimizing for compare inst assumes an instr that uses CPSR has an MO for ARM Cond code.However, for VSEL instructions (vseqeq, vselgt, vselgt, vselvs), there is no such operand nor do they support the modification of Cond Code. llvm-svn: 196588
* Update AVX512 vector blend intrinsic names.Cameron McInally2013-12-061-4/+4
| | | | llvm-svn: 196581
* [SystemZ] Use LOAD AND TEST for comparisons with -0Richard Sandiford2013-12-062-5/+8
| | | | | | ...since it os equivalent to comparison with +0. llvm-svn: 196580
* [SystemZ] Extend the use of C(L)GFRRichard Sandiford2013-12-061-2/+19
| | | | | | | instcombine prefers to put extended operands first, so this patch handles that case for C(L)GFR. llvm-svn: 196579
* [SystemZ] Optimize selects between 0 and -1Richard Sandiford2013-12-061-14/+44
| | | | | | | | | | | | | Since z has no setcc instruction as such, the choice of setBooleanContents is a bit arbitrary. Currently it's set to ZeroOrOneBooleanContent, so we produced a branch-free form when selecting between 0 and 1, but not when selecting between 0 and -1. This patch handles the latter case too. At some point I'd like to measure whether it's better to use conditional moves for constant selects on z196, but that's future work. llvm-svn: 196578
* Fix an index array check.Eric Christopher2013-12-061-1/+1
| | | | | | Patch by Marius Wachtler. llvm-svn: 196561
* Delete dead code.Reed Kotler2013-12-062-21/+0
| | | | llvm-svn: 196551
* Apply transformation on OS X 10.9+ and iOS 7.0+: pow(10, x) ―> __exp10(x)Yi Jiang2013-12-051-0/+25
| | | | llvm-svn: 196544
* Implemented vget/vset_lane_f16 intrinsicsAna Pazos2013-12-051-1/+105
| | | | llvm-svn: 196533
* MI-Sched: handle latency of in-order operations with the new machine model.Andrew Trick2013-12-051-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The per-operand machine model allows the target to define "unbuffered" processor resources. This change is a quick, cheap way to model stalls caused by the latency of operations that use such resources. This only applies when the processor's micro-op buffer size is non-zero (Out-of-Order). We can't precisely model in-order stalls during out-of-order execution, but this is an easy and effective heuristic. It benefits cortex-a9 scheduling when using the new machine model, which is not yet on by default. MI-Sched for armv7 was evaluated on Swift (and only not enabled because of a performance bug related to predication). However, we never evaluated Cortex-A9 performance on MI-Sched in its current form. This change adds MI-Sched functionality to reach performance goals on A9. The only remaining change is to allow MI-Sched to run as a PostRA pass. I evaluated performance using a set of options to estimate the performance impact once MI sched is default on armv7: -mcpu=cortex-a9 -disable-post-ra -misched-bench -scheditins=false For a simple saxpy loop I see a 1.7x speedup. Here are the llvm-testsuite results: (min run time over 2 runs, filtering tiny changes) Speedups: | Benchmarks/BenchmarkGame/recursive | 52.39% | | Benchmarks/VersaBench/beamformer | 20.80% | | Benchmarks/Misc/pi | 19.97% | | Benchmarks/Misc/mandel-2 | 19.95% | | SPEC/CFP2000/188.ammp | 18.72% | | Benchmarks/McCat/08-main/main | 18.58% | | Benchmarks/Misc-C++/Large/sphereflake | 18.46% | | Benchmarks/Olden/power | 17.11% | | Benchmarks/Misc-C++/mandel-text | 16.47% | | Benchmarks/Misc/oourafft | 15.94% | | Benchmarks/Misc/flops-7 | 14.99% | | Benchmarks/FreeBench/distray | 14.26% | | SPEC/CFP2006/470.lbm | 14.00% | | mediabench/mpeg2/mpeg2dec/mpeg2decode | 12.28% | | Benchmarks/SmallPT/smallpt | 10.36% | | Benchmarks/Misc-C++/Large/ray | 8.97% | | Benchmarks/Misc/fp-convert | 8.75% | | Benchmarks/Olden/perimeter | 7.10% | | Benchmarks/Bullet/bullet | 7.03% | | Benchmarks/Misc/mandel | 6.75% | | Benchmarks/Olden/voronoi | 6.26% | | Benchmarks/Misc/flops-8 | 5.77% | | Benchmarks/Misc/matmul_f64_4x4 | 5.19% | | Benchmarks/MiBench/security-rijndael | 5.15% | | Benchmarks/Misc/flops-6 | 5.10% | | Benchmarks/Olden/tsp | 4.46% | | Benchmarks/MiBench/consumer-lame | 4.28% | | Benchmarks/Misc/flops-5 | 4.27% | | Benchmarks/mafft/pairlocalalign | 4.19% | | Benchmarks/Misc/himenobmtxpa | 4.07% | | Benchmarks/Misc/lowercase | 4.06% | | SPEC/CFP2006/433.milc | 3.99% | | Benchmarks/tramp3d-v4 | 3.79% | | Benchmarks/FreeBench/pifft | 3.66% | | Benchmarks/Ptrdist/ks | 3.21% | | Benchmarks/Adobe-C++/loop_unroll | 3.12% | | SPEC/CINT2000/175.vpr | 3.12% | | Benchmarks/nbench | 2.98% | | SPEC/CFP2000/183.equake | 2.91% | | Benchmarks/Misc/perlin | 2.85% | | Benchmarks/Misc/flops-1 | 2.82% | | Benchmarks/Misc-C++-EH/spirit | 2.80% | | Benchmarks/Misc/flops-2 | 2.77% | | Benchmarks/NPB-serial/is | 2.42% | | Benchmarks/ASC_Sequoia/CrystalMk | 2.33% | | Benchmarks/BenchmarkGame/n-body | 2.28% | | Benchmarks/SciMark2-C/scimark2 | 2.27% | | Benchmarks/Olden/bh | 2.03% | | skidmarks10/skidmarks | 1.81% | | Benchmarks/Misc/flops | 1.72% | Slowdowns: | Benchmarks/llubenchmark/llu | -14.14% | | Benchmarks/Polybench/stencils/seidel-2d | -5.67% | | Benchmarks/Adobe-C++/functionobjects | -5.25% | | Benchmarks/Misc-C++/oopack_v1p8 | -5.00% | | Benchmarks/Shootout/hash | -2.35% | | Benchmarks/Prolangs-C++/ocean | -2.01% | | Benchmarks/Polybench/medley/floyd-warshall | -1.98% | | Polybench/linear-algebra/kernels/3mm | -1.95% | | Benchmarks/McCat/09-vor/vor | -1.68% | llvm-svn: 196516
* Fix the A9 machine model. VTRN writes two registers.Andrew Trick2013-12-051-1/+1
| | | | llvm-svn: 196514
* Add a default constructor to get deterministic behavior.Rafael Espindola2013-12-051-0/+1
| | | | | | Should fix the msan and valgrind bots. llvm-svn: 196509
* [NVPTX] Fix off-by-one error when creating the VT list for an SDNodeJustin Holewinski2013-12-051-1/+1
| | | | llvm-svn: 196503
* [mips] Small code generation improvement for conditional operator (select)Matheus Almeida2013-12-051-0/+33
| | | | | | | | | | | | | in case the operands are constants and its difference is |1|. It should be possible in those cases to rematerialize the result using MIPS's slt and similar instructions. The small update to some of the tests in cmov.ll, sel1c.ll and sel2c.ll was needed otherwise the optimization implemented in this patch would have been triggered (difference between the operands was 1) and that would have changed the semantic of the tests. llvm-svn: 196498
* [mips] Add some comments related to the optimization performed in ↵Matheus Almeida2013-12-051-8/+21
| | | | | | | | | | performSELECTCombine. The structure of the code was slightly modified so that the next patch is easier to read/review. No functional changes. llvm-svn: 196496
* [mips][msa] Fix issue with immediate fields of LD/ST instructionsMatheus Almeida2013-12-055-5/+95
| | | | | | | | | not being correctly encoded/decoded. In more detail, immediate fields of LD/ST instructions should be divided/multiplied by the size of the data format before encoding and after decoding, respectively. llvm-svn: 196494
* ARM: fix yet another stack-folding bugTim Northover2013-12-051-6/+1
| | | | | | | | | | | We were trying to fold the stack adjustment into the wrong instruction in the situation where the entire basic-block was epilogue code. Really, it can only ever be valid to do the folding precisely where the "add sp, ..." would be placed so there's no need for a separate iterator to track that. Should fix PR18136. llvm-svn: 196493
* Remove the isImplicitlyPrivate argument of getNameWithPrefix.Rafael Espindola2013-12-054-7/+14
| | | | | | | | | | | | getSymbolWithGlobalValueBase use is to create a name of a new symbol based on the name of an existing GV. Assert that and then remove the last call to pass true to isImplicitlyPrivate. This gives the mangler API a 1:1 mapping from GV to names, which is what we need to drop the mangler dependency on the target (and use an extended datalayout instead). llvm-svn: 196472
* Correct word hyphenationsAlp Toker2013-12-0519-21/+21
| | | | | | | This patch tries to avoid unrelated changes other than fixing a few hyphen-related ambiguities and contractions in nearby lines. llvm-svn: 196471
* Hide the stub created for MO_ExternalSymbol too.Rafael Espindola2013-12-051-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | given declare void @llvm.memset.p0i8.i32(i8* nocapture, i8, i32, i32, i1) declare void @foo() define void @bar() { call void @foo() call void @llvm.memset.p0i8.i32(i8* null, i8 0, i32 188, i32 1, i1 false) ret void } We used to produce L_foo$stub: .indirect_symbol _foo .ascii "\364\364\364\364\364" _memset$stub: .indirect_symbol _memset .ascii "\364\364\364\364\364" We not produce a private stub for memset too. Stubs are not needed with recent linkers, but we still produce them for darwin8. Thanks to David Fang for confirming that gcc used to do this too. llvm-svn: 196468
* R600/SI: Add comments for number of used registers.Matt Arsenault2013-12-052-14/+56
| | | | llvm-svn: 196467
OpenPOWER on IntegriCloud