summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* [X86] Don't transform atomic-load-add into an inc/dec when inc/dec is slowRobin Morisset2014-10-081-3/+4
| | | | llvm-svn: 219357
* [X86] Avoid generating inc/dec when slow for x.atomic_store(1 + x.atomic_load())Robin Morisset2014-10-081-2/+2
| | | | | | | | | | | | | | | | | Summary: I had forgotten to check for NotSlowIncDec in the patterns that can generate inc/dec for the above pattern (added in D4796). This currently applies to Atom Silvermont, KNL and SKX. Test Plan: New checks on atomic_mi.ll Reviewers: jfb, nadav Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5677 llvm-svn: 219336
* [AVX512] Added intrinsics for 128-, 256- and 512-bit versions of ↵Robert Khasanov2014-10-082-6/+38
| | | | | | | | | | | VPCMP/VPCMPU{BWDQ} Added CMP_MASK_CC intrinsic type. Added tests for intrinsics. Patch by Sergey Lisitsyn <sergey.lisitsyn@intel.com> llvm-svn: 219316
* [AVX512] Refactoring of avx512_binop_rm multiclass through AVX512_masking.Robert Khasanov2014-10-082-133/+60
| | | | | | | Added new argrument for AVX512_masking: InstrItinClass and bit isCommutable. No functional change. llvm-svn: 219310
* Emit unaligned access build attribute for ARMRenato Golin2014-10-081-0/+7
| | | | | | Patch by Charlie Turner. llvm-svn: 219301
* Refactor isThumb1Only() && isMClass() into a predicate called isV6M()Renato Golin2014-10-082-5/+8
| | | | | | | | | This must be enforced for all v6M cores, not just the cortex-m0, irregardless of the user-specified alignment. Patch by Charlie Turner. llvm-svn: 219300
* Simplify switch statement in ARM subtarget align accessRenato Golin2014-10-081-30/+24
| | | | | | | | This switch can be reduced to a simpler if/else statement. Patch by Charlie Turner. llvm-svn: 219299
* Cache TargetLowering on SelectionDAGISel and update previousEric Christopher2014-10-083-47/+29
| | | | | | calls to getTargetLowering() with the cached variable. llvm-svn: 219284
* [AArch64] Generate vector signed/unsigned mul and mla/mls long.Chad Rosier2014-10-083-0/+248
| | | | | | | Phabricator Revision: http://reviews.llvm.org/D5589 Patch by Balaram Makam <bmakam@codeaurora.org>!! llvm-svn: 219276
* [X86] Fix a bug with fetch_add(INT32_MIN)Robin Morisset2014-10-071-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Fix pr21099 The pseudocode of what we were doing (spread through two functions) was: if (operand.doesNotFitIn32Bits()) Opc.initializeWithFoo(); if (operand < 0) operand = -operand; if (operand.doesFitIn8Bits()) Opc.initializeWithBar(); else if (operand.doesFitIn32Bits()) Opc.initializeWithBlah(); doStuff(Opc); So for operand == INT32_MIN, Opc was never initialized because the operand changes from fitting in 32 bits to not fitting, causing the various bugs/error messages noted by pr21099. This patch adds an extra test at the beginning for this case, and an llvm_unreachable to have better error message if the operand ends up not fitting in 32-bits at the end. Test Plan: new test + make check Reviewers: jfb Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5655 llvm-svn: 219257
* R600/SI: Refactor VOP3 instruction defsTom Stellard2014-10-072-61/+64
| | | | llvm-svn: 219256
* R600/SI: Refactor VOPC instruction defsTom Stellard2014-10-072-213/+217
| | | | llvm-svn: 219255
* R600/SI: Refactor VOP2 instruction defsTom Stellard2014-10-072-42/+46
| | | | llvm-svn: 219254
* R600/SI: Refactor VOP1 instruction defsTom Stellard2014-10-073-72/+96
| | | | llvm-svn: 219253
* R600: Remove dead codeMatt Arsenault2014-10-072-18/+1
| | | | llvm-svn: 219242
* R600: Remove some redundant initializations from AMDGPUMCAsmInfoTom Stellard2014-10-071-10/+0
| | | | llvm-svn: 219238
* R600: Use MCAsmInfoELF as AMDGPUMCAsmInfo base classTom Stellard2014-10-072-3/+8
| | | | | | | | | | | | | The main reason for this is that the MCAsmInfo class, which we were previously using as the base class, sets PrivateGlobalPrefix to "L", which causes all global functions that start with L to be treated as local symbols. MCAsmInfoELF sets PrivateGlobalPrefix to ".L", which is what we want, and it is probably a good idea to use this as the base class anyway, since we are emitting ELF binaries. llvm-svn: 219237
* R600/SI: Remove assertion in SIInstrInfo::areLoadsFromSameBasePtr()Tom Stellard2014-10-071-1/+4
| | | | | | | Added a FIXME coment instead, we need to handle the case where the two DS instructions being compared have different numbers of operands. llvm-svn: 219236
* [asan-asm-instrumentation] CFI directives are generated for .S files.Yuri Gorshenin2014-10-073-10/+51
| | | | | | | | | | | | Summary: CFI directives are generated for .S files. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5520 llvm-svn: 219199
* [mips] Return {f128} correctly for N32/N64.Daniel Sanders2014-10-072-2/+12
| | | | | | | | | | | | | | | | | Summary: According to the ABI documentation, f128 and {f128} should both be returned in $f0 and $f2. However, this doesn't match GCC's behaviour which is to return f128 in $f0 and $f2, but {f128} in $f0 and $f1. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5578 llvm-svn: 219196
* [X86] Fix a bug where the disassembler was ignoring the VEX.W bit in 32-bit ↵Craig Topper2014-10-071-0/+47
| | | | | | | | | | mode for certain instructions it shouldn't. Unfortunately, this isn't easy to fix since there's no simple way to figure out from the disassembler tables whether the W-bit is being used to select a 64-bit GPR or if its a required part of the opcode. The fix implemented here just looks for "64" in the instruction name and ignores the W-bit in 32-bit mode if its present. Fixes PR21169. llvm-svn: 219194
* Formatting fixes. Most putting 'else' on the same line as the preceding ↵Craig Topper2014-10-071-38/+19
| | | | | | curly brace. llvm-svn: 219193
* Fix filename in header and use C++ version of the C header files.Craig Topper2014-10-071-5/+5
| | | | llvm-svn: 219192
* [FastISel][AArch64] Teach the address computation code to also fold ↵Juergen Ributzka2014-10-071-0/+29
| | | | | | | | | | sign-/zero-extends. The code already folds sign-/zero-extends, but only if they are arguments to mul and shift instructions. This extends the code to also fold them when they are direct inputs. llvm-svn: 219187
* [FastISel][AArch64] Teach the address computation to also fold sub instructions.Juergen Ributzka2014-10-071-1/+12
| | | | | | | Tiny enhancement to the address computation code to also fold sub instructions if the rhs is constant and can be folded into the offset. llvm-svn: 219186
* [FastISel][AArch64] Fix "Fold sign-/zero-extends into the load instruction."Juergen Ributzka2014-10-071-64/+90
| | | | | | | | | | This commit fixes an issue with sign-/zero-extending loads that was discovered by Richard Barton. We use now the correct load instructions for sign-extending loads to 64bit. Also updated and added more unit tests. llvm-svn: 219185
* ARMInstPrinter.cpp: Suppress a warning for -Asserts. [-Wunused-variable]NAKAMURA Takumi2014-10-061-3/+2
| | | | llvm-svn: 219172
* ARM: silence unused variable warningTim Northover2014-10-061-2/+2
| | | | llvm-svn: 219128
* ARM: remove dead InstPrinting codeTim Northover2014-10-062-29/+1
| | | | | | | This instruction form is handled by different AsmOperands now, so the code is completely dead (and wrong anyway). llvm-svn: 219127
* X86: Drop the isConvertibleTo3Addr bit from shufps/shufpd now that we don't ↵Benjamin Kramer2014-10-061-9/+8
| | | | | | convert them anymore. llvm-svn: 219112
* Add subtarget caches to aarch64, arm, ppc, and x86.Eric Christopher2014-10-068-4/+148
| | | | | | | | | These will make it easier to test further changes to the code generation and optimization pipelines as those are moved to subtargets initialized with target feature and target cpu. llvm-svn: 219106
* [x86] Remove the 2-addr-to-3-addr "optimization" from shufps to pshufd.Chandler Carruth2014-10-051-28/+0
| | | | | | | | | | | | | | | This trades a (register-renamer-friendly) movaps for a floating point / integer domain cross. That is a very bad trade, even on architectures where domain crossing is relatively fast. On any chip where there is even a cycle stall, this is a Very Bad Idea. It doesn't even seem likely to cause a spill to be introduced because the reason for the copy is to destructively shuffle in place. Thanks to Ben Kramer for fixing a bug in this code that my new shuffle lowering exposed and highlighting that perhaps it should just go away. =] llvm-svn: 219090
* X86: Don't drop half of the mask when converting 2-address shufps into ↵Benjamin Kramer2014-10-051-1/+1
| | | | | | | | | 3-address pshufd. It's debatable whether this transform is useful at all, but for now make sure we don't generate invalid asm. llvm-svn: 219084
* AVX-512-SKX: Added instruction VPMOVM2B/W/D/Q.Elena Demikhovsky2014-10-052-2/+50
| | | | | | This instruction allows to broadacst mask vector to data vector. llvm-svn: 219083
* [x86] Fix PR21139, one of the last remaining regressions found in theChandler Carruth2014-10-051-9/+17
| | | | | | | | | | | | | new vector shuffle lowering. This is loosely based on a patch by Marius Wachtler to the PR (thanks!). I refactored it a bi to use std::count_if and a mutable array ref but the core idea was exactly right. I also added some direct testing of this case. I believe PR21137 is now the only remaining regression. llvm-svn: 219081
* [x86] Teach the new vector shuffle lowering how to lower 128-bitChandler Carruth2014-10-051-55/+102
| | | | | | | | | | | | | | | | | | | | | | shuffles using AVX and AVX2 instructions. This fixes PR21138, one of the few remaining regressions impacting benchmarks from the new vector shuffle lowering. You may note that it "regresses" many of the vperm2x128 test cases -- these were actually "improved" by the naive lowering that the new shuffle lowering previously did. This regression gave me fits. I had this patch ready-to-go about an hour after flipping the switch but wasn't sure how to have the best of both worlds here and thought the correct solution might be a completely different approach to lowering these vector shuffles. I'm now convinced this is the correct lowering and the missed optimizations shown in vperm2x128 are actually due to missing target-independent DAG combines. I've even written most of the needed DAG combine and will submit it shortly, but this part is ready and should help some real-world benchmarks out. llvm-svn: 219079
* HexagonMCCodeEmitter.cpp: Prune 2nd redundant \brief. [-Wdocumentation]NAKAMURA Takumi2014-10-051-1/+1
| | | | llvm-svn: 219073
* HexagonDesc: Update LLVMBuild.txt.NAKAMURA Takumi2014-10-051-1/+1
| | | | llvm-svn: 219071
* [SystemZ] Make operator bool explicit. NFC.Benjamin Kramer2014-10-042-2/+2
| | | | llvm-svn: 219069
* Make AAMDNodes ctor and operator bool (!!!) explicit, mop up bugs and ↵Benjamin Kramer2014-10-041-1/+1
| | | | | | weirdness exposed by it. llvm-svn: 219068
* Remove unnecessary copying or replace it with moves in a bunch of places.Benjamin Kramer2014-10-046-57/+56
| | | | | | NFC. llvm-svn: 219061
* [x86] Enable the new vector shuffle lowering by default.Chandler Carruth2014-10-041-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update the entire regression test suite for the new shuffles. Remove most of the old testing which was devoted to the old shuffle lowering path and is no longer relevant really. Also remove a few other random tests that only really exercised shuffles and only incidently or without any interesting aspects to them. Benchmarking that I have done shows a few small regressions with this on LNT, zero measurable regressions on real, large applications, and for several benchmarks where the loop vectorizer fires in the hot path it shows 5% to 40% improvements for SSE2 and SSE3 code running on Sandy Bridge machines. Running on AMD machines shows even more dramatic improvements. When using newer ISA vector extensions the gains are much more modest, but the code is still better on the whole. There are a few regressions being tracked (PR21137, PR21138, PR21139) but by and large this is expected to be a win for x86 generated code performance. It is also more correct than the code it replaces. I have fuzz tested this extensively with ISA extensions up through AVX2 and found no crashes or miscompiles (yet...). The old lowering had a few miscompiles and crashers after a somewhat smaller amount of fuzz testing. There is one significant area where the new code path lags behind and that is in AVX-512 support. However, there was *extremely little* support for that already and so this isn't a significant step backwards and the new framework will probably make it easier to implement lowering that uses the full power of AVX-512's table-based shuffle+blend (IMO). Many thanks to Quentin, Andrea, Robert, and others for benchmarking assistance. Thanks to Adam and others for help with AVX-512. Thanks to Hal, Eric, and *many* others for answering my incessant questions about how the backend actually works. =] I will leave the old code path in the tree until the 3 PRs above are at least resolved to folks' satisfaction. Then I will rip it (and 1000s of lines of code) out. =] I don't expect this flag to stay around for very long. It may not survive next week. llvm-svn: 219046
* Add fake use to suppress defined-but-unused warningsJingyue Wu2014-10-041-0/+1
| | | | llvm-svn: 219045
* [x86] Fix a bug in the VZEXT DAG combine that I just made more powerful.Chandler Carruth2014-10-041-3/+23
| | | | | | | | | | | | It turns out this combine was always somewhat flawed -- there are cases where nested VZEXT nodes *can't* be combined: if their types have a mismatch that can be observed in the result. While none of these show up in currently, once I switch to the new vector shuffle lowering a few test cases actually form such nested VZEXT nodes. I've not come up with any IR pattern that I can sensible write to exercise this, but it will be covered by tests once I flip the switch. llvm-svn: 219044
* [x86] Sink a generic combine of VZEXT nodes from the lowering to VZEXTChandler Carruth2014-10-041-40/+39
| | | | | | | | | | | nodes to the DAG combining of them. This will allow the combine to fire on both old vector shuffle lowering and the new vector shuffle lowering and generally seems like a cleaner design. I've trimmed down the code a bit and tried to make it and the surrounding combine fairly clean while moving it around. llvm-svn: 219042
* R600/SI: Custom lower f64 -> i64 conversionsMatt Arsenault2014-10-033-3/+57
| | | | llvm-svn: 219038
* R600: Custom lower [s|u]int_to_fp for i64 -> f64Matt Arsenault2014-10-033-2/+46
| | | | llvm-svn: 219037
* R600/SI: Fix ftrunc f64 conformance failures.Matt Arsenault2014-10-031-1/+1
| | | | | | Re-add the tests since they were deleted at some point llvm-svn: 219036
* [x86] Add a really preposterous number of patterns for matching all ofChandler Carruth2014-10-032-5/+194
| | | | | | | | | | | | | | | | | | | | | | | | | the various ways in which blends can be used to do vector element insertion for lowering with the scalar math instruction forms that effectively re-blend with the high elements after performing the operation. This then allows me to bail on the element insertion lowering path when we have SSE4.1 and are going to be doing a normal blend, which in turn restores the last of the blends lost from the new vector shuffle lowering when I got it to prioritize insertion in other cases (for example when we don't *have* a blend instruction). Without the patterns, using blends here would have regressed sse-scalar-fp-arith.ll *completely* with the new vector shuffle lowering. For completeness, I've added RUN-lines with the new lowering here. This is somewhat superfluous as I'm about to flip the default, but hey, it shows that this actually significantly changed behavior. The patterns I've added are just ridiculously repetative. Suggestions on making them better very much welcome. In particular, handling the commuted form of the v2f64 patterns is somewhat obnoxious. llvm-svn: 219033
* [x86] Adjust the patterns for lowering X86vzmovl nodes which don'tChandler Carruth2014-10-032-47/+54
| | | | | | | | | | | | | | perform a load to use blendps rather than movss when it is available. For non-loads, blendps is *much* faster. It can execute on two ports in Sandy Bridge and Ivy Bridge, and *three* ports on Haswell. This fixes one of the "regressions" from aggressively taking the "insertion" path in the new vector shuffle lowering. This does highlight one problem with blendps -- it isn't commuted as heavily as it should be. That's future work though. llvm-svn: 219022
OpenPOWER on IntegriCloud