summaryrefslogtreecommitdiffstats
path: root/llvm/test
Commit message (Collapse)AuthorAgeFilesLines
* DebugInfo+DeadArgElimination: Ensure llvm::Function*s from debug info are ↵David Blaikie2014-10-071-47/+52
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | updated even when DAE removes both varargs and non-varargs arguments on the same function. After some stellar (& inspired) help from Reid Kleckner providing a test case for some rather unstable undefined behavior showing up as assertions produced by r214761, I was able to fix this issue in DAE involving the application of both varargs removal, followed by normal argument removal. Indeed I introduced this same bug into ArgumentPromotion (r212128) by copying the code from DAE, and when I fixed the bug in ArgPromo (r213805) and commented in that patch that I didn't need to address the same issue in DAE because it was a single pass. Turns out it's two pass, one for the varargs and one for the normal arguments, so the same fix is needed (at least during varargs removal). So here it is. (the observable/net effect of this bug, even when it didn't result in assertion failure, is that debug info would describe the DAE'd function in the abstract, but wouldn't provide high/low_pc, variable locations, line table, etc (it would appear as though the function had been entirely optimized away), see the original PR14016 for details of the general problem) I'm not recommitting the assertion just yet, as there's been another regression of it since I last tried. It might just be a few test cases weren't adequately updated after Adrian or Duncan's recent schema changes. llvm-svn: 219210
* Remove Extra lines. NFC.Suyog Sarda2014-10-071-2/+0
| | | | llvm-svn: 219201
* [asan-asm-instrumentation] CFI directives are generated for .S files.Yuri Gorshenin2014-10-074-13/+80
| | | | | | | | | | | | Summary: CFI directives are generated for .S files. Reviewers: eugenis Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5520 llvm-svn: 219199
* [mips] Return {f128} correctly for N32/N64.Daniel Sanders2014-10-071-0/+36
| | | | | | | | | | | | | | | | | Summary: According to the ABI documentation, f128 and {f128} should both be returned in $f0 and $f2. However, this doesn't match GCC's behaviour which is to return f128 in $f0 and $f2, but {f128} in $f0 and $f1. Reviewers: vmedic Reviewed By: vmedic Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5578 llvm-svn: 219196
* [X86] Fix a bug where the disassembler was ignoring the VEX.W bit in 32-bit ↵Craig Topper2014-10-071-0/+3
| | | | | | | | | | mode for certain instructions it shouldn't. Unfortunately, this isn't easy to fix since there's no simple way to figure out from the disassembler tables whether the W-bit is being used to select a 64-bit GPR or if its a required part of the opcode. The fix implemented here just looks for "64" in the instruction name and ignores the W-bit in 32-bit mode if its present. Fixes PR21169. llvm-svn: 219194
* GlobalDCE: Don't drop any COMDAT membersDavid Majnemer2014-10-071-0/+17
| | | | | | | | | If we require a single member of a comdat, require all of the other members as well. This fixes PR20981. llvm-svn: 219191
* gold plugin: Handle gold selecting a linkonce GV when a weak is present.Rafael Espindola2014-10-072-0/+22
| | | | | | | | | The plugin API doesn't have the notion of linkonce, only weak. It is up to the plugin to figure out if a symbol used only for the symbol table can be dropped. In particular, it has to avoid dropping a linkonce_odr selected by gold if there is also a weak_odr. llvm-svn: 219188
* [FastISel][AArch64] Teach the address computation code to also fold ↵Juergen Ributzka2014-10-071-20/+10
| | | | | | | | | | sign-/zero-extends. The code already folds sign-/zero-extends, but only if they are arguments to mul and shift instructions. This extends the code to also fold them when they are direct inputs. llvm-svn: 219187
* [FastISel][AArch64] Teach the address computation to also fold sub instructions.Juergen Ributzka2014-10-071-10/+10
| | | | | | | Tiny enhancement to the address computation code to also fold sub instructions if the rhs is constant and can be folded into the offset. llvm-svn: 219186
* [FastISel][AArch64] Fix "Fold sign-/zero-extends into the load instruction."Juergen Ributzka2014-10-071-71/+382
| | | | | | | | | | This commit fixes an issue with sign-/zero-extending loads that was discovered by Richard Barton. We use now the correct load instructions for sign-extending loads to 64bit. Also updated and added more unit tests. llvm-svn: 219185
* gold plugin: create internal replacement with original linkage first.Rafael Espindola2014-10-072-2/+2
| | | | | | | | | The call to copyAttributesFrom will copy the visibility, which might assert if it were to produce something invalid like "internal hidden". We avoid it by first creating the replacement with the original linkage and then setting it to internal affter the call to copyAttributesFrom. llvm-svn: 219184
* gold plugin: Remap function arguments when creating a replacement function.Rafael Espindola2014-10-072-16/+18
| | | | | | | | When creating an internal function replacement for use in an alias we were not remapping the argument uses in the instructions to point to the new arguments. llvm-svn: 219177
* [InstCombine] re-commit r218721 icmp-select-icmp optimizationGerolf Hoflehner2014-10-071-1/+1
| | | | | | | | | Takes care of the assert that caused build fails. Rather than asserting the code checks now that the definition and use are in the same block, and does not attempt to optimize when that is not the case. llvm-svn: 219175
* llvm/test/lit.cfg: Suppress dwarf stuff for targeting x86_64-mingw32 while ↵NAKAMURA Takumi2014-10-061-1/+2
| | | | | | investigating since r219108. llvm-svn: 219171
* [DAGCombine] Remove SIGN_EXTEND-related inf-loopHal Finkel2014-10-061-0/+33
| | | | | | | | | | | | | | | | | | | | | | The patch's author points out that, despite the function's documentation, getSetCCResultType is only used to get the SETCC result type (with one here-removed problematic exception). In one case, getSetCCResultType was being used to get the predicate type to use for a SELECT node, and then SIGN_EXTENDing (or truncating) to get the input predicate to match that type. Unfortunately, this was happening inside visitSIGN_EXTEND, and creating new SIGN_EXTEND nodes was causing an infinite loop. In addition, this behavior was wrong if a target was not using ZeroOrNegativeOneBooleanContent. Lastly, the extension/truncation seems unnecessary here: SELECT is defined as: Select(COND, TRUEVAL, FALSEVAL). If the type of the boolean COND is not i1 then the high bits must conform to getBooleanContents. So here we remove this use of getSetCCResultType and update getSetCCResultType's documentation to reflect its actual uses. Patch by deadal nix! llvm-svn: 219141
* Fast-math fold: x / (y * sqrt(z)) -> x * (rsqrt(z) / y)Sanjay Patel2014-10-061-0/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The motivation is to recognize code such as this from /llvm/projects/test-suite/SingleSource/Benchmarks/BenchmarkGame/n-body.c: float distance = sqrt(dx * dx + dy * dy + dz * dz); float mag = dt / (distance * distance * distance); Without this patch, we don't match the sqrt as a reciprocal sqrt, so for PPC the new testcase in this patch produces: addis 3, 2, .LCPI4_2@toc@ha lfs 4, .LCPI4_2@toc@l(3) addis 3, 2, .LCPI4_1@toc@ha lfs 0, .LCPI4_1@toc@l(3) fcmpu 0, 1, 4 beq 0, .LBB4_2 # BB#1: frsqrtes 4, 1 addis 3, 2, .LCPI4_0@toc@ha lfs 5, .LCPI4_0@toc@l(3) fnmsubs 13, 1, 5, 1 fmuls 6, 4, 4 fmadds 1, 13, 6, 5 fmuls 1, 4, 1 fres 4, 1 <--- reciprocal of reciprocal square root fnmsubs 1, 1, 4, 0 fmadds 4, 4, 1, 4 .LBB4_2: fmuls 1, 4, 2 fres 2, 1 fnmsubs 0, 1, 2, 0 fmadds 0, 2, 0, 2 fmuls 1, 3, 0 blr After the patch, this simplifies to: frsqrtes 0, 1 addis 3, 2, .LCPI4_1@toc@ha fres 5, 2 lfs 4, .LCPI4_1@toc@l(3) addis 3, 2, .LCPI4_0@toc@ha lfs 7, .LCPI4_0@toc@l(3) fnmsubs 13, 1, 4, 1 fmuls 6, 0, 0 fnmsubs 2, 2, 5, 7 fmadds 1, 13, 6, 4 fmadds 2, 5, 2, 5 fmuls 0, 0, 1 fmuls 0, 0, 2 fmuls 1, 3, 0 blr Differential Revision: http://reviews.llvm.org/D5628 llvm-svn: 219139
* [BasicAA] Revert "Revert r218714 - Make better use of zext and sign ↵Hal Finkel2014-10-062-0/+88
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | information." This reverts r218944, which reverted r218714, plus a bug fix. Description of the bug in r218714 (by Nick) The original patch forgot to check if the Scale in VariableGEPIndex flipped the sign of the variable. The BasicAA pass iterates over the instructions in the order they appear in the function, and so BasicAliasAnalysis::aliasGEP is called with the variable it first comes across as parameter GEP1. Adding a %reorder label puts the definition of %a after %b so aliasGEP is called with %b as the first parameter and %a as the second. aliasGEP later calculates that %a == %b + 1 - %idxprom where %idxprom >= 0 (if %a was passed as the first parameter it would calculate %b == %a - 1 + %idxprom where %idxprom >= 0) - ignoring that %idxprom is scaled by -1 here lead the patch to incorrectly conclude that %a > %b. Revised patch by Nick White, thanks! Thanks to Lang to isolating the bug. Slightly modified by me to add an early exit from the loop and avoid unnecessary, but expensive, function calls. Original commit message: Two related things: 1. Fixes a bug when calculating the offset in GetLinearExpression. The code previously used zext to extend the offset, so negative offsets were converted to large positive ones. 2. Enhance aliasGEP to deduce that, if the difference between two GEP allocations is positive and all the variables that govern the offset are also positive (i.e. the offset is strictly after the higher base pointer), then locations that fit in the gap between the two base pointers are NoAlias. Patch by Nick White! llvm-svn: 219135
* MachObjectWriter: optimize the string table for common sufficesHans Wennborg2014-10-0624-252/+252
| | | | | | | | This is a follow-up to r207670 (ELF) and r218636 (COFF). Differential Revision: http://reviews.llvm.org/D5622 llvm-svn: 219126
* Fix dumping codeview line tables when there are multiple debug sectionsTimur Iskhodzhanov2014-10-062-19/+59
| | | | | | | | | | Codeview line tables for functions in different sections refer to a common STRING_TABLE_SUBSECTION for filenames. This happens when building with -Gy or with inline functions with MSVC. Original patch by Jeff Muizelaar! llvm-svn: 219125
* [CFL-AA] Update for handling of globals and more testsHal Finkel2014-10-066-0/+146
| | | | | | | | | | | | | | | | | | We used to return PartialAlias if *either* variable being queried interacted with arguments or globals. AFAICT, we can change this to only returning MayAlias iff *both* variables being queried interacted with arguments or globals. Also, adding some basic functionality tests: some basic IPA tests, checking that we give conservative responses with arguments/globals thrown in the mix, and ensuring that we trace values through stores and loads. Note that saying that 'x' interacted with arguments or globals means that the Attributes of the StratifiedSet that 'x' belongs to has any bits set. Patch by George Burgess IV, thanks! llvm-svn: 219122
* For biendian targets like ARM and AArch64, it is useful to have theEric Christopher2014-10-064-2/+40
| | | | | | | | | output of the llvm-dwarfdump and llvm-objdump report the endianness used when the object files were generated. Patch by Charlie Turner. llvm-svn: 219110
* Add support for ARM and AArch64 big endian objects toEric Christopher2014-10-062-0/+30
| | | | | | | | RelocVisitor. Patch by Charlie Turner. llvm-svn: 219109
* Add some tests for RelocVisitor.Eric Christopher2014-10-067-0/+113
| | | | | | Patch by Charlie Turner. llvm-svn: 219107
* [dwarfdump] Print the name for referenced specification of abstract_origin DIEs.Frederic Riss2014-10-0616-53/+53
| | | | | | | | | | Reviewers: dblaikie, samsonov, echristo, aprantl Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5466 llvm-svn: 219099
* Give the Reassociate pass a bit more flexibility and autonomy when ↵Owen Anderson2014-10-051-0/+15
| | | | | | | | | | optimizing expressions. Particularly, it addresses cases where Reassociate breaks Subtracts but then fails to optimize combinations like I1 + -I2 where I1 and I2 have the same rank and are identical. Patch by Dmitri Shtilman. llvm-svn: 219092
* [x86] Remove the 2-addr-to-3-addr "optimization" from shufps to pshufd.Chandler Carruth2014-10-056-26/+30
| | | | | | | | | | | | | | | This trades a (register-renamer-friendly) movaps for a floating point / integer domain cross. That is a very bad trade, even on architectures where domain crossing is relatively fast. On any chip where there is even a cycle stall, this is a Very Bad Idea. It doesn't even seem likely to cause a spill to be introduced because the reason for the copy is to destructively shuffle in place. Thanks to Ben Kramer for fixing a bug in this code that my new shuffle lowering exposed and highlighting that perhaps it should just go away. =] llvm-svn: 219090
* [x86, dag] Teach the DAG combiner to prune inputs toa vector_shuffleChandler Carruth2014-10-052-15/+71
| | | | | | | | | | | | | | | that are unused. This allows the combiner to delete math feeding shuffles where the math isn't actually necessary. This improves some of the vperm2x128 tests that regressed when the vector shuffle lowering started actually generating vperm instructions rather than forcibly decomposing them. Sadly, this isn't enough to get this *really* right because we still form a completely unnecessary permutation. To fix that, we also need to fold shuffles which just rearrange concatenated or inserted subvectors. llvm-svn: 219086
* X86: Don't drop half of the mask when converting 2-address shufps into ↵Benjamin Kramer2014-10-051-0/+11
| | | | | | | | | 3-address pshufd. It's debatable whether this transform is useful at all, but for now make sure we don't generate invalid asm. llvm-svn: 219084
* AVX-512-SKX: Added instruction VPMOVM2B/W/D/Q.Elena Demikhovsky2014-10-051-0/+29
| | | | | | This instruction allows to broadacst mask vector to data vector. llvm-svn: 219083
* [x86] Fix PR21139, one of the last remaining regressions found in theChandler Carruth2014-10-052-7/+20
| | | | | | | | | | | | | new vector shuffle lowering. This is loosely based on a patch by Marius Wachtler to the PR (thanks!). I refactored it a bi to use std::count_if and a mutable array ref but the core idea was exactly right. I also added some direct testing of this case. I believe PR21137 is now the only remaining regression. llvm-svn: 219081
* [x86] Teach the new vector shuffle lowering how to lower 128-bitChandler Carruth2014-10-054-192/+116
| | | | | | | | | | | | | | | | | | | | | | shuffles using AVX and AVX2 instructions. This fixes PR21138, one of the few remaining regressions impacting benchmarks from the new vector shuffle lowering. You may note that it "regresses" many of the vperm2x128 test cases -- these were actually "improved" by the naive lowering that the new shuffle lowering previously did. This regression gave me fits. I had this patch ready-to-go about an hour after flipping the switch but wasn't sure how to have the best of both worlds here and thought the correct solution might be a completely different approach to lowering these vector shuffles. I'm now convinced this is the correct lowering and the missed optimizations shown in vperm2x128 are actually due to missing target-independent DAG combines. I've even written most of the needed DAG combine and will submit it shortly, but this part is ready and should help some real-world benchmarks out. llvm-svn: 219079
* [InstCombine] Remove redundant @llvm.assume intrinsicsHal Finkel2014-10-041-0/+55
| | | | | | | | | For any @llvm.assume intrinsic, if there is another which dominates it and uses the same condition, then it is redundant and can be removed. While this does not alter the semantics of the @llvm.assume intrinsics, it makes subsequent handling more efficient (and the resulting IR easier to read). llvm-svn: 219067
* [x86] Slap a triple on this test since it is poking around at the stackChandler Carruth2014-10-041-0/+2
| | | | | | | and calling conventions. Otherwise its too hard to craft a usefully generic set of assertions. llvm-svn: 219047
* [x86] Enable the new vector shuffle lowering by default.Chandler Carruth2014-10-0490-3775/+1333
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Update the entire regression test suite for the new shuffles. Remove most of the old testing which was devoted to the old shuffle lowering path and is no longer relevant really. Also remove a few other random tests that only really exercised shuffles and only incidently or without any interesting aspects to them. Benchmarking that I have done shows a few small regressions with this on LNT, zero measurable regressions on real, large applications, and for several benchmarks where the loop vectorizer fires in the hot path it shows 5% to 40% improvements for SSE2 and SSE3 code running on Sandy Bridge machines. Running on AMD machines shows even more dramatic improvements. When using newer ISA vector extensions the gains are much more modest, but the code is still better on the whole. There are a few regressions being tracked (PR21137, PR21138, PR21139) but by and large this is expected to be a win for x86 generated code performance. It is also more correct than the code it replaces. I have fuzz tested this extensively with ISA extensions up through AVX2 and found no crashes or miscompiles (yet...). The old lowering had a few miscompiles and crashers after a somewhat smaller amount of fuzz testing. There is one significant area where the new code path lags behind and that is in AVX-512 support. However, there was *extremely little* support for that already and so this isn't a significant step backwards and the new framework will probably make it easier to implement lowering that uses the full power of AVX-512's table-based shuffle+blend (IMO). Many thanks to Quentin, Andrea, Robert, and others for benchmarking assistance. Thanks to Adam and others for help with AVX-512. Thanks to Hal, Eric, and *many* others for answering my incessant questions about how the backend actually works. =] I will leave the old code path in the tree until the 3 PRs above are at least resolved to folks' satisfaction. Then I will rip it (and 1000s of lines of code) out. =] I don't expect this flag to stay around for very long. It may not survive next week. llvm-svn: 219046
* R600/SI: Custom lower f64 -> i64 conversionsMatt Arsenault2014-10-034-34/+122
| | | | llvm-svn: 219038
* R600: Custom lower [s|u]int_to_fp for i64 -> f64Matt Arsenault2014-10-032-3/+67
| | | | llvm-svn: 219037
* R600/SI: Fix ftrunc f64 conformance failures.Matt Arsenault2014-10-034-3/+113
| | | | | | Re-add the tests since they were deleted at some point llvm-svn: 219036
* [x86] Add a really preposterous number of patterns for matching all ofChandler Carruth2014-10-034-35/+90
| | | | | | | | | | | | | | | | | | | | | | | | | the various ways in which blends can be used to do vector element insertion for lowering with the scalar math instruction forms that effectively re-blend with the high elements after performing the operation. This then allows me to bail on the element insertion lowering path when we have SSE4.1 and are going to be doing a normal blend, which in turn restores the last of the blends lost from the new vector shuffle lowering when I got it to prioritize insertion in other cases (for example when we don't *have* a blend instruction). Without the patterns, using blends here would have regressed sse-scalar-fp-arith.ll *completely* with the new vector shuffle lowering. For completeness, I've added RUN-lines with the new lowering here. This is somewhat superfluous as I'm about to flip the default, but hey, it shows that this actually significantly changed behavior. The patterns I've added are just ridiculously repetative. Suggestions on making them better very much welcome. In particular, handling the commuted form of the v2f64 patterns is somewhat obnoxious. llvm-svn: 219033
* [x86] Adjust the patterns for lowering X86vzmovl nodes which don'tChandler Carruth2014-10-035-60/+186
| | | | | | | | | | | | | | perform a load to use blendps rather than movss when it is available. For non-loads, blendps is *much* faster. It can execute on two ports in Sandy Bridge and Ivy Bridge, and *three* ports on Haswell. This fixes one of the "regressions" from aggressively taking the "insertion" path in the new vector shuffle lowering. This does highlight one problem with blendps -- it isn't commuted as heavily as it should be. That's future work though. llvm-svn: 219022
* PR21145: Teach LLVM about C++14 sized deallocation functions.Richard Smith2014-10-031-0/+23
| | | | | | | | C++14 adds new builtin signatures for 'operator delete'. This change allows new/delete pairs to be removed in C++14 onwards, as they were in C++11 and before. llvm-svn: 219014
* Revert "Revert "DI: Fold constant arguments into a single MDString""Duncan P. N. Exon Smith2014-10-03342-5879/+5879
| | | | | | | | | | | | | | | | | | | | | | This reverts commit r218918, effectively reapplying r218914 after fixing an Ocaml bindings test and an Asan crash. The root cause of the latter was a tightened-up check in `DILexicalBlock::Verify()`, so I'll file a PR to investigate who requires the loose check (and why). Original commit message follows. -- This patch addresses the first stage of PR17891 by folding constant arguments together into a single MDString. Integers are stringified and a `\0` character is used as a separator. Part of PR17891. Note: I've attached my testcases upgrade scripts to the PR. If I've just broken your out-of-tree testcases, they might help. llvm-svn: 219010
* [ISel] Keep matching state consistent when folding during X86 address matchAdam Nemet2014-10-031-0/+62
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the X86 backend, matching an address is initiated by the 'addr' complex pattern and its friends. During this process we may reassociate and-of-shift into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the shift into the scale of the address. However as demonstrated by the testcase, this can trigger CSE of not only the shift and the AND which the code is prepared for but also the underlying load node. In the testcase this node is sitting in the RecordedNode and MatchScope data structures of the matcher and becomes a deleted node upon CSE. Returning from the complex pattern function, we try to access it again hitting an assert because the node is no longer a load even though this was checked before. Now obviously changing the DAG this late is bending the rules but I think it makes sense somewhat. Outside of addresses we prefer and-of-shift because it may lead to smaller immediates (FoldMaskAndShiftToScale is an even better example because it create a non-canonical node). We currently don't recognize addresses during DAGCombiner where arguably this canonicalization should be performed. On the other hand, having this in the matcher allows us to cover all the cases where an address can be used in an instruction. I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible for initiating the recursive CSE on users (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it is not strictly necessary since the shift is hooked into the visited user. Of course it's safer to keep the DAG consistent at all times (e.g. for accurate number of uses, etc.). So rather than changing the fundamentals, I've decided to continue along the previous patches and detect the CSE. This patch installs a very targeted DAGUpdateListener for the duration of a complex-pattern match and updates the matching state accordingly. (Previous patches used HandleSDNode to detect the CSE but that's not practical here). The listener is only installed on X86. I tested that there is no measurable overhead due to this while running through the spec2k BC files with llc. The only thing we pay for is the creation of the listener. The callback never ever triggers in spec2k since this is a corner case. Fixes rdar://problem/18206171 llvm-svn: 219009
* R600: Align functions to 256 bytesTom Stellard2014-10-031-0/+2
| | | | llvm-svn: 219002
* [Power] Delete redundant test Atomics-32.llRobin Morisset2014-10-031-715/+0
| | | | | | | | The test Atomics-32.ll was both redundant (all operations are also checked by atomics.ll at least) and not actually checking correctness (it was not using FileCheck, just verifying that the compiler does not crash). llvm-svn: 218997
* llvm-readobj: print out the fields of the COFF delay-import tableRui Ueyama2014-10-031-0/+12
| | | | llvm-svn: 218996
* [Power] Use lwsync for non-seq_cst fencesRobin Morisset2014-10-031-0/+29
| | | | | | | | | | | | | | | | Summary: hwsync is only required for seq_cst fences, acquire and release one can use the cheaper lwsync. Test Plan: Added some cases to atomics.ll + make check-all Reviewers: jfb, wschmidt Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5317 llvm-svn: 218995
* [mips] Print warning when using register names not available in N32/64Daniel Sanders2014-10-031-3/+23
| | | | | | | | | | | | | | | | | | | Summary: The register names t4-t7 are not available in the N32 and N64 ABIs. This patch prints a warning, when those names are used in N32/64, along with a fix-it with the correct register names. Patch by Vasileios Kalintiris Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5272 llvm-svn: 218989
* [x86] Teach the new vector shuffle lowering to aggressively form MOVSSChandler Carruth2014-10-034-123/+66
| | | | | | | | | | | | | | | | | | | | | | | | and MOVSD nodes for single element vector inserts. This is particularly important because a number of patterns in the backend detect these patterns and leverage them to simplify things. It also fixes quite a few of the insertion bad code examples. However, it regresses a specific area: when available, blendps and blendpd are *dramatically* faster than movss and movsd respectively. But it doesn't really work to form the blend logic first because the blends *aren't* as crazy efficient when the data is coming from memory anyways, and thus will have a movss or movsd regardless. Also, doing that would block a bunch of the patterns that this is designed to hit. So my plan is to go into the patterns for lowering MOVSS and MOVSD and lower them via blends when available. However that's a pretty invasive restructuring so it will need to be a follow-up patch. I have already gone into the patterns to lower MOVSS and MOVSD from memory using MOVLPD, etc. Without that, several of the test cases I already have regress. llvm-svn: 218985
* Revert 202433 - Provide a target override for the latest regalloc heuristicRenato Golin2014-10-031-2/+2
| | | | | | | | | | | That commit was introduced in order to help investigate a problem in ARM codegen breaking from commit 202304 (Add a limit to the heuristic that register allocates instructions in local order). Recent analisys indicated that the problem no longer exists, so I'm reverting this change. See PR18996. llvm-svn: 218981
* [x86] Fix the RUN-lines of this test to make sense.Chandler Carruth2014-10-031-3/+5
| | | | | | | | | | I got them quite wrong when updating it and had the SSE4.1 run checked for SSE2 and the SSE2 run checked for SSE4.1. I think everything was actually generic SSE, but this still seems good to fix. While here, hoist the triple into the IR and make the flag set a bit more direct in what it is trying to test. llvm-svn: 218978
OpenPOWER on IntegriCloud