summaryrefslogtreecommitdiffstats
path: root/llvm/lib/Target
Commit message (Collapse)AuthorAgeFilesLines
...
* PR21145: Teach LLVM about C++14 sized deallocation functions.Richard Smith2014-10-031-0/+4
| | | | | | | | C++14 adds new builtin signatures for 'operator delete'. This change allows new/delete pairs to be removed in C++14 onwards, as they were in C++11 and before. llvm-svn: 219014
* [ISel] Keep matching state consistent when folding during X86 address matchAdam Nemet2014-10-031-0/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In the X86 backend, matching an address is initiated by the 'addr' complex pattern and its friends. During this process we may reassociate and-of-shift into shift-of-and (FoldMaskedShiftToScaledMask) to allow folding of the shift into the scale of the address. However as demonstrated by the testcase, this can trigger CSE of not only the shift and the AND which the code is prepared for but also the underlying load node. In the testcase this node is sitting in the RecordedNode and MatchScope data structures of the matcher and becomes a deleted node upon CSE. Returning from the complex pattern function, we try to access it again hitting an assert because the node is no longer a load even though this was checked before. Now obviously changing the DAG this late is bending the rules but I think it makes sense somewhat. Outside of addresses we prefer and-of-shift because it may lead to smaller immediates (FoldMaskAndShiftToScale is an even better example because it create a non-canonical node). We currently don't recognize addresses during DAGCombiner where arguably this canonicalization should be performed. On the other hand, having this in the matcher allows us to cover all the cases where an address can be used in an instruction. I've also talked a little bit to Dan Gohman on llvm-dev who added the RAUW for the new shift node in FoldMaskedShiftToScaledMask. This RAUW is responsible for initiating the recursive CSE on users (http://lists.cs.uiuc.edu/pipermail/llvmdev/2014-September/076903.html) but it is not strictly necessary since the shift is hooked into the visited user. Of course it's safer to keep the DAG consistent at all times (e.g. for accurate number of uses, etc.). So rather than changing the fundamentals, I've decided to continue along the previous patches and detect the CSE. This patch installs a very targeted DAGUpdateListener for the duration of a complex-pattern match and updates the matching state accordingly. (Previous patches used HandleSDNode to detect the CSE but that's not practical here). The listener is only installed on X86. I tested that there is no measurable overhead due to this while running through the spec2k BC files with llc. The only thing we pay for is the creation of the listener. The callback never ever triggers in spec2k since this is a corner case. Fixes rdar://problem/18206171 llvm-svn: 219009
* R600: Align functions to 256 bytesTom Stellard2014-10-032-3/+12
| | | | llvm-svn: 219002
* Eliminate some deep std::vector copies. NFC.Benjamin Kramer2014-10-037-18/+12
| | | | llvm-svn: 218999
* [Power] Use lwsync for non-seq_cst fencesRobin Morisset2014-10-031-1/+8
| | | | | | | | | | | | | | | | Summary: hwsync is only required for seq_cst fences, acquire and release one can use the cheaper lwsync. Test Plan: Added some cases to atomics.ll + make check-all Reviewers: jfb, wschmidt Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5317 llvm-svn: 218995
* MipsAsmParser.cpp: fix VS2012 buildHans Wennborg2014-10-031-1/+1
| | | | llvm-svn: 218991
* HexagonMCCodeEmitter.h: deleted member functions are not supported in VS2012Hans Wennborg2014-10-031-2/+2
| | | | llvm-svn: 218990
* [mips] Print warning when using register names not available in N32/64Daniel Sanders2014-10-031-0/+30
| | | | | | | | | | | | | | | | | | | Summary: The register names t4-t7 are not available in the N32 and N64 ABIs. This patch prints a warning, when those names are used in N32/64, along with a fix-it with the correct register names. Patch by Vasileios Kalintiris Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5272 llvm-svn: 218989
* Fix build break on HexagonSid Manning2014-10-031-1/+2
| | | | | | Differential Revision: http://reviews.llvm.org/D5600 llvm-svn: 218987
* Adding skeleton for unit testing Hexagon Code EmissionSid Manning2014-10-036-8/+173
| | | | | | | | | | | Adding and modifying CMakeLists.txt files to run unit tests under unittests/Target/* if the directory exists. Adding basic unit test to check that code emitter object can be retrieved. Differential Revision: http://reviews.llvm.org/D5523 Change by: Colin LeMahieu llvm-svn: 218986
* [x86] Teach the new vector shuffle lowering to aggressively form MOVSSChandler Carruth2014-10-032-5/+37
| | | | | | | | | | | | | | | | | | | | | | | | and MOVSD nodes for single element vector inserts. This is particularly important because a number of patterns in the backend detect these patterns and leverage them to simplify things. It also fixes quite a few of the insertion bad code examples. However, it regresses a specific area: when available, blendps and blendpd are *dramatically* faster than movss and movsd respectively. But it doesn't really work to form the blend logic first because the blends *aren't* as crazy efficient when the data is coming from memory anyways, and thus will have a movss or movsd regardless. Also, doing that would block a bunch of the patterns that this is designed to hit. So my plan is to go into the patterns for lowering MOVSS and MOVSD and lower them via blends when available. However that's a pretty invasive restructuring so it will need to be a follow-up patch. I have already gone into the patterns to lower MOVSS and MOVSD from memory using MOVLPD, etc. Without that, several of the test cases I already have regress. llvm-svn: 218985
* Revert 202433 - Provide a target override for the latest regalloc heuristicRenato Golin2014-10-032-7/+0
| | | | | | | | | | | That commit was introduced in order to help investigate a problem in ARM codegen breaking from commit 202304 (Add a limit to the heuristic that register allocates instructions in local order). Recent analisys indicated that the problem no longer exists, so I'm reverting this change. See PR18996. llvm-svn: 218981
* [x86] Refactor the element insertion logic in the new vector shuffleChandler Carruth2014-10-031-19/+21
| | | | | | | | | | | lowering to handle the potential mirroring of 2-element vectors (because we can't reliably sort them one way) in the caller rather than in the insertion logic. This will simplify things considerably as more ways to fail to match the insertion are added because now we have a nice try and retry point. llvm-svn: 218980
* [x86] Significantly improve the ability of the new vector shuffleChandler Carruth2014-10-031-26/+30
| | | | | | | | | | | | | | | | lowering to match VZEXT_MOVL patterns. I hadn't realized that these had sufficient pattern smarts in the backend to lower zext-ing from the low element of a vector without it being a scalar_to_vector node. They do, and this is how to match a bunch of patterns for movq, movss, etc. There is a weird propensity to end up using pshufd to place the element afterward even though it means domain crossing (or rather, to use xorps+movss to zext the element rather than movq) but that's an orthogonal problem with VZEXT_MOVL that someone should probably look at. llvm-svn: 218977
* [x86] Unbreak SSE1 with the new vector shuffle lowering. We can't widenChandler Carruth2014-10-031-4/+8
| | | | | | | | | element types to form illegal vector types. I've added a special SSE1 test case here that makes sure we don't break this going forward. llvm-svn: 218974
* constify TargetMachine parameter.Eric Christopher2014-10-034-5/+6
| | | | llvm-svn: 218934
* constify TargetMachine argument.Eric Christopher2014-10-034-4/+4
| | | | llvm-svn: 218930
* We can grab the options struct from the TargetMachine, no need toEric Christopher2014-10-033-5/+4
| | | | | | pass it down in the constructor. llvm-svn: 218929
* [AVX512] Pull pattern for subvector insert into the instruction definitionAdam Nemet2014-10-021-8/+4
| | | | | | | | | | No functional change intended. Very similar to the change I made for subvector extract in r218480. test/CodeGen/X86/avx512-insert-extract.ll covers this. llvm-svn: 218928
* [AVX512] Refactor subvector insertsAdam Nemet2014-10-021-102/+55
| | | | | | | | | | No functional change. Very similar to the extract refactoring I did in r218478. Compared X86.td.expanded before and after. llvm-svn: 218927
* [AVX512] Fix i256mem->f256mem typo in VINSERTF64x4rmAdam Nemet2014-10-021-1/+1
| | | | | | | Just like in the case of extracts, the refactoring is uncovering some typos in the code. llvm-svn: 218926
* [PowerPC] Modern Book-E cores support syncHal Finkel2014-10-024-17/+24
| | | | | | | | | | | | | Older Book-E cores, such as the PPC 440, support only msync (which has the same encoding as sync 0), but not any of the other sync forms. Newer Book-E cores, however, do support sync, and for performance reasons we should allow the use of the more-general form. This refactors msync use into its own feature group so that it applies by default only to older Book-E cores (of the relevant cores, we only have definitions for the PPC440/450 currently). llvm-svn: 218923
* [Power] Improve the expansion of atomic loads/storesRobin Morisset2014-10-023-4/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Atomic loads and store of up to the native size (32 bits, or 64 for PPC64) can be lowered to a simple load or store instruction (as the synchronization is already handled by AtomicExpand, and the atomicity is guaranteed thanks to the alignment requirements of atomic accesses). This is exactly what this patch does. Previously, these were implemented by complex load-linked/store-conditional loops.. an obvious performance problem. For example, this patch turns ``` define void @store_i8_unordered(i8* %mem) { store atomic i8 42, i8* %mem unordered, align 1 ret void } ``` from ``` _store_i8_unordered: ; @store_i8_unordered ; BB#0: rlwinm r2, r3, 3, 27, 28 li r4, 42 xori r5, r2, 24 rlwinm r2, r3, 0, 0, 29 li r3, 255 slw r4, r4, r5 slw r3, r3, r5 and r4, r4, r3 LBB4_1: ; =>This Inner Loop Header: Depth=1 lwarx r5, 0, r2 andc r5, r5, r3 or r5, r4, r5 stwcx. r5, 0, r2 bne cr0, LBB4_1 ; BB#2: blr ``` into ``` _store_i8_unordered: ; @store_i8_unordered ; BB#0: li r2, 42 stb r2, 0(r3) blr ``` which looks like a pretty clear win to me. Test Plan: fixed the tests + new test for indexed accesses + make check-all Reviewers: jfb, wschmidt, hfinkel Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5587 llvm-svn: 218922
* [Stackmaps] Make ithe frame-pointer required for stackmaps.Juergen Ributzka2014-10-022-2/+4
| | | | | | | | | Do not eliminate the frame pointer if there is a stackmap or patchpoint in the function. All stackmap references should be FP relative. This fixes PR21107. llvm-svn: 218920
* [x86] Teach the new vector shuffle lowering to widen floating pointChandler Carruth2014-10-022-8/+19
| | | | | | | | | | | | | | | | | | | elements as well as integer elements in order to form simpler shuffle patterns. This is the primary reason why we were failing to match some of the 2-and-2 floating point shuffles such as PR21140. Even after fixing this we need to support some extra patterns in the backend in order to match the resulting X86ISD::UNPCKL nodes into the correct instructions. This commit should fix PR21140 and includes more comprehensive testing of insertion patterns in v4 shuffles. Not all of the added tests are beautiful. For example, we don't have clever instructions to insert-via-load in the integer domain. There are also some places where we aren't sufficiently cunning with our use of movq and movd, but that's future work. llvm-svn: 218911
* [NVPTX] Remove dead code.Tilmann Scheller2014-10-021-9/+3
| | | | | | Found by the Clang static analyzer. llvm-svn: 218874
* Support padding unaligned data in .text.Joerg Sonnenberger2014-10-021-1/+6
| | | | llvm-svn: 218870
* [x86] Improve and correct how the new vector shuffle lowering wasChandler Carruth2014-10-011-8/+32
| | | | | | | | | | | | | | | | | | | matching and lowering 64-bit insertions. The first problem was that we weren't looking through bitcasts to discover that we *could* lower as insertions. Once fixed, we in turn weren't looking through bitcasts to discover that we could fold a load into the lowering. Once fixed, we weren't forming a SCALAR_TO_VECTOR node around the inserted element and instead were passing a scalar to a DAG node that expected a vector. It turns out there are some patterns that will "lower" this into the correct asm, but the rest of the X86 backend is very unhappy with such antics. This should fix a few more edge case regressions I've spotted going through the regression test suite to enable the new vector shuffle lowering. llvm-svn: 218839
* constify the TargetMachine argument used in the subtarget andEric Christopher2014-10-014-4/+4
| | | | | | lowering constructors. llvm-svn: 218832
* Lower FNEG ( FABS (x) ) -> FNABS (x) [X86 codegen] PR20578Sanjay Patel2014-10-011-6/+22
| | | | | | | | | | | | Negative FABS of either a scalar or vector should be handled the same way on x86 with SSE/AVX: a single OR instruction of the FP operand with a constant to light up the sign bit(s). http://llvm.org/bugs/show_bug.cgi?id=20578 Differential Revision: http://reviews.llvm.org/D5201 llvm-svn: 218822
* Now that the optimization level is adjusting the feature stringEric Christopher2014-10-013-9/+4
| | | | | | before we hit the subtarget, remove the constructor parameter. llvm-svn: 218817
* Rework the PPC TargetMachine so that the non-function specificEric Christopher2014-10-012-27/+32
| | | | | | | overrides happen at TargetMachine creation and not on every subtarget creation. llvm-svn: 218805
* constify TargetMachine parameter for X86TargetLowering.Eric Christopher2014-10-014-5/+5
| | | | llvm-svn: 218804
* Don't repeat function/variable name in comment. NFC.Sanjay Patel2014-10-012-99/+84
| | | | llvm-svn: 218791
* ARM: allow copying of CPSR when all else fails.Tim Northover2014-10-013-1/+57
| | | | | | | | | | | | As with x86 and AArch64, certain situations can arise where we need to spill CPSR in the middle of a calculation. These should be avoided where possible (MRS/MSR is rather expensive), which ARM is actually better at than the other two since it tries to Glue defs to uses, but as a last ditch effort, copying is better than crashing. rdar://problem/18011155 llvm-svn: 218789
* Move the complex address expression out of DIVariable and into an extraAdrian Prantl2014-10-013-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! Note: I accidentally committed a bogus older version of this patch previously. llvm-svn: 218787
* Add fptrunc to mips fast-selReed Kotler2014-10-011-0/+25
| | | | | | | | | | | | | | | | | | Summary: Implement conversion of 64 to 32 bit floating point numbers (fptrunc) in mips fast-isel Test Plan: fptrunc.ll checked also with 4 internal mips build bot flavors mip32r1/miprs32r2 and at -O0 and -O2 Reviewers: dsanders Reviewed By: dsanders Subscribers: rfuhler Differential Revision: http://reviews.llvm.org/D5553 llvm-svn: 218785
* Revert r218778 while investigating buldbot breakage.Adrian Prantl2014-10-013-11/+10
| | | | | | "Move the complex address expression out of DIVariable and into an extra" llvm-svn: 218782
* Move the complex address expression out of DIVariable and into an extraAdrian Prantl2014-10-013-10/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | argument of the llvm.dbg.declare/llvm.dbg.value intrinsics. Previously, DIVariable was a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end. By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too. Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too. The new intrinsics look like this: declare void @llvm.dbg.declare(metadata %storage, metadata %var, metadata %expr) declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %var, metadata %expr) This patch adds a new LLVM-local tag to DIExpressions, so we can detect and pretty-print DIExpression metadata nodes. What this patch doesn't do: This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step. http://reviews.llvm.org/D4919 rdar://problem/17994491 Thanks to dblaikie and dexonsmith for reviewing this patch! llvm-svn: 218778
* R600: Call EmitFunctionHeader() in the AsmPrinter to populate the ELF symbol ↵Tom Stellard2014-10-011-1/+1
| | | | | | table llvm-svn: 218776
* [mips] Rename emit and parse functions for the .cpload assembler directive. NFC.Toma Tabacu2014-10-013-10/+10
| | | | | | | | | | | | | | Summary: It's better if we have a consistent name for .cpload-related functions. Reviewers: dsanders Reviewed By: dsanders Subscribers: llvm-commits Differential Revision: http://reviews.llvm.org/D5437 llvm-svn: 218768
* R600/SI: Add a generic pseudo EXP instructionTom Stellard2014-10-013-8/+30
| | | | llvm-svn: 218767
* R600/SI: Add generic pseudo MTBUF instructionsTom Stellard2014-10-013-31/+58
| | | | llvm-svn: 218766
* R600/SI: Add generic pseudo SMRD instructionsTom Stellard2014-10-012-14/+39
| | | | llvm-svn: 218765
* [ARM] Allow selecting VRINT[APMXZR] and VCVT[BT] instructions for FPv5Oliver Stannard2014-10-011-12/+17
| | | | | | | | | | Currently, we only codegen the VRINT[APMXZR] and VCVT[BT] instructions when targeting ARMv8, but they are actually present on any target with FP-ARMv8. Note that FP-ARMv8 is called FPv5 when is is part of an M-profile core, but they have the same instructions so we model them both as FPARMv8 in the ARM backend. llvm-svn: 218763
* [x86] Fix a few more tiny patterns with the new vector shuffle loweringChandler Carruth2014-10-011-5/+23
| | | | | | | | | | | | | | | | | | | that keep cropping up in the regression test suite. This also addresses one of the issues raised on the mailing list with failing to form 'movsd' in as many cases as we realistically should. There will be corresponding patches forthcoming for v4f32 at least. This was a lot of fuss for a relatively small gain, but all the fuss was on my end trying different ways of holding the pieces of the x86 fragment patterns *just right*. Now that it works, the code is reasonably simple. In the new test cases I'm adding here, v2i64 sticks out as just plain horrible. I've not come up with any great ideas here other than that it would be nice to recognize when we're *going* to take a domain crossing hit and cross earlier to get the decent instructions. At least with AVX it is slightly less silly.... llvm-svn: 218756
* [x86] Delete some extraneous logic from the new vector shuffle lowering.Chandler Carruth2014-10-011-7/+0
| | | | | | | | Nothing was relying on this and there are potentially some edge cases that it would not be correct under. Removing it seems better than trying to "fix" it as nothing was relying on it. llvm-svn: 218755
* [AArch64] Allow access to all system registers with MRS/MSR instructions.Tom Coxon2014-10-015-60/+26
| | | | | | | | | | | | | | | | | | | | | The A64 instruction set includes a generic register syntax for accessing implementation-defined system registers. The syntax for these registers is: S<op0>_<op1>_<CRn>_<CRm>_<op2> The encoding space permitted for implementation-defined system registers is: op0 op1 CRn CRm op2 11 xxx 1x11 xxxx xxx The full encoding space can now be accessed: op0 op1 CRn CRm op2 xx xxx xxxx xxxx xxx This is useful to anyone needing to write assembly code supporting new system registers before the assembler has learned the official names for them. llvm-svn: 218753
* Add missing natual vector cast.Asiri Rathnayake2014-10-012-0/+2
| | | | | | | | | Summary: The natual vector cast node (similar to bitcast) AArch64ISD::NVCAST was introduced in r217159 and r217138. This patch adds a missing cast from v2f32 to v1i64 which is causing some compilation failures. Also added test cases to cover various modimm types and BUILD_VECTORs with i64 elements. llvm-svn: 218751
* [ARM] Add support for Cortex-M7, FPv5-SP and FPv5-DP (LLVM)Oliver Stannard2014-10-015-1/+20
| | | | | | | | | The Cortex-M7 has 3 options for its FPU: none, FPv5-SP-D16 and FPv5-DP-D16. FPv5 has the same instructions as FP-ARMv8, so it can be modelled using the same target feature, and all double-precision operations are already disabled by the fp-only-sp target features. llvm-svn: 218747
OpenPOWER on IntegriCloud