summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
...
* Move registering the execution of a basic block to the beginning rather than ↵Bill Wendling2013-08-201-7/+7
| | | | | | | | | | | | | | | | | the end. There are situations which can affect the correctness (or at least expectation) of the gcov output. For instance, if a call to __gcov_flush() occurs within a block before the execution count is registered and then the program aborts in some way, then that block will not be marked as executed. This is not normally what the user expects. If we move the code that's registering when a block is executed to the beginning, we can catch these types of situations. PR16893 llvm-svn: 188849
* [mips] Add support for mfhc1 and mthc1.Akira Hatanaka2013-08-203-19/+45
| | | | llvm-svn: 188848
* [mips] Add support for calling convention CC_MipsO32_FP64, which is used ↵Akira Hatanaka2013-08-206-32/+70
| | | | | | | | | | when the size of floating point registers is 64-bit. Test case will be added when support for mfhc1 and mthc1 is added. llvm-svn: 188847
* [mips] Remove predicates that were incorrectly or unnecessarily added.Akira Hatanaka2013-08-203-6/+6
| | | | llvm-svn: 188845
* Add some constantness.Jakub Staszak2013-08-201-5/+6
| | | | llvm-svn: 188844
* [mips] Define register class FGRH32 for the high half of the 64-bit floatingAkira Hatanaka2013-08-206-16/+60
| | | | | | | | point registers. We will need this register class later when we add definitions for instructions mfhc1 and mthc1. Also, remove sub-register indices sub_fpeven and sub_fpodd and use sub_lo and sub_hi instead. llvm-svn: 188842
* SLPVectorizer: Fix invalid iterator errorsArnold Schwaighofer2013-08-201-13/+51
| | | | | | | | | | | Update iterator when the SLP vectorizer changes the instructions in the basic block by restarting the traversal of the basic block. Patch by Yi Jiang! Fixes PR 16899. llvm-svn: 188832
* Teach ConstantFolding about pointer address spacesMatt Arsenault2013-08-201-33/+54
| | | | llvm-svn: 188831
* [mips] Resolve register classes dynamically using ptr_rc to reduce the number ofAkira Hatanaka2013-08-2010-371/+184
| | | | | | | | load/store instructions defined. Previously, we were defining load/store instructions for each pointer size (32 and 64-bit), but now we need just one definition. llvm-svn: 188830
* Add an option which permits the user to specify using a bitmask, that variousReed Kotler2013-08-201-6/+28
| | | | | | | | | functions be compiled as mips32, without having to add attributes. This is useful in certain situations where you don't want to have to edit the function attributes in the source. For now it's only an option used for the compiler developers when debugging the mips16 port. llvm-svn: 188826
* [mips] Guard micromips instructions with predicate InMicroMips. Also, fixAkira Hatanaka2013-08-202-12/+10
| | | | | | assembler predicate HasStdEnd so that it is false when the target is micromips. llvm-svn: 188824
* ARM: Fix fast-isel copy/paste-o.Jim Grosbach2013-08-201-1/+1
| | | | | | | | | | | | Update testcase to be more careful about checking register values. While regexes are general goodness for these sorts of testcases, in this example, the registers are constrained by the calling convention, so we can and should check their explicit values. rdar://14779513 llvm-svn: 188819
* Fix style issues in AsmParser.cppVladimir Medic2013-08-201-9/+11
| | | | llvm-svn: 188798
* AVX-512: Added more patterns for VMOVSS, VMOVSD, VMOVD, VMOVQElena Demikhovsky2013-08-202-12/+71
| | | | llvm-svn: 188786
* [mips][msa] Removed fcge, fcgt, fsge, fsgtDaniel Sanders2013-08-201-44/+0
| | | | | | | These instructions were present in a draft spec but were removed before publication. llvm-svn: 188782
* [SystemZ] Update READMERichard Sandiford2013-08-201-4/+2
| | | | | | We now use MVST, CLST and SRST for the obvious cases. llvm-svn: 188781
* [SystemZ] Use SRST to optimize memchrRichard Sandiford2013-08-206-3/+80
| | | | | | | | | | | | | | | | | | | SystemZTargetLowering::emitStringWrapper() previously loaded the character into R0 before the loop and made R0 live on entry. I'd forgotten that allocatable registers weren't allowed to be live across blocks at this stage, and it confused LiveVariables enough to cause a miscompilation of f3 in memchr-02.ll. This patch instead loads R0 in the loop and leaves LICM to hoist it after RA. This is actually what I'd tried originally, but I went for the manual optimisation after noticing that R0 often wasn't being hoisted. This bug forced me to go back and look at why, now fixed as r188774. We should also try to optimize null checks so that they test the CC result of the SRST directly. The select between null and the SRST GPR result could then usually be deleted as dead. llvm-svn: 188779
* memcmp is not a valid way to compare structs with padding in them.Benjamin Kramer2013-08-201-2/+9
| | | | llvm-svn: 188778
* [mips][msa] Added insveDaniel Sanders2013-08-201-0/+32
| | | | llvm-svn: 188777
* Fix overly pessimistic shortcut in post-RA MachineLICMRichard Sandiford2013-08-201-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Post-RA LICM keeps three sets of registers: PhysRegDefs, PhysRegClobbers and TermRegs. When it sees a definition of R it adds all aliases of R to the corresponding set, so that when it needs to test for membership it only needs to test a single register, rather than worrying about aliases there too. E.g. the final candidate loop just has: unsigned Def = Candidates[i].Def; if (!PhysRegClobbers.test(Def) && ...) { to test whether register Def is multiply defined. However, there was also a shortcut in ProcessMI to make sure we didn't add candidates if we already knew that they would fail the final test. This shortcut was more pessimistic than the final one because it checked whether _any alias_ of the defined register was multiply defined. This is too conservative for targets that define register pairs. E.g. on z, R0 and R1 are sometimes used as a pair, so there is a 128-bit register that aliases both R0 and R1. If a loop used R0 and R1 independently, and the definition of R0 came first, we would be able to hoist the R0 assignment (because that used the final test quoted above) but not the R1 assignment (because that meant we had two definitions of the paired R0/R1 register and would fail the shortcut in ProcessMI). This patch just uses the same check for the ProcessMI shortcut as we use in the final candidate loop. llvm-svn: 188774
* ARM: implement some simple f64 materializations.Tim Northover2013-08-201-10/+40
| | | | | | | | Previously we used a const-pool load for virtually all 64-bit floating values. Actually, we can get quite a few common values (including 0.0, 1.0) via "vmov" instructions of one stripe or another. llvm-svn: 188773
* [stackprotector] Small cleanup.Michael Gottesman2013-08-201-1/+2
| | | | llvm-svn: 188772
* [stackprotector] Small Bit of computation hoisting.Michael Gottesman2013-08-201-5/+5
| | | | llvm-svn: 188771
* [stackprotector] Added significantly longer comment to FindPotentialTailCall ↵Michael Gottesman2013-08-201-1/+6
| | | | | | to make clear its relationship to llvm::isInTailCallPosition. llvm-svn: 188770
* Removed trailing whitespace.Michael Gottesman2013-08-201-18/+18
| | | | llvm-svn: 188769
* [stackprotector] Removed stale TODO.Michael Gottesman2013-08-201-6/+3
| | | | llvm-svn: 188768
* [mips][msa] Added and.v, bmnz.v, bmz.v, bsel.v, nor.v, or.v, xor.vDaniel Sanders2013-08-202-0/+64
| | | | llvm-svn: 188767
* [stackprotector] Added support for emitting the llvm intrinsic stack ↵Michael Gottesman2013-08-201-49/+150
| | | | | | | | protector check. rdar://13935163 llvm-svn: 188766
* [stackprotector] Refactor out the end of isInTailCallPosition into the ↵Michael Gottesman2013-08-201-1/+8
| | | | | | | | | | function returnTypeIsEligibleForTailCall. This allows me to use returnTypeIsEligibleForTailCall in the stack protector pass. rdar://13935163 llvm-svn: 188765
* Remove unused variables that crept in.Michael Gottesman2013-08-201-6/+0
| | | | llvm-svn: 188761
* Teach selectiondag how to handle the stackprotectorcheck intrinsic.Michael Gottesman2013-08-203-4/+390
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Previously, generation of stack protectors was done exclusively in the pre-SelectionDAG Codegen LLVM IR Pass "Stack Protector". This necessitated splitting basic blocks at the IR level to create the success/failure basic blocks in the tail of the basic block in question. As a result of this, calls that would have qualified for the sibling call optimization were no longer eligible for optimization since said calls were no longer right in the "tail position" (i.e. the immediate predecessor of a ReturnInst instruction). Then it was noticed that since the sibling call optimization causes the callee to reuse the caller's stack, if we could delay the generation of the stack protector check until later in CodeGen after the sibling call decision was made, we get both the tail call optimization and the stack protector check! A few goals in solving this problem were: 1. Preserve the architecture independence of stack protector generation. 2. Preserve the normal IR level stack protector check for platforms like OpenBSD for which we support platform specific stack protector generation. The main problem that guided the present solution is that one can not solve this problem in an architecture independent manner at the IR level only. This is because: 1. The decision on whether or not to perform a sibling call on certain platforms (for instance i386) requires lower level information related to available registers that can not be known at the IR level. 2. Even if the previous point were not true, the decision on whether to perform a tail call is done in LowerCallTo in SelectionDAG which occurs after the Stack Protector Pass. As a result, one would need to put the relevant callinst into the stack protector check success basic block (where the return inst is placed) and then move it back later at SelectionDAG/MI time before the stack protector check if the tail call optimization failed. The MI level option was nixed immediately since it would require platform specific pattern matching. The SelectionDAG level option was nixed because SelectionDAG only processes one IR level basic block at a time implying one could not create a DAG Combine to move the callinst. To get around this problem a few things were realized: 1. While one can not handle multiple IR level basic blocks at the SelectionDAG Level, one can generate multiple machine basic blocks for one IR level basic block. This is how we handle bit tests and switches. 2. At the MI level, tail calls are represented via a special return MIInst called "tcreturn". Thus if we know the basic block in which we wish to insert the stack protector check, we get the correct behavior by always inserting the stack protector check right before the return statement. This is a "magical transformation" since no matter where the stack protector check intrinsic is, we always insert the stack protector check code at the end of the BB. Given the aforementioned constraints, the following solution was devised: 1. On platforms that do not support SelectionDAG stack protector check generation, allow for the normal IR level stack protector check generation to continue. 2. On platforms that do support SelectionDAG stack protector check generation: a. Use the IR level stack protector pass to decide if a stack protector is required/which BB we insert the stack protector check in by reusing the logic already therein. If we wish to generate a stack protector check in a basic block, we place a special IR intrinsic called llvm.stackprotectorcheck right before the BB's returninst or if there is a callinst that could potentially be sibling call optimized, before the call inst. b. Then when a BB with said intrinsic is processed, we codegen the BB normally via SelectBasicBlock. In said process, when we visit the stack protector check, we do not actually emit anything into the BB. Instead, we just initialize the stack protector descriptor class (which involves stashing information/creating the success mbbb and the failure mbb if we have not created one for this function yet) and export the guard variable that we are going to compare. c. After we finish selecting the basic block, in FinishBasicBlock if the StackProtectorDescriptor attached to the SelectionDAGBuilder is initialized, we first find a splice point in the parent basic block before the terminator and then splice the terminator of said basic block into the success basic block. Then we code-gen a new tail for the parent basic block consisting of the two loads, the comparison, and finally two branches to the success/failure basic blocks. We conclude by code-gening the failure basic block if we have not code-gened it already (all stack protector checks we generate in the same function, use the same failure basic block). llvm-svn: 188755
* Fix formatting. No functional change.Craig Topper2013-08-201-1/+1
| | | | llvm-svn: 188746
* Add AVX-512 and related features to the CPUID detection code.Craig Topper2013-08-201-3/+19
| | | | llvm-svn: 188745
* Move AVX and non-AVX replication inside a couple multiclasses to avoid ↵Craig Topper2013-08-201-87/+60
| | | | | | repeating each instruction for both individually. llvm-svn: 188743
* Add an error check for a typo I accidentally made in a td file that caused ↵Craig Topper2013-08-201-0/+3
| | | | | | an assert to fire. llvm-svn: 188742
* [PowerPC] More refactoring prior to real PPC emitPrologue/Epilogue changes.Bill Schmidt2013-08-201-271/+194
| | | | | | | | | | | | | | | | | | | | | | | | | | | | (Patch committed on behalf of Mark Minich, whose log entry follows.) This is a continuation of the refactorings performed in svn rev 188573 (see that rev's comments for more detail). This is my stage 2 refactoring: I combined the emitPrologue() & emitEpilogue() PPC32 & PPC64 code into a single flow, simplifying a lot of the code since in essence the PPC32 & PPC64 code generation logic is the same, only the instruction forms are different (in most cases). This simplification is necessary because my functional changes (yet to come) add significant complexity, and without the simplification of my stage 2 refactoring, the overall complexity of both emitPrologue() & emitEpilogue() would have become almost intractable for most mortal programmers (like me). This submission was intended to be a pure refactoring (no functional changes whatsoever). However, in the process of combining the PPC32 & PPC64 flows, I spotted a difference that I believe is a bug (see svn rev 186478 line 863, or svn rev 188573 line 888): This line appears to be restoring the BP with the original FP content, not the original BP content. When I merged the 32-bit and 64-bit code, I used the corresponding code from the 64-bit flow, which I believe uses the correct offset (BPOffset) for this operation. llvm-svn: 188741
* [Sparc] Use HWEncoding instead of unused Num field in Sparc register ↵Venkatraman Govindaraju2013-08-202-12/+9
| | | | | | definitions. Also, correct the definitions of RETL and RET instructions. llvm-svn: 188738
* Add a llvm.copysign intrinsicHal Finkel2013-08-197-0/+26
| | | | | | | | | | | | | | | | | | | | | This adds a llvm.copysign intrinsic; We already have Libfunc recognition for copysign (which is turned into the FCOPYSIGN SDAG node). In order to autovectorize calls to copysign in the loop vectorizer, we need a corresponding intrinsic as well. In addition to the expected changes to the language reference, the loop vectorizer, BasicTTI, and the SDAG builder (the intrinsic is transformed into an FCOPYSIGN node, just like the function call), this also adds FCOPYSIGN to a few lists in LegalizeVector{Ops,Types} so that vector copysigns can be expanded. In TargetLoweringBase::initActions, I've made the default action for FCOPYSIGN be Expand for vector types. This seems correct for all in-tree targets, and I think is the right thing to do because, previously, there was no way to generate vector-values FCOPYSIGN nodes (and most targets don't specify an action for vector-typed FCOPYSIGN). llvm-svn: 188728
* Don't form PPC CTR-based loops around a copysignl callHal Finkel2013-08-191-1/+2
| | | | | | | | | copysign/copysignf never become function calls (because the SDAG expansion code does not lower to the corresponding function call, but rather directly implements the associated logic), but copysignl almost always is lowered into a call to the requested libm functon (and, thus, might clobber CTR). llvm-svn: 188727
* Adding PIC support for ELF on x86_64 platformsAndrew Kaylor2013-08-194-16/+244
| | | | llvm-svn: 188726
* Introduce non-const overloads for GlobalAlias::{get,resolve}AliasedGlobal.Peter Collingbourne2013-08-191-8/+8
| | | | llvm-svn: 188725
* Use pop_back_val() instead of both back() and pop_back().Jakub Staszak2013-08-191-2/+1
| | | | llvm-svn: 188723
* Teach InstCombine visitGetElementPtr about address spacesMatt Arsenault2013-08-193-20/+26
| | | | llvm-svn: 188721
* Cleanup visitGetElementPtr to make address space change easierMatt Arsenault2013-08-191-11/+13
| | | | llvm-svn: 188720
* commonPointerCast cleanups to make address space change easierMatt Arsenault2013-08-191-5/+11
| | | | llvm-svn: 188719
* Fix assert with GEP ptr vector indexing structsMatt Arsenault2013-08-191-2/+12
| | | | | | | | Also fix it calculating the wrong value. The struct index is not a ConstantInt, so it was being interpreted as an array index. llvm-svn: 188713
* Use less verbose code and update comments.Eric Christopher2013-08-191-23/+16
| | | | llvm-svn: 188711
* Revert non-test parts of r188507Matt Arsenault2013-08-191-1/+9
| | | | | | Re-add the inboundsless tests I didn't add originally llvm-svn: 188710
* Turn on pubnames by default on linux.Eric Christopher2013-08-192-10/+22
| | | | | | | | | Until gdb supports the new accelerator tables we should add the pubnames section so that gdb_index can be generated from gold at link time. On darwin we already emit the accelerator tables and so don't need to worry about pubnames. llvm-svn: 188708
* Improve the widening of integral binary vector operationsPaul Redmond2013-08-192-10/+24
| | | | | | | | | | | | | | - split WidenVecRes_Binary into WidenVecRes_Binary and WidenVecRes_BinaryCanTrap - WidenVecRes_BinaryCanTrap preserves the original behaviour for operations that can trap - WidenVecRes_Binary simply widens the operation and improves codegen for 3-element vectors by allowing widening and promotion on x86 (matches the behaviour of unary and ternary operation widening) - use WidenVecRes_Binary for operations on integers. Reviewed by: nrotem llvm-svn: 188699
OpenPOWER on IntegriCloud