summaryrefslogtreecommitdiffstats
path: root/llvm/lib
Commit message (Collapse)AuthorAgeFilesLines
* Move ashr optimization from InstCombineShift to InstSimplify.Suyog Sarda2014-07-172-5/+5
| | | | | | | | | Refactor code, no functionality change, test case moved from instcombine to instsimplify. Differential Revision: http://reviews.llvm.org/D4102 llvm-svn: 213231
* Use range forMatt Arsenault2014-07-171-6/+4
| | | | llvm-svn: 213230
* R600: Short circuit alloca check if address space isn't private.Matt Arsenault2014-07-171-1/+1
| | | | | | | Skip calling GetUnderlyingObject in cases where it obviously isn't from an alloca. This should only be a compile time improvement. llvm-svn: 213229
* Fix Typo (first commit to test commit access)Suyog Sarda2014-07-171-1/+1
| | | | llvm-svn: 213228
* MC: make WinEH opcode an opaque valueSaleem Abdulrasool2014-07-172-16/+29
| | | | | | | | | | | This makes the opcode an opaque value (unsigned int) rather than the enumeration. This permits the use of target specific operands. Split out the generic type into a MCWinEH header and add a supporting MCWin64EH::Instruction to abstract out the selection of the opcode and construction of the actual instruction. llvm-svn: 213221
* Improve BasicAA CS-CS queries (redux)Hal Finkel2014-07-173-130/+151
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This reverts, "r213024 - Revert r212572 "improve BasicAA CS-CS queries", it causes PR20303." with a fix for the bug in pr20303. As it turned out, the relevant code was both wrong and over-conservative (because, as with the code it replaced, it would return the overall ModRef mask even if just Ref had been implied by the argument aliasing results). Hopefully, this correctly fixes both problems. Thanks to Nick Lewycky for reducing the test case for pr20303 (which I've cleaned up a little and added in DSE's test directory). The BasicAA test has also been updated to check for this error. Original commit message: BasicAA contains knowledge of certain intrinsics, such as memcpy and memset, and uses that information to form more-accurate answers to CallSite vs. Loc ModRef queries. Unfortunately, it did not use this information when answering CallSite vs. CallSite queries. Generically, when an intrinsic takes one or more pointers and the intrinsic is marked only to read/write from its arguments, the offset/size is unknown. As a result, the generic code that answers CallSite vs. CallSite (and CallSite vs. Loc) queries in AA uses UnknownSize when forming Locs from an intrinsic's arguments. While BasicAA's CallSite vs. Loc override could use more-accurate size information for some intrinsics, it did not do the same for CallSite vs. CallSite queries. This change refactors the intrinsic-specific logic in BasicAA into a generic AA query function: getArgLocation, which is overridden by BasicAA to supply the intrinsic-specific knowledge, and used by AA's generic implementation. This allows the intrinsic-specific knowledge to be used by both CallSite vs. Loc and CallSite vs. CallSite queries, and simplifies the BasicAA implementation. Currently, only one function, Mac's memset_pattern16, is handled by BasicAA (all the rest are intrinsics). As a side-effect of this refactoring, BasicAA's getModRefBehavior override now also returns OnlyAccessesArgumentPointees for this function (which is an improvement). llvm-svn: 213219
* Partially revert r210444 due to performance regressionJingyue Wu2014-07-161-57/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Converting outermost zext(a) to sext(a) causes worse code when the computation of zext(a) could be reused. For example, after converting ... = array[zext(a)] ... = array[zext(a) + 1] to ... = array[sext(a)] ... = array[zext(a) + 1], the program computes sext(a), which is actually unnecessary. I added one test in split-gep-and-gvn.ll to illustrate this scenario. Also, with r211281 and r211084, we annotate more "nuw" tags to computation involving CUDA intrinsics such as threadIdx.x. These annotations help with splitting GEP a lot, rendering the benefit we get from this reverted optimization only marginal. Test Plan: make check-all Reviewers: eliben, meheff Reviewed By: meheff Subscribers: jholewinski, llvm-commits Differential Revision: http://reviews.llvm.org/D4542 llvm-svn: 213209
* Fixed formatting, removed bug reference, renamed testcaseSanjay Patel2014-07-161-3/+4
| | | | | | Thanks to Duncan Exon Smith for reviewing and cleanup suggestions. llvm-svn: 213205
* [FastISel] Local values shouldn't be alive across an inline asm call with ↵Juergen Ributzka2014-07-161-0/+5
| | | | | | | | | | | | | | side effects. This fixes an issue where a local value is defined before and used after an inline asm call with side effects. This fix simply flushes the local value map, which updates the insertion point for the inline asm call to be above any previously defined local values. This fixes <rdar://problem/17694203> llvm-svn: 213203
* [MCJIT] Improve a RuntimeDyldChecker diagnostic.Lang Hames2014-07-161-3/+7
| | | | | | | When a RuntimeDyldChecker test requests an invalid operand for an instruction, print the decoded instruction to aid diagnosis. llvm-svn: 213202
* trivial fix for PR20314Sanjay Patel2014-07-161-1/+4
| | | | | | Make sure that the AddrInst is an Instruction. llvm-svn: 213197
* Remove Atom references in description.Sanjay Patel2014-07-161-4/+3
| | | | | | Any CPU can run this pass. llvm-svn: 213190
* Utilize CastInst::CreatePointerBitCastOrAddrSpaceCast here.Manuel Jacob2014-07-161-9/+6
| | | | llvm-svn: 213189
* [RegisterCoalescer] Moving the RegisterCoalescer subtarget hook onto the ↵Chris Bieneman2014-07-165-66/+68
| | | | | | TargetRegisterInfo instead of the TargetSubtargetInfo. llvm-svn: 213188
* [NVPTX] Honor alignment on vector loads/storesJustin Holewinski2014-07-161-5/+31
| | | | | | | | | | | | | | | | | | | | | | | | | We were not considering the stated alignment on vector loads/stores, leading us to generate vector instructions even when we do not have sufficient alignment. Now, for IR like: %1 = load <4 x float>, <4 x float>* %ptr, align 4 we will generate correct, conservative PTX like: ld.f32 ... [%ptr] ld.f32 ... [%ptr+4] ld.f32 ... [%ptr+8] ld.f32 ... [%ptr+12] Or if we have an alignment of 8 (for example), we can generate code like: ld.v2.f32 ... [%ptr] ld.v2.f32 ... [%ptr+8] llvm-svn: 213186
* Remove unnecessary/redundant std::moveDavid Blaikie2014-07-161-1/+1
| | | | | | (run returns unique_ptr by value already) llvm-svn: 213174
* Added documentation for SizeMultiplier in the ARM subtarget hook for ↵Chris Bieneman2014-07-161-2/+11
| | | | | | | | register coalescing. Also fixed some 80 col violations. No functional code changes. llvm-svn: 213169
* [NVPTX] Rename registers %fl -> %fd and %rl -> %rdJustin Holewinski2014-07-164-8/+8
| | | | | | This matches the internal behavior of NVIDIA tools like libnvvm. llvm-svn: 213168
* CodeGen: don't form illegail EXTLOAD operations.Tim Northover2014-07-162-5/+8
| | | | | | | | | | | | | | | | | It turns out that in most cases (the main exception being i1-related types) once these operations are formed we cannot separate them and the targets end up having to deal with them whether they want to or not. This is not a good situation, and a more reasonable default can be formed by ackowledging this and having targets leave them as Legal. Only x86 seems to be affected (other targets don't even try marking the operation Expand). Mostly there's no visible change here yet, but it will be useful to have truly expanded EXTLOADS for MVT::f16 softening support. llvm-svn: 213162
* [mips][fp64a] Temporarily disable odd-numbered double-precision registers ↵Daniel Sanders2014-07-161-1/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | when using the FP64A ABI. Summary: A few instructions (mostly cvt.d.w and similar) are causing problems with -mfp64 and -mno-odd-spreg and it looks like fixing it properly may take several weeks. In the meantime, let's disable the odd-numbered double-precision registers so that the generated code is at least valid. The problem is that instructions like cvt.d.w read from the 32-bit low subregister of a double-precision FPU register. This often leads to the compiler to inserting moves to transfer a GPR32 to a FGR32 using mtc1. Such moves violate the rules against 32-bit writes to odd-numbered FPU registers imposed by -mno-odd-spreg. By disabling the odd-numbered double-precision registers, it becomes impossible for the 32-bit low subregister to be odd-numbered. This fixes numerous test-suite failures when compiling for the FP64A ABI ('-mfp64 -mno-odd-spreg'). There is no LLVM test case because it's difficult to test that odd-numbered FPU registers are not allocatable. Instead, we depend on the assembler (GAS and -fintegrated-as) raising errors when the rules are violated. Differential Revision: http://reviews.llvm.org/D4532 llvm-svn: 213160
* [X86] Add a check for 'isMOVHLPSMask' within method 'isShuffleMaskLegal'.Andrea Di Biagio2014-07-161-0/+1
| | | | | | | | | | | | | | | | Before this change, method 'isShuffleMaskLegal' didn't know that shuffles implementing a 'movhlps' operation were perfectly legal for SSE targets. This patch adds the missing check for 'isMOVHLPSMask' inside method 'isShuffleMaskLegal' to fix the problem. The reason why it is important to do this is because the DAGCombiner conservatively avoids combining a pair of shuffles if the resulting shuffle node has an illegal mask. Before this patch, shuffles with a MOVHLPS mask were wrongly considered not to be legal. This was the root cause of some poor-code generation bugs. llvm-svn: 213137
* Roundtrip the inalloca bit on allocas through bitcodeReid Kleckner2014-07-162-4/+15
| | | | | | | | | | | | | This was an oversight in the original support. As it is, I stuffed this bit into the alignment. The alignment is stored in log2 form, so it doesn't need more than 5 bits, given that Value::MaximumAlignment is 1 << 29. Reviewers: nicholas Differential Revision: http://reviews.llvm.org/D3943 llvm-svn: 213118
* Fix comment in InstCombiner::visitAddrSpaceCast.Manuel Jacob2014-07-161-3/+3
| | | | | | | | In the original version of the patch the behaviour was like described in the comment. This behaviour was changed before committing it without updating the comment. llvm-svn: 213117
* Perform wildcard expansion in Process::GetArgumentVector on Windows (PR17098)Hans Wennborg2014-07-161-19/+71
| | | | | | | | | | | | | | On Windows, wildcard expansion isn't performed by the shell, but left to the program itself. The common way to do this is to link with setargv.obj, which performs the expansion on argc/argv before main is entered. However, we don't use argv in Clang on Windows, but instead call GetCommandLineW so we can handle unicode arguments. This means we have to do wildcard expansion ourselves. A test case will be added on the Clang side. Differential Revision: http://reviews.llvm.org/D4529 llvm-svn: 213114
* Emit warnings if vectorization is forced and fails.Tyler Nowicki2014-07-162-10/+42
| | | | | | | | | | | This patch modifies the existing DiagnosticInfo system to create a generic base class that is inherited to produce diagnostic-based warnings. This is used by the loop vectorizer to trigger a warning when vectorization is forced and fails. Several tests have been added to verify this behavior. Reviewed by: Arnold Schwaighofer llvm-svn: 213110
* Remove TLI from isInTailCallPosition's arguments. NFC.Juergen Ributzka2014-07-163-5/+5
| | | | | | | There is no need to pass on TLI separately to the function. As Eric pointed out the Target Machine already provides everything we need. llvm-svn: 213108
* R600/SI: Allow using f32 rcp / rsq when denormals not handled.Matt Arsenault2014-07-153-10/+31
| | | | | | | These are precise enough to use for OpenCL unless denormals are handled. llvm-svn: 213107
* X86: Simplify X86WindowsTargetObjectFile::getSectionForConstantDavid Majnemer2014-07-151-9/+3
| | | | | | | | | There exists a helper function to abstract away the various differences between ConstantVector, ConstantDataVector, ConstantAggregateZero, etc. Use it to simplify X86WindowsTargetObjectFile::getSectionForConstant. llvm-svn: 213104
* Move Post RA Scheduling flag bit into SchedMachineModelSanjay Patel2014-07-1513-95/+65
| | | | | | | | | | | | | | | | | | | | | Refactoring; no functional changes intended Removed PostRAScheduler bits from subtargets (X86, ARM). Added PostRAScheduler bit to MCSchedModel class. This bit is set by a CPU's scheduling model (if it exists). Removed enablePostRAScheduler() function from TargetSubtargetInfo and subclasses. Fixed the existing enablePostMachineScheduler() method to use the MCSchedModel (was just returning false!). Added methods to TargetSubtargetInfo to allow overrides for AntiDepBreakMode, CriticalPathRCs, and OptLevel for PostRAScheduling. Added enablePostRAScheduler() function to PostRAScheduler class which queries the subtarget for the above values. Preserved existing scheduler behavior for ARM, MIPS, PPC, and X86: a. ARM overrides the CPU's postRA settings by enabling postRA for any non-Thumb or Thumb2 subtarget. b. MIPS overrides the CPU's postRA settings by enabling postRA for everything. c. PPC overrides the CPU's postRA settings by enabling postRA for everything. d. X86 is the only target that actually has postRA specified via sched model info. Differential Revision: http://reviews.llvm.org/D4217 llvm-svn: 213101
* [dfsan] Introduce further optimization to reduce the number of union queries.Peter Collingbourne2014-07-151-0/+36
| | | | | | | Specifically, do not compute a union if it is statically known that one shadow set subsumes the other. llvm-svn: 213100
* R600/SI: Fix select on i1Matt Arsenault2014-07-151-0/+3
| | | | llvm-svn: 213096
* R600/SI: Implement less wrong f32 fdivMatt Arsenault2014-07-153-7/+83
| | | | | | | Assuming single precision denormals and accurate sqrt/div are not reported, this passes the OpenCL conformance test. llvm-svn: 213089
* R600: Add predicate for UnsafeFPMathMatt Arsenault2014-07-151-0/+1
| | | | llvm-svn: 213088
* R600: Remove intrinsics that appear to be unusedMatt Arsenault2014-07-151-3/+0
| | | | llvm-svn: 213087
* [RuntimeDyld] Revert r211652 - MachO object GDB registration support.Lang Hames2014-07-153-149/+22
| | | | | | | | The registration scheme used in r211652 violated the read-only contract of MemoryBuffer. This caused crashes in llvm-rtdyld where macho objects were backed by read-only mmap'd memory. llvm-svn: 213086
* [RegisterCoalescer] Add new subtarget hook allowing targets to opt-out of ↵Chris Bieneman2014-07-154-0/+90
| | | | | | | | | | coalescing. The coalescer is very aggressive at propagating constraints on the register classes, and the register allocator doesn’t know how to split sub-registers later to recover. This patch provides an escape valve for targets that encounter this problem to limit coalescing. This patch also implements such for ARM to lower register pressure when using lots of large register classes. This works around PR18825. llvm-svn: 213078
* Revert r213070. It's breaking the build in MCELFStreamer::EmitInstToData(...).Cameron McInally2014-07-151-6/+0
| | | | llvm-svn: 213073
* R600: Implement zero undef variants of ctlz/cttzJan Vesely2014-07-153-0/+17
| | | | | | | | | v2: use ffbh/l if available v3: Rebase on top of Matt's SI patches Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu> Reviewed-by: Tom Stellard <tom@stellard.net> llvm-svn: 213072
* [mips] Correct .MIPS.abiflags fp_abi field for -mfpxx and without .moduleDaniel Sanders2014-07-151-1/+1
| | | | | | | | Summary: Previously all the test cases set it after initialization with '.module fp=xx'. Differential Revision: http://reviews.llvm.org/D4489 llvm-svn: 213071
* Add x86 patterns to match a specific add-with-carry. Cameron McInally2014-07-151-0/+6
| | | | llvm-svn: 213070
* [DAGCombiner] Add more rules to fold shuffles.Andrea Di Biagio2014-07-151-7/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds two new rules to the DAGCombiner: 1. shuffle (shuffle A, Undef, M0), B, M1 -> shuffle A, B, M2 2. shuffle (shuffle A, Undef, M0), A, M1 -> shuffle A, Undef, M2 We only do this if the combined shuffle is legal for the target. Example: ;; define <4 x float> @test(<4 x float> %a, <4 x float> %b) { %1 = shufflevector <4 x float> %a, <4 x float> undef, <4 x i32><i32 6, i32 0, i32 1, i32 7> %2 = shufflevector <4 x float> %1, <4 x float> %b, <4 x i32><i32 1, i32 2, i32 4, i32 5> ret <4 x i32> %2 } ;; (using llc -mcpu=corei7 -march=x86-64) Before, the x86 backend generated: pshufd $120, %xmm0, %xmm0 shufps $-108, %xmm0, %xmm1 movaps %xmm1, %xmm0 Now the x86 backend generates: movsd %xmm1, %xmm0 llvm-svn: 213069
* Prune Redundant libdeps in CMake's target_link_libraries and LLVMBuild.txt.NAKAMURA Takumi2014-07-154-4/+4
| | | | | | I checked this with Release+Asserts on x86_64-mingw32. Please restore partially if this were overkill. llvm-svn: 213064
* Silence a warning in conditional expression.Andrea Di Biagio2014-07-151-1/+1
| | | | | | | | Fixes a gcc warning caused by a typo. A redundant assignment operation was accidentally used as the third operand of a conditional expression. No functional change intended. llvm-svn: 213061
* MergeFunc patch from Björn Steinbrink.Stepan Dyatkovskiy2014-07-151-2/+12
| | | | | | | Phabricator ticket: D4246, Don't merge functions with different range metadata on call/invoke. Thanks! llvm-svn: 213060
* AArch64: fall back to generic code for out of range extract/insert.Tim Northover2014-07-151-6/+8
| | | | | | rdar://problem/17624784 llvm-svn: 213059
* Fix typo in commentDavid Majnemer2014-07-151-1/+1
| | | | | | No functionality changed. llvm-svn: 213052
* [FastISel][X86] Remove no longer needed functions.Juergen Ributzka2014-07-151-462/+0
| | | | llvm-svn: 213051
* [FastISel][X86] Implement the FastLowerIntrinsicCall hook.Juergen Ributzka2014-07-151-41/+41
| | | | | | | Rename X86VisitIntrinsicCall -> FastLowerIntrinsicCall, which effectively implements the target hook. llvm-svn: 213050
* [FastISel][X86] Implement the FastLowerCall hook.Juergen Ributzka2014-07-151-9/+400
| | | | | | | | | | | | This implements the FastLowerCall hook, which is based on the DoSelectCall function. The implementation is very similar, but the target-independent call lowering part has been factored out. This should also enable patchpoint intrinsic lowering for FastISel on X86. Related to <rdar://problem/17427052>. llvm-svn: 213049
* Revert "[FastISel][X86] Remove no longer needed functions."Juergen Ributzka2014-07-151-244/+315
| | | | | | | | | | Revert "[FastISel][X86] Implement the FastLowerIntrinsicCall hook." Revert "[FastISel][X86] Implement the FastLowerCall hook." This reverts commit r213035, r213036, and r213037 to make the buildbots happy again. llvm-svn: 213048
OpenPOWER on IntegriCloud