| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
| |
Refactor code, no functionality change, test case moved from instcombine to instsimplify.
Differential Revision: http://reviews.llvm.org/D4102
llvm-svn: 213231
|
| |
|
|
| |
llvm-svn: 213230
|
| |
|
|
|
|
|
| |
Skip calling GetUnderlyingObject in cases where it obviously
isn't from an alloca. This should only be a compile time improvement.
llvm-svn: 213229
|
| |
|
|
| |
llvm-svn: 213228
|
| |
|
|
|
|
|
|
|
|
|
| |
This makes the opcode an opaque value (unsigned int) rather than the
enumeration. This permits the use of target specific operands.
Split out the generic type into a MCWinEH header and add a supporting
MCWin64EH::Instruction to abstract out the selection of the opcode and
construction of the actual instruction.
llvm-svn: 213221
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts, "r213024 - Revert r212572 "improve BasicAA CS-CS queries", it
causes PR20303." with a fix for the bug in pr20303. As it turned out, the
relevant code was both wrong and over-conservative (because, as with the code
it replaced, it would return the overall ModRef mask even if just Ref had been
implied by the argument aliasing results). Hopefully, this correctly fixes both
problems.
Thanks to Nick Lewycky for reducing the test case for pr20303 (which I've
cleaned up a little and added in DSE's test directory). The BasicAA test has
also been updated to check for this error.
Original commit message:
BasicAA contains knowledge of certain intrinsics, such as memcpy and memset,
and uses that information to form more-accurate answers to CallSite vs. Loc
ModRef queries. Unfortunately, it did not use this information when answering
CallSite vs. CallSite queries.
Generically, when an intrinsic takes one or more pointers and the intrinsic is
marked only to read/write from its arguments, the offset/size is unknown. As a
result, the generic code that answers CallSite vs. CallSite (and CallSite vs.
Loc) queries in AA uses UnknownSize when forming Locs from an intrinsic's
arguments. While BasicAA's CallSite vs. Loc override could use more-accurate
size information for some intrinsics, it did not do the same for CallSite vs.
CallSite queries.
This change refactors the intrinsic-specific logic in BasicAA into a generic AA
query function: getArgLocation, which is overridden by BasicAA to supply the
intrinsic-specific knowledge, and used by AA's generic implementation. This
allows the intrinsic-specific knowledge to be used by both CallSite vs. Loc and
CallSite vs. CallSite queries, and simplifies the BasicAA implementation.
Currently, only one function, Mac's memset_pattern16, is handled by BasicAA
(all the rest are intrinsics). As a side-effect of this refactoring, BasicAA's
getModRefBehavior override now also returns OnlyAccessesArgumentPointees for
this function (which is an improvement).
llvm-svn: 213219
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Converting outermost zext(a) to sext(a) causes worse code when the
computation of zext(a) could be reused. For example, after converting
... = array[zext(a)]
... = array[zext(a) + 1]
to
... = array[sext(a)]
... = array[zext(a) + 1],
the program computes sext(a), which is actually unnecessary. I added one
test in split-gep-and-gvn.ll to illustrate this scenario.
Also, with r211281 and r211084, we annotate more "nuw" tags to
computation involving CUDA intrinsics such as threadIdx.x. These
annotations help with splitting GEP a lot, rendering the benefit we get
from this reverted optimization only marginal.
Test Plan: make check-all
Reviewers: eliben, meheff
Reviewed By: meheff
Subscribers: jholewinski, llvm-commits
Differential Revision: http://reviews.llvm.org/D4542
llvm-svn: 213209
|
| |
|
|
|
|
| |
Thanks to Duncan Exon Smith for reviewing and cleanup suggestions.
llvm-svn: 213205
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
side effects.
This fixes an issue where a local value is defined before and used after an
inline asm call with side effects.
This fix simply flushes the local value map, which updates the insertion point
for the inline asm call to be above any previously defined local values.
This fixes <rdar://problem/17694203>
llvm-svn: 213203
|
| |
|
|
|
|
|
| |
When a RuntimeDyldChecker test requests an invalid operand for an instruction,
print the decoded instruction to aid diagnosis.
llvm-svn: 213202
|
| |
|
|
|
|
| |
Make sure that the AddrInst is an Instruction.
llvm-svn: 213197
|
| |
|
|
|
|
| |
Any CPU can run this pass.
llvm-svn: 213190
|
| |
|
|
| |
llvm-svn: 213189
|
| |
|
|
|
|
| |
TargetRegisterInfo instead of the TargetSubtargetInfo.
llvm-svn: 213188
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were not considering the stated alignment on vector loads/stores,
leading us to generate vector instructions even when we do not have
sufficient alignment.
Now, for IR like:
%1 = load <4 x float>, <4 x float>* %ptr, align 4
we will generate correct, conservative PTX like:
ld.f32 ... [%ptr]
ld.f32 ... [%ptr+4]
ld.f32 ... [%ptr+8]
ld.f32 ... [%ptr+12]
Or if we have an alignment of 8 (for example), we can
generate code like:
ld.v2.f32 ... [%ptr]
ld.v2.f32 ... [%ptr+8]
llvm-svn: 213186
|
| |
|
|
|
|
| |
(run returns unique_ptr by value already)
llvm-svn: 213174
|
| |
|
|
|
|
|
|
| |
register coalescing. Also fixed some 80 col violations.
No functional code changes.
llvm-svn: 213169
|
| |
|
|
|
|
| |
This matches the internal behavior of NVIDIA tools like libnvvm.
llvm-svn: 213168
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
It turns out that in most cases (the main exception being i1-related
types) once these operations are formed we cannot separate them and
the targets end up having to deal with them whether they want to or
not.
This is not a good situation, and a more reasonable default can be
formed by ackowledging this and having targets leave them as Legal.
Only x86 seems to be affected (other targets don't even try marking
the operation Expand).
Mostly there's no visible change here yet, but it will be useful to
have truly expanded EXTLOADS for MVT::f16 softening support.
llvm-svn: 213162
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
when using the FP64A ABI.
Summary:
A few instructions (mostly cvt.d.w and similar) are causing problems with
-mfp64 and -mno-odd-spreg and it looks like fixing it properly may
take several weeks. In the meantime, let's disable the odd-numbered
double-precision registers so that the generated code is at least valid.
The problem is that instructions like cvt.d.w read from the 32-bit low
subregister of a double-precision FPU register. This often leads to the compiler
to inserting moves to transfer a GPR32 to a FGR32 using mtc1. Such moves
violate the rules against 32-bit writes to odd-numbered FPU registers imposed
by -mno-odd-spreg. By disabling the odd-numbered double-precision registers, it
becomes impossible for the 32-bit low subregister to be odd-numbered.
This fixes numerous test-suite failures when compiling for the FP64A ABI
('-mfp64 -mno-odd-spreg'). There is no LLVM test case because it's difficult to
test that odd-numbered FPU registers are not allocatable. Instead, we depend on
the assembler (GAS and -fintegrated-as) raising errors when the rules are
violated.
Differential Revision: http://reviews.llvm.org/D4532
llvm-svn: 213160
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Before this change, method 'isShuffleMaskLegal' didn't know that shuffles
implementing a 'movhlps' operation were perfectly legal for SSE targets.
This patch adds the missing check for 'isMOVHLPSMask' inside method
'isShuffleMaskLegal' to fix the problem.
The reason why it is important to do this is because the DAGCombiner
conservatively avoids combining a pair of shuffles if the resulting shuffle
node has an illegal mask. Before this patch, shuffles with a MOVHLPS mask were
wrongly considered not to be legal. This was the root cause of some poor-code
generation bugs.
llvm-svn: 213137
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This was an oversight in the original support. As it is, I stuffed this
bit into the alignment. The alignment is stored in log2 form, so it
doesn't need more than 5 bits, given that Value::MaximumAlignment is 1
<< 29.
Reviewers: nicholas
Differential Revision: http://reviews.llvm.org/D3943
llvm-svn: 213118
|
| |
|
|
|
|
|
|
| |
In the original version of the patch the behaviour was like described in
the comment. This behaviour was changed before committing it without
updating the comment.
llvm-svn: 213117
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
On Windows, wildcard expansion isn't performed by the shell, but left to the
program itself. The common way to do this is to link with setargv.obj, which
performs the expansion on argc/argv before main is entered. However, we don't
use argv in Clang on Windows, but instead call GetCommandLineW so we can handle
unicode arguments. This means we have to do wildcard expansion ourselves.
A test case will be added on the Clang side.
Differential Revision: http://reviews.llvm.org/D4529
llvm-svn: 213114
|
| |
|
|
|
|
|
|
|
|
|
| |
This patch modifies the existing DiagnosticInfo system to create a generic base
class that is inherited to produce diagnostic-based warnings. This is used by
the loop vectorizer to trigger a warning when vectorization is forced and
fails. Several tests have been added to verify this behavior.
Reviewed by: Arnold Schwaighofer
llvm-svn: 213110
|
| |
|
|
|
|
|
| |
There is no need to pass on TLI separately to the function. As Eric pointed out
the Target Machine already provides everything we need.
llvm-svn: 213108
|
| |
|
|
|
|
|
| |
These are precise enough to use for OpenCL unless denormals
are handled.
llvm-svn: 213107
|
| |
|
|
|
|
|
|
|
| |
There exists a helper function to abstract away the various differences
between ConstantVector, ConstantDataVector, ConstantAggregateZero, etc.
Use it to simplify X86WindowsTargetObjectFile::getSectionForConstant.
llvm-svn: 213104
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Refactoring; no functional changes intended
Removed PostRAScheduler bits from subtargets (X86, ARM).
Added PostRAScheduler bit to MCSchedModel class.
This bit is set by a CPU's scheduling model (if it exists).
Removed enablePostRAScheduler() function from TargetSubtargetInfo and subclasses.
Fixed the existing enablePostMachineScheduler() method to use the MCSchedModel (was just returning false!).
Added methods to TargetSubtargetInfo to allow overrides for AntiDepBreakMode, CriticalPathRCs, and OptLevel for PostRAScheduling.
Added enablePostRAScheduler() function to PostRAScheduler class which queries the subtarget for the above values.
Preserved existing scheduler behavior for ARM, MIPS, PPC, and X86:
a. ARM overrides the CPU's postRA settings by enabling postRA for any non-Thumb or Thumb2 subtarget.
b. MIPS overrides the CPU's postRA settings by enabling postRA for everything.
c. PPC overrides the CPU's postRA settings by enabling postRA for everything.
d. X86 is the only target that actually has postRA specified via sched model info.
Differential Revision: http://reviews.llvm.org/D4217
llvm-svn: 213101
|
| |
|
|
|
|
|
| |
Specifically, do not compute a union if it is statically known that one
shadow set subsumes the other.
llvm-svn: 213100
|
| |
|
|
| |
llvm-svn: 213096
|
| |
|
|
|
|
|
| |
Assuming single precision denormals and accurate sqrt/div are not
reported, this passes the OpenCL conformance test.
llvm-svn: 213089
|
| |
|
|
| |
llvm-svn: 213088
|
| |
|
|
| |
llvm-svn: 213087
|
| |
|
|
|
|
|
|
| |
The registration scheme used in r211652 violated the read-only contract of
MemoryBuffer. This caused crashes in llvm-rtdyld where macho objects were backed
by read-only mmap'd memory.
llvm-svn: 213086
|
| |
|
|
|
|
|
|
|
|
| |
coalescing.
The coalescer is very aggressive at propagating constraints on the register classes, and the register allocator doesn’t know how to split sub-registers later to recover. This patch provides an escape valve for targets that encounter this problem to limit coalescing.
This patch also implements such for ARM to lower register pressure when using lots of large register classes. This works around PR18825.
llvm-svn: 213078
|
| |
|
|
| |
llvm-svn: 213073
|
| |
|
|
|
|
|
|
|
| |
v2: use ffbh/l if available
v3: Rebase on top of Matt's SI patches
Signed-off-by: Jan Vesely <jan.vesely@rutgers.edu>
Reviewed-by: Tom Stellard <tom@stellard.net>
llvm-svn: 213072
|
| |
|
|
|
|
|
|
| |
Summary: Previously all the test cases set it after initialization with '.module fp=xx'.
Differential Revision: http://reviews.llvm.org/D4489
llvm-svn: 213071
|
| |
|
|
| |
llvm-svn: 213070
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch adds two new rules to the DAGCombiner:
1. shuffle (shuffle A, Undef, M0), B, M1 -> shuffle A, B, M2
2. shuffle (shuffle A, Undef, M0), A, M1 -> shuffle A, Undef, M2
We only do this if the combined shuffle is legal for the target.
Example:
;;
define <4 x float> @test(<4 x float> %a, <4 x float> %b) {
%1 = shufflevector <4 x float> %a, <4 x float> undef, <4 x i32><i32 6, i32 0, i32 1, i32 7>
%2 = shufflevector <4 x float> %1, <4 x float> %b, <4 x i32><i32 1, i32 2, i32 4, i32 5>
ret <4 x i32> %2
}
;;
(using llc -mcpu=corei7 -march=x86-64)
Before, the x86 backend generated:
pshufd $120, %xmm0, %xmm0
shufps $-108, %xmm0, %xmm1
movaps %xmm1, %xmm0
Now the x86 backend generates:
movsd %xmm1, %xmm0
llvm-svn: 213069
|
| |
|
|
|
|
| |
I checked this with Release+Asserts on x86_64-mingw32. Please restore partially if this were overkill.
llvm-svn: 213064
|
| |
|
|
|
|
|
|
| |
Fixes a gcc warning caused by a typo. A redundant assignment operation was
accidentally used as the third operand of a conditional expression.
No functional change intended.
llvm-svn: 213061
|
| |
|
|
|
|
|
| |
Phabricator ticket: D4246, Don't merge functions with different range metadata on call/invoke.
Thanks!
llvm-svn: 213060
|
| |
|
|
|
|
| |
rdar://problem/17624784
llvm-svn: 213059
|
| |
|
|
|
|
| |
No functionality changed.
llvm-svn: 213052
|
| |
|
|
| |
llvm-svn: 213051
|
| |
|
|
|
|
|
| |
Rename X86VisitIntrinsicCall -> FastLowerIntrinsicCall, which effectively
implements the target hook.
llvm-svn: 213050
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This implements the FastLowerCall hook, which is based on the DoSelectCall
function. The implementation is very similar, but the target-independent call
lowering part has been factored out.
This should also enable patchpoint intrinsic lowering for FastISel on X86.
Related to <rdar://problem/17427052>.
llvm-svn: 213049
|
| |
|
|
|
|
|
|
|
|
| |
Revert "[FastISel][X86] Implement the FastLowerIntrinsicCall hook."
Revert "[FastISel][X86] Implement the FastLowerCall hook."
This reverts commit r213035, r213036, and r213037 to make the
buildbots happy again.
llvm-svn: 213048
|