| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
Add SDep printer to make debugging sessions more productive.
Differential revision: https://reviews.llvm.org/D35144
llvm-svn: 307799
|
| |
|
|
|
|
|
|
|
|
| |
FastIsel can't handle them, so we would end up crashing during
register class selection.
Fixes PR26522.
Differential Revision: https://reviews.llvm.org/D35272
llvm-svn: 307797
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Continuing the work from https://reviews.llvm.org/D33240, this change introduces an element unordered-atomic memmove intrinsic. This intrinsic is essentially memmove with the implementation requirement that all loads/stores used for the copy are done with unordered-atomic loads/stores of a given element size.
Reviewers: eli.friedman, reames, mkazantsev, skatkov
Reviewed By: reames
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34884
llvm-svn: 307796
|
| |
|
|
| |
llvm-svn: 307790
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
NetBSD shell sh(1) does not support ">& /dev/null" construct.
This is bashism. The portable and POSIX solution is to use:
"> /dev/null 2>&1".
This change fixes 22 Unexpected Failures on NetBSD/amd64
for the "check-llvm" target.
Sponsored by <The NetBSD Foundation>
Reviewers: joerg, dim, rnk
Reviewed By: joerg, rnk
Subscribers: rnk, davide, llvm-commits
Differential Revision: https://reviews.llvm.org/D35277
llvm-svn: 307789
|
| |
|
|
|
|
|
|
|
| |
When we have a diamond ifcvt the fallthough block will have a branch at the end
of it that disappears when predicated, so discount it from the predication cost.
Differential Revision: https://reviews.llvm.org/D34952
llvm-svn: 307788
|
| |
|
|
|
|
| |
Improves test coverage for pre-AVX512 targets as well
llvm-svn: 307783
|
| |
|
|
|
|
|
| |
Very similar to how we select s32 G_FCMP, the only thing that is
different is the exact opcodes that we use.
llvm-svn: 307763
|
| |
|
|
|
|
| |
Adding base test for AVX512
llvm-svn: 307761
|
| |
|
|
|
|
| |
This should fix the problems on the greendragon build.
llvm-svn: 307747
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
OpenCL 2.0 introduces the notion of memory scopes in atomic operations to
global and local memory. These scopes restrict how synchronization is
achieved, which can result in improved performance.
This change extends existing notion of synchronization scopes in LLVM to
support arbitrary scopes expressed as target-specific strings, in addition to
the already defined scopes (single thread, system).
The LLVM IR and MIR syntax for expressing synchronization scopes has changed
to use *syncscope("<scope>")*, where <scope> can be "singlethread" (this
replaces *singlethread* keyword), or a target-specific name. As before, if
the scope is not specified, it defaults to CrossThread/System scope.
Implementation details:
- Mapping from synchronization scope name/string to synchronization scope id
is stored in LLVM context;
- CrossThread/System and SingleThread scopes are pre-defined to efficiently
check for known scopes without comparing strings;
- Synchronization scope names are stored in SYNC_SCOPE_NAMES_BLOCK in
the bitcode.
Differential Revision: https://reviews.llvm.org/D21723
llvm-svn: 307722
|
| |
|
|
| |
llvm-svn: 307718
|
| |
|
|
| |
llvm-svn: 307698
|
| |
|
|
| |
llvm-svn: 307691
|
| |
|
|
|
|
|
| |
Base test for avx512
adding new base test to trunk befor commit change on the test
llvm-svn: 307677
|
| |
|
|
| |
llvm-svn: 307675
|
| |
|
|
|
|
|
|
|
|
|
| |
1. The available program storage region of the red zone to compilers is 288
bytes rather than 244 bytes.
2. The formula for negative number alignment calculation should be
y = x & ~(n-1) rather than y = (x + (n-1)) & ~(n-1).
Differential Revision: https://reviews.llvm.org/D34337
llvm-svn: 307672
|
| |
|
|
|
|
|
|
| |
Patch by Michael Wu.
Differential Revision: https://reviews.llvm.org/D35104
llvm-svn: 307671
|
| |
|
|
|
|
|
|
| |
Use CHECK-NEXT for the comparison sequence, to make sure we don't get
any unexpected instructions in the middle of our flag manipulation
efforts.
llvm-svn: 307656
|
| |
|
|
| |
llvm-svn: 307654
|
| |
|
|
|
|
| |
Map the result into GPR and the operands into FPR.
llvm-svn: 307653
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Make sure that all the legalizer tests where the original instruction
needs to be removed check for the removal. We do this by adding
CHECK-NOT lines before and after the replacement sequence. This won't
catch pathological cases where the instruction remains somewhere in the
middle of the instruction sequence that's supposed to replace it, but
hopefully that won't occur in practice (since ideally we'd be setting
the insert point for the new instruction sequence either before or after
the original instruction and not fiddle with it while building the
sequence).
llvm-svn: 307647
|
| |
|
|
|
|
|
| |
We used to forget to erase the original instruction when replacing a
G_FCMP true/false. Fix this bug and make sure the tests check for it.
llvm-svn: 307639
|
| |
|
|
|
|
|
|
|
|
|
|
| |
TreePatternNode considers them to be plain integers but MachineInstr considers
them to be a distinct kind of operand.
The tweak to AArch64InstrInfo.td to produce a simple test case is a NFC for
everything except GlobalISelEmitter (confirmed by diffing the tablegenerated
files). GlobalISelEmitter is currently unable to infer the type of operands in
the Dst pattern from the operands in the Src pattern.
llvm-svn: 307634
|
| |
|
|
|
|
| |
Same as the s32 version, for both hard and soft float.
llvm-svn: 307633
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a second attempt to land this patch.
The first one resulted in a crash of clang sanitizer buildbot.
The fix is here and regression test is added.
This is a last fix for the corner case of PR32214. Actually this is not really corner case in general.
We should not do a loop rotation if we create an additional branch due to it.
Consider the case where we have a loop chain H, M, B, C , where
H is header with viable fallthrough from pre-header and exit from the loop
M - some middle block
B - backedge to Header but with exit from the loop also.
C - some cold block of the loop.
Let's H is determined as a best exit. If we do a loop rotation M, B, C, H we can introduce the extra branch.
Let's compute the change in number of branches:
+1 branch from pre-header to header
-1 branch from header to exit
+1 branch from header to middle block if there is such
-1 branch from cold bock to header if there is one
So if C is not a predecessor of H then we introduce extra branch.
This change actually prohibits rotation of the loop if both true
Best Exit has next element in chain as successor.
Last element in chain is not a predecessor of first element of chain.
Reviewers: iteratee, xur, sammccall, chandlerc
Reviewed By: iteratee
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34745
llvm-svn: 307631
|
| |
|
|
|
|
| |
AND8ri8 not supported in 64bit.
llvm-svn: 307630
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
CodeGenPrepare::optimizeMemoryInst contains a check that we do nothing
if all instructions combining the address for memory instruction is in the same
block as memory instruction itself.
However if any of these instruction are placed after memory instruction then
address calculation will not be folded to memory instruction.
The added test case shows an example.
Reviewers: loladiro, spatel, efriedma
Reviewed By: efriedma
Subscribers: llvm-commits
Differential Revision: https://reviews.llvm.org/D34862
llvm-svn: 307628
|
| |
|
|
| |
llvm-svn: 307617
|
| |
|
|
|
|
|
|
|
| |
Reverting as it breaks tramp3d-v4 in the llvm test-suite. I added some
comments to https://reviews.llvm.org/D33345 about it.
This reverts commit r307546.
llvm-svn: 307589
|
| |
|
|
| |
llvm-svn: 307576
|
| |
|
|
|
|
|
|
| |
Immediates can be folded as long as the immediate is a vreg.
Also undo commuting instructions if it didn't fold an immediate.
llvm-svn: 307575
|
| |
|
|
|
|
| |
This fixes https://llvm.org/PR33718.
llvm-svn: 307566
|
| |
|
|
| |
llvm-svn: 307564
|
| |
|
|
|
|
|
| |
Differential Revision: https://reviews.llvm.org/D34908
Fix PR: https://bugs.llvm.org/show_bug.cgi?id=33093
llvm-svn: 307563
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
use as an index register for X-Form loads/stores.
For this example:
float test (int *arr) {
return arr[2];
}
We currently generate the following code:
li r4, 8
lxsiwax f0, r3, r4
xscvsxdsp f1, f0
With this patch, we will now generate:
addi r3, r3, 8
lxsiwax f0, 0, r3
xscvsxdsp f1, f0
Originally reported in: https://bugs.llvm.org/show_bug.cgi?id=27204
Differential Revision: https://reviews.llvm.org/D35027
llvm-svn: 307553
|
| |
|
|
|
|
|
|
|
|
|
| |
(PR28573).
The new version of the model is definitely faster.
Differential Revision:
https://reviews.llvm.org/D35198
llvm-svn: 307552
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Memory accesses offset from frame indices may alias, e.g., we
may merge write from function arguments passed on the stack when they
are contiguous. As a result, when checking aliasing, we consider the
underlying frame index's offset from the stack pointer.
Static allocs are realized as stack objects in SelectionDAG, but its
offset is not set until post-DAG causing DAGCombiner's alias check to
consider access to static allocas to frequently alias. Modify isAlias
to consider access between static allocas and access from other frame
objects to be considered aliasing.
Many test changes are included here. Most are fixes for tests which
indirectly relied on our aliasing ability and needed to be modified to
preserve their original intent.
The remaining tests have minor improvements due to relaxed
ordering. The exception is CodeGen/X86/2011-10-19-widen_vselect.ll
which has a minor degradation dispite though the pre-legalized DAG is
improved.
Reviewers: rnk, mkuper, jonpa, hfinkel, uweigand
Reviewed By: rnk
Subscribers: sdardis, nemanjai, javed.absar, llvm-commits
Differential Revision: https://reviews.llvm.org/D33345
llvm-svn: 307546
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SandyBridge architecture target by modifying the file X86SchedSandyBridge.td located under the X86 Target.
The SandyBridge architects have provided us with a more accurate information about each instruction latency, number of uOPs and used ports and I used it to replace the existing estimated SNB instructions scheduling and to add missing scheduling information.
Please note that the patch extensively affects the X86 MC instr scheduling for SNB.
Also note that this patch will be followed by additional patches for the remaining target architectures HSW, IVB, BDW, SKL and SKX.
The updated and extended information about each instruction includes the following details:
•static latency of the instruction
•number of uOps from which the instruction consists of
•all ports used by the instruction's' uOPs
For example, the following code dictates that instructions, ADC64mr, ADC8mr, SBB64mr, SBB8mr have a static latency of 9 cycles. Each of these instructions is decoded into 6 micro operations which use ports 4, ports 2 or 3 and port 0 and ports 0 or 1 or 5:
def SBWriteResGroup94 : SchedWriteRes<[SBPort4,SBPort23,SBPort0,SBPort015]> {
let Latency = 9;
let NumMicroOps = 6;
let ResourceCycles = [1,2,2,1];
}
def: InstRW<[SBWriteResGroup94], (instregex "ADC64mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "ADC8mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "SBB64mr")>;
def: InstRW<[SBWriteResGroup94], (instregex "SBB8mr")>;
Note that apart for the header, most of the X86SchedSandyBridge.td file was generated by a script.
Reviewers: zvi, chandlerc, RKSimon, m_zuckerman, craig.topper, igorb
Differential Revision: https://reviews.llvm.org/D35019#inline-304691
llvm-svn: 307529
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: Support G_LOAD/G_STORE i1.
Reviewers: zvi, guyblank
Reviewed By: guyblank
Subscribers: rovka, kristof.beyls, llvm-commits
Differential Revision: https://reviews.llvm.org/D35178
llvm-svn: 307527
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Mark G_ZEXT/G_SEXT i1 to i8/i16, i8 to i16 as legal.
Support G_ZEXT i1 to i8/i16 instruction selection ( C++ code).
This patch requred to support G_LOAD/G_STORE i1.
Reviewers: zvi, guyblank
Reviewed By: guyblank
Subscribers: rovka, llvm-commits, kristof.beyls
Differential Revision: https://reviews.llvm.org/D35177
llvm-svn: 307526
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
WidenVSELECTAndMask can fold (and it folds in this case) so we
get a BUILD_VECTOR of constants as mask. convertMask() seems to
work fine when the input is a vector of constants, and we still
need to call it to extend/add elements at the end. but the current
code just asserts on anything but a SETCC or AND/OR/XOR of 2xSETCC.
This change was discussed briefly with Simon Pilgrim, who also
suggests we might consider dropping this assertion in the future.
Fixes PR33715.
llvm-svn: 307508
|
| |
|
|
|
|
| |
Broken due to r307259.
llvm-svn: 307503
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This change fixes a bug in SelectionDAGBuilder::visitInsertValue and SelectionDAGBuilder::visitExtractValue where constant expressions (InsertValueConstantExpr and ExtractValueConstantExpr) would be treated as non-constant instructions (InsertValueInst and ExtractValueInst). This bug resulted in an incorrect memory access, which manifested as an assertion failure in SDValue::SDValue.
Fixes PR#33094.
Submitted on behalf of @Praetonus (Benoit Vey)
Differential Revision: https://reviews.llvm.org/D34538
llvm-svn: 307502
|
| |
|
|
|
|
| |
Show poor codegen on KNL targets as mentioned on D35179
llvm-svn: 307500
|
| |
|
|
| |
llvm-svn: 307494
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary: FastISel was marked as failed in case instruction selection succeeded.
Reviewers: qcolombet, zvi, rovka, ab
Reviewed By: zvi
Subscribers: javed.absar, ab, qcolombet, bogner, llvm-commits
Differential Revision: https://reviews.llvm.org/D34438
llvm-svn: 307489
|
| |
|
|
|
|
| |
sucessor -> successor
llvm-svn: 307488
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
x86 scalar select-of-constants (Cond ? C1 : C2) combining/lowering is a mess
with missing optimizations. We handle some patterns, but miss logical variants.
To clean that up, we should convert all select-of-constants to logic/math and
enhance the combining for the expected patterns from that. Selecting 0 or -1
needs extra attention to produce the optimal code as shown here.
Attempt to verify that all of these IR forms are logically equivalent:
http://rise4fun.com/Alive/plxs
Earlier steps in this series:
rL306040
rL306072
rL307404 (D34652)
As acknowledged in the earlier review, there's a possibility that some Intel
uarch would prefer to produce an xor to clear the fake register operand with
sbb %eax, %eax. This will likely need to be addressed in a separate pass.
llvm-svn: 307471
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When reusing a register for a new definition, the fast register allocator
used to insert a kill flag at the previous last use of that register to
inform later passes that this register is free between the redef and the
last use. However, this may be wrong when subregisters are involved.
Indeed, a partially redef would have trigger a kill of the full super
register, potentially wrongly marking all the other subregisters as
free. Given we don't track which lanes are still live, we cannot set the
kill flag in such case.
Note: This bug has been latent for about 7 years (r104056).
llvmg.org/PR33677
llvm-svn: 307428
|