| Commit message (Collapse) | Author | Age | Files | Lines | 
| ... |  | 
| | 
| 
| 
| 
| 
|  | 
Also fixes missing f32 test.
llvm-svn: 260780
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Reviewers: nhaustov, cfang, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17159
llvm-svn: 260774
 | 
| | 
| 
| 
|  | 
llvm-svn: 260773
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
This patch skips DAG combine of fp_round (fp_round x) if it results in
an fp_round from f80 to f16.
fp_round from f80 to f16 always generates an expensive (and as yet,
unimplemented) libcall to __truncxfhf2.  This prevents selection of
native f16 conversion instructions from f32 or f64.  Moreover, the first
(value-preserving) fp_round from f80 to either f32 or f64 may become a
NOP in platforms like x86.
Reviewers: ab
Subscribers: srhines, llvm-commits
Differential Revision: http://reviews.llvm.org/D17221
llvm-svn: 260769
 | 
| | 
| 
| 
|  | 
llvm-svn: 260766
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Reviewers: arsenm
Subscribers: mareko, MatzeB, qcolombet, arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D16603
llvm-svn: 260765
 | 
| | 
| 
| 
| 
| 
|  | 
Differential Revision: http://reviews.llvm.org/D16837
llvm-svn: 260764
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
ops.
Computed gotos and RETURNADDR may never be supported; we can do
FRAMEADDR in the future.
llvm-svn: 260759
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Replace spills to memory with spills to registers, if possible. This
applies mostly to predicate registers (both scalar and vector), since
they are very limited in number. A spill of a predicate register may
happen even if there is a general-purpose register available. In cases
like this the stack spill/reload may be eliminated completely.
This optimization will consider all stack objects, regardless of where
they came from and try to match the live range of the stack slot with
a dead range of a register from an appropriate register class.
llvm-svn: 260758
 | 
| | 
| 
| 
|  | 
llvm-svn: 260753
 | 
| | 
| 
| 
|  | 
llvm-svn: 260750
 | 
| | 
| 
| 
|  | 
llvm-svn: 260748
 | 
| | 
| 
| 
|  | 
llvm-svn: 260746
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
Add another interface to function annotateValueSite() which directly uses the
VauleData array.
Differential Revision: http://reviews.llvm.org/D17108
llvm-svn: 260741
 | 
| | 
| 
| 
|  | 
llvm-svn: 260740
 | 
| | 
| 
| 
| 
| 
|  | 
This should have landed in r260686.
llvm-svn: 260739
 | 
| | 
| 
| 
|  | 
llvm-svn: 260737
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
never return
Differential Revision: http://reviews.llvm.org/D17208
llvm-svn: 260733
 | 
| | 
| 
| 
| 
| 
|  | 
Last part of PR25166.
llvm-svn: 260732
 | 
| | 
| 
| 
|  | 
llvm-svn: 260731
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
convergent functions.
Summary:
Performing this optimization duplicates the call to the convergent
function and adds new control-flow dependencies, which is a no-no.
Reviewers: jingyue
Subscribers: broune, hfinkel, tra, resistor, joker.eph, arsenm, llvm-commits, mzolotukhin
Differential Revision: http://reviews.llvm.org/D17128
llvm-svn: 260730
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
convergent function.
Summary:
Calls to convergent functions can be duplicated, but only if the
duplicates are not control-flow dependent on any additional values.
Loop rotation doesn't meet the bar.
Reviewers: jingyue
Subscribers: mzolotukhin, llvm-commits, arsenm, joker.eph, resistor, tra, hfinkel, broune
Differential Revision: http://reviews.llvm.org/D17127
llvm-svn: 260729
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary: No functional changes.
Reviewers: jingyue, arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D17126
llvm-svn: 260728
 | 
| | 
| 
| 
|  | 
llvm-svn: 260725
 | 
| | 
| 
| 
| 
| 
|  | 
More to come, but those were easy.
llvm-svn: 260723
 | 
| | 
| 
| 
|  | 
llvm-svn: 260722
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
The attached patch removes all of the block local code for performing X-load forwarding by reusing the code used in the non-local case.
The motivation here is to remove duplication and in the process increase our test coverage of some fairly tricky code. I have some upcoming changes I'll be proposing in this area and wanted to have the code cleaned up a bit first.
Note: The review for this mostly happened in email which didn't make it to phabricator on the 258882 commit thread.
Differential Revision: http://reviews.llvm.org/D16608
llvm-svn: 260711
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
In short, before r252926 we were comparing an unsigned (StoreSize) against an a
APInt (Stride), which is fine and well.  After we were zero extending the Stride
and then converting to an unsigned, which is not the same thing.  Obviously,
Stides can also be negative.  This commit just restores the original behavior.
AFAICT, it's not possible to write a test case to expose the issue because
the code already has checks to make sure the StoreSize can't overflow an
unsigned (which prevents the Stride from overflowing an unsigned as well).
llvm-svn: 260706
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
As the title says. Modelled after similar code in SCEV.
This is useful when analysing induction variables in loops which have been canonicalized by other passes. I wrote the tests as non-loops specifically to avoid the generality introduced in http://reviews.llvm.org/D17174. While that can handle many induction variables without *needing* to exploit nsw, there's no reason not to use it if we've already proven it.
Differential Revision: http://reviews.llvm.org/D17177
llvm-svn: 260705
 | 
| | 
| 
| 
| 
| 
| 
|  | 
Other component could not depends on an optional library in llvm-config
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 260701
 | 
| | 
| 
| 
|  | 
llvm-svn: 260698
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
Rewrite the code to handle all pseudo-instructions in a single pass.
This temporarily reverts spill slot optimization that used general-
purpose registers to hold values of spilled predicate registers.
llvm-svn: 260696
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
For some cases, InstCombine replaces the sequence of xor/sub instruction
followed by cmp instruction into a single cmp instruction.
However, this replacement may result suboptimal result especially when
the xor/sub has more than one use, as discussed in
bug 26465 (https://llvm.org/bugs/show_bug.cgi?id=26465).
This patch make the replacement happen only when xor/sub has only one
use.
Differential Revision: http://reviews.llvm.org/D16915
Patch by Taewook Oh!
llvm-svn: 260695
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
assembler
Historically, AMD internal sp3 assembler has flat_store* addr, data
format. To match existing code and to enable reuse, change LLVM
definitions to match.  Also update MC and CodeGen tests.
Differential Revision: http://reviews.llvm.org/D16927
Patch by: Nikolay Haustov
llvm-svn: 260694
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
  It is possible that the loop condition can be a boolean constant (infinite loop,
for example). So we sould handle constant condition in annotating a loop. This
patch adds this functionality to support annotating constant condition.
Reviewers: tstellarAMD, arsenm
Subscribers: llvm-commits, arsenm
Differential Revision: http://reviews.llvm.org/D15093
llvm-svn: 260692
 | 
| | 
| 
| 
| 
| 
|  | 
This code is dead. The expansion is now done in HexagonFrameLowering.
llvm-svn: 260691
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
We can generate the actual instructions from the intrinsics without the
need for pseudo-instructions. Also, since the intrinsics have a side-
effect in a form of a store, attempt to optimize away loads from the
store location.
llvm-svn: 260690
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
Before this change, callee-save registers would be rounded up to even
pairs of GPRs and FPRs.  This change eliminates these extra padding
load/stores, though it does keep the stack allocation the same size
unless both the GPR and FPR sets have an odd size, in which case one
full pair stack slot (16 bytes) is saved.
This optimization cannot currently be done for MachO targets since they
rely on a fast-path .debug_frame equivalent that can only encode
callee-save registers as pairs.
Reviewers: t.p.northover, rengolin, mcrosier, jmolloy
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D17000
llvm-svn: 260689
 | 
| | 
| 
| 
| 
| 
| 
|  | 
Create a virtual register that will hold the actual address and use it
with the offset of 0 in the place of the original FI.
llvm-svn: 260688
 | 
| | 
| 
| 
| 
| 
|  | 
Machine model description by Dave Estes <cestes@codeaurora.org>.
llvm-svn: 260686
 | 
| | 
| 
| 
|  | 
llvm-svn: 260683
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
This change merges adjacent 32 bit zero stores into a 64 bit zero store.
e.g.,
  str wzr, [x0]
  str wzr, [x0, #4]
becomes
  str xzr, [x0]
Therefore, four adjacent 32 bit zero stores will be a single stp.
e.g.,
  str wzr, [x0]
  str wzr, [x0, #4]
  str wzr, [x0, #8]
  str wzr, [x0, #12]
becomes
  stp xzr, xzr, [x0]
Reviewers: mcrosier, jmolloy, gberry, t.p.northover
Subscribers: aemerson, rengolin, mcrosier, llvm-commits
Differential Revision: http://reviews.llvm.org/D16933
llvm-svn: 260682
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
The DataLayout can calculate alignment of vectors based on the alignment
of the element type and the number of elements. In fact, it is the product
of these two values. The problem is that for vectors of N x i1, this will
return the alignment of N bytes, since the alignment of i1 is 8 bits. The
vector types of vNi1 should be aligned to N bits instead. Provide explicit
alignment for HVX vectors to avoid such complications.
llvm-svn: 260678
 | 
| | 
| 
| 
| 
| 
|  | 
Found by msan.
llvm-svn: 260676
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
node set rather than walking the SCC directly.
This directly exposes the functions and has already had null entries
filtered out. We also don't need need to handle optnone as it has
already been handled in the caller -- we never try to remove convergent
when there are optnone functions in the SCC.
With this change, the code for removing convergent should work with the
new pass manager and a different SCC analysis.
llvm-svn: 260668
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
with the test for a non-convergent intrinsic call.
While it is possible to use the call records to search for function
calls, we're going to do an instruction scan anyways to find the
intrinsics, we can handle both cases while scanning instructions. This
will also make the logic more amenable to the new pass manager which
doesn't use the same call graph structure.
My next patch will remove use of CallGraphNode entirely and allow this
code to work with both the old and new pass manager. Fortunately, it
should also get strictly simpler without changing functionality.
llvm-svn: 260666
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This was hardcoded to the static private size, but this
would be missing the offset and additional size for someday
when we have dynamic sizing.
Also stops always initializing flat_scratch even when unused.
In the future we should stop emitting this unless flat instructions
are used to access private memory. For example this will initialize
it almost always on VI because flat is used for global access.
llvm-svn: 260658
 | 
| | 
| 
| 
| 
|  | 
From: Mehdi Amini <mehdi.amini@apple.com>
llvm-svn: 260657
 | 
| | 
| 
| 
| 
| 
|  | 
before I update it to be friendly with the new pass manager.
llvm-svn: 260653
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
Introduce a subtarget feature for this, and leave the default with
the current behavior which assumes up to 16-byte loads/stores can
be used. The field also seems to have the ability to be set to 2 bytes,
but I'm not sure what that would be used for.
llvm-svn: 260651
 |