| Commit message (Collapse) | Author | Age | Files | Lines | 
| | 
| 
| 
|  | 
llvm-svn: 256130
 | 
| | 
| 
| 
|  | 
llvm-svn: 256127
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
selection DAG knows how to optimize into a shift.
This allows "icmp ugt %a, 4294967295" and "icmp uge %a, 4294967296" to be optimized into right shifts by 32 which can fold the immediate into the shift instruction. These patterns show up with some regularity in real code.
Unfortunately, since getImmCost can't see the icmp predicate we can't be tell if we're only catching these specific cases.
llvm-svn: 256126
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
This adds the core AVR TableGen file, along with the register descriptions.
Lines in AVR.td which require other TableGen files which haven't been committed
yet are commented out.
This is a fairly trivial patch, and should only require a quick review.
I kept the line width smaller than 80 columns, but there are a few exceptions
because I'm not sure how to split a string over several lines.
Reviewers: stoklund
Subscribers: dylanmckay, agnat
Differential Revision: http://reviews.llvm.org/D14684
llvm-svn: 256120
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Thumb2
Summary:
r250697 fixed the mapping for ARM mode. We have to do the same for Thumb2 otherwise the same llvm.arm.ssat() will generate different saturating amount for ARM and Thumb.
r250697: http://reviews.llvm.org/rL250697
Reviewers: rmaprath
Subscribers: aemerson, llvm-commits, rengolin
Differential Revision: http://reviews.llvm.org/D15653
llvm-svn: 256115
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
The analysis of shader inputs was completely wrong.  We were passing the
wrong index to AttributeSet::hasAttribute() and the logic for which
inputs where in SGPRs was wrong too.
Reviewers: arsenm
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15608
llvm-svn: 256082
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
noduplicate prevents unrolling of small loops that happen to have
barriers in them. If a loop has a barrier in it, it is OK to duplicate
it for the unroll.
llvm-svn: 256075
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15629
llvm-svn: 256073
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
When copying aggregate registers within the same register class, there may
be an overlap between source and destination that forces us to do the copy
backwards.
Do the simplest possible thing that guarantees the correct order of moves
when there are overlaps, and does whatever when there is no overlap. (The
last part forces some trivial adjustments to test cases.)
Together with r255906, this fixes a VM fault in Unreal Elemental Demo.
While at it, change the generation of kill and def flags to something that
looks more reasonable. This method is used very late during compilation, so
it probably doesn't matter in practice, and to be honest, I don't know if
this change is actually correct because the semantics in connection with
aggregate registers vs. sub-registers are not clear to me.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=93264
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15622
llvm-svn: 256072
 | 
| | 
| 
| 
|  | 
llvm-svn: 256025
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary: This is just my first commit. Test!
    Reviewers: none
    Subscribers: none
    Differential Revision: none
llvm-svn: 256022
 | 
| | 
| 
| 
| 
| 
|  | 
This reverts commit a493cb636e0152ad28210934a47c6c44b1437193.
llvm-svn: 256021
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary: This is just my first commit. Test!
Reviewers: none
Subscribers: none
Differential Revision: none
llvm-svn: 256020
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This change promotes load instructions which directly read from stores by
replacing them with mov instructions. If the store is wider than the load,
the load will be replaced with a bitfield extract.
For example :
  STRWui %W1, %X0, 1
  %W0 = LDRHHui %X0, 3
becomes
  STRWui %W1, %X0, 1
  %W0 = UBFMWri %W1, 16, 31
llvm-svn: 256004
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
and WRDSP instructions
Differential Revision: http://reviews.llvm.org/D14429
llvm-svn: 255991
 | 
| | 
| 
| 
|  | 
llvm-svn: 255939
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Use the 3-byte (4 with REX prefix) push-pop sequence for materializing
small constants. This is smaller than using a mov (5, 6 or 7 bytes
depending on size and REX prefix), but it's likely to be slower, so
only used for 'minsize'.
This is a follow-up to r255656.
Differential Revision: http://reviews.llvm.org/D15549
llvm-svn: 255936
 | 
| | 
| 
| 
| 
| 
| 
|  | 
This reverts commit r255895. The patch breaks internal tests. Reverting until a
fix is ready.
llvm-svn: 255928
 | 
| | 
| 
| 
|  | 
llvm-svn: 255925
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Reviewers: tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15583
Patch by: Changpeng Fang
llvm-svn: 255908
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
The method insertNOPs expected the number of wait states to be passed as
parameter, while eliminateFrameIndex passed the immediate argument for the
S_NOP, leading to an off-by-one error. Rename the method to make the
meaning of its parameter clearer. The number of 4 / 5 wait states (which
is what the method has always _tried_ to do according to the comment) is
correct according to the hardware docs.
I stumbled upon this while trying to track down the cause of
https://bugs.freedesktop.org/show_bug.cgi?id=93264. While clearly needed,
this patch unfortunately does not fix that bug...
Reviewers: arsenm, tstellarAMD
Subscribers: arsenm, llvm-commits
Differential Revision: http://reviews.llvm.org/D15542
llvm-svn: 255906
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
These days relocations are created and stored in a deterministic way.
The order they are created is also suitable for the .o file, so we don't
need an explicit sort.
The last remaining exception is MIPS.
llvm-svn: 255902
 | 
| | 
| 
| 
| 
| 
|  | 
This reverts commit r255896. It broke the tests.
llvm-svn: 255899
 | 
| | 
| 
| 
| 
| 
| 
|  | 
Every target changing sortRelocs was first calling the parent
implementation. Just run that first.
llvm-svn: 255898
 | 
| | 
| 
| 
|  | 
llvm-svn: 255897
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This patch enables PostRAScheduler specifically for AArch64 generic build,
which is beneficial from the performance perspective.
Speedups up to 2 to 7% for some benchmarks on A57 and A53 are observed.
Also benchmarks from LLVM test-suite did not regress.
Differential Revision: http://reviews.llvm.org/D15557
llvm-svn: 255896
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This patch adds a DAG combine for (any_extend (extract_vector_elt v, i)) ->
(extract_vector_elt v, i). The combine enables us to better match some SMOV
patterns.
Differential Revision: http://reviews.llvm.org/D15515
llvm-svn: 255895
 | 
| | 
| 
| 
|  | 
llvm-svn: 255894
 | 
| | 
| 
| 
| 
| 
| 
|  | 
Add option to enable/disable LEA optimization pass. By default the pass is disabled.
Differential Revision: http://reviews.llvm.org/D15573
llvm-svn: 255881
 | 
| | 
| 
| 
|  | 
llvm-svn: 255877
 | 
| | 
| 
| 
| 
| 
|  | 
This is in preparation to an upcoming patch.
llvm-svn: 255872
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
This creates the initial infrastructure for writing ELF output files. It
doesn't yet have any implementation for encoding instructions.
Differential Revision: http://reviews.llvm.org/D15555
llvm-svn: 255869
 | 
| | 
| 
| 
| 
| 
|  | 
We now have 240 expected failures.
llvm-svn: 255858
 | 
| | 
| 
| 
|  | 
llvm-svn: 255847
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
Implement eliminateCallFramePsuedo to handle ADJCALLSTACKUP/DOWN
pseudo-instructions. Add a test calling a vararg function which causes non-0
adjustments. This revealed an issue with RegisterCoalescer wherein it
eliminates a COPY from SP32 to a vreg but failes to update the live ranges
of EXPR_STACK, causing a machineinstr verifier failure (so this test
is commented out).
Also add a dynamic alloca test, which causes a callseq_end dag node with
a 0 (instead of undef) second argument to be generated. We currently fail to
select that, so adjust the ADJCALLSTACKUP tablegen code to handle it.
Differential Revision: http://reviews.llvm.org/D15587
llvm-svn: 255844
 | 
| | 
| 
| 
| 
| 
|  | 
We don't need static_casts when we use the right Subtarget.
llvm-svn: 255836
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
This matches the other MIB methods, none of which modify the builder.
Without this, we can't chain copyImplicitOps.
Also reformat the few users, in PPCEarlyReturn.
llvm-svn: 255828
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
The access function has a short entry and a short exit, the initialization
block is only run the first time. To improve the performance, we want to
have a short frame at the entry and exit.
We explicitly handle most of the CSRs via copies. Only the CSRs that are not
handled via copies will be in CSR_SaveList.
Frame lowering and prologue/epilogue insertion will generate a short frame
in the entry and exit according to CSR_SaveList. The majority of the CSRs will
be handled by register allcoator. Register allocator will try to spill and
reload them in the initialization block.
We add CSRsViaCopy, it will be explicitly handled during lowering.
1> we first set FunctionLoweringInfo->SplitCSR if conditions are met (the target
   supports it for the given machine function and the function has only return
   exits). We also call TLI->initializeSplitCSR to perform initialization.
2> we call TLI->insertCopiesSplitCSR to insert copies from CSRsViaCopy to
   virtual registers at beginning of the entry block and copies from virtual
   registers to CSRsViaCopy at beginning of the exit blocks.
3> we also need to make sure the explicit copies will not be eliminated.
The target independent portion was committed as r255353.
rdar://problem/23557469
Differential Revision: http://reviews.llvm.org/D15341
llvm-svn: 255821
 | 
| | 
| 
| 
|  | 
llvm-svn: 255817
 | 
| | 
| 
| 
|  | 
llvm-svn: 255816
 | 
| | 
| 
| 
| 
| 
|  | 
Differential Revision: http://reviews.llvm.org/D15546
llvm-svn: 255815
 | 
| | 
| 
| 
|  | 
llvm-svn: 255811
 | 
| | 
| 
| 
|  | 
llvm-svn: 255807
 | 
| | 
| 
| 
| 
| 
|  | 
This reverts commit r255762.
llvm-svn: 255806
 | 
| | 
| 
| 
| 
| 
| 
|  | 
If a branch both branches to and falls through to the same block, treat it as
an explicit branch.
llvm-svn: 255803
 | 
| | 
| 
| 
| 
| 
|  | 
The default cost was 0 with the assumption that it is predictable.
llvm-svn: 255796
 | 
| | 
| 
| 
|  | 
llvm-svn: 255788
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
The SystemZ linkers provide an optimization to transform a general-
or local-dynamic TLS sequence into an initial-exec sequence if possible.
Do do that, the compiler generates a function call to __tls_get_offset,
which is a brasl instruction annotated with *two* relocations:
- a R_390_PLT32DBL to install __tls_get_offset as branch target
- a R_390_TLS_GDCALL / R_390_TLS_LDCALL to inform the linker that
  the TLS optimization should be performed if possible
If the optimization is performed, the brasl is replaced by an ld load
instruction.
However, *both* relocs are processed independently by the linker.
Therefore it is crucial that the R_390_PLT32DBL is processed *first*
(installing the branch target for the brasl) and the R_390_TLS_GDCALL
is processed *second* (replacing the whole brasl with an ld).
If the relocs are swapped, the linker will first replace the brasl
with an ld, and *then* install the __tls_get_offset branch target
offset.  Since ld has a different layout than brasl, this may even
result in a completely different (or invalid) instruction; in any
case, the resulting code is corrupted.
Unfortunately, the way the MC common code sorts relocations causes
these two to *always* end up the wrong way around, resulting in
wrong code generation by the linker and crashes.
This patch overrides the sortRelocs routine to detect this particular
pair of relocs and enforce the required order.
llvm-svn: 255787
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
When comparing a zero-extended value against a constant small enough to
be in range of the inner type, it doesn't matter whether a signed or
unsigned compare operation (for the outer type) is being used.  This is
why the code in adjustSubwordCmp had this assertion:
    assert(C.ICmpType == SystemZICMP::Any &&
           "Signedness shouldn't matter here.");
assuming the the caller had already detected that fact.  However, it
turns out that there cases, in particular with always-true or always-
false conditions that have not been eliminated when compiling at -O0,
where this is not true.
Instead of failing an assertion if C.ICmpType is not SystemZICMP::Any
here, we can simply *set* it safely to SystemZICMP::Any, however.
llvm-svn: 255786
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
This removes an unpleasant hack involving a global variable for special
lowering of certain memcpy calls. These are now lowered as intended in
EmitTargetCodeForMemcpy in the same way that other targets do it.
llvm-svn: 255785
 |