| Commit message (Collapse) | Author | Age | Files | Lines | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
I'm under the impression that we used to infer the isCommutable flag from the
instruction-associated pattern. Regardless, we don't seem to do this (at least
by default) any more. I've gone through all of our instruction definitions, and
marked as commutative all of those that should be trivial to commute (by
exchanging the first two operands). There has been special code for the RL*
instructions, and that's not changed.
Before this change, we had the following commutative instructions:
 RLDIMI
 RLDIMIo
 RLWIMI
 RLWIMI8
 RLWIMI8o
 RLWIMIo
 XSADDDP
 XSMULDP
 XVADDDP
 XVADDSP
 XVMULDP
 XVMULSP
After:
 ADD4
 ADD4o
 ADD8
 ADD8o
 ADDC
 ADDC8
 ADDC8o
 ADDCo
 ADDE
 ADDE8
 ADDE8o
 ADDEo
 AND
 AND8
 AND8o
 ANDo
 CRAND
 CREQV
 CRNAND
 CRNOR
 CROR
 CRXOR
 EQV
 EQV8
 EQV8o
 EQVo
 FADD
 FADDS
 FADDSo
 FADDo
 FMADD
 FMADDS
 FMADDSo
 FMADDo
 FMSUB
 FMSUBS
 FMSUBSo
 FMSUBo
 FMUL
 FMULS
 FMULSo
 FMULo
 FNMADD
 FNMADDS
 FNMADDSo
 FNMADDo
 FNMSUB
 FNMSUBS
 FNMSUBSo
 FNMSUBo
 MULHD
 MULHDU
 MULHDUo
 MULHDo
 MULHW
 MULHWU
 MULHWUo
 MULHWo
 MULLD
 MULLDo
 MULLW
 MULLWo
 NAND
 NAND8
 NAND8o
 NANDo
 NOR
 NOR8
 NOR8o
 NORo
 OR
 OR8
 OR8o
 ORo
 RLDIMI
 RLDIMIo
 RLWIMI
 RLWIMI8
 RLWIMI8o
 RLWIMIo
 VADDCUW
 VADDFP
 VADDSBS
 VADDSHS
 VADDSWS
 VADDUBM
 VADDUBS
 VADDUHM
 VADDUHS
 VADDUWM
 VADDUWS
 VAND
 VAVGSB
 VAVGSH
 VAVGSW
 VAVGUB
 VAVGUH
 VAVGUW
 VMADDFP
 VMAXFP
 VMAXSB
 VMAXSH
 VMAXSW
 VMAXUB
 VMAXUH
 VMAXUW
 VMHADDSHS
 VMHRADDSHS
 VMINFP
 VMINSB
 VMINSH
 VMINSW
 VMINUB
 VMINUH
 VMINUW
 VMLADDUHM
 VMULESB
 VMULESH
 VMULEUB
 VMULEUH
 VMULOSB
 VMULOSH
 VMULOUB
 VMULOUH
 VNMSUBFP
 VOR
 VXOR
 XOR
 XOR8
 XOR8o
 XORo
 XSADDDP
 XSMADDADP
 XSMAXDP
 XSMINDP
 XSMSUBADP
 XSMULDP
 XSNMADDADP
 XSNMSUBADP
 XVADDDP
 XVADDSP
 XVMADDADP
 XVMADDASP
 XVMAXDP
 XVMAXSP
 XVMINDP
 XVMINSP
 XVMSUBADP
 XVMSUBASP
 XVMULDP
 XVMULSP
 XVNMADDADP
 XVNMADDASP
 XVNMSUBADP
 XVNMSUBASP
 XXLAND
 XXLNOR
 XXLOR
 XXLXOR
This is a by-inspection change, and I'm not sure how to write a reliable test
case. I would like advice on this, however.
llvm-svn: 204609
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
- If only two registers are passed to a three-register operation, then the
  first argument is both source and destination register.
- If a non-register is passed as the last argument, generate the immediate
  version of the instruction.
Also mark DADD commutative and add scheduling information (to the generic
scheduler), and implement DSUB.
Patch by David Chisnall
His work was sponsored by: DARPA, AFRL
CC: theraven
Differential Revision: http://llvm-reviews.chandlerc.com/D3148
llvm-svn: 204605
 | 
| | 
| 
| 
|  | 
llvm-svn: 204600
 | 
| | 
| 
| 
| 
| 
|  | 
There is no need to schedule this extra pass if it will have nothing to do.
llvm-svn: 204594
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
I've done some experimentation with this, and it looks like using the
lower-latency (but lower throughput) copy instruction is essentially always the
right thing to do.
My assumption is that, in order to be relatively sure that the higher-latency
copy will increase throughput, we'd want to have it unlikely to be in-flight
with its use. On the P7, the global completion table (GCT) can hold a maximum
of 120 instructions, shared among all active threads (up to 4), giving 30
instructions per thread.  So specifically, I'd require at least that many
instructions between the copy and the use before the high-latency variant is
used.
Trying this, however, over the entire test suite resulted in zero cases where
the high-latency form would be preferable. This may be a consequence of the
fact that the scheduler views copies as free, and so they tend to end up close
to their uses. For this experiment I created a function:
  unsigned chooseVSXCopy(MachineBasicBlock &MBB,
                         MachineBasicBlock::iterator I,
                         unsigned DestReg, unsigned SrcReg,
                         unsigned StartDist = 1,
                         unsigned Depth = 3) const;
with an implementation like:
  if (!Depth)
    return PPC::XXLOR;
  const unsigned MaxDist = 30;
  unsigned Dist = StartDist;
  for (auto J = I, JE = MBB.end(); J != JE && Dist <= MaxDist; ++J) {
    if (J->isTransient() && !J->isCopy())
      continue;
    if (J->isCall() || J->isReturn() || J->readsRegister(DestReg, TRI))
      return PPC::XXLOR;
    ++Dist;
  }
  // We've exceeded the required distance for the high-latency form, use it.
  if (Dist > MaxDist)
    return PPC::XVCPSGNDP;
  // If this is only an exit block, use the low-latency form.
  if (MBB.succ_empty())
    return PPC::XXLOR;
  // We've reached the end of the block, check the successor blocks (up to some
  // depth), and use the high-latency form if that is okay with all successors.
  for (auto J = MBB.succ_begin(), JE = MBB.succ_end(); J != JE; ++J) {
    if (chooseVSXCopy(**J, (*J)->begin(), DestReg, SrcReg,
                      Dist, --Depth) == PPC::XXLOR)
      return PPC::XXLOR;
  }
  // All of our successor blocks seem okay with the high-latency variant, so
  // we'll use it.
  return PPC::XVCPSGNDP;
and then changed the copy opcode selection from:
    Opc = PPC::XXLOR;
to:
    Opc = chooseVSXCopy(MBB, std::next(I), DestReg, SrcReg);
In conclusion, I'm removing the FIXME from the comment, because I believe that
there is, at least absent other examples, nothing to fix.
llvm-svn: 204591
 | 
| | 
| 
| 
|  | 
llvm-svn: 204583
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
We were already propagating the section in
a = b
With this patch we also propagate it for
a = b + 1
llvm-svn: 204581
 | 
| | 
| 
| 
| 
| 
|  | 
No functionality change.
llvm-svn: 204580
 | 
| | 
| 
| 
|  | 
llvm-svn: 204575
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
This is a pretty straight forward translation for COFF, we just need to
stick the function in a COMDAT section marked as
IMAGE_COMDAT_SELECT_NODUPLICATES.
llvm-svn: 204565
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
found with a smarter version of -Wunused-member-function that I'm playwing with.
Appologies in advance if I removed someone's WIP code.
 include/llvm/CodeGen/MachineSSAUpdater.h            |    1 
 include/llvm/IR/DebugInfo.h                         |    3 
 lib/CodeGen/MachineSSAUpdater.cpp                   |   10 --
 lib/CodeGen/PostRASchedulerList.cpp                 |    1 
 lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp    |   10 --
 lib/IR/DebugInfo.cpp                                |   12 --
 lib/MC/MCAsmStreamer.cpp                            |    2 
 lib/Support/YAMLParser.cpp                          |   39 ---------
 lib/TableGen/TGParser.cpp                           |   16 ---
 lib/TableGen/TGParser.h                             |    1 
 lib/Target/AArch64/AArch64TargetTransformInfo.cpp   |    9 --
 lib/Target/ARM/ARMCodeEmitter.cpp                   |   12 --
 lib/Target/ARM/ARMFastISel.cpp                      |   84 --------------------
 lib/Target/Mips/MipsCodeEmitter.cpp                 |   11 --
 lib/Target/Mips/MipsConstantIslandPass.cpp          |   12 --
 lib/Target/NVPTX/NVPTXISelDAGToDAG.cpp              |   21 -----
 lib/Target/NVPTX/NVPTXISelDAGToDAG.h                |    2 
 lib/Target/PowerPC/PPCFastISel.cpp                  |    1 
 lib/Transforms/Instrumentation/AddressSanitizer.cpp |    2 
 lib/Transforms/Instrumentation/BoundsChecking.cpp   |    2 
 lib/Transforms/Instrumentation/MemorySanitizer.cpp  |    1 
 lib/Transforms/Scalar/LoopIdiomRecognize.cpp        |    8 -
 lib/Transforms/Scalar/SCCP.cpp                      |    1 
 utils/TableGen/CodeEmitterGen.cpp                   |    2 
 24 files changed, 2 insertions(+), 261 deletions(-)
llvm-svn: 204560
 | 
| | 
| 
| 
| 
| 
| 
|  | 
When VSX is available, these instructions should be used in preference to the
older variants that only have access to the scalar floating-point registers.
llvm-svn: 204559
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
benchmarks.
<rdar://problem/16368461>
llvm-svn: 204558
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Since the profile can come from 32-bit machines, we need to check the
pointer size.  Change the magic number to facilitate this.
Adds tests for reading 32-bit and 64-bit binaries (both big- and
little-endian).  The tests write a binary using printf in RUN lines
(like raw-magic-but-no-header.test).  Assuming the bots don't complain,
this seems like a better way forward for testing RawInstrProfReader than
committing binary files.
<rdar://problem/16400648>
llvm-svn: 204557
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This is similar, but not identical to what gas does. The logic in MC is to just
compute the symbol table after parsing the entire file. GAS is mixed, given
.type b, @object
a = b
b:
.type b, @function
It will propagate the change and make 'a' a function. Given
.type b, @object
b:
a = b
.type b, @function
the type of 'a' is still object.
Since we do the computation in the end, we produce a function in both cases.
llvm-svn: 204555
 | 
| | 
| 
| 
|  | 
llvm-svn: 204553
 | 
| | 
| 
| 
|  | 
llvm-svn: 204548
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
When a label is parsed, check if there is type information available for the
label.  If so, check if the symbol is a function.  If the symbol is a function
and we are in thumb mode and no explicit thumb_func has been emitted, adjust the
symbol data to indicate that the function definition is a thumb function.
The application of this inferencing is improved value handling in the object
file (the required thumb bit is set on symbols which are thumb functions).  It
also helps improve compatibility with binutils.
The one complication that arises from this handling is the MCAsmStreamer.  The
default implementation of getOrCreateSymbolData in MCStreamer does not support
tracking the symbol data.  In order to support the semantics of thumb functions,
track symbol data in assembly streamer.  Although O(n) in number of labels in
the TU, this is already done in various other streamers and as such the memory
overhead is not a practical concern in this scenario.
llvm-svn: 204544
 | 
| | 
| 
| 
| 
| 
| 
|  | 
v2f64 values, like other 128-bit values, are returned under VSX in register
vs34 (Altivec register v2).
llvm-svn: 204543
 | 
| | 
| 
| 
| 
| 
| 
|  | 
The cleanup code that removes dead cast instructions only removed them from the
basic block, but didn't delete them. This fix erases them now too.
llvm-svn: 204538
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
A PHI node usually has only one value/basic block pair per incoming basic block.
In the case of a switch statement it is possible that a following PHI node may
have more than one such pair per incoming basic block. E.g.:
%0 = phi i64 [ 123456, %case2 ], [ 654321, %Entry ], [ 654321, %Entry ]
This is valid and the verfier doesn't complain, because both values are the
same.
Constant hoisting materializes the constant for each operand separately and the
value is still the same, but the variable names have changed. As a result the
verfier can't recognize anymore that they are the same value and complains.
This fix adds special update code for PHI node in constant hoisting to prevent
this corner case.
This fixes <rdar://problem/16394449>
llvm-svn: 204537
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
'BuildVectorSDNode::isConstantSplat'
This patch renames method 'isConstantSplat' as 'getConstantSplatValue'
(mainly for consistency reasons), and rewrites its logic to ensure
that we always perform a legal 'cast<ConstantSDNode>'.
Added test shift-combine-crash.ll to verify that DAGCombiner no longer crashes with an assertion failure in the attempt to simplify a vector shift by a vector of all undef counts.
llvm-svn: 204536
 | 
| | 
| 
| 
|  | 
llvm-svn: 204530
 | 
| | 
| 
| 
| 
| 
| 
|  | 
an ID, so this is a noop.
Thanks Manman for catching this!
llvm-svn: 204528
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
sym_a:
sym_d = sym_a + 1
This is the smallest fix I was able to extract from what got reverted in
r204203.
llvm-svn: 204527
 | 
| | 
| 
| 
|  | 
llvm-svn: 204526
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
We make sure a spill is not hoisted to a hotter outer loop by adding
a condition. Hoist a spill to outer loop if there are multiple dependents
(it can be beneficial if more than one dependents are hoisted) or
if DepSV (the hoisting source) is hotter than SV (the hoisting destination).
rdar://16268194
llvm-svn: 204522
 | 
| | 
| 
| 
| 
| 
|  | 
absolute so relative paths are properly handled in both Windows and Unix.
llvm-svn: 204520
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
Fixes 80-column violation at the same time.
<rdar://problem/15950346>
llvm-svn: 204516
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
Include non-text characters in the magic number so that text files can't
match.
<rdar://problem/15950346>
llvm-svn: 204513
 | 
| | 
| 
| 
| 
| 
|  | 
<rdar://problem/15950346>
llvm-svn: 204512
 | 
| | 
| 
| 
| 
| 
| 
| 
|  | 
No functionality change.
<rdar://problem/15950346>
llvm-svn: 204511
 | 
| | 
| 
| 
| 
| 
|  | 
<rdar://problem/15950346>
llvm-svn: 204510
 | 
| | 
| 
| 
|  | 
llvm-svn: 204508
 | 
| | 
| 
| 
|  | 
llvm-svn: 204507
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Type units have no addresses, so there's no need for DW_AT_addr_base.
This removes another relocation from every skeletal type unit and brings
LLVM's skeletal type units in line with GCC's (containing only
GNU_dwo_name (strp), comp_dir (strp), and GNU_pubnames (flag_present)).
Cary's got some ideas about using str_index in the .o file to reduce
those last two relocations (well, replace two relocations with one
relocation (pointing to the string index) and two indicies)
llvm-svn: 204506
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Previously, only regular AArch64 instructions were annotated with SchedRW lists.
This patch does the same for NEON enabling these instructions to be scheduled by
the MIScheduler. Additionally, store operations are now modeled and a few
SchedRW lists were updated for bug fixes (e.g. multiple def operands).
Reviewers: apazos, mcrosier, atrick
Patch by Dave Estes <cestes@codeaurora.org>!
llvm-svn: 204505
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Read a raw binary profile that corresponds to a memory dump from the
runtime profile.
The test is a binary file generated from
cfe/trunk/test/Profile/c-general.c with the new compiler-rt runtime and
the matching text version of the input.  It includes instructions on how
to regenerate.
<rdar://problem/15950346>
llvm-svn: 204496
 | 
| | 
| 
| 
|  | 
llvm-svn: 204494
 | 
| | 
| 
| 
| 
| 
| 
|  | 
Some of them also had the pattern on both, so this removes the
duplication.
llvm-svn: 204492
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This isn't a format we'll want to write out in practice, but moving it
to the writer library simplifies llvm-profdata and isolates it from
further changes to the format.
This also allows us to update the tests to not rely on the text output
format.
llvm-svn: 204489
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
This introduces the ProfileData library and updates llvm-profdata to
use this library for reading profiles. InstrProfReader is an abstract
base class that will be subclassed for both the raw instrprof data
from compiler-rt and the efficient instrprof format that will be used
for PGO.
llvm-svn: 204482
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
Summary:
VECTOR_SHUFFLE concatenates the vectors in an vectorwise fashion.
  <0b00, 0b01> + <0b10, 0b11> -> <0b00, 0b01, 0b10, 0b11>
VSHF concatenates the vectors in a bitwise fashion:
  <0b00, 0b01> + <0b10, 0b11> ->
  0b0100       + 0b1110       -> 0b01001110
                                 <0b10, 0b11, 0b00, 0b01>
We must therefore swap the operands to get the correct result.
The test case that discovered the issue was MultiSource/Benchmarks/nbench.
Reviewers: matheusalmeida
Reviewed By: matheusalmeida
Differential Revision: http://llvm-reviews.chandlerc.com/D3142
llvm-svn: 204480
 | 
| | 
| 
| 
|  | 
llvm-svn: 204476
 | 
| | 
| 
| 
|  | 
llvm-svn: 204475
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
|  | 
The SReg_(32|64) register classes contain special registers in addition
to the numbered SGPRs.  This can lead to machine verifier errors when
these register classes are used as sub-registers for SReg_128, since
SReg_128 only uses the numbered SGPRs.
Replacing SReg_(32|64) with SGPR_(32|64) fixes this problem, since
the SGPR_(32|64) register classes contain only numbered SGPRs.
Tests cases for this are comming in a later commit.
llvm-svn: 204474
 | 
| | 
| 
| 
| 
| 
| 
|  | 
CodeGen treats allocas outside the entry block as dynamically sized
stack objects.
llvm-svn: 204473
 | 
| | 
| 
| 
| 
| 
| 
| 
| 
|  | 
...instead of a separate Requires for each one.  This style was already
used in some places and seems more compact.
No behavioral change intended.
llvm-svn: 204452
 | 
| | 
| 
| 
| 
| 
|  | 
These complement the older float<->signed instructions.
llvm-svn: 204451
 | 
| | 
| 
| 
| 
| 
| 
|  | 
We should be using the llvm namespace and not an anonymous namespace
in a header file.
llvm-svn: 204450
 |