| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |
|
|
|
|
|
|
|
|
|
|
| |
in constant islands for Mips16. We introdcuce JalB16 as a synomnym
for Jal16. It makes it easier to read and is also necessary because
Jal16 is a call instruction but JalB16 is being used as a branch.
Various parts of LLVM will not work properly even in this late stage of
the backend if we use what was declared as a call instruction to function
as a branch. For one, basic block labels may not get emitted in some
situations.
llvm-svn: 195968
|
| |
|
|
| |
llvm-svn: 195967
|
| |
|
|
| |
llvm-svn: 195965
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
On the PPC A2, instructions are only issued after their input operands are
ready. Model this by specifying that input operands are read at dispatch (0
cycles after issue). This changes all input operand latencies from 1 to 0.
Significant test-suite performance changes (these are 99.5% confidence
intervals on 6 runs for both before and after):
speedups:
MultiSource/Benchmarks/sim/sim
-1.21915% +/- 0.175063%
MultiSource/Benchmarks/TSVC/LinearDependence-flt/LinearDependence-flt
-1.23946% +/- 1.05133%
SingleSource/Benchmarks/Misc/flops-2
-1.24237% +/- 0.681362%
MultiSource/Applications/JM/lencod/lencod
-1.33992% +/- 0.757498%
MultiSource/Benchmarks/TSVC/InductionVariable-flt/InductionVariable-flt
-1.51802% +/- 1.21468%
MultiSource/Benchmarks/TSVC/GlobalDataFlow-flt/GlobalDataFlow-flt
-2.18818% +/- 1.28605%
MultiSource/Benchmarks/TSVC/Packing-flt/Packing-flt
-2.21977% +/- 1.19499%
SingleSource/Benchmarks/BenchmarkGame/spectral-norm
-2.29822% +/- 0.671871%
MultiSource/Benchmarks/TSVC/Packing-dbl/Packing-dbl
-2.40975% +/- 0.355931%
SingleSource/Benchmarks/Misc/fp-convert
-2.41899% +/- 1.04751%
MultiSource/Benchmarks/TSVC/Searching-dbl/Searching-dbl
-2.50349% +/- 0.126765%
SingleSource/Benchmarks/Misc/flops-3
-3.00214% +/- 0.700795%
MultiSource/Benchmarks/TSVC/LoopRestructuring-flt/LoopRestructuring-flt
-3.56995% +/- 3.2929%
MultiSource/Applications/sgefa/sgefa
-4.24908% +/- 2.00413%
MultiSource/Benchmarks/ASC_Sequoia/IRSmk/IRSmk
-18.1294% +/- 3.96489%
regressions:
MultiSource/Benchmarks/TSVC/Reductions-dbl/Reductions-dbl
1.03249% +/- 0.178547%
MultiSource/Applications/hexxagon/hexxagon
1.16597% +/- 0.285235%
MultiSource/Benchmarks/TSVC/IndirectAddressing-flt/IndirectAddressing-flt
1.39576% +/- 1.07855%
SingleSource/Benchmarks/Misc-C++/stepanov_v1p2
1.71539% +/- 0.173182%
MultiSource/Benchmarks/Fhourstones-3.1/fhourstones3.1
1.90013% +/- 0.866472%
MultiSource/Benchmarks/TSVC/Recurrences-dbl/Recurrences-dbl
2.39854% +/- 1.05914%
MultiSource/Benchmarks/TSVC/ControlFlow-dbl/ControlFlow-dbl
2.4402% +/- 0.817904%
MultiSource/Benchmarks/TSVC/LoopRestructuring-dbl/LoopRestructuring-dbl
5.87997% +/- 3.3172%
MultiSource/Benchmarks/Trimaran/netbench-crc/netbench-crc
9.02643% +/- 5.79591%
MultiSource/Benchmarks/VersaBench/bmm/bmm
10.3517% +/- 1.227%
Obviously, there are data points on both sides of this; but I think, overall,
this supports making the change.
llvm-svn: 195951
|
| |
|
|
|
|
| |
constraints on their frame offsets.
llvm-svn: 195950
|
| |
|
|
|
|
|
| |
Some of the older PPC processor definitions don't have associated
SchedMachineModels; correct this for the PPC440.
llvm-svn: 195949
|
| |
|
|
|
|
|
|
| |
The operand latencies for loads and stores in the PPC440 itinerary were wrong
(the store operands are all inputs, and the "with update" (pre-increment)
instructions need a latency for the additional output).
llvm-svn: 195948
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The operand latencies for the PPC440 should be specified relative to dispatch,
not relative to the initial fetch-and-decode stages. Because most instructions
(ignoring bypass) wait in dispatch until their operands are ready, this is
modeled as reading input operands "at dispatch" (0 cycles after issue), and so
every input and output operand has 4 cycles subtracted from it.
This could alter scheduling slightly, but I don't expect a large effect.
llvm-svn: 195947
|
| |
|
|
|
|
|
|
|
|
| |
Modeling the fetch and decode units in the PPC440 itinerary does not add
anything to the hazard detection capability (and so modeling them just wastes
compile time).
No functionality change intended.
llvm-svn: 195946
|
| |
|
|
| |
llvm-svn: 195945
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
target independent.
Most of the x86 specific stackmap/patchpoint handling was necessitated by the
use of the native address-mode format for frame index operands. PEI has now
been modified to treat stackmap/patchpoint similarly to DEBUG_INFO, allowing
us to use a simple, platform independent register/offset pair for frame
indexes on stackmap/patchpoints.
Notes:
- Folding is now platform independent and automatically supported.
- Emiting patchpoints with direct memory references now just involves calling
the TargetLoweringBase::emitPatchPoint utility method from the target's
XXXTargetLowering::EmitInstrWithCustomInserter method. (See
X86TargetLowering for an example).
- No more ugly platform-specific operand parsers.
This patch shouldn't change the generated output for X86.
llvm-svn: 195944
|
| |
|
|
|
|
|
| |
Or we can generate some illegal instructions.
E.g. shrn2 v0.4s, v1.2d, #35. The legal range should be in [1, 16].
llvm-svn: 195941
|
| |
|
|
|
|
| |
argument double floating point.
llvm-svn: 195938
|
| |
|
|
| |
llvm-svn: 195936
|
| |
|
|
| |
llvm-svn: 195934
|
| |
|
|
| |
llvm-svn: 195933
|
| |
|
|
|
|
| |
No functionality change.
llvm-svn: 195932
|
| |
|
|
| |
llvm-svn: 195931
|
| |
|
|
|
|
|
|
|
|
| |
I think, in principle, intrinsics_gen may be added explicitly.
That said, it can be added incidentally, since each target already has dependencies to llvm-tblgen.
Almost all source files depend on both CommonTaleGen and intrinsics_gen.
Explicit add_dependencies() have been pruned under lib/Target.
llvm-svn: 195929
|
| |
|
|
|
|
|
|
|
| |
CommonTableGen.
add_public_tablegen_target adds *CommonTableGen to LLVM_COMMON_DEPENDS.
LLVM_COMMON_DEPENDS affects add_llvm_library (and other add_target stuff) within its scope.
llvm-svn: 195927
|
| |
|
|
| |
llvm-svn: 195926
|
| |
|
|
|
|
| |
I forgot to commit them. They were staging in my local repo.
llvm-svn: 195924
|
| |
|
|
| |
llvm-svn: 195923
|
| |
|
|
|
|
| |
Will be reverted in the next commit
llvm-svn: 195922
|
| |
|
|
|
|
| |
sets them.
llvm-svn: 195921
|
| |
|
|
| |
llvm-svn: 195920
|
| |
|
|
| |
llvm-svn: 195911
|
| |
|
|
|
|
|
|
|
|
|
| |
Instead of sharing functional unit names between the various PPC itineraries,
give each core its own unit names prefixed with the core name. This follows
the convention used by other backends (such as ARM), and removes a non-obvious
ordering dependency between the various PPCSchedule*.td files.
No functionality change intended.
llvm-svn: 195908
|
| |
|
|
|
|
| |
caused by build options [-Werror,-Wunused-variable].
llvm-svn: 195905
|
| |
|
|
|
|
| |
vectors
llvm-svn: 195903
|
| |
|
|
|
|
|
|
|
|
|
|
| |
conditional branches for very large targets. That will be the next small
patch. Everything now should in principle work as good (functionality
wise) as without constant islands so we decided at Mips/Imagination to
make constant islands the default for Mips16 now so that it will get
excercised a lot and this port is still experimentatl though hopefully soon
we will change the status. Some more cleanup and code review is in order
but things are converging fast.
llvm-svn: 195902
|
| |
|
|
|
|
| |
No functionality change.
llvm-svn: 195896
|
| |
|
|
|
|
|
|
| |
ARanges included even extern variables referenced by pointer non-type
template parameters even though that variable isn't part of this
compilation unit.
llvm-svn: 195895
|
| |
|
|
| |
llvm-svn: 195894
|
| |
|
|
|
|
|
|
|
|
|
| |
make PIC calls a little more efficient:
1. Remove instructions setting up $gp if it is known that a function has been
called at least once.
2. Save the address of a called function in a register instead of loading
it from the GOT at every call site.
llvm-svn: 195892
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This adds the IIC_ prefix to the instruction itinerary class names, giving the
PPC backend a naming convention for itinerary classes that is more consistent
with that used by the X86 and ARM backends.
Instruction scheduling in the PPC backend needs a bunch of cleanup and
improvement (especially for the ooo cores). This is just a preliminary step.
No functionality change intended.
llvm-svn: 195890
|
| |
|
|
| |
llvm-svn: 195884
|
| |
|
|
| |
llvm-svn: 195883
|
| |
|
|
|
| |
NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195881
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
SGPRs are spilled into VGPRs using the {READ,WRITE}LANE_B32 instructions.
v2:
- Fix encoding of Lane Mask
- Use correct register flags, so we don't overwrite the low dword
when restoring multi-dword registers.
v3:
- Register spilling seems to hang the GPU, so replace all shaders
that need spilling with a dummy shader.
v4:
- Fix *LANE definitions
- Change destination reg class for 32-bit SMRD instructions
v5:
- Remove small optimization that was crashing Serious Sam 3.
https://bugs.freedesktop.org/show_bug.cgi?id=68224
https://bugs.freedesktop.org/show_bug.cgi?id=71285
NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195880
|
| |
|
|
|
|
|
|
| |
Writing to the M0 register from an SMRD instruction hangs the GPU, so
we need to use the SGPR_32 register class, which does not include M0.
NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195879
|
| |
|
|
|
| |
NOTE: This is a candidate for the 3.4 branch.
llvm-svn: 195878
|
| |
|
|
|
|
|
| |
cross-reference debug output with encoded stack-maps, and to create stackmap
test-cases.
llvm-svn: 195874
|
| |
|
|
|
|
| |
MO_ExternalSymbol and MO_JumpTableIndex don't show up in inline asm.
llvm-svn: 195861
|
| |
|
|
| |
llvm-svn: 195859
|
| |
|
|
| |
llvm-svn: 195857
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We currently error in clang with:
"error: thread-local storage is unsupported for the current target", but we
can start to get the llvm level ready.
When compiling
template<typename T>
struct foo {
static __declspec(thread) int bar;
};
template<typename T>
__declspec(therad) int foo<T>::bar;
template struct foo<int>;
msvc produces
SECTION HEADER #3
.tls$ name
0 physical address
0 virtual address
4 size of raw data
12F file pointer to raw data (0000012F to 00000132)
0 file pointer to relocation table
0 file pointer to line numbers
0 number of relocations
0 number of line numbers
C0301040 flags
Initialized Data
COMDAT; sym= "public: static int foo<int>::bar" (?bar@?$foo@H@@2HA)
4 byte align
Read Write
gcc produces a ".data$__emutls_v.<symbol>" for the testcase with
__declspec(thread) replaced with thread_local.
llvm-svn: 195849
|
| |
|
|
|
|
|
| |
MO_ConstantPoolIndex is handled in printLeaMemReference.
MO_JumpTableIndex and MO_ExternalSymbol don't show up in inline asm.
llvm-svn: 195847
|
| |
|
|
|
|
| |
of ACLE intrinsics.
llvm-svn: 195843
|
| |
|
|
| |
llvm-svn: 195826
|