| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
llvm-svn: 221382
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This works around the limitation that PTX does not allow .param space
loads/stores with arbitrary pointers.
If a function has a by-val struct ptr arg, say foo(%struct.x *byval %d), then
add the following instructions to the first basic block :
%temp = alloca %struct.x, align 8
%tt1 = bitcast %struct.x * %d to i8 *
%tt2 = llvm.nvvm.cvt.gen.to.param %tt2
%tempd = bitcast i8 addrspace(101) * to %struct.x addrspace(101) *
%tv = load %struct.x addrspace(101) * %tempd
store %struct.x %tv, %struct.x * %temp, align 8
The above code allocates some space in the stack and copies the incoming
struct from param space to local space. Then replace all occurences of %d
by %temp.
Fixes PR21465.
llvm-svn: 221377
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Change `NamedMDNode::getOperator()` from returning `MDNode *` to
returning `Value *`. To reduce boilerplate at some call sites, add a
`getOperatorAsMDNode()` for named metadata that's expected to only
return `MDNode` -- for now, that's everything, but debug node named
metadata (such as llvm.dbg.cu and llvm.dbg.sp) will soon change. This
is part of PR21433.
Note that there's a follow-up patch to clang for the API change.
llvm-svn: 221375
|
| |
|
|
|
|
| |
Dead code identified by the Clang static analyzer.
llvm-svn: 221372
|
| |
|
|
| |
llvm-svn: 221371
|
| |
|
|
| |
llvm-svn: 221370
|
| |
|
|
| |
llvm-svn: 221369
|
| |
|
|
|
|
| |
Found by the Clang static analyzer.
llvm-svn: 221368
|
| |
|
|
| |
llvm-svn: 221367
|
| |
|
|
|
|
| |
Found by the Clang static analyzer.
llvm-svn: 221366
|
| |
|
|
| |
llvm-svn: 221358
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D6039
llvm-svn: 221355
|
| |
|
|
| |
llvm-svn: 221354
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D5797
llvm-svn: 221353
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D5933
llvm-svn: 221352
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D5163
llvm-svn: 221351
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This matches the format produced by the AMD proprietary driver.
//==================================================================//
// Shell script for converting .ll test cases: (Pass the .ll files
you want to convert to this script as arguments).
//==================================================================//
; This was necessary on my system so that A-Z in sed would match only
; upper case. I'm not sure why.
export LC_ALL='C'
TEST_FILES="$*"
MATCHES=`grep -v Patterns SIInstructions.td | grep -o '"[A-Z0-9_]\+["e]' | grep -o '[A-Z0-9_]\+' | sort -r`
for f in $TEST_FILES; do
# Check that there are SI tests:
grep -q -e 'verde' -e 'bonaire' -e 'SI' -e 'tahiti' $f
if [ $? -eq 0 ]; then
for match in $MATCHES; do
sed -i -e "s/\([ :]$match\)/\L\1/" $f
done
# Try to get check lines with partial instruction names
sed -i 's/\(;[ ]*SI[A-Z\\-]*: \)\([A-Z_0-9]\+\)/\1\L\2/' $f
fi
done
sed -i -e 's/bb0_1/BB0_1/g' ../../../test/CodeGen/R600/infinite-loop.ll
sed -i -e 's/SI-NOT: bfe/SI-NOT: {{[^@]}}bfe/g'../../../test/CodeGen/R600/llvm.AMDGPU.bfe.*32.ll ../../../test/CodeGen/R600/sext-in-reg.ll
sed -i -e 's/exp_IEEE/EXP_IEEE/g' ../../../test/CodeGen/R600/llvm.exp2.ll
sed -i -e 's/numVgprs/NumVgprs/g' ../../../test/CodeGen/R600/register-count-comments.ll
sed -i 's/\(; CHECK[-NOT]*: \)\([A-Z_0-9]\+\)/\1\L\2/' ../../../test/CodeGen/R600/select64.ll ../../../test/CodeGen/R600/sgpr-copy.ll
//==================================================================//
// Shell script for converting .td files (run this last)
//==================================================================//
export LC_ALL='C'
sed -i -e '/Patterns/!s/\("[A-Z0-9_]\+[ "e]\)/\L\1/g' SIInstructions.td
sed -i -e 's/"EXP/"exp/g' SIInstrInfo.td
llvm-svn: 221350
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This patch improves the folding of vector AND nodes into blend operations for
targets that feature SSE4.1. A vector AND node where one of the operands is
a constant build_vector with elements that are either zero or all-ones can be
converted into a blend.
This allows for example to simplify the following code:
define <4 x i32> @test(<4 x i32> %A, <4 x i32> %B) {
%1 = and <4 x i32> %A, <i32 0, i32 0, i32 0, i32 -1>
%2 = and <4 x i32> %B, <i32 -1, i32 -1, i32 -1, i32 0>
%3 = or <4 x i32> %1, %2
ret <4 x i32> %3
}
Before this patch llc (-mcpu=corei7) generated:
andps LCPI1_0(%rip), %xmm0, %xmm0
andps LCPI1_1(%rip), %xmm1, %xmm1
orps %xmm1, %xmm0, %xmm0
retq
With this patch we generate a single 'vpblendw'.
llvm-svn: 221343
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Some ARM FPUs only have 16 double-precision registers, rather than the
normal 32. LLVM represents this with the D16 target feature. This is
currently used by CodeGen to avoid using high registers when they are
not available, but the assembler and disassembler do not.
I fix this in the assmebler and disassembler rather than the
InstrInfo.td files, as the latter would require a large number of
changes everywhere one of the floating-point instructions is referenced
in the backend. This solution is similar to the one used for
co-processor numbers and MSR masks.
llvm-svn: 221341
|
| |
|
|
|
|
|
|
|
|
| |
We currently try to push an even number of registers to preserve 8-byte
alignment during a function's prologue, but only when the stack alignment is
prcisely 8. Many of the reasons for doing it apply also when that alignment > 8
(the extra store is often free, and can save another stack adjustment, though
less frequently for 16-byte stack alignment).
llvm-svn: 221321
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We were making an attempt to do this by adding an extra callee-saved GPR (so
that there was an even number in the list), but when that failed we went ahead
and pushed anyway.
This had a couple of potential issues:
+ The .cfi directives we emit misplaced dN because they were based on
PrologEpilogInserter's calculation.
+ Unaligned stores can be less efficient.
+ Unaligned stores can actually fault (likely only an issue in niche cases,
but possible).
This adds a final explicit stack adjustment if all other options fail, so that
the actual locations of the registers match up with where they should be.
llvm-svn: 221320
|
| |
|
|
|
|
|
|
|
|
| |
Patch to allow (v)blendps, (v)blendpd, (v)pblendw and vpblendd instructions to be commuted - swaps the src registers and inverts the blend mask.
This is primarily to improve memory folding (see new tests), but it also improves the quality of shuffles (see modified tests).
Differential Revision: http://reviews.llvm.org/D6015
llvm-svn: 221313
|
| |
|
|
|
|
|
|
|
| |
While fixing up the register classes in the machine combiner in a previous
commit I missed one.
This fixes the last one and adds a test case.
llvm-svn: 221308
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit r221299.
The tests
LLVM :: MC/Disassembler/Mips/mips32.txt
LLVM :: MC/Disassembler/Mips/mips32_le.txt
were failing.
llvm-svn: 221307
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
MipsInstrInfo.td. NFC.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5843
llvm-svn: 221300
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5763
llvm-svn: 221299
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
and FeatureSSE4A for all the bdver* cpus.
This patch adds 'FeatureSlowSHLD' to 'bdver3'.
According to the official AMD optimization guide for amdfam15: "Using
alternative code in place of SHLD achieves lower overall latency and
requires fewer execution resources. The 32-bit and 64-bit forms of
ADD, ADC, SHR, and LEA (except 16-bit form) are DirectPath
instructions, while SHLD is a VectorPath instruction."
This patch also explicitly sets feature AVX and SSE4A for all the bdver*
cpus. This part of the patch is a non-functional change and it is mainly
done for clarity reasons (Both XOP and FMA4 already imply AVX and SSE4A).
llvm-svn: 221296
|
| |
|
|
| |
llvm-svn: 221291
|
| |
|
|
|
|
| |
This kind of pattern is emitted by the loop vectorizer.
llvm-svn: 221289
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
Appropriately set/clear the FeatureBit for Mips16 when these assembler directives are used and also emit ".set nomips16" (previously, only ".set mips16" was being emitted).
These improvements allow for better testing of the .cpload/.cprestore assembler directives (which are not supposed to work when Mips16 is enabled).
Test Plan: The test is bare-bones because there are no MC tests for Mips16 instructions (there's only one, which checks that the Mips16 ELF header flag gets set), and that suggests to me that it has not been implemented yet in the IAS.
Reviewers: dsanders
Reviewed By: dsanders
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D5462
llvm-svn: 221277
|
| |
|
|
| |
llvm-svn: 221228
|
| |
|
|
| |
llvm-svn: 221210
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D6062
llvm-svn: 221204
|
| |
|
|
| |
llvm-svn: 221199
|
| |
|
|
|
|
| |
The opcodes were added in r220516, but I forgot to add the print names.
llvm-svn: 221185
|
| |
|
|
|
|
|
|
|
|
|
|
| |
register class tGPRRegClass if the target is thumb1.
This commit fixes a crash that occurs during register allocation which was
triggered when a virtual register defined by an inline-asm instruction had to
be spilled.
rdar://problem/18740489
llvm-svn: 221178
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
For 8-bit divrems where the remainder is used, we used to generate:
divb %sil
shrw $8, %ax
movzbl %al, %eax
That was to avoid an H-reg access, which is problematic mainly because
it isn't possible in REX-prefixed instructions.
This patch optimizes that to:
divb %sil
movzbl %ah, %eax
To do that, we explicitly extend AH, and extract the L-subreg in the
resulting register. The extension is done using the NOREX variants of
MOVZX. To support signed operations, MOVSX_NOREX is also added.
Further, this introduces a new SDNode type, [us]divrem_ext_hreg, which is
then lowered to a sequence containing a single zext (rather than 2).
Differential Revision: http://reviews.llvm.org/D6064
llvm-svn: 221176
|
| |
|
|
|
|
|
|
| |
Function calls aren't supported yet.
This was reverted due to build breakages, which should be fixed now.
llvm-svn: 221173
|
| |
|
|
|
|
|
| |
Change `Instruction::getAllMetadataOtherThanDebugLoc()` from a vector of
`MDNode` to one of `Value`. Part of PR21433.
llvm-svn: 221167
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CPU definition is redundant. The Cortex-A9 is defined as
supporting multiprocessing extensions. Remove its definition and
update appropriate tests.
LLVM defines both a cortex-a9 CPU and a cortex-a9-mp CPU. The only
difference between the two CPU definitions in ARM.td is that
cortex-a9-mp contains the feature FeatureMP for multiprocessing
extensions.
This is redundant since the Cortex-A9 is defined as having
multiprocessing extensions in the TRMs. armcc also defines the
Cortex-A9 as having multiprocessing extensions by default.
Change-Id: Ifcadaa6c322be0a33d9d2a39cfdd7da1d75981a7
llvm-svn: 221166
|
| |
|
|
|
|
|
| |
Some literals in the AArch64 backend had 15 'f's rather than 16, causing
comparisons with a constant 0xffffffffffffffff to be miscompiled.
llvm-svn: 221157
|
| |
|
|
|
|
|
|
|
| |
Hexagon was not calling InitializeELF and could not select between
ctors and init_array.
Phabricator revision: http://reviews.llvm.org/D6061
llvm-svn: 221156
|
| |
|
|
| |
llvm-svn: 221146
|
| |
|
|
| |
llvm-svn: 221119
|
| |
|
|
| |
llvm-svn: 221118
|
| |
|
|
|
|
|
|
|
| |
The problem is mostly that variadic output instruction
aren't handled, so it is rejected for having an inconsistent
number of operands, and then the right number of operands
isn't emitted.
llvm-svn: 221117
|
| |
|
|
|
|
|
|
|
| |
into MipsCCState as we did for returns. NFC."
sret arguments can never originate from an f128 argument so we detect
sret arguments and push false into OriginalArgWasF128.
llvm-svn: 221102
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
MipsCCState as we did for returns. NFC."
r221056 "[mips] Move F128 argument handling into MipsCCState as we did for returns. NFC."
r221058 "[mips] Fix unused variable warning introduced in r221056"
r221059 "[mips] Move all ByVal handling into CCState and tablegen-erated code. NFC."
r221061 "Renamed CCState members that appear to misspell 'Processed' as 'Proceed'. NFC."
It cuased an undefined behavior in LLVM :: CodeGen/Mips/return-vector.ll.
llvm-svn: 221081
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Reviewers: rnk
Reviewed By: rnk
Subscribers: rnk, llvm-commits
Differential Revision: http://reviews.llvm.org/D5978
llvm-svn: 221061
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
CCState already contains a byval implementation that is very similar to the
Mips custom code. This patch merges the custom code into the existing
common code and tablegen-erated code.
Reviewers: vmedic
Reviewed By: vmedic
Subscribers: rnk, llvm-commits
Differential Revision: http://reviews.llvm.org/D5977
llvm-svn: 221059
|