| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
| |
Without that check it was possible to write test cases where the size
was not specified and we ended up with weird asserts down the road,
because the default value (1) would not make sense.
llvm-svn: 272226
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
backward-hot-prob consistently.
Summary:
Consider the following diamond CFG:
A
/ \
B C
\/
D
Suppose A->B and A->C have probabilities 81% and 19%. In block-placement, A->B is called a hot edge and the final placement should be ABDC. However, the current implementation outputs ABCD. This is because when choosing the next block of B, it checks if Freq(C->D) > Freq(B->D) * 20%, which is true (if Freq(A) = 100, then Freq(B->D) = 81, Freq(C->D) = 19, and 19 > 81*20%=16.2). Actually, we should use 25% instead of 20% as the probability here, so that we have 19 < 81*25%=20.25, and the desired ABDC layout will be generated.
Reviewers: djasper, davidxl
Subscribers: llvm-commits
Differential Revision: http://reviews.llvm.org/D20989
llvm-svn: 272203
|
| |
|
|
|
|
|
|
|
|
| |
Teach AArch64RegisterBankInfo that G_OR can be mapped on either GPR or
FPR for 64-bit or 32-bit values.
Add test cases demonstrating how this information is used to coalesce a
computation on a single register bank.
llvm-svn: 272170
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The MSR instructions can write to the CPSR, but we did not model this
fact, so we could emit them in the middle of IT blocks, changing the
condition flags for later instructions in the block.
The tests use two calls to llvm.write_register.i32 because it is valid
to use these instructions at the end of an IT block, which if conversion
does do in some cases. With two calls, the first clobbers the flags, so
a branch has to be used to make the second one conditional.
Differential Revision: http://reviews.llvm.org/D21139
llvm-svn: 272154
|
| |
|
|
|
|
|
| |
The MachineMemOperand parser lacked the code to handle %stack.X
references (%fixed-stack.X was working).
llvm-svn: 272082
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
The presence of this attribute indicates that VGPR outputs should be computed
in whole quad mode. This will be used by Mesa for prolog pixel shaders, so
that derivatives can be taken of shader inputs computed by the prolog, fixing
a bug.
The generated code could certainly be improved: if a prolog pixel shader is
used (which isn't common in modern OpenGL - they're used for gl_Color, polygon
stipples, and forcing per-sample interpolation), Mesa will use this attribute
unconditionally, because it has to be conservative. So WQM may be used in the
prolog when it isn't really needed, and furthermore a silly back-and-forth
switch is likely to happen at the boundary between prolog and main shader
parts.
Fixing this is a bit involved: we'd first have to add a mechanism by which
LLVM writes the WQM-related input requirements to the main shader part binary,
and then Mesa specializes the prolog part accordingly. At that point, we may
as well just compile a monolithic shader...
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130
Reviewers: arsenm, tstellarAMD, mareko
Subscribers: arsenm, llvm-commits, kzhuravl
Differential Revision: http://reviews.llvm.org/D20839
llvm-svn: 272063
|
| |
|
|
|
|
| |
There are no VEX encoded versions of SSE4A instructions, make sure that AVX targets give the same output
llvm-svn: 272060
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Author: Wei Ding <wei.ding2@amd.com>
Date: Tue Jun 7 19:04:44 2016 +0000
Differential Revision: http://reviews.llvm.org/D20557
git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272044
91177308-0d34-0410-b5e6-96231b3b80d8
as it was breaking the bots.
This reverts commit r272044.
llvm-svn: 272056
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Summary:
This patch is adding support for the MSVC buffer security check implementation
The buffer security check is turned on with the '/GS' compiler switch.
* https://msdn.microsoft.com/en-us/library/8dbf701c.aspx
* To be added to clang here: http://reviews.llvm.org/D20347
Some overview of buffer security check feature and implementation:
* https://msdn.microsoft.com/en-us/library/aa290051(VS.71).aspx
* http://www.ksyash.com/2011/01/buffer-overflow-protection-3/
* http://blog.osom.info/2012/02/understanding-vs-c-compilers-buffer.html
For the following example:
```
int example(int offset, int index) {
char buffer[10];
memset(buffer, 0xCC, index);
return buffer[index];
}
```
The MSVC compiler is adding these instructions to perform stack integrity check:
```
push ebp
mov ebp,esp
sub esp,50h
[1] mov eax,dword ptr [__security_cookie (01068024h)]
[2] xor eax,ebp
[3] mov dword ptr [ebp-4],eax
push ebx
push esi
push edi
mov eax,dword ptr [index]
push eax
push 0CCh
lea ecx,[buffer]
push ecx
call _memset (010610B9h)
add esp,0Ch
mov eax,dword ptr [index]
movsx eax,byte ptr buffer[eax]
pop edi
pop esi
pop ebx
[4] mov ecx,dword ptr [ebp-4]
[5] xor ecx,ebp
[6] call @__security_check_cookie@4 (01061276h)
mov esp,ebp
pop ebp
ret
```
The instrumentation above is:
* [1] is loading the global security canary,
* [3] is storing the local computed ([2]) canary to the guard slot,
* [4] is loading the guard slot and ([5]) re-compute the global canary,
* [6] is validating the resulting canary with the '__security_check_cookie' and performs error handling.
Overview of the current stack-protection implementation:
* lib/CodeGen/StackProtector.cpp
* There is a default stack-protection implementation applied on intermediate representation.
* The target can overload 'getIRStackGuard' method if it has a standard location for the stack protector cookie.
* An intrinsic 'Intrinsic::stackprotector' is added to the prologue. It will be expanded by the instruction selection pass (DAG or Fast).
* Basic Blocks are added to every instrumented function to receive the code for handling stack guard validation and errors handling.
* Guard manipulation and comparison are added directly to the intermediate representation.
* lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp
* lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
* There is an implementation that adds instrumentation during instruction selection (for better handling of sibbling calls).
* see long comment above 'class StackProtectorDescriptor' declaration.
* The target needs to override 'getSDagStackGuard' to activate SDAG stack protection generation. (note: getIRStackGuard MUST be nullptr).
* 'getSDagStackGuard' returns the appropriate stack guard (security cookie)
* The code is generated by 'SelectionDAGBuilder.cpp' and 'SelectionDAGISel.cpp'.
* include/llvm/Target/TargetLowering.h
* Contains function to retrieve the default Guard 'Value'; should be overriden by each target to select which implementation is used and provide Guard 'Value'.
* lib/Target/X86/X86ISelLowering.cpp
* Contains the x86 specialisation; Guard 'Value' used by the SelectionDAG algorithm.
Function-based Instrumentation:
* The MSVC doesn't inline the stack guard comparison in every function. Instead, a call to '__security_check_cookie' is added to the epilogue before every return instructions.
* To support function-based instrumentation, this patch is
* adding a function to get the function-based check (llvm 'Value', see include/llvm/Target/TargetLowering.h),
* If provided, the stack protection instrumentation won't be inlined and a call to that function will be added to the prologue.
* modifying (SelectionDAGISel.cpp) do avoid producing basic blocks used for inline instrumentation,
* generating the function-based instrumentation during the ISEL pass (SelectionDAGBuilder.cpp),
* if FastISEL (not SelectionDAG), using the fallback which rely on the same function-based implemented over intermediate representation (StackProtector.cpp).
Modifications
* adding support for MSVC (lib/Target/X86/X86ISelLowering.cpp)
* adding support function-based instrumentation (lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, .h)
Results
* IR generated instrumentation:
```
clang-cl /GS test.cc /Od /c -mllvm -print-isel-input
```
```
*** Final LLVM Code input to ISel ***
; Function Attrs: nounwind sspstrong
define i32 @"\01?example@@YAHHH@Z"(i32 %offset, i32 %index) #0 {
entry:
%StackGuardSlot = alloca i8* <<<-- Allocated guard slot
%0 = call i8* @llvm.stackguard() <<<-- Loading Stack Guard value
call void @llvm.stackprotector(i8* %0, i8** %StackGuardSlot) <<<-- Prologue intrinsic call (store to Guard slot)
%index.addr = alloca i32, align 4
%offset.addr = alloca i32, align 4
%buffer = alloca [10 x i8], align 1
store i32 %index, i32* %index.addr, align 4
store i32 %offset, i32* %offset.addr, align 4
%arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 0
%1 = load i32, i32* %index.addr, align 4
call void @llvm.memset.p0i8.i32(i8* %arraydecay, i8 -52, i32 %1, i32 1, i1 false)
%2 = load i32, i32* %index.addr, align 4
%arrayidx = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 %2
%3 = load i8, i8* %arrayidx, align 1
%conv = sext i8 %3 to i32
%4 = load volatile i8*, i8** %StackGuardSlot <<<-- Loading Guard slot
call void @__security_check_cookie(i8* %4) <<<-- Epilogue function-based check
ret i32 %conv
}
```
* SelectionDAG generated instrumentation:
```
clang-cl /GS test.cc /O1 /c /FA
```
```
"?example@@YAHHH@Z": # @"\01?example@@YAHHH@Z"
# BB#0: # %entry
pushl %esi
subl $16, %esp
movl ___security_cookie, %eax <<<-- Loading Stack Guard value
movl 28(%esp), %esi
movl %eax, 12(%esp) <<<-- Store to Guard slot
leal 2(%esp), %eax
pushl %esi
pushl $204
pushl %eax
calll _memset
addl $12, %esp
movsbl 2(%esp,%esi), %esi
movl 12(%esp), %ecx <<<-- Loading Guard slot
calll @__security_check_cookie@4 <<<-- Epilogue function-based check
movl %esi, %eax
addl $16, %esp
popl %esi
retl
```
Reviewers: kcc, pcc, eugenis, rnk
Subscribers: majnemer, llvm-commits, hans, thakis, rnk
Differential Revision: http://reviews.llvm.org/D20346
llvm-svn: 272053
|
| |
|
|
| |
llvm-svn: 272044
|
| |
|
|
|
|
|
|
| |
negative values.
Originally reviewed here: http://reviews.llvm.org/D17463
llvm-svn: 272023
|
| |
|
|
|
|
|
| |
This reverts commit r271930, r271915, r271923. They break a thumb selfhosting
bot.
llvm-svn: 272017
|
| |
|
|
| |
llvm-svn: 272016
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently the only way to use the (V)MOVNTDQA nontemporal vector loads instructions is through the int_x86_sse41_movntdqa style builtins.
This patch adds support for lowering nontemporal loads from general IR, allowing us to remove the movntdqa builtins in a future patch.
We currently still fold nontemporal loads into suitable instructions, we should probably look at removing this (and nontemporal stores as well) or at least make the target's folding implementation aware that its dealing with a nontemporal memory transaction.
There is also an issue that VMOVNTDQA only acts on 128-bit vectors on pre-AVX2 hardware - so currently a normal ymm load is still used on AVX1 targets.
Differential Review: http://reviews.llvm.org/D20965
llvm-svn: 272010
|
| |
|
|
|
|
| |
We can materialize these integers using a MOV; ADDi8 pair.
llvm-svn: 272007
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D21067
llvm-svn: 272006
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Using an LLVM IR aggregate return value type containing three
or more integer values causes an abort in the fast isel pass.
This patch adds two more registers to RetCC_PPC64_ELF_FIS to
allow returning up to four integers with fast isel, just the
same as is currently supported with regular isel (RetCC_PPC).
This is needed for Swift and (possibly) other non-clang frontends.
Fixes PR26190.
llvm-svn: 272005
|
| |
|
|
|
|
|
|
|
|
| |
shuffle mask directly
We currently only combine to blend+zero if the target value type has 8 elements or less, but this was missing a lot of cases where the combined mask had been widened.
This change makes it so we use the combined mask to determine the blend value type, allowing us to catch more widened cases.
llvm-svn: 272003
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
A Thumb-2 post-indexed LDR instruction such as:
ldr.w r0, [r1], #4
Can be rewritten as:
ldm.n r1!, {r0}
LDMs can be more expensive than LDRs on some cores, so this has been enabled only in minsize mode.
llvm-svn: 272002
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
If we have an LDM that uses only low registers and doesn't write to its base register:
ldm.w r0, {r1, r2, r3}
And that base register is dead after the LDM, then we can convert it to writeback form and use a narrow encoding:
ldm.n r0!, {r1, r2, r3}
Obviously, this introduces a new register write and so can cause WAW hazards, so I've enabled it only in minsize mode. This is a code size trick that ARM Compiler 5 ("armcc") does that we don't.
llvm-svn: 272000
|
| |
|
|
|
|
|
|
|
|
|
|
| |
TLS access requires an offset from the TLS index. The index itself is the
section-relative distance of the symbol. For ARM, the relevant relocation
(IMAGE_REL_ARM_SECREL) is applied as a constant. This means that the value may
not be an immediate and must be lowered into a constant pool. This offset will
not be base relocated. We were previously emitting the actual address of the
symbol which would be base relocated and would therefore be the vaue offset by
the ImageBase + TLS Offset.
llvm-svn: 271974
|
| |
|
|
|
|
|
|
| |
If we had a constant group address space cast the queue pointer
wasn't enabled for the function, resulting in a crash on noreg
later.
llvm-svn: 271935
|
| |
|
|
| |
llvm-svn: 271930
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
The code layout that TailMerging (inside BranchFolding) works on is not the
final layout optimized based on the branch probability. Generally, after
BlockPlacement, many new merging opportunities emerge.
This patch calls Tail Merging after MBP and calls MBP again if Tail Merging
merges anything.
Differential Revision: http://reviews.llvm.org/D20276
llvm-svn: 271925
|
| |
|
|
|
|
|
|
|
|
|
|
| |
src2 == VCC.
Another step for unification llvm assembler/disassembler with sp3.
Besides, CodeGen output is a bit improved, thus changes in CodeGen tests.
Assembler/Disassembler tests updated/added.
Differential Revision: http://reviews.llvm.org/D20796
llvm-svn: 271900
|
| |
|
|
|
|
| |
Differential Revision: http://reviews.llvm.org/D21013
llvm-svn: 271891
|
| |
|
|
|
|
| |
of vector shuffle and select.
llvm-svn: 271872
|
| |
|
|
| |
llvm-svn: 271870
|
| |
|
|
| |
llvm-svn: 271869
|
| |
|
|
|
|
| |
combines
llvm-svn: 271834
|
| |
|
|
| |
llvm-svn: 271831
|
| |
|
|
|
|
| |
Could do this for other types to, but this is what's needed to replace the instrinsic with native IR in clang.
llvm-svn: 271828
|
| |
|
|
|
|
| |
commit.
llvm-svn: 271827
|
| |
|
|
|
|
| |
v4i32/v8i32 ANDs aren't promoted to v2i64/v4i64 when VLX is enabled.
llvm-svn: 271826
|
| |
|
|
| |
llvm-svn: 271809
|
| |
|
|
|
|
|
| |
Windows itanium is nearly identical to windows-msvc (MS ABI for C, itanium for
C++). Enable the TLS support for the target similar to the MSVC model.
llvm-svn: 271797
|
| |
|
|
|
|
|
|
| |
The AVX2 v16i16 shift lowering works by unpacking to 2 x v8i32, performing the shift and then truncating the result.
The unpacking is used to place the values in the upper 16-bits so that we can correctly sign-extend for SRA shifts. Unfortunately we weren't ensuring that the lower 16-bits were zero to ensure that SHL correctly shifts in zero bits.
llvm-svn: 271796
|
| |
|
|
|
|
|
| |
This is allowed (though used rarely) and useful to keep your tests
short.
llvm-svn: 271752
|
| |
|
|
| |
llvm-svn: 271718
|
| |
|
|
|
|
|
| |
This is very similar to r271677, but for extracts from i32 with the SIGN_EXTEND
acting on a arithmetic shift.
llvm-svn: 271717
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Under emscripten, C code can take the address of a function implemented
in Javascript (which is exposed via an import in wasm). Because imports
do not have linear memory address in wasm, we need to generate a thunk
to be the target of the indirect call; it call the import directly.
To make this possible, LLVM needs to emit the type signatures for these
functions, because they may not be called directly or referred to other
than where the address is taken.
This uses s new .s directive (.functype) which specifies the signature.
Differential Revision: http://reviews.llvm.org/D20891
Re-apply r271599 but instead of bailing with an error when a declared
function has multiple returns, replace it with a pointer argument. Also
add the test case I forgot to 'git add' last time around.
llvm-svn: 271703
|
| |
|
|
|
|
|
|
| |
in more instructions than the libary call.
Differential Revision: http://reviews.llvm.org/D20958
llvm-svn: 271678
|
| |
|
|
|
|
|
|
|
|
| |
We were assuming all SBFX-like operations would have the shl/asr form, but often
when the field being extracted is an i8 or i16, we end up with a
SIGN_EXTEND_INREG acting on a shift instead.
This is a port of r213754 from ARM to AArch64.
llvm-svn: 271677
|
| |
|
|
| |
llvm-svn: 271673
|
| |
|
|
| |
llvm-svn: 271668
|
| |
|
|
|
|
| |
These currently all lower to regular loads, generic nontemporal load support will be added in a future patch
llvm-svn: 271659
|
| |
|
|
| |
llvm-svn: 271656
|
| |
|
|
|
|
| |
types
llvm-svn: 271654
|
| |
|
|
|
|
| |
vector types
llvm-svn: 271651
|
| |
|
|
| |
llvm-svn: 271646
|