bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[RegBankSelect] Explain what it would take to support non-copy	Quentin Colombet	2016-06-08	1	-0/+9
\| \| \| \| \| \| \| \| \| \| \|	repairing. Copies are easy because we repair only when there is a mismatch. For non-copy repairing, i.e., cases that involves breaking down or gathering up the value, one of the operand may not have a register bank yet. Thus, derivate a cost from that, requires more work. llvm-svn: 272157
*	[ARM] MSR instructions implicitly set CPSR	Oliver Stannard	2016-06-08	2	-0/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The MSR instructions can write to the CPSR, but we did not model this fact, so we could emit them in the middle of IT blocks, changing the condition flags for later instructions in the block. The tests use two calls to llvm.write_register.i32 because it is valid to use these instructions at the end of an IT block, which if conversion does do in some cases. With two calls, the first clobbers the flags, so a branch has to be used to make the second one conditional. Differential Revision: http://reviews.llvm.org/D21139 llvm-svn: 272154
*	Support: correct AArch64 TargetParser implementation	Saleem Abdulrasool	2016-06-08	1	-20/+21
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The architecture enumeration is shared across ARM and AArch64. However, the data is not. The code incorrectly would index into the array using the architecture index which was offset by the ARMv7 architecture enumeration. We do not have a marker for indicating the architectural family to which the enumeration belongs so we cannot be clever about offsetting the index (at least it is not immediately apparent to me). Instead, fall back to the tried-and-true method of slowly iterating the array (its not a large array, so the impact of this is not too high). Because of the incorrect indexing, if we were lucky, we would crash, but usually we would return an invalid StringRef. We did not have any tests for the AArch64 target parser previously;. Extend the previous tests I had added for ARM to cover AArch64 for ensuring that we return expected StringRefs. Take the opportunity to change some iterator types to references. This work is needed to support parsing `.arch name` directives in the AArch64 target asm parser. llvm-svn: 272145
*	[PM] LoopSimplify. Remove unneeded pass dependencies. NFCI.	Davide Italiano	2016-06-08	1	-3/+0
\| \| \| \|	llvm-svn: 272140
*	[PM/SimplifyCFG] Preserve GlobalsAA even if the IR is mutated.	Davide Italiano	2016-06-08	1	-4/+5
\| \| \| \|	llvm-svn: 272139
*	[mips] Add a proper file header in MipsFastISel.cpp	Vasileios Kalintiris	2016-06-08	1	-2/+15
\| \| \| \|	llvm-svn: 272138
*	[Hexagon] Modify HexagonExpandCondsets to handle subregisters	Krzysztof Parzyszek	2016-06-08	1	-507/+454
\| \| \| \| \| \| \| \| \|	Also, switch to using functions from LiveIntervalAnalysis to update live intervals, instead of performing the updates manually. Re-committing r272045. llvm-svn: 272135
*	[ARM] Remove redundant check. NFC	Diana Picus	2016-06-08	1	-1/+1
\| \| \| \| \| \|	isSwift is tested earlier and known to be false when we reach this code. llvm-svn: 272127
*	Avoid copies of std::strings and APInt/APFloats where we only read from it	Benjamin Kramer	2016-06-08	22	-39/+39
\| \| \| \| \| \| \| \|	As suggested by clang-tidy's performance-unnecessary-copy-initialization. This can easily hit lifetime issues, so I audited every change and ran the tests under asan, which came back clean. llvm-svn: 272126
*	[AVX512] Fix cvtusi2sd instruction Opcode, it should be 0x7B instead of 0x2A.	Igor Breger	2016-06-08	1	-1/+1
\| \| \| \|	llvm-svn: 272122
*	Make LiveDebugValues preserve CFG	Matt Arsenault	2016-06-08	1	-0/+1
\| \| \| \|	llvm-svn: 272117
*	[libFuzzer] add 'weak' back to __sanitizer_malloc_hook and __sanitizer_free_hook	Kostya Serebryany	2016-06-08	1	-0/+2
\| \| \| \|	llvm-svn: 272116
*	[libFuzzer] add a test that is built w/o coverage instrumentation but has ↵	Kostya Serebryany	2016-06-08	5	-1/+27
\| \| \| \| \| \|	the coverage rt (it should now fail with a descriptive message) llvm-svn: 272090
*	[AArch64][RegisterBankInfo] Use the generic implementation of copyCost.	Quentin Colombet	2016-06-08	1	-1/+2
\| \| \| \| \| \|	Long term we may want to give high cost at FPR to/from GPR copies. llvm-svn: 272086
*	[RegisterBankInfo] Add a size argument for the cost of copy.	Quentin Colombet	2016-06-08	3	-5/+12
\| \| \| \| \| \| \|	The cost of a copy may be different based on how many bits we have to copy around. E.g., a 8-bit copy may be different than a 32-bit copy. llvm-svn: 272084
*	[RegisterBankInfo] Move a hidden function into a static method. NFC.	Quentin Colombet	2016-06-08	1	-24/+22
\| \| \| \| \| \|	This will allow code reuse in the coming commits. llvm-svn: 272083
*	MIR: Fix parsing of stack object references in MachineMemOperands	Matthias Braun	2016-06-08	1	-1/+10
\| \| \| \| \| \| \|	The MachineMemOperand parser lacked the code to handle %stack.X references (%fixed-stack.X was working). llvm-svn: 272082
*	[pdb] Try to fix use after free.	Zachary Turner	2016-06-08	3	-0/+13
\| \| \| \|	llvm-svn: 272078
*	[pdbdump] Print out # of hash buckets.	Rui Ueyama	2016-06-07	1	-0/+1
\| \| \| \| \| \|	In the reference code, the field name is `cHashBuckets`. llvm-svn: 272075
*	[pdbdump] Print out TPI hash key size.	Rui Ueyama	2016-06-07	1	-0/+2
\| \| \| \|	llvm-svn: 272073
*	[LibFuzzer] Declare and use sanitizer functions in ``fuzzer::ExternalFunctions``	Dan Liew	2016-06-07	8	-73/+84
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes linking problems on OSX. Unfortunately it turns out we need to use an instance of the ``fuzzer::ExternalFunctions`` object in several places so this commit also replaces all instances with a single global instance. It also turns out initializing a global ``fuzzer::ExternalFunctions`` before main is entered (i.e. letting the object be initialised by the global initializers) is not safe (on OSX the call to ``Printf()`` in the CTOR crashes if it is called from a global initializer) so we instead have a global ``fuzzer::ExternalFunctions`` and initialize it inside ``FuzzerDriver()``. Multiple unit tests depend also depend on the ``fuzzer::ExternalFunctions`` global so a ``main()`` function has been added that initializes it before running any tests. Differential Revision: http://reviews.llvm.org/D20943 llvm-svn: 272072
*	[CFLAA] Kill dead code/fix comments in StratifiedSets.	George Burgess IV	2016-06-07	1	-87/+23
\| \| \| \| \| \| \| \|	Also use default/delete instead of hand-written ctors. Thanks to Jia Chen for bringing this stuff up. llvm-svn: 272064
*	AMDGPU: Add amdgpu-ps-wqm-outputs function attributes	Nicolai Haehnle	2016-06-07	1	-2/+27
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The presence of this attribute indicates that VGPR outputs should be computed in whole quad mode. This will be used by Mesa for prolog pixel shaders, so that derivatives can be taken of shader inputs computed by the prolog, fixing a bug. The generated code could certainly be improved: if a prolog pixel shader is used (which isn't common in modern OpenGL - they're used for gl_Color, polygon stipples, and forcing per-sample interpolation), Mesa will use this attribute unconditionally, because it has to be conservative. So WQM may be used in the prolog when it isn't really needed, and furthermore a silly back-and-forth switch is likely to happen at the boundary between prolog and main shader parts. Fixing this is a bit involved: we'd first have to add a mechanism by which LLVM writes the WQM-related input requirements to the main shader part binary, and then Mesa specializes the prolog part accordingly. At that point, we may as well just compile a monolithic shader... Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=95130 Reviewers: arsenm, tstellarAMD, mareko Subscribers: arsenm, llvm-commits, kzhuravl Differential Revision: http://reviews.llvm.org/D20839 llvm-svn: 272063
*	[LibFuzzer] Split the fuzzer-oom.test into two tests.	Dan Liew	2016-06-07	3	-1/+14
\| \| \| \| \| \| \| \| \| \|	This is necessary because the existing fuzzer-oom.test was Linux specific due to its use of __sanitizer_print_memory_profile() which is only available on Linux right now and so the test would fail on OSX. Differential Revision: http://reviews.llvm.org/D20977 llvm-svn: 272061
*	[pdb] Convert StringRefs to ArrayRef<uint8_t>s.	Zachary Turner	2016-06-07	2	-9/+11
\| \| \| \|	llvm-svn: 272058
*	Revert "Differential Revision: http://reviews.llvm.org/D20557"	Eric Christopher	2016-06-07	1	-55/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Author: Wei Ding <wei.ding2@amd.com> Date: Tue Jun 7 19:04:44 2016 +0000 Differential Revision: http://reviews.llvm.org/D20557 git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@272044 91177308-0d34-0410-b5e6-96231b3b80d8 as it was breaking the bots. This reverts commit r272044. llvm-svn: 272056
*	[libfuzzer] custom crossover interface function.	Mike Aizatsky	2016-06-07	7	-0/+107
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D21089 llvm-svn: 272054
*	[stack-protection] Add support for MSVC buffer security check	Etienne Bergeron	2016-06-07	7	-26/+155
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch is adding support for the MSVC buffer security check implementation The buffer security check is turned on with the '/GS' compiler switch. * https://msdn.microsoft.com/en-us/library/8dbf701c.aspx * To be added to clang here: http://reviews.llvm.org/D20347 Some overview of buffer security check feature and implementation: * https://msdn.microsoft.com/en-us/library/aa290051(VS.71).aspx * http://www.ksyash.com/2011/01/buffer-overflow-protection-3/ * http://blog.osom.info/2012/02/understanding-vs-c-compilers-buffer.html For the following example: ``` int example(int offset, int index) { char buffer[10]; memset(buffer, 0xCC, index); return buffer[index]; } ``` The MSVC compiler is adding these instructions to perform stack integrity check: ``` push ebp mov ebp,esp sub esp,50h [1] mov eax,dword ptr [__security_cookie (01068024h)] [2] xor eax,ebp [3] mov dword ptr [ebp-4],eax push ebx push esi push edi mov eax,dword ptr [index] push eax push 0CCh lea ecx,[buffer] push ecx call _memset (010610B9h) add esp,0Ch mov eax,dword ptr [index] movsx eax,byte ptr buffer[eax] pop edi pop esi pop ebx [4] mov ecx,dword ptr [ebp-4] [5] xor ecx,ebp [6] call @__security_check_cookie@4 (01061276h) mov esp,ebp pop ebp ret ``` The instrumentation above is: * [1] is loading the global security canary, * [3] is storing the local computed ([2]) canary to the guard slot, * [4] is loading the guard slot and ([5]) re-compute the global canary, * [6] is validating the resulting canary with the '__security_check_cookie' and performs error handling. Overview of the current stack-protection implementation: * lib/CodeGen/StackProtector.cpp * There is a default stack-protection implementation applied on intermediate representation. * The target can overload 'getIRStackGuard' method if it has a standard location for the stack protector cookie. * An intrinsic 'Intrinsic::stackprotector' is added to the prologue. It will be expanded by the instruction selection pass (DAG or Fast). * Basic Blocks are added to every instrumented function to receive the code for handling stack guard validation and errors handling. * Guard manipulation and comparison are added directly to the intermediate representation. * lib/CodeGen/SelectionDAG/SelectionDAGISel.cpp * lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp * There is an implementation that adds instrumentation during instruction selection (for better handling of sibbling calls). * see long comment above 'class StackProtectorDescriptor' declaration. * The target needs to override 'getSDagStackGuard' to activate SDAG stack protection generation. (note: getIRStackGuard MUST be nullptr). * 'getSDagStackGuard' returns the appropriate stack guard (security cookie) * The code is generated by 'SelectionDAGBuilder.cpp' and 'SelectionDAGISel.cpp'. * include/llvm/Target/TargetLowering.h * Contains function to retrieve the default Guard 'Value'; should be overriden by each target to select which implementation is used and provide Guard 'Value'. * lib/Target/X86/X86ISelLowering.cpp * Contains the x86 specialisation; Guard 'Value' used by the SelectionDAG algorithm. Function-based Instrumentation: * The MSVC doesn't inline the stack guard comparison in every function. Instead, a call to '__security_check_cookie' is added to the epilogue before every return instructions. * To support function-based instrumentation, this patch is * adding a function to get the function-based check (llvm 'Value', see include/llvm/Target/TargetLowering.h), * If provided, the stack protection instrumentation won't be inlined and a call to that function will be added to the prologue. * modifying (SelectionDAGISel.cpp) do avoid producing basic blocks used for inline instrumentation, * generating the function-based instrumentation during the ISEL pass (SelectionDAGBuilder.cpp), * if FastISEL (not SelectionDAG), using the fallback which rely on the same function-based implemented over intermediate representation (StackProtector.cpp). Modifications * adding support for MSVC (lib/Target/X86/X86ISelLowering.cpp) * adding support function-based instrumentation (lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp, .h) Results * IR generated instrumentation: ``` clang-cl /GS test.cc /Od /c -mllvm -print-isel-input ``` ``` * Final LLVM Code input to ISel * ; Function Attrs: nounwind sspstrong define i32 @"\01?example@@YAHHH@Z"(i32 %offset, i32 %index) #0 { entry: %StackGuardSlot = alloca i8* <<<-- Allocated guard slot %0 = call i8* @llvm.stackguard() <<<-- Loading Stack Guard value call void @llvm.stackprotector(i8* %0, i8** %StackGuardSlot) <<<-- Prologue intrinsic call (store to Guard slot) %index.addr = alloca i32, align 4 %offset.addr = alloca i32, align 4 %buffer = alloca [10 x i8], align 1 store i32 %index, i32* %index.addr, align 4 store i32 %offset, i32* %offset.addr, align 4 %arraydecay = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 0 %1 = load i32, i32* %index.addr, align 4 call void @llvm.memset.p0i8.i32(i8* %arraydecay, i8 -52, i32 %1, i32 1, i1 false) %2 = load i32, i32* %index.addr, align 4 %arrayidx = getelementptr inbounds [10 x i8], [10 x i8]* %buffer, i32 0, i32 %2 %3 = load i8, i8* %arrayidx, align 1 %conv = sext i8 %3 to i32 %4 = load volatile i8, i8* %StackGuardSlot <<<-- Loading Guard slot call void @__security_check_cookie(i8* %4) <<<-- Epilogue function-based check ret i32 %conv } ``` * SelectionDAG generated instrumentation: ``` clang-cl /GS test.cc /O1 /c /FA ``` ``` "?example@@YAHHH@Z": # @"\01?example@@YAHHH@Z" # BB#0: # %entry pushl %esi subl $16, %esp movl ___security_cookie, %eax <<<-- Loading Stack Guard value movl 28(%esp), %esi movl %eax, 12(%esp) <<<-- Store to Guard slot leal 2(%esp), %eax pushl %esi pushl $204 pushl %eax calll _memset addl $12, %esp movsbl 2(%esp,%esi), %esi movl 12(%esp), %ecx <<<-- Loading Guard slot calll @__security_check_cookie@4 <<<-- Epilogue function-based check movl %esi, %eax addl $16, %esp popl %esi retl ``` Reviewers: kcc, pcc, eugenis, rnk Subscribers: majnemer, llvm-commits, hans, thakis, rnk Differential Revision: http://reviews.llvm.org/D20346 llvm-svn: 272053
*	Revert r272045 since GCC doesn't know how to compile it.	Krzysztof Parzyszek	2016-06-07	1	-449/+507
\| \| \| \|	llvm-svn: 272048
*	[Hexagon] Modify HexagonExpandCondsets to handle subregisters	Krzysztof Parzyszek	2016-06-07	1	-507/+449
\| \| \| \| \| \| \|	Also, switch to using functions from LiveIntervalAnalysis to update live intervals, instead of performing the updates manually. llvm-svn: 272045
*	Differential Revision: http://reviews.llvm.org/D20557	Wei Ding	2016-06-07	1	-17/+55
\| \| \| \|	llvm-svn: 272044
*	[pdb] Fix a potential overflow and remove unnecessary comments.	Zachary Turner	2016-06-07	1	-3/+0
\| \| \| \|	llvm-svn: 272043
*	[CFLAA] Add AttrEscaped, remove bit twiddling functions.	George Burgess IV	2016-06-07	2	-63/+58
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch does a few things: - Unifies AttrAll and AttrUnknown (since they were used for more or less the same purpose anyway). - Introduces AttrEscaped, an attribute that notes that a value escapes our analysis for a given set, but not that an unknown value flows into said set. - Removes functions that take bit indices, since we also had functions that took bitsets, and the use of both (with similar names) was unclear and bug-prone. Patch by Jia Chen. Differential Revision: http://reviews.llvm.org/D21000 llvm-svn: 272040
*	[libfuzzer] prune_corpus option for disabling pruning during the load.	Mike Aizatsky	2016-06-07	5	-1/+19
\| \| \| \| \| \| \| \| \| \|	Summary: The option is very useful for testing, plus I intend to measure its effect on fuzzer effectiveness. Differential Revision: http://reviews.llvm.org/D21084 llvm-svn: 272035
*	Reapply [AArch64] Fix isLegalAddImmediate() to return true for valid ↵	Geoff Berry	2016-06-07	1	-2/+5
\| \| \| \| \| \| \| \|	negative values. Originally reviewed here: http://reviews.llvm.org/D17463 llvm-svn: 272023
*	Revert "[MBP] Reduce code size by running tail merging in MBP."	Haicheng Wu	2016-06-07	4	-121/+34
\| \| \| \| \| \| \|	This reverts commit r271930, r271915, r271923. They break a thumb selfhosting bot. llvm-svn: 272017
*	[ARM] Accept conditional versions of BXNS and BLXNS	Oliver Stannard	2016-06-07	1	-0/+1
\| \| \| \| \| \| \| \| \|	These instructions end in "S" but are not flag-setting, so they need including in the list of special cases in the assembly parser. Differential Revision: http://reviews.llvm.org/D21077 llvm-svn: 272015
*	[LAA] Improve non-wrapping pointer detection by handling loop-invariant case.	Andrey Turetskiy	2016-06-07	1	-4/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes PR26314. This patch adds new helper “isNoWrap” with detection of loop-invariant pointer case. Patch by Roman Shirokiy. Ref: https://llvm.org/bugs/show_bug.cgi?id=26314 Differential Revision: http://reviews.llvm.org/D17268 llvm-svn: 272014
*	[Linker/IRMover] Simplify the code a bit. NFCI.	Davide Italiano	2016-06-07	1	-25/+7
\| \| \| \|	llvm-svn: 272013
*	[X86][SSE] Add general lowering of nontemporal vector loads (fixed bad merge)	Simon Pilgrim	2016-06-07	1	-9/+37
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the only way to use the (V)MOVNTDQA nontemporal vector loads instructions is through the int_x86_sse41_movntdqa style builtins. This patch adds support for lowering nontemporal loads from general IR, allowing us to remove the movntdqa builtins in a future patch. We currently still fold nontemporal loads into suitable instructions, we should probably look at removing this (and nontemporal stores as well) or at least make the target's folding implementation aware that its dealing with a nontemporal memory transaction. There is also an issue that VMOVNTDQA only acts on 128-bit vectors on pre-AVX2 hardware - so currently a normal ymm load is still used on AVX1 targets. Differential Review: http://reviews.llvm.org/D20965 llvm-svn: 272011
*	[X86][SSE] Add general lowering of nontemporal vector loads	Simon Pilgrim	2016-06-07	2	-0/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently the only way to use the (V)MOVNTDQA nontemporal vector loads instructions is through the int_x86_sse41_movntdqa style builtins. This patch adds support for lowering nontemporal loads from general IR, allowing us to remove the movntdqa builtins in a future patch. We currently still fold nontemporal loads into suitable instructions, we should probably look at removing this (and nontemporal stores as well) or at least make the target's folding implementation aware that its dealing with a nontemporal memory transaction. There is also an issue that VMOVNTDQA only acts on 128-bit vectors on pre-AVX2 hardware - so currently a normal ymm load is still used on AVX1 targets. Differential Review: http://reviews.llvm.org/D20965 llvm-svn: 272010
*	[PM] Preserve GlobalsAA for SROA.	Davide Italiano	2016-06-07	1	-1/+6
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D21040 llvm-svn: 272009
*	[Thumb-1] Add optimized constant materialization for integers [256..512)	James Molloy	2016-06-07	2	-0/+13
\| \| \| \| \| \|	We can materialize these integers using a MOV; ADDi8 pair. llvm-svn: 272007
*	[AVX512] Fix load opcode for fast isel.	Igor Breger	2016-06-07	1	-1/+1
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D21067 llvm-svn: 272006
*	[PowerPC] Support multiple return values with fast isel	Ulrich Weigand	2016-06-07	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Using an LLVM IR aggregate return value type containing three or more integer values causes an abort in the fast isel pass. This patch adds two more registers to RetCC_PPC64_ELF_FIS to allow returning up to four integers with fast isel, just the same as is currently supported with regular isel (RetCC_PPC). This is needed for Swift and (possibly) other non-clang frontends. Fixes PR26190. llvm-svn: 272005
*	[X86][SSE] Improved blend+zero target shuffle combining to use combined ↵	Simon Pilgrim	2016-06-07	1	-7/+11
\| \| \| \| \| \| \| \| \| \|	shuffle mask directly We currently only combine to blend+zero if the target value type has 8 elements or less, but this was missing a lot of cases where the combined mask had been widened. This change makes it so we use the combined mask to determine the blend value type, allowing us to catch more widened cases. llvm-svn: 272003
*	[ARM] Shrink post-indexed LDR and STR to LDM/STM	James Molloy	2016-06-07	1	-0/+42
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	A Thumb-2 post-indexed LDR instruction such as: ldr.w r0, [r1], #4 Can be rewritten as: ldm.n r1!, {r0} LDMs can be more expensive than LDRs on some cores, so this has been enabled only in minsize mode. llvm-svn: 272002
*	[ARM] Transform LDMs into writeback form to save code size	James Molloy	2016-06-07	1	-3/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we have an LDM that uses only low registers and doesn't write to its base register: ldm.w r0, {r1, r2, r3} And that base register is dead after the LDM, then we can convert it to writeback form and use a narrow encoding: ldm.n r0!, {r1, r2, r3} Obviously, this introduces a new register write and so can cause WAW hazards, so I've enabled it only in minsize mode. This is a code size trick that ARM Compiler 5 ("armcc") does that we don't. llvm-svn: 272000
*	[ARM] Incorrect relocation type for Thumb2 B<cond>.w	Peter Smith	2016-06-07	1	-0/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The Thumb2 conditional branch B<cond>.W has a different encoding (T3) to the unconditional branch B.W (T4) as it needs to record <cond>. As the encoding is different the B<cond>.W is given a different relocation type. ELF for the ARM Architecture 4.6.1.6 (Table-13) states that R_ARM_THM_JUMP19 should be used for B<cond>.W. At present the MC layer is using the R_ARM_THM_JUMP24 from B.W. This change makes B<cond>.W use R_ARM_THM_JUMP19 and alters the existing test that checks for R_ARM_THM_JUMP24 to expect R_ARM_THM_JUMP19. llvm-svn: 271997
*	[InstCombine][AVX2] Add support for simplifying AVX2 per-element shifts to ↵	Simon Pilgrim	2016-06-07	1	-0/+125
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	native shifts Unlike native shifts, the AVX2 per-element shift instructions VPSRAV/VPSRLV/VPSLLV handle out of range shift values (logical shifts set the result to zero, arithmetic shifts splat the sign bit). If the shift amount is constant we can sometimes convert these instructions to native shifts: 1 - if all shift amounts are in range then the conversion is trivial. 2 - out of range arithmetic shifts can be clamped to the (bitwidth - 1) (a legal shift amount) before conversion. 3 - logical shifts just return zero if all elements have out of range shift amounts. In addition, UNDEF shift amounts are handled - either as an UNDEF shift amount in a native shift or as an UNDEF in the logical 'all out of range' zero constant special case for logical shifts. Differential Revision: http://reviews.llvm.org/D19675 llvm-svn: 271996