bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	DI: Reverse direction of subprogram -> function edge.	Peter Collingbourne	2015-11-05	17	-63/+63
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Previously, subprograms contained a metadata reference to the function they described. Because most clients need to get or set a subprogram for a given function rather than the other way around, this created unneeded inefficiency. For example, many passes needed to call the function llvm::makeSubprogramMap() to build a mapping from functions to subprograms, and the IR linker needed to fix up function references in a way that caused quadratic complexity in the IR linking phase of LTO. This change reverses the direction of the edge by storing the subprogram as function-level metadata and removing DISubprogram's function field. Since this is an IR change, a bitcode upgrade has been provided. Fixes PR23367. An upgrade script for textual IR for out-of-tree clients is attached to the PR. Differential Revision: http://reviews.llvm.org/D14265 llvm-svn: 252219
*	[ARM] Combine CMOV into BFI where possible	James Molloy	2015-11-04	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we have a CMOV, OR and AND combination such as: if (x & CN) y \|= CM; And: * CN is a single bit; * All bits covered by CM are known zero in y; Then we can convert this to a sequence of BFI instructions. This will always be a win if CM is a single bit, will always be no worse than the TST & OR sequence if CM is two bits, and for thumb will be no worse if CM is three bits (due to the extra IT instruction). llvm-svn: 252057
*	ARM: add extra test for watchOS ABI	Tim Northover	2015-10-30	1	-0/+153
\| \| \| \|	llvm-svn: 251705
*	ARM: add support for WatchOS's compact unwind information.	Tim Northover	2015-10-28	4	-3/+67
\| \| \| \|	llvm-svn: 251573
*	ARM: teach backend about WatchOS and TvOS libcalls.	Tim Northover	2015-10-28	2	-0/+170
\| \| \| \| \| \| \|	The most substantial changes are again for watchOS: libcalls are hard-float if needed and sincos has a different calling convention. llvm-svn: 251571
*	ARM: add backend support for the ABI used in WatchOS	Tim Northover	2015-10-28	1	-0/+146
\| \| \| \| \| \| \|	At the LLVM level this ABI is essentially a minimal modification of AAPCS to support 16-byte alignment for vector types and the stack. llvm-svn: 251570
*	[ARM] Expand ROTL and ROTR of vector value types	Charlie Turner	2015-10-27	1	-0/+14
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: After D13851 landed, we saw backend crashes when compiling the reduced test case included in this patch. The right fix seems to be to allow these vector types for expansion in instruction selection. Reviewers: rengolin, t.p.northover Subscribers: RKSimon, t.p.northover, aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D14082 llvm-svn: 251401
*	ARM: make sure VFP loads and stores are properly aligned.	Tim Northover	2015-10-26	1	-0/+98
\| \| \| \| \| \| \|	Both VLDRS and VLDRD fault if the memory is not 4 byte aligned, which wasn't really being checked before, leading to faults at runtime. llvm-svn: 251352
*	Fix tests.	Peter Collingbourne	2015-10-26	1	-1/+1
\| \| \| \|	llvm-svn: 251343
*	ARM/ELF: Better codegen for global variable addresses.	Peter Collingbourne	2015-10-26	4	-66/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In PIC mode we were previously computing global variable addresses (or GOT entry addresses) by adding the PC, the PC-relative GOT displacement and the GOT-relative symbol/GOT entry displacement. Because the latter two displacements are fixed, we ended up performing one more addition than necessary. This change causes us to compute addresses using a single PC-relative displacement, resulting in a shorter code sequence. This reduces code size by about 4% in a recent build of Chromium for Android. As a result of this change we no longer need to compute the GOT base address in the ARM backend, which allows us to remove the Global Base Reg pass and SDAG lowering for the GOT. We also now no longer use the GOT when addressing a symbol which is known to be defined in the same linkage unit. Specifically, the symbol must have either hidden visibility or a strong definition in the current module in order to not use the the GOT. This is a change from the previous behaviour where we would use the GOT to address externally visible symbols defined in the same module. I think the only cases where this could matter are cases involving symbol interposition, but we don't really support that well anyway. Differential Revision: http://reviews.llvm.org/D13650 llvm-svn: 251322
*	[ARM] Handle the inline asm constraint type 'o'	James Molloy	2015-10-26	1	-0/+11
\| \| \| \| \| \|	This means "memory with offset" and requires very little plumbing to get working. This fixes PR25317. llvm-svn: 251280
*	[ARM CodeGen] @llvm.debugtrap call may be removed when restoring callee ↵	Oleg Ranevskyy	2015-10-23	1	-0/+17
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	saved registers Summary: When ARMFrameLowering::emitPopInst generates a "pop" instruction to restore the callee saved registers, it checks if the LR register is among them. If so, the function may decide to remove the basic block's terminator and replace it with a "pop" to the PC register instead of LR. This leads to a problem when the block's terminator is preceded by a "llvm.debugtrap" call. The MI iterator points to the trap in such a case, which is also a terminator. If the function decides to restore LR to PC, it erroneously removes the trap. Reviewers: asl, rengolin Subscribers: aemerson, jfb, rengolin, dschuff, llvm-commits Differential Revision: http://reviews.llvm.org/D13672 llvm-svn: 251123
*	Fix incorrect target triple in fp16-promote.ll	Pirama Arumuga Nainar	2015-10-22	1	-96/+99
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Hyphens were missing from the triple, causing it to be parsed incorrectly. This patch updates the triple and makes necessary changes to the expected output. Patch is from Vinicius Tinti. Reviewers: ab, tinti Subscribers: srhines, llvm-commits Differential Revision: http://reviews.llvm.org/D13792 llvm-svn: 251020
*	Add missing load/store flags to thumb2 instructions.	Pete Cooper	2015-10-22	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	These were the cause of a verifier error when building 7zip with -verify-machineinstrs. Running 'make check' with the verifier triggered the same error on the test here so i've updated the test to run the verifier on one of its runs instead of adding a new one. While looking at this code, there was a stale comment that these instructions were only used for disassembly. This probably used to be the case, but they are now used in the 'ARM load / store optimization pass' too. This reapplies r242300 which was reverted in r242428 due to bot failures. Ultimately those failures were spurious and completely unrelated to this commit. I reverted this at the time because it was thought to be at fault. llvm-svn: 250969
*	Adding support for TargetLoweringBase::LibCall	Artyom Skrobov	2015-10-20	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: TargetLoweringBase::Expand is defined as "Try to expand this to other ops, otherwise use a libcall." For ISD::UDIV and ISD::SDIV, the choice between the two possibilities was defined in a rather convoluted way: - if DIVREM is legal, expand to DIVREM - if DIVREM has a custom lowering, expand to DIVREM - if DIVREM libcall is defined and a remainder from the same division is computed elsewhere, expand to a DIVREM libcall - else, expand to a DIV libcall This had the undesirable effect that if both DIV and DIVREM are implemented as libcalls, then ISD::UDIV and ISD::SDIV are expanded to the heavier DIVREM libcall, even when the remainder isn't used. The new code adds a new LegalizeAction, TargetLoweringBase::LibCall, so that backends can directly control whether they prefer an expansion or a conversion to a libcall. This makes the generic lowering code even more generic, allowing its reuse in a wider range of target-specific configurations. The useful effect is that ARM backend will now generate a call to __aeabi_{i,u}div rather than __aeabi_{i,u}divmod in cases where it doesn't need the remainder. There's no functional change outside the ARM backend. Reviewers: t.p.northover, rengolin Subscribers: t.p.northover, llvm-commits, aemerson Differential Revision: http://reviews.llvm.org/D13862 llvm-svn: 250826
*	Fix mapping of @llvm.arm.ssat/usat intrinsics to ssat/usat instructions	Asiri Rathnayake	2015-10-19	5	-0/+100
\| \| \| \| \| \| \| \| \| \| \| \| \|	The mapping of these two intrinsics in ARMInstrInfo.td had a small omission which lead to their operands not being validated/transformed before being lowered into usat and ssat instructions. This can cause incorrect instructions to be emitted. I've also added tests for the remaining two saturating arithmatic intrinsics @llvm.arm.qadd and @llvm.arm.qsub as they are missing codegen tests. llvm-svn: 250697
*	[ARM] Make sure we do not dereference the end iterator when accessing debug	Quentin Colombet	2015-10-15	1	-0/+57
\| \| \| \| \| \| \| \| \| \|	information. Although the problem was always here, it would only be exposed when shrink-wrapping is enable. rdar://problem/23110493 llvm-svn: 250352
*	ARM: tweak WoA frame lowering	Saleem Abdulrasool	2015-10-09	1	-0/+22
\| \| \| \| \| \| \| \| \| \|	Accept r11 when targeting Windows on ARM rather than just low registers. Because we are in a thumb-2 only mode, this may be slightly more expensive in code size, but results in better code for the environment since it spills the frame register, which is generally desired for fast stack walking as per the ABI. llvm-svn: 249804
*	[ARM] Promote helper function to SelectionDAG.	Chad Rosier	2015-10-07	1	-0/+9
\| \| \| \| \| \| \| \| \|	I'll be using the function in a similar combine for AArch64. The helper was also improved to handle undef values. Part of http://reviews.llvm.org/D13442 llvm-svn: 249572
*	[ARM] Use correct half-precision functions in EABI mode	Oliver Stannard	2015-10-07	1	-26/+36
\| \| \| \| \| \| \| \| \|	The ARM RTABI defines the half- to single-precision float conversion functions with an __aeabi prefix, but libgcc only has them with a __gnu prefix. Therefore we need to emit the __aeabi version when compiling with an eabi or eabihf triple, and the __gnu version with a gnueabi or gnueabihf triple. llvm-svn: 249565
*	[ARM] Prevent PerformVDIVCombine from combining a vcvt/vdiv with 8 lanes.	Chad Rosier	2015-10-07	1	-0/+8
\| \| \| \| \| \|	This would result in a crash since the vcvt used does not support v8i32 types. llvm-svn: 249560
*	[ARM][AArch64] Only lower to interleaved load/store if the target has NEON	Jeroen Ketema	2015-10-07	1	-48/+89
\| \| \| \| \| \| \| \| \|	Without an additional check for NEON, the compiler crashes during legalization of NEON ldN/stN. Differential Revision: http://reviews.llvm.org/D13508 llvm-svn: 249550
*	[ARM] Simplify tests and make checks more rigid. NFC.	Chad Rosier	2015-10-06	1	-67/+36
\| \| \| \|	llvm-svn: 249432
*	[ARM] Modify codegen for memcpy intrinsic to prefer LDM/STM.	Scott Douglass	2015-10-05	3	-2/+189
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	We were previously codegen'ing memcpy as regular load/store operations and hoping that the register allocator would allocate registers in ascending order so that we could apply an LDM/STM combine after register allocation. According to the commit that first introduced this code (r37179), we planned to teach the register allocator to allocate the registers in ascending order. This never got implemented, and up to now we've been stuck with very poor codegen. A much simpler approach for achieving better codegen is to create MEMCPY pseudo instructions, attach scratch virtual registers to them and then, post register allocation, expand the MEMCPYs into LDM/STM pairs using the scratch registers. The register allocator will have picked arbitrary registers which we sort when expanding the MEMCPY. This approach also avoids the need to repeatedly calculate offsets which ultimately ought to be eliminated pre-RA in order to decrease register pressure. Fixes PR9199 and PR23768. [This is based on Peter Collingbourne's r238473 which was reverted.] Differential Revision: http://reviews.llvm.org/D13239 Change-Id: I727543c2e94136e0f80b8e22d5642d7b9ee5b458 Author: Peter Collingbourne <peter@pcc.me.uk> llvm-svn: 249322
*	[ARM] More care with Thumb1 writeback in ARMLoadStoreOptimizer	Scott Douglass	2015-10-01	1	-0/+27
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D13240 llvm-svn: 249002
*	[ARM][NEON] Use address space in vld([1234]\|[234]lane) and ↵	Jeroen Ketema	2015-09-30	33	-408/+568
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	vst([1234]\|[234]lane) instructions This commit changes the interface of the vld[1234], vld[234]lane, and vst[1234], vst[234]lane ARM neon intrinsics and associates an address space with the pointer that these intrinsics take. This changes, e.g., <2 x i32> @llvm.arm.neon.vld1.v2i32(i8, i32) to <2 x i32> @llvm.arm.neon.vld1.v2i32.p0i8(i8, i32) This change ensures that address spaces are fully taken into account in the ARM target during lowering of interleaved loads and stores. Differential Revision: http://reviews.llvm.org/D12985 llvm-svn: 248887
*	[ARM] Don't generate clrex for pre-v7 targets.	Ahmed Bougacha	2015-09-26	1	-1/+32
\| \| \| \| \| \|	Since r248294, we emit clrex, but it doesn't exist on v6. llvm-svn: 248640
*	ARM: address WoA division limitation	Saleem Abdulrasool	2015-09-25	2	-36/+49
\| \| \| \| \| \| \| \| \| \| \| \| \|	We now emit the compiler generated divide by zero check that was needed for the MSVC routines. We construct a psuedo-instruction for the DBZ check as the operation requires splitting up the BB. For the 64-bit operations, we need to custom expand the node as we need to insert the DBZ check and then emit the libcall to the appropriate name. Because this is target specific, it seemed better to reproduce the expansion operation from the target-agnostic type legalization rather than sink this there to avoid the duplication. The division library calls now match MSVC semantically. llvm-svn: 248561
*	Introduce target hook for optimizing register copies	Matt Arsenault	2015-09-24	4	-76/+98
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Allow a target to do something other than search for copies that will avoid cross register bank copies. Implement for SI by only rewriting the most basic copies, so it should look through anything like a subregister extract. I'm not entirely satisified with this because it seems like eliminating a reg_sequence that isn't fully used should work generically for all targets without them having to override something. However, it seems to be tricky to have a simple implementation of this without rewriting to invalid kinds of subregister copies on some targets. I'm not sure if there is currently a generic way to easily check if a subregister index would be valid for the current use. The current set of TargetRegisterInfo::get*Class functions don't quite behave like I would expect (e.g. getSubClassWithSubReg returns the maximal register class rather than the minimal), so I'm not sure how to make the generic test keep searching if SrcRC:SrcSubReg is a valid replacement for DefRC:DefSubReg. Making the default implementation to check for simple copies breaks a variety of ARM and x86 tests by producing illegal subregister uses. The ARM tests are not actually changed since it should still be using the same sharesSameRegisterFile implementation, this just relaxes them to not check for specific registers. llvm-svn: 248478
*	ARM: fix folding stack adjustment (again again again...)	Tim Northover	2015-09-23	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \|	This time, the issue is that we weren't accounting for the possibility that aligned DPRs could have been stored after the final "push" in a prologue. When that happened we effectively moved a "sub sp, #N" from below the aligned stores to above them, and everything went to pot. To make it worse, I'd actually committed something testing that we produced wrong code, so the test update is tiny. llvm-svn: 248437
*	[ARM] Emit clrex in the expanded cmpxchg fail block.	Ahmed Bougacha	2015-09-22	5	-69/+138
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	ARM counterpart to r248291: In the comparison failure block of a cmpxchg expansion, the initial ldrex/ldxr will not be followed by a matching strex/stxr. On ARM/AArch64, this unnecessarily ties up the execution monitor, which might have a negative performance impact on some uarchs. Instead, release the monitor in the failure block. The clrex instruction was designed for this: use it. Also see ARMARM v8-A B2.10.2: "Exclusive access instructions and Shareable memory locations". Differential Revision: http://reviews.llvm.org/D13033 llvm-svn: 248294
*	[ARM] Do not scale vext with a factor	Jeroen Ketema	2015-09-21	1	-0/+11
\| \| \| \| \| \| \| \| \| \| \| \| \|	The vext pseudo-instruction takes the number of elements that need to be extracted, not the number of bytes. Hence, use the number of elements directly instead of scaling them with a factor. Reviewers: Silviu Baranga, James Molloy (not reflected in the differential revision) Differential Revision: http://reviews.llvm.org/D12974 llvm-svn: 248208
*	Update edge weights properly when merging blocks in if-conversion.	Cong Hou	2015-09-18	1	-0/+6
\| \| \| \| \| \| \| \|	In if-conversion, there is a utility function MergeBlocks() that is used to merge blocks. However, when new edges are built in this function the edge weight is either not provided or not updated properly, leading to a modified CFG with incorrect edge weights. This patch corrects this issue. Differential Revision: http://reviews.llvm.org/D12513 llvm-svn: 248030
*	Limit the range of processors supported by ARM fast isel to v6 or	Eric Christopher	2015-09-18	2	-36/+0
\| \| \| \| \| \| \| \|	later as that's all that is tested right now. Fixes PR24858. llvm-svn: 248027
*	Scaling up values in ARMBaseInstrInfo::isProfitableToIfCvt() before they are ↵	Cong Hou	2015-09-18	4	-9/+9
\| \| \| \| \| \| \| \| \| \|	scaled by a probability to avoid precision issue. In ARMBaseInstrInfo::isProfitableToIfCvt(), there is a simple cost model in which the number of cycles is scaled by a probability to estimate the cost. However, when the number of cycles is small (which is usually the case), there is a precision issue after the computation. To avoid this issue, this patch scales those cycles by 1024 (chosen to make the multiplication a litter faster) before they are scaled by the probability. Other variables are also scaled up for the final comparison. Differential Revision: http://reviews.llvm.org/D12742 llvm-svn: 248018
*	[ShrinkWrap] Refactor the handling of infinite loop in the analysis.	Quentin Colombet	2015-09-17	1	-0/+62
\| \| \| \| \| \| \| \| \|	- Strenghten the logic to be sure we hoist the restore point out of the current loop. (The fixes a bug with infinite loop, added as part of the patch.) - Walk over the exit blocks of the current loop to conver to the desired restore point in one iteration of the update loop. llvm-svn: 247958
*	[ARM] Extract shifts out of multiply-by-constant	John Brawn	2015-09-14	1	-31/+182
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Turning (op x (mul y k)) into (op x (lsl (mul y k>>n) n)) is beneficial when we can do the lsl as a shifted operand and the resulting multiply constant is simpler to generate. Do this by doing the transformation when trying to select a shifted operand, as that ensures that it actually turns out better (the alternative would be to do it in PreprocessISelDAG, but we don't know for sure there if extracting the shift would allow a shifted operand to be used). Differential Revision: http://reviews.llvm.org/D12196 llvm-svn: 247569
*	[opaque pointer type] Add textual IR support for explicit type parameter for ↵	David Blaikie	2015-09-11	2	-8/+8
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	global aliases update.py: import fileinput import sys import re alias_match_prefix = r"(.(?:=\|:\|^)\s(?:external \|)(?:(?:private\|internal\|linkonce\|linkonce_odr\|weak\|weak_odr\|common\|appending\|extern_weak\|available_externally) )?(?:default \|hidden \|protected )?(?:dllimport \|dllexport )?(?:unnamed_addr \|)(?:thread_local(?:$[a-z]$)? )?alias" plain = re.compile(alias_match_prefix + r" (.?))(\| addrspace$\d+$ )\($\| (?:%\|@\|null\|undef\|blockaddress\|addrspacecast\|\[\[[a-zA-Z]\|\{\{).$)") cast = re.compile(alias_match_prefix + r") ((?:bitcast\|inttoptr\|addrspacecast)\s$. to (.?)(\| addrspace\(\d+$ )\\)\s(?:;.)?$)") gep = re.compile(alias_match_prefix + r") ((?:getelementptr)\s(?:inbounds)?\s$(?P<type>.), (?P=type)(?:\saddrspace\(\d+$\s)?\* .\)\s(?:;.)?$)") def conv(line): m = re.match(cast, line) if m: return m.group(1) + " " + m.group(3) + ", " + m.group(2) m = re.match(gep, line) if m: return m.group(1) + " " + m.group(3) + ", " + m.group(2) m = re.match(plain, line) if m: return m.group(1) + ", " + m.group(2) + m.group(3) + "" + m.group(4) + "\n" return line for line in sys.stdin: sys.stdout.write(conv(line)) apply.sh: for name in "$@" do python3 `dirname "$0"`/update.py < "$name" > "$name.tmp" && mv "$name.tmp" "$name" rm -f "$name.tmp" done The actual commands: From llvm/src: find test/ -name .ll \| xargs ./apply.sh From llvm/src/tools/clang: find test/ -name .mm -o -name .m -o -name .cpp -o -name .c \| xargs -I '{}' ../../apply.sh "{}" From llvm/src/tools/polly: find test/ -name .ll \| xargs ./apply.sh llvm-svn: 247378
*	[ARM] Do not use vtrn for vectorshuffle if the order is reversed	James Molloy	2015-09-10	3	-0/+50
\| \| \| \| \| \| \| \|	The tests in isVTRNMask and isVTRN_v_undef_Mask should also check that the elements of the upper and lower half of the vectorshuffle occur in the correct order when both halves are used. Without this test the code assumes that it is correct to use vector transpose (vtrn) for the masks <1, 1, 0, 0> and <1, 3, 0, 2>, among others, but the transpose actually incorrectly generates shuffles for <0, 0, 1, 1> and <0, 2, 1, 3> in this case. Patch by Jeroen Ketema! llvm-svn: 247254
*	[SelectionDAG] Swap commutative binops before constant-based folding	Hal Finkel	2015-09-06	1	-3/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In searching for a fix for the underlying code-quality bug highlighted by r246937 (that SDAG simplification can lead to us generating an ISD::OR node with a constant zero LHS), I ran across this: We generically canonicalize commutative binary-operation nodes in SDAG getNode so that, if only one operand is a constant, it will be on the RHS. However, we were doing this only after a bunch of constant-based simplification checks that all assume this canonical form (that any constant will be on the RHS). Moving the operand-swapping canonicalization prior to these checks seems like the right thing to do (and, as it turns out, causes SDAG to completely fold away the computation in test/CodeGen/ARM/2012-11-14-subs_carry.ll, just like InstCombine would do). llvm-svn: 246938
*	[ARM] Add a test case for revision 243956.	Quentin Colombet	2015-09-03	1	-0/+35
\| \| \| \|	llvm-svn: 246785
*	[ARM] Don't abort on variable-idx extractelt in ReconstructShuffle.	Ahmed Bougacha	2015-09-01	1	-0/+16
\| \| \| \| \| \| \| \| \|	The code introduced in r244314 assumed that EXTRACT_VECTOR_ELT only takes constant indices, but it does accept variables. Bail out for those: we can't use them, as the shuffles we want to reconstruct do require constant masks. llvm-svn: 246594
*	[ARM][AArch64] Turn on by default interleaved access lowering	Silviu Baranga	2015-09-01	2	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Interleaved access lowering removes a memory operation and a sequence of vector shuffles and replaces it with a series of memory operations. This should be always beneficial. This pass in only enabled on ARM/AArch64. Reviewers: rengolin Subscribers: aemerson, llvm-commits, rengolin Differential Revision: http://reviews.llvm.org/D12145 llvm-svn: 246540
*	Distribute the weight on the edge from switch to default statement to edges ↵	Cong Hou	2015-09-01	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	generated in lowering switch. Currently, when edge weights are assigned to edges that are created when lowering switch statement, the weight on the edge to default statement (let's call it "default weight" here) is not considered. We need to distribute this weight properly. However, without value profiling, we have no idea how to distribute it. In this patch, I applied the heuristic that this weight is evenly distributed to successors. For example, given a switch statement with cases 1,2,3,5,10,11,20, and every edge from switch to each successor has weight 10. If there is a binary search tree built to test if n < 10, then its two out-edges will have weight 4x10+10/2 = 45 and 3x10 + 10/2 = 35 respectively (currently they are 40 and 30 without considering the default weight). Each distribution (which is 5 here) will be stored in each SwitchWorkListItem for further distribution. There are some exceptions: For a jump table header which doesn't have any edge to default statement, we don't distribute the default weight to it. For a bit test header which covers a contiguous range and hence has no edges to default statement, we don't distribute the default weight to it. When the branch checks a single value or a contiguous range with no edge to default statement, we don't distribute the default weight to it. In other cases, the default weight is evenly distributed to successors. Differential Revision: http://reviews.llvm.org/D12418 llvm-svn: 246522
*	Fix CHECK directives that weren't checking.	Hans Wennborg	2015-08-31	1	-3/+2
\| \| \| \|	llvm-svn: 246485
*	[ARM] Fix up buildbots after r246360	James Molloy	2015-08-29	1	-2/+2
\| \| \| \| \| \|	I have no idea how I missed this in my internal testing. Just no idea. Sorry for the bot-armageddon. llvm-svn: 246361
*	[ARM] Hoist fabs/fneg above a conversion to float.	James Molloy	2015-08-29	1	-0/+41
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is especially visible in softfp mode, for example in the implementation of libm fabs/fneg functions. If we have: %1 = vmovdrr r0, r1 %2 = fabs %1 then move the fabs before the vmovdrr: %1 = and r1, #0x7FFFFFFF %2 = vmovdrr r0, r1 This is never a lose, and could be a serious win because the vmovdrr may be followed by a vmovrrd, which would enable us to remove the conversion into FPRs completely. We already do this for f32, but not for f64. Tests are added for both. llvm-svn: 246360
*	DI: Update tests before adding !dbg subprogram attachments	Duncan P. N. Exon Smith	2015-08-28	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	I'm working on adding !dbg attachments to functions (PR23367), which we'll use to determine the canonical subprogram for a function (instead of the `subprograms:` array in the compile units). This updates a few old tests in preparation. Transforms/Mem2Reg/ConvertDebugInfo2.ll had an old-style grep+count based test that would start to fail because I've added an extra line with `!dbg`. Instead, explicitly `CHECK` for what I think the test actually cares about. All three testcases have subprograms with a valid `function:` reference -- which means my upgrade script will add a `!dbg` attachment -- but that aren't referenced from any compile unit. I suspect these testcases were handreduced over-zealously (or have bitrotted?). Add a reference from the compile unit so that upcoming Verifier checks won't fail here. llvm-svn: 246351
*	DI: Require subprogram definitions to be distinct	Duncan P. N. Exon Smith	2015-08-28	20	-38/+38
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As a follow-up to r246098, require `DISubprogram` definitions (`isDefinition: true`) to be 'distinct'. Specifically, add an assembler check, a verifier check, and bitcode upgrading logic to combat testcase bitrot after the `DIBuilder` change. While working on the testcases, I realized that test/Linker/subprogram-linkonce-weak-odr.ll isn't relevant anymore. Its purpose was to check for a corner case in PR22792 where two subprogram definitions match exactly and share the same metadata node. The new verifier check, requiring that subprogram definitions are 'distinct', precludes that possibility. I updated almost all the IR with the following script: git grep -l -E -e '= !DISubprogram$.* isDefinition: true' \| grep -v test/Bitcode \| xargs sed -i '' -e 's/= \(!DISubprogram(.*, isDefinition: true$/= distinct \1/' Likely some variant of would work for out-of-tree testcases. llvm-svn: 246327
*	Assign weights to edges to jump table / bit test header when lowering switch ↵	Cong Hou	2015-08-26	1	-1/+1
\| \| \| \| \| \| \| \| \| \|	statement. Currently, when lowering switch statement and a new basic block is built for jump table / bit test header, the edge to this new block is not assigned with a correct weight. This patch collects the edge weight from all its successors and assign this sum of weights to the edge (and also the other fall-through edge). Test cases are adjusted accordingly. Differential Revision: http://reviews.llvm.org/D12166#fae6eca7 llvm-svn: 246104