bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	Fix unsupported addressing mode assertion for pld	David Peixotto	2014-01-27	2	-22/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This commit gives an address mode to the PLD instruction. We were getting an assertion failure in the frame lowering code because we had code that was doing a pld of a stack allocated address. The frame lowering was checking the address mode and then asserting because pld had none defined. This commit fixes pld for arm mode. There was a previous fix for thumb mode in a separate commit. The commit for thumb mode added a test in a separate file because it would otherwise fail for arm. This commit moves the thumb test back into the prefetch.ll file and adds the corresponding arm test. Differential Revision: http://llvm-reviews.chandlerc.com/D2622 llvm-svn: 200248
*	[DAGCombiner] Teach how to fold sext/aext/zext of constant build vectors.	Andrea Di Biagio	2014-01-27	3	-12/+308
\| \| \| \| \| \| \| \| \| \| \| \| \|	This patch teaches the DAGCombiner how to fold a sext/aext/zext dag node when the operand in input is a build vector of constants (or UNDEFs). The inability to fold a sext/zext of a constant build_vector was the root cause of some pcg bugs affecting vselect expansion on x86-64 with AVX support. Before this change, the DAGCombiner only knew how to fold a sext/zext/aext of a ConstantSDNode. llvm-svn: 200234
*	MC: Add support for .cfi_startproc simple	David Majnemer	2014-01-27	1	-4/+12
\| \| \| \| \| \| \| \| \| \| \| \| \|	This commit allows LLVM MC to process .cfi_startproc directives when they are followed by an additional `simple' identifier. This signals to elide the emission of target specific CFI instructions that would normally occur initially. This fixes PR16587. Differential Revision: http://llvm-reviews.chandlerc.com/D2624 llvm-svn: 200227
*	[vectorize] Initial version of respecting PGO in the vectorizer: treat	Chandler Carruth	2014-01-27	1	-0/+25
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	cold loops as-if they were being optimized for size. Nothing fancy here. Simply test case included. The nice thing is that we can now incrementally build on top of this to drive other heuristics. All of the infrastructure work is done to get the profile information into this layer. The remaining work necessary to make this a fully general purpose loop unroller for very hot loops is to make it a fully general purpose loop unroller. Things I know of but am not going to have time to benchmark and fix in the immediate future: 1) Don't disable the entire pass when the target is lacking vector registers. This really doesn't make any sense any more. 2) Teach the unroller at least and the vectorizer potentially to handle non-if-converted loops. This is trivial for the unroller but hard for the vectorizer. 3) Compute the relative hotness of the loop and thread that down to the various places that make cost tradeoffs (very likely only the unroller makes sense here, and then only when dealing with loops that are small enough for unrolling to not completely blow out the LSD). I'm still dubious how useful hotness information will be. So far, my experiments show that if we can get the correct logic for determining when unrolling actually helps performance, the code size impact is completely unimportant and we can unroll in all cases. But at least we'll no longer burn code size on cold code. One somewhat unrelated idea that I've had forever but not had time to implement: mark all functions which are only reachable via the global constructors rigging in the module as optsize. This would also decrease the impact of any more aggressive heuristics here on code size. llvm-svn: 200219
*	ConstantHoisting: We can't insert instructions directly in front of a PHI node.	Benjamin Kramer	2014-01-27	2	-0/+52
\| \| \| \| \| \|	Insert before the terminating instruction of the dominating block instead. llvm-svn: 200218
*	[vectorizer] Add an override for the target instruction cost and use it	Chandler Carruth	2014-01-27	1	-1/+1
\| \| \| \| \| \| \|	to stabilize a test that really is trying to test generic behavior and not a specific target's behavior. llvm-svn: 200215
*	[vectorizer] Teach the loop vectorizer's unroller to only unroll by	Chandler Carruth	2014-01-27	1	-1/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	powers of two. This is essentially always the correct thing given the impact on alignment, scaling factors that can be used in addressing modes, etc. Also, fix the management of the unroll vs. small loop cost to more accurately model things with this world. Enhance a test case to actually exercise more of the unroll machinery if using synthetic constants rather than a specific target model. Before this change, with the added flags this test will unroll 3 times instead of either 2 or 4 (the two sensible answers). While I don't expect this to make a huge difference, if there are lots of loops sitting right on the edge of hitting the 'small unroll' factor, they might change behavior. However, I've benchmarked moving the small loop cost up and down in many various ways and by a huge factor (2x) without seeing more than 0.2% code size growth. Small adjustments such as the series that led up here have led to about 1% improvement on some benchmarks, but it is very close to the noise floor so I mostly checked that nothing regressed. Let me know if you see bad behavior on other targets but I don't expect this to be a sufficiently dramatic change to trigger anything. llvm-svn: 200213
*	Fix crasher introduced in r200203 and caught by a libc++ buildbot. Don't ↵	Nick Lewycky	2014-01-27	1	-0/+9
\| \| \| \| \| \|	assume that getMulExpr returns a SCEVMulExpr, it may have simplified it to something else! llvm-svn: 200210
*	Teach SCEV to handle more cases of 'and X, CST', specifically where CST is ↵	Nick Lewycky	2014-01-27	4	-15/+35
\| \| \| \| \| \| \| \| \| \|	any number of contiguous 1 bits in a row, with any number of leading and trailing 0 bits. Unfortunately, this in turn led to some lower quality SCEVs due to some different paths through expression simplification, so add getUDivExactExpr and use it. This fixes all instances of the problems that I found, but we can make that function smarter as necessary. Merge test "xor-and.ll" into "and-xor.ll" since I needed to update it anyways. Test 'nsw-offset.ll' analyzes a little deeper, %n now gets a scev in terms of %no instead of a SCEVUnknown. llvm-svn: 200203
*	Additional fix for 200201: due to dependence on bitwidth test was moved to ↵	Stepan Dyatkovskiy	2014-01-27	1	-0/+0
\| \| \| \| \| \|	X86 directory. llvm-svn: 200202
*	Fix for PR18102.	Stepan Dyatkovskiy	2014-01-27	1	-0/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Issue outcomes from DAGCombiner::MergeConsequtiveStores, more precisely from mem-ops sequence sorting. Consider, how MergeConsequtiveStores works for next example: store i8 1, a[0] store i8 2, a[1] store i8 3, a[1] ; a[1] again. return ; DAG starts here 1. Method will collect all the 3 stores. 2. It sorts them by distance from the base pointer (farthest with highest index). 3. It takes first consecutive non-overlapping stores and (if possible) replaces them with a single store instruction. The point is, we can't determine here which 'store' instruction would be the second after sorting ('store 2' or 'store 3'). It happens that 'store 3' would be the second, and 'store 2' would be the third. So after merging we have the next result: store i16 (1 \| 3 << 8), base ; is a[0] but bit-casted to i16 store i8 2, a[1] So actually we swapped 'store 3' and 'store 2' and got wrong contents in a[1]. Fix: In sort routine just also take into account mem-op sequence number. llvm-svn: 200201
*	R600/SI: Add intrinsic for BUFFER_LOAD_DWORD* instructions	Michel Danzer	2014-01-27	1	-0/+40
\| \| \| \| \|	Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 200196
*	R600/SI: Add intrinsic for S_SENDMSG instruction	Michel Danzer	2014-01-27	1	-0/+21
\| \| \| \| \|	Reviewed-by: Tom Stellard <thomas.stellard@amd.com> llvm-svn: 200195
*	Rename IMAGE_DLL_CHARACTERISTICS_HIGH_ENTROPY_VA.	Rui Ueyama	2014-01-27	1	-1/+1
\| \| \| \| \| \| \|	editbin.exe and link.exe both accepts /highentropyva option to set this bit, so doing s/VIRTUAL_ADDRESS/VA/ should make sense. llvm-svn: 200191
*	[AArch64 NEON] Try to generate CONCAT_VECTOR when lowering BUILD_VECTOR or ↵	Kevin Qin	2014-01-27	1	-0/+270
\| \| \| \| \| \| \| \|	SHUFFLE_VECTOR. Replace r199791. llvm-svn: 200180
*	Revert r199791.	Kevin Qin	2014-01-27	1	-270/+0
\| \| \| \| \| \|	It's old version which has some bugs. I'll commit lattest patch soon. llvm-svn: 200179
*	MC: fix test locations/name	Saleem Abdulrasool	2014-01-26	2	-0/+0
\| \| \| \| \| \| \|	Placed the MC variant diagnostics in the wrong directory accidentally. Move them into their respective architecture specific directories. llvm-svn: 200161
*	ARM: improve diagnostics for .word directive	Saleem Abdulrasool	2014-01-26	1	-0/+12
\| \| \| \| \| \| \| \| \|	If a complex expression was passed to the .word directive and the first part of the directive failed to parse, a secondary diagnostic would be produced that would clutter the error diagnostics. Improve the diagnostics by consuming the remainder of the statement. llvm-svn: 200160
*	AsmParser: improve diagnostics for invalid variants	Saleem Abdulrasool	2014-01-26	2	-0/+24
\| \| \| \| \| \| \| \| \|	An emitted diagnostic for an invalid relocation variant would place the caret on the token following the relocation variant indicator or at the end of the line if there was no following token. This change corrects the placement of the caret to point to the token. llvm-svn: 200159
*	Clean up the Legal/Expand logic for SPARC popc.	Jakob Stoklund Olesen	2014-01-26	1	-2/+2
\| \| \| \|	llvm-svn: 200141
*	Implement the missing bits corresponding to .mips_hack_elf_flags.	Rafael Espindola	2014-01-26	7	-33/+87
\| \| \| \| \| \| \| \| \| \| \| \|	These were: * noreorder handling on the target object streamer and asm parser. * setting the initial flag bits based on the enabled features. * setting the elf header flag for micromips It is really depressing I am the one doing this instead of someone at mips actually taking the time to understand the infrastructure. llvm-svn: 200138
*	Only generate the popc instruction for SPARC CPUs that implement it.	Jakob Stoklund Olesen	2014-01-26	1	-6/+6
\| \| \| \| \| \| \|	The popc instruction is defined in the SPARCv9 instruction set architecture, but it was emulated on CPUs older than Niagara 2. llvm-svn: 200131
*	Fix swapped CASA operands.	Jakob Stoklund Olesen	2014-01-26	1	-2/+2
\| \| \| \| \| \|	Found by SingleSource/UnitTests/AtomicOps.c llvm-svn: 200130
*	[Sparc] Add support for parsing DW_CFA_GNU_window_save.	Venkatraman Govindaraju	2014-01-26	2	-0/+74
\| \| \| \|	llvm-svn: 200127
*	COFF: Add a missing enum value for high entropy ASLR.	Rui Ueyama	2014-01-26	1	-0/+1
\| \| \| \| \| \| \| \| \|	That bit is not documented in the PE/COFF spec published by Microsoft, so we don't know the official name of it. I named this bit IMAGE_DLL_CHARACTERISTICS_HIGH_ENTROPY_VIRTUAL_ADDRESS because the bit is reported as "high entropy virtual address" by dumpbin.exe, llvm-svn: 200121
*	Improve pattern match from v1i8 to v1i32 for AArch64 Neon.	Jiangning Liu	2014-01-26	1	-2/+1
\| \| \| \|	llvm-svn: 200119
*	llvm-readobj: add support for PE32+ (Windows 64 bit executable).	Rui Ueyama	2014-01-26	2	-0/+82
\| \| \| \| \| \| \| \| \| \| \| \|	PE32+ supports 64 bit address space, but the file format remains 32 bit. So its file format is pretty similar to PE32 (32 bit executable). The differences compared to PE32 are (1) the lack of "BaseOfData" field and (2) some of its data members are 64 bit. In this patch, I added a new member function to get a PE32+ Header object to COFFObjectFile class and made llvm-readobj to use it. llvm-svn: 200117
*	Remove -print-hack-directives from a test where we already do the right thing.	Rafael Espindola	2014-01-26	1	-1/+1
\| \| \| \|	llvm-svn: 200116
*	Move tests that just use llc from test/MC/Mips to test/MC/Codegen.	Rafael Espindola	2014-01-26	4	-0/+0
\| \| \| \| \| \|	This is an expanded version of r200064. llvm-svn: 200115
*	Implement pattern match from v1xx to v1xx for AArch64 Neon.	Jiangning Liu	2014-01-26	1	-0/+114
\| \| \| \|	llvm-svn: 200113
*	[Sparc] Add support for sparc relocation types in ELF object file.	Venkatraman Govindaraju	2014-01-26	1	-0/+13
\| \| \| \|	llvm-svn: 200112
*	[AArch64 NEON] Add patterns for concat_vector on v2i32.	Kevin Qin	2014-01-26	1	-19/+46
\| \| \| \|	llvm-svn: 200111
*	[AArch64 NEON] Add test case for vector FP_ROUND.	Kevin Qin	2014-01-26	1	-0/+18
\| \| \| \|	llvm-svn: 200110
*	Re-enabling MCJIT tests on ARM	Renato Golin	2014-01-25	1	-2/+2
\| \| \| \| \| \| \| \| \|	After several refactorings on the MCJIT remote communication, things are finally looking good on Clang-compiled LLVM regarding MCJIT remote tests, so I'm re-enabling them to see how the self-hosting buildbot behaves over a longer period. llvm-svn: 200102
*	[Sparc] Add sparc to the list of XFAIL architecture. It seems that the ↵	Venkatraman Govindaraju	2014-01-25	1	-1/+1
\| \| \| \| \| \|	llvm-cov test is not supported in big-endian architectures. llvm-svn: 200101
*	Add a TBAA CodeGen failure test case	Hal Finkel	2014-01-25	1	-0/+41
\| \| \| \| \| \| \| \| \|	I disabled the use of TBAA in CodeGen in r200093. This adds a test case that demonstrates the problems with inttoptr and TBAA in CodeGen (and, specifically, the problem that causes LLVM to miscompile itself in Release mode). This test will currently fail if -use-tbaa-in-sched-mi is enabled. llvm-svn: 200097
*	XFAIL test/CodeGen/SystemZ/alias-01.ll which requires CodeGen TBAA	Hal Finkel	2014-01-25	1	-0/+3
\| \| \| \|	llvm-svn: 200094
*	Fix "llvm-objdump -d -r" to show relocations inline for ELF files	Mark Seaborn	2014-01-25	1	-0/+35
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This fixes a regression introduced by r182908, which broke llvm-objdump's ability to display relocations inline in a disassembly dump for ELF object files. That change removed a SectionRelocMap from Object/ELF.h, which we recreate in llvm-objdump.cpp. I discovered this regression via an out-of-tree test (test/NaCl/X86/pnacl-hides-sandbox-x86-64.ll) which used llvm-objdump. Note that the "Unknown" string in the test output on i386 isn't quite right, but this appears to be a pre-existing bug. Differential Revision: http://llvm-reviews.chandlerc.com/D2559 llvm-svn: 200090
*	Reverting r199886 (Prevent repetitive warnings for unrecognized processors ↵	Artyom Skrobov	2014-01-25	1	-15/+0
\| \| \| \| \| \|	and features) llvm-svn: 200083
*	This reverts commit r200064 and r200051.	Rafael Espindola	2014-01-25	4	-137/+78
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	r200064 depends on r200051. r200051 is broken: I tries to replace .mips_hack_elf_flags, which is a good thing, but what it replaces it with is even worse. The new emitMipsELFFlags it adds corresponds to no assembly directive, is not marked as a hack and is not even printed to the .s file. The patch also introduces more uses of hasRawTextSupport. The correct way to remove .mips_hack_elf_flags is to have the mips target streamer handle the default flags (and command line options). That way the same code path is used for asm and obj. The streamer interface should really correspond to what is printed in the .s file. llvm-svn: 200078
*	[LPM] Make LCSSA a utility with a FunctionPass that applies it to all	Chandler Carruth	2014-01-25	3	-14/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	the loops in a function, and teach LICM to work in the presance of LCSSA. Previously, LCSSA was a loop pass. That made passes requiring it also be loop passes and unable to depend on function analysis passes easily. It also caused outer loops to have a different "canonical" form from inner loops during analysis. Instead, we go into LCSSA form and preserve it through the loop pass manager run. Note that this has the same problem as LoopSimplify that prevents enabling its verification -- loop passes which run at the end of the loop pass manager and don't preserve these are valid, but the subsequent loop pass runs of outer loops that do preserve this pass trigger too much verification and fail because the inner loop no longer verifies. The other problem this exposed is that LICM was completely unable to handle LCSSA form. It didn't preserve it and it actually would give up on moving instructions in many cases when they were used by an LCSSA phi node. I've taught LICM to support detecting LCSSA-form PHI nodes and to hoist and sink around them. This may actually let LICM fire significantly more because we put everything into LCSSA form to rotate the loop before running LICM. =/ Now LICM should handle that fine and preserve it correctly. The down side is that LICM has to require LCSSA in order to preserve it. This is just a fact of life for LCSSA. It's entirely possible we should completely remove LCSSA from the optimizer. The test updates are essentially accomodating LCSSA phi nodes in the output of LICM, and the fact that we now completely sink every instruction in ashr-crash below the loop bodies prior to unrolling. With this change, LCSSA is computed only three times in the pass pipeline. One of them could be removed (and potentially a SCEV run and a separate LoopPassManager entirely!) if we had a LoopPass variant of InstCombine that ran InstCombine on the loop body but refused to combine away LCSSA PHI nodes. Currently, this also prevents loop unrolling from being in the same loop pass manager is rotate, LICM, and unswitch. There is one thing that I really don't like -- preserving LCSSA in LICM is quite expensive. We end up having to re-run LCSSA twice for some loops after LICM runs because LICM can undo LCSSA both in the current loop and the parent loop. I don't really see good solutions to this other than to completely move away from LCSSA and using tools like SSAUpdater instead. llvm-svn: 200067
*	[Mips] Move 2 test cases from MC to CodeGen.	Jack Carter	2014-01-25	2	-0/+0
\| \| \| \| \| \|	No code changes. Just reassignment of test case files. llvm-svn: 200064
*	Revert "Revert "Add Constant Hoisting Pass" (r200034)"	Juergen Ributzka	2014-01-25	2	-2/+71
\| \| \| \| \| \| \|	This reverts commit r200058 and adds the using directive for ARMTargetTransformInfo to silence two g++ overload warnings. llvm-svn: 200062
*	Revert "Add Constant Hoisting Pass" (r200034)	Hans Wennborg	2014-01-25	2	-71/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This commit caused -Woverloaded-virtual warnings. The two new TargetTransformInfo::getIntImmCost functions were only added to the superclass, and to the X86 subclass. The other targets were not updated, and the warning highlighted this by pointing out that e.g. ARMTTI::getIntImmCost was hiding the two new getIntImmCost variants. We could pacify the warning by adding "using TargetTransformInfo::getIntImmCost" to the various subclasses, or turning it off, but I suspect that it's wrong to leave the functions unimplemnted in those targets. The default implementations return TCC_Free, which I don't think is right e.g. for ARM. llvm-svn: 200058
*	[Mips] TargetStreamer ELF flag Support for default and commandline options.	Jack Carter	2014-01-25	3	-34/+93
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch uses a common MipsTargetSteamer interface for both MipsAsmPrinter and MipsAsmParser for recording default and commandline driven directives that affect ELF header flags. It has been noted that the .ll tests affected by this patch belong in test/Codegen/Mips. I will move them in a separate patch. Also, a number of directives do not get expressed by AsmPrinter in the resultant .s assembly such as setting the correct ASI. I have noted this in the tests and they will be addressed in later patches. llvm-svn: 200051
*	[AArch64] Removed unused i8 type from FPR8 register class.	Ana Pazos	2014-01-24	1	-0/+47
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The i8 type is not registered with any register class. This causes a segmentation fault in MachineLICM::getRegisterClassIDAndCost. The code selects the first type associated with register class FPR8, which happens to be i8. It uses this type (i8) to get the representative class pointer, which is 0. It then uses this pointer to access a field, resulting in segmentation fault. Since i8 type is not being used for printing any neon instruction we can safely remove it. llvm-svn: 200046
*	Add Constant Hoisting Pass	Juergen Ributzka	2014-01-24	2	-2/+71
\| \| \| \| \| \| \| \|	Retry commit r200022 with a fix for the build bot errors. Constant expressions have (unlike instructions) module scope use lists and therefore may have users in different functions. The fix is to simply ignore these out-of-function uses. llvm-svn: 200034
*	Verify that attributes are not lost during linking.	Bill Wendling	2014-01-24	2	-0/+22
\| \| \| \| \| \| \| \|	We don't want to lose attributes when a function decl without them is merged with a function decl that has them. PR2382 llvm-svn: 200030
*	InstCombine: Don't try to use aggregate elements of ConstantExprs.	Benjamin Kramer	2014-01-24	1	-0/+8
\| \| \| \| \| \|	PR18600. llvm-svn: 200028
*	Add a testcase for the changes in r199938.	Lang Hames	2014-01-24	1	-3/+21
\| \| \| \| \| \|	<rdar://problem/15611947> llvm-svn: 200027