bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[SROA] Teach SROA how to much more intelligently handle split loads and	Chandler Carruth	2015-01-01	2	-79/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	stores. When there are accesses to an entire alloca with an integer load or store as well as accesses to small pieces of the alloca, SROA splits up the large integer accesses. In order to do that, it uses bit math to merge the small accesses into large integers. While this is effective, it produces insane IR that can cause significant problems in the rest of the optimizer: - It can cause load and store mismatches with GVN on the non-alloca side where we end up loading an i64 (or some such) rather than loading specific elements that are stored. - We can't always get rid of the integer bit math, which is why we can't always fix the loads and stores to work well with GVN. - This is especially bad when we have operations that mix poorly with integer bit math such as floating point operations. - It will block things like the vectorizer which might be able to handle the scalar stores that underly the aggregate. At the same time, we can't just directly split up these loads and stores in all cases. If there is actual integer arithmetic involved on the values, then using integer bit math is actually the perfect lowering because we can often combine it heavily with the surrounding math. The solution this patch provides is to find places where SROA is partitioning aggregates into small elements, and look for splittable loads and stores that it can split all the way to some other adjacent load and store. These are uniformly the cases where failing to split the loads and stores hurts the optimizer that I have seen, and I've looked extensively at the code produced both from more and less aggressive approaches to this problem. However, it is quite tricky to actually do this in SROA. We may have loads and stores to the same alloca, or other complex patterns that are hard to handle. This complexity leads to the somewhat subtle algorithm implemented here. We have to do this entire process as a separate pass over the partitioning of the alloca, and split up all of the loads prior to splitting the stores so that we can handle safely the cases of overlapping, including partially overlapping, loads and stores to the same alloca. We also have to reconstitute the post-split slice configuration so we can avoid iterating again over all the alloca uses (the slow part of SROA). But we also have to ensure that when we split up loads and stores to other allocas, we do re-iterate over them in SROA to adapt to the more refined partitioning now required. With this, I actually think we can fix a long-standing TODO in SROA where I avoided splitting as many loads and stores as probably should be splittable. This limitation historically mitigated the fallout of all the bad things mentioned above. Now that we have more intelligent handling, I plan to remove the FIXME and more aggressively mark integer loads and stores as splittable. I'll do that in a follow-up patch to help with bisecting any fallout. The net result of this change should be more fine-grained and accurate scalars being formed out of aggregates. At the very least, Clang now generates perfect code for this high-level test case using std::complex<float>: #include <complex> void g1(std::complex<float> &x, float a, float b) { x += std::complex<float>(a, b); } void g2(std::complex<float> &x, float a, float b) { x -= std::complex<float>(a, b); } void foo(const std::complex<float> &x, float a, float b, std::complex<float> &x1, std::complex<float> &x2) { std::complex<float> l1 = x; g1(l1, a, b); std::complex<float> l2 = x; g2(l2, a, b); x1 = l1; x2 = l2; } This code isn't just hypothetical either. It was reduced out of the hot inner loops of essentially every part of the Eigen math library when using std::complex<float>. Those loops would consistently and pervasively hop between the floating point unit and the integer unit due to bit math extraction and insertion of floating point values that were "stored" in a 64-bit integer register around the loop backedge. So far, this change has passed a bootstrap and I have done some other testing and so far, no issues. That doesn't mean there won't be though, so I'll be prepared to help with any fallout. If you performance swings in particular, please let me know. I'm very curious what all the impact of this change will be. Stay tuned for the follow-up to also split more integer loads and stores. llvm-svn: 225061
*	[PowerPC] Improve instruction selection bit-permuting operations (64-bit)	Hal Finkel	2015-01-01	1	-0/+239
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This is the second installment of improvements to instruction selection for "bit permutation" instruction sequences. r224318 added logic for instruction selection for 32-bit bit permutation sequences, and this adds lowering for 64-bit sequences. The 64-bit sequences are more complicated than the 32-bit ones because: a) the 64-bit versions of the 32-bit rotate-and-mask instructions work by replicating the lower 32-bits of the value-to-be-rotated into the upper 32 bits -- and integrating this into the cost modeling for the various bit group operations is non-trivial b) unlike the 32-bit instructions in 32-bit mode, the rotate-and-mask instructions cannot, in one instruction, specify the mask starting index, the mask ending index, and the rotation factor. Also, forming arbitrary 64-bit constants is more complicated than in 32-bit mode because the number of instructions necessary is value dependent. Plus, support for 'late masking' was added: it is sometimes more efficient to treat the overall value as if it had no mandatory zero bits when planning the bit-group insertions, and then mask them in at the very end. Unfortunately, as the structure of the bit groups is different in the two cases, the more feasible implementation technique was to generate both instruction sequences, and then pick the shorter one. And finally, we now generate reasonable code for i64 bswap: rldicl 5, 3, 16, 0 rldicl 4, 3, 8, 0 rldicl 6, 3, 24, 0 rldimi 4, 5, 8, 48 rldicl 5, 3, 32, 0 rldimi 4, 6, 16, 40 rldicl 6, 3, 48, 0 rldimi 4, 5, 24, 32 rldicl 5, 3, 56, 0 rldimi 4, 6, 40, 16 rldimi 4, 5, 48, 8 rldimi 4, 3, 56, 0 vs. what we used to produce: li 4, 255 rldicl 5, 3, 24, 40 rldicl 6, 3, 40, 24 rldicl 7, 3, 56, 8 sldi 8, 3, 8 sldi 10, 3, 24 sldi 12, 3, 40 rldicl 0, 3, 8, 56 sldi 9, 4, 32 sldi 11, 4, 40 sldi 4, 4, 48 andi. 5, 5, 65280 andis. 6, 6, 255 andis. 7, 7, 65280 sldi 3, 3, 56 and 8, 8, 9 and 4, 12, 4 and 9, 10, 11 or 6, 7, 6 or 5, 5, 0 or 3, 3, 4 or 7, 9, 8 or 4, 6, 5 or 3, 3, 7 or 3, 3, 4 which is 12 instructions, instead of 25, and seems optimal (at least in terms of code size). llvm-svn: 225056
*	InstCombine: fsub nsz 0, X ==> fsub nsz -0.0, X	Sanjay Patel	2014-12-31	1	-0/+8
\| \| \| \| \| \| \| \| \| \| \| \| \|	Some day the backend may handle instruction-level fast math flags and make this transform unnecessary, but it's still better practice to use the canonical representation of fneg when possible (use a -0.0). This is a partial fix for PR20870 ( http://llvm.org/bugs/show_bug.cgi?id=20870 ). See also http://reviews.llvm.org/D6723. Differential Revision: http://reviews.llvm.org/D6731 llvm-svn: 225050
*	Add r224985 back with a fix.	Rafael Espindola	2014-12-31	3	-0/+124
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	The issues was that AArch64 has additional restrictions on when local relocations can be used. We have to take those into consideration when deciding to put a L symbol in the symbol table or not. Original message: Remove doesSectionRequireSymbols. In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. llvm-svn: 225048
*	Reverting 225045 and 225043 and XFAIL multiline.ll on hexagon	Colin LeMahieu	2014-12-31	1	-0/+1
\| \| \| \|	llvm-svn: 225047
*	Add a test for the recent compiler-rt build failure.	Rafael Espindola	2014-12-31	1	-0/+14
\| \| \| \|	llvm-svn: 225046
*	Revert "Remove doesSectionRequireSymbols."	Rafael Espindola	2014-12-31	3	-124/+0
\| \| \| \| \| \| \| \|	This reverts commit r224985. I am investigating why it made an Apple bot unhappy. llvm-svn: 225044
*	[X86] Update disassembler tests for absolute move instructions to check the ↵	Craig Topper	2014-12-31	1	-37/+37
\| \| \| \| \| \|	encodings. This provides testing for r225036. 64-bit mode is still broken. llvm-svn: 225037
*	InstCombine: try to transform A-B < 0 into A < B	David Majnemer	2014-12-31	1	-0/+36
\| \| \| \| \| \| \|	We are allowed to move the 'B' to the right hand side if we an prove there is no signed overflow and if the comparison itself is signed. llvm-svn: 225034
*	Revert "merge consecutive stores of extracted vector elements"	Alexey Samsonov	2014-12-31	1	-33/+0
\| \| \| \| \| \| \|	This reverts commit r224611. This change causes crashes in X86 DAG->DAG Instruction Selection. llvm-svn: 225031
*	[Hexagon] Adding accumulating add/sub, doubleword logic-not variants, ↵	Colin LeMahieu	2014-12-31	3	-0/+24
\| \| \| \| \| \|	doubleword bitfield extract, word parity, accumulating multiplies with saturation. llvm-svn: 225024
*	Fix a test case to not depend on asm comment syntax, so as to be portable	David Blaikie	2014-12-30	1	-9/+9
\| \| \| \| \| \| \| \|	Too many different comment characters - instead of trying to account for them all, instead disable the comments and just check for end-of-line instead. llvm-svn: 225020
*	Generalize even further, for ARM comment syntax (@)	David Blaikie	2014-12-30	1	-8/+8
\| \| \| \|	llvm-svn: 225019
*	[Hexagon] Adding double-logic on predicate instructions.	Colin LeMahieu	2014-12-30	1	-2/+22
\| \| \| \|	llvm-svn: 225018
*	Generalize test case to handle different asm syntax (# or // comments)	David Blaikie	2014-12-30	1	-8/+8
\| \| \| \|	llvm-svn: 225017
*	[Hexagon] Adding newvalue compare and jumps.	Colin LeMahieu	2014-12-30	1	-0/+134
\| \| \| \|	llvm-svn: 225015
*	DebugInfo: Omit is_stmt from line table entries on the same line.	David Blaikie	2014-12-30	2	-1/+82
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	GCC does this for non-zero discriminators and since GCC doesn't produce column info, that was the only place it comes up there. For LLVM, since we can emit discriminators and/or column info, it makes more sense to invert the condition and just test for changes in line number. This should resolve at least some of the GDB 7.5 test suite failures created by recent Clang changes that increase the location fidelity (which, since Clang defaults to including column info on Linux by default created a bunch of cases that confused GDB). In theory we could do this better/differently by grouping actual source statements together in a similar manner to the way lexical scopes are handled but given that GDB isn't really in a position to consume that (& users are probably somewhat used to different lines being different 'statements') this seems the safest and cheapest change. (I'm concerned that doing this 'right' would bloat the debugloc data even further - something Duncan's working hard to address) llvm-svn: 225011
*	[Hexagon] Adding postincrement register newvalue stores.	Colin LeMahieu	2014-12-30	1	-0/+9
\| \| \| \|	llvm-svn: 225010
*	[Hexagon] Removing old newvalue store variants. Adding postincrement ↵	Colin LeMahieu	2014-12-30	1	-0/+51
\| \| \| \| \| \|	immediate newvalue stores. llvm-svn: 225009
*	[mips][microMIPS] Relocate with symbol for micromips symbols	Zoran Jovanovic	2014-12-30	1	-0/+16
\| \| \| \| \| \|	Differential Revision: http://reviews.llvm.org/D6796 llvm-svn: 225008
*	[Hexagon] Adding indexed store new-value variants.	Colin LeMahieu	2014-12-30	1	-0/+51
\| \| \| \|	llvm-svn: 225007
*	[Hexagon] Adding indexed store of immediates.	Colin LeMahieu	2014-12-30	1	-0/+36
\| \| \| \|	llvm-svn: 225006
*	[Hexagon] Adding indexed stores.	Colin LeMahieu	2014-12-30	2	-0/+115
\| \| \| \|	llvm-svn: 225005
*	x86_64: Fix calls to __morestack under the large code model.	Peter Collingbourne	2014-12-30	1	-0/+15
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Under the large code model, we cannot assume that __morestack lives within 2^31 bytes of the call site, so we cannot use pc-relative addressing. We cannot perform the call via a temporary register, as the rax register may be used to store the static chain, and all other suitable registers may be either callee-save or used for parameter passing. We cannot use the stack at this point either because __morestack manipulates the stack directly. To avoid these issues, perform an indirect call via a read-only memory location containing the address. This solution is not perfect, as it assumes that the .rodata section is laid out within 2^31 bytes of each function body, but this seems to be sufficient for JIT. Differential Revision: http://reviews.llvm.org/D6787 llvm-svn: 225003
*	[asan] change _sanitizer_cov_module_init to accept int* instead of int**	Kostya Serebryany	2014-12-30	1	-1/+1
\| \| \| \|	llvm-svn: 224999
*	[COFF] Don't try to add quotes to already quoted linker directives	Michael Kuperstein	2014-12-30	1	-1/+2
\| \| \| \| \| \| \| \| \| \|	If a linker directive is already quoted, don't try to quote it again, otherwise it creates a mess. This pops up in places like: #pragma comment(linker,"\"/foo bar'\"") Differential Revision: http://reviews.llvm.org/D6792 llvm-svn: 224998
*	[Hexagon] Adding reg-reg indexed load forms.	Colin LeMahieu	2014-12-30	3	-9/+81
\| \| \| \|	llvm-svn: 224997
*	[Hexagon] Adding compare byte/halfword reg-reg/reg-imm forms. Adding ↵	Colin LeMahieu	2014-12-30	2	-0/+28
\| \| \| \| \| \|	compare to general register reg-imm form. llvm-svn: 224991
*	[Hexagon] Updating constant extender def, adding alu-not instructions, ↵	Colin LeMahieu	2014-12-30	2	-4/+17
\| \| \| \| \| \|	compare to general register, and inverted compares. llvm-svn: 224989
*	Remove doesSectionRequireSymbols.	Rafael Espindola	2014-12-30	3	-0/+124
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In an assembly expression like bar: .long L0 + 1 the intended semantics is that bar will contain a pointer one byte past L0. In sections that are merged by content (strings, 4 byte constants, etc), a single position in the section doesn't give the linker enough information. For example, it would not be able to tell a relocation must point to the end of a string, since that would look just like the start of the next. The solution used in ELF to use relocation with symbols if there is a non-zero addend. In MachO before this patch we would just keep all symbols in some sections. This would miss some cases (only cstrings on x86_64 were implemented) and was inefficient since most relocations have an addend of 0 and can be represented without the symbol. This patch implements the non-zero addend logic for MachO too. llvm-svn: 224985
*	Simplify test a bit.	Rafael Espindola	2014-12-30	1	-516/+3
\| \| \| \| \| \| \| \| \| \|	It looks like the original intent was to check which symbols were created. With macho-dump the sections were being checked just to match which symbol was in which section. llvm-objdump prints the section a symbol is in. llvm-svn: 224980
*	[OCaml] Fix bitrot in tests.	Peter Zotov	2014-12-30	1	-2/+2
\| \| \| \|	llvm-svn: 224979
*	[lit] Make config.llvm_lib_dir available on cmake, too.	Peter Zotov	2014-12-30	2	-2/+2
\| \| \| \| \| \| \|	The OCaml tests require config.llvm_lib_dir to determine the OCaml package search path. llvm-svn: 224978
*	Testcases for r224939.	Craig Topper	2014-12-30	1	-0/+17
\| \| \| \|	llvm-svn: 224976
*	Convert test to llvm-readobj. NFC.	Rafael Espindola	2014-12-30	1	-127/+214
\| \| \| \|	llvm-svn: 224973
*	Semantic tests for memory invalidation at statepoints	Philip Reames	2014-12-29	1	-0/+108
\| \| \| \| \| \| \| \| \| \|	These are simply a collection of tests intended to show that information about the contents of gc references in the heap is lost at a statepoint. I've tried to write them so that they don't disallow correct transformations, while still being fairly easy to understand. p.s. Ideas for additional tests are welcome. Differential Revision: http://reviews.llvm.org/D6491 llvm-svn: 224971
*	Carry facts about nullness and undef across GC relocation	Philip Reames	2014-12-29	1	-0/+52
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This change implements four basic optimizations: If a relocated value isn't used, it doesn't need to be relocated. If the value being relocated is null, relocation doesn't change that. (Technically, this might be collector specific. I don't know of one which it doesn't work for though.) If the value being relocated is undef, the relocation is meaningless. If the value being relocated was known nonnull, the relocated pointer also isn't null. (Since it points to the same source language object.) I outlined other planned work in comments. Differential Revision: http://reviews.llvm.org/D6600 llvm-svn: 224968
*	Refine the notion of MayThrow in LICM to include a header specific version	Philip Reames	2014-12-29	1	-0/+69
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In LICM, we have a check for an instruction which is guaranteed to execute and thus can't introduce any new faults if moved to the preheader. To handle a function which might unconditionally throw when first called, we check for any potentially throwing call in the loop and give up. This is unfortunate when the potentially throwing condition is down a rare path. It prevents essentially all LICM of potentially faulting instructions where the faulting condition is checked outside the loop. It also greatly diminishes the utility of loop unswitching since control dependent instructions - which are now likely in the loops header block - will not be lifted by subsequent LICM runs. define void @nothrow_header(i64 %x, i64 %y, i1 %cond) { ; CHECK-LABEL: nothrow_header ; CHECK-LABEL: entry ; CHECK: %div = udiv i64 %x, %y ; CHECK-LABEL: loop ; CHECK: call void @use(i64 %div) entry: br label %loop loop: ; preds = %entry, %for.inc %div = udiv i64 %x, %y br i1 %cond, label %loop-if, label %exit loop-if: call void @use(i64 %div) br label %loop exit: ret void } The current patch really only helps with non-memory instructions (i.e. divs, etc..) since the maythrow call down the rare path will be considered to alias an otherwise hoistable load. The one exception is that it does kick in for loads which are known to be invariant without regard to other possible stores, i.e. those marked with either !invarant.load metadata of tbaa 'is constant memory' metadata. Differential Revision: http://reviews.llvm.org/D6725 llvm-svn: 224965
*	Loading from null is valid outside of addrspace 0	Philip Reames	2014-12-29	1	-0/+20
\| \| \| \| \| \| \| \| \| \|	This patches fixes a miscompile where we were assuming that loading from null is undefined and thus we could assume it doesn't happen. This transform is perfectly legal in address space 0, but is not neccessarily legal in other address spaces. We really should introduce a hook to control this property on a per target per address space basis. We may be loosing valuable optimizations in some address spaces by being too conservative. Original patch by Thomas P Raoux (submitted to llvm-commits), tests and formatting fixes by me. llvm-svn: 224961
*	Convert test to llvm-readobj. NFC.	Rafael Espindola	2014-12-29	1	-16/+41
\| \| \| \|	llvm-svn: 224959
*	[Hexagon] Adding allocframe, post-increment circular immediate stores, ↵	Colin LeMahieu	2014-12-29	1	-1/+34
\| \| \| \| \| \|	post-increment circular register stores, and bit reversed post-increment stores. llvm-svn: 224957
*	[Hexagon] Adding post-increment register form stores and register-immediate ↵	Colin LeMahieu	2014-12-29	3	-2/+84
\| \| \| \| \| \|	form stores with tests. llvm-svn: 224952
*	[Hexagon] Replacing the remaining postincrement stores with versions that ↵	Colin LeMahieu	2014-12-29	1	-1/+40
\| \| \| \| \| \|	have encoding bits. llvm-svn: 224951
*	Convert test to FileCheck. NFC.	Rafael Espindola	2014-12-29	1	-876/+877
\| \| \| \|	llvm-svn: 224950
*	[Hexagon] Renaming old multiclass for removal. Adding post-increment store ↵	Colin LeMahieu	2014-12-29	1	-0/+14
\| \| \| \| \| \|	classes and instruction defs. llvm-svn: 224949
*	Add segmented stack support for DragonFlyBSD.	Rafael Espindola	2014-12-29	1	-0/+108
\| \| \| \| \| \|	Patch by Michael Neumann. llvm-svn: 224936
*	llvm/test/CodeGen/X86/fast-isel-call-bool.ll: Add explicit ↵	NAKAMURA Takumi	2014-12-28	1	-1/+1
\| \| \| \| \| \|	-mtriple=x86_64-unknown to satisfy x64. llvm-svn: 224907
*	[X86][ISel] Fix a regression I introduced in r224884	Keno Fischer	2014-12-28	2	-2/+14
\| \| \| \| \| \| \| \| \| \| \| \| \|	The else case ResultReg was not checked for validity. To my surprise, this case was not hit in any of the existing test cases. This includes a new test cases that tests this path. Also drop the `target triple` declaration from the original test as suggested by H.J. Lu, because apparently with it the test won't be run on Linux llvm-svn: 224901
*	[X86] Add missing memory variants to AVX false dependency breaking	Michael Kuperstein	2014-12-28	2	-62/+73
\| \| \| \| \| \| \| \|	Adds missing memory instruction variants to AVX false dependency breaking handling. (SSE was handled in r224246) Differential Revision: http://reviews.llvm.org/D6780 llvm-svn: 224900
*	[CodeGenPrepare] Teach when it is profitable to speculate calls to ↵	Andrea Di Biagio	2014-12-28	1	-0/+250
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	@llvm.cttz/ctlz. If the control flow is modelling an if-statement where the only instruction in the 'then' basic block (excluding the terminator) is a call to cttz/ctlz, CodeGenPrepare can try to speculate the cttz/ctlz call and simplify the control flow graph. Example: \code entry: %cmp = icmp eq i64 %val, 0 br i1 %cmp, label %end.bb, label %then.bb then.bb: %c = tail call i64 @llvm.cttz.i64(i64 %val, i1 true) br label %end.bb end.bb: %cond = phi i64 [ %c, %then.bb ], [ 64, %entry] \code In this example, basic block %then.bb is taken if value %val is not zero. Also, the phi node in %end.bb would propagate the size-of in bits of %val only if %val is equal to zero. With this patch, CodeGenPrepare will try to hoist the call to cttz from %then.bb into basic block %entry only if cttz is cheap to speculate for the target. Added two new hooks in TargetLowering.h to let targets customize the behavior (i.e. decide whether it is cheap or not to speculate calls to cttz/ctlz). The two new methods are 'isCheapToSpeculateCtlz' and 'isCheapToSpeculateCttz'. By default, both methods return 'false'. On X86, method 'isCheapToSpeculateCtlz' returns true only if the target has LZCNT. Method 'isCheapToSpeculateCttz' only returns true if the target has BMI. Differential Revision: http://reviews.llvm.org/D6728 llvm-svn: 224899