bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[LTOs] Allow generation of hotness information	Adam Nemet	2016-12-02	2	-0/+13
\| \| \| \| \| \| \| \|	The flag is passed by the clang driver. Differential Revision: https://reviews.llvm.org/D27331 llvm-svn: 288519
*	Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements."	Renato Golin	2016-12-02	1	-74/+66
\| \| \| \| \| \| \| \|	This reverts commit r288497, as it broke the AArch64 build of Compiler-RT's builtins (twice: once in r288412 and once in r288497). We should investigate this offline. llvm-svn: 288508
*	[DAGCombiner] do not fold (fmul (fadd X, 1), Y) -> (fmad X, Y, Y) by default	Nicolai Haehnle	2016-12-02	1	-5/+11
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: When X = 0 and Y = inf, the original code produces inf, but the transformed code produces nan. So this transform (and its relatives) should only be used when the no-infs-fp-math flag is explicitly enabled. Also disable the transform using fmad (intermediate rounding) when unsafe-math is not enabled, since it can reduce the precision of the result; consider this example with binary floating point numbers with two bits of mantissa: x = 1.01 y = 111 x * (y + 1) = 1.01 * 1000 = 1010 (this is the exact result; no rounding occurs at any step) x * y + x = 1000.11 + 1.01 =r 1000 + 1.01 = 1001.01 =r 1000 (with rounding towards zero) The example relies on rounding towards zero at least in the second step. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=98578 Reviewers: RKSimon, tstellarAMD, spatel, arsenm Subscribers: wdng, llvm-commits Differential Revision: https://reviews.llvm.org/D26602 llvm-svn: 288506
*	Tidyup code with indentation and clang-format. NFCI.	Simon Pilgrim	2016-12-02	1	-6/+6
\| \| \| \|	llvm-svn: 288505
*	[Sparc] Fix parsing of double-precision %f18, %f20, and %f22	Daniel Cederman	2016-12-02	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: They are currently being parsed as %f14, %f16, and %f18. Reviewers: venkatra, jyknight Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D27342 llvm-svn: 288503
*	[X86][SSE] Add support for extracting constant bit data from broadcasted ↵	Simon Pilgrim	2016-12-02	1	-24/+44
\| \| \| \| \| \|	constants llvm-svn: 288499
*	[SLP] Fix for PR6246: vectorization for scalar ops on vector elements.	Alexey Bataev	2016-12-02	1	-66/+74
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 288497
*	[X86] Refactored getTargetConstantBitsFromNode to allow for expansion. NFCI.	Simon Pilgrim	2016-12-02	1	-44/+56
\| \| \| \| \| \| \| \|	getTargetConstantBitsFromNode currently only extracts constant pool vector data, but it will need to be generalized to support broadcast and scalar constant pool data as well. Converted Constant bit extraction and Bitset splitting to helper lambda functions. llvm-svn: 288496
*	[AVX-512] Add EVEX vpshuflw/vpshufhw/vpshufd instructions to load folding ↵	Craig Topper	2016-12-02	1	-0/+27
\| \| \| \| \| \|	tables. llvm-svn: 288484
*	[AVX-512] Add EVEX PSHUFB instructions to load folding tables.	Craig Topper	2016-12-02	1	-0/+9
\| \| \| \|	llvm-svn: 288482
*	[AVX-512] Add masked VINSERTF/VINSERTI instructions to load folding tables.	Craig Topper	2016-12-02	1	-1/+25
\| \| \| \|	llvm-svn: 288481
*	IR: Move NumElements field from {Array,Vector}Type to SequentialType.	Peter Collingbourne	2016-12-02	7	-53/+20
\| \| \| \| \| \| \| \| \| \|	Now that PointerType is no longer a SequentialType, all SequentialTypes have an associated number of elements, so we can move that information to the base class, allowing for a number of simplifications. Differential Revision: https://reviews.llvm.org/D27122 llvm-svn: 288464
*	Change LoopUnrollPass cost from int to unsigned to make it consistent. (NFC)	Dehao Chen	2016-12-02	1	-5/+5
\| \| \| \|	llvm-svn: 288463
*	IR: Change PointerType to derive from Type rather than SequentialType.	Peter Collingbourne	2016-12-02	7	-19/+18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	As proposed on llvm-dev: http://lists.llvm.org/pipermail/llvm-dev/2016-October/106640.html This is for a couple of reasons: - Values of type PointerType are unlike the other SequentialTypes (arrays and vectors) in that they do not hold values of the element type. By moving PointerType we can unify certain aspects of how the other SequentialTypes are handled. - PointerType will have no place in the SequentialType hierarchy once pointee types are removed, so this is a necessary step towards removing pointee types. Differential Revision: https://reviews.llvm.org/D26595 llvm-svn: 288462
*	Fix GlobalISel build.	Peter Collingbourne	2016-12-02	1	-1/+1
\| \| \| \|	llvm-svn: 288460
*	ConstantFolding: Factor code into helper function	Matt Arsenault	2016-12-02	1	-23/+34
\| \| \| \|	llvm-svn: 288459
*	IR: Change the gep_type_iterator API to avoid always exposing the "current" ↵	Peter Collingbourne	2016-12-02	28	-93/+70
\| \| \| \| \| \| \| \| \| \| \| \| \|	type. Instead, expose whether the current type is an array or a struct, if an array what the upper bound is, and if a struct the struct type itself. This is in preparation for a later change which will make PointerType derive from Type rather than SequentialType. Differential Revision: https://reviews.llvm.org/D26594 llvm-svn: 288458
*	[DWARF] Put linkage-name on abstract origin even when there's a declaration.	Paul Robinson	2016-12-02	1	-1/+3
\| \| \| \| \| \| \| \| \| \|	In r266692, we made it possible to emit linkage names for just inlined functions, putting the attribute on the abstract origin. Make sure we don't think the linkage-name was already emitted on a declaration. Differential Revision: http://reviews.llvm.org/D27320 llvm-svn: 288450
*	[ThinLTO] Stop importing constant global vars as copies in the backend	Teresa Johnson	2016-12-02	2	-23/+0
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: We were doing an optimization in the ThinLTO backends of importing constant unnamed_addr globals unconditionally as a local copy (regardless of whether the thin link decided to import them). This should be done in the thin link instead, so that resulting exported references are marked and promoted appropriately, but will need a summary enhancement to mark these variables as constant unnamed_addr. The function import logic during the thin link was trying to handle this proactively, by conservatively marking all values referenced in the initializer lists of exported global variables as also exported. However, this only handled values referenced directly from the initializer list of an exported global variable. If the value is itself a constant unnamed_addr variable, we could end up exporting its references as well. This caused multiple issues. The first is that the transitively exported references weren't promoted. Secondly, some could not be promoted/renamed (e.g. they had a section or other constraint). recursively, instead of just adding the first level of initializer list references to the ExportList directly. Remove this optimization and the associated handling in the function import backend. SPEC measurements indicate we weren't getting much from it in any case. Fixes PR31052. Reviewers: mehdi_amini Subscribers: krasin, llvm-commits Differential Revision: https://reviews.llvm.org/D26880 llvm-svn: 288446
*	AMDGPU: Use wider scalar spills for SGPR spilling	Matt Arsenault	2016-12-02	1	-15/+70
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since the spill is for the whole wave, these don't have the swizzling problems that vector stores do and a single 4-byte allocation is enough to spill a 64 element register. This should reduce the number of spill instructions and put all the spills for a register in the same cacheline. This should save allocated private size, but for now it doesn't. The extra slots are allocated for each component, but never used because the frame layout is essentially finalized before frame indices are replaced. For always using the scalar store path, this should probably be moved into processFunctionBeforeFrameFinalized. llvm-svn: 288445
*	When instructions are hoisted out of loops by MachineLICM, remove their ↵	Wolfgang Pieb	2016-12-02	1	-0/+5
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	debug loc. This prevents erratic stepping behavior as well as incorrect source attribution for sample profiling. Reviewers: dblakie Subscribers: llvm-commit Differential Revision: https://reviews.llvm.org/D27290 llvm-svn: 288442
*	SDAG: Avoid a large, usually empty SmallVector in a recursive function	Justin Bogner	2016-12-02	1	-2/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This SmallVector is using up 128 bytes on the stack every time despite almost always being empty[1], and since this function can recurse quite deeply that adds up to a lot of overhead. We've seen this run afoul of ulimits in some cases with ASAN on. Replacing the SmallVector with a std::vector trades an occasional heap allocation for vastly less stack usage. [1]: I gathered some stats on an internal test suite and the vector was non-empty in only 45,000 of 10,000,000 calls to this function. llvm-svn: 288441
*	[AArch64] Fold more spilled/refilled COPYs.	Geoff Berry	2016-12-01	1	-10/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Make AArch64InstrInfo::foldMemoryOperandImpl more general by folding all full COPYs between register classes of the same size that are either spilled or refilled. Reviewers: MatzeB, qcolombet Subscribers: aemerson, rengolin, mcrosier, llvm-commits Differential Revision: https://reviews.llvm.org/D27271 llvm-svn: 288439
*	[MC] Refactor emitELFSize to make usage more consistent. NFC.	Dan Gohman	2016-12-01	5	-12/+9
\| \| \| \| \| \| \| \| \| \| \| \| \|	Move the cast<MCSymbolELF> inside emitELFSize, so that: - it's done in one place instead of at each call - it's more consistent with similar functions like EmitCOFFSafeSEH - ambiguity between cast<> and dyn_cast<> is avoided (which also eliminates an unnecessary dyn_cast call) This also makes it easier to experiment with using ".size" directives on non-ELF targets. llvm-svn: 288437
*	[ARM] Fix for 64-bit CAS expansion on ARM32 with -O0	Oleg Ranevskyy	2016-12-01	1	-7/+4
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch fixes comparison of 64-bit atomic with its expected value in CMP_SWAP_64 expansion. Currently, the low words are compared with CMP, while the high words are compared with SBC. SBC expects the carry flag to be set if CMP detects a difference. CMP might leave the carry unset for unequal arguments though if the first one is >= than the second. This might cause the comparison logic to detect false equality. Example of the broken C++ code: ``` std::atomic<long long> at(2); long long ll = 1; std::atomic_compare_exchange_strong(&at, &ll, 3); ``` Even though the atomic `at` and the expected value `ll` are not equal and `atomic_compare_exchange_strong` returns `false`, `at` is changed to 3. The patch replaces SBC with CMPEQ. Reviewers: t.p.northover Subscribers: aemerson, rengolin, llvm-commits, asl Differential Revision: https://reviews.llvm.org/D27315 llvm-svn: 288433
*	Revert "[SLP] Fix for PR6246: vectorization for scalar ops on vector elements."	Artem Belevich	2016-12-01	1	-72/+66
\| \| \| \| \| \|	This reverts r288412 which causes severe compile-time regression. llvm-svn: 288431
*	RegisterCoalscer: Only coalesce complete reserved registers.	Matthias Braun	2016-12-01	1	-1/+7
\| \| \| \| \| \| \| \| \| \| \| \|	The coalescer eliminates copies from reserved registers of the form: %vregX = COPY %rY in the case where %rY is a reserved register. However this turns out to be invalid if only some of the subregisters are reserved (see also https://reviews.llvm.org/D26648). Differential Revision: https://reviews.llvm.org/D26687 llvm-svn: 288428
*	[debug info] Minor cleanup from D27170/r288399	David Blaikie	2016-12-01	3	-3/+3
\| \| \| \|	llvm-svn: 288421
*	AArch64: fix 128-bit cmpxchg at -O0 (again, again).	Tim Northover	2016-12-01	1	-6/+14
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This time the issue is fortunately just a simple mistake rather than a horrible design spectre. I thought SUBS/SBCS provided sufficient NZCV flags for comparing two 64-bit values, but they don't. The fix is slightly clunkier in AArch64 because we can't use conditional execution to emit a pair of CMPs. Traditionally an "icmp ne i128" would map to an EOR/EOR/ORR/CBNZ, but that uses more registers so it's easier to go with a CSET/CINC/CBNZ combination. Slightly less efficient, but this is -O0 anyway. Thanks to Anton Korobeynikov for pointing out the issue. llvm-svn: 288418
*	Fix unused variable warning in Release builds. NFC.	Benjamin Kramer	2016-12-01	1	-1/+1
\| \| \| \|	llvm-svn: 288416
*	[PR29121] Don't fold if it would produce atomic vector loads or stores	Philip Reames	2016-12-01	1	-14/+28
\| \| \| \| \| \| \| \|	The instcombine code which folds loads and stores into their use types can trip up if the use is a bitcast to a type which we can't directly load or store in the IR. In principle, such types shouldn't exist, but in practice they do today. This is a workaround to avoid a bug while we work towards the long term goal. Differential Revision: https://reviews.llvm.org/D24365 llvm-svn: 288415
*	Factor out common parts of LVI and Float2Int into ConstantRange [NFCI]	Philip Reames	2016-12-01	3	-80/+98
\| \| \| \| \| \| \| \| \| \|	This just extracts out the transfer rules for constant ranges into a single shared point. As it happens, neither bit of code actually overlaps in terms of the handled operators, but with this change that could easily be tweaked in the future. I also want to have this separated out to make experimenting with a eager value info implementation and possibly a ValueTracking-like fixed depth recursion peephole version. There's no reason all four of these can't share a common implementation which reduces the chances of bugs. Differential Revision: https://reviews.llvm.org/D27294 llvm-svn: 288413
*	[SLP] Fix for PR6246: vectorization for scalar ops on vector elements.	Alexey Bataev	2016-12-01	1	-66/+72
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	When trying to vectorize trees that start at insertelement instructions function tryToVectorizeList() uses vectorization factor calculated as MinVecRegSize/ScalarTypeSize. But sometimes it does not work as tree cost for this fixed vectorization factor is too high. Patch tries to improve the situation. It tries different vectorization factors from max(PowerOf2Floor(NumberOfVectorizedValues), MinVecRegSize/ScalarTypeSize) to MinVecRegSize/ScalarTypeSize and tries to choose the best one. Differential Revision: https://reviews.llvm.org/D27215 llvm-svn: 288412
*	Refactored X86InterleavedAccess into a class. NFCI.	David L Kreitzer	2016-12-01	1	-67/+171
\| \| \| \| \| \| \| \|	Patch by Farhana Aleen Differential Revision: https://reviews.llvm.org/D25986 llvm-svn: 288410
*	Move most EH from MachineModuleInfo to MachineFunction	Matthias Braun	2016-12-01	20	-272/+260
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Recommitting r288293 with some extra fixes for GlobalISel code. Most of the exception handling members in MachineModuleInfo is actually per function data (talks about the "current function") so it is better to keep it at the function instead of the module. This is a necessary step to have machine module passes work properly. Also: - Rename TidyLandingPads() to tidyLandingPads() - Use doxygen member groups instead of "//===- EH ---"... so it is clear where a group ends. - I had to add an ugly const_cast at two places in the AsmPrinter because the available MachineFunction pointers are const, but the code wants to call tidyLandingPads() in between (markFunctionEnd()/endFunction()). Differential Revision: https://reviews.llvm.org/D27227 llvm-svn: 288405
*	Fix unused variable warning in Release builds. NFC.	Benjamin Kramer	2016-12-01	1	-1/+1
\| \| \| \|	llvm-svn: 288401
*	This change removes the dependency on DwarfDebug that was used for ↵	Greg Clayton	2016-12-01	9	-85/+63
\| \| \| \| \| \| \| \| \| \| \| \| \|	DW_FORM_ref_addr by making a new DIEUnit class in DIE.cpp. The DIEUnit class represents a compile or type unit and it owns the unit DIE as an instance variable. This allows anyone with a DIE, to get the unit DIE, and then get back to its DIEUnit without adding any new ivars to the DIE class. Why was this needed? The DIE class has an Offset that is always the CU relative DIE offset, not the "offset in debug info section" as was commented in the header file (the comment has been corrected). This is great for performance because most DIE references are compile unit relative and this means most code that accessed the DIE's offset didn't need to make it into a compile unit relative offset because it already was. When we needed to emit a DW_FORM_ref_addr though, we needed to find the absolute offset of the DIE by finding the DIE's compile/type unit. This class did have the absolute debug info/type offset and could be added to the CU relative offset to compute the absolute offset. With this change we can easily get back to a DIE's DIEUnit which will have this needed offset. Prior to this is required having a DwarfDebug and required calling: DwarfCompileUnit DwarfDebug::lookupUnit(const DIE CU) const; Now we can use the DIEUnit class to do so without needing DwarfDebug. All clients now use DIEUnit objects (the DwarfDebug stack and the DwarfLinker). A follow on patch for the DWARF generator will also take advantage of this. Differential Revision: https://reviews.llvm.org/D27170 llvm-svn: 288399
*	[SLP] Fixed cost model for horizontal reduction.	Alexey Bataev	2016-12-01	1	-1/+2
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Currently when cost of scalar operations is evaluated the vector type is used for scalar operations. Patch fixes this issue and fixes evaluation of the vector operations cost. Several test showed that vector cost model is too optimistic. It allowed vectorization of 8 or less add/fadd operations, though scalar code is faster. Actually, only for 16 or more operations vector code provides better performance. Differential Revision: https://reviews.llvm.org/D26277 llvm-svn: 288398
*	[llvm] Implement support for -defsym assembler option	Mandeep Singh Grang	2016-12-01	1	-0/+16
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Changes to llvm-mc to move common logic to separate function. Related clang patch: https://reviews.llvm.org/D26213 Reviewers: rafael, t.p.northover, colinl, echristo, rengolin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26214 llvm-svn: 288396
*	[X86][SSE] Moved shuffle mask widening/narrowing helper functions earlier in ↵	Simon Pilgrim	2016-12-01	1	-78/+84
\| \| \| \| \| \| \| \|	the file. Will be necessary for a future patch. llvm-svn: 288395
*	[libFuzzer] add a test for r288389 (-rss_limit_mb=0 means no limit).	Kostya Serebryany	2016-12-01	1	-0/+3
\| \| \| \|	llvm-svn: 288392
*	[SystemZ] Fix fallout from r288374	Ulrich Weigand	2016-12-01	1	-1/+2
\| \| \| \| \| \|	Avoid undefined behavior due to too-large shift count. llvm-svn: 288391
*	[AsmParser] Diagnose empty symbol for .set directive	Weiming Zhao	2016-12-01	1	-0/+3
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Diagnose empty symbol to avoid hitting assertion in MCContext::getOrCreateSymbol Reviewers: eli.friedman, rengolin Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D26728 llvm-svn: 288390
*	[libFuzzer] treat -rss_limit_mb=0 as no limit	Kostya Serebryany	2016-12-01	1	-1/+1
\| \| \| \|	llvm-svn: 288389
*	[GVN, OptDiag] Print the interesting instructions involved in missed ↵	Adam Nemet	2016-12-01	1	-0/+36
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	load-elimination [recommitting after the fix in r288307] This includes the intervening store and the load/store that we're trying to forward from in the optimization remark for the missed load elimination. This is hooked up under a new mode in ORE that allows for compile-time budget for a bit more analysis to print more insightful messages. This mode is currently enabled for -fsave-optimization-record (-Rpass is trickier since it is controlled in the front-end). With this we can now print the red remark in http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446 Differential Revision: https://reviews.llvm.org/D26490 llvm-svn: 288381
*	[GVN, OptDiag] Include the value that is forwarded in load elimination	Adam Nemet	2016-12-01	2	-7/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[recommitting after the fix in r288307] This requires some changes to the opt-diag API. Hal and I have discussed this at the Dev Meeting and came up with a streaming delimiter (setExtraArgs) to solve this. Arguments after this delimiter are only included in the optimization records and not in the remarks printed in the compiler output. (Note, how in the test the content of the YAML file changes but the remarks on the compiler output don't.) This implements the green GVN message with a bug fix at line http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L446 The fix is that now we properly include the constant value in the message: "load of type i32 eliminated in favor of 7" Differential Revision: https://reviews.llvm.org/D26489 llvm-svn: 288380
*	[SystemZ] Fix applyFixup for 12-bit fixups	Ulrich Weigand	2016-12-01	1	-1/+3
\| \| \| \| \| \| \| \| \|	Now that we have fixups that only fill parts of a byte, it turns out we have to mask off the bits outside the fixup area when applying them. Failing to do so caused invalid object code to be emitted for bprp with a negative 12-bit displacement. llvm-svn: 288374
*	[GVN] Basic optimization remark support	Adam Nemet	2016-12-01	2	-3/+26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	[recommitting after the fix in r288307] Follow-on patches will add more interesting cases. The goal of this patch-set is to get the GVN messages printed in opt-viewer from Dhrystone as was presented in my Dev Meeting talk. This is the optimization view for the function (the last remark in the function has a bug which is fixed in this series): http://lab.llvm.org:8080/artifacts/opt-view_test-suite/build/SingleSource/Benchmarks/Dhrystone/CMakeFiles/dry.dir/html/_org_test-suite_SingleSource_Benchmarks_Dhrystone_dry.c.html#L430 Differential Revision: https://reviews.llvm.org/D26488 llvm-svn: 288370
*	[X86][SSE] Classify AND bitmasks as variable shuffle masks	Simon Pilgrim	2016-12-01	1	-0/+4
\| \| \| \| \| \|	They are loading the bitmasks from the constant pool so the cost is similar to loading a shuffle mask. llvm-svn: 288367
*	[X86][SSE] Add support for combining AND bitmasks to shuffles.	Simon Pilgrim	2016-12-01	1	-0/+11
\| \| \| \|	llvm-svn: 288365