bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
...
*	[X86][AVX512] Improve lowering of AVX512 compare intrinsics (remove ↵	Ayman Musa	2017-06-15	4	-50/+13490
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	redundant shift left+right instructions). AVX512 compare instructions return v*i1 types. In cases where the number of elements in the returned value are less than 8, clang adds zeroes to get a mask of v8i1 type. Later on it's replaced with CONCAT_VECTORS, which then is lowered to many DAG nodes including insert/extract element and shift right/left nodes. The fact that AVX512 compare instructions put the result in a k register and zeroes all its upper bits allows us to remove the extra nodes simply by copying the result to the required register class. When lowering, identify these cases and transform them into an INSERT_SUBVECTOR node (marked legal), then catch this pattern in instructions selection phase and transform it into one avx512 cmp instruction. Differential Revision: https://reviews.llvm.org/D33188 llvm-svn: 305465
*	Align definition of DW_OP_plus with DWARF spec [3/3]	Florian Hahn	2017-06-14	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch is part of 3 patches that together form a single patch, but must be introduced in stages in order not to break things. The way that LLVM interprets DW_OP_plus in DIExpression nodes is basically that of the DW_OP_plus_uconst operator since LLVM expects an unsigned constant operand. This unnecessarily restricts the DW_OP_plus operator, preventing it from being used to describe the evaluation of runtime values on the expression stack. These patches try to align the semantics of DW_OP_plus and DW_OP_minus with that of the DWARF definition, which pops two elements off the expression stack, performs the operation and pushes the result back on the stack. This is done in three stages: • The first patch (LLVM) adds support for DW_OP_plus_uconst. • The second patch (Clang) contains changes all its uses from DW_OP_plus to DW_OP_plus_uconst. • The third patch (LLVM) changes the semantics of DW_OP_plus and DW_OP_minus to be in line with its DWARF meaning. This patch includes the bitcode upgrade from legacy DIExpressions. Patch by Sander de Smalen. Reviewers: echristo, pcc, aprantl Reviewed By: aprantl Subscribers: fhahn, javed.absar, aprantl, llvm-commits Differential Revision: https://reviews.llvm.org/D33894 llvm-svn: 305386
*	[AVX-512] Mark masked VPCMP instructions as commutable.	Craig Topper	2017-06-13	1	-0/+13
\| \| \| \|	llvm-svn: 305276
*	[AVX-512] Mark masked version of vpcmpeq as being commutable.	Craig Topper	2017-06-13	1	-0/+14
\| \| \| \|	llvm-svn: 305275
*	[X86] Add masked integer compare instructions to load folding tables.	Craig Topper	2017-06-13	1	-0/+28
\| \| \| \|	llvm-svn: 305274
*	[x86] regenerate checks with update_llc_test_checks.py	Sanjay Patel	2017-06-12	28	-160/+33
\| \| \| \| \| \| \| \| \| \| \|	The dream of a unified check-line auto-generator for all phases of compilation is dead. The llc script has already diverged to be better at its goal, so having 2 scripts that do almost the same thing is just causing confusion. We can rip out the llc ability in update_test_checks.py next and rename it, so it will be clear that we have one script for llc check auto-generation and another for opt. llvm-svn: 305206
*	[SelectionDAG] Allow sin/cos -> sincos optimization on GNU triples w/ just ↵	Geoff Berry	2017-06-12	1	-43/+94
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	-fno-math-errno Summary: This change enables the sin(x) cos(x) -> sincos(x) optimization on GNU target triples. This optimization was being inhibited when -ffast-math wasn't set because sincos in GLibC does not set errno, while sin and cos do. However, this optimization will only run if the attributes on the sin/cos calls include readnone, which is how clang represents the fact that it doesn't care about the errno values set by these functions (via the -fno-math-errno flag). Reviewers: hfinkel, bogner Subscribers: mcrosier, javed.absar, llvm-commits, paul.redmond Differential Revision: https://reviews.llvm.org/D32921 llvm-svn: 305204
*	[x86] regenerate checks with update_llc_test_checks.py	Sanjay Patel	2017-06-12	8	-121/+192
\| \| \| \| \| \| \| \| \| \|	The dream of a unified check-line auto-generator for all phases of compilation is dead. The llc script has already diverged to be better at its goal, so having 2 scripts that do almost the same thing is just causing confusion for newcomers. I plan to fix up more x86 tests in a next commit. We can rip out the llc ability in update_test_checks.py after that. llvm-svn: 305202
*	StackColoring: smarter check for slot overlap	Than McIntosh	2017-06-12	1	-0/+64
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The old check for slot overlap treated 2 slots `S` and `T` as overlapping if there existed a CFG node in which both of the slots could possibly be active. That is overly conservative and caused stack blowups in Rust programs. Instead, check whether there is a single CFG node in which both of the slots are possibly active together. Fixes PR32488. Patch by Ariel Ben-Yehuda <ariel.byd@gmail.com> Reviewers: thanm, nagisa, llvm-commits, efriedma, rnk Reviewed By: thanm Subscribers: dotdash Differential Revision: https://reviews.llvm.org/D31583 llvm-svn: 305193
*	[AVX-512] Add VPCONFLICT and VPLZCNT to load folding tables.	Craig Topper	2017-06-12	2	-2/+110
\| \| \| \|	llvm-svn: 305180
*	[x86] use vperm2f128 rather than vinsertf128 when there's a chance to fold a ↵	Sanjay Patel	2017-06-11	2	-41/+23
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	32-byte load I was looking closer at the x86 test diffs in D33866, and the first change seems like it shouldn't happen in the first place. So this patch will resolve that. Using Agner's tables and AMD docs, vperm2f128 and vinsertf128 have identical timing for any given CPU model, so we should be able to interchange those without affecting perf. But as we can see in some of the diffs here, using vperm2f128 allows load folding, so we should take that opportunity to reduce code size and register pressure. A secondary advantage is making AVX1 and AVX2 codegen more similar. Given that vperm2f128 was introduced with AVX1, we should be selecting it in all of the same situations that we would with AVX2. If there's some reason that an AVX1 CPU would not want to use this instruction, that should be fixed up in a later pass. Differential Revision: https://reviews.llvm.org/D33938 llvm-svn: 305171
*	[DAGCombine] Make sure we check the ResNo from UADDO before combining	Amaury Sechet	2017-06-11	1	-0/+24
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: UADDO has 2 result, and one must check the result no before doing any kind of combine. Without it, the transform is invalid. Reviewers: joerg Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D34088 llvm-svn: 305162
*	[X86][SSE] Extended PR32368 to SSE/AVX1/AVX2	Simon Pilgrim	2017-06-10	1	-8/+142
\| \| \| \|	llvm-svn: 305154
*	[X86][AVX512] Added test case for PR32368	Simon Pilgrim	2017-06-10	1	-0/+19
\| \| \| \|	llvm-svn: 305153
*	[X86][SSE] Add support for PACKSS nodes to faux shuffle extraction	Simon Pilgrim	2017-06-09	1	-273/+265
\| \| \| \| \| \|	If the inputs won't saturate during packing then we can treat the PACKSS as a truncation shuffle llvm-svn: 305091
*	Prevent RemoveDeadNodes from deleted already deleted node.	Nirav Dave	2017-06-09	1	-0/+83
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This prevents against assertion errors like PR32659 which occur from a replacement deleting a node after it's been added to the list argument of RemoveDeadNodes. The specific failure from PR32659 does not currently happen, but it is still potentially possible. The underlying cause is that the callers of the change dfunction builds up a list of nodes to delete after having moved their uses and it possible that a move of a later node will cause a previously deleted nodes to be deleted. Reviewers: bkramer, spatel, davide Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D33731 llvm-svn: 305070
*	[x86] remove unused param from tests; NFC	Sanjay Patel	2017-06-08	1	-10/+10
\| \| \| \|	llvm-svn: 304989
*	Add scheduler classes to integer/float horizontal operations.	Andrew V. Tischenko	2017-06-08	1	-16/+16
\| \| \| \| \| \| \|	This patch will close PR32801. Differential Revision: https://reviews.llvm.org/D33203 llvm-svn: 304986
*	[x86] add tests for memcmp expansion; NFC	Sanjay Patel	2017-06-08	1	-37/+293
\| \| \| \| \| \| \| \| \| \| \| \|	We already had a test to demonstrate PR33325: https://bugs.llvm.org/show_bug.cgi?id=33325 I'm adding tests for general memcmp expansion (see D34005 / D33963) and: https://bugs.llvm.org/show_bug.cgi?id=33329 ...plus non-power-of-2 sizes, so we can see what that looks like currently or if expanded. llvm-svn: 304979
*	This patch closes PR28513: an optimization of multiplication by different ↵	Andrew V. Tischenko	2017-06-08	4	-360/+4253
\| \| \| \| \| \| \| \|	constants. The initial patch was rejected: I fixed the issue and re-apply it. llvm-svn: 304972
*	[X86] Add test to demonstrate inefficient lowering of v48i8 shuffle.	Guy Blank	2017-06-07	1	-0/+49
\| \| \| \|	llvm-svn: 304915
*	[x86] avoid flipping sign bits for vector icmp by using known bits	Sanjay Patel	2017-06-07	2	-104/+34
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	If we know that both operands of an unsigned integer vector comparison are non-negative, then it's safe to directly use a signed-compare-greater-than instruction (the only non-equality integer vector compare predicate provided by SSE/AVX). We're intentionally not changing the condition code to signed in order to preserve the existing transforms that use min/max/psubus below here. This should solve PR33276: https://bugs.llvm.org/show_bug.cgi?id=33276 Differential Revision: https://reviews.llvm.org/D33862 llvm-svn: 304909
*	[X86][SSE] Fix an issue with PEXTRW/PEXTRB indices during shuffle combining	Simon Pilgrim	2017-06-07	1	-40/+4
\| \| \| \| \| \|	We were checking that the index was in range of the destination vector type, not the (larger) source vector type llvm-svn: 304894
*	Added tests for X86InterleavedStore.	Evgeny Stupachenko	2017-06-06	1	-0/+93
\| \| \| \| \| \| \| \| \| \|	Reviewers: RKSimon, DavidKreitzer Differential Revision: https://reviews.llvm.org/D33684 Patch by: Aleen Farhana <Farhana.aleen@gmail.com> llvm-svn: 304834
*	Fix PR23384 (part 3 of 3)	Evgeny Stupachenko	2017-06-06	7	-67/+67
\| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The patch makes instruction count the highest priority for LSR solution for X86 (previously registers had highest priority). Reviewers: qcolombet Differential Revision: http://reviews.llvm.org/D30562 From: Evgeny Stupachenko <evstupac@gmail.com> llvm-svn: 304824
*	[X86][AVX1] Split 256-bit vector non-temporal FastISel loads to keep it ↵	Simon Pilgrim	2017-06-06	1	-6/+30
\| \| \| \| \| \| \| \|	non-temporal (PR32744) Extension to D33728 llvm-svn: 304798
*		Vivek Pandya	2017-06-06	34	-704/+704
\| \| \| \| \| \| \| \| \| \| \| \|	[Improve CodeGen Testing] This patch renables MIRPrinter print fields which have value equal to its default. If -simplify-mir option is passed then MIRPrinter will not print such fields. This change also required some lit test cases in CodeGen directory to be changed. Reviewed By: MatzeB Differential Revision: https://reviews.llvm.org/D32304 llvm-svn: 304779
*	[x86] Stop this test from dirtying the source tree when run.	Chandler Carruth	2017-06-06	1	-1/+1
\| \| \| \| \| \|	The output isn't used anyways. llvm-svn: 304766
*	[x86] Add the test for folding stack spills into pextrw.	Chandler Carruth	2017-06-06	1	-2/+15
\| \| \| \| \| \| \| \|	This is a negative test as pextrw doesn't write to all 32-bits of the spilled GPR. This fold ended up happening when D32684 was landed and covers the regression that motivated reverting it in r304762. llvm-svn: 304763
*	[x86] Revert the X86FoldTablesEmitter due to more miscompiles.	Chandler Carruth	2017-06-06	3	-16/+29
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In testing, we've found yet another miscompile caused by the new tables. And this one is even less clear how to fix (we could teach it to fold a 16-bit load instead of the 32-bit load it wants, or block folding entirely). Also, the approach to excluding instructions seems increasingly to not scale well. I have left a more detailed analysis on the review log for the original patch (https://reviews.llvm.org/D32684) along with suggested path forward. I will land an additional test case that I wrote which covers the code that was miscompiling (folding into the output of `pextrw`) in a subsequent commit to keep this a pure revert. For each commit reverted here, I've restricted the revert to the non-test code touching the x86 fold table emission until the last commit where I did revert the test updates. This means the new test cases added for `insertps` and `xchg` remain untouched (and continue to pass). Reverted commits: r304540: [X86] Don't fold into memory operands into insertps in the ... r304347: [TableGen] Adapt more places to getValueAsString now ... r304163: [X86] Don't fold away the memory operand of an xchg. r304123: Don't capture a temporary std::string in a StringRef. r304122: Resubmit "[X86] Adding new LLVM TableGen backend that ..." Original commit was in r304088, and after a string of fixes was reverted previously in r304121 to fix build bots, and then re-landed in r304122. llvm-svn: 304762
*	CodeGen/LLVMTargetMachine: Refactor ISel pass construction; NFCI	Matthias Braun	2017-06-06	1	-1/+1
\| \| \| \| \| \| \| \| \| \| \| \|	- Move ISel (and pre-isel) pass construction into TargetPassConfig - Extract AsmPrinter construction into a helper function Putting the ISel code into TargetPassConfig seems a lot more natural and both changes together make make it easier to build custom pipelines involving .mir in an upcoming commit. This moves MachineModuleInfo to an earlier place in the pass pipeline which shouldn't have any effect. llvm-svn: 304754
*	[x86] fix over-specific triple; NFC	Sanjay Patel	2017-06-06	1	-18/+18
\| \| \| \| \| \| \| \|	There's nothing darwin-specific in these tests, and using that setting causes extra phantom diffs when the auto-generated check lines are regenerated today. llvm-svn: 304753
*	[SelectionDAG] Update the dominator after splitting critical edges.	Davide Italiano	2017-06-05	1	-0/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Running `llc -verify-dom-info` on the attached testcase results in a crash in the verifier, due to a stale dominator tree. i.e. DominatorTree is not up to date! Computed: =============================-------------------------------- Inorder Dominator Tree: [1] %safe_mod_func_uint8_t_u_u.exit.i.i.i {0,7} [2] %lor.lhs.false.i61.i.i.i {1,2} [2] %safe_mod_func_int8_t_s_s.exit.i.i.i {3,6} [3] %safe_div_func_int64_t_s_s.exit66.i.i.i {4,5} Actual: =============================-------------------------------- Inorder Dominator Tree: [1] %safe_mod_func_uint8_t_u_u.exit.i.i.i {0,9} [2] %lor.lhs.false.i61.i.i.i {1,2} [2] %safe_mod_func_int8_t_s_s.exit.i.i.i {3,8} [3] %safe_div_func_int64_t_s_s.exit66.i.i.i {4,5} [3] %safe_mod_func_int8_t_s_s.exit.i.i.i.lor.lhs.false.i61.i.i.i_crit_edge {6,7} This is because in `SelectionDAGIsel` we split critical edges without updating the corresponding dominator for the function (and we claim in `MachineFunctionPass::getAnalysisUsage()` that the domtree is preserved). We could either stop preserving the domtree in `getAnalysisUsage` or tell `splitCriticalEdge()` to update it. As the second option is easy to implement, that's the one I chose. Differential Revision: https://reviews.llvm.org/D33800 llvm-svn: 304742
*	[X86][SSE41] Non-temporal loads shouldn't be folded if it can be avoided ↵	Simon Pilgrim	2017-06-05	1	-96/+252
\| \| \| \| \| \| \| \| \| \|	(PR32743) Missed SSE41 non-temporal load case in previous commit Differential Revision: https://reviews.llvm.org/D33728 llvm-svn: 304722
*	[X86][AVX1] Split 256-bit vector non-temporal loads to keep it non-temporal ↵	Simon Pilgrim	2017-06-05	2	-100/+202
\| \| \| \| \| \| \| \|	(PR32744) Differential Revision: https://reviews.llvm.org/D33728 llvm-svn: 304718
*	[X86][SSE] Non-temporal loads shouldn't be folded if it can be avoided (PR32743)	Simon Pilgrim	2017-06-05	1	-65/+148
\| \| \| \| \| \|	Differential Revision: https://reviews.llvm.org/D33728 llvm-svn: 304717
*	[X86][SSE] Change BUILD_VECTOR interleaving ordering to improve ↵	Simon Pilgrim	2017-06-04	19	-1297/+1240
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	coalescing/combine opportunities We currently generate BUILD_VECTOR as a tree of UNPCKL shuffles of the same type: e.g. for v4f32: Step 1: unpcklps 0, 2 ==> X: <?, ?, 2, 0> : unpcklps 1, 3 ==> Y: <?, ?, 3, 1> Step 2: unpcklps X, Y ==> <3, 2, 1, 0> The issue is because we are not placing sequential vector elements together early enough, we fail to recognise many combinable patterns - consecutive scalar loads, extractions etc. Instead, this patch unpacks progressively larger sequential vector elements together: e.g. for v4f32: Step 1: unpcklps 0, 2 ==> X: <?, ?, 1, 0> : unpcklps 1, 3 ==> Y: <?, ?, 3, 2> Step 2: unpcklpd X, Y ==> <3, 2, 1, 0> This does mean that we are creating UNPCKL shuffle of different value types, but the relevant combines that benefit from this are quite capable of handling the additional BITCASTs that are now included in the shuffle tree. Differential Revision: https://reviews.llvm.org/D33864 llvm-svn: 304688
*	[GlobalISel][X86] merge irtranslator-call test files. NFC	Igor Breger	2017-06-04	3	-57/+34
\| \| \| \|	llvm-svn: 304683
*	Regenerate expectations for trunc-to-bool.ll . NFC	Amaury Sechet	2017-06-03	1	-10/+60
\| \| \| \|	llvm-svn: 304660
*	[X86][SSE] Add SCALAR_TO_VECTOR(PEXTRW/PEXTRB) support to faux shuffle combining	Simon Pilgrim	2017-06-03	1	-39/+5
\| \| \| \| \| \|	Generalized existing SCALAR_TO_VECTOR(EXTRACT_VECTOR_ELT) code to support AssertZext + PEXTRW/PEXTRB cases as well. llvm-svn: 304659
*	[x86] fix over-specific triple; NFC	Sanjay Patel	2017-06-02	1	-204/+204
\| \| \| \| \| \| \| \|	There's nothing darwin-specific in these tests, and using that setting causes extra phantom diffs when the auto-generated check lines are regenerated today. llvm-svn: 304614
*	Canonicalize a test via utils/update_test_checks.py	Philip Reames	2017-06-02	1	-31/+91
\| \| \| \| \| \|	Turns out I might not have further changes to make here, but with the way I'd written the tests, even I couldn't tell that. :( llvm-svn: 304613
*	[x86] add tests for unsigned vector compares with known signbits; NFC (PR33276)	Sanjay Patel	2017-06-02	1	-0/+519
\| \| \| \|	llvm-svn: 304612
*	RegisterScavenging: Add ScavengerTest pass	Matthias Braun	2017-06-02	1	-0/+54
\| \| \| \| \| \| \| \| \|	This pass allows to run the register scavenging independently of PrologEpilogInserter to allow targeted testing. Also adds some basic register scavenging tests. llvm-svn: 304606
*	[X86] Correctly broadcast NaN-like integers as float on AVX.	Ahmed Bougacha	2017-06-02	1	-0/+32
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Since r288804, we try to lower build_vectors on AVX using broadcasts of float/double. However, when we broadcast integer values that happen to have a NaN float bitpattern, we lose the NaN payload, thereby changing the integer value being broadcast. This is caused by ConstantFP::get, to which we pass the splat i32 as a float (by bitcasting it using bitsToFloat). ConstantFP::get takes a double parameter, so we end up lossily converting a single-precision NaN to double-precision. Instead, avoid any kinds of conversions by directly building an APFloat from the splatted APInt. Note that this also fixes another piece of code (broadcast of subvectors), that currently isn't susceptible to the same problem. Also note that we could really just use APInt and ConstantInt throughout: the constant pool type doesn't matter much. Still, for consistency, use the appropriate type. llvm-svn: 304590
*	Regenerate expectation for wide-fma-contraction.ll . NFC	Amaury Sechet	2017-06-02	1	-16/+38
\| \| \| \|	llvm-svn: 304586
*	Add placeholder for more extensive verification of psuedo ops	Philip Reames	2017-06-02	12	-13/+13
\| \| \| \| \| \| \| \| \| \|	This initial patch doesn't actually do much useful. It's just to show where the new code goes. Once this is in, I'll extend the verification logic to check more useful properties. For those curious, the more complicated version of this patch already found one very suspicious thing. Differential Revision: https://reviews.llvm.org/D33819 llvm-svn: 304564
*	Update select.ll expected results. NFC	Amaury Sechet	2017-06-02	1	-0/+31
\| \| \| \|	llvm-svn: 304557
*	Regenerate sse3.ll test results. NFC	Amaury Sechet	2017-06-02	1	-0/+18
\| \| \| \|	llvm-svn: 304548
*	Regenerate and-sink.ll test results. NFC	Amaury Sechet	2017-06-02	1	-39/+94
\| \| \| \|	llvm-svn: 304547