summaryrefslogtreecommitdiffstats
path: root/llvm/test
Commit message (Collapse)AuthorAgeFilesLines
* [X86] Separate BSWAP32r and BSWAP64r scheduling data in ↵Craig Topper2018-04-041-6/+6
| | | | | | | | SandyBridge/Haswell/Broadwell/Skylake scheduler models. The BSWAP64r version is 2 uops and BSWAP32r is only 1 uop. The regular expressions also looked for a non-existant BSWAP16r. llvm-svn: 329211
* [Power9]Legalize and emit code for quad-precision fma instructionsLei Huang2018-04-041-0/+203
| | | | | | | | | | | | | Legalize and emit code for the following quad-precision fma: * xsmaddqp * xsnmaddqp * xsmsubqp * xsnmsubqp Differential Revision: https://reviews.llvm.org/D44843 llvm-svn: 329206
* Re-commit r329179 after fixing build&test issuesPavel Labath2018-04-048-5/+1878
| | | | | | | | | | | - MSVC was not OK with a static_assert referencing a non-static member variable, even though it was just in a sizeof(expression). I move the assert into the emit function, where it is probably more useful. - Tests were failing in builds which did not have the X86 target configured. Since this functionality is not target-specific, I have removed the target specifiers from the .ll files. llvm-svn: 329201
* [InstCombine] [NFC] Add tests for getting rid of select of bittest (PR36950 ↵Roman Lebedev2018-04-041-0/+464
| | | | | | | | | | | | | | | | / PR17564) Summary: See [[ https://bugs.llvm.org/show_bug.cgi?id=36950 | PR36950 ]], [[ https://bugs.llvm.org/show_bug.cgi?id=17564 | PR17564 ]], D45065, D45108 Reviewers: spatel, craig.topper Reviewed By: spatel Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45107 llvm-svn: 329198
* [AMDGPU][MC] Enabled instruction TBUFFER_LOAD_FORMAT_XYZ for SI/CIDmitry Preobrazhensky2018-04-041-0/+4
| | | | | | | | | See bug 36958: https://bugs.llvm.org/show_bug.cgi?id=36958 Differential Revision: https://reviews.llvm.org/D45099 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329197
* [SLPVectorizer][X86] Regenerate some tests. NFCISimon Pilgrim2018-04-045-74/+187
| | | | llvm-svn: 329196
* [X86][Btver2] Strip unnecessary check prefixes from resources testsSimon Pilgrim2018-04-0411-11/+11
| | | | llvm-svn: 329192
* Revert r329179 (and follow-up unsuccessful fix attempts 329184, 329186); it ↵Nico Weber2018-04-048-1897/+5
| | | | | | doesn't build. llvm-svn: 329190
* [AMDGPU][MC] Added support of 3-element addresses for MIMG instructionsDmitry Preobrazhensky2018-04-041-0/+63
| | | | | | | | | See bug 35999: https://bugs.llvm.org/show_bug.cgi?id=35999 Differential Revision: https://reviews.llvm.org/D45084 Reviewers: artem.tamazov, arsenm, timcorringham llvm-svn: 329187
* [CodeGen] Generate DWARF v5 Accelerator TablesPavel Labath2018-04-048-5/+1897
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: This patch adds a DwarfAccelTableEmitter class, which generates an accelerator table, as specified in DWARF v5 standard. At the moment it only generates a DIE offset column and (if we are indexing more than one compile unit) a CU column. Indexing type units is not currently supported, as we don't even have the ability to generate DWARF v5-compatible compile units. The implementation is not data-source agnostic like the one generating apple tables. This was not necessary as we currently only have one user of this code, and without a second user it was not obvious to me how to best abstract this. (The difference between these tables and the apple ones is that they need a lot more metadata about the debug info they are indexing). The generation is triggered by the --accel-tables argument, which supersedes the --dwarf-accel-tables arg -- the latter was a simple on-off switch, but not we can choose between two kinds of accelerator tables we can generate. This is tested by parsing the generated tables with llvm-dwarfdump and the DWARFVerifier, and I've also checked that GNU readelf is able to make sense of the tables. Differential Revision: https://reviews.llvm.org/D43286 llvm-svn: 329179
* [X86][CostModel] Use generic SSE levels instead of particular CPUs for ↵Simon Pilgrim2018-04-041-5/+5
| | | | | | shuffle costs llvm-svn: 329168
* AMDGPU: Dimension-aware image intrinsicsNicolai Haehnle2018-04-048-0/+1438
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: These new image intrinsics contain the texture type as part of their name and have each component of the address/coordinate as individual parameters. This is a preparatory step for implementing the A16 feature, where coordinates are passed as half-floats or -ints, but the Z compare value and texel offsets are still full dwords, making it difficult or impossible to distinguish between A16 on or off in the old-style intrinsics. Additionally, these intrinsics pass the 'texfailpolicy' and 'cachectrl' as i32 bit fields to reduce operand clutter and allow for future extensibility. v2: - gather4 supports 2darray images - fix a bug with 1D images on SI Change-Id: I099f309e0a394082a5901ea196c3967afb867f04 Reviewers: arsenm, rampitec, b-sumner Subscribers: kzhuravl, wdng, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D44939 llvm-svn: 329166
* StructurizeCFG: Test for branch divergence correctlyNicolai Haehnle2018-04-041-0/+82
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Fixes cases like the new test @nonuniform. In that test, %cc itself is a uniform value; however, when reading it after the end of the loop in basic block %if, its value is effectively non-uniform, so the branch is non-uniform. This problem was encountered in https://bugs.freedesktop.org/show_bug.cgi?id=103743; however, this change in itself is not sufficient to fix that bug, as there is another issue in the AMDGPU backend. As discovered after committing an earlier version of this change, this exposes a subtle interaction between this pass and DivergenceAnalysis: since we remove and re-create branch instructions, we can no longer rely on DivergenceAnalysis for branches in subregions that were already processed by the pass. Explicitly remove branch instructions from DivergenceAnalysis to avoid dangling pointers as a matter of defensive programming, and change how we detect non-uniform subregions. Change-Id: I32bbffece4a32f686fab54964dae1a5dd72949d4 Differential Revision: https://reviews.llvm.org/D43743 llvm-svn: 329165
* AMDGPU: Fix copying i1 value out of loop with non-uniform exitNicolai Haehnle2018-04-041-0/+48
| | | | | | | | | | | | | | | | | | | | | | | | Summary: When an i1-value is defined inside of a loop and used outside of it, we cannot simply use the SGPR bitmask from the loop's last iteration. There are also useful and correct cases of an i1-value being copied between basic blocks, e.g. when a condition is computed outside of a loop and used inside it. The concept of dominators is not sufficient to capture what is going on, so I propose the notion of "lane-dominators". Fixes a bug encountered in Nier: Automata. Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103743 Change-Id: If37b969ddc71d823ab3004aeafb9ea050e45bd9a Reviewers: arsenm, rampitec Subscribers: kzhuravl, wdng, mgorny, yaxunl, dstuttard, tpr, llvm-commits, t-tye Differential Revision: https://reviews.llvm.org/D40547 llvm-svn: 329164
* [AArch64] Add patterns matching (fabs (fsub x y)) to (fabd x y)John Brawn2018-04-043-0/+106
| | | | | | Differential Revision: https://reviews.llvm.org/D44573 llvm-svn: 329163
* [DAGCombine] Improve ReduceLoadWidth for SRLSam Parker2018-04-042-22/+114
| | | | | | | | | | | | | | | | | | Recommitting rL321259. Previosuly this caused an issue with PPCBE but I didn't receieve a reproducer and didn't have the time to follow up. If the issue appears again, please provide a reproducer so I can fix it. Original commit message: If the SRL node is only used by an AND, we may be able to set the ExtVT to the width of the mask, making the AND redundant. To support this, another check has been added in isLegalNarrowLoad which queries whether the load is valid. Differential Revision: https://reviews.llvm.org/D41350 llvm-svn: 329160
* [ARM] Do not convert some vmov instructionsMikhail Maltsev2018-04-041-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Patch https://reviews.llvm.org/D44467 implements conversion of invalid vmov instructions into valid ones. It turned out that some valid instructions also get converted, for example vmov.i64 d2, #0xff00ff00ff00ff00 -> vmov.i16 d2, #0xff00 Such behavior is incorrect because according to the ARM ARM section F2.7.7 Modified immediate constants in T32 and A32 Advanced SIMD instructions, "On assembly, the data type must be matched in the table if possible." This patch fixes the isNEONmovReplicate check so that the above instruction is not modified any more. Reviewers: rengolin, olista01 Reviewed By: rengolin Subscribers: javed.absar, kristof.beyls, rogfer01, llvm-commits Differential Revision: https://reviews.llvm.org/D44678 llvm-svn: 329158
* [SCEV] Prove implications for SCEVUnknown PhisMax Kazantsev2018-04-042-0/+181
| | | | | | | | | | | | | | | | | This patch teaches SCEV how to prove implications for SCEVUnknown nodes that are Phis. If we need to prove `Pred` for `LHS, RHS`, and `LHS` is a Phi with possible incoming values `L1, L2, ..., LN`, then if we prove `Pred` for `(L1, RHS), (L2, RHS), ..., (LN, RHS)` then we can also prove it for `(LHS, RHS)`. If both `LHS` and `RHS` are Phis from the same block, it is sufficient to prove the predicate for values that come from the same predecessor block. The typical case that it handles is that we sometimes need to prove that `Phi(Len, Len - 1) >= 0` given that `Len > 0`. The new logic was added to `isImpliedViaOperations` and only uses it and non-recursive reasoning to prove the facts we need, so it should not hurt compile time a lot. Differential Revision: https://reviews.llvm.org/D44001 Reviewed By: anna llvm-svn: 329150
* [SimplifyCFG] Teach merge conditional stores to handle cases where the ↵Craig Topper2018-04-041-0/+39
| | | | | | | | | | | | | | | | | | | PostBB has more than 2 predecessors by inserting a new block for the store. Summary: Currently merge conditional stores can't handle cases where PostBB (the block we need to move the store to) has more than 2 predecessors. This patch removes that restriction by creating a new block with only the 2 predecessors we care about and an unconditional branch to the original block. This provides a place to put the store. Reviewers: efriedma, jmolloy, ABataev Reviewed By: efriedma Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D39760 llvm-svn: 329142
* Add the ShadowCallStack passVlad Tsyrklevich2018-04-043-0/+206
| | | | | | | | | | | | | | | | | | | | Summary: The ShadowCallStack pass instruments functions marked with the shadowcallstack attribute. The instrumented prolog saves the return address to [gs:offset] where offset is stored and updated in [gs:0]. The instrumented epilog loads/updates the return address from [gs:0] and checks that it matches the return address on the stack before returning. Reviewers: pcc, vitalybuka Reviewed By: pcc Subscribers: cryptoad, eugenis, craig.topper, mgorny, llvm-commits, kcc Differential Revision: https://reviews.llvm.org/D44802 llvm-svn: 329139
* [MachineOutliner] Test for X86FI->getUsesRedZone() as well as ↵Jessica Paquette2018-04-031-0/+70
| | | | | | | | | | | | | | | | Attribute::NoRedZone This commit is similar to r329120, but uses the existing getUsesRedZone() function in X86MachineFunctionInfo. This teaches the outliner to look at whether or not a function *truly* uses a redzone instead of just the noredzone attribute on a function. Thus, after this commit, it's possible to outline from x86 without using -mno-red-zone and still get outlining results. This also adds a new test for the new redzone behaviour. llvm-svn: 329134
* [AMDGPU] performMinMaxCombine should not optimize patterns of vectors to ↵Farhana Aleen2018-04-032-0/+58
| | | | | | | | | | | | | | | | min3/max3. Summary: There are no packed instructions for min3 or max3. So, performMinMaxCombine should not optimize vectors of f16 to min3/max3. Author: FarhanaAleen Reviewed By: arsenm Subscribers: llvm-commits, AMDGPU Differential Revision: https://reviews.llvm.org/D45219 llvm-svn: 329131
* [InstCombine] allow more fmul folds with 'reassoc'Sanjay Patel2018-04-031-25/+28
| | | | | | | | The tests marked with 'FIXME' require loosening the check in SimplifyAssociativeOrCommutative() to optimize completely; that's still checking isFast() in Instruction::isAssociative(). llvm-svn: 329121
* [MachineOutliner] Keep track of fns that use a redzone in AArch64FunctionInfoJessica Paquette2018-04-031-0/+47
| | | | | | | | | | | | This patch adds a hasRedZone() function to AArch64MachineFunctionInfo. It returns true if the function is known to use a redzone, false if it is known to not use a redzone, and no value otherwise. This removes the requirement to pass -mno-red-zone when outlining for AArch64. https://reviews.llvm.org/D45189 llvm-svn: 329120
* Revert "MSG"Farhana Aleen2018-04-032-24/+0
| | | | | | | | This reverts commit 9a0ce889d1c39c74d69ecad5ce9c875155ae55de. This was committed by mistake. llvm-svn: 329119
* [MachineOutliner][NFC] Make outlined functions have internal linkageJessica Paquette2018-04-034-32/+40
| | | | | | | | | | | | The linkage type on outlined functions was private before. This meant that if you set a breakpoint in an outlined function, the debugger wouldn't be able to give a sane name to the outlined function. This commit changes the linkage type to internal and updates any tests that relied on the prefixes on the names of outlined functions. llvm-svn: 329116
* MSGFarhana Aleen2018-04-032-0/+24
| | | | llvm-svn: 329114
* [coroutines] Respect alloca alignment requirements when building coroutine frameGor Nishanov2018-04-031-0/+61
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: If an alloca need to be stored in the coroutine frame and it has an alignment specified and the alignment does not match the natural alignment of the alloca type. Insert appropriate padding into the coroutine frame to make sure that it gets requested alignment. For example for a packet type (which natural alignment is 1), but alloca alignment is 8, we may need to insert a padding field with required number of bytes to make sure it is properly aligned. ``` %PackedStruct = type <{ i64 }> ... %data = alloca %PackedStruct, align 8 ``` If the previous field in the coroutine frame had alignment 2, we would have [6 x i8] inserted before %PackedStruct in the coroutine frame: ``` %f.Frame = type { ..., i16, [6 x i8], %PackedStruct } ``` Reviewers: rnk, lewissbaker, modocache Reviewed By: modocache Subscribers: EricWF, llvm-commits Differential Revision: https://reviews.llvm.org/D45221 llvm-svn: 329112
* [LoopInterchange] Add remark for calls preventing interchanging.Florian Hahn2018-04-031-44/+33
| | | | | | | | | | | | | | It also updates test/Transforms/LoopInterchange/call-instructions.ll to use accesses where we can prove dependence after D35430. Reviewers: sebpop, karthikthecool, blitz.opensource Reviewed By: sebpop Differential Revision: https://reviews.llvm.org/D45206 llvm-svn: 329111
* Add the ShadowCallStack attributeVlad Tsyrklevich2018-04-031-2/+9
| | | | | | | | | | | | | | | | | | Summary: Introduce the ShadowCallStack function attribute. It's added to functions compiled with -fsanitize=shadow-call-stack in order to mark functions to be instrumented by a ShadowCallStack pass to be submitted in a separate change. Reviewers: pcc, kcc, kubamracek Reviewed By: pcc, kcc Subscribers: cryptoad, mehdi_amini, javed.absar, llvm-commits, kcc Differential Revision: https://reviews.llvm.org/D44800 llvm-svn: 329108
* [x86] add tests for convert-FP-to-integer with constants; NFCSanjay Patel2018-04-031-0/+133
| | | | | | | | | | We don't constant fold any of these, but we could...but if we do, we must produce the right answer. Unlike the IR fptosi instruction or its DAG node counterpart ISD::FP_TO_SINT, these are not undef for an out-of-range input. llvm-svn: 329100
* Disable a test using environment variables that requires a real shellDavid Blaikie2018-04-031-0/+1
| | | | llvm-svn: 329096
* [DEBUGINFO] Add option that allows to disable emission of flags in .loc ↵Alexey Bataev2018-04-031-0/+73
| | | | | | | | | | | | | | | | | | directives. Summary: Some targets do not support extended format of .loc directive and support only simple format: .loc <FileID> <Line> <Column>. Patch adds MCAsmInfo flag and option that allows emit .loc directive without additional flags. Reviewers: echristo Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D45184 llvm-svn: 329089
* [InstCombine] Fold compare of int constant against a splatted vector of intsDaniel Neilson2018-04-031-0/+127
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Folding patterns like: %vec = shufflevector <4 x i8> %insvec, <4 x i8> undef, <4 x i32> zeroinitializer %cast = bitcast <4 x i8> %vec to i32 %cond = icmp eq i32 %cast, 0 into: %ext = extractelement <4 x i8> %insvec, i32 0 %cond = icmp eq i32 %ext, 0 Combined with existing rules, this allows us to fold patterns like: %insvec = insertelement <4 x i8> undef, i8 %val, i32 0 %vec = shufflevector <4 x i8> %insvec, <4 x i8> undef, <4 x i32> zeroinitializer %cast = bitcast <4 x i8> %vec to i32 %cond = icmp eq i32 %cast, 0 into: %cond = icmp eq i8 %val, 0 When we construct a splat vector via a shuffle, and bitcast the vector into an integer type for comparison against an integer constant. Then we can simplify the the comparison to compare the splatted value against the integer constant. Reviewers: spatel, anna, mkazantsev Reviewed By: spatel Subscribers: efriedma, rengolin, llvm-commits Differential Revision: https://reviews.llvm.org/D44997 llvm-svn: 329087
* [SLP] Fix PR36481: vectorize reassociated instructions.Alexey Bataev2018-04-039-186/+126
| | | | | | | | | | | | | | | | | | Summary: If the load/extractelement/extractvalue instructions are not originally consecutive, the SLP vectorizer is unable to vectorize them. Patch allows reordering of such instructions. Patch does not support reordering of the repeated instruction, this must be handled in the separate patch. Reviewers: RKSimon, spatel, hfinkel, mkuper, Ayal, ashahid Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D43776 llvm-svn: 329085
* [llvm-mca] Move the logic that prints register file statistics to its own ↵Andrea Di Biagio2018-04-035-5/+5
| | | | | | | | | | | | | view. NFCI Before this patch, the "BackendStatistics" view was responsible for printing the register file usage (as well as many other statistics). Now users can enable register file usage statistics using the command line flag `-register-file-stats`. By default, the tool doesn't print register file statistics. llvm-svn: 329083
* [LoopInterchange] Update tests so DA can handle access after D35430.Florian Hahn2018-04-038-325/+365
| | | | | | | | | | | | | | | I have taken the opportunity to simplify some tests slightly and move parts around. It also brings back a few IR checks for interchangable loops. Reviewers: karthikthecool, sebpop, grosser Reviewed By: sebpop Differential Revision: https://reviews.llvm.org/D45207 llvm-svn: 329081
* [SLP] Added tests for checks of reordering of the repeated instructions,Alexey Bataev2018-04-031-0/+129
| | | | | | NFC. llvm-svn: 329080
* [Hexagon] Remove unneeded attributes from lit testKrzysztof Parzyszek2018-04-031-1/+1
| | | | llvm-svn: 329078
* Revert "[SLP] Fix PR36481: vectorize reassociated instructions."Benjamin Kramer2018-04-038-122/+183
| | | | | | This reverts commit r328980 and r329046. Makes the vectorizer crash. llvm-svn: 329071
* [MC][Tablegen] Allow the definition of processor register files in the ↵Andrea Di Biagio2018-04-035-13/+217
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | scheduling model for llvm-mca This patch allows the description of register files in processor scheduling models. This addresses PR36662. A new tablegen class named 'RegisterFile' has been added to TargetSchedule.td. Targets can optionally describe register files for their processors using that class. In particular, class RegisterFile allows to specify: - The total number of physical registers. - Which target registers are accessible through the register file. - The cost of allocating a register at register renaming stage. Example (from this patch - see file X86/X86ScheduleBtVer2.td) def FpuPRF : RegisterFile<72, [VR64, VR128, VR256], [1, 1, 2]> Here, FpuPRF describes a register file for MMX/XMM/YMM registers. On Jaguar (btver2), a YMM register definition consumes 2 physical registers, while MMX/XMM register definitions only cost 1 physical register. The syntax allows to specify an empty set of register classes. An empty set of register classes means: this register file models all the registers specified by the Target. For each register class, users can specify an optional register cost. By default, register costs default to 1. A value of 0 for the number of physical registers means: "this register file has an unbounded number of physical registers". This patch is structured in two parts. * Part 1 - MC/Tablegen * A first part adds the tablegen definition of RegisterFile, and teaches the SubtargetEmitter how to emit information related to register files. Information about register files is accessible through an instance of MCExtraProcessorInfo. The idea behind this design is to logically partition the processor description which is only used by external tools (like llvm-mca) from the processor information used by the llvm machine schedulers. I think that this design would make easier for targets to get rid of the extra processor information if they don't want it. * Part 2 - llvm-mca related * The second part of this patch is related to changes to llvm-mca. The main differences are: 1) class RegisterFile now needs to take into account the "cost of a register" when allocating physical registers at register renaming stage. 2) Point 1. triggered a minor refactoring which lef to the removal of the "maximum 32 register files" restriction. 3) The BackendStatistics view has been updated so that we can print out extra details related to each register file implemented by the processor. The effect of point 3. is also visible in tests register-files-[1..5].s. Differential Revision: https://reviews.llvm.org/D44980 llvm-svn: 329067
* [x86] Fix a pretty obvious think-o with my asm scrubbing. You have to inChandler Carruth2018-04-031-3959/+3959
| | | | | | | | | | fact use regular expression syntax to use regular expressions. Should restore the bots. Sorry for the noise on this test. Thanks to Philip for spotting the bug! llvm-svn: 329057
* [x86] Clean up and enhance a test around eflags copying.Chandler Carruth2018-04-031-25/+212
| | | | | | | | | | | This adds the basic test cases from all the EFLAGS bugs in more direct forms. It also switches to generated check lines, and includes both 32-bit and 64-bit variations. No functionality changing here, just setting things up to have a nice clean asm diff in my EFLAGS patch. llvm-svn: 329056
* [x86] Extend my goofy SP offset scrubbing for llc test cases to actuallyChandler Carruth2018-04-031-3959/+3959
| | | | | | | | | | | | | | | do explicit scrubbing of the offsets of stack spills and reloads. You can always turn this off in order to test specific stack slot usage. We were already hiding most of this, but the new logic hides it more generically. Notably, we should effectively hide stack slot churn in functions that have a frame pointer now, and should also hide it when changing a function from stack pointer to frame pointer. That transition already changes enough to be clearly noticed in the test case diff, showing *every* spill and reload is really noisy without benefit. See the test case I ran this on as a classic example. llvm-svn: 329055
* MSan: introduce the conservative assembly handling mode.Alexander Potapenko2018-04-031-0/+83
| | | | | | | | | | | | The default assembly handling mode may introduce false positives in the cases when MSan doesn't understand that the assembly call initializes the memory pointed to by one of its arguments. We introduce the conservative mode, which initializes the first |sizeof(type)| bytes for every |type*| pointer passed into the assembly statement. llvm-svn: 329054
* [SCEV] Make computeExitLimit more simple and more powerfulMax Kazantsev2018-04-031-0/+34
| | | | | | | | | | | | | | | | | | | | | | | Current implementation of `computeExitLimit` has a big piece of code the only purpose of which is to prove that after the execution of this block the latch will be executed. What it currently checks is actually a subset of situations where the exiting block dominates latch. This patch replaces all these checks for simple particular cases with domination check over loop's latch which is the only necessary condition of taking the exiting block into consideration. This change allows to calculate exact loop taken count for simple loops like for (int i = 0; i < 100; i++) { if (cond) {...} else {...} if (i > 50) break; . . . } Differential Revision: https://reviews.llvm.org/D44677 Reviewed By: efriedma llvm-svn: 329047
* bpf: fix incorrect SELECT_CC loweringYonghong Song2018-04-033-3/+3
| | | | | | | | | | | | | | | | | | | | | | | Commit 37962a331c77 ("bpf: Improve expanding logic in LowerSELECT_CC") intended to improve code quality for certain jmp conditions. The commit, however, has a couple of issues: (1). In code, just swap is not enough, ConditionalCode CC should also be swapped, otherwise incorrect code will be generated. (2). The ConditionalCode swap should be subject to getHasJmpExt(). If getHasJmpExt() is False, certain conditional codes will not be supported and swap may generate incorrect code. The original goal for this patch is to optimize jmp operations which does not have JmpExt turned on. If JmpExt is on, better code could be generated. For example, the test select_ri.ll is introduced to demonstrate the optimization. The same result can be achieved with -mcpu=v2 flag. Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> llvm-svn: 329043
* peel loops with runtime small trip countsIkhlas Ajbar2018-04-031-0/+37
| | | | | | | | | | For Hexagon, peeling loops with small runtime trip count is beneficial for our benchmarks. We set PeelCount in HexagonTargetInfo.cpp and we use PeelCount set by the target for computing the desired peel count. Differential Revision: https://reviews.llvm.org/D44880 llvm-svn: 329042
* [x86] Tidy up test case, generate check lines with script. NFC.Chandler Carruth2018-04-031-141/+412
| | | | | | | | | Just adds basic block labels and tidies up where comments go in the test case and then generates fresh CHECK lines with the script. This way, the check lines are much easier to maintain. They were already close to this but not quite there. llvm-svn: 329040
* [SLP] Distinguish "demanded and shrinkable" from "demanded and not ↵Haicheng Wu2018-04-032-19/+28
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | shrinkable" values when determining the minimum bitwidth We use two approaches for determining the minimum bitwidth. * Demanded bits * Value tracking If demanded bits doesn't result in a narrower type, we then try value tracking. We need this if we want to root SLP trees with the indices of getelementptr instructions since all the bits of the indices are demanded. But there is a missing piece though. We need to be able to distinguish "demanded and shrinkable" from "demanded and not shrinkable". For example, the bits of %i in %i = sext i32 %e1 to i64 %gep = getelementptr inbounds i64, i64* %p, i64 %i are demanded, but we can shrink %i's type to i32 because it won't change the result of the getelementptr. On the other hand, in %tmp15 = sext i32 %tmp14 to i64 %tmp16 = insertvalue { i64, i64 } undef, i64 %tmp15, 0 it doesn't make sense to shrink %tmp15 and we can skip the value tracking. Ideas are from Matthew Simpson! Differential Revision: https://reviews.llvm.org/D44868 llvm-svn: 329035
OpenPOWER on IntegriCloud