summaryrefslogtreecommitdiffstats
path: root/llvm/lib/CodeGen
Commit message (Collapse)AuthorAgeFilesLines
* [GlobalISel][IRTranslator] Fix crash on translation of fneg.Amara Emerson2019-01-261-1/+1
| | | | | | | When the fneg IR instruction was added the code to do translation wasn't tested, and tried to get an invalid operand. llvm-svn: 352296
* AMDGPU/GlobalISel: Legalize more bit opsMatt Arsenault2019-01-261-0/+3
| | | | llvm-svn: 352295
* Revert r352255 "[SelectionDAG][X86] Don't use SEXTLOAD for promoting masked ↵Craig Topper2019-01-261-1/+1
| | | | | | | | loads in the type legalizer" This might be breaking an lldb windows buildbot. llvm-svn: 352268
* [SelectionDAG][X86] Don't use SEXTLOAD for promoting masked loads in the ↵Craig Topper2019-01-261-1/+1
| | | | | | | | | | | | | | | | | | | | | type legalizer Summary: I'm not sure why we were using SEXTLOAD. EXTLOAD seems more appropriate since we don't care about the upper bits. This patch changes this and then modifies the X86 post legalization combine to emit a extending shuffle instead of a sign_extend_vector_inreg. Could maybe use an any_extend_vector_inreg, but I just did what we already do in LowerLoad. I think we can actually get rid of this code entirely if we switch to -x86-experimental-vector-widening-legalization. On AVX512 targets I think we might be able to use a masked vpmovzx and not have to expand this at all. Reviewers: RKSimon, spatel Reviewed By: RKSimon Subscribers: llvm-commits Differential Revision: https://reviews.llvm.org/D57186 llvm-svn: 352255
* [NFC] Test commit : fix typo.Alexey Lapshin2019-01-251-1/+1
| | | | llvm-svn: 352248
* [MBP] Don't move bottom block before header if it can't reduce taken branchesGuozhi Wei2019-01-251-0/+38
| | | | | | | | | | | | | | | | | | If bottom of block BB has only one successor OldTop, in most cases it is profitable to move it before OldTop, except the following case: -->OldTop<- | . | | . | | . | ---Pred | | | BB----- Move BB before OldTop can't reduce the number of taken branches, this patch detects this case and prevent the moving. Differential Revision: https://reviews.llvm.org/D57067 llvm-svn: 352236
* [MC] Teach the MachO object writer about N_FUNC_COLDVedant Kumar2019-01-251-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | N_FUNC_COLD is a new MachO symbol attribute. It's a hint to the linker to order a symbol towards the end of its section, to improve locality. Example: ``` void a1() {} __attribute__((cold)) void a2() {} void a3() {} int main() { a1(); a2(); a3(); return 0; } ``` A linker that supports N_FUNC_COLD will order _a2 to the end of the text section. From `nm -njU` output, we see: ``` _a1 _a3 _main _a2 ``` Differential Revision: https://reviews.llvm.org/D57190 llvm-svn: 352227
* [TEST][COMMIT] - fix comment typo in AsmPrinter/DwarfDebug.cpp - NFCTom Weaver2019-01-251-1/+1
| | | | llvm-svn: 352214
* Fix gcc -Wparentheses warning. NFCI.Simon Pilgrim2019-01-251-2/+2
| | | | llvm-svn: 352191
* AMDGPU/GlobalISel: Scalarize add/subMatt Arsenault2019-01-251-0/+1
| | | | llvm-svn: 352167
* GlobalISel: fewerElementsVector for more cast typesMatt Arsenault2019-01-251-0/+5
| | | | llvm-svn: 352166
* GlobalISel: fewerElementsVector for a few more trivial opsMatt Arsenault2019-01-251-0/+6
| | | | llvm-svn: 352165
* AMDGPU/GlobalISel: Legalize smulh/umulh and scalarize mulMatt Arsenault2019-01-251-0/+3
| | | | llvm-svn: 352162
* GlobalISel: Support fewerElementsVector for icmp/fcmpMatt Arsenault2019-01-251-3/+75
| | | | | | Also legalize 64-bit compares for AMDGPU llvm-svn: 352157
* GlobalISel: Implement fewerElementsVector for extensionsMatt Arsenault2019-01-251-0/+54
| | | | llvm-svn: 352155
* GlobalISel: Add convenience mutatations to scalarizeMatt Arsenault2019-01-252-0/+12
| | | | llvm-svn: 352143
* RegBankSelect: Fix use after free in r352123Matt Arsenault2019-01-241-1/+1
| | | | llvm-svn: 352130
* [GISel]: Change how CSE is enabled by default for each passAditya Nandakumar2019-01-243-6/+12
| | | | | | | | | | | | | | | https://reviews.llvm.org/D57178 Now add a hook in TargetPassConfig to query if CSE needs to be enabled. By default this hook returns false only for O0 opt level but this can be overridden by the target. As a consequence of the default of enabled for non O0, a few tests needed to be updated to not use CSE (by passing in -O0) to the run line. reviewed by: arsenm llvm-svn: 352126
* RegBankSelect: Support some more complex part mappingsMatt Arsenault2019-01-242-25/+87
| | | | llvm-svn: 352123
* [GlobalISel][AArch64] Add isel support for FP16 vector @llvm.ceilJessica Paquette2019-01-241-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | This patch adds support for vector @llvm.ceil intrinsics when full 16 bit floating point support isn't available. To do this, this patch... - Implements basic isel for G_UNMERGE_VALUES - Teaches the legalizer about 16 bit floats - Teaches AArch64RegisterBankInfo to respect floating point registers on G_BUILD_VECTOR and G_UNMERGE_VALUES - Teaches selectCopy about 16-bit floating point vectors It also adds - A legalizer test for the 16-bit vector ceil which verifies that we create a G_UNMERGE_VALUES and G_BUILD_VECTOR when full fp16 isn't supported - An instruction selection test which makes sure we lower to G_FCEIL when full fp16 is supported - A test for selecting G_UNMERGE_VALUES And also updates arm64-vfloatintrinsics.ll to show that the new ceiling types work as expected. https://reviews.llvm.org/D56682 llvm-svn: 352113
* Fix emission of _fltused for MSVC.James Y Knight2019-01-244-26/+30
| | | | | | | | | | | | | | | | | | | | | | | | | | It should be emitted when any floating-point operations (including calls) are present in the object, not just when calls to printf/scanf with floating point args are made. The difference caused by this is very subtle: in static (/MT) builds, on x86-32, in a program that uses floating point but doesn't print it, the default x87 rounding mode may not be set properly upon initialization. This commit also removes the walk of the types pointed to by pointer arguments in calls. (To assist in opaque pointer types migration -- eventually the pointee type won't be available.) That latter implies that it will no longer consider a call like `scanf("%f", &floatvar)` as sufficient to emit _fltused on its own. And without _fltused, `scanf("%f")` will abort with error R6002. This new behavior is unlikely to bite anyone in practice (you'd have to read a float, and do nothing with it!), and also, is consistent with MSVC. Differential Revision: https://reviews.llvm.org/D56548 llvm-svn: 352076
* [SelectionDAGBuilder] Simplify HasSideEffect calculation. NFC.Nirav Dave2019-01-241-13/+7
| | | | llvm-svn: 352067
* [InlineAsm] Don't calculate registers for inline asm memory operands. NFCI.Nirav Dave2019-01-241-0/+4
| | | | llvm-svn: 352066
* [TargetLowering] Rename getExpandedFixedPointMultiplication to ↵Simon Pilgrim2019-01-242-3/+2
| | | | | | | | expandFixedPointMul. NFCI. Match the (much shorter) name used in various legalization methods. llvm-svn: 352056
* [SelectionDAGBuilder] Fuse inline asm input operand loops passes. NFCI.Nirav Dave2019-01-241-13/+9
| | | | llvm-svn: 352053
* [X86] Update SelectionDAGDumper to print the extension type and expanding ↵Craig Topper2019-01-241-0/+30
| | | | | | flag for masked loads. Add truncating and compressing for masked stores. llvm-svn: 352029
* DebugInfo: Use assembly label arithmetic for address pool size for easier ↵David Blaikie2019-01-242-9/+17
| | | | | | | | | reading/editing Recommits 350048, 350050 That broke buildbots because of some typos in the test case. llvm-svn: 352019
* [ADT] Notify ilist traits about in-list transfersReid Kleckner2019-01-231-2/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | Summary: Previously no client of ilist traits has needed to know about transfers of nodes within the same list, so as an optimization, ilist doesn't call transferNodesFromList in that case. However, now there are clients that want to use ilist traits to cache instruction ordering information to optimize dominance queries of instructions in the same basic block. This change updates the existing ilist traits users to detect in-list transfers and do nothing in that case. After this change, we can start caching instruction ordering information in LLVM IR data structures. There are two main ways to do that: - by putting an order integer into the Instruction class - by maintaining order integers in a hash table on BasicBlock I plan to implement and measure both, but I wanted to commit this change first to enable other out of tree ilist clients to implement this optimization as well. Reviewers: lattner, hfinkel, chandlerc Subscribers: hiraditya, dexonsmith, llvm-commits Differential Revision: https://reviews.llvm.org/D57120 llvm-svn: 351992
* [MC][X86] Correctly model additional operand latency caused by transfer ↵Andrea Di Biagio2019-01-231-2/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | delays from the integer to the floating point unit. This patch adds a new ReadAdvance definition named ReadInt2Fpu. ReadInt2Fpu allows x86 scheduling models to accurately describe delays caused by data transfers from the integer unit to the floating point unit. ReadInt2Fpu currently defaults to a delay of zero cycles (i.e. no delay) for all x86 models excluding BtVer2. That means, this patch is only a functional change for the Jaguar cpu model only. Tablegen definitions for instructions (V)PINSR* have been updated to account for the new ReadInt2Fpu. That read is mapped to the the GPR input operand. On Jaguar, int-to-fpu transfers are modeled as a +6cy delay. Before this patch, that extra delay was added to the opcode latency. In practice, the insert opcode only executes for 1cy. Most of the actual latency is actually contributed by the so-called operand-latency. According to the AMD SOG for family 16h, (V)PINSR* latency is defined by expression f+1, where f is defined as a forwarding delay from the integer unit to the fpu. When printing instruction latency from MCA (see InstructionInfoView.cpp) and LLC (only when flag -print-schedule is speified), we now need to account for any extra forwarding delays. We do this by checking if scheduling classes declare any negative ReadAdvance entries. Quoting a code comment in TargetSchedule.td: "A negative advance effectively increases latency, which may be used for cross-domain stalls". When computing the instruction latency for the purpose of our scheduling tests, we now add any extra delay to the formula. This avoids regressing existing codegen and mca schedule tests. It comes with the cost of an extra (but very simple) hook in MCSchedModel. Differential Revision: https://reviews.llvm.org/D57056 llvm-svn: 351965
* [DAGCombine] Enable more pre-indexed storesSam Parker2019-01-231-1/+7
| | | | | | | | | | | The current check in CombineToPreIndexedLoadStore is too conversative, preventing a pre-indexed store when the base pointer is a predecessor of the value being stored. Instead, we should check the pointer operand of the store. Differential Revision: https://reviews.llvm.org/D56719 llvm-svn: 351933
* [Pipeliner] Add two pragmas to control software pipelining optimizationBrendon Cahoon2019-01-231-7/+76
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | #pragma clang loop pipeline(disable) Disable SWP optimization for the next loop. “disable” is the only possible value. #pragma clang loop pipeline_initiation_interval(number) Set value of initiation interval for SWP optimization to specified number value for the next loop. Number is the positive value greater than 0. These pragmas could be used for debugging or reducing compile time purposes. It is possible to disable SWP for concrete loops to save compilation time or to find bugs by not doing SWP to certain loops. It is possible to set value of initiation interval to concrete number to save compilation time by not doing extra pipeliner passes or to check created schedule for specific initiation interval. That is llvm part of the fix Clang part of fix: https://reviews.llvm.org/D55710 Patch by Alexey Lapshin! Differential Revision: https://reviews.llvm.org/D56403 llvm-svn: 351923
* [CodeView] Allow empty types in member functionsJosh Stone2019-01-231-1/+4
| | | | | | | | | | | | | | | | | | | | | Summary: `CodeViewDebug::lowerTypeMemberFunction` used to default to a `Void` return type if the function's type array was empty. After D54667, it started blindly indexing the 0th item for the return type, which fails in `getOperand` for empty arrays if assertions are enabled. This patch restores the `Void` return type for empty type arrays, and adds a test generated by Rust in line-only debuginfo mode. Reviewers: zturner, rnk Reviewed By: rnk Subscribers: hiraditya, JDevlieghere, llvm-commits Differential Revision: https://reviews.llvm.org/D57070 llvm-svn: 351910
* [LegalizeTypes] Add debug prints to the top of PromoteFloatOperand and ↵Craig Topper2019-01-221-0/+12
| | | | | | | | | | PromoteFloatResult. Also add debug prints in the default case of the switches in these routines. Most if not all of the type legalization handlers already do this so this makes promoting floats consistent llvm-svn: 351890
* GlobalISel: Allow shift amount to be a different typeMatt Arsenault2019-01-221-17/+47
| | | | | | | | | For AMDGPU the shift amount is never 64-bit, and this needs to use a 32-bit shift. X86 uses i8, but seemed to be hacking around this before. llvm-svn: 351882
* GlobalISel: Make buildConstant handle vectorsMatt Arsenault2019-01-221-4/+38
| | | | | | | Produce a splat build_vector similar to how SelectionDAG::getConstant does. llvm-svn: 351880
* GlobalISel: Implement widen for extract_vector_elt elt typeMatt Arsenault2019-01-221-1/+16
| | | | llvm-svn: 351871
* GlobalISel: Implement fewerElementsVector for basic FP opsMatt Arsenault2019-01-221-7/+37
| | | | llvm-svn: 351866
* GlobalISel: Support narrowing zextload/sextloadMatt Arsenault2019-01-221-0/+27
| | | | llvm-svn: 351856
* [SelectionDAGBuilder] Defer C_Register Assignments to be in line withNirav Dave2019-01-221-13/+3
| | | | | | those of C_RegisterClass. NFCI. llvm-svn: 351854
* GlobalISel: Disallow vectors for G_CONSTANT/G_FCONSTANTMatt Arsenault2019-01-222-4/+12
| | | | llvm-svn: 351853
* Codegen support for atomicrmw fadd/fsubMatt Arsenault2019-01-223-0/+5
| | | | llvm-svn: 351851
* Reapply "IR: Add fp operations to atomicrmw"Matt Arsenault2019-01-221-0/+6
| | | | | | | This reapplies commits r351778 and r351782 with RISCV test fixes. llvm-svn: 351850
* [DEBUG_INFO, NVPTX] Fix relocation info.Alexey Bataev2019-01-223-22/+54
| | | | | | | | | | | | Summary: Initial function labels must follow the debug location for the correct relocation info generation. Reviewers: tra, jlebar, echristo Subscribers: jholewinski, llvm-commits Differential Revision: https://reviews.llvm.org/D45784 llvm-svn: 351843
* [DAGCombiner] narrow vector binop with 2 insert subvector operandsSanjay Patel2019-01-221-1/+24
| | | | | | | | | | | | | | | | | | vecbo (insertsubv undef, X, Z), (insertsubv undef, Y, Z) --> insertsubv VecC, (vecbo X, Y), Z This is another step in generic vector narrowing. It's also a step towards more horizontal op formation specifically for x86 (although we still failed to match those in the affected tests). The scalarization cases are also not optimal (we should be scalarizing those), but it's still an improvement to use a narrower vector op when we know part of the result must be constant because both inputs are undef in some vector lanes. I think a similar match but checking for a constant operand might help some of the cases in D51553. Differential Revision: https://reviews.llvm.org/D56875 llvm-svn: 351825
* Revert r351778: IR: Add fp operations to atomicrmwChandler Carruth2019-01-221-6/+0
| | | | | | | | | | | | | This broke the RISCV build, and even with that fixed, one of the RISCV tests behaves surprisingly differently with asserts than without, leaving there no clear test pattern to use. Generally it seems bad for hte IR to differ substantially due to asserts (as in, an alloca is used with asserts that isn't needed without!) and nothing I did simply would fix it so I'm reverting back to green. This also required reverting the RISCV build fix in r351782. llvm-svn: 351796
* IR: Add fp operations to atomicrmwMatt Arsenault2019-01-221-0/+6
| | | | | | Add just fadd/fsub for now. llvm-svn: 351778
* GlobalISel: Fix out of bounds crashes in verifierMatt Arsenault2019-01-221-3/+8
| | | | llvm-svn: 351769
* [DAGCombiner] fix crash when converting build vector to shuffleSanjay Patel2019-01-211-5/+11
| | | | | | | | | | The regression test is reduced from the example shown in D56281. This does raise a question as noted in the test file: do we want to handle this pattern? I don't have a motivating example for that on x86 yet, but it seems like we could have that pattern there too, so we could avoid the back-and-forth using a shuffle. llvm-svn: 351753
* GlobalISel: Add isPointer legality predicatesMatt Arsenault2019-01-201-0/+14
| | | | llvm-svn: 351699
* GlobalISel: Implement widenScalar for basic FP opsMatt Arsenault2019-01-201-4/+13
| | | | llvm-svn: 351696
OpenPOWER on IntegriCloud