bcm5719-llvm - Project Ortega BCM5719 LLVM

	Commit message (Collapse)	Author	Age	Files	Lines
*	[Test] Split up TestIntegerTypes.py	Jonas Devlieghere	2019-11-21	10	-217/+297
\| \| \| \| \|	The unsplit test is timing out on GreenDragon's sanitized bot. By splitting the test we avoid this issue and increase parallelism.
*	Debug info: Emit objc_direct methods as members of their containing class	Adrian Prantl	2019-11-21	3	-24/+50
\| \| \| \| \| \| \| \| \|	even in DWARF 4 and earlier. This allows the debugger to recognize them as direct functions as opposed to Objective-C methods. <rdar://problem/57327663> Differential Revision: https://reviews.llvm.org/D70544
*	[cmake] Explicitly mark libraries defined in lib/ as "Component Libraries"	Tom Stellard	2019-11-21	139	-154/+156
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: Most libraries are defined in the lib/ directory but there are also a few libraries defined in tools/ e.g. libLLVM, libLTO. I'm defining "Component Libraries" as libraries defined in lib/ that may be included in libLLVM.so. Explicitly marking the libraries in lib/ as component libraries allows us to remove some fragile checks that attempt to differentiate between lib/ libraries and tools/ libraires: 1. In tools/llvm-shlib, because llvm_map_components_to_libnames(LIB_NAMES "all") returned a list of all libraries defined in the whole project, there was custom code needed to filter out libraries defined in tools/, none of which should be included in libLLVM.so. This code assumed that any library defined as static was from lib/ and everything else should be excluded. With this change, llvm_map_components_to_libnames(LIB_NAMES, "all") only returns libraries that have been added to the LLVM_COMPONENT_LIBS global cmake property, so this custom filtering logic can be removed. Doing this also fixes the build with BUILD_SHARED_LIBS=ON and LLVM_BUILD_LLVM_DYLIB=ON. 2. There was some code in llvm_add_library that assumed that libraries defined in lib/ would not have LLVM_LINK_COMPONENTS or ARG_LINK_COMPONENTS set. This is only true because libraries defined lib lib/ use LLVMBuild.txt and don't set these values. This code has been fixed now to check if the library has been explicitly marked as a component library, which should now make it easier to remove LLVMBuild at some point in the future. I have tested this patch on Windows, MacOS and Linux with release builds and the following combinations of CMake options: - "" (No options) - -DLLVM_BUILD_LLVM_DYLIB=ON - -DLLVM_LINK_LLVM_DYLIB=ON - -DBUILD_SHARED_LIBS=ON - -DBUILD_SHARED_LIBS=ON -DLLVM_BUILD_LLVM_DYLIB=ON - -DBUILD_SHARED_LIBS=ON -DLLVM_LINK_LLVM_DYLIB=ON Reviewers: beanz, smeenai, compnerd, phosek Reviewed By: beanz Subscribers: wuzish, jholewinski, arsenm, dschuff, jyknight, dylanmckay, sdardis, nemanjai, jvesely, nhaehnle, mgorny, mehdi_amini, sbc100, jgravelle-google, hiraditya, aheejin, fedor.sergeev, asb, rbar, johnrusso, simoncook, apazos, sabuasal, niosHD, jrtc27, MaskRay, zzheng, edward-jones, atanasyan, steven_wu, rogfer01, MartinMosbeck, brucehoult, the_o, dexonsmith, PkmX, jocewei, jsji, dang, Jim, lenary, s.egerton, pzheng, sameer.abuasal, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70179
*	Broaden the definition of a "widenable branch"	Philip Reames	2019-11-21	7	-26/+455
\| \| \| \| \| \| \| \| \| \| \| \|	As a reminder, a "widenable branch" is the pattern "br i1 (and i1 X, WC()), label %taken, label %untaken" where "WC" is the widenable condition intrinsics. The semantics of such a branch (derived from the semantics of WC) is that a new condition can be added into the condition arbitrarily without violating legality. Broaden the definition in two ways: Allow swapped operands to the br (and X, WC()) form Allow widenable branch w/trivial condition (i.e. true) which takes form of br i1 WC() The former is just general robustness (e.g. for X = non-instruction this is what instcombine produces). The later is specifically important as partial unswitching of a widenable range check produces exactly this form above the loop. Differential Revision: https://reviews.llvm.org/D70502
*	[Tests] Autogenerate a bunch of SCEV trip count tests for readability. Will ↵	Philip Reames	2019-11-21	9	-260/+443
\| \| \| \|	likely merge some of these files soon.
*	[OPENMP50]Add device/kind context selector support.	Alexey Bataev	2019-11-21	14	-74/+905
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Added basic parsing/sema support for device/kind context selector. Reviewers: jdoerfert Subscribers: rampitec, aheejin, fedor.sergeev, simoncook, guansong, s.egerton, hfinkel, kkwli0, caomhin, cfe-commits Tags: #clang Differential Revision: https://reviews.llvm.org/D70245
*	[SCEV] Add a mode to skip classification when printing analysis	Philip Reames	2019-11-21	2	-168/+64
\| \| \| \|	For the various trip-count tests, the classification isn't useful and makes the auto-generated tests super verbose. By skipping it, we make the auto-gen tests closer to the manually written ones. Up next: auto-genning a bunch of the existings tests.
*	[scudo][standalone] Minor optimization & improvements	Kostya Kortchinsky	2019-11-21	3	-11/+31
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: A few small improvements and optimizations: - when refilling the free list, push back the last batch and return the front one: this allows to keep the allocations towards the front of the region; - instead of using 48 entries in the shuffle array, use a multiple of `MaxNumCached`; - make the maximum number of batches to create on refil a constant; ultimately it should be configurable, but that's for later; - `initCache` doesn't need to zero out the cache, it's already done. - it turns out that when using `\|\|` or `&&`, the compiler is adamant on adding a short circuit for every part of the expression. Which ends up making somewhat annoying asm with lots of test and conditional jump. I am changing that to bitwise `\|` or `&` in two place so that the generated code looks better. Added comments since it might feel weird to people. This yields to some small performance gains overall, nothing drastic though. Reviewers: hctim, morehouse, cferris, eugenis Subscribers: #sanitizers, llvm-commits Tags: #sanitizers, #llvm Differential Revision: https://reviews.llvm.org/D70452
*	[Docs] Generate the LLDB man page with Sphinx	Jonas Devlieghere	2019-11-21	5	-156/+324
\| \| \| \| \| \| \| \| \| \| \|	This patch replaces the existing out-of-date man page for lldb and replaces it with an RST file from which sphinx generates the actual troff file. This is similar to how man pages are generated for the rest of the LLVM utilities. The man page is generated by building the `docs-lldb-man` target. Differential revision: https://reviews.llvm.org/D70514
*	[SCEV] Be robust against IR generated by simple-loop-unswitch	Philip Reames	2019-11-21	2	-48/+74
\| \| \| \| \| \|	Simple loop unswitch likes to leave around unsimplified and/or/xors. SCEV today bails out on these idioms which is unfortunate in general, and specifically for the unswitch interaction. Differential Revision: https://reviews.llvm.org/D70459
*	[ELF] Error if -Ttext-segment is specified	Fangrui Song	2019-11-21	4	-8/+19
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	In GNU ld, -Ttext sets the address of the .text section and -Ttext-segment sets the address of the text segment (RX). gold only supports the -Ttext-segment semantic and treats -Ttext as an alias for -Ttext-segment. lld only supports the -Ttext semantic and treats -Ttext-segment as an alias for -Ttext. The text segment will be assigned to an address less than the specified -Ttext-segment value. This patch drops the -Ttext-segment alias. The text segment is traditionally the first segment. Users who specify -Ttext-segment may actually want to specify --image-base, the lld way to express this. Unfortunately currently this is supported by GNU ld's COFF port but not by its ELF port. gold does not support this option. With -z separate-code, the behavior of GNU ld -Ttext-segment is weird (see https://sourceware.org/bugzilla/show_bug.cgi?id=25207) rL289827 introduced the alias for linking qemu's non-pie user mode binaries. As explained previously, this actually assigns the text segment to an address less than 0x60000000. I feel that a better fix is on the qemu side: https://lists.nongnu.org/archive/html/qemu-devel/2019-11/msg02480.html Reviewed By: grimar, ruiu Differential Revision: https://reviews.llvm.org/D70468
*	[llvm-objcopy][MachO] Implement --strip-debug	Fangrui Song	2019-11-21	2	-3/+41
\| \| \| \| \| \|	Reviewed By: alexshap Differential Revision: https://reviews.llvm.org/D70476
*	[llvm-objcopy][MachO] Fix symbol order in the symbol table	Fangrui Song	2019-11-21	3	-3/+11
\| \| \| \| \| \| \| \| \| \|	Only consider isUndefinedSymbol() when the symbol is not local. This fixes an assert failure when copying the symbol table, if a n_type=0x20 symbol is followed by a n_type=0x64 symbol. Reviewed By: alexshap, seiya Differential Revision: https://reviews.llvm.org/D70475
*	[BranchFolding] Fix PR43964 about branch folder not being debug invariant	Bjorn Pettersson	2019-11-21	2	-95/+183
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: The fix in BranchFolder related to non debug invariant problems done in commit ec32dff0b075055 actually introduced some new problems with debug invariance. Before that patch ComputeCommonTailLength would move iterators back, past debug instructions, in order to make ProfitableToMerge make consistent answers "when one block differs from the other only by whether debugging pseudos are present at the beginning". But the changes in ec32dff0b075055 undid that by moving the iterators forward again. This patch refactors ComputeCommonTailLength. The function was really complex, considering that the SkipTopCFIAndReturn part always moved the iterators forward to the first "real" instruction in the found tail after ec32dff0b075055. The patch also restores the logic to "back past possible debugging pseudos at beginning of block" to make sure ProfitableToMerge gives consistent answers independent of DBG_VALUE instructions before the tail. That is now done by ProfitableToMerge instead of being hidden as a side-effect in ComputeCommonTailLength. Reviewers: probinson, yechunliang, jmorse Reviewed By: jmorse Subscribers: Orlando, mehdi_amini, dexonsmith, aprantl, hiraditya, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70091
*	Fix compilation warning. NFC.	Michael Liao	2019-11-21	1	-1/+1
\|
*	[NFC] Refactor and improve comments in CommandObjectTarget	Adrian McCarthy	2019-11-21	1	-145/+141
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Made small improvements while debugging through CommandObjectTarget::AddModuleSymbols. 1. Refactored error case for an early out, reducing the indentation of the rest of this long function. 2. Clarified some comments by correcting spelling and punctuation. 3. Reduced duplicate code at the end of the function. Tested with `ninja check-lldb` Differential Review: https://reviews.llvm.org/D70458
*	Reduce the number of iterations in testcase. (NFC)	Adrian Prantl	2019-11-21	1	-2/+2
\|
*	[OPENMP]Fix datasharing checks for if clause in parallel taskloop	Alexey Bataev	2019-11-21	3	-3/+27
\| \| \| \| \| \| \| \|	directives. If the default datasharing is set to none, the datasharing attributes for variables in the condition of the if clause for the inner taskloop directive must be verified.
*	[InstCombine] add assert in SimplifyDemandedVectorElts and improve ↵	Sanjay Patel	2019-11-21	1	-19/+22
\| \| \| \|	readability; NFC
*	Fix unused variable warning. NFCI.	Simon Pilgrim	2019-11-21	1	-1/+1
\|
*	LLD: Don't use the stderrOS stream in link before it's reassigned.	James Y Knight	2019-11-21	9	-33/+30
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove the lld::enableColors function, as it just obscures which stream it's affecting, and replace with explicit calls to the stream's enable_colors. Also, assign the stderrOS and stdoutOS globals first in link function, just to ensure nothing might use them. (Either change individually fixes the issue of using the old stream, but both together seems best.) Follow-up to b11386f9be9b2dc7276a758d64f66833da10bdea. Differential Revision: https://reviews.llvm.org/D70492
*	[Hexagon] Remove incorrect intrinsic definition and invalid testcase	Krzysztof Parzyszek	2019-11-21	2	-36/+0
\| \| \| \| \| \| \| \| \|	The intrinsic int_hexagon_S2_asr_i_vh was mapped to S2_asr_r_vh, which is wrong. The testcase vasrh.select.ll was using an invalid immediate for that intrinsic. This is not a proper testcase, since at the MIR level such use of this intrinsic should never appear. Together with 824b25fc02, this completes the fix for llvm.org/PR44090.
*	[OPENMP50]Add if clause in for simd directive.	Alexey Bataev	2019-11-21	7	-107/+319
\| \| \| \| \| \|	According to OpenMP 5.0, if clause can be used in for simd directive. If condition in the if clause if false, the non-vectorized version of the loop must be executed.
*	[mips] Add a 'generic' Mips CPU	Miloš Stojanović	2019-11-21	2	-0/+5
\| \| \| \| \| \| \|	Having a generic CPU removes a warning when creating a subtarget without the CPU being explicitly specified. Differential Revision: https://reviews.llvm.org/D70490
*	[DAGCombiner] Use the right thumbv7meb triple for ARM big-endian test.	Clement Courbet	2019-11-21	1	-93/+71
\|
*	[LV] PreferPredicateOverEpilog respecting option	Sjoerd Meijer	2019-11-21	2	-1/+23
\| \| \| \| \| \| \| \|	Follow-up of cb47b8783: don't query TTI->preferPredicateOverEpilogue when option -prefer-predicate-over-epilog is set to false, i.e. when we prefer not to predicate the loop. Differential Revision: https://reviews.llvm.org/D70382
*	[lldb][NFC] Modernize string handling in ↵	Raphael Isemann	2019-11-21	1	-11/+8
\| \| \| \|	ClangExpressionDeclMap::FindExternalVisibleDecl
*	[lldb][NFC] Move searching functions in ClangExpressionDeclMap to own function	Raphael Isemann	2019-11-21	2	-89/+121
\|
*	[DeclCXX] Remove unknown external linkage specifications	Ehud Katz	2019-11-21	8	-44/+7
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Partial revert of r372681 "Support for DWARF-5 C++ language tags". The change introduced new external linkage languages ("C++11" and "C++14") which not supported in C++. It also changed the definition of the existing enum to use the DWARF constants. The problem is that "LinkageSpecDeclBits.Language" (the field that reserves this enum) is actually defined as 3 bits length (bitfield), which cannot contain the new DWARF constants. Defining the enum as integer literals is more appropriate for maintaining valid values. Differential Revision: https://reviews.llvm.org/D69935
*	[Debuginfo][NFC] removes redundant semicolon.	Alexey Lapshin	2019-11-21	1	-1/+1
\|
*	[lldb][NFC] Reduce scope of some variables in ↵	Raphael Isemann	2019-11-21	1	-5/+3
\| \| \| \|	ClangExpressionDeclMap::FindExternalVisibleDecls
*	Make coding standards document more inclusive	Dmitri Gribenko	2019-11-21	2	-332/+180
\| \| \| \| \| \| \| \| \| \| \| \|	Summary: Patch by Doug Gregor, Tres Popp, and Dmitri Gribenko. Reviewers: chandlerc Subscribers: hfinkel, bmcreusillet, arsenm, doug.gregor, mgrang, arphaman, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D69354
*	Revert "[RISCV] Support mutilib in baremetal environment"	Zakk Chen	2019-11-21	29	-201/+11
\| \| \| \| \|	This reverts commit df876a026981b7a125b31bbb85ba4b1144edb0f9. Clang::riscv32-toolchain.c Clang::riscv64-toolchain.c fails on Windows.
*	[DAGCombiner] Add tests for thumb load-combine.	Clement Courbet	2019-11-21	2	-0/+593
\|
*	Statistic - Fix MSVC shadow warning against global PrintOnExit static ↵	Simon Pilgrim	2019-11-21	2	-3/+3
\| \| \| \|	variable. NFC.
*	Fix Wshadow warning against global None variable. NFC.	Simon Pilgrim	2019-11-21	1	-2/+2
\|
*	[lldb][NFC] Remove test directory completely	Tatyana Krasnukha	2019-11-21	3	-2/+0
\| \| \| \| \| \|	The test was moved to "completion-in-lambda-and-unnamed-class" by D66175. + Fix typo in the directory name.
*	[lldb][NFC] Move searching local variables into own function	Raphael Isemann	2019-11-21	2	-39/+74
\|
*	[Driver] Fix a shadowing warning. NFC	Ilya Biryukov	2019-11-21	1	-7/+7
\| \| \| \| \|	Found by the following buildbot: http://lab.llvm.org:8011/builders/clang-cmake-x86_64-sde-avx512-linux/builds/30084
*	[lldb][NFC] Move searching the ClangModulesDeclVendor into own function	Raphael Isemann	2019-11-21	2	-72/+86
\|
*	Reland 9f3fdb0d7fab: [Driver] Use VFS to check if sanitizer blacklists exist	Ilya Biryukov	2019-11-21	13	-29/+188
\| \| \| \| \| \| \|	With updates to various LLVM tools that use SpecialCastList. It was tempting to use RealFileSystem as the default, but that makes it too easy to accidentally forget passing VFS in clang code.
*	dwarfdump --statistics: Use new location list api	Pavel Labath	2019-11-21	3	-40/+147
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Summary: This patch removes manual location list handling in the statistics code and replaces it with the new DWARFDie api, which provides access to a "cooked" location list. This has the following effects: - the code now properly handles split-dwarf location lists - it will automatically support dwarf5 location lists once support for those is added - it properly handles location lists with base address selection entries - it fixes a bug where the location list code was using the first DW_AT_ranges range as a "base address" of the compile unit (it should have used DW_AT_low_pc instead. The effect of this was that the computation of the start address of a variable in its scope was broken for these kinds of compile units. This only manifested itself on linked files, since in object files the first DW_AT_ranges range normally starts at 0. Since pretty much every kind of location list was broken in some way, it's hard to verify that the new implementation is correct -- the output will be different in all non-trivial cases, and mostly with good reason. Most of the existing statistics tests continue to pass though, and a visual inspection of the statistics for non-trivial inputs shows that the data is more "reasonable" now. I have updated the "dwo statistics" test to include the new numbers, as the previous ones were completely bogus, and I have added a targeted test for the "base address" bug. Reviewers: dblaikie, cmtice, vsk Subscribers: aprantl, SouraVX, JDevlieghere, djtodoro, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70444
*	[mips] Rename test case. NFC	Simon Atanasyan	2019-11-21	1	-0/+0
\|
*	[mips] Remove unused `IsPCRelativeLoad` MIPS instructions attribute. NFC	Simon Atanasyan	2019-11-21	3	-11/+3
\| \| \| \|	This attribute is always set to zero.
*	[mips] Remove addresses from the test case. NFC	Simon Atanasyan	2019-11-21	1	-40/+40
\| \| \| \|	It reduces "diff" after addition more tests in the future.
*	Revert "[DependenceAnalysis] Dependecies for loads marked with ↵	Benjamin Kramer	2019-11-21	1	-19/+6
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	"ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151" Summary: Revert "[DependenceAnalysis] Dependecies for loads marked with "ivnariant.load" should not be shared with general accesses. Fix for https://bugs.llvm.org/show_bug.cgi?id=42151" This reverts commit 5f026b6d9e882941fde9b7e5dc0a2d807f7f24f5. We're (tensorflow.org/xla team) seeing some misscompiles with the new change, only at -O3, with fast math disabled. I'm still trying to come up with a useful/small/external example, but for now, the following IR: ``` ; ModuleID = '__compute_module' source_filename = "__compute_module" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-grtev4-linux-gnu" @0 = private unnamed_addr constant [4 x i8] c"\DB\0F\C9@" @1 = private unnamed_addr constant [4 x i8] c"\00\00\00?" ; Function Attrs: uwtable define void @jit_wrapped_fun.31(i8* %retval, i8* noalias %run_options, i8 noalias %params, i8 noalias %buffer_table, i64* noalias %prof_counters) #0 { entry: %fusion.invar_address.dim.2 = alloca i64 %fusion.invar_address.dim.1 = alloca i64 %fusion.invar_address.dim.0 = alloca i64 %fusion.1.invar_address.dim.2 = alloca i64 %fusion.1.invar_address.dim.1 = alloca i64 %fusion.1.invar_address.dim.0 = alloca i64 %0 = getelementptr inbounds i8, i8* %buffer_table, i64 1 %1 = load i8, i8* %0, !invariant.load !0, !dereferenceable !1, !align !2 %parameter.3 = bitcast i8* %1 to [2 x [1 x [4 x float]]]* %2 = getelementptr inbounds i8, i8* %buffer_table, i64 5 %3 = load i8, i8* %2, !invariant.load !0, !dereferenceable !1, !align !2 %fusion.1 = bitcast i8* %3 to [2 x [1 x [4 x float]]]* store i64 0, i64* %fusion.1.invar_address.dim.0 br label %fusion.1.loop_header.dim.0 fusion.1.loop_header.dim.0: ; preds = %fusion.1.loop_exit.dim.1, %entry %fusion.1.indvar.dim.0 = load i64, i64* %fusion.1.invar_address.dim.0 %4 = icmp uge i64 %fusion.1.indvar.dim.0, 2 br i1 %4, label %fusion.1.loop_exit.dim.0, label %fusion.1.loop_body.dim.0 fusion.1.loop_body.dim.0: ; preds = %fusion.1.loop_header.dim.0 store i64 0, i64* %fusion.1.invar_address.dim.1 br label %fusion.1.loop_header.dim.1 fusion.1.loop_header.dim.1: ; preds = %fusion.1.loop_exit.dim.2, %fusion.1.loop_body.dim.0 %fusion.1.indvar.dim.1 = load i64, i64* %fusion.1.invar_address.dim.1 %5 = icmp uge i64 %fusion.1.indvar.dim.1, 1 br i1 %5, label %fusion.1.loop_exit.dim.1, label %fusion.1.loop_body.dim.1 fusion.1.loop_body.dim.1: ; preds = %fusion.1.loop_header.dim.1 store i64 0, i64* %fusion.1.invar_address.dim.2 br label %fusion.1.loop_header.dim.2 fusion.1.loop_header.dim.2: ; preds = %fusion.1.loop_body.dim.2, %fusion.1.loop_body.dim.1 %fusion.1.indvar.dim.2 = load i64, i64* %fusion.1.invar_address.dim.2 %6 = icmp uge i64 %fusion.1.indvar.dim.2, 4 br i1 %6, label %fusion.1.loop_exit.dim.2, label %fusion.1.loop_body.dim.2 fusion.1.loop_body.dim.2: ; preds = %fusion.1.loop_header.dim.2 %7 = load float, float* bitcast ([4 x i8]* @0 to float) %8 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]] %parameter.3, i64 0, i64 %fusion.1.indvar.dim.0, i64 0, i64 %fusion.1.indvar.dim.2 %9 = load float, float* %8, !invariant.load !0, !noalias !3 %10 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %fusion.1.indvar.dim.0, i64 0, i64 %fusion.1.indvar.dim.2 %11 = load float, float* %10, !invariant.load !0, !noalias !3 %12 = fmul float %9, %11 %13 = fmul float %7, %12 %14 = call float @llvm.log.f32(float %13) %15 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %fusion.1, i64 0, i64 %fusion.1.indvar.dim.0, i64 0, i64 %fusion.1.indvar.dim.2 store float %14, float* %15, !alias.scope !7, !noalias !8 %invar.inc2 = add nuw nsw i64 %fusion.1.indvar.dim.2, 1 store i64 %invar.inc2, i64* %fusion.1.invar_address.dim.2 br label %fusion.1.loop_header.dim.2 fusion.1.loop_exit.dim.2: ; preds = %fusion.1.loop_header.dim.2 %invar.inc1 = add nuw nsw i64 %fusion.1.indvar.dim.1, 1 store i64 %invar.inc1, i64* %fusion.1.invar_address.dim.1 br label %fusion.1.loop_header.dim.1 fusion.1.loop_exit.dim.1: ; preds = %fusion.1.loop_header.dim.1 %invar.inc = add nuw nsw i64 %fusion.1.indvar.dim.0, 1 store i64 %invar.inc, i64* %fusion.1.invar_address.dim.0 br label %fusion.1.loop_header.dim.0 fusion.1.loop_exit.dim.0: ; preds = %fusion.1.loop_header.dim.0 %16 = getelementptr inbounds i8, i8* %buffer_table, i64 4 %17 = load i8, i8* %16, !invariant.load !0, !dereferenceable !9, !align !2 %parameter.1 = bitcast i8* %17 to float* %18 = getelementptr inbounds i8, i8* %buffer_table, i64 2 %19 = load i8, i8* %18, !invariant.load !0, !dereferenceable !10, !align !2 %parameter.2 = bitcast i8* %19 to [3 x [1 x float]]* %20 = getelementptr inbounds i8, i8* %buffer_table, i64 0 %21 = load i8, i8* %20, !invariant.load !0, !dereferenceable !11, !align !2 %fusion = bitcast i8* %21 to [2 x [3 x [4 x float]]]* store i64 0, i64* %fusion.invar_address.dim.0 br label %fusion.loop_header.dim.0 fusion.loop_header.dim.0: ; preds = %fusion.loop_exit.dim.1, %fusion.1.loop_exit.dim.0 %fusion.indvar.dim.0 = load i64, i64* %fusion.invar_address.dim.0 %22 = icmp uge i64 %fusion.indvar.dim.0, 2 br i1 %22, label %fusion.loop_exit.dim.0, label %fusion.loop_body.dim.0 fusion.loop_body.dim.0: ; preds = %fusion.loop_header.dim.0 store i64 0, i64* %fusion.invar_address.dim.1 br label %fusion.loop_header.dim.1 fusion.loop_header.dim.1: ; preds = %fusion.loop_exit.dim.2, %fusion.loop_body.dim.0 %fusion.indvar.dim.1 = load i64, i64* %fusion.invar_address.dim.1 %23 = icmp uge i64 %fusion.indvar.dim.1, 3 br i1 %23, label %fusion.loop_exit.dim.1, label %fusion.loop_body.dim.1 fusion.loop_body.dim.1: ; preds = %fusion.loop_header.dim.1 store i64 0, i64* %fusion.invar_address.dim.2 br label %fusion.loop_header.dim.2 fusion.loop_header.dim.2: ; preds = %fusion.loop_body.dim.2, %fusion.loop_body.dim.1 %fusion.indvar.dim.2 = load i64, i64* %fusion.invar_address.dim.2 %24 = icmp uge i64 %fusion.indvar.dim.2, 4 br i1 %24, label %fusion.loop_exit.dim.2, label %fusion.loop_body.dim.2 fusion.loop_body.dim.2: ; preds = %fusion.loop_header.dim.2 %25 = mul nuw nsw i64 %fusion.indvar.dim.2, 1 %26 = add nuw nsw i64 0, %25 %27 = udiv i64 %26, 4 %28 = mul nuw nsw i64 %fusion.indvar.dim.0, 1 %29 = add nuw nsw i64 0, %28 %30 = udiv i64 %29, 2 %31 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %fusion.1, i64 0, i64 %29, i64 0, i64 %26 %32 = load float, float* %31, !alias.scope !7, !noalias !8 %33 = mul nuw nsw i64 %fusion.indvar.dim.1, 1 %34 = add nuw nsw i64 0, %33 %35 = udiv i64 %34, 3 %36 = load float, float* %parameter.1, !invariant.load !0, !noalias !3 %37 = getelementptr inbounds [3 x [1 x float]], [3 x [1 x float]]* %parameter.2, i64 0, i64 %34, i64 0 %38 = load float, float* %37, !invariant.load !0, !noalias !3 %39 = fsub float %36, %38 %40 = fmul float %39, %39 %41 = mul nuw nsw i64 %fusion.indvar.dim.2, 1 %42 = add nuw nsw i64 0, %41 %43 = udiv i64 %42, 4 %44 = mul nuw nsw i64 %fusion.indvar.dim.0, 1 %45 = add nuw nsw i64 0, %44 %46 = udiv i64 %45, 2 %47 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %45, i64 0, i64 %42 %48 = load float, float* %47, !invariant.load !0, !noalias !3 %49 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %parameter.3, i64 0, i64 %45, i64 0, i64 %42 %50 = load float, float* %49, !invariant.load !0, !noalias !3 %51 = fmul float %48, %50 %52 = fdiv float %40, %51 %53 = fadd float %32, %52 %54 = fneg float %53 %55 = load float, float* bitcast ([4 x i8]* @1 to float) %56 = fmul float %54, %55 %57 = getelementptr inbounds [2 x [3 x [4 x float]]], [2 x [3 x [4 x float]]] %fusion, i64 0, i64 %fusion.indvar.dim.0, i64 %fusion.indvar.dim.1, i64 %fusion.indvar.dim.2 store float %56, float* %57, !alias.scope !8, !noalias !12 %invar.inc5 = add nuw nsw i64 %fusion.indvar.dim.2, 1 store i64 %invar.inc5, i64* %fusion.invar_address.dim.2 br label %fusion.loop_header.dim.2 fusion.loop_exit.dim.2: ; preds = %fusion.loop_header.dim.2 %invar.inc4 = add nuw nsw i64 %fusion.indvar.dim.1, 1 store i64 %invar.inc4, i64* %fusion.invar_address.dim.1 br label %fusion.loop_header.dim.1 fusion.loop_exit.dim.1: ; preds = %fusion.loop_header.dim.1 %invar.inc3 = add nuw nsw i64 %fusion.indvar.dim.0, 1 store i64 %invar.inc3, i64* %fusion.invar_address.dim.0 br label %fusion.loop_header.dim.0 fusion.loop_exit.dim.0: ; preds = %fusion.loop_header.dim.0 %58 = getelementptr inbounds i8, i8* %buffer_table, i64 3 %59 = load i8, i8* %58, !invariant.load !0, !dereferenceable !2, !align !2 %tuple.30 = bitcast i8* %59 to [1 x i8] %60 = bitcast [2 x [3 x [4 x float]]]* %fusion to i8* %61 = getelementptr inbounds [1 x i8], [1 x i8]* %tuple.30, i64 0, i64 0 store i8* %60, i8** %61, !alias.scope !14, !noalias !8 ret void } ; Function Attrs: nounwind readnone speculatable willreturn declare float @llvm.log.f32(float) #1 attributes #0 = { uwtable "no-frame-pointer-elim"="false" } attributes #1 = { nounwind readnone speculatable willreturn } !0 = !{} !1 = !{i64 32} !2 = !{i64 8} !3 = !{!4, !6} !4 = !{!"buffer: {index:0, offset:0, size:96}", !5} !5 = !{!"XLA global AA domain"} !6 = !{!"buffer: {index:5, offset:0, size:32}", !5} !7 = !{!6} !8 = !{!4} !9 = !{i64 4} !10 = !{i64 12} !11 = !{i64 96} !12 = !{!13, !6} !13 = !{!"buffer: {index:3, offset:0, size:8}", !5} !14 = !{!13} ``` gets (correctly) optimized to the one below without the change: ``` ; ModuleID = '__compute_module' source_filename = "__compute_module" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-grtev4-linux-gnu" ; Function Attrs: nofree nounwind uwtable define void @jit_wrapped_fun.31(i8* nocapture readnone %retval, i8* noalias nocapture readnone %run_options, i8 noalias nocapture readnone %params, i8 noalias nocapture readonly %buffer_table, i64* noalias nocapture readnone %prof_counters) local_unnamed_addr #0 { entry: %0 = getelementptr inbounds i8, i8* %buffer_table, i64 1 %1 = bitcast i8 %0 to [2 x [1 x [4 x float]]] %2 = load [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %1, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %3 = getelementptr inbounds i8, i8* %buffer_table, i64 5 %4 = bitcast i8 %3 to [2 x [1 x [4 x float]]] %5 = load [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %4, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %6 = bitcast [2 x [1 x [4 x float]]]* %2 to <4 x float>* %7 = load <4 x float>, <4 x float>* %6, align 8, !invariant.load !0, !noalias !3 %8 = fmul <4 x float> %7, %7 %9 = fmul <4 x float> %8, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000> %10 = call <4 x float> @llvm.log.v4f32(<4 x float> %9) %11 = bitcast [2 x [1 x [4 x float]]]* %5 to <4 x float>* store <4 x float> %10, <4 x float>* %11, align 8, !alias.scope !7, !noalias !8 %12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0 %13 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0 %14 = bitcast float* %12 to <4 x float>* %15 = load <4 x float>, <4 x float>* %14, align 8, !invariant.load !0, !noalias !3 %16 = fmul <4 x float> %15, %15 %17 = fmul <4 x float> %16, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000> %18 = call <4 x float> @llvm.log.v4f32(<4 x float> %17) %19 = bitcast float* %13 to <4 x float>* store <4 x float> %18, <4 x float>* %19, align 8, !alias.scope !7, !noalias !8 %20 = getelementptr inbounds i8, i8* %buffer_table, i64 4 %21 = bitcast i8 %20 to float %22 = load float, float* %21, align 8, !invariant.load !0, !dereferenceable !9, !align !2 %23 = getelementptr inbounds i8, i8* %buffer_table, i64 2 %24 = bitcast i8 %23 to [3 x [1 x float]] %25 = load [3 x [1 x float]], [3 x [1 x float]]* %24, align 8, !invariant.load !0, !dereferenceable !10, !align !2 %26 = load i8, i8* %buffer_table, align 8, !invariant.load !0, !dereferenceable !11, !align !2 %27 = load float, float* %22, align 8, !invariant.load !0, !noalias !3 %.phi.trans.insert28 = getelementptr inbounds [3 x [1 x float]], [3 x [1 x float]]* %25, i64 0, i64 2, i64 0 %.pre29 = load float, float* %.phi.trans.insert28, align 8, !invariant.load !0, !noalias !3 %28 = bitcast [3 x [1 x float]]* %25 to <2 x float>* %29 = load <2 x float>, <2 x float>* %28, align 8, !invariant.load !0, !noalias !3 %30 = insertelement <2 x float> undef, float %27, i32 0 %31 = shufflevector <2 x float> %30, <2 x float> undef, <2 x i32> zeroinitializer %32 = fsub <2 x float> %31, %29 %33 = fmul <2 x float> %32, %32 %shuffle30 = shufflevector <2 x float> %33, <2 x float> undef, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1> %34 = fsub float %27, %.pre29 %35 = fmul float %34, %34 %36 = insertelement <4 x float> undef, float %35, i32 0 %37 = shufflevector <4 x float> %36, <4 x float> undef, <4 x i32> zeroinitializer %shuffle = shufflevector <4 x float> %10, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %38 = fmul <4 x float> %7, %7 %shuffle31 = shufflevector <4 x float> %38, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %39 = fdiv <8 x float> %shuffle30, %shuffle31 %40 = fadd <8 x float> %shuffle, %39 %41 = fmul <8 x float> %40, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %42 = bitcast i8* %26 to <8 x float>* store <8 x float> %41, <8 x float>* %42, align 8, !alias.scope !8, !noalias !12 %43 = getelementptr inbounds i8, i8* %26, i64 32 %44 = fdiv <4 x float> %37, %38 %45 = fadd <4 x float> %10, %44 %46 = fmul <4 x float> %45, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %47 = bitcast i8* %43 to <4 x float>* store <4 x float> %46, <4 x float>* %47, align 8, !alias.scope !8, !noalias !12 %.phi.trans.insert = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0 %.phi.trans.insert12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0 %48 = bitcast float* %.phi.trans.insert to <4 x float>* %49 = load <4 x float>, <4 x float>* %48, align 8, !alias.scope !7, !noalias !8 %50 = bitcast float* %.phi.trans.insert12 to <4 x float>* %51 = load <4 x float>, <4 x float>* %50, align 8, !invariant.load !0, !noalias !3 %shuffle.1 = shufflevector <4 x float> %49, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %52 = getelementptr inbounds i8, i8* %26, i64 48 %53 = fmul <4 x float> %51, %51 %shuffle31.1 = shufflevector <4 x float> %53, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %54 = fdiv <8 x float> %shuffle30, %shuffle31.1 %55 = fadd <8 x float> %shuffle.1, %54 %56 = fmul <8 x float> %55, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %57 = bitcast i8* %52 to <8 x float>* store <8 x float> %56, <8 x float>* %57, align 8, !alias.scope !8, !noalias !12 %58 = getelementptr inbounds i8, i8* %26, i64 80 %59 = fdiv <4 x float> %37, %53 %60 = fadd <4 x float> %49, %59 %61 = fmul <4 x float> %60, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %62 = bitcast i8* %58 to <4 x float>* store <4 x float> %61, <4 x float>* %62, align 8, !alias.scope !8, !noalias !12 %63 = getelementptr inbounds i8, i8* %buffer_table, i64 3 %64 = bitcast i8** %63 to [1 x i8]* %65 = load [1 x i8], [1 x i8]* %64, align 8, !invariant.load !0, !dereferenceable !2, !align !2 %66 = getelementptr inbounds [1 x i8], [1 x i8]* %65, i64 0, i64 0 store i8* %26, i8** %66, align 8, !alias.scope !14, !noalias !8 ret void } ; Function Attrs: nounwind readnone speculatable willreturn declare <4 x float> @llvm.log.v4f32(<4 x float>) #1 attributes #0 = { nofree nounwind uwtable "no-frame-pointer-elim"="false" } attributes #1 = { nounwind readnone speculatable willreturn } !0 = !{} !1 = !{i64 32} !2 = !{i64 8} !3 = !{!4, !6} !4 = !{!"buffer: {index:0, offset:0, size:96}", !5} !5 = !{!"XLA global AA domain"} !6 = !{!"buffer: {index:5, offset:0, size:32}", !5} !7 = !{!6} !8 = !{!4} !9 = !{i64 4} !10 = !{i64 12} !11 = !{i64 96} !12 = !{!13, !6} !13 = !{!"buffer: {index:3, offset:0, size:8}", !5} !14 = !{!13} ``` and (incorrectly) optimized to the one below with the change: ``` ; ModuleID = '__compute_module' source_filename = "__compute_module" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128" target triple = "x86_64-grtev4-linux-gnu" ; Function Attrs: nofree nounwind uwtable define void @jit_wrapped_fun.31(i8* nocapture readnone %retval, i8* noalias nocapture readnone %run_options, i8 noalias nocapture readnone %params, i8 noalias nocapture readonly %buffer_table, i64* noalias nocapture readnone %prof_counters) local_unnamed_addr #0 { entry: %0 = getelementptr inbounds i8, i8* %buffer_table, i64 1 %1 = bitcast i8 %0 to [2 x [1 x [4 x float]]] %2 = load [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %1, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %3 = getelementptr inbounds i8, i8* %buffer_table, i64 5 %4 = bitcast i8 %3 to [2 x [1 x [4 x float]]] %5 = load [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %4, align 8, !invariant.load !0, !dereferenceable !1, !align !2 %6 = bitcast [2 x [1 x [4 x float]]]* %2 to <4 x float>* %7 = load <4 x float>, <4 x float>* %6, align 8, !invariant.load !0, !noalias !3 %8 = fmul <4 x float> %7, %7 %9 = fmul <4 x float> %8, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000> %10 = call <4 x float> @llvm.log.v4f32(<4 x float> %9) %11 = bitcast [2 x [1 x [4 x float]]]* %5 to <4 x float>* store <4 x float> %10, <4 x float>* %11, align 8, !alias.scope !7, !noalias !8 %12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0 %13 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0 %14 = bitcast float* %12 to <4 x float>* %15 = load <4 x float>, <4 x float>* %14, align 8, !invariant.load !0, !noalias !3 %16 = fmul <4 x float> %15, %15 %17 = fmul <4 x float> %16, <float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000, float 0x401921FB60000000> %18 = call <4 x float> @llvm.log.v4f32(<4 x float> %17) %19 = bitcast float* %13 to <4 x float>* store <4 x float> %18, <4 x float>* %19, align 8, !alias.scope !7, !noalias !8 %20 = getelementptr inbounds i8, i8* %buffer_table, i64 4 %21 = bitcast i8 %20 to float %22 = load float, float* %21, align 8, !invariant.load !0, !dereferenceable !9, !align !2 %23 = getelementptr inbounds i8, i8* %buffer_table, i64 2 %24 = bitcast i8 %23 to [3 x [1 x float]] %25 = load [3 x [1 x float]], [3 x [1 x float]]* %24, align 8, !invariant.load !0, !dereferenceable !10, !align !2 %26 = load i8, i8* %buffer_table, align 8, !invariant.load !0, !dereferenceable !11, !align !2 %27 = load float, float* %22, align 8, !invariant.load !0, !noalias !3 %.phi.trans.insert28 = getelementptr inbounds [3 x [1 x float]], [3 x [1 x float]]* %25, i64 0, i64 2, i64 0 %.pre29 = load float, float* %.phi.trans.insert28, align 8, !invariant.load !0, !noalias !3 %28 = bitcast [3 x [1 x float]]* %25 to <2 x float>* %29 = load <2 x float>, <2 x float>* %28, align 8, !invariant.load !0, !noalias !3 %30 = insertelement <2 x float> undef, float %27, i32 0 %31 = shufflevector <2 x float> %30, <2 x float> undef, <2 x i32> zeroinitializer %32 = fsub <2 x float> %31, %29 %33 = fmul <2 x float> %32, %32 %shuffle32 = shufflevector <2 x float> %33, <2 x float> undef, <8 x i32> <i32 0, i32 0, i32 0, i32 0, i32 1, i32 1, i32 1, i32 1> %34 = fsub float %27, %.pre29 %35 = fmul float %34, %34 %36 = insertelement <4 x float> undef, float %35, i32 0 %37 = shufflevector <4 x float> %36, <4 x float> undef, <4 x i32> zeroinitializer %shuffle = shufflevector <4 x float> %10, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %38 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 0, i64 0, i64 3 %39 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 0, i64 0, i64 3 %40 = fmul <4 x float> %7, %7 %41 = shufflevector <4 x float> %40, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 undef, i32 undef, i32 undef, i32 undef> %42 = fdiv <8 x float> %shuffle32, %41 %43 = fadd <8 x float> %shuffle, %42 %44 = fmul <8 x float> %43, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %45 = bitcast i8* %26 to <8 x float>* store <8 x float> %44, <8 x float>* %45, align 8, !alias.scope !8, !noalias !12 %46 = extractelement <4 x float> %10, i32 0 %47 = getelementptr inbounds i8, i8* %26, i64 32 %48 = extractelement <4 x float> %10, i32 1 %49 = extractelement <4 x float> %10, i32 2 %50 = load float, float* %38, align 4, !alias.scope !7, !noalias !8 %51 = load float, float* %39, align 4, !invariant.load !0, !noalias !3 %52 = fmul float %51, %51 %53 = insertelement <4 x float> undef, float %52, i32 3 %54 = fdiv <4 x float> %37, %53 %55 = insertelement <4 x float> undef, float %46, i32 0 %56 = insertelement <4 x float> %55, float %48, i32 1 %57 = insertelement <4 x float> %56, float %49, i32 2 %58 = insertelement <4 x float> %57, float %50, i32 3 %59 = fadd <4 x float> %58, %54 %60 = fmul <4 x float> %59, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %61 = bitcast i8* %47 to <4 x float>* store <4 x float> %60, <4 x float>* %61, align 8, !alias.scope !8, !noalias !12 %.phi.trans.insert = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 0 %.phi.trans.insert12 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 0 %62 = bitcast float* %.phi.trans.insert to <4 x float>* %63 = load <4 x float>, <4 x float>* %62, align 8, !alias.scope !7, !noalias !8 %64 = bitcast float* %.phi.trans.insert12 to <4 x float>* %65 = load <4 x float>, <4 x float>* %64, align 8, !invariant.load !0, !noalias !3 %shuffle.1 = shufflevector <4 x float> %63, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %66 = getelementptr inbounds i8, i8* %26, i64 48 %67 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %5, i64 0, i64 1, i64 0, i64 3 %68 = getelementptr inbounds [2 x [1 x [4 x float]]], [2 x [1 x [4 x float]]]* %2, i64 0, i64 1, i64 0, i64 3 %69 = fmul <4 x float> %65, %65 %70 = shufflevector <4 x float> %69, <4 x float> undef, <8 x i32> <i32 0, i32 1, i32 2, i32 3, i32 0, i32 1, i32 2, i32 3> %71 = fdiv <8 x float> %shuffle32, %70 %72 = fadd <8 x float> %shuffle.1, %71 %73 = fmul <8 x float> %72, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %74 = bitcast i8* %66 to <8 x float>* store <8 x float> %73, <8 x float>* %74, align 8, !alias.scope !8, !noalias !12 %75 = extractelement <4 x float> %69, i32 0 %76 = extractelement <4 x float> %63, i32 0 %77 = getelementptr inbounds i8, i8* %26, i64 80 %78 = extractelement <4 x float> %69, i32 1 %79 = extractelement <4 x float> %63, i32 1 %80 = extractelement <4 x float> %69, i32 2 %81 = extractelement <4 x float> %63, i32 2 %82 = load float, float* %67, align 4, !alias.scope !7, !noalias !8 %83 = load float, float* %68, align 4, !invariant.load !0, !noalias !3 %84 = fmul float %83, %83 %85 = insertelement <4 x float> undef, float %75, i32 0 %86 = insertelement <4 x float> %85, float %78, i32 1 %87 = insertelement <4 x float> %86, float %80, i32 2 %88 = insertelement <4 x float> %87, float %84, i32 3 %89 = fdiv <4 x float> %37, %88 %90 = insertelement <4 x float> undef, float %76, i32 0 %91 = insertelement <4 x float> %90, float %79, i32 1 %92 = insertelement <4 x float> %91, float %81, i32 2 %93 = insertelement <4 x float> %92, float %82, i32 3 %94 = fadd <4 x float> %93, %89 %95 = fmul <4 x float> %94, <float -5.000000e-01, float -5.000000e-01, float -5.000000e-01, float -5.000000e-01> %96 = bitcast i8* %77 to <4 x float>* store <4 x float> %95, <4 x float>* %96, align 8, !alias.scope !8, !noalias !12 %97 = getelementptr inbounds i8, i8* %buffer_table, i64 3 %98 = bitcast i8** %97 to [1 x i8]* %99 = load [1 x i8], [1 x i8]* %98, align 8, !invariant.load !0, !dereferenceable !2, !align !2 %100 = getelementptr inbounds [1 x i8], [1 x i8]* %99, i64 0, i64 0 store i8* %26, i8** %100, align 8, !alias.scope !14, !noalias !8 ret void } ; Function Attrs: nounwind readnone speculatable willreturn declare <4 x float> @llvm.log.v4f32(<4 x float>) #1 attributes #0 = { nofree nounwind uwtable "no-frame-pointer-elim"="false" } attributes #1 = { nounwind readnone speculatable willreturn } !0 = !{} !1 = !{i64 32} !2 = !{i64 8} !3 = !{!4, !6} !4 = !{!"buffer: {index:0, offset:0, size:96}", !5} !5 = !{!"XLA global AA domain"} !6 = !{!"buffer: {index:5, offset:0, size:32}", !5} !7 = !{!6} !8 = !{!4} !9 = !{i64 4} !10 = !{i64 12} !11 = !{i64 96} !12 = !{!13, !6} !13 = !{!"buffer: {index:3, offset:0, size:8}", !5} !14 = !{!13} ``` This results in bad numerical answers when used through XLA. Again, it's not that easy to give a small fully-reproducible example, but the misscompare is: ``` Expected literal: ( f32[2,3,4] { { { nan, -inf, -3181.35, -inf }, { nan, -inf, -28.2577019, -inf }, { nan, -inf, -28.2577019, -inf } }, { { -inf, -inf, -inf, -inf }, { -6.60753046e+28, -1.47314833e+23, -inf, -inf }, { -2.43504347e+30, -5.42892693e+24, -inf, -inf } } } ) Actual literal: ( f32[2,3,4] { { { nan, -inf, -3181.35, -inf }, { nan, -inf, -inf, -inf }, { inf, -inf, -28.2577019, -inf } }, { { -inf, -inf, -inf, -inf }, { -6.60753046e+28, -1.47314833e+23, -inf, -inf }, { -2.43504347e+30, -5.42892693e+24, -inf, -inf } } } ) ``` Reviewers: sanjoy.google, sanjoy, ebrevnov, jdoerfert, reames, chandlerc Subscribers: hiraditya, Charusso, llvm-commits Tags: #llvm Differential Revision: https://reviews.llvm.org/D70516
*	[OpenCL] Fix address space for base method call (PR43145)	Sven van Haastregt	2019-11-21	2	-2/+35
\| \| \| \| \| \| \| \|	Clang was creating an UncheckedDerivedToBase ImplicitCastExpr that was also casting between address spaces. Insert an ImplicitCastExpr node for doing the address space conversion. Differential Revision: https://reviews.llvm.org/D69810
*	Atomics: support min/max orthogonally	Tim Northover	2019-11-21	9	-31/+193
\| \| \| \| \| \| \| \| \| \| \| \|	We seem to have been gradually growing support for atomic min/max operations (exposing longstanding IR atomicrmw instructions). But until now there have been gaps in the expected intrinsics. This adds support for the C11-style intrinsics (i.e. taking _Atomic, rather than individually blessed by C11 standard), and the variants that return the new value instead of the original one. That way, people won't be misled by trying one form and it not working, and the front-end is more friendly to people using _Atomic types, as we recommend.
*	Revert "[Driver] Use VFS to check if sanitizer blacklists exist"	Ilya Biryukov	2019-11-21	9	-175/+23
\| \| \| \| \|	This reverts commit ba6f906854263375cff3257d22d241a8a259cf77. Commit caused compilation errors on llvm tests. Will fix and re-land.
*	[COFF] Widen PE32Header fields to fit 64 bit versions	Martin Storsjö	2019-11-21	2	-6/+8
\| \| \| \| \| \| \| \| \| \| \| \|	The PE32Header struct is only used by COFFYAML, for intermediate storage. The struct doesn't match the on-disk struct layout as it uses native integers instead of e.g. support::ulittle32_t, so just widen the fields to fit values for object::pe32plus_header, in addition to object::pe32_header. This avoids truncating the 64 bit ImageBase for 64 bit executables. Differential Revision: https://reviews.llvm.org/D70464