SSLHandshakeException with MOE 1.10

We are trying to upgrade our project to MOE 1.10, and have found an issue making HTTPS requests with Java’s URLConnection. Does someone know what could have changed in this release to cause this issue?

I’ve put together a repro using the Calculator demo: moe-samples-java/Calculator/common/src/main/java/org/moe/samples/calculator/common/CalcOperations.java at tls · mcosand-caltopo/moe-samples-java · GitHub
When you add two numbers, make a request to https://google.com and return the response code as the result of the sum. On exception, print the stack trace to the XCode log and display -1.0.
The repro works on MOE 1.9, but returns the following stack on 1.10. Any ideas?

javax.net.ssl.SSLHandshakeException: java.lang.RuntimeException: error:1006706B:elliptic curve routines:ec_GFp_simple_oct2point:point is not on curve
	at com.android.org.conscrypt.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:344)
	at com.android.okhttp.Connection.connectTls(Connection.java:235)
	at com.android.okhttp.Connection.connectSocket(Connection.java:199)
	at com.android.okhttp.Connection.connect(Connection.java:172)
	at com.android.okhttp.Connection.connectAndSetOwner(Connection.java:367)
	at com.android.okhttp.OkHttpClient$1.connectAndSetOwner(OkHttpClient.java:130)
	at com.android.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:329)
	at com.android.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:246)
	at com.android.okhttp.internal.huc.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:442)
	at com.android.okhttp.internal.huc.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:393)
	at com.android.okhttp.internal.huc.HttpURLConnectionImpl.getResponseCode(HttpURLConnectionImpl.java:506)
	at com.android.okhttp.internal.huc.DelegatingHttpsURLConnection.getResponseCode(DelegatingHttpsURLConnection.java:105)
	at com.android.okhttp.internal.huc.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:25)
	at org.moe.samples.calculator.common.CalcOperations.sum(CalcOperations.java:41)
	at org.moe.samples.calculator.common.CalcOperations.calculate(CalcOperations.java:85)
	at org.moe.samples.calculator.common.CalculatorAdapter.calculateAndPrepare(CalculatorAdapter.java:258)
	at org.moe.samples.calculator.common.CalculatorAdapter.sendNewSymbol(CalculatorAdapter.java:166)
	at org.moe.samples.calculator.ios.ui.AppViewController.buttonEqPressed(AppViewController.java:176)
	at apple.uikit.c.UIKit.UIApplicationMain(Native Method)
	at org.moe.samples.calculator.ios.Main.main(Main.java:47)
	at java.lang.reflect.Method.invoke(Native Method)
	at org.moe.IOSLauncher.main(IOSLauncher.java:34)
Caused by: java.security.cert.CertificateException: java.lang.RuntimeException: error:1006706B:elliptic curve routines:ec_GFp_simple_oct2point:point is not on curve
	at com.android.org.conscrypt.OpenSSLSocketImpl.verifyCertificateChain(OpenSSLSocketImpl.java:593)
	at com.android.org.conscrypt.NativeCrypto.SSL_do_handshake(Native Method)
	at com.android.org.conscrypt.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:340)
	... 21 more
Caused by: java.lang.RuntimeException: error:1006706B:elliptic curve routines:ec_GFp_simple_oct2point:point is not on curve
	at com.android.org.conscrypt.NativeCrypto.X509_get_pubkey(Native Method)
	at com.android.org.conscrypt.OpenSSLX509Certificate.getPublicKey(OpenSSLX509Certificate.java:429)
	at com.android.org.conscrypt.ChainStrengthAnalyzer.checkKeyLength(ChainStrengthAnalyzer.java:52)
	at com.android.org.conscrypt.ChainStrengthAnalyzer.checkCert(ChainStrengthAnalyzer.java:47)
	at com.android.org.conscrypt.ChainStrengthAnalyzer.check(ChainStrengthAnalyzer.java:42)
	at com.android.org.conscrypt.TrustManagerImpl.checkTrusted(TrustManagerImpl.java:324)
	at com.android.org.conscrypt.TrustManagerImpl.checkServerTrusted(TrustManagerImpl.java:219)
	at com.android.org.conscrypt.Platform.checkServerTrusted(Platform.java:120)
	at com.android.org.conscrypt.OpenSSLSocketImpl.verifyCertificateChain(OpenSSLSocketImpl.java:572)
	... 23 more

Hi @mcosand ,

the issue is, that the 1.10.0 was build with a newer clang version, which somehow incorrectly builds OpenSSL. I haven’t figured out the cause yet, or what exactly breaks, compiler bugs are hard to track down.
In the meantime, does the server support TLSv1.2? If yes, you can hack around this by doing:

public class OverrideCipherSuiteSSLSocketFactory extends SSLSocketFactory {

    private final SSLSocketFactory delegate;

    public OverrideCipherSuiteSSLSocketFactory(SSLSocketFactory delegate) {
        this.delegate = delegate;
    }

    @Override
    public String[] getDefaultCipherSuites() {

        return new String[]{"TLS_DHE_RSA_WITH_AES_256_CBC_SHA"};
    }

    @Override
    public String[] getSupportedCipherSuites() {
        return new String[]{"TLS_DHE_RSA_WITH_AES_256_CBC_SHA"};
    }

    @Override
    public Socket createSocket(String arg0, int arg1) throws IOException, UnknownHostException {

        Socket socket = this.delegate.createSocket(arg0, arg1);
        ((SSLSocket)socket).setEnabledCipherSuites(new String[]{"TLS_DHE_RSA_WITH_AES_256_CBC_SHA"});
        ((SSLSocket)socket).setEnabledProtocols(new String[] { "TLSv1.2" });

        return socket;
    }

    @Override
    public Socket createSocket(InetAddress arg0, int arg1) throws IOException {

        Socket socket = this.delegate.createSocket(arg0, arg1);
        ((SSLSocket)socket).setEnabledCipherSuites(new String[]{"TLS_DHE_RSA_WITH_AES_256_CBC_SHA"});
        ((SSLSocket)socket).setEnabledProtocols(new String[] { "TLSv1.2" });
        return socket;
    }

    @Override
    public Socket createSocket(Socket arg0, String arg1, int arg2, boolean arg3)
            throws IOException {

        Socket socket = this.delegate.createSocket(arg0, arg1, arg2, arg3);
        ((SSLSocket)socket).setEnabledCipherSuites(new String[]{"TLS_DHE_RSA_WITH_AES_256_CBC_SHA"});
        ((SSLSocket)socket).setEnabledProtocols(new String[] { "TLSv1.2" });
        return socket;
    }

    @Override
    public Socket createSocket(String arg0, int arg1, InetAddress arg2, int arg3)
            throws IOException, UnknownHostException {

        Socket socket = this.delegate.createSocket(arg0, arg1, arg2, arg3);
        ((SSLSocket)socket).setEnabledCipherSuites(new String[]{"TLS_DHE_RSA_WITH_AES_256_CBC_SHA"});
        ((SSLSocket)socket).setEnabledProtocols(new String[] { "TLSv1.2" });
        return socket;
    }

    @Override
    public Socket createSocket(InetAddress arg0, int arg1, InetAddress arg2,
            int arg3) throws IOException {

        Socket socket = this.delegate.createSocket(arg0, arg1, arg2, arg3);
        ((SSLSocket)socket).setEnabledCipherSuites(new String[]{"TLS_DHE_RSA_WITH_AES_256_CBC_SHA"});
        ((SSLSocket)socket).setEnabledProtocols(new String[] { "TLSv1.2" });
        return socket;
    }
}

and than you can do on startup:

SSLSocketFactory preferredCipherSuiteSSLSocketFactory = new OverrideCipherSuiteSSLSocketFactory((SSLSocketFactory)SSLSocketFactory.getDefault());
        HttpsURLConnection.setDefaultSSLSocketFactory(preferredCipherSuiteSSLSocketFactory);

or per connection

SSLSocketFactory preferredCipherSuiteSSLSocketFactory = new OverrideCipherSuiteSSLSocketFactory((SSLSocketFactory)SSLSocketFactory.getDefault());
            conn.setSSLSocketFactory(preferredCipherSuiteSSLSocketFactory);

Quick reply!
I couldn’t get the cipher you picked to work on my phone.
I iterated through all the supported ciphers of the default SSLSocketFactory, and was able to get a couple of RSA ciphers to connect to google.com. I didn’t find any ciphers that work with our primary host (an AWS application load balancer).

I’ve found a couple of random posts across the internet that discuss openssl, clang, and various compiler options that result in similar failures. I’ll see if I can get a local build running on my machine and figure out what levels and knobs I can adjust.

Are the build instructions at GitHub - multi-os-engine/multi-os-engine at moe-master still close to correct?

@mcosand
I have just updated them real quick! But other README files could still be out of date.

The xcodeproject/build of the OpenSSL project that is breaking is the one under: moe-core/moe.apple/moe.core.native/android.external.openssl
The source code is under aosp/external/openssl

Let me know, if you have any other questions!

New instructions got me past my last error.
I’m not sure what the process is for getting running a build and including it as part of my project yet.

I’m trying to build the SDK with ./gradlew :tools:moe-sdk:devsdk. Besides the documented prereqs, I also needed a JDK8 install, which I ended up getting with brew install openjdk@8 under Rosetta. My current error:

> Task :prebuilts:external:libffi:prebuild_macos FAILED
Full rror log available at /Users/mcosand/code/repos/r/moe/prebuilts/external/libffi/build/macos-build.log
--------- COMMAND LOG START ---------


COMMAND >>> [hdiutil, attach, -nomount, ram://262144]
/dev/disk28                                             


COMMAND >>> [diskutil, erasevolume, HFS+, build-20241011-110617, /dev/disk28]
Started erase on disk28
Unmounting disk
Erasing
Initialized /dev/rdisk28 as a 128 MB case-insensitive HFS Plus volume
Mounting disk
Finished erase on disk28 (build-20241011-110617)


COMMAND >>> [rsync, -r, --exclude=.git, /Users/mcosand/code/repos/r/external/libffi/, /Volumes/build-20241011-110617/]


COMMAND >>> [bash, moe-prebuild-macos.sh]
MOE_PREBUILTS_DIR=/Users/mcosand/code/repos/r/moe/prebuilts
MOE_PREBUILTS_TARGET_DIR=external/libffi/build/macos

<snip>

checking xargs -n works... yes
checking for arm-apple-darwin11-gcc... xcrun -sdk iphoneos clang -arch armv7
checking whether the C compiler works... no
configure: error: in '/Volumes/build-20241011-110617/build_iphoneos-armv7':
configure: error: C compiler cannot create executables
See 'config.log' for more details
Traceback (most recent call last):
  File "/Volumes/build-20241011-110617/generate-darwin-source-and-headers.py", line 244, in <module>
    generate_source_and_headers(generate_osx=not args.disable_osx, generate_ios=not args.disable_ios, generate_tvos=False)
  File "/Volumes/build-20241011-110617/generate-darwin-source-and-headers.py", line 221, in generate_source_and_headers
    build_target(ios_device_platform, platform_headers)
  File "/Volumes/build-20241011-110617/generate-darwin-source-and-headers.py", line 185, in build_target
    subprocess.check_call(['../configure', '-host', platform.triple], env=env)
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['../configure', '-host', 'arm-apple-darwin11']' returned non-zero exit status 77.


COMMAND >>> [diskutil, unmountDisk, /dev/disk28]
Unmount of all volumes on disk28 was successful

--------- COMMAND LOG END ---------

FAILURE: Build failed with an exception.

I’m running on an M1 machine, and updated yesterday to MacOS 15.0.1, XCode 16.0, and clang 16.0.0.

@mcosand
clang 16 broke things again (basically, every new major clang version breaks something).
I pushed a fix that should address this:

If you build an devsdk, you can point your project to it with the moe.sdk.localbuild gradle property.
Otherwise, if you publish to mavenLocal, than you just need to specify your own custom version number in your build.gradle file.

progress?

  • I can build the SDK

  • I can get the calculator demo to run with my SDK (A System.out.println call in aosp/external/okhttp/okhttp-urlconnection/src/main/java/com/squareup/okhttp/internal/huc/DelegatingHttpsURLConnection.java prints in the calculator console.

  • If I put garbage in one of the aosp/external/openssl/ files, the SDK gradle task fails. Maybe this suggests the build is not using a cached artifact?

Now I’m at a point where the SSLHandshakeException stack trace includes a reference to /Users/runner/work/moe-gha/moe-gha/moe-ci/aosp/external/openssl/crypto/ec/ecp_oct.c:421. This path is not on my machine, and I have line 421 commented out in my aosp/external/openssl version of the file, so I suspect the app is using a different version of the library.

Any more pointers on where this code path is coming from?

@mcosand
That is very odd.
The android openssl is linked into the MOE framework in moe-core/moe.apple/moe.core.native/moe.sdk.
This framework than gets placed under sdk/iphoneos/ in the SDK.

While building, the SDK gets symlinked to build/moe/sdk from where XCode picks it up.

So my main assumption is, that XCode incorrectly links/includes the framework.

So, I would recommend removing the build/moe/ folder and cleaning the XCode build folder under “Product → Clean Build Folder”.
I hope this resolves the caching issue!

If not, you can try to manually check all checksums on the path of the framework to see, where the wrong one is picked up.

I found the calculator app had a symlink build/moe/sdk -> /Users/me/.moe/moe-sdk-1.10.0. I manually re-linked it to <repo>/moe/tools/moe.sdk.publisher/build/dev-sdk, and now it looks like the calculator app is running my openssl changes.

I am able to create a build that is able to create an HTTPS connection to our AWS application load balancer.

Trying to reduce my change set, I’m finding that I can get a good build with no source changes in openssl. If I add the following change, run ./gradlew :tools:moe-sdk:devsdk, and rebuild the app, I can get it to fail. Removing the change and rebuilding the SDK and app works again.

diff --git a/crypto/bn/bn_nist.c b/crypto/bn/bn_nist.c
index abb1570..e8c9ba0 100644
--- a/crypto/bn/bn_nist.c
+++ b/crypto/bn/bn_nist.c
@@ -294,7 +294,7 @@ static void nist_cp_bn_0(BN_ULONG *dst, const BN_ULONG *src, int top, int max)
        OPENSSL_assert(top <= max);
 #endif
        for (i = 0; i < top; i++)
-               dst[i] = src[i];
+               dst[i] = src[i] + 1;
        for (; i < max; i++)
                dst[i] = 0;
        }

I’m going to keep checking my changes, but it seems that the updates to work under clang 16 (and other Mac/XCode updates?) might be sufficient.

@mcosand
It might be possible, that clang 15 introduced a bug that got fixed in clang 16?
The CI currently runs on XCode 15: moe-gha/.github/workflows/gradle-publish.yml at b9bb93d248f95258f263b873bb1fd7c7e47cf5e7 · Berstanio/moe-gha · GitHub

If you want I can update the CI to XCode 16 and make a snapshot build to test, just let me know.

We would love to be able to stay on an official release instead of forking MOE. If it’s not a hassle to upgrade the CI build to clang 16, we’d be happy to give it a shake down.

… still trying to figure out caching in the build process. Let me do a few more things on my end before cutting a new snapshot build.

Okay. Am more confident that I can do a good build without code changes in the MOE SDK. I would like to see how a clang 16 CI build would work.

@mcosand
I published a new MOE 1.10.1-SNAPSHOT snapshot that was build with clang 16.
The snapshot is hosted at https://oss.sonatype.org/content/repositories/snapshots

Well, now things get interesting.

I have updated the calculator app to use 1.10.1-SNAPSHOT, and see the same SSL error as before.

It seems that I have corrupted something in my production app build. I don’t remember the last time I was able to build successfully, but believe it was after I upgrade to MacOS 15.0.1 and XCode 16.0. Now, regardless of what version of MOE I’m trying to use (1.8.2, 1.10.1-SNAPSHOT, my local 1.10.0 build), I’m able to build the app but get a runtime error like:

*** Assertion failure in -[UIApplication _runWithMainScene:transitionContext:completion:], UIApplication.m:4735
*** Terminating app due to uncaught exception 'NSInternalInconsistencyException', reason: 'Application windows are expected to have a root view controller at the end of application launch'
*** First throw call stack:
(0x18f9ecf20 0x1878a6018 0x18eef3868 0x191f7c9dc 0x191f2fa0c 0x191e227c4 0x191ff52ac 0x191ff4fbc 0x191d8441c 0x191d51650 0x191d50f78 0x191d50924 0x191d4ff64 0x191e23d3c 0x191e22574 0x191e21ecc 0x191eece7c 0x191ee99e0 0x191ee9600 0x1a8669974 0x1a8669808 0x1a86656cc 0x1a8669cc4 0x103e527bc 0x103e561f0 0x1a8666d58 0x1a8666cd8 0x1a8666bb0 0x18f9bf834 0x18f9bf7c8 0x18f9bd298 0x18f9bc484 0x18f9bbcd8 0x1d486c1a8 0x191ff490c 0x1920a89d0 0x104a8bc38 0x10982e464 0x1b306de4c)
libc++abi: terminating due to uncaught exception of type NSException

Adding some print statements to my local build, it seems that in MOE.mm::run_moevm, get_oat_data is returning null.
I’ve tried cleaning the XCode build folder, manually removing <ios>/build, clearing ~/.gradle/cache and ~/Library/Developer/Xcode/DerivedData with no success.
I do see a 122MB <ios>/build/moe/main/dex2oat/debug-arm64/application.oat, but am getting lost figuring out how that gets packaged into the app and why it might not be found my get_oat_data.

Whenever you switch versions, make sure to run the moeUpdateXcodeSettings task. This should (hopefully) fix this.
As background info, the oat_data was moved into the __TEXT section on MOE 1.10.0:

And moeUpdateXcodeSettings updates XCode to link the oatdata properly.

I had also some reports with a similar issue when the iOS 18 SDK is installed, but not sure whether it is the same here. I also don’t know yet, what causes this.

That is unfortunate to hear :frowning: It is weird, that it works on local build.
I’m currently a bit busy to deep dive into the OpenSSL issue myself, sadly.

Just found that setting ENABLE_DEBUG_DYLIB=NO allows me to build the production app again on iOS 18, and setting it to YES fails again.
I don’t know the implications of this setting yet. I got the suggestion from this thread: https://forums.developer.apple.com/forums/thread/760543

Now I have our production app running against my local SDK, and have the original SSL error. I will see if I can get SSL traffic working again in this configuration.

I have our production app working now.

Thank you very much @mcosand for the investigation!
I will backport the suggested patch into MOE.

If I have some time I will also dig a bit into the ENABLE_DEBUG_DYLIB=NO
But from the source you provided, I assume changing the method to just retrieve the header pointer using
extern const struct mach_header_64 __dso_handle; instead might solve this. I think there is also no point in iterating over the images in current MOE, but I might be mistaken on that.

Anyway, thanks a lot again!