SSLHandshakeException with MOE 1.10

We are trying to upgrade our project to MOE 1.10, and have found an issue making HTTPS requests with Java’s URLConnection. Does someone know what could have changed in this release to cause this issue?

I’ve put together a repro using the Calculator demo: moe-samples-java/Calculator/common/src/main/java/org/moe/samples/calculator/common/CalcOperations.java at tls · mcosand-caltopo/moe-samples-java · GitHub
When you add two numbers, make a request to https://google.com and return the response code as the result of the sum. On exception, print the stack trace to the XCode log and display -1.0.
The repro works on MOE 1.9, but returns the following stack on 1.10. Any ideas?

javax.net.ssl.SSLHandshakeException: java.lang.RuntimeException: error:1006706B:elliptic curve routines:ec_GFp_simple_oct2point:point is not on curve
	at com.android.org.conscrypt.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:344)
	at com.android.okhttp.Connection.connectTls(Connection.java:235)
	at com.android.okhttp.Connection.connectSocket(Connection.java:199)
	at com.android.okhttp.Connection.connect(Connection.java:172)
	at com.android.okhttp.Connection.connectAndSetOwner(Connection.java:367)
	at com.android.okhttp.OkHttpClient$1.connectAndSetOwner(OkHttpClient.java:130)
	at com.android.okhttp.internal.http.HttpEngine.connect(HttpEngine.java:329)
	at com.android.okhttp.internal.http.HttpEngine.sendRequest(HttpEngine.java:246)
	at com.android.okhttp.internal.huc.HttpURLConnectionImpl.execute(HttpURLConnectionImpl.java:442)
	at com.android.okhttp.internal.huc.HttpURLConnectionImpl.getResponse(HttpURLConnectionImpl.java:393)
	at com.android.okhttp.internal.huc.HttpURLConnectionImpl.getResponseCode(HttpURLConnectionImpl.java:506)
	at com.android.okhttp.internal.huc.DelegatingHttpsURLConnection.getResponseCode(DelegatingHttpsURLConnection.java:105)
	at com.android.okhttp.internal.huc.HttpsURLConnectionImpl.getResponseCode(HttpsURLConnectionImpl.java:25)
	at org.moe.samples.calculator.common.CalcOperations.sum(CalcOperations.java:41)
	at org.moe.samples.calculator.common.CalcOperations.calculate(CalcOperations.java:85)
	at org.moe.samples.calculator.common.CalculatorAdapter.calculateAndPrepare(CalculatorAdapter.java:258)
	at org.moe.samples.calculator.common.CalculatorAdapter.sendNewSymbol(CalculatorAdapter.java:166)
	at org.moe.samples.calculator.ios.ui.AppViewController.buttonEqPressed(AppViewController.java:176)
	at apple.uikit.c.UIKit.UIApplicationMain(Native Method)
	at org.moe.samples.calculator.ios.Main.main(Main.java:47)
	at java.lang.reflect.Method.invoke(Native Method)
	at org.moe.IOSLauncher.main(IOSLauncher.java:34)
Caused by: java.security.cert.CertificateException: java.lang.RuntimeException: error:1006706B:elliptic curve routines:ec_GFp_simple_oct2point:point is not on curve
	at com.android.org.conscrypt.OpenSSLSocketImpl.verifyCertificateChain(OpenSSLSocketImpl.java:593)
	at com.android.org.conscrypt.NativeCrypto.SSL_do_handshake(Native Method)
	at com.android.org.conscrypt.OpenSSLSocketImpl.startHandshake(OpenSSLSocketImpl.java:340)
	... 21 more
Caused by: java.lang.RuntimeException: error:1006706B:elliptic curve routines:ec_GFp_simple_oct2point:point is not on curve
	at com.android.org.conscrypt.NativeCrypto.X509_get_pubkey(Native Method)
	at com.android.org.conscrypt.OpenSSLX509Certificate.getPublicKey(OpenSSLX509Certificate.java:429)
	at com.android.org.conscrypt.ChainStrengthAnalyzer.checkKeyLength(ChainStrengthAnalyzer.java:52)
	at com.android.org.conscrypt.ChainStrengthAnalyzer.checkCert(ChainStrengthAnalyzer.java:47)
	at com.android.org.conscrypt.ChainStrengthAnalyzer.check(ChainStrengthAnalyzer.java:42)
	at com.android.org.conscrypt.TrustManagerImpl.checkTrusted(TrustManagerImpl.java:324)
	at com.android.org.conscrypt.TrustManagerImpl.checkServerTrusted(TrustManagerImpl.java:219)
	at com.android.org.conscrypt.Platform.checkServerTrusted(Platform.java:120)
	at com.android.org.conscrypt.OpenSSLSocketImpl.verifyCertificateChain(OpenSSLSocketImpl.java:572)
	... 23 more

Hi @mcosand ,

the issue is, that the 1.10.0 was build with a newer clang version, which somehow incorrectly builds OpenSSL. I haven’t figured out the cause yet, or what exactly breaks, compiler bugs are hard to track down.
In the meantime, does the server support TLSv1.2? If yes, you can hack around this by doing:

public class OverrideCipherSuiteSSLSocketFactory extends SSLSocketFactory {

    private final SSLSocketFactory delegate;

    public OverrideCipherSuiteSSLSocketFactory(SSLSocketFactory delegate) {
        this.delegate = delegate;
    }

    @Override
    public String[] getDefaultCipherSuites() {

        return new String[]{"TLS_DHE_RSA_WITH_AES_256_CBC_SHA"};
    }

    @Override
    public String[] getSupportedCipherSuites() {
        return new String[]{"TLS_DHE_RSA_WITH_AES_256_CBC_SHA"};
    }

    @Override
    public Socket createSocket(String arg0, int arg1) throws IOException, UnknownHostException {

        Socket socket = this.delegate.createSocket(arg0, arg1);
        ((SSLSocket)socket).setEnabledCipherSuites(new String[]{"TLS_DHE_RSA_WITH_AES_256_CBC_SHA"});
        ((SSLSocket)socket).setEnabledProtocols(new String[] { "TLSv1.2" });

        return socket;
    }

    @Override
    public Socket createSocket(InetAddress arg0, int arg1) throws IOException {

        Socket socket = this.delegate.createSocket(arg0, arg1);
        ((SSLSocket)socket).setEnabledCipherSuites(new String[]{"TLS_DHE_RSA_WITH_AES_256_CBC_SHA"});
        ((SSLSocket)socket).setEnabledProtocols(new String[] { "TLSv1.2" });
        return socket;
    }

    @Override
    public Socket createSocket(Socket arg0, String arg1, int arg2, boolean arg3)
            throws IOException {

        Socket socket = this.delegate.createSocket(arg0, arg1, arg2, arg3);
        ((SSLSocket)socket).setEnabledCipherSuites(new String[]{"TLS_DHE_RSA_WITH_AES_256_CBC_SHA"});
        ((SSLSocket)socket).setEnabledProtocols(new String[] { "TLSv1.2" });
        return socket;
    }

    @Override
    public Socket createSocket(String arg0, int arg1, InetAddress arg2, int arg3)
            throws IOException, UnknownHostException {

        Socket socket = this.delegate.createSocket(arg0, arg1, arg2, arg3);
        ((SSLSocket)socket).setEnabledCipherSuites(new String[]{"TLS_DHE_RSA_WITH_AES_256_CBC_SHA"});
        ((SSLSocket)socket).setEnabledProtocols(new String[] { "TLSv1.2" });
        return socket;
    }

    @Override
    public Socket createSocket(InetAddress arg0, int arg1, InetAddress arg2,
            int arg3) throws IOException {

        Socket socket = this.delegate.createSocket(arg0, arg1, arg2, arg3);
        ((SSLSocket)socket).setEnabledCipherSuites(new String[]{"TLS_DHE_RSA_WITH_AES_256_CBC_SHA"});
        ((SSLSocket)socket).setEnabledProtocols(new String[] { "TLSv1.2" });
        return socket;
    }
}

and than you can do on startup:

SSLSocketFactory preferredCipherSuiteSSLSocketFactory = new OverrideCipherSuiteSSLSocketFactory((SSLSocketFactory)SSLSocketFactory.getDefault());
        HttpsURLConnection.setDefaultSSLSocketFactory(preferredCipherSuiteSSLSocketFactory);

or per connection

SSLSocketFactory preferredCipherSuiteSSLSocketFactory = new OverrideCipherSuiteSSLSocketFactory((SSLSocketFactory)SSLSocketFactory.getDefault());
            conn.setSSLSocketFactory(preferredCipherSuiteSSLSocketFactory);

Quick reply!
I couldn’t get the cipher you picked to work on my phone.
I iterated through all the supported ciphers of the default SSLSocketFactory, and was able to get a couple of RSA ciphers to connect to google.com. I didn’t find any ciphers that work with our primary host (an AWS application load balancer).

I’ve found a couple of random posts across the internet that discuss openssl, clang, and various compiler options that result in similar failures. I’ll see if I can get a local build running on my machine and figure out what levels and knobs I can adjust.

Are the build instructions at GitHub - multi-os-engine/multi-os-engine at moe-master still close to correct?

@mcosand
I have just updated them real quick! But other README files could still be out of date.

The xcodeproject/build of the OpenSSL project that is breaking is the one under: moe-core/moe.apple/moe.core.native/android.external.openssl
The source code is under aosp/external/openssl

Let me know, if you have any other questions!

New instructions got me past my last error.
I’m not sure what the process is for getting running a build and including it as part of my project yet.

I’m trying to build the SDK with ./gradlew :tools:moe-sdk:devsdk. Besides the documented prereqs, I also needed a JDK8 install, which I ended up getting with brew install openjdk@8 under Rosetta. My current error:

> Task :prebuilts:external:libffi:prebuild_macos FAILED
Full rror log available at /Users/mcosand/code/repos/r/moe/prebuilts/external/libffi/build/macos-build.log
--------- COMMAND LOG START ---------


COMMAND >>> [hdiutil, attach, -nomount, ram://262144]
/dev/disk28                                             


COMMAND >>> [diskutil, erasevolume, HFS+, build-20241011-110617, /dev/disk28]
Started erase on disk28
Unmounting disk
Erasing
Initialized /dev/rdisk28 as a 128 MB case-insensitive HFS Plus volume
Mounting disk
Finished erase on disk28 (build-20241011-110617)


COMMAND >>> [rsync, -r, --exclude=.git, /Users/mcosand/code/repos/r/external/libffi/, /Volumes/build-20241011-110617/]


COMMAND >>> [bash, moe-prebuild-macos.sh]
MOE_PREBUILTS_DIR=/Users/mcosand/code/repos/r/moe/prebuilts
MOE_PREBUILTS_TARGET_DIR=external/libffi/build/macos

<snip>

checking xargs -n works... yes
checking for arm-apple-darwin11-gcc... xcrun -sdk iphoneos clang -arch armv7
checking whether the C compiler works... no
configure: error: in '/Volumes/build-20241011-110617/build_iphoneos-armv7':
configure: error: C compiler cannot create executables
See 'config.log' for more details
Traceback (most recent call last):
  File "/Volumes/build-20241011-110617/generate-darwin-source-and-headers.py", line 244, in <module>
    generate_source_and_headers(generate_osx=not args.disable_osx, generate_ios=not args.disable_ios, generate_tvos=False)
  File "/Volumes/build-20241011-110617/generate-darwin-source-and-headers.py", line 221, in generate_source_and_headers
    build_target(ios_device_platform, platform_headers)
  File "/Volumes/build-20241011-110617/generate-darwin-source-and-headers.py", line 185, in build_target
    subprocess.check_call(['../configure', '-host', platform.triple], env=env)
  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.9/lib/python3.9/subprocess.py", line 373, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['../configure', '-host', 'arm-apple-darwin11']' returned non-zero exit status 77.


COMMAND >>> [diskutil, unmountDisk, /dev/disk28]
Unmount of all volumes on disk28 was successful

--------- COMMAND LOG END ---------

FAILURE: Build failed with an exception.

I’m running on an M1 machine, and updated yesterday to MacOS 15.0.1, XCode 16.0, and clang 16.0.0.

@mcosand
clang 16 broke things again (basically, every new major clang version breaks something).
I pushed a fix that should address this:

If you build an devsdk, you can point your project to it with the moe.sdk.localbuild gradle property.
Otherwise, if you publish to mavenLocal, than you just need to specify your own custom version number in your build.gradle file.

progress?

  • I can build the SDK

  • I can get the calculator demo to run with my SDK (A System.out.println call in aosp/external/okhttp/okhttp-urlconnection/src/main/java/com/squareup/okhttp/internal/huc/DelegatingHttpsURLConnection.java prints in the calculator console.

  • If I put garbage in one of the aosp/external/openssl/ files, the SDK gradle task fails. Maybe this suggests the build is not using a cached artifact?

Now I’m at a point where the SSLHandshakeException stack trace includes a reference to /Users/runner/work/moe-gha/moe-gha/moe-ci/aosp/external/openssl/crypto/ec/ecp_oct.c:421. This path is not on my machine, and I have line 421 commented out in my aosp/external/openssl version of the file, so I suspect the app is using a different version of the library.

Any more pointers on where this code path is coming from?

@mcosand
That is very odd.
The android openssl is linked into the MOE framework in moe-core/moe.apple/moe.core.native/moe.sdk.
This framework than gets placed under sdk/iphoneos/ in the SDK.

While building, the SDK gets symlinked to build/moe/sdk from where XCode picks it up.

So my main assumption is, that XCode incorrectly links/includes the framework.

So, I would recommend removing the build/moe/ folder and cleaning the XCode build folder under “Product → Clean Build Folder”.
I hope this resolves the caching issue!

If not, you can try to manually check all checksums on the path of the framework to see, where the wrong one is picked up.

I found the calculator app had a symlink build/moe/sdk -> /Users/me/.moe/moe-sdk-1.10.0. I manually re-linked it to <repo>/moe/tools/moe.sdk.publisher/build/dev-sdk, and now it looks like the calculator app is running my openssl changes.

I am able to create a build that is able to create an HTTPS connection to our AWS application load balancer.

Trying to reduce my change set, I’m finding that I can get a good build with no source changes in openssl. If I add the following change, run ./gradlew :tools:moe-sdk:devsdk, and rebuild the app, I can get it to fail. Removing the change and rebuilding the SDK and app works again.

diff --git a/crypto/bn/bn_nist.c b/crypto/bn/bn_nist.c
index abb1570..e8c9ba0 100644
--- a/crypto/bn/bn_nist.c
+++ b/crypto/bn/bn_nist.c
@@ -294,7 +294,7 @@ static void nist_cp_bn_0(BN_ULONG *dst, const BN_ULONG *src, int top, int max)
        OPENSSL_assert(top <= max);
 #endif
        for (i = 0; i < top; i++)
-               dst[i] = src[i];
+               dst[i] = src[i] + 1;
        for (; i < max; i++)
                dst[i] = 0;
        }

I’m going to keep checking my changes, but it seems that the updates to work under clang 16 (and other Mac/XCode updates?) might be sufficient.

@mcosand
It might be possible, that clang 15 introduced a bug that got fixed in clang 16?
The CI currently runs on XCode 15: moe-gha/.github/workflows/gradle-publish.yml at b9bb93d248f95258f263b873bb1fd7c7e47cf5e7 · Berstanio/moe-gha · GitHub

If you want I can update the CI to XCode 16 and make a snapshot build to test, just let me know.

We would love to be able to stay on an official release instead of forking MOE. If it’s not a hassle to upgrade the CI build to clang 16, we’d be happy to give it a shake down.

… still trying to figure out caching in the build process. Let me do a few more things on my end before cutting a new snapshot build.

Okay. Am more confident that I can do a good build without code changes in the MOE SDK. I would like to see how a clang 16 CI build would work.

@mcosand
I published a new MOE 1.10.1-SNAPSHOT snapshot that was build with clang 16.
The snapshot is hosted at https://oss.sonatype.org/content/repositories/snapshots