Wednesday, April 12, 2017

Lessons learned Docker microservices architecture with Spring Boot

Introduction

During my last project consisting of a Docker microservices architecture, built with Spring Boot, using RabbitMQ as communication channel, I learned a bunch of lessons, here's a summary of them.

Architecture

Below is a high level overview of the architecture that was used.


Docker

  • Run 1 process/service/application per docker container (or put stuff in init.d but that's not intended use of docker)

  • Starting background processes in the CMD cause container to exit. So either have a script waiting at the end (e.g tail -f /dev/null) or keep the process (i.e the one prefixed with CMD) running in the foreground. Other useful Dockerfile tips you can find here

  • As far as I can tell Docker checks if Dockerfile has changed, and if so, creates a new image instance (diffs only?)

  • Basic example to start RabbitMq docker image, as used in the build tool:

    $ docker pull 172.18.19.20/project/rabbitmq:latest
    docker rm -f build-server-rabbitmq
    $ # Map the RabbitMQ regular and console ports
    $ docker run -d -p 5672:5672 -p 15672:15672 --name build-server-rabbitmq 172.18.19.20/rabbitmq:latest

  • If there's no docker0 interface (check by running command ifconfig) then probably there are ^M characters in the config file at /etc/default/docker/docker.config. To fix it, perform a dos2unix on that file.

  • Check for errors at startup of docker in /var/log/upstart/docker.log

  • If your docker push <image> asks for a login (and you don't expect that) or it returns some weird html like "</html>" then you're probably missing the host in front of the image name, e.g: 172.18.19.20:6000/projectname/some-service:latest

  • Stuff like /var/log/messages is not visible in a Docker container, but is in its host! So look there for example to find out why a process is not starting/gets killed at startup without any useful logging (like we had with clamd)

  • How to remove old dangling unused docker images: docker rmi $(docker images --filter "dangling=true" -q --no-trunc)

Spring Boot

  • Some jackson2 dependencies were missing from the generated Spring Initializr project, noticed when creating unittests. These dependencies were additionally needed in scope test:

    <dependency>
      <groupid>com.fasterxml.jackson.core</groupid>
      <artifactid>jackson-databind</artifactid>
      <version>2.5.0</version></dependency>
    <dependency>
      <groupid>com.fasterxml.jackson.core</groupid>
      <artifactid>jackson-annotations</artifactid>
      <version>2.5.0</version></dependency>
    <dependency>
      <groupid>com.fasterxml.jackson.core</groupid>
      <artifactid>jackson-core</artifactid>
      <version>2.5.0</version></dependency>


    Not sure anymore why these then didn't get <scope>test</scope> then... Guess it was needed also in some regular code... :)

  • In the Spring Boot AMQP Quick Start the last param has name .with(queueName) during binding, but that's the topic key! (which is related to the binding key used at sending), so not the queue name.

  • Spring Boot Actuator's /health will check all related dependencies! So if you have a dependency in your pom.xml to a project which uses spring-boot-starter-amqp, /health will check now for an AMQP queue being up! So add a for those if you don't want that.

  • Spring Boot's default AppAplicationIT probably needs a @DirtiesContext for your tests, otherwise the tests might re-use or create more beans than you think (we saw that in our message receiver tests helper class).

  • @Transactional in Spring: by default only for unchecked exceptions!! It's documented but still a thing to watch out for.

  • And of course: Spring's @Transactional does not work on private methods (due to proxy stuff it creates)

  • To see in Spring Boot the transaction logging, put this in application.properties:

    logging.level.org.springframework.jdbc=TRACE
    logging.level.org.springframework.transaction=TRACE


    Note that by default @Transactional just rolls back, it does not log anything, so if you don't log your runtime exceptions, you won't see much in your logs.

  • mockMvc from spring is not really invoking from "outside", our spring sec context filter (for which you can use @Secured(role)) was allowing calls while no authentication was provided for. RestTemplate seems to work from "the outside".

  • Scan order can mess up @ControllerAdvice error handler it seems. Had to change the order sometimes:

    Setup:
    - Controller is in: com.company.request.web.
    - General error controller is in com.company.common package.

    Had to change
    @ComponentScan(value = {"com.company.security", "com.company.common", "com.company.cassandra", "com.company.module", "com.company.request"})

    to

    @ComponentScan(value = {"com.company.security", "com.company.cassandra", "com.company.module", "com.company.request", "com.company.common"})

    Note that the general error controller has now been put in last

  • Spring Boot footprint seems relatively big especially for microservices. At least 500MB or something is needed, so we have quite big machines for about 20 services. Maybe plain Spring (iso Spring boot) might be more lightweight...

Bamboo build server

  • When Bamboo gets slow and the CPU seems quite busy and memory availability on its server seems fine, increase the Xmss and Xmsx (or related). Found this out because the java Bamboo process was running out of heap sometimes, increasing heap also fixed performance.

  • To have Bamboo builds fail on quality gates not met in SonarQube, install in Sonar the build breaker plugin. See the plugin docs and Update Center. This FAQ says so.

Stash

  • The Stash (now called Bitbucket) API: in /rest/git/1.0/projects/{projectKey}/repos/{repositorySlug}/tags a 'slug' is just a repository name. 

Microservices with event based architecture

  • When you do microservices, IMMEDIATELY take into account during coding + reviews that multiple instances can do concurrent access to database.

    This has affect on your queries. Most likely correct implementation for uniqueness check on inserts:
    1- add unique constraint
    2- run insert
    3- catch uniqueness exception --> you know it already exists. Solution with SELECT NOT EXISTS is not guaranteed unique.

  • Also take deleting of data (e.g user deletes himself) into account from the start. Especially when using events and/or eventual consistency in combination with an account-balance or similar. Because what if one services in the whole chain of things to execute for a delete fails? Has the user still some money left on his/her account then? In short: take care of CRUD.

  • Multiple services are sending the same event? That can indicate 2 services are doing the same thing --> Not good probably.

  • Microservices advantages:

    - Forces you to better think about where to put stuff in comparison to monolith where you more often can be tempted to "just do a quick fix".
    - language independency for service implementation: choose the best language for the job

    Disadvantages:
    - more time needed for design
    - eventual consistency is quite tough to understand & work with, also conceptually
    - infrastructure is more complex including all communication between services

    More cons can be found here.

Tomcat

  • Limit the maximum size of what can be posted to a servlet is not as easy as it seems for REST services:

    - maxPostSize in Tomcat is enforced only for specific contenttype: Tomcat only enforces that limit if the content type is application/x-www-form-urlencoded

    - And the other 3 below XML options are for multipart only:

    <multipart-config>
      <!-- 52MB max -->
      <max-file-size>52428800</max-file-size>
      <max-request-size>52428800</max-request-size>
      <file-size-threshold>0</file-size-threshold></multipart-config>


    So that one won't work for uploading just a byte[]. The only solution is in the servlet (e.g Spring @Controller) you'll have to check for the limit you want to allow.

  • maxthreads seems set to be unlimited by default or something. 50 seems to perform better. (workerthreads) 

Security

  • To securely generate a random number: SecureRandom randomGenerator = SecureRandom.getInstance("NativePRNG");

  • Good explanation of secure use of a salt to use for hashing can be found here

Cassandra

  • Unique constraints are not possible in Cassandra, so there you will even have to implement unique constraints in the business logic (and make it eventually consistent)

  • CassandraOperations query for one field:

    Select select = QueryBuilder.select(MultiplePaymentRequestRequesterEntityKey.ID).from(MultiplePaymentRequestRequesterEntity.TABLE_NAME);
    select.setConsistencyLevel(ConsistencyLevel.LOCAL_QUORUM);
    select.where(QueryBuilder.eq(MultiplePaymentRequestRequesterEntityKey.REQUESTER, requester));
    return cassandraTemplate.queryForList(select, UUID.class);


    See also here.

  • Note below two keys don't seem to get picked up by Cassandra in the Spring Data Cassandra version 1.1.4.RELEASE:  

    <groupid>org.springframework.data</groupid>
    <artifactid>spring-data-cassandra</artifactid>

    @PrimaryKeyColumn(name = OTHER_USER_ID, ordinal = 0, type = PrimaryKeyType.PARTITIONED)
    @CassandraType(type = DataType.Name.UUID)
    private UUID meUserId;

    @PrimaryKeyColumn(name = ME_USER_ID, ordinal = 1, type = PrimaryKeyType.CLUSTERED)
    @CassandraType(type = DataType.Name.UUID)
    private UUID meId;

    This *does* get picked up: put it into a separate class:

    @Data
    @AllArgsConstructor
    @PrimaryKeyClass
    public class HistoryKey implements Serializable {

      @PrimaryKeyColumn(name = HistoryEntity.ME_USER_ID, ordinal = 0, type = PrimaryKeyType.PARTITIONED)
      @CassandraType(type = DataType.Name.UUID)
      private UUID meUserId;

      @PrimaryKeyColumn(name = HistoryEntity.OTHER_USER_ID, ordinal = 1, type = PrimaryKeyType.PARTITIONED)
      @CassandraType(type = DataType.Name.UUID)
      private UUID otherUserId;

      @PrimaryKeyColumn(name = HistoryEntity.CREATED, ordinal = 2, type = PrimaryKeyType.CLUSTERED, ordering = Ordering.DESCENDING)
      private Date created;

    }

  • Don't use Cassandra for all types of use cases. An RDMS still has its value, e.g for ACID requirements. Cassandra is eventually consistent.

Kubernetes

Miscellaneous

  • Use dig for DNS resolving problems

  • Use pgAdmin III for PostgreSQL GUI

  • To stop SonarQube complaining about unused private fields when using Lombok @Data annotation: add to each of those classes @SuppressWarnings("PMD.UnusedPrivateField")

  • Managed to not need transactions nor XA transactions for message publishing, message reading, store db, message sending, by using the confirm + ack mechanism.
    And allow message to be read again. DB then sees: oh already stored (or do an upsert).
    So,when processing message from the queue:
    1- store in db
    2- send message on queue
    3- only then ack back to queue that read was successful

  • Performance: instantiate the Jackson2 ObjectMapper once as static, not in each call, so:
    private static final ObjectMapper mapper = new ObjectMapper();
  • Javascript: when an exception occurs in a callback and it is not handled, processing just ends. Promises have better error handling.

  • clamd would not start correctly; it would try to start but then show 'Killed' when started via the commandline. Turns out it runs out of memory when starting up.  Though we had enough RAM (16G total, 3G free), it turns out clamd needs swap configured!

  • Linux bash shell script to loop through projects for tagging with projects with spaces in their name:

    PROJECTS="
      project1
      project space2
    ";
    IFS=$'\n'
    for PROJECT in $PROJECTS
    do
      TRIM_LEADING_SPACE_PROJECT="$(echo -e "${PROJECT}" | sed -e 's/^[[:space:]]*//')"
      echo "Cloning '$TRIM_LEADING_SPACE_PROJECT'"
      git clone --depth=1 http://$USER:$GITPASSWD@github.com/projects/$TRIM_LEADING_SPACE_PROJECT.git
    done

  • OpenVPN in Windows 10: Sometimes it hangs on "Connecting..."  It doesn't show the popup to enter username/pwd. Go to View logs. Then when you see: Enter management password in the logs: ???? you have to kill the OpenVPN Daemon under Processes tab (windows taskmanager). The service is stopped when exiting the app but that's not enough!

  • javascript/nodejs log every call that comes in:

    app.use(function (req, res, next) {
      console.log('Incoming request = ' + new Date(), req.method, req.url);
      logger.debug('log at debug level');
      next()
    }

  • If ever your mouse is suddenly not working anymore your VirtualBox guest, kill the process in your guest-machine mentioned in comment 5 here. After that the mouse works again in your vbox guest.

  • Fix Firefox to version 45.0.0 for selenium driver tests:

    sudo apt-get install -y firefox=45.0.2+build1-0ubuntu1
    sudo apt-mark hold firefox

  • Setting the cookie attribute Secure (indicating cookie should only be sent over httpS) can be seen when using curl to request the URL(s) that should send that cookie plus the new attribute, even when using HTTP. See also my previous post.

    But when using a browser and HTTP, you probably won't see the secure cookie appear in the cookie store. This is (probably) because the browser knows not to store it in that case because it's HTTP being used.

  • Idempotency within services is key for resilience and be able to resend an event or perform an API call again.

Wednesday, January 18, 2017

OpenVPN how to route all IPV4 traffic through the OpenVPN tunnel

Introduction

Originally I was connected from a Windows 10 machine via OpenVPN to a network (segment?) for "our" project. I could access all servers and websites related to it.  But when switching to another project (using the same OpenVPN settings) I could only access the new project's servers when at the premise of that project. At home or from any other place, I could not get to the servers, e.g Jenkins. The error shown was "This site can't be reached" in Chrome. See screenshot below for the exact error:



But I could get to the microservices pods directly by IP address, e.g 172.18.33.xyz (xyz are not the same in below example IP addresses, just obfuscators). So quite strange.

The administrator of the OpenVPN server didn't know how to fix the problem either. Suggested was to make sure "to route all IPV4 traffic through VPN". That made me search on the interwebs and I found below solution to work, without having to change any server settings. (I did not even have access to those server settings.)

Analyzing the problem

A) Trying the website with the hostname:
C:\Users\moi>tracert website.eu
Tracing route to website.eu [183.45.163.xyz] over a maximum of 30 hops:
  1     1 ms     1 ms     1 ms  MODEM [192.169.178.x]
  2    20 ms    19 ms    20 ms  d13.xs4all.com [195.109.5.xyz]
  3    22 ms    22 ms    22 ms  3d13.xs4all.com [195.109.7.xyz]
...


B) Trying the well-known google gateway:
C:\Users\moi>tracert 8.8.8.8
Tracing route to google-public-dns-a.google.com [8.8.8.8] over a maximum of 30 hops:
  1     1 ms     1 ms     1 ms  MODEM [192.169.178.x]
  2    21 ms    20 ms    21 ms  d12.xs4all.com [195.109.5.xyz]
...

Hmm so its route goes via the same initial gateway for both external IPs and the hostname, so not via the VPN.

C) Trying with the IP that works (note not the IP for the hostname from above!):
C:\Users\moi>tracert 172.18.33.xyz
Tracing route to 172.18.33.xyz over a maximum of 30 hops
  1    97 ms    21 ms    20 ms  192.169.200.xyz
  2    45 ms    98 ms    29 ms  172.16.11.xyz
  3   130 ms    65 ms    68 ms  172.16.11.xyz
...

As you can see, the first entrypoint gateway is a different one, and most likely the wrong one.

The solution

The solution was to add this to the .ovpn OpenVPN configuration file:

route-method exe
route-delay 2
redirect-gateway def1

For me even only the last line (redirect-gateway def1) was sufficient, but for others the other two lines had to be added too.

D) After adding the setting, you can see the IP of the gateway changed to, the what turns out to, be the correct one:
C:\Users\moi>tracert website.eu
Tracing route to website.eu [183.45.163.yyy] over a maximum of 30 hops:
  1   143 ms    31 ms    21 ms  192.169.200.xyz
  2    21 ms    20 ms    21 ms  static.services.de [88.20.160.xyz]
  3    21 ms    21 ms    25 ms  10.31.17.xyz
  4    25 ms    21 ms    91 ms  10.31.17.xyz
...

References used:
- http://superuser.com/questions/120069/routing-all-traffic-through-openvpn-tunnel
- http://askubuntu.com/questions/665394/how-to-route-all-traffic-through-openvpn-using-network-manager