Open Source is at the core of apifocal’s business model. Learn how apifocal contributes to the OSS communities.

,

The Utopia of Unique Identity

The Identity Zoo

The subject of Unique Patient Identity pops up with clockwork regularity in the healthcare discourse.

A recent article in the New England Journal of Medicine (NEJM) points out that HIPAA initially mandated it. Reason prevailed and the requirement was abandoned. The article goes on to list, correctly, all the issues related to duplicates and split records and the dire consequences from financial costs to potential loss of life.

Just a few short years ago AHIMA made a petition citing much of the same issues and making a proposal for a voluntary solution. That didn’t get much traction either.

While the problems cited are very real, it is not at all clear how a unique identifier would solve the problem, in spite of the very bold claims. Let alone that there is no precedent in history where something like this was pulled off (the NEJM article points out efforts in countries with a population half the one of NYC without suburbs).

The problem is not one of identity, but trusted identity. To be more accurate it’s not about the identity, but the claims made about the identity. I will not discuss the fact that a unique identifier only shifts the burden to a bureaucracy that is expected to be timely and correct. I will address instead the trust aspect. The problem is certainly not new and more prevalent in industry and in life that one might think. We learned that (some) humans lie and cannot be trusted. So how can I be sure, with a degree of confidence above a reasonable threshold (it’s always a relative, never an absolute) that people are who they say they are. If we’re not gonna trust them we have to trust somebody else. And that’s the Identity Issuer or Certification Authority.

As an example, say I meet Roger at a conference and he introduces me to John. I will trust John to be John, because Roger said so, my trust in Roger transferred to the claim he made about John’s identity. I will have a reasonable confidence (never 100%) of John being John to engage in some activities with him, but not others.

Let’s take another example, we travel abroad, we need to prove our identity. Enter the US Government issued Passport. It has an ID, which is not unique. When one renews their Passport, one is issued a new passport with another ID number. What is important is that foreign governments who need to know who is entering their country have no way to establish that individual’s identity. So they trust that the US Government has verified that person’s identity (or the foreign government makes one come in person to the consulate to obtain a visa). First they need to know that the proof of identity is valid, hence the security features of the passport. Then the proof of id makes a number of claims, certified by the issuer, the US Government, such as name, birthdate, height, etc, and even a photo. So what the verifier does is to assess the validity of the proof of id and then compare the features it observes herself against the claims made by the proof of id.

So the burden of determining the identity rests with the verifier and the process could be more involved, or more simple. At the very minimum, simple possession of a proof of id may be sufficient. My son would likely be able to check out a book from the local library with my library card, but may be less successful using my proof of id at a liquor store checking out other items.

The point of all this is that the goal of a unique patient identifier is at best unrealistic. It may solve some problem but it will for sure create more others. The problem so accurately described by the proponents of the unique id is the lack of a trusted trust model (trust the issuer) and the lack and inconsistency of claims made about identities that makes correlation so hard and error prone.

In my opinion, the solution involves a larger number of trusted Identity Issuers that use a standardized set and format for claims on the proof of id. Kinda like the guys who issue driver licenses (and I no way am I implying that States, or God forbid, the DMV should do that for healthcare). Complementing that, linked identities would provide for most cases sufficient metadata to address the problem of correlating identities across different EMR silos, a problem EMPI services are struggling with.

In future posts we will explore a few concrete solutions based on integrating existing standards and technologies.

, ,

A JMS Ping Utility

JMS Ping

Most companies that do integration at scale use some variation of an asynchronous messaging service. JMS is by far the preferred choice, whether using commercial products like IBM WebSphere or Oracle WebLogic (to name the more popular commercial ones, there are a few others), or open source alternatives, of which Apache ActiveMQ is by far the most popular. So popular, in fact, that Amazon included it a few months ago in its cloud services menu.

Regardless if one operates a messaging service on premise or in the cloud, it is important to constantly monitor and assess the health of the service. Unfortunately, tools that support operations of messaging services are virtually nonexistent, or have a limited scope at best. Therefore, every organization comes with its own solution. Over the years, we developed a nifty tool set that we are moving to open source under the business friendly Apache License v2.

A simple and very useful tool is a JMS ping utility. It is intended for ActiveMQ, but the code only uses the javax.jms package. With minor changes (mostly dependencies) it can be adapted for use with other messaging products.

Assuming you have a broker running, you can execute something like:

$ amix-ping tcp://localhost:60001
Completed: 1 messages sent
ping: destination=queue://jms.ping.GRqSOy, size=100 bytes, time=47 ms
Completed: 1 messages received

The tcp://localhost:60001 address is the connection URL for the broker. If you don’t have a broker running, you can easily run a local broker from the test-brokers directory in the project by running:

$ mvn -P jms-broker

The jms-broker profile from the maven project runs a local minimal broker.

The JMS ping utility is very useful in measuring the health of a broker. One may run it occasionally in development, but I find it extremely useful for production operations to measure, more or less continuously, the response time of one or multiple brokers. The amix-ping tool help prints:

$ amix-ping --help
usage: amix-ping [options] <broker-url> [destination]

Options:
 -h,--help
 -u,--user <user> User for JMS connection
 -p,--password <pass> Password for JMS connection
 -l,--length <len> Send message of 'len' bytes with random content
 -c <count> Stop after sending 'count' messages.
 -i <interval> Wait 'interval' seconds between sending messages
 -a,--async Send next ping before receiving reply

broker-url JMS connection URL
 destination JMS destination (either queue or topic) as url
 (e.g. "queue://queue.name" "topic://topic.name")
 default: "queue://jms.ping.<random>"

For more information visit https://docs.silkmq.com

The tool works with both authenticated and anonymous connections. For authenticated connections the -u and -p parameters are mandatory. Although the majority of brokers I’ve seen in production environments use, surprisingly, anonymous connections, I strongly recommend using authenticated connections in production. Yes, credential management may be a pain, but the added security is well worth it.

The -c argument allows for multiple pings being sent. The default is 1, as seen in the sample output above, but in general we want to send more pings and measure the average time.

The -i argument throttles down the pings to an interval specified, in seconds. By default, the messages are still throttled down to 20 ms (to a max of 50 msg/sec) to avoid flooding the broker.

The -l,--length is an interesting one. Predictably, it allows one to send random payloads of the specified size (in bytes), the default being 100. What is interesting is that this allows one to measure the response of the broker for various size loads. As an example (4 msg bursts):

$ amix-ping tcp://localhost:60001 -c 4
ping: destination=queue://jms.ping.nLZEdR, size=100 bytes, time=6 ms
ping: destination=queue://jms.ping.nLZEdR, size=100 bytes, time=1 ms
ping: destination=queue://jms.ping.nLZEdR, size=100 bytes, time=1 ms
Completed: 4 messages sent
ping: destination=queue://jms.ping.nLZEdR, size=100 bytes, time=1 ms
Completed: 4 messages received

$ amix-ping tcp://localhost:60001 -c 4 -l 1024
ping: destination=queue://jms.ping.mlGmGN, size=1024 bytes, time=6 ms
ping: destination=queue://jms.ping.mlGmGN, size=1024 bytes, time=1 ms
ping: destination=queue://jms.ping.mlGmGN, size=1024 bytes, time=2 ms
Completed: 4 messages sent
ping: destination=queue://jms.ping.mlGmGN, size=1024 bytes, time=2 ms
Completed: 4 messages received

$ amix-ping tcp://localhost:60001 -c 4 -l 1048576
ping: destination=queue://jms.ping.FAvRBK, size=1048576 bytes, time=58 ms
ping: destination=queue://jms.ping.FAvRBK, size=1048576 bytes, time=22 ms
ping: destination=queue://jms.ping.FAvRBK, size=1048576 bytes, time=37 ms
Completed: 4 messages sent
ping: destination=queue://jms.ping.FAvRBK, size=1048576 bytes, time=9 ms
Completed: 4 messages received

$ amix-ping tcp://localhost:60001 -c 4 -l 10485760
ping: destination=queue://jms.ping.ZprkcU, size=10485760 bytes, time=177 ms
ping: destination=queue://jms.ping.ZprkcU, size=10485760 bytes, time=124 ms
ping: destination=queue://jms.ping.ZprkcU, size=10485760 bytes, time=107 ms
Completed: 4 messages sent
ping: destination=queue://jms.ping.ZprkcU, size=10485760 bytes, time=103 ms
Completed: 4 messages received

What we see is that the first message takes longer – that’s not surprising given how ActiveMQ internals work. The 1 ms time is because the broker runs on the same host with no persistence. One thing to notice is that there is no significant difference between 100 bytes and 1 kb payloads, but once the payload size increases, so does the ping round trip time. This is not a surprise either, but what is important to know is by how much to fine tune the broker performance.

Running the JMS ping utility against a production broker, like SilkMQ, gives more realistic results. SilkMQ is apifocal’s managed ActiveMQ network of brokers service that currently runs distributed among multiple datacenters on 2 continents, including AWS.

$ amix-ping -u sC6tyqFP -p secret failover://(nio+ssl://mqs.silkmq.com:61616) -c 4
ping: destination=queue://jms.ping.KLKswe, size=100 bytes, time=36 ms
ping: destination=queue://jms.ping.KLKswe, size=100 bytes, time=13 ms
ping: destination=queue://jms.ping.KLKswe, size=100 bytes, time=13 ms
Completed: 4 messages sent
ping: destination=queue://jms.ping.KLKswe, size=100 bytes, time=20 ms
Completed: 4 messages received

The ping time for a broker on the US east coast is around 15 ms for small payloads, but this utility allows me to measure for different payload size and, more importantly, broker saturation. For that I run the ping from multiple consoles on my laptop, or I can run it from computers in different geographies. If I ping from two consoles in parallel, I will probably not see a degradation in response time vs just running one ping instance. Once I increase the number of parallel ping sessions, at some point we’ll start noticing a slight performance degradation, and even a serious degradation as the number of ping sessions grows. This is very helpful for tuning the performance of the brokers, in this case by configuring the ActiveMQ producer flow control.

One thing you may have noticed in the output is the destination name. From the usage text above, we see that the destination name may be provided as an optional parameter. The default is queue://jms.ping but we see that the actual destination has a random 6 character string appended to the name. The reason is that the ping utility uses both a producer and a consumer, and time is measured by using the JMSTimestamp property, so we want to make sure we don’t receive the timestamp from another computer running a ping at the same time (the time difference would be meaningless, right?).

Since we mentioned JMS message properties, there is one more thing worth mentioning. The ping utility also sets the TTL of the message to 2 min, so there is no need to flush destinations in case something went wrong. The broker will discard pings after 2 mins automatically, freeing up resources.

At the time of writing, the first version of activemq-mix open source utilities is not yet released (although we used the code in production for a long time). If you find this post after the release, the README will provide more up-to-date instructions. Otherwise, some scripts may be missing, including the amix-ping one seen above. You can get by by dropping a shell script with the content below on the $PATH.

#!/bin/sh

VER=1.0.0-SNAPSHOT
java -jar ~/.m2/repository/org/apifocal/amix/tools/amix-ping/${VER}/amix-ping-${VER}-jar-with-dependencies.jar $@

Enjoy, and if you find this utility helpful, drop us a note. Comments and constructive feedback encouraged.