Smart HAT engine (SHE) is the HAT's capability to run any algorithms on data within the HAT without leaking it anywhere else. It extends the core HAT functionality to include algorithms ranging from simple summary of HAT data to personal AI.
Containerised applications appears to be the obvious choice due to the possibility of writing them in any language and having isolation guarantees. The rest can be controlled through a well defined interface between the HAT itself and the Smart HAT Engine. In SHE, algorithms run in such an isolated environment with no ability to communicate with the outside world, enforced through firewalls and security policies. An algorithm only runs reactively in response to a request from a HAT, processes the received data and returns results in a response. The downside of the approach is that it does not allow for accumulating data over longer periods of time (the HAT does it itself), it does not allow for aggregation of data across multiple users, and the algorithms that can be executed are limited to ones that are fixed ahead of deployment, whether traditional code or pre-trained Machine Learning models. Serverless environments (such as AWS Lambda) allow for the remaining goals of elasticity, on-demand use and ease of deployment.
Current limitation in the AWS Lambda environment is that it provides little detail and no guarantees on how a specific container instance gets reused, there are possibilities for timing-related attacks. Specifically, a common optimisation is to have some state retained in a given container (more in the sense of caching than storage as there are no guaranteed that the same container will get used), however that state can also contain data previously received from a HAT. And although interactions with a given function are driven by HATs and not functions themselves, and functions are unable to communicate with the outside world, they could respond with custom responses to a specific HAT controlled by the perpetrator. This, too, is mitigated through metadata logging, but additional controls around function scheduling and execution could eliminate the risk.
SHE functions are currently standard AWS Lambda functions and benefit from a wealth of information on how to build such functions.
While an over-simplification, it is not inaccurate to say that you can just drop in an algorithm you have already written or write one in any major language and framework:
Furthermore, HAT uses the industry-standard JSON protocol for handling data, therefore what your algorithm receives is simply a bundle of JSON records (sometimes called documents) matching your specific Data Bundle query (check the guide on Data Bundles for more details).
Your function needs to do 3 things:
untilDatequery parameters in ISO8601 format).
A common recommendation is to split your algorithm details from the Lambda function handling details - it makes testing and debugging a lot simpler. You should try and develop your entire algorithm outside the HAT (the serverless framework includes a helpful set of tools for that), exposing the three steps above as separate API Gateway endpoints. You should be able to feed the generated Data Bundle definition into the HAT you use for development, as well as the data extracted from the HAT using the bundle into your algorithm for processing.
Everything else is the details of your own implementation!
AWS Lambda functions and by extension - SHE functions also have some limitations worth noting:
Each SHE function available in a particular HAT cluster is registered in the HAT's static configuration, which provides the ID of the function along with the version to be used,
endpoint the function is allowed to publish data to and the details necessary for the HAT to know how to invoke it.
HAT internally tracks data "events" and with incoming data events it determines what functions may need to be invoked on the data. The current approach is rather straightforward: the HAT accumulates a bunch of events and checks what endpoints they were for. It then compares the set of endpoints against the functions enabled for the HAT and if there is an overlap - checks trigger details for the function. A new function execution with all data since the last execution matching the bundle is started when the trigger is either
individual (should be run for every individual data record) or
period and at least the specified period of time has passed since last invocation.
It is important to note that unless triggered manually via an API endpoint, functions for a HAT will not run if there is no new data coming in, generating data events which in turn trigger functions. In a completely inactive HAT, such functions would never be executed.
Every time the HAT decides it needs to execute a SHE function, it performs three steps:
This results in the generated data becoming available for the HAT owner and other applications the same way as any other data, with no need to deal with the complexities of running algorithms, managing dependencies between components or running dedicated infrastructure.