To Nha Notes | June 12, 2023, 10:27 p.m.
Avro is an open-source data serialization and RPC framework originally developed for use with Apache Hadoop. It utilizes schemas defined in JSON to produce serialized data in a compact binary format. The serialized data can be sent to any destination (i.e. application or program) and can be easily deserialized at the destination because the schema is included in the data.
An Avro schema consists of a JSON string, object, or array that defines the type of schema and the data attributes (field names, data types, etc.) for the schema type. The attributes differ depending on the schema type. Complex data types such as arrays and maps are supported.
Snowflake reads Avro data into a single VARIANT column. You can query the data in a VARIANT column just as you would JSON data, using similar commands and functions.
View avro files online
https://dataformat.net/avro/viewer-and-converter
https://extendsclass.com/avro-viewer.html
The library Avsc, a pure JavaScript implementation of the Avro specification
$ npm install avsc
avsc is compatible with all versions of node.js since 0.11.
Inside a node.js module, or using browserify:
const avro = require('avsc');
Encode and decode values from a known schema:
const type = avro.Type.forSchema({
type: 'record',
name: 'Pet',
fields: [
{
name: 'kind',
type: {type: 'enum', name: 'PetKind', symbols: ['CAT', 'DOG']}
},
{name: 'name', type: 'string'}
]
});
const buf = type.toBuffer({kind: 'CAT', name: 'Albert'}); // Encoded buffer.
const val = type.fromBuffer(buf); // = {kind: 'CAT', name: 'Albert'}
Infer a value's schema and encode similar values:
const type = avro.Type.forValue({
city: 'Cambridge',
zipCodes: ['02138', '02139'],
visits: 2
});
// We can use `type` to encode any values with the same structure:
const bufs = [
type.toBuffer({city: 'Seattle', zipCodes: ['98101'], visits: 3}),
type.toBuffer({city: 'NYC', zipCodes: [], visits: 0})
];
Get a readable stream of decoded values from an Avro container file compressed using Snappy (see the BlockDecoder API for an example including checksum validation):
const snappy = require('snappy'); // Or your favorite Snappy library.
const codecs = {
snappy: function (buf, cb) {
// Avro appends checksums to compressed blocks, which we skip here.
return snappy.uncompress(buf.slice(0, buf.length - 4), cb);
}
};
avro.createFileDecoder('./values.avro', {codecs})
.on('metadata', function (type) { /* `type` is the writer's type. */ })
.on('data', function (val) { /* Do something with the decoded value. */ });
Implement a TCP server for an IDL-defined protocol:
// We first generate a protocol from its IDL specification.
const protocol = avro.readProtocol(`
protocol LengthService {
/** Endpoint which returns the length of the input string. */
int stringLength(string str);
}
`);
// We then create a corresponding server, implementing our endpoint.
const server = avro.Service.forProtocol(protocol)
.createServer()
.onStringLength(function (str, cb) { cb(null, str.length); });
// Finally, we use our server to respond to incoming TCP connections!
require('net').createServer()
.on('connection', (con) => { server.createChannel(con); })
.listen(24950);
https://www.npmjs.com/package/avsc
https://medium.com/@frankbaele/avro-containers-in-nodejs-95b72cbdde90
https://github.com/frankbaele/examples/tree/master
https://github.com/mtth/avsc/issues/81
https://docs.snowflake.com/en/user-guide/semistructured-data-formats#avro