Pure JavaScript implementation of the Avro specification

To Nha Notes | June 12, 2023, 10:27 p.m.

What is Avro?

Avro is an open-source data serialization and RPC framework originally developed for use with Apache Hadoop. It utilizes schemas defined in JSON to produce serialized data in a compact binary format. The serialized data can be sent to any destination (i.e. application or program) and can be easily deserialized at the destination because the schema is included in the data.

An Avro schema consists of a JSON string, object, or array that defines the type of schema and the data attributes (field names, data types, etc.) for the schema type. The attributes differ depending on the schema type. Complex data types such as arrays and maps are supported.

Snowflake reads Avro data into a single VARIANT column. You can query the data in a VARIANT column just as you would JSON data, using similar commands and functions.

Avro sample data 

View avro files online

https://dataformat.net/avro/viewer-and-converter

https://extendsclass.com/avro-viewer.html

The library Avsc, a pure JavaScript implementation of the Avro specification

Features

Installation

$ npm install avsc

avsc is compatible with all versions of node.js since 0.11.

Examples

Inside a node.js module, or using browserify:

const avro = require('avsc');
  • Encode and decode values from a known schema:

    const type = avro.Type.forSchema({
      type: 'record',
      name: 'Pet',
      fields: [
        {
          name: 'kind',
          type: {type: 'enum', name: 'PetKind', symbols: ['CAT', 'DOG']}
        },
        {name: 'name', type: 'string'}
      ]
    });
    
    const buf = type.toBuffer({kind: 'CAT', name: 'Albert'}); // Encoded buffer.
    const val = type.fromBuffer(buf); // = {kind: 'CAT', name: 'Albert'}
  • Infer a value's schema and encode similar values:

    const type = avro.Type.forValue({
      city: 'Cambridge',
      zipCodes: ['02138', '02139'],
      visits: 2
    });
    
    // We can use `type` to encode any values with the same structure:
    const bufs = [
      type.toBuffer({city: 'Seattle', zipCodes: ['98101'], visits: 3}),
      type.toBuffer({city: 'NYC', zipCodes: [], visits: 0})
    ];
  • Get a readable stream of decoded values from an Avro container file compressed using Snappy (see the BlockDecoder API for an example including checksum validation):

    const snappy = require('snappy'); // Or your favorite Snappy library.
    const codecs = {
      snappy: function (buf, cb) {
        // Avro appends checksums to compressed blocks, which we skip here.
        return snappy.uncompress(buf.slice(0, buf.length - 4), cb);
      }
    };
    
    avro.createFileDecoder('./values.avro', {codecs})
      .on('metadata', function (type) { /* `type` is the writer's type. */ })
      .on('data', function (val) { /* Do something with the decoded value. */ });
  • Implement a TCP server for an IDL-defined protocol:

    // We first generate a protocol from its IDL specification.
    const protocol = avro.readProtocol(`
      protocol LengthService {
        /** Endpoint which returns the length of the input string. */
        int stringLength(string str);
      }
    `);
    
    // We then create a corresponding server, implementing our endpoint.
    const server = avro.Service.forProtocol(protocol)
      .createServer()
      .onStringLength(function (str, cb) { cb(null, str.length); });
    
    // Finally, we use our server to respond to incoming TCP connections!
    require('net').createServer()
      .on('connection', (con) => { server.createChannel(con); })
      .listen(24950);
References

https://www.npmjs.com/package/avsc

https://github.com/mtth/avsc

https://avro.apache.org/

https://parquet.apache.org/

https://medium.com/@frankbaele/avro-containers-in-nodejs-95b72cbdde90

https://github.com/frankbaele/examples/tree/master

https://github.com/mtth/avsc/issues/81

https://docs.snowflake.com/en/user-guide/semistructured-data-formats#avro