wiki.techinc.nl/includes/debug/logger/monolog/AvroFormatter.php

172 lines
4.7 KiB
PHP
Raw Normal View History

Produce monolog messages through kafka+avro This allows a logging channel to be configured to write directly to kafka. Logs can be serialized either to json blobs or the more compact apache avro format. The Kafka handler for monolog needs a list of one of more kafka servers to query cluster metadata from. This should be able to use any monolog formatter, although some like JsonFormatter require you to disable formatBatch as Kafka protocol would prefer to encode each record independently in the protocol. This requires the nmred/kafka-php library, version >= 1.3.0. Adds a new formatter which serializes to the apache avro format. This is a compact binary format which uses pre- defined schemas. This initial implementation is very simple and takes the plain schemas as a constructor argument. Adds a new option to MonologSpi to wrap handlers in a BufferHandler. This doesn't flush until the request shuts down and prevents any network requests in the logger from adding latency to web requests. Related mediawiki/vendor update: Ibfe4bd2036ae8e998e2973f07bd9a6f057691578 The necessary config is something like: array( 'loggers' => array( 'CirrusSearchRequests' => array( 'handlers' => array( 'kafka' ), ), ), 'handlers' => array( 'kafka' => array( 'factory' => '\\MediaWiki\\Logger\\Monolog\\KafkaHandler::factory', 'args' => array( 'localhost:9092' ), 'formatter' => 'avro', 'buffer' => true, ), ), 'formatters' => array( 'avro' => array( 'class' => '\\MediaWiki\\Logger\\Monolog\\AvroFormatter', 'args' => array( array( 'CirrusSearchRequests' => array( 'type' => 'record', 'name' => 'CirrusSearchRequests' 'fields' => array( ... ) ), ), ), ), ), ) Bug: T106256 Change-Id: I6ee744b3e5306af0bed70811b558a543eed22840
2015-08-04 18:02:47 +00:00
<?php
/**
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*
* This program is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
* http://www.gnu.org/copyleft/gpl.html
*
* @file
*/
namespace MediaWiki\Logger\Monolog;
use AvroIODatumWriter;
use AvroIOBinaryEncoder;
use AvroIOTypeException;
use AvroSchema;
use AvroStringIO;
use AvroValidator;
use Monolog\Formatter\FormatterInterface;
/**
* Log message formatter that uses the apache Avro format.
*
* @since 1.26
* @author Erik Bernhardson <ebernhardson@wikimedia.org>
* @copyright © 2015 Erik Bernhardson and Wikimedia Foundation.
*/
class AvroFormatter implements FormatterInterface {
/**
* @var Magic byte to encode schema revision id.
*/
const MAGIC = 0x0;
Produce monolog messages through kafka+avro This allows a logging channel to be configured to write directly to kafka. Logs can be serialized either to json blobs or the more compact apache avro format. The Kafka handler for monolog needs a list of one of more kafka servers to query cluster metadata from. This should be able to use any monolog formatter, although some like JsonFormatter require you to disable formatBatch as Kafka protocol would prefer to encode each record independently in the protocol. This requires the nmred/kafka-php library, version >= 1.3.0. Adds a new formatter which serializes to the apache avro format. This is a compact binary format which uses pre- defined schemas. This initial implementation is very simple and takes the plain schemas as a constructor argument. Adds a new option to MonologSpi to wrap handlers in a BufferHandler. This doesn't flush until the request shuts down and prevents any network requests in the logger from adding latency to web requests. Related mediawiki/vendor update: Ibfe4bd2036ae8e998e2973f07bd9a6f057691578 The necessary config is something like: array( 'loggers' => array( 'CirrusSearchRequests' => array( 'handlers' => array( 'kafka' ), ), ), 'handlers' => array( 'kafka' => array( 'factory' => '\\MediaWiki\\Logger\\Monolog\\KafkaHandler::factory', 'args' => array( 'localhost:9092' ), 'formatter' => 'avro', 'buffer' => true, ), ), 'formatters' => array( 'avro' => array( 'class' => '\\MediaWiki\\Logger\\Monolog\\AvroFormatter', 'args' => array( array( 'CirrusSearchRequests' => array( 'type' => 'record', 'name' => 'CirrusSearchRequests' 'fields' => array( ... ) ), ), ), ), ), ) Bug: T106256 Change-Id: I6ee744b3e5306af0bed70811b558a543eed22840
2015-08-04 18:02:47 +00:00
/**
* @var array Map from schema name to schema definition
*/
protected $schemas;
/**
* @var AvroStringIO
*/
protected $io;
/**
* @var AvroIOBinaryEncoder
*/
protected $encoder;
/**
* @var AvroIODatumWriter
*/
protected $writer;
/**
* @var array $schemas Map from Monolog channel to Avro schema.
* Each schema can be either the JSON string or decoded into PHP
* arrays.
*/
public function __construct( array $schemas ) {
$this->schemas = $schemas;
$this->io = new AvroStringIO( '' );
$this->encoder = new AvroIOBinaryEncoder( $this->io );
$this->writer = new AvroIODatumWriter();
}
/**
* Formats the record context into a binary string per the
* schema configured for the records channel.
*
* @param array $record
* @return string|null The serialized record, or null if
* the record is not valid for the selected schema.
*/
public function format( array $record ) {
$this->io->truncate();
$schema = $this->getSchema( $record['channel'] );
$revId = $this->getSchemaRevisionId( $record['channel'] );
if ( $schema === null || $revId === null ) {
Produce monolog messages through kafka+avro This allows a logging channel to be configured to write directly to kafka. Logs can be serialized either to json blobs or the more compact apache avro format. The Kafka handler for monolog needs a list of one of more kafka servers to query cluster metadata from. This should be able to use any monolog formatter, although some like JsonFormatter require you to disable formatBatch as Kafka protocol would prefer to encode each record independently in the protocol. This requires the nmred/kafka-php library, version >= 1.3.0. Adds a new formatter which serializes to the apache avro format. This is a compact binary format which uses pre- defined schemas. This initial implementation is very simple and takes the plain schemas as a constructor argument. Adds a new option to MonologSpi to wrap handlers in a BufferHandler. This doesn't flush until the request shuts down and prevents any network requests in the logger from adding latency to web requests. Related mediawiki/vendor update: Ibfe4bd2036ae8e998e2973f07bd9a6f057691578 The necessary config is something like: array( 'loggers' => array( 'CirrusSearchRequests' => array( 'handlers' => array( 'kafka' ), ), ), 'handlers' => array( 'kafka' => array( 'factory' => '\\MediaWiki\\Logger\\Monolog\\KafkaHandler::factory', 'args' => array( 'localhost:9092' ), 'formatter' => 'avro', 'buffer' => true, ), ), 'formatters' => array( 'avro' => array( 'class' => '\\MediaWiki\\Logger\\Monolog\\AvroFormatter', 'args' => array( array( 'CirrusSearchRequests' => array( 'type' => 'record', 'name' => 'CirrusSearchRequests' 'fields' => array( ... ) ), ), ), ), ), ) Bug: T106256 Change-Id: I6ee744b3e5306af0bed70811b558a543eed22840
2015-08-04 18:02:47 +00:00
trigger_error( "The schema for channel '{$record['channel']}' is not available" );
return null;
}
try {
$this->writer->write_data( $schema, $record['context'], $this->encoder );
} catch ( AvroIOTypeException $e ) {
$errors = AvroValidator::getErrors( $schema, $record['context'] );
$json = json_encode( $errors );
trigger_error( "Avro failed to serialize record for {$record['channel']} : {$json}" );
return null;
}
return chr( self::MAGIC ) . $this->encodeLong( $revId ) . $this->io->string();
Produce monolog messages through kafka+avro This allows a logging channel to be configured to write directly to kafka. Logs can be serialized either to json blobs or the more compact apache avro format. The Kafka handler for monolog needs a list of one of more kafka servers to query cluster metadata from. This should be able to use any monolog formatter, although some like JsonFormatter require you to disable formatBatch as Kafka protocol would prefer to encode each record independently in the protocol. This requires the nmred/kafka-php library, version >= 1.3.0. Adds a new formatter which serializes to the apache avro format. This is a compact binary format which uses pre- defined schemas. This initial implementation is very simple and takes the plain schemas as a constructor argument. Adds a new option to MonologSpi to wrap handlers in a BufferHandler. This doesn't flush until the request shuts down and prevents any network requests in the logger from adding latency to web requests. Related mediawiki/vendor update: Ibfe4bd2036ae8e998e2973f07bd9a6f057691578 The necessary config is something like: array( 'loggers' => array( 'CirrusSearchRequests' => array( 'handlers' => array( 'kafka' ), ), ), 'handlers' => array( 'kafka' => array( 'factory' => '\\MediaWiki\\Logger\\Monolog\\KafkaHandler::factory', 'args' => array( 'localhost:9092' ), 'formatter' => 'avro', 'buffer' => true, ), ), 'formatters' => array( 'avro' => array( 'class' => '\\MediaWiki\\Logger\\Monolog\\AvroFormatter', 'args' => array( array( 'CirrusSearchRequests' => array( 'type' => 'record', 'name' => 'CirrusSearchRequests' 'fields' => array( ... ) ), ), ), ), ), ) Bug: T106256 Change-Id: I6ee744b3e5306af0bed70811b558a543eed22840
2015-08-04 18:02:47 +00:00
}
/**
* Format a set of records into a list of binary strings
* conforming to the configured schema.
*
* @param array $records
* @return string[]
*/
public function formatBatch( array $records ) {
$result = [];
Produce monolog messages through kafka+avro This allows a logging channel to be configured to write directly to kafka. Logs can be serialized either to json blobs or the more compact apache avro format. The Kafka handler for monolog needs a list of one of more kafka servers to query cluster metadata from. This should be able to use any monolog formatter, although some like JsonFormatter require you to disable formatBatch as Kafka protocol would prefer to encode each record independently in the protocol. This requires the nmred/kafka-php library, version >= 1.3.0. Adds a new formatter which serializes to the apache avro format. This is a compact binary format which uses pre- defined schemas. This initial implementation is very simple and takes the plain schemas as a constructor argument. Adds a new option to MonologSpi to wrap handlers in a BufferHandler. This doesn't flush until the request shuts down and prevents any network requests in the logger from adding latency to web requests. Related mediawiki/vendor update: Ibfe4bd2036ae8e998e2973f07bd9a6f057691578 The necessary config is something like: array( 'loggers' => array( 'CirrusSearchRequests' => array( 'handlers' => array( 'kafka' ), ), ), 'handlers' => array( 'kafka' => array( 'factory' => '\\MediaWiki\\Logger\\Monolog\\KafkaHandler::factory', 'args' => array( 'localhost:9092' ), 'formatter' => 'avro', 'buffer' => true, ), ), 'formatters' => array( 'avro' => array( 'class' => '\\MediaWiki\\Logger\\Monolog\\AvroFormatter', 'args' => array( array( 'CirrusSearchRequests' => array( 'type' => 'record', 'name' => 'CirrusSearchRequests' 'fields' => array( ... ) ), ), ), ), ), ) Bug: T106256 Change-Id: I6ee744b3e5306af0bed70811b558a543eed22840
2015-08-04 18:02:47 +00:00
foreach ( $records as $record ) {
$message = $this->format( $record );
if ( $message !== null ) {
$result[] = $message;
}
}
return $result;
}
/**
* Get the writer for the named channel
*
* @var string $channel Name of the schema to fetch
* @return \AvroSchema|null
Produce monolog messages through kafka+avro This allows a logging channel to be configured to write directly to kafka. Logs can be serialized either to json blobs or the more compact apache avro format. The Kafka handler for monolog needs a list of one of more kafka servers to query cluster metadata from. This should be able to use any monolog formatter, although some like JsonFormatter require you to disable formatBatch as Kafka protocol would prefer to encode each record independently in the protocol. This requires the nmred/kafka-php library, version >= 1.3.0. Adds a new formatter which serializes to the apache avro format. This is a compact binary format which uses pre- defined schemas. This initial implementation is very simple and takes the plain schemas as a constructor argument. Adds a new option to MonologSpi to wrap handlers in a BufferHandler. This doesn't flush until the request shuts down and prevents any network requests in the logger from adding latency to web requests. Related mediawiki/vendor update: Ibfe4bd2036ae8e998e2973f07bd9a6f057691578 The necessary config is something like: array( 'loggers' => array( 'CirrusSearchRequests' => array( 'handlers' => array( 'kafka' ), ), ), 'handlers' => array( 'kafka' => array( 'factory' => '\\MediaWiki\\Logger\\Monolog\\KafkaHandler::factory', 'args' => array( 'localhost:9092' ), 'formatter' => 'avro', 'buffer' => true, ), ), 'formatters' => array( 'avro' => array( 'class' => '\\MediaWiki\\Logger\\Monolog\\AvroFormatter', 'args' => array( array( 'CirrusSearchRequests' => array( 'type' => 'record', 'name' => 'CirrusSearchRequests' 'fields' => array( ... ) ), ), ), ), ), ) Bug: T106256 Change-Id: I6ee744b3e5306af0bed70811b558a543eed22840
2015-08-04 18:02:47 +00:00
*/
protected function getSchema( $channel ) {
if ( !isset( $this->schemas[$channel] ) ) {
return null;
}
if ( !isset( $this->schemas[$channel]['revision'], $this->schemas[$channel]['schema'] ) ) {
return null;
}
if ( !$this->schemas[$channel]['schema'] instanceof AvroSchema ) {
$schema = $this->schemas[$channel]['schema'];
if ( is_string( $schema ) ) {
$this->schemas[$channel]['schema'] = AvroSchema::parse( $schema );
Produce monolog messages through kafka+avro This allows a logging channel to be configured to write directly to kafka. Logs can be serialized either to json blobs or the more compact apache avro format. The Kafka handler for monolog needs a list of one of more kafka servers to query cluster metadata from. This should be able to use any monolog formatter, although some like JsonFormatter require you to disable formatBatch as Kafka protocol would prefer to encode each record independently in the protocol. This requires the nmred/kafka-php library, version >= 1.3.0. Adds a new formatter which serializes to the apache avro format. This is a compact binary format which uses pre- defined schemas. This initial implementation is very simple and takes the plain schemas as a constructor argument. Adds a new option to MonologSpi to wrap handlers in a BufferHandler. This doesn't flush until the request shuts down and prevents any network requests in the logger from adding latency to web requests. Related mediawiki/vendor update: Ibfe4bd2036ae8e998e2973f07bd9a6f057691578 The necessary config is something like: array( 'loggers' => array( 'CirrusSearchRequests' => array( 'handlers' => array( 'kafka' ), ), ), 'handlers' => array( 'kafka' => array( 'factory' => '\\MediaWiki\\Logger\\Monolog\\KafkaHandler::factory', 'args' => array( 'localhost:9092' ), 'formatter' => 'avro', 'buffer' => true, ), ), 'formatters' => array( 'avro' => array( 'class' => '\\MediaWiki\\Logger\\Monolog\\AvroFormatter', 'args' => array( array( 'CirrusSearchRequests' => array( 'type' => 'record', 'name' => 'CirrusSearchRequests' 'fields' => array( ... ) ), ), ), ), ), ) Bug: T106256 Change-Id: I6ee744b3e5306af0bed70811b558a543eed22840
2015-08-04 18:02:47 +00:00
} else {
$this->schemas[$channel]['schema'] = AvroSchema::real_parse(
$schema
Produce monolog messages through kafka+avro This allows a logging channel to be configured to write directly to kafka. Logs can be serialized either to json blobs or the more compact apache avro format. The Kafka handler for monolog needs a list of one of more kafka servers to query cluster metadata from. This should be able to use any monolog formatter, although some like JsonFormatter require you to disable formatBatch as Kafka protocol would prefer to encode each record independently in the protocol. This requires the nmred/kafka-php library, version >= 1.3.0. Adds a new formatter which serializes to the apache avro format. This is a compact binary format which uses pre- defined schemas. This initial implementation is very simple and takes the plain schemas as a constructor argument. Adds a new option to MonologSpi to wrap handlers in a BufferHandler. This doesn't flush until the request shuts down and prevents any network requests in the logger from adding latency to web requests. Related mediawiki/vendor update: Ibfe4bd2036ae8e998e2973f07bd9a6f057691578 The necessary config is something like: array( 'loggers' => array( 'CirrusSearchRequests' => array( 'handlers' => array( 'kafka' ), ), ), 'handlers' => array( 'kafka' => array( 'factory' => '\\MediaWiki\\Logger\\Monolog\\KafkaHandler::factory', 'args' => array( 'localhost:9092' ), 'formatter' => 'avro', 'buffer' => true, ), ), 'formatters' => array( 'avro' => array( 'class' => '\\MediaWiki\\Logger\\Monolog\\AvroFormatter', 'args' => array( array( 'CirrusSearchRequests' => array( 'type' => 'record', 'name' => 'CirrusSearchRequests' 'fields' => array( ... ) ), ), ), ), ), ) Bug: T106256 Change-Id: I6ee744b3e5306af0bed70811b558a543eed22840
2015-08-04 18:02:47 +00:00
);
}
}
return $this->schemas[$channel]['schema'];
}
/**
* Get the writer for the named channel
*
* @var string $channel Name of the schema
* @return int|null
*/
public function getSchemaRevisionId( $channel ) {
if ( isset( $this->schemas[$channel]['revision'] ) ) {
return (int)$this->schemas[$channel]['revision'];
}
return null;
}
/**
* convert an integer to a 64bits big endian long (Java compatible)
* NOTE: certainly only compatible with PHP 64bits
* @param int $id
* @return string the binary representation of $id
*/
private function encodeLong( $id ) {
$high = ( $id & 0xffffffff00000000 ) >> 32;
$low = $id & 0x00000000ffffffff;
return pack( 'NN', $high, $low );
Produce monolog messages through kafka+avro This allows a logging channel to be configured to write directly to kafka. Logs can be serialized either to json blobs or the more compact apache avro format. The Kafka handler for monolog needs a list of one of more kafka servers to query cluster metadata from. This should be able to use any monolog formatter, although some like JsonFormatter require you to disable formatBatch as Kafka protocol would prefer to encode each record independently in the protocol. This requires the nmred/kafka-php library, version >= 1.3.0. Adds a new formatter which serializes to the apache avro format. This is a compact binary format which uses pre- defined schemas. This initial implementation is very simple and takes the plain schemas as a constructor argument. Adds a new option to MonologSpi to wrap handlers in a BufferHandler. This doesn't flush until the request shuts down and prevents any network requests in the logger from adding latency to web requests. Related mediawiki/vendor update: Ibfe4bd2036ae8e998e2973f07bd9a6f057691578 The necessary config is something like: array( 'loggers' => array( 'CirrusSearchRequests' => array( 'handlers' => array( 'kafka' ), ), ), 'handlers' => array( 'kafka' => array( 'factory' => '\\MediaWiki\\Logger\\Monolog\\KafkaHandler::factory', 'args' => array( 'localhost:9092' ), 'formatter' => 'avro', 'buffer' => true, ), ), 'formatters' => array( 'avro' => array( 'class' => '\\MediaWiki\\Logger\\Monolog\\AvroFormatter', 'args' => array( array( 'CirrusSearchRequests' => array( 'type' => 'record', 'name' => 'CirrusSearchRequests' 'fields' => array( ... ) ), ), ), ), ), ) Bug: T106256 Change-Id: I6ee744b3e5306af0bed70811b558a543eed22840
2015-08-04 18:02:47 +00:00
}
}