For a large scale site such as LinkedIn, tracking metrics accurately and efficiently is an important task. For example, imagine we need a dashboard that shows the number of visitors to every page on the site over the last thirty days. To keep this dashboard up to date, we can schedule a query that runs daily and gathers the stats for the last 30 days. However, this simple implementation would be w
As we can see, the DataFu version provides a noticable improvement in both metrics. Glad to know that work wasn't all for naught. Creating a custom purpose UDF Many UDFs, such as those presented in the previous section, are general purpose. DataFu serves to collect these UDFs and make sure they are tested and easily available. If you are writing such a UDF, then we will happily accept contribution
This document discusses tools and techniques used at LinkedIn for building data products, including Pig, Java, R, Hive, Voldemort, Kafka, and Azkaban. It describes how LinkedIn uses Pig extensively for its conciseness and expressiveness. DataFu is introduced as an open source library of user-defined functions (UDFs) for Pig that was created to share useful UDFs developed by different teams at Link
リリース、障害情報などのサービスのお知らせ
最新の人気エントリーの配信
処理を実行中です
j次のブックマーク
k前のブックマーク
lあとで読む
eコメント一覧を開く
oページを開く