[B! algorithm][hyperLogLog] manboubirdのブックマーク

manboubird id:manboubird

algorithmとhyperLogLogに関するmanboubirdのブックマーク (15)

Count-distinct problem - Wikipedia
manboubird 2020/11/14
hyperLogLog

algorithm

countDistinctProblem

approximateQuery

dataSketch
リンク
GitHub - google/zetasketch: A collection of libraries for single-pass, distributed, sublinear-space approximate aggregation and sketching algorithms. Currently: HyperLogLog++; more to come.
manboubird 2020/11/14
hyperLogLog

approximateQuery

zetasketch

sketch

google

algorithm

dataSketch
リンク
Data Sketching - ACM Queue
May 31, 2017 Volume 15, issue 2 PDF Data Sketching The approximate approach is often faster and more efficient. Graham Cormode Do you ever feel overwhelmed by an unending stream of information? It can seem like a barrage of new em ail and text messages demands constant attention, and there are also phone calls to pick up, articles to read, and knocks on the door to answer. Putting these pieces toge
manboubird 2019/03/09
dataSketching

sql

acmqueue

BloomFilter

probabilisticDataStructure

hyperLogLog

algorithm
リンク
https://dl.acm.org/citation.cfm?doid=2452376.2452456
manboubird 2015/07/15
HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm

paper

edbt

hyperLogLog

algorithm
リンク
乱択データ構造の最新事情－MinHash と HyperLogLog の最近の進歩－
Two sentences are tokenized and encoded by a BERT model. The first sentence describes two kids playing with a green crocodile float in a swimming pool. The second sentence describes two kids pushing an inflatable crocodile around in a pool. The tokenized sentences are passed through the BERT model, which outputs the encoded representations of the token sequences.
manboubird 2014/08/27
hyperLogLog

slide

algorithm
リンク
https://tech.nextroll.com/media/hllminhash.pdf
manboubird 2013/09/28
HyperLogLog and MinHash A Union for Intersections, Andrew Pascoe

hyperLogLog

minhash

algorithm

paper

adRoll
リンク
GitHub - MLnick/hive-udf: Approximate cardinality estimation with HyperLogLog, as a Hive function
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
manboubird 2013/02/07
hive

hyperLogLog

algorithm

udf
リンク
Probabilistic Data Structures for Web Analytics and Data Mining
Statistical analysis and mining of huge multi-terabyte data sets is a common task nowadays, especially in the areas like web analytics and Internet advertising. Analysis of such large data sets often requires powerful distributed data stores like Hadoop and heavy data processing with techniques like MapReduce. This approach often leads to heavyweight high-latency analytical processes and poor appl
manboubird 2013/02/07
hyperLogLog

algorithm

analytics

countSketch
リンク
http://research.google.com/pubs/archive/40671.pdf
manboubird 2013/02/07
paper

hyperLogLog

algorithm

google
リンク
HyperLogLog++: Google’s Take On Engineering HLL –
Matt Abrams recently pointed me to Google’s excellent paper “HyperLogLog in Practice: Algorithmic Engineering of a State of The Art Cardinality Estimation Algorithm” [UPDATE: changed the link to the paper version without typos] and I thought I’d share my take on it and explain a few points that I had trouble getting through the first time. The paper offers a few interesting improvements that are w
manboubird 2013/02/07
hyperLogLog

algorithm

google

paper
リンク
Set Operations On HLLs of Different Sizes –
Introduction Here at AK, we’re in the business of storing huge amounts of information in the form of 64 bit keys. As shown in other blog posts and in the HLL post by Matt, one efficient way of getting an estimate of the size of the set of these keys is by using the HyperLogLog (HLL) algorithm. There are two important decisions one has to make when implementing this algorithm. The first is how ma
manboubird 2013/02/07
hyperLogLog

algorithm
リンク
Sketch of the Day: HyperLogLog — Cornerstone of a Big Data Infrastructure –
Sketch of the Day: HyperLogLog — Cornerstone of a Big Data Infrastructure Intro In the Zipfian world of AK, the HyperLogLog distinct value (DV) sketch reigns supreme. This DV sketch is the workhorse behind the majority of our DV counters (and we’re not alone) and enables us to have a real time, in memory data store with incredibly high throughput. HLL was conceived of by Flajolet et. al. in the ph
manboubird 2013/02/07
hyperLogLog

algorithm
リンク
https://algo.inria.fr/flajolet/Publications/DuFl03.pdf
manboubird 2013/02/07
paper

hyperLogLog

algorithm
リンク
HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm (PDF形式)
manboubird 2013/02/07
paper

hyperLogLog

algorithm
リンク
Big Data Counting: How to count a billion distinct objects using only 1.5KB of Memory - High Scalability -
The table shows that we can count the words with a 3% error rate using only 512 bytes of space. Compare that to a perfect count using a HashMap that requires nearly 10 megabytes of space and you can easily see why cardinality estimators are useful. In applications where accuracy is not paramount, which is true for most web scale and network counting scenarios, using a probabilistic count
manboubird 2012/04/28
aggregation

implementation

cardinalityEstimation

hyperLogLog

algorithm
リンク
1

お知らせ

もっと読む

公式Twitter

@HatenaBookmark
リリース、障害情報などのサービスのお知らせ
@hatebu
最新の人気エントリーの配信

キーボードショートカット一覧

j次のブックマーク

k前のブックマーク

lあとで読む

eコメント一覧を開く

oページを開く

設定を変更しましたx