[B! algorithms][datamining] fcicqのブックマーク

fcicq id:fcicq

algorithmsとdataminingに関するfcicqのブックマーク (18)

GitHub - DwangoMediaVillage/pqkmeans: Fast and memory-efficient clustering
You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session. Dismiss alert
fcicq 2017/09/17
algorithms

c++

python

datamining
リンク
GitHub - hillbig/redsvd: Automatically exported from code.google.com/p/redsvd
fcicq 2016/03/29
randomized svd, by PFI (hillbig).

datamining

algorithms
リンク
冗長性が低く重要度の高いパターンの抽出(1) - sfchaos's blog
パターンマイニングはデータマイニングを代表する手法の一つで，特にアソシエーションルールを適用した「ビールとおむつ」などの例が有名です．最近は，Rなどのデータ分析ツールでもAprioriやEclat(頻出パターンマイニング), CSPADE(系列パターンマイニング)等のアルゴリズムを実行するライブラリが提供されており，パターンマイニングを実行することの障壁は比較的低くなっています．パターンマイニングでは，一般的に膨大な数のパターンが抽出されます．この事象はアイテムの組み合わせや順列の数が膨大になることに起因しており，少量のトランザクションから大量のパターンが抽出されることも決して珍しくありません*1．このような背景の下，パターンマイニングで抽出されたパターンから重要なパターンを抽出することは，大きな技術的課題の一つだと言えるでしょう．抽出したパターンは膨大な数に以上で説明したことを実
fcicq 2014/03/25
Extracting redundancy-aware top-k patterns. https://github.com/sfchaos/RedTopK

algorithms

datamining

tools
リンク
Streaming Data Mining Tutorial slides (and more)
Jelani Nelson.and Edo Liberty just released an important tutorial they gave at KDD 12 on the state of the art and practical algorithms used in mining streaming data, it is entitled: Streaming Data Mining I personally marvel at the development of these deep algorithms which, because of the large data streams constraints, get to redefine what it means to do seemingly simple functions such as count
fcicq 2012/10/07
should we also check compressive sensing?

algorithms

analysis

***

resources

presentation

datamining
リンク
The MMDS 2012 Slides are out! Workshop on Algorithms for Modern Massive Data Sets
In case you have to take your mind off tomorrow's suspense-filled and techno logically challenging landing of Curiosity on Mars (see 7 minutes of Terror, a blockbuster taking place on Mars this Summer ) Michael Mahoney, Alex Shkolnik, Gunnar Carlsson, Petros Drineas, the organizers of Workshop on Algorithms for Modern Massive Data Sets (MMDS 2012), just made available the slides of the meeting. Oth
fcicq 2012/08/06
resources

datamining

algorithms

presentation
リンク
Home - Metamarkets
Radical Transparency for Your Programmatic Data Our interactive analytics tools put the full power of data navigation and visualization into the hands of marketers. CONTACT USLEARN MORE Let us Tell You Why, not What Traditional analytics tools tell you what is happening in your marketplace with predetermined data sets. Metamarkets is the only interactive analytics platform that gives you real-time
fcicq 2012/08/06
they also have a nice blog @ http://metamarkets.com/blog/ , similar with the paper "Processing a Trillion Cells per Mouse Click" by Google

algorithms

database

datamining

***

commercial
リンク
A C++ Frequent Itemset Mining Template Library
Frequent It emset Mining (FIM) is the most researched field of frequent pattern mining. Over one hundred FIM algorithms were proposed - the majority claiming to be the most efficient. To clarify this chaos and the contradictions, two FIMI competitions were organized. It not only resulted in some excellent implementations but the community also get better understanding (limits, drawbacks, advantages
fcicq 2012/05/11
fp-growth

datamining

algorithms

library
リンク
Streaming k-means approximation - tsubosakaの日記
実家に帰省中，電車の中で読んでた論文の紹介。概要 k-meansはクラスタリングテクニックとして非常に基本的な手法である。しかし、k-meansでは全データに対してラベリングを複数回繰り返す必要があり、全データ点を保存する必要がある、そこでk-meansの出力であるk個のクラスタ中心をワンパスで見つける方法を提案する。ここで得られるクラスタ中心は最適値と比較したときにO(log k)の近似となっているストリームアルゴリズムについて本論文で言っているStreamingの意味としては入力を前から見ていって、すべて保存しないアルゴリズムのことを言っている。いわゆるオンラインアルゴリズムのように入力が入ってくるたびに何かしらの結果が得られるわけではない。また，ストリームの長さは有限である事を仮定している。 k-meansとは k-meansとはデータ点 X = {x_1 , ... x_
fcicq 2012/03/21
k-means# from NIPS2009, Streaming k-means approximation

algorithms

datamining
リンク
What are some good resources for learning about stream mining? Why?
fcicq 2012/01/18
algorithms

datamining

resources

toread
リンク
Kaggle: Your Machine Learning and Data Science Community
Kaggle is the world’s largest data science community with powerful tools and resources to help you achieve your data science goals.
fcicq 2011/12/20
algorithms

datamining

toread
リンク
直積量子化(Product Quantization)を用いた近似最近傍探索についての簡単な解説
"aka motsu-nabe" by chatani 概要冬の寒さも一段と厳しくなってまいりました。おでんや鍋が恋しくなる季節です。さて、最近ようやっと一仕事が終わりまして、長ったらしい記事が書けるようになりました。ですので、今回は2011年にTPAMIで発表された、近似最近傍探索についての論文『Product quantization for nearest neighbor search』について簡単に紹介したいと思います。この論文は2011年に発表された、最近傍探索アルゴリズムの決定打です。シンプルな理論でありながら既存手法を打ち破るほどの強力な性能を有し、速度も非常に高速、かつ省メモリなのでスマートフォンに載せ、リアルタイムで動作させることも可能です。以前この手法はCV勉強会@関東で紹介されたらしいのですが、具体的に紹介しているページは(最近すぎるので当たり前ですが)現在
fcicq 2011/11/28
the paper needs review.

datamining

algorithms

toread
リンク
Efficient Similarity Query Processing (Previously Efficient Exact Similarity Join)
Efficient Similarity Query Processing (Previously Efficient Exact Similarity Join) Given a similarity function and two sets of objects, a similarity join returns all pairs of objects (from each set respectively) such that their similarity value satisifies a given criterion. A typical example is to find pairs of documents such that their cosine similarity is above a constant threshold (say, 0.95),
fcicq 2011/11/19
algorithms

datamining

toread
リンク
Locality-sensitive hashing - Wikipedia
In computer science, locality-sensitive hashing (LSH) is a fuzzy hashing technique that hashes similar input it ems into the same "buckets" with high probability.[1] (The number of buckets is much smaller than the universe of possible input it ems.)[1] Since similar it ems end up in the same buckets, this technique can be used for data clustering and nearest neighbor search. It differs from conventio
fcicq 2011/10/30
algorithms

datamining

hash
リンク
Simple Simhashing - Ryan Moulton の記事
fcicq 2011/09/01
algorithms

datamining
リンク
『MPJoin を使った類似データ抽出　―アルゴリズムシリーズ 1―』
Hattori　です。以前書いた記事の冒頭で、”今度はシリーズで何かエントリを書きたい ! ”と軽いノリで一文を表記しておいたら、ホントにやることになりました。弊社のエンジニア組織の特徴のひとつに、手を上げる・声を上げると、『じゃ、やってよ。』というノリで返ってくるという事が挙げられるのですが、今回もその例に漏れなったわけですね・・・。シクシク・・・。というわけで、何を書こうかなぁって話しなんですが・・・。私の場合アルゴリズム系の話しかできそうにないので、毎回ポツポツとマイナーで極一部の人にしかウケないテーマを紹介して行こうと思います。で、初回の今回は SimilarityJoin 関連のアルゴリズムで　"MPJoin" というやつを紹介したいと思います。 ■　Similarity Join とは何ぞや？まず最初に SimilarityJoin [1] の定義なんですが、ざっくり
fcicq 2011/02/16
they opensourced their implementation

algorithms

datamining

library
リンク
大規模データで単語の数を数える - ny23の日記
大規模データから one-pass で it em（n-gram など）の頻度を数える手法に関するメモ．ここ数年，毎年のように超大規模な n-gram の統計情報を空間／時間効率良く利用するための手法が提案されている．最近だと， Storing the Web in Memory: Space Efficient Language Models with Constant Time Retrieval (EM NLP 2010) とか．この論文では，最小完全ハッシュ関数や power-law を考慮した頻度表現の圧縮など，細かい技術を丁寧に組み上げており，これぐらい工夫が細かくなってくるとlog-frequency Bloom filter (ACL 2007) ぐらいからから始まった n-gram 頻度情報の圧縮の研究もそろそろ収束したかという印象（ちょうど論文を読む直前に，この論文の7節の
fcicq 2011/01/27
log

algorithms

datamining

nlp

toread
リンク
Mining of Massive Datasets
The book has a new Web site www.mmds.org. This page will no longer be maintained. Your browser should be automatically redirected to the new site in 10 seconds. The book has now been published by Cambridge University Press. The publisher is offering a 20% discount to anyone who buys the hardcopy Here. By agreement with the publisher, you can still download it free from this page. Cambridge Press d
fcicq 2011/01/03
datamining

book

algorithms

***

toread
リンク
Stanford CS246: Mining Massive Data Sets
Course information: This course is the first part in a two part sequence CS246/CS341 replacing CS345A. CS246 will discuss methods and algorithms for mining massive data sets, while CS341 (Advanced Topics in Data Mining) will be a project-focused advanced class. Instructor: Jure Leskovec Office Hours: TBD, Gates 418 Room: Mon,Wed 9:30-10:45 in 420-041 (Jordan Hall, room 041) Teaching assistants: Ad
fcicq 2010/12/25
algorithms

datamining

hadoop
リンク
1