《Data Algorithm》读书笔记之八 — 共同好友

最新推荐文章于 2021-02-23 18:55:15 发布

原创最新推荐文章于 2021-02-23 18:55:15 发布 · 463 阅读

1 ·

CC 4.0 BY-SA版权

喜欢文章？请私信联系作者。

《Data Algorithms》专栏收录该内容

10 篇文章

订阅专栏

本文介绍了一种使用MapReduce计算社交网络中用户共同好友的高效算法。通过将用户及其好友列表映射为键值对，再利用Reduce阶段计算交集，得出用户间的共同好友列表。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

《`Data Algorithm`》读书笔记之八 — 共同好友

1. 需求

在所有的用户对中找出共同好友。

2.实现思路

找出一个设计网络中用户之间的共同好友。下面给出两种可能解决的方案：

01.使用一个缓存策略，将共同好友保存在同一个缓存中
02.使用MapReduce 每天计算一次每个人的共同好友并存储这些结果

令U为包含所有用户的一个集合：{U1,U2....Un}，目标是为每个 (Ui,Uj)(i!=j) 对找出共同好友。
解决方案有如下几种：

01.MapReduce/Hadoop，使用基本数据类型
02.MapReduce/hadoop，用定制数据类型

如果不使用MapReduce，如何计算出共同好友？

使用简单的java 程序demo展示。
对简单算法中的迭代处理小集合来提供性能不大理解。

2.2 使用MapReduce 方法解决

使用MapReduce方法解决共同好友 问题的主要步骤：

step 01.map 的输入是某个用户的好友列表
step 02.map 的输出的键是所有可能有共同好友的用户。 => (u1,u2)比如，这样就知道某个键值对是用于计算 (u1,u2)的共同好友
map 函数的输入：(key1,list<F>)；输出：(<key2,Fi>,list<F>)。其中F表示的是这个key1的好友； list 代表的就是这个key1的好友列表；
step 03. reduce 中进行共同好友的遍历。当然这个遍历就会很快，因为这个遍历都是某两个用户的共同好友。数据量很小。最后计算出一个共同好友列表。作为 reduce 的 value 输出。
reduce 函数
输入： (<key2,Fi>,list<F>)
输出： (<key,f>,list<f>)
其中的输出的意思是指： <key,f>指的是：<key,f>这两个用户的共同好友结果。其 list<f> 表示的是共同好友列表。

下面针对一个输入数据，给出具体的演算步骤：
例如，我们得到的输入文件是：

200, 100 300 400 
400, 100 200 300

我们想求<200,400> 这两个好友的共同好友，那么就会得到一次次的分析就是下面这个样子：
200, 100 300 400
=>

<200,100> [300 400]
<200,300> [100 400]
<200,400> [100 300]

400, 100 200 300
=>

<400, 100> [200 300]
<400, 200> [100 300]
<400, 300> [100 200]

<200,100> [300 400] 代表的是：user_id = 200 与 user_id = 100 的可能的共同好友列表是【300,400】。
但是问题来了，如何求共同好友呢？
我们只需要比对<200,400> [100 300] 和 <400, 200> [100 300] 中的朋友清单数据就行了。但是又因为MapReduce中都是键值对的存在，如何根据两个不同的键 <200,400> ， <400,200>求出其共同好友呢。这个也很好解决，我们只需要将<400,200> 写成 <200,400>即可【在Mapper代码中稍微进行一个排序即可】。

2.3 需要注意的问题有：

好友关系是否是双向的。
共同好友特性（你和mary 有多少个共同好友？）

3.样例测试

3.1测试数据1

<person>,<friend1>,<friend2>,<friend3>...
一般的，这里的每个用户都会由id来标识，所以更一般的，得到的输入文件是如下这个样子：

100, 200 300 400 500 600 
200, 100 300 400 
300, 100 200 400 500
400, 100 200 300
500, 100 300
600,100

{A1,A2,A3...Am} 是u1的好友集合；{B1,B2,B3...Bm} 是 u2 的好友集合。因此 u1,u2 的共同好友可以定义为这两个集合的交集（共同元素）。

例如，我们要求<200,400> 这两个好友的共同好友，那么就会得到一次次的分析就是下面这个样子：
200, 100 300 400
=>

<200,100> [300 400]
<200,300> [100 400]
<200,400> [100 300]

400, 100 200 300
=>

<400, 100> [200 300]
<400, 200> [100 300]
<400, 300> [100 200]

<200,100> [300 400] 代表的是：user_id = 200 与 user_id = 100 的可能的共同好友列表是【300,400】。

4. 代码

4.1 `PubFriendJobDriver`

package data_algorithm.chapter_8;

import data_algorithm.utils.HdfsUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;

public class PubFriendJobDriver extends Configured implements Tool  {
    @Override
    public int run(String[] args) throws Exception {
        HdfsUtils.deletePath(args[1]);
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);
        Path inpath = new Path(args[0]);
        Path outpath = new Path(args[1]);

        job.setJarByClass(PubFriendJobDriver.class);
        job.setMapperClass(PubFriendMapper.class);
        job.setReducerClass(PublicFriendReducer.class);

        job.setMapOutputValueClass(Text.class);
        job.setMapOutputKeyClass(Text.class);

        FileInputFormat.setInputPaths(job, inpath);
        FileOutputFormat.setOutputPath(job,outpath);

        int status = job.waitForCompletion(true)?0:1;
        return status;
    }

    public static void main(String[] args) throws Exception {
        if (args.length != 2) {
            System.out.println("args exception");
            System.exit(1);
        }
        ToolRunner.run(new PubFriendJobDriver(), args);
    }
}

4.2 `PubFriendMapper`

package data_algorithm.chapter_8;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

import java.io.IOException;

public class PubFriendMapper extends Mapper<LongWritable,Text,Text,Text> {
    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

        String[] line = value.toString().split("[,]");
        //System.out.println("line = "+ Arrays.toString(line));
        String curUser = line[0];
        String combineKey ;
        for(int i = 1;i < line.length;i++) {
            String friendList = "";
            if (sortUser(curUser, line[i]) < 0) {
                combineKey = "<" + curUser + "," + line[i] + ">";
            } else {
                combineKey = "<" + line[i] + "," + curUser + ">";
            }

            for(int j = 1;j< line.length ;j++) {
                if (j != i) {
                    friendList += line[j] ;
                    friendList += " ";
                }
            }

            System.out.println("combineKey = "+combineKey);
            System.out.println("friendList = "+friendList);
            context.write(new Text(combineKey),new Text(friendList));
        }
    }

    public int sortUser(String user_id,String friend) {
        return user_id.compareTo(friend);
    }

    public static void main(String[] args) {
        String str = "100,200,300,400,500,600";
        String[] line = str.split("[,]");
        String curUser = line[0];
        String combineKey ;
        for(int i = 1;i < line.length;i++) {
            String friendList = "";
            if (curUser.compareTo(line[i]) < 0) {
                combineKey = "<" + curUser + "," + line[i] + ">";
            } else {
                combineKey = "<" + line[i] + "," + curUser + ">";
            }

            for(int j = 1;j< line.length ;j++) {
                if (j != i) {
                    friendList += line[j] ;
                    friendList += " ";
                }
            }
            System.out.println("combineKey = "+combineKey);
            System.out.println("friendList = "+friendList);
        }
    }
}

4.3 `PublicFriendReducer`

package data_algorithm.chapter_8;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import scala.tools.cmd.gen.AnyVals;

import java.io.IOException;
import java.util.ArrayList;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class PublicFriendReducer extends Reducer<Text,Text,Text,Text> {
    @Override
    protected void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        Set<String> set = new HashSet<String>();
        StringBuilder sb = new StringBuilder();

        String [] friendList;
        for (Text fr : values) {
            friendList = fr.toString().split(" ");
            for (String fr_id : friendList) {
                if (set.contains(fr_id)) {
                    sb.append(fr_id).append(",");
                } else {
                    set.add(fr_id);
                }
            }
        }
        context.write(key,new Text(sb.toString()));
    }
}

4.4 执行结果

<100,200>	300,400,
<100,300>	200,400,500,
<100,400>	200,300,
<100,500>	300,
<100,600>	
<200,300>	100,400,
<200,400>	100,300,
<300,400>	100,200,
<300,500>	100,