保定学校网站建设/苏州网站建设公司排名
文章目录
- 新建项目
- 选择Maven
- 直接下一步
- GroupId默认就可(可与根据需要修改)
- 点击Finish后会报错:Cannot resolve plugin org.apache.maven.plugins:maven-clean-plugin:2.5
- 错误解决办法
- 新建scala目录
- 配置settings.xml
- 重新加载项目
- 配置scala环境
- 新建数据目录data,在data目录下新建word.txt
- 在src/main/scala下新建WordCount.scala
新建项目
选择Maven
直接下一步
GroupId默认就可(可与根据需要修改)
点击Finish后会报错:Cannot resolve plugin org.apache.maven.plugins:maven-clean-plugin:2.5
出现如下错误是因为本地maven的配置文件和仓库地址不一致
错误解决办法
-
Step1: File -> Settings:
-
Step2:点击Bulid Execution Deployment
-
Step3:选择Bulid Tools -> Maven,发现本地maven的配置文件和仓库地址不一致
切换到C:\Users\32429.m2下,发现该目录下只有respository文件,没有settings.xml,如下所示:
先将C:\Users\32429.m2目录下的repository文件夹复制到目录E:\maven\apache-maven-3.6.3-bin\apache-maven-3.6.3下,如下所示:
然后找到本地下载的maven地址中的conf目录
在conf目录中找到settings.xml并将其复制到上一级目录下
即将E:\maven\apache-maven-3.6.3-bin\apache-maven-3.6.3\conf目录下的settings.xml复制到E:\maven\apache-maven-3.6.3-bin\apache-maven-3.6.3下
最后再在IDEA中更改本地maven的配置文件和仓库地址
点击OK后,问题解决
新建scala目录
在src/main新建scala目录,将其提升为Sources Root; 然后在src/test再新建scala目录,将其提升为Test Sources Root
配置settings.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"><modelVersion>4.0.0</modelVersion><groupId>org.example</groupId><artifactId>Testspark</artifactId><version>1.0-SNAPSHOT</version><properties><maven.compiler.source>15</maven.compiler.source><maven.compiler.target>15</maven.compiler.target><scala.version>2.12.10</scala.version><hadoop.version>3.2.0</hadoop.version><spark.version>3.1.1</spark.version><hanlp.version>portable-1.8.0</hanlp.version><scopt.version>3.3.0</scopt.version><slf4j-api.version>1.7.21</slf4j-api.version><slf4j-log4j12.version>1.7.21</slf4j-log4j12.version><log4j.version>1.2.17</log4j.version><junit.version>4.12</junit.version></properties><dependencies><!-- scala环境,有了spark denpendencies后可以省略--><dependency><groupId>org.scala-lang</groupId><artifactId>scala-library</artifactId><version>${scala.version}</version></dependency><dependency><groupId>org.scala-lang</groupId><artifactId>scala-compiler</artifactId><version>${scala.version}</version></dependency><dependency><groupId>org.scala-lang</groupId><artifactId>scala-reflect</artifactId><version>${scala.version}</version></dependency><!--Spark环境--><dependency><groupId>org.apache.spark</groupId><artifactId>spark-core_2.12</artifactId><version>${spark.version}</version></dependency><!-- https://mvnrepository.com/artifact/org.apache.spark/spark-streaming --><dependency><groupId>org.apache.spark</groupId><artifactId>spark-streaming_2.12</artifactId><version>3.1.1</version><scope>provided</scope></dependency><!--Hadoop--><dependency><groupId>org.apache.hadoop</groupId><artifactId>hadoop-client</artifactId><version>${hadoop.version}</version></dependency><!-- https://mvnrepository.com/artifact/org.apache.spark/spark-sql --><dependency><groupId>org.apache.spark</groupId><artifactId>spark-sql_2.12</artifactId><version>3.1.1</version></dependency><!-- https://mvnrepository.com/artifact/org.apache.spark/spark-mllib --><dependency><groupId>org.apache.spark</groupId><artifactId>spark-mllib_2.12</artifactId><version>3.1.1</version></dependency><!-- HanLP中文自然语言处理包 --><dependency><groupId>com.hankcs</groupId><artifactId>hanlp</artifactId><version>${hanlp.version}</version></dependency><dependency><groupId>junit</groupId><artifactId>junit</artifactId><version>${junit.version}</version><scope>test</scope></dependency><dependency><groupId>org.slf4j</groupId><artifactId>slf4j-api</artifactId><version>${slf4j-api.version}</version></dependency><!-- 日志框架 --><dependency><groupId>log4j</groupId><artifactId>log4j</artifactId><version>${log4j.version}</version></dependency><!-- https://mvnrepository.com/artifact/org.ansj/ansj_seg --><dependency><groupId>org.ansj</groupId><artifactId>ansj_seg</artifactId><version>5.1.6</version></dependency></dependencies><build><sourceDirectory>src/main/scala</sourceDirectory><testSourceDirectory>src/test/scala</testSourceDirectory><plugins><plugin><groupId>net.alchim31.maven</groupId><artifactId>scala-maven-plugin</artifactId><version>4.5.1</version><executions><execution><goals><goal>compile</goal><goal>testCompile</goal></goals><configuration><args><arg>compile</arg><arg>${project.build.directory}/.scala_dependencies</arg></args></configuration></execution></executions></plugin></plugins></build></project>
注意目录结构和配置的一致
重新加载项目
配置scala环境
一定要注意这里的Ivy版本与spark-core的版本对应
新建数据目录data,在data目录下新建word.txt
在src/main/scala下新建WordCount.scala
import org.apache.spark.{SparkConf, SparkContext}object WordCount {def main(args: Array[String]): Unit = {//1.创建SparkConf对象,设置appName和Master地址//local[2]: 表示本地模拟2个线程去运行程序val conf = new SparkConf().setMaster("local[2]").setAppName("wordcount")//2.创建SparkContext对象,它是所有任务计算的源头,它会创建DAGScheduler和TaskScheduler//spark对象是所有spark程序的执行入口val sc = new SparkContext(conf)//3.读取数据文件,RDD可以通过简单的理解为是一个集合,集合中存放的元素是Stringval inputRdd = sc.textFile("./data/word.txt")//4.切分每一份,获取所有单词val spiltRdd = inputRdd.flatMap(_.split(" "))//5. 每个单词记为1,转换单词(单词,1)val pairRDD = spiltRdd.map(x => (x, 1))//6.相同单词汇总,前一个下划线表示累加数据,后一个下划线表示数据val resultRdd = pairRDD.reduceByKey(_ + _)//7.收集打印结果数据resultRdd.collect().foreach(println)//8.关闭sparkContext对象sc.stop()}
}
结果如下: