博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
SQL Server报表生成器中的R脚本词云
阅读量:2510 次
发布时间:2019-05-11

本文共 10500 字,大约阅读时间需要 35 分钟。

什么是R (What is R)

R is a very popular data programing language. R is especially used in data analysis, statistics calculations, predictions, data mining and machine learning. R is used by data scientist, economist, genetic scientists, and statisticians. R has very wide usage in real life. Healthcare, finance, marketing, and manufacturing are some of them.

R是一种非常流行的数据编程语言。 R特别用于数据分析,统计计算,预测,数据挖掘和机器学习。 R由数据科学家,经济学家,遗传科学家和统计学家使用。 R在现实生活中用途非常广泛。 医疗保健,金融,市场营销和制造业就是其中一些。

In short, R is/has:

简而言之,R是/具有:

  • open source

    开源的
  • flexible for statistics calculation

    灵活的统计计算
  • huge chart ability

    巨大的图表能力
  • suitable for big data analyses

    适用于大数据分析
  • a large community

    大型社区
  • the ability to integrate with many programs (Microsoft SQL Server , SAP HANA Oracle vs.)

    与许多程序集成的能力(Microsoft SQL Server,SAP HANA Oracle与)

The capabilities of the R program improve with additional packages. You can easily find or develop additional libraries for R.

R程序的功能通过附加软件包来改进。 您可以轻松找到或开发R的其他库。

什么是“词云” (What is a “Word cloud”)

A “Word cloud” is a technique that allows us to focus on the most repetitive words in a text, article or any word sequence. Using this technique, we can analyze keywords in a word sequence.

“词云”是一种技术,使我们可以专注于文本,文章或任何词序中最重复的词。 使用这种技术,我们可以分析单词序列中的关键字。

To give a simple example, if we analyze product comments of an online shopping website, we can find most common words. These words can tell us what our customers think about this product. Now we will make our article word cloud. As shown in below, our article keywords are “SQL” and “server” if we combine the two words, our article is about probably about “SQL Server”.

举一个简单的例子,如果我们分析一个在线购物网站的产品评论,我们可以找到最常用的词。 这些话可以告诉我们客户对这款产品的看法。 现在,我们将使文章词云化。 如下所示,如果我们将这两个词结合使用,则我们的文章关键字为“ SQL”和“ server”,我们的文章可能涉及“ SQL Server”。

Microsoft BI和SQL Server R服务 (Microsoft BI and SQL Server R Services)

Microsoft is one of the key players in Business Intelligence market. Gartner positions Microsoft as a leader in Business Intelligence and analytics . Microsoft Business Intelligence platform includes a lot of capable tools (Power BI, Power View, Reporting Service, Microsoft SQL Server Mobile Report, Analysis Services and Integration Services).

微软是商业智能市场的主要参与者之一。 Gartner将Microsoft定位为商业智能和分析的领导者。 Microsoft商业智能平台包括许多功能强大的工具(Power BI,Power View,Reporting Service,Microsoft SQL Server移动报表,Analysis Services和Integration Services)。

Specifically, Microsoft has developed some very strong features for SQL Server. The first come to mind include columnstore index, R script integration, Python integration and graph database. Azure Machine Learning is an impressive tool for predictive analytics. PolyBase allows us to process data outside of the database via T-SQL. With PolyBase we can access Hadoop data. When we look at all these features, the Microsoft BI stack without a doubt, very impressive.

具体地说,Microsoft为SQL Server开发了一些非常强大的功能。 首先想到的是列存储索引,R脚本集成,Python集成和图形数据库。 Azure机器学习是用于预测分析的出色工具。 PolyBase允许我们通过T-SQL处理数据库外部的数据。 使用PolyBase,我们可以访问Hadoop数据。 当我们查看所有这些功能时,毫无疑问,Microsoft BI堆栈非常令人印象深刻。

In SQL Server 2016, Microsoft added support for the R language. This capability opens the door to the magical world or R script. With this feature, SQL Server users can now easily calculate using advanced statistical data and predictions.

在SQL Server 2016中,Microsoft添加了对R语言的支持。 此功能为神奇的世界或R脚本打开了大门。 借助此功能,SQL Server用户现在可以使用高级统计数据和预测轻松进行计算。

Let’s give a real-life example now. You are working with a dataset and you are looking for the relationship between two numerical columns. One of the easiest ways to find this relationship is a linear regression (linear regression is a statistical method that summarizes the relationship between numerical ). Now let’s see how we can find this statistical relationship. We are passing dataset and columns to the “sp_execute_external_script” procedure. In R script we are using the linearMod function and then we are getting summary of linear. In the example below we are searching relations between UnitPrice and order quantity.

现在让我们举一个真实的例子。 您正在使用数据集,并且正在寻找两个数值列之间的关系。 查找此关系的最简单方法之一是线性回归(线性回归是一种统计方法,用于总结数值之间的关系)。 现在让我们看看如何找到这种统计关系。 我们将数据集和列传递给“ sp_execute_external_script”过程。 在R脚本中,我们使用linearModule函数,然后获得linear的摘要。 在下面的示例中,我们正在搜索单价和订单数量之间的关系。

Now, in this article we will create word cloud in SQL Server Report Builder. If you want you can deploy this report to SQL Server Reporting Service.

现在,在本文中,我们将在SQL Server报表生成器中创建词云。 如果需要,可以将该报表部署到SQL Server报表服务。

要求 (Requirements)

入门 (Getting started )

We will perform these steps to show our word cloud on SQL Server Report Builder;

我们将执行以下步骤,以在SQL Server Report Builder上显示词云;

  • Enable external scripts option

    启用外部脚本选项
  • Test R script SQL Server Services

    测试R脚本SQL Server服务
  • Install R script packages

    安装R脚本包
  • Create word cloud script on R with SQL Server Management Studio

    使用SQL Server Management Studio在R上创建词云脚本
  • Connect R word cloud script to SQL Server Report Builder

    将R字云脚本连接到SQL Server报表生成器

Enable external scripts option

启用外部脚本选项

In SSMS you will need to check configuration for “external scripts enables”. If this option is “0” we have to change it to 1.

在SSMS中,您需要检查“外部脚本启用”的配置。 如果此选项为“ 0”,我们必须将其更改为1。

 EXEC SP_CONFIGURE 'external scripts enabled';   

Now we will enable R script run value.

现在,我们将启用R脚本运行值。

 EXEC SP_CONFIGURE 'external scripts enabled',1GORECONFIGUREGOEXEC SP_CONFIGURE 'external scripts enabled' 

We enabled external scripts. SQL Server 2016 only supports R script.

我们启用了外部脚本。 SQL Server 2016仅支持R脚本。

SQL Server 2017 supports R script and Python. We will test R script execution on SQL Server Management Studio.

SQL Server 2017支持R脚本和Python。 我们将在SQL Server Management Studio上测试R脚本的执行情况。

 EXEC sp_execute_external_script    @language = N'R'    ,@script = N'print("Hello SQLShack")	            print("R Script is working")' 

Now we will install word cloud packages on the SQL Server R Service;

现在,我们将在SQL Server R Service上安装词云包;

  • You can find it in this folder
    您可以在此文件夹中找到它
    “C:\Program Files\Microsoft SQL Server\MSSQL13.MSSQLSERVER\R_SERVICES\bin”
    “ C:\ Program Files \ Microsoft SQL Server \ MSSQL13.MSSQLSERVER \ R_SERVICES \ bin”
  • Run console with administrator rights.

    以管理员权限运行控制台。

We will install word cloud and other packages. This process will take a little longer.

我们将安装词云和其他软件包。 此过程将花费一些时间。

install.packages(“tm”) # for text mining

install.packages(“SnowballC”) # for text stemming
install.packages(“wordcloud”) # word-cloud generator
install.packages(“RColorBrewer”) # color palettes

install.packages(“ tm”)#用于文本挖掘

install.packages(“ SnowballC”)#用于文本提取
install.packages(“ wordcloud”)#词云生成器
install.packages(“ RColorBrewer”)#调色板

In this step we will create data for our R script word cloud.

在这一步中,我们将为R脚本词云创建数据。

DROP TABLE IF  EXISTS FruitListCREATE TABLE FruitList(FruitName VARCHAR(2000)) INSERT INTO FruitList VALUES('Orange Lemon Melon Banana Pear Apple Peach Pear Apple Peach Orange Papaya Pear Apple Peach Banana') 

Now we will generate R script for the word cloud.

现在,我们将为单词cloud生成R脚本。

 EXECUTE sp_execute_external_script @language = N'R' , @script = N'library("tm")library("SnowballC")library("wordcloud")library("RColorBrewer") imageDir <- ''C:\\temp'';image_filename = tempfile(pattern = "plot_", tmpdir = imageDir, fileext = ".jpg")print(image_filename);jpeg(filename=image_filename,  width=600, height = 800); text <- InputDataSet$FruitNamedocs <- Corpus(VectorSource(text)) inspect(docs)dtm <- TermDocumentMatrix(docs)m <- as.matrix(dtm)v <- sort(rowSums(m),decreasing=TRUE)d <- data.frame(word = names(v),freq=v)head(d, 10)set.seed(1234)aaa <- wordcloud(words = d$word, freq = d$freq , min.freq = 1 ,          max.words=200, random.order=FALSE, rot.per=0.35)     dev.off();     OutputDataSet <- data.frame(data=readBin(file(image_filename, "rb"), what=raw(), n=1e6))'  , @input_data_1 = N'select FruitName from FruitList'WITH RESULT SETS ((plot varbinary(max)));

We will analyze R script code, piece by piece

我们将逐段分析R脚本代码

This part of script will create a jpeg file.

脚本的这一部分将创建一个jpeg文件。

imageDir <- ''C:\\temp'';image_filename = tempfile(pattern = "plot_", tmpdir = imageDir, fileext = ".jpg")print(image_filename);jpeg(filename=image_filename,  width=600, height = 800);

In this part, we will send our row data to the R script

在这一部分,我们将把行数据发送到R脚本

text <- InputDataSet$FruitName

This will create our word cloud if you run this script in R studio you can see word cloud. But you have to set some values to text variable.

如果您在R studio中运行此脚本,则将创建词云。 但是您必须为文本变量设置一些值。

docs <- Corpus(VectorSource(text)) inspect(docs)dtm <- TermDocumentMatrix(docs)m <- as.matrix(dtm)v <- sort(rowSums(m),decreasing=TRUE)d <- data.frame(word = names(v),freq=v)head(d, 10)set.seed(1234)aaa <- wordcloud(words = d$word, freq = d$freq , min.freq = 1 ,          max.words=200, random.order=FALSE, rot.per=0.35)

And the finally. this part of code will generate our binary output and we will see below output.

最后。 这部分代码将生成二进制输出,我们将在下面看到输出。

dev.off();     OutputDataSet <- data.frame(data=readBin(file(image_filename, "rb"), what=raw(), n=1e6))

In this part of our article we will connect our word cloud to SQL Server 2016 Report Builder.

在本文的这一部分中,我们将词云连接到SQL Server 2016 Report Builder。

We will open report builder and create new report.

我们将打开报告生成器并创建新报告。

We will create a blank report.

我们将创建一个空白报告。

Right click data source and add data source.

右键单击数据源并添加数据源。

We will select Use a connection embedded in my report then click build button enter server name or IP and last step step select or enter a database name. Next test the connection.

我们将选择“使用我报表中嵌入的连接”,然后单击“构建”按钮输入服务器名称或IP,最后一步选择或输入数据库名称。 接下来测试连接。

We will add a data set for the executed query.

我们将为执行的查询添加一个数据集。

Now we will select the Use a dataset embedded in my report then select Data source and paste our query.

现在,我们将选择“ 使用嵌入到我的报表中的数据集”,然后选择“数据源”并粘贴我们的查询。

Now we will add an image to our report. Select the image and add the image to report.

现在,我们将图像添加到我们的报告中。 选择图像并添加图像以进行报告。

Select the image source Database and select image data. Change the image size and select Orginal size

选择图像源数据库,然后选择图像数据。 更改图像尺寸并选择原始尺寸

Finally, we will run the report and view our word cloud!

最后,我们将运行报告并查看我们的词云!

结论 (Conclusion )

SQL Server R script Services is a powerful tool. It provides a gateway to the R script world to SQL Server and Microsoft BI tools. In this article we looked R script SQL Server Integration, created R script on SQL Server Manegment Studio and we generated a word cloud report.

SQL Server R脚本服务是一个功能强大的工具。 它为R脚本世界提供了通往SQL Server和Microsoft BI工具的网关。 在本文中,我们研究了R脚本SQL Server集成,在SQL Server Manegment Studio上创建了R脚本,并生成了词云报告。

翻译自:

转载地址:http://gmnwd.baihongyu.com/

你可能感兴趣的文章
C# MySql 连接
查看>>
sk_buff Structure
查看>>
oracle的级联更新、删除
查看>>
多浏览器开发需要注意的问题之一
查看>>
Maven配置
查看>>
HttpServletRequest /HttpServletResponse
查看>>
SAM4E单片机之旅——24、使用DSP库求向量数量积
查看>>
从远程库克隆库
查看>>
codeforces Unusual Product
查看>>
hdu4348 - To the moon 可持久化线段树 区间修改 离线处理
查看>>
springMVC中一个class中的多个方法
查看>>
cxx signal信号捕获
查看>>
《Android开发艺术探索》读书笔记——Cha3.2.3改变布局参数实现View的滑动
查看>>
python闭包与装饰器
查看>>
Acegi 源码解释
查看>>
Activity的几种启动跳转方式
查看>>
LCA最近公共祖先Tarjan(离线)
查看>>
牛客练习赛16 E求值
查看>>
matlab rank
查看>>
Asp.net系列--基础篇(三)
查看>>