数据解析、文本处理与正则表达式的应用
立即解锁
发布时间: 2025-08-18 02:37:38 阅读量: 2 订阅数: 11 

# 数据解析、文本处理与正则表达式的应用
## 1. 逗号分隔数据的解析
### 1.1 问题描述
在处理数据时,我们常常会遇到包含逗号分隔值(CSV)的字符串或文件。许多基于 Windows 的电子表格和一些数据库会使用 CSV 格式来导出数据。我们的任务就是读取这些 CSV 数据。
### 1.2 解决方案
可以使用自定义的 CSV 类或正则表达式来完成这个任务。
### 1.3 Java 程序实现
#### 1.3.1 CSVSimple 类示例
```java
import java.util.*;
/* Simple demo of CSV parser class.
*/
public class CSVSimple {
public static void main(String[] args) {
CSV parser = new CSV( );
List list = parser.parse(
"\"LU\",86.25,\"11/4/1998\",\"2:19PM\",+4.0625");
Iterator it = list.iterator( );
while (it.hasNext( )) {
System.out.println(it.next( ));
}
}
}
```
运行这个程序,输出结果如下:
```plaintext
> java CSVSimple
LU
86.25
11/4/1998
2:19PM
+4.0625
>
```
#### 1.3.2 CSV 类实现
```java
import java.util.*;
import com.darwinsys.util.Debug;
/** Parse comma-separated values (CSV), a common Windows file format.
* Sample input: "LU",86.25,"11/4/1998","2:19PM",+4.0625
* <p>
* Inner logic adapted from a C++ original that was
* Copyright (C) 1999 Lucent Technologies
* Excerpted from 'The Practice of Programming'
* by Brian W. Kernighan and Rob Pike.
* <p>
* Included by permission of the http://tpop.awl.com/ web site,
* which says:
* "You may use this code for any purpose, as long as you leave
* the copyright notice and book citation attached." I have done so.
* @author Brian W. Kernighan and Rob Pike (C++ original)
* @author Ian F. Darwin (translation into Java and removal of I/O)
* @author Ben Ballard (rewrote advQuoted to handle '""' and for readability)
*/
public class CSV {
public static final char DEFAULT_SEP = ',';
/** Construct a CSV parser, with the default separator (','). */
public CSV( ) {
this(DEFAULT_SEP);
}
/** Construct a CSV parser with a given separator.
* @param sep The single char for the separator (not a list of
* separator characters)
*/
public CSV(char sep) {
fieldSep = sep;
}
/** The fields in the current String */
protected List list = new ArrayList( );
/** the separator char for this parser */
protected char fieldSep;
/** parse: break the input String into fields
* @return java.util.Iterator containing each field
* from the original as a String, in order.
*/
public List parse(String line)
{
StringBuffer sb = new StringBuffer( );
list.clear( ); // recycle to initial state
int i = 0;
if (line.length( ) == 0) {
list.add(line);
return list;
}
do {
sb.setLength(0);
if (i < line.length( ) && line.charAt(i) == '"')
i = advQuoted(line, sb, ++i); // skip quote
else
i = advPlain(line, sb, i);
list.add(sb.toString( ));
Debug.println("csv", sb.toString( ));
i++;
} while (i < line.length( ));
return list;
}
/** advQuoted: quoted field; return index of next separator */
protected int advQuoted(String s, StringBuffer sb, int i)
{
int j;
int len= s.length( );
for (j=i; j<len; j++) {
if (s.charAt(j) == '"' && j+1 < len) {
if (s.charAt(j+1) == '"') {
j++; // skip escape char
} else if (s.charAt(j+1) == fieldSep) { //next delimiter
j++; // skip end quotes
break;
}
} else if (s.charAt(j) == '"' && j+1 == len) { // end quotes at end of line
break; //done
}
sb.append(s.charAt(j)); // regular character.
}
return j;
}
/** advPlain: unquoted field; return index of next separator */
protected int advPlain(String s, StringBuffer sb, int i)
{
int j;
j = s.indexOf(fieldSep, i); // look for separator
Debug.println("csv", "i = " + i + ", j = " + j);
if (j == -1) { // none found
sb.append(s.substring(i));
return s.length( );
} else {
sb.append(s.substring(i, j));
return j;
}
}
}
```
### 1.4 正则表达式实现
```java
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/* Simple demo of CSV matching using Regular Expressions.
* Does NOT use the "CSV" class defined in the Java CookBook, but uses
* a regex pattern simplified from Chapter 7 of <em>Mastering Regular
* Expressions</em> (p. 205, first edn.)
* @version $Id: ch03,v 1.3 2004/05/04 18:03:14 ian Exp $
*/
public class CSVRE {
/** The rather involved pattern used to match CSV's consists of three
* alternations: the first matches aquoted field, the second unquoted,
* the third a null field.
*/
public static final String CSV_PATTERN = "\"([^\"]+?)\",?|([^,]+),?|,";
private static Pattern csvRE;
public static void main(String[] argv) throws IOException {
System.out.println(CSV_PATTERN);
new CSVRE().process(new BufferedReader(new InputStreamReader(System.in)));
}
/** Construct a regex-based CSV parser. */
public CSVRE() {
csvRE = Pattern.compile(CSV_PATTERN);
}
/** Process one file. Delegates to parse() a line at a time */
public void process(BufferedReader in) throws IOException {
String line;
// For each line...
while ((line = in.readLine()) != null) {
System.out.println("line = `"
```
0
0
复制全文
相关推荐










