将元素快速加载到具有固定索引且无重复的数组/列表中

发布于 2025-01-02 06:22:58 字数 4250 浏览 2 评论 0原文

我的要求是将不在数组中的字符串输入到数组中。我还需要维护固定索引,因为该数组将与其他数据结构一起使用,与每个索引具有一对一的关系。目前我正在使用 ArrayList 类并使用 indexOf () 方法检查它是否首先存在,如果不存在则使用 将其添加到列表中带有一个参数的 add() 方法。我对java中的类不熟悉,因此无法理解如何使用HashMap或其他东西(trie或其他)来实现它,这将使加载过程更快。

ArrayList 中的 indexOf () 是否进行顺序搜索? 我的观点是减少将单词加载到数组时的处理时间,不插入重复项,并保持元素的固定索引。如果测试的单词已在数组中,则需要已插入该单词的索引,因为需要该索引来索引到其他结构并进行某些处理。有什么建议可以让这个过程变得更好吗?

更新

有一个数组,我有一些文档,我需要扫描每个单词并在文档中查找唯一的单词。但我还需要计算重复项的数量。换句话说,我需要计算文档中出现的独特术语的术语频率。我正在维护术语频率的 ArrayList(术语数 x 文档数)。我正在获取一个单词,然后使用 indexOf () 方法检查它是否在单词列表中。如果它不存在于单词列表中,那么我将该单词插入到列表中,并在二维数组中分配一个新行(Array),然后设置二维数组中的术语元素计数为 1。如果该单词已经在单词数组中,那么我使用该单词在数组中的索引来索引该单词的行Array 矩阵,并使用当前正在处理的文档编号来获取单元格并递增计数。

我的问题是减少我当前使用的每个单词的 indexOf () 处理时间。我需要获取单词数组中单词的索引(如果它已经在其中),如果它不在那里,那么我需要将其动态插入到数组中。

示例代码

import java.io.*;
import java.util.ArrayList;
import static java.lang.Math.log;


class DocumentRepresentation
{
  private String dirPath;
  private ArrayList<String> fileNameVector;
  private ArrayList<String> termVector;
  private ArrayList<Integer[]> tf; /* store it in natural 2d array */
  private Integer df[]; /* do normal 1d array */
  private Double idf[]; /* do normal 1d array */
  private Double tfIdf[][]; /* do normal 2d array */

  DocumentRepresentation (String dirPath)
  {
    this.dirPath = dirPath;
    fileNameVector = new ArrayList<String> ();
    termVector = new ArrayList<String> ();
    tf = new ArrayList<Integer[]> ();
  }

  /* Later sepatere the internal works */
  public int start ()
  {
    /* Load the files, and populate the fileNameVector string */
    File fileDir = new File (dirPath);
    int fileCount = 0;
    int index;

    if (fileDir.isDirectory () == false)
    {
      return -1;
    }

    File fileList[] = fileDir.listFiles ();

    for (int i=0; i<fileList.length; i++)
    {
      if (fileList[i].isFile () == true)
      {
        fileNameVector.add (fileList[i].getName ());
        //      System.out.print ("File Name " + (i + 1) + ": " + fileList[i].getName () + "\n");
      }
    }

    fileCount = fileNameVector.size ();
    for (int i=0;i<fileNameVector.size (); i++)
    {
      System.out.print ("Name " + (i+1) + ": " + fileNameVector.get (i) + "\n");
    }

    /* Bind the files with a buffered reader */
    BufferedReader fileReaderVector[] = new BufferedReader [fileCount];
    for (int i=0; i<fileCount; i++)
    {
      try
      {
        fileReaderVector[i] = new BufferedReader (new FileReader (fileList[i]));
      }
      /* Not handled */
      catch (FileNotFoundException e)
      {
        System.out.println (e);
      }
    }

    /* Scan the term frequencies in the tf 2d array */
    for (int i=0; i<fileCount; i++)
    {
      String line;

      try
      {
            /*** THIS IS THE PLACE OF MY QUESTION **/
        while ((line = fileReaderVector[i].readLine ()) != null)
        {
          String words[] = line.split ("[\\W]");

          for (int j=0;j<words.length;j++)
          { 
            if ((index = termVector.indexOf (words[j])) != -1)
            {
              tf.get (index)[i]++;
              /* increase the tf count */
            }
            else
            {
              termVector.add (words[j]);
              Integer temp[] = new Integer [fileCount];

              for (int k=0; k<fileCount; k++)
              {
                temp[k] = new Integer (0);
              }
              temp[i] = 1;
              tf.add (temp);
              index = termVector.indexOf (words[j]);
            }

            System.out.println (words[j]);
          }
        }
      }
      /* Not handled */
      catch (IOException e)
      {
        System.out.println (e);
      }
    }

    return 0;
  }
}

class DocumentRepresentationTest
{
  public static void main (String args[])
  {
    DocumentRepresentation docSet = new DocumentRepresentation (args[0]);
    docSet.start ();
    System.out.print ("\n");
  }
}

注意:代码被剪断以将焦点集中在问题上

My requirement is to enter strings into an array which are not in the array. I also need to maintain fixed indexes, as this array will be used with other data structure with a one-to-one relation with each index. At present i am using the ArrayList class and checking with the indexOf () method to check if it exists first, if not then add it into the list with the add () method with one argument. I am not familiar to the classes in java, and therefore could not understand how can i implement it with HashMap or something else (trie or else), which will make the loading process fast .

Do the indexOf () in ArrayList makes a sequential search ?
My point is to reduce the processing time when loading the words into the array, with not inserting duplicates, and maintain fixed index of the elements. If a word tested is already in the array, then the index in which it is already inserted is required, as this index is needed to index into some other structure and do some processing. Any suggestions to make this process better?

UPDATE

There is an array, i have some documents from where i need to scan each word and find unique words in the documents. But also i need to count the number of duplicates. Stated in other way, i need to count the term frequencies of the unique terms occurring in the documents. I am maintaining a ArrayList<Integer[]> of term frequency (number of terms x number of docs). I am fetching one word and then checking if it is in the word list with the indexOf () method. If it is not present in the word list, then i am inserting the word into the list, and allocating a new row in the 2d array (the Array<Integer[]>) and then setting the count of the term element in 2d array to 1. If the word is already in the word array, then i use the index of the word in the array to index in the row of the Array<Integer[]> matrix, and use the current under processing document number to get to the cell and increment the count.

My question is to reduce the indexOf () processing time for each word i am currently using. I need to get the index of the word in the word array if it is already in there, and if it is not in there then i need to insert it into the array dynamically.

Sample Code

import java.io.*;
import java.util.ArrayList;
import static java.lang.Math.log;


class DocumentRepresentation
{
  private String dirPath;
  private ArrayList<String> fileNameVector;
  private ArrayList<String> termVector;
  private ArrayList<Integer[]> tf; /* store it in natural 2d array */
  private Integer df[]; /* do normal 1d array */
  private Double idf[]; /* do normal 1d array */
  private Double tfIdf[][]; /* do normal 2d array */

  DocumentRepresentation (String dirPath)
  {
    this.dirPath = dirPath;
    fileNameVector = new ArrayList<String> ();
    termVector = new ArrayList<String> ();
    tf = new ArrayList<Integer[]> ();
  }

  /* Later sepatere the internal works */
  public int start ()
  {
    /* Load the files, and populate the fileNameVector string */
    File fileDir = new File (dirPath);
    int fileCount = 0;
    int index;

    if (fileDir.isDirectory () == false)
    {
      return -1;
    }

    File fileList[] = fileDir.listFiles ();

    for (int i=0; i<fileList.length; i++)
    {
      if (fileList[i].isFile () == true)
      {
        fileNameVector.add (fileList[i].getName ());
        //      System.out.print ("File Name " + (i + 1) + ": " + fileList[i].getName () + "\n");
      }
    }

    fileCount = fileNameVector.size ();
    for (int i=0;i<fileNameVector.size (); i++)
    {
      System.out.print ("Name " + (i+1) + ": " + fileNameVector.get (i) + "\n");
    }

    /* Bind the files with a buffered reader */
    BufferedReader fileReaderVector[] = new BufferedReader [fileCount];
    for (int i=0; i<fileCount; i++)
    {
      try
      {
        fileReaderVector[i] = new BufferedReader (new FileReader (fileList[i]));
      }
      /* Not handled */
      catch (FileNotFoundException e)
      {
        System.out.println (e);
      }
    }

    /* Scan the term frequencies in the tf 2d array */
    for (int i=0; i<fileCount; i++)
    {
      String line;

      try
      {
            /*** THIS IS THE PLACE OF MY QUESTION **/
        while ((line = fileReaderVector[i].readLine ()) != null)
        {
          String words[] = line.split ("[\\W]");

          for (int j=0;j<words.length;j++)
          { 
            if ((index = termVector.indexOf (words[j])) != -1)
            {
              tf.get (index)[i]++;
              /* increase the tf count */
            }
            else
            {
              termVector.add (words[j]);
              Integer temp[] = new Integer [fileCount];

              for (int k=0; k<fileCount; k++)
              {
                temp[k] = new Integer (0);
              }
              temp[i] = 1;
              tf.add (temp);
              index = termVector.indexOf (words[j]);
            }

            System.out.println (words[j]);
          }
        }
      }
      /* Not handled */
      catch (IOException e)
      {
        System.out.println (e);
      }
    }

    return 0;
  }
}

class DocumentRepresentationTest
{
  public static void main (String args[])
  {
    DocumentRepresentation docSet = new DocumentRepresentation (args[0]);
    docSet.start ();
    System.out.print ("\n");
  }
}

Note: code is snipped to keep the focus on the question

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

染墨丶若流云 2025-01-09 06:22:58

LinkedHashMap 能够一次性满足您的所有要求,具有良好的性能特点。

键将是您的项目,值将是索引。如果按照索引递增的顺序插入元素,则迭代映射也会按照索引递增的顺序返回元素。

以下是一些示例代码:

LinkedHashMap<Item,Integer> map = new LinkedHashMap<Item,Integer>();

获取项目的索引:

Integer index = map.get(item);
if (index != null) {
  // already in the map; use `index'
} else {
  // not in the map
}

item 添加到下一个索引:

if (!map.containsKey(item)) {
  map.put(item, map.size());
}

按索引递增的顺序迭代元素:

for (Entry<Item,Integer> e : map.entrySet()) {
  Item item = e.getKey();
  int index = e.getValue();
  ...
}

这不能有效地获取特定位置的值索引,但我对你的问题的阅读表明你实际上并不需要这个。

LinkedHashMap can satisfy all your requirements at once, with good performance characteristics.

The keys would be your items and the values would be the indices. If you insert the elements in the order of increasing indices, then iterating over the map would also return the elements in the order of increasing indices.

Here is some sample code:

LinkedHashMap<Item,Integer> map = new LinkedHashMap<Item,Integer>();

Get the item's index:

Integer index = map.get(item);
if (index != null) {
  // already in the map; use `index'
} else {
  // not in the map
}

Add item with the next index:

if (!map.containsKey(item)) {
  map.put(item, map.size());
}

Iterate over the elements in the order of increasing indices:

for (Entry<Item,Integer> e : map.entrySet()) {
  Item item = e.getKey();
  int index = e.getValue();
  ...
}

What this can't do efficiently is get the value at the specific index, but my reading of your question indicates that you don't actually need this.

避讳 2025-01-09 06:22:58

ArrayList.indexOf() 执行线性搜索,因此时间复杂度为 O(n)。

如果确实必须放入 ArrayList,您可以创建两个集合:ArrayList 和 HashSet。向两个集合添加和删除元素。在添加之前,调用HashSet.contains()来查看该元素是否已经存在。

将 ArrayList 和 HashSet 封装在它自己的类中。

ArrayList.indexOf() does a linear search, so it's O(n).

If it really has to go into an ArrayList, you could create two collections, ArrayList and HashSet. Add and remove elements to both collections. Before adding, call HashSet.contains() to see if the element already exists.

Encapsulate your ArrayList and HashSet in its own class.

凉墨 2025-01-09 06:22:58

如果你想离开ArrayList,你可以有一个HashSet作为支持,但代价是双倍的内存。

您可以使用 HashSet.add() 如果返回 true,您还可以将元素添加到 ArrayList

If you want to leave the ArrayList you can have an HashSet as support, with the cost of the double of the memory.

You can use HashSet.add() if return true you can add also the element to the ArrayList

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文