GetFormFieldNames 并不总是有效

发布于 2024-10-21 14:01:22 字数 1903 浏览 7 评论 0原文

我试图找出哪种形式和元素也属于。我现在从这个网站理解的代码:

http://www.cryer .co.uk/brian/delphi/twebbrowser/read_write_form_elements.htm

包含此代码的

function GetFormFieldNames(fromForm: IHTMLFormElement): TStringList;
var
  index: integer;
  field: IHTMLElement;
  input: IHTMLInputElement;
  select: IHTMLSelectElement;
  text: IHTMLTextAreaElement;
begin
  result := TStringList.Create;
  for index := 0 to fromForm.length do
  begin
    field := fromForm.Item(index,'') as IHTMLElement;
    if Assigned(field) then
    begin
      if field.tagName = 'INPUT' then
      begin
        // Input field.
        input := field as IHTMLInputElement;
        result.Add(input.name);
      end
      else if field.tagName = 'SELECT' then
      begin
        // Select field.
        select := field as IHTMLSelectElement;
        result.Add(select.name);
      end
      else if field.tagName = 'TEXTAREA' then
      begin
        // TextArea field.
        text := field as IHTMLTextAreaElement;
        result.Add(text.name);
      end;
    end;
  end;

end;

似乎对大多数网站都工作正常。但是,有一些网站,例如以下网站:

http://service .mail.com/registration.html#.1258-bluestripe-product1-undef

通过查看该代码并将其与活动 ID 进行比较,我可以找到它所在的形式。但是它不适用于该形式网站。出于某种原因,我认为它与 htmldocument3 有关,并且该代码适用于 htmldocument2。但我不确定。

所以我的问题是如何从该网站中提取包含所有元素名称的 tstringlist ?希望你能帮忙!

编辑:添加了一些代码

              begin

                theForm := GetFormByNumber(webbrowser1.document as IHTMLDocument2,
                  0);
                fields := GetFormFieldNames(theForm);
                num := fields.IndexOf(theid);
              end;
              until (num <> -1);

I am trying to find out which form and element belongs too. The code that I now understand from this website:

http://www.cryer.co.uk/brian/delphi/twebbrowser/read_write_form_elements.htm

containing this code

function GetFormFieldNames(fromForm: IHTMLFormElement): TStringList;
var
  index: integer;
  field: IHTMLElement;
  input: IHTMLInputElement;
  select: IHTMLSelectElement;
  text: IHTMLTextAreaElement;
begin
  result := TStringList.Create;
  for index := 0 to fromForm.length do
  begin
    field := fromForm.Item(index,'') as IHTMLElement;
    if Assigned(field) then
    begin
      if field.tagName = 'INPUT' then
      begin
        // Input field.
        input := field as IHTMLInputElement;
        result.Add(input.name);
      end
      else if field.tagName = 'SELECT' then
      begin
        // Select field.
        select := field as IHTMLSelectElement;
        result.Add(select.name);
      end
      else if field.tagName = 'TEXTAREA' then
      begin
        // TextArea field.
        text := field as IHTMLTextAreaElement;
        result.Add(text.name);
      end;
    end;
  end;

end;

seems to be working fine for most sites. However there are a few websites such as this one:

http://service.mail.com/registration.html#.1258-bluestripe-product1-undef

By looking at that code and comparing it with the active id, I can find the form it is in. However it does not work for that website. for some reason I think it has to do with htmldocument3 adn that this code is for htmldocument2. But I am not sure.

so my question is How can I extract a tstringlist from this website with all the elements names in them? hope you can help!

Edited: Added some code

              begin

                theForm := GetFormByNumber(webbrowser1.document as IHTMLDocument2,
                  0);
                fields := GetFormFieldNames(theForm);
                num := fields.IndexOf(theid);
              end;
              until (num <> -1);

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

暗喜 2024-10-28 14:01:22

在网页中定位表单元素的一个复杂问题是该页面可能包含框架并且任何框架中都可能存在表单。基本上,您必须迭代所有框架和每个框架中的表单。一旦获得 IHTMLFormElement 形式的表单,就可以使用 Cryer 的函数来获取表单元素名称。

您提供的示例链接没有任何框架,并且您在获取表单元素列表时应该没有问题,除非您尝试按名称获取表单,因为它没有分配名称。 获取表单元素名称和值没有问题

procedure GetForms(doc1: IHTMLDocument2; var sl: TStringList);
var
  i, j, n: integer;
  docForm: IHTMLFormElement;
  slt:  TStringList;
  s: string;
begin
  if doc1 = nil then
  begin
    ShowMessage('doc1 is empty [GetForms]');
    Exit;
  end;
  slt := TStringList.Create;

  n := NumberOfForms(doc1);
  sl.Add('Forms: ' + IntToStr(n));
  for i := 0 to n - 1 do
  begin
    docForm := GetFormByNumber(doc1, i);
    sl.Add('Form Name: ' + docForm.Name);
    slt.Clear;
    slt := GetFormFieldNames(docForm);
    for j := 0 to slt.Count - 1 do
    begin
      s := GetFieldValue(docForm, slt[j]);
      sl.Add('Field Name: ' + slt[j] + '  value: "' + s + '"');
    end;
  end;
  sl.Add('');
  slt.Free;
end;

使用以下过程Cryer 的示例 导航框架集不适用于所有网站,请参阅http://support.microsoft.com/support/kb/articles/Q196/3/40.ASP。以下函数在我尝试过的所有网站上成功提取框架作为 IHTMLDocument2

function GetFrameByNumber(Doc:IHTMLDocument2; n:integer):IHTMLDocument2;
var
  Container: IOleContainer;
  Enumerator: ActiveX.IEnumUnknown;
  Unknown: IUnknown;
  Browser: IWebBrowser2;
  Fetched: Longint;
  NewDoc: IHTMLDocument2;
  i : integer;
begin
  // We cannot use the document's frames collection here, because
  // it does not work in every case (i.e. Documents from a foreign domain).
  // From: http://support.microsoft.com/support/kb/articles/Q196/3/40.ASP
  i := 0;
  if (Supports(Doc, IOleContainer, Container)) and
     (Container.EnumObjects(OLECONTF_EMBEDDINGS, Enumerator) = S_OK) then
  begin
    while Enumerator.Next(1, Unknown, @Fetched) = S_OK do
    begin
      if (Supports(Unknown, IWebBrowser2, Browser)) and
         (Supports(Browser.Document, IHTMLDocument2, NewDoc)) then
      begin
        // Here, NewDoc is an IHTMLDocument2 that you can query for
        // all the links, text edits, etc.
        if i=n then
        begin
          Result := NewDoc;
          Exit;
        end;
        i := i+1;
      end;
    end;
  end;
end;

这是我如何使用 GetForms 和 GetFrameByNumber 的示例

// from the TForm1 declaration
    { Public declarations }
    wdoc: IHTMLDocument2;


procedure TForm1.btnAnalyzeClick(Sender: TObject);
begin
  wdoc := WebBrowser.Document as IHTMLDocument2;
  GetDoc(wdoc);
end;

procedure TForm1.GetDoc(doc1: IHTMLDocument2);
var
  i, n: integer;
  doc2: IHTMLDocument2;
  frame_dispatch: IDispatch;
  frame_win: IHTMLWindow2;
  ole_index: olevariant;
  sl: TStringList;
begin
  if doc1 = nil then
  begin
    ShowMessage('Web doc is empty');
    Exit;
  end;
  Form2.Memo1.Lines.Clear;
  sl := TStringList.Create;

  n := doc1.frames.length;
  sl.Add('Frames: ' + IntToStr(n));
  // check each frame for the data
  if n = 0 then
    GetForms(doc1, sl)
  else
    for i := 0 to n - 1 do
    begin
      sl.Add('--Frame: ' + IntToStr(i));
      ole_index := i;
      frame_dispatch := doc1.Frames.Item(ole_index);
      if frame_dispatch <> nil then
      begin
        frame_win := frame_dispatch as IHTMLWindow2;
        doc2 := frame_win.document;
//        sl.Add(doc2.body.outerHTML);
        GetForms(doc2,sl);
        GetDoc(doc2);
      end;
    end;

// Form2 just contains a TMemo
  Form2.Memo1.Lines.AddStrings(sl);
  Form2.Show;
  sl.Free;
end;

您的示例中的逻辑是错误的,1.当网页上只有 1 个表单时,列表表单元素永远不会被提取,2.重复循环将导致访问冲突,除非找到“theid”中的标签

这是您成功提取表单元素的示例。

var
  i : integer;
  nforms : integer;
  document : IHTMLDocument2;
  theForm : IHTMLFormElement;
  fields : TStringList;
  theform1 : integer;
  num : integer;
  theid : string;
begin
  fields := TStringList.Create;
  theid := 'xx';

// original code follows
i := -1;
//    nforms := NumberOfForms(webbrowser1.document as IHTMLDocument2);
//    document := webbrowser1.document as IHTMLDocument2;
//    if nforms = 1 then
//    begin
//      theForm := GetFormByNumber(webbrowser1.document as IHTMLDocument2, 0);
//      theform1 := 0;
//    end
//    else
    begin
//              repeat
              begin
                inc(i);
                theForm := GetFormByNumber(webbrowser1.document as IHTMLDocument2,
                  i);
                fields := GetFormFieldNames(theForm);
                num := fields.IndexOf(theid);
                theform1 := i;
              end;
//              until (num <> -1);
    end;
// end of original code

  Memo1.Lines.Text := fields.Text;
  fields.Free;
end;

One complication with locating form elements in a web page is that the page may contain frames and there may be forms in any of the frames. Basically, you have to iterate through all the frames and the forms in each frame. Once you get the form as an IHTMLFormElement, use Cryer's function to get the form element names.

The example link you gave does not have any frames and you should have had no problems getting your list of form elements, unless you tried to get the form by name because it had no name assigned. I had no problem getting the form element names and values using the following procedure

procedure GetForms(doc1: IHTMLDocument2; var sl: TStringList);
var
  i, j, n: integer;
  docForm: IHTMLFormElement;
  slt:  TStringList;
  s: string;
begin
  if doc1 = nil then
  begin
    ShowMessage('doc1 is empty [GetForms]');
    Exit;
  end;
  slt := TStringList.Create;

  n := NumberOfForms(doc1);
  sl.Add('Forms: ' + IntToStr(n));
  for i := 0 to n - 1 do
  begin
    docForm := GetFormByNumber(doc1, i);
    sl.Add('Form Name: ' + docForm.Name);
    slt.Clear;
    slt := GetFormFieldNames(docForm);
    for j := 0 to slt.Count - 1 do
    begin
      s := GetFieldValue(docForm, slt[j]);
      sl.Add('Field Name: ' + slt[j] + '  value: "' + s + '"');
    end;
  end;
  sl.Add('');
  slt.Free;
end;

Cryer's example for navigating a frameset will not work for all web sites, see http://support.microsoft.com/support/kb/articles/Q196/3/40.ASP. The following function successfuly extracts a frame as an IHTMLDocument2 on all sites I have tried

function GetFrameByNumber(Doc:IHTMLDocument2; n:integer):IHTMLDocument2;
var
  Container: IOleContainer;
  Enumerator: ActiveX.IEnumUnknown;
  Unknown: IUnknown;
  Browser: IWebBrowser2;
  Fetched: Longint;
  NewDoc: IHTMLDocument2;
  i : integer;
begin
  // We cannot use the document's frames collection here, because
  // it does not work in every case (i.e. Documents from a foreign domain).
  // From: http://support.microsoft.com/support/kb/articles/Q196/3/40.ASP
  i := 0;
  if (Supports(Doc, IOleContainer, Container)) and
     (Container.EnumObjects(OLECONTF_EMBEDDINGS, Enumerator) = S_OK) then
  begin
    while Enumerator.Next(1, Unknown, @Fetched) = S_OK do
    begin
      if (Supports(Unknown, IWebBrowser2, Browser)) and
         (Supports(Browser.Document, IHTMLDocument2, NewDoc)) then
      begin
        // Here, NewDoc is an IHTMLDocument2 that you can query for
        // all the links, text edits, etc.
        if i=n then
        begin
          Result := NewDoc;
          Exit;
        end;
        i := i+1;
      end;
    end;
  end;
end;

Here is an example of how I have used GetForms and GetFrameByNumber

// from the TForm1 declaration
    { Public declarations }
    wdoc: IHTMLDocument2;


procedure TForm1.btnAnalyzeClick(Sender: TObject);
begin
  wdoc := WebBrowser.Document as IHTMLDocument2;
  GetDoc(wdoc);
end;

procedure TForm1.GetDoc(doc1: IHTMLDocument2);
var
  i, n: integer;
  doc2: IHTMLDocument2;
  frame_dispatch: IDispatch;
  frame_win: IHTMLWindow2;
  ole_index: olevariant;
  sl: TStringList;
begin
  if doc1 = nil then
  begin
    ShowMessage('Web doc is empty');
    Exit;
  end;
  Form2.Memo1.Lines.Clear;
  sl := TStringList.Create;

  n := doc1.frames.length;
  sl.Add('Frames: ' + IntToStr(n));
  // check each frame for the data
  if n = 0 then
    GetForms(doc1, sl)
  else
    for i := 0 to n - 1 do
    begin
      sl.Add('--Frame: ' + IntToStr(i));
      ole_index := i;
      frame_dispatch := doc1.Frames.Item(ole_index);
      if frame_dispatch <> nil then
      begin
        frame_win := frame_dispatch as IHTMLWindow2;
        doc2 := frame_win.document;
//        sl.Add(doc2.body.outerHTML);
        GetForms(doc2,sl);
        GetDoc(doc2);
      end;
    end;

// Form2 just contains a TMemo
  Form2.Memo1.Lines.AddStrings(sl);
  Form2.Show;
  sl.Free;
end;

The logic in your example is faulty, 1. when there is only 1 form on the web page the list of form elements is never extracted, 2. the repeat loop will result in a access violation unless the the tag in "theid" is found

Here is your example cut down to successfully extract the form elements.

var
  i : integer;
  nforms : integer;
  document : IHTMLDocument2;
  theForm : IHTMLFormElement;
  fields : TStringList;
  theform1 : integer;
  num : integer;
  theid : string;
begin
  fields := TStringList.Create;
  theid := 'xx';

// original code follows
i := -1;
//    nforms := NumberOfForms(webbrowser1.document as IHTMLDocument2);
//    document := webbrowser1.document as IHTMLDocument2;
//    if nforms = 1 then
//    begin
//      theForm := GetFormByNumber(webbrowser1.document as IHTMLDocument2, 0);
//      theform1 := 0;
//    end
//    else
    begin
//              repeat
              begin
                inc(i);
                theForm := GetFormByNumber(webbrowser1.document as IHTMLDocument2,
                  i);
                fields := GetFormFieldNames(theForm);
                num := fields.IndexOf(theid);
                theform1 := i;
              end;
//              until (num <> -1);
    end;
// end of original code

  Memo1.Lines.Text := fields.Text;
  fields.Free;
end;
玩物 2024-10-28 14:01:22

嗯,您确定此链接包含任何表单元素吗?至少我没有看到任何可见的。也许它们是隐藏的——但是我自己没有检查过。

迈克尔

Hm, are you sure this link contains any form elements? At least I did not see any visible ones. Perhaps they are hidden - did not check this myself, however.

Michael

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文