Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Set policy for LLM-generated tests #202

Open
jgraham opened this issue Jul 8, 2024 · 6 comments
Open

Set policy for LLM-generated tests #202

jgraham opened this issue Jul 8, 2024 · 6 comments

Comments

@jgraham
Copy link
Contributor

jgraham commented Jul 8, 2024

CC @web-platform-tests/wpt-core-team

I was recently asked about the policy for using LLMs to generate tests that are submitted to wpt. Currently we don't have any explicit policy on this, but I think we need one, so it's clear to test authors what's permissible, and to downstream consumers whether wpt is in line with any other policies they might have with regard to LLMs.

To be clear, I expect that any policy here would not affect review requirements i.e. we'd still require human review for all tests to ensure they're correct and easy to follow.

@Ms2ger
Copy link
Contributor

Ms2ger commented Jul 10, 2024

Given their tendency to create output that looks plausible but isn't actually correct, I'd lean towards "don't allow" or at least increase review requirements on them. We've been relatively permissive in accepting tests based on the fact that a human has already reasoned their way through the spec while writing; if that's no longer a valid assumption, we should take that into account.

@tabatkins
Copy link

Yes, I'm overall on the "disallow officially" stance. If they are LLM-generated, but are good/correct enough we can't tell, then fine, but in my experience people who use LLMs like this are not particularly concerned about the quality in the first place.

@jgraham
Copy link
Contributor Author

jgraham commented Jul 10, 2024

We also need to consider cases where the test is paritally generated but human-modified (e.g. with copilot-like tools).

The practical arguments I've heard in favour of allowing LLM-[partially-]generated tests are basically:

  • People are going to do it anyway, and it will be difficult to police.
  • For people who do follow the policy being forbidden from using productivity tooling for wpt will make them more likely to write non-wpt tests.

Speaking personally, I'm definitely worried about the potential for lower-quality tests being written using these technologies, but as a practical matter I don't know how much we can enforce, so I'd only want to ban them if there's some strong non-technical reason (e.g. vendors who would be unable to use wpt). However I'd be interesting in ideas for marking such tests (assuming good faith contributors) so that we could verify whether they cause more problems in the longer term compared to purely human-generated tests.

@WeizhongX
Copy link
Contributor

Maybe we need clarify the license issue associate with code generated with LLM? One of the reason LLM is banned from some corporation is because the license of LLM-generated code can be unclear, thus not safe to use.

What is the policy regarding the license now for WPTs contributed by a human?

@jgraham
Copy link
Contributor Author

jgraham commented Aug 6, 2024

I think broadly it's the submitter's responsibility to ensure that they're able to contribute it under the terms of the 3-Clause BSD License.

@zcorpan
Copy link
Member

zcorpan commented Aug 8, 2024

I have thought about using an LLM to generate a testing plan (giving it https://web-platform-tests.org/writing-tests/making-a-testing-plan.html + the relevant part of the spec to test), and just tested it now with ChatGPT 4o for https://quirks.spec.whatwg.org/#the-html-element-fills-the-viewport-quirk and https://quirks.spec.whatwg.org/#the-body-element-fills-the-html-element-quirk

After clarifying that I want a list of 100 tests with short descriptions, it came up with various reasonable cases to test. I then asked it to write the first test, and while it made a few mistakes (included a standards mode doctype, unnecessary margin calculations, and unnecessary setup() etc boilerplate), after fixing those the test is basically correct and useful (it passes in Chrome and Safari, fails in Firefox due to bug 1259924).

As a reviewer I would probably like to know when LLMs were used and whether the contributor put in some effort to clean up and verify the correctness.

Prompts and output
  • Please read this to learn how to generate a testing plan for a specification. [attachment: making-a-testing-plan.md]
  • Now make a testing plan for the attached spec. [attachment: quirks-html-body.bs]
  • 5 tests is not sufficient to test all interesting cases. I expect something on the order of 100 tests. Please only generate a list where each bullet point briefly explains a single test. Example: "Test float: left on <body>, verify the border box size in the block flow direction"

Certainly! Here is an extensive list of test cases to thoroughly cover the specification for quirks mode behavior of the <html> and <body> elements:

Tests for <html> Element

Vertical Writing Mode with width: auto

  1. Test width: auto on <html>, verify the border box size in vertical writing mode.
  2. Test margin-left and margin-right on <html>, verify the border box size in vertical writing mode.
  3. Test margin-left: 10px; margin-right: 20px on <html>, verify the border box size in vertical writing mode.
  4. Test border: 5px solid on <html>, verify the border box size in vertical writing mode.
  5. Test padding: 10px on <html>, verify the border box size in vertical writing mode.
  6. Test width: auto; writing-mode: vertical-rl on <html>, verify the border box size.

Horizontal Writing Mode with height: auto

  1. Test height: auto on <html>, verify the border box size in horizontal writing mode.
  2. Test margin-top and margin-bottom on <html>, verify the border box size in horizontal writing mode.
  3. Test margin-top: 10px; margin-bottom: 20px on <html>, verify the border box size in horizontal writing mode.
  4. Test border: 5px solid on <html>, verify the border box size in horizontal writing mode.
  5. Test padding: 10px on <html>, verify the border box size in horizontal writing mode.
  6. Test height: auto; writing-mode: horizontal-tb on <html>, verify the border box size.

Tests for <body> Element

Vertical Writing Mode with width: auto

  1. Test width: auto on <body>, verify the border box size in vertical writing mode.
  2. Test margin-left and margin-right on <body>, verify the border box size in vertical writing mode.
  3. Test margin-left: 10px; margin-right: 20px on <body>, verify the border box size in vertical writing mode.
  4. Test border: 5px solid on <body>, verify the border box size in vertical writing mode.
  5. Test padding: 10px on <body>, verify the border box size in vertical writing mode.
  6. Test position: static on <body>, verify the border box size in vertical writing mode.
  7. Test position: relative on <body>, verify the border box size in vertical writing mode.
  8. Test position: absolute on <body>, verify the border box size in vertical writing mode.
  9. Test position: fixed on <body>, verify the border box size in vertical writing mode.
  10. Test float: none on <body>, verify the border box size in vertical writing mode.
  11. Test float: left on <body>, verify the border box size in vertical writing mode.
  12. Test float: right on <body>, verify the border box size in vertical writing mode.
  13. Test inline-level <body>, verify the border box size in vertical writing mode.
  14. Test spanning <body> (multi-column), verify the border box size in vertical writing mode.

Horizontal Writing Mode with height: auto

  1. Test height: auto on <body>, verify the border box size in horizontal writing mode.
  2. Test margin-top and margin-bottom on <body>, verify the border box size in horizontal writing mode.
  3. Test margin-top: 10px; margin-bottom: 20px on <body>, verify the border box size in horizontal writing mode.
  4. Test border: 5px solid on <body>, verify the border box size in horizontal writing mode.
  5. Test padding: 10px on <body>, verify the border box size in horizontal writing mode.
  6. Test position: static on <body>, verify the border box size in horizontal writing mode.
  7. Test position: relative on <body>, verify the border box size in horizontal writing mode.
  8. Test position: absolute on <body>, verify the border box size in horizontal writing mode.
  9. Test position: fixed on <body>, verify the border box size in horizontal writing mode.
  10. Test float: none on <body>, verify the border box size in horizontal writing mode.
  11. Test float: left on <body>, verify the border box size in horizontal writing mode.
  12. Test float: right on <body>, verify the border box size in horizontal writing mode.
  13. Test inline-level <body>, verify the border box size in horizontal writing mode.
  14. Test spanning <body> (multi-column), verify the border box size in horizontal writing mode.

Interaction Between <html> and <body>

Different Writing Modes

  1. Test <html> horizontal and <body> vertical, verify the border box size for both.
  2. Test <html> vertical and <body> horizontal, verify the border box size for both.

Same Writing Modes with Different Properties

  1. Test <html> and <body> both vertical, different margins.
  2. Test <html> and <body> both vertical, different paddings.
  3. Test <html> and <body> both vertical, different borders.
  4. Test <html> and <body> both horizontal, different margins.
  5. Test <html> and <body> both horizontal, different paddings.
  6. Test <html> and <body> both horizontal, different borders.

Additional Scenarios for Edge Cases

Complex Combinations

  1. Test <html> with width: auto, <body> with height: auto.
  2. Test <html> with height: auto, <body> with width: auto.
  3. Test <html> with padding: 20px, <body> with margin: 30px.
  4. Test <html> with border: 5px solid, <body> with padding: 15px.
  5. Test <html> with margin: 10px, <body> with border: 10px solid.

Nested Elements and Layouts

  1. Test nested <div> elements inside <body> affecting the border box size.
  2. Test nested <table> elements inside <body> affecting the border box size.
  3. Test nested <iframe> elements inside <body> affecting the border box size.
  4. Test nested <section> elements inside <body> affecting the border box size.
  5. Test nested <article> elements inside <body> affecting the border box size.

Special CSS Combinations

  1. Test <html> with display: flex, <body> with flex-direction: column.
  2. Test <html> with display: grid, <body> with grid-template-rows.
  3. Test <html> with display: block, <body> with display: inline.
  4. Test <html> with display: inline-block, <body> with display: block.

Margin and Padding Variations

  1. Test <body> with varying margin-top: 10px to 50px.
  2. Test <body> with varying margin-bottom: 10px to 50px.
  3. Test <body> with varying margin-left: 10px to 50px.
  4. Test <body> with varying margin-right: 10px to 50px.
  5. Test <body> with varying padding-top: 10px to 50px.
  6. Test <body> with varying padding-bottom: 10px to 50px.
  7. Test <body> with varying padding-left: 10px to 50px.
  8. Test <body> with varying padding-right: 10px to 50px.

Tests for Interaction with Other Elements

Interaction with <header> and <footer>

  1. Test <body> with a <header> element affecting the border box size.
  2. Test <body> with a <footer> element affecting the border box size.

Interaction with Forms and Inputs

  1. Test <body> with a <form> element affecting the border box size.
  2. Test <body> with multiple <input> elements affecting the border box size.

Interaction with Multimedia Elements

  1. Test <body> with an <img> element affecting the border box size.
  2. Test <body> with a <video> element affecting the border box size.
  3. Test <body> with an <audio> element affecting the border box size.

Special Cases and Edge Conditions

Handling of Overflow

  1. Test <body> with overflow: auto, verify the border box size.
  2. Test <body> with overflow: hidden, verify the border box size.
  3. Test <body> with overflow: scroll, verify the border box size.

Extreme Values

  1. Test <body> with extremely large margin values.
  2. Test <body> with extremely small padding values.

JavaScript Modifications

  1. Test modifying <body> styles via JavaScript, verify the border box size.
  2. Test modifying <html> styles via JavaScript, verify the border box size.

Responsive and Adaptive Design

  1. Test <body> with media queries affecting width and height.
  2. Test <body> with viewport changes, verify border box size adjustments.

Testing in Different Browsers

  1. Test <body> behavior in Chrome, verify quirks mode behavior.
  2. Test <body> behavior in Firefox, verify quirks mode behavior.
  3. Test <body> behavior in Safari, verify quirks mode behavior.
  4. Test <body> behavior in Edge, verify quirks mode behavior.

Integration with Modern CSS Features

  1. Test <body> with CSS Variables affecting size properties.
  2. Test <body> with CSS Custom Properties.
  3. Test <body> with CSS Grid and Flexbox properties.

Legacy and Compatibility Modes

  1. Test <body> with <!DOCTYPE html> and no doctype, verify differences.
  2. Test <body> with <!DOCTYPE html SYSTEM "about:legacy-compat">.

Additional Custom Scenarios

  1. Test custom styles on <body> with user-defined classes.
  2. Test combining multiple CSS properties on <body> simultaneously.
  3. Test <body> with pseudo-classes like :hover and :active.
  4. Test <body> with pseudo-elements like ::before and ::after.
  5. Test <body> with different character encodings affecting layout.

By covering these tests, you will comprehensively ensure the correct behavior of the <html> and <body> elements in quirks mode under various conditions and configurations.

  • Create the first test using testharness.js
<!DOCTYPE html>
<html>
<head>
    <title>Quirks Mode Test: HTML Width Auto in Vertical Writing Mode</title>
    <script src="/resources/testharness.js"></script>
    <script src="/resources/testharnessreport.js"></script>
    <style>
        html {
            width: auto;
            writing-mode: vertical-rl;
            margin: 0;
            padding: 0;
            border: 0;
        }
        body {
            margin: 0;
            padding: 0;
            border: 0;
        }
    </style>
</head>
<body>
    <script>
        // Define the test
        test(function() {
            // Retrieve the HTML element
            var htmlElement = document.documentElement;

            // Get the computed style of the HTML element
            var computedStyle = getComputedStyle(htmlElement);

            // Calculate the expected border box size
            var viewportWidth = window.innerWidth;
            var marginLeft = parseInt(computedStyle.marginLeft);
            var marginRight = parseInt(computedStyle.marginRight);
            var expectedBorderBoxWidth = Math.max(viewportWidth - marginLeft - marginRight, htmlElement.clientWidth);

            // Assert that the border box width matches the expected value
            assert_equals(htmlElement.offsetWidth, expectedBorderBoxWidth, "HTML element's border box width should be correctly calculated.");
        }, "Test HTML element with width: auto in vertical writing mode");

        // Report the results
        setup({explicit_done: true});
        add_completion_callback(function() {
            done();
        });
    </script>
</body>
</html>
  • Please make these changes:
    • remove the doctype
    • remove marginLeft and marginRight calculations (can assume those are 0)
    • remove "// Report the results" and the rest of the script after that
<html>
<head>
    <title>Quirks Mode Test: HTML Width Auto in Vertical Writing Mode</title>
    <script src="/resources/testharness.js"></script>
    <script src="/resources/testharnessreport.js"></script>
    <style>
        html {
            width: auto;
            writing-mode: vertical-rl;
            margin: 0;
            padding: 0;
            border: 0;
        }
        body {
            margin: 0;
            padding: 0;
            border: 0;
        }
    </style>
</head>
<body>
    <script>
        // Define the test
        test(function() {
            // Retrieve the HTML element
            var htmlElement = document.documentElement;

            // Get the computed style of the HTML element
            var computedStyle = getComputedStyle(htmlElement);

            // Calculate the expected border box size
            var viewportWidth = window.innerWidth;
            var expectedBorderBoxWidth = Math.max(viewportWidth, htmlElement.clientWidth);

            // Assert that the border box width matches the expected value
            assert_equals(htmlElement.offsetWidth, expectedBorderBoxWidth, "HTML element's border box width should be correctly calculated.");
        }, "Test HTML element with width: auto in vertical writing mode");
    </script>
</body>
</html>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
5 participants